Читать книгу Informatics and Machine Learning. From Martingales to Metaheuristics онлайн
77 страница из 101
3 H(p1, p2, …, pn, 0) = H(p1, p2, …, pn). This reductive relationship, or something like it, is implicitly assumed when describing any system in “isolation.”
Note that the above axiomatic derivation is still “weak” in that it assumes the existence of the conditional entropy in property (2).
3.1.2 Maximum Entropy Principle
The law of large numbers (ssss1), and related central limit theorem, explain the ubiquitous appearance of the Gaussian (a.k.a., Normal) distribution in Nature and statistical analysis. Even when speaking of a probability distribution purely in the abstract, the Gaussian distribution (amongst a collection) still stands out in a singular way. This is revealed when seeking the discrete probability distribution that maximizes the Shannon entropy subject to constraints. The Lagrangian optimization method is a mathematical formalism to solve problems of this type, where you want to optimize something, but must do so subject to constraints. Lagrangians are described in detail in ssss1, ssss1, and ssss1. For our purposes here, once you know how to group the terms to create the Lagrangian expression appropriate to your problem, the problem is then reduced to simple differential calculus and algebra (you take a derivative of the Lagrangian and solve for it being zero – the classic way to find an extremum from calculus). I will skip most of the math here, and just state the Lagrangians and their solutions in the small examples that follow.