Читать книгу Informatics and Machine Learning. From Martingales to Metaheuristics онлайн
75 страница из 101
Numerous prior book, journal, and patent publications by the author are drawn upon throughout the text [1–68]. Almost all of the journal publications are open access. These publications can typically be found online at either the author’s personal website (www.meta‐logos.com) or with one of the following online publishers: www.m‐hikari.com or bmcbioinformatics.biomedcentral.com.
3.1 Shannon Entropy, Relative Entropy, Maxent, Mutual Information
k
k
The definition of Shannon Entropy in this math notation, for the P distribution, is:
The degree of randomness in a discrete probability distribution P can be measured in terms of Shannon entropy [106] .
Shannon entropy appears in fundamental contexts in communications theory and in statistical physics [100] . Efforts to derive Shannon entropy from some deeper theory drove early efforts to at least obtain axiomatic derivations, with the one used by Khinchine given in the next section being the most popular. The axiomatic approach is limited by the assumptions of its axioms, however, so it was not until the fundamental role of relative entropy was established in an “information geometry” context [113–115], that a path to show that Shannon entropy is uniquely qualified as a measure was established (c. 1999). The fundamental (extremal optimum) aspect of relative entropy (and Shannon entropy as a simple case) is found by differential geometry arguments akin to those of Einstein on Riemannian spaces (here involving spaces defined by the family of exponential distributions). Whereas the “natural” notion of metric and distance locally is given by the Minkowski metric and Euclidean distance, a similar analysis on comparing distributions (evaluating their “distance” from eachother) indicates the natural measure is relative entropy (which reduces to Shannon entropy in variational contexts when the relative entropy is relative to the uniform probability distribution). Further details on this derivation are given in ssss1.