Читать книгу Informatics and Machine Learning. From Martingales to Metaheuristics онлайн
76 страница из 101
3.1.1 The Khinchin Derivation
In his now famous 1948 paper [106] , Claude Shannon provided a qualitative measure for entropy in connection with communication theory. The Shannon entropy measure was later put on a more formal footing by A. I. Khinchin in an article where he proves that with certain assumptions the Shannon entropy is unique [107] . (Dozens of similar axiomatic proofs have since been made.) A statement of the theorem is as follows:
H(p1, p2, …, pn)np1p2…pnpk≥ 0k = 1, 2, …, n∑kpk= 1nH(p1, p2, …, pn) = −λ∑kpklog(pk)λλ = 1
1 For given n and for ∑kpk = 1, the function takes its largest value for pk = 1/n (k = 1, 2, …, n). This is equivalent to Laplace’s principle of insufficient reason, which says if you do not know anything assume the uniform distribution (also agrees with Occam’s Razor assumption of minimum structure).
2 H(ab) = H(a) + Ha(b), where Ha(b) = –∑ap(a)log(p(b|a)), is the conditional entropy. This is consistent with H(ab) = H(a) + H(b), for probabilities of a and b independent, with modifications involving conditional probability being used when not independent.