Читать книгу Informatics and Machine Learning. From Martingales to Metaheuristics онлайн
62 страница из 101
2.4 Identifying Emergent/Convergent Statistics and Anomalous Statistics
Expectation, E(X), of random variable (r.v.) X:
X is the total of rolling two six‐sided dice: X = 2 can occur in one way, rolling “snake eyes,” while rolling X = 7 can be done in six ways, etc. E(X) = 7. Now consider the expectation for rolling a single die, now E(X) = 3.5. Notice that the value of the expectation need not be one of your possible outcomes (it is really hard to roll a 3.5).
The expectation, E(g(X)), of a function g of r.v. X:
Consider special case g(X) where g(xi ) = −log(p(xi )):
which is Shannon Entropy for the discrete distribution p(xi). For Mutual Information, similarly, use g(X,Y) = log(p(xi , yi )/p(xi )p(yi )) :
if p(xi ), p(yi ), p(xi , yi ) are all ∈ℜ+ , which is the Relative Entropy between a joint distribution and the same distribution if r.v.'s independent: D( p(xi , yi ) ‖ p(xi )p(yi ) ).
Jensen's Inequality:
11nn11nnii11
Since φ(x) = −log(x) is a convex function:
Variance:
Chebyshev's Inequality: