Читать книгу Informatics and Machine Learning. From Martingales to Metaheuristics онлайн
84 страница из 101
Recall that the definition of mutual information between two random variables, {X, Y} is simply the relative entropy between P(X, Y) “P” and P(X)P(Y) “p”, and this is implemented in addendum #2 to prog2.py:
------------------- prog2.py addendum 2 ---------------------- # the mutual information between P(X,Y) and p(X)p(Y) requires passing two # prob arrays: P and p, where |P| = |p|^2 is relation between # of terms: def mutual_info ( P , p ): Pnum = len(P) pnum = len(p) if Pnum != pnum*pnum: print "error: Pnum != pnum*pnum" return -1 mi = 0 for index in range(0, Pnum): row = index/pnum column = index%pnum mi += P[index]*math.log(P[index]/(p[row]*p[column])) return mi #usage Prob_EC_2mer = shannon_order(EC_sequence,2) Prob_EC_1mer = shannon_order(EC_sequence,1) mutual_info(Prob_EC_2mer,Prob_EC_1mer) ----------------- prog2.py addendum 2 end -------------------
3.2 Codon Discovery from Mutual Information Anomaly
As mentioned previously, mutual information allows statistical linkages to be discovered that are not otherwise apparent. Consider the mutual information between nucleotides in genomic data when different gap sizes are considered between the nucleotides as shown in ssss1a. When the MI for different gap sizes is evaluated (see ssss1b), a highly anomalous long‐range statistical linkage is seen, consistent with a three‐element encoding scheme (the codon structure is thereby revealed) [1, 3].