Читать книгу Informatics and Machine Learning. From Martingales to Metaheuristics онлайн
85 страница из 101
To do the computation for ssss1, the next subroutine (prog2.py addendum 3) is a recycled version of the code for the dinucleotide counter described previously (prog1.py addendum 6), except that now the two bases in the sample window have a fixed gap size between them of the indicated size. Before, with no gap, the gapsize was zero.
ssss1 Codon structure is revealed in the V. cholera genome by mutual information between nucleotides in the genomic sequence when evaluated for different gap sizes.
In ssss1, at first the mutual information falls off as we look at statistical linkages at greater and greater distance, which makes sense for any information construct that is “local,” e.g. it should have less linkage with structure further away. After a certain point, however, the mutual information no longer falls off, instead cycling back to a certain level of mutual information with a cycle period of three bases. This suggests that a long‐range three‐element encoding scheme might exist (among other things), which can easily be tested. In doing so we ask Nature “the right question” and the answer is the rediscovery of the codon encoding scheme, as will be shown in what follows.