Читать книгу Informatics and Machine Learning. From Martingales to Metaheuristics онлайн

89 страница из 101

Upon running the above code with codon delimiter set to “tag,” we arrive at ssss1, which shows the distribution on (tag) gap sizes. Bin size is 100. So gap bin 0 has the count on all gaps seen sized anywhere from 1 to 99. Bin 1 has counts on occurrences of gaps in the domain 100–199, etc.

ssss1 (tag) Gap sizes, with bin size 100.

Gap bin Count 0 2115 1 1428 2 1066 3 829 4 696 5 484 6 399 7 293 8 241 9 222

ssss1 (aaa) Gap sizes, with bin size 100.

Gap bin Count 0 21 256 1 7843 2 3375 3 1665 4 827 5 480 6 287 7 163 8 86 9 70

In order to see how strongly the (tag) distribution is skewed, we consider some other codon to evaluate, such as for the “aaa” gap, where aaa is most common. The aaa gaps, shown in ssss1, tend to be much smaller, with a standard exponential distribution fall‐off indicative of no long‐range encoding linkages:

Thus the codon tag is clearly very different from aaa, it is as if tag roughly marks the boundaries of regions, and aaa is just scattered throughout. Are any other codons similar to tag? The frequency analysis blurs counts so more subtle differences not as obvious have to run gap counter for each to directly see, and how to easily “see”? Notice how the tag distribution has a long tail. The gap bins only go to 9 in the figures, but for the full dataset the last nonzero gap bin for “tag” is at a remarkable bin 70. For “aaa” the last nonzero bin is much earlier, at bin number 23 (even though there are 10 times as many (aaa) codons as (tag) codons). For “taa” the last nonzero bin is at 60, while for tga it is at 53. The codons taa, tga, and tag are known as the stop codons and the gaps between them are known as ORFs, or ORFs. A subtlety in the statistical analysis is that the stop codons do not have to match to define such anomalously large regions (according to observation). Thus, a biochemical encoding scheme must exist that works with any of the three stop codons seen as equivalent, thus the naming for this group as “stop” codons (and their grouping as such in ssss1). For more nuances of the naming convention “stop” codon when delving into the encoding biochemistry see [1, 3].

Правообладателям