Читать книгу Informatics and Machine Learning. From Martingales to Metaheuristics онлайн
87 страница из 101
In running prog2.py addendum 4 we find that the codon “tag” has much lower counts, and similarly for the codon “cta”:
frame 0 have tag with 8970 and cta with 8916
frame 1 have tag with 9407 and cta with 8821
frame 2 have tag with 8877 and cta with 9033
The tag and cta trinucleotides happen to be related – they are reverse compliments of each other (the first hint of information encoding via duplex deoxyribonucleic acid (DNA) with Watson–Crick base‐pairing). There are two other notably rare codons: taa and tga (and their reverse compliment in this all‐frame genome‐wide study as well).
Now that we have identified an interesting feature, such as “tag,” it is reasonable to ask about this feature’s placement across the genome. Having done that, the follow‐up is to identify any anomalously recurring feature proximate to the feature of interest. Such an analysis would need a generic subroutine for getting counts on sub‐strings of indicated order on an indicated reference, to genome sequence data, and that is provided next as an addendum #5 to prog2.py.