Читать книгу Informatics and Machine Learning. From Martingales to Metaheuristics онлайн

56 страница из 101

In the above example we see our first use of hash variables (“stats”) in keeping tabs on counts of occurrences of various outcomes. This is a fundamental way to perform such counts without enumerating all of the outcomes beforehand (which results in what is known as the “enumeration problem,” which is not really a problem, just a poor algorithmic approach). Further discussion of the enumeration “problem” and how it can be circumvented with use of hash variables will be described in ssss1.

The sequence information is traversed in a manner such that each of the dinucleotides is counted in the order seen, where the dinucleotide is extracted as a “window” of width two bases is slid across the genomic sequence. Each dinucleotide is entered into the count hash variable as a “key” entry, with the associated “value” being an increment on the count already seen and held as the old “value.” These counts are then transferred to an array to make use of our prior subroutines count_to_freq and Shannon.

Правообладателям