Читать книгу Informatics and Machine Learning. From Martingales to Metaheuristics онлайн
52 страница из 101
Is genomic DNA random? Let us read thru a dna file, consisting of a sequence of a,c,g, and t's, and get their counts… then compute the shannon entropy vs. random (uniform distribution, e.g. p = 1/4 for each of the four possibilities). In order to do this we must learn file input/output (i/o) to “read” the data file:
------------------ prog1.py addendum 4 ----------------------- fo = open("Norwalk_Virus.txt", "r+") str = fo.read() # print(str) fo.close() ---------------- end prog1.py addendum 4 ---------------------
Notes on syntax: the example above shows the standard template for reading a data file, where the datafile's name is Norwalk_Virus.txt. The subroutine “open” is a Python command that handles file i/o. As its name suggests, it “opens” a datafile.
ssss1 The Norwalk virus genome (the “cruise ship virus”).
The Norwalk virus file has nonstandard format and is shown in its entirety in ssss1 (split into two columns). The Escherichia coli genome (ssss1), on the other hand, has standard FASTA format. (FASTA is the name of a program (~1985), where a file format convention was adopted, allowing “flat‐file” record access that has been used in similar form ever since.)