Читать книгу Informatics and Machine Learning. From Martingales to Metaheuristics онлайн
70 страница из 101
4 2.4 Go to genbank (https://www.ncbi.nlm.nih.gov/genbank/) and select the genome of a small virus (~10 kb). Using the Python code shown in ssss1, determine the base frequencies for {a,c,g,t}. What is the shannon entropy (if those frequencies are taken to be the probabilities on the associated outcomes)?
5 2.5 Go to genbank (https://www.ncbi.nlm.nih.gov/genbank/) and select the genome of three medium‐sized viruses (~100 kb). Using the Python code shown in ssss1, determine the trinucleotide frequencies. What is the Shannon entropy of the trinucleotide frequencies for each of the three virus genomes? Using this as a distance measure phylogenetically speaking, which two viruses are most closely related?
6 2.6 Repeat (Exercise 2.5) but now use symmetrized relative entropy between the trinucleotide probability distributions as a distance measure instead (reevaluate pairwise between the three viruses). Using this as a distance measure phylogenetically speaking, which two viruses are most closely related?