Читать книгу Informatics and Machine Learning. From Martingales to Metaheuristics онлайн
86 страница из 101
So, to clarify before proceeding, suppose we want to get information on a three‐element encoding scheme for the Escherichia coli genome (Chromosome 1), say, in file EC_Chr1.fasta.txt. We, therefore, want an order = 3 oligo counting, but on 3‐element windows seen “stepping” across the genome, e.g. a “stepping” window for sampling, not a sliding window, resulting in three choices on stepping, or framing, according to how you take your first step:
case 0: agttagcgcgt ‐‐> (agt)(tag)(cgc)gt
case 1: agttagcgcgt ‐‐> a(gtt)(agc)(gcg)t
case 2: agttagcgcgt ‐‐> ag(tta)(gcg)(cgt)
In the code that follows we get codon counts for a particular frame‐pass (prog2.py addendum 4):
------------------- prog2.py addendum 4 --------------------- # so suspect existence of three-element coding scheme, the codon, # so need stats (anomolous) on codons.... # 'frame' specifies the frame pass as case 0, 1, or 2 in the text. # getting codon counts for a specified framing and specified sequence # will now be shown in two ways, one built from re-use of code blocks, # one from re-use of an entire subroutine: def codon_counter ( seq, frame ): codon_counts = {} pattern = '[acgtACGT]' result = re.findall(pattern, seq) seqlen = len(seq) # probs = np.empty((0)) for index in range(frame,seqlen-2): if (index+3-frame)%3!=0: continue codon = result[index]+result[index+1]+result[index+2] if codon in codon_counts: codon_counts[codon]+=1 else: codon_counts[codon]=1 counts = np.empty((0)) for i in sorted(codon_counts): counts = np.append(counts,codon_counts[i]+0.0) print "codon", i, "count =", codon_counts[i] probs = count_to_freq(counts) return probs codon_counter(EC_sequence,0) # could also get codon counts by shannon_order with modification to step # (and have order=3 for codon: def shannon_codon( seq, frame ): order=3 stats = {} pattern = '[acgtACGT]' result = re.findall(pattern, seq) seqlen = len(seq) for index in range(order-1+frame,seqlen): if index%3!=2: continue xmer = "" for xmeri in range(0,order): xmer+=result[index-(order-1)+frame+xmeri] if xmer in stats: stats[xmer]+=1 else: stats[xmer]=1 for i in sorted(stats): print("%d %s" % (stats[i],i)) counts = np.empty((0)) for i in sorted(stats): counts = np.append(counts,stats[i]+0.0) probs = count_to_freq(counts) return probs shannon_codon(EC_sequence,0) ---------------- prog2.py addendum 4 end -------------------