DNA Language: DNA Cryptic Codex

The Yamanuchi Code

A Universal Language for reading DNA, RNA, and Protein Sequences

Part 1: The DNA & RNA code

Outline

The human genome has now been sequenced along with the complete genomes of several other forms of life.

This yields the base sequences e.g. ucaggugagcug etc

But what do these sequences mean?

The correspondence of the sequences of DNA via RNA to protein is well established and is known as the “genetic code”

The RNA is read as triplet codons so the sense must reside in the nucleotide triplets

However although the correspondence of triplets to amino acids is known the meaning of the sequences remains a mystery.

i.e. although we know how the language of RNA is translated into protein we do not know the meaning of the protein sequences or the RNA sequences

Only a small percentage (about 2%) of DNA is translated into protein and the function of the remaining 98% remains unknown and is referred to as junk DNA. However this DNA is essential for life and longer sequences of DNA that are needed for the protein are translated into RNA, which is then spliced into a shorter form for protein translation.

The genetic code acts as a kind of Rosetta Stone so if we can understand what the protein sequence is saying then the DNA can be understood in triplet codons.

The protein sequence consists of a series of amino acids of which there are

20 in the genetic code

The Alphabet has 26 letters of which 5 are vowels and 21 are consonants

So each amino acid can correspond to a consonant and so a language can be formed by assigning each amino acid to a consonant that reflects its position in the code.

The array of triplet codons consista of alpha, beta, gamma, and delta quadrants.

So the consonant for each amino acid can be deduced by its position in the codon array as follows

Yamanuchi Code Translation

UUU Fa Phe F UUC Fi Phe F UUA Fo Leu L UUG Fu Leu L	UCU Sa Ser S UCC Si Ser S UCA So Ser S UCG Su Ser S	CUU La Leu L CUC Li Leu L CUA Lo Leu L CUG Lu Leu L	CCU Ra Pro P CCC Ri Pro P CCA Ro Pro P CCG Ru Pro P
UGU Za Cys C UGC Zi Cys C UGA Dno Stop UGG Wi Trp W	UAU Ya Tyr Y UAC Yi Tyr Y UAA Kno Stop UAG Gno Stop	CGU Da Arg R CGC Di Arg R CGA Do Arg R CGG Du Arg R	CAU Sha His H CAC Shi His H CAA Qui Gln Q CAG Qua Gln Q
GUU Va Val V GUC Vi Val V GUA Vo Val V GUG Vu Val V	GCU Ba Ala A GCC Bi Ala A GCA Bo Ala A GCG Bu Ala A	AUU Cha Ile I AUC Chi Ile I AUA Cho Ile I AUG Me Met M	ACU Ta Thr T ACC Ti Thr T ACA To Thr T ACG Tu Thr T
GGU Ja Gly G GGC Ji Gly G GGA Jo Gly G GGG Ju Gly G	GAU Ga Asp D GAC Gi Asp D GAA Go Glu E GAG Gu Glu E	AGU Ma Ser S AGC Mi Ser S AGA Mo Arg R AGG Mu Arg R	AAU Nu Asn N AAC Neu Asn N AAA Ki Lys K AAG Ka Lys K

The protein code is much simpler than the DNA code, due to redundancy of the 64 triplet codons coding for 20 amino acids, i.e. Most amino acids have 4 different codons.

If we only have a protein sequence, then we do not know which exact codon coded for the amino acid but we do know the first to letters of the codon so the corresponding consonant is valid.

Here we use just one vowel for each consonant

So Glycine is simply “Ja”, and serine is “Si”

With DNA the vowel ending is changed to correspond to the third codon

e.g. Ja, Ji, Jo, Ju correspond to the 3^rd codon being u, c, a, g respectively

DNA Language

Sunday, 25 March 2012

DNA Cryptic Codex

No comments:

Post a Comment

About Me