Sunday, 25 March 2012

DNA Cryptic Codex

The Yamanuchi Code

A Universal Language for reading DNA, RNA, and Protein Sequences
Part 1: The DNA & RNA code

Outline

The human genome has now been sequenced along with the complete genomes of several other forms of life.

This yields the base sequences e.g. ucaggugagcug etc

But what do these sequences mean?

The correspondence of the sequences of DNA via RNA to protein is well established and is known as the “genetic code”

The RNA is read as triplet codons so the sense must reside in the nucleotide triplets

However although the correspondence of triplets to amino acids is known the meaning of the sequences remains a mystery.

i.e. although we know how the language of RNA is translated into protein we do not know the meaning of the protein sequences or the RNA sequences

Only a small percentage (about 2%) of DNA is translated into protein and the function of the remaining 98% remains unknown and is referred to as junk DNA. However this DNA is essential for life and longer sequences of DNA that are needed for the protein are translated into RNA, which is then spliced into a shorter form for protein translation.

The genetic code acts as a kind of Rosetta Stone so if we can understand what the protein sequence is saying then the DNA can be understood in triplet codons.

The protein sequence consists of a series of amino acids of which there are
20 in the genetic code

The Alphabet has 26 letters of which 5 are vowels and 21 are consonants

So each amino acid can correspond to a consonant and so a language can be formed by assigning each amino acid to a consonant that reflects its position in the code.

The array of triplet codons consista of alpha, beta, gamma, and delta quadrants.
So the consonant for each amino acid can be deduced by its position in the codon array as follows

Yamanuchi Code Translation


UUU   Fa   Phe F
UUC   Fi    Phe F
UUA   Fo   Leu L
UUG   Fu   Leu L


UCU   Sa   Ser S
UCC   Si    Ser S
UCA   So   Ser S
UCG   Su   Ser S
CUU   La   Leu L
CUC   Li    Leu L
CUA   Lo   Leu L
CUG   Lu   Leu L
CCU   Ra   Pro P
CCC   Ri    Pro P
CCA   Ro   Pro P
CCG   Ru   Pro P
UGU   Za    Cys C
UGC   Zi     Cys C
UGA   Dno  Stop
UGG   Wi   Trp W


UAU   Ya    Tyr Y
UAC   Yi     Tyr Y
UAA   Kno  Stop
UAG   Gno  Stop
CGU   Da   Arg R
CGC   Di    Arg R
CGA   Do   Arg R
CGG   Du   Arg R
CAU   Sha   His H
CAC   Shi    His H
CAA   Qui   Gln Q
CAG   Qua  Gln Q
GUU   Va   Val V
GUC   Vi    Val V
GUA   Vo   Val V
GUG   Vu   Val V


GCU   Ba   Ala A
GCC   Bi    Ala A
GCA   Bo   Ala A
GCG   Bu   Ala A
AUU   Cha   Ile I
AUC   Chi    Ile I
AUA   Cho   Ile I
AUG   Me  Met M
ACU   Ta   Thr T
ACC   Ti    Thr T
ACA   To   Thr T
ACG   Tu   Thr T
GGU   Ja   Gly G
GGC   Ji    Gly G
GGA   Jo   Gly G
GGG   Ju   Gly G


GAU   Ga   Asp D
GAC   Gi    Asp D
GAA   Go   Glu E
GAG   Gu   Glu E
AGU   Ma   Ser S
AGC   Mi    Ser S
AGA   Mo   Arg R
AGG   Mu   Arg R
AAU   Nu    Asn N
AAC   Neu  Asn N
AAA   Ki     Lys K
AAG   Ka    Lys K




The protein code is much simpler than the DNA code, due to redundancy of the 64 triplet codons coding for 20 amino acids, i.e. Most amino acids have 4 different codons.
If we only have a protein sequence, then we do not know which exact codon coded for the amino acid but we do know the first to letters of the codon so the corresponding consonant is valid.

Here we use just one vowel for each consonant
So Glycine is simply “Ja”, and serine is “Si”

With DNA the vowel ending is changed to correspond to the third codon
e.g. Ja, Ji, Jo, Ju correspond to the 3rd codon being u, c, a, g respectively

No comments:

Post a Comment