Book of Abstracts: Albany 2003
June 17-21 2003
Open Reading Frames (ORFs) and Codon Bias In The Genome Of Streptomycis coelicolor and the Origin and Evolution of the Genetic Code
Examination of the complete genome of S coelicolor reveals that the antisense strands of 70% of the 7514 genes (5265) contain no stop codons and could in principal be open reading frames (ORFs). Of these, 53% (2805 genes) have a third full length ORF and 10% of these (284) have a fourth ORF. Finally 56 of the 7302 genes have five ORFs (no stop frames). We have previously detected a significant bias in codon usage in the short chain oxide reductase (SCOR) enzyme family. Of 1651 predicted or known gene products in species from bacteria, archaea, and eukaryotes, 81 SCOR genes having triple ORFs (TORFs) were found to be encoded almost exclusively by the 32 of the 64 codons that are GC-only or GC-rich (2 out of 3 nucleic acids in a codon being G or C) in composition. Examination of the double ORFs (DORFs), TORFs, quartet ORFs (QORFs) and penta ORFs (PORFs) in S coelicolor revealed a similar bias in codon use and a DNA triple distribution that is most severe in the QORFs and PORFs. The 256 QORF genes vary in length from 22 to 464 amino acids. When the 170 hypothetical gene products that have at least 100 amino acids are examined, 87% of the coding is from the GC-rich half of the genetic code and 82% of the protein sequences are composed of only 10 amino acids (GPASTDLVER). Only eighteen of the expected gene products are specificly characterized. These include 5 dehydrogenases, 3 kinases, 2 esterases, a permease, a deformylase, 2 ABC transport proteins, a 2 component regulator, and three ribosomal proteins [S12, L18 and L33]. The QORF subset also includes 98 proteins characterized as being homologous with known proteins (including cysteinyl-tRNA synthetase) and 58 gene products identified only as hypothetical proteins. The QORFs in S. coelicolor appear to identify a subset of the codon system that evolved first, a subset of amino acids that constituted the earliest folded proteins and evidence of a possible two letter genetic code that preceded the modern genetic code. This work is supported by NIH Grant No. DK26546.
1Structural Biology Dept.