Book of Abstracts: Albany 2007
June 19-23 2007
Gene Duplication in the (β/α)8 Barrel and a Primitive Genetic Code
Gene duplication is considered to be one of the most fundamental and important mechanisms of protein evolution, but the details of the process are poorly understood. Recent publications provide evidence that an eight-stranded beta-alpha barrel [(β/α)8] was a result of gene duplication and fusion of a four-stranded (β/α)4 barrel early in evolution (1, 2). The postulated gene duplication was based on the conservation of the identities of 18 amino acids out of 120 in the half barrels of only one protein sequence. The overlap of the 18 identities required introduction of four insertions and deletions (1, 2). This analysis was based on amino-acid sequence comparison only and not on the DNA sequences.
On the other hand, by applying our techniques for protein and gene sequence analysis (3, 4) to 400 structures with (β/α)8 barrels, we can unequivocally show that the (β/α)8 barrel arose via gene duplication without the occurrence of insertions and deletions within the half barrels. We have found that the (β/α)8 barrels evolved via sequential introduction of twelve amino acids (one at a time) at a single locus between the two halves of the duplicated β-barrel leading to the existence of 12 subgroups. We have also found very specific species distribution associated with each of the subgroups. For example, the shortest insertions appear to be in proteins found in the earliest species (firmicutes and specific classes of proteobacteria).
Next, we examined the genes of the 400 (β/α)8 barrels for the presence of alternate full-length open reading frames (ORFs) (4). We found 23 genes with sense/antisense overlapping ORFs and eight genes with triple ORFs. When we compared the sequences of the two halves of (β/α)8 barrels with genes having multiple ORFs, an average of 22 amino acids aligned with no insertions, and an average of 17 of these had conserved wobble-base identity supporting the gene duplication hypothesis. Although these genes come from species with a wide range of average GC content, they all had a high GC bias. Examination of these 31 genes revealed a similar GC codon and amino-acid triple bias that we have previously found in short-chain oxidoreductase enzymes and heat shock proteins (4). Codon use in the 23 genes having sense/antisense overlapping ORFs is illustrated in Table I. In some cases, the bias was so extreme that AT-only triples were completely absent, 96% of the coding was due to just 26 codons (all of which end in G or C), and 95% of the entire protein was composed of just 15 amino acids. The rare occurrence of certain amino acids (MYQCW) is consistent with the frequency of occurrence of these residues in the proteins of Streptomyces coelicolor, a species with high GC content.
References and Footnotes
W. L. Duax1,3,*