Book of Abstracts: Albany 2007

category image Albany 2007
Conversation 15
June 19-23 2007

Molecular Dynamics Based Physicochemical Model for Genome Analysis

In search of a simple hypothesis driven molecular model to characterize DNA sequences as genes (coding for proteins) and nongenes, we examined some physicochemical properties of each of the 64 double-helical trinucleotides (codons) intrinsic to DNA sequences and constructed three-dimensional vectors for each codon considering hydrogen-bonding energy (x), stacking energy (y), and a third parameter, which we provisionally identified with protein-nucleic acid interactions (z). Assignment of x, y values for each codon are based on average energies derived from 15ns long Molecular Dynamics simulations on 39 double-stranded DNA sequences containing multiple copies of each codon (3). The z values are assigned on the basis of the conjugate rule (4). As this three-dimensional vector moves along any genome, the net orientation of the resultant vector differs significantly for gene and nongene regions to make a distinction feasible. As of now, the success achieved in the segregation of genes and nongenes in 370 genomes including 331 prokaryotic, 21 eukaryotic, and 18 viral genomes is at par or better than some of the popular genome analysis software based on sophisticated statistical and mathematical database trained models; thus, presenting a strong proof of concept of the viability of the physico-chemical model. ChemGenome 1.1 can perform sequence classification into genes and non-genes. The gene prediction program, ChemGenome2.0, can be employed to carry out whole genome analysis with the methodology further extended for a molecular level identification of genes. In a test of this protocol a prediction rate of >95% was observed among 276 prokaryotic genomes out of the 331 genomes tested. These genome analysis tools are available online at www.scfbio-iitd.res.in/chemgenome2 for public access. The relatively high sensitivity of the method, coupled with the hypothesis driven nature of the algorithm makes ChemGenome2.0 a useful tool for confirming the predictions of alternative methodologies. ChemGenome analyses provide further physicochemical insights into what constitutes a gene. Presently, work is in progress to improve the specificity of the model and extend the methodology to eukaryotes.

References and Footnotes
  1. Singhal, P. and Jayaram, B. A Novel Whole Genome Analysis Method Based on DNA Energetics. Manuscript in preparation (2007).
  2. Dutta, S., Singhal, P., Agrawal, P., Tomer, R., Kritee, Khurana, E., and Jayaram, B. J Chem Inf Mod 46, 78-85 (2006).
  3. Dixit, S. B., Beveridge, D. L. et al. Biophysical Journal 89, 3721-3740 (2005).
  4. Jayaram, B. J Mol Evol 45, 704 (1997).

Poonam Singhal1, *
B. Jayaram1
Surjit B. Dixit2
D. L. Beveridge2

1Department of Chemistry & Supercomputing Facility for Bioinformatics & Computational Biology
Indian Institute of Technology
Hauz Khas, New Delhi-110016, India
2Department of Chemistry
Wesleyan University
Middletown, CT-06459, USA

*Email: poonam@scfbio-iitd.res.in