Book of Abstracts: Albany 2007
June 19-23 2007
Structure based identification of promoters in genomic DNA
Analysis of various predicted structural properties of promoter regions in prokaryotic as well as eukaryotic genomes indicates that they have several common features, such as lower stability, higher curvature and less bendability, when compared with their neighboring regions (1). Based on the difference in stability between neighboring upstream and downstream regions in the vicinity of experimentally determined transcription start sites, a promoter prediction algorithm has been developed to identify prokaryotic promoter sequences in whole genomes (2). The average free energy (E) over known promoter sequences and the difference (D) between E and the average free energy over the entire genome (G) are used to search for promoters in the genomic sequences. Using these cutoff values to predict promoter regions across entire E. coli genome, leads to reliability of 77% when the predicted promoters were cross verified against the 957 transcription start sites (TSSs) listed in the Ecocyc database. This compares well with the results of promoter prediction program based on stress induced DNA duplex destabilization (SIDD) which attained a reliability of only 37%, when this property alone was used as a distinctive structural attribute to identify promoter sequences in the E. coli genome. A web based database ?ECOPROM? has been created for the predicted promoters in E. coli . Using a similar procedure, reliability of 61% is achieved in predicting promoters over the whole B. subtilis genome, as verified against the 879 TSSs listed in the DBTBS database. We consider a region as true positive (TP) only if it lies or overlaps with the 200nt long region spanning from -150 to +50 with respect to TSSs (this condition is more stringent than those used by other programs). If more than one region satisfies the TP condition, only the region nearest to the TSS is considered as TP. Since it appears that some genes may have several different promoter regions which may be active under different conditions, a manual examination of such selected genes has been carried out to consolidate the predictions. The NNPP program for promoter prediction, which uses a neural network model to arrive at different weights to predict promoter regions, has less sensitivity and precision as compared to the stability method. Since the proposed method is based on a physico-chemical property of the DNA double helix, it is quite general and can be used to annotate the promoter regions of other genomes, with different AT/GC contents.
References and Footnotes
Molecular Biophysics Unit,