Book of Abstracts: Albany 2007
June 19-23 2007
Widespread DNA Structural Constraint in the Human Genome
Success in deciphering functional signals from the vast non-coding landscape of the human genome has come from advances in detecting sequences that are conserved between multiple species. However, fundamental questions still exist that are not amenable to traditional genomic analysis. For example, results from the ENCODE Pilot Project show that many functional elements are not evolutionarily constrained at the primary sequence level. We have developed a novel algorithm to assess evolutionary constraint based on DNA structure (as measured by the hydroxyl radical cleavage pattern) rather than primary sequence.
After correcting for false discovery rates, we find that more than twice as much territory in the ENCODE regions of the human genome is covered when the set of structurally-constrained regions are merged with the set of sequence-constrained regions. Furthermore, regions that are identified as constrained based on structure and not primary sequence are not distributed at random. Instead, these regions overlap functional elements (e.g., DNase I hypersensitive sites, promoters, transcription start sites, histone acetylation sites). That is, some functional elements that are not constrained at the level of primary sequence are constrained based on DNA structure. Surprisingly, the majority of functional elements conform to this paradigm: a larger proportion of elements are constrained based on structure compared to sequence alone. This increased enrichment is significantly more than what is expected at random.
Our new high-resolution DNA structure conservation method reveals that structural constraint is widespread throughout the human genome, and that these regions are informative of known functional sites. That natural selection operates to preserve not only information encoded in the sequence of DNA, but also in its local structure, may be of critical importance to understanding how the human genome functions, and to refining what is meant by ?constrained sequence.?
Stephen C. J. Parker*1
1Program in Bioinformatics, Boston University, Boston, Massachusetts, USA;