Book of Abstracts: Albany 2007

category image Albany 2007
Conversation 15
June 19-23 2007

Widespread DNA Structural Constraint in the Human Genome

Success in deciphering functional signals from the vast non-coding landscape of the human genome has come from advances in detecting sequences that are conserved between multiple species. However, fundamental questions still exist that are not amenable to traditional genomic analysis. For example, results from the ENCODE Pilot Project show that many functional elements are not evolutionarily constrained at the primary sequence level. We have developed a novel algorithm to assess evolutionary constraint based on DNA structure (as measured by the hydroxyl radical cleavage pattern) rather than primary sequence.

After correcting for false discovery rates, we find that more than twice as much territory in the ENCODE regions of the human genome is covered when the set of structurally-constrained regions are merged with the set of sequence-constrained regions. Furthermore, regions that are identified as constrained based on structure and not primary sequence are not distributed at random. Instead, these regions overlap functional elements (e.g., DNase I hypersensitive sites, promoters, transcription start sites, histone acetylation sites). That is, some functional elements that are not constrained at the level of primary sequence are constrained based on DNA structure. Surprisingly, the majority of functional elements conform to this paradigm: a larger proportion of elements are constrained based on structure compared to sequence alone. This increased enrichment is significantly more than what is expected at random.

Our new high-resolution DNA structure conservation method reveals that structural constraint is widespread throughout the human genome, and that these regions are informative of known functional sites. That natural selection operates to preserve not only information encoded in the sequence of DNA, but also in its local structure, may be of critical importance to understanding how the human genome functions, and to refining what is meant by ?constrained sequence.?

Stephen C. J. Parker*1
Loren Hansen1, 2
Eric Bishop1
David Landsman2
NISC Comparative Sequencing Program3
Elliott H. Margulies3and
Thomas D. Tullius1, 4

1Program in Bioinformatics, Boston University, Boston, Massachusetts, USA;

2National Center for Biotechnology Information and 3National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland, USA;

4Department of Chemistry, Boston University, Boston, Massachusetts, USA.

Phone: 617-353-8810
Email: parker@bu.edu