Albany 2015:Book of Abstracts
June 9-13 2015
©Adenine Press (2012)
TFBSshape: a motif database for DNA shape features of transcription factor binding sites
Transcription factor binding sites (TFBSs) are most commonly characterized by the nucleotide preferences at each position of the DNA target. Whereas these sequence motifs are quite accurate descriptions of the DNA binding specificity of transcription factors (TFs), proteins recognize DNA as a three-dimensional object. Therefore, DNA structural features refine the description of TF binding specificities and provide mechanistic insights into protein-DNA recognition. Motif databases contain large numbers of nucleotide sequences identified in binding experiments based on their selection by a TF. To utilize DNA shape information when analyzing the DNA binding specificities of TFs, we developed a new tool for calculating DNA structural features from nucleotide sequences provided by motif databases. Our resulting TFBSshape database generates heat maps and quantitative data for the DNA structural features minor groove width, propeller twist, roll, and helix twist for 729 TF datasets from 23 different species derived from the motif databases JASPAR and UniPROBE. As demonstrated for the basic helix-loop-helix and Hox TFs, our TFBSshape database can be used to uncover differential DNA binding preferences of closely related TFs. Our approach can also be used to quantify the structural similarity between distinct sequence motifs. The TFBSshape database is freely available at http://rohslab.cmb.usc.edu/TFBSshape/. With the availability of DNA structural data, machine learning methods such as multiple linear regression can be used to construct models of TF-DNA binding specificity that incorporate both DNA sequence and shape information, which can introduce performance increase to sequence-only models and help gain new insights into TF-DNA recognition mechanisms.
1Molecular and Computational Biology Program