19th-banner-rev.gif

Albany 2015:Book of Abstracts

Albany 2015
Conversation 19
June 9-13 2015
©Adenine Press (2012)

TFBSshape: a motif database for DNA shape features of transcription factor binding sites

Transcription factor binding sites (TFBSs) are most commonly characterized by the nucleotide preferences at each position of the DNA target. Whereas these sequence motifs are quite accurate descriptions of the DNA binding specificity of transcription factors (TFs), proteins recognize DNA as a three-dimensional object. Therefore, DNA structural features refine the description of TF binding specificities and provide mechanistic insights into protein-DNA recognition. Motif databases contain large numbers of nucleotide sequences identified in binding experiments based on their selection by a TF. To utilize DNA shape information when analyzing the DNA binding specificities of TFs, we developed a new tool for calculating DNA structural features from nucleotide sequences provided by motif databases. Our resulting TFBSshape database generates heat maps and quantitative data for the DNA structural features minor groove width, propeller twist, roll, and helix twist for 729 TF datasets from 23 different species derived from the motif databases JASPAR and UniPROBE. As demonstrated for the basic helix-loop-helix and Hox TFs, our TFBSshape database can be used to uncover differential DNA binding preferences of closely related TFs. Our approach can also be used to quantify the structural similarity between distinct sequence motifs. The TFBSshape database is freely available at http://rohslab.cmb.usc.edu/TFBSshape/. With the availability of DNA structural data, machine learning methods such as multiple linear regression can be used to construct models of TF-DNA binding specificity that incorporate both DNA sequence and shape information, which can introduce performance increase to sequence-only models and help gain new insights into TF-DNA recognition mechanisms.

Lin Yang1
Iris Dror1,2
Tianyin Zhou1
Anthony Mathelier3
Wyeth W. Wasserman3
Raluca Gordan4
Remo Rohs1*

1Molecular and Computational Biology Program
University of Southern California
Los Angeles, CA 90089, USA
2Department of Biology
Technion-Israel Institute of Technology
Technion City
Haifa 32000, Israel
3 Centre for Molecular Medicine and Therapeutics
University of British Columbia
Vancouver, BC, Canada
4Institute for Genome Sciences & Policy
Duke University
Durham, NC 27708, USA

yang23@usc.edu