Book of Abstracts: Albany 2007

category image Albany 2007
Conversation 15
June 19-23 2007

Prediction of Nucleosome Positioning Using a Support Vector Machine

Understanding how DNA is packaged into chromatin is a fundamental problem in modern biology. Transcription factors and other biologically active molecules prefer to bind to regions of DNA that are not associated with a nucleosome, the fundamental unit of chromatin. Recent studies suggest that the signal which determines what regions are incorporated into nucleosomes is encoded by the sequence itself (2,3,4). However, detecting this signal has proven difficult. Here, we describe the application of a support vector machine (SVM) to integrate a large number of sequence features to discriminate between sequences bound by nucleosomes and those that are excluded.

Previous support vector machines designed to discriminate one class of sequence from another (1) have used a distribution of k-mers, derived from all subsequences of length k, to represent each sequence. This approach eliminates any information regarding where in the original sequence each k-mer can be found. Here we incorporate positional information by splitting the original sequence into a small number of large subsequences of equal length. Feature distributions are computed for each of these large subsequences in the same manner as for the full length sequence. The inclusion of position specific information means that training sequences can be longer, including the context of a sequence known to strongly bind nucleosomes.

Sequences were obtained from a tiled microarray experiment over a portion of the S. cerevesiae genome (2). An SVM trained on this data was then used to predict the relative nucleosome occupancy of sequences derived from regions of yeast absent in the training set. These predictions correlate remarkably well with other experimental results.

References and Footnotes
  1. W. S. Noble, S. Kuehn, R. Thurman, M. Yu, and J. Stamatoyannopoulos. Bioinformatics 21, i338-i343 (2005)
  2. G. Yuan, Y. Liu, M F. Dion, M. D. Slack, L. F. Wu, S. J. Altschuler and O. J. Rando. Science 309, 626-630 (2005)
  3. E. Segal, Y. Fondufe-Mittendorf, L. Chen, A. Thastrom, Y. Field, I. K. Moore, J. Z. Wang and J. Widom. Nature 442, 772-778 (2006)
  4. S. M. Johnson, F. J. Tan, H. L. McCullough, D. P. Riordan and A. Z. Fire. Genome Research 16, 1505-1516 (2006)

Eric Bishop*1 and
Thomas D. Tullius 1, 2

1Program in Bioinformatics, Boston University, Boston, Massachusetts, USA 1Department of Chemistry, Boston University, Boston, Massachusetts, USA

Phone: 617-353-8810
Fax: 617-353-4814
Email: ebishop@bu.edu