Book of Abstracts: Albany 2007
June 19-23 2007
Prediction of Nucleosome Positioning Using a Support Vector Machine
Understanding how DNA is packaged into chromatin is a fundamental problem in modern biology. Transcription factors and other biologically active molecules prefer to bind to regions of DNA that are not associated with a nucleosome, the fundamental unit of chromatin. Recent studies suggest that the signal which determines what regions are incorporated into nucleosomes is encoded by the sequence itself (2,3,4). However, detecting this signal has proven difficult. Here, we describe the application of a support vector machine (SVM) to integrate a large number of sequence features to discriminate between sequences bound by nucleosomes and those that are excluded.
Previous support vector machines designed to discriminate one class of sequence from another (1) have used a distribution of k-mers, derived from all subsequences of length k, to represent each sequence. This approach eliminates any information regarding where in the original sequence each k-mer can be found. Here we incorporate positional information by splitting the original sequence into a small number of large subsequences of equal length. Feature distributions are computed for each of these large subsequences in the same manner as for the full length sequence. The inclusion of position specific information means that training sequences can be longer, including the context of a sequence known to strongly bind nucleosomes.
Sequences were obtained from a tiled microarray experiment over a portion of the S. cerevesiae genome (2). An SVM trained on this data was then used to predict the relative nucleosome occupancy of sequences derived from regions of yeast absent in the training set. These predictions correlate remarkably well with other experimental results.
References and Footnotes
Eric Bishop*1 and
1Program in Bioinformatics, Boston University, Boston, Massachusetts, USA
1Department of Chemistry, Boston University, Boston, Massachusetts, USA