Albany 2013: Book of Abstracts
June 11-15 2013
©Adenine Press (2012)
Discovery of Novel ncRNA by Scanning Multiple Genome Alignments
Recently, non-coding RNAs (ncRNAs) have been discovered with novel functions, and it has been appreciated that there is pervasive transcription. Therefore, de novo computational ncRNA detection that are accurate and efficient are desirable. The purpose of this study is to develop a ncRNA detection method based on structural conservation.
A new method called Multifind, based on Multilign (Xu & Mathews 2011), was developed. It uses an algorithm that predicts common structures among multiple sequences and estimates the probability the input sequences are ncRNA using a classification support vector machine (SVM). Multilign uses Dynalign (Mathews & Turner 2002), which folds and aligns two sequences simultaneously without requiring any sequence identity; its structure prediction quality will therefore not be affected by input sequence diversity. Benchmarks showed Multifind performs better than RNAz on testing sequences extracted from Rfam database (Gardner et al. 2011), especially on sequences that are more diverse. For de novo ncRNA discovery in genomes, Multifind had an advantage in low similarity regions of genome alignments. Multifind takes about 48 hours to finish scanning the whole yeast genome alignment and RNAz takes about 4 hours, therefore, its computational requirements do not present a barrier for most users.
The program was implemented in C++ and is included in RNAstructure package (Reuter & Mathews, 2010): http://rna.urmc.rochester.edu.
1Department of Biochemistry & Biophysics and Center for RNA Biology, University of Rochester Medical Center, Rochester, NY, USA