19th-banner-rev.gif

Albany 2013: Book of Abstracts

category image Albany 2013
Conversation 18
June 11-15 2013
©Adenine Press (2012)

Discovery of Novel ncRNA by Scanning Multiple Genome Alignments

Recently, non-coding RNAs (ncRNAs) have been discovered with novel functions, and it has been appreciated that there is pervasive transcription. Therefore, de novo computational ncRNA detection that are accurate and efficient are desirable. The purpose of this study is to develop a ncRNA detection method based on structural conservation.

A new method called Multifind, based on Multilign (Xu & Mathews 2011), was developed. It uses an algorithm that predicts common structures among multiple sequences and estimates the probability the input sequences are ncRNA using a classification support vector machine (SVM). Multilign uses Dynalign (Mathews & Turner 2002), which folds and aligns two sequences simultaneously without requiring any sequence identity; its structure prediction quality will therefore not be affected by input sequence diversity. Benchmarks showed Multifind performs better than RNAz on testing sequences extracted from Rfam database (Gardner et al. 2011), especially on sequences that are more diverse. For de novo ncRNA discovery in genomes, Multifind had an advantage in low similarity regions of genome alignments. Multifind takes about 48 hours to finish scanning the whole yeast genome alignment and RNAz takes about 4 hours, therefore, its computational requirements do not present a barrier for most users.

The program was implemented in C++ and is included in RNAstructure package (Reuter & Mathews, 2010): http://rna.urmc.rochester.edu.

References

  1. Xu, Z.J. and Mathews, D.H. (2011) Multilign: an algorithm to predict secondary structures onserved in multiple RNA sequences, Bioinformatics, 27, 626-632.
  2. Mathews, D.H. and Turner, D.H. (2002) Dynalign: An algorithm for finding the secondary structure common to two RNA sequences, Journal of Molecular Biology, 317, 191-203.
  3. Gardner, P.P., et al. (2011) Rfam: Wikipedia, clans and the decimal release, Nucleic Acids Research, 39, D141.
  4. Reuter, J.S. and Mathews, D.H. (2010) RNAstructure: software for RNA secondary structure prediction and analysis, BMC Bioinformatics, 11.

Yinghan Fu1
Zhenjiang Xu1
Zhi J. Lu2
Shan Zhao1
and David H. Mathews1,3*

1Department of Biochemistry & Biophysics and Center for RNA Biology, University of Rochester Medical Center, Rochester, NY, USA
2School of Life Sciences, Tsinghua University, Beijing, China
3Department of Biostatistics & Computational Biology, University of Rochester Medical Center, Rochester, NY, USA

*To whom correspondence should be addressed.

yinghan_fu@urmc.rochester.edu