Book of Abstracts: Albany 2003
June 17-21 2003
RNA Informatics: Discovery of RNA Structural Features by Computation
Increasingly we see that RNA has functions in important biological processes beyond carrying the code for protein synthesis and ribosome structure. Included among these functions are catalysis and modulation of gene expression mediated by self-splicing ribozymes, small inhibitory RNAs(RNAi), translational frame shifting, internal ribosome entry sites(IRES), iron response elements, and mRNA localization. The ever-increasing accumulation of genomic sequences poses a major challenge for the discovery of similar and new functional elements. This is an especially inviting challenge for bioinformatics and computational biology. Specialized sequence, structure and motif databases of selected and annotated tRNAS, ribosomal RNAs, mRNAS and selected 5?- and 3?- untranslated regions of mRNAs and others are appearing. Most of the functional elements in RNA involve higher order structure rather than simple, linear sequence motifs. Computational algorithms using dynamic programming, genetic algorithmic or other methods for attempted prediction of secondary structure in folded, self-complementary sequences are well known. We mainly adapt the dynamic algorithm approach to discover statistically unusual folding regions (UFR) in RNAs. This is done by scanning successive segments along a sequence for differences between the folded property of the natural sequences and the property of a number of segments of the same size and composition with shuffled sequence. Another approach to finding well-formed structures is to compare the optimal structure of a segment with another optimal structure in which all the previous base pairings are forbidden. When a particular region is deemed to be of interest, often correlated with attending experimental data on a biological phenomenon, we then attempt to localize the critical sequence and predict its secondary and tertiary structure. This aspect can be greatly aided if related, but different sequences, either by mutation or by phylogenetics, are available. We use several manual or automated bioinformatics approaches to find conserved structures. In a unique genetic algorithm developed here conservation of optimal stability and structure are done simultaneously. These programs are available for anonymous ftp download at ftp.ncifcrf.gov/pub/users/shuyun and ftp.ncifcrf.gov/pub/users/chen. They are also available as web based servers at http://protein3d.ncifcrf.gov/shuyun/rna2d.html.
Examples will be shown where these tools have discovered correlations between UFRs and the above-mentioned non-coding functions, and the potential and implications of further discoveries will be discussed. Because there is still considerable opportunity for improving the sensitivity, accuracy and precision of computational structure predictions, we will address problems and approaches for their improvement.
1Laboratory of Experimental and Computational Biology