SPINE-D: Accurate Prediction of Short and Long Disordered Regions by a Single Neural-Network Based Method

Short and long disordered regions of proteins have different preference for different amino acid residues. Different methods often have to be trained to predict them separately. In this study, we developed a single neural-network-based technique called SPINE-D that makes a three-state prediction first (ordered residues and disordered residues in short and long disordered regions) and reduces it into a two-state prediction afterwards. SPINE-D was tested on various sets composed of different combinations of Disprot annotated proteins and proteins directly from the PDB annotated for disorder by missing coordinates in X-ray determined structures. While disorder annotations are different according to Disprot and X-ray approaches, SPINE-D’s prediction accuracy and ability to predict disorder are relatively independent of how the method was trained and what type of annotation was employed but strongly depend on the balance in the relative populations of ordered and disordered residues in short and long disordered regions in the test set. With greater than 85% overall specificity for detecting residues in both short and long disordered regions, the residues in long disordered regions are easier to predict at 81% sensitivity in a balanced test dataset with 56.5% ordered residues but more challenging (at 65% sensitivity) in a test dataset with 90% ordered residues. Compared to eleven other methods, SPINE-D yields the highest area under the curve (AUC), the highest Mathews correlation coefficient for residue-based prediction, and the lowest mean square error in predicting disorder contents of proteins for an independent test set with 329 proteins. In particular, SPINE-D is comparable to a meta predictor in predicting disordered residues in long disordered regions and superior in short disordered regions. SPINE-D participated in CASP 9 blind prediction and is one of the top servers according to the official ranking. In addition, SPINE-D was examined for prediction of functional molecular recognition motifs in several case studies. The server and databases are available at http://sparks.informatics.iupui.edu/.

This article can be cited as:
T. Zhang, E. Faraggi, B. Xue, A. Keith, V.N. Uversky, Y. Zhao. SPINE-D: Accurate Prediction of Short and Long Disordered Regions by a Single Neural-Network Based Method J. Biomol Struct Dyn 29(4), 799-813 (2012).

Tuo Zhang1,2,5
Eshel Faraggi1,2,5
Bin Xue3
A. Keith Dunker2
Vladimir N. Uversky3,4
Yaoqi Zhou1,2*

1School of Informatics, Indiana University Purdue University, Indianapolis, IN 46202, USA
2Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, IN 46202, USA
3Department of Molecular Medicine, University of South Florida, Tampa, FL 33612, USA
4Institute for Biological Instrumentation, Russian Academy of Sciences, 142290 Pushchino, Moscow Region, Russia
5Equal contribution.


Open Access Article
The authors, the publisher, and the right holders grant the right to use, reproduce, and disseminate the work in digital form to all users.

Download full Text PDF of Article