Albany 2015:Book of Abstracts
June 9-13 2015
©Adenine Press (2012)
Classification of protein sequences with special references to multi-domain systems
Classification of proteins into families has profoundly influenced function annotation of proteins, which adds enormous value to the genome sequence data of various model organisms and pathogens. Association of a protein family to a newly discovered protein sequence by homology searches is the starting point for recognition of functional, structural and regulation properties of the protein. However given that a domain of a protein is often characteristic of a specific function, understandably, current developments of protein families are almost always made at the level of protein domain families. Some of the popular developments of protein domain families include SCOP, CATH, Pfam, PRODOM and SMART. Association of domain families to a newly discovered sequence of a multi-domain protein results in recognition of a set of domains in the sequence. While it provides an opportunity to recognize functions of individual domains, interplay of the functions of these domains, which confer function and regulation of the protein considered as a whole, is not clear. Unfortunately the magnitude of this problem is huge as majority of proteins in almost all eukaryotic organisms of known genome sequence are multi-domain in nature.
As a first step towards the long term objective of understanding the functions of multi-domain proteins considered in large scale, we have developed a method to classify proteins considered as a whole with simultaneous consideration of all the domains present in a protein. The method, CLAP (CLassification of Proteins), employs an alignment free approach involving match of sliding window of pentapeptide sequence stretches in quantifying similarity between two proteins of known amino acid sequence. Our assessment involving single domain tyrosine phosphatases suggests that CLAP performs nearly as good as alignment-based methods which are usually incapable of being effective with multi-domain proteins. Application of CLAP on protein kinases resulted in clusters of kinases consistent with the well-established Hanks and Hunter classification scheme for kinases. A case study on immunoglobulins, a highly promiscuous and divergent family, showed that CLAP generated domain architecturally pure clusters with high functional relevance which is quantified on the basis of Gene Ontology scores. We have also successfully shown that our method is ~ 7 times faster than alignment-based methods. Thus our method is capable of providing biologically meaningful clustering of a set of proteins, utilizing only the sequence information. CLAP is freely available as a web-server at http://nslab.mbu.iisc.ernet.in/clap/
Martin, J, Anamika, K, Srinivasan, N (2010) Classification of Protein Kinases on the Basis of Both Kinase and Non-Kinase Regions. PLoS ONE 5: e12460.
Bhaskara, R. M., Mehrotra, P., Rakshambikai, R., Gnanavel, M., Martin, J., & Srinivasan, N. (2014) The relationship between classification of multi-domain proteins using an alignment-free approach and their functions: a case study with immunoglobulins. Mol. BioSyst., 10, 1082-1093.
Gnanavel, M., Mehrotra, P., Rakshambikai, R., Martin, J., Srinivasan, N., & Bhaskara, R. M. (2014). CLAP: A web-server for automatic classification of proteins with special reference to multi-domain proteins. BMC Bioinformatics, 15, 343.
1IISc Mathematics Initiative