Albany 2019: 20th Conversation - Abstracts

Albany 2019
Conversation 20
June 11-15 2019
Adenine Press (2019)

Genome-wide survey of protein structural domains and analysis of domain architectures

In the past few years, the increase in genome and transcriptome studies has resulted in a plethora of protein sequences being deposited in the sequence databases. But, there exists a huge deficit between the number of structures and sequences available. In this work, we have tried to bridge this gap using SCOP superfamily domain sequences to identify homologues in non-redundant database from NCBI (Iyer, M.S., et. al.,2018a). The sequences were validated using structure-based sequence alignment HMMs derived from SCOP (v1.75) superfamily members in PASS2.4 database. Domain architectures were computed for the validated hits at structure-level (SCOP) and sequence-level (Pfam). The associated domains in a domain architecture were found to be involved in the same biological process. A correspondence of Pfam and SCOP superfamilies was obtained for each superfamily covering about 61% of Pfam families. The structural census (Chothia, C. 1992; Caetano-Anollés, G., et. al., 2009) was revisited and distribution of homologues across superfamilies, folds and classes were analysed. About 27% of NR database and 41% of the taxonomy database (from NCBI) were covered in the study. The results from the above analysis have been presented in the form of a database named GenDiS+ (Pugalenthi, G., et. al., 2005; Iyer, M.S., et. Al., 2018b). Profiles derived from the alignments of superfamily homologues can be used in sequence searches and for assigning structural domains to sequences.

meenakshi-fig.gif This research has been supported by NCBS, TIFR.


    Caetano-Anollés, G., et al. (2009). The origin, evolution and structure of the protein world. Biochemical Journal, 417(3), 621–637.

    Chothia, C. (1992). One thousand families for the molecular biologist. Nature, 357(6379), 543–544.

    Iyer, M.S., et al (2018a). Genome-wide survey of remote homologues for protein domain superfamilies of known structure reveals unequal distribution across structural classes. Mol. omics, 14(4), 266-280.

    Iyer, M.S., et al(2018b). GenDiS+: improved sequence search and validation and update of features for homologues of Structural Superfamily members. (Manuscript under review)

    Pugalenthi, G., et al. (2005). GenDiS: Genomic Distribution of protein structural domain Superfamilies. Nucleic Acids Research.

Iyer Meenakshi S.
Adwait G. Joshi and
R. Sowdhamini

meenakshi-photo.gifMeenakshi is a post-doctoral student in Prof Sowdhamini's lab at NCBS, and will present a short oral in the Big Data session.

National Centre for Biological Sciences,
Bellary Road,
Bangalore – 560065India

Ph: +91 80 2366 6250
Email: mini@ncbs.res.in