Book of Abstracts: Albany 2011

category image Albany 2011
Conversation 17
June 14-18 2011
©Adenine Press (2010)

Physicochemical Features of Protein-DNA Interactions and the Identification of Interface Region in DNA-Binding Proteins

The use of physicochemical features of protein-protein and protein-DNA interfaces can act as useful constraints in docking studies (1-3). Amino acid residues important for structure and function are evolutionary conserved, and tend to cluster together at the binding site (4-5). Even though recently, there is substantial progress on our understanding of the interactions between DNA and proteins (6-8), still we lack quantitative or semi quantitative description of the surfaces involved. Following the protocol used in protein-protein interfaces (5), here we analyze the clustering of conserved residues in protein-DNA interfaces and show how this and other features, such as the propensity of residues to occur in the interfaces, can be used for the discrimination of the real interface from other random surface patches.

In 129 non-redundant interfaces from 126 protein-DNA complexes, 81% have the conserved positions clustered within the overall interface region – indicated by rho (ratio of parameters representing the degree of clustering of conserved residues relative to the overall interface) being greater than 1. The use of rho can identify the interface (with rank 1) from other randomly generated surface patches in ~46% of the cases. The incorporation of the Euclidean distance of the composition of a surface patch from the average value in all protein-DNA interfaces improves the efficiency by another 6%. While the effectiveness of the clustering in the discrimination of the real interface is rather mediocre, the use of Rp (9) (the number of a residue type in a patch multiplied by its propensity to occur in the interface, summed over all the residues in the patch) can identify 81% of the interfaces with rank 1. Another parameter Dp, the number of potential hydrogen bond donors in the patch gives an accuracy of ~65%.


  1. S. Ahmad, M. M. Gromiha, A. Sarai, Bioinformatics 20, 477–486 (2004).
  2. Y. Mandel-Gutfrend, H. Margalit, Nucleic Acids Res 26, 2306–2312 (1998).
  3. S. Biswas, M. Guharoy, P. Chakrabarti, Proteins 74, 643-654 (2009).
  4. S. Ahmad, O. Keskin, A. Sarai, R. Nussinov, Nucleic Acids Res 36, 5922-5932 (2008).
  5. M. Guharoy, P. Chakrabarti, BMC Bioinformatics 11, 286 (2010).
  6. S.M. West, R. Rohs, R. S. Mann, B. Honig, J Biomol Struct Dyn 27, 861-866 (2010).
  7. D. Wang, N. B. Ulyanov, V. B. Zhurkin, J Biomol Struct Dyn 27, 843-859 (2010).
  8. C. Carra, F. A. Cucinotta, J Biomol Struct Dyn 27, 407-427 (2010).
  9. R. P. Bahadur, P. Chakrabarti, F. Rodier, J. Janin, J Mol Biol 336, 943–955 (2004).

Pinak Chakrabarti,1,2
Sucharita Dey,1
Arumay Pal,2
Mainak Guharoy2

1Bioinformatics Centre
2Department of Biochemistry
Bose Institute
P-1/12 CIT Scheme VIIM
Kolkata 700054, India

ph: (91) (33) 2355-0256
fx: (91) (33) 2355-3886
pinak_chak@yahoo.co.in pinak@bic.boseinst.ernet.in