Book of Abstracts: Albany 2003
June 17-21 2003
Analysis, Prediction and Evolution of Protein-DNA Interactions
An overview will be presented of our work on analysing protein-DNA interactions from a structural perspective. We have grouped the proteins into homologous families and studied the geometry of their interactions and how the proteins have evolved. We investigate the conservation of amino acid sequences in 21 DNA-binding protein families and study the effects that mutations have on DNA-sequence recognition. The observations are best understood by assigning each protein family to one of three classes: (i) non-specific, where binding is independent of DNA sequence, (ii) highly specific where binding is specific and all members of the family target the same DNA sequence, and (iii) multi-specific where binding is also specific, but individual family members target different DNA sequences. Overall, protein residues in contact with the DNA are better conserved than the rest of the protein surface, but there is a complex underlying trend of conservation for individual residue positions. Amino acids that interact with the DNA backbone are well conserved across all protein families and provide a core of stabilising contacts for homologous protein-DNA complexes. In contrast, amino acids that interact with DNA bases have variable levels of conservation depending on the family classification. In non-specific families, base-contacting residues are well conserved and interactions are always found in the minor groove where there is little discrimination between base types. In highly specific families, base-contacting residues are highly conserved and allow member proteins to recognise the same target sequence. In multi-specific families, base-contacting residues undergo frequent mutations and enable different proteins to recognise distinct target sequences. (1, 2)
More recently we have developed sequence and structural motifs to identify DNA-binding proteins using structural data. This is relevant for exploiting the output from structural genomics projects (3) A structural template library of 144 HTH motifs has been created from DNA-binding proteins in the Protein Data Bank. The templates were used to scan complete protein structures using an algorithm that calculated the root mean squared deviation (rmsd) for the optimal superposition of each template on each structure, based on Cα backbone coordinates. The template library and the validated thresholds were used to make predictions for target proteins from a structural genomics project.
J. M. Thornton1,*
1EMBL- European Bioinformatics Institute