Issue February 2006No. 4 (p 357-484) February 2006 ISSN 0739-110
Open Access Protein Structure Evaluation using an All-Atom Energy Based Empirical Scoring Function (p. 385-406)Arriving at the native conformation of a polypeptide chain characterized by minimum most free energy is a problem of long standing interest in protein structure prediction endeavors. Owing to the computational requirements in developing free energy estimates, scoring functions -- energy based or statistical -- have received considerable renewed attention in recent years for distinguishing native structures of proteins from non-native like structures. Several cleverly designed decoy sets, CASP (Critical Assessment of Techniques for Protein Structure Prediction) structures and homology based internet accessible three dimensional model builders are now available for validating the scoring functions. We describe here an all-atom energy based empirical scoring function and examine its performance on a wide series of publicly available decoys. Barring two protein sequences where native structure is ranked second and seventh, native is identified as the lowest energy structure in 67 protein sequences from among 61,659 decoys belonging to 12 different decoy sets. We further illustrate a potential application of the scoring function in bracketing native-like structures of two small mixed alpha/beta globular proteins starting from sequence and secondary structural information. The scoring function has been web enabled at www.scfbio-iitd.res.in/utility/proteomics/energy.jsp
Key words: Biomolecular modeling; Decoys; Physics-based potentials; Protein tertiary structure; Scoring function. Pooja Narang Department of Chemistry
Introduction
Generating the three-dimensional structure of a polypeptide chain resident at the global minimum on the free energy surface is the central theme of most of the computational protocols for protein folding. Free energy calculations (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11) of macromolecular systems in aqueous media are in general computationally intensive and molecular simulations on proteins to capture the minimum most free energy conformation are intractable as of date. Thus a viable option appears to be to generate a multitude of representative structures for a given molecular/biomolecular system followed by a rapid assay for locating the most preferred conformation under the prescribed external constraints, with the proviso that the native structure is characterized by the minimum most free energy. All-atom energy based empirical scoring functions in this regard appear to be a preferred choice in that they capture the physics of the problem, yet keep the protocol computationally simple. The need for separating native-like conformations from the non-native like ones in protein folding attempts (12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39) has resulted in intense efforts to devise newer, better, and more efficient force fields (40, 41, 42, 43, 44, 45, 46) and empirical scoring functions (47, 48, 49, 50). Balancing simplicity and speed of the computational protocol with accuracy remains a challenge. We report here an investigation of an all-atom energy based empirical scoring function for protein structure evaluation and examine its ability to distinguish the native from a series of cleverly designed and extensively studied decoy sets. We also illustrate a computational pathway for bracketing native-like structures of small alpha/beta proteins in a potential application of the scoring function. Background The various empirical energy functions proposed for the evaluation of energy of protein molecules can be divided into three categories: knowledge/statistics based (51, 52, 53, 54, 55, 56), physics based (57), and hybrids (58) of both. Table I presents a compilation of different scoring functions proposed in the literature and protein decoy sets prepared and studied for the validation of scoring functions. Statistically derived energy functions are parameterized on the experimentally known structures and generally use reduced representations of the protein molecule. Park and Levitt have used such energy functions on the decoys and found that instead of a single energy component, a combination of energy terms makes for a better distinction among the decoys and the native structure (59). A knowledge based energy function proposed by Bahar and Jernigan, calculates residue specific potentials between side chains and pairs of side chain backbone interactions (60). The potentials are shown to discriminate the correct conformations in reverse folding experiments. Fain et al. have proposed a hydrophobicity based potential to distinguish correct from the incorrect folds (61) (Table I). The statistically derived scoring function introduced by McConkey et al. uses contact based potential developed according to the contact preferences of the residue specific atom types in proteins, for distinguishing the native from the decoy structures (62) (Table I). A hydrophobic fitness function was proposed and tested by Huang et al. (63) on decoy structures, which were generated by molecular dynamics simulations at 298K and 498K for five small monomeric proteins. For two of the five proteins, the energy function was able to discriminate native from the decoy structures. Huang et al. proposed knowledge based scoring function combining all-atom distances, a residue pair potential and hydrophobicity to rank structures generated by distance geometry approach (64). In five of the 11 cases studied, the best-ranking fold was found to be near native. An energy function ProVal (Protein Validate) proposed by Berglund et al., utilizes variables describing 12 different aspects of protein structure (65) (Table I). The function distinguished native structure from a group of decoys in 18 out of 28 sequences. Empirical free energy based scoring functions combining molecular mechanics with solvation and entropic terms have been studied in detail, to characterize the contribution of individual energy terms as well as their combinations, in their ability to discriminate between the native and decoys. Lazaridis and Karplus have proposed an effective energy function, which combines the CHARMM 19 (Chemistry at HARvard Molecular Mechanics) vacuum potential with Gaussian solvation model (66) (Table I). The energy function distinguished the native from decoy structures with an overall accuracy of 97%. Petrey and Honig utilized CHARMM 19 potential along with a continuum model for solvent to discriminate the native from decoy structures (67) (Table I). Thereafter, an effective energy function was proposed, as the sum of buried hydrophobic surface area and Coulomb energy term. This sum correctly identified the native in 51 of the 57 protein sequences studied. Gatchell et al. defined two criteria viz. correlation coefficient between RMSD (root mean square deviation) and free energy, and MDS (minimum discriminatory slope) providing quantitative measures of discrimination of near native proteins from misfolded decoys (68) (Table I). The MM-PBSA (Molecular Mechanics Poisson-Boltzmann Solvent Accesibility) free energy, a sum of the internal energy based on AMBER (Assisted Model Building and Energy Refinement) force field and solvation free energy based on finite difference PB (Poisson-Boltzmann) calculations, was used by Lee and Kollman (69) (Table I). The native was distinguishable, in nine of the ten X-Ray structures and in none of the five NMR structures, after single-point energy minimization. A short 150 ps molecular dynamics simulation was found to predict all the native structures as the lowest in free energy. A CHARMM gas phase implicit hydrogen force field in conjunction with a GB (Generalized Born) implicit solvation term was proposed by Dominy and Brooks as an optional criterion to distinguish the native from the decoy structures (70) (Table I). Felts et al. used an effective energy function based on OPLS (Optimized Liquid State) all-atom force field and GB model for solvent electrostatic effects (71) (Table I). The function discriminated native from the decoys in 17 of the 19 proteins studied. An energy function based on GROMOS96 (GROningen MOlecular Simulation) force field combined with GB solvation model was proposed by Zhu et al. (72) (Table I). Tsai et al. evaluated different scoring functions to recognize near-native structures from the modified Rosetta decoy set (73). Based on the analysis, they developed a combined scoring function that exhibited an enhanced performance over a variety of folds. A new energy function utilizing the parameters from the AMBER force field was tested by Lee and Duan on eight decoy sets (74) (Table I). The accuracy of the energy function in discriminating the native structure from the decoys on the whole was 81%. Hsieh et al. proposed a scoring function combining AMBER force field for intramolecular interactions and PB model for solvation. This, when applied to decoys, showed discrimination of native in almost all the six decoy sets studied (75). Thus, while there is a continuous progression over the last few years in the ability of empirical potential functions and protocols in characterizing the native as the lowest energy structure, design of an energy function which shows 100% discrimination between the native and the decoys, rapidly and accurately, remains a challenge. Here, we present an extensive study of an all-atom empirical energy function which combines second generation force field parameters with a hydrophobicity function, on a wide variety of decoy sets. We note that the function described here effectively discriminates the native structure from the decoys in nearly all the cases. We have also tested the energy function on homology built models to assess its general applicability. In a further extension of the work, an illustrative application of the scoring function in predicting/bracketing native-like structures of two small mixed alpha/beta globular proteins is also presented. Theory and Methodology The scoring function investigated considers the non-bonded energy of a protein, expressed as a sum of three energy terms ? electrostatic, van der Waals, and hydrophobic (76). Here Eel is the electrostatic contribution to the energy, Evdw is the van der Waals term, Ehpb is the hydrophobic contribution and the summation in Eq. [1] runs over all the atoms of the protein. The electrostatic contribution to the interaction energy between atom i and j of the protein molecule is computed as ....................[2]where qi and qj are the partial atomic charges taken from AMBER force field (40) for two interacting atoms, rij is the distance between the atoms i and j, and D(r) is a dielectric function. The D(r) (76-80) is taken as ![]() D(r) is a sigmoidal function. D = 78, Di = 4 , α = sr, and s = 0.395. The van der Waals interactions were modeled using a (12, 6) Lennard-Jones potential between the atoms of the protein. ....................[3]The Cij12 and Cij6 are the geometric means of the individual (12, 6) parameters which are derived by computing Rij and εij as and εi above is the well-depth parameter and R*i is half the distance to the well depth (σii = 2-1/6 R*ii and R*ii = 2R*i; alternatively, σii = 25/6 R*i). The R*i values are adopted from the AMBER force field parameters for amino acids (40). The (12, 6) parameters are then obtained as and The hydrophobic interactions are captured via the Gurney parameter approach (81, 82, 83, 84, 85, 86) which provides a computationally simple means for treating desolvation. ...................[4]Here, fij are the free energy parameters for desolvation. In the present calculations, value of fij is kept as 1.0 kcal/mol. Vexcl is the excluded volume calculated as ![]() where Vw = 4/3 r3w and rw = 1.575 Å, the radius of water molecule as represented by the TIP4P model (87). RHi are the hydration sphere radii of the atoms involved. These are evaluated as RHi = ari with a = 0.769 and ri = 2 Å, chosen to give -1 kcal mol-1 for the overlap of the hydration spheres of two methyl groups (88). The energy function {equations [1] to [4]} described above enables evaluation of the total non-bonded energy of the proteins in aqueous environment from the Cartesian coordinates of all the atoms. We have examined the performance of this scoring function previously on base pairs (76), alpha helices (77), protein-DNA complexes (89), and ion atmosphere around DNA (90). The empirical energy function was tested on several publicly available decoy sets spanning 69 protein sequences as shown in the last column of Table I. The EMBL (23 sequences) set is obtained from ProStar website (http://prostar.carb.nist.gov). 4state_reduced (7 sequences), lattice_ssfit (8 sequences), Lmds (10 sequences), fisa (4 sequences), fisa_CASP3 (6 sequences), hg_structal (29 sequences), semfold (6 sequences) sets were obtained from Decoys ?R? Us web sites (http://dd.stanford.edu and http://dd.compbio.washington.edu). The CASP1 (6 sequences) decoys were obtained from the protein structure prediction website (http://predictioncenter.llnl.gov/download_area/). Rosetta decoys (92 structures) were downloaded from the Baker laboratory web site (http://depts. washington.edu/bakerpg) and the CASP5 decoys (67 sequences) were downloaded from the Moult Group website (http://moult.carb.nist.gov/). Various decoy sets differ in their method of generation. Target proteins from each decoy set were selected, if the corresponding native structure was obtained by crystallography. In all cases, native protein structures containing metal ions and prosthetic groups were omitted. Proteins containing mismatches in the number of atoms, between the native and decoys were not considered for the present study. Also, proteins which are fragments or multimers were skipped. The total number of proteins present in each decoy set and those considered for the present study are compiled in Table II. The Table also shows the total number of decoys investigated from each decoy set. In all, a total of 69 protein sequences and 61,974 decoy structures were evaluated. ![]() To validate the energy function further, we have built homology models for eight known proteins. A good scoring function is expected to differentiate between the native and the homology models, constructed in the absence of close homologs. In the present study, sequence similarity searches were performed using PSI-BLAST software available on NCBI website (www.ncbi.nlm.nih.gov). Sequence templates with more than 60% similarity were ignored in order to build models only with distantly related proteins. With each of the selected templates, models were built using SWISS-MODEL (91, 92, 93), ESyPred3D (94), and 3DJIGSAW (95) softwares. CPHmodels (96) server does not provide an option for template selection. Thus, a single structural model per sequence was generated. Structures generated with different modeling softwares for eight protein sequences are shown in Table III.
A flowchart of the methodology adopted to rank the decoy structures according to their energies is shown in Figure 1. All the structures (including the native structures) were parameterized and hydrogen atoms were added, which are subsequently minimized with AMBER (97) for 500 steps [250 steps SD (steepest descent) + 250 steps CG (conjugate gradient)] whereby hydrogens are minimized. This was followed by 150 steps (50 steps SD + 100 steps CG) of all-atom minimization. After the initial parameterization and geometry relaxation of the structures, energy calculations were carried out using the scoring function. This was followed by a ranking of the structures, in comparison to the native structure. ![]() Figure 1: Flowchart of the protocol followed for determining the energy rank of decoys vis-à-vis the native structure.
Additionally, the scoring function is utilized to select native-like structures for two small alpha/beta proteins. We have previously proposed a computational pathway for bracketing native-like structures of proteins starting from its sequence and secondary structure information and demonstrated its viability on twelve alpha helical globular proteins comprising three to four helices (98). Here we consider two new mixed alpha/beta proteins (1FME and 1BHI). Starting from the sequence and secondary structural information, initial structure containing the coordinates of all the atoms is build. The main chain Ramachandran angles for helix, sheet, and loop regions from the database analysis are utilized to generate the initial structure. For helix and loop regions, the values are reported earlier (98). For the sheet region, considering parallel and antiparallel strands separately, the analysis is performed on all the available globular proteins (∼28,000) taken from PDB database. Average values for the twenty amino acids are given in Table IV(a). As the standard deviation is large for the average values, frequency distribution is generated and most frequently occurring values are used (Table IV(b)). The dihedral sampling procedure is employed for generating trial structures by selecting four dihedrals from each loop region and rotating each of them to g+, g-, trans, and eclipsed conformations. Thus for each protein 256(n-1) structures are generated, where n is the number of secondary structural elements. For the two proteins, having one helix and two strands, a total of 65,536 structures are generated. The generated structures are then passed through filters and close contacts were removed by Monte Carlo method. The structures were further energy minimized using AMBER suite of programs (97). These two steps relax the structures by removing strain that may occur due to intramolecular repulsions created during structure generation. The optimized structures are then ranked according to the scoring function and the 100 lowest energy structures are selected. The selected structures are further optimized using AMBER (97) by applying distance constraints (based on the secondary structural information) during the minimization, so as to facilitate formation of hydrogen bonds between the strands. Results on the decoys, the homology-built models and the tertiary structures generated for small proteins are presented below. ![]()
Results Performance of the scoring function on 12 decoy sets investigated here is depicted in Figure 2(a) to 2(l) as plots of relative energy versus RMSD with respect to the native. Each plot shows the relative energy of all the decoys of a particular set, in comparison to their respective native structures located at the origin. The energies of ten decoy structures in set A (EMBL decoy set) (99) (Table I), relative to their respective seven natives are shown in Figure 2(a), as a function of RMSD of the backbone atoms in Cartesian space. For each of the seven proteins, native is identifiable as the lowest energy structure. The decoys in set B (CASP1 decoy set) (Table I) are a collection of proposed structures for a particular target sequence. Figure 2(b) shows the plot of relative energy versus RMSD for 12 decoy structures corresponding to two proteins. Native was found to be the lowest energy structure in both cases. The decoy set C (4state_reduced decoy set) (59) (Table I) has been used earlier extensively for validating various energy functions. The relative energy versus RMSD of the backbone atoms for this decoy set C comprising 3,326 decoys corresponding to five protein sequences is shown in Figure 2(c). The native structures are distinguishable as the lowest in energy for each of the five proteins. The plot of relative energy versus RMSD for decoy set D (Lattice_fit decoy set) (100) (Table I) consisting of four protein sequences and 8,000 decoys is shown in Figure 2(d). Native is recovered as the lowest energy structure in each case. A plot of relative energy versus RMSD consisting of six sequences and 2,634 structures of the decoy set E (Lmds decoy set) (101) (Table I) is shown in Figure 2(e). The native turns out to be the lowest energy structure in each of the six cases. With the energy based scoring function native was found to be the lowest for both proteins from set F (fisa decoy set) (102) (Table I), consisting of two proteins and 1,000 decoy structures, as shown in Figure 2(f). Application of the energy function described here, to three proteins of decoy set G (fisa_casp3 decoy set) (102) (Table I) comprising 3,098 decoy structures, ranked native to be the lowest, in each case as clear from Figure 2(g). A plot of relative energy versus RMSD of decoy set H (Hg_structal decoy set) (Samudrala et al., unpublished work, http://dd.stanford.edu) (Table I), consisting of two proteins and 58 decoys is shown in Figure 2(h). For both proteins native structure was distinguished as the lowest energy structure. ![]() Click on image for full size Figure 2: Relative energy versus RMSD (Å) plots for six decoy sets: (a) EMBL, (b) CASP1, (c) four-state reduced, (d) Lattice_ssfit, (e) Lmds, and (f) Fisa. ![]() Click on image for full size Figure 2 (cont.): Relative energy versus RMSD (Å) plots for five decoy sets: (g) Fisa_CASP3, (h) Hg_structal, (i) Semfold, (j) Rosetta, (k) CASP5, (l) Homology model built set. All the native structures for each decoy set are made to coincide at the origin and are represented as a triangle (black in color) at the origin. The plot of relative energy versus RMSD for decoy set I (Semfold decoy set) (103) (Table I) consisting of two proteins and 22,667 decoys is shown in Figure 2(i). The energy function examined here achieves a complete discrimination between the native structure and the decoys. The plot of relative energy versus RMSD of decoy set J (Rosetta decoy set) (104) (Table I), comprising 20,484 decoys corresponding to 21 proteins, is shown in Figure 2(j). The native structure turns out to be the lowest energy structure in each of the 21 cases. The number of targets in CASP5 experiment was 67 and the number of predictions for each target varied from 399 to 535. For these proteins, all-atom tertiary structure models were selected for the present study. Out of the 67 proteins, 16 proteins do not have experimentally determined structures, seven were NMR structures, five were multimeric, and 25 contained either missing residues or metal ions or prosthetic groups. Of the remaining 14 proteins, seven had mismatches in the sequences of the native and the decoys. The remaining seven proteins (Table I) along with 614 decoys were minimized and ranked energetically. The native was differentiated in five out of seven proteins as shown in Figure 2(k). For the remaining two proteins native was ranked at 2nd and 7th positions (Appendix I, Protein sets: T0150, T1077). The energy function adopted in this study was applied to homology models represented in set O (Table I), constructed with the help of four commonly available softwares viz. CPHmodels, Swiss-Model, 3DJigSaw, and EsyPred3D. The plot of relative energy versus RMSD is shown in Figure 2(l). Native was favored in all the cases except for those constructed by CPHmodels server (not shown in the figure) where models were generated with the sequence having maximum similarity (including the native). The success achieved with the models generated using Swiss-Model, 3DJigSaw, and EsyPred3D softwares, indicates the utility of the energy function in comparative modeling studies. The applicability of the energy function to select native-like structures from a large ensemble of conformations is demonstrated for two alpha/beta mixed proteins. The native turned out to be the lowest energetically in both the cases according to the scoring function. It is evident that native like structures to within 3 to 5 Å RMSD are bracketed by the best 50 structures energy-wise in both cases (Table V), (Fig. 3). ![]() Figure 3: The lowest RMSD structure emerging from the proposed computational pathway for structure prediction, superimposed on the corresponding native structure for the two alpha/beta proteins: (a) 1fme; (b) 1bhi. Native is represented in black color and the native-like structure in grey color. Discussion The all-atom empirical energy based scoring function described and examined here discriminates the native structure as the energy minimum in eleven of the twelve sets considered comprising 69 different protein sequences and 61,974 decoys, irrespective of their method of generation. Even for the twelfth set (CASP5), except for two of the seven sequences where the native ranking is 2nd and 7th, native is ranked as the lowest energy structure. The scoring function was found to display a high degree of accuracy on a large number of decoy sets when compared with other energy based scoring functions. Also of particular interest is the observation that the scoring function is able to capture structures to within 3-5 Å of the native in the 50 lowest energy structures, in the attempts to predict native-like structures of two small alpha/beta globular proteins. The success of a given scoring function in folding and binding studies depends on how well each energy component such as van der Waals, electrostatics, hydrophobicity, et cetera, correlates with the various interactions involved and whether a proper balance between the various energy components is achieved. To probe this further, we conducted an energy component analysis on all the protein sets where native is ranked as lowest in energy. The electrostatic term alone could distinguish the native structure for all but six proteins (i.e., 61 out of 67) exceptions being: 5icb (4, Rosetta); 1bl0 (3, Fisa_casp3); 1hg7 (7, Homology built set); T0154 (2, CASP5); T0156 (2, CASP5); and T0183 (8, CASP5). The values in brackets following the pdb code of protein represent the energy rank of the native structure and the method of decoy generation. This consistency can be attributed to a more favorable charge distribution present in the native structure in comparison with the decoys. The hydrogen bonding interactions, which form a significant part in stabilization of the native structure, are included in the electrostatic term. The van der Waals component, on the other hand separates native as the lowest energy structure with nine exceptions: 2cro (2, four-state); 1ris (2, Rosetta); 1utg (3, Rosetta); 1vls (2, Rosetta); 1pch (3, CASP1); 2cro (157, Fisa); T0149 (4, CASP5); 1fsf (2, Homology built set); and 1hg7 (2, Homology built set). The preference of the van der Waals term for the native can be attributed to a better packing of the backbone as well as side chains in the native protein relative to the decoys. The hydrophobic component however, does not distinguish native from decoys except for eight protein sequences: 2cro, 4pti (Lmds); 1rn3, 2i1b, 5pad (EMBL); 1col-A (hg_structal); 1hg7 and 1hzt (Homology built set). This result at first sight was surprising as the hydrophobic interactions are known to be a dominant force in the folding and stabilization of protein structures. However, from the CASP meetings, it has become clear that there is no dominant force in protein folding, it is all about the balance of forces. The above analysis paved the way for determining if the sum of any two components was capable of distinguishing the native structure. The sum of electrostatics and van der Waals terms was found to discriminate the native from the decoys, for 67 out of the 69 protein sequences as with the complete function. Overall, a component-wise analysis of the energies indicates the favorable intramolecular electrostatics and packing in the native in relation to decoys. This may be a property of the decoy structures examined as well as an indicator of the necessity to improve the hydrophobic term. Based on the energy component analysis, we also investigated two cases where the energy function did not rank the native structure as the lowest in energy. The probable reason for this observation is the absence of solvent during generation of decoys. The component wise energy analysis for 50 lowest energy structures for all the seven proteins in the CASP5 decoy set is provided in Appendix I. In protein set T0150 of the CASP5 decoy set, where native is ranked second, the electrostatics component of the more stable decoy is lower than that of the native, which is not compensated completely by van der Waals and hydrophobic counterparts. Detailed investigations into the side chains of the two structures showed formation of salt bridge between lysine (amino acid 17) and aspartate (amino acid 89) of decoy structure, making it more stable. The consideration of explicit solvent in computational protocols in resolving native from among the best 50 structures may be necessary in structure prediction attempts. In the other case (protein set T0177 of CASP5 set) where native is ranked seventh, the component wise energy analysis indicates van der Waals component to be the key stabilizing factor. All the six decoys have more favorable van der Waals interactions than the native. Structural investigations indicate better side chain packing between residues present at the interfaces. Thus again, in the presence of solvent, van der Waals desolvation expense would probably have resulted in the decoys to be less stable than the native. Thus the two exceptions in 69 decoy sets seem to be attributable to lack of explicit solvent. Such problem can be tackled in structure prediction attempts by not focusing on lowest energy structure but a few lowest energy structures which could be then processed by adding explicit solvent. On the merits and limitations of the empirical energy function approaches vis-à-vis free energy methodologies, between a folded native structure relative to open/unfolded structures and random coils, one would intuitively expect that the intramolecular interactions, mainly due to electrostatics and van der Waals and hydrophobic interactions should favor the former. Electrostatic and van der Waals components of solvation as well as chain entropy, on the other hand, should favor the latter. Thus, if a scoring function is able to capture the intramolecular interactions and hydrophobicity correctly, native and native-like structures would always be preferred mimicking free energy trends albeit qualitatively. However, limitations do surface when one attempts to pin-down the native. Energy versus RMSD plots (Fig. 2) of the protein sets studied show that several structures located across the abscissa are compatible with similar energies. Essentially all the points are not along the diagonal. It is desirable to obtain a correlation between energy and RMSD or any other reaction coordinate indicating the distance from the native. The performance here may be suggestive of the fact that decoys are not true folding intermediates on a smooth funnel. A decoy-dependent discriminatory function called self-RAPDF (Table I) has been proposed by Wang et al. that shows a correlation with the Cα RMSD and selects better near-native conformations for 62 out of 83 decoy sets (105). Work is in progress in our laboratory to fine tune the relative weights of the electrostatic, van der Waals and hydrophobicity terms in the all-atom energy based empirical scoring function to obtain a better correlation between energy and some metric from the native. As of now, native-like structures having optimal packing and appropriate hydrogen bonding and side-chain interactions, are predicted not as the lowest in energy, but in the lowest 50 energy structures by the scoring function as seen from the results in structure prediction attempts (Table V). Beyond this point, Boltzmann averaging followed by free energy analyses of plausible candidates appear to hold the promise of finding candidate structures for the native with minimum most free energy. Conclusion An all-atom empirical energy based scoring function is assessed for protein structure evaluation and is seen to exhibit a near universal ability in separating the native from near-native and non-native structures. Preliminary studies on two small alpha/beta globular proteins show that the scoring function described can be utilized in conjunction with protein structure prediction methodologies such as ab initio or comparative modeling for bracketing native-like structures. Supporting Information The scoring function for protein structure evaluation has been web enabled at www.scfbio-iitd.res.in/utility/proteomics/energy.jsp, where at the user can upload a protein structure in pdb format and web-server reports the energy in component-wise breakup. All the decoys, parameterized and minimized according to Figure 1, are also made available at the website. Programs for evaluating the energies of the native and the decoy protein structures are available on request from the authors. Acknowledgements Funding from the Department of Biotechnology is gratefully acknowledged. Ms. Bhushan is a recipient of Senior Research Fellowship from the Council of Scientific and Industrial Research. The authors also wish to thank Mr. Vidhu S. Pandey and Mr. Anuj Gupta for the help received in web enabling the programs as a utility. Appendix I ![]() ![]() ![]() ![]() ![]() ![]() ![]()
References and Footnotes
|