Issue February 2010No. 4 (p 399-572) February 2010 ISSN 0739-110
Open Access Binding Sites of the E. Coli DNA Recombinase Protein to the ssdna: a Computational Study (407-427)Homologous recombination (HR) is a major pathway for the repair of double strand breaks, which is of primary importance for genomic stability and survival of all organisms. The crucial step in HR is the formation of the nucleoprotein filament composed by a single stranded DNA chain surrounded by the recombinases protein. The filament leads the search for an undamaged homologue, a sister-chromatid or homologous chromosome, as a template for the repair process. The theoretical study presented in this work is aimed to increase the understanding of the mechanism of interaction between a trimer of the recombinase enzyme in the Escherichia coli, EcRecA, with a model system of single stranded DNA, dT9. The molecular dynamics (MD) calculations were performed using the Amber 10 computer code. The binding free energies are calculated with the Molecular Mechanics (MM) Poisson-Boltzmann (PB) and Generalized Born (GB) solvent accessible surface area (SA), MM-PB(GB)SA model. The study is extended by the use of the Principal Component Analysis (PCA) to reduce the dimensionality of the conformational space. Several mutants are considered and the corresponding calculated binding free energies are compared. An experimental correlation between the binding free energy and the enzyme activity is absent in the literature, however our results confirm how the wild type RecA has a higher binding affinity compared to the mutants examined. Moreover, for all cases considered, a significant contribution of the binding free energy is due to one amino-acid, R196, located in the binding loop 2 of the enzyme. This is consistent with the complete loss of the enzyme activity for any mutation on that point.
Claudio Carra1 1Universities Space
The dominant role of DNA repair process is underlined by the fact that defects in some components of the pathways often lead to developmental abnormalities, accelerated rates of tissue and organ degeneration, and an increased incidence of cancer (1-3). Thus, an understanding of DNA repair pathways is not only of considerable academic interest but it can also have an impact on our comprehension of human disease, and it may help to understand how these pathologies can be treated or diagnosed more effectively.
Among the possible DNA lesions due to endogenous sources such as free radicals generated by essential metabolic processes or environmental agents like ultraviolet lights (UV), chemical agents, and ionizing radiation, the double strand break (DSB) is particularly genotoxic (4). DSB repair is also of importance in human space exploration as recently outlined by Durante and Cucinotta (5). Since the lesions occur in both DNA strands, the complementary DNA strand cannot be used as a template for the repair. As a consequence, DSB is a strong inducer of mutation and chromosomal aberrations. To combat these deleterious effects, multiple pathways have evolved for the DSB repair. Our interest is on the homologous recombination (HR) because it is the pathway which keeps the genomic integrity by using an undamaged homologue, a sister-chromatid or homologous chromosome, as a template for the repair process (6). While it was traditionally considered as a means for generating genetic diversity, it is now known to be essential for restart of collapsed replication forks that have met a lesion on the DNA template (7). The central stage of this process requires the presence of the DNA recombinase proteins, RecA (8) in bacteria, RadA in archaea (9), and Rad51 in eukaryotes (10), which leads to an ATP-mediated DNA strand-exchange process. A particularly efficient model to describe the molecular mechanism of homologous recombination has been the three-strand exchange reaction between linear double stranded DNA, dsDNA, and circular single-stranded DNA, ssDNA, promoted by the recombinases family of proteins. Detailed biochemical and structural studies have shown that the process proceeds in two distinguishable phases. The first step consists in the presynaptic polymerization of RecA protein on ssDNA forming a helical nucleoprotein filament assembled in the 5’-3’ direction (11). The second is the synaptic homologous alignment of nucleoprotein filament of RecA-ssDNA with a naked dsDNA filament. Finally, an unidirectional strand exchange, driven by ATP hydrolysis, leads to a stretching of heteroduplex DNA or to dissociation of the RecA. In the RecA filament, the monomers are packed head-to-tail with an extensive monomer-monomer chain. Each monomer is formed by three domains, identified for the first time by Story et al., (12). The residues 1–33, N domain, form a α-helix and β-strand that pack against the neighboring subunit, including most of the monomer interface. The core domain, located in the central part of the enzyme, residues 34–268, is formed by eight stranded mixed α-helices and β-sheets and contains the ATP binding site. The position of the loops involved in the interaction with the ssDNA in the presynaptic filament was very elusive, and only recently the X-ray structure was solved by Chen et al., (13). The final part of the protein is represented by the C-terminal domain, residues 269-352, which is rather far from the central body of the enzyme and it is involved in the formation of the synaptic filament (14, 15). It has been observed that ATP increases the affinity of RecA to ssDNA whereas ADP destabilize the binding, but the reason for this behavior are still unknown (16). A peculiar and intriguing characteristic of the L1 and L2 domains is their high degree of conservation in prokaryotic cells. In a sequence of 64 bacteria the RecAs has 11 of the residues from 193 to 212 almost identical, and 6 are highly conserved, thus 17 over 20 residues are either identical or chemical conserved (17, 18). On the other end, the eukaryotic homologues Rad51 and Dmc1 are considered to have an analogue function to the RecA. They share a strong homology sequence, in particular within the 230 amino acids in the core of the structure (19). Even though the loop L2 is in the core of the enzyme, and the residues that attach the loop to the whole enzyme are identical, some of the critical residues in L2 of RecA like Arg196 and Lys198 are absent in the corresponding region of the Rad51 and Dmc1 (20). Rad51, RecA and RadA share a TPBase domain which has a similar structure to the F1 ATPase (21). It has been suggested that part of the activity provided by the missing amino acids can be replaced but other unidentified proteins (22). Because of the high degree of conservation in the family of the recombinase enzymes, we focused our study on the bacterial RecA for which the crystal structure of the binding complex has been determined. To model the interaction between RecA and the ssDNA, the position of the binding domains, L1 and L2 (residues 157–164 and 195–209 respectively) has to be available. Chen et al., (13) by constructing a finite segment composed of 6 monomers, managed to determine the orientation of the loops, and their mechanism of interaction with the ssDNA (pdb id 3cmu). The network of interaction is indeed complex: each nucleotide triplet interacts with three consecutive RecA monomers. The first nucleotide interacts with the closest RecA to the nucleotide triplets (RecA0), and with the next monomer proceeding in direction 5’ of the ssDNA. The second nucleotide binds only with RecA0 and the third nucleotide is bound to RecA0 and with the following monomer in direction 3’ of the filament. A series of hydrogen bonds between the backbone of the nucleotide triplet and the RecA monomers where clearly described, involving Met197, from the first RecA, Asn213 Gly211 Gly212 from the closest RecA monomer and Ser172 and Arg176 with the following RecA monomer. The bases also interact with the RecA monomers. The first base has a van der Waals contact with Met197, Ile199, Gly200 of the first RecA, while the last base is close to Lys198, Ile199 and Thr208 of the third RecA monomer. Also RecA0 is not interacting with any base. Even though Chen et al., (13) provided fundamental insight in the structure of the RecA-p(dT) complex, many questions remain with regards to the activity controlled by a growing network of functional interactions (23–31). The kinetic rates, particularly elusive, were only recently reported by Ha et al., (32) using a single molecule fluorescence assay. Several studies were conducted on mutations of the wild RecA protein to try to depict which is the binding part of the protein to the ssDNA. All data showed that the mutations lead to mild changes in the activity of the enzyme (33), but a strong change in activity is observed by each point mutation in the L2 binding site of the enzyme (22). Such information, however, does not clearly elucidate the binding property of RecA since there is not a clear correlation between binding and enzyme activity of the corresponding mutants. Moreover the data obtained are very sensitive to the nature of the assay used (34, 35). The strong dependence of the experimental results with the techniques used, combined with the complexity and rate of the formation of the nucleoprotein filament (36), challenges the achievement of quantitative theoretical predictions with a method based on all-atom force field. However, a theoretical analysis might elucidate which are the major contributions to the binding site of the protein, which is the flexibility of the complex, and how a mutation might affect the binding property of the enzyme. We aim to model by classical dynamics, the interaction of the trimer of the EcRecA with a dT9 sequence of nucleotides, RecA-T. Our goal is to shed light on the binding mechanism by looking at several aspects of the enzyme interacting with the ssDNA model. The mobility of the complexes is analyzed by comparing the averaged RMSD values as a function of residue number for the RecA-T and mutants. The study is extended to the analysis of the correlated and uncorrelated motions revealed by the cross correlation matrix plotted for our systems. Because of the substantial size of the systems considered, we performed a Principal Component Analysis (PCA) on the molecular dynamics trajectories (37–40). The PCA method extracts the dominant modes in the motion of the molecule from the trajectory generated in the molecular dynamics calculation. This approached is aimed to complement the previous results by providing a more accurate description of the distinct states for the complexes, that is compatible with the inevitable error that a short sampling will imply (41). Our interest, however, is not only in the conformational analysis of the complexes, but also in and estimation of the binding free energy. We adopt the MM-PB(GB)SA protocol, using the Poisson-Boltzmann (PB) and Generalized Born (GB) solvent accessible surface area methods. This approach allowed us to depict the dependence of the binding free energy change due to the mutation and to distinguish which amino acid brings the major contribution to the overall free energy. We estimated the solute binding entropy, by using the quasi-harmonic approximation, restricting the mass weighted covariance matrix to only few atom types, and by employing the harmonic model by considering only the central monomer in the complex. Because of the extension of the time of the simulation, we restricted our study to mutants which showed interesting experimental aspects. The M202Y, according to the work of Camerini-Otero (22) seems having an unchanged reactivity respect the wild type. On the contrary, for any change of the position 196 there is a complete loss of reactivity. We chose the R196K because Lysine is chemically similar to Arginine, thus it is hard to guess the origin of this change in reactivity. Cazaux et al., (33) measured the change of reactivity for several mutants with a significant difference in reactivity was observed in G204S and E207Q (27). Computational Details The geometry of the trimer of EcRecA interacting with the (dT)9 was constructed by considering part of the crystal structure determined by Chen et al., (13) (pdb id 3mcu). The X-Ray structure showed that each nucleotide triplet is bond by three consecutive RecA monomer (ReacA1, RecA2, RecA3) In particular, the first monomer, RecA1, situated in 5´ direction is followed by RecA2 in the central position, proceeded with RecA3 in 3´ direction of the homonucleotide fragment. Because of the size of the system, not all residues were considered in the model. The terminal C was removed for all the models because its role is to recognize the homologue dsDNA once the presinaptic filament is formed (14, 15), thus it is not relevant for the reaction we are considering. The terminal N was already removed in RecA1, to prevent the protein fusion by polymerization (13). The ADP·AlF4 in the crystal structure was replaced by ATP which was put in the same geometry. In the simulation the library from the work of Carlson (42) was used to model the nucleotide. Several Na+ ions were added to the each complex to reach the neutrality of the whole system. The molecular dynamics simulations and the data analysis were carried out with the AMBER 10 (43) package using the parmbsc0 force field (44). Nonbonding interactions were cut off at 10 Å. The (EcRecA-ATP)3 interacting with the (dT)9, RecA-dT, was modeled in a periodic box with a 8 Å buffer of water molecules explicitly described by the TIP3P (45) model. The particle mesh Ewald (PME) method (46) was used to treat the long range electrostatic interactions. To simulate the experimental conditions, the concentration of the salt in the bulk of the solution was set at 50 mM. Computational Details The geometry of the trimer of EcRecA interacting with the (dT)9 was constructed by considering part of the crystal structure determined by Chen et al., (13) (pdb id 3mcu). The X-Ray structure showed that each nucleotide triplet is bond by three consecutive RecA monomer (ReacA1, RecA2, RecA3) In particular, the first monomer, RecA1, situated in 5´ direction is followed by RecA2 in the central position, proceeded with RecA3 in 3´ direction of the homonucleotide fragment. Because of the size of the system, not all residues were considered in the model. The terminal C was removed for all the models because its role is to recognize the homologue dsDNA once the presinaptic filament is formed (14, 15), thus it is not relevant for the reaction we are considering. The terminal N was already removed in RecA1, to prevent the protein fusion by polymerization (13). The ADP·AlF4 in the crystal structure was replaced by ATP which was put in the same geometry. In the simulation the library from the work of Carlson (42) was used to model the nucleotide. Several Na+ ions were added to the each complex to reach the neutrality of the whole system. The molecular dynamics simulations and the data analysis were carried out with the AMBER 10 (43) package using the parmbsc0 force field (44). Nonbonding interactions were cut off at 10 Å. The (EcRecA-ATP)3 interacting with the (dT)9, RecA-dT, was modeled in a periodic box with a 8 Å buffer of water molecules explicitly described by the TIP3P (45) model. The particle mesh Ewald (PME) method (46) was used to treat the long range electrostatic interactions. To simulate the experimental conditions, the concentration of the salt in the bulk of the solution was set at 50 mM. The equilibration of the system was performed as follows. An initial optimization of 1000 steps by steepest descent was performed followed by 3000 steps with the conjugate gradient method. In this first stage, the complex was kept constrained in order to relax the solvent. Then a further optimization of 9000 steps with no constraints on the whole system was carried out to lead to a final relaxed geometry. The first equilibration was carried out with a weak restraint on the complex for 20 ps at constant volume, constantly increasing the temperature from 0 to 300 K. The equilibration continued for 100 ps at a constant pressure of 1 atm, by keeping the temperature constant with the Langevin temperature equilibration scheme (47) using a collision frequency of 1.0 ps-1. Under these conditions the restraints were gradually removed. The production run was carried out without restraints for 20 ns. During the MD calculation, hydrogen stretching motions were removed using SHAKE bond constraints (48), allowing a longer time step of 2 fs without introducing any instability. Free binging energies of the complex were calculated with the Molecular Mechanics Poisson Boltzmann and Generalized Born surface area method, MM-BP(GB)SA, implemented in Amber10. The binding free energy is calculated by taking the average free energy difference between the complex, ![]() ![]() ![]() Where the average free energy, , for the complex, RecA-ATP-Mg2+dT9, receptor, RecA-ATP-Mg2+, and ligand, dT9, is composed by: ![]() where EMM is the molecular mechanics interaction energy, in “gas phase”, within the system, GPB/GB is the component of the electrostatic energy calculated with the Poisson-Boltzmann (PB) (49) or Generalized Born (GB) (50) methods. We used several GB methods to compare the adaptability of different protocols to our systems. We considered the method of Howkins et al., (51), IGB1, the modified GB methods, GBOBC, as in the approach of Onufriev et al., (52), IGB2, a modification of IGB2 (53), IGB5, and the modification of Mongan et al., (54), IGB7. For all the GBOBC approaches, we used the mbondi and the mbondi2 radii definitions, IGB2’, IGB5’ and IGB7’ respectively, whereas in IGB2” we used PBradii, as described in Table II. The δGSA represents the non-polar contribution to the solvation free energy which is determined with a solvent-accessible-surface-area-dependent (SA) (55) approach. The term TΔS(s) is the conformational entropy change of the solute, that will be only estimated in this work accordingly to the method proposed by Orzechowski et al., (56) where only a fragment of the structure was considered to extrapolate the binding entropy of the entire system. We also used a quasi-harmonic method to have an evaluation of the term TΔS(s), but the results were not conclusive (see SI for details). The grid size used to solve the Poisson-Boltzmann equation was 0.5 Å, and the values of interior dielectric constant and exterior dielectric constant were set to 1 and 80, respectively. The gas phase and the solvation free energies were calculated over 400 snapshots taken at 20 ps intervals from the last 8 ns of the MD trajectories. In order to investigate motion between different regions in the protein, as domain-domain communication (57–59), or between ligand and receptor, we calculated the correlation matrix for the binding complex, displayed as a two dimension correlation map. A positive value shows that the atoms are moving in the same direction, whereas a negative value indicates an anti-correlated motion. We used ptraj applet in Amber code, to generate the correlation matrix and Matlab to generate the 3D plots. In a MD calculation the coordinate trajectory matrix has the dimensions of (3N, n) where N is the number of atoms and n the number of snapshots selected for the analysis. The resulting conformational space has a high dimensionality and does not allow a simple description or a visualization analysis. The use of the principal component analysis (PCA), methods that reduces the dimensionality of the conformation space, can allow for the depiction of extreme structures and the major fluctuations of the correlated motions. This method projects the multidimensional conformational space onto a new set of axis which maximizes the variance of the projection along orthogonal directions. Such a projection enables low-dimensional representation of the spatial relationship between conformations. With this approach, the analysis starts by diagonalizing the covariance matrix σm,w whose individual element is: ![]() by projecting the MD trajectory onto the main essential directions, corresponding to the larger eigenvectors, one can visualize the extreme structures and the major fluctuations of the correlated motions. The histograms obtained by the projection of the MD trajectories, where we considered only the Cα and P atoms because of the memory limitations, onto the first two principal eigenvectors are visualized with Matlab. The PCA (38-40, 60-62) analysis was carried out with PCAZIP software (63), and the collective dynamic modes are plotted using the porcupine method developed by Tai et al., (64). The mutations were performed with Swiss pdb viewer 4.0. Results The model was constructed on the crystal structure of the active complex RecA-ATP-p(dT) determined by Chen et al., (13), (PDB id 3cmu). The geometry revealed that each triplet set of dT base interacts with three adjacent RecA monomers. The binding process propagates via a monomer initiation (25, 36, 65), but it continues via a co-operative binding mechanism (36, 65, 66). As a consequence, it is appropriate to design a model formed by more than a single monomer. In our simulation we considered the first three consecutive monomers of the experimental crystal structure of EcRecA interacting with a sequence of nine bases, dT9, to maintain the stoichiometry of the complex, three bases per monomer. This structure, RecA-T, is used as a reference to generate the complexes, labeled G204S, E207Q, M202Y, and R196K of corresponding mutants. For each trajectory, the root mean square deviation, RMSD, of the backbone atoms in function of the residue number of the entire system, 756 residues, is shown in Figure 1. The loops, L1, and L2, (residues 157-164 and 194-209 for each monomer) and dT9 are the residues directly involved in the binding process. The graph shows how mutations lead to an increase in mobility of the N terminal, (residues 1-33), the part of the enzyme supposed to interact with the adjacent monomer in the nucleoprotein filament, present in the second and third monomer. In RecA-T, however, the N-terminal results more stable with a RMSD maximum values of 6 Å for the second monomer and 4Å for the third monomer. Particularly noticeable is the change in the RMSD in the L1 and L2 regions in the mutants. RecA-T maintains a value reasonably low, of 1.5Å and 2.5Å for L1 and L2 respectively, underlying how the geometry change of the binding moieties does not deviate sensibly from the original X-Ray structure. For the third monomer slightly higher deviations are noticeable, with RMSD values of ~3Å for L1 and L2. The complexes with the mutants have a higher flexibility respect RecA-T, in particular L1 and L2 regions belonging to the third monomer, with a RMSD that reaches 5Å. The back-bone of the complex is the most stable part of the complex, RMSD values < 1Å for RecA-T, and slightly higher for the mutants, with values from 1 to 1.5 Å. ![]() Figure 1: RMSD value per residue averaged over 20ns for RecA-T and mutants. In order to investigate motion between different region in the protein, as domain-domain communication (57-59), or between ligand and receptor, we calculated the correlation matrix for the binding complexes, displayed as a two dimension correlation map. A positive value shows that the atoms are moving in the same direction, whereas a negative value indicates an anti-correlated motion. Our data show the correlation values of the Cα for the protein and P atoms for the nucleotide, Figure 2. ![]() Figure 2:Cross correlation matrix positive values show atoms moving in the same direction, in red, whereas negative values indicated anti-correlated motion, in blue. The native structure, RecA-T, shows areas of mild correlation and anti correlation, with values changing from -0.3 to 0.3. The movement of the residues belonging to the same monomer results correlated three main squares along the diagonal, and were anti-correlated with respect the other monomers in the remaining part of the diagram. The dT9 fragment shows a barely noticeable correlation with the central monomer, which is however, increased by mutations. The smallest values in the correlation matrix found for RecA-T is due to a minor change in the geometry of the complex during MD simulation, suggesting how the geometry of the crystal structure is close to corresponding structure in solution. The mutations lead to a general increase in the absolute values of the correlation matrix elements, underlining how the resulting geometries change during the simulation. In particular, the correlation map shows a significant increase in the correlation and anti correlation areas to values closer to ±0.5, for G204S and E207Q. M202Y shows an intermediate correlation pattern, similar to RecA-T, while R196K has a correlation close to the unit. The interaction of the enzyme with the dT9 fragment can be seen by looking at the top and right border of the figures. Those regions are largely uncorrelated with the binding loops of the RecA, confirming how the binding geometry does not change significantly from the X-Ray structure. The mutants reveal an enhance degree of correlation in the binding area, in particular G204S, however R196K has increased the correlation with dT9 mainly in the central region. Because of the size of the systems considered and the potential flexibility that the binding units might show, we extended our analysis to a study based on the principal component analysis (PCA), often called essential dynamics(38), applied to protein dynamics (64). The PCA technique was extensively used (38, 67-70), despite the issues risen on the effective accuracy (71). ![]() Figure 3:Percentage of the cumulative eigenvalues as a function of the number of eigenvalues considered. With this approach, the high dimensional space spanned by 3N-6 degrees of freedom (where N is the number of atoms) is sensibly reduced. The eigenvalues resulting from the principal component projection represent the variation of the original set of data along the corresponding eigenvector. These eigenvalues are normally sorted in terms of their magnitude, and their accumulative sum gives an indication of the quality of the representation for a given number of dimensions. As a limiting case, if all the eigenvectors are considered, the original space is correctly represented in the projection subspace. In Figure 3, the accumulative projection in function of the number of eigenvalues is represented. In general to obtain 99% of the variance, it will be necessarily to consider more than 500 eigenvectors, however, already if the first two eigenvalues are taken into account, more than 60% of the total motion is included for RecA and more than 50% for the remaining mutants. Although a representation over 3D or 2D dimensional subspace might not cover more that 50-60% of the motion of the entire system, each contribution of the remaining dimensions is small. We therefore assume it is appropriate to evaluate qualitatively the properties of the RecA complex by looking at the 2 main components. The representations of the 2 principal component eigenvectors, relative to Cα and P atoms, for RecA-T and the mutants are shown in Figure 4. The eigenvectors, ev1 and ev2, relative to the three RecA subunits are shown in red, violet and pink respectively, while the vectors for the dT9 are shown in green. The binding loops are represented in yellow, for L1, and orange, for L2. The main component of the motion of RecA-T, ev1, is dominated by a concerted motion of the monomers toward the dT9. In particular, L2 has a strong component toward the binding regions. L1, on the contrary, seems showing a minor binding character, with vectors pointing far from the dT9 fragment. The components of ev2 are lower in magnitude compared to ev1, but still they underline the motion of the RecA toward the dT9. The mutation alters the values of ev1 and ev2 but the major aspects are preserved. The general tendency in binding the dT9 fragment is maintained, were clearly the motion of the protein is towards the bases. In all cases there is a strong component in the N-loop, which is toward the rest of the enzyme, in RecA-T, M202Y, but in the opposite direction in E207Q and G204S. These motions, however, reappear as dominant in the ev2. As a general picture we can identify two main motions for all the systems: a combined rotation of the monomer toward the dT9, in particular in ev1 for RecA-T and M202Y, and a motion of the N-terminal toward the rest of the complex, suggesting how a binding cooperatively can be a possible mechanism of RecA reaction. According to the energy landscape model (72), the folding and binding coupling is often followed by conformational changes related to the biological functions of proteins, and as such determine the nature of the energy landscape (73-78). In this model of selection of conformations for the binding process in biomolecules, the flexibility of the binding partners and the binding mechanism can be established by the dynamic equilibrium of the complex conformational states (79). ![]() Figure 4:The first and second component of the principal component analysis plotted for Ca atoms for RecA-T and mutants. The vectors are displayed with porcupine program. The two dimensional projections of the trajectories of the systems considered onto the first and second principal component is shown in Figure 5. This technique allows one to visualize the shape of the potential energy surface in spite of the inevitable distortion introduced by the projection. The model was performed using only the Cα atoms of the protein residues and P atom for the dT9 fragment, thus it describes the interaction with the ssDNA only partially. Nevertheless, it is relevant to observe groups of local minima along the trajectories which are more or less pronounced in the function of the mutations. It is interesting to note how the interpolation of the values of the projection of the trajectory onto the two main principal eigenvectors along the simulation, shown by the black line, crosses the funnel areas of the histograms. The calculations have not been scaled in any way to reflect relative energies because our statistics from direct MD is insufficient to achieve a quantitative accuracy; however the results reveal the effect of mutation of RecA on the change in the basins structure. For RecA-T there are several basins separated by low barriers showing the exploration of the conformational space along the simulation. The first basin, encountered at the initial stage of the simulation, is located in proximity of the point with ev1 and ev2 coordinates (10,15). It is not particularly pronounced and the system evolves after the first 6 ns to a new minimum, (10,-15). From there the final transition is rather smooth, reaching the final basin located in proximity of the point (-15,10). The G204S has a more complex structure of basins, more separated rather then connected by valleys as for RecA-T. The system proceeds through several intermediate states, finishing by lying for 3 ns in proximity of the global minimum (-20,10), in the final part of the simulation. A similar topology is encountered for E207Q, where all the basins result well separated. The first minimum is found at the beginning of the simulation, coordinates (-40,15), then it is followed by a series of microstates, ending to the final basin at coordinates (20,10). The remaining two mutants, M202Y and R196K, have the basins connected by several valleys, suggesting how the transaction is rather smooth, similar to the RecA-T case. R196K, shows a deeper minimum close to the point (-40,-10) which is reached at the end of the simulation. ![]() Figure 5:The “minimum energy envelope” mapping the potential energy surface of Reca-T and corresponding mutants. The black line represents the interpolation of the values of the projection of the trajectory on the first two principal eigenvectors. The binding free energy is obtained as a difference between the free energy of the complex and single species, RecA or mutants, and dT9. The MM-PB(GB)SA approach was used to calculate the average contributions of gas phase energies, the solvation free energy, using the Poisson Boltzman (PB), and the Generalized Born (GB) protocol, for the electrostatic component of the free energy. One of the advantages of using the MM-PB(GB)SA method is that the “nonphysical” annihilation (80, 81) or decupling (82, 83) of the species alone in solution or bounded to a substrate is not required. Moreover, it is not necessarily to model the partially unbound states as demanded using umbrella sampling. Consequently the MM-PB(GB)SA method uses to estimate the binding free energy only physical states. The input for the MM-PB(GB)SA calculations were extracted from the last 8 ns of the MD trajectories, from 400 adjacent snapshots at 20 ps time intervals. Since the correlation time for decay of fluctuation of the free energy is about 1 ps, extracting the snapshots at a time significantly longer should lead to independent series of sample points. Under these conditions, the value of the standard error in the mean (SEM), decreases with the square-root of the number of points, thus by having a large number of samples it is possible to estimate properly the value for the binding free energy. The study conducted by Kollman and Case (84, 85), revealed how a number of snapshots from 100 to 200 is sufficient to calculate the binding free energy with a good accuracy. The values for the binding ΔGgas+solv, calculated by MM-PB(GB)SA, oscillate a lot respect the average value but still in an expected range. Table 1 Table 2 Table 3 In our study we used several protocols to calculate the binding free energy. To the MM-PBSA, PB, method (Table I), we compared several MM-GBSA methods with different effective Born radii definitions (Table II). To verify the stability of the complexes, we plot the MM-PBSA and MM-GBSA values (for this case we considered the IGB2”), for each snapshot for the last 8 ns of the trajectory, Figure 6. In the discussion we use the MM-PBSA energy as a reference to establish the relative stability in within the mutants and wild RecA, then an analysis on the results obtained with different methods follows. The average MM-PBSA energy of RecA-T is -179.4 kcal/mol with a standard error of 6.2 which is similar to values calculated for the other cases. The data come from a reasonably equilibrated structure with a linear regression slope of 4.8·10-2 kcal · mol-1 · ps-1 comparable to previously published results (86). The G204S mutation decreases the binding character with the estimation of binding free energy of -161.4 kcal/mol. The standard error is slightly lower, 5.8, but still comparable to the RecA-T case. The convergence appears to be better achieved as confirmed by a lower slope value, 1.4·10-2 kcal·mol-1·ps-1. The E-Q mutation in position 207 gives a totally different result. The binding free energy is strongly reduced, with a main value of -123.5 kcal/mol even if the standard error and the slope are comparable to the RecA-T system, with values of 6.2 and 5.2·10-2 kcal·mol-1·ps-1 respectively. The stability of the complex is higher but very similar with the mutations in position 202 and 196. The slope of the linear regression for both cases is very small, with values of 2.2·10-2 kcal·mol-1·ps-1 and 2.8·10-3 kcal·mol-1·ps-1 for M202Y and R196K respectively. The binding character is stronger in both cases than the previous mutations, but still lower than the wild system. M202Y has a binding free energy of -155.9 kcal/mol and R196K value change only slightly, -155.4 kcal/mol. The data relative to the MM-PBSA method, including the corresponding SEM are listed in Table I. The two major attractive contributions are the gas phase Coulombic energy, Helec, and van der Waals energy, HvdW, whose sum is labeled as Hgas, whereas in general the antibonding contributions come from the polar solvation component, GBP, while the non-polar part, Gnp, gives a minor contribution. The sum of both quantities is labeled as Gsolv. The solvation part of the free energy is calculated by solving the Poisson-Boltzmann (PB) equation using radii from the PARSE parameter set (87).
Figure 6:MM-PB(GB)SA binding free energies for RecA and mutants, calculated in the last 8 ns of the trajectory. Each snapshot has 20ps step.
Blown-up View: Fig. 6.
The highest binding contribution of the Coulombic energy, Helec, is found in the E207Q mutant -532.9 kcal/mol, whereas the lowest value is found for the M202Y mutant with an absolute value of -230.2 kcal/mol. The Hvdw energies have a similar magnitude along the series of complexes, with an average absolute value of -180 kcal/mol. The polar salvation energies, GPB, give a strong antibonding character, with a high value for E207Q of 429.9 kcal/mol, respect the RecA-T of 193.8 kcal/mol. Much lower values are found for G204S and M202Y 103.6 kcal/mol and 96.6 kcal/mol respectively. We compare the results obtained with the PB method to several Generalized Born (GB) approaches. Contrarily to the PB method, in the GB approach each atom is represented as a sphere with a radius called effective Born radii. In order to increase the computational efficiency, several methods have been proposed to determine the Born radii. The modified CFA method introduced by Hawkins et al., (51), IGB1, was one of the first implemented in Amber. This method, however, gave some issues on the buried atoms. A rescaling approach was introduced to estimate more accurately the Born radii, with the GBOBC methods, namely IGB2 and IGB5. The last developmentwith this approach is the IGB7. In our calculation we compared all the methods available. We used two different radius definition, bondi and mdondi2 for IGB2, 5, 7, and IGB2’, 5’, 7’ respectively. In IGB2” we used PBradii. All the data are summarized in Table II, indicating the mean values for the each MM-GBSA method, the standard error of the mean, sem, and the energy difference respect to the corresponding MM-PBSA values reported at the bottom of Table I. The MM-GBSA results are very sensitive to the method used, even if the order of binding is preserved among the considered species. The methods IGB5 and IGB7, regardless the radii definition, give a unrealistic result with overestimation of the binging free energy. IGB1 and IGB2, however, predict a binding free energy comparable with the PB method. The corresponding SEM values are all comparably and with a value between 3 and 7. IGB1 has a general tendency to overestimate the binding free energy, with an average value of -44 kcal/mol. On the contrary, IGB2, shows a small under binding character by only +5 kcal/mol. The value, however, is very close to the corresponding PB result for the R196K system. The change of the radii definition effects the results, in fact for IGB2’, there is an over binding but not so pronounced as in IGB1. The average values in this case is -16 kcal/mol. In N202Y the value is closer to the PB result, -166.2 kcal/mol, while in G196K the IGB2’ prediction is more overestimated, -187.9 kcal/mol. The IGB2” method provides values that confirm a general over binding, which is not uncommon (88), but still in much better agreement than IGB1. Because of the good agreement with the PB results, we used IGB2” to analyze the residue contribution to the binding free energy, Figure 7. Our attempt in determining the solute binding entropy, TΔS(S), was not to provide a quantitative analysis but a qualitative picture to establish whether there is a systematic change of the entropy in function of the mutation. Because of the periodicity of the system, we considered the approach proposed Orzechowski et al., (56) where only a part of the complex was considered to estimate the binding entropy of the entire system.
Figure 7:Residues decomposition of the binding free energies calculated using the IGB2” method for RecA and mutants, calculated in the last 8 ns of the trajectory. The values show the contributions of the coulombic interaction plus the salvation free energy (coul + GB) the non polar (NP) and van der Waals (vdW). The number in parenthesis shows to which monomer the residue belongs.
Blown-up View: Fig. 7.
The data are calculated over 5 equidistant points at the last 10ns of the trajectory. The resulting standard deviation is rather high >11 Å however, the binding entropy is particularly high for the G204S mutant, -80 kcal/mol, and substantial for E207Q, -53 kcal/mol. For the remaining systems, the value are comparable with values between -43 and -49 kcal/mol. The MM-GBSA method allows for the decomposition of the binding free energy in terms of the contribution from each residue of a considered structure. The data calculated with the IGB2” method (Figure 7), show the three components for the highest 50 contributions to the binding free energy, ΔGgas+sol, for each residue in the complex: the Coulombic interaction summed with the solvation free energy (coul + GB), the non polar (NP) contribution, and the van der Waals (vdW) energy. The results shown in Figure 7 reveal a few intriguing common features. First of all only about fifteen amino acids contribute to the overall binding free energy, with energies from -18 to -2 kcal/mol, confirming the complexity of the binding process and its strong cooperatively character. Even if the number of residues which contribute sensibly to the binding is significantly smaller respect the overall system, there are just few amino acids which provide a significant contribution. For all cases the ARG196 for the first and second monomer give the major binding contribution. In RecA-T the contribution is -13.4 kcal/mol for the Arginine in the first monomer, (1)ARG, and -11.4 kcal/mol for the second monomer (2)ARG. In G207Q and G204S the Arginine contribution is higher respect RecA-T, with values of -15.5 and -15.0 kcal/mol for the former and -17.8 and -17 kcal/mol for the latter. In M202Y the contribution is more heterogeneous, where the value for (1)ARG is -11.8 and (2) ARG is -18.2 kcal/mol. Even when the position 196 has been replaced, the contribution is still high, as in R196 K the Lys 196 in the first and second monomer contribute by -14.6 and by -11.7 kcal/mol. As a general finding, ARG is an amino acid which has a dominant role in the contribution to the binding free energy, as the ARG in position 169, and 176, where the contribution is generally higher that -5 kcal/mol. Equally noticeable is ASN in position 213 in particular for the second and third monomer. Likewise the THR210 and GLY212, present in all systems considered, contribute by more than -6 kcal/mol. There are, however, few antibonding contributions and their weight respect the overall values are essentially negligible. For instance the negatively charged ATP contributes by less than 5 kcal/mol, likely due to its distance from binding regions. The ion Mg2+ gives a negligible contribution on the order of 1 kcal/mol. The nucleoprotein filament is a very flexible structure (13) thus it is conceivable to assume that the central part of the protein filament might bind differently from the extremities. We tried to address this issue considering the data used in Figure 7 but regrouped in terms of binding monomers, ATPs, and ions, Mg2+ with results listed in Table IV. ![]() The contribution for each monomer to the binding free energy reveal substantial differences . The monomer in the central position has the highest weight in the binding free energy, in particular in RecA-T that is -78.9 kcal/mol higher than the remaining mutants. The first monomer has the lowest contribution with an average value of -29.5 kcal/mol in among all the complexes. Not surprisingly the component from the dT9 fragment has a strong weight, in particular for R196K and RecA-T with values of -41.5 and -36.7 kcal/mol respectively. The contribution for all ATP is about 2 kcal/mol for all systems, while the role of Mg2+ is practically negligible. Discussion The mechanism of formation of the nucleoprotein filament is still unclear (23, 65, 89-91), despite numerous years of study; in particular it seems very elusive to determine which is the minimum binding unit involved in the formation of the nucleoprotein filament (36, 92). Several mechanisms were proposed to elucidate the kinetics of the formation of the nucleoprotein filament, by considering the presence of several reactions and functional states in equilibrium (26, 30, 31, 93-97). However a clear model capable of properly describing the RecA binding to ssDNA is still missing. As a consequence the kinetic data obtained so far cannot be used to develop an accurate picture of the nucleation of the ssDNA by RecA. Our purpose is to try to understand in more detail some of the crucial aspects of the binding process by using molecular dynamics (MD) simulations. Our calculations investigated the complex, RecA-T, between a trimer of the wild enzyme, wtRecA, with a fragment of ssDNA, dT9. The model is extended to a selected series of complexes equivalent to RecA-T, but with mutations in the L2 binding site of the enzyme. Several studies have been conducted on the RecA activity in function of different mutants (22) but measurements of binding free energies for RecA and mutants are absent in the literature. The only thermochemical information are on the binding enthalpies (98, 99). Studies conducted on mutation in the last two decades (22, 27, 33-35, 100-105), when the crystal structure of RecA was still undiscovered, were designed to identify possible binding sites. As shown by experiment, mutations changed significantly the RecA activity and the ability of the protein to associate with the ssDNA. Kowalczykowski et al., (34), for instance, performed an extensive study on association rate of the RecA protein to the ssDNA, including RecA441 (E38K), RecA803 (V37M) and RecA730 (G38K). These mutations, however, are not in the binding loops but in the N terminal domain, where each monomer interacts with the adjacent one in the filament. The observed reduced binding activity might be due to a decrease in the packing ability rather than a change in the intrinsic binding property of each monomer, or to a modification in the co-operativity parameter (66). The determination of the number of monomers binding simultaneously to the ssDNA has always been very challenging, especially because it was unachievable to separate the process of nucleation from the filament expansion that follows immediately after the binding. Moreover, the extension of the filament forming on the ssDNA was hard to measure because multi nucleation event occurring simultaneously. There is however evidence that the minimum binding unit is the monomer and the mechanism of nucleation has a high degree of co-operativity (36, 65), even if the oligomeric forms of the protein can take place in the assembly (92). A strong contribution was brought by Joo et al., (36), where, for the first time using very sophisticated single molecule FRET (106-108) technique, the dissociation and binding unit and the corresponding rates were measured in proximity of both ends of the filament near the ss-sdDNA junction. Our study provides a qualitative description of the change of binding free energy relative to mutation in the binding loops, in particular L2, for which some experimental data are available. The absolute binding free energy is beyond our reach because an essential component in its determination, the solute binding entropy, TΔS(s), cannot be calculated with our computer resources. We provided estimations, however, by calculating the binding entropy of a fragment of the trimer with the harmonic approximation (see below for further discussion). The model proposed of the complex is formed by a RecA trimer, obtained from the crystal structure, interacting with 9 bases, dT9, to maintain the stoichiometry of the complex. The co-operativity factor is disregarded in our treatment, thus we considered the mutations occurring only in the binding loops. The point mutation in position 207 was examined by Cazaux et al., (27), to reveal the importance of the positive charge in that position interacting with a p(dT) chain. The experiment suggests how the E207K did not change significantly the stoichiometry, structure, and stability of the nucleoprotein filament with respect to the wtRecA, while the E207Q shows an inhibition in the reactivity. Our data consistently confirmed the strong decrease in binding free energy, by 56 kcal/mol for the trimer, even if the position 207 is irrelevant to contribution to the binding, Figure 7. This is consistent with the experimental X-Ray structure (13) which shows how the point residue 207 does not hydrogen bonds with the ssDNA. The change of reactivity for E207Q can be due to additional factors, has co-operativity effect or a change in solubility of the protein, but unlikely caused by a direct interaction of the amino acid in position 207 with the substrate. Another point mutation considered is the replacement of methionine in position 202. According to Camerini-Otero et al. (22) who performed a systematic mutation in L2, the positions from 197 to 206, except for 204, 205 and 199, show only a minor change in reactivity. We replaced the position 202 with tyrosine, M202Y, to observe how the increase of the polarity in that point can preserve the reactivity, although we suspect that the small change in reactivity is due to the lack of interactions of the residue in position 202 with the ssDNA, as shown by the X-ray structure. The results reveal a decrease in the binding free energy, by 23 kcal/mol, but not so drastically as in the case of E207Q. Another residue which seems to have little interaction with ssDNA is the position 204, Reca430, but contrarily to the previous case, a mutation in that point influences sensibly the binding. This mutation has been extensively studied (35, 109), and the results were consistent in terms of decrease of the binding affinity with a ssDNA, despite the use of different essays. Our calculations confirm that the amino acid in that position does not contribute in a significant way to total the binding free energy, but the overall binding free energy decrease by 18 kcal/mol. However, there is a high value in the binding entropy for MG204S, Table III, suggesting how for this mutation there is a strong entropic contribution that reduces the absolute binding free energy, as expected from experiments. A very critical mutation for the enzyme activity appears to be in position 196 where any change gives a complete inactivity of the enzyme (22). We considered the replacement of the Arginine with Lysine, R196K, because of the close similarity that both the amino acids have, despite the very different effect they induce on the RecA reactivity. Although the X-Ray structure reveals how the R196 makes a hydrogen bond with the phosphate group of the interacting ssDNA, like many other amino acids located in L2, the contribution to the overall binding of Arginine 196 is not clearly determined by experiment. Our calculation showed that the R in position 196 provides the highest value in the biding with a ΔGcoul+GB higher that 15 kcal/mol for all cases examined, significantly more pronounced then the other amino acids involved in the binding, confirming how that position is critical to the enzyme activity. For the mutation, the binding free energy of R196K is reduced sensibly, 24 kcal/mol, and the system went to a strong modification before reaching the equilibrated conformer, as shown by the correlation map in Figure 2 and the energy envelope plot in Figure 5. Beside the general outline that describes the motivation of the choice of the mutants, we examined a series of different methods to analyze and compare the reactivity of RecA-T with the point mutations. The first aspect considered is the flexibility of the complex. The RMSD averaged over 20 ns in function of the residues are plotted in Figure 1. The pattern that reveals the correlation and anti correlation motion of the complexes can be seen in the off diagonal areas of the cross correlation plots relative to Cα and P atoms, Figure 2. In general the indicative values in the cross correlation matrix are above 0.25 and lower than -0.25. According to the plots the three monomers for each complex are not equivalent in terms of dynamics. There is a general correlation pattern along the diagonal part of the plot, showing how the Cα atoms of each monomer are correlated among them, but each monomer, reveals anti-correlation pattern with respect to the remaining fragments. This behavior is particular noticeable in R196K where the correlation map assumes strong correlation value for the central monomer, and anti-correlation patterns on the adjacent structures. The movement of dT9 is rather hard to visualize with this approach because it is depicted by a tiny part of the area representing the last 9 residues over more than 700, Figure 2. There is, however, a general correlated motion for few bases relative to each monomer, top and right border of the plot, showing how the binding character is still maintained along the mutations. This is particular noticeable for R196K, where there is a strong correlation for the central monomer and dT9, and a pronounced anti-correlation is visible for the first and third monomer interacting with the bases. Another tool extensively used which is very versatile to analyze and visualize the protein dynamics is the principal component analysis, PCA. This theory considers the covariance matrix of the atoms, in our case the Cα and P, which results diagnolized by an orthonormal transformation matrix, whose eigenvectors are defined as principal modes. The projection of the trajectory onto the principal modes produces principal components whose eigenvalues represent the mean square fluctuation along that component. We calculated and visualized the first two principal components, Figure 5. For all systems, there is a general motion describing the binding movement, i.e., the movement of the binding loops towards the dT9 segment, confirming how even if the mutations influence the magnitude of the binding free energy, the general behavior is still toward the formation of the complex. One of the major features of PCA is the reduction of dimensionality of the trajectory data, and it underlined the major dynamical features with the first few eigenvectors describing the global motion of the system. The projection of the trajectory onto the principal components gives the energy landscape. In order to facilitate the visualization, we represented the tridimensional histogram of the trajectory projection on the first two principal components, using the minimum energy landscape procedure. The sampling data do not contain barrier information, but when funnels are highly separated, it is an indication that a high barrier might be present. The energy envelopes represented in Figure 5 show the basins all connected by valleys with a rather complex structure of microstates. The systems however, do not have a significant barrier to overcome, but simply an equilibration toward a more suitable conformation for the binding. The presence of deeper basins for the mutants, in particular for R196K, show how the structure must undergo major changes to reach the equilibration, contrarily to RecA-T where the transitions appear to be rather smooth. Conclusion The goal of this study is to shed light on the modality of the interaction of the recombinases protein RecA and a few selected mutants with a ssDNA model. To achieve this, a series of calculations aimed to estimate the absolute free binding free energy were performed using the MM-PB(GB)SA protocol. The technique is based on the Poisson-Boltzmann (PB) and the Generalized Born (GB) solvent accessible surface area methods. The solute entropic contribution is also considered and qualitatively estimated by normal mode analysis. All the modeling was performed using the commercial code Amber 10. The results confirm how the highest binding free energy is found for the wild RecA, and all the point mutations examined lead to a reduction of the binding character. Moreover, the data underline how a strong contribution of the binding free energy is due to the interaction with the ARG196, a critical residue for the functionality of the enzyme. Supplementary Material Supplementray data and figure dealing with binding entropy calculated by the quasiharmonic analysis is available at no charge from the authors directly; the supplementary data can also be purchased from Adenine Press for US $50.00 Acknowledgement We gratefully acknowledge support for this work from the NASA Space Radiation Risk Assessment Project. We would like to extend our gratitude to Dr Shaowen Hu for the useful discussion on the computational methods and to Masani Denmon-Carra for helpful considerations. References and Footnotes
|