Issue June 2010
Current Perspectives on Nucleosome Positioning

category image Volume 27
No. 6 (713-894)
June 2010
ISSN 0739-110
Open Access

Nucleosome Positioning by Sequence, State of the Art and Apparent Finale (741-746)

All major suggestions about the nucleosome positioning sequence pattern(s) are overviewed. Two basic binary periodical patterns are well established: in purine/pyrimidine alphabet - YRRRRRYYYYYR and in strong/weak alphabet –SWWWWWSSSSSW. Their merger in four-letter alphabet sequence coincides with first ever complete matrix of nucleosome DNA bendability derived from very large database of nucleosome DNA sequences. Its simplified linear form is CGGAAATTTCCG. Several independent ways of derivation of the same pattern are described. It appears that the pattern represents an ultimate solution of long-standing problem of nucleosome positioning, and provides simple means for nucleosome mapping on sequences with single-base resolution.

Edward N. Trifonov1,2*

1Genome Diversity Center, Institute of Evolution, University of Haifa, Mount Carmel, Haifa 31905, Israel
2Division of Functional Genomics and Proteomics, Faculty of Science, Masaryk University, Kamenice 5, Brno CZ-62500, Czech Republic

trifonov@research.haifa.ac.il

Open Access Article
The authors, the publisher, and the right holders grant the right to use, reproduce, and disseminate the work in digital form to all users.


Click here to download PDF.

Results and Discussion

The first complete matrix of nucleosome DNA bendability has been recently derived (1) from large database of nucleosome core DNA sequences (~160,000) generated by MNase digestion of C.elegans chromatin (2). In simplified form the matrix is described by the sequence CGGAAATTTCCG (10 bases from CG to CG), called further a CG/AT motif, with CG and AT elements five bases apart, at the centers of complementary symmetry of the bendability pattern. The ideal nucleosome positioning pattern would be, thus, the repetition (GGAAATTTCC)n

with correction for the period 10.4 bases (see, e.g., (3)) instead of 10 bases. For derivation of the pattern an original signal processing procedure has been applied – signal regeneration from its parts (1). Essentially, all occurrences of, say, CGxxxxxxxxCG combination have been collected, and preferences of various dinucleotides to different positions between the CG elements evaluated. The sequence (GGAAATTTCC)n is more advanced, detailed version of the very first, 26 years old bendability patterns (4): (RRRRRYYYYY)n – RR/YY motif – and its variant (AAAAATTTTT)n – AA/TT motif. Original assumption that DNA sequence is the major factor in nucleosome positioning, in vivo as well, has been confirmed already in 1984 (5-7) but exact formulation of the positioning pattern remained elusive all these years. Since early 80’s when the problem was first outlined (8, 9) and very approximate solution suggested (4) many other tentative patterns have been offered. All had some good points behind and all should bear, therefore, some resemblance to an ultimate solution. The solution should be, probably, unique and universal, since the physics of DNA bendability does not imply any uncertainty (though not fully described yet), and is the same for all species. All the patterns suggested since then, including the CG/AT motif, are combined in the Figure 1, aligned to maximize the similarities. It is easily seen, letter by letter, that the simple higher occurrence common consensus of the motifs is (GGAAATTTCC)n, which is identical to the CG/AT motif derived from C. elegans nucleosomes. None of other suggested motifs scores better when compared to the rest of the set. This alone is a good reason to believe that the CG/AT sequence is a universal DNA bendability pattern.

Trifonovfig1.jpg

Figure 1: Alignment of suggested nucleosome positioning patterns, 1983-2009. Minor groove positions are indicated on the top by vertical bars.

Another reason is that this motif can be derived from simple DNA deformability considerations, by minimizing unstacking of bases and base pairs caused by DNA bending on the surface of the histone octamer (21). Indeed, the opening of the rolls (angles between base pairs) at the minor grooves of nucleosome DNA oriented outwards (OUT) is larger than for the rolls at the minor grooves oriented IN (22). This means that in order to minimize the unstacking the [A,T] dinucleotides should be placed in positions OUT, while the [C,G] dinucleotides should be placed IN, in accord with their respective lower and higher values of stacking interactions (23). That suggests the optimal arrangement of strong [C,G] and weak [A,T] dinucleotides: (SSxWWWWxSS)n – SS/WW motif. On the other hand, the purines of RR˙YY base pair stacks in orientation orthogonal to the DNA-histones interface should be closer to the surface, to minimize the unstacking of the purines, which is energetically more expensive as compared to pyrimidines. This suggests another binary arrangement: (RRRRRYYYYY)n – RR/YY motif. Both binary patterns have been suggested before as possible nucleosome positioning patterns (4, 11,13-15). Derivation of four-letter combination that satisfies both binary patterns is straightforward (Figure 2).
Trifonovfig2.jpg


Figure 2: Merger of two binary nucleosome positioning patterns, SS/WW (15) and RR/YY (11).

Taking into account that eukaryotic genomes are rather A+T rich, one should place in the uncertain R and Y positions A and T, respectively. We, thus, arrive again to the CG/AT motif. Formally, one can also combine the SS/WW and RR/YY motifs in (CCTTTAAAGG)n – GC/TA pattern, but this would be inconsistent with the optimization of the deformational unstacking of bases as described above. The IN bases, Sn, are followed by bases Rn oriented towards interface, and OUT bases Wn are followed by Yn oriented outwards (Figure 3). Note that researchers not dealing closely with details of DNA structure often do not realize that it is important to correctly assign 5’- and 3’- ends in schematic presentations of DNA. Indeed, each of the single-stranded ends of the DNA fragment has to be labeled either by 3’ or by 5’, and there are two ways to place the labels on the simple scheme. However, when the same duplex is drawn with all chemical details, there is only one consistent way to assign the ends. With the ends assigned properly, the 5’-3’ direction vectors exit from the minor grooves, as in the Figure 3, following the “rule of fountain”. Wrong assignment of the ends would lead to the direction vectors rather entering the minor grooves. Incidentally, the GC/TA pattern, when read in wrong direction (3’-5’), turns in the same sequence as CG/AT pattern.

The two binary expressions of the nucleosome positioning pattern are formally satisfied by altogether four 4-letter alphabet solutions (Figure 4)

The CG/AT pattern is the one originally derived from the nucleosomes of C. elegans and confirmed as described. The GC/TA pattern contains TA-periodical component and is inconsistent with DNA deformability properties (see above). However, TA-periodical sequences have been suggested as strong nucleosome positioning signals, on a rather solid experimental basis (18,19). From a random sequence pool those sequences have been selected which make stable associations with histone octamers. It turned out that common for these sequences is 10-11 base periodicity of TA dinucleotides. We believe that the preference of the artificial nucleosomes to the periodical TA steps is, actually, due to very low stacking interactions between base pairs in TA·TA stacks (23). They are not only easiest to deform but also prone to kinking (24), being, thus, hot spots for mutations or digestion. Natural nucleosomes with periodical TA would be, therefore, selected against. Analysis of periodicities in 13 fully sequenced eukaryotic genomes (25) showed that weakly periodically positioned TA dinucleotides are detected only in yeast (26, 25). This rare manifestation of the TA periodicity could well be due to sequence exclusion or to sequence linkage effects (1) as almost all other dinucleotides in yeast genome are periodical, especially dinucleotides AA and TT to which the TA dinucleotides are immediately linked. The motifs TG/GT and (complementary) AC/CA (Figure 4) are only partially supported by dinucleotide distance analyses (25). Periodicities of AC and GT dinucleotides are detected only in one of 13 genomes, C. elegans, and perhaps are caused, again, by sequence exclusion (almost all other dinucleotides of this genome as well are strongly periodical). The TG and CA dinucleotides, however, are rather periodical (four genomes of thirteen (25)). The two last patterns in the Figure 4 are likely to be merely formal, rather degenerate solutions.

Trifonovfig3.jpg


Figure 3: One helical turn of DNA bent on the surface of the histone octamer (scheme). Minor grooves in contact with the surface (IN) and oriented toward the exterior (OUT) are indicated on the top by vertical bars.


One a posteriori thought, after finding the presumably universal CG/AT positioning motif, is irritatingly simple.

If there is certain universal sequence motif in eukaryotic genomes similarity to which would attract the histone octamers, then probably some elements (short subsequences) of this motif might be overrepresented in the repertoires of the oligonucleotides of the eukaryotic genomes. Indeed, the most frequent trinucleotides in eukaryotes are AAA, TTT, AAT and ATT (e.g., (27)). These triplets may be combined in the sequence AAATTT. The most frequent triplets of the series xAA are GAA triplets (apart from AAA), and amongst the TTx trinucleotides the dominant ones are TTC (apart from TTT). They fuse together with the above in the sequence GAAATTTC which is an almost complete CG/AT nucleosome positioning motif. In other words, even most primitive oligonucleotide frequency approach, apparently, leads to the universal CG/AT pattern.

Trifonovfig4.jpg


Figure 4:Four sequences formally derivable from binary nucleosome positioning motifs. The sequences are aligned by RY and YR dinucleotides.

Thus, there are several good reasons to consider the sequence motif (GGAAATTTCC)n as universal nucleosome DNA positioning pattern - sequence analysis of C. elegans nucleosomes, consensus of all earlier suggested patterns, minimization of base pair unstacking, combination of binary positioning patterns, and reconstruction from most frequent trinucleotides. This pattern can be used for sequence-directed nucleosome mapping. A computer program has been developed for this purpose (28) that maps the nucleosomes with single base resolution. It is made publicly available (29). To verify any accurate nucleosome prediction procedure it is not appropriate to rely upon just correlation between predicted and experimental maps. One needs, indeed, high resolution test cases. Therefore for testing of the nucleosome mapping program several sequences have been used, for which exact positions of the central bases in the nucleosome DNA are known with atomic resolution (30-33). Predictions by the program fit the nucleosome center positions in the test cases within ±1 base (28). The mapping with the repeating CG/AT motif is, to our best knowledge, the first verified single base resolution nucleosome mapping procedure available, based on the sequence only.

Appearance of the CG dinucleotide in the nucleosome positioning pattern is rather surprising, considering its generally low occurrence in eukaryotic sequences. Recent studies indicate, however, that the CG element plays, indeed, a special role. First, it displays 10.4 base periodicity almost as often as the AA and TT dinucleotides do (25). Second, in the genome of honey bee it is, actually, the major periodical component, while in case of D. discoideum and H. sapiens the periodicity is displayed by the CG dinucleotides only (25). Finally, in the Alu sequences the CG elements appear at distance 31-32 bases (10.4x3) from one another (34), suggesting involvement of the sequences in the nucleosomes. Methylation/demethylation of CpG would modulate the nucleosome stability, so that the CG-containing nucleosomes could be considered as “epigenetic nucleosomes” (34).

One may, naturally, ask why the CG/AT sequence has not been identified before as nucleosome positioning motif. The truth is that ideally bendable sequence, such as the CG/AT pattern, perhaps, would cause exceptionally tight binding of DNA to the octamers. Large amount of strong nucleosomes would be energetically rather expensive to unfold during template processes. Besides, since genomic sequences carry many different overlapping messages (codes) (35, 36) the strong sequence constraint towards the ideal nucleosome positioning pattern would exclude any other messages from the sequences. The golden middle is, therefore, just similarity of natural sequences to the ideal pattern, very low similarity, in fact. That explains why it took so long time to, finally, reveal the pattern. The earlier nucleosome DNA sequence collections (13, 10, 37) were not sufficiently large (of the order of several hundreds) to ensure reliable processing of the weak signal. With the latest database of 160,000 sequences (2) it became possible.

The weak signal is not a problem for the histone octamer, though. It may select the best bendable segments in random sequence DNA, in experimental setting as in (18, 19). It also may find suitable pieces in prokaryotic sequences (38) that had never been in contact with histones in vivo. One analogy comes to one’s mind: nucleosomes are like rain-puddles on country road. Each puddle finds its place, no matter how shallow. As the experimental nucleosome mapping indicates, most of the nucleosomes have only marginal stability (39). It does not mean, however, that their positions are fully uncertain.

Conclusion

Chromatin code is a well hidden, weak periodical DNA sequence pattern that is recognized by histone octamers.

The repeat unit of the pattern consists of eight dinucleotides (GG, GA, AA, AT, TT, TC, CC and CG) fused together in the sequence CGGAAATTTCCG. Each of the dinucleotides occupies certain position within the repeat, such that respective dinucleotide stack in DNA is optimally oriented relative to the surface of the octamer, to minimize energy of deformation (unstacking) of DNA base pairs.

The CG*CG stacks are centered at the minor grooves oriented towards the octamers, while AT*AT stacks are centered at minor grooves oriented outwards. Both self-complementary elements are, thus, positioned at local axes of dyad symmetry. The whole motive is complementarily symmetrical, as required by deformation properties and helical symmetry of DNA duplex.

To ensure preferential bending of DNA in certain direction, only rather small proportion of the optimally positioned dinucleotides is necessary, so that any given nucleosome DNA sequence bears only weak resemblance to the ideal repeating pattern.

In short, the chromatin code is the repeat (GGAAATTTCC)n. It is an easy guide, though not readily visible.

This research was reported by the author in part at 16th Conversation in Biomolecular Structure and Dynamics, Albany 2009 (40).

Acknowledgements

This work has been partially supported by Israel Science Foundation (grant 222/09). Discussions with Idan Gabdank, Danny Barash and Zakharia Frenkel are highly appreciated.

References and Footnotes

  1. I. Gabdank, D. Barash, and E. N. Trifonov. J Biomol Struct Dyn 26, 403-412 (2009).
  2. S. M. Johnson, F. J. Tan, H. L. McCullough, D. P. Riordan, and A. Z. Fire. Genome Res 16, 1505-1516 (2006).
  3. A. B. Cohanim, Y. Kashi, and E. N. Trifonov. J Biomol Struct Dyn 23, 559-566 (2006).
  4. G. Mengeritsky and E. N. Trifonov. Nucl Acids Res 11, 3833-3851 (1983).
  5. W. Linxweiler and W. Hörz, Nucl Acids Res 12, 9395-9413 (1984).
  6. H. R. Drew and C. R. Calladine. J Mol Biol 195, 143-173 (1987).
  7. N. Kaplan, I. K. Moore, Y. Fondufe-Mittendorf, A. J. Gossett, D.Tillo, Y. Field, E. M. LeProust, T. R. Hughes, J. D. Lieb, J. Widom, and E. Segal. Nature 458, 362-366 (2009).
  8. E. N. Trifonov and J. L. Sussman. Proc Natl Acad Sci USA 77, 3816-3820 (1980).
  9. E. N. Trifonov. Nucl Acids Res 8, 4041-4053 (1980).
  10. I. Ioshikhes, A. Bolshoy, K. Derenshteyn, M. Borodovsky, and E. N. Trifonov. J Mol Biol 262, 129-139 (1996).
  11. F. Salih, B. Salih, and E. N. Trifonov. J Biomol Struct Dyn 26, 273-281 (2008).
  12. V. B. Zhurkin. FEBS Let 158, 293-297 (1983).
  13. S. C. Satchwell, H. R.Drew, and A. A. Travers. J Mol Biol 191, 659-675 (1986).
  14. T. E. Shrader and D. M. Crothers. Proc Natl Acad Sci USA 86, 7418-7422 (1989).
  15. H. R. Chung and M. Vingron. J Mol Biol 386, 1411-1422 (2009).
  16. A. Bolshoy. Nature Struct Biol 2, 446-448 (1995).
  17. P. Baldi, S. Brunak, Y. Chauvin, and A. Krogh. J Mol Biol 263, 503-510 (1996).
  18. H. R. Widlund, H. Cao, S. Simonsson, E. Magnusson, T. Simonsson, P. E. Nielsen, J. D. Kahn, D. M. Crothers, and M. Kubista. J Mol Biol 267, 807-817 (1997).
  19. P. T. Lowary and J. Widom. J Mol Biol 276, 19-42 (1998).
  20. S. B. Kogan, M. Kato, R. Kiyama, and E. N. Trifonov. J Biomol Struct Dyn 24, 43-48 (2006).
  21. E. N. Trifonov. J Theor Biol 263, 337-339 (2010).
  22. T. J. Richmond and C. A. Davey. Nature 423, 145-150 (2003).
  23. A. Krueger, E. Protozanova, and M. D. Frank-Kamenetskii. Biophys J 90, 3091-3099 (2006).
  24. P. T. McNamara, A. Bolshoy, E. N. Trifonov, and R. E. Harrington. J Biomol Struct Dyn 8, 529-538 (1990).
  25. T. Bettecken and E. N. Trifonov. PLoS ONE 4, e7654 (2009).
  26. A. B. Cohanim, Y. Kashi, and E. N. Trifonov. J Biomol Struct Dyn 22, 687-694 (2005).
  27. M. Costantini and G. Bernardi. Proc Natl Acad Sci USA 105, 13971-13976 (2008).
  28. Abstracts of Albany 2009: 16th Conversation. June 16-20, Albany, New York, USA; I. Gabdank, D. Barash, E. N. Trifonov. Complete Nucleosome DNA Bendability Matrix and Sequence-Directed Nucleosome Mapping (C. elegans), Abstract #201. J Biomol Struct Dyn 26, 787-927 (2009).
  29. http://www.cs.bgu.ac.il/~nucleom/
  30. K. Luger, A. W. Maeder, R. K. Richmond, D. F. Sargent, and T. J. Richmond. Nature 389, 251-260 (1997).
  31. J. M. Harp, B. L. Hanson, D. E. Timm, and G. J. Bunick. Acta Cryst D 56, 1513-1534 (2000).
  32. C. A. Davey, D. F. Sargent, K. Luger, A. W. Maeder, and T. J. Richmond. J Mol Biol 319, 1097-1113 (2002).
  33. M. S. Ong, T. J. Richmond, and C. A. Davey. J Mol Biol 368, 1067-1074 ( 2007).
  34. F. Salih, B. Salih, S. Kogan, and E. N. Trifonov. J Biomol Struct Dyn 26, 9-15 (2008).
  35. E. N. Trifonov. Bull Math Biol 51, 417-432 (1989).
  36. E. N. Trifonov. In: Encyclopedia of Molecular Biology, (Ed. T. E. Creighton) John Wiley & Sons, Inc., New York, pp. 2324-2326 (1999).
  37. M. Kato, Y. Onishi, Y. Wada-Kiyama, T. Abe, T. Ikemura, S. Kogan, A. Bolshoy, E. N. Trifonov, and R. Kiyama. J Mol Biol 332, 111-125 (2003).
  38. N. Ramsay, G. Felsenfeld, B. M. Rushton, and J. D. McGhee. EMBO J 3, 2605-2611 (1984).
  39. A. Valouev, J. Ichikawa, T. Tonthat, J. Stuart, S. Ranade, H. Peckham, K. Zeng, J. A. Malek, G. Costa, K. McKernan, A. Sidow, A. Fire, and S. M. Johnson. Cenome Res 18, 1051-1063 (2008).
  40. Abstracts of Albany 2009: 16th Conversation. June 16-20, Albany, New York, USA; E. N. Trifonov. Nucleosome Positioning by Sequence, State of the Art, Abstract #210. J Biomol Struct Dyn 26, 787-927 (2009).