Issue February 2006No. 4 (p 357-484) February 2006 ISSN 0739-110
Open Access Evidence for Long Poly(dA).Poly(dT) Tracts in D. Discoideum DNA at High Frequencies and Their Preferential Avoidance of Nucleosomal DNA Core Regions (p. 429-446)The eukaryote, Dictyostelium discoideum, has one of the most (A+T) rich genomes studied to date. Isolated nuclear D. discoideum DNA (AX3 strain) was used to qualitatively determine the frequency and length distribution of long (dA).(dT) homopolymer tracts in this genome, in comparison to the less (A+T) rich calf thymus and Schistosoma mansoni DNAs that had few observable long tracts. These experimental data accurately reflect the significantly elevated frequencies of long tracts found computationally within the D. discoideum intron and flanking sequences, but not exons. PCR amplification of long (dA).(dT) homopolymer tract containing sequences was carried out. Then experimental biotinylated (dT)18 probe hybridization to the PCR amplified DNA showed that the long (dA).(dT) homopolymer tracts were enriched in D. discoideum sequences only hundreds of base pair in length, under conditions where no equivalent hybridization was observed to S. mansoni DNA or calf DNA sequences. Similar probe hybridization to DNA isolated following micrococcal nuclease digestion of D. discoideum chromatin demonstrated that long (dA).(dT) homopolymer tracts were more highly enriched in nucleosomal DNA lengths that included the internucleosomal linker as compared to shorter linker free mononucleosomal lengths. This observation is in agreement with the frequency of tract spacing results calculated from GenBank sequence data. These frequency data indicate that adjacent long tracts plus the intervening spacer DNA are found at peak lengths (average 42bp), exactly characteristic of the internucleosomal spacer region of D. discoideum chromatin and are in sufficient number to be found in nearly half of all nucleosomes. Compared to shuffled tract sequence controls, these lengths of adjacent long tracts plus the intervening spacer DNA were found to be significantly enriched. Lesser enrichments are observed at lengths corresponding to adjacent tracts being separated by nucleosomal core length DNA sequences (145-185bp). These data strongly suggest that adjacent long tracts occur spaced at selected lengths so as to avoid the central core regions of nucleosomes and instead are found localized within internucleosomal DNA linker and core edge regions in D. discoideum chromatin.
Key words: Dictyostelium DNA; Frequency distributions; Homopolymer tracts; Length distributions; Nucleosomes; and poly(dA).poly(dT). Kenneth A. Marx, Ph.D.* Center for Intelligent Biomaterials Introduction The slime mold, D. discoideum, long studied as a simple developmental model system, is a lower eukaryote for which considerable genetic linkage and DNA sequence information exists. We have studied this organism?s DNA because of its unusually high (A+T) base content (76%), which makes it of interest from a physical chemical and information theory perspective. We previously studied the experimental melting of this organism?s whole nuclear and fractionated DNA (1, 2). From both the melting of D. discoideum DNA and an examination of GenBank sequences for this organism, it is clear that the genome exhibits a high frequency of long (dA).(dT) homopolymer tracts occuring out to quite long lengths. In a previous study, we reported that long poly(dA).poly(dT) homopolymer tracts occured at high frequencies in D. discoideum DNA intron and flanking sequences, but not coding sequences (3). Poly(dA).poly(dT) tracts possess unusual physical and structural properties. Compared to non-homopolymer tract sequences, these homopolymer tracts exhibit the following: an unusual propellor-twist of the base pairs leading to bifurcated H-bonds within a base pair to adjacent bases, a shorter helical repeat, a narrower and deeper minor groove with a defined spine of hydration, and a straighter and more rigid helix over longer lengths (4, 5, 6, 7). Perhaps as a result of these unusual properties, accumulating evidence demonstrates that long homopolymer tracts are excluded from being complexed within nucleosome core structures where DNA undergoes maximum interactions with the core histones (8, 9, 10, 11, 12, 13, 14). This restricted location may be functional as well since long tracts have also been demonstrated to have promoter activity (14, 15, 16, 17). The higher than expected frequency of long poly(dA).poly(dT) tracts and their greater than expected lengths is a general property of nearly all eukaryotic organisms, as we have documented in a wide survey studying 27 eukaryotes (18). Particularly in D. discoideum, we have shown that the tracts are more frequent and longer than expected when compared to occurrence frequencies and lengths in random generated DNA sequences of equivalent base composition (19). Also, in D. discoideum, the poly(dA).poly(dT) tracts in intron and flanking DNA regions occur in two frequency vs. length N classes. The first is comprised of short tracts, N < 8bp, where frequencies are as expected from random occurrence in the base composition. The second class is comprised of long tracts, N > 10bp, where frequencies are far greater than expected on a random occurrence basis for that base composition. Within the 8-10bp transition range, the frequencies gradually increase from those characteristic of the first class to those of the second class of highly enriched long tracts. The frequent occurrence of long tracts (> 10bp) in most eukaryotes suggests their origin via the operation of slip-strand events during replication (20, 21). Long poly(dA).poly(dT) tracts are not spaced randomly in D. discoideum DNA, but occur with an average periodicity of 185-190bp (19), matching the nucleosome size, 187bp, experimentally determined in D. discoideum chromatin (22). Furthermore, in a small study of the then available DNA sequences, adjacent long tracts were found to have a length of spacer sequence plus two adjacent tracts that would fit exactly into the average experimentally determined nucleosomal linker DNA region of size 42bp (19). Both of the above facts suggest that long d(A).d(T) tracts occur preferentially packaged within the nucleosomal linker regions of D. discoideum chromatin. In the two previous poly(dA).poly(dT) tract studies of D. discoideum DNA mentioned above, we examined only the small fraction (0.34%) of this organism?s genome then available as single gene containing files in GenBank. Given the predominance of these tracts in D. discoideum, we decided in the current study to further examine the frequency and length distribution of these (dA).(dT) homopolymer tracts. While large sections of the D. discoideum genome have recently been sequenced, these have not been sufficiently annotated with exon and intron locations for our use in the current study (23, 24). Therefore, we did this experimentally via hybridization experiments on the whole genome and compared the results to the values calculated from the ∼10-fold larger D. discoideum DNA single gene containing sequence files currently available from GenBank (∼3% genome). We utilized a total of 1,399,375bp nuclear DNA in the D. discoideum GenBank files. The tract frequencies of D. discoideum DNA were determined as a function of length for GenBank sequences broken up into total coding, intron, and flanking files. We also determined the length distribution of adjacent tracts and their intervening spacer DNAs and showed that this distribution was enriched at a variety of lengths that would fit tracts within the nucleosomal linker region of D. discoideum chromatin and avoid the central core regions of nucleosomes. This conclusion agreed with the results of the D. discoideum DNA hybridization experiments we carried out using the biotinylated poly(dT)18 probe. Materials and Methods D. discoideum DNA Isolation DNA was isolated from nuclei of log phase D. discoideum AX3 strain cells. The isolation has been described previously in detail (2). In summary, purified nuclei were treated sequentially with: NaCl, SDS, and RNase, followed by extraction three times with equal volumes of phenol:chloroform:isoamyl alcohol (100:100:4). DNA was then ethanol precipitated, washed two times with 70% ethanol, air dried, and resuspended in TE buffer (10mM Tris, 1mM EDTA, pH 8.0). Only DNA with the EcoRI restriction pattern characteristic of D. discoideum was used for further experiments (25). CsCl Gradient Fractionation DNA was purified from RNA by ultracentrifugation in CsCl gradients (40,000rpm for 36hr in a 70 Ti Beckmann rotor) containing ethidium bromide. The DNA band was removed slowly from the gradient using a 16 guage needle. Ethidium bromide was extracted four times in buffer saturated butanol. Fractionation of DNA in netropsin (Boehringer Mannheim) CsCl gradients containing TE buffer was performed at a 2:1 weight ratio of netropsin: DNA. The gradients in quickseal tubes were ultracentrifuged at 45,000rpm for 30hr in a Beckmann 70 Ti rotor. Gradients were fractionated from the top using a Buchler Autodensiflow II device. OD260nm and fluorimetric measurements (Hoechst 33258 dye and Hoeffer TKO Minifluorometer with base composition correction) were made of the gradient fractions following dilution with buffer. DNA fractions were extracted four times with CsCl saturated isopropanol and dialyzed into buffer for the following experiments. Enzymatic Digestion, Blotting, and Hybridization Slot blots were carried out on CsCl gradient fractions containing DNA with and without netropsin. Using UV spectroscopy, the volume of pooled gradient fractions containing 0.25μg DNA was loaded onto a Millipore Immobilon-S nylon membrane using a slot blot apparatus. The DNA was denatured using 500μl of 1.5M NaCl, 0.5M NaOH, and then neutralized with 500μl neutralization buffer (1M Tris, 1.5M NaCl, pH 8.0). The membrane was air dried and the DNA crosslinked in a BioRad GS Gene linker UV chamber at C2 program of 50mJ if the membrane was dry or C3 program at 150mJ if the membrane was damp. For EcoR1 digestion, 20U EcoR1 (New England Biolabs) was added to 1μg DNA in 10μl of 1× EcoR1 buffer, and the solution was incubated 16hr at 37 °C. For micrococcal nuclease digestion of chromatin in isolated nuclei, cells were harvested at the end of the exponential growth phase (at 5-6 × 107cells/ml). Nuclei were isolated as described previously (2), then the nuclei were washed twice in incubation buffer (60mM KCl, 15mM NaCl, 1mM CaCl2, 20mM Tris-HCl, pH 7.8). The nuclei were resuspended in this buffer at 3 × 109 cells/ml and then incubated at 37 °C with varying concentrations of micrococcal nuclease (30U/ml, 60U/ml, 150U/ml) with reaction aliquots taken at times of 3, 6, 12, and 24mins. The reactions were stopped with 5mM EDTA and placed on ice. Phenol extractions of each DNA sample were then performed as previously described. Electrophoresis was carried out in agarose gels (IBI, Ultra Pure Molecular Biology Grade) in TE buffer. Staining was carried out with ethidium bromide and gels were photographed under 302nm illumination using an IBI camera system. For Southern blotting the gel was denatured by slow speed rotary shaking for 1hr in: 1.5M NaCl, 0.5M NaOH, followed by neutralization for another 1hr interval in neutralization buffer. DNA was transferred overnight to a Millipore Immobilon-S membrane by capillary action using 10× SSC (1.5M NaCl, 0.15M Na3C6H5O7). Wells were marked on the membrane with a biotinylated DNA and the DNA was UV crosslinked at C3 program 150mJ. In experiments where hybridization to nucleosomal DNA was performed, the control probe was D. discoideium total nuclear DNA, which was random primer labelled with a Kirkegaard & Perry Laboratories (KPL) DNA biotinylation kit. This produced a randomly biotinylated cytosine labelled DNA probe. Hybridization reactions were performed as follows. In a small hybridization tube, the membrane was allowed to stick to the side of the tube with the DNA side facing inward. The tube contained 5ml hybridization fluid (6×SSC, 0.5% SDS, and 5×Denhardt solution), 100μg denatured salmon sperm DNA carrier (Sigma), and probe nucleic acid at 50ng/ml. The probe nucleic acid was either a biotinylated poly (dT)18 or random primer biotinylated cytosine labelled D. discoideum DNA. The former probe was synthesized at the Massachusetts General Hospital DNA Synthesis Core Facility. Hybridization was performed at 55 °C for 20hr in a Techne HB-20 Hybridizer. Washing was then performed for 30min in wash buffer (2× SSC and 0.1% SDS) at various wash temperatures. Hybridization signal detection was carried out using chemiluminescence via the Phototope-Star kit protocol (New England Biolabs). Streptavidin was bound to biotinylated marker DNA or hybridized biotinylated poly (dT)18, followed by binding biotinylated alkaline phosphatase to the streptavidin. CDP Star substrate was then applied and 15min later the integrated chemiluminescence was measured with Kodak X-ray film for up to a 30min exposure, followed by film development and fixation. PCR Reactions and Product Stabilities PCR was carried out using the Perkin Elmer AmpliTaq DNA polymerase PCR kit. The reaction volume of 100μl contained 1μg poly(dA)18 primer, 5U of AmpliTaq DNA polymerase, 1μg of template DNA, 200μM each of dCTP, dGTP, dTTP and dATP, and a varying 0.5-2.5mM MgCl2, all in a buffer of: 50mM KCl, 10mM Tris.HCl, and pH 8.3. Three temperatures were used in the thermocycler for the first five cycles: denaturation at 92 °C for 1min; annealing at 38 °C for 1min; and 72 °C for 1min. The annealing temperature is sequence dependent and was determined by the commonly used relationship: [2(A&T number)+3(G&C number)+2]. For poly(dA)18, 2(18) + 2 = 38 °C. This temperature is not a stringent annealing temperature and was used only during the first five cycles to ensure product formation. The next 30 cycles were carried out under the following temperature conditions: denaturation at 90 °C for 30sec; annealing at 40 °C for 30sec; and extension at 72 °C for 30sec. The final extension incubation was carried out at 72 °C for 10min to ensure that all PCR products were fully extended. In experiments where the PCR product stabilities were assayed, the following procedure was carried out. Multiple wells of a 1% agarose gel were loaded with identical masses of the PCR product. The samples were electrophoresed and the DNA transferred and crosslinked to a Millipore Immobilon-S membrane as previously described. Hybridization was performed at 38 °C with biotinylated poly(dT)18, and then the membrane was soaked in wash buffer (2×SSC and 0.1% SDS). Each DNA sample in a separate lane of the gel was cut out and washed at a different temperature ranging from 30-55 °C for 30min. Once a membrane was washed, it was soaked in blocking buffer. All samples were processed for chemiluminescence detection simultaneously as described above. Computational Tract Frequency Determination The single copy gene sequences for D. discoideum were retrieved from the public sequence databases GenBank, EMBL, and DDBJ (www.ncbi.nlm.nih.gov/entrez/ query.fcgi?db=Nucleotide). We retrieved and utilized only nuclear DNA sequences for D. discoideum and S. mansoni. We used the program CleanUP (26), as previously described (18), to rid the retrieved sequences of sequence redundancies that could artifactually bias our calculated tract frequencies. Then the COMPILE program (, 27) was used to extract raw sequences from the GenBank-formatted documents and extract subsequences from them into the following functional categories: coding, intron, and flanking. These functional categories contained sequences as ASCII text files where the ends of individual sequences were tagged to prevent artifactual joining, thereby preventing potential artifactual tracts from being tabulated. Each functional file was subsequently analyzed using the program ?Poly? (www.bioinformatics.org/poly, 28), which calculates parameters for non-overlapping homopolymer tracts, including the frequencies of the homopolymer tracts of different types and of different lengths. For the analysis of the length distribution of two adjacent tracts and the DNA spacing those tracts in D. discoideum, we utilized a program called Filter that reads through the FASTA format of the database files and converts them to a long sequence string of output without any formatting. We next use this output in the Spacer-Tract program. In this program we use the terms tract and spacer. A tract refers only to homopolymer tracts of the poly(dA).poly(dT) type, where either the A or T base runs in the sequence strand being counted are identically. The poly(dC).poly(dG) tracts of any length occuring in the sequence strand are counted as non-tract sequences or spacers in these calculations. Tracts occuring at the exact beginnings and ends of sequence strings are not counted as tracts because of the uncertainty in assigning exact lengths. In this way, we avoid what could be considered ?end effects? of sequence strings that could introduce artifactual cases of the length quantity we are calculating. A spacer sequence refers to any sequence which lies between any two adjacent poly(dA).poly(dT) tracts. Spacer-Tract is a C language program designed especially for batch-processing sequence files. Spacer-Tract reduces the input sequence string into a sequence of numbers which represent alternating tracts of length N and spacer lengths. From this analysis, the user can select the tract length N that they wish to examine. With this N criterion identified, Spacer-Tract then reads through the numeric string and tabulated all instances of the length of spacer plus two adjacent tracts, where the two adjacent tracts both satisfy the criterion of being N or greater. Tracts in the string of length < N are considered as spacers in this analysis. After all instances of the spacer plus two adjacent tract lengths have been tabulated, the length instances identified are converted to frequencies by dividing the number of instances at the given length by the total sequence string length that was counted. It should be noted that we avoid concatenation of individual sequence files in tabulating instances of lengths of spacer plus two adjacent tracts since this would lead to artifactual instances at the ends of adjacent sequence files. Therefore, instances are tabulated only within individual sequences and then instances are summed for all sequences before frequencies of the given lengths are tabulated. Computational Tract Shuffling We carried out two types of sequence shuffling of the original sequences used in our study, followed by determination of the length frequencies in these shuffled sequences. The objective was to compare the distribution in the shuffled sequences to that from the original sequence frequencies. The two types of shuffling are: random shuffling (Rshuf) and conserved shuffling (Cshuf). Programs with these names were written to carry out the respective types of shuffling. In Rshuf, the position of each base was randomized in the output sequence using a random number generator at each base position to exchange its position with the base at any other possible position in the sequence. This was performed sequentially at each position throughout the sequence. We examined the effect of one Rshuf cycle versus carrying out this process ten times on each successive output sequence from the Rshuf program. The resulting calculated frequencies between the first and the tenth run through Rshuf showed no differences, indicating that the randomization was complete after one Rshuf cycle. It should be noted that this Rshuf procedure statistically eliminates all enrichments of the longer length tracts found at high frequencies that we demonstrated to occur in the original D. discoideum sequences. The Cshuf program carries out a sequence randomization at individual base positions in the same way that Rshuf does except that prior to base position randomization all tracts were removed, conserving the length and type (T or A runs) of the tracts. The sequence was then shuffled randomly followed by the conserved tracts being added back randomly. This conservative shuffled sequence is a more accurate control with which to judge the tract distribution properties of the D. discoideum DNA since all length tracts at their exact frequencies are conserved following shuffling. The primary change will be in randomization of the lengths of the spacer DNAs between tracts as well as the tract positions. We repeated this Cshuf procedure ten times, each time feeding the program output back into the program before using the final output in the Spacer-Tract program to determine the distribution of the lengths of the spacer plus two adjacent tracts in these conserved shuffled sequences. Results and Discussion D. discoideum DNA Fractionation and dT18 Localization D. discoideum DNA has been previously studied by fractionation in netropsin containing CsCl gradients, at a 2:1 netropsin: DNA ratio (1, 2, 25). In the case of the first two references the equilibrium high resolution melting properties of the different netropsin CsCl gradient fractions were investigated and those fractions containing the multiple sharp melting subtransitions from two satellite DNAs were identified. The satellite DNAs are comprised of the 10 major repetitive sequence EcoRI fragments (100-200 copies/cell) observed in the total restriction digest of this genome. Enrichment of the satellite DNA EcoRI bands observed by their equilibrium melting (1, 2), agreed with the enrichment of specific EcoRI bands demonstrated in different regions of the CsCl fractionated genome (25). In the present study, we have applied a CsCl fractionation approach, but also compared it to netropsin containing CsCl gradient fractionation to aid in the resolution of DNA classes based on (A+T) base composition differences. The different aim of this investigation is to demonstrate the genomic distribution of the d(A)n and d(T)n homopolymer tract sequences via hybridization of a d(T)18 probe to the genomic fractions. In Figure 1, we display two D. discoideum DNA CsCl gradient fractionations, one in the absence and the other in the presence of netropsin at a 2:1 molar ratio of netropsin to DNA bases. The gradient in the presence of netropsin is shifted to lower CsCl density from the gradient lacking netropsin. This is due to the known effect of netropsin binding in the DNA minor groove at AT base pairs (29), decreasing the CsCl density/bp of DNA (30). These fractionations are similar to those reported previously, in which the satellite DNAs are observed at higher densities, in gradients both in the absence and presence of netropsin (2, 25). For example, fractions 62-84 in the Figure 1 netropsin containing gradient, correspond to the fraction A melted in our previous study, which contained the most enriched melting subtransitions that we demonstrated to be the satellite DNA/EcoRI repetitive sequence bands (2). ![]()
Figure 1: Fractionation of total D. discoideum DNA in CsCl gradients with (+) and without (-) netropsin. The gradient DNA distribution (μg/ml) are presented with the left Y-axis. Selected fractions from the gradients were hybridized to biotinylated (dT)18 and the normalized chemiluminescence hybridization levels for them are presented (symbols above gradients) for each gradient using the right Y-axis.
Our aim here is to determine the distribution of long homopolymer tracts in the D. discoideum genome. Therefore, we initially carried out slot blot hybridizations of the biotinylated d(T)18 probe to selected individual fractions throughout the CsCl gradient fractions presented in Figure 1. These were performed under stringent hybridization conditions where a control pC1 plasmid DNA exhibited zero hybridization intensity. Since this plasmid contains tracts of length up to but no longer than 8bp, we know that short tracts will not hybridize under our stringency conditions. The results of these hybridizations are presented normalized to the hybridization of total D. discoideum DNA probe to that same fraction. Normalization was performed with the hybridization of total D. discoideum DNA probe because this takes into account individual gradient fraction variations in all of the steps in the hybridization process, where DNA length effects in gel to membrane transfer or hybridization efficiency could seriously skew results. For both gradients, the data clearly reveal that long homopolymer tracts are distributed throughout all gradient fractions. For the netropsin containing gradient, nearly all the fractions assayed exhibit a similar level of homopolymer tracts. However, for the gradient lacking netropsin, there is a clear preferential enrichment of these homopolymer tracts in the lowest density fraction sequences. This enrichment decreases for fractions of higher density, reaching an enrichment value for the highest density fraction assayed, 113, nearly 3-fold lower than that of the lowest density fractions. In the Figure 2 A-C and D-F panels, we present respective results from agarose gel electrophoresis of the (-)netropsin and (+)netropsin gradient fractions from Figure 1. The total EcoRI digested genome is presented in the left most gel lanes of the two gels except for marker DNAs labelled M also on the left in panels A and D. In panels A and D we present the ethidium bromide fluorescence distribution from the respective gradient fractions of (-)netropsin and (+)netropsin gradients. The 10 EcoRI fragments are clearly observed above the background fragment fluorescence. DNAs from the gradient fraction numbers displayed are electrophoresed in the other lanes with increasing density fractions loaded to the right. Most gradient fractions exhibit no significant differences compared to the total DNA distribution. In panels B and E we present autoradiography results following hybridization of biotinylated total D. discoideum DNA to the panels A and D gel contents following their transfer to membranes. Here a distribution of enriched EcoRI bands is observed that is similar to the panels A and D ethidium bromide band distributions. In both panels B and E, the fractions show clear enrichment in lower molecular weight repetitive sequence EcoRI bands. In both gradients, these enriched EcoRI bands are the same sequences that comprise the greatly enriched melting subtransitions we observed previously in the highest CsCl density fraction of the D. discoideum genome (2). When removing the biotinylated total D. discoideum DNA probe from the membrane and then hybridized the (dT)18 probe, we obtained the respective autoradiography distributions in panels C and F. The one repetitive sequence EcoRI fragment clearly containing homopolymer tract enrichment in both panels is EcoRI band 3 (see arrow). Other than this prominent repetitive sequence band, the hybridization intensity in each well is distributed over a range of molecular weights indicating that the long tracts occur within all sequence types. These hybridization data to all gradient fractions are in general agreement with the slot blot hybridizations presented in Figure 1, which demonstrate that tracts occur across all gradient fractions. For example, the decrease in d(T)18 hybridization at increasing density fractions in the (-)netropsin gradient in Figure 1 is reflected in the loss of single copy sequence hybridizations for fractions of increasing density in Figure 2C.
Figure 2: Agarose gel electrophoresis of selected fractions from the (-)netropsin gradient: panels A-C and from the (+)netropsin gradient: panels D-F. Total D. discoideum DNA EcoRI digests are shown in the left most gel lanes labelled T. DNA size markers are also shown in lanes labelled M. Panels A and D present ethidium bromide stained distributions. Panels B and E present autoradiography results following hybridization of biotinylated total D. discoideum DNA. In panels C and F, the total D. discoideum probe was removed and hybridization with biotinylated (dT)18 was carried out followed by streptavidin-Alk. Phos. binding and autoradiography of the chemiluminescence.
Fragment Length Spacing Distributions of Long poly(dA).poly(dT) Tracts by PCR We next decided to determine the distribution of spacing distances between long d(A).d(T) homopolymer tracts in D. Discoideum DNA. Our aim was to use PCR amplification of segments of DNA sequences containing two long tracts using poly A18 as a primer at a series of different Mg2+ concentrations. The logic was that adjacent tract sites in the genome, of large but varying size, would create amplified PCR fragments which could then be visualized and sized on agarose gels. This method would reveal the size distribution of a subset of all adjacent tracts, those with an adjacent (dA)n and a (dT)n tract on the same strand, plus the spacer DNA between those tracts. No use of tracts as primers could reveal the distribution of all tracts simultaneously. For example, it is not possible to amplify adjacent d(A)n tracts or adjacent d(T)n tract neighbor pairs on the same strand since the required self-complementary (dA)n and (dT)n primers could not be used successfully in the same PCR amplification experiment. The use of these non-traditional PCR primers in our experiments required a number of controls to ensure that the possible primer dimer artifact was not occurring. Therefore, we performed the following controls (data not shown). Blank control reactions containing no DNA at various Mg2+ concentrations ([Mg2+]) were performed and no PCR products were observed on gels. The pcDNA1 plasmid, containing no homopolymer tracts longer than [d(A).d(T)]8 in its sequence, also produced no observable PCR gel bands at any [Mg2+] used in the data presented here. This control serves to prove that in our experiments we are detecting the genomic spacing of two homopolymer tracts greater than [d(A).(dT)]8 in size. In addition to the D. discoideum genome, we carried out similar experiments on two other genomes, Schistosoma mansoni and calf. S. mansoni DNA is less (A+T) rich (70.6%) than D. discoideum DNA (76%) and calf DNA is similar to other mammals in (A+T) base composition (54%) (31). Calf thymus DNA, at all [Mg2+], produced no observable PCR products on gels (data not shown). These results are consistent with the much higher (G+C) base composition of calf thymus DNA and its lower tract frequencies at similar tract lengths (18). This result, along with the two controls mentioned above, prove that potential PCR primer dimer artifacts are not being produced in our experiments. In Figure 3, the results from PCR amplifications of the total D. discoideum and S. mansoni genomes carried out at a series of [Mg2+] are shown. The individual samples were electrophoresed, transferred to nylon hybridization membranes, hybridized to biotinylated (dT)18 and developed for chemiluminescence. In the case of PCR fragments produced from S. mansoni DNA, there appears to be only one well defined band, approximately 1300 bp in length. Since this results from PCR with d(A)18 at the very stringent [Mg2+] conditions of 0.5 and 1.0mM, this fragment must be defined by two long uninterrupted homopolymer tracts, one d(A)n and the other d(T)n, defining the fragment on one single strand. This fragment is totally absent in the gel at [Mg2+] 1.5mM and higher (2.0mM and 2.5mM results not shown). Clearly, PCR amplification is not occurring under the less stringent conditions at these long tracts. The absence of any other fragments or continuous background intensity in the 1.5mM [Mg2+]sample lane strongly suggests that no other homopolymer tracts of shorter lengths (∼n>8), exist in the S. mansoni genome spaced close enough together, say within approximately 3000 bp the effective PCR technique cut-off, to be amplified. This is consistent with the data we present later in Figure 4, where S. mansoni is observed to lack long tracts and where the observed tracts at any given length, ∼n=10 being the maximum observed, occur at significantly lower frequencies than those found in D. discoideum. ![]()
Figure 3: PCR amplifications of D. discoideum DNA and S. mansoni DNA at varying [Mg2+] using (dA)18 as primer. Following transfer from gel to filter membranes, biotinylated (dT)18 was hybridized, followed by streptavidin-Alk. phos. binding and autoradiography of the chemiluminescence.
In Figure 3, D. discoideum DNA PCR fragments are displayed for a series of increasing [Mg2+] from 0.5-2.5mM. The results are very different than was observed with S. mansoni DNA. With increasing [Mg2+] there is a significant decrease in the broad average PCR band size, from around 1.7kb at 0.5mM [Mg2+] to 500bp at 2.5mM [Mg2+]. This indicates that a significant number of long (N ≥ 18bp) tracts are spaced on average around 1.7kb or less in the genome. Only at the increasingly less stringent higher [Mg2+] concentrations will shorter tract lengths be primed in the PCR reactions. This leads to the observed decrease to 500bp or less average fragment spacing at 2.5mM [Mg2+]. These shorter fragments are defined on average by two shorter tracts (with N > 8bp) acting as low stringency PCR primer sites. Overall, these data compare favorably to an estimate of average tract occurrence from an earlier experimental study (25). It is also consistent with our previous detailed observation of the higher than expected frequency of long tracts in D. discoideum flanking DNA calculated from the then available GenBank sequences [0.12% of genome] (3, 18). As we described above, this PCR method amplifies only a subset of the possible adjacent tracts. Those sequence situations where either two long (dA) tracts or two long (dT) tracts occur closest together on the same strand would not be amplified with this primer. Therefore, we expect that these observed tract amplification length distributions represents upper limits to the actual lengths of spacer plus closest adjacent tracts in the D. discoideum genome. We investigated the melting behavior of PCR fragment populations from the Figure 3 gel. We carried out this analysis to serve as an artifact control for the varying stringency PCR experiments, as well as to understand the nature of the PCR fragments we generated in the Figure 3 experiment. Another gel, containing multiple identical samples to those in Figure 3, was electrophoresed and transferred to a nylon hybridization membrane. Then each identical lane was cut from the gel and subjected to melting at a particular temperature, followed by washing and detection of the remaining DNA sequences using biotinylated dT18 hybridization and chemiluminescence. We performed this melting analysis on two PCR sequence populations from D. discoideum, the 0.5mM and 2.5mM [Mg2+] PCR reactions, as well as the 1.3kb PCR band from the S. mansoni PCR 0.5mM [Mg2+] reaction (data not shown). The two D. discoideum PCR populations are rather similar in melting behavior, both possessing Tm values between 33-35 °C. This result might be expected since the DNA spacing the tracts can be unrelated to the low melting tract sequences defining the fragment ends. In contrast, the distinct S. mansoni PCR band exhibited a Tm of 47 °C. These values are consistent with the higher average (A+T) composition of the D. discoideum DNA sequences (76%) compared to the S. mansoni DNA (70.6%) (31). This is further evidence that the biotinylated (dA)18 primer did not produce artifacts in the PCR experiments. Tract Frequencies Calculated from GenBank Sequences Occur at Levels Well Above Random Occurrence We next examined the occurrence of d(A).d(T) tracts in D. discoideum and S. mansoni DNA single gene containing sequences downloaded from GenBank. To eliminate bias in the calculation of tract frequencies, we first removed redundant sequences using the CleanUP program (26). Following this, we had available a total of 1,399,375 D. discoideum nucleotides, or nearly 3% of the genome of this organism and 125,605 S. mansoni nucleotides. For each organism, these sequences were next separated into coding, intron, and flanking files using the COMPILE program. In these functional files, each sequence was kept separate from its neighbors to eliminate potential sequence end artifacts in the tract counting and determination of frequencies. The POLY program was then used on each functional file to count tracts of each length N (28). Tract length occurrences were then converted to frequencies, fobs, and the data were plotted in Figure 4 as log(fobs) versus tract length N. In much the same way as we observed previously for a 10-fold smaller D. discoideum DNA sequence sample (3), this figure demonstrates the significant enrichment of long d(A).d(T) tracts in both intron and flanking sequences, but not exons. The significantly lower negative slopes of both the intron and flanking DNA curves, for tracts at sizes N > 8-10 bp, are due to higher frequencies of tract occurrence than for sizes N < 8 bp. Also, as we demonstrated previously (3), the N > 8-10bp tracts occur at much higher frequencies than would be expected for tract occurrence in randomly generated sequences of equivalent base composition. Behavior similar to the D. discoideum intron and flanking sequences was not observed for the equivalent S. mansoni sequences. In the latter organism, the flanking and intron sequences possess d(A).d(T) tracts at only moderate lengths N and these occur at frequencies nearly the same as found in the coding sequences (18). The calculated representative genomic tract frequencies for D. discoideum and S. mansoni could be used to determine how they compared to the experimental PCR results in Figure 3 for these same organisms. Clearly, the calculated results agree semi-quanitatively with the PCR experiments. At the highest tract lengths observed in S. mansoni, around N = 13, Figure 4 shows that the D. discoideum genome has a 100-fold greater frequency of tracts. This is the reason that the D. discoideum DNA produced PCR evidence of closely spaced long tracts, but not S. mansoni DNA, with the exception of the 1.3kb band. Moreover, for D. discoideum introns and flanking sequences, the frequencies determined for the 10-20 nucleotide length tracts range from about 1 tract/1000 nucleotides for 10 nucleotide length tracts to about 1 tract/5000 nucleotides for 20 nucleotide length tracts. These frequencies correspond to 50,000 tracts of 10bp size and 10,000 tracts of 20bp size in the D. discoideum genome of 5 × 107 bp. Or thought of another way, on average there is approximately 1 tract of 10bp length found in all 1kb length sequences present in the genome. On average, tracts of 20bp are found in all sequences of 5kb length. Since A tracts and T tracts occur with equal frequency, these observed frequency magnitudes for 10-20 nucleotide length tracts from Figure 4, can explain qualitatively the PCR tract defined sequence amplification lengths we observed in Figure 3.
Figure 4: Plot of fobs vs. N, tract length, calculated for all (dA).(dT) homopolymer tracts from total and fractionated D. discoideum DNA and S. mansoni DNA sequences retrieved from the public databases.
Location of Long [d(A).d(T)] Homopolymer Tracts in Relation to the Nucleosome Structure Next, we carried out an experiment designed to understand the relationship between the position of long [d(A).d(T)] tracts and the repeating nucleosome structure of D. discoideum chromatin. In part, this experiment was motivated by our previous observation (19) that the majority of adjacent [d(A).d(T)] tracts of length N >10 were found to have an average total length for both tracts and the spacer DNA between them that corresponded to the average D. discoideum nucleosome linker DNA size, 42bp, that had previously been measured experimentally (22). However, these earlier results were based upon analyzing a nearly 10-fold smaller fraction of the sequenced genome (< 0.34%) than we used in the present study. To validate these results, suggesting that pairs of the long homopolymer tracts are preferentially localized in the nucleosomal linker regions within D. discoideum chromatin, in our current examination of this issue, we carried out DNA hybridization experiments described below that use the entire D. discoideum genome. Initially, we carried out a micrococcal nuclease digestion of intact D. discoideum nuclei. This generated an arithmetic series of DNA fragment lengths resulting from random internucleosomal DNA cleavage in the repeating nucleosomal arrays. For a few of the micrococcal nuclease digestion times, the DNA fragment distribution was most appropriate for our experiment, consisting primarily of monomer, dimer, and some trimer nucleosomal DNA sizes. DNA purified from reaction aliquots of two of these digestion time points are displayed in Figure 5A, following electrophoresis on an agarose gel and ethidium bromide staining. Clearly observable in the 3 and 6min. digestion samples are the monomer, dimer, and trimer length nucleosomal DNA fragments, as well as unresolved higher length fragments. This DNA distribution was transferred from the gel to immobilon-S membranes and was then sequentially hybridized using two different probes. The first probe was a total D. discoideum DNA probe of nucleosomal monomer size. The purpose of this probe was to act as a DNA mass normalization control for the second homopolymer tract hybridization probe. When the biotinylated total D. discoideum DNA probe was hybridized to the membrane, the results shown in the middle lanes of Figure 5A were observed. It is clear that the hybridization of this probe is localizing preferentially to and being driven by the high DNA mass concentration immobilized in the monomer, dimer, and trimer nucleosomal peaks. Hybridization intensity clearly reflects the underlying nucleosomal DNA mass distribution pattern in the gel lane, as we would expect for this total genomic sequence probe. Following stripping of the total D. discoideum DNA probe from the membrane, we carried out hybridization of the biotinylated (dT)18 probe and observed the chemiluminescence distribution in the right hand lanes in Figure 5A. There is intense hybridization over the entire size distribution of nucleosomal fragments, which is in agreement with the high level of tract occurrence in D. discoideum that we have already noted. However, the hybridization does not reflect the underlying nucleosomal DNA mass distribution pattern in the gel. In particular, the hybridization is as intense over the internucleosomal DNA containing fragments between monomer and dimer as it is over the monomer and dimer DNA fragments. However, the monomer and dimer fragments are present in significantly higher concentrations on the membrane compared to the internucleosomal DNA containing fragments, as the biotinylated total D. discodeum DNA probe has shown. Since the probe hybridization level is driven everywhere on the filter by the local concentration of sequences containing [d(A).d(T)] homopolymer tracts, this hybridization result indicates the existence of a significant enrichment of homopolymer tract sequences in the internucleosomal spacer DNA within D. discoideum chromatin. ![]() Figure 5: DNA from micrococcal nuclease digested D. discoideum chromatin. A) left to right, (-) the ethidium bromide fluorescence distributions for 3 and 6min nuclease digestion times following electrophoresis; chemiluminescence distribution of total D. discoideum DNA probe hybridized to the 3 and 6min samples following transfer and blotting; chemiluminescence distribution of (dT)18 probe hybridized to the same 3 and 6 min digested samples following prior stripping of the total D. discoideum DNA probe. Marker DNA size standards are shown in the left most gel lane. B) The hybridization intensity ratios [(dT)18 probe:total D. discoideum probe from 5A) of the scanned gel lanes at different gel positions were calculated and are presented along with the positions (underlined) of the trimer (T), dimer (D) and monomer (M) peaks. The arrowhead indicates the DNA size of maximum (dT)18 hybridization between monomer and dimer peaks. In order to more clearly demonstrate this fact, we have scanned the two 6min digestion samples hybridized with the total D. discoideum DNA and d(T)18 probes, then calculated their hybridization ratio at a series of gel positions as shown in Figure 5B. Clearly the ratio reflects the relative enrichment of long tracts in the DNA of varying lengths. At sizes above the trimer and higher, the ratio is around 1.0 and is noisy, but does not change significantly with increasing size. For these sizes the ratio of linker DNA (∼42bp) to core DNA (∼145bp) is relatively constant, since as the fragments lengthen they add one linker and one core DNA region for each additional nucleosome DNA length in the arithmetic fragment series generated by micrococcal nuclease. However, on going from trimer to dimer then to monomer the variation in linker to core is greater. In particular, the fully enzymatically trimmed monomer core has no linker remaining. When the dimer is initially cleaved by micrococcal nuclease, the smaller fragment generated has little of the internal linker remaining while the larger fragment has most of that linker attached to the monomer core. For the latter fragment, the linker to core ratio is relatively high. This scenario agrees with what we observed in Figure 5B where the minimum hybridization ratio, as low as 0.8, occurs at the lower size edge of the dimer (D) band. The ratio then dramatically increases for fragment sizes below the dimer, reaching its highest values between 1.2 to 1.3, demonstrating maximum tract enrichment. This maxima would correspond to the position of initially enzymatically trimmed monomer possessing nearly a full length linker attached to a core length. The hybridization ratio then decreases before rising again below the monomer size. The ratio decreases to a minimum value probably corresponding to the core fragments without any remaining linker DNA. That the ratio rises again at lower fragment sizes is in agreement with results from a computational study of the length distribution of long adjacent tracts plus the spacer DNA separating them. We describe these results in detail later in Figure 8C. However, they simply show that the frequency of the tract enriched size fragments from 42bp to 185bp, corresponding to the sizes exhibiting the rise in hybridization ratio, occur at frequencies above that of random occurrence. As a result, fragments of these lengths containing adjacent tracts are enriched in D. discoideum DNA. Consequently, the ratio we observed in Figure 5B is corroborated by computed properties of long tract spacings in the genome. Now we need to ask whether a mechanism exists to enrich these fragments in the micrococcal nuclease generated distribution. Just such a mechanism exists in the form of the known but slight AT base DNA sequence cleavage rate preference of micrococcal nuclease (32). Shorter fragments containing enriched AT sequences would result from micrococcal nuclease cleavage of the DNA in chromatin. Thus, tracts enriched within fragments of shorter lengths would likely be a result of this enzymatic preference, agreeing with our hybridization ratio plot in Figure 5B. Given the known sizes for nucleosomal core and internucleosomal linker DNA, we can estimate from the Figure 5B ratios that the relative tract occurrence in internucleosomal linker DNA is about 2-fold greater that that found in the nucleosomal core. This ratio is not high, but is reasonable since as we mentioned above, long tracts are not totally excluded from nucleosome core regions but are known to penetrate into the 20-25bp DNA at the ends of nucleosome core regions (10). We present further sequence computation based evidence below in Figures 7 and 8 for this penetration of long tracts into the sequence edge regions of the nucleosome core DNA. Length Distributions of Two Adjacent Homopolymer Tracts Plus Spacer DNA Suggest a Nucleosomal Linker DNA Location for Tracts In order to corroborate the experimental observation in Figure 5 that the long d(A).d(T) homopolymer tracts are preferentially located in the nucleosomal linker DNA of D. discoideum chromatin, we carried out the following calculations using sequences in GenBank. We determined the distribution of all instances of the following DNA length quantity-the total length of two adjacent long tracts plus spacer DNA, for tracts of different N minimum ranges. For this calculation, we preselected 458 D. discoideum sequences that were a minimum of 200 nucleotides in length and we did not apply the COMPILE program to break them into functional regions, but instead used the intact sequences. In Figure 6, we display the logarithm of the observed frequency, fobs, for each spacer plus two adjacent tract length for three different tract N length regions as representative data. In panel A, the results for all tracts N>3 are presented; in panel B results for all tracts N>5 are presented; and in panel C the results for all tracts N>9 are presented. The determinations were done for specific N regions, as opposed to individual N values, so that we could accumulate sufficient instances to make the results statistically valid. The overall behavior of the data for each region is dominated by the lowest N length members of that region, as the Figure 4, log(fobs) vs. N dependence clearly demonstrates is the case. What the data in Figure 6 show is that the real sequence length distributions display a behavior that cannot be represented by a single exponential relationship. In panel A, there is something resembling at least a two exponential function decrease in the fobs as length increases. A less obvious, more noisy relationship exists between the points in panel B, due to the lower fobs values observed for the N>5 case. In the case of panel C, the fobs data decrease but are too noisy to discern a clear relationship. Next, we decided to compare the fobs values of the real sequences for the N>3, N>5, and N>9 region cases to the observed frequencies calculated from the same sequences in which all base positions of the sequence had been randomly shuffled except for the N>X selected tracts. We call these new cases conservatively shuffled sequences and the observed frequencies of occurrence determined from them are designated, fcons shuffle. We carried out this conservative shuffling procedure ten consecutive times for each sequence. After each shuffling, we tabulated the number of instances found in all the shuffled sequences for each value of the length of spacer plus two adjacent tracts. The conservative shuffling instances from all ten conservative shufflings were then averaged and the fcons shuffle. determined for each value of the length of spacer plus two adjacent tracts. These data were also plotted in the three respective panels of Figure 6. These data display a very different behavior than do the real sequence fobs length distributions of two tract plus spacer DNA. In the 10× conservatively shuffled case, there is a single exponential fobs decrease evident in panels A and B, while panel C is too noisy to determine. However, what is clear from these panels, is that at all N ranges the real sequence fobs values are higher than the corresponding 10× conservative shuffled values for low to |