Book of Abstracts: Albany 2011
June 14-18 2011
©Adenine Press (2010)
The Nucleosome Simulator: 100 Nucleosomes; 2 Microseconds and Counting
Molecular simulation is an effective tool to study structure function relationships in biomolecules. A modest molecular dynamics simulation today includes ~100,000 atoms and represents over 100ns of time. The computational effort requires nearly one hundred processors in order to complete in less than one week. Today's supercomputers contain up to 200,000 processors, too many for a single simulation of modest size. However, it is reasonable to simulate 10's to 100's of structures simultaneously on a single supercomputer or to distribute them to any suitable computing resources as they become available. Such high throughput high performance simulations require careful coordination and strategy. The ManyJobs and BigJobs tools provide this functionality.
Our present efforts are directed toward the investigation of nucleosome positioning and stability as a function of DNA sequence using all atom molecular dynamics simulation. Nucleosome positioning is one of the current ‘hot’ issues and the various approaches and hypotheses on what positions nucleosomes are discussed extensively in recent publications, particularly in an issue of this Journal devoted exclusively to this sub¬ject (1-12). One of the major factors affecting the nucleosome positioning is DNA sequence. Nucleosomes consist of 147 base pair (bp) of DNA wrapped ~1.7 left handed superhelical turns around a histone core. This study requires simulations of hundreds of nucleosomes with different sequences. Our chosen sequences are divided into four broad categories: naturally occurring positioning sequences, artificial positioning sequences, sequences from the Saccharomyces cerevisiae genome and sequences used for control purposes, e.g. homopolymers. For the S. cerevisiae derived sequences, we model 336 nucleosomes. The collection represents 16 of the most well positioned nucleosomes and their immediate neighbors in sequence space. Each neighborhood spans two turns of the DNA, one upstream turn and one downstream turn. The 21 individual neighbors contain only 147bp, each created by threading the appropriate sequence onto the histone core. Thus each neighborhood has a common segment of 126bp located at 21 successive positions on the histone core. To date, we have simulated over 100 nucleosomes, including 4 separate neighborhoods, and accumulated over 2 microseconds of nucleosome dynamics. Our high throughput approach requires significant computational power, constant scheduling, monitoring, and efficient utilization of resources in order to achieve the shortest time to completion.
To manage the workflow we have utilized two scheduling tools: ManyJobs and BigJobs. ManyJobs is a portable tool written in Python. ManyJobs maintains a database of all compute tasks and the dependencies between tasks. At the beginning of a run, ManyJobs submits requests for resources to all computers listed by the user. Once a resource is allocated and a job starts, ManyJobs assigns a task to the resource and requests additional resources in anticipation of the next task. Upon job completion, the task is marked complete in the ManyJobs database. The process repeats until all tasks in the database are completed. The current version of ManyJobs uses secure shell for communications between the machine maintaining the task database and the various compute resources. BigJobs is a Simple API for Grid Applications (SAGA) based implementation of the pilot job concept. SAGA provides alternate methods for authentication and communication than secure shell. Another distinction of BigJobs is the ability to dynamically bundle individual tasks into a container with multiple tasks, the pilot job.
We will discuss implementation and proper utilization of these tools and their pros and cons. A meta-analysis of simulation results is conducted to identify features of nucleosome positioning and stability. We focus attention on DNA structural deformations, in the form of kinks, their location, sequence dependencies, and the timescale associated with kink formation and healing.
1Center for Computational Science