Book of Abstracts: Albany 2003
June 17-21 2003
Promoter-search Algorithm Based on Canonical and Non-canonical Sequence Elements of E.coli Regulatory Regions.
Availability of totally sequenced genomes allowed a possibility to use high-throughput approaches for annotation of the regulatory gene networks. An initial step is the mapping of promoter sites, which ideally should be based on experimental approaches, which up to now is a complex task. Bioinformatics offers an alternative tool and a set of promoter-search algorithms for E.coli regulatory regions have been proposed. They exploit the presence of two canonical hexamers, some preferences in the flanking regions and around the start point of transcription and are capable of identifying ~90% of known promoter sites at the 2% level of false positives or up to 40% if the stronger criterion (0,005%) is used (1). To increase predictive capability of computer algorithms we employed clustering software for finding unknown patterns that occur imperfectly in a set of promoter sequences. Additional elements were searched using all kinds of degenerated sequence alphabets. Non-random distribution of base pairs has been revealed in the area -204/+75, shifting the borders of promoter DNA for several helix-turns in either side. Besides modules involved in specific interaction with RNA polymerase sigma subunit several elements were identified in the putative contact region with RNA polymerase alpha subunits (-60/-40) (2). Dominant AAAAA or TTTTT may participate in structure-specific interaction with DNA-binding domain of alpha, while flexible CACA, TGTA, TG, and TA may support adoptive conformational transitions or participate in non-specific interaction with protein. Easily deformable dinucleotides YR have significant maxima at several other positions (3), favoring proper orientation of the promoter DNA on the interface of transcription machinery. The least understood feature of the promoter DNA is the presence of strongly preferred sites for 4-5 b.p.-long A/T-tracts. Maxima in disposition of these tracts, exceeding background level for at least 3 std, are observed at 24 positions following ~1 helix turn (8-13 b.p.) periodicity upstream of -45, and 15-18 b.p. regularity downstream from this position. Additional promoter-specific features when taken into account increase distinguishing potential of promoter-search algorithm and allow identifying 85% of natural promoters at the level when no false positives have been found in the control set composed of random sequences. Results of promoter mapping within entire genome of E.coli both for genes involved in protein synthesis and not will be discussed.
Supported by Russian Foundation for Basic Research (03-04-48339, 01-04-97006).
O. N. Ozoline1
1Institute of Cell Biophysics