Issue April 2011No. 5 (p 675-843) April 2011 ISSN 0739-1102 An Efficient Binomial Model-Based Measure for Sequence Comparison and its ApplicationSequence comparison is one of the major tasks in bioinformatics, which could serve as evidence of structural and functional conservation, as well as of evolutionary relations. There are several similarity/dissimilarity measures for sequence comparison, but challenges remains. This paper presented a binomial model-based measure to analyze biological sequences. With help of a random indicator, the occurrence of a word at any position of sequence can be regarded as a random Bernoulli variable, and the distribution of a sum of the word occurrence is well known to be a binomial one. By using a recursive formula, we computed the binomial probability of the word count and proposed a binomial model-based measure based on the relative entropy. The proposed measure was tested by extensive experiments including classification of HEV genotypes and phylogenetic analysis, and further compared with alignment -based and alignment-free measures. The results demonstrate that the proposed measure based on binomial model is more efficient.
Key words: Word count; Binomial model; Sequence comparison; Classification; Phylogenetic analysis. This article can be cited as: X. Liu, Q. Dai, L. Li, Z. He, An Efficient Binomial Model-Based Measure for Sequence Comparison and its Application, J. Biomol Struct Dyn 28(5) 833-843 (2011) Xiaoqing Liu1 1School of Science, Hangzhou Dianzi Unviersity, Hangzhou 310018, People’s Republic of China Subscription is more cost effective than purchasing PDFs on-the-fly. Click here for details. |