LOGOS: A Hierarchical Bayesian Markovian Motif Model Capturing Local Site-Dependencies and Global Motif Distributions

Eric P. Xing and Richard M. Karp

EECS Department
University of California, Berkeley
Technical Report No. UCB/CSD-03-1225
January 2003

We present LOGOS, an integrated LOcal and GlObal motif Sequence model for biopolymer sequences. LOGOS consists of two interacting submodels: HMDM, a model for aligned motif sequences and HMM, a model for the global distribution of motif instances. HMDM is a hidden Markov Dirichlet-multinomial model which captures rich biological prior knowledge and positional dependence in motif local structure in a principled way. HMM is a standard hidden Markov which allows formal and efficient inference of motif locations, and is potentially capable of capturing their dependencies. Model parameters can be fit on training motifs using a variational EM algorithm within an empirical Bayesian framework. Variational inference is also used for detecting hidden motifs. Our model improves over existing models that ignore biological priors and positional dependence, and do not adhere strictly to a well-founded probabilistic model for the global motif distribution. It shows higher sensitivity to motifs, notable ability to distinguish genuine motifs from false recurring patterns, and flexibility for complex detection tasks (e.g. simultaneous multiple motif detection, unknown number of motif instances, unknown motif lengths, etc.). LOGOS provides a principled framework for modularizing, extending and computing motif models for complex biopolymer sequence analysis.


BibTeX citation:

@techreport{Xing:CSD-03-1225,
    Author = {Xing, Eric P. and Karp, Richard M.},
    Title = {LOGOS: A Hierarchical Bayesian Markovian Motif Model Capturing Local Site-Dependencies and Global Motif Distributions},
    Institution = {EECS Department, University of California, Berkeley},
    Year = {2003},
    Month = {Jan},
    URL = {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2003/6188.html},
    Number = {UCB/CSD-03-1225},
    Abstract = {We present LOGOS, an integrated LOcal and GlObal motif Sequence model for biopolymer sequences. LOGOS consists of two interacting submodels: HMDM, a model for aligned motif sequences and HMM, a model for the global distribution of motif instances. HMDM is a hidden Markov Dirichlet-multinomial model which captures rich biological prior knowledge and positional dependence in motif local structure in a principled way.  HMM is a standard hidden Markov which allows formal and efficient inference of motif locations, and is potentially capable of capturing their dependencies. Model parameters can be fit on training motifs using a variational EM algorithm within an empirical Bayesian framework. Variational inference is also used for detecting hidden motifs. Our model improves over existing models that ignore biological priors and positional dependence, and do not adhere strictly to a well-founded probabilistic model for the global motif distribution. It shows higher sensitivity to motifs, notable ability to distinguish genuine motifs from false recurring patterns, and flexibility for complex detection tasks (e.g. simultaneous multiple motif detection, unknown number of motif instances, unknown motif lengths, etc.). LOGOS provides a principled framework for modularizing, extending and computing motif models for complex biopolymer sequence analysis.}
}

EndNote citation:

%0 Report
%A Xing, Eric P.
%A Karp, Richard M.
%T LOGOS: A Hierarchical Bayesian Markovian Motif Model Capturing Local Site-Dependencies and Global Motif Distributions
%I EECS Department, University of California, Berkeley
%D 2003
%@ UCB/CSD-03-1225
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2003/6188.html
%F Xing:CSD-03-1225