Bayesian Haplotype Inference via the Dirichlet Process

Eric P. Xing and Roded Sharan and Michael I. Jordan

EECS Department, University of California, Berkeley

Technical Report No. UCB/CSD-03-1275

2003

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2003/CSD-03-1275.pdf

The problem of the inferring haplotypes from genotypes of single nucleotide polymorphisms (SNPs) is essential for the understanding of genetic variation within and among populations, with important applications to the genetic analysis of disease propensities and other complex traits. In this paper we present a novel statistical model for haplotype inference. Our model is a Bayesian model based on a prior known as the Dirichlet process, a nonparametric prior which provides control over the size of the unknown pool of population haplotypes. The model also incorporates a likelihood that allows statistical errors in the haplotype/genotype relationship, trading off these errors against the size of the pool of haplotypes. We describe an algorithm based on Markov chain Monte Carlo for posterior inference. The overall result is a flexible Bayesian model that is reminiscent of parsimony methods in its preference for small haplotype pools. We apply this new approach to the analysis of both simulated and real genotype data, and compare to extant methods.

BibTeX citation:

@techreport{Xing:CSD-03-1275,
    Author= {Xing, Eric P. and Sharan, Roded and Jordan, Michael I.},
    Title= {Bayesian Haplotype Inference via the Dirichlet Process},
    Year= {2003},
    Month= {Sep},
    Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2003/6358.html},
    Number= {UCB/CSD-03-1275},
    Abstract= {The problem of the inferring haplotypes from genotypes of single nucleotide polymorphisms (SNPs) is essential for the understanding of genetic variation within and among populations, with important applications to the genetic analysis of disease propensities and other complex traits. In this paper we present a novel statistical model for haplotype inference. Our model is a Bayesian model based on a prior known as the Dirichlet process, a nonparametric prior which provides control over the size of the unknown pool of population haplotypes. The model also incorporates a likelihood that allows statistical errors in the haplotype/genotype relationship, trading off these errors against the size of the pool of haplotypes. We describe an algorithm based on Markov chain Monte Carlo for posterior inference. The overall result is a flexible Bayesian model that is reminiscent of parsimony methods in its preference for small haplotype pools. We apply this new approach to the analysis of both simulated and real genotype data, and compare to extant methods.},
}

EndNote citation:

%0 Report
%A Xing, Eric P. 
%A Sharan, Roded 
%A Jordan, Michael I. 
%T Bayesian Haplotype Inference via the Dirichlet Process
%I EECS Department, University of California, Berkeley
%D 2003
%@ UCB/CSD-03-1275
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2003/6358.html
%F Xing:CSD-03-1275