Nonparametric Bayesian Models for Machine Learning
Romain Jean Thibaux
EECS Department, University of California, Berkeley
Technical Report No. UCB/EECS-2008-130
October 14, 2008
http://www2.eecs.berkeley.edu/Pubs/TechRpts/2008/EECS-2008-130.pdf
This thesis presents general techiques for inference in various nonparametric Bayesian models, furthers our understanding of the stochastic processes at the core of these models, and develops new models of data based on these findings. In particular, we develop new Monte Carlo algorithms for Dirichlet process mixtures based on a general framework. We extend the vocabulary of processes used for nonparametric Bayesian models by proving many properties of beta and gamma processes. In particular, we show how to perform probabilistic inference in hierarchies of beta and gamma processes, and how this naturally leads to improvements to the well known na\"{i}ve Bayes algorithm. We demonstrate the robustness and speed of the resulting methods by applying it to a classification task with 1 million training samples and 40,000 classes.
Advisors: Michael Jordan
BibTeX citation:
@phdthesis{Thibaux:EECS-2008-130, Author= {Thibaux, Romain Jean}, Title= {Nonparametric Bayesian Models for Machine Learning}, School= {EECS Department, University of California, Berkeley}, Year= {2008}, Month= {Oct}, Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2008/EECS-2008-130.html}, Number= {UCB/EECS-2008-130}, Abstract= {This thesis presents general techiques for inference in various nonparametric Bayesian models, furthers our understanding of the stochastic processes at the core of these models, and develops new models of data based on these findings. In particular, we develop new Monte Carlo algorithms for Dirichlet process mixtures based on a general framework. We extend the vocabulary of processes used for nonparametric Bayesian models by proving many properties of beta and gamma processes. In particular, we show how to perform probabilistic inference in hierarchies of beta and gamma processes, and how this naturally leads to improvements to the well known na\"{i}ve Bayes algorithm. We demonstrate the robustness and speed of the resulting methods by applying it to a classification task with 1 million training samples and 40,000 classes.}, }
EndNote citation:
%0 Thesis %A Thibaux, Romain Jean %T Nonparametric Bayesian Models for Machine Learning %I EECS Department, University of California, Berkeley %D 2008 %8 October 14 %@ UCB/EECS-2008-130 %U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2008/EECS-2008-130.html %F Thibaux:EECS-2008-130