Sebastian Prillo

EECS Department, University of California, Berkeley

Technical Report No. UCB/

May 1, 2025

Molecular evolution drives many important biological processes, such as protein evolution under natural selection and DNA mutations in growing cancer cell populations. Unlike other statistical settings, our data are not IID, but rather correlated through an unobserved, latent phylogenetic tree—of which only the tips are observed. In this work, I present new methods for learning models of molecular evolution. For modeling protein evolution, I develop a new framework called CherryML, based on composite likelihoods. This enables scalable learning of rich models of protein evolution. In particular, its scalability allow us to fit deep-learning based models, which is unprecedented in the field. For tree inference under irreversible mutation processes, such as DNA mutations in single-cell lineage tracing experiments, I propose scalable and accurate algorithms based on moment-matching, conservative maximum parsimony, and pseudocounts. This enables the analysis of larger and more complex single-cell datasets.

Advisors: Yun S. Song and Nir Yosef


BibTeX citation:

@phdthesis{Prillo:31697,
    Author= {Prillo, Sebastian},
    Title= {Scalable and Accurate Models of Molecular Evolution},
    School= {EECS Department, University of California, Berkeley},
    Year= {2025},
    Number= {UCB/},
    Abstract= {Molecular evolution drives many important biological processes, such as protein evolution under natural selection and DNA mutations in growing cancer cell populations. Unlike other statistical settings, our data are not IID, but rather correlated through an unobserved, latent phylogenetic tree—of which only the tips are observed. In this work, I present new methods for learning models of molecular evolution. For modeling protein evolution, I develop a new framework called CherryML, based on composite likelihoods. This enables scalable learning of rich models of protein evolution. In particular, its scalability allow us to fit deep-learning based models, which is unprecedented in the field. For tree inference under irreversible mutation processes, such as DNA mutations in single-cell lineage tracing experiments, I propose scalable and accurate algorithms based on moment-matching, conservative maximum parsimony, and pseudocounts. This enables the analysis of larger and more complex single-cell datasets.},
}

EndNote citation:

%0 Thesis
%A Prillo, Sebastian 
%T Scalable and Accurate Models of Molecular Evolution
%I EECS Department, University of California, Berkeley
%D 2025
%8 May 1
%@ UCB/
%F Prillo:31697