Inferring Perturbation Effects on Gene Regulation and Disease
Ruchir Rastogi
EECS Department, University of California, Berkeley
Technical Report No. UCB/
May 1, 2025
Biological systems are commonly studied by introducing perturbations and observing the resulting responses. However, we have limited experimental ability to make targeted perturbations at scale, and even when feasible, the resulting data are frequently high-dimensional and noisy. In this thesis, we evaluate and develop machine learning methods that either aim to predict perturbation effects or decode complex perturbation readouts, focusing on two key types of perturbations: genetic variants and vaccination.
First, we study sequence-to-function models that predict the impact of noncoding variants on gene expression. These models are never trained on perturbation data, yet they aim to accurately predict variant effects. We show that these models are highly uncertain about the effects of variants, especially those that drive expression differences between individuals. Fine-tuning these models on data from naturally-occurring human genetic variation ameliorates, but does not eliminate, this uncertainty.
Second, we assess leading missense variant effect predictors---computational tools that predict how single-nucleotide changes to proteins affect human health---on a new, comprehensive, and clinically-oriented benchmark. While many predictors demonstrate strong overall performance, their capabilities can vary markedly across different contexts, which we trace to biases in their observational training data.
Finally, we chart how vaccines remodel the innate immune system---traditionally viewed as unaffected by immunization---by constructing a single-cell atlas from donors given one of six vaccines. Our computational analyses reveal prolonged epigenetic reprogramming of classical monocytes, uncover five classical monocyte subtypes, and connect vaccine-induced shifts in the abundance of specific subtypes to distinct molecular pathways and functional outcomes. Together, these studies demonstrate how machine learning can help both predict and understand the effects of complex biological perturbations.
Advisors: Nir Yosef and Nilah Ioannidis
BibTeX citation:
@phdthesis{Rastogi:31899,
Author= {Rastogi, Ruchir},
Title= {Inferring Perturbation Effects on Gene Regulation and Disease},
School= {EECS Department, University of California, Berkeley},
Year= {2025},
Number= {UCB/},
Abstract= {Biological systems are commonly studied by introducing perturbations and observing the resulting responses. However, we have limited experimental ability to make targeted perturbations at scale, and even when feasible, the resulting data are frequently high-dimensional and noisy. In this thesis, we evaluate and develop machine learning methods that either aim to predict perturbation effects or decode complex perturbation readouts, focusing on two key types of perturbations: genetic variants and vaccination.
First, we study sequence-to-function models that predict the impact of noncoding variants on gene expression. These models are never trained on perturbation data, yet they aim to accurately predict variant effects. We show that these models are highly uncertain about the effects of variants, especially those that drive expression differences between individuals. Fine-tuning these models on data from naturally-occurring human genetic variation ameliorates, but does not eliminate, this uncertainty.
Second, we assess leading missense variant effect predictors---computational tools that predict how single-nucleotide changes to proteins affect human health---on a new, comprehensive, and clinically-oriented benchmark. While many predictors demonstrate strong overall performance, their capabilities can vary markedly across different contexts, which we trace to biases in their observational training data.
Finally, we chart how vaccines remodel the innate immune system---traditionally viewed as unaffected by immunization---by constructing a single-cell atlas from donors given one of six vaccines. Our computational analyses reveal prolonged epigenetic reprogramming of classical monocytes, uncover five classical monocyte subtypes, and connect vaccine-induced shifts in the abundance of specific subtypes to distinct molecular pathways and functional outcomes. Together, these studies demonstrate how machine learning can help both predict and understand the effects of complex biological perturbations.},
}
EndNote citation:
%0 Thesis %A Rastogi, Ruchir %T Inferring Perturbation Effects on Gene Regulation and Disease %I EECS Department, University of California, Berkeley %D 2025 %8 May 1 %@ UCB/ %F Rastogi:31899