Aniketh Janardhan Reddy
EECS Department
University of California, Berkeley
Technical Report No. UCB/EECS-2025-95
May 16, 2025
http://www2.eecs.berkeley.edu/Pubs/TechRpts/2025/EECS-2025-95.pdf
Gene expression is a complex, tightly regulated process controlled by cis-acting DNA elements including promoters and enhancers, and trans-acting elements such as transcription and splicing factors. Understanding how these elements coordinate to regulate expression is a key goal in regulatory genomics. Recently, sequence-based deep learning models have emerged as powerful tools for learning the sequence determinants of gene expression. In this work, we both leverage these models for cis-regulatory element design and propose methods to improve their predictive capabilities.
First, we explore transfer learning strategies to improve promoter-driven expression prediction, especially in data-constrained settings. After showing that these strategies can substantially improve prediction performance, we couple our sequence-based models with sequence optimizers in a novel model-based optimization workflow to design cell-type-specific promoters that are crucial for gene therapies. Crucially, this workflow accounts for practical constraints such as adversarial designs, sequence diversity, and prediction uncertainty, and improves the cell-type-specificity of most sequences in a challenging setting.
Next, we explore methods to improve individual-level gene expression prediction, a task on which current sequence-based deep learning models fail. Fine-tuning on paired personal genome and transcriptome data improves predictions on held-out individuals for genes seen during training—matching baselines—but these models still fail to generalize to unseen genes.
Finally, we address limitations of current splicing predictors, which do not generalize to unseen tissues or model the expression levels of trans-acting splicing factors. We propose a new sequence-based model that incorporates splicing factor expression to make tissue-specific splicing predictions, even in tissues not seen during training. This model captures some tissue-specific splicing patterns and achieves performance comparable to existing models trained directly on those tissues, highlighting the utility of incorporating regulatory context for generalization.
Advisor: Nilah Ioannidis
";
?>
BibTeX citation:
@phdthesis{Reddy:EECS-2025-95, Author = {Reddy, Aniketh Janardhan}, Title = {Predicting and regulating gene expression using sequence-based deep learning models}, School = {EECS Department, University of California, Berkeley}, Year = {2025}, Month = {May}, URL = {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2025/EECS-2025-95.html}, Number = {UCB/EECS-2025-95}, Abstract = {Gene expression is a complex, tightly regulated process controlled by cis-acting DNA elements including promoters and enhancers, and trans-acting elements such as transcription and splicing factors. Understanding how these elements coordinate to regulate expression is a key goal in regulatory genomics. Recently, sequence-based deep learning models have emerged as powerful tools for learning the sequence determinants of gene expression. In this work, we both leverage these models for cis-regulatory element design and propose methods to improve their predictive capabilities. First, we explore transfer learning strategies to improve promoter-driven expression prediction, especially in data-constrained settings. After showing that these strategies can substantially improve prediction performance, we couple our sequence-based models with sequence optimizers in a novel model-based optimization workflow to design cell-type-specific promoters that are crucial for gene therapies. Crucially, this workflow accounts for practical constraints such as adversarial designs, sequence diversity, and prediction uncertainty, and improves the cell-type-specificity of most sequences in a challenging setting. Next, we explore methods to improve individual-level gene expression prediction, a task on which current sequence-based deep learning models fail. Fine-tuning on paired personal genome and transcriptome data improves predictions on held-out individuals for genes seen during training—matching baselines—but these models still fail to generalize to unseen genes. Finally, we address limitations of current splicing predictors, which do not generalize to unseen tissues or model the expression levels of trans-acting splicing factors. We propose a new sequence-based model that incorporates splicing factor expression to make tissue-specific splicing predictions, even in tissues not seen during training. This model captures some tissue-specific splicing patterns and achieves performance comparable to existing models trained directly on those tissues, highlighting the utility of incorporating regulatory context for generalization.} }
EndNote citation:
%0 Thesis %A Reddy, Aniketh Janardhan %T Predicting and regulating gene expression using sequence-based deep learning models %I EECS Department, University of California, Berkeley %D 2025 %8 May 16 %@ UCB/EECS-2025-95 %U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2025/EECS-2025-95.html %F Reddy:EECS-2025-95