Sathvik Kolli

EECS Department, University of California, Berkeley

Technical Report No. UCB/EECS-2023-102

May 11, 2023

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2023/EECS-2023-102.pdf

The success of deep learning in computational biology has been largely limited to prediction problems, such as protein structure prediction and gene expression prediction. Nevertheless, these successes serve as a testament to the ability of deep neural networks to extract useful insights from datasets of biological sequences, and this has recently motivated research into the applications of deep learning for biological sequence design problems. In this paper, we tackle two important synthetic biology problems: (1) the problem of designing promoter sequences that are differentially expressed and (2) the inverse-protein folding problem of recovering protein sequences from three-dimensional structure. We identify both problems as black-box computational design problems, and we adapt conservative objective models (COMs), a data-driven offline model-based optimization (MBO) technique that has been used successfully on a wide range of design problems, to design biological sequences in these settings. On both problems, we demonstrate that our approach significantly outperforms standard offline MBO techniques.

Advisors: Sergey Levine


BibTeX citation:

@mastersthesis{Kolli:EECS-2023-102,
    Author= {Kolli, Sathvik},
    Title= {Conservative Objective Models for Biological Sequence Design},
    School= {EECS Department, University of California, Berkeley},
    Year= {2023},
    Month= {May},
    Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2023/EECS-2023-102.html},
    Number= {UCB/EECS-2023-102},
    Abstract= {The success of deep learning in computational biology has been largely limited to prediction problems, such as protein structure prediction and gene expression prediction. Nevertheless, these successes serve as a testament to the ability of deep neural networks to extract useful insights from datasets of biological sequences, and this has recently motivated research into the applications of deep learning for biological sequence design problems. In this paper, we tackle two important synthetic biology problems: (1) the problem of designing promoter sequences that are differentially expressed and (2) the inverse-protein folding problem of recovering protein sequences from three-dimensional structure. We identify both problems as black-box computational design problems, and we adapt conservative objective models (COMs), a data-driven offline model-based optimization (MBO) technique that has been used successfully on a wide range of design problems, to design biological sequences in these settings. On both problems, we demonstrate that our approach significantly outperforms standard offline MBO techniques.},
}

EndNote citation:

%0 Thesis
%A Kolli, Sathvik 
%T Conservative Objective Models for Biological Sequence Design
%I EECS Department, University of California, Berkeley
%D 2023
%8 May 11
%@ UCB/EECS-2023-102
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2023/EECS-2023-102.html
%F Kolli:EECS-2023-102