EECS Department, University of California, Berkeley

Technical Report No. UCB/EECS-2023-257

December 1, 2023

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2023/EECS-2023-257.pdf

This thesis examines two forms of constraint-driven machine learning-based molecule generation techniques. The first is BBO-SYN, a generative framework based on black-box optimization (BBO), which predicts diverse molecules with desired properties together with corresponding synthesis pathways. BBO-SYN uses recent advances in a Monte Carlo Tree Search-based latent search algorithm to locate promising reactants that produce high-scoring products when fed to a pretrained language model for chemical reaction prediction. BBO-SYN is empirically shown to produce high-scoring and diverse synthesis trees while operating over a large continuous reactant space. Similarly, after exploring synthesizability constraints, CoarsenConf was developed to generate optimal 3D low-energy conformers in an SE(3) equivariant fashion. CoarsenConf is a hierarchical graph variational autoencoder that coarsens input molecule graphs based on torsion angles to learn a subgraph level latent distribution that is used for an efficient autoregressive generation via aggregated attention. CoarsenConf predominantly outperforms state-or-the-art methods with significantly less data and training iterations on more robust benchmarks.


BibTeX citation:

@mastersthesis{EECS-2023-257,
    Editor= {Krishnapriyan, Aditi and Klein, Daniel},
    Title= {Generating Optimal Molecules with Synthesizability and 3D Equivariant Conformational Constraints},
    School= {EECS Department, University of California, Berkeley},
    Year= {2023},
    Month= {Dec},
    Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2023/EECS-2023-257.html},
    Number= {UCB/EECS-2023-257},
    Abstract= {This thesis examines two forms of constraint-driven machine learning-based molecule generation techniques. The first is BBO-SYN, a generative framework based on black-box optimization (BBO), which predicts diverse molecules with desired properties together with corresponding synthesis pathways. BBO-SYN uses recent advances in a Monte Carlo Tree Search-based latent search algorithm to locate promising reactants that produce high-scoring products when fed to a pretrained language model for chemical reaction prediction. BBO-SYN is empirically shown to produce high-scoring and diverse synthesis trees while operating over a large continuous reactant space. Similarly, after exploring synthesizability constraints, CoarsenConf was developed to generate optimal 3D low-energy conformers in an SE(3) equivariant fashion. CoarsenConf is a hierarchical graph variational autoencoder that coarsens input molecule graphs based on torsion angles to learn a subgraph level latent distribution that is used for an efficient autoregressive generation via aggregated attention. CoarsenConf predominantly outperforms state-or-the-art methods with significantly less data and training iterations on more robust benchmarks.},
}

EndNote citation:

%0 Thesis
%E Krishnapriyan, Aditi 
%E Klein, Daniel 
%T Generating Optimal Molecules with Synthesizability and 3D Equivariant Conformational Constraints
%I EECS Department, University of California, Berkeley
%D 2023
%8 December 1
%@ UCB/EECS-2023-257
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2023/EECS-2023-257.html
%F EECS-2023-257