Generating Optimal Molecules with Synthesizability and 3D Equivariant Conformational Constraints
EECS Department, University of California, Berkeley
Technical Report No. UCB/EECS-2023-257
December 1, 2023
http://www2.eecs.berkeley.edu/Pubs/TechRpts/2023/EECS-2023-257.pdf
This thesis examines two forms of constraint-driven machine learning-based molecule generation techniques. The first is BBO-SYN, a generative framework based on black-box optimization (BBO), which predicts diverse molecules with desired properties together with corresponding synthesis pathways. BBO-SYN uses recent advances in a Monte Carlo Tree Search-based latent search algorithm to locate promising reactants that produce high-scoring products when fed to a pretrained language model for chemical reaction prediction. BBO-SYN is empirically shown to produce high-scoring and diverse synthesis trees while operating over a large continuous reactant space. Similarly, after exploring synthesizability constraints, CoarsenConf was developed to generate optimal 3D low-energy conformers in an SE(3) equivariant fashion. CoarsenConf is a hierarchical graph variational autoencoder that coarsens input molecule graphs based on torsion angles to learn a subgraph level latent distribution that is used for an efficient autoregressive generation via aggregated attention. CoarsenConf predominantly outperforms state-or-the-art methods with significantly less data and training iterations on more robust benchmarks.
BibTeX citation:
@mastersthesis{EECS-2023-257, Editor= {Krishnapriyan, Aditi and Klein, Daniel}, Title= {Generating Optimal Molecules with Synthesizability and 3D Equivariant Conformational Constraints}, School= {EECS Department, University of California, Berkeley}, Year= {2023}, Month= {Dec}, Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2023/EECS-2023-257.html}, Number= {UCB/EECS-2023-257}, Abstract= {This thesis examines two forms of constraint-driven machine learning-based molecule generation techniques. The first is BBO-SYN, a generative framework based on black-box optimization (BBO), which predicts diverse molecules with desired properties together with corresponding synthesis pathways. BBO-SYN uses recent advances in a Monte Carlo Tree Search-based latent search algorithm to locate promising reactants that produce high-scoring products when fed to a pretrained language model for chemical reaction prediction. BBO-SYN is empirically shown to produce high-scoring and diverse synthesis trees while operating over a large continuous reactant space. Similarly, after exploring synthesizability constraints, CoarsenConf was developed to generate optimal 3D low-energy conformers in an SE(3) equivariant fashion. CoarsenConf is a hierarchical graph variational autoencoder that coarsens input molecule graphs based on torsion angles to learn a subgraph level latent distribution that is used for an efficient autoregressive generation via aggregated attention. CoarsenConf predominantly outperforms state-or-the-art methods with significantly less data and training iterations on more robust benchmarks.}, }
EndNote citation:
%0 Thesis %E Krishnapriyan, Aditi %E Klein, Daniel %T Generating Optimal Molecules with Synthesizability and 3D Equivariant Conformational Constraints %I EECS Department, University of California, Berkeley %D 2023 %8 December 1 %@ UCB/EECS-2023-257 %U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2023/EECS-2023-257.html %F EECS-2023-257