Machine Learning Prediction of TCR-Epitope Binding

Julian Faust and Yun S. Song

EECS Department, University of California, Berkeley

Technical Report No. UCB/EECS-2022-216

August 17, 2022

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2022/EECS-2022-216.pdf

Prediction of T-cell receptor (TCR) binding with peptide-MHC complexes remains a difficult problem due to data accuracy, data scarceness, and problem complexity. Here, we compare predictions of TCR-pMHC binding across several approaches of featurizing the TCR, and several different machine learning methods. First, we analyze the available data and discuss the formulation of binder/non-binder designations for our binary classification framework. Next, we compare several featurizations of the TCR across different machine learning methods of varying complexity. We provide an ablation study across different region combinations common in cases with limited data. We show that simpler machine learning methods trained on binders and non-binders of a single epitope can be used to better understand binding factors. Our attention-based neural network directly incorporates peptide and MHC sequence information, and performs similarly on the harder problem of training with binders and non-binders of many epitopes at once. Lastly, we incorporate gene usage data into our prediction framework.

Advisors: Yun S. Song

BibTeX citation:

@mastersthesis{Faust:EECS-2022-216,
    Author= {Faust, Julian and Song, Yun S.},
    Title= {Machine Learning Prediction of TCR-Epitope Binding},
    School= {EECS Department, University of California, Berkeley},
    Year= {2022},
    Month= {Aug},
    Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2022/EECS-2022-216.html},
    Number= {UCB/EECS-2022-216},
    Abstract= {Prediction of T-cell receptor (TCR) binding with peptide-MHC complexes remains a difficult problem due to data accuracy, data scarceness, and problem complexity. Here, we compare predictions of TCR-pMHC binding across several approaches of featurizing the TCR, and several different machine learning methods. First, we analyze the available data and discuss the formulation of binder/non-binder designations for our binary classification framework. Next, we compare several featurizations of the TCR across different machine learning methods of varying complexity. We provide an ablation study across different region combinations common in cases with limited data. We show that simpler machine learning methods trained on binders and non-binders of a single epitope can be used to better understand binding factors. Our attention-based neural network directly incorporates peptide and MHC sequence information, and performs similarly on the harder problem of training with binders and non-binders of many epitopes at once. Lastly, we incorporate gene usage data into our prediction framework.},
}

EndNote citation:

%0 Thesis
%A Faust, Julian 
%A Song, Yun S. 
%T Machine Learning Prediction of TCR-Epitope Binding
%I EECS Department, University of California, Berkeley
%D 2022
%8 August 17
%@ UCB/EECS-2022-216
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2022/EECS-2022-216.html
%F Faust:EECS-2022-216