A Likelihood-based Deconvolution of Bulk Gene Expression Data Using Single-cell References

Justin Hong, Dan D. Erdmann-Pham, Jonathan Fischer and Yun S. Song

EECS Department
University of California, Berkeley
Technical Report No. UCB/EECS-2021-21
May 1, 2021

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2021/EECS-2021-21.pdf

A single bulk gene expression experiment estimates thousands of RNA transcript levels averaged over myriad cells. Unfortunately, direct comparison of different bulk expression profiles is complicated by the mixtures of distinct cell types in each sample, obscuring whether perceived differences are actually due to changes in expression levels themselves or simply cell type composition. Single-cell technology has made it possible to measure gene expression in individual cells, achieving higher resolution at the expense of increased noise. If carefully incorporated, such data can be used as references for the supervised deconvolution of bulk samples to yield accurate estimates of the true cell type proportions. These estimates permit us to disentangle the effects of differential expression and cell type mixtures, both of which are independently relevant to our understanding of aging and disease. We hence propose a generative model which uses asymptotic statistical theory and a robust estimation procedure to perform a supervised deconvolution of bulk RNA-seq samples to produce cell type proportion estimates. We demonstrate the effectiveness of our approach in several scenarios with real data and also discuss several novel extensions made uniquely possible by our paradigm.

Advisor: Kannan Ramchandran and Yun S. Song


BibTeX citation:

@mastersthesis{Hong:EECS-2021-21,
    Author = {Hong, Justin and Erdmann-Pham, Dan D. and Fischer, Jonathan and Song, Yun S.},
    Title = {A Likelihood-based Deconvolution of Bulk Gene Expression Data Using Single-cell References},
    School = {EECS Department, University of California, Berkeley},
    Year = {2021},
    Month = {May},
    URL = {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2021/EECS-2021-21.html},
    Number = {UCB/EECS-2021-21},
    Abstract = {A single bulk gene expression experiment estimates thousands of RNA transcript levels averaged over myriad cells. Unfortunately, direct comparison of different bulk expression profiles is complicated by the mixtures of distinct cell types in each sample, obscuring whether perceived differences are actually due to changes in expression levels themselves or simply cell type composition. Single-cell technology has made it possible to measure gene expression in individual cells, achieving higher resolution at the expense of increased noise. If carefully incorporated, such data can be used as references for the supervised deconvolution of bulk samples to yield accurate estimates of the true cell type proportions. These estimates permit us to disentangle the effects of differential expression and cell type mixtures, both of which are independently relevant to our understanding of aging and disease. We hence propose a generative model which uses asymptotic statistical theory and a robust estimation procedure to perform a supervised deconvolution of bulk RNA-seq samples to produce cell type proportion estimates. We demonstrate the effectiveness of our approach in several scenarios with real data and also discuss several novel extensions made uniquely possible by our paradigm.}
}

EndNote citation:

%0 Thesis
%A Hong, Justin
%A Erdmann-Pham, Dan D.
%A Fischer, Jonathan
%A Song, Yun S.
%T A Likelihood-based Deconvolution of Bulk Gene Expression Data Using Single-cell References
%I EECS Department, University of California, Berkeley
%D 2021
%8 May 1
%@ UCB/EECS-2021-21
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2021/EECS-2021-21.html
%F Hong:EECS-2021-21