Group Probability-Weighted Tree Sums for Interpretable Modeling of Heterogeneous Data

Keyan Abou-Nasseri

EECS Department
University of California, Berkeley
Technical Report No. UCB/EECS-2022-167
May 29, 2022

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2022/EECS-2022-167.pdf

Machine learning in high-stakes domains, such as healthcare, faces two critical challenges: (1) generalizing to diverse data distributions given limited training data while (2) maintaining interpretability. To address these challenges, we propose an instance-weighted tree-sum method that effectively pools data across diverse groups to output a concise, rule-based model. Given distinct groups of instances in a dataset (e.g., medical patients of different ages or from different treatment sites), our method first estimates group membership probabilities for each instance. Then, it uses these estimates as instance weights in FIGS [1], an existing greedy tree-sums method, to grow an set of decision trees whose values sum to the final prediction. We call this new method Group Probability-Weighted Tree Sums (G-FIGS). Extensive experiments on important clinical decision instruments datasets show that G-FIGS achieves state-of-the-art prediction performance; e.g., holding the level of sensitivity fixed at 92%, G-FIGS increases specificity for identifying cervical spine injury (CSI) by up to 10% over CART and up to 3% over FIGS alone, with larger gains at higher sensitivity levels. By keeping the total number of tree splits below 16 in FIGS, the final models remain interpretable, and we find that they match medical domain expertise. All code, data, and models are released on Github: Group Probability-Weighted Tree Sums is integrated into the Python package imodels [2] with an sklearn-compatible API, and experiments for reproducing the results here can be found at Yu-Group/imodels-experiments.

Advisor: Trevor Darrell


BibTeX citation:

@mastersthesis{Abou-Nasseri:EECS-2022-167,
    Author = {Abou-Nasseri, Keyan},
    Title = {Group Probability-Weighted Tree Sums for Interpretable Modeling of Heterogeneous Data},
    School = {EECS Department, University of California, Berkeley},
    Year = {2022},
    Month = {May},
    URL = {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2022/EECS-2022-167.html},
    Number = {UCB/EECS-2022-167},
    Abstract = {Machine learning in high-stakes domains, such as healthcare, faces two critical challenges: (1) generalizing to diverse data distributions given limited training data while (2) maintaining interpretability. To address these challenges, we propose an instance-weighted tree-sum method that effectively pools data across diverse groups to output a concise, rule-based model. Given distinct groups of instances in a dataset (e.g., medical patients of different ages or from different treatment sites), our method first estimates group membership probabilities for each instance. Then, it uses these estimates as instance weights in FIGS [1], an existing greedy tree-sums method, to grow an set of decision trees whose values sum to the final prediction. We call this new method Group Probability-Weighted Tree Sums (G-FIGS). Extensive experiments on important clinical decision instruments datasets show that G-FIGS achieves state-of-the-art prediction performance; e.g., holding the level of sensitivity fixed at 92%, G-FIGS increases specificity for identifying cervical spine injury (CSI) by up to 10% over CART and up to 3% over FIGS alone, with larger gains at higher sensitivity levels. By keeping the total number of tree splits below 16 in FIGS, the final models remain interpretable, and we find that they match medical domain expertise. All code, data, and models are released on Github: Group Probability-Weighted Tree Sums is integrated into the Python package imodels [2] with an sklearn-compatible API, and experiments for reproducing the results here can be found at Yu-Group/imodels-experiments.}
}

EndNote citation:

%0 Thesis
%A Abou-Nasseri, Keyan
%T Group Probability-Weighted Tree Sums for Interpretable Modeling of Heterogeneous Data
%I EECS Department, University of California, Berkeley
%D 2022
%8 May 29
%@ UCB/EECS-2022-167
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2022/EECS-2022-167.html
%F Abou-Nasseri:EECS-2022-167