Rolando Garcia

EECS Department, University of California, Berkeley

Technical Report No. UCB/EECS-2024-142

June 25, 2024

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2024/EECS-2024-142.pdf

We present novel techniques and systems for managing data context within the machine learning (ML) lifecycle. Drawing from a vision laid out in 2018, we present Flor and its evolutions, FlorDB and FlorDB with Build extensions, designed for comprehensive metadata capture and version control in the ML lifecycle. A cornerstone of our approach is the use of an interview study to understand what the ML lifecycle is, and how engineers operationalize machine learning, focusing on MLOps and the iterative model development process. Through the implementation of these systems and their use in real-world applications for lawyers and journalists, we demonstrate the tangible benefits of rich data context in agile model development. In sum, we show how the integration of Application, Build, and Change contexts—The ABCs of Context—enables MLEs to close the loop in the ML lifecycle.

Advisors: Joseph M. Hellerstein


BibTeX citation:

@phdthesis{Garcia:EECS-2024-142,
    Author= {Garcia, Rolando},
    Title= {The Management of Context in the Machine Learning Lifecycle},
    School= {EECS Department, University of California, Berkeley},
    Year= {2024},
    Month= {Jun},
    Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2024/EECS-2024-142.html},
    Number= {UCB/EECS-2024-142},
    Abstract= {We present novel techniques and systems for managing data context within the machine learning (ML) lifecycle. Drawing from a vision laid out in 2018, we present Flor and its evolutions, FlorDB and FlorDB with Build extensions, designed for comprehensive metadata capture and version control in the ML lifecycle. A cornerstone of our approach is the use of an interview study to understand what the ML lifecycle is, and how engineers operationalize machine learning, focusing on MLOps and the iterative model development process. Through the implementation of these systems and their use in real-world applications for lawyers and journalists, we demonstrate the tangible benefits of rich data context in agile model development. In sum, we show how the integration of Application, Build, and Change contexts—The ABCs of Context—enables MLEs to close the loop in the ML lifecycle.},
}

EndNote citation:

%0 Thesis
%A Garcia, Rolando 
%T The Management of Context in the Machine Learning Lifecycle
%I EECS Department, University of California, Berkeley
%D 2024
%8 June 25
%@ UCB/EECS-2024-142
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2024/EECS-2024-142.html
%F Garcia:EECS-2024-142