Jupyter’s Archive: Searchable Output Histories for Computational Notebooks

Kunal Chaudhary

EECS Department, University of California, Berkeley

Technical Report No. UCB/EECS-2019-72

May 17, 2019

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2019/EECS-2019-72.pdf

When using a computational notebook, programmers tend to run, overwrite, and delete cells many times. These actions, which are core to exploratory programming, tend to create a long history of outputs that become fragmented and difficult to track. These outputs are critical to returning to past states when programmers make mistakes in implementation. They are also critical to understanding the evolution of a notebook which can help programmers improve how they code in different situations. To resolve this, this paper introduces the Output Archive, a thumbnail-based output history built into Jupyter Lab that automatically records all outputs produced over the lifetime of a notebook and makes the code that produced them available. This paper also introduces a new class of grouping filters which allows users to navigate large output histories by clustering outputs based on similarities in their underlying code (similar function name, object names, parameters). To test the tool, a usability study was run on 12 computational notebook users who found the Output Archive useful and were able to use its accompanying grouping filters to quickly find important outputs.

Advisors: Björn Hartmann

BibTeX citation:

@mastersthesis{Chaudhary:EECS-2019-72,
    Author= {Chaudhary, Kunal},
    Editor= {Head, Andrew and Hartmann, Björn},
    Title= {Jupyter’s Archive: Searchable Output Histories for Computational Notebooks},
    School= {EECS Department, University of California, Berkeley},
    Year= {2019},
    Month= {May},
    Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2019/EECS-2019-72.html},
    Number= {UCB/EECS-2019-72},
    Abstract= {When using a computational notebook, programmers tend to run, overwrite, and delete cells many times. These actions, which are core to exploratory programming, tend to create a long history of outputs that become fragmented and difficult to track. These outputs are critical to returning to past states when programmers make mistakes in implementation. They are also critical to understanding the evolution of a notebook which can help programmers improve how they code in different situations. To resolve this, this paper introduces the Output Archive, a thumbnail-based output history built into Jupyter Lab that automatically records all outputs produced over the lifetime of a notebook and makes the code that produced them available. This paper also introduces a new class of grouping filters which allows users to navigate large output histories by clustering outputs based on similarities in their underlying code (similar function name, object names, parameters). To test the tool, a usability study was run on 12 computational notebook users who found the Output Archive useful and were able to use its accompanying grouping filters to quickly find important outputs.},
}

EndNote citation:

%0 Thesis
%A Chaudhary, Kunal 
%E Head, Andrew 
%E Hartmann, Björn 
%T Jupyter’s Archive: Searchable Output Histories for Computational Notebooks
%I EECS Department, University of California, Berkeley
%D 2019
%8 May 17
%@ UCB/EECS-2019-72
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2019/EECS-2019-72.html
%F Chaudhary:EECS-2019-72