Richard Lin

EECS Department, University of California, Berkeley

Technical Report No. UCB/EECS-2021-107

May 14, 2021

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2021/EECS-2021-107.pdf

Spreadsheets are easy to learn and provide intuitive controls for exploring data, but they scale poorly and make it hard to perform certain tasks in compared to code. Dataframes are more performant than spreadsheets and can support significantly larger datasets, but have a steeper learning curve and are less interactive. The respective advantages and disadvantages of spreadsheets and dataframes lead data scientists to switch between the two for various steps. Rather than regarding them as separate workflows, we want to integrate them into a single workflow so that users can take advantage of both tools.

We present Modin Spreadsheet, a spreadsheet UI for dataframes with a specific implementation choice based on the popular Modin dataframe system. Modin Spreadsheet builds off of Qgrid in modeling spreadsheet data as a dataframe and improves on traditional spreadsheet software in the aspects of interactivity, scalability, and reproducibility. Modin Spreadsheet’s integration of spreadsheets into a coding interface allows it to create a new form of reproducibility for spreadsheets through the representation of spreadsheet changes as dataframe code.

Advisors: Aditya Parameswaran


BibTeX citation:

@mastersthesis{Lin:EECS-2021-107,
    Author= {Lin, Richard},
    Editor= {Parameswaran, Aditya},
    Title= {A Spreadsheet Interface for Dataframes},
    School= {EECS Department, University of California, Berkeley},
    Year= {2021},
    Month= {May},
    Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2021/EECS-2021-107.html},
    Number= {UCB/EECS-2021-107},
    Abstract= {Spreadsheets are easy to learn and provide intuitive controls for exploring data, but they scale poorly and make it hard to perform certain tasks in compared to code. Dataframes are more performant than spreadsheets and can support significantly larger datasets, but have a steeper learning curve and are less interactive. The respective advantages and disadvantages of spreadsheets and dataframes lead data scientists to switch between the two for various steps. Rather than regarding them as separate workflows, we want to integrate them into a single workflow so that users can take advantage of both tools. 

We present Modin Spreadsheet, a spreadsheet UI for dataframes with a specific implementation choice based on the popular Modin dataframe system. Modin Spreadsheet builds off of Qgrid in modeling spreadsheet data as a dataframe and improves on traditional spreadsheet software in the aspects of interactivity, scalability, and reproducibility. Modin Spreadsheet’s integration of spreadsheets into a coding interface allows it to create a new form of reproducibility for spreadsheets through the representation of spreadsheet changes as dataframe code.},
}

EndNote citation:

%0 Thesis
%A Lin, Richard 
%E Parameswaran, Aditya 
%T A Spreadsheet Interface for Dataframes
%I EECS Department, University of California, Berkeley
%D 2021
%8 May 14
%@ UCB/EECS-2021-107
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2021/EECS-2021-107.html
%F Lin:EECS-2021-107