Kunal Agarwal

EECS Department, University of California, Berkeley

Technical Report No. UCB/EECS-2022-80

May 12, 2022

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2022/EECS-2022-80.pdf

Exploratory Data Analysis (EDA) is a necessary and vital part of data science that usually occurs in computational notebooks with tools such as pandas. One of the most popular tools for EDA is Lux which visualizes data stored in pandas DataFrames in a dashboard displayed in a Jupyter Notebook. However, as datasets become larger in size, the computation neces- sary to compute these visualizations becomes larger as well, slowing down Lux. We consider the use of sampling to accelerate the computation required for generating visualizations. We analyzed how Lux performs on large datasets and determined what parts of Lux could be accelerated using data sampling. We then integrate our sampling method into Lux and demonstrate a significant speedup while not compromising the quality of the visualizations produced by Lux.

Advisors: Aditya Parameswaran


BibTeX citation:

@mastersthesis{Agarwal:EECS-2022-80,
    Author= {Agarwal, Kunal},
    Title= {Accelerating Visual Data Exploration via Sampling: A Case Study with Lux},
    School= {EECS Department, University of California, Berkeley},
    Year= {2022},
    Month= {May},
    Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2022/EECS-2022-80.html},
    Number= {UCB/EECS-2022-80},
    Abstract= {Exploratory Data Analysis (EDA) is a necessary and vital part of data science that usually occurs in computational notebooks with tools such as pandas. One of the most popular tools for EDA is Lux which visualizes data stored in pandas DataFrames in a dashboard displayed in a Jupyter Notebook. However, as datasets become larger in size, the computation neces- sary to compute these visualizations becomes larger as well, slowing down Lux. We consider the use of sampling to accelerate the computation required for generating visualizations. We analyzed how Lux performs on large datasets and determined what parts of Lux could be accelerated using data sampling. We then integrate our sampling method into Lux and demonstrate a significant speedup while not compromising the quality of the visualizations produced by Lux.},
}

EndNote citation:

%0 Thesis
%A Agarwal, Kunal 
%T Accelerating Visual Data Exploration via Sampling: A Case Study with Lux
%I EECS Department, University of California, Berkeley
%D 2022
%8 May 12
%@ UCB/EECS-2022-80
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2022/EECS-2022-80.html
%F Agarwal:EECS-2022-80