Understanding Data Analysis Activity via Log Analysis

Sara Alspaugh

EECS Department
University of California, Berkeley
Technical Report No. UCB/EECS-2017-137
August 3, 2017

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2017/EECS-2017-137.pdf

The study of user analysis behavior is of interest to the designers of analysis tools. Specific questions studied include: What types of tasks do users perform using this analysis tool? What approaches do users take to gain insights? What interface features help or hinder users in their work? What are the the distinguishing characteristics of different types of users? These questions are often investigated through controlled experiments, observational studies, user interviews, or surveys. An alternative avenue of investigation is to analyze the logs – the records of user activity – generated by analysis tools themselves. In this dissertation we present two case studies using log analysis to understand user behavior. In the first, we analyze records of user queries from Splunk, a system for log analysis, as well as a survey of Splunk users. In the second, we analyze detailed event logs and application state from Tableau, a system for visualizing relational data. We focus in particular on methods of identifying higher-level units of activity, which we refer to as tasks. We include a discussion of the particular challenges associated with collecting and analyzing log data from analysis systems. In addition to this discussion, our contributions include the description of two different approaches for identifying higher-level analysis activity from logs and a summary of the tasks represented in our datasets.

Advisor: Randy H. Katz and Marti Hearst


BibTeX citation:

@phdthesis{Alspaugh:EECS-2017-137,
    Author = {Alspaugh, Sara},
    Title = {Understanding Data Analysis Activity via Log Analysis},
    School = {EECS Department, University of California, Berkeley},
    Year = {2017},
    Month = {Aug},
    URL = {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2017/EECS-2017-137.html},
    Number = {UCB/EECS-2017-137},
    Abstract = {The study of user analysis behavior is of interest to the designers of analysis tools. Specific questions studied include: What types of tasks do users perform using this analysis tool? What approaches do users take to gain insights? What interface features help or hinder users in their work? What are the the distinguishing characteristics of different types of users? These questions are often investigated through controlled experiments, observational studies, user interviews, or surveys. An alternative avenue of investigation is to analyze the logs – the records of user activity – generated by analysis tools themselves. In this dissertation we present two case studies using log analysis to understand user behavior. In the first, we analyze records of user queries from Splunk, a system for log analysis, as well as a survey of Splunk users. In the second, we analyze detailed event logs and application state from Tableau, a system for visualizing relational data. We focus in particular on methods of identifying higher-level units of activity, which we refer to as tasks. We include a discussion of the particular challenges associated with collecting and analyzing log data from analysis systems. In addition to this discussion, our contributions include the description of two different approaches for identifying higher-level analysis activity from logs and a summary of the tasks represented in our datasets.}
}

EndNote citation:

%0 Thesis
%A Alspaugh, Sara
%T Understanding Data Analysis Activity via Log Analysis
%I EECS Department, University of California, Berkeley
%D 2017
%8 August 3
%@ UCB/EECS-2017-137
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2017/EECS-2017-137.html
%F Alspaugh:EECS-2017-137