Pedagogy, Infrastructure, and Analytics for Data Science Education at Scale

Vinitra Swamy

EECS Department, University of California, Berkeley

Technical Report No. UCB/EECS-2018-81

May 19, 2018

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2018/EECS-2018-81.pdf

This report presents an educational computing environment for data science education at scale, highlighted in use at the University of California, Berkeley. With the rise of online learners in massively open computing courses (MOOCs), we detail a relevant technical case study of the decisions made in converting an introductory undergraduate data science course into a series of data science edX MOOCs. The focus of this study is on the student and instructor workflow, distributed system infrastructure, cost analysis, cloud resource allocation, and autograding integration in the scaling process. We implement an analytics pipeline for collecting data from Jupyter notebooks and propose a Deep Knowledge Tracing modification to model student progress on coding assignments.

Advisors: David E. Culler

BibTeX citation:

@mastersthesis{Swamy:EECS-2018-81,
    Author= {Swamy, Vinitra},
    Title= {Pedagogy, Infrastructure, and Analytics for Data Science Education at Scale},
    School= {EECS Department, University of California, Berkeley},
    Year= {2018},
    Month= {May},
    Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2018/EECS-2018-81.html},
    Number= {UCB/EECS-2018-81},
    Abstract= {This report presents an educational computing environment for data science education at scale, highlighted in use at the University of California, Berkeley. With the rise of online learners in massively open computing courses (MOOCs), we detail a relevant technical case
study of the decisions made in converting an introductory undergraduate data science course into a series of data science edX MOOCs. The focus of this study is on the student and instructor workflow, distributed system infrastructure, cost analysis, cloud resource allocation,
and autograding integration in the scaling process. We implement an analytics pipeline for collecting data from Jupyter notebooks and propose a Deep Knowledge Tracing modification to model student progress on coding assignments.},
}

EndNote citation:

%0 Thesis
%A Swamy, Vinitra 
%T Pedagogy, Infrastructure, and Analytics for Data Science Education at Scale
%I EECS Department, University of California, Berkeley
%D 2018
%8 May 19
%@ UCB/EECS-2018-81
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2018/EECS-2018-81.html
%F Swamy:EECS-2018-81