Infusing Parallelism into Introductory Computer Science Curriculum using MapReduce
Matthew Johnson and Robert H. Liao and Alexander Rasmussen and Ramesh Sridharan and Dan Garcia and Brian K. Harvey
EECS Department, University of California, Berkeley
Technical Report No. UCB/EECS-2008-34
April 10, 2008
http://www2.eecs.berkeley.edu/Pubs/TechRpts/2008/EECS-2008-34.pdf
We have incorporated cluster computing fundamentals into the introductory computer science curriculum at UC Berkeley. For the first course, we have developed coursework and programming problems in Scheme centered around Google¿s MapReduce. To allow students only familiar with Scheme to write and run MapReduce programs, we designed a functional interface in Scheme and implemented software to allow tasks to be run in parallel on a cluster. The streamlined interface enables students to focus on programming to the essence of the MapReduce model and avoid the potentially cumbersome details in the MapReduce implementation, and so it delivers a clear pedagogical advantage.
The interface¿s simplicity and purely functional treatment allows students to tackle data-parallel problems after the first two-thirds of the first introductory course.
In this paper we describe the system implementation to interface our Scheme interpreter with a cluster running Hadoop (a Java-based MapReduce implementation). Our design can serve as a prototype for other such interfaces in educational environments that do not use Java and therefore cannot simply use Hadoop. We also outline the MapReduce exercises we have introduced to our introductory course, which allow students in an introductory programming class to begin to work with data-parallel programs and designs.
BibTeX citation:
@techreport{Johnson:EECS-2008-34, Author= {Johnson, Matthew and Liao, Robert H. and Rasmussen, Alexander and Sridharan, Ramesh and Garcia, Dan and Harvey, Brian K.}, Title= {Infusing Parallelism into Introductory Computer Science Curriculum using MapReduce}, Year= {2008}, Month= {Apr}, Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2008/EECS-2008-34.html}, Number= {UCB/EECS-2008-34}, Abstract= {We have incorporated cluster computing fundamentals into the introductory computer science curriculum at UC Berkeley. For the first course, we have developed coursework and programming problems in Scheme centered around Google¿s MapReduce. To allow students only familiar with Scheme to write and run MapReduce programs, we designed a functional interface in Scheme and implemented software to allow tasks to be run in parallel on a cluster. The streamlined interface enables students to focus on programming to the essence of the MapReduce model and avoid the potentially cumbersome details in the MapReduce implementation, and so it delivers a clear pedagogical advantage. The interface¿s simplicity and purely functional treatment allows students to tackle data-parallel problems after the first two-thirds of the first introductory course. In this paper we describe the system implementation to interface our Scheme interpreter with a cluster running Hadoop (a Java-based MapReduce implementation). Our design can serve as a prototype for other such interfaces in educational environments that do not use Java and therefore cannot simply use Hadoop. We also outline the MapReduce exercises we have introduced to our introductory course, which allow students in an introductory programming class to begin to work with data-parallel programs and designs.}, }
EndNote citation:
%0 Report %A Johnson, Matthew %A Liao, Robert H. %A Rasmussen, Alexander %A Sridharan, Ramesh %A Garcia, Dan %A Harvey, Brian K. %T Infusing Parallelism into Introductory Computer Science Curriculum using MapReduce %I EECS Department, University of California, Berkeley %D 2008 %8 April 10 %@ UCB/EECS-2008-34 %U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2008/EECS-2008-34.html %F Johnson:EECS-2008-34