Designing Education Tools for Undergraduate Data Science Pedagogy at Scale

Session Host/Speaker(s)

UC Berkeley has pioneered the use of scalable, open-source EdTech tools that assist data science instructors in delivering interactive materials. Recently, the demand for data science courses at Berkeley has outpaced the university’s ability to offer them in a traditional setting. This has resulted in the need to develop software that creates an accessible learning environment for over 1,300 undergraduates in a single course while maintaining the reliability of course resources and infrastructure. Several software tools are listed:

  1. JupyterHub 

JupyterHub is a product-agnostic, open-source software that creates on-demand cloud-based Jupyter Notebook servers, and has allowed Berkeley’s data science program to deploy scalable Jupyter infrastructure utilizing cloud computing resources. A cloud-based JupyterHub provides pre-installed software, quicker access to course content, and computing flexibility.

  1. Jupyter Books

Jupyter Books is a subproject of Project Jupyter that creates online textbooks for data-driven courses. Tools such as Binder and Thebelab allow students to run cells directly within the page and play with visualizations. The adaptability of Jupyter Books promotes collaboration across disciplines.

  1. Autograding 

Otter Grader is a light-weight, open-source autograder that allows instructors to grade student submissions locally in batch or use Gradescope’s proprietary autograding service to grade submissions on Gradescope’s server. Otter Service, an optional supplement to the local Otter Grader, further enables instructors to completely customize their grading environments, which is necessary for scaling to class sizes of thousands of students. 

  1. Curriculum Development

Instructors at Berkeley have also created beginner-friendly packages to provide accessibility for students who have no background in programming or statistics, such as datascience and prob140 for the introductory and probability-focused data science courses The accessibility of these packages within these courses has extended the reach of data science to students of all backgrounds.

  1. Miscellaneous Tools

Berkeley DSEP has implemented a variety of other open-source tools that make data science education more accessible. nbforms is a Python package and Ruby server combo that embeds attendance tracking and student surveys within Jupyter Notebooks. It was made so that instructors can receive real-time feedback from students as well as take attendance. Another tool created to make grading easier is nb2pdf, which makes PDF generation simpler for notebooks containing visualizations.

The effect that these tools have had on saving professors hours of infrastructure setup  has been widespread. As of the submission of this proposal, roughly 35 institutions across the country have adopted some form of a data science curriculum from Berkeley.