CIRCL Primer: Data science education


Vahey, P., Finzer, W., Yarnall, L., & Schank, P. (2017). CIRCL Primer: Data science education. In CIRCL Primer Series.


Data Science is an interdisciplinary field that seeks to derive insights and knowledge from the analysis of typically very large data sets. While data science education is relatively new, there are currently many undergraduate and graduate degree programs available in data science. This primer is an overview of the early state of data science education in grades K-12.

New technology has made it easier than ever to capture, store, and arrange many forms of data about the world. Low-cost sensors capture and store scientific data from various environments. Optical character recognition (OCR) technology converts volumes of texts into data for analysis. Image recognition technology permits rapid search of photographic and graphic databases. Portable audio and video recording devices now collect many types of human interactions in different situations and settings.

Data science is the field that attempts to build knowledge from this newly available massive data store. While there is no consensus definition of data science, there is widespread agreement that data science goes beyond the application of traditional disciplinary or statistical methods. Drew Conway’s Data Science Venn Diagram describes data science as the partial union of content expertise, math and statistics knowledge, and hacking (or computer science) skills. Some characteristics are:

  • The investigator is “awash in data” (the dataset may at first be too overwhelming for there to be a clear path to analysis)
  • The analysis requires “data moves” that go beyond application of known procedures (for instance, one may have to create a completely new visualization)
  • The data are “unruly”, meaning that a single observation may have many pieces of information
  • The data are not typically easily stored in traditional data table format

While traditional statistical tools are central to data science, investigators may use more exploratory techniques such as machine learning or visualization to find patterns in data. Data science education introduces students to the tools, dispositions, and techniques:

  • Running experiments and collecting data, typically in science class
  • Conducting exploratory data analysis of one’s own data or of others’ data using different visualization tools
  • Statistics, typically in mathematics class

At its core, data science requires students to engage in cross disciplinary thinking. While it is unrealistic to expect K-12 students to engage in all aspects of data science, especially in the elementary and middle grades, educators are beginning to understand how we can incorporate appropriate tools and techniques for each grade level, and create and manage engaging data science classroom activities.

Read more from SRI