How AI can improve student data literacy


SRI education and AI experts collaborate to help students thrive in a data-driven world.


Over the last decade, scores in Data Analysis and Statistics on the NAEP mathematics exam dropped 10 points for fourth-grade students and 17 points for eighth-grade students. The implications are clear: U.S. students are at risk of falling behind in a world where future opportunities will require advanced data fluency.

Recognizing the pressing need to improve student data literacy skills in the U.S., SRI education researchers started to explore how they could leverage AI to provide students with personalized and accessible opportunities to explore data. SRI’s Safe and Accessible Data Interactions in Education (SADIE) project, an internally funded R&D effort, is building an AI-driven solution to help students overcome barriers to data exploration.

“This solution speaks to what is truly unique about SRI: our ability to synthesize the deep contextual expertise of researchers in SRI’s Education Division with cutting-edge AI frameworks from our technical lab.” — Jennifer Nakamura

“Initially, we were focused on how we could leverage conversational AI to support students with disabilities,” says Shari Dubos, a principal education researcher at SRI and co-principal investigator for the SADIE effort. “What we realized, eventually, is that all students need better data literacy tools. So we’re developing a set of capabilities that will greatly increase the accessibility of data literacy-focused EdTech products for all students, including those with disabilities, allowing them to engage more deeply with data-intensive material.”

Building a system that works for today’s EdTech

EdTech companies recognize this data literacy gap, Dubos points out, and they’re rapidly rolling out new GenAI-native products to address it.

“Where SRI can play a major role is in the area of safety and reliability,” Dubos emphasizes. “We’re already seeing teacher and parent mistrust toward GenAI-driven platforms. Administrators have concerns about data privacy. And anyone who works alongside students with disabilities knows that, often, fine-tuning accessibility features like screen readers and speech inputs are not the first priority for product developers. So if we could help make AI-driven data literacy tools trustworthy and accessible, that would be a huge win.”

The SADIE team recognized that building yet another standalone learning app would not be the most impactful approach. Instead, SADIE is being constructed as a “middle layer” that can sit between large language models (LLMs) and student-facing EdTech products that make use of them.

This middle layer will allow student-facing application to ensure robust content moderation, multimodal data accessibility, and data privacy. Content moderation means, among other things, constraining LLMs to safe responses that are relevant to data literacy, while also redirecting inappropriate or off-task queries. SADIE will promote multimodal data accessibility by enabling LLMs to generate high-quality graphs for students who need to visualize data and high-quality alternative text that is accessible to students using screen readers. Finally, the system will prevent LLMs from accessing sensitive student data and personally identifiable information.

“This solution speaks to what is truly unique about SRI: our ability to synthesize the deep contextual expertise of researchers in SRI’s Education Division with cutting-edge AI frameworks from our technical labs,” observes Jennifer Nakamura, the co-principal investigator for SADIE.

“At SRI, we know that AI can play a massively positive role in education. The biggest risk is that these systems will be rolled out carelessly, which will erode trust.” — Shari Dubos

By integrating and customizing automatic speech recognition (ASR) software created by SRI’s Speech Technology and Research Lab, for example, the SADIE team was able to significantly improve the system’s ability to accurately capture speech from middle school-aged students in noisy classroom environments. Compared to standard ASR systems, this new model maintains competitive performance while enabling faster inference because of its smaller footprint.

Moving from the lab to the classroom

After co-designing and user testing SADIE with middle school students and teachers throughout the past year, the team completed their work on a prototype that includes several technical capabilities that will comprise the SADIE middle layer.

This year, the research team will build in retrieval-augmented generation to further improve the accuracy and appropriateness of LLM-generated responses during data literacy instruction, refine the data privacy parameters to match the needs of emerging EdTech products, and demonstrate the system to commercial collaborators eager to ensure that AI innovations are designed to empower all students.

“At SRI, we know that AI can play a massively positive role in education,” concludes Dubos. “The biggest risk is that these systems will be rolled out carelessly, which will erode trust. That’s why we’re so excited about bringing SADIE to forward-thinking education innovators who are adamant about delivering solutions that can be trusted by parents, educators, and especially students.”

Learn how SRI Education advances data-informed insights for better teaching and learning.


Read more from SRI