The SRI speech-based collaborative learning corpus

, ,


C. Richey, C. D’Angelo, N. Alozie, H. Bratt and E. Shriberg, “ The SRI speech-based collaborative learning corpus,” in Proc. INTERSPEECH 2016, pp. 1550-1554, September 2016.


We introduce the SRI speech-based collaborative learning corpus, a novel collection designed for the investigation and measurement of how students collaborate together in small groups. This is a multi-speaker corpus containing high-quality audio recordings of middle school students working in groups of three to solve mathematical problems. Each student was recorded via a head-mounted noise-cancelling microphone. Each group was also recorded via a stereo microphone placed nearby. A total of 80 sessions were collected with the participation of 134 students. The average duration of a session was 20 minutes. All students spoke English; for some students, English was a second language. Sessions have been annotated with time stamps to indicate which mathematical problem the students were solving and which student was speaking. Sessions have also been hand annotated with common indicators of collaboration for each speaker (e.g., inviting others to contribute, planning) and the overall collaboration quality for each problem. The corpus will be useful to education researchers interested in collaborative learning and to speech researchers interested in children’s speech, speech analytics, and speech diarization. The corpus, both audio and annotation, will be made available to researchers.

Read more from SRI