back icon
close icon

Capture phrases in quotes for more specific queries (e.g. "rocket ship" or "Fred Lynn")

Conference Proceeding  April 1, 2018

Crowdsourcing Emotional Speech

SRI Authors Jennifer Smith, Andreas Tsiartas



J. Smith, A. Tsiartas, V. Wagner, E. Shriberg and N. Bassiou, “Crowdsourcing emotional speech,” 2018 IEEE 2018 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Calgary, Alberta, Canada, 2018, pp. 5139-5143.


We describe the methodology for the collection and annotation of a large corpus of emotional speech data through crowdsourcing. The corpus offers 187 hours of data from 2,965 subjects. Data includes non-emotional recordings from each subject as well as recordings for five emotions: angry, happy-low-arousal, happy-high-arousal, neutral, and sad. The data consist of spontaneous speech elicited from subjects via a web-based tool. Subjects used their own personal recording equipment, resulting in a data set that contains variation in room acoustics, microphone, etc. This offers the advantage of matching the type of variation one would expect to see when exposing speech technology in the wild in a web-based environment. The annotation scheme covers the quality of emotion expressed through the tone of voice and what was said, along with common audioquality issues. We discuss lessons learned in the process of the creation of this corpus.

How can we help?

Once you hit send…

We’ll match your inquiry to the person who can best help you. Expect a response within 48 hours.

Our Privacy Policy