A. Lawson, L. Ferrer, W. Wang and J. Murray, “Detection of Demographics and Identity in Spontaneous Speech and Writing,” Chapter in Multimedia Data Mining and Analytics: Disruptive Innovation, Baughman, A., Gao, J., Pan, J.-Y., Petrushin, V.A. (Eds.), Springer, 2015.
This chapter focuses on the automatic identification of demographic traits and identity in both speech and writing. We address language use in the virtual world of on-line games and text entry on mobile devices in the form of chat, email and nicknames and demonstrate text factors that correlate with demographics, such as age, gender, personality and interaction style. Also presented here is work on speaker identification in spontaneous language use, where we describe the state of the art in verification, feature extraction, modeling and calibration across multiple environmental conditions. Finally, we bring speech and writing together to explore approaches to user authentication that span language in general. We discuss how speech-specific factors such as intonation, and writing-specific features such as spelling, punctuation and typing correction correlate and predict one another as a function of users’ sociolinguistic characteristics.