W. Wang, S. Yemen, K. Precoda, and C. Richey, “Automatic identification of speaker role and agreement/disagreement in broadcast conversation,” in Proc. 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2011), pp. 5556–5559.
We present supervised approaches for detecting speaker roles and agreement/disagreement between speakers in broadcast conversation shows in three languages: English, Arabic, and Mandarin. We develop annotation approaches for a variety of linguistic phenomena. Various lexical, structural, and social network analysis based features are explored, and feature importance is analyzed across the three languages. We also compare the performance when using features extracted from automatically generated annotations against that when using human annotations. The algorithms achieve speaker role labeling accuracy of more than 86 pct. for all three languages. For agreement and disagreement detection, the algorithms achieve precision of 63 pct. to 92 pct. and 55 pct. to 85 pct., respectively, across the three languages.
Keywords: speaker recognition