S. S. Kajarekar, H. Bratt, E. Shriberg and R. de Leon, “A Study of Intentional Voice Modifications for Evading Automatic Speaker Recognition,” 2006 IEEE Odyssey – The Speaker and Language Recognition Workshop, 2006, pp. 1-6, doi: 10.1109/ODYSSEY.2006.248123.
We investigate the effect of intentional voice modifications on a state-of-the-art speaker recognition system. The investigation includes data collection, where normal and changed voices are collected from subjects conversing by telephone. For comparison purposes, it also includes an evaluation framework similar to that for NIST extended-data speaker recognition. Results show that the state-of-the-art system gives nearly perfect recognition performance in a clean condition using normal voices. Using the threshold from this condition, it falsely rejects 39 pct. of subjects who change their voices during testing. However, this can be improved to 9 pct. if a threshold from the changed-voice testing condition is used. We also compare machine performance with human performance from a pilot listening experiment. Results show that machine performance is comparable to human performance when normal voices are used for both training and testing. However, the machine outperforms humans when changed voices are used for testing. In general, the results show vulnerability in both humans and speaker recognition systems to changed voices, and suggest a potential for collaboration between human analysts and automatic speaker recognition systems to address this phenomenon.