Wei, K., Liu, Y., Kirchhoff, K., Bartels, C., & Bilmes, J. (2014, 4-9 May). Submodular subset selection for large-scale speech training data. Paper presented at the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’14), Florence, Italy.
We address the problem of subselecting a large set of acoustic data to train automatic speech recognition (ASR) systems. To this end, we apply a novel data selection technique based on constrained submodular function maximization. Though NP-hard, the combinatorial optimization problem can be approximately solved by a simple and scalable greedy algorithm with constant-factor guarantees. We evaluate our approach by subselecting data from 1300 hours of conversational English telephone data to train two types large vocabulary speech recognizers, one with Gaussian mixture model (GMM) based acoustic models, and another based on deep neural networks (DNNs). We show that training data can be reduced significantly, and that our technique outperforms both random selection and a previously proposed selection method utilizing comparable resources. Notably, using the submodular selection method, the DNN system using only about 5% of the training data is able to achieve performance on par with the GMM system using 100% of the training data — with the baseline subset selection methods, however, the DNN system is unable to accomplish this correspondence.