August 1, 2011

Analysis and comparison of recent MLP features for LVCSR systems

Citation

F. Valente, M. M. Doss, and W. Wang, “Analysis and comparison of recent MLP features for LVCSR systems.” in Proc. Interspeech, 2011, pp. 1245–1248.

Abstract

MLP based front-ends have evolved in different ways in recent years beyond the seminal TANDEM-PLP features. This paper aims at providing a fair comparison of these recent progresses including the use of different long/short temporal inputs (PLP,MRASTA,wLP-TRAPS,DCT-TRAPS) and the use of complex architectures (bottleneck, hierarchy, multistream) that go beyond the conventional three layer MLP. Furthermore, the paper identifies which of these actually provide advantages over the conventional TANDEM-PLP. The investigation is carried on an LVCSR task for recognition of Mandarin Broadcast speech and results are analyzed in terms of Character Error Rate and phonetic confusions. Results reveal that as stand alone features, multistream front-ends can outperform by 10% conventional MFCC while TANDEM-PLP only improve by 1%. On the other hand, when used in concatenation with MFCC features, hierarchical/bottleneck front-ends reduce the character error rate by +18% relative compared to +14% relative from TANDEM-PLP. The various input long-term representations recently developed provide comparable performances.

Index Terms: TANDEM features, Multilayer Perceptron, Acoustic features, GALE project, LVCSR.

↓ Download

Analysis and comparison of recent MLP features for LVCSR systems

Abstract

Read more from SRI

Researchers develop materials that can take on the toughest conditions

Podcast: Re-imagining instructional quality and coaching

SRI’s Genome Explorer: Enhanced genome browser delivers better user experience