Ferrer, L., Bratt, H., Richey, C., Franco, H., Abrash, V., & Precoda, K. (2014, May). Lexical stress classification for language learning using spectral and segmental features. In 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 7704-7708). IEEE.
We present a system for detecting lexical stress in English words spoken by English learners. The system uses both spectral and segmental features to detect three levels of stress for each syllable in a word. The segmental features are computed on the vowels and include normalized energy, pitch, spectral tilt and duration measurements. The spectral features are computed at the frame level and are modeled by one Gaussian Mixture Model (GMM) for each stress class. These GMMs are used to obtain segmental posteriors, which are then appended to the segmental features to obtain a final set of GMMs. The segmental GMMs are used to obtain posteriors for each stress class. The system was tested on English speech from native English-speaking children and from Japanese-speaking children with variable levels of English proficiency. Our algorithm results in an error rate of approximately 13% on native data and 20%on Japanese non-native data.