Abstract
An acoustic con dence measure for acceptance/rejection of recognition hypotheses for continuous speech utterances is proposed. This measure is useful for rejecting utterances that are out of domain, or contain out-of-vocabulary words or speech disfluencies. A phone-based approach is implemented so that a single global threshold can be applied to hypothesis rejection for any word sequence. Phone con fidence is computed for each frame of speech as the posterior phone probability given the acoustic observation. Word sequence confi dence is evaluated as the average phone con fidence, either by weighting all frames equally or by normalizing by phone duration. The con fidence measure is tested on a database of spoken company names. When normalized by phone duration, it achieves, in some cases with less computational expense, rejection performance comparable to a baseline system implementing a common fi ller-model approach. […]
Share this



