A. O. Hatch and A. Stolcke, “Generalized Linear Kernels for One-Versus-All Classification: Application to Speaker Recognition,” 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings, 2006, pp. V-V, doi: 10.1109/ICASSP.2006.1661343.
In this paper, we examine the problem of kernel selection for one-versus-all (OVA) classification of multiclass data with support vector machines (SVMs). We focus specifically on the problem of training what we refer to as generalized linear kernels–that is, kernels of the form, k(x_1,x_2) = x_1^T R x_2, where R is a positive semidefinite matrix. Our approach for training k(x_1,x_2) involves first constructing a set of upper bounds on the rates of false positives and false negatives at a given score threshold. Under various conditions, minimizing these bounds leads to the closed-form solution, R = W^-1, where W is the expected within-class covariance matrix of the data. We tested various parameterizations of R, including a diagonal parameterization that simply performs per-feature variance normalization, on the 1-conversation training condition of the SRE-2003 and SRE-2004 speaker recognition tasks. In experiments on a state-of-the-art MLLR-SVM speaker recognition system , the parameterization, R = W^ 1s , where W^sis a smoothed estimate of W, achieves relative reductions in theminimum decision cost function (DCF)  of up to 22% below theresults obtained when R does per-feature variance normalization.