Dept. of Computer Science and Engineering
Oregon Graduate Institute of Science & Technology
After decades of research in speech recognition, the technology is finally entering into the commercial market. A significant challenge is to downsize research laboratory recognizers so that they can be used on platforms with less computational power: Most contemporary laboratory recognizers require too much memory to run, and are too slow for mass applications. This thesis addresses the problem by greatly reducing the number of parameters in the acoustic models. We focus on more compact acoustic models because they constitute a major component of any speech recognizers, and the computation of their likelihoods consumes 50-70% of total recognition time for many typical tasks. The main contribution of this thesis is the formulation of a new acoustic modeling method which we call subspace distribution clustering hidden Markov modeling (SDCHMM). The theory of SDCHMM is based on tying continuous density hidden Markov models (CDHMMs) at a new finer sub-phonetic unit, namely the subspace distribution. Two methods are presented to implement the SDCHMMs. The first implementation requires training a set of intermediate CDHMMs followed by model conversion in which the distributions from the CDHMMs are projected onto orthogonal subspaces, and similar subspace distributions are then tied over all states and all acoustic models in each subspace. By exploiting the combinatorial effect of subspace distribution encoding, all original full-space distributions can be represented by combinations of a small number of subspace distribution prototypes. Consequently, there is a great reduction in the number of model parameters, and thus substantial savings in memory and computation. Furthermore, we demonstrate in the second implementation method that, given prior knowledge of the tying structure of the subspace distributions, SDCHMMs can be trained directly from much less data. This renders SDCHMM very attractive in the practical implementation of acoustic models, speaker-specific training, and speaker/environment adaptation. Evaluation on the ATIS (Airline Travel Information System) task shows that in comparison to a CDHMM system, a SDCHMM system achieves 7- to 18-fold reduction in memory required for acoustic models, runs 30-60% faster, and can be trained with 10-20 times less data, without any loss of recognition accuracy.
Mak, Brian Kan-Wing, "Towards a compact speech recognizer subspace distribution clustering hidden Markov model" (1998). Scholar Archive. 110.