Material Detail
Dynamic Time Warping's New Youth
This video was recorded at IEEE International Conference on Multimedia & Expo (ICME), Melbourne 2012. Before the use of Hidden Markov Models (HMM) became ubiquitous in speech‐related applications, pattern matching algorithms like the well known Dynamic Time Warping (DTW) algorithm [1] were extensively used for applications such as spoken keyword recognition [2]. At the time, the main drawbacks of this technology were its computational cost (given the machinery available at the time) and the lack of generalization when matching acoustic sequences from different speakers or different acoustic contexts. The availability of labeled datasets used for training pushed pattern matching techniques aside in favor of HMMs. Still, HMMs have several well known weaknesses, such as overgeneralization given the training data, lack of robustness to changing noise conditions and the need to have large corpora of well‐labeled training data, limiting their suitability for some speech applications. For this reason, recently some research groups started to look again at DTW as a plausible alternative, and worked on smoothing those issues that made it unsuitable in the past. On the one hand, new acoustic features are being researched [3] to make the matching as independent as possible to the speaker, while keeping the content. On the other hand, although computing power is much improved from the 70's, DTW several enhancements have been proposed [4,5] in order to allow for more challenging tasks than in the past. Some of the tasks where pattern‐matching (and in particular DTW) approaches are currently applied are: automatic discovery of repeated patterns in speech, query‐by‐example voice search, pattern‐based speech recognition and low‐resource languages analysis. References: [1] H. Sakoe and S. Chiba, "Dynamic programming algorithm optimiza‐ tion for spoken word recognition," IEEE Transactions on Acoustics, Speech and Signal Processing, vol. 26, pp. 43–49, 1978. [2] Alan L. Higgins and Robert E. Wohlford, "Keyword recognition using template concatenation," in In Proc. ICASSP 1985, 1985. [3] G. Aradilla, "Using Posterior‐Based Features in Template Matching for Speech Recognition," in ICSLP, 2006. [4] X. Anguera, R. Macrae, and N. Oliver, "Partial Sequence Matching using an Unbounded Dynamic Time Warping Algorithm," ICASSP, 2010. [5] A. Jansen and B. V. Durme, "Efficient Spoken Term Discovery Using Randomized Algorithms," in ASRU, 2011.
Quality
- User Rating
- Comments
- Learning Exercises
- Bookmark Collections
- Course ePortfolios
- Accessibility Info