Material Detail

Dynamic Time Warping's New Youth

This video was recorded at IEEE International Conference on Multimedia & Expo (ICME), Melbourne 2012. Before the use of Hidden Markov Models (HMM) became ubiquitous in speech‐related applications, pattern matching algorithms like the well known Dynamic Time Warping (DTW) algorithm [1] were extensively used for applications such as spoken keyword recognition [2]. At the time, the main drawbacks of this technology were its computational cost (given the machinery available at the time) and the lack of generalization when matching acoustic sequences from different speakers or different acoustic contexts. The availability of labeled datasets used for training pushed pattern matching techniques aside in favor of HMMs. Still, HMMs have several well known weaknesses, such as overgeneralization given the training data, lack of robustness to changing noise conditions and the need to have large corpora of well‐labeled training data, limiting their suitability for some speech applications. For this reason, recently some research groups started to look again at DTW as a plausible alternative, and worked on smoothing those issues that made it unsuitable in the past. On the one hand, new acoustic features are being researched [3] to make the matching as independent as possible to the speaker, while keeping the content. On the other hand, although computing power is much improved from the 70's, DTW several enhancements have been proposed [4,5] in order to allow for more challenging tasks than in the past. Some of the tasks where pattern‐matching (and in particular DTW) approaches are currently applied are: automatic discovery of repeated patterns in speech, query‐by‐example voice search, pattern‐based speech recognition and low‐resource languages analysis. References: [1] H. Sakoe and S. Chiba, "Dynamic programming algorithm optimiza‐ tion for spoken word recognition," IEEE Transactions on Acoustics, Speech and Signal Processing, vol. 26, pp. 43–49, 1978. [2] Alan L. Higgins and Robert E. Wohlford, "Keyword recognition using template concatenation," in In Proc. ICASSP 1985, 1985. [3] G. Aradilla, "Using Posterior‐Based Features in Template Matching for Speech Recognition," in ICSLP, 2006. [4] X. Anguera, R. Macrae, and N. Oliver, "Partial Sequence Matching using an Unbounded Dynamic Time Warping Algorithm," ICASSP, 2010. [5] A. Jansen and B. V. Durme, "Efficient Spoken Term Discovery Using Randomized Algorithms," in ASRU, 2011.

Keywords:: videolectures, ocwc, oec

Disciplines:

Science and Technology / Computer Science

Go to Material

Bookmark / Add to Course ePortfolio

Create a Learning Exercise

Add Accessibility Information

Rate

Add a Comment

Quality

User Rating
Comments
Learning Exercises
Bookmark Collections
Course ePortfolios
Accessibility Info

Report Broken Link
Report as Inappropriate

More about this material

Material Type:: Presentation
Date Added to MERLOT:: February 8, 2015
Date Modified in MERLOT:: February 8, 2015
Author:: Xavier Anguera Miro, Telefonica R&D, Barcelona
Submitter:: The Open Education Consortium
Primary Audience:: College General Ed, College Lower Division, College Upper Division
Technical Format:: Video

Mobile Compatibility:: Not specified at this time
Language:: English
Cost Involved:: No
Source Code Available:: No
Creative Commons:: This work is licensed under a Attribution-NonCommercial-NoDerivs 3.0 United States