Discriminative Acoustic Features for Deployable Speech Recognition

Arlo Faria

EECS Department
University of California, Berkeley
Technical Report No. UCB/EECS-2016-199
December 13, 2016

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2016/EECS-2016-199.pdf

This work explores discriminative acoustic features: audio signal representations produced by multilayer perceptrons, such as Tandem and bottleneck features used in state-of-the-art automatic speech recognition systems. Experimental results highlight the factors that influence performance in terms of accuracy and speed; novel approaches are introduced to provide improvement in both regards. The overall emphasis is on discovering techniques that are suitable for practical deployment, translating effectiveness beyond a traditional research setting. Applications include real-time low-latency audio stream processing on mobile devices – as well as systems that are built with low-quality training data and must encounter the diverse complications of "real-world" use cases.

Advisor: Nelson Morgan


BibTeX citation:

@phdthesis{Faria:EECS-2016-199,
    Author = {Faria, Arlo},
    Title = {Discriminative Acoustic Features for Deployable Speech Recognition},
    School = {EECS Department, University of California, Berkeley},
    Year = {2016},
    Month = {Dec},
    URL = {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2016/EECS-2016-199.html},
    Number = {UCB/EECS-2016-199},
    Abstract = {This work explores discriminative acoustic features: audio signal representations produced by multilayer perceptrons, such as Tandem and bottleneck features used in state-of-the-art automatic speech recognition systems. Experimental results highlight the factors that influence performance in terms of accuracy and speed; novel approaches are introduced to provide improvement in both regards. The overall emphasis is on discovering techniques that are suitable for practical deployment, translating effectiveness beyond a traditional research setting. Applications include real-time low-latency audio stream processing on mobile devices – as well as systems that are built with low-quality training data and must encounter the diverse complications of "real-world" use cases.}
}

EndNote citation:

%0 Thesis
%A Faria, Arlo
%T Discriminative Acoustic Features for Deployable Speech Recognition
%I EECS Department, University of California, Berkeley
%D 2016
%8 December 13
%@ UCB/EECS-2016-199
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2016/EECS-2016-199.html
%F Faria:EECS-2016-199