Discriminative Acoustic Features for Deployable Speech Recognition

Arlo Faria

EECS Department, University of California, Berkeley

Technical Report No. UCB/EECS-2016-199

December 13, 2016

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2016/EECS-2016-199.pdf

This work explores discriminative acoustic features: audio signal representations produced by multilayer perceptrons, such as Tandem and bottleneck features used in state-of-the-art automatic speech recognition systems. Experimental results highlight the factors that influence performance in terms of accuracy and speed; novel approaches are introduced to provide improvement in both regards. The overall emphasis is on discovering techniques that are suitable for practical deployment, translating effectiveness beyond a traditional research setting. Applications include real-time low-latency audio stream processing on mobile devices – as well as systems that are built with low-quality training data and must encounter the diverse complications of "real-world" use cases.

Advisors: Nelson Morgan

BibTeX citation:

@phdthesis{Faria:EECS-2016-199,
    Author= {Faria, Arlo},
    Title= {Discriminative Acoustic Features for Deployable Speech Recognition},
    School= {EECS Department, University of California, Berkeley},
    Year= {2016},
    Month= {Dec},
    Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2016/EECS-2016-199.html},
    Number= {UCB/EECS-2016-199},
    Abstract= {This work explores discriminative acoustic features: audio signal representations produced by multilayer perceptrons, such as Tandem and bottleneck features used in state-of-the-art automatic speech recognition systems. Experimental results highlight the factors that influence performance in terms of accuracy and speed; novel approaches are introduced to provide improvement in both regards. The overall emphasis is on discovering techniques that are suitable for practical deployment, translating effectiveness beyond a traditional research setting. Applications include real-time low-latency audio stream processing on mobile devices – as well as systems that are built with low-quality training data and must encounter the diverse complications of "real-world" use cases.},
}

EndNote citation:

%0 Thesis
%A Faria, Arlo 
%T Discriminative Acoustic Features for Deployable Speech Recognition
%I EECS Department, University of California, Berkeley
%D 2016
%8 December 13
%@ UCB/EECS-2016-199
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2016/EECS-2016-199.html
%F Faria:EECS-2016-199