Ensemble Feature Selection for Multi-Stream Automatic Speech Recognition

David Gelbart

EECS Department, University of California, Berkeley

Technical Report No. UCB/EECS-2008-160

December 15, 2008

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2008/EECS-2008-160.pdf

Multi-stream automatic speech recognition (ASR) systems consisting of an ensemble of classifiers working together, each with its own feature vector, are popular in the research literature. Published work on feature selection for such systems has dealt with indivisible blocks of features. I break from this tradition by investigating feature selection at the level of individual features. I use the OGI ISOLET and Numbers speech corpora, including noisy versions I created using a variety of noises and signal-to-noise ratios. I have made these noisy versions available for use by other researchers, along with my ASR and feature selection scripts.

I start with the random subspace method of ensemble feature selection, in which each feature vector is simply chosen randomly from the feature pool. Using ISOLET, I obtain performance improvements over baseline in almost every case where there is a statistically significant performance difference, but there are many cases with no such difference.

I then try hill-climbing, a wrapper approach that changes a single feature at a time when the change improves a performance score. With ISOLET, hill-climbing gives performance improvements in most cases for noisy data, but no improvement for clean data. I then move to Numbers, for which much more data is available to guide hill-climbing. When using either the clean or noisy Numbers data, hill-climbing gives performance improvements over multi-stream baselines in almost all cases, although it does not improve over the best single-stream baseline. For noisy data, these performance improvements are present even for noise types that were not seen during the hill-climbing process. In mismatched condition tests involving mismatch between clean and noisy data, hill-climbing outperforms all baselines when Opitz's scoring formula is used. I find that this scoring formula, which blends single-classifier accuracy and ensemble diversity, works better for me than ensemble accuracy as a performance score for guiding hill-climbing.

Advisors: Nelson Morgan

BibTeX citation:

@phdthesis{Gelbart:EECS-2008-160,
Author= {Gelbart, David},
Title= {Ensemble Feature Selection for Multi-Stream Automatic Speech Recognition},
School= {EECS Department, University of California, Berkeley},
Year= {2008},
Month= {Dec},
Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2008/EECS-2008-160.html},
Number= {UCB/EECS-2008-160},
Abstract= {Multi-stream automatic speech recognition (ASR) systems consisting of an ensemble of classifiers working together, each with its own feature vector, are popular in the research literature. Published work on feature selection for such systems has dealt with indivisible blocks of features. I break from this tradition by investigating feature selection at the level of individual features. I use the OGI ISOLET and Numbers speech corpora, including noisy versions I created using a variety of noises and signal-to-noise ratios. I have made these noisy versions available for use by other researchers, along with my ASR and feature selection scripts.

EndNote citation:

%0 Thesis
%A Gelbart, David 
%T Ensemble Feature Selection for Multi-Stream Automatic Speech Recognition
%I EECS Department, University of California, Berkeley
%D 2008
%8 December 15
%@ UCB/EECS-2008-160
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2008/EECS-2008-160.html
%F Gelbart:EECS-2008-160