Timothy Tsai

EECS Department, University of California, Berkeley

Technical Report No. UCB/EECS-2016-185

December 1, 2016

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2016/EECS-2016-185.pdf

We rely heavily on search engines like Google to navigate millions of webpages, but a lot of content of interest is multimedia, not text data. One important class of multimedia data is audio. How can we search a database of audio data? One of the main challenges in audio search and retrieval is to determine a mapping from a continuous time-series signal to a sequence of discrete symbols that are suitable for reverse-indexing and efficient pairwise comparison. This talk introduces a method for learning this mapping in an unsupervised, highly adaptive way, resulting in a representation which we call audio hashprints. We will discuss the theoretical underpinnings that determine how useful a particular representation is in a retrieval context, and we show how hashprints are a suitable representation for tasks requiring high adaptivity. We investigate the performance of the proposed hashprints on two different audio search tasks: synchronizing consumer recordings of the same live event using audio correspondences, and identifying a song at a live concert. Using audio hashprints, we demonstrate state-of-the-art performance on both tasks.

Advisors: Nelson Morgan


BibTeX citation:

@phdthesis{Tsai:EECS-2016-185,
    Author= {Tsai, Timothy},
    Title= {Audio Hashprints: Theory & Application},
    School= {EECS Department, University of California, Berkeley},
    Year= {2016},
    Month= {Dec},
    Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2016/EECS-2016-185.html},
    Number= {UCB/EECS-2016-185},
    Abstract= {We rely heavily on search engines like Google to navigate millions of webpages, but a lot of content of interest is multimedia, not text data.  One important class of multimedia data is audio.  How can we search a database of audio data?  One of the main challenges in audio search and retrieval is to determine a mapping from a continuous time-series signal to a sequence of discrete symbols that are suitable for reverse-indexing and efficient pairwise comparison.  This talk introduces a method for learning this mapping in an unsupervised, highly adaptive way, resulting in a representation which we call audio hashprints.  We will discuss the theoretical underpinnings that determine how useful a particular representation is in a retrieval context, and we show how hashprints are a suitable representation for tasks requiring high adaptivity.  We investigate the performance of the proposed hashprints on two different audio search tasks: synchronizing consumer recordings of the same live event using audio correspondences, and identifying a song at a live concert.  Using audio hashprints, we demonstrate state-of-the-art performance on both tasks.},
}

EndNote citation:

%0 Thesis
%A Tsai, Timothy 
%T Audio Hashprints: Theory & Application
%I EECS Department, University of California, Berkeley
%D 2016
%8 December 1
%@ UCB/EECS-2016-185
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2016/EECS-2016-185.html
%F Tsai:EECS-2016-185