Audio Hashprints: Theory & Application
Timothy Tsai
EECS Department, University of California, Berkeley
Technical Report No. UCB/EECS-2016-185
December 1, 2016
http://www2.eecs.berkeley.edu/Pubs/TechRpts/2016/EECS-2016-185.pdf
We rely heavily on search engines like Google to navigate millions of webpages, but a lot of content of interest is multimedia, not text data. One important class of multimedia data is audio. How can we search a database of audio data? One of the main challenges in audio search and retrieval is to determine a mapping from a continuous time-series signal to a sequence of discrete symbols that are suitable for reverse-indexing and efficient pairwise comparison. This talk introduces a method for learning this mapping in an unsupervised, highly adaptive way, resulting in a representation which we call audio hashprints. We will discuss the theoretical underpinnings that determine how useful a particular representation is in a retrieval context, and we show how hashprints are a suitable representation for tasks requiring high adaptivity. We investigate the performance of the proposed hashprints on two different audio search tasks: synchronizing consumer recordings of the same live event using audio correspondences, and identifying a song at a live concert. Using audio hashprints, we demonstrate state-of-the-art performance on both tasks.
Advisors: Nelson Morgan
BibTeX citation:
@phdthesis{Tsai:EECS-2016-185, Author= {Tsai, Timothy}, Title= {Audio Hashprints: Theory & Application}, School= {EECS Department, University of California, Berkeley}, Year= {2016}, Month= {Dec}, Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2016/EECS-2016-185.html}, Number= {UCB/EECS-2016-185}, Abstract= {We rely heavily on search engines like Google to navigate millions of webpages, but a lot of content of interest is multimedia, not text data. One important class of multimedia data is audio. How can we search a database of audio data? One of the main challenges in audio search and retrieval is to determine a mapping from a continuous time-series signal to a sequence of discrete symbols that are suitable for reverse-indexing and efficient pairwise comparison. This talk introduces a method for learning this mapping in an unsupervised, highly adaptive way, resulting in a representation which we call audio hashprints. We will discuss the theoretical underpinnings that determine how useful a particular representation is in a retrieval context, and we show how hashprints are a suitable representation for tasks requiring high adaptivity. We investigate the performance of the proposed hashprints on two different audio search tasks: synchronizing consumer recordings of the same live event using audio correspondences, and identifying a song at a live concert. Using audio hashprints, we demonstrate state-of-the-art performance on both tasks.}, }
EndNote citation:
%0 Thesis %A Tsai, Timothy %T Audio Hashprints: Theory & Application %I EECS Department, University of California, Berkeley %D 2016 %8 December 1 %@ UCB/EECS-2016-185 %U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2016/EECS-2016-185.html %F Tsai:EECS-2016-185