Extremely Lightweight Vocoders for On-device Speech Synthesis
Tianren Gao
EECS Department, University of California, Berkeley
Technical Report No. UCB/EECS-2021-69
May 13, 2021
http://www2.eecs.berkeley.edu/Pubs/TechRpts/2021/EECS-2021-69.pdf
As edge device applications begin to increasingly interact with users through speech, efficient automatic speech synthesis is becoming increasingly important. Typical text-to-speech pipelines include a vocoder, which translates intermediate audio representations into raw audio waveforms. Most existing vocoders are difficult to parallelize since each generated sample is conditioned on previous samples. Flow-based feed-forward models, for example, WaveGlow, is an alternative to these auto-regressive models. However, while WaveGlow can be easily parallelized, the model is too expensive for real-time speech synthesis on the edge. This work presents SqueezeWave, an extremely lightweight vocoder that can generate audio of similar quality to WaveGlow with 61x - 214x fewer MACs.
Advisors: Kurt Keutzer and Joseph Gonzalez
BibTeX citation:
@mastersthesis{Gao:EECS-2021-69, Author= {Gao, Tianren}, Title= {Extremely Lightweight Vocoders for On-device Speech Synthesis}, School= {EECS Department, University of California, Berkeley}, Year= {2021}, Month= {May}, Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2021/EECS-2021-69.html}, Number= {UCB/EECS-2021-69}, Abstract= {As edge device applications begin to increasingly interact with users through speech, efficient automatic speech synthesis is becoming increasingly important. Typical text-to-speech pipelines include a vocoder, which translates intermediate audio representations into raw audio waveforms. Most existing vocoders are difficult to parallelize since each generated sample is conditioned on previous samples. Flow-based feed-forward models, for example, WaveGlow, is an alternative to these auto-regressive models. However, while WaveGlow can be easily parallelized, the model is too expensive for real-time speech synthesis on the edge. This work presents SqueezeWave, an extremely lightweight vocoder that can generate audio of similar quality to WaveGlow with 61x - 214x fewer MACs.}, }
EndNote citation:
%0 Thesis %A Gao, Tianren %T Extremely Lightweight Vocoders for On-device Speech Synthesis %I EECS Department, University of California, Berkeley %D 2021 %8 May 13 %@ UCB/EECS-2021-69 %U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2021/EECS-2021-69.html %F Gao:EECS-2021-69