Roshan Nagaram

EECS Department, University of California, Berkeley

Technical Report No. UCB/EECS-2024-23

April 28, 2024

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2024/EECS-2024-23.pdf

This paper investigates the application of Reinforcement Learning (RL), particularly Reinforcement Learning with Artificial Intelligence Feedback (RLAIF), to enhance emotional expression in text-to-speech (TTS) models. RLAIF stems from RLHF (Reinforcement Learning with Human Feedback). These techniques aim to leverage feedback from humans or other AI models in order to train and tune a set of AI Models. Techniques such as RLAIF are powerful when training on a set of objectives that are complex, abstract, and difficult to mathematically define. Through its capabilities, RLAIF has become crucial in ensuring Large Language Models align with human values. In this paper, we intend to utilize these same capabilities in the text-to-speech domain. Similar to how RLAIF can be used to align Large Language Models with complex objectives such as mitigating toxicity, this paper aims to use RLAIF to align text-to-speech models to express certain emotions. By exploring different base model architectures in addition to specific RL techniques such as Reward Weighted Regression (RWR) and Proximal Policy Optimization (PPO), we are able to train text-to-speech models to better express emotion using feedback from an emotion predictor model. Ultimately, this research seeks to demonstrate, through quantitative and qualitative data, the effectiveness of RLAIF as a method for tuning TTS models to achieve more nuanced emotional expression.

Advisors: Trevor Darrell


BibTeX citation:

@mastersthesis{Nagaram:EECS-2024-23,
    Author= {Nagaram, Roshan},
    Title= {Enhancing Emotional Expression in Text-to-Speech Models through Reinforcement Learning with AI Feedback},
    School= {EECS Department, University of California, Berkeley},
    Year= {2024},
    Month= {Apr},
    Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2024/EECS-2024-23.html},
    Number= {UCB/EECS-2024-23},
    Abstract= {This paper investigates the application of Reinforcement Learning (RL), particularly Reinforcement Learning with Artificial Intelligence Feedback (RLAIF), to enhance emotional expression in text-to-speech (TTS) models. RLAIF stems from RLHF (Reinforcement Learning with Human Feedback). These techniques aim to leverage feedback from humans or other AI models in order to train and tune a set of AI Models. Techniques such as RLAIF are powerful when training on a set of objectives that are complex, abstract, and difficult to mathematically define. Through its capabilities, RLAIF has become crucial in ensuring Large Language Models align with human values. In this paper, we intend to utilize these same capabilities in the text-to-speech domain. Similar to how RLAIF can be used to align Large Language Models with complex objectives such as mitigating toxicity, this paper aims to use RLAIF to align text-to-speech models to express certain emotions. By exploring different base model architectures in addition to specific RL techniques such as Reward Weighted Regression (RWR) and Proximal Policy Optimization (PPO), we are able to train text-to-speech models to better express emotion using feedback from an emotion predictor model. Ultimately, this research seeks to demonstrate, through quantitative and qualitative data, the effectiveness of RLAIF as a method for tuning TTS models to achieve more nuanced emotional expression.},
}

EndNote citation:

%0 Thesis
%A Nagaram, Roshan 
%T Enhancing Emotional Expression in Text-to-Speech Models through Reinforcement Learning with AI Feedback
%I EECS Department, University of California, Berkeley
%D 2024
%8 April 28
%@ UCB/EECS-2024-23
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2024/EECS-2024-23.html
%F Nagaram:EECS-2024-23