Training and Analyzing Language Agents in Socially Complex Dialogues

Jessica Lin

EECS Department
University of California, Berkeley
Technical Report No. UCB/EECS-2025-63
May 15, 2025

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2025/EECS-2025-63.pdf

Advancements in large language models (LLMs) have led to their use as conversational partners in social contexts that may be highly nuanced. Additional data is needed for further training to improve agent performance in emotionally complex dialogues, but real data for such domains is scarce, ethically complex, or hard to obtain and label. Thus, using LLMs to generate synthetic data has arisen as a popular alternative, yet this is often unrealistic and lacking in diversity. In this technical report, we propose two methods to improve synthetic data to train dialogue agents with reinforcement learning (RL). In Chapter 1, we introduce a hindsight regeneration pipeline that improves the diversity and quality of existing dialogue data in persuasion and mental health counseling tasks. We illustrate the capability of our method by comparing to common baselines, as well as conducting simulated evaluation and a user study. In Chapter 2, we explore the effects of deception on a language model's ability to negotiate in a real-world business scenario. We present a simulation engine pipeline for effective conversation generation, as well as analysis of model behavior in negotiation tasks.

Advisor: Sergey Levine

\"Edit"; ?>


BibTeX citation:

@mastersthesis{Lin:EECS-2025-63,
    Author = {Lin, Jessica},
    Title = {Training and Analyzing Language Agents in Socially Complex Dialogues},
    School = {EECS Department, University of California, Berkeley},
    Year = {2025},
    Month = {May},
    URL = {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2025/EECS-2025-63.html},
    Number = {UCB/EECS-2025-63},
    Abstract = {Advancements in large language models (LLMs) have led to their use as conversational partners in social contexts that may be highly nuanced. Additional data is needed for further training to improve agent performance in emotionally complex dialogues, but real data for such domains is scarce, ethically complex, or hard to obtain and label. Thus, using LLMs to generate synthetic data has arisen as a popular alternative, yet this is often unrealistic and lacking in diversity. In this technical report, we propose two methods to improve synthetic data to train dialogue agents with reinforcement learning (RL). In Chapter 1, we introduce a hindsight regeneration pipeline that improves the diversity and quality of existing dialogue data in persuasion and mental health counseling tasks. We illustrate the capability of our method by comparing to common baselines, as well as conducting simulated evaluation and a user study. In Chapter 2, we explore the effects of deception on a language model's ability to negotiate in a real-world business scenario. We present a simulation engine pipeline for effective conversation generation, as well as analysis of model behavior in negotiation tasks.}
}

EndNote citation:

%0 Thesis
%A Lin, Jessica
%T Training and Analyzing Language Agents in Socially Complex Dialogues
%I EECS Department, University of California, Berkeley
%D 2025
%8 May 15
%@ UCB/EECS-2025-63
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2025/EECS-2025-63.html
%F Lin:EECS-2025-63