Binding Large Language Models to Virtual Personas for Human Simulation
Suhong Moon
EECS Department, University of California, Berkeley
Technical Report No. UCB/EECS-2025-191
December 9, 2025
http://www2.eecs.berkeley.edu/Pubs/TechRpts/2025/EECS-2025-191.pdf
This dissertation develops a unified framework for binding large language models (LLMs) to coherent virtual personas through narrative backstories, enabling scalable, and valid simulation of human attitudes and behaviors. The central idea is that backstories—synthetic life narratives created by LLMs, which encode demographic information, psychological context, and human beliefs, values, and perspectives, both implicitly and explicitly—can serve as conditioning contexts that stabilize and differentiate LLM behavior. Through this lens, the work investigates how backstory conditioning improves representativeness, consistency, and behavioral realism in simulated populations.
A key assumption underlying this framework is the use of pretrained base models, whose heterogeneous “mixture of voices” enable backstories to bind naturally through prefix conditioning. This reliance on pretrained models distinguishes the approach from much of the related work on LLM conditioning, which often employs instruction-tuned chat models that override narrative cues with safety or normative alignment objectives. Through this lens, the work investigates how backstory conditioning improves representativeness, consistency, and behavioral realism in simulated populations.
Chapter 2 introduces the Anthology framework, which generates diverse first-person backstories through simple prompting (e.g., “Tell me about yourself”) and aligns them to target demographic distributions using a maximum-weight or greedy bipartite matching algorithm. When conditioned on these backstories, LLMs reproduce population-level opinion distributions from the Pew Research Center’s American Trends Panel with smaller distributional shifts between human and model responses and higher internal consistency than existing persona-conditioning methods.
Chapter 3 extends the Anthology framework to model social identity and group perception. We test whether LLMs exhibit deep persona binding—responding as true in-group members would—rather than shallow imitation of social stereotypes. Longer and more coherent backstories, generated through multi-turn prompting, enable richer and more consistent virtual personas. These backstory-conditioned LLMs reproduce partisan asymmetries in moral judgment and meta-perception observed in human data, showing that narrative coherence is essential for capturing authentic identity-driven perspectives.
Chapter 4 applies backstory conditioning to action prediction in social-dilemma settings, including the Dictator and Trust games. By incorporating temporal cues and identity reinforcement, LLM personas display cooperative and strategic behaviors aligned with empirical human results.
Together, these studies demonstrate that narrative-based persona conditioning provides a general mechanism for aligning LLMs with human psychological realism. By integrating demographic structure, narrative coherence, and contextual grounding, the framework enables LLMs to approximate human attitudes, identities, and actions within a unified modeling paradigm. This work establishes backstory-conditioned LLMs as a principled foundation for scalable and ethically responsible behavioral simulation, offering a new methodological bridge between computational modeling and human behavioral studies.
Advisors: John F. Canny
BibTeX citation:
@phdthesis{Moon:EECS-2025-191,
Author= {Moon, Suhong},
Title= {Binding Large Language Models to Virtual Personas for Human Simulation},
School= {EECS Department, University of California, Berkeley},
Year= {2025},
Month= {Dec},
Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2025/EECS-2025-191.html},
Number= {UCB/EECS-2025-191},
Abstract= {This dissertation develops a unified framework for binding large language models (LLMs) to coherent virtual personas through narrative backstories, enabling scalable, and valid simulation of human attitudes and behaviors. The central idea is that backstories—synthetic life narratives created by LLMs, which encode demographic information, psychological context, and human beliefs, values, and perspectives, both implicitly and explicitly—can serve as conditioning contexts that stabilize and differentiate LLM behavior. Through this lens, the work investigates how backstory conditioning improves representativeness, consistency, and behavioral realism in simulated populations.
A key assumption underlying this framework is the use of pretrained base models, whose heterogeneous “mixture of voices” enable backstories to bind naturally through prefix conditioning. This reliance on pretrained models distinguishes the approach from much of the related work on LLM conditioning, which often employs instruction-tuned chat models that override narrative cues with safety or normative alignment objectives. Through this lens, the work investigates how backstory conditioning improves representativeness, consistency, and behavioral realism in simulated populations.
Chapter 2 introduces the Anthology framework, which generates diverse first-person backstories through simple prompting (e.g., “Tell me about yourself”) and aligns them to target demographic distributions using a maximum-weight or greedy bipartite matching algorithm. When conditioned on these backstories, LLMs reproduce population-level opinion distributions from the Pew Research Center’s American Trends Panel with smaller distributional shifts between human and model responses and higher internal consistency than existing persona-conditioning methods.
Chapter 3 extends the Anthology framework to model social identity and group perception. We test whether LLMs exhibit deep persona binding—responding as true in-group members would—rather than shallow imitation of social stereotypes. Longer and more coherent backstories, generated through multi-turn prompting, enable richer and more consistent virtual personas. These backstory-conditioned LLMs reproduce partisan asymmetries in moral judgment and meta-perception observed in human data, showing that narrative coherence is essential for capturing authentic identity-driven perspectives.
Chapter 4 applies backstory conditioning to action prediction in social-dilemma settings, including the Dictator and Trust games. By incorporating temporal cues and identity reinforcement, LLM personas display cooperative and strategic behaviors aligned with empirical human results.
Together, these studies demonstrate that narrative-based persona conditioning provides a general mechanism for aligning LLMs with human psychological realism. By integrating demographic structure, narrative coherence, and contextual grounding, the framework enables LLMs to approximate human attitudes, identities, and actions within a unified modeling
paradigm. This work establishes backstory-conditioned LLMs as a principled foundation for scalable and ethically responsible behavioral simulation, offering a new methodological bridge between computational modeling and human behavioral studies.},
}
EndNote citation:
%0 Thesis %A Moon, Suhong %T Binding Large Language Models to Virtual Personas for Human Simulation %I EECS Department, University of California, Berkeley %D 2025 %8 December 9 %@ UCB/EECS-2025-191 %U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2025/EECS-2025-191.html %F Moon:EECS-2025-191