Language Models as Rational Agents
Nicholas Tomlin
EECS Department, University of California, Berkeley
Technical Report No. UCB/EECS-2025-145
July 31, 2025
http://www2.eecs.berkeley.edu/Pubs/TechRpts/2025/EECS-2025-145.pdf
One of the long-standing aims of artificial intelligence is to build agents that can reason and act in order to achieve complex goals. Decades of research in search and planning have focused on the development of rational agents that take actions in order to maximize their expected utilities. This line of work has culminated in the development of models like Deep Blue for chess, AlphaZero for Go, and Libratus for poker. These rational agents are goal-oriented by design, using long-horizon planning, learning, and reasoning under uncertainty to perform at an expert level, but their utility has often been limited to narrow domains.
In recent years, language models have emerged as a promising path toward building more general-purpose artificial intelligence, and they are increasingly being used as agents that take actions in the world. However, in contrast to game-playing agents like AlphaGo, language models often fail to act rationally. In this thesis, I present work which aims to bridge this gap by building language models that can reason, plan, and act in complex environments.
Through a series of multi-agent, utility-maximization games, I first define what it would mean for a language model to communicate and take actions in a rational way. I develop a new evaluation suite of decision-making tasks for language model agents and show that current models make worse decisions than humans, while also taking longer to do so. Then, I discuss a method for building more rational language models across a range of objectives. Finally, I present a new benchmark for evaluating future language models as general-purpose, rational agents and describe potential directions and remaining challenges.
Advisors: Daniel Klein
BibTeX citation:
@phdthesis{Tomlin:EECS-2025-145,
Author= {Tomlin, Nicholas},
Title= {Language Models as Rational Agents},
School= {EECS Department, University of California, Berkeley},
Year= {2025},
Month= {Jul},
Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2025/EECS-2025-145.html},
Number= {UCB/EECS-2025-145},
Abstract= {One of the long-standing aims of artificial intelligence is to build agents that can reason and act in order to achieve complex goals. Decades of research in search and planning have focused on the development of rational agents that take actions in order to maximize their expected utilities. This line of work has culminated in the development of models like Deep Blue for chess, AlphaZero for Go, and Libratus for poker. These rational agents are goal-oriented by design, using long-horizon planning, learning, and reasoning under uncertainty to perform at an expert level, but their utility has often been limited to narrow domains.
In recent years, language models have emerged as a promising path toward building more general-purpose artificial intelligence, and they are increasingly being used as agents that take actions in the world. However, in contrast to game-playing agents like AlphaGo, language models often fail to act rationally. In this thesis, I present work which aims to bridge this gap by building language models that can reason, plan, and act in complex environments.
Through a series of multi-agent, utility-maximization games, I first define what it would mean for a language model to communicate and take actions in a rational way. I develop a new evaluation suite of decision-making tasks for language model agents and show that current models make worse decisions than humans, while also taking longer to do so. Then, I discuss a method for building more rational language models across a range of objectives. Finally, I present a new benchmark for evaluating future language models as general-purpose, rational agents and describe potential directions and remaining challenges.},
}
EndNote citation:
%0 Thesis %A Tomlin, Nicholas %T Language Models as Rational Agents %I EECS Department, University of California, Berkeley %D 2025 %8 July 31 %@ UCB/EECS-2025-145 %U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2025/EECS-2025-145.html %F Tomlin:EECS-2025-145