Benchmarks for RL on Goal-directed Language Tasks with LLMs

Charles Sun

EECS Department, University of California, Berkeley

Technical Report No. UCB/EECS-2023-98

May 11, 2023

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2023/EECS-2023-98.pdf

Large language models (LLMs) have demonstrated remarkable abilities when conversing with humans, answering questions, and responding to requests. Much of this capability is enabled through learning from large datasets taken from the Internet and finetuning on human preferences with RLHF. However, LLMs trained this way are not explicitly long-term goal-directed, as they are not optimized with an explicitly defined long-term objective. Reinforcement learning (RL) aims to solve the long-term goal-directed problem, and has been extremely successful on a wide variety of non-language tasks. However, progress for RL on goal-directed language tasks with LLMs has been lacking. A major roadblock with leveraging RL for goal-directed language tasks is the lack of clarity with respect to the tasks that it is best suited towards. We propose LLM-RL, a diverse suite of tasks and a set of corresponding datasets that will allow us to illustrate the potential of RL algorithms in goal-directed language tasks.

Advisors: Sergey Levine

BibTeX citation:

@mastersthesis{Sun:EECS-2023-98,
    Author= {Sun, Charles},
    Title= {Benchmarks for RL on Goal-directed Language Tasks with LLMs},
    School= {EECS Department, University of California, Berkeley},
    Year= {2023},
    Month= {May},
    Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2023/EECS-2023-98.html},
    Number= {UCB/EECS-2023-98},
    Abstract= {Large language models (LLMs) have demonstrated remarkable abilities when conversing with humans, answering questions, and responding to requests. Much of this capability is enabled through learning from large datasets taken from the Internet and finetuning on human preferences with RLHF. However, LLMs trained this way are not explicitly long-term goal-directed, as they are not optimized with an explicitly defined long-term objective. Reinforcement learning (RL) aims to solve the long-term goal-directed problem, and has been extremely successful on a wide variety of non-language tasks. However, progress for RL on goal-directed language tasks with LLMs has been lacking. A major roadblock with leveraging RL for goal-directed language tasks is the lack of clarity with respect to the tasks that it is best suited towards. We propose LLM-RL, a diverse suite of tasks and a set of corresponding datasets that will allow us to illustrate the potential of RL algorithms in goal-directed language tasks.},
}

EndNote citation:

%0 Thesis
%A Sun, Charles 
%T Benchmarks for RL on Goal-directed Language Tasks with LLMs
%I EECS Department, University of California, Berkeley
%D 2023
%8 May 11
%@ UCB/EECS-2023-98
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2023/EECS-2023-98.html
%F Sun:EECS-2023-98