Multi-task Policy Learning with Minimal Human Supervision

Parsa Mahmoudieh

EECS Department, University of California, Berkeley

Technical Report No. UCB/EECS-2022-200

August 11, 2022

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2022/EECS-2022-200.pdf

Multi-task policies enable a user to adjust their desired objective or task parameters without having to train a new policy for every new desired task. In order to train multi-task policies that can generalize to unseen tasks it is common to train them on a large repository of tasks. Tasks are commonly learned with demonstrations or reward functions. However, collecting human demonstrations or instrumenting reward functions for each new task is expensive and limits scaling of multi-task policies. How tasks are specified to multi-task policies is also an important dimension that can result in expensive labor during task communication. In this thesis we explore ways to learn and specify new tasks with minimal human supervision to enable more scalable multi-task policies.

Advisors: Trevor Darrell

BibTeX citation:

@phdthesis{Mahmoudieh:EECS-2022-200,
    Author= {Mahmoudieh, Parsa},
    Title= {Multi-task Policy Learning with Minimal Human Supervision},
    School= {EECS Department, University of California, Berkeley},
    Year= {2022},
    Month= {Aug},
    Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2022/EECS-2022-200.html},
    Number= {UCB/EECS-2022-200},
    Abstract= {Multi-task policies enable a user to adjust their desired objective or task parameters without having to train a new policy for every new desired task. In order to train multi-task policies that can generalize to unseen tasks it is common to train them on a large repository of tasks. Tasks are commonly learned with demonstrations or reward functions. However, collecting human demonstrations or instrumenting reward functions for each new task is expensive and limits scaling of multi-task policies. How tasks are specified to multi-task policies is also an important dimension that can result in expensive labor during task communication. In this thesis we explore ways to learn and specify new tasks with minimal human supervision to enable more scalable multi-task policies.},
}

EndNote citation:

%0 Thesis
%A Mahmoudieh, Parsa 
%T Multi-task Policy Learning with Minimal Human Supervision
%I EECS Department, University of California, Berkeley
%D 2022
%8 August 11
%@ UCB/EECS-2022-200
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2022/EECS-2022-200.html
%F Mahmoudieh:EECS-2022-200