Shrishti Jeswani

EECS Department, University of California, Berkeley

Technical Report No. UCB/EECS-2020-98

May 29, 2020

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2020/EECS-2020-98.pdf

Learning to adapt to new situations in the face of limited experience is the hallmark of human intelligence. Whether in Natural Language Processing (NLP) or Reinforcement Learning (RL), versatility is key for intelligent systems to perform well in the real world. This work will propose and evaluate solutions to salient transfer learning problems in NLP and RL.

Although today's pre-trained language models are considerably more robust to out-of-distribution data than traditional NLP models, they still remain notoriously brittle. We present a test-time training technique for NLP models to adapt to unforeseen distribution shifts at test-time, where no data is available during training time to use for domain adaptation. Our approach updates models at test-time using an unsupervised masked language modeling (MLM) objective. We ensure that this auxiliary loss is helpful by training using a gradient alignment technique that pushes the MLM and supervised losses together. We evaluate our approach on a variety of different tasks such as sentiment analysis and semantic similarity.

Although deep RL algorithms enable agents to perform impressive tasks, they often require several trials in order for agents to develop skills within a given environment. Furthermore, agents struggle to adapt to small changes in the environment, requiring additional samples to rebuild their knowledge about the world. In contrast, humans and animals are able to rapidly adapt to changes, while learning quickly from their prior experiences. Our objective is to improve generalization performance of state-of-the-art meta-RL approaches, where we consider generalization to changes in environment dynamics and environment reward structure. We propose and evaluate various novel meta-RL architectures, which aim to improve adaptation to new environments by disentangling components of the recurrent policy network.

Advisors: Joseph Gonzalez


BibTeX citation:

@mastersthesis{Jeswani:EECS-2020-98,
    Author= {Jeswani, Shrishti},
    Editor= {Gonzalez, Joseph and Canny, John F.},
    Title= {A Study of Transfer Learning Methods within Natural Language Processing and Reinforcement Learning},
    School= {EECS Department, University of California, Berkeley},
    Year= {2020},
    Month= {May},
    Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2020/EECS-2020-98.html},
    Number= {UCB/EECS-2020-98},
    Abstract= {Learning to adapt to new situations in the face of limited experience is the hallmark of human intelligence. Whether in Natural Language Processing (NLP) or Reinforcement Learning (RL), versatility is key for intelligent systems to perform well in the real world. This work will propose and evaluate solutions to salient transfer learning problems in NLP and RL.

Although today's pre-trained language models are considerably more robust to out-of-distribution data than traditional NLP models, they still remain notoriously brittle. We present a test-time training technique for NLP models to adapt to unforeseen distribution shifts at test-time, where no data is available during training time to use for domain adaptation. Our approach updates models at test-time using an unsupervised masked language modeling (MLM) objective. We ensure that this auxiliary loss is helpful by training using a gradient alignment technique that pushes the MLM and supervised losses together. We evaluate our approach on a variety of different tasks such as sentiment analysis and semantic similarity.

Although deep RL algorithms enable agents to perform impressive tasks, they often require several trials in order for agents to develop skills within a given environment. Furthermore, agents struggle to adapt to small changes in the environment, requiring additional samples to rebuild their knowledge about the world. In contrast, humans and animals are able to rapidly adapt to changes, while learning quickly from their prior experiences. Our objective is to improve generalization performance of state-of-the-art meta-RL approaches, where we consider generalization to changes in environment dynamics and environment reward structure. We propose and evaluate various novel meta-RL architectures, which aim to improve adaptation to new environments by disentangling components of the recurrent policy network.},
}

EndNote citation:

%0 Thesis
%A Jeswani, Shrishti 
%E Gonzalez, Joseph 
%E Canny, John F. 
%T A Study of Transfer Learning Methods within Natural Language Processing and Reinforcement Learning
%I EECS Department, University of California, Berkeley
%D 2020
%8 May 29
%@ UCB/EECS-2020-98
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2020/EECS-2020-98.html
%F Jeswani:EECS-2020-98