The Cost of OPS in Reinforcement Learning

Yao Fu

EECS Department, University of California, Berkeley

Technical Report No. UCB/EECS-2021-128

May 14, 2021

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2021/EECS-2021-128.pdf

It is typical to ignore the cost of computing an observation or action in the perception-action loop, such that the agent is free to sense the environment and prepare its decision at length before time steps forward. Decision making and dynamics of the environment are treated as synchronous. Yet, the need to act and react efficiently is a basic constraint in natural environments that shapes the behavior of animals. We consider a setting in which the environment is asynchronous, and the computational components of decision making such as observation, prediction, and action selection involve associated costs. The costs of operations affect the policy by inducing a need to trade off between them, and can be incorporated in the learning setting as an intrinsic reward function. As a first attempt, we develop a simple hierarchical approach that adaptively chooses between OPs – explicitly observing or implicitly predicting – in order to update its hidden state, and analyze emergent strategies on a number of environments involving partial observability and stochastic dynamics.

Advisors: Pieter Abbeel

BibTeX citation:

@mastersthesis{Fu:EECS-2021-128,
    Author= {Fu, Yao},
    Title= {The Cost of OPS in Reinforcement Learning},
    School= {EECS Department, University of California, Berkeley},
    Year= {2021},
    Month= {May},
    Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2021/EECS-2021-128.html},
    Number= {UCB/EECS-2021-128},
    Abstract= {It is typical to ignore the cost of computing an observation or action in the perception-action loop, such that the agent is free to sense the environment and prepare its decision at length before time steps forward. Decision making and dynamics of the environment are treated as synchronous. Yet, the need to act and react efficiently is a basic constraint in natural environments that shapes the behavior of animals. We consider a setting in which the environment is asynchronous, and the computational components of decision making such as observation, prediction, and action selection involve associated costs. The costs of operations affect the policy by inducing a need to trade off between them, and can be incorporated in the learning setting as an intrinsic reward function. As a first attempt, we develop a simple hierarchical approach that adaptively chooses between OPs – explicitly observing or implicitly predicting – in order to update its hidden state, and analyze emergent strategies on a number of environments involving partial observability and stochastic dynamics.},
}

EndNote citation:

%0 Thesis
%A Fu, Yao 
%T The Cost of OPS in Reinforcement Learning
%I EECS Department, University of California, Berkeley
%D 2021
%8 May 14
%@ UCB/EECS-2021-128
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2021/EECS-2021-128.html
%F Fu:EECS-2021-128