State Abstraction for Programmable Reinforcement Learning Agents
David Andre and Stuart J. Russell
EECS Department, University of California, Berkeley
Technical Report No. UCB/CSD-01-1156
, 2001
http://www2.eecs.berkeley.edu/Pubs/TechRpts/2001/CSD-01-1156.pdf
Safe state abstraction in reinforcement learning allows an agent to ignore aspects of its current state that are irrelevant to its current decision, and therefore speeds up dynamic programming and learning. Like Dietterich's MAXQ framework, this paper develops methods for safe state abstraction in the context of hierarchical reinforcement learning, in which a hierarchical partial program is used to constrain the policies that are considered. We extend techniques from MAXQ to the context of programmable hierarchical abstract machines (PHAMs), which express complex parameterized behaviors using a simple extension of the Lisp language. We show that our methods preserve the property of hierarchical optimality, i.e., optimality among all policies consistent with the PHAM program. We also show how our methods allow safe detachment, encapsulation, and transfer of learned "subroutine" behaviors, and demonstrate our methods on Dietterich's taxi domain.
BibTeX citation:
@techreport{Andre:CSD-01-1156, Author= {Andre, David and Russell, Stuart J.}, Title= {State Abstraction for Programmable Reinforcement Learning Agents}, Year= {2001}, Month= {Oct}, Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2001/5767.html}, Number= {UCB/CSD-01-1156}, Abstract= {Safe state abstraction in reinforcement learning allows an agent to ignore aspects of its current state that are irrelevant to its current decision, and therefore speeds up dynamic programming and learning. Like Dietterich's MAXQ framework, this paper develops methods for safe state abstraction in the context of hierarchical reinforcement learning, in which a hierarchical partial program is used to constrain the policies that are considered. We extend techniques from MAXQ to the context of programmable hierarchical abstract machines (PHAMs), which express complex parameterized behaviors using a simple extension of the Lisp language. We show that our methods preserve the property of hierarchical optimality, i.e., optimality among all policies consistent with the PHAM program. We also show how our methods allow safe detachment, encapsulation, and transfer of learned "subroutine" behaviors, and demonstrate our methods on Dietterich's taxi domain.}, }
EndNote citation:
%0 Report %A Andre, David %A Russell, Stuart J. %T State Abstraction for Programmable Reinforcement Learning Agents %I EECS Department, University of California, Berkeley %D 2001 %@ UCB/CSD-01-1156 %U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2001/5767.html %F Andre:CSD-01-1156