Practical reinforcement learning in continuous domains

Jeffrey Forbes and David Andre

EECS Department, University of California, Berkeley

Technical Report No. UCB/CSD-00-1109

2000

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2000/CSD-00-1109.pdf

Many real-world domains have continuous features and actions, whereas the majority of results in the reinforcement learning community are for finite Markov decision processes. Much of the work that addresses continuous domains either uses discretization or simple parametric function approximators. A drawback to some commonly-used parametric function approximation techniques, such as neural networks, is that parametric methods can "forget" and concentrate representational power on new examples. In this paper, we propose a practical architecture for model-based reinforcement learning in continuous state and action spaces that avoids the above difficulties by using an instance-based modeling technique. We present a method for learning and maintaining a value function estimate using instance-based learners, and show that our method compares favorably to other function approximation methods, such as neural networks. Furthermore, our reinforcement learning algorithm learns an explicit model of the environment simultaneously with a value function and policy. The use of a model is beneficial, first, because it allows the agent to make better use of its experiences through simulated planning steps. Second, the use of a model makes it straightforward to provide prior information to the system in the form of the structure of the environmental model. We extend a technique called generalized prioritized sweeping to the continuous case in order to focus the agent's planning steps on those states where the current value is most likely to be incorrect. We illustrate our algorithm's effectiveness with results on several control domains.

BibTeX citation:

@techreport{Forbes:CSD-00-1109,
    Author= {Forbes, Jeffrey and Andre, David},
    Title= {Practical reinforcement learning in continuous domains},
    Year= {2000},
    Month= {Aug},
    Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2000/5792.html},
    Number= {UCB/CSD-00-1109},
    Abstract= {Many real-world domains have continuous features and actions, whereas the majority of results in the reinforcement learning community are for finite Markov decision processes. Much of the work that addresses continuous domains either uses discretization or simple parametric function approximators. A drawback to some commonly-used parametric function approximation techniques, such as neural networks, is that parametric methods can "forget" and concentrate representational power on new examples. In this paper, we propose a practical architecture for model-based reinforcement learning in continuous state and action spaces that avoids the above difficulties by using an instance-based modeling technique. We present a method for learning and maintaining a value function estimate using instance-based learners, and show that our method compares favorably to other function approximation methods, such as neural networks. Furthermore, our reinforcement learning algorithm learns an explicit model of the environment simultaneously with a value function and policy. The use of a model is beneficial, first, because it allows the agent to make better use of its experiences through simulated planning steps. Second, the use of a model makes it straightforward to provide prior information to the system in the form of the structure of the environmental model. We extend a technique called generalized prioritized sweeping to the continuous case in order to focus the agent's planning steps on those states where the current value is most likely to be incorrect. We illustrate our algorithm's effectiveness with results on several control domains.},
}

EndNote citation:

%0 Report
%A Forbes, Jeffrey 
%A Andre, David 
%T Practical reinforcement learning in continuous domains
%I EECS Department, University of California, Berkeley
%D 2000
%@ UCB/CSD-00-1109
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2000/5792.html
%F Forbes:CSD-00-1109