Analyzing the Prediction Accuracy of Trajectory-Based Models with High-Dimensional Control Policies for Long-term Planning in MBRL

Howard Zhang

EECS Department, University of California, Berkeley

Technical Report No. UCB/EECS-2021-46

May 11, 2021

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2021/EECS-2021-46.pdf

Learning effective policies with model-based reinforcement learning is highly dependent on the accuracy of the dynamics model. Recently, a new parametrization called the trajectory-based model was introduced, which takes in an initial state, a future time index, and control policy parameters, and returns the state at that future time index [3]. This new method has demonstrated improved prediction accuracy in long horizons, increased sample efficiency, and ability to predict the task reward. However, this model has limited transferability to MBRL due to the limited expressivity of its low-dimensional control policy parameter inputs. In this work, we look at the effectiveness of the trajectory-based model at predicting environment dynamics with higher-dimensional and expressive neural network control policies. The trajectory-based model has demonstrated some capability in learning from these neural network policies, and still outperforms the traditional state-action one-step model due to less compounding error.

BibTeX citation:

@mastersthesis{Zhang:EECS-2021-46,
    Author= {Zhang, Howard},
    Title= {Analyzing the Prediction Accuracy of Trajectory-Based Models with  High-Dimensional Control Policies for Long-term Planning in MBRL},
    School= {EECS Department, University of California, Berkeley},
    Year= {2021},
    Month= {May},
    Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2021/EECS-2021-46.html},
    Number= {UCB/EECS-2021-46},
    Abstract= {Learning effective policies with model-based reinforcement learning is highly dependent on the accuracy of the dynamics model. Recently, a new parametrization called the trajectory-based model was introduced, which takes in an initial state, a future time index, and control policy parameters, and returns the state at that future time index [3]. This new method has demonstrated improved prediction accuracy in long horizons, increased sample efficiency, and ability to predict the task reward. However, this model has limited transferability to MBRL due to the limited expressivity of its low-dimensional control policy parameter inputs. In this work, we look at the effectiveness of the trajectory-based model at predicting environment dynamics with higher-dimensional and expressive neural network control policies. The trajectory-based model has demonstrated some capability in learning from these neural network policies, and still outperforms the traditional state-action one-step model due to less compounding error.},
}

EndNote citation:

%0 Thesis
%A Zhang, Howard 
%T Analyzing the Prediction Accuracy of Trajectory-Based Models with  High-Dimensional Control Policies for Long-term Planning in MBRL
%I EECS Department, University of California, Berkeley
%D 2021
%8 May 11
%@ UCB/EECS-2021-46
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2021/EECS-2021-46.html
%F Zhang:EECS-2021-46