Hierarchical Actor-Critic Exploration with Synchronized, Adversarial, & Knowledge-Based Actions
Ayush Jain
EECS Department, University of California, Berkeley
Technical Report No. UCB/EECS-2022-6
April 17, 2022
http://www2.eecs.berkeley.edu/Pubs/TechRpts/2022/EECS-2022-6.pdf
This work leans into the efficiency and robustness advantages of a hierarchical learning structure by introducing HAC-E-SAK, which expands beyond the Hierarchical Actor-Critic framework with an Exploration paradigm driven by Synchronized, Adversarial, and Knowledge-based actions. While Hierarchical Actor-Critic (HAC) emphasizes a strictly-defined hierarchical organization for rapid learning through parallelized training of multilevel subtask transition functions, it does not extend this principle to the exploration phase of training, an oversight addressed by this work's approach. Further, HAC's exploration strategy consists of simple epsilon-greedy based perturbations to deterministic actions generated from the DDPG algorithm. The novel approach presented in this work substitutes this with an alternate adversarial strategy relying on knowledge of prior agent experiences, motivating guided environment discovery for tasks with continuous state and action spaces. HAC-E-SAK extends the aforementioned hierarchical organization used by leading methods in subtask learning for the parallel purpose of structured exploration, allowing for explicit synchronization between levels. Experiments across a number of sparse-reward scenarios in Flow and OpenAI Gym demonstrate HAC-E-SAK's consistent outperformance over other tested procedures in terms of both sample efficiency and task success rates.
Advisors: Alexandre Bayen
BibTeX citation:
@mastersthesis{Jain:EECS-2022-6, Author= {Jain, Ayush}, Title= {Hierarchical Actor-Critic Exploration with Synchronized, Adversarial, & Knowledge-Based Actions}, School= {EECS Department, University of California, Berkeley}, Year= {2022}, Month= {Apr}, Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2022/EECS-2022-6.html}, Number= {UCB/EECS-2022-6}, Abstract= {This work leans into the efficiency and robustness advantages of a hierarchical learning structure by introducing HAC-E-SAK, which expands beyond the Hierarchical Actor-Critic framework with an Exploration paradigm driven by Synchronized, Adversarial, and Knowledge-based actions. While Hierarchical Actor-Critic (HAC) emphasizes a strictly-defined hierarchical organization for rapid learning through parallelized training of multilevel subtask transition functions, it does not extend this principle to the exploration phase of training, an oversight addressed by this work's approach. Further, HAC's exploration strategy consists of simple epsilon-greedy based perturbations to deterministic actions generated from the DDPG algorithm. The novel approach presented in this work substitutes this with an alternate adversarial strategy relying on knowledge of prior agent experiences, motivating guided environment discovery for tasks with continuous state and action spaces. HAC-E-SAK extends the aforementioned hierarchical organization used by leading methods in subtask learning for the parallel purpose of structured exploration, allowing for explicit synchronization between levels. Experiments across a number of sparse-reward scenarios in Flow and OpenAI Gym demonstrate HAC-E-SAK's consistent outperformance over other tested procedures in terms of both sample efficiency and task success rates.}, }
EndNote citation:
%0 Thesis %A Jain, Ayush %T Hierarchical Actor-Critic Exploration with Synchronized, Adversarial, & Knowledge-Based Actions %I EECS Department, University of California, Berkeley %D 2022 %8 April 17 %@ UCB/EECS-2022-6 %U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2022/EECS-2022-6.html %F Jain:EECS-2022-6