Demystifying Decision-Making of Deep RL through Validated Language Explanations

Ashwin Dara

EECS Department, University of California, Berkeley

Technical Report No. UCB/EECS-2025-51

May 13, 2025

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2025/EECS-2025-51.pdf

Reinforcement learning (RL) controllers have been shown in both simulation and real-world deployments to significantly improve traffic flow and fuel efficiency, even when only a small fraction of vehicles are autonomous. Despite these benefits, real-world adoption remains limited due to a lack of transparency, which leads human operators to distrust and often override RL policies. In response, we introduce CLEAR (Contextual Language Explanations for Actions from RL), a framework that generates step-by-step natural language explanations of RL decisions using large language models (LLMs). To address the risk of hallucinations in high-stakes settings, CLEAR integrates a multi-stage validation pipeline that verifies explanations against policy outputs, tests robustness under input perturbations, and checks for logical consistency. Unlike static fine-tuning methods, CLEAR adapts online to new scenarios and maintains alignment with the underlying policy. When evaluated on real-world highway data from the VanderTest, CLEAR significantly outperformed few-shot prompting and retrieval-based workflows in both predictive accuracy and explanation quality. This work extends a prior conference submission and demonstrates the potential of validated language-based interpretability for safe and trustworthy RL deployment.

Advisors: Alexandre Bayen

BibTeX citation:

@mastersthesis{Dara:EECS-2025-51,
    Author= {Dara, Ashwin},
    Title= {Demystifying Decision-Making of Deep RL through Validated Language Explanations},
    School= {EECS Department, University of California, Berkeley},
    Year= {2025},
    Month= {May},
    Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2025/EECS-2025-51.html},
    Number= {UCB/EECS-2025-51},
    Abstract= {Reinforcement learning (RL) controllers have been shown in both simulation and real-world
deployments to significantly improve traffic flow and fuel efficiency, even when only a small
fraction of vehicles are autonomous. Despite these benefits, real-world adoption remains
limited due to a lack of transparency, which leads human operators to distrust and often
override RL policies. In response, we introduce CLEAR (Contextual Language Explanations
for Actions from RL), a framework that generates step-by-step natural language explanations
of RL decisions using large language models (LLMs). To address the risk of hallucinations
in high-stakes settings, CLEAR integrates a multi-stage validation pipeline that verifies
explanations against policy outputs, tests robustness under input perturbations, and checks
for logical consistency. Unlike static fine-tuning methods, CLEAR adapts online to new
scenarios and maintains alignment with the underlying policy. When evaluated on real-world
highway data from the VanderTest, CLEAR significantly outperformed few-shot prompting
and retrieval-based workflows in both predictive accuracy and explanation quality. This
work extends a prior conference submission and demonstrates the potential of validated
language-based interpretability for safe and trustworthy RL deployment.},
}

EndNote citation:

%0 Thesis
%A Dara, Ashwin 
%T Demystifying Decision-Making of Deep RL through Validated Language Explanations
%I EECS Department, University of California, Berkeley
%D 2025
%8 May 13
%@ UCB/EECS-2025-51
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2025/EECS-2025-51.html
%F Dara:EECS-2025-51