This study explores whether LLMs can mentally model decision-making agents by reasoning over their behavior and state transitions from interaction histories. Evaluated on reinforcement learning tasks, results show that while LLMs offer some insight, they fall short of fully modeling agents without further innovation, highlighting both their potential and current limitations for explainable RL.
We propose reward decomposition methods for better decision-making explainality.
Explainable Q-Map improves the transparency of RL agents by combining reward decomposition with abstract action spaces, enabling clear, high-level explanations based on task-relevant object properties. We demonstrate visual and textual explanations in robotic scenarios and show how they can be used with LLMs for reasoning and interactive querying.