We propose an explainable reinforcement learning (XRL) framework that analyzes an agent’s history of interaction with the environment to extract interestingness elements that explain its behavior. The framework relies on data readily available from standard RL algorithms, augmented with data that can easily be collected by the agent while learning. We describe how to create visual explanations of an agent’s behavior in the form of short video-clips highlighting key interaction moments, based on the proposed elements. We also report on a user study where we evaluated the ability of humans in correctly perceiving the aptitude of agents with different characteristics, including their capabilities and limitations, given explanations automatically generated by our framework. The results show that the diversity of aspects captured by the different interestingness elements is crucial to help humans correctly identify the agents’ aptitude in the task, and determine when they might need adjustments to improve their performance.
Interestingness Elements for Explainable Reinforcement Learning through Introspection
We propose a framework toward more explainable reinforcement learning (RL) agents. The framework uses introspective analysis of an agent’s history of interaction with its environment to extract several interestingness elements regarding its behavior. Introspection operates at three distinct levels, first analyzing characteristics of the task that the agent has to solve, then the behavior of the agent while interacting with the environment, and finally by performing a meta-analysis combining information gathered at the lower levels. The analyses rely on data that is already collected by standard RL algorithms. We propose that additional statistical data can easily be collected by a RL agent while learning that helps extract more meaningful aspects. We provide insights on how an explanation framework can leverage the elements generated through introspection. Namely, they can help convey learned strategies to a human user, justify the agent’s decisions in relevant situations, denote its learned preferences and goals, and identify circumstances in which advice from the user might be needed.