Interestingness Elements for Explainable Reinforcement Learning through Introspection

, ,


Sequeira, P., Yeh, E., and Gervasio, M. (2019). Interestingness elements for explainable reinforcement learning through introspection. Joint Proceedings of the ACM IUI 2019 Workshops, Vol. 2327.


We propose a framework toward more explainable reinforcement learning (RL) agents. The framework uses introspective analysis of an agent’s history of interaction with its environment to extract several interestingness elements regarding its behavior. Introspection operates at three distinct levels, first analyzing characteristics of the task that the agent has to solve, then the behavior of the agent while interacting with the environment, and finally by performing a meta-analysis combining information gathered at the lower levels. The analyses rely on data that is already collected by standard RL algorithms. We propose that additional statistical data can easily be collected by a RL agent while learning that helps extract more meaningful aspects. We provide insights on how an explanation framework can leverage the elements generated through introspection. Namely, they can help convey learned strategies to a human user, justify the agent’s decisions in relevant situations, denote its learned preferences and goals, and identify circumstances in which advice from the user might be needed.

Read more from SRI