University College London | UCL · Wellcome Trust Centre for Neuroimaging
Retrospective Model-Based Inference Guides Model-Free Credit Assignment
To adapt to their environments, organisms need to learn which actions are rewarding in different states of the world. Extensive research in Reinforcement Learning (RL) has shown that organisms cope efficiently with this credit-assignment problem, even when their actions are executed under challenging conditions of action and/or state uncertainty. However, little is known about such credit-assignment even for the common case where this uncertainty is subsequently resolved. Such is the case, for example, when you ask for a salary-raise without knowing whether your employer is in a good or a bad mood and a few days later you learn that she was in a bad mood when she approved your request. Does this knowledge modulate your credit assignment? Here, I will examine this question from the perspective of the interaction between habitual (model-free; MF) and goal-directed (model-based; MB) systems. Whereas previous research in RL has mostly focused on an MB prospective-planning function, I will present a novel theory of MB retrospective-inference and an experimental test of this theory based on a novel bandit task. According to our theory, an MB system resolves the uncertainty that prevailed when actions were taken and hence guides MF credit-assignment. In support of our theory, we found that when subject’s momentary uncertainty about which bandit had generated an outcome was resolved by subsequent information, they assigned most of the credit to the bandit they inferred to have been responsible. I will discuss how these findings enrich our knowledge on the variety of MB functions and the scope of MB-MF interactions.