Institut des Systèmes Intelligents
et de Robotique

Partenariats

Sorbonne Universite

CNRS

INSERM

Tremplin CARNOT Interfaces

Labex SMART

Rechercher

Research

girard Benoît
Title : Research Director
Address : 4 place Jussieu, CC 173, 75252 Paris cedex 05
Phone : +33 (0) 1 44 27 28 85
Email : girard(at)isir.upmc.fr
Group : AMAC (AMAC)

Research Topics

My current main topics of research are the intermingled functions of decision-making, action selection and reinforcement learning. I am primarily interested in modeling them in animals and humans, but also in their implementation in artificial devices such as autonomous robots.

Multiple-level modeling of the basal ganglia

Context

The basal ganglia (BG) are a set of interconnected subcortical nuclei that constitute a key component of the neural substrate of action selection, reinforcement learning and decision-making (for a deeper introduction, I warmly recommend reading this introduction to the BG). The BG are part of large cortico-baso-thalamo-cortical loops, as well as subcortical ones; the latter being usually overlooked (refer to McHaffie et al., 2005 for a smooth introduction to these forgotten loops). These loops are supposed to be ultimately subdivided in channels, in the context of decision-making/action selection, each of them representing one of the competing options. The BG interconnect with dopaminergic nuclei, namely the ventral tegmental area (VTA) and the substantia nigra pars compacta (SNc); the dopaminergic inputs provided by them to the input nucleus of the BG, the striatum, are assumed to carry the reinforcement signals that allow to bias the decisions made by the BG towards rewarding options and away from punishing ones (Schultz et al., 1997).

Numerous models of the basal ganglia have been proposed in the last 20 years, however most of them are based on the idea introduced by Albin et al. (1989) that the relatively complex architecture of the BG can be abstracted to a two-pathways feedforward architecture. The first pathway, known as the direct one, would stem from the striatum medium spiny neurons (MSN) exclusively expressing the D1 type of dopaminergic receptors, and project to the output nuclei of the BG (the internal globus pallidus, GPi, and the substantia nigra pars reticulata, SNr). This pathway is inhibitory and probably focused, meaning that each competing pathway projects only to itself. The second and indirect pathway, would stem from the D2 receptor expressing MSNs, project in cascade to the external globus pallidus (GPe) and then the subthalamic nucleus (STN) before reaching the GPi/SNr. This pathway globally has an excitatory effect, and is diffused, as each channel projects to its competitors. This focused inhibitory vs. diffuse excitatory architecture is well suited for selection, as long as it is well balanced. In this interpretation of the BG operation,the effects of dopaminergic dysfunctions (for example, concentration decrease in the case of Parkinson's Disease) affect the balance between these two pathways, and thus degrades the selection capabilities of the circuit (towards either no selection at all, or towards multiple simultaneous selections).

This strict pathway segregation interpretation is problematic in the case of non-human primates (and probably human ones too). Anatomical results have repeatedly shown (Parent et al, 1995, Lévesque & Parent, 2005) that most of the macaque MSNs (around 80%) project to both GPe and GPi/SNr, and that the remaining 20% are GPe specific. There is thus no segregated direct pathway in the macaque BG, and a poorly segregated indirect one. The question of selective expression of either D1 or D2 receptors in MSNs has also been questionned, and co-expression of both types could be quite common (Nadjar et al., 2006). Should this be confirmed, it would be an additional nail in the coffin of segregated BG pathways in macaque monkeys.

On the dopamine front, additional concerns are that dopaminergic receptors are not exclusively expressed in the MSNs of the Striatum, but in fact in all the BG nuclei (Rommelfanger & Whichmann, 2010).

Contributions

  • Selection without pathways

My current project thus aims at exploring what can be explained of the known BG functions an dysfunctions, without relying on the direct/indirect pathway interpretation.

We proposed a first model of the macaque BG (Liénard & Girard, 2014) at the population level (the individual components of the model represent the average activity of large populations of neurons), constrained by a large set of anatomical and electrophysiological data. Its main characteristics are the following:

  1. there are no segregated pathways: the proportion of overlap between the MSN to GPe and MSN to GPi projections follow the aforementionned results from Parent's team,
  2. it is agnostic with regards to the function of the BG: the model was constrained with electrophysiological data obtained with animals at rest, executing no specific task, thus the obtained model parameterizations do not favor any function in particular, like selectivity.

The main obtained results are:

  1. numerous testable anatomical predictions, like predicted number of synaptic boutons in projections that have not been counted yet,
  2. a posteriori exhibition of selective capabilities, because when sliced into separate channels, and fed with inputs of varying intensities (mimicking motor cortex inputs in a target selection task) our BG model performs selection, while it was not a priori optimized for this.

Even if preliminary (selectivity has here to be tested more in-depth), these resuts show that the supposed selection function of the BG is indeed compatible with what we know of BG anatomy and physiology,, and importantly, that this selection function does not require segregated pathways.

  • Parkinsonian oscillations without pathways

We extended this model (Liénard et al., 2017): first by adding realistic transmission delays between nuclei, which were obtained by a systematic test of delay combinations confronted to the rich stimulation data obtained by Nambu's team, and second, by adding dopaminergic receptors in the GPe and the STN (and not in the Striatum!). We then simulated decreased dopaminergic innervation of the BG, simulating the effect of Parkinson's disease (PD) on the BG, and obtained strong oscillations in the beta-range, which are typical of PD condition. These oscillations originate from the STN-GPe loop, a theory that had already been proposed in the 2000s, but that had been challenged by a number of posterior proposals. We ad some support to this theory here, in a very parsimonious manner.

Interestingly, increasing the effect of dopamine depletion on the D1 receptors of the STN only (i.e. somehow selectively increasing the intensity of the PD condition on these receptors) cancel the oscillations, a counter-intuitive effect that fits well with the results from (Chetrit et al, 2013).

We do not claim that there are indeed no dopaminergic receptors in the MSNs, our point here is to show that segregated pathways are not necessary for the emergence of typical PD oscillations.

  • Extensions of the model to higher- and lower-level modeling

Preliminary work has been carried out by Jean Bellot (at the end of his PhD) and by Anne Chadoeuf (M.Sc. internship) to develop a leaky-integrator model of the BG whose parameters would derive from the initial (Liénard & Girard, 2014) model. The goal here is to obtain a model comparable to previous models of the literature, built at this level of abstraction. It is also faster to simulate, which makes it more suitable to be integrated in large scale models, encompassing the various loops the BG are embedded in.

On the opoosite direction, a spiking version of the model is in developement (Girard et al., 2017), based on leaky-integrate and fire neuron building bricks. Preliminary results show that this new model can reproduce the results obtained with the mean-field model.

Behavioral models of reinforcement learning

Context

The link established between the activity of the dopaminergic neurons during Pavlovian (and later, instrumental) learning in mammals (Schultz et al., 1997) and the reward prediction variable, central to the independently developed temporal-difference learning algorithms (Sutton & Barto, 1998), led to the idea that reinforcement learning in mammals can be abstracted by such AI-derived models. These models require very simple computations at each timestep, they however probably do not explain all the reinforcement learning capabilities of mammals, as they are slow to learn (many repetitions of the same task are necessary to learn the correct course of actions) and even slower to re-learn (if something changes in the environment, like the location of the exit in a maze), and thus lack adaptatbility.

These "temporal-difference" models are also known as "model-free", as they do not attempt to learn the causal structure of the environment independently from reward (i.e. learn to predict in which state you end up if you perform a given action in a given state). Learning the causal structure of the environment (a world-model) gives the ability to plan your future course of action in a more flexible manner, using model-based reinforcement learning, at the cost of extra computations. It has been argued since (Daw et al., 2005) that mammals can use model-based reinforcement learning capabilities, and that while habits would correspond to a behavior driven by a model-free system, goal-directed behavior would result from the expression of a model-based system. This of courses raises the question of the coordination between these two complementary learning systems (the fast but dumb one vs. the slow but adaptable one).

Contributions

  • Navigating with multiple learning systems

We have shown that the behavior of rats in various navigation tasks can be explained by the interaction of model-based and model-free reinforcement learning systems (Dollé et al., 2010). We proposed that they are coordinated by a third system (a model-free one) that learns to choose the right system in the right context.

We have proposed to import this idea of ensemble reinforcement learning in robotics, by showing the efficency of the Dollé et al. model in a robotic navigation task (Caluwaerts et al., 2012). This thus extended our older robotic navigation work (Girard et al., 2005), were two navigation systems were interacting, but with a fixed and predetermined priority. We now try to keep this ensemble reinforcement learning idea imported from neuroscience, but to find the most efficient robotic arbitration mechanism, rather than sticking to the sometimes inefficient behavior of rats (Chatila et al., 2018).

  • Beyond reinforcement learning

We have shown that this habitual vs. goal-directed dichotomy extends beyond the model-based vs. model-free reinforcement learning (Viejo et al., 2015). Indeed, we managed to explain the behavior of humans in a task that requires short-term memory capabilities rather than planning in the future, by combining a model-free reinforcement learning algorithm with a newly developed model of working memory. The coordination criterion we proposed, based on a progressive processing of memorized information until a sharp decision can be made, managed to explain both the choices of subject and their reaction times.

References

  • Albin, R. L., Young, A. B., & Penney, J. B. (1989). The functional anatomy of basal ganglia disorders. Trends in neurosciences, 12(10), 366-375.
  • Caluwaerts, K., Staffa, M., N’Guyen, S., Grand, C., Dollé, L., Favre-Félix, A., ... & Khamassi, M. (2012). A biologically inspired meta-control navigation system for the psikharpax rat robot. Bioinspiration & biomimetics, 7(2), 025009.
  • Chatila, R., Renaudo, E., Andries, M., Chavez-Garcia, R. O., Luce-Vayrac, P., Gottstein, R., ... & Khamassi, M. (2017). Toward Self-Aware Robots. Frontiers in Robotics and AI, 5, 88.
  • Chetrit, J., Taupignon, A., Froux, L., Morin, S., Bouali-Benazzouz, R., Naudet, F., Kadiri, N., Gross, C. E., Bioulac, B., and Benazzouz, A. (2013). Inhibiting subthalamic d5 receptor constitutive activity alleviates abnormal electrical activity and reverses motor impairment in a rat model of parkinson’s disease. The Journal of Neuroscience, 33(37):14840–14849.
  • Daw, N. D., Niv, Y., & Dayan, P. (2005). Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nature neuroscience, 8(12), 1704.
  • Girard, B., Filliat, D., Meyer, J. A., Berthoz, A., & Guillot, A. (2005). Integration of navigation and action selection functionalities in a computational model of cortico-basal-ganglia–thalamo-cortical loops. Adaptive Behavior, 13(2), 115-130.
  • Girard, B., Heraiz-Bekkis, D., Doya, K. (2017). A spiking model of the monkey basal ganglia without segregated pathways, but with emerging selection capabilities and Parkinson-like oscillations. In Seventh International Symposium on Biology of Decision Making. Bordeaux, France. (poster)
  • Lévesque, M. and Parent, A. (2005). The striatofugal fiber system in primates: a reevaluation of its organization based on single-axon tracing studies. Proceedings of the National Academy of Sciences, 102(33):11888–11893.
  • Liénard, J. & Girard, B. (2014). A biologically constrained model of the whole basal ganglia addressing the paradoxes of connections and selection. Journal of Computational Neuroscience, 36(3):445–468.
  • Liénard, J., Cos, I., Girard, B. (2017). Beta-Band Oscillations without Pathways: the opposing Roles of D2 and D5 Receptors. bioRxiv preprint.
  • McHaffie, J. G., Stanford, T. R., Stein, B. E., Coizet, V., & Redgrave, P. (2005). Subcortical loops through the basal ganglia. Trends in neurosciences, 28(8), 401-407.
  • Nadjar, A., Brotchie, J., Guigoni, C., Li, Q., Zhou, S., Wang, G., Ravenscroft, P., Georges, F., Crossman, A., and Bezard, E. (2006). Phenotype of striatofugal medium spiny neurons in parkinsonian and dyskinetic nonhuman primates: a call for a reappraisal of the functional organization of the basal ganglia. The Journal of Neuroscience, 26(34):8653–8661.
  • Parent, A., Charara, A., and Pinault, D. (1995). Single striatofugal axons arborizing in both pallidal segments and in the substantia nigra in primates. Brain research, 698(1):280–284.
  • Rommelfanger, K. S. and Wichmann, T. (2010). Extrastriatal dopaminergic circuits of the basal ganglia. Frontiers in neuroanatomy, 4.
  • Schultz, W., Dayan, P., & Montague, P. R. (1997). A neural substrate of prediction and reward. Science, 275(5306), 1593-1599.
  • Viejo, G., Khamassi, M., Brovelli, A., & Girard, B. (2015). Modeling choice and reaction time during arbitrary visuomotor learning through the coordination of adaptive working memory and reinforcement learning. Frontiers in behavioral neuroscience, 9, 225.