Thursday, May 03, 2007

Embedded neuromodulation

This post gives a summary of an article I've just read. It is mainly a note for myself.
The article can be downloaded here

The article describes experiments with an autonomous Kephera robot. Sporns et al have modeled a neuromodulatory system, resembling the dopamine reward system in the brain. This dopamine reward system influences the plasticity of the robot. Whenever unexpected reward takes place, the dopamine system get's active, this leads to 'value-dependent learning', both the sensorimotor connections get directly affected (normal stimulus-response associations are formed) and the neuromodulatory system itself get's affected.

QUOTE from a related article
Value signals combine temporal specificity
(they are phasic and short-lasting) with spatial uniformity
(they affect widespread projection regions and act as a
single global signal). Value enters into traditional Hebbiantype
synaptic rules as a third factor, in addition to factors
representing pre- and postsynaptic activity. Because of their
phasic nature, value signals effectively gate plasticity, in
addition to influencing its magnitude and direction (see
below). Value affects plasticity more or less uniformly
throughout the widespread cortical and subcortical regions
to which value systems project.

The interesting thing is that they put this system (which I do not completely grasp at the moment but at any rate resembles something like a sensorimotor system that gets 'laden' with internal, bodily based 'value', depending on reward, which is like Damasio in a way, i.e.: embodiment) in a real environment with objects. The behavior of the robot in the world influenced the subsequent inputs of the robot, because at the beginning, reward giving objects (the red objects) were dispersed quite homogenous in the environment, but the behavior of the robot lead to the effect that clusters of red objects were formed. The result was that at first there was a quite predictable timing/rythm in which the robot would first detect, visually, a red object, then grab it (feeling it with a touch sensor, thereby receiving the reward, which was supposed to model 'tasting/eating' it). But later on, all red objects were clustered, so after an initial delay, suddenly the robot would get massive amounts of reward in short time intervals.

QUOTE from the article:
Our experiments document a progressive alteration of an
environmental variable (the spatial distribution of reward
throughout the environment) due to the behavioral activity
of the robot. This alteration, in turn, has consequences on
synaptic patterns encoding predictions about the
occurrence of future rewards.
It is especially noteworthy that the differences between
early and late phases in experiments with high object
densities are neither the result of purposeful
rearrangements of the environment by either robot or
experimenter, nor are they due to the adjustment of
“internal” variables over time such as learning rates, cell
response functions, or motor variables. Instead they are the
outcome of the coupling between brain, body and
environment. This coupling is strongly reciprocal.
Behavior affects the statistics of reward timings which
drive synaptic plasticity through activation of a
neuromodulatory system. In turn, synaptic changes alter
the coupling between visual and motor units which affects
behavior.
ENDQUOTE

Here they even suggest a possible role for embeddedness (i.e. reshaping your own environment) in the emergence of addiction:

QUOTE from the article
The experiments discussed in this paper may shed light
on the activity and functional role of neuromodulatory
systems (in particular, dopamine) in the course of
“natural”, self-guided behavior. The “attractive force”
exerted by clusters of rewarding objects, resulting in
restricted trajectories of robot movement and navigation as
well as repeated “rapid-fire” sequences of reward
encounters are especially intriguing. Disruptions of the
neurobiological bases of reward processing are thought to
form a major cause for lasting behavioral changes and,
eventually, chronic disease (addiction) in humans. Our
results suggest the hypothesis that a pattern of persistent
reward-seeking behavior may in part be generated as a
result of a progressive reshaping of the environment
coupled with long-lasting synaptic changes in specific
neural structures. Future experiments will investigate this
hypothesis in detail.
ENDQUOTE

For me this article shows that neuromodulation (embodiment) and embeddedness can be part of a larger perspective in which brain, body and world form a tightly coupled system, where the causal work depends on interaction between both world-events (behavior that reshapes the environment) and internal modulatory signals (reward leading to changes in synaptic connectivity - and hence, in the speed/ease of learning). In this article it is shown how this could work out in practice.

No comments: