Sunday, November 1, 2009

Operant conditioning, Reward pathway, Reinforcement learning - Beginners directions to Artificial General Intelligence, Part 2

На български - втора част от Въведението в Силния ИИ/Универсалния ИИ.

Условни рефлекси, Оперантно кондициониране, обучение с подкрепление, допамин, пристрастяващо поведение и др. - Насоки за начинаещи в Универсалния изкуствен интелект. Психология, адаптивни системи, философия, икономика. Очаквайте скоро.

Part I - References to important researchers in Artificial General Intelligence

Part 2 - Classical conditioning, Operant conditioning, Reward pathway, Reinforcement learning


In this article I'll emphasize several important concepts in Psychology, Human and animal behaviour, Adaptive systems and Learning. It has also something to do with Utilitarism in philosophy and economy.

1. Conditioning and Operant conditioning

Conditioning is related to the experiment of I. Pavlov with his famous dog's conditional reflex. Operant conditioning, formulated by Skinner in his Behaviorism goes much further with the inclusion of predictive conditioning that I would say, implies will and goal-driven behaviour. Rats or pigeons learn to press a lever, when they notice that after this action they receive a reward - food or even direct electric stimulation of a pleasure center in the brain.
Reinforcement - in operant conditioning, reinforcement occurs when an event following a response causes an increase in the probability of that response occurring in the future.
I.e. when the rat gets food after pressing the lever, it's more likely to press it again. If it presses it many times, expecting food, but not receiving any, this behaviour is going to be inhibited and forgotten. The rat would adapt. First, it adapts to the case that pressing the lever is getting him reward, then it readapts that this behaviour is not rewarding anymore, and it's better to seek or focus on another one.

2. Reward pathway

Reward pathway is a mechanism in the brain, linked with learning, repetitive and goal-driven behaviours. It has something to do with dopamine neurotransmitter.

Addictive, repetitive and goal-driven behaviours are often related to affection of this mechanism, either by taking drugs or generating dopamine and other "natural drugs" by the brain, when you put yourself in particular behavioural patterns.

3. Love and post-traumatic disorders in terms of conditioning

For example, love is an addiction to an operant conditioned stimulus, that lover finds a source of very big rewards: sex, care, emotional support, fun, responsiveness to any of his needs etc.

Strongly conditioned stimulus are not only wanted, but expected to happen - being reinfoced many times, and giving very big rewards. This is the reason of the tragedy when the other lover lets you down or brakes up suddenly - she is causing a shocking and undesirable change in the expected rewarding pattern.

Post-traumatic stress disorders are also related to conditioning. When somebody suffers a painful experience, especially in young age, the brain may slip into wrong directions of making connections (conditioning) between many stimulus and the bad feelings of the traumatic situation. Every time when the sufferer is in a situation with even tiny similarities, the brain may start making wrong predictions, expecting undesirable outcomes like in the traumatic situation; all these resulting in fear, pain and a behaviour of avoiding such conditions.

4. High Level

At high level these concepts are related to the intelligent systems' principle of predicting future stimuli through extrapolation/projection/simulation of the past stimuli (experiences).

Intelligence is also a mechanism that aims at maximizing the probabilities of desirable/wanted stimuli and minimizing the probabilities of unwanted ones. Concepts of will and goal-driven behaviour take part at this point.

5. Self-control and long term goals vs short term ones

It worths thinking of these concepts also when considering your own behaviour and changing it. For example when thinking of longer term goals, instead of shorter term ones. The shorter term goals may give you immediate rewarding response, thus throwing you to local maximum" of pleasure and preventing you to find a higher local maximum or even the "global maximum". The global maximum could be reached after passing through some "minimums" - unrewarding activities in short term - which a shorter term reward strategy may not accept.

...To be continued...

No comments :