Wednesday, December 16, 2009

Reinforcement learning - comments on a Ben Goertzel's blog article

About: "Reinforcement Learning: Some Limitations of the Paradigm"


Bulgarian readers could also check out my article: Анализ на смисъла на изречение въз основа на базата знания на действаща мислеща машина. Мисли за смисъла и изкуствената мисъл.

...

You're right that if the system can alter or "re-wire" its circuitry that computes "reward", this confuses the mechanism, but I believe humans actually can do that - person's values are changing during life time, and even during the decision processes themselves.

I think "reward" is looked in a too narrow sense, because mind is not as solid and... single-minded to have one-single type of rewards. You've spoken about the "sub-selves" in the blog, I would say "virtual control units" that take control over the body.

Body is what makes mind to look uniform, even if it's not, mind can want 1000 things, body can't do them in the same time.

I've speculated on that in my old article, but it's in Bulgarian, I have to translate it in English (eventyally citing it).

E.g. let's consider a boy which is hesitating whether to eat up a chocolate or not. This could be an immediate high taste reward in a near future and if the boy plans only for 1 minute ahead, this is a right decision.

However, what if the boy widen the period of prediction to one year? He remembers his pain while visiting his dentists and reminds his notes, that eating too much chocolate causes bad teeth and pain. So if he plans for one year and reminds this, and decides that this will happen (it couldn't be sure), then the highly rewarding decision would stop being rewarding in the equation and the boy wouldn't take it.

Overall, the maximum reward depends on the set of predicting sub-units, scenarios, values taken into account in the very moment of decision, and the period of time they are predicting ahead.

They are changing, depending on attention, context, mood or even chance - there are so many scenarios that the brain can think of.

The set of predicting sub-units may change, they can switch at different levels of hierarchies and predicted periods, while the behaviour could still keep being reward-driven. It could be reward-driven for the particular "reward-driven virtual unit that took control over the body in this very moment".

This implies the mind is not uniform and there is not single "greatest reward".

...

And another comment on another comment:

I mean, e.g. if you're 18, you may think that making random sex right now is "cool". When you are 28, you may think it's not, even though making sex still would bring you immediate pleasere - however, higher level controls would inhibit your urge for lower level rewards and generally shift your behaviour to higher level rewards.

My point is that reward is not only dopamine, endorphin or so. Higher the ingelligence, higher the abstraction of reward could be. A reward is what the one that receives it considers "a reward", and even the altruism could also be taken as egoism, because one is doing what is "good" regarding its own values and desires, sometimes it's against what the other wants.

E.g. when a lover dies to save his beloved one. Is it really an altruism? How she will feel when seeing him dying, wouldn't she prefer them to die together?

And if you're doing something "against yourself" isn't it to prevent something that you consider worst. When you feel moral responsibility of doing something, fear of not fulfilling your duty might be bigger than the fear of pain.

...

На български - учене с подкрепление, машинно обучение, изкуствен интелект, силно направление, Бен Гьорцел

No comments :