Tuesday, February 9, 2010

// // Leave a Comment

Motivation is dependent on local and specific stimuli, not general ones. Pleasure and displeasure as goal-state indicators. Reinforcement learning.

Analysis of the meaning of a sentence, based on the knowledge base of an operational thinking machine. Reflections about the meaning and artificial intelligence.

Part 3 of 4 - Comment #2 continues...

Part 1 (и български): http://artificial-mind.blogspot.com/2010/01/semantic-analysis-of-sentence.html

Part 2 (и български): http://artificial-mind.blogspot.com/2010/02/causes-and-reasons-for-any-particular.html

Part 4   Intelligence: search for the biggest cumulative reward for a given period ahead, based on given model of the rewards. Reinforcement learning.

One of the milestones of my AGI research. I wrote this particular article and the comments
in Bulgarian as a 19-year old freshman in Computer Science at Plovdiv University.

By Todor Arnaudov | 13 March 2004 @ 21:49 EET | 340 reads |
First published at bgit.net and the e-zine “Sacred Computer” in Bulgarian

Comment # 2 by Todor Arnaudov | 18 March 2004 @ 21:58 EET |CONTINUES...

Let's note, that when a mind is being evaluated, a "stop frame" is evaluated. In this very moment. (...) And somebody with a particular attitude is the one who can find best why he does like or dislike a particular thing.

E.g. only you know, you see or imagine on what exact circumstances you're talking about. Every detail in a given situation is important, not only the generalized ones, said in a few sentences. E.g. on the wall that you imagine, there could be other particular objects which don't match this one. And this is your taste - your like or dislike.

The purpose of this "will", this kind of will - like/dislike something - is a search for a reason to decide and choose, when a single and unambiguous action should be executed. If that action doesn't lead to a damage (you won't be hurt either if that clock with a thermometer is put on the wall or not) - then there is not an immediate practical consequence/meaning what we choose. For a long period ahead, we cannot predict how exactly this particular action and decision will affect us, because the future inputs are too much and too much unknown in advance.

So, let's assume that it doesn't matter the clock & thermometer combo is on the wall or not, and the reason is that "we like it" (anyhow) and it is possible to find a persuasive reasonable explanation, if it is possible to analyze ourselves precisely enough.

It is possible that:

1. You already had a clock and a thermometer (separate) and you're a practical person who doesn't like possessing redundant stuff.
2. You don't need a thermometer anymore. - This is a preliminary calculated reason; once you have told to yourself "I don't need a thermometer anymore". Then you have never questioned this reason and you have followed it as a reason not to put a thermometer on the wall)
3. You're in a bad mood and you would deny anything that anyone would ask you to do.
4. You don't like anybody to tell you to do whatever, and you feel like they are giving you a command (e.g. it's a gift from your mother in law)
5. You just don't know why.

In general, of course - if we're searching for a reason, it's nice to have as complete model of the evaluator as possible...

Definition of a "meaning" with the meaning of a goal/purpose or cause/reason and "not contradictory link" expresses the attitude of the evaluator to the item that is being evaluated (where a cause/reason/purpose/meaning is searched).

And it is very important to note, that the evaluator determines whether there is or there is not a cause/reason/meaning/purpose for him personally. "Meaning" is subjective.

Actions of thinking machines and persons can't be explained unambiguously from an external evaluator, because externally the information is very scarce [and intelligent agents behaviour is very complex and non-linear]. The amount of information which is transferred between the parts/modules of the machine/human is enormously bigger than the data which is outputted - e.g. the externally visible behaviour.

Besides, the mind is biased and limits itself when searches. To me, the ultimate cause for anything is the whole past, the every single little difference would make the whole different. However, since it is impossible for the mind to compute all causes, mind uses greedy algorithms and searches for most direct and "plausible" explanations/causes/scenarios.

It is like differentiating in mathematics - there is raw data, a graph of values; the function that has drawn the function is searched. However, this is an ambiguous operation, we can guess, but not always know - the causes are also ambiguous.

That's why any "differentiated" causes are meaningful at a given deepness, resolution or so, where the search is interrupted.

Now, let's check Konstantin's opinion once more, differently:

Konstantin: People do absurd things, which however appear to be full of deep meaning. Programmers sing - out of tune... A banker who owns millions and visits luxury restaurants once passes next to an old lady who is selling donuts; then he takes one from a dirty bag and buys a donut for 30 cents. No doubt this donut was made by a poor snotty baker, but... For the banker, this is the best food and the best thing in the world!

Where's the absurd in programmer singing? "A programmer" means a person, and a person usually can sing - good or bad.
What you express here is just your opinion, your disapproval.

There is not a reason/purpose for a bad singer to sing? Why? Because he's afraid of being accused for singing, mocked... (...) 

[However] This particular programmer may be singing because:
- He feels in love and felt happier than the moment before, and singing is a way to express your good mood.
- He is alone, and he wanted to sing before, but he was shy.
- He is drunk and his inhibition was taken away.
- He is in a karaoke bar, lonely, he saw a beautiful girl and he decided that this is a meaningful way to attract her attention. He might be drunk or not.


It is impossible to embrace all possible reasons, because of the combinatorial explosion, but when having rich enough information about the circumstances, it is always possible to find plausible concrete reasons, if one wants to find one. (The one who denies, usually doesn't want to find reasons.)

Konstantin: For him, this is an act that is filled with deep meaning, but how would you persuade a computer program? Especially if the program is counting the number of viruses and bacteria that has entered the banker's organism in that very moment. How does the banker would explain it - “I felt a thrill, I remembered once when I was a child...”. From the viewpoint of the computer, this is a non-sense, just a random association – especially if from this little moment eventually grow up a serious decision for his life and career; how would you explain to a computer what's in common between the donuts and the money?

What does a "program" means? The implied in your words is "a dumb program" or one that would react as it doesn't understand. However, then obviously this is not a thinking machine!

I think the reason is actually very purposeful and meaningful. Every Control Unit has goals: the man has recalled an event that has made him feel good and he wanted to feel that pleasure again. He didn't think of the microbes, he didn't include them in the evaluation function, and he has never counted the microbes. If a machine is searching for a reason for a human, it must put itself in his shoes and evaluate as if it were him, not as if it wa counting machine. (As I noted before, at least according to my research, "pleasure" means achieving the goal of the behaviour of the Control Unit.

The banker has recalled a reachable state that in the past has made him feel pleasure.
That state is a set of circumstances, feelings, possibilities for actions; possible ways for changes in the circumstances/perceptions/feelings.

In that very moment there weren't any other possibilities that would give him higher pleasure in the next, say, 5 seconds. After he has bitten the donut, the search for another way to feel pleasure is cut. The donut became a "master" of the mind and rules the person's current goals and behaviour. [This local reward] rules the hands, the jaw etc. in order the person to feel the taste that has made him feel the pleasure in the past, and that pleasure to come back and be felt again.

How does this temporal control over the mind and the effectors (muscles) happen?

The behaviour of human can be represented as a complex of greedy algorithms which are searching for states, local maxima - the biggest possible pleasure, and the minimum possible displeasure.

Any chosen action is reasonable, for the virtual greedy algorithm that has ruled over the rest in the given situation. (...)

E.g. if one starts to eat some food and it's tasty, he doesn't spit it after the first bite, in order to search for something which is tastier, even if he knows that such food does exist and is near-by - in the fridge; the first bite may rule us for a moment and inhibits the urge for another piece of tasty food.

However, what if in the evaluation of the pleasure the mind includes "the fear of caries"? It is so complex mixture in the mind, what exactly would "rule" depends on specific memories, specific stimuli in the recent past and it may appear random.

In case of caries being included in the evaluation, it might be negative - the chocolate is not the maximum pleasure and shouldn't be taken. Different Control Units are working in parallel, all fighting for control over the effectors. And if one which has this fear rules out, it may stop the eating operation and switch to "brush my teeth immediately".

Before the caries consideration, the greedy algorithm has computed the biggest cumulative pleasure/reward in the near 1 second. However, the caries and the pain at the dentist forced him to look feelings long ahead, which are assumed to be caused by the teeth and chocolate. This assumption is important - something else may actually cause it, but the person takes this as a reason/cause!

The dental pain is much bigger punishment than not feeling neither pain, nor pleasure (not eating the chocolate), so it is avoided.

Turning back to the donut - this "random" link/memory/recall is not random at all!

The apparent reason that recalled the memory are the images of donuts, their smell. Also, the circumstance that the banker has been walking alone and he was thinking of something; not long ago he has met his grandparents in his village; when he was young he loved donuts...

All these details has made him want to feel this pleasure again right then.

Let's analyze the donut even once more:

Konstantin: People do absurd things, which however appear to be full of deep meaning. Programmers sing - out of tune... A banker who owns millions and visits luxury restaurants once passes next to an old lady who is selling donuts; then he takes one from a dirty bag and buys a donut for 30 cents. No doubt this donut was made by a poor snotty baker, but... For the banker, this is the best food and the best thing in the world!

The banker may possess millions, but while he's walking by the old lady on the streets, those millions are worthless. When one is walking on the street, he is supposed to follow the stimuli around him - reading the captions, watching the cars, traffic lights, passers-by. There are not luxury restaurants at every single corner, and you can't purchase a Ferrari or an airplane right there.

The action of the banker just seems "meaningless", i.e. inappropriate, impossible to explain, because it was assumed that if one agent is a banker, then he should do this and that, and never does this-and-that. This is another example of artificial self-pruning of the search of reason/purpose, without explaining why and without an explicit cause, besides the prejudice.

Every human being, even every humanoid robot can put his hand in a bag, catch the donut, pay and is capable to inform the others that "this is the great thing I've ever did in my life".

There is always a possible reason to do it, if this is the best/most rewarding action that the agent has found in the current situation/planning period.

CONTINUES with part 4/4...

More keywords: Universal AI, Artificial General Intelligence (AGI), Behaviorism, Psychology, Control Unit

Part 4   Intelligence: search for the biggest cumulative reward for a given period ahead, based on given model of the rewards. Reinforcement learning.

0 коментара: