Wednesday, February 10, 2010

Intelligence: A Search for the biggest cumulative reward for a given period ahead, based on a given model of the rewards. Reinforcement learning.

Analysis of the meaning of a sentence, based on the knowledge base of an operational thinking machine. Reflections about the meaning and artificial intelligence.

Part 4 of 4 - Comment #3

Part 1 (и български): Semantic analysis of a sentence. Reflections about the meaning of the meaning and the Artificial Intelligence

Part 2 (и български): Causes and reasons for human actions. Searching for causes. Whether higher or lower levels control. Control Units. Reinforcement learning.

Part 3 (и български): Motivation is dependent on local and specific stimuli, not general ones. Pleasure and displeasure as goal-state indicators. Reinforcement learning.

Part 4 : Intelligence: search for the biggest cumulative reward for a given period ahead, based on given model of the rewards. Reinforcement learning.

One of the milestones of my AGI research. I wrote this particular article and the comments in Bulgarian as a 19-year old freshman in Computer Science at Plovdiv University.

By Todor Arnaudov | 13 March 2004 @ 21:49 EET | 340 reads |
First published at bgit.net and the e-zine “Sacred Computer” in Bulgarian


Comment # 3 by Todor Arnaudov | 18 March 2004 @ 22:04 EET

(...)

I'm one of the "scientists" who assume that since anything in Universe does work, there is nothing mysterious in its operation. I think that every information process in Universe is formal and can be modeled by a Fon Noeman machine if it has enough memory and - because Universe is working as a computer; because the computer is what it is, perhaps because it is following a high and universal model, of the Universe. Why the essence of the computers is yet a CPU, memory, clock [time, synchron], input-otput since the start - this is the simplest complete model of operation of the Universe.

[As a University freshman it was harsh to call myself a scientist seriously, I was more a philosopher. This quotes my theory about "The Universe Computer", i.e. Digital Physics]


Everything is "formal" and "external", i.e. it could be written with formulas. E.g. why anyone assumes that another one is intelligent, if not external. And what is "internally intelligent" - one that is working like a human? What does it mean - to be built from proteins?...

Actually, a person usually says that something is "formal" if he believes that he does understand the formulas and/or if he thinks that these formulas are intelligible or superficial - just changing symbols like in Searle's Chinese room - even if he himself is incapable to understand them.

E.g. if a thinking machine is built and after it is taught [from a seed intelligence, a core, like a human] and it gets as complex as me or you, so that it would be untracable for anyone, some humans would still say that it is formal, they do understand it and it is nothing special.

Why would they think so? Because they assume that they do understand how computers work - "1 and 0, NAND, NOR; cycles, reading, writing, shifting... it's so simple and the computer doesn't realize anything of what it computes, so even if he appears to think, it would just calculate!"

Does the carbon atom "realize" that it's a part of a neuron?! Or each neuron "realizes" that it's a part of the brain, and wouldn't it behave the same way if we put it outside of the brain and feed it with the same signals like in the brain? Is the neuron aware that it's a part of a mind and does it know a part of what thought it is?

Bullshit! One thing is for sure to me - humans want to feel unintelligible, mysterious, hyper-complex, because people believe that to be "formal" or to be explainable is a bad thing. People don't want to ask hard questions, and more likely - don't want to get irritating answers. (...)

However, what is to be a human? I would ask, perhaps as a "self-humiliating" individual.
A new born is, not a 20-30 years old. A seed intelligence where mind grows and develops.

The mysterious concept of soul is an example of a helpless "prove" that human is not a "formal" entity. It has a soul, actually it is just something that a machine or anything else couldn't have just because it is a machine, that means - not a human, that means - not a member of our precious club. However those people do not question why most people believe that the animals have one, but machines - don't, while a lot of pretty formal reasons could be found.
(See theTeenage Theory of Universe and Mind/Intelligence and the short novel "The Truth" http://research.twenkid.com )

To me human is formal, and the principles in its mind as information processes are similar to what they could be in a thinking machine: memory, processor (merged in one), clock (a way to separate/resolve events), input-output. In the memory there are records of possible actions from which the person - its body - all the time selects one that mostly matches current goals for a given selected period of time ahead, based on a current active model for checking whether the goal is reached or not.

The purpose is a GOAL; a state, where the Control Units aims to. The "thirst for a purpose/meaning" is a search for local maxima, indicating to the Control Unit how close the goal is. [E.g. - an orgasm... ;) ]

The machine, as of my current understanding [2/2004] - should be made to have complex enough subunits - diverse enough in their kinds and goals, and either real or virtual - which would feel "pleasure" in a particular way, i.e. they would seek achieving their goals. And it should be impossible all subunits to reach their goal in the same time.


Terminal devices are needed (effectors) - muscules, which can't be controlled in parallel, e.g. only one subuint could control them at a given moment, so that the external behaviour, movement, output data - be clearly defined, as with human.

The subunits should compete and interact; to contract and fight; sometimes to collaborate in groups and to "fight" against other groups, aiming at the goal each of them to reach maximum pleasure.



I think that's what human does and I think [as of 2/2004] this is "mind" [intelligence]: search for pleasure and avoidance of displeasure by complex enough entities for a given period ahead [prediction, planning]; the "enough-ness"" is defined by the examples we already call "intelligent": humans at different ages, with different IQ, different characters and ambitions.

The behaviour of each of them could be modeled as a search for the biggest sum of pleasure (displeasure is a negative in the sum) for a selected period of time ahead, which is also chosen at the moment of the computation.

"Happiness" is another word for pleasure, and as a friend of mine, a poet, once said:
"The man is unresting happiness seeker. I don't know why it'so important for everybody to take their dose of happiness, but it is true... and the people, caught in the hands of vices and addictions are a kind of deluded seekers... They are also searching for happiness, but at a wrong place".

Addictions - such as to drugs, but not only - are an example of Control Units finding their goal - an enourmous pleasure they feel, taking a given drug - and this can lead to taking control over the whole body. "Happiness" for the body is turning to an elementary happiness for the elementary control units, which feel that their goal is reached by detecting a particular kind of molecules.

PS. When I was about to write "a local maximum" I was about to say also "for a close enough neighbourhood" (very short period), influenced by the Mathematical Analysis I've been studying lately, however mind is much more complex than the Analysis, i.e. it's based on much more linearly independent premises, that is impossible to be deduced from each other.

It's not required the period for the prediction to be short enough, because like in the example with the chocolate, caries and the dentist, immediate actions can be included in functions that give result shifter far ahead in the future; at the same time, the same action can be regarded in the immediate future, in the example situation, the eater will feel ultimate pleasure in the following one second.

One amongst many functions can be chosen or construted, and they all can give both negative or positive results, depending on minor apparently random details - the mood of the mind in the particular moment; which variables are at hand, which the mind has recalled of, what the mind is thinking of right then...

(...)

Suggested reading - "The Truth", a short novel I wrote in 2002. [not translated as of 10/2/2010]



Comments from 10/2/2010:

Yes, multi-agent stuff reminds Minsky's "Society of Mind", but I hadn't read it then and still have not.

The drug stuff is related to dopamine, endorfine and other neurotransmitters, "Reward pathway" ans so on. The discussion is about "virtual" units, Control units , not about neurons etc., though.

Also, the last discussion is about Reinforcement learning bug, I didn't know the word "reinforcement learning" back then and didn't know what exactly behaviorism was, I knew about utilitarism, though, and was using my imagination to find explanations.

A version of reinforcement learning was reinvented in my early
works from my Teenage Theory of Universe and Mind, where the Control Unit (agent/human/society/state/system) is predicting and aiming at scenarios which would lead to the biggest cummulative pleasure for a given period ahead.

In society and states, most of the laws have this function, and it's related to another concept - the limited level of pleasure (reward in standard RL terms) that any control unit/subunit/agent can get in any possible circumsances. This is supposed to prevent one from gaining too much control over the system - mind/society.
I gave the following example: when somebody robs a bank, he gets rich, can do better what he wants, but his happiness is not unlimited, it's normalized to 1 anyway. On the other hand, he is causing lasting displeausure to many people and the sum is highly negative. The system is aiming to avoid robberies. This is related also to doing what one wants and what doesn't want (another important concept in my theory about Control Units). Control Units are aiming to control, and loosing control and predicatability causes displeasure/confusion.

Recently I found that Marcus Hutter and Shane Legg talk about a concept which is smiliar to my limited reward "Assuming that environments return bounded sum rewards ... " in "Universal Intelligence:A Definition of Machine Intelligence". Here. They also define the Universally intelligent agent as a seeker of a sum of rewards. However, his papers are published years later (or around, I haven't checked all), and I didn't know about him at the time, I heard of him in 2009.

In 2002-2004 when I was bulding the foundations of my Universal AI theories and understanding, I was too young, busy and "ignorant" - I didn't know about any of the gurus such as Goertzel, Schmidhuber, Hutter, de Garis. Hawkins has just appeared with his "On Intelligence" right after I published the last 4-th part of my first works, which was written
mostly in 2003.

I wrote my stuff on my own
and I still believe that "Imagination is more important than knowledge"; can't be so ignorant as back then, though.

I wasn't sure whether I'm not a mad man at the time, because it's hard to find anybody to understand and appreciate such advanced stuff as AGI, I was 18-19 years old. Years later, starting from 2007, I started to find that my ideas, pieces of them or directions are shared by gurus in AGI - first Hawkins in his Memory-prediction framework and "On Intelligence", then Boris Kazachenko, Jurgen Schmidhuber, Marcus Hutter.

Of course, this "Teenage Theory of Universe and Mind" is something that will be translated and shared as well.


Keywords: Reinforcement learning, multi-agent systems, control units, pleasure seeking, utilitarism, local maximum, Todor Arnaudov's Teenage Theory of Universe and Mind, Todor Arnaudov's Teenage Theory of Universe and Intelligence, limited reward, limited pleasure, bounded reward, bounded pleasure, addiction, addiction bug of reinforcement learning, cumulative reward, cumulative pleasure, pleasure seeking, seeker, searcher, intelligent agents, Twenkid Research

http://research.twenkid.com


No comments :