Wednesday, November 14, 2007

// // Leave a Comment

Faults in Turing Test and Lovelace Test. Introduction of Educational Test.

Faults in Turing Test and Lovelace Test. Introduction of Educational Test.
(For Measuring Intelligence Of Machines)

Todor Arnaudov


This essay criticizes faults in the settings of both Turing Test and Lovelace Test for deciding whether or not a machine is intelligent, respectively - creative. Ideas for objective measure of machine intelligence is given, applying human educational standards, used by psychologists and teachers to grade children's and students' cognitive performance.


1. Turing Test is wrong
2. Lovelace Test, learning machines and the causal agents
2.1. Lady Lovelace doesn't believe in creative machines
2.2. Turing response to Ada - a learning machine
2.3. Lovelace Test
3. Lovelace Test is wrong, too
3.1. What is wrong in Lovelace Test
3.2. Why the art should be magic?
4. Educational methods for measuring machine intelligence level
5. References

1. Turing Test is wrong

Turing Test is probably the most popular machine intelligence test ever, because it was the first one.

However, Turing Test is criticized for being inadequate. The author of this essay has been criticizing it himself in his earliest speculative essay about the possibility a thinking machine to be created. (Arnaudov, 2001)

What's wrong in Turing Test? Even if the machine behaves in text dialogue like a human, there are too simple ways to recognize that it is not a human. For example, the machine may be too smart, too fast, too slow... Human may ask personal questions like "where was your home 10 years ago", "when did you kiss a girl for the first time", "what is your favorite food" etc. If the machine is just a box with electronics, it wouldn't have personal life. It would have to lie, in order to take the exam.

On the other hand, there are a lot of "smart" ways for the machine to avoid answering any questions and engaging actively in conversation, pretending to be smart even if it is very dumb. ELIZA exploited this "tricky" way to pseudo intelligence long decades ago; unfortunately, current conversational agents seem to do it, too.

In brief, Turing Test sounds like how to trick someone that something is true.

Actually, a truly intelligent machine could do the trick the hard way - by its imagination. The young children who lie about the candy in the famous test, tend to be more intelligent and adaptive than those who don't. So if a machine is capable to make up a story and pretend to have had human experiences, this would mean that it is more intelligent than the naive machines, therefore - more intelligent?

Indeed, if it is capable to imagine itself scenarios of it's own "human" life, then it is creative. Creativeness should be more appropriate measure of intelligence, isn't it?

2. Lovelace Test (LT), Learning machines and the causal agents

Bringsjord, Bello and Ferrucci proposed "Lovelace Test" (2000), which is based on the creative abilities of an artificial agents. At first glance this test sounds better than Turing Test, but it has misleading "mystic" rules which will be questioned briefly in this chapter and in more details in the next one.

2.1. Lady Lovelace doesn't believe in creative machines

In the notes that Augusta Ada leaves about the Analytical Engine, she doubts that computers may ever be creative:

The Analytical Engine has no pretensions whatever to originate any thing. It can do whatever we know how to order it to perform. It can follow analysis; but it has no power of anticipating any analytical relations or truth. (Toole, 1992)

This statement sounds close to Searle's Chinese room, denying "real" artificial intelligence, because machines would just play with symbols, without "really" understand anything.

This speculation sounds correct about the Analytical Engine and its limited memory and computing power. However, it does not take into account that the future machines might grow so complex, that it's impossible one person to know in details how they really work and what precisely causes their output. Also, nowadays there are computational systems Which are able to evolve in a way that theoretically could be known or predicted by an external system if we have all the data, but practically it's very hard or impractical - e.g. neural nets, hierarchical temporal memory, search engine data warehouses.

2.2. Turing response to Ada - a learning machine

An important feature of the learning machine is that its teacher will often be very largely ignorant of quite what is going on inside, although he may still be able to some extent to predict his pupil's behavior. This should apply most strongly to the later education of a machine arising from a child-machine of well-tried design (or program). This is in clear contrast with normal procedure when using a machine to do computations: one's object is then to have a clear mental picture of the state of the machine at each moment in the computation. This object can only be achieved with a struggle. The view that "the machine can only do what we know how to order it to do" appears strange in the face of this. (Turing 1964, p. 29)

(Bringsjord et al. 2000) state that the point of Turing "can be easily surmounted" and the learning machine is "a puppet", like the artificial prose author Brutus.1 (Bringsjord & Ferrucci 1998).

Now, suppose that the child-machine Mc, on the strength of ANNs (Artificial Neural Network), computes some function f.This function is representable in some F. You can think of F in this case as a knowledge-base. But
then there is no longer any "thinking itself" going on, for if we assume a computer scientist
to be in command of the knowledge-base F and the relevant deduction from it, the reasons for
this scientist to declare the child-machine a puppet are isomorphic to the reasons that compel the
designer of knowledge-based systems like brutus to admit that such a system originates nothing. (Bringsjord et al. 2000)

They define the Lovelace Test:

2.3. Lovelace Test

Assume that Jones, a human
AInik, attempts to build an artificial computational agent A that doesn't engage in conversation,
but rather creates stories | creates in the Lovelacean sense that this system originates stories.
Assume that Jones activates A and that a stunningly belletristic story o is produced. We claim
that if Jones cannot explain how o was generated by A, and if Jones has no reason whatever to
believe that A succeeded on the strength of a uke hardware error, etc. (which entails that A can
produce other equally impressive stories), then A should at least provisionally be regarded genuinely
creative. An artificial computational agent passes LT if and only if it stands to its creator as A stands to Jones.

Def(LT) 1 Artificial agent A, designed by H, passes LT if and only if
1 A outputs o;
2 A's outputting o is not the result of a fluke hardware error, but rather the result of processes A can
3 H (or someone who knows what H knows, and has H's resources ) cannot explain how A produced

DefLT 2 Artificial agent A, designed by H, passes LT if and only if
1 A outputs o;
2 A's outputting o is not the result of a fluke hardware error, but rather the result of processes A can
3 H (or someone who knows what H knows, and has H's resources) cannot explain how A produced
o by appeal to A's architecture, knowledge-base, and core functions. (Bringsjord et al. 2000)

Bringsjord et al. discuss the so called "Oracle-machines" and find them incapable to pass LT test as well.

They conclude that there "may not be a way for a mere information-processing artifact to pass LT, because what Lovelace is looking for may require a kind of autonomy that is beyond the bounds of ordinary causation and mathematics".
The doctrine of agent causation is mentioned, which presumes that decisions of humans are made "directly, with no ordinary physical chain in the picture" (Bringsjord et al. 2000).

3. Lovelace Test is wrong, too

First of all, it is supported by hidden variables. Generally speaking, we can not extract the mind and body model of someone just by observing him or her. That is why it is impossible to predict precisely his or her behavior. However, it doesn't prove that the behaviour is not deterministic and that human creativity is "creative" by the definition of Lady Lovelace.

Let's take a quick tour through the points:

3.1. What is wrong in Lovelace Test

2. A's outputting o is not the result of a fluke hardware error, but rather the result of processes A can repeat; (Bringsjord et al. 2000)

Speaking about low level processes, that's OK, but it's hard to call low level processes in humans "creative", as well.

However, looking on that point from a different side, should the agent be able to repeat the process of outputting o?

The agent would be able to do it, if it's capable to memorize precisely enough the states of its mind, and if it is possible the states of its mind to be "played back"... Humans usually are not able to do so, and rarely can repeat the process of outputting a piece of art they've just created, unless it's a very short one. (Imagine the process of writing a novel.)

And actually, if they do repeat the process by their memory, it wouldn't correspond to the requirement to repeat the processes of outputting o, because the author would know now, that he has already created that piece of art and will just execute the instructions from his memory. Therefore he will be doing what he is ordered to do, even thought the orders come from an earlier time-space version of himself...

Well, I see that perhaps this repetition requirement means "the system should be deterministic...". However, complex deterministic systems lead to chaotic behavior, which may seem impossible to explain for the observers, but it is still deterministic anyway.

3. H (or someone who knows what H knows, and has H's resources) cannot explain how A produced o by appeal to A's architecture, knowledge-base, and core functions. (Bringsjord et al. 2000)

That's easy... Make the agent's mind intractably complex and lacking detailed self-reflection or ability to self-analyze its behavior. Or just include chance in its decisions, which comes from its perceptions. If the agent exists in a complex and active enough environment, it can't have full control of his perceptions.

If H can detect the precise agent's perceptions and completely emulate agent's mind, then yes - H will know that the agent does not create anything new, but just "is doing what is ordered to do" by its construction and its perceptions... That's right, but why do we think it's not the same about as?

Creation is a play with perceptions and exhaustive search for possible plausible perceptions, which can be resulted by transformations. This is the view of Arnaudov (2002, 2003, 2004) which he is aiming to prove.

Human creators do not really create anything new. They just generate or test combinations, which they didn't know before, haven't tested them for "novelty" before in particular creative domain; have forgotten that they have already tested them; or have forgotten or are unable to trace the inputs and the knowledge that caused the particular creative decisions to be made.

3.2. Why the art should be magic?

I would like to discuss the conclusion of (Bringsjord et al. 2000) as well:

...decisions of humans are made "directly, with no ordinary physical chain in the picture...

I would rewrite it like this:

...some of the decisions of humans appear, for some of the observers, like if the decisions are made directly, with no ordinary visible, simple enough and objective enough - for their cognitive abilities and knowledge - physical chain in the picture...

Yet there is not a technology capable to extract a model of a real working brain, a whole body and its direct environment, and simulate it, in order to explain physically its "free will" decisions. And yes, it seems that the knowledge, explaining how the machines do what they do, is known, because machines are made to be deterministic and we are sure that there are schematics and code which drive them.

I agree, that the key point in defining "creativity" in the viewpoint of the observers of creations is his or her inability to understand with sufficient degree of details or quickly enough how and why the particular artifact was created precisely the way it was.

Причина да се удивляваме от творчеството и изкуството е, че не схващаме пътя за създаването му, както е при случайността.

Creativity and art astonishes as, because we do not understand how it's done, like it is with the chance phenomenons.
(Arnaudov 2003)

If we know precisely how a piece could have been done, and we can create it systematically, we tend to call it "science" or "technology", instead of art. Then, if we can make a science from an art, the art would become obviously "mechanical"; it will lose the illusion that it can't be explained, the magic will be gone and we, humans, will be unhappy...

In my view, that's one reason why typical artists and people in general prefer not to be capable to understand deeply the process of creation of art, and why they need to think that art is something completely different from science.

4. Educational methods for measuring machine intelligence level

I believe Lovelace Test can be redefined in a better manner, but until it happens, I propose the following simple idea, inspired by glancing primary school teachers' manuals in the library few years ago.

Standardized, specially designed human intelligence tests exist, but actually all educational standards and requirements are different kinds of graded intelligence tests in different domains.

The machine may be examined like a student. E.g. in its "primary classes", it should take tests in:

1. Composing simple sentences.
2. Composing compound sentences from different types.
3. Composing simple stories (with given level of complexity).
4. Writing essays, expressing personal opinions with given increasing complexity

Teachers or examiners will inspect detailed educational standards, thus it would be possible to objectively compare machines' performance with that of typical human students. At least that would be true in standards of educational system.


Arnaudov, T. (2001) - Човекът и Мислещата машина (Анализ на възможността да се създаде Мислеща Машина и някои недостатъци на човека и органичната материя пред нея), сп. "Свещеният сметач", бр. 13, дек. 2001 г.

Arnaudov, T. (2002, 2003, 2004) - Теория за "Вселената сметач", 2002 - 2004. сп. "Свещеният сметач".

Arnaudov, T. (2003) - Творчеството е подражание на ниво алгоритми, сп. "Свещеният сметач", бр. 23, апр. 2003.

Arnaudov, T. (2004) - Анализ на смисъла на изречение въз основа на базата знания на действаща мислеща машина. Мисли за смисъла и изкуствената мисъл. Сп. "Свещеният сметач", бр.29, апр. 2004.

Bringsjord, S.; Ferrucci, D.; Bello, P. (2000) - Creativity, the Turing Test, and the (Better) Lovelace Test

Bringsjord, S., Noel, R., Why Did Evolution Engineer Consciousness? (2000). BRUTUS: A Parameterized Logicist Approach:

Bringsjord, S., Ferrucci, D. (1998) - AI and Literary Creativity: Inside the Minds of Brutus, A Storytelling Machine. Lawrence Erlbaum Associates

Toole, B. A. (1992) - Ada: The Enchantress of Numbers

Turing, A. (1964), Computing machinery and intelligence, in A. R. Anderson, ed., `Minds and
Machines', Prentice-Hall, Englewood Cliffs, NJ, pp. 4{30}.


ScienceDaily (1998), A Silicon Hemingway -- Artificial Author Brutus.1 Generates Betrayal By Bits; Mar. 12, 1998,


Plovdiv, November 14-th 2007

0 коментара: