Chatbots Review: The Turing test V – Strengths and Weaknesses of the Turing Test.

Strengths
Tractability
The philosophy of mind, psychology, and modern neuroscience have been unable to provide definitions of "intelligence" and "thinking" that are sufficiently precise and general to be applied to machines. Without such definitions, the central questions of the philosophy of artificial intelligence cannot be answered. The Turing test, even if imperfect, at least provides something that can actually be measured. As such, it is a pragmatic solution to a difficult philosophical question.

Breadth of subject matter
The power of the Turing test derives from the fact that it is possible to talk about anything. Turing wrote that "the question and answer method seems to be suitable for introducing almost any one of the fields of human endeavor that we wish to include." John Haugeland adds that "understanding the words is not enough; you have to understand the topic as well."
In order to pass a well-designed Turing test, the machine must use natural language, reason, have knowledge and learn. The test can be extended to include video input, as well as a "hatch" through which objects can be passed: this would force the machine to demonstrate the skill of vision and robotics as well. Together, these represent almost all of the major problems of artificial intelligence.
The Feigenbaum test is designed to take advantage of the broad range of topics available to a Turing test. It compares the machine against the abilities of experts in specific fields such as literature or chemistry.

Weaknesses
The Turing test is based on the assumption that human beings can judge a machine's intelligence by comparing its behaviour with human behaviour. Every element of this assumption has been questioned: the human's judgement, the value of comparing only behaviour and the value of comparing against a human. Because of these and other considerations, some AI researchers have questioned the usefulness of the test. In practice, the test's results can easily be dominated not by the computer's (pseudo-?) intelligence, but by the attitudes, skill or naiveté of the questioner.

Human intelligence vs intelligence in general
The Turing test does not directly test whether the computer behaves intelligently - it tests only whether the computer behaves like a human being. Since human behavior and intelligent behavior are not exactly the same thing, the test can fail to accurately measure intelligence in two ways:

Some human behavior is unintelligent
The Turing test requires that the machine be able to execute all human behaviors, regardless of whether they are intelligent. It even tests for behaviors that we may not consider intelligent at all, such as the susceptibility to insults, the temptation to lie or, simply, a high frequency of typing mistakes. If a machine cannot imitate human behavior in detail, it fails the test.
This objection was raised by The Economist, in an article entitled "Artificial Stupidity" published shortly after the first Loebner prize competition in 1992. The article noted that the first Loebner winner's victory was due, at least in part, to its ability to "imitate human typing errors." Turing himself had suggested that programs add errors into their output, so as to be better "players" of the game.

Some intelligent behavior is inhuman
The Turing test does not test for highly intelligent behaviors, such as the ability to solve difficult problems or come up with original insights. In fact, it specifically requires deception on the part of the machine: if the machine is more intelligent than a human being it must deliberately avoid appearing too intelligent. If it were to solve a computational problem that is impossible for any human to solve, then the interrogator would know the program is not human, and the machine would fail the test.
Because it cannot measure intelligence that is beyond the ability of humans, the test cannot be used in order to build or evaluate systems that are more intelligent than humans. Because of this, several test alternatives that would be able to evaluate superintelligent systems have been proposed.

Real intelligence vs simulated intelligence
The Turing test is concerned strictly with how the subject acts — the external behaviour of the machine. In this regard, it takes a behaviourist or functionalist approach to the study of intelligence. The example of ELIZA suggests that a machine passing the test may be able to simulate human conversational behavior by following a simple (but large) list of mechanical rules, without thinking or having a mind at all.
John Searle has argued that external behavior cannot be used to determine if a machine is "actually" thinking or merely "simulating thinking." His chinese room argument is intended to show that, even if the Turing test is a good operational definition of intelligence, it may not indicate that the machine has a mind, consciousness, or intentionality (Intentionality is a philosophical term for the power of thoughts to be "about" something).

Turing anticipated to this line of criticism in his original paper, writing that:
I do not wish to give the impression that I think there is no mystery about consciousness. There is, for instance, something of a paradox connected with any attempt to localise it. But I do not think these mysteries necessarily need to be solved before we can answer the question with which we are concerned in this paper. — Alan Turing, (Turing 1950)

Naivete of interrogators and the anthropomorphic fallacy
The Turing test assumes that the interrogator is sophisticated enough to determine the difference between the behaviour of a machine and the behaviour of a human being, though critics argue that this is not a skill most people have.
Turing does not specify the precise skills and knowledge required by the interrogator in his description of the test, but he did use the term "average interrogator": "[the] average interrogator would not have more than 70 per cent chance of making the right identification after five minutes of questioning". Shah & Warwick (2009c) show that experts are fooled, and that interrogator strategy, 'power' vs 'solidarity' affects correct identification, the latter being more successful.
Chatterbot programs such as ELIZA have repeatedly fooled unsuspecting people into believing that they are communicating with human beings. In these cases, the "interrogator" is not even aware of the possibility that they are interacting with a computer. To successfully appear human, there is no need for the machine to have any intelligence whatsoever and only a superficial resemblance to human behaviour is required. Most would agree that a "true" Turing test has not been passed in "uninformed" situations like these.
Early Loebner prize competitions used "unsophisticated" interrogators who were easily fooled by the machines. Since 2004, the Loebner Prize organizers have deployed philosophers, computer scientists, and journalists among the interrogators. However, even some of these experts have been deceived by the machines.
Michael Shermer points out that human beings consistently choose to consider non-human objects as human whenever they are allowed the chance, a mistake called the anthropomorphic fallacy: They talk to their cars, ascribe desire and intentions to natural forces (e.g., "nature abhors a vacuum"), and worship the sun as a human-like being with intelligence. If the Turing test is applied to religious objects, Shermer argues, then, that inanimate statues, rocks, and places have consistently passed the test throughout history. This human tendency towards anthropomorphism effectively lowers the bar for the Turing test, unless interrogators are specifically trained to avoid it.

Impracticality and irrelevance: the Turing test and AI research
Mainstream AI researchers argue that trying to pass the Turing Test is merely a distraction from more fruitful research. Indeed, the Turing test is not an active focus of much academic or commercial effort—as Stuart Russell and Peter Norvig write: "AI researchers have devoted little attention to passing the Turing test." There are several reasons.
First, there are easier ways to test their programs. Most current research in AI-related fields is aimed at modest and specific goals, such as automated scheduling, object recognition, or logistics. In order to test the intelligence of the programs that solve these problems, AI researchers simply give them the task directly, rather than going through the roundabout method of posing the question in a chat room populated with computers and people.
Second, creating life-like simulations of human beings is a difficult problem on its own that does not need to be solved to achieve the basic goals of AI research. Believable human characters may be interesting in a work of art, a game, or a sophisticated user interface, but they are not part of the science of creating intelligent machines, that is, machines that solve problems using intelligence. Russell and Norvig suggest an analogy with the history of flight: Planes are tested by how well they fly, not by comparing them to birds. "Aeronautical engineering texts," they write, "do not define the goal of their field as 'making machines that fly so exactly like pigeons that they can fool other pigeons.'"
Turing, for his part, never intended his test to be used as a practical, day-to-day measure of the intelligence of AI programs; he wanted to provide a clear and understandable example to aid in the discussion of the philosophy of artificial intelligence. As such, it is not surprising that the Turing test has had so little influence on AI research — the philosophy of AI, writes John McCarthy, "is unlikely to have any more effect on the practice of AI research than philosophy of science generally has on the practice of science."

Based on http://en.wikipedia.org/wiki/Turing_test licensed under the Creative Commons Attribution-Share-Alike License 3.0

Chatbots Review

Saturday, December 31, 2011

The Turing test V – Strengths and Weaknesses of the Turing Test.

No comments:

Post a Comment