Chatbots Review

Saturday, December 31, 2011

Loebner Prize II - Prizes and Contest History.

For the Loebner Prize, originally, $2,000 was awarded for the most human-seeming chatterbot in the competition. The prize was $3,000 in 2005 and $2,250 in 2006. In 2008, $3,000 was awarded.
In addition, there are two one-time-only prizes that have never been awarded. $25,000 is offered for the first chatterbot that judges cannot distinguish from a real human and which can convince judges that the human is the computer program. $100,000 is the reward for the first chatterbot that judges cannot distinguish from a real human in a Turing test that includes deciphering and understanding text, visual, and auditory input. Once this is achieved, the annual competition will end.

In 2006, the contest was organised by Tim Child (CEO of Televirtual) and Huma Shah. On August 30, the four finalists were announced:
• Rollo Carpenter
• Richard Churchill and Marie-Claire Jenkins
• Noah Duncan
• Robert Medeksza
The contest was held on 17 September in the VR theatre, Torrington Place campus of University College London. The judges included the University of Reading's cybernetics professor, Kevin Warwick, a professor of artificial intelligence, John Barnden (specialist in metaphor research at the University of Birmingham), a barrister, Victoria Butler-Cole and a journalist, Graham Duncan-Rowe. The latter's experience of the event can be found in an article in Technology Review. The winner was 'Joan', based on Jabberwacky, both created by Rollo Carpenter.

The 2007 competition was held on 21 October in New York City. The judges were: Computer Science Professor Russ Abbott, Philosophy Professor Hartry Field, Psychology Assistant Professor Clayton Curtis and English lecturer Scott Hutchins.
No bot passed the Turing Test, but the judges ranked the three contestants as follows:
• 1st: Robert Medeksza from Zabaware, creator of Ultra Hal Assistant
• 2nd: Noah Duncan, a private entry, creator of Cletus
• 3rd: Rollo Carpenter from Icogno, creator of Jabberwacky
The winner received $2,250 and the annual medal. The runners-up received $250 each.

The 2008 competition was organised by Professor Kevin Warwick, coordinated by Huma Shah and held on 12 October at the University of Reading, UK. After testing by over one hundred judges during the preliminary phase, in June and July 2008, six finalists were selected from thirteen original entrants - artificial conversational entity (ACE). Five of those invited competed in the finals:
• Brother Jerome, Peter Cole and Benji Adams
• Elbot, Fred Roberts / Artificial Solutions
• Eugene Goostman, Vladimir Veselov, Eugene Demchenko and Sergey Ulasen
• Jabberwacky, Rollo Carpenter
• Ultra Hal, Robert Medeksza
In the finals, each of the judges was given five minutes to conduct simultaneous, split-screen conversations with two hidden entities. Elbot of Artificial Solutions won the 2008 Loebner Prize bronze award, for most human-like artificial conversational entity, through fooling three of the twelve judges who interrogated it (in the human-parallel comparisons) into believing it was human. This is coming very close to the 30% traditionally required to consider that a program has actually passed the Turing test. Eugene Goostman and Ultra Hal both deceived one judge each that it was the human.
Will Pavia, a journalist for The Times, has written about his experience; a Loebner finals' judge, he was deceived by Elbot and Eugene. Kevin Warwick and Huma Shah have reported on the parallel-paired Turing tests here.

The 2009 Loebner Prize Competition was held 6 September 2009 at the Brighton Centre, Brighton UK in conjunction with Interspeech 2009 conference. The prize amount for 2009 was USD 3000.
Entrants were David Levy, Rollo Carpenter, and Mohan Embar, who finished in that order.

The 2010 Loebner Prize Competition was held on October 23rd at California State University, Los Angeles. The 2010 competition was the 20th running of the contest.

Official list of winners.
1991 Joseph Weintraub - PC Therapist
1992 Joseph Weintraub - PC Therapist
1993 Joseph Weintraub - PC Therapist
1994 Thomas Whalen - TIPS
1995 Joseph Weintraub - PC Therapist
1996 Jason Hutchens - HeX

1997 David Levy - Converse
1998 Robby Garner - Albert One
1999 Robby Garner - Albert One
2000 Richard Wallace - Artificial Linguistic Internet Computer Entity (A.L.I.C.E.)
2001 Richard Wallace - Artificial Linguistic Internet Computer Entity (A.L.I.C.E.)
2002 Kevin Copple - Ella
2003 Juergen Pirner - Jabberwock
2004 Richard Wallace - Artificial Linguistic Internet Computer Entity (A.L.I.C.E.)
2005 Rollo Carpenter - George
2006 Rollo Carpenter - Joan
2007 Robert Medeksza - Ultra Hal
2008 Fred Roberts - Elbot
2009 David Levy - Do-Much-More
2010 Bruce Wilcox - Suzette

Based on http://en.wikipedia.org/wiki/Loebner_prize licensed under the Creative Commons Attribution-Share-Alike License 3.0

Loebner Prize I - rules and restriction

The Loebner Prize is an annual competition in artificial intelligence that awards prizes to the chatterbot considered by the judges to be the most human-like. The format of the competition is that of a standard Turing test. In each round, a human judge simultaneously holds textual conversations with a computer program and a human being via computer. Based upon the responses, the judge must decide which is which.

The contest began in 1990 by Hugh Loebner in conjunction with the Cambridge Center for Behavioral Studies, Massachusetts, United States. It has since been associated with Flinders University, Dartmouth College, the Science Museum in London, and most recently the University of Reading. In 2004 and 2005, it was held in Loebner's apartment in New York City.

Within the field of artificial intelligence, the Loebner Prize is somewhat controversial; the most prominent critic, Marvin Minsky, has called it a publicity stunt that does not help the field along.

In addition, the time limit of 5 minutes and the use of untrained and unsophisticated judges has resulted in some wins that may be due to trickery rather than to plausible intelligence, as one can judge from transcripts of winning conversations

The rules of the competition have varied over the years and early competitions featured restricted conversation Turing tests but since 1995 the discussion has been unrestricted.

For the three entries in 2007, Robert Medeksza, Noah Duncan and Rollo Carpenter, some basic "screening questions" were used by the sponsor to evaluate the state of the technology. These included simple questions about the time, what round of the contest it is, etc.; general knowledge ("What is a hammer for?"); comparisons ("Which is faster, a train or a plane?"); and questions demonstrating memory for preceding parts of the same conversation. "All nouns, adjectives and verbs will come from a dictionary suitable for children or adolescents under the age of 12." Entries did not need to respond "intelligently" to the questions to be accepted.

For the first time in 2008 the sponsor allowed introduction of a preliminary phase to the contest opening up the competition to previously disallowed web-based entries judged by a variety of invited interrogators. The available rules do not state how interrogators are selected or instructed. Interrogators (who judge the systems) have limited time: 5 minutes per entity in the 2003 competition, 20+ per pair in 2004–2007 competitions, and 5 minutes to conduct simultaneous conversations with a human and the program since 2008.

Based on http://en.wikipedia.org/wiki/Loebner_prize licensed under the Creative Commons Attribution-Share-Alike License 3.0

The Turing test VI – Variations of the Turing Test.

Numerous other versions of the Turing test, including those expounded above, have been mooted through the years.

Reverse Turing test and CAPTCHA
A modification of the Turing test wherein the objective of one or more of the roles have been reversed between machines and humans is termed a reverse Turing test. An example is implied in the work of psychoanalyst Wilfred Bion, who was particularly fascinated by the "storm" that resulted from the encounter of one mind by another. Carrying this idea forward, R. D. Hinshelwood described the mind as a "mind recognizing apparatus," noting that this might be some sort of "supplement" to the Turing test. The challenge would be for the computer to be able to determine if it were interacting with a human or another computer. This is an extension of the original question that Turing attempted answer but would, perhaps, offer a high enough standard to define a machine that could "think" in a way that we typically define as characteristically human.
CAPTCHA is a form of reverse Turing test. Before being allowed to perform some action on a website, the user is presented with alphanumerical characters in a distorted graphic image and asked to type them out. This is intended to prevent automated systems from being used to abuse the site. The rationale is that software sufficiently sophisticated to read and reproduce the distorted image accurately does not exist (or is not available to the average user), so any system able to do so is likely to be a human.
Software that can reverse CAPTCHA with some accuracy by analyzing patterns in the generating engine is being actively developed.

"Fly on the wall" Turing test
The "fly on the wall" variation of the Turing test changes the original Turing-test parameters in three ways. First, parties A and B communicate with each other rather than with party C, who plays the role of a detached observer ("fly on the wall") rather than of an interrogator or other participant in the conversation. Second, party A and party B may each be either a human or a computer of the type being tested. Third, it is specified that party C must not be informed as to the identity (human versus computer) of either participant in the conversation. Party C's task is to determine which of four possible participant combinations (human A/human B, human A/computer B, computer A/human B, computer A/computer B) generated the conversation. At its most rigorous, the test is conducted in numerous iterations, in each of which the identity of each participant is determined at random (e.g., using a fair-coin toss) and independently of the determination of the other participant's identity, and in each of which a new human observer is used (to prevent the discernment abilities of party C from improving through conscious or unconscious pattern recognition over time). The computer passes the test for human-level intelligence if, over the course of a statistically significant number of iterations, the respective parties C are unable to determine with better-than-chance frequency which participant combination generated the conversation.
The "fly on the wall" variation increases the scope of intelligence being tested in that the observer is able to evaluate not only the participants' ability to answer questions but their capacity for other aspects of intelligent communication, such as the generation of questions or comments regarding an existing aspect of a conversation subject ("deepening"), the generation of questions or comments regarding new subjects or new aspects of the current subject ("broadening"), and the ability to abandon certain subject matter in favor of other subject matter currently under discussion ("narrowing") or new subject matter or aspects thereof ("shifting").
The Bion-Hinshelwood extension of the traditional test is applicable to the "fly on the wall" variation as well, enabling the testing of intellectual functions involving the ability to recognize intelligence: If a computer placed in the role of party C (reset after each iteration to prevent pattern recognition over time) can identify conversation participants with a success rate equal to or higher than the success rate of a set of humans in the party-C role, the computer is functioning at a human level with respect to the skill of intelligence recognition.

Subject matter expert Turing test
Another variation is described as the subject matter expert Turing test, where a machine's response cannot be distinguished from an expert in a given field. This is also known as a "Feigenbaum test" and was proposed by Edward Feigenbaum in a 2003 paper.

Immortality test
The Immortality-test variation of the Turing test would determine if a person's essential character is reproduced with enough fidelity to make it impossible to distinguish a reproduction of a person from the original person.

Minimum Intelligent Signal Test
The Minimum Intelligent Signal Test, proposed by Chris McKinstry, is another variation of Turing's test, where only binary responses are permitted. It is typically used to gather statistical data against which the performance of artificial intelligence programs may be measured.

Meta Turing test
Yet another variation is the Meta Turing test, in which the subject being tested (say, a computer) is classified as intelligent if it has created something that the subject itself wants to test for intelligence.

Hutter Prize
The organizers of the Hutter Prize believe that compressing natural language text is a hard AI problem, equivalent to passing the Turing test.
The data compression test has some advantages over most versions and variations of a Turing test, including:
• It gives a single number that can be directly used to compare which of two machines is "more intelligent."
• It does not require the computer to lie to the judge
The main disadvantages of using data compression as a test are:
• It is not possible to test humans this way.
• It is unknown what particular "score" on this test—if any—is equivalent to passing a human-level Turing test.

Other tests based on compression or Kolmogorov Complexity
A related approach to Hutter's prize which appeared in the late 1990s is the inclusion of compression problems in an extended Turing Test. Two major advantages of some of these tests are their applicability to nonhuman intelligences and their absence of a requirement for human testers.

Based on http://en.wikipedia.org/wiki/Turing_test licensed under the Creative Commons Attribution-Share-Alike License 3.0

Chatbots III - Commercial functions for Chatbots

Automated conversational systems have now progressed, and large companies such as Lloyds Banking Group, Royal Bank of Scotland, Renault and Citroën are already using them instead of call centers to provide a first point of contact. Chatbots can also be implemented via Twitter, or Windows Live Messenger. For example, the Robocoke chatbot for Coca Cola Hungary. This chatbot provides users with information about the brand Coca Cola, but he can also give users party and concert recommendations all over Hungary. These kind of chatbots are often used for marketing purposes.

Popular online portals like eBay and PayPal are also using multi lingual virtual agents to offer online support to their customers. For example, PayPal uses chatterbot Louise to handle queries in English and chatterbot Léa to handle queries in French. Developed by VirtuOz, both agents handle 400,000 conversations in a month.These agents have been functional since September 2008 on PayPal websites.

Malicious chatterbots are frequently used to fill chat rooms with spam and advertising, or to entice people into revealing personal information, such as bank account numbers. They are commonly found on Yahoo! Messenger, Windows Live Messenger, AOL Instant Messenger and other instant messaging protocols. There has also been a published report of a chatterbot used in a fake personal ad on a dating service's website.

Competitions for Chatbots focus on the Turing test or more specific goals. Two such annual contests are the Loebner Prize and The Chatterbox Challenge.

Based on http://en.wikipedia.org/wiki/Chatbots licensed under the Creative Commons Attribution-Share-Alike License 3.0