Wednesday, May 27, 2009

HAL is trying to become a two-time champion!

Jeopardy set entrance at CES by Joseph Hunkins (joeduck) used under a Creative Commons License

Back in April, in my Empoprise-NTN blog, I discussed the "Watson" computer that IBM is designing to play Jeopardy. In that post I cited a Technology Review post, which noted that programming a computer to play Jeopardy is a much more complex task than programming a computer to play chess.

Well, over the last several hours, three mentions of Watson have crossed my Google Reader feeds. Two of them are, again, from Technology Review. The first post links to a video that begins by illustrating the issue. It begins with an answer from a guy named Alex Trebek, who (probably intentionally) illustrates the complex nature of the problem by noting that the question that goes with the answer is "elementary." The reference to the word "elementary" has nothing to do with IBM, but is based upon the fact that IBM's most famous leaders had a last name that was also used by a famous fictional doctor, and that the fictional doctor's friend liked to use the word "Elementary." How do you teach a computer to understand that a book/movie reference pertains to a business machine company?

In the second Technology Review post, David Ferrucci gives a high-level explanation of how Watson would solve a human problem:

Ferrucci describes how the technology would handle the following Jeopardy!-style question: "It's the opera mentioned in the lyrics of a 1970 number-one hit by Smokey Robinson and the Miracles."

The Watson engine uses natural-language processing techniques to break the question into structural components. In this case, the pieces include 1) an opera; 2) the opera is mentioned in a song; 3) the song was a hit in 1970; and 4) the hit was by Smokey Robinson and the Miracles.

In searching its databases for information that could be relevant to these segments, the system might find hundreds of passages. These could include the following three:

"Pagliacci,'' the opera about a clown who tries to keep his feelings "hid";

Smokey Robinson's Motown hit record of the '60s "Tears of a Clown";

"Tears of a Clown" by the Miracles hit #1 in the UK in 1970.

By analyzing these passages, Watson can identify "Pagliacci" as being "an opera," although this on its own would not be much help, since many other passages also identify opera names. The second result identifies a hit record, "The Tears of a Clown," by "Smokey Robinson," which the system judges to be probably the same thing as "Smokey Robinson and the Miracles." However, many other song titles would be generated in a similar manner. The probability that the result is accurate would also be judged low, because the song is associated with "the '60s" and not "1970." The third passage, however, reinforces the idea that "The Tears of a Clown" was a hit in 1970, provided the system determines that "The Miracles" refers to the same thing as "Smokey Robinson and the Miracles."

From the first of these three passages, the Watson engine would know that Pagliacci is an opera about a clown who hides his feelings. To make the connection to Smokey Robinson, the system has to recognize that "tears" are strongly related to "feelings," and since it knows that Pagliacci is about a clown that tries to keep its feelings hid, it guesses--correctly--that Pagliacci is the answer. Of course, the system might still make the wrong choice "depending on how the wrong answers may be supported by the available evidence," says Ferrucci.

And Alex Trebek goes "nyah nyah nyah" to all the chess players out there.

The third item that I read didn't come from MIT, but from a blogger who knows a thing or two about Jeopardy - Ken Jennings. He read the description above, and shared some thoughts of his own:

As I told Ed Toutant last week: I actually think you could get pretty good Jeopardy!-winning results with a computer using a very naive algorithm. To wit: programming the computer with the length and breadth of the J! Archive, and just doing some simple matching against key words in the clue. If, for example, the “this” clause (”this country”) and two other key nouns match (”Ayers Rock” or something), then the machine buzzes. I bet that’s a pretty short Perl script that could beat some human players. Maybe I shouldn’t be giving IBM tips, though.

However, I'd be willing to bet that when Watson shows up in the studio, they'll write some NEW questions. So much for a Jeopardy archive. Here's my vote for a new question:

This fruit took out a full-page ad that read, "Welcome, IBM. Seriously."

If Watson can answer the Smokey question, the fruit question should be a breeze. (Hint.)
blog comments powered by Disqus