Topics for Today

How can we evaluate AI systems?

Formal measures common in research

word senses, parts of speech, syntax, ?

(sometimes already exists)

Summarization, machine translation: are these reasonable? Can't expect exact matches!

The Turing Test

Judge computers by human behavior

Room1 Room2

Person 1---------------- Computer responding

Person 2---------------- Person responding

Can the computer fool person 1 into thinking

it's a computer?

The Turing Test: What do you think?

- (Allen, keynote address, AAAI-97) : defines success only in terms of human intelligence

(Also not well founded: computer could act like a crazy person, person could act like a computer)

Should we use Humans as our models?

Maybe machines must do things differently

than people (animals) do them

Current practice: Both and Neither

Many AI researchers use probability theory and formal logic, without claim of cognitive validity

Equivalent to in-depth analysis and observation of human behavior