Founded in 1966

Distinguished Lecturer Series

Building and verifying a shallow ontology for higher-quality NLP

Eduard Hovy

Information Sciences Institute, University of Southern California

Friday, September 5, 2008
10:30am - SENSQ 5317

Refreshments/meet the speaker at 10:00am

Hosted by Janyce Wiebe

Abstract

Research in natural language processing (NLP) over the past fifteen years has produced impressive practical results using statistical methods. But increasingly there are signs that continued quality improvement in language processing applications (including QA, summarization, information extraction, and machine translation) requires deeper and richer representations, possibly even (shallow) semantics of text meaning. Although theories of semantics (formal and informal) abound, no-one has yet built a resource of semantic symbols that effectively supports NLP, that is empirically based, and that has been validated through human agreement scores. Can this be done? This talk describes the construction of the Omega ontology to support various NLP applications, in the context of the OntoNotes project in DARPA’s GALE program. Omega contains an Upper Model of about a hundred manually constructed and organized terms and a Middle Model of several thousand ‘sense pools’, where each sense pool is a collection of word senses from English, Arabic, and Chinese nouns and verbs, and includes one or more associated atomic features to support reasoning, as well as pointers to hundreds of individual sentences containing a word with the appropriate sense. The creation of senses, their pooling, and their integration into Omega is carried out by teams of annotators, and is subjected to cross-annotator agreement tests and other semi-automated validation procedures. To our knowledge, this is by far the most extensive ontology building effort that involves such validation. This work is a collaboration of researchers at USC/ISI and the University of Colorado at Boulder.

 

Biography of Speaker

Eduard Hovy leads the Natural Language Research Group at the Information Sciences Institute of the University of Southern California. He is also Deputy Director of the Intelligent Systems Division, as well as a research associate professor of the Computer Science Department of USC and Advisory Professor of the Beijing University of Posts and Telecommunications. He completed a Ph.D. in Computer Science (Artificial Intelligence) at Yale University in 1987. His research focuses on information extraction, automated text summarization, the semi-automated construction of large lexicons and ontologies, machine translation, question answering, and digital government.

You are using an older browser that does not support current Web standards. Although this site is viewable in all browsers, it will look much better in a browser that supports Web standards.