logo  

Center for the Extraction and Summarization of Events and Opinions in Text  
line decor
  HOME  ::   RESEARCH  ::   PUBLICATIONS  ::   PEOPLE  ::   CORPORA  ::   SYSTEMS  ::   CONTACT  ::  
line decor


RESEARCH













University of Pittsburgh






Detecting Opinion and Sentiment Types in the News and on the Web to Improve Automatic Question Answering and Information Extraction:
The general public is now able to voice their opinions, as well as debate, understand and persuade others on important topics via message boards, discussion forums and online debates.  Being able to automatically extract these opinions in a timely manner enables one to take the pulse of the general public.  In the past year, the University of Pittsburgh research group has been studying the
expression of different types of opinions, in the news and the web to support automatic question answering (QA). To find answers to a question such as "Are you worried about climate change?" should involve searching for sentiment and emotion expressed in text, whereas finding answers to a question such as "What would be the effect of reporting Iran to the Security Council?"  should involve finding persuasive argumentation.

We first showed that these opinion types can be reliably identified by humans. For this purpose, we trained multiple human annotators to detect Sentiment and Arguing in text and performed statistical tests to confirm that humans are able to detect these categories reliably. In the second step, we used machine learning methods to detect these opinion categories automatically. For this, we used rich lexical resources developed for subjectivity analysis by CERATOPS reseachers. Finally, we incorporated this knowledge in a QA system to enhance its capability to answer opinion questions.

In the past, we have shown that opinion extraction techniques may be used to improve the performance of information extraction systems.  Currently, we are conducting a study of the types of subjectivity that occur in Promed articles to determine what types of attitude recognition will be most beneficial for syndromic surveillance.












University of Utah










The natural language processing group at the University of Utah has developed several new methods for extracting factual information from unstructured text. Our new IE techniques have achieved good results in the domains of Latin American terrorism, using the MUC-4 corpus, and infectious disease outbreaks, using a text collection of ProMed-mail articles.

We have developed a new method for IE pattern learning that exploits "role-identifying nouns", which are nouns whose semantics reveal the role that they play in an event (e.g., an "assassin" is defined as a perpetrator). Given a few seed nouns, a bootstrapping algorithm automatically learns role-identifying nouns, which are then used to learn extraction patterns for that event role. Using the same approach, we also can learn expanded extraction patterns that can recognize "role-identifying expressions", which consist of a role-identifying verb linked to an event noun (e.g., "<subject> participated in the bombing"). Using these techniques, an IE system can be created for a new domain using only unannotated texts and a few seed words for training.

In a second line of investigation, we have been exploring a different approach to information extraction that decouples the tasks of finding relevant regions of text and applying extraction patterns.  We create a self-trained relevant sentence classifier to identify relevant regions, and use a "semantic affinity" measure to automatically learn domain-relevant extraction patterns.  We can then distinguish primary patterns from secondary patterns and apply the patterns selectively in the relevant regions. This approach requires only a few seed patterns and a collection of relevant and irrelevant documents for training. We have also explored the idea of using the Web to automatically identify domain-specific IE patterns that were not seen during training.  We used IE patterns learned from the training set as anchors to identify domain-specific web pages and then learned new IE patterns from the web texts. The IE patterns learned from the web improved recall with only a small precision loss.












Cornell University










EXTRACTION OF FINE-GRAINED OPINION FRAMES.  The CERATOPS UAC focuses on the development of methods for extracting and summarizing expressions of opinion that appear in digital text.  In contrast to most approaches to the analysis of subjective text, which aim to determine whether an entire document is positive or negative in tone, we tackle the problem of automating fine-grained interpretation of opinion expressions at the phrase-level or clause-level.  In particular, a single article or text will often have multiple opinion expressions --- with each opinion expression possibly associated with a different source (e.g., the opinion holder), related to a different topic (e.g., the target or subject of the opinion), or indicating a different sentiment (e.g., weakly negative vs. strongly positive).

During this past year, we have developed the first system that produces detailed "opinion frames" for each phrase-level opinion expression in a document . Each frame includes the (1) opinion source (e.g., the opinion holder), (2) the opinion polarity (e.g., positive, neutral, negative), (3) the opinion strength or intensity (e.g., low, mild, extreme), and (4) the opinion trigger (e.g., the word, words, or phrase from the digital text that indicate an opinion is being expressed). (We are currently working on topic extraction.) The end-to-end system builds on previous work from CERATOPS, the UIUC UAC , and the ISI UAC .

PRODUCTIZATION OF OPINION SUMMARIZATION SOFTWARE. Over the past year, there has been industry interest in CERATOPS research in the area of summarizing fine-grained opinions. Entrepreneur Larry Levy formed and financed a company, Jodange LLC, to develop opinion summarization systems for the financial services industry.  The company is based in Yonkers, NY, has five employees, and was selected to launch its initial product in January 2008 at the DEMO08 conference.