| CERATOPS | |||||
| Center for the Extraction and Summarization of Events and Opinions in Text | |||||
| HOME :: RESEARCH :: PUBLICATIONS :: PEOPLE :: CORPORA :: SYSTEMS :: CONTACT :: | |||||
RESEARCH |
|||||
University of Pittsburgh |
|||||
Detecting Opinion and Sentiment Types in the News and on the Web to Improve Automatic Question Answering and Information Extraction: The general public is now able
to voice their opinions, as well as
debate, understand and persuade others on important topics via message
boards, discussion forums and online debates. Being able to
automatically extract these opinions in a timely manner enables one to
take the pulse of the general public. In the past year, the
University of Pittsburgh research group has been studying the
expression of different types of opinions, in the news and the web to support automatic question answering (QA). To find answers to a question such as "Are you worried about climate change?" should involve searching for sentiment and emotion expressed in text, whereas finding answers to a question such as "What would be the effect of reporting Iran to the Security Council?" should involve finding persuasive argumentation. We first showed that these
opinion types can be reliably identified by
humans. For this purpose, we trained multiple human annotators to
detect Sentiment and Arguing in text and performed statistical tests to
confirm that humans are able to detect these categories reliably. In
the second step, we used machine learning methods to detect these
opinion categories automatically. For this, we used rich lexical
resources developed for subjectivity analysis by CERATOPS reseachers.
Finally, we incorporated this knowledge in a QA system to enhance its
capability to answer opinion questions.
In the past, we have shown that
opinion extraction techniques may be
used to improve the performance of information extraction
systems. Currently, we are conducting a study of the types of
subjectivity that occur in Promed articles to determine what types of
attitude recognition will be most beneficial for syndromic surveillance.
|
|||||
University of Utah |
|||||
The
natural language processing group at the University of Utah has
developed several new methods for extracting factual information from
unstructured text. Our new IE techniques have achieved good results in
the domains of Latin American terrorism, using the MUC-4 corpus, and
infectious disease outbreaks, using a text collection of ProMed-mail
articles.
We have developed a new method for IE pattern learning that exploits "role-identifying nouns", which are nouns whose semantics reveal the role that they play in an event (e.g., an "assassin" is defined as a perpetrator). Given a few seed nouns, a bootstrapping algorithm automatically learns role-identifying nouns, which are then used to learn extraction patterns for that event role. Using the same approach, we also can learn expanded extraction patterns that can recognize "role-identifying expressions", which consist of a role-identifying verb linked to an event noun (e.g., "<subject> participated in the bombing"). Using these techniques, an IE system can be created for a new domain using only unannotated texts and a few seed words for training. In a second line of
investigation, we have been exploring a different
approach to information extraction that decouples the tasks of finding
relevant regions of text and applying extraction patterns. We
create a self-trained relevant sentence classifier to identify relevant
regions, and use a "semantic affinity" measure to automatically learn
domain-relevant extraction patterns. We can then distinguish
primary patterns from secondary patterns and apply the patterns
selectively in the relevant regions. This approach requires only a few
seed patterns and a collection of relevant and irrelevant documents for
training. We have also explored the idea of using the Web to
automatically identify domain-specific IE patterns that were not seen
during training. We used IE patterns learned from the training
set as anchors to identify domain-specific web pages and then learned
new IE patterns from the web texts. The IE patterns learned from the
web improved recall with only a small precision loss.
|
|||||
Cornell University |
|||||
EXTRACTION OF
FINE-GRAINED OPINION FRAMES. The CERATOPS UAC focuses on the
development of methods for extracting and summarizing expressions of
opinion that appear in digital text. In contrast to most
approaches to the analysis of subjective text, which aim to determine
whether an entire document is positive or negative in tone, we tackle
the problem of automating fine-grained interpretation of opinion
expressions at the phrase-level or clause-level. In particular, a
single article or text will often have multiple opinion expressions ---
with each opinion expression possibly associated with a different
source (e.g., the opinion holder), related to a different topic (e.g.,
the target or subject of the opinion), or indicating a different
sentiment (e.g., weakly negative vs. strongly positive).
During this past year, we have
developed the first system that produces detailed "opinion frames" for
each phrase-level opinion expression in a document . Each frame
includes the (1) opinion source (e.g., the opinion holder), (2) the
opinion polarity (e.g., positive, neutral, negative), (3) the opinion
strength or intensity (e.g., low, mild, extreme), and (4) the opinion
trigger (e.g., the word, words, or phrase from the digital text that
indicate an opinion is being expressed). (We are currently working on
topic extraction.) The end-to-end system builds on previous work from
CERATOPS, the UIUC UAC , and the ISI UAC .
PRODUCTIZATION OF OPINION SUMMARIZATION SOFTWARE. Over the past year, there has been industry interest in CERATOPS research in the area of summarizing fine-grained opinions. Entrepreneur Larry Levy formed and financed a company, Jodange LLC, to develop opinion summarization systems for the financial services industry. The company is based in Yonkers, NY, has five employees, and was selected to launch its initial product in January 2008 at the DEMO08 conference. |
|||||