logo  
Center for the Extraction and Summarization of Events and Opinions in Text  
line decor
  HOME  ::   RESEARCH  ::   PUBLICATIONS  ::   PEOPLE  ::   CORPORA  ::   SYSTEMS  ::   CONTACT  ::  
line decor
   
 
SYSTEMS

Opinionfinder

OpinionFinder is a system that performs subjectivity analysis, automatically identifying when opinions, sentiments, speculations and other private states are present in text. Specifically, OpinionFinder aims to identify subjective sentences and to mark various aspects of the subjectivity in these sentences, including the source of the subjectivity and words that are included in phrases expressing positive or negative sentiments.

Our goal with OpinionFinder is to develop a system capable of supporting other Natural Language Processing (NLP) applications by providing them with information about the subjectivity in documents. Of particular interest are question answering systems that focus on being able to answer opinion-oriented questions, such as the following:

How is Bush's decision not to ratify the Kyoto Protocol looked upon by Japan and other US allies?
How do the Chinese regard the human rights record of the United States?

To answer these types of questions, a system needs to be able to identify when opinions are expressed in text and who is expressing them. Other applications that would benefit from knowledge of subjective language include systems that summarize the various viewpoints in a document or that mine product reviews. Even typical fact-oriented applications, such as information extraction, can benefit from subjectivity analysis by filtering out opinionated sentences.

OpinionFinder operates as one large pipeline. Conceptually, the pipeline can be divided into two parts. The first part performs mostly general purpose document processing (e.g., tokenization and part-of-speech tagging). The second part performs the subjectivity analysis. The results of the the subjectivity analysis are returned to the user in the form of SGML/XML markup of the original documents.

1. Document Processing

For general document processing, OpinionFinder first runs the Sundance partial parser (Riloff and Phillips, 2004) to provide semantic class tags, identify Named Enties, and match extraction patterns that correspond to subjective language (Riloff and Wiebe, 2003). Next, OpenNLP 1.3 is used to tokenize, sentence split and part-of-speech tag the data, and the Abney stemmer SCOL version 1g is used to stem.

2. Subjectivity Analysis

The subjectivity analysis has four comonents.

a) Subjective Sentence Classification

The first component is a Naive Bayes classifier that distinguishes between subjective and objective sentences using a variety of lexical and contextual features (Wiebe and Riloff, 2005; Riloff and Wiebe, 2003). The classifier is trained using subjective and objective sentences, which are automatically generated from a large corpus of unannotated data by two high-precision, rule-based classifiers.

b) Direct Subjective Expression and Speech Event (DSESE) Identification

The second component identifies direct subjective expressions (e.g., ``fears,'' ``is happy'') and speech events (e.g., ``said,'' ``according to''). Direct subjective expressions are words or phrases where an opinion, emotion, sentiment, etc. is directly described. Speech events include both speaking and writing events. A Conditional Random Field sequence tagging model (Lafferty et al., 2001) is used to identify DSESEs. The model is trained on a subset of the MPQA Opinion Corpus.

c) Opinion Source Identification

The third component is a source identifier. The source of a speech event is the speaker; the source of a subjective expression is the experiencer of the private state. The source identifier identifies sources with high precision using extraction patterns that were learned automatically (Choi et al., 2005).

d) Sentiment Expression Classification

The final component uses two classifiers to identify words contained in phrases that express positive or negative sentiments (Wilson et al., 2005). The first classifier focuses on identifying sentiment expressions. The second classifier takes the sentiment expressions and identifies those that are positive and negative. Both classifiers were developed using BoosTexter (Schapire and Singer, 2000) and trained on the MPQA Opinion Corpus.


References on Opinionfinder

Yejin Choi, Claire Cardie, Ellen Riloff, and Siddharth Patwardhan (2005). Identifying sources of opinions with conditional random fields and extraction patterns. In Proceedings of the Human Language Technology Conference/Conference on Empirical Methods in Natural Language Processing (HLT-EMNLP 2005).

John Lafferty and Andrew McCallum and Fernando Pereira (2001). Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the 18th International Conference on Machine Learning.

Ellen Riloff and William Phillips (2004). An Introduction to the Sundance and AutoSlog Systems. Technical Report UUCS-04-015, School of Computing, University of Utah.

Ellen Riloff and Janyce Wiebe (2003). Learning extraction patterns for subjective expressions. In Proceedings of the Conference on Empirical Methods in Natural Language Procession (EMNLP-2003).

Robert E. Schapire and Yoram Singer (2000). BoosTexter: A boosting based system for text categorization. Machine Learning, 39(2/3):135-168.

Janyce Wiebe and Ellen Riloff (2005). Creating subjective and objective sentence classifiers from unannotated texts. In Proceedings of the Sixth International Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2005).

Theresa Wilson, Janyce Wiebe, and Paul Hoffmann (2005). Recognizing contextual polarity in phrase-level sentiment analysis. In Proceedings of the Human Language Technology Conference/Conference on Empirical Methods in Natural Language Processing (HLT-EMNLP 2005).