Opinionfinder
OpinionFinder
is a system that performs subjectivity analysis, automatically
identifying when opinions, sentiments, speculations and other private
states are present in text. Specifically, OpinionFinder aims to
identify subjective sentences and to mark various aspects of
the subjectivity in these sentences, including the source of
the subjectivity and words that are included in phrases expressing
positive or negative sentiments.
Our goal
with OpinionFinder is to develop a system capable of supporting other
Natural Language Processing (NLP) applications by providing them with
information about the subjectivity in documents. Of particular interest
are question answering systems that focus on being able to answer
opinion-oriented questions, such as the following:
How
is Bush's decision not
to ratify the Kyoto Protocol looked upon by Japan and other US
allies?
How
do the Chinese regard
the human rights record of the United States?
To answer
these types of questions, a system needs to be able to identify when
opinions are expressed in text and who is expressing them. Other
applications that would benefit from knowledge of subjective language
include systems that summarize the various viewpoints in a document or
that mine product reviews. Even typical fact-oriented applications,
such as information extraction, can benefit from subjectivity analysis
by filtering out opinionated sentences.
OpinionFinder
operates as one large pipeline. Conceptually, the pipeline can be
divided into two parts. The first part performs mostly general purpose
document processing (e.g., tokenization and part-of-speech tagging).
The second part performs the subjectivity analysis. The results of the
the subjectivity analysis are returned to the user in the form of
SGML/XML markup of the original documents.
1. Document
Processing
For general
document processing,
OpinionFinder first runs the Sundance partial parser (Riloff and
Phillips, 2004)
to provide semantic class tags, identify Named Enties, and match
extraction
patterns that correspond to subjective language (Riloff and Wiebe,
2003). Next, OpenNLP 1.3
is used to tokenize,
sentence split and part-of-speech tag the data, and the Abney stemmer SCOL version 1g
is used to stem.
2.
Subjectivity Analysis
The subjectivity
analysis has four
comonents.
a)
Subjective Sentence Classification
The first
component is a Naive Bayes
classifier that distinguishes between subjective and objective
sentences using a
variety of lexical and contextual features (Wiebe and Riloff, 2005;
Riloff and
Wiebe, 2003). The classifier is trained using subjective and objective
sentences, which are automatically generated from a large corpus of
unannotated
data by two high-precision, rule-based classifiers.
b) Direct
Subjective Expression and Speech Event (DSESE) Identification
The second
component identifies direct
subjective expressions (e.g., ``fears,'' ``is happy'') and speech
events (e.g.,
``said,'' ``according to''). Direct subjective expressions are words or
phrases
where an opinion, emotion, sentiment, etc. is directly described.
Speech events
include both speaking and writing events. A Conditional Random Field
sequence
tagging model (Lafferty et al., 2001) is used to identify DSESEs. The
model is
trained on a subset of the MPQA
Opinion Corpus.
c) Opinion
Source Identification
The third
component is a source
identifier. The source of a speech event is the speaker; the source of
a
subjective expression is the experiencer of the private state. The
source
identifier identifies sources with high precision using extraction
patterns that
were learned automatically (Choi et al., 2005).
d)
Sentiment Expression Classification
The final
component uses two
classifiers to identify words contained in phrases that express
positive or
negative sentiments (Wilson et al., 2005). The first classifier focuses
on
identifying sentiment expressions. The second classifier takes the
sentiment
expressions and identifies those that are positive and negative. Both
classifiers were developed using BoosTexter (Schapire and Singer, 2000)
and
trained on the MPQA
Opinion
Corpus.
References
on Opinionfinder
Yejin Choi,
Claire Cardie, Ellen Riloff, and Siddharth Patwardhan (2005).
Identifying sources of opinions with conditional random fields and
extraction patterns. In Proceedings of the Human Language Technology
Conference/Conference on Empirical Methods in Natural Language
Processing (HLT-EMNLP 2005).
John
Lafferty and Andrew McCallum and Fernando Pereira (2001). Conditional
random fields: Probabilistic models for segmenting and labeling
sequence data. In Proceedings of the 18th International Conference on
Machine Learning.
Ellen Riloff
and William Phillips (2004). An Introduction to the Sundance and
AutoSlog Systems. Technical Report UUCS-04-015, School of Computing,
University of Utah.
Ellen Riloff
and Janyce Wiebe (2003). Learning extraction patterns for subjective
expressions. In Proceedings of the Conference on Empirical Methods in
Natural Language Procession (EMNLP-2003).
Robert E.
Schapire and Yoram Singer (2000). BoosTexter: A boosting based system
for text categorization. Machine Learning, 39(2/3):135-168.
Janyce Wiebe
and Ellen Riloff (2005). Creating subjective and objective sentence
classifiers from unannotated texts. In Proceedings of the Sixth
International Conference on Intelligent Text Processing and
Computational Linguistics (CICLing-2005).
Theresa
Wilson, Janyce Wiebe, and Paul Hoffmann (2005). Recognizing contextual
polarity in phrase-level sentiment analysis. In Proceedings of the
Human Language Technology Conference/Conference on Empirical Methods in
Natural Language Processing (HLT-EMNLP 2005).
|