Opinion Annotation in GATE
Introduction
The instructions below apply to the current style of opinion annotation performed by the Subjectivity Analysis group at the University of Pittsburgh. For a discussion of the annotation scheme please read the LRE 2005 paper on annotating subjectivity, and the ACL 2005 paper (bibtex) for the extension with attitude annotations. The tool that we use for annotation is GATE, which is freely available from Sheffield University. See http://gate.ac.uk/ for more information about GATE. These instructions are current as of GATE 6.0.
Getting Started
- Download and install GATE version 6.0 or higher: http://gate.ac.uk/download/
- Download and open mpqa-annotation.tar.bz2. This will create a directory called, mpqa-annotation, which includes te MPQA schema files and example annotation files.
- Load the MPQA annotation scheme (you should only need to do this once)
- Load an Annotated Document (and look at the annotations)

How to Annotate in GATE
In GATE, load the document: sample-documents/examples-untagged.xml
While it's possible to load a plain text file or a web document URL and to just begin annotating, some minimal preprocessing can make the annotation process a bit easier (see XXXX for how we preprocess a document in GATE).MPQA Annotation Scheme
Annotation Practice
This practice will walk you through a variety of different cases (implicit direct-subjective and objective-speech annotations, implicit sources, etc.) and show how they are annotated in GATE.
Open again the document sample-documents/examples-untagged.xml in GATE. There are three sections in this document (HR10, HR16, and PNG1), each taken from a different news article. Each link below will take you to explicit instructions for annotating a given section.
There are three additional practice documents in the sample-documents directory:
- 09.53.15-23595.unannotated.xml
- 11.21.37-22256.unannotated.xml
- 21.37.46-9337.unannotated.xml
Annotated versions of these documents are also provided which you can use to check your practice annotations.
This work was (in part) supported by The Northeast Regional Research Center (NRRC), which is sponsored by the Advanced Research and Development Activity in Information Technology (ARDA), a U.S. Government entity, which sponsors and promotes research of import to the Intelligence Community, which includes but is not limited to the CIA, DIA, NSA, NIMA, and NRO.
annotation instructions by J. Ruppenhofer
