Before you annotate a new document, check out the sentence splits. Note
that there are two kinds of splits, the default GATE_Splits that come
from the text processing platform, and the MPQA splits. The
preprocessing done in GATE sets the MPQA splits to be the same as the GATE
Splits.
Splits could be bad in one of two ways: their extent is too small or
large, or they are in the wrong place, where wrong place typically
means that they need to be deleted, for instance because a split got
introduced because of an abbreviation ending in a period.
When you modify splits, change both the GATE and the MPQA splits. (Since
I am not sure at this point whether one or the other type of split is
crucial to automatic systems let's adjust both.) Also
let's adjust the associated GATE_Sentence labels and
the MPQA_inside labels.
For instance, if you had a split after "Mr"
in :
Mr. Bean ... You'll just have to love him!
you would want to remove it (both the GATE Split and the MPQA split) and
then you need to merge the two MPQA insides that cover
"Mr" and "...
You'll just have to love him!".
Likewise, you need to merge the two GATE_Sentences over the same
spans.