Tags:
Download:pdf

Voting between Multiple Data Representations for Text Chunking

Shen & Sakar 2005

Hong Shen & Anoop Sakar, 2005. Advances in Artificial Intelligence: 18th Conference of the Canadian Society for Computational Studies of Intelligence, Canadian AI 2005, Victoria, Canada, May 9-11, 2005. Proceedings

Summary

Shen & Sakar get improved chunking performance by using 5 different 2nd order HMMs, each with a different output representation, and combining their results by majority voting.

Task

Shen & Sakar looked at two data sets: the "Base NP" data set and the CoNLL-2000 chunking data set. In the base NP data set, the task is to identify np chunks. In the CoNLL-2000 data set, the task is to identify a variety of different chunk types (noun, verb, etc); and to correctly identify their type.

In both cases, performance appears to be evaluated on a per-token basis, using each corpus's native tagging format, for only the tokens contained in chunks. I.e., precision is the number of correct non-O tags divided by the number of non-O tags in the output; and recall is the number of correct non-O tags divided by the number of non-O tags in the gold standard data. (It's worth noting that the article claims at one point that evaluation is done per-chunk; but this seems directly contradicted by several other parts of the article.)

Output Representations

Shen & Sakar considered the following output representations:

  • IOB1: O=a token that is "outside" a chunk; B=a token that begins a chunk that immediately follows another chunk; I=a token that is "inside" a chunk, but is not marked B.
  • IOB2: O=a token that is "outside" a chunk; B=a token that begins a chunk; I=a token that is "inside" a chunk, but is not marked B.
  • IOE1: O=a token that is "outside" a chunk; E=a token that ends a chunk that immediately follows another chunk; I=a token that is "inside" a chunk, but is not marked E.
  • IOE2: O=a token that is "outside" a chunk; E=a token that ends a chunk; I=a token that is "inside" a chunk, but is not marked E.
  • O+C: O=a token that is "outside" a chunk; B=a token that begins a multiword chunk; E=a token that ends a multiword chunk; S=a token that forms a single-word chunk; I=a token that is "inside" a chunk, but is not marked B, E, or S.

Examples:

word IOB1 IOB2 IOE1 IOE2 O+C
In O O O O O
early I B I I B
trading I I I E E
in O O O O O
Hong I B I I B
Kong I I E E E
Monday B B I E S
, O O O O O
gold I B I E S
was O O O O O
quoted O O O O O
at O O O O O
$ I B I I B
366.50 I I E E E
an B B I I B
ounce I I I E E
. O O O O O

Model

The machine learning model used by Shen & Sakar was based on a 2nd order HMM:

   ____   ____   ____   ____
  /    V /    V /    V /    V
y1 --> y2 --> y3 --> y4 -->
 |      |      |      |
 V      V      V      V
x1     x2     x3     x4

Where x[i] is the i'th "input" and y[i] is the i'th "output." Shen & Sakar examine several variations on this model, which basically come down to redefining what is considered the "input" and what is considered the "output". The two basic models are:

Model Input Output
basic trigram POS chunk tag
SP POS (POS, chunk tag)

What SP is doing is basically multiplying the number of states in the markov model, which lets its decisions rely on more history information.

(Where chunk tag is one of I,O,B,E,S.) The remaining models are formed by lexicalizing the input & output for specific words. E.g., SP+Lex-WHF uses (POS,word) as input and (POS,word,chunk tag) as output for words that occur at least 100 times in the training data. Again, these lexicalized models are basically just giving the markov model more history to use in making decisions.

Evaluation

Shen & Sakar evaluated their system on NP chunking & CoNNL-2000 data sets. They did achieve an increase in F-score, but it was modest (95.23 for base NP, vs best-previous of 94.22). They also pointed out that their model trains much faster than the best-previous, which was an SVM.

Login