A Novel Use of Statistical Parsing to Extract Information from Text

Miller, Fox, Ramshaw, Weishcedel 2000

Miller et al describe an integrated system for NP detection/categorization and relation categorization. It is basically a generative statistical parser, trained on trees that are augmented with semantic information. In particular, NPs are marked for category, where appropriate, and relations are marked on intermediate nodes, using slashed categories.

To create the training trees, they first ran an existing parser (trained from the treebank) on unlabeled training data; and then manually annotated the NP categories and relations. The trees were automatically merged with the hand-annotations.

The parser model uses interpolated MLE estimates, with separate models for modifier consituents, POS tags, head words, and word features. The model is searched using a CKY-style chart parser, with pruning of low-probability elements. They got around 83.5% f-score on entities, and 71% f-score on relations.

This system was constructed at BBN


  author =       {Scott Miller, Heidi Fox, Lance Ramshaw,
                  and Ralph Weishcedel},
  title =        {A Novel Use of Statistical Parsing to
                  Extract Information from Text},
  booktitle =    {Proceedings of the 1st Annual Meeting of the
                  North American Chapter of the ACL (NAACL)},
  pages =        {226-233},
  year =         2000