NYU: Description of the Proteus/PET System as Used for MUC-7 ST

Yangarber, Grishman 1998

Yangarber and Grishman describe Proteus, a pipelined pattern-driven IE system; and PET, a user interface for creating patterns for a new domain.

The Proteus pipeline contains:

  • A tokenizer
  • A regexp-driven named entity recognizer
  • A regexp-driven chunker
  • A regexp-driven scenario detector
  • Reference resolution
  • Discourse analysis

PET is a system to rapidly develop the regexps for stages 2-4. It organizes patterns into 3 layers: "core patterns" are always used; "libraries" contain patterns for a subdomain; and "user patterns" are used to customize Proteus for a specific domain. Multiple libraries can be used. To add a new pattern to the system, the user enters a sample sentence, selects the event template, and tunes the pattern. The user must explicitly fill slots for scenario patterns. Finally, PET applies some rules to the pattern to generalize to related syntactic constructs (e.g., it creates a passive form). Yangarber and Grishman describe the use of Proteus/PET with a launch scenario, and say that it considerably reduced the amount of time it took them to customize their system to a new domain.


  author =       {Roman Yangarber and Ralph Grishman},
  title =        {NYU: Description of the Proteus/PET System
                  as used for MUC-7 ST},
  booktitle =    {Proceedings of the Seventh Message
                  Understanding Conference (MUC-7)},
  year =         1998