Can Subcategorisation Probabilities Help a Statistical Parser?

Carroll, Minnen 1998

Carroll and Minnen present a parse ranking algorithm which uses subcategorization improve performance. In particular, it ranks complete derivations produced by a baseline parser by the product of:

  • The (structural) derivation probability, according to the baseline probabilistic LR model.
  • P(VSUBCAT|V) for each verb in the sentence, where VSUBCAT is the subcategorization frame used by the verb (e.g., NONE, NP, AP, NP_NP, NP_AP, etc.); and V is the actual verb (e.g., "make," "admit," etc).
Add-1 (aka Laplace) smoothing is used for the lexical entries.


Initially, they evaluated their performance using 4 standard metrics: bracket recall; bracket precision; zero crossings; and mean crossings per sentence. But these metrics are fairly forgiving to incorrect attatchments, so the new system didn't perform significantly different from the baseline. They then applied a metric based on grammatical relations, which basically seems like precision and recall over dependency graph links. Using this metric, they got an increase in precision from 79.2% to 88.2%, and a small decrease in recall (88.6% to 88.1%). See the paper for an error analysis, which breaks the errors into categories.


  author =       {John Carroll and Guido Minnen},
  title =        {Can Subcategorisation Probabilities Help a Statistical Parser?},
  booktitle =    {The 6th ACL/SIGDAT Workshop on Very Large Corpora.},
  year =         1998,
  address =      {Montreal, Canada}