Foster 2010 - Notes on Papers

Investigating Parser Performance on Discussion Forum Posts

Foster 2010

Jennifer Foster

Baseline parse: Berkley parser (split/merge). 5th order grammar. POS tagging is done by the grammar.

untokenized w/ spelling errors: F=69.6
gold tokenization w/ spelling errors: F=72.4 (+2.8) - missing apostraphes (eg didnt) cause issues
gold tokenization w/ spelling corrected: 74.75 (+2.35) - mis-spelled function words cause significant problems. e.g. "whpo"

Transform test:

Transform dev set: retrain the parser with modifications to portions of the training data:

Self-training/co-training experiments

Try using a different corpus for training (eg Brown, switchboard)

Trying parsing with multiple grammars