Unsupervised Models for Named Entity Classification

Collins, Singer 1999

Collins and Singer present a co-training algorithm for named entity classification. Starting with a small set of rules, they apply the algorithm to large amounts of unannotated data to produce a classifier. They assume that nouns of interest have already been identified, and their algorithm then classifies those nouns as people, locations, or companies. For their first view, they use internal ("spelling") features, such as string identity, substrings, and capitalization features. For their second view, they use external ("context") features, such as surrounding words or syntactic location. They only consider NPs that appear in one of two contexts:

  • There is an appositive modifier
  • The NP is the complement of a PP


  author =       {Michael Collins and Yoram Singer},
  title =        {Unsupervised Models for Named Entity Classification},
  booktitle =    {Proceedings of the Joint SIGDAT Conference on EMNLP},
  year =         1999,
  url = "citeseer.nj.nec.com/collins99unsupervised.html"