Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
http://citeseer.ist.psu.edu/lafferty01conditional.html
This is the paper that talks about the label bias problem. Basically, the idea is that MEMMs are suboptimal because they're normalized locally, rather than globally, and that local normalization means that the model will be forced to give a high score to a transition even if the model knows that transition is very unlikely, as long as it's more likely than the alternatives.
They give some synthetic and POS tagging experiment results.
Bibtex
@inproceedings{ lafferty01conditional, author = "John Lafferty and Andrew McCallum and Fernando Pereira", title = "Conditional Random Fields: {P}robabilistic Models for Segmenting and Labeling Sequence Data", booktitle = "Proc. 18th International Conf. on Machine Learning", publisher = "Morgan Kaufmann, San Francisco, CA", pages = "282--289", year = "2001", url = "citeseer.ist.psu.edu/lafferty01conditional.html" }