Many of the Natural Language Processing tasks that we would like to model with machine learning techniques generate structured output values, such as trees, lists, or groupings. These structured output problems can be modeled by decomposing them into a set of simpler sub-problems, with well-defined and well-constrained interdependencies between sub-problems. However, the effectiveness of this approach depends to a large degree on exactly how the problem is decomposed into sub-problems; and on how those sub-problems are divided into equivalence classes.

The notion of *output encoding* can be used to examine the effects of
problem decomposition on learnability for specific tasks. These
effects can be divided into two general classes: local effects and
global effects. Local effects, which influence the difficulty of
learning individual sub-problems, depend primarily on the coherence of
the classes defined by individual output tags. Global effects, which
determine the model's ability to learn long-distance dependencies,
depend on the information content of the output tags.

Using a *canonical encoding* as a reference point, we can define
additional encodings as reversible transformations from canonical
encoded structures to a new set of encoded structures. This allows us
to define a space of potential encodings (and by extension, a space of
potential problem decompositions). Using search methods, we can then
analyze and improve upon existing problem decompositions.

For my dissertation, I plan to apply automatic and semi-automatic methods to the problem of finding optimal problem decompositions, in the context of three specific systems (one chunking system and two semantic role labeling systems). Additionally, I plan to evaluate a novel approach to voting between multiple models when each model uses a different problem decomposition, which I describe in Chapter 7.

Edward Loper (2007). *Encoding Structured Output Values.*
Phd thesis proposal, University of Pennsylvania.

**Advisor:**- Martha Palmer
**Thesis Committee:**- Dan Gildea (external), Mitch Marcus, Fernando Pereira, and Ben Taskar