Date: | June 1-5, 2009 |
---|

NAACL 2009

NAACL 2009

### Shared Logistic Normal Distributions for Soft Parameter Tying in Unsupervised Grammar Induction

*Shay Cohen and Noah A. Smith*

Bayesian methods:

- theta = parameters
- z = hidden structure
- x = evidence
- beta = hyperparameters

Assume an underlying generative distribution

- Work in a parametric space.
- Use a logistic normal distribution for prior (not derichlet)

Use the prior:
- to tie parameters in the grammar
- to encode linguistic information
- Used over a *family* of multinomials
- Used for multilingual learning

Encoding prior knowledge: let the model know which part of speech tags are related to one another. (E.g., NN, NNS, NNP get grouped)

Logistic normal prior lets us tie between parameters that are within the same multinomial. But we can't tie between parameters in different multinomials. So instead of learning each multinomial on its now, we learn them in adjoined normal spaces.

Treat the entire parameter space as a single unit, with a single prior, rather than dividing it up.