ACL 2007b - Notes on Papers

ACL Main Conference

ACL 2007b

Notes from some of the ACL talks I went to (when my laptop battery wasn't dead).

The Brain, Meaning, and Corpus Statistics

Tom Mitchel

An interesting talk about applying machine learning models to fMRI data to try to predict which word a subject is thinking about: show the subject a word/picture, record their fMRI data, and then train a classifier that predicts which word they're looking at, given the fMRI. Can we develop a meaningful theory of meaning from this? One way is to break a word into "features," based on corpus statistics -- how often does the word occur with other words -- then use those as features. E.g., for apple, it cooccurs with red, eat, etc -- so use those as features. Then when we see a novel word, we have some hope of figuring out what it is based on what features it looks like it has.

Guiding Statistical Word Alignment Models With Prior Knowledge

Yonggang Deng and Yuqing Gao

A Discriminative Language Model with Pseudo-negative Samples

Daisuke Okanohara and Jun’ichi Tsujii

Tailoring Word Alignments to Syntactic Machine Translation

John DeNero and Dan Klein

Context: tree-to-string transducer.

Problem: alignment errors can trip up the MT system. For example, if "the" and "le" get aligned with each other in an incorrect place, it can prevent the MT system from learning anything from the sentence. In particular, the MT system can no longer figure out a way to decompose the sentence into a set of steps that can be used productively.

So, if we can get rid of some of these alignment errors, we can make life easier for the MT system.

One way to do this is to use syntactic information during the word alignment, to discourage alignments that look syntactically implausible.

Making Lexical Ontologies Functional and Context-Sensitive

Tony Veale and Yanfen Hao

WordNet has a nice hierarchy of terms; but it doesn't give any meaning to its node -- e.g., there's a "skilled worker" node, but what does it mean to be under that node? What's different between the senses of a word W that are under that node vs senses that are not?

Generalizing Semantic Role Annotations Across Syntactically Similar Verbs

Andrew Gordon and Reid Swanson

one useful feature for SRL is parse tree path. parse trees path work for a wide variety of surface forms. E.g., adding various adjectives, changing the noun, etc, don't affect it too much.

For a given roleset, look at how often each parse tree path occurs with a given argn label.

simple SRL approach: just walk down the list off parse tree paths, sorted by how often they occur. When we find one, mark that constituent, and then leave that constituent out.

One issue: most verbs don't have enough data. E.g., only 20% of verbs have more than 20 instances.

solution: look for syntactically similar verbs, and share training data between them.

how can we tell if 2 verbs are syntactically similar? look at its distriution over parse tree paths.

questions: what about using verbnet? and diffs between arg0/1 and arg2-5

Transforming Projective Bilexical Dependency Grammars into Efficiently-parsable CFGs with Unfold-Fold

Mark Johnson

this talk was fairly interesting -- I'll have to look into the second order dependency stuff in the actual paper.

Adding Noun Phrase Structure to the Penn Treebank

David Vadas and James Curran

yay! np structure for tb! :)

Optimizing Grammars for Minimum Dependency Length

Daniel Gildea and David Temperley

Dan gives some evidence that one guiding principle in English is to make sure dependencies stay short.. this didn't seem that surprising -- e.g., heavy right shift.

Frustratingly Easy Domain Adaptation

Hal Daumé III

we have a source domain, with lots of training data, and a target domain, with less training data. Take our original feature set, and add to it two new features for each original feature: Sf is only used for docs from the source feature; and Tf is only used for docs from the target feature. So now we have 3n features. For things that work the same in both the source & the target domain, we'll use the original features; but for things that are special in one domain or the other, we'll use the special features.

Instance Weighting for Domain Adaptation in NLP

Jing Jiang and ChengXiang Zhai

The Infinite Tree

Jenny Rose Finkel, Trond Grenager and Christopher D. Manning

I should read this paper.

Guiding Semi-Supervision with Constraint-Driven Learning

Ming-Wei Chang, Lev Ratinov and Dan Roth

Different Structures for Evaluating Answers to Complex Questions: Pyramids Won’t Topple, and Neither Will Human Assessors

Hoa Trang Dang and Jimmy Lin

Topic: how do we evaluate QA systems

one assessor per topic
systems must be able to satisfy mutliple assessors (multiple users)

question types: factoid, list, complex. concentrate here on complex.

evaluation technique: assessor reads documents; and creates an answer key, consisting of a list of pieces of information that are relevant to the question. Then, classify each piece of information as necessary (important) or non-vital.

"binary" f-score -- f-score that distinguishes vital and non-vital nuggets. Let a=num ok in response; r=num vital in response; R=num vital in key. l = number of non-whitespace characters in the nugget. etc.

Recall = R/r Precision = 1-((l-A)/l)

For f-score harmonic mean, nugget recall has higher importance.

Complaints:

it's easy to get a zero f-score.. if you get some ok nuggets, but no vital nuggets, your score is 0. So it can't discriminate bad systems from horrible systems.
different assessors differ in opinion on what's important

So.. Still use a single assessor to pick out the list of relevant nuggets, but then have multiple assessors decide whether each nugget is vital. So rather than each nugget being vital or non-vital, it has a "vitalness" ranging from 0 to 1. "Pyramid f-score."

The overall score with pyramid f-score is well correlated with the binary score. But on individual questions, they differ. This might make it easier for system developers and systems to learn from questions.

Another plus: it's an efficient way of using multiple assessors (binary grading is relatively fast)

Automatic Acquisition of Ranked Qualia Structures from the Web

Philipp Cimiano and Johanna Wenderoth

qualia structures. describe words/concepts with four classes of properties: formal (distinguishing properties); agentive (factors involved in creation); constituitive (physical properties); telic (purpose or function). In concrete terms, we usually simplify this somewhat.. in particular: formal->hypernym; agentive->creation verb; constituitive->meronymy; telic->use verbs

A Sequencing Model for Situation Entity Classification

Alexis Palmer, Elias Ponvert, Jason Baldridge and Carlota Smith

Chinese Segmentation with a Word-Based Perceptron Algorithm

Yue Zhang and Stephen Clark

Unsupervised Coreference Resolution in a Nonparametric Bayesian Model

Aria Haghighi and Dan Klein