Formalisms and Methodology for Learning by Reading
Machine Reading as a Process of Partial QA
Peter Clark and Phil Harrison
Machine reading goal:
- construct an inference-supporting representation from text
- connect what is read with what is known -- reader knows something, and the text elaborates/deepens that knowledge
In QA: "remainder" that we don't know is failure; In machine reading, "remainder" that we don't know is new knowledge
Interleave interpretation with answering:
- start with logical form
- consider several alternative disambiguations
- compare them with the existing knowledge base; are they provable or (partially) known?
- iterate
- end up with a disambiguated semantics that is consistent with the knowledge base
Example: if we see "joined", it could me "acoompanied by" or "attached to" (WSD)
Create a tree of possible interpretations; interpret and try to prove parts of the logical form.
how sensitive is this approach to the order in which we 'read' documents?
Audience questions mostly focused on "what's new here?" and "does this scale?"
Building an end-to-end text reading system based on a packed representation
Doo Soon Kim, Ken Barker and Bruce Porter
Simple pipeline uses very aggressive pruning. Alternative: n-best, using a beam. But combinatorial expansion.
So they used a packed representation through the entire system.
Target representation: graphical representation of dependencies between events/objects. Nodes are things like "has-part" and "object-of-event"
Packed Graphical (PG) Representation
- Base representation plus constraints.
- Base representation is a graph with variables.
- We can then put constraints on those variables (eg R1=(foo|bar)); and we can put constraints on the relations between the variables.
Ambiguity types:
- parsing ambiguity
- type ambiguity
- relation ambiguity
- coref ambiguity
Disambiguation: identify mappings between (hopefully) redundant texts, and then merge the information from them.
To do the merging: convert constraints to Bayesian networks, then merge the networks.
Pruning: discard low-probability candidates. This can propagate, because of the derivation constraints. This is actually done with Bayesian networks.
Semantic Enrichment of Text with Background Knowledge
Anselmo Peñas and Eduard Hovy
Typically, texts omit important information.
Goal: automatically recover the omitted information. "Enrichment"
Use domain-specific knowledge base, with counts of patterns used to enrich semantically poor relationships (eg noun-noun compounds).
Large Scale Relation Detection
Chris Welty, James Fan, David Gondek and Andrew Schlaikjer
Mining Script-Like Structures from the Web
Niels Kasch and Tim Oates
Open-domain Commonsense Reasoning Using Discourse Relations from a Corpus of Weblog Stories
Matthew Gerber, Andrew Gordon and Kenji Sagae
Semantic Role Labeling for Open Information Extraction
Janara Christensen, Mausam, Stephen Soderland and Oren Etzioni
Empirical Studies in Learning to Read
Marjorie Freedman, Edward Loper, Elizabeth Boschee and Ralph Weischedel
Learning Rules from Incomplete Examples: A Pragmatic Approach
Janardhan Rao Do
Unsupervised techniques for discovering ontology elements from Wikipedia article links
Zareen Syed and Tim Finin
Machine Reading at the University of Washington
Hoifung Poon, Janara Christensen, Pedro Domingos, Oren Etzioni, Raphael Hoffmann, Chloe Kiddon, Thomas Lin, Xiao Ling, Mausam, Alan Ritter, Stefan Schoenmackers, Stephen Soderland, Dan Weld, Fei Wu and Congle Zhang
Analogical Dialogue Acts: Supporting Learning by Reading Analogies
David Barbella and Kenneth Forbus
A Hybrid Approach to Unsupervised Relation Discovery Based on Linguistic Analysis and Semantic Typing
Zareen Syed and Evelyne Viegas
Supporting rule-based representations with corpus-derived lexical information.
Annie Zaenen, Cleo Condoravdi, Daniel Bobrow and Raphael Hoffmann
PRISMATIC: Inducing Knowledge from a Large Scale Lexicalized Relation Resource
James Fan, David Ferrucci, David Gondek and Aditya Kalyanpur