### Machine Learning Reading Group (MLRG): Machine Reading - Rule Learning, Coreference Resolution, and Learning from Incomplete Examples

#### Meeting time

Every thursday 11:30 PM in KEC 2057.

• Some basic ILP and SRL papers
1. FOIL algorithm : Notes from Alan's CS532 course is here
2. Logan-H : Learning Horn expressions with LOGAN-H (PDF)
3. Overview of SRL models : Sriraam's qualifier paper is here (We will read specific papers from SRL if needed )
4. Probabilistic modeling paper : Analysis of multinomial models with unknown index using data augmentation (PDF)

• Learning Inference Rules (papers suggested by Prasad)

1. Discovery of Inference Rules for Question Answering (Lin, D. and Pantel, P.) Natural Language Engineering, 7(4), 343-360, 2001. -- The rules are generated using similarities between templates of paths. The similarities are calculated based on a version of "mutual information". High ranking similarities between paths are used to generate inference rules. As a rule, the  recall is good, but precision is low. Moreover, inference rules are symmetric here. X eats Y <=> X likes Y.

2. LEDIR: An unsupervised algorithm for learning directionality of inference rules (Bhagat, R. Pantel, P., Hovy, E. ) Proceedings of the 2007 joint Conference on EMNLP&CoNLL pp 161-170, Prague, June 2007.  -- Learned directional inference rules based on the frequencies of occurence of each side of the inference rule. Learns that X eats Y => X likes Y. The directionality of learning  has improved, but recognizing valid vs invalid inferences was not. So the precision still suffers. For example, x likes y <=> x hates y might be learned as a rule. The problem, it seems to me, is that the x and y are abstracted to "person" before the inference rule is learned. I.e., the learner has not seen any evidence for (x likes y) and (x hates y) for the same x and y! It has only seen someone liking someone and someone else hating someone else. So in fact, there is only evidence for believing someone likes someone <=> someone hates someone. This seems reasonable enough, but it is much weaker than  the inference rule that is actually learned from this!  Another issue: inference was not used during learning process  to learn additional constraints.

3. Harabagiu, S. and Hickl, A. Methods for using Textual Entailment in Open-Domain Question Answering. In Proceedings of ACL 2006, pp 905-912, Sydney Australia. -- Have not read this. Apparently showed that directional textual entailment alone can improve the question answering without other inference mechanisms (according to Bhagat et al. )

4. Szpektor, I; Tamev, H.; Dagan, I; and Coppola, B; 2004. Scaling web-based acquisition of entailment relations. In Proceedings of EMNLP 2004. pp 41-48. Barcelona, Spain.

5. Chklovski, T. and Pantel, P. 2004. VerbOCEAN: Mining the  Web for Fine-Granied Semantic Verb Relations. In Proceedings of EMNLP 2004, Barcelona, Spain.

6. Rodrigo de Salvo Braz, Roxana Girju, Vasin Punyakanok, Dan Roth,  ark Sammons: An Inference Model for Semantic Entailment in Natural Language. Lecture Notes in Computer Science, Springer Berlin / Heidelberg Volume 3944/2006, Book: Machine Learning Challenges. --  This paper treats inference as optimization and does not discuss  learning inference rules.

7. Claire Nedellec: Corpus-Based Learning of Semantic Relations by the ILP System, Asium. Learning Language in Logic 1999: 259-278  http://www.eecs.orst.edu/~tadepall/lbr/asium

8. A Paper by Ritter, Etzioni et al. on learning functional relationshipshttp://turing.cs.washington.edu/papers/Ritter_emnlp08.pdf  e.g., emplyoeeOf(person,Company) is a function but colleagueOf(x,y)  is not.

9. Subgroup discovery:  Gamberger, D. and Lavrac, N. 2002. Descriptive Induction through Subgroup Discovery: A Case Study in a Medical Domain. In /Proceedings of the Nineteenth international Conference on Machine Learning/ (July 08 - 12, 2002). C. Sammut and A. G. Hoffmann, Eds. Morgan Kaufmann Publishers, San Francisco, CA, 163-170

10. Markov Logic Networks paper by Richardson and Domingos. MLNs are schematized versions of undirected graphical models over relational atoms. There is a lot of current work on using these  in lifted inference and comparisons to directed relational models like probailistic relational models. This is  a basic MLN paper.  http://www.springerlink.com/content/w55p98p426l6405q/fulltext.pdf

11. Claudien paper - learning from interpretations  An interpretation is an assignment of truth values to all ground predicates, e.g., author(paper23,JohnDoe). Given a theory, a positive interpretation satisfies the theory. a negative interpretation does not. Claudien learns a clausal theory (conjunction of Horn clauses) from a set of positive and negative examples. http://www.springerlink.com/content/j30702810h758166/fulltext.pdf

12. Natural logic for textual inference describes the NatLog  system that does textual inference.  http://www.springerlink.com/content/j30702810h758166/fulltext.pdf -- This system describes a set of inference rules that can apply to natural language sentences to derive some natural inferences, e.g., John does not work in the US. => John does not work in New York.

13. Natlan/Coling 2008 paper on extending NatLog
http://nlp.stanford.edu/~wcmac/papers/natlog-coling08.pdf

14. Bill MacCartney's Stanford thesis on natural language inference
http://nlp.stanford.edu/~wcmac/papers/nli-diss.pdf

• On Learning Temporal Structure of Events
• Coreference Resolution
1. , Proceedings of the Fifteenth Conference on Computational Natural Language Learning: Shared Task  (2011) pp. 40--44
2. Understanding the Value of Features for Coreference Resolution, EMNLP - 2008
3. Constraint-Based Entity Matching , , Proceedings of the National Conference on Artificial Intelligence (AAAI) - 2005
4. Syntactic Parsing for Ranking-Based Coreference Resolution. Altaf Rahman and Vincent Ng.  Proceedings of the 5th International Joint Conference on Natural Language Processing (IJCNLP-11), 2011.
5. Ensemble-Based Coreference Resolution. Altaf Rahman and Vincent Ng. Proceedings of the 22nd International Joint Conference on Artificial Intelligence (IJCAI-11), 2011.
6. Coreference Resolution with World Knowledge. Altaf Rahman and Vincent Ng. Main Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL-HLT), 2011
7. Narrowing the Modeling Gap: A Cluster-Ranking Approach to Coreference Resolution. Altaf Rahman and Vincent Ng.  Journal of Artificial Intelligence Research 40, pages 469-521, 2011. (This is an expanded version of the Rahman & Ng EMNLP 2009 paper. It proposes the cluster-ranking model, which solidly advances the state of the art in coreference modeling.)
8. Supervised Noun Phrase Coreference Research: The First Fifteen Years. Vincent Ng. Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (ACL-10), 2010.
9. Supervised Models for Coreference Resolution. Altaf Rahman and Vincent Ng. Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing (EMNLP-09), 2009.
10. Semantic Class Induction and Coreference Resolution. Vincent Ng. Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics (ACL-07), 2007.
11. Shallow Semantics for Coreference Resolution. Vincent Ng. Proceedings of the Twentieth International Joint Conference on Artificial Intelligence (IJCAI-07), 2007.
12. Machine Learning for Coreference Resolution: From Local Classification to Global Ranking.  Vincent Ng. Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL-05), 2005.
13. Bootstrapping Coreference Classifiers with Multiple Machine Learning Algorithms. Vincent Ng and Claire Cardie. Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing (EMNLP-03), 2003.
14. Weakly Supervised Natural Language Learning Without Redundant Views. Vincent Ng and Claire Cardie. Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics (HLT-NAACL), 2003.
15. Machine Learning for Coreference Resolution: Recent Successes and Future Directions. Vincent Ng. Cornell University Technical Report CUL.CIS/TR2003-1918, 2003
16. Knowledge Base Population: Successful Approaches and Challenges (ACL), 2011.
17. Coreference Resolution in a Modular, Entity-Centered Model, Aria Haghighi and Dan Klein, Proceedings of NAACL 2010.
18. Simple Coreference Resolution with Rich Syntactic and Semantic Features, Aria Haghighi and Dan Klein, Proceedings of EMNLP 2009.
19. Large-Scale Cross-Document Coreference Using Distributed Inference and Hierarchical Models.  Sameer Singh, Amarnag Subramanya, Fernando Pereira, Andrew McCallum.  Association for Computational Linguistics: Human Language Technologies (ACL HLT), 2011
20. SampleRank: Training Factor Graphs with Atomic Gradients. Michael Wick, Khashayar Rohanimanesh, Kedar Bellare, Aron Culotta, Andrew McCallum. Proceedings of the International Conference on Machine Learning (ICML), 2011.
21. Advances in Learning and Inference for Partition-wise Models of Coreference Resolution. Michael Wick and Andrew McCallum. University of Massachusets Technical Report # UM-CS-2009-028 (TR), 2009
22. An Entity Based Model for Coreference Resolution. Michael Wick, Aron Culotta, Khashayar Rohanimanesh, Andrew McCallum. Proceedings of the SIAM International Conference on Data Mining (SDM), Reno, Nevada, 2009
23. Joint Unsupervised Coreference Resolution with Markov Logic, with Hoifung Poon. Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing (pp. 649-658), 2008. Honolulu, HI: ACL
24. T. Finley, Supervised Clustering with Structural SVMs, PhD Thesis, Cornell University, Department of Computer Science, 2008. [Download]
• Other Related Papers
1. , Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI)  (2011)
2. Abductive Plan Recognition by Extending Bayesian Logic Programs [Details] [PDF] Sindhu Raghavan, Raymond J. Mooney. To Appear In Proceedings of the European Conference on Machine Learning/Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD 2011), September 2011.
3. Extending Bayesian Logic Programs for Plan Recognition and Machine Reading [Details] [PDF] [Slides] Sindhu V. Raghavan Technical Report, PhD proposal, Department of Computer Science, The University of Texas at Austin, May 2011.
4. Learning to Interpret Natural Language Navigation Instructions from Observations [Details] [PDF] David L. Chen and Raymond J. Mooney To Appear In Proceedings of the 25th AAAI Conference on Artificial Intelligence (AAAI-2011), 2011.
5. Abductive Markov Logic for Plan Recognition [Details] [PDF] Parag Singla and Raymond J. Mooney To Appear In Proceedings of the 25th AAAI Conference on Artificial Intelligence (AAAI-2011), 2011
6. Which Clustering Do You Want? Inducing Your Ideal Clustering with Minimal Feedback. Sajib Dasgupta and Vincent Ng. Journal of Artificial Intelligence Research 39, 2010.
7. S.R.K. Branavan, David Silver, and Regina Barzilay "Learning to Win by Reading Manuals in a Monte-Carlo Framework",  Proceedings of ACL, 2011.
8. S.R.K. Branavan, David Silver, and Regina Barzilay "Non-Linear Monte-Carlo Search in Civilization II", Proceedings of IJCAI, 2011.
9. S.R.K. Branavan, Luke Zettlemoyer and Regina Barzilay "Reading Between the Lines: Learning to Map High-level Instructions to Commands", Proceedings of ACL, 2010
10. S.R.K. Branavan, Harr Chen, Luke Zettlemoyer and Regina Barzilay "Reinforcement Learning for Mapping Instructions to Actions", Proceedings of ACL, 2009. Best Paper Award

(Below papers are from Nimar Arora's NLP reading list)

## Logical Form Transformation

• Jerry R. Hobbs (1986): Overview of the TACITUS Project gives a brief glimpse of the knowledge representation scheme involving predicates for each word of the sentence and thinking of a derivation as the interpretation. Hints also at issues in temporal reasoning.
• John Bear and Jerry R. Hobbs (1988): Localizing Expression of Ambiguity discusses how to capture attachment and other ambiguities in the logical form of a sentence instead of creating multiple logical forms. The ambiguities are captured as a disjunction of possible entity or action variables that special 'y' variables could be identical to.
• Jerry R. Hobbs, Mark Stickel, Paul Martin, Douglas Edwards (1990): Interpretation as abduction describes how abductive reasoning (an unsound logical inference process) can be used to understand natural language.
• Patric Blackburn, Johan Bos, Michael Kohlhase (1998): Automated Theorem Proving for Natural Language Understanding shows how to transform sentences in Discourse Representation Theory to first order logic.
• Ricardo Santos (2000): Donald Davidson On The Logical Form of Action Sentences describes and justifies the Davidsonian view on logical forms. The logical form of a sentence must capture the entailment relation between the sentence and other sentences. Actions (and entities) should be represented by variables and their descriptions by predicates in the logical form - because a single action can have multiple descriptions. Also, prepositions should have their own predicates which modify the action.
• Dan I. Moldovan, Vasile Rus (2001): Logic Form Transformation of WordNet and its Applicability to Question Answering takes glosses found in WordNet, parses them and converts them to a logical form.

Some papers from Question Answering literature:

## Systems evaluated on Remedia Corpus

• Lynette Hirschman, Marc Light, Eric Breck, and John D. Burger (1999): Deep Read: A Reading Comprehension System uses a bag of words to find the answer sentence which has the best intersection with the question. The bag of words consists of the stemmed words in the sentence along with semantic labels like :PERSON and :LOCATION, and personal pronouns replaced by the last :PERSON named entity. Some other heuristics include preferring longer matching words and preferring sentences which appear earlier in the document.

Performs at 33% (HumSentAcc) on the Remedia corpus. (36% with perfect name and stem resolution)

• Eugene Charniak, Yasemin Altun, Rodrigo de Salvo Braz, Benjamin Garrett, Margaret Kosrnala, Tomer Moscovich, Lixin Pang, Changbee Pyo, Ye Sun, Wei Wy (2000): Reading Comprehension Programs in a Statistical-Language-Processing Class Same Bag-of-words approach with a few tweaks to push up the numbers a little bit. Specifically, bag-of-verbs, tfidf based matching instead of set intersection, and special rules for each question type. This work shows that down-weighting stop words is better than removing them altogether. Also, many a times, the correct answer is not in the sentence with the best match but in the preceding or the following sentence.

Performs at 41% (HumSentAcc) on the Remedia corpus.

• Ellen Riloff and Michael Thelen (2000): A Rule-based Question Answering System for Reading Comprehension Tests is also a bag-of-words approach augmented with semantic classes (HUMAN, LOCATION, MONTH, TIME). Specific score rules are hand constructed for different question types.

Performs at 40% (HumSentAcc) on the Remedia corpus.

• Sanda M. Harabagiu, Steven J. Maiorano, and Marius A Pasca (2003): Open-Domain Textual Question Answering
• question stem analysis and disambiguation
• uses 24 named-entity categories
• mapping of named-entity to answer type

Performs at 65.3% (HumSentAcc) on the Remedia Corpus (76.4% with perfect named entity resolution and coreference resolution). There are no results provided for their system's named entity resolution and coreference resolution -- the first number has named entity resolution only.

• Eugene Grois and David C. Wilkins (2005): Learning Strategies for Story Comprehension: A Reinforcement Learning Approach

Performs at 48% (HumSentAcc) on the Remedia Corpus.

• Ben Wellner, Lisa Ferro, Warren Greiff and Lynette Hirschman (2006): Reading comprehension tests for computer-based understanding evaluation creates a logical form of the question and answer, and uses abductive reasoning

Performs at 46% (inexact) on the Remedia Corpus.

## Systems evaluated on TREC

• Dan Moldovan, Christine Clark, Sanda Harabagiu, and Steve Maiorano (2003): COGEX: A Logic Prover for Question Answering uses a theorem prover on the logical form of the question, text, and world-knowledge axioms to justify an answer. The logical form of a sentence is constructed using syntactic information such as "the subject, object, preposition attachment, complex nominals, and adjectival/adverbial adjuncts." Each word and its part-of-speech becomes a predicate in the logical form. Noun predicates have only one argument while verbs take a subject and an object. The hypernym, instance-of, part-of relations in WordNet supplmented by information in the gloss are used to relate words in the question and answer. NLP axioms effectively help to do anaphora resolution as well.

In TREC 2002, the COGEX system helped boost the LCC system's performance by 30%.

• Vasin Punyakanok, Dan Roth, and Wen-tau Yih (2004): Natural Language Inference via Dependency Tree Mapping: An Application to Question Answering computes weighted edit distance between the question represented as a tree and each candidate answer, also as a tree.