OrganizersAlan Fern, Oregon State University
Liang Huang, Oregon State University
Jana Doppa, Washington State University (Contact Organizer)
Time and Location1:45 to 5:30pm in Room 2 on July 11, 2016
DescriptionStructured
Prediction (SP) deals with the task of mapping a structured input
(e.g., sequence of words) to a structured output (e.g., sequence of
part of speech tags). Many important applications in natural language
processing, computer vision, and bio-informatics can be naturally
formulated as structured output prediction problems. Some examples are
as follows: parsing (predicting the parse tree for a given sentence);
information extraction (detecting entity and event mentions from a
given text); co-reference resolution (recognizing and resolving
co-references of entities and events); machine translation (translating
a sentence from one language to another); object detection (detecting
objects in a given image); semantic segmentation (labeling each pixel
in an image with the corresponding semantic label); object tracking in
videos (predicting the tracks of multiple objects in a video);
text-to-speech and speech-to-text mapping; and protein structure
prediction (predicting the three-dimensional structure of a protein
from its amino acid sequence) etc.
Each of these prediction
problems has a huge number of possible interpretations or outputs
(e.g., many possible part of speech taggings for a sentence). A
standard approach to structured prediction is to learn a cost function
for scoring a potential output for each input. Given such a cost
function and a new input, the output computation involves solving the
so-called ``Argmin inference problem,'' which is to find the minimum
cost output for the corresponding input. Unfortunately, exactly solving
the Argmin inference problem is often intractable (computationally very
hard) except for some special cases.
This tutorial will focus on
computational frameworks that integrate "learning" and "search" in a
principled manner to solve structured prediction problems.
Outline of the Tutorial
Part 1: Introduction
Motivation via Diverse Application Domains- Natural language processing
- Computer vision
- Bioinformatics
- Planning
Classification to Structured Prediction
- Standard machine learning setup w/ simple outputs
- Simple outputs vs. Structured outputs
- Key challenge: large number of candidate structured outputs
Part 2: Cost Function Learning Framework and Argmin Inference Challenge
Cost Function Learning Algorithms: Generalizations of Classification Algorithms
- Perceptron => Structured Perceptron
- SVM => Structured SVM
- Naive Bayes => HMMs
- Logistic Regression => CRFs
Key Elements of Cost Function Learning Framework
- Joint feature function over structured input-output pairs
- (Loss-augmented) Argmin inference solver
- Optimization algorithm to tune the weights of the cost function
Generic Learning Framework and Challenges- Repeatedly
call the Argmin Inference solver with the current weights on each
training input and update the weights via online or batch optimization
algorithm
- Expressiveness of features vs. Time complexity of Argmin inference solver
- Learning w/ "Exact" Inference vs. "Approximate" Inference
Focus of Tutorial: - Integrating "Learning" and "Search", two fundamental branches of AI to solve structured prediction problems
- Structured Prediction as a search process and learning to improve the accuracy of search using training data
Part 3: Overview of Search Concepts: Search Space, Search Strategy, Search Control Knowledge etc.
Primitive Search Spaces:
- Initial State Function, Successor Function, and Terminal State Function
- Ordered
vs. Unordered Search Spaces: Making the decisions in a fixed
order (e.g., left-to-right in sequence labeling) vs. no fixed ordering.
Single target path from initial state to goal state vs. Multiple target
paths.
Search Procedures: Best-First Search
- No pruning at one extreme and greedy search at the other extreme
- Beam search with beam width B \in [1, \infinity] to access the entire spectrum
- Best-first vs. Breadth-first Beam search
Search Control Knowledge
- Policies, Pruning Rules, and Heuristic Functions
Cost Function
- Scoring function to evaluate the terminal states
Part 4: Control Knowledge Learning Framework: Greedy Methods
Control Knowledge Learning Framework
- Given
a ordered or a unordered search space and a cost function, how to learn
a greedy policy to quickly the search towards low cost terminal states?
Greedy Methods
- Ordered Search Space: Classifier-based structured prediction
- Unordered Search Space: Easy-first approach
Classifier-based Structured Prediction- Connections to Imitation Learning (IL)
- Exact-Imitation and Error propagation (Theoretical results from Khardon, Syed & Schapire, Ross & Bagnell)
- Advanced Imitation Learning Algorithms: SEARN, SMiLe, DAgger, AggreVaTe, LOLS
- DAgger algorithm as a representative sample
Easy-First Approach
- Key
idea: learns to make easy prediction decisions first to constrain the
harder decisions akin to constraint satisfaction algorithms
- Imitation learning with a non-deterministic oracle policy: ties are broken with the learned policy (scoring function)
- Standard training approach: best-scoring target node vs. best-scoring non-target node
- Optimization
based learning approach: best-scoring target node vs. all violated
non-target nodes with a passive-aggressive regularizer
Part 5: Control Knowledge Learning Framework: Beam Search MethodsGiven
a ordered or unordered search space and a cost function, how to learn a
heuristic function to quickly guide the search procedure towards low
cost terminal states?
General Framework: 3 Choices- How to define the notion of search error?
- How to learn (update weights) when we encounter a search error?
- What to do after learning?
Different Instantiations
- Early update instantiation
- Max-violation update instantiation
- LaSO instnatiation
- Convergence results
- Rate of convergence is inversely proportional to the beam width (amount of search allowed)
Part 6: HC-Search: A Unifying framework for Cost Function Learning and Control Knowledge Learning
Unifying View
- Adding C to control knowledge learning or Adding H to cost function learning
- Without H, becomes cost function learning
- Without C, becomes control knowledge learning
Overview
- Key
Idea: Generate high-quality candidate outputs via a search procedure
guided by learned heuristic H and score the candidate outputs using a
learned cost function C to select the least cost candidate output as
prediction
- HC-Search supports learning to improve both speed and accuracy of structured prediction
- H
can be learned either in a primitive space (see IJCAI'16 paper on
incremental parsing) or a complete output space (see also IJCAI'16
paper on heuristic search for computing M-Best Modes)
HC-Search: Learning Approach- How to learn H and C for given a complete output space, search procedure, search bound, and loss function over a training set?
- Loss decomposition: generation error and selection error
- Greedy stage-wise loss minimization
- Learning H via Imitation Learning
- Learning C via Rank Learning
Search Space Design
- Quality of a search space: concept of (expected) target depth of a search space
- Flipbit search space
- Limited
Discrepancy Search (LDS) Space parameterized by a recurrent classifier
or greedy policy (see also IJCAI'16 paper on LDS for AND/OR search
spaces)
- Optimizing the quality of LDS space via optimizing the accuracy of recurrent classifier
- Sparse LDS spaces to trade-off speed and accuracy
- Randomized
Segmentation Space for computer vision problems: probabilistically
sample likely object configurations in the image from a hierarchical
segmentation tree
Engineering Methodology for Applying HC-Search
Relation to Other Methods
- Inference in HC-Search vs. Inference in CRF/SSVM
- Re-ranking style algorithms (generative model, M-Best diverse modes, and DPPs etc.)
Part 7: Future Directions
- Designing and optimizing search spaces for complex structured prediction problems
- Leveraging deep learning techniques to enhance (search-based) structured prediction approaches
- What
architectures are more suitable for "Anytime predictions" and what is
best way to perform learning for making anytime predictions
- Theoretical analysis: sample complexity and generalization bounds
- More ...
Background of PresentersAlan Fern
is an Professor of Computer Science at Oregon State University. He
received his Ph.D (2004) and M.S (2000) in Computer Engineering from
Purdue University under the direction of Robert Givan, and his B.S
(1997) in Electrical Engineering from the University of Maine. He
received a National Science Foundation CAREER award in 2006, is an
associate editor of the Machine Learning Journal, and currently serves
on the editorial boards of the Journal of Artificial Intelligence
Research and the Artificial Intelligence Journal. His research
interests span a range of topics in artificial intelligence, including
machine learning and automated planning, with a particular interest in
work falling at their intersection. He is also interested in
applications of AI, including AI for real-time strategy games and
activity recognition from videos of American football and other sports.
Liang Huang
is an Assistant Professor of Computer Science at Oregon State
University. He was most recently an assistant professor at the City
University of New York (CUNY) and a part-time research scientist at IBM
T. J. Watson Research Center. He graduated in 2008 from the University
of Pennsylvania and has worked as a research scientist at Google and a
research assistant professor at USC/ISI. Most of his work develops fast
algorithms and provable theory to speedup large-scale natural language
processing and structured machine learning. He has received a Best
Paper Award at ACL 2008, several best paper nominations (ACL 2007,
EMNLP 2008, and ACL 2010), two Google Faculty Research Awards (2010 and
2013), a Yahoo Faculty Research Award (2015), and a University Teaching
Prize at Penn (2005). His research has been supported by DARPA, NSF,
Google, and Yahoo. He also co-authored a best-selling textbook in China
on algorithms for programming contests.
Jana Doppa is
an Assistant Professor of Computer Science at Washington State
University, Pullman. He earned his PhD working with the Artificial
Intelligence group at Oregon State University (2014); and his MTech
from Indian Institute of Technology (IIT), Kanpur, India (2006). His
general research interests are in the broad field of Artificial
Intelligence (AI) and its applications including planning, natural
language processing, and computer vision. He received a Outstanding
Paper Award for his structured prediction work at the AAAI (2013)
conference and a Google Faculty Research Award (2015). His PhD
dissertation entitled ``Integrating Learning and Search for Structured
Prediction'' was nominated for the ACM Distinguished Dissertation Award
(2015). He has organized a successful workshop at ICML (2013) on
structured prediction and interactions between inference and learning.
References