CPT_S 580 Spring 2016: Structured Prediction: Algorithms and Applications

  Basic Information

Instructor: Janardhan Rao (Jana) Doppa
Email: jana AT eecs dot wsu dot edu
Office: EME 133
Office hours: Mon 4-5 and Fri 3-4 or by appointment
Class location and time: Sloan 161, Tue and Thu 2:50-4:05
Class email-list: We will use piazza for all the class announcements and discussions.

  Quick Links

[ Announcements ]    [ Course Information ]     [  Lecture Schedule ]     [ Textbooks and Additional Resources  ]

  Announcements

  Course Information

Contents of the Course:

Structured prediction is a sub-field of machine learning, where the goal is to learn a mapping from a structured input to a structure output. Some example structures are sequences, trees, and graphs. These type of problems arise in several applied fields including
Each of these prediction problems has a huge number of possible outputs (e.g., many possible POS taggings for a sentence), and poses severe learning and inference challenges. In this course, we will study different learning frameworks to solve structured prediction problems.

Learning Objectives of the Course:

By the end of the course, students will be able to:

Grading Policy: Late Policy:

All assignments and project proposal/report are due at the start of the class.  The late policy is as follows.
If you are late, please slip the assignment  through my office door.

Exam Policy:
Collaboration Policy:
Safety on Campus:

 
Accommodation for Students with Disabilities:


   Lecture Schedule


Date Topic Suggested / Optional reading
Tue 1/12
Thu 1/14
Introduction to Structured Prediction
Simple outputs vs. Structured outputs
Diverse applications of structured prediction
Structured Prediction vs. Combinatorial Optimization

Tue 1/19
Thu 1/21
Structured Prediction: The Big Picture

Cost Function Learning Framework 
Control Knowledge Learning Framework
HC-Search Framework

Learning with Inference vs. Learning for Inference
Tue 1/26
Thu 1/28
Basic Search Concepts
Search space; search strategies; and search control knowledge
greedy; breadth-first and best-first beam; and branch-and-bound search

Control Knowledge Learning Framework: Big Picture
Design choices: how to define search error? when and how to perform weight updates?

Overview of Learning as Search Optimization (LaSO) approach
LaSO-BR vs. LaSO-BST and convergence of online weight updates
Beam width dependent mistake bound and its implication relating the hardness of learning and amount of search

Structured classification vs. planning
LaSO for learning to plan
Tue 2/2
Thu 2/4
Tue 2/9
Thu 2/11
Tue 2/16
Thu 2/18
Greedy Control Knowledge Learning: Classifier based Structured Prediction

Reductions in machine learning: key ideas and examples

Structured prediction, imitation learning, and reduction to classification
Recurrent classifier learning via exact imitation
Error propagation issue with exact imitation learning

Advanced imitation learning algorithms:
Forward Training
SEARN and connections to Conservative Policy Iteration (CPI)
SMiLe
DAgger
AGGREVATE
LOLS
Suggested list of papers for group presentations:

2/23
2/25
Group paper presentations:

Feb 23:

Feb 25:

3/1
3/5
Easy-First Framework for Structured Prediction

Fixed ordering of decisions vs. learned ordering of decisions
Connections to constraint satisfaction

Learning parameters of the action scoring function:
Best Good vs. Best Bad (BGBB)
Average Good vs. Average Bad (AGAB) -- same as LaSO weight update
Best Good vs. Violated Bad (BGVB)
BGBB and BGVB with Passive-Aggressive regularizer

Applications: within document coreference resolution, cross document entity and event coreference resolution

Learning to Improve Combinatorial Optimization:
STAGE algoriothm
Reinforcement Learning for job shop scheduling
MIMIC algorithm

3/8
3/10
3/22
3/24
HC-Search Framework for Structured Prediction
A unifying framework for cost function learning and control knowledge learning

Primitive Search Space vs. Search Space over Complete Outputs
Quality of a search space
Search spaces: Flipbit and Limited Discrepancy Search (LDS) space
Quality of Flipbit vs. Quality of LDS space
Randomized Segmentation space for computer vision problems

Loss decomposition in terms of generation and selection loss
Heuristic Learning via Imitation Learning
Reduction to Rank Learning
Characterization of pairwise ranking examples for any rank-based search procedure
Cost function learning via rank learning

Efficiency Issues
Sparse Search Spaces to Improve Efficiency

Engineering methodology for applying HC-Search to new structured prediction problems

3/29
3/31
4/5
4/7
4/12
4/14
Cost Function Learning Framework for Structured Prediction
Learning with fixed Inference algorithm

Naive Bayes to Hidden Markov Models (HMMs)
Perceptron to Structured Perceptron
Logistic Regression to Conditional Random Fields (CRF)
SVM to Structured SVM (SSVM)

Training algorithms: stochastic gradient descent and cutting plane algorithms
Inference algorithms: Viterbi and Belief propagation

Independent Learning
Independent Learning and Inference with constraints (connections to penalty logic)
Global learning
IL vs. ILC vs. GL

Piece-wise training
Pseudo-max training
Decomposed learning framework

Structured prediction cascades
Coarse-to-Fine learning and inference
Generalized A* search architecture
Suggested list of papers for group presentations:
4/19
4/21
4/26
Group paper presentations:
  • Zhila: Yisong Yue, Thomas Finley, Filip Radlinski, Thorsten Joachims: A support vector method for optimizing average precision. SIGIR 2007: 271-278
  • Ehdieh and Namaki: Liang Huang, Suphan Fayong, and Yang Guo (2012). Structured Perceptron with Inexact Search. In Proceedings of NAACL 2012
  •  Qi and Reza: Shay Zakov, Yoav Goldberg, Michael Elhadad, Michal Ziv-Ukelson: Rich Parameterization Improves RNA Structure Prediction. RECOMB 2011: 546-562

  • Rakib: Xiao Cheng, Dan Roth: Relational Inference for Wikification. EMNLP 2013: 1787-1796
  • David and Jin Tao: Thomas Finley, Thorsten Joachims: Training structural SVMs when exact inference is intractable. ICML 2008: 304-311
  • Tao and Yao: Thomas Finley, Thorsten Joachims: Supervised clustering with support vector machines. ICML 2005: 217-224
  • Yunshu: Wei Lu, Dan Roth: Automatic Event Extraction with Structured Preference Modeling. ACL (1) 2012: 835-844
4/27Structured prediction cascades (contd.)
Review of different learning frameworks for structured prediction

  Textbooks and Additional Resources

We will not follow any fixed textbook for this course.  The instructor will provide the lecture slides and notes at the begining of each class.

An optional list of textbooks is as follows:
A list of machine learning software that you can use as needed: