Instance Based Learning

Key idea: just store all training examples $\langle x_i, f(x_i) \rangle$

Nearest neighbor:

k-Nearest neighbor:

When To Consider Nearest Neighbor



Voronoi Diagram


Behavior in the Limit

Consider p(x) defines probability that instance x will be labeled 1 (positive) versus 0 (negative).

k-Nearest neighbor:

As number of training examples $\rightarrow\infty$ and k gets large, approaches Bayes optimal
Bayes optimal: if p(x)>.5 then predict 1, else 0

Distance-Weighted kNN

Might want weight nearer neighbors more heavily...

\begin{displaymath}\hat{f}(x_{q}) \leftarrow\frac{\sum_{i=1}^{k} w_{i} f(x_{i})}{\sum_{i=1}^{k} w_{i}}


\begin{displaymath}w_{i} \equiv \frac{1}{d(x_{q}, x_{i})^{2}} \end{displaymath}

and d(xq, xi) is distance between xq and xi

Note now it makes sense to use all training examples instead of just k


Day Outlook Temperature Humidity Wind PlayTennis


Sunny 88 High Weak No (4)
D2 Sunny 80 High Strong No (2)
D3 Overcast 92 High Weak Yes (8)
D4 Rain 72 High Weak Yes (6)
D5 Rain 51 Normal Weak Yes (6)
D6 Rain 55 Normal Strong No (2)
D7 Overcast 60 Normal Strong Yes (10)
D8 Sunny 75 High Weak No (9)
D9 Sunny 48 Normal Weak Yes (7)
D10 Rain 68 Normal Weak Yes (6)
D11 Sunny 78 Normal Strong Yes (7)
D12 Overcast 77 High Strong Yes (8)
D13 Overcast 95 Normal Weak Yes (8)
D14 Rain 68 High Strong No (4)

Curse of Dimensionality

Imagine instances described by 20 attributes, but only 2 are relevant to target function

Curse of dimensionality: nearest nbr is easily mislead when high-dimensional X

One approach: