Instance Based Learning

Key idea: just store all training examples $\langle x_i, f(x_i) \rangle$

Nearest neighbor:

Given query instance x_q, first locate nearest training example x_n, then estimate $\hat{f}(x_q) \leftarrow f(x_n)$

k-Nearest neighbor:

Given x_q, take vote among its k nearest nbrs (if discrete-valued target function)
take mean of f values of k nearest nbrs (if real-valued)

$\begin{displaymath}\hat{f}(x_{q}) \leftarrow\frac{\sum_{i=1}^{k}f(x_{i})}{k} \end{displaymath}$

When To Consider Nearest Neighbor

Advantages:

Disadvantages:

$\psfig{figure=figures/knn-f1.ps}$

Consider p(x) defines probability that instance x will be labeled 1 (positive) versus 0 (negative).

k-Nearest neighbor:

$\mbox{$\bullet$}$: As number of training examples $\rightarrow\infty$ and k gets large, approaches Bayes optimal
: Bayes optimal: if p(x)>.5 then predict 1, else 0

Might want weight nearer neighbors more heavily...

$\begin{displaymath}\hat{f}(x_{q}) \leftarrow\frac{\sum_{i=1}^{k} w_{i} f(x_{i})}{\sum_{i=1}^{k} w_{i}} \end{displaymath}$

where

$\begin{displaymath}w_{i} \equiv \frac{1}{d(x_{q}, x_{i})^{2}} \end{displaymath}$

and d(x_q, x_i) is distance between x_q and x_i

Note now it makes sense to use all training examples instead of just k

Example Day Outlook Temperature Humidity Wind PlayTennis D1 Sunny 88 High Weak No (4) D2 Sunny 80 High Strong No (2) D3 Overcast 92 High Weak Yes (8) D4 Rain 72 High Weak Yes (6) D5 Rain 51 Normal Weak Yes (6) D6 Rain 55 Normal Strong No (2) D7 Overcast 60 Normal Strong Yes (10) D8 Sunny 75 High Weak No (9) D9 Sunny 48 Normal Weak Yes (7) D10 Rain 68 Normal Weak Yes (6) D11 Sunny 78 Normal Strong Yes (7) D12 Overcast 77 High Strong Yes (8) D13 Overcast 95 Normal Weak Yes (8) D14 Rain 68 High Strong No (4) Curse of Dimensionality Imagine instances described by 20 attributes, but only 2 are relevant to target function Curse of dimensionality: nearest nbr is easily mislead when high-dimensional X One approach: Stretch jth axis by weight z_j, where $z_1, \ldots, z_n$ chosen to minimize prediction error Use cross-validation to automatically choose weights $z_1, \ldots, z_n$ Note setting z_j to zero eliminates this dimension altogether