Key idea: just store all training examples 
Nearest neighbor:
k-Nearest neighbor:
 
Advantages:
Disadvantages:
 
 
Consider p(x) defines probability that instance x will be labeled 1 (positive) versus 0 (negative).
k-Nearest neighbor: 
 
Might want weight nearer neighbors more heavily...
where
Note now it makes sense to use all training examples instead of just k
 
| Day | Outlook | Temperature | Humidity | Wind | PlayTennis | 
D1  | 
Sunny | 88 | High | Weak | No (4) | 
| D2 | Sunny | 80 | High | Strong | No (2) | 
| D3 | Overcast | 92 | High | Weak | Yes (8) | 
| D4 | Rain | 72 | High | Weak | Yes (6) | 
| D5 | Rain | 51 | Normal | Weak | Yes (6) | 
| D6 | Rain | 55 | Normal | Strong | No (2) | 
| D7 | Overcast | 60 | Normal | Strong | Yes (10) | 
| D8 | Sunny | 75 | High | Weak | No (9) | 
| D9 | Sunny | 48 | Normal | Weak | Yes (7) | 
| D10 | Rain | 68 | Normal | Weak | Yes (6) | 
| D11 | Sunny | 78 | Normal | Strong | Yes (7) | 
| D12 | Overcast | 77 | High | Strong | Yes (8) | 
| D13 | Overcast | 95 | Normal | Weak | Yes (8) | 
| D14 | Rain | 68 | High | Strong | No (4) | 
 
Imagine instances described by 20 attributes, but only 2 are relevant to target function
Curse of dimensionality: nearest nbr is easily mislead when high-dimensional X
One approach: