Provides practical learning algorithms:
Provides useful conceptual framework
Generally want the most probable hypothesis given the training data
Maximum a posteriori hypothesis hMAP:
hMAP | |||
If assume P(hi)=P(hj) then can further simplify, and choose the Maximum likelihood (ML) hypothesis
Does patient have cancer or not?
A patient takes a lab test and the result comes back positive. The test returns a correct positive result in only of the cases in which the disease is actually present, and a correct negative result in only of the cases in which the disease is not present. Furthermore, .008 of the entire population have this cancer.
P(cancer) =
P(
cancer) =
P(+
cancer) =
P(-
cancer) =
P(+
cancer) =
P(-
cancer) =
So far we've sought the most probable hypothesis given the data D(i.e., hMAP)
Given new instance x, what is its most probable classification?
Consider:
Bayes optimal classification:
Example:
P(h1|D)=.4, | P(-|h1)=0, | P(+|h1)=1 | |
P(h2|D)=.3, | P(-|h2)=1, | P(+|h2)=0 | |
P(h3|D)=.3, | P(-|h3)=1, | P(+|h3)=0 |
= | .4 | ||
= | .6 |
= | - |
Along with decision trees, neural networks, nearest nbr, one of the most practical learning methods.
When to use
Successful applications:
Assume target function , where each instance x described by attributes .
Most probable value of f(x) is:
vMAP | = | ||
vMAP | = | ||
= |
Naive Bayes assumption:
which gives
Naive_Bayes_Learn(examples)
Classify_New_Instance(x)
Day | Outlook | Temperature | Humidity | Wind | PlayTennis |
D1 |
Sunny | Hot | High | Weak | No |
D2 | Sunny | Hot | High | Strong | No |
D3 | Overcast | Hot | High | Weak | Yes |
D4 | Rain | Mild | High | Weak | Yes |
D5 | Rain | Cool | Normal | Weak | Yes |
D6 | Rain | Cool | Normal | Strong | No |
D7 | Overcast | Cool | Normal | Strong | Yes |
D8 | Sunny | Mild | High | Weak | No |
D9 | Sunny | Cool | Normal | Weak | Yes |
D10 | Rain | Mild | Normal | Weak | Yes |
D11 | Sunny | Mild | Normal | Strong | Yes |
D12 | Overcast | Mild | High | Strong | Yes |
D13 | Overcast | Hot | Normal | Weak | Yes |
D14 | Rain | Mild | High | Strong | No |
Consider PlayTennis again, and new instance
Want to compute:
Typical solution is Bayesian estimate for
Why?
Naive Bayes is among most effective algorithms
What attributes shall we use to represent text documents??
Target concept
where P(ai=wk| vj) is probability that word in position i is wk, given vj
one more assumption:
LEARN/SMALL>_NAIVE/SMALL>_BAYES/SMALL>_TEXT( Examples, V)
CLASSIFY/SMALL>_NAIVE/SMALL>_BAYES/SMALL>_TEXT(Doc)
Given 1000 training documents from each group
Learn to classify new documents according to which newsgroup it came from
comp.graphics | misc.forsale |
comp.os.ms-windows.misc | rec.autos |
comp.sys.ibm.pc.hardware | rec.motorcycles |
comp.sys.mac.hardware | rec.sport.baseball |
comp.windows.x | rec.sport.hockey |
alt.atheism | sci.space |
soc.religion.christian | sci.crypt |
talk.religion.misc | sci.electronics |
talk.politics.mideast | sci.med |
talk.politics.misc | |
talk.politics.guns |
Naive Bayes: 89% classification accuracy
Path: cantaloupe.srv.cs.cmu.edu!das-news.harvard.edu!ogicse!uwm.edu From: xxx@yyy.zzz.edu (John Doe) Subject: Re: This year's biggest and worst (opinion)... Date: 5 Apr 93 09:53:39 GMT I can only comment on the Kings, but the most obvious candidate for pleasant surprise is Alex Zhitnik. He came highly touted as a defensive defenseman, but he's clearly much more than that. Great skater and hard shot (though wish he were more accurate). In fact, he pretty much allowed the Kings to trade away that huge defensive liability Paul Coffey. Kelly Hrudey is only the biggest disappointment if you thought he was any good to begin with. But, at best, he's only a mediocre goaltender. A better choice would be Tomas Sandstrom, though not through any fault of his own, but because some thugs in Toronto decided
Interesting because:
(also called Bayes Nets)
Definition: X is conditionally independent of Y given Z if the probability distribution governing X is independent of the value of Ygiven the value of Z; that is, if
more compactly, we write
P(X | Y,Z) = P(X | Z)
Example: Thunder is conditionally independent of Rain, given Lightning
Naive Bayes uses cond. indep. to justify
P(X,Y|Z) | = | P(X|Y,Z) P(Y|Z) | |
= | P(X|Z) P(Y|Z) |
Network represents a set of conditional independence assertions:
A belief network represents the dependence between variables.
* nodes
* links
* conditional probability tables