Why prefer short hypotheses?
Argument in favor:

- Fewer short hyps. than long hyps.
- a short hyp that fits data unlikely to be coincidence
- a long hyp that fits data might be coincidence
Argument opposed:
- There are many ways to define small sets of hyps
- e.g., all trees with a prime number of nodes that use attributes
beginning with ``Z''
- What's so special about small sets based on size of hypothesis??