Clustering

Group similar items together
Example: sorting laundry
Similar items may have important attributes in common, functionality in common
Group customers together with similar interests and spending patterns
Form of unsupervised learning
Cluster objects into classes using following rule: maximum intraclass similarity and minimize interclass similarity
Probability-based vs. distanced-based clustering

Commercial Examples of Clustering

Claritas PRIZM system
Equifax MicroVision system
Group population by demographic information
Used for marketing and sales

Once clusters are formed, analyze for distinguishing features

Name Income Age Education Vendor

Blue Blood Estates High 35-54 College PRIZM

Shotguns and Pickup Middle 35-64 High school PRIZM

Southside City Low Mix Grade school MicroVision

Living off Land Middle-Low Families with children Low MicroVision

University USA Very Low Young-Mix Medium-High MicroVision

Sunset Years Medium Seniors Medium Microvision

Another use: Deviation detection (which do not fit in cluster)

Example

ID Name Age Balance ($) Income Eyes Gender

1 Amy 62 0 Medium Brown F

2 Al 53 1,800 Medium Green M

3 Betty 47 16,543 High Brown F

4 Bob 32 45 Medium Green M

5 Carla 21 2,300 High Blue F

6 Carl 27 5,400 High Brown M

7 Donna 50 165 Low Blue F

8 Don 46 0 High Blue F

9 Edna 27 500 Low Blue F

10 Ed 68 1,200 Low Blue M

How would you cluster this data?

Financial: (3,5,6,8), (1,2,4), (7,9,10)
Romantic: (4,5,6,9), (7,8,10), (1,2,3)

Clustering Techniques

1.

Partitioning Based

Enumerate partitions and score by some criteria
K-means

2.

Hierarchy Based

Create hierarchical decomposition of data

3.

Model Based

Model is hypothesized for each cluster
Find models that best fit data and each other
Bayesian classification (AutoClass), Cobweb

Partitioning the Space

Partitioning in n-dimensional space
n is number of features
How is distance calculated?
- Manhattan distance
- Euclidean distance
- Can be customized for each type of feature
- Distance between entries 5 and 6: 6 + 3,100 + 0 + 1 + 1 = 3,108
Dimensions can be weighted separately
Distance usually calculated from center of mass of the cluster to a point
How many clusters?

$\psfig{figure=figures/c1.ps}$

K-means Clustering

1.: Determine desired number of clusters
2.: Randomly pick items to become ``seed'' of each cluster
3.: Assign each entry to the nearest cluster
4.: Recalculate centers of clusters
5.: Repeat steps 3 and 4 until number of moves below threshold

Hierarchical Clustering

Create hierarchy of clusters from small to big
Can choose desired number of clusters after seeing results
Agglomerative algorithm
- Start with as many clusters as items
- Iteratively merge closest clusters to form next level
- Stop with single cluster
- More popular method
Divisive
- Start with one cluster
- Iteratively split until clusters each contain one item (or threshold of cluster size is reached)
- More expensive method

How merge clusters?

Single-link method
- Merge clusters whose nearest records are the closest
- Can create long clusters $\psfig{figure=figures/c2.ps}$
Complete-link method
- Merge clusters whose farthest records are the closest
- Creates very compact clusters
Group-average-link method: average locations are closest
Ward's method: minimum total distance between all records

CobWeb

Builds tree of probability-based concept descriptions
Builds tree incrementally as observations are processed
With each observation
- Adds new data to existing node or
- Adds new node to the tree
- Take action that produces best partition
Split and merge to improve chosen node or its children
Discovered concepts maximize number of features that can be predicted

Bayesian Clustering: AutoClass

Fit one of a set of possible probabilistic models to the data
Select theory that best fits the data
- Produce class descriptions that maximize likelihood of data
- Classifications provide best representations of observed data
- Representations calculate probability that each observation is in a given class
- Same mechanism can calculate probability that a new observation is in each class
Databases
- Infrared Astronomical Satellite (IRAS) (77 classes, some relevant unknown patterns)
- DNA Intron (3 classes of patterns in protein donor/acceptor sites)
- LandSat (93 classes corresponding to image features such as road pixels)
- Database of all USA airports

AutoClass

Instead of splitting data into clusters, search for clusters that predict characteristics of all observations
Classes (clusters) provide probabilities for all attribute values
One data point has a probability of membership in all classes
Searching all worlds takes too long
Instead, search a model space
Model defined by V (for continuous parameters) and T (for discrete parameters)
Assume cases are independent
Class definitions can overlap
For any set of class assignments, calculate maximally likely values for parameters in V
A number of models can be used for each type of attribute
Example: Gaussian Normal
- Model location attribute
- Location is real-valued number
- Gaussian Normal distribution of probabilities for the attribute value
- Can calculate likelihood of the value (in general) by integrating function over a limited range centered at point value

Pseudocode

For i in NumberOfClusters
   Randomly initialize i clusters
   Do
      Compute class likelihood vectors
      Compute normalized probabilities for each data point
      Update class model parameters
         Analyze new parameters that will maximize probabilities
         (For normal function, recalculate mean, variance, skewness, kurtosis)
   Until convergence (sum of classes' log marginal probability > threshold or
                      no change)

Example

$\psfig{figure=figures/ac1.ps}$

Sample Run: Auto Imports Database

imports-85c.hd2

num_db2_format_defs 2
number_of_attributes 26
separator_char  ','     ; Can also supply comment char and unknown token
0 discrete nominal "symboling" range 7
1 real scalar "normalized-loses" zero_point 0.0 rel_error 0.01
2 discrete nominal "make" range 22
3 discrete nominal "fuel-type" range 2
4 discrete nominal "aspiration" range 2
5 discrete nominal "num-of-doors" range 2
6 discrete nominal "body-style" range 5
7 discrete nominal "drive-wheels" range 3
8 discrete nominal "engine-location" range 2
9 real scalar "wheel-base" zero_point 0.0 rel_error 0.001
10 real scalar "length" zero_point 0.0 rel_error 0.001
11 real scalar "width" zero_point 0.0 rel_error 0.001
12 real scalar "height" zero_point 0.0 rel_error 0.001
13 real scalar "curb-weight" zero_point 0.0 rel_error 0.0002
14 discrete nominal "engine-type" range 7
15 discrete nominal "num-of-cylinders" range 7
16 real scalar "engine-size" zero_point 0.0 rel_error 0.01
17 discrete nominal "fuel-system" range 8
18 real scalar "bore" zero_point 0.0 rel_error 0.003
19 real scalar "stroke" zero_point 0.0 rel_error 0.003
20 real scalar "compression-ratio" zero_point 0.0 rel_error 0.003
21 real scalar "horse-power" zero_point 0.0 rel_error 0.01
22 real scalar "peak-rpm" zero_point 0.0 rel_error 0.02
23 real scalar "city-mpg" zero_point 0.0 rel_error 0.04
24 real scalar "highway-mpg" zero_point 0.0 rel_error 0.04
25 real scalar "price" zero_point 0.0 rel_error 0.001

Sample Run: Auto Imports Database

imports-85c.db2

3,?,alfa-romero,gas,std,two,convertible,rwd,front,88.60,168.80,64.10,48.80,2548,
   dohc,four,130,mpfi,3.47,2.68,9.00,111,5000,21,27,13495
3,?,alfa-romero,gas,std,two,convertible,rwd,front,88.60,168.80,64.10,48.80,2548,
   dohc,four,130,mpfi,3.47,2.68,9.00,111,5000,21,27,16500
1,?,alfa-romero,gas,std,two,hatchback,rwd,front,94.50,171.20,65.50,52.40,2823,
   ohcv,six,152,mpfi,2.68,3.47,9.00,154,5000,19,26,16500
2,164,audi,gas,std,four,sedan,fwd,front,99.80,176.60,66.20,54.30,2337,ohc,four,
   109,mpfi,3.19,3.40,10.00,102,5500,24,30,13950
2,164,audi,gas,std,four,sedan,4wd,front,99.40,176.60,66.40,54.30,2824,ohc,five,
   136,mpfi,3.19,3.40,8.00,115,5500,18,22,17450
2,?,audi,gas,std,two,sedan,fwd,front,99.80,177.30,66.30,53.10,2507,ohc,five,136,
   mpfi,3.19,3.40,8.50,110,5500,19,25,15250
1,158,audi,gas,std,four,sedan,fwd,front,105.80,192.70,71.40,55.70,2844,ohc,five,
   136,mpfi,3.19,3.40,8.50,110,5500,19,25,17710
1,?,audi,gas,std,four,wagon,fwd,front,105.80,192.70,71.40,55.70,2954,ohc,five,
   136,mpfi,3.19,3.40,8.50,110,5500,19,25,18920
1,158,audi,gas,turbo,four,sedan,fwd,front,105.80,192.70,71.40,55.90,3086,ohc,
   five,131,mpfi,3.13,3.40,8.30,140,5500,17,20,23875
0,?,audi,gas,turbo,two,hatchback,4wd,front,99.50,178.20,67.90,52.00,3053,ohc,
   five,131,mpfi,3.13,3.40,7.00,160,5500,16,22,?
2,192,bmw,gas,std,two,sedan,rwd,front,101.20,176.80,64.80,54.30,2395,ohc,four,
   108,mpfi,3.50,2.80,8.80,101,5800,23,29,16430
...

Sample Run: Auto Imports Database

imports-85c.model

model_index 0 4
ignore 0
single_normal_cm 1 18 19 21 22 25
single_normal_cn 9 10 11 12 13 16 20 23 24
single_multinomial default

imports-85c.s-params (abbreviated)

# start_j_list = 2, 3, 5, 7, 10, 15, 25
# min_report_period = 30
# max_duration = 0
# max_n_tries = 0
# n_save = 2
...

Run

imports-85c.log

AUTOCLASS C (version 2.5) STARTING at Mon Jun 26 16:30:39 1995

AUTOCLASS -SEARCH default parameters:
...

WELCOME TO AUTOCLASS.
  1) Each time I have finished a new 'trial', or attempt to find a good
     classification, I will print the number of classes that trial
     started and ended with, such as 9->7.
  2) If that trial results in a duplicate of a previous run, I will print
     print 'dup' first.
  3) If that trial results in a classification better than any previous,
     I will print 'best' first.
  4) If more than 30 seconds have passed since the last report, and a new
     classification has been found which is better than any previous ones,
     I will report on that classification and on the status of the search
     so far.
  5) This report will include an estimate of the time it will take to find
     another even better classification, and how much better that will be.
     In addition, I will estiamte a lower bound on how long it might take to
     find the very best classification, and how much better that might be.
  6) If you are warned about too much time in overhead, you may want to
     change the parameters n_save, min_save_period, min_report_period, or
     min_checkpoint_period.
  7) To quit searching, type a 'q', hit <return>, and wait.  Otherwise I'll
     go on until I complete trial number (12).
  8) If needed, every 30 minutes I will save the best 2 classifications
     so far to file:
     /home/tove/p/autoclass-c/sample/imports-85c.results-bin
     and a description of the search to file:
     /home/tove/p/autoclass-c/sample/imports-85c.search
  9) A record of this search will be printed to file:
     /home/tove/p/autoclass-c/sample/imports-85c.log

BEGINNING SEARCH at Mon Jun 26 16:30:40 1995

[j_in=2]  [cs-3: cycles 15] best2->2(1) [j_in=3]  [cs-3: cycles 49] best3->3(2) [j_in=5]  [cs-3: cycles 12] best5->5(3) [j_in=7]  [cs-3: cycles 11] best7->7(4) [j_in=10]  [cs-3: cycles 14] best10->10(5) [j_in=15]  [cs-3: cycles 28] 15->15(6) [j_in=25]  [cs-3: cycles 10] 25->22(7)

----------------  NEW BEST CLASSIFICATION FOUND on try 5  -------------
It has 10 CLASSES with WEIGHTS 32 30 28 24 21 21 20 11 10 8
PROBABILITY of both the data and the classification = exp(-16368.367)
(Also found 4 other better than last report.)

-----------  SEARCH STATUS as of Mon Jun 26 16:31:12 1995  -----------
It just took 32 seconds since beginning.
Estimate < 28 seconds to find a classification
  exp(61.7) [= 6.0e+26] times more probable.
Estimate >> 1 minute 6 seconds to find the very best classification,
 which may be exp(28.6) to exp(11764.5) times more probable.
Have seen 7 of the estimated > 21 possible classifications (based on 0
 duplicates do far).
Log-Normal fit to classifications probabilities has M(ean) -16598.5,
 S(igma) 154.9
Choosing initial n-classes randomly from a log_normal [M-S, M, M+S] =
 [2.9, 7.0, 16.9]
Overhead time is 3.0 % of total search time

[j_in=9]  [cs-3: cycles 10] 9->9(8) [j_in=3]  [cs-3: cycles 11] 3->3(9) [j_in=5]  [cs-3: cycles 48] 5->5(10) [j_in=3]  [cs-3: cycles 18] 3->3(11) [j_in=5]  [cs-3: cycles 35] 5->5(12)


ENDING SEARCH because max number of tries reached at Mon Jun 26 16:31:32 1995
  after a total of 12 tries over 53 seconds
A log of this search is in file:
 /home/tove/p/autoclass-c/sample/imports-85c.log
The search results are stored in file:
 /home/tove/p/autoclass-c/sample/imports-85c.results-bin
This search can be restarted by having "force_new_search_p = false" in file:
 /home/tove/p/autoclass-c/sample/imports-85c.s-params
 and reinvoking the "autoclass -search ..." form

------------------  SUMMARY OF 10 BEST RESULTS  ------------------
PROBABILITY: exp(-16368.367) N_CLASSES: 10 FOUND ON TRY:   5 *SAVED*
PROBABILITY: exp(-16477.345) N_CLASSES:  9 FOUND ON TRY:   8 *SAVED*
PROBABILITY: exp(-16537.556) N_CLASSES: 15 FOUND ON TRY:   6
PROBABILITY: exp(-16542.413) N_CLASSES:  7 FOUND ON TRY:   4
PROBABILITY: exp(-16590.504) N_CLASSES:  5 FOUND ON TRY:  10
PROBABILITY: exp(-16617.452) N_CLASSES:  5 FOUND ON TRY:   3
PROBABILITY: exp(-16632.595) N_CLASSES:  5 FOUND ON TRY:  12
PROBABILITY: exp(-16673.545) N_CLASSES: 22 FOUND ON TRY:   7
PROBABILITY: exp(-16759.053) N_CLASSES:  3 FOUND ON TRY:   2
PROBABILITY: exp(-16898.385) N_CLASSES:  3 FOUND ON TRY:   9
...

Results

imports-85c.class-text-1

      CROSS REFERENCE: CLASS => CASE NUMBER MEMBERSHIP


      AutoClass CLASSIFICATION for the 205 cases in:
        /home/centauri/cook/projects/ac/sample/imports-85c.db2
        /home/centauri/cook/projects/ac/sample/imports-85c.hd2
      with log-A<X/H> (approximate marginal likelihood) = -16564.197
      from classification results file:
        /home/centauri/cook/projects/ac/sample/imports-85c.results-bin
      and using models:
        /home/centauri/cook/projects/ac/sample/imports-85c.model - index = 0



                                 CLASS = 0



Case #   make            num-of-doors   body-style    (Cls  Prob)
--------------------------------------------------------------------------------

     5   audi            four           sedan               0.99
     7   audi            four           sedan               1.00
     8   audi            four           wagon               1.00
     9   audi            four           sedan               1.00
    10   audi            two            hatchback           1.00
    15   bmw             four           sedan               1.00
    16   bmw             four           sedan               1.00
    17   bmw             two            sedan               1.00
    18   bmw             four           sedan               1.00
    48   jaguar          four           sedan               1.00
    49   jaguar          four           sedan               1.00
    50   jaguar          two            sedan               1.00
    68   mercedes-benz   four           sedan               1.00
    69   mercedes-benz   four           wagon               1.00
    70   mercedes-benz   two            hardtop             1.00
    71   mercedes-benz   four           sedan               1.00
...


                                 CLASS = 1



Case #   make            num-of-doors   body-style    (Cls  Prob)
--------------------------------------------------------------------------------

     1   alfa-romero     two            convertible         1.00
     2   alfa-romero     two            convertible         1.00
     3   alfa-romero     two            hatchback           1.00
    11   bmw             two            sedan               1.00
    12   bmw             four           sedan               1.00
    13   bmw             two            sedan               1.00
    14   bmw             four           sedan               0.99
                                                        0   0.01
    30   dodge           two            hatchback           1.00
    47   isuzu           two            hatchback           1.00
    56   mazda           two            hatchback           1.00
    57   mazda           two            hatchback           1.00
    58   mazda           two            hatchback           1.00
    59   mazda           two            hatchback           1.00
    66   mazda           four           sedan               0.99
    76   mercury         two            hatchback           1.00
    83   mitsubishi      two            hatchback           1.00
    84   mitsubishi      two            hatchback           1.00
    85   mitsubishi      two            hatchback           1.00
   105   nissan          two            hatchback
...

                                 CLASS = 2



Case #   make            num-of-doors   body-style    (Cls  Prob)
--------------------------------------------------------------------------------

    19   chevrolet       two            hatchback           1.00
    20   chevrolet       two            hatchback           1.00
    21   chevrolet       four           sedan               1.00
    22   dodge           two            hatchback           1.00
    23   dodge           two            hatchback           1.00
    31   honda           two            hatchback           1.00
    32   honda           two            hatchback           1.00
    33   honda           two            hatchback           1.00
    34   honda           two            hatchback           1.00
    35   honda           two            hatchback           1.00
    36   honda           four           sedan               1.00
    37   honda           four           wagon               1.00
    45   isuzu           two            sedan               1.00
    46   isuzu           four           sedan               1.00
    51   mazda           two            hatchback           1.00
...

                                 CLASS = 9 (continued)



Case #   make            num-of-doors   body-style    (Cls  Prob)
--------------------------------------------------------------------------------

    81   mitsubishi      two            hatchback           1.00
    88   mitsubishi      four           sedan               1.00
    89   mitsubishi      four           sedan               1.00
   120   plymouth        two            hatchback           1.00
   190   volkswagen      two            convertible         0.99

Results

imports-85c.case-text-1

      CROSS REFERENCE: CASE NUMBER => MOST PROBABLE CLASS


      AutoClass CLASSIFICATION for the 205 cases in:
        /home/centauri/cook/projects/ac/sample/imports-85c.db2
        /home/centauri/cook/projects/ac/sample/imports-85c.hd2
      with log-A<X/H> (approximate marginal likelihood) = -16564.197
      from classification results file:
        /home/centauri/cook/projects/ac/sample/imports-85c.results-bin
      and using models:
        /home/centauri/cook/projects/ac/sample/imports-85c.model - index = 0



     Case #  Class  Prob         Case #  Class  Prob         Case #  Class  Prob
--------------------------------------------------------------------------------
          1    1    1.00             47    1    0.99             93    2    1.00
          2    1    1.00             48    0    1.00             94    2    0.99
          3    1    1.00             49    0    1.00             95    2    1.00
          4    3    0.99             50    0    1.00             96    2    0.99
          5    0    0.99             51    2    0.99             97    2    0.99
          6    4    0.99             52    2    0.99             98    2    0.99
          7    0    1.00             53    2    0.99             99    2    0.99
          8    0    1.00             54    2    0.99            100    3    0.99
          9    0    1.00             55    2    0.99            101    3    0.99
         10    0    0.99             56    1    1.00            102    0    0.99
...

Results

imports-85c.influ-o-text-1

...
CLASSIFICATION HAS 10 POPULATED CLASSES:  (max global influence value = 7.063)

  We give below a heuristic measure of class strength: the approximate
  geometric mean probability for instances belonging to each class,
  computed from the class parameters and statistics.  This approximates
  the contribution made, by any one instance "belonging" to the class,
  to the log probability of the data set w.r.t. the classification.  It
  thus provides a heuristic measure of how strongly each class predicts
  "its" instances.

   Class     Log of class       Relative         Class     Normalized
    num        strength       class strength     weight    class weight

     0        -8.25e+01          1.64e-10          51         0.249
     1        -8.01e+01          1.69e-09          39         0.190
     2        -6.99e+01          4.89e-05          29         0.141
     3        -6.86e+01          1.75e-04          18         0.088
     4        -7.25e+01          3.58e-06          16         0.078
     5        -6.86e+01          1.68e-04          14         0.068
     6        -7.11e+01          1.43e-05          12         0.059
     7        -5.99e+01          1.00e+00           9         0.044
     8        -6.95e+01          7.20e-05           9         0.044
     9        -6.95e+01          6.73e-05           8         0.039
...

ORDERED LIST OF NORMALIZED ATTRIBUTE INFLUENCE VALUES SUMMED OVER ALL CLASSES:

  This gives a rough heuristic measure of relative influence of each
  attribute in differentiating the classes from the overall data set.
  Note that "influence values" are only computable with respect to the
  model terms.  When multiple attributes are modeled by a single
  dependent term (e.g. multi_normal_cn), the term influence value is
  distributed equally over the modeled attributes.

   num                        description                          I-*k

    38: Log compression-ratio                                      1.000
    36: Log curb-weight                                            0.607
    29: Log horse-power                                            0.604
     2: make                                                       0.589
    37: Log engine-size                                            0.582
    32: Log wheel-base                                             0.550
    28: Log stroke                                                 0.515
    33: Log length                                                 0.496
    31: Log price                                                  0.487
    34: Log width                                                  0.437
    17: fuel-system                                                0.414
    27: Log bore                                                   0.408
    26: Log normalized-loses                                       0.305
    35: Log height                                                 0.292
    39: Log city-mpg                                               0.222
     7: drive-wheels                                               0.209
    40: Log highway-mpg                                            0.191
    14: engine-type                                                0.160
     6: body-style                                                 0.130
     3: fuel-type                                                  0.121
     5: num-of-doors                                               0.106
    30: Log peak-rpm                                               0.106
    15: num-of-cylinders                                           0.089
     4: aspiration                                                 0.075
     8: engine-location                                            0.009
     0: symboling                                                  -----
     1: normalized-loses                                           -----
...

Applications: IRAS Data

5,425 mean spectra of IRAS point sources
Each spectrum consists of 100 ``blue'' and 100 ``red'' channels
Spectra cover a wide range of intensities
Treat each channel as independent normally distributed single real value
Many difficulties in interacting with scientists
- Scientists released pre-processed data
- Pre-processing removed some interesting data
- Method of pre-processing not initially revealed
- Changed reference point from Vega to Tau partway through collection, did not mention this change

IRAS Results

Generated 77 classes
Significantly different classification than with human analysis
AutoClass found many subtle distinctions between spectra that superficially look similar (not previously known)
Example, two subgroups of stars distinguished (not previously known to be different)
Analyzed classes containing known carbon stars, thereby tripling number of known (or suspected) carbon stars
Revealed blackbody stars with significant IR excess (dust surrounding star)

Application: DNA Intron Data

Database of 3,000 donor and acceptor sites from human DNA
Coding DNA is interspersed with parts from messenger RNA
Beginning of splice point is donor site, end is acceptor site
Intron length (between donor and acceptor) can vary from 80-thousands of bases
Donor database consists of ordered list of bases 10 bases before splice site and 40 bases of intron
Bases are A (adenine), C (cytosine), G (guanine), and T (thymine)

Results

First generated many classes with one unique base sequence per class
There are many duplicated splice sites in human DNA
Analysis showed many duplicates appear in a sequence in the same gene
If occur in different genes, usually result of gene duplication

Results after removing gene duplication

Found three classes
First class, every position was dominated by C (C rich)
Other two classes were TA rich and G rich
Class of donor site correlated with class of acceptor site
Donor, acceptor, and entire intron are C-rich
Similar pattern observed for all classes
If one intron is rich in a particular base, high probability that neighboring introns will be rich in same bases

Application: LandSat Data

Analyze 1024x1024 array of satellite image pixels
Each pixel records seven spectral intensity values
1,000,000 cases
Big enough to need parallel algorithm
Parallel AutoClass, C AutoClass developed with UTA

Results

Discovered 93 classes
Classes used to discover meta-classes
Classes were roads, rivers, valley bottoms, valley edges, fields of crops

Assessment

Clustering good first tool when classes are unknown
Results can be used as is or used to determine classes
Structure information may yield better results

Name	Income	Age	Education	Vendor
Blue Blood Estates	High	35-54	College	PRIZM
Shotguns and Pickup	Middle	35-64	High school	PRIZM
Southside City	Low	Mix	Grade school	MicroVision
Living off Land	Middle-Low	Families with children	Low	MicroVision
University USA	Very Low	Young-Mix	Medium-High	MicroVision
Sunset Years	Medium	Seniors	Medium	Microvision

ID	Name	Age	Balance ($)	Income	Eyes	Gender
1	Amy	62	0	Medium	Brown	F
2	Al	53	1,800	Medium	Green	M
3	Betty	47	16,543	High	Brown	F
4	Bob	32	45	Medium	Green	M
5	Carla	21	2,300	High	Blue	F
6	Carl	27	5,400	High	Brown	M
7	Donna	50	165	Low	Blue	F
8	Don	46	0	High	Blue	F
9	Edna	27	500	Low	Blue	F
10	Ed	68	1,200	Low	Blue	M