Supervised Learning of Sign Language Characters
Project Description

For this project, you will need to download and install the
Weka machine learning code from www.cs.waikato.ac.nz/ml/weka/.  This machine
learning code can run different learning algorithms on the same input data.
The learning algorithms that you will test are Naive Bayes, a multilayer
perceptron, the IB1 instance-based learner, and the J4.8 decision tree
algorithm.  You will find details on specifying the input data and attributes,
selecting classifiers, and interpreting the output in the Weka tutorial
downloaded with the code.  More extensive details are found in the "Data Mining"
book written by Witten and Frank.

Weka is designed to accept an ARFF file as input.  Example input files are
found in the data directory.  The name of the learning problem (the relation)
is specified, followed by the attributes (they can be nominal or real) and
the data.

We will use the learning algorithms to recognize sign language letters.
This type of learning problem has potential for use not only in automatically
recognizing and understanding sign language, but also for performing gesture
recognition and other related image-based recognition tasks.
I have downloaded 6 25x25, black and white images for each of the letters
"c", "d", and "e".  These are stored in PGM (ascii) format.  Each of the
400 pixels (features) is represented by a value in the range 0-255.

   a) For your first step, you will use the specified machine learning
algorithms implemented in Weka to learn a two-class concept that distinguishes
the sign language "c" letters from the "d" letters.  Submit the input files that
you used, the output concept that was generated, and test the models on the
training data.

   b) Next, test the performance using 3-fold cross validation.  How does this
affect the performance results, and why?  Comment on the performance of
each algorithm - why do you think some algorithms outperformed others?
Why are the results poorer here then when the training data was used for
testing.

   c) Devise a method of using these learning algorithms to learn a
three-class problem that distinguishes the "c", "d", and "e" letters from
each other, as a set of two-class problems.
Explain the method you used, submit the input and output
files, and summarize the results.

   d) Compare and contrast the concept representations that the alternative
learning algorithms provide.  Note that these algorithms provide a visualization
option in Weka to help interpret the generated concept.  What are some of the
advantages and disadvantages of the alternative representations?

   e) Finally, test one mechanism for improving the classification accuracy of
the learning algorithms.  This mechanisms may include adding more training
data, thresholding the images (values below x are mapped to 0, the rest are
mapped to 255), or another improvement that you design.  Provide a discussion
of your enhancement and summarize the cross-validation results.