PROGRAM #5
Fall 2012
Due: November 28, 2012, 11:59pm on Angel Dropbox
General Guidelines:
· All source code must be in C++. You can use Windows or Unix environments at your discretion.
· Each program project should be accompanied by a COVER SHEET (see details below). Assignments without cover page will NOT be graded.
·
For this programming
assignment, you have the option of either work as a TEAM if you prefer, or as
an individual. A TEAM should contain at
most 2 students including yourself.
·
If working as a team,
submit only one copy of your assignment into the Drop box and indicate
in the text box of the drop-box while submission, both names in the
project. If you specify only one name then only that person will get the grade
and the other will not. So please make sure you fill in both names.
·
If working as a team,
remember to include both your last names in the zip file of your submission.
·
In the rest of this
document, the term “YOU” is used either to refer to you (if you are working
individually) or your team (if you work as a team).
· The final material for submission should be entirely written by you. If you decide to consult with others or refer materials online, you MUST give due credits to these sources (people, books, webpages, etc.) by listing them on the cover sheet mentioned below. Note that no points will be deducted for referencing these sources. However, your discussion/consultation should be limited to the initial design level. Sharing or even showing your source code/assignment verbiage to anyone else in the class, or direct reproduction of source code/verbiage from online resources, will all be considered plagiarism, and therefore will be awarded ZERO points and subject to the WSU Academic Dishonesty policy. (Reproducing from the Weiss textbook is an exception to this rule, and such reproduction is encouraged wherever possible.)
· Grading will be based on correctness, coding style, implementation efficiency, exception handling, source code documentation, and the written report. Try to also follow the good coding practices discussed in class during the C++ review lecture.
How to submit using Angel?
All assignments should be submitted to the PA3 Dropbox within Angel.
1) Once you login to Angel, click on the "Lessons" tab.
2) You will see "Programming Assignment #5".
Click on that and you will see the submission website.
Under Title, just write your first and last name and say PA 5 - e.g., "Ananth Kalyanaraman,
PA5".
If you are submitting as a team, only one of you
should submit, and put both your names on the Title:
e.g., "Firstname1 Lastname1 & Firstname2 Lastname2, PA5"
Leave the message textbox empty, unless you want to add a
special comment to us.
Then click on the "Attachments" button and attach your ZIP
file there. And then press the SUBMIT button
EMAIL
METHOD (backup only):
Only if dropbox doesn't work for
some reason, then you can submit by email to "All course faculty"
on Angel. Same time deadlines apply.
Late submission policy: A late penalty of 10% will be assessed for late submissions within the next 24-hour.
Any submission sent by any other means than specified above will be discarded and not graded.
PROBLEM Hashing and Hash tables:
For this assignment you will be comparing the performance of different hash functions and different collision resolution implementations. More specifically, you will be testing hash table implementations to store and retrieve a set of strings. The input is a set of n strings, and for all practical purposes, you can assume that the length of each string can be bound by a small constant. You need to build and maintain a searchable hash table for this input. Obviously there are many ways you can design the hash table, depending on what hash function you pick, and what collision resolution strategy you choose.
Aim 1. Comparing different collision resolution strategies keeping the hash function fixed
For this aim, you will be implementing and experimentally comparing three collision resolution mechanisms: chaining, linear probing, and quadratic probing. For doing this, you don't have to write the code from scratch. The source code for chaining and quadratic probing are already downloadable from the Weiss book's webpage: http://users.cis.fiu.edu/~weiss/dsaa_c++/Code/. tables. For your convenience, I have created zipped archive of the main relevant files in this zip file (note: this code may *not* be STL compliant).
In fact
I modified the Weiss source code to be STL compliant. Here is the
ZIP file. If you use this version, you don't need the
prior zip file.
Disclaimer: I spent only about 30 minutes working on this
modification and so haven't had a chance to test it thoroughly.
All I can guarantee is that the code compiles and runs the test
driver code (TestHashTable.cpp) successfully on my Linux box. So
what I suggest you do is eye-ball every line of this code (its not
very long!) and make sure it looks correct before you modify it for
your assignment. You might even want to think of adding your own
lines of code to the test driver code for some additional testing.
In the process if you notice any issues with the code, let me know
so that I can fix it.
Notice that you
need to implement the code to do linear probing. This should
be a very simple modification of the quadratic probing code (more
specifically, to the function that finds the next position of
probe). Call the resulting source files for linear probing as LinearProbing.h and LinearProbing.cpp.
You will also notice that the hash function for converting a string key into a hash table index used by all these three implementations is the same, and is defined "int hash( const string & key, int tableSize )" inside the three .cpp files. For this aim, we will keep this hash function as-is, and just experiment by varying the collision resolution implementations, viz. chaining, linear probing and quadratic probing.
You will also notice that the quadratic probing code initializes the hash tablesize to a default size of 101, and then does a rehash anytime the number of elements in the hash table reaches half of the hash tablesize. And rehashing doubles the tablesize. Leave these things as they are and keep the same setup for linear probing as well.
You will also notice that the chaining implementation also initializes the hash tablesize to a default size of 101. However it does NOT do rehashing. Again, leave this as is.
Experimental plan:
Input files: There are two input files provided for your testing and performance measurements.
For the sake of simplicity of your code, we will assume that the names of these two input files won't change and that these two input files are always made available in the current working directory of execution. Of course you need to ensure this when you run and test the code.
Write a test driver function, call it TestAim1.cpp, whose main() function performs the following sequence of operations in order:
At the end of this experiment, you should generate the following table:
Collision strategy | Insert | Search | |||
Total time | Avg. time | Total number of collisions | Total time | Avg. time | |
Chaining | |||||
Linear Probing | |||||
Quadratic Probing |
Make sure you specify the unit for the run-times - i.e., hours or minutes or seconds or milliseconds or microseconds, as appropriate.
Aim 2. Comparing different hash functions keeping the collision strategy fixed
For this aim, you will compare three different string
hashing functions (given in the book) keeping the collision strategy fixed
at Quadratic Probing. The three hash functions to use are all given in the
book, Figure 5.2, Figure 5.3 and Figure 5.4. Note that Figure 5.4 is same as
the default hash function used in the source code for Aim 1. For the purpose
of analysis, let us refer to these three hash functions as follows:
1) Figure
5.2: "Simple hash function"
2) Figure
5.3: "Prefix hash function"
3) Figure
5.4: "Full-length hash function"
For comparing these three hash functions, first add the other two hash functions into the source code for Quadratic Probing (obviously using different names to refer to those functions). Then, repeat the same procedure that you used to analyze Quadratic Probing method in Aim 1, with the only exception that now you should keep track of separate timer variables and collision variables for the three different hash functions. As a result of this experiment, you should generate the following table:
Hash function | Insert | Search | |||
Total time | Avg. time | Total number of collisions | Total time | Avg. time | |
Simple | |||||
Prefix | |||||
Full-length |
Note that you don't have to repeat the experiment for the row corresponding to the Full-length. The results should already be available from the corresponding entry in Aim 1.
REPORT:
In a separate document (Word or PDF), not more than 2 pages, compile the following sections:
· A: Problem statement. In 1-2 sentences state the goal(s) of this exercise.
· C: Experimental Results: In this section, include the following:
o The two tables as described above.
o Are the observations made in the above plots as per your theoretical expectations? Justify.
FINAL CHECKLIST FOR SUBMISSION:
___ Cover sheet
___ A separate folder containing all your source code (including the main function you used for testing)
___ Report
___ Both the above zipped into another folder archive called Program5<YourLastName(s)>.zip
GRADING RUBRICS:
This assignment is mainly about empirical testing and analysis. There is some design component but not much. So for grading this PA, we will primarily look at how well you have designed experiments, what the plots look like, and have you offered satisfactory/convincing rationale to explain your observations. Therefore the points during grading will be distributed as follows:
The whole assignment is worth a total of 100 points.
----------------------------------------
CODING (45 pts):
(10 pts): Are all versions of the hash tables and functions implemnted correctly?
(10 pts): Is the code implemented in an efficient way? i.e., are there parts in the code that appear redundant, or implemented in ways that can be easily improved? Does the code conform to good coding practices of Objected Oriented programming?
(15 pts): Does the code compile and run successfully on a couple of test cases?
(10 pts): Is the code documented well and generally easy to read (with helpful comments and pointers)?
REPORT (55 pts):
(15 pts): Is the experimental plan technically sound? Is the experimental setup specified clearly?
(10 pts): Are results shown as plots in a well annotated manner and are general trends in the results visible?
(30 pts): Are the justifications/reasons provided to explain the observations analytically sound? Is there a reasonable attempt to explain anomalies (i.e., results that go against analytical expectations), if any?
----------------------------------------
Obviously to come up with the above evaluation, the TAs are going to both read and run your code.