Machine Learning
Class Project
Project due December 17, 2008 (midnight)
No late submissions will be accepted.
Intermediate deadlines:
Team Registration: October 24, 2008
Initial Entry: November 7, 2008
For the class project, you will form 1-2 person teams to compete in the
Netflix Prize, a machine
learning challenge to predict how Netflix users will rate movies. The
best entries each year can win $50,000, and anyone achieving their
target performance increase of 10% over their current approach will
win $1,000,000. Of course, the main goal is for you to learn more
about applying machine learning techniques to real problems.
Below are the specific requirements for the class project.
- Read over the material at
www.netflixprize.com, especially
the rules and the frequently asked questions.
- You may choose to compete individually or as a two-person team.
Some portion of the grading will be based on the difficulty of your
approach and your team's ranking within the class, so I recommend
you pair up. If you need help finding a teammate, let me know. Once
you have your team finalized, you should follow the instructions on
the website to register your team. By October 24 provide
me with your team name, team members and password.
I need this information in order
to monitor your progress and the class rankings by accessing the
performance of your entries maintained at the Netflix Prize website.
- Next, you will need to download the data, which is about 700MB
compressed and about 2GB uncompressed. Let me know if you need help
storing the data. More information about the data is available in
the README file contained in the download. You should read this
carefully.
- For your first entry, each team should submit the same entry;
namely, predict 3.8 (the global average rating) for every customer-id/date
for every movie. This should result in an RMSE score of 1.1357.
By November 7 provide me with this prediction file that has been
successfully submitted to the Netflix Prize site.
- The remainder of your effort on the project should involve designing,
implementing and testing one or more machine learning approaches to
achieve better predictions for the Netflix Prize data. Note the following:
- We will be maintaining an up-to-date ranking of the teams based on
their submissions to the Netflix Prize. So, if you make a submission that
improves your current best RMSE score, let me or the TA know so that we
can update the class ranking.
- Be aware that the Netflix Prize only allows one submission per team,
per day, so you will need to make steady progress on this project.
You will not be able to perform many last minute submissions.
- By December 17 you should email to me
(holder@wsu.edu) the following:
- Report describing all your attempts (at least for those that show up
under your NetFlix Prize team information), the methods used for each,
enough detail on your best submission so that the results can be reproduced,
and a general discussion of your experience (what worked, what didn't,
why, and what would you try next).
- All code and instructions necessary for reproducing your best result
from the training data. That is, we will need to be able to input the
training data to your software and get out your best prediction file. You
can assume we have Weka, but there is no requirement that you use Weka.
- Your best prediction file successfully submitted to the Netflix
Prize site.
- For 2-person teams, each team member should send me a
separate email describing each team member's contribution
to the project. These emails will be considered confidential between me
and you.
- Your project will be graded according to the following criteria.
- The difficulty, number and creativity of your successful submissions
to the Netflix Prize.
- The relevance to machine learning of your approach(es) to the
problem.
- Your team's ranking within the class based on the RMSE score of your
best successful Netflix Prize submission.
- The relative contribution of each team member.
- Your meeting the above intermediate deadlines.
- The quality of your report based on presentation, coverage, detail
and general discussion.
- The efficiency, understandability and correctness of your instructions
and code for reproducing your best result.
Team Rankings
| Rank | Team Name | Best RMSE |
| 1 | Narimus | 0.9028 |
| 2 | Jack_Fisher | 0.9171 |
| 3 | Barbarossa | 1.0075 |
| 4 | Winnie | 1.0150 |
| 5 | VirtualMan | 1.0533 |
Last updated: December 19, 2008 at 10:44am