Machine Learning: Foundations
(2010/11)
Final Project due 3.3.2013
REMARK:
For submission of the project create a small web site with the document and
pointers to any relevant stuff (data, code, etc.) You probably not want to
write your student ID, having your name is enough.
For
the final project you need to select a project that would utilize what you
learned about machine learning.
There
will be a great flexibility in the topic of the project, but you need to
first get it approved !
Classifying
the possible projects, there are a few different general types
The
project can be mainly an empirical project. Involving experimentation
(hopefully) with real data, and the
methodologies
(algorithms) introduced in the course. The project can be theoretical in
mature. This again will have
two
distinct flavors. It can be a summary, reading a few related papers and
writing a critical summary of them
(critical
summary means that you try to point to weaknesses in the model/results, and
what should be the goal).
It
can be a research project, which will usually involve in trying to
define an open interesting problem, and trying
(hopefully)
solve it, but at the very least be able to explain what you tried to solve it,
and why did it fail (or where
did
you get stuck).
If
you have an idea which you think is reasonable, and does not fall in any of the
category, simply ask if it can be a project,
At
the end of the project you will need to write a report (ideally, 7-10 pages and
no more than 12) that you summarize what you did in the project.
The
project can be done in pairs.
REMEMBER:
You need to first get your project approved!
The
deadline for the project is Dec 23, 2012.
In
the following I will be more precise, and try to give you pointers to each
project.
Empirical
data project
For
this kind of project you will need two elements. First you will need the data
that you are interested in,
and
second, you need to define what you like to do (learn) with the data.
DATA:
The best kind of data is any data
that you have access to and you are interested in analyzing.
(If this is part of your graduate
studies, even better.)
Here are a few more standard
pointers to open data sets:
UC Irvine Machine Learning Repository
MNIST and
procedures to handle it Hinton's
webpage
Also, you can try to think “out of the box”
Yahoo! Finance has a
large variety of financial data about stock price.
Google Trend has
information about queries.
Finally, there is a huge variety on the web of user report
on product and services.
LEARNING:
Try
to propose what you would like to do with the data. You can either try to
simply learn a given task,
or
try to learn something about a learning algorithm , or try to compose a two (or
more) learning algorithms.
Here
are a few examples:
TRIPADVISOR:
try to reconstruct their proprietary Popularity Index algorithm. (see http://www.tripadvisor.nl/pages/owner_faq.html)
SENTIMENT
ANALYSIS: Given user reviews, to determine if the review positive, negative or
neutral. (see http://www.cs.uic.edu/~liub/FBS/sentiment-analysis.html)
SPAM
Analysis: http://csmining.org/index.php/data.html
Empirical
algorithm project
Select
one algorithm, and some works that show how to optimize the performance of the
algorithm.
Do
an empirical evaluation of the proposed methodologies of setting the
parameters, and their influence on the performance.
Still,
you need in advance to decide on which data sets you will perform the empirical
evaluation
Summary
project
You
first need to find a number of related papers (2-3) on the same topic (that
interest you).
A
good starting place is surveys and tutorials that explain the area and have
many references.
Another
place to find papers on machine learning is in the conferences (COLT (more
theoretical), ICML (more empirical) or NIPS) or journals (JMLR or Machine
Learning Journal).
Here
are a few leads on some of the topics:
Boosting page also tutorial,
tutorial2
Semi-supervised
learning- survey,
tutorial
Active
learning: Sanjoy
Dasgupta and tutorial
Online
Learning and Regret Minimization: Survey, tutorial, tutorial2,
Elad Hazan, Nicolo Cesa-Bianchi
Domain
Adaptation: tutorial and Mehryar Mohri
Structured
Prediction: tutorial
Agnostic
Learning: tutorial
Research
project
Basically,
this is very similar to the summary project, in the planning and preparation.
The main difference might be that you will select maybe a single paper,
and
that your goal would be to identify a research project that is related to this
work. In the proposal stage try to give a general outline of what you would
like to do.
Multi-Class
Labels
Overview can be found in Chapter 8 of:
Foundations of machine learning Mehryar Mohri, Afshin Rostamizadeh and Ameet Talwalkar;
MIT Press, 2012
The slides are available here