Machine Learning: Foundations
(2010/11)
Final Project
For
the final project you need to select a project that would utilize what you
learned about machine learning.
There
will be a great flexibility in the topic of the project, but you need to
first get it approved !
Classifying
the possible projects, there are two different general types (actually it will
be closer to three).
The
project can be mainly an empirical project. Involving experimentation
(hopefully) with real data, and the
methodologies (algorithms) introduced in the course.
The project can be theoretical in mature. This again will have
two
distinct flavors. It can be a summary, reading a few related papers and
writing a critical summary of them
(critical summary means that you try to point to weaknesses
in the model/results, and what should be the goal).
It
can be a research project, which will usually involve in trying to
define an open interesting problem, and trying
(hopefully) solve it, but at the very least be able to
explain what you tried to solve it, and why did it fail (or where
did
you get stuck).
If
you have an idea which you think is reasonable, and does not fall in any of the
category, simply ask if it can be a project,
At
the end of the project you will need to write a report (ideally, 7-10 pages)
that you summarize what you did in the project.
The
project can be done in pairs.
REMEMBER:
You need to first get your project approved!
The
deadline for the project is April 27, 2010.
In
the following I will be more precise, and try to give you pointers to each
project.
Empirical
project
For
this kind of project you will need two elements. First you will need the data
that you are interested in,
and
second, you need to define what you like to do (learn) with the data.
DATA:
The best kind of data is any data
that you have access to and you are interested in analyzing.
(If this is
part of your graduate studies, even better.)
Here are a few more standard
pointers to open data sets:
UC Irvine Machine Learning Repository
MNIST and
procedures to handle it
Hinton's
webpage
Also, you can try to think “out of the box”
Yahoo! Finance has a
large variety of financial data about stock price.
Google Trend has
information about queries.
Finally, there is a huge variety on the web of user report
on product and services.
LEARNING:
Try
to propose what you would like to do with the data. You can either try to
simply learn a given task,
or
try to learn something about a learning algorithm , or try to compose a two (or
more) learning algorithms.
Here
are a few examples:
BOOSTING
USING REGRET MINIMIZATION: In class we showed a general reduction from regret
minimization to boosting.
Try
to utilize this reduction with some of the many regret minimization algorithms
and compare them to AdaBoost.
SENTIMENT
ANALYSIS: Given user reviews, to determine if the review positive, negative or
neutral.
Summary
project
You
first need to find a number of related papers (2-3) on the same topic (that
interest you).
One
place to find papers on machine learning is in the conferences (COLT (more
theoretical), ICML (more empirical) or
NIPS) or journals (JMLR or Machine Learning Journal). Another
good starting places are surveys and tutorials that explain the area and have
many references.
Here
are a few leads on some of the topics:
Boosting page also tutorial,
tutorial2
Semi-supervised
learning- survey,
tutorial
Active
learning: Sanjoy Dasgupta and tutorial
Online
Learning and Regret Minimization: Survey, tutorial, tutorial2,
Elad Hazan, Nicolo
Cesa-Bianchi
Domain
Adaptation: tutorial and Mehryar Mohri
Structured
Prediction: tutorial
Agnostic
Learning: tutorial
Research
project
Basically,
this is very similar to the summary project, in the planning and preparation.
The main difference might be that you will select maybe a single paper,
and
that your goal would be to identify a research project that is related to this
work. In the proposal stage try to give a general outline of what you would
like to do.