Machine Learning: Foundations (2010/11)

Students' Projects

Final Project

For the final project you need to select a project that would utilize what you learned about machine learning.

There will be a great flexibility in the topic of the project, but you need to first get it approved !

Classifying the possible projects, there are two different general types (actually it will be closer to three).

The project can be mainly an empirical project. Involving experimentation (hopefully) with real data, and the

methodologies (algorithms) introduced in the course. The project can be theoretical in mature. This again will have

two distinct flavors. It can be a summary, reading a few related papers and writing a critical summary of them

(critical summary means that you try to point to weaknesses in the model/results, and what should be the goal).

It can be a research project, which will usually involve in trying to define an open interesting problem, and trying

(hopefully) solve it, but at the very least be able to explain what you tried to solve it, and why did it fail (or where

did you get stuck).

If you have an idea which you think is reasonable, and does not fall in any of the category, simply ask if it can be a project,

At the end of the project you will need to write a report (ideally, 7-10 pages) that you summarize what you did in the project.

The project can be done in pairs.

REMEMBER: You need to first get your project approved!

The deadline for the project is April 27, 2010.

In the following I will be more precise, and try to give you pointers to each project.

Empirical project

For this kind of project you will need two elements. First you will need the data that you are interested in,

and second, you need to define what you like to do (learn) with the data.

DATA:

The best kind of data is any data that you have access to and you are interested in analyzing.

(If this is part of your graduate studies, even better.)

Here are a few more standard pointers to open data sets:

Delve Datasets

UC Irvine Machine Learning Repository

ICDAR

MNIST and procedures to handle it Hinton's webpage

Also, you can try to think “out of the box”

Yahoo! Finance has a large variety of financial data about stock price.

Google Trend has information about queries.

Finally, there is a huge variety on the web of user report on product and services.

LEARNING:

Try to propose what you would like to do with the data. You can either try to simply learn a given task,

or try to learn something about a learning algorithm , or try to compose a two (or more) learning algorithms.

Here are a few examples:

BOOSTING USING REGRET MINIMIZATION: In class we showed a general reduction from regret minimization to boosting.

Try to utilize this reduction with some of the many regret minimization algorithms and compare them to AdaBoost.

SENTIMENT ANALYSIS: Given user reviews, to determine if the review positive, negative or neutral.

Summary project

You first need to find a number of related papers (2-3) on the same topic (that interest you).

One place to find papers on machine learning is in the conferences (COLT (more theoretical), ICML (more empirical) or

NIPS) or journals (JMLR or Machine Learning Journal). Another good starting places are surveys and tutorials that explain the area and have many references.

Here are a few leads on some of the topics:

Boosting page also tutorial, tutorial2

Semi-supervised learning- survey, tutorial

Active learning: Sanjoy Dasgupta and tutorial

Online Learning and Regret Minimization: Survey, tutorial, tutorial2, Elad Hazan, Nicolo Cesa-Bianchi

Domain Adaptation: tutorial and Mehryar Mohri

Structured Prediction: tutorial

Agnostic Learning: tutorial

Research project

Basically, this is very similar to the summary project, in the planning and preparation. The main difference might be that you will select maybe a single paper,

and that your goal would be to identify a research project that is related to this work. In the proposal stage try to give a general outline of what you would like to do.