Computational Learning Sem I, 1999-0

School of Computer Science

Learning and Neural Computation

0368-4034-01

Semester I, 1999-0, Wednesday 9-12, Dan David 211

The course will be given in English!

Submission of the project till Thursday, Feb 24 8pm
Presentation of the project in Mid March.

	Dr. Nathan Intrator, nin@math.tau.ac.il Office hours: Wednesday, 13:30-14:30, Schreiber 216			Prof. Manfred Joswig, manfred@ndc.soreq.gov.il Office hours: *, Schreiber *

Updated April 10, 2000 This page will be updated throughout the course.

Download the test mat file for all signals (click on right button)

Please send me an email with a list of 19 numbers between zero and one which represent your prediction regarding each pattern. Please indicate the architecture and parameters used to get these values.

The course is intended for third-year and Master students. The first part of the course will deal with classical topics in computational learning theory and neural network learning. The second part will be geared more towards applications. In this part, students will apply the acquired methods on some Seismic Data Classification and few other real-world problems. Part of the course will therefore be devoted to topics related to detection in time-series and acoustic signals. This will be done by Dr. Manfred Joswig who is visiting this year and is a world expert on Seismic Data Analysis.

Course Outline

Some background and notation from Statistics
Calma1 Calma2 Faces Mines
- Curse of Dimensionality and Over-Fitting
- Estimation via MSE and Maximum Likelihood
- Linear regression: Numerical estimation problems
- Bayes Theorem: Prior & Posterior probabilities.
- Class Boundaries
Topics from Non-paramteric statistics
- Linear discrimination
- The plug-in method, Bias and Variance of an estimator
- Kernel estimation and Parzen Windows (notes)
- Density Estimation: Naive estimator, Kernel and variable kernel estimator, K-nearest neighbor estimation.
- Likelihood ratios of nested models
- Confidence Intervals
Learning goals: Introduction to the Entropy
Additional material:
- T. Cover & J. Thomas (1991) Elements of Information Theory. Wiley publishers.
- E. T. Jaynes (1999) Probability theory
- D.J. MacKay (1998) Information Theory Chapter 1 (Local)
- Maximization of number of possible configurations
- Minimization of averaged bit length
- Motivation from learning and model estimation
- Error functions
- Parameter Optimization
- Maximum likelihood
- Infomax
Single Layer Perceptrons
- Probabilistic interpretation of Perceptron outputs
- Perceptron learning rule
Multi-layer perceptrons
Introduction to Seismology
Preprocessing
- Feature extraction
- Some properties of Principal components
- General properties of other orthonormal bases
Information theory, Projection pursuit and neuronal coding
- Feature extraction by minimization of entropy
- Mutual information from MDL perspective
- Search for non-Gaussian distributions
- Skewness, Kurtosis, BCM rule
- Some approximations to the entropy
- Independent components analysis
Model complexity via MDL Introduction (Hinton's paper)
- Short discussion about matlab (Project presentation)
Radial Basis Functions
- Description of the EM algorithm and its application to RBF.
- ORR's code
Visualization software
- Linear Multi-Dimensional scaling
- Non-linear MDS via Neural Networks.
- Parallel coordinates
- Hard and soft clustering
- Mahalanobis clustering
Prediction confidence
- Using reconstruction
- Using variance
The bias/variance problem & ensemble of experts
Noise Injection Two-spiral problem
Exhaustive training
Combination of Experts based on maximal entropy considerations.
Bagging, Arcing

Self Organizing maps and Recurrent networks
Kohonen LVQ and SOM
Recurrent Boltzman machines

Network Ensembles
Dichotomy between models based on their Entropy minimization or maximization.
Various ways to measure deviation from maximal entrop .
Variance error reduction via Ensemble of experts , LVQ
Super and Sub-Gaussian distributions.
Examples of Kurtosis, Skewness or BCM optimizations.

Description of the final

The purpose of the final this year is to develop expertise in using regularization constraints during training to reduce the chance of over-fitting. We shall thus train with different bias constraints using the general framework of Intrator (1993). In addition we shall master the use of object preprocessing and ensemble training and prediction confidence as essential tools for a practical neural network expert system.
We shall be using the following preprocessing methods:

Principal components
Dimensionality reduction via entropy constraints
K-means Clustering
Your choice.

The following training constraints:

Reconstruction constraint
Kurtosis constraint
BCM constraint
Minimal entropy constraint
Maximal entropy constraint
Mixture of Gaussians constraint
Your choice

The following network architectures and training algorithms

Orr's Radial Basis Function code.
Bishop's Radial Basis Function code.
Neural networks toolbox back-prop code with Levenberg-Marquardt training.
Neural networks toolbox RBF code.
Your choice.

The following prediction-confidence methods:

Variance between different experts weighted by their past performance.
Weighted reconstruction error.
Your choice.

Description of the data, and past projects (from which code can be taken) can be found here.

You will need to supply three functions:

[preproc, netarch] = findarch(train_data, train_label, options)
This function finds an optimal preprocessing and a set of architectures with their corresponding parameters which are best for the given training data. This function calls internally to findpreproc
[processed, preproc] = findpreproc(train_data, train_label, procoptsions, preproc)
This function has two tasks. First it finds an optimal preprocessing and second, it generates the pre-processed data. If preproc is supplied, the program runs the data through the preprocessing methodology which should be completely kept in preproc.
[label_pred, confidence] = netpred(data, preproc, netarch, options)
This function performs the preprocessing on the given data based on the preprocessing scheme that was chosen by findarch and is stored in preproc and then prediction of the class labels based on the architectures that were found by findarch and are stored in netarch. Note that if a collection of several experts is found by findarch, then the algorithm for fusing the experts should also be stored in netarch to be used by netpred.
You should suggest a way to calculate the confidence and justify in your written report why this confidence method is useful.

Each function should print once on the screen the names of the students and a short description of the methods that are used, the type of preprocessing and the architectures that are found. In the prediction code, again, the names should be displayed as well as the type of expert fusion and the method to calculate the confidence. There should also be a clear description of the possible options to the functions and default values (which were found optimal) in case options are not chosen.
All other functions should be internal to the above three functions.

Note: You are allowed to use code written by last year's projects or code taken from the web as long as you clearly state it at the beggining of the code, insert it into your own code, make sure that you understand what the code is doing and the code has good documentation.

Project submission

There should be an HTML page describing the project and having a link to the functions. It should include the following:

Clear description of the type of algorithms that were used and the way optimal architectures were chosen.
Clear justification based on what was learned in the course for your decisions. (This can include graphs.)
Clear description and justification of how the confidence is calculated.

In the code itself, there should be a help file similar to those found in matlab code with clear description of the structure of your variables (the net architectures, the options and the preprocessing methods and parameters).

Project presentation is scheduled to Tuesday February 8, 2000 at 9am.

Specific-group tasks

Project: Akavia Gindi Erez
- Preprocessing: clustering and your choice.
- Training constraints: Kurtosis and your choice.
- Architecture and code: RBF with Orr and Back-prop with LM.
- Confidence: Reconstruction and your choice.
- Data: Sonl and Sono.
Project: Lifchitz Weisman Arnold
- Preprocessing: Minimal entropy and clustering.
- Training constraints: BCM and Kurtosis.
- Architecture and code: Back-prop with LM.
- Confidence: Variance and your choice.
- Data: Sono and wvlet.
Project Hoffman Stav
- Preprocessing: Maximal entropy and clustering.
- Training constraints: Reconstruction and minimal entropy.
- Architecture and code: RBF with Orr and Bishop.
- Confidence: Variance and your choice.
- Data: Sonl and psd.
Project Tobias
- Preprocessing: Clustering and your choice.
- Training constraints: Reconstruction and Kurtosis
- Architecture and code: RBF with Bishop.
- Confidence: Reconstruction and your choice.
- Data: Sonl and wvlet
Project Sidi, Bogodlov
- Preprocessing: clustering and your choice.
- Training constraints: BCM and Kurtosis.
- Architecture and code: RBF with Orr.
- Confidence: Reconstruction and your choice.
- Data: Sonl and Sono.

Data Sets Sonl Sono Psd WVlet Analog Data Download using right mouse botton

Learning and Neural Computation 0368-4034-01

Learning and Neural Computation

0368-4034-01