Recitations: Thursdays
Contact Info:
|
|
Email |
Phone |
Office |
OfficeHours |
Instructor |
benny AT cs.tau.ac.il |
640-5977 |
Schreiber 223 |
e-appointment |
|
Instructor |
ruppin AT
cs.tau.ac.il |
640-6528 |
Schreiber 118 |
e-appointment |
|
Teaching Assistant |
shlomito AT post.tau.ac.il |
640-5369 |
Schreiber 19מ |
e-appointment |
Course Outline
The Human Genome Project has completed the sequencing of the human DNA. Other genome projects of various organisms (bacteria, fungi, worms, rice, wheat, flies, chicken, mice and primates) are either completed or in the process of advanced sequencing. New bio-technologies produce huge amounts of biological data, whose manual analysis not possible. These advances have pushed for a new scientific discipline of bioinformatics, which integrates computer science, statistics, mathematics, and various biological fields. In this introductory course we concentrate on the algorithmic aspects of various computational tasks with biological origins. We will explore their computational complexity, and ways to solve them even if they are computationally intractable, such as approximation algorithms and heuristics.
Course topics include comparisons and alignments of biological sequences; Suffix trees and their uses for analyzing very long sequences (like complete genomes); Reconstruction of evolutionary trees using various optimization criteria, such as maximum parsimony, maximum likelihood, and distance criteria; Physical mapping of genomes; Genome alignment; Probabilistic models, such as PSSM (Position Specific Scoring Matrix) and HMM(Hidden Markov Models); HMM algorithms (Viterby, Expectation Maximization);Biological applications of the probabilistic models (e.g. gene prediction, protein classification, and sequence alignment); Analyzing biological networks; DNA microarrays and their uses; Clustering gene expression data; Classification using SVM (Support Vector Machines), and applications to clinical diagnosis.
Final exam: Closed book except for 4 double-sided pages of your choice.
Tantative Course Plan (subject to changes):
Lecture |
Date |
Subject |
Reference(s) |
Recitation |
1 |
|
Introduction and administrativia. |
Kanehisa, ch. 3 Durbin et. al.,
ch. 2. |
______ |
2 |
9/11 |
Pairwise Sequence Alignment: Linear space algorithm; FASTA and BLAST heuristics; PAM and BLOSSUM |
Durbin et. al., 2.5, 2.7 |
|
3 |
16/11 |
Genome Alignment (no slides) Introduction to suffix trees and suffix
arrays: Dan Geiger (pdf), Fernndez-Baca slides
(ppt). |
Gusfield, ch. 15 Gusfield, ch. 5,6,7 |
|
4 |
23/11 |
Durbin et al., 7.1, 7.2 Gusfield,
ch. 17. |
Multiple Sequence Alignment: Family
Representations and
|
|
5 |
30/11 |
Reconstructing phylogenetic (evolutionary) trees: |
Cancelled |
|
6 |
7/12 |
Maximum likelihood estimate (MLE); ML inference of trees. ML and AML properties.
Large, small, and tiny versions. Hardness results. |
|
Solutions to assignment 1 (recitation in Schreiber 309) |
7 |
14/12 |
1) Maximum likelihood (ML and AML): Felsenstein "tiny ML" algorithm, Pupko et. al "tiny AML" algorithm. |
|
Weighted parsimony. Factor 2 approximation to parsimony. TSP: Approximation and simulated annealing heuristic. |
8 |
21/12 |
Durbin et al. , ch. 3 |
ML vs. MP. ML examples. |
|
9 |
28/12 |
Hidden Markov
models (HMM): Parameter estimation using the EM algorithm. ML vs MAP (maximum apostriori probability); |
Profile HMM to identify protein families. |
|
10 |
4/1/2006 |
Introduction to biological networks 1 |
|
Solutions:
Assignment 2 |
11 |
11/1 |
|
Pair HMM |
|
12 |
19/1 |
Intro to DNA chips and an animation;
|
|
Network Motives |
13 |
26/1 |
Pluto's Introduction to Machine Learning. Support vector machines and kernels. SVMs for string problems and protein classification. Disease diagnosis by SVM on gene expression data. Linear separability. |
Solutions: Assignment 3. Linear programming. |
|
14 |
3/2 |
. |
|
Exam examples |
The course is "officially'' opened to 3rd year
students of the bioinformatics track, to 3rd year and graduate Computer Science
students. The algorithms course (“efficiency of computations”) is a
prerequisite, and the computability course (“computational models”) should at
least be taken concurrently with this course. Students from other disciplines
are welcome to take the course, but are encouraged to first contact the
instructor or teaching assistant. Some background in algorithms, probability,
and programming is required.(In the previous times
this course was given by this instructor, a number of life science students
completed it successfully).
Administratrivia
BTW this is also the case
regarding the lectures: If you can understand the material, answer the home
assignments (on your own) and pass the exam in flying colors without attending
classes, then surely windsurfing or boogy-boarding is
a better use of your time.)
Homework submission in singles or pairs (no triplets, quartets, quintets etc.). There will be a total of 5-6 problem sets, involving both "dry" assignments and "wet" ones. The wet parts will require understanding and running existing software, but not writing any code. Homework should be done independently. External sources (books, journal articles, web pages) can be used but should be clearly quoted. Research done at the Technion has shown a significant correlation between independent work and final performance.
Assignment
1 (due
Assignment
2 (due
Assignment
3 (due
Assignment 4 (not there yet)
Books
Other On-line Courses
Lior
Pachter course: Algebraic Statistics for
Computational Biology, UC Berkeley, Fall 2004.
Ron Shamir course notes: Computational Genomics,
Tel-Aviv UniversityFall2003/4
Martin Tompa's course notes: Computational Biology, (CSE 527),
University of
Nir
Friedman's course slides: Computational Molecular Biology the
Karp, Ruzzo and Tompa's course notes:
Algorithms in Molecular Biology
Other Interesting Resources (not course related)