Tel-Aviv University
School of Computer Science
Lectures:
Wednsdays
Recitations: Thursdays
Contact Info:
|
|
Email |
Phone |
Office |
OfficeHours |
Instructor |
benny AT cs.tau.ac.il |
640-5977 |
Schreiber 223 |
e-appointment |
|
Teaching Assistant |
Ulitskyi AT gmail.com |
640-5394 |
Schreiber 011 |
e-appointment |
Course Bulletin Board
Date posted |
|
---|---|
March 22 |
Assignment 1 has been posted (see link below) |
April 11 |
Assignment 2 has been posted (see link below) |
May 19 |
Assignment 3 has been posted (see link below) |
Course Outline
The Human Genome
Project has completed the sequencing of the human DNA. Other genome
projects of various organisms (bacteria, fungi, worms, rice, wheat, flies,
chicken, mice and primates) are either completed or in the process of advanced
sequencing. New bio-technologies produce huge amounts of biological data,
whose manual analysis not possible. These advances have pushed for a new
scientific discipline of bioinformatics, which integrates computer science,
statistics, mathematics, and various biological fields. In
this introductory course we concentrate on the algorithmic aspects of
various computational tasks with biological origins. We will explore their
computational complexity, and ways to solve them even if they are
computationally intractable, such as approximation algorithms and heuristics.
Course topics
include comparisons and alignments of biological sequences; Suffix trees and
their uses for analyzing very long sequences (like complete genomes);
Reconstruction of evolutionary trees using various optimization criteria, such
as maximum parsimony, maximum likelihood, and distance criteria; Physical
mapping of genomes; Genome alignment; Probabilistic models, such as PSSM
(Position Specific Scoring Matrix) and HMM (Hidden Markov Models); HMM
algorithms (Viterby, Expectation Maximization); Biological applications of the
probabilistic models (e.g. gene prediction, protein classification, and
sequence alignment); Analyzing biological networks; DNA microarrays and their
uses; Clustering gene expression data; Classification using SVM (Support Vector
Machines), and applications to clinical diagnosis; Metagenomics.
Tantative Course Plan (subject to changes):
Lecture |
Date |
Subject |
Reference(s) |
Recitation
|
1
|
|
Introduction and
administrativia. |
Kanehisa, ch.
3 Durbin et. al., ch. 2.
|
no
recitation this week
|
2 |
11/3 |
Similarity and distance; Affine gap penalties; Fasta and Blast heuristics. |
Durbin et. al., ch. 2.
Gusfield, ch. 11 |
More pairwise sequence allignments |
3 |
18/3 |
Scoring functions: PAM and BLOSUM |
Durbin et. al., 2.7 Gusfield, ch. 11,15 |
|
4 |
25/3 |
Suffix trees - Introduction (Dan Geiger pdf) |
Gusfield, ch. 5,6,7 | |
5 |
1/4 (no kidding) |
Phylogenetic trees reconstruction: | Durbin et. al.,7.1, 7.2, 7.4 Gusfield, ch. 17 |
ML and AML in phylogenetic trees |
6 |
21/4 |
Maximum parsimony cont., including proof of hardness. Maximum likelihood (2) - a status report |
Durbin et. al., ch. 7 and 8 Gusfield, ch. 17 |
|
7 |
6/5 |
1) Introduction to discrete Markov Models 2) Hidden Markov models
(HMM): the three problems
3) Baum's Forward-Backwards Algorithm (board presentation, following Rabiner's tutorial). |
Rabiner's Tutorial Durbin et al. , ch. 3 |
|
8 |
13/5 |
Viterbi's algorithm (board presentation, following Rabiner's tutorial).. Using EM to optimize HMM parameters (board presentation, following Rabiner's tutorial). Determining protein families by domains: PSSMs, HMMs, and the Pfam database |
Dan Geiger (Technion) EM slides Bilmes' gentle tutorial to the EM algorithm Durbin et al., ch. 3 |
|
9 |
20/5 |
Introduction to biological networks |
No recitation: Student's day | |
10 |
27/5 |
Intro to DNA microarrays |
No recitation: Shavuot vacation |
|
11 |
3/6 |
More microarray analysis: Clustering; Diagnosis. |
|
|
12 |
10/6 |
Pluto's intro to machine learning. Differential gene expression and class discovery (based on a presentation of Dr. Zohar Yakhini) Linear separability of gene expression pairs (based on joint work with (Giora Unger) |
Pluto's intro to machine learning (3.5GB file): Based on ch. 1 from Schlkopf
and Smola's book |
|
13 |
17/6 |
DNA sequencing Pizza, pasta, and beer |
The course is officially opened to 3rd year students of the bioinformatics track, to 3rd year and graduate Computer Science students. The algorithms course (formerly efficiency of computations) is a prerequisite, and the computability course (computational
models) should at least be taken concur8rently with this course. Students from
other disciplines are welcome to take the course, but are encouraged to first
contact the instructor or teaching assistant. Some background in algorithms,
probability, and programming is required.(In the previous times this course was
given by this instructor, a number of life science students completed it
successfully).
Administratrivia
Following last years’
successful experience, in addition to a 3 hours weekly lecture, there will also
be a 1 hour weekly recitation, given
by the teaching assistant. Attending this recitation is absolutely optional,
but is strongly encouraged.
BTW this is also the case
regarding the lectures: If you can understand the material, answer the home
assignments (on your own) and pass the exam in flying colors without attending
classes, then surely windsurfing or boogy-boarding is a better use of your
time.)
Grades in the course will be based on five-six problem sets
(20-30% of total grade) and a final exam (70-80%). Intelligent participation in
class may yield a small (unspecified, but surely non-negative) bonus.
Homework
submission in singles or pairs (no triplets, quartets, quintets etc.). There
will be a total of 3-5 problem sets, involving both "dry" assignments
and "wet" ones. The wet parts will require understanding and running
existing software, but not writing any code. Homework should be done and
written independently. External sources (books,
journal articles, web pages) can be used but should be clearly
quoted. Research done at the
Technion has shown a significant correlation between independent work and final
performance.
Moed A from 2005/6 can be found here
Problem Sets
Assignment 2
Assignment 3
Assignment 4
Books
Biological Sequence
Analysis, R. Durbin et al. ,
Algorithms on Strings,
Trees, and Sequences: Computer Science and Computational Biology, D. Gusfield,
Post-genome Informatics, M. Kanehisa ,
Introduction to
Computational Molecular Biology, J.
Meidanis and J. Setubal, Brooks/Cole Pub Co., 1997.
Computational Molecular
Biology : An Algorithmic Approach, P.
Pevzner, MIT Press, 2000.
Introduction to
Bioinformatics Algorithims,
Neil C. Jones, Pavel A. Pevzner, MIT Press, 2004.
Learning with Kernels: Support
Vector Machines, Regularization, Optimization and Beyond, Bernhard Schlkopf
and Alex Smola, MIT Press,
2002. Available on-line: Preface, Section 1.1,
Section 1.2,
and more.
Biology Background Books
The Cartoon Guide to
Genetics, by Larry Gonick and Mark
Wheelis, Harper Perennial, 1991
(I have not read this book, but I was
told it is gives a good popular overview. Some students complained the book
spends way too much time before getting down to relevant business).
Other On-line Courses
Dan Geiger course: Algorithms in Computational Biology , Technion, Spring 2008.
Lior Pachter course: Algebraic
Statistics for Computational Biology, UC Berkeley, Fall 2004.
Ron Shamir course notes: Computational Genomics, Tel-Aviv University, Fall 2003/4
Nir Friedman's course slides:
Computational Molecular Biology the
Ruzzo and
Tompa's course notes: Algorithms in Molecular Biology
Other Interesting
Resources (not
directly course related)
Mother Jones: Alternative views on some world affairs
Snails
are faster than ADSL (see also here, and pay special
attention to the apparatus)