|
|
|
|
|
Tel Aviv University
-- Blavatnik
School of Computer Science
0368-4190-01
Algorithms for Big Data
Analysis in Biology and Medicine
àìâåøéúîéí ìðéúåç ðúåðé òú÷ áîãòé äçééí åáøôåàä
http://www.cs.tau.ac.il/~rshamir/abdbm
Prof. Ron Shamir
TA: Nimrod Rappoport
Fall 2017
Tuesday 16-19 Dan David 003
We shall describe and
analyze algorithmic and statistical methods for modern large-scale data. The
methods are generic, but we motivate them and also demonstrate their applications
on biomedical data. The course combines topics and ideas from genomics*, precision medicine**, machine learning and big data*.
Curriculum: (tentative)
·
Introduction
·
Statistical toolbox: Enrichment analysis: GO, TANGO, GSEA, KM
plots, LogRank, Cox, ROC, PR curves
·
Motif finding: PRIMA, MEME, Amadeus, DREME
·
Clustering: graph formulations, k-means, SOM, hierarchical, CLICK, Newman's alg, Consensus, FPF, K-Boost, PCA
·
Biclustering: ISA, Samba, Bimax
·
Classification: Introduction, dimension reduction, KNN,
SVM, Regression, feature selection BHASIC
·
Biological networks: Matisse, Cezanne, Network propagation
·
Drugs and personalized medicine
·
Integrated analysis: Paradigm, iCluster,
CoC, Hotnet, SNF, spectral
methods
Audience: The course is open for graduate and undergraduate students. Students in the MSc and BSc
bioinformatics tracks can take this as a core course.
Prerequisites:
Statistics for CS and Algorithms. Background in
biology, machine learning or bioinformatics is not required. No biology background is assumed. The basic background in biology will be given
in the first meetings.
Requirements:
Non-Scribers:
·
(70%) Homework assignments involving theory and implementation
(can be done in pairs)
·
(30%) final exam
Scribers:
·
(60%) Homework assignments
·
(30%) final exam
·
(15%) scribe
·
Lecture
notes of my course on Gene
Expression analysis (covering about half of the material).
·
Class
presentations and new scribes will be added during the semester.
·
Note: Homework
assignments are available on Moodle.
Plan (tentative):
Lec. |
Date |
Topic |
Scribe |
1 |
24/10 |
Introduction |
- |
2 |
31/10 |
Statistical
toolbox |
|
3 |
7/11 |
Motif
discovery |
- |
4 |
14/11 |
Clustering
1 |
- |
5 |
21/11 |
Clustering 2 |
Tomer Wolfson |
6 |
28/11 |
Biclustering |
- |
7 |
5/12 |
Classification 1 |
- |
8 |
12/12 |
Classification
2 |
- |
9 |
19/12 |
Integration
1 (Nimrod Rappoport) |
Shahar Segal |
10 |
26/12 |
Integration 2 |
Dan Coster |
11 |
2/1 |
Systems
genetics (Prof. Irit Gat-Viks) |
Itay Levy |
12 |
9/1 |
Biological
networks / EMRs |
Itay Harel,
Omri Lifshitz |
13 |
16/1 |
Drug
targets (Prof. Roded Sharan) |
David Pellow |
*Genomics and
Big Data: Biotechnology enables today
to measure many aspects of cellular life on the scale of the whole genome: the
DNA, the RNA, proteins, interactions and many more. A typical single
measurement ('profile') can produce 104-105 values. A
typical medical study can generate multiple profiles for each of 100-1000
patients. Advanced computational methods are being developed to analyze such
data, combining algorithms, machine learning, graph theory and statistics.
**Precision
Medicine: The combination of cheap
and accessible biotechnology, advanced computation and big data is expected to
change the medical practice: rather than one-size-fits-all treatment and drug
prescription, care will be tailored to the particular properties of a group of
individuals - or even to a single individual. These properties can be based on
the patients' genomes (via DNA deep sequencing), their metagenomes (skin, gut
and other microbial community genomes, also measured by deep sequencing), their
life style (monitored online by wearable devices) and
their medical history (available as electronic medical records). Large projects
have been initiated with this vision. For example, the US Precision
Medicine Initiative, Genomics England 100,000
Genomes Project, Denmark's GenomeDenmark platform, and commercial projects (e.g. 23&me and Regeneron and Geisinger) are
collecting genetic and clinical data from hundreds of thousands of patients.
The determination of the best treatment based on these data raises major
computational challenges, and we shall study some of them.
Contact info: email: rshamir AT tau dot ac dot il; phone: 640-5383; office: Schreiber 014; office hours – by
appointment
picture credits:
·
http://bioinformaticsreview.com/20151005/biominer-intro/
·
Time magazine
·
https://www.linkedin.com/pulse/20140923215637-5241481-artificial-intelligence-to-deliver-personalised-medicine
·
https://www.whitehouse.gov/blog/2015/01/30/precision-medicine-initiative-data-driven-treatments-unique-your-own-body