Seminar – Data Science
Tova Milo, 2016
Meetings: Tuesdays 16-18
Seminar Information
The seminar focuses on managing,
analyzing, sharing, and integrating big data.
We shall read recent papers in
this area, focusing on several specific issues, and then explore possible
future directions. A list of tentative topics/papers is enclosed.
Tentative list of
papers
Management of Big Data
1. Minimal
MapReduce Algorithms. Yufei Tao, Wenqing Lin, Xiaokui Xiao.
SIGMOD'13 Tuval Rotem 15/3 Slides
2.
Upper and Lower Bounds on the Cost of
a Map-Reduce Computation. Foto Afrati, Anish Das Sarma, Semih Salihoglu,
Jeffrey Ullman. VLDB'13 Ilia Shevrin
22/3 Slides
3. Spark:
Cluster Computing with Working Sets. Matei Zaharia, Mosharaf Chowdhury,
Michael J. Franklin, Scott Shenker, Ion Stoica. HotCloud'10
Omri Zimbler 29/3 Slides
4. Shark: SQL
and Rich Analytics at Scale. Reynold S Xin, Josh Rosen, Matei
Zaharia, Michael J Franklin, Scott Shenker, Ion Stoica, SIGMOD'13 Jacob Komarovski 29/3 Slides
5. Spark
SQL: Relational Data Processing in Spark.
Michael Armbrust, Reynold S. Xin, Cheng Lian, Yin Huai, Davies Liu,
Joseph K. Bradley, Xiangrui Meng,
Tomer Kaftan, Michael J. Franklin, Ali Ghodsi, Matei Zaharia.
SIGMOD'15
Data Exploration
6.
Explore-by-Example:
An Automatic Query Steering Framework for Interactive Data Exploration , Kyriaki Dimitriadou, Olga Papaemmanouil, Yanlei Diao. SIGMOD'14 Adi Berger 5/4 Slides
7. SeeDB: Efficient Data-Driven Visualization Recommendations
to Support Visual Analytics. Manasi Vartak, Sajjadur Rahman, Samuel Madden,
Aditya Parameswaran, Alkis Polyzotis. VLDB'16 Gilad Rubin
3/5
8. Discovering
Queries based on Example Tuples. P. Yanyan Shen,
Kaushik Chakrabarti, Surajit
Chaudhuri, Bolin Ding, Lev Novik. SIGMOD'14 Uri Berge 10/5 Slides
9. Principles
of Dataset Versioning: Exploring the Recreation/Storage Tradeoff. Souvik Bhattacherjee, Amit Chavan, Silu Huang, Amol
Deshpande, Aditya Parameswaran. VLDB'15 Julya Yaroslavski
10/5
10. Interactive
Data Exploration Using Semantic Windows., Alexander Kalinin, Ugur Cetintemel, Stan Zdonik.
SIGMOD'14
Access, Analysis and
Interaction
11. Explaining
Query Answers with Explanation-Ready Databases. Sudeepa
Roy, Laurel Orr, Dan Suciu. VLDB'15 Tamar Shevach 17/5
12. Mining
Subjective Properties on the Web , Immanuel Trummer, Alon Halevy, Hongrae Lee, Sunita Sarawagi, Rahul Gupta. SIGMOD 2015 Amir Taubenfeld 24/5 Slides
13. Automatic
Enforcement of Data Use Policies with DataLawyer.
Prasang Upadhyaya,
Magdalena Balazinska, Dan Suciu.
SIGMOD'15 Ariela Naftulishen 7/6
14. Wrangler:
interactive visual specification of data transformation scripts. Sean Kandel, Andreas
Paepcke, Joseph M. Hellerstein,
Jeffrey Heer. CHI 2011
Data Cleaning
15. Data
X-Ray: A Diagnostic Tool for Data Errors, Xiaolan
Wang, Xin Luna Dong, Alexandra Meliou. SIGMOD'15 Tomer Amir 7/6 Slides
16. BigDansing: A System for Big Data Cleansing, Zuhair Khayyat, Ihab F. Ilyas, Alekh Jindal, Samuel Madden, Mourad
Ouzzani, Paolo Papotti,
Jorge-Arnulfo Quiané-Ruiz, Nan Tang, Si Yin.
SIGMOD'15