Seminar – Data Science

 

Tova Milo, 2016

 

Meetings: Tuesdays 16-18

 


Seminar Information

 

The seminar focuses on managing, analyzing, sharing, and integrating big data.

We shall read recent papers in this area, focusing on several specific issues, and then explore possible future directions. A list of tentative topics/papers is enclosed.

 

Tentative list of papers

   

                Management of Big Data

 

1.      Minimal MapReduce Algorithms. Yufei Tao, Wenqing Lin, Xiaokui Xiao. SIGMOD'13 Tuval Rotem 15/3 Slides

 

2.      Upper and Lower Bounds on the Cost of a Map-Reduce Computation. Foto Afrati, Anish Das Sarma, Semih Salihoglu, Jeffrey Ullman. VLDB'13 Ilia Shevrin 22/3 Slides

 

3.      Spark: Cluster Computing with Working Sets.  Matei Zaharia, Mosharaf Chowdhury, Michael J. Franklin, Scott Shenker, Ion Stoica. HotCloud'10   Omri Zimbler 29/3 Slides

 

4.      Shark: SQL and Rich Analytics at Scale. Reynold S Xin, Josh Rosen, Matei Zaharia, Michael J Franklin, Scott Shenker, Ion Stoica, SIGMOD'13 Jacob Komarovski 29/3 Slides

 

5.      Spark SQL: Relational Data Processing in Spark.  Michael Armbrust, Reynold S. Xin, Cheng Lian, Yin Huai, Davies Liu, Joseph K. Bradley, Xiangrui Meng, Tomer Kaftan, Michael J. Franklin, Ali Ghodsi, Matei Zaharia. SIGMOD'15

 

Data Exploration

 

6.      Explore-by-Example: An Automatic Query Steering Framework for Interactive Data Exploration , Kyriaki Dimitriadou, Olga Papaemmanouil, Yanlei Diao. SIGMOD'14 Adi Berger 5/4 Slides

 

7.      SeeDB: Efficient Data-Driven Visualization Recommendations to Support Visual Analytics. Manasi Vartak, Sajjadur Rahman, Samuel Madden, Aditya Parameswaran, Alkis Polyzotis. VLDB'16 Gilad Rubin 3/5

 

8.      Discovering Queries based on Example Tuples. P. Yanyan Shen, Kaushik Chakrabarti, Surajit Chaudhuri, Bolin Ding, Lev Novik. SIGMOD'14  Uri Berge 10/5 Slides

 

 

9.      Principles of Dataset Versioning: Exploring the Recreation/Storage Tradeoff. Souvik Bhattacherjee, Amit Chavan, Silu Huang, Amol Deshpande, Aditya Parameswaran. VLDB'15 Julya Yaroslavski 10/5

 

10.  Interactive Data Exploration Using Semantic Windows., Alexander Kalinin, Ugur Cetintemel, Stan Zdonik. SIGMOD'14

 

Access, Analysis and Interaction

 

 

11.  Explaining Query Answers with Explanation-Ready Databases. Sudeepa Roy, Laurel Orr, Dan Suciu. VLDB'15 Tamar Shevach 17/5

 

12.  Mining Subjective Properties on the Web , Immanuel Trummer, Alon Halevy, Hongrae Lee, Sunita Sarawagi, Rahul Gupta. SIGMOD 2015 Amir Taubenfeld 24/5 Slides

 

13.  Automatic Enforcement of Data Use Policies with DataLawyer. Prasang Upadhyaya, Magdalena Balazinska, Dan Suciu. SIGMOD'15 Ariela Naftulishen 7/6

 

 

14.  Wrangler: interactive visual specification of data transformation scripts.  Sean Kandel, Andreas Paepcke, Joseph M. Hellerstein, Jeffrey Heer. CHI 2011

 

Data Cleaning

 

15.  Data X-Ray: A Diagnostic Tool for Data Errors, Xiaolan Wang, Xin Luna Dong, Alexandra Meliou. SIGMOD'15 Tomer Amir  7/6 Slides

 

16.  BigDansing: A System for Big Data Cleansing, Zuhair Khayyat, Ihab F. Ilyas, Alekh Jindal, Samuel Madden, Mourad Ouzzani, Paolo Papotti, Jorge-Arnulfo Quiané-Ruiz, Nan Tang, Si Yin. SIGMOD'15