Sunday, Dec 18, 2005, 11:15-12:15
Room 309
--------------------------------------------------------------------------------
Robert Krauthgamer
IBM Research
Title:
Unstructured data: practical algorithms with rigorous
analysis
Abstract:
The plethora of data available nowadays has tremendous
potential, but
analyzing it presents a host of
algorithmic challenges. Current data
sets are mostly unstructured and
noisy, requiring relatively
complicated computational tasks, such
as sequence alignment and
similarity search. Furthermore, the data's enormous size might
severely restrict the computational
paradigm, e.g., to streaming or
sublinear-time algorithms.
I will present some recent rigorous algorithmic approaches
aimed at
explaining and/or predicting
success in practice. In particular, for
problems that are hard in the worst-case,
one may design a plausible
model for real-life data, and then
exploit this ``additional
structure'' to devise improved
algorithms. My running example will be
near neighbor searching in regimes
such as the Euclidean distance and
the edit distance.