Deep Learning Master Class
Tel Aviv University | Wednesday-Thursday, November 5-6, 2014
|
|
NOV.5 (WEDNESDAY)
9:30-10:30: Yann LeCun (Facebook, NYU)
TBD
10:30-11:30: Tomaso Poggio (Massachusetts
Institute of Technology) Learning of invariant
representations in visual cortex: m-theory
The theory shows how a hierarchical architecture of
simple-complex cells modules can learn in an unsupervised way to
be automatically invariant to transformations of a new object,
achieving the goal of recognition with very few labeled
examples. M-theory makes specific predictions about the
architecture of the ventral stream, including the dependence on
eccentricity of the magnification factor in various areas, and
on the tuning properties of its neurons from early generic,
Gabor-like tuning to class-specific tuning in AIT. This approach
is an example of what could be the next phase in the theory of
learning: how to learn in an unsupervised way good
representations that allow a supervised classifier to learn from
very few labeled examples, similar to how children learn.
12:00-13:00: Yaniv Taigman (Facebook) Web-Scale Training
for Face Identification
Scaling machine learning methods to massive datasets has
attracted considerable attention in recent years, thanks to easy
access to ubiquitous sensing and data from the web. Face
recognition is a task of great practical interest for which (i)
very large labeled datasets exist, containing billions of
images; (ii) the number of classes can reach billions; and (iii)
complex features are necessary in order to encode subtle
differences between subjects, while maintaining invariance to
factors such as pose, illumination, and aging. In this talk I
will present an elaborate pipeline and several customized deep
architectures, that learn representations that generalize well
on the tasks of face verification and identification.
14:00-15:00 Amnon Shashua (HUJI) SimNets: A
Generalization of Convolutional Networks
We present a deep layered architecture that generalizes
classical convolutional neural networks (ConvNets). The
architecture, called SimNets, is driven by two operators, one
being a similarity function whose family contains the
convolution operator used in ConvNets, and the other is a new
"soft max-min-mean" operator called MEX that realizes classical
operators like ReLU and max-pooling, but has additional
capabilities that make SimNets a powerful generalization of
ConvNets. Two interesting properties that emerge from the
architecture are: (i) the basic input to hidden-units to
output-nodes machinery contains as special case a kernel
machine, and (ii) initializing networks using unsupervised
learning is natural. Experiments demonstrate the capability of
achieving state of the art accuracy with networks that are 1/8
the size of comparable ConvNets.
This work has been jointly done with Nadav Cohen.
NOV.6 THURSDAY
9:30-10:30: Rob Fergus (Facebook) Learning to
Discover Efficient Mathematical Identities
In this talk, I will describe how machine learning
techniques can be applied to the discovery of efficient
mathematical identities. We introduce an attribute grammar
framework for representing symbolic expressions. Given a set of
grammar rules we build trees that combine different rules,
looking for branches which yield compositions that are
analytically equivalent to a target expression, but of lower
computational complexity. However, as the size of the trees
grows exponentially with the complexity of the target
expression, brute force search is impractical for all but the
simplest of expressions. Consequently, we explore two learning
approaches that are able to learn from simpler expressions to
guide the tree search. The first of these is a simple n-gram
model, the other being a recursive neural-network. We show how
these approaches enable us to derive complex identities, beyond
reach of brute-force search, or human derivation.
10:30-11:30: Shai-Shalev Shwartz (HUJI)
Accelerationg
Stochastic Optimization
Stochastic optimization techniques such as SGD and its
variants are currently the method of choice for shallow and deep
learning from big data. The two main advantages of SGD are the
constant cost of each iteration (which does not depend on the
number of examples) and the ability to overcome local minima.
However, a major disadvantage of SGD is its slow convergence. I
will describe new stochastic optimization algorithms that
converge exponentially faster than SGD.
12:00-13:00: Ilya Sutskever (Google) Supervised
Learning with Deep Nural Networks
Deep neural networks achieved great results in speech,
vision, and language problems. But why do they work so well? I
will argue that large deep neural networks can solve almost any
problem, no matter how difficult, provided we have a large yet
often feasible number of input-output examples. I will then
present my recent work, together with Oriol Vinyals and Quoc Le,
on applying recurrent neural networks to generic
sequence-to-sequence problems, and report its performance on a
large scale machine translation task.
15:00-16:00: Yoshua Bengio (Université de
Montréal) Fundamentals
of Deep Learning of Representations
Deep learning has become very popular because of successes
in speech recognition and computer vision, but it remains
unclear to many why it works. To help feed that understanding,
this lecture focuses on the basic motivations behind
representation learning, in particular distributed
representations of the kind used in deep learning, and for the
idea of depth, i.e., composing features at multiple levels to
capture a hierarchy of abstractions. It will use manifold
learning and the geometric point of view on statistical learning
and generalization to illustrate what deep learning of
representations can bring, backed up by theoretical results
about the potentially exponential gain of deeper networks. This
view aims to bring some light on the question of what is a good
representation, based on the notion of unfolding manifolds and
disentangling the underlying factors of variation. It is
anchored on the perspective of broad priors as the fundamental
enablers of generalization in high-dimensional spaces, in order
to reduce or tackle the curse of dimensionality.
|
|
Supported by: Intel's ICRI-CI The I-CORE Program of the
planning and Budgeting Committee and the Israel Science Foundation
(grant No. 4/11) The Raymond and Beverly Sackler Faculty of Exact
Sciences of Tel Aviv University
|