This part of the book describes the more ambitious and advanced approaches
to deep learning, currently pursued by the research community.
In the previous parts of the book, we have shown how to solve supervised
learning problems—how to learn to map one vector to another, given enough
examples of the mapping.
Not all problems we might want to solve fall into this category. We may
wish to generate new examples, or determine how likely some point is, or handle
missing values and take advantage of a large set of unlabeled examples or examples
from related tasks. A shortcoming of the current state of the art for industrial
applications is that our learning algorithms require large amounts of supervised
data to achieve good accuracy. In this part of the book, we discuss some of
the speculative approaches to reducing the amount of labeled data necessary
for existing models to work well and be applicable across a broader range of
tasks. Accomplishing these goals usually requires some form of unsupervised or
semi-supervised learning.
Many deep learning algorithms have been designed to tackle unsupervised
learning problems, but none have truly solved the problem in the same way that
deep learning has largely solved the supervised learning problem for a wide variety of
tasks. In this part of the book, we describe the existing approaches to unsupervised
learning and some of the popular thought about how we can make progress in this
field.
A central cause of the difficulties with unsupervised learning is the high di-
mensionality of the random variables being modeled. This brings two distinct
challenges: a statistical challenge and a computational challenge. The statistical
challenge regards generalization: the number of configurations we may want to
distinguish can grow exponentially with the number of dimensions of interest, and
this quickly becomes much larger than the number of examples one can possibly
have (or use with bounded computational resources). The computational challenge
associated with high-dimensional distributions arises because many algorithms for
learning or using a trained model (especially those based on estimating an explicit
probability function) involve intractable computations that grow exponentially
with the number of dimensions.
With probabilistic models, this computational challenge arises from the need to
perform intractable inference or simply from the need to normalize the distribution.
• Intractable inference
: inference is discussed mostly in Chapter 19. It
regards the question of guessing the probable values of some variables
a
,
given other variables
b
, with respect to a model that captures the joint
490