CHAPTER 13. LINEAR FACTOR MODELS
Many variants of ICA are not generative models in the sense that we use the
phrase. In this book, a generative model either represents
p
(
x
) or can draw samples
from it. Many variants of ICA only know how to transform between
x
and
h
, but
do not have any way of representing
p
(
h
), and thus do not impose a distribution
over
p
(
x
). For example, many ICA variants aim to increase the sample kurtosis of
h
=
W
−1
x
, because high kurtosis indicates that
p
(
h
) is non-Gaussian, but this is
accomplished without explicitly representing
p
(
h
). This is because ICA is more
often used as an analysis tool for separating signals, rather than for generating
data or estimating its density.
Just as PCA can be generalized to the nonlinear autoencoders described in
Chapter 14, ICA can be generalized to a nonlinear generative model, in which
we use a nonlinear function
f
to generate the observed data. See Hyvärinen
and Pajunen (1999) for the initial work on nonlinear ICA and its successful
use with ensemble learning by Roberts and Everson (2001) and Lappalainen
et al. (2000). Another nonlinear extension of ICA is the approach of nonlinear
independent components estimation, or NICE (Dinh et al., 2014), which stacks a
series of invertible transformations (encoder stages) that have the property that
the determinant of the Jacobian of each transformation can be computed efficiently.
This makes it possible to compute the likelihood exactly and, like ICA, attempts
to transform the data into a space where it has a factorized marginal distribution,
but is more likely to succeed thanks to the nonlinear encoder. Because the encoder
is associated with a decoder that is its perfect inverse, it is straightforward to
generate samples from the model (by first sampling from
p
(
h
) and then applying
the decoder).
Another generalization of ICA is to learn groups of features, with statistical
dependence allowed within a group but discouraged between groups (Hyvärinen
and Hoyer, 1999; Hyvärinen et al., 2001b). When the groups of related units are
chosen to be non-overlapping, this is called independent subspace analysis. It is also
possible to assign spatial coordinates to each hidden unit and form overlapping
groups of spatially neighboring units. This encourages nearby units to learn similar
features. When applied to natural images, this topographic ICA approach learns
Gabor filters, such that neighboring features have similar orientation, location or
frequency. Many different phase offsets of similar Gabor functions occur within
each region, so that pooling over small regions yields translation invariance.
13.3 Slow Feature Analysis
Slow feature analysis (SFA) is a linear factor model that uses information from
496