CHAPTER 16. STRUCTURED PROBABILISTIC MODELS FOR DEEP LEARNING
Of course, the utility of a graphical model is that the graph implies that some
variables do not interact directly. The complete graph is not very useful because it
does not imply any independences.
When we represent a probability distribution with a graph, we want to choose
a graph that implies as many independences as possible, without implying any
independences that do not actually exist.
From this point of view, some distributions can be represented more efficiently
using directed models, while other distributions can be represented more efficiently
using undirected models. In other words, directed models can encode some
independences that undirected models cannot encode, and vice versa.
Directed models are able to use one specific kind of substructure that undirected
models cannot represent perfectly. This substructure is called an immorality. The
structure occurs when two random variables
a
and
b
are both parents of a third
random variable
c
, and there is no edge directly connecting
a
and
b
in either
direction. (The name “immorality” may seem strange; it was coined in the graphical
models literature as a joke about unmarried parents.) To convert a directed model
with graph
D
into an undirected model, we need to create a new graph
U
. For
every pair of variables
x
and
y
, we add an undirected edge connecting
x
and
y
to
U
if there is a directed edge (in either direction) connecting
x
and
y
in
D
or if
x
and
y
are both parents in
D
of a third variable
z
. The resulting
U
is known as
a moralized graph. See Fig. 16.11 for examples of converting directed models to
undirected models via moralization.
Likewise, undirected models can include substructures that no directed model
can represent perfectly. Specifically, a directed graph D cannot capture all of the
conditional independences implied by an undirected graph
U
if
U
contains a loop
of length greater than three, unless that loop also contains a chord. A loop is
a sequence of variables connected by undirected edges, with the last variable in
the sequence connected back to the first variable in the sequence. A chord is a
connection between any two non-consecutive variables in the sequence defining a
loop. If
U
has loops of length four or greater and does not have chords for these
loops, we must add the chords before we can convert it to a directed model. Adding
these chords discards some of the independence information that was encoded in
U
. The graph formed by adding chords to
U
is known as a chordal or triangulated
graph, because all the loops can now be described in terms of smaller, triangular
loops. To build a directed graph
D
from the chordal graph, we need to also assign
directions to the edges. When doing so, we must not create a directed cycle in
D
, or the result does not define a valid directed probabilistic model. One way
to assign directions to the edges in
D
is to impose an ordering on the random
580