CHAPTER 4. NUMERICAL COMPUTATION
f
0
(
x
+
)
>
0 for small enough
. In other words, as we move right, the slope begins
to point uphill to the right, and as we move left, the slope begins to point uphill
to the left. Thus, when
f
0
(
x
) = 0 and
f
00
(
x
)
>
0, we can conclude that
x
is a local
minimum. Similarly, when
f
0
(
x
) = 0 and
f
00
(
x
)
<
0, we can conclude that
x
is a
local maximum. This is known as the second derivative test. Unfortunately, when
f
00
(
x
) = 0, the test is inconclusive. In this case
x
may be a saddle point, or a part
of a flat region.
In multiple dimensions, we need to examine all of the second derivatives of the
function. Using the eigendecomposition of the Hessian matrix, we can generalize
the second derivative test to multiple dimensions. At a critical point, where
∇
x
f
(
x
) = 0, we can examine the eigenvalues of the Hessian to determine whether
the critical point is a local maximum, local minimum, or saddle point. When the
Hessian is positive definite (all its eigenvalues are positive), the point is a local
minimum. This can be seen by observing that the directional second derivative
in any direction must be positive, and making reference to the univariate second
derivative test. Likewise, when the Hessian is negative definite (all its eigenvalues
are negative), the point is a local maximum. In multiple dimensions, it is actually
possible to find positive evidence of saddle points in some cases. When at least
one eigenvalue is positive and at least one eigenvalue is negative, we know that
x
is a local maximum on one cross section of
f
but a local minimum on another
cross section. See Fig. 4.5 for an example. Finally, the multidimensional second
derivative test can be inconclusive, just like the univariate version. The test is
inconclusive whenever all of the non-zero eigenvalues have the same sign, but at
least one eigenvalue is zero. This is because the univariate second derivative test is
inconclusive in the cross section corresponding to the zero eigenvalue.
In multiple dimensions, there can be a wide variety of different second derivatives
at a single point, because there is a different second derivative for each direction.
The condition number of the Hessian measures how much the second derivatives
vary. When the Hessian has a poor condition number, gradient descent performs
poorly. This is because in one direction, the derivative increases rapidly, while in
another direction, it increases slowly. Gradient descent is unaware of this change
in the derivative so it does not know that it needs to explore preferentially in
the direction where the derivative remains negative for longer. It also makes it
difficult to choose a good step size. The step size must be small enough to avoid
overshooting the minimum and going uphill in directions with strong positive
curvature. This usually means that the step size is too small to make significant
progress in other directions with less curvature. See Fig. 4.6 for an example.
This issue can be resolved by using information from the Hessian matrix to
89