Real evolution finds the right direction by trying and eliminating many wrong ones. It is not known to have a memory of the previous changes that were successful. Instead, many individuals are generated that have particular characteristics in different dimensions. The individuals that have moved in the wrong direction are simply eliminated. There is an excessive number of individuals at nature's disposal, and there is nothing wrong with wasting a few. However, for an optimization algorithm of practical value, every time step is valuable. Instead of wasting resources by using a large population, I propose gaining efficiency with a short term memory.
Note that in Figure , the right direction to move could be
discovered by summing a few successful steps. This is exactly the approach
that will be taken.
Adding up too many vectors, however, is dangerous. In Figure ,
things would have worked fine, because there is one right direction to go, but
for a function like Rosenbrock's saddle in Figure
, the right
direction is constantly changing. Experiments with high dimensional
optimization problems suggest that, the turning ridge problem is a common one.
Thus, the algorithm should be able to adapt itself to the local terrain it is
moving through.
Figure:
Rosenbrock's saddle. The function has a ridge along y = x^2 that rises
slowly as it is traversed from (-1, 1) to (1, 1). The arrows show the steps
taken by the improved Local-Optimize
. Note how the steps shrink in
order to discover a new direction, and grow to speed up once the direction is
established.
Figure:
An illustration of how Procedure 4-1 uses the vectors u and v.
In (A) the step vector v keeps growing in the same direction as long as
better points are found. u accumulates the moves made in this
direction. Once v fails, the algorithm tries new directions. In (B) a
new direction has been found. The combination u+v is tried and
found to give a better point. In this case u starts accumulating moves
in this new direction. In (C) u+v did not give a better point.
Thus we start moving in the last direction we know that worked, in this case
v.
The Local-Optimize
in Procedure 3-1 of the previous section, is able to
adapt its step sizes to the local terrain. Procedure 4-1 gives an improved
version which is able to adapt its direction also. The local optimizer moves
along its best estimate of the gradient direction. Vector u is used to
accumulate moves made along this direction. When this direction fails, vector
v enters its random probing mode, and searches until it can find a new
direction that works. When a new direction is found, the algorithm tries
u+v as a new estimate of the gradient direction. Either this
succeeds, and u starts accumulating moves along u+v, or
it fails and the new v becomes the new estimate. Figure
explains the algorithm in pictures. As shown in the figure, the algorithm
keeps adapting its estimate of the gradient vector according to the local
terrain.
The performance of the algorithm improved by
an order of magnitude better in complex spaces. Figure is
the result of running this algorithm on Rosenbrock's saddle. The function
has a ridge along the y=x^2 parabola that gradually rises from (-1,1)
to (1,1). The global maximum is at (1,1) and the algorithm was started on
the other end of the ridge, so that it had to traverse the whole parabola.
As illustrated in Figure
the more adaptive Procedure 4-1 is
more than an order of magnitude efficient than Procedure 3-1 of the last
chapter.
Figure:
Comparison of the two versions of Local-Optimize
from chapter 3 and
chapter 4 on Rosenbrock's saddle. Note that both axes are logarithmic. The
old version converges in 8782 iterations, the new version finds a better value
in 197 iterations.