« Previous « Start
5 Algorithm
5.1 Multistart overview
A pseudo-code description of the MSNLP algorithm follows, in which
SP(xt) denotes the starting point generator and xt is the candidate
starting point produced. We refer to the local NLP solver as L(xs,
xf), where xs is the starting point and xf the final point. The
function UPDATE LOCALS(xs, xf, w) processes and stores solver output
xf, using the starting point xs to compute the distance from xs to
xf, and produces updated penalty weights, w.
MSNLP Algorithm
STAGE 1
x0 = user initial point
Call L(x0, xf)
Call UPDATE LOCALS(x0, xf,w)
FOR i = 1, n1
DO
Call SP(xt(i))
Evaluate P(xt(i),w)
ENDDO
xt* = point yielding best value of P(xt(i),w) over
all stage one points, (i = 1, 2, ..., n1).
call L(xt*, xf)
Call UPDATE LOCALS(xt*, xf,w)
threshold = P(xt*,w)
STAGE 2
FOR i = 1, n2
DO
Call SP(xt(i))
Evaluate P(xt(i),w)
Perform merit and distance filter tests:
Call distance filter(xt(i), dstatus)
Call merit filter(xt(i), threshold, mstatus)
IF (dstatus and mstatus = 'accept')
THEN
Call L(xt(i), xf)
Call UPDATE LOCALS(xt(i), xf, w)
ENDIF
ENDDO
After an initial call to L at the user-provided initial point, x0,
stage 1 of the algorithm performs n1 iterations in which SP(xt) is
called, and the L1 exact penalty value P(xt,w) is calculated. The
user can set n1 through the MSNLP options structure using the
STAGE1_ITERATIONS keyword. The point with the smallest of these P
values is chosen as the starting point for the next call to L, which
begins stage 2. In this stage, n2 iterations are performed in which
candidate starting points are generated and L is started at any one
which passes the distance and merit filter tests.
The distance filter helps insure that the starting points for L are
diverse, in the sense that they are not too close to any previously
found local solution. Its goal is to prevent L from starting more
than once within the basin of attraction of any local optimum. When
a local solution is found, it is stored in a linked list, ordered by
its objective value, as is the Euclidean distance between it and the
starting point that led to it. If a local solution is located more
than once, the maximum of these distances, maxdist, is updated and
stored. For each trial point, t, if the distance between t and any
local solution already found is less than DISTANCE_FACTOR*maxdist,
L is not started from the point, and we obtain the next trial
solution from the generator.
This distance filter implicitly assumes that the attraction basins
are spherical, with radii at least maxdist. The default value of
DISTANCE_FACTOR is 1.0, and it can be set to any positive value. As
DISTANCE_FACTOR approaches zero, the filtering effect vanishes, as
would be appropriate if there were many closely spaced local
solutions. As it becomes larger than 1, the filtering effect
increases until eventually L is never started.
The merit filter helps insure that the starting points for L have
high quality, by not starting from candidate points whose exact
penalty function value P1 is greater than a threshold. This
threshold is set initially to the P1 value of the best candidate
point found in the first stage of the algorithm. If trial points are
rejected by this test for more than WAITCYCLE consecutive
iterations, the threshold is increased by the updating rule:
threshold <− threshold +
THRESHOLD_INCREASE_FACTOR*(1.0+
abs(
threshold))
where the default value of THRESHOLD_INCREASE_FACTOR is 0.2 and
that for WAITCYCLE is 20. The additive 1.0 term is included so that
threshold increases by at least THRESHOLD_INCREASE_FACTOR when its
current value is near zero. When a trial point is accepted by the
merit filter, threshold is decreased by setting it to the P1 value
of that point.
The combined effect of these 2 filters is that L is started at only
a few percent of the trial points, yet global optimal solutions are
found for a very high percentage of the test problems. However, the
chances of finding a global optimum are increased by increasing
ITERATION_LIMIT (which we recommend trying first) or by 'loosening'
either or both filters, although this is rarely necessary if the
dynamic filters and basin overlap fix are used, as they are by
default. If the ratio of stage 2 iterations to solver calls is more
than 20 using the current filter parameters, and computation times
with the default filter parameters are reasonable, one can try
loosening the filters. This is achieved for the merit filter either
by decreasing WAITCYCLE or by increasing THRESHOLD_INCREASE_FACTOR
(or doing both), and for the distance filter by decreasing
DISTANCE_FACTOR. Either or both filters may be turned off, by
setting USE_DISTANCE_FILTER and/or USE_MERIT_FILTER to 0.
Turning off both causes an NLP solver call at every stage 2 trial
point. This is the best way to insure that all local optima are
found, but it can take a long time.
5.2 Pure and smart random drivers
The 'pure' random (PR) driver generates uniformly distributed points
within the hyper-rectangle S defined by the variable bounds.
However, this rectangle is often very large, because users often set
bounds to (-inf,+inf), (0,+inf), or to large positive and/or
negative numbers, particularly in problems with many variables. This
usually has little adverse impact on a good local solver, as long as
the starting point is chosen well inside the bounds. But the PR
generator will often generate starting points with very large
absolute component values when some bounds are very large, and this
sharply degrades solver performance. Thus we were motivated to
develop random generators which control the likelihood of generating
candidate points with large components, and intensify the search by
focusing points into promising regions. We present two variants, one
using normal, the other triangular distributions. Pseudo-code for
this 'smart random' generator using normal distributions follows,
where w is the set of penalty weights determined by the 'update
locals' logic discussed above, after the first solver call at the
user-specified initial point.
Smart Random Generator with Normal Distributions, SRN(xt)
IF (first call)
THEN
Generate k1 (default 400) diverse points in S and
evaluate the exact penalty function P(x,w)
at each point.
B=subset of S with k2 (default 10) best P values
FOR i = 1,nvars
DO
xmax(i) = max of component i of points in B
xmin(i)= min of component i of points in B
mu(i) = (xmax(i) + xmin(i))/2
ratio(i) = (xmax(i) - xmin(i))/(1+buvar(i)-blvar(i))
sigfactor = 2.0
IF (ratio>0.7) sigfactor = f(ratio)
sigma(i) = (xmax(i) - xmin(i))/sigfactor
ENDDO
ENDIF
FOR i = 1,nvars
DO
Generate a normally distributed random
variable rv(i) with mean mu(i) and standard deviation sigma(i)
If rv(i) is between blvar(i) and buvar(i), xt(i) = rv(i)
If rv(i)<blvar(i), generate xt(i) uniformly between
blvar(i) and xmin(i)
If rv(i)>buvar(i), generate xt(i) uniformly between
xmax(i) and buvar(i)
ENDDO
Return xt
This SRN generator attempts to find a subset, B, of k2 'good'
points, and generates most of its trial points xt, within the
smallest rectangle containing B. It first generates a set of k1
diverse points within the bounds using a stratified random sampling
procedure with frequency-based memory. For each variable x(i), this
divides the interval [blvar(i), buvar(i)] into 4 equal segments,
chooses a segment with probability inversely proportional to the
frequency with which it has been chosen thus far, then generates a
random point in this segment. We choose k2 of these points having
the best P(x,w) penalty values, and use the smallest rectangle
containing these, intersecting the ith axis at points [xmin(i),
xmax(i)], to define n univariate normal distributions (driver SRN)
or n univariate triangular distributions (driver SRT). The mean of
the ith normal distribution, mu(i), is the midpoint of the interval
[xmin(i), xmax(i)], and this point is also the mode of the ith
triangular distribution, whose lower and upper limits are blvar(i)
and buvar(i). The standard deviation of the ith normal distribution
is selected as described below. The trial point xt is generated by
sampling n times independently from these distributions. For the
driver using normals, if the generated point lies within the bounds,
it is accepted. Otherwise, we generate a uniformly distributed point
between the violated bound and the start of the interval.
To determine the standard deviation of the normal distributions, we
compute ratio, roughly the ratio of interval width to distance
between bounds, where the factor 1.0 is included to avoid division
by zero when the bounds are equal (fixed variables). If the interval
width is small relative to the distance between bounds for variable
i (
ratio ≤ 0.7), then the standard deviation sigma(i) is half
the interval width, so about 1/3 of the xt(i) values fall outside
the interval, providing diversity when the interval does not contain
an optimal value for x(i). If the bounds are large, then ratio
should be small, say less than 0.1, so xt(i) values near the bounds
are very unlikely. If ratio > 0.7, the function f sets sigfactor
equal to 2.56 if ratio is between 0.7 and 0.8, increasing in steps
to 6.2 if textitratio > 0.999. Thus if ratio is near 1.0, more
than 99% of the values fall within the interval, and few have to be
projected back within the bounds. The projecting back process avoids
undesirable clustering of trial points at a bound, by generating
points uniformly between the violated bound and the nearest edge of
the interval [xmin(i), xmax(i)]. When the interval [xmin(i),
xmax(i)] is sharply skewed toward one of the variable bounds and is
much narrower than the distance between the bounds, a symmetric
distribution like the normal, combined with our projection
procedure, generates too many points between the interval and its
nearest bound. A quick scan of the test results indicates that this
happens rarely, but an asymmetric distribution like the triangular
overcomes this difficulty, and needs no projection.
« Previous « Start