Multistart Overview of Algorithm

5 Algorithm

5.1 Multistart overview

A pseudo-code description of the MSNLP algorithm follows, in which SP(xt) denotes the starting point generator and xt is the candidate starting point produced. We refer to the local NLP solver as L(xs, xf), where xs is the starting point and xf the final point. The function UPDATE LOCALS(xs, xf, w) processes and stores solver output xf, using the starting point xs to compute the distance from xs to xf, and produces updated penalty weights, w.
MSNLP Algorithm
STAGE 1
x0 = user initial point
Call L(x0, xf)
Call UPDATE LOCALS(x0, xf,w)
FOR i = 1, n1 DO
Call SP(xt(i))
Evaluate P(xt(i),w)
ENDDO
xt* = point yielding best value of P(xt(i),w) over all stage one points, (i = 1, 2, ..., n1).
call L(xt*, xf)
Call UPDATE LOCALS(xt*, xf,w)
threshold = P(xt*,w)
STAGE 2
FOR i = 1, n2 DO Call SP(xt(i))
Evaluate P(xt(i),w)
Perform merit and distance filter tests:
Call distance filter(xt(i), dstatus)
Call merit filter(xt(i), threshold, mstatus)
IF (dstatus and mstatus = 'accept') THEN
Call L(xt(i), xf)
Call UPDATE LOCALS(xt(i), xf, w)
ENDIF
ENDDO
After an initial call to L at the user-provided initial point, x0, stage 1 of the algorithm performs n1 iterations in which SP(xt) is called, and the L1 exact penalty value P(xt,w) is calculated. The user can set n1 through the MSNLP options structure using the STAGE1_ITERATIONS keyword. The point with the smallest of these P values is chosen as the starting point for the next call to L, which begins stage 2. In this stage, n2 iterations are performed in which candidate starting points are generated and L is started at any one which passes the distance and merit filter tests.
The distance filter helps insure that the starting points for L are diverse, in the sense that they are not too close to any previously found local solution. Its goal is to prevent L from starting more than once within the basin of attraction of any local optimum. When a local solution is found, it is stored in a linked list, ordered by its objective value, as is the Euclidean distance between it and the starting point that led to it. If a local solution is located more than once, the maximum of these distances, maxdist, is updated and stored. For each trial point, t, if the distance between t and any local solution already found is less than DISTANCE_FACTOR*maxdist, L is not started from the point, and we obtain the next trial solution from the generator.

This distance filter implicitly assumes that the attraction basins are spherical, with radii at least maxdist. The default value of DISTANCE_FACTOR is 1.0, and it can be set to any positive value. As DISTANCE_FACTOR approaches zero, the filtering effect vanishes, as would be appropriate if there were many closely spaced local solutions. As it becomes larger than 1, the filtering effect increases until eventually L is never started.

The merit filter helps insure that the starting points for L have high quality, by not starting from candidate points whose exact penalty function value P1 is greater than a threshold. This threshold is set initially to the P1 value of the best candidate point found in the first stage of the algorithm. If trial points are rejected by this test for more than WAITCYCLE consecutive iterations, the threshold is increased by the updating rule:

threshold <− threshold + THRESHOLD_INCREASE_FACTOR*(1.0+abs(threshold))

where the default value of THRESHOLD_INCREASE_FACTOR is 0.2 and that for WAITCYCLE is 20. The additive 1.0 term is included so that threshold increases by at least THRESHOLD_INCREASE_FACTOR when its current value is near zero. When a trial point is accepted by the merit filter, threshold is decreased by setting it to the P1 value of that point.

The combined effect of these 2 filters is that L is started at only a few percent of the trial points, yet global optimal solutions are found for a very high percentage of the test problems. However, the chances of finding a global optimum are increased by increasing ITERATION_LIMIT (which we recommend trying first) or by 'loosening' either or both filters, although this is rarely necessary if the dynamic filters and basin overlap fix are used, as they are by default. If the ratio of stage 2 iterations to solver calls is more than 20 using the current filter parameters, and computation times with the default filter parameters are reasonable, one can try loosening the filters. This is achieved for the merit filter either by decreasing WAITCYCLE or by increasing THRESHOLD_INCREASE_FACTOR (or doing both), and for the distance filter by decreasing DISTANCE_FACTOR. Either or both filters may be turned off, by setting USE_DISTANCE_FILTER and/or USE_MERIT_FILTER to 0. Turning off both causes an NLP solver call at every stage 2 trial point. This is the best way to insure that all local optima are found, but it can take a long time.

5.2 Pure and smart random drivers

The 'pure' random (PR) driver generates uniformly distributed points within the hyper-rectangle S defined by the variable bounds. However, this rectangle is often very large, because users often set bounds to (-inf,+inf), (0,+inf), or to large positive and/or negative numbers, particularly in problems with many variables. This usually has little adverse impact on a good local solver, as long as the starting point is chosen well inside the bounds. But the PR generator will often generate starting points with very large absolute component values when some bounds are very large, and this sharply degrades solver performance. Thus we were motivated to develop random generators which control the likelihood of generating candidate points with large components, and intensify the search by focusing points into promising regions. We present two variants, one using normal, the other triangular distributions. Pseudo-code for this 'smart random' generator using normal distributions follows, where w is the set of penalty weights determined by the 'update locals' logic discussed above, after the first solver call at the user-specified initial point.

Smart Random Generator with Normal Distributions, SRN(xt)

IF (first call) THEN
Generate k1 (default 400) diverse points in S and evaluate the exact penalty function P(x,w) at each point.
B=subset of S with k2 (default 10) best P values
FOR i = 1,nvars DO
    xmax(i) = max of component i of points in B
    xmin(i)= min of component i of points in B
    mu(i) = (xmax(i) + xmin(i))/2
    ratio(i) = (xmax(i) - xmin(i))/(1+buvar(i)-blvar(i))
    sigfactor = 2.0
    IF (ratio>0.7) sigfactor = f(ratio)
    sigma(i) = (xmax(i) - xmin(i))/sigfactor
ENDDO
ENDIF
FOR i = 1,nvars DO
Generate a normally distributed random variable rv(i) with mean mu(i) and standard deviation sigma(i)
If rv(i) is between blvar(i) and buvar(i), xt(i) = rv(i)
If rv(i)<blvar(i), generate xt(i) uniformly between blvar(i) and xmin(i)
If rv(i)>buvar(i), generate xt(i) uniformly between xmax(i) and buvar(i)
ENDDO
Return xt

This SRN generator attempts to find a subset, B, of k2 'good' points, and generates most of its trial points xt, within the smallest rectangle containing B. It first generates a set of k1 diverse points within the bounds using a stratified random sampling procedure with frequency-based memory. For each variable x(i), this divides the interval [blvar(i), buvar(i)] into 4 equal segments, chooses a segment with probability inversely proportional to the frequency with which it has been chosen thus far, then generates a random point in this segment. We choose k2 of these points having the best P(x,w) penalty values, and use the smallest rectangle containing these, intersecting the ith axis at points [xmin(i), xmax(i)], to define n univariate normal distributions (driver SRN) or n univariate triangular distributions (driver SRT). The mean of the ith normal distribution, mu(i), is the midpoint of the interval [xmin(i), xmax(i)], and this point is also the mode of the ith triangular distribution, whose lower and upper limits are blvar(i) and buvar(i). The standard deviation of the ith normal distribution is selected as described below. The trial point xt is generated by sampling n times independently from these distributions. For the driver using normals, if the generated point lies within the bounds, it is accepted. Otherwise, we generate a uniformly distributed point between the violated bound and the start of the interval.

To determine the standard deviation of the normal distributions, we compute ratio, roughly the ratio of interval width to distance between bounds, where the factor 1.0 is included to avoid division by zero when the bounds are equal (fixed variables). If the interval width is small relative to the distance between bounds for variable i (ratio ≤ 0.7), then the standard deviation sigma(i) is half the interval width, so about 1/3 of the xt(i) values fall outside the interval, providing diversity when the interval does not contain an optimal value for x(i). If the bounds are large, then ratio should be small, say less than 0.1, so xt(i) values near the bounds are very unlikely. If ratio > 0.7, the function f sets sigfactor equal to 2.56 if ratio is between 0.7 and 0.8, increasing in steps to 6.2 if textitratio > 0.999. Thus if ratio is near 1.0, more than 99% of the values fall within the interval, and few have to be projected back within the bounds. The projecting back process avoids undesirable clustering of trial points at a bound, by generating points uniformly between the violated bound and the nearest edge of the interval [xmin(i), xmax(i)]. When the interval [xmin(i), xmax(i)] is sharply skewed toward one of the variable bounds and is much narrower than the distance between the bounds, a symmetric distribution like the normal, combined with our projection procedure, generates too many points between the interval and its nearest bound. A quick scan of the test results indicates that this happens rarely, but an asymmetric distribution like the triangular overcomes this difficulty, and needs no projection.

« Previous « Start