« Previous « Start » Next »
6 MINOS details
6.1 Introduction
TOMLAB /MINOS (hereafter referred to as MINOS) is a linear and
nonlinear programming system, designed to solve large-scale
constrained optimization problems of the following form:
|
|
|
|
F(x) + cT x + dT y |
|
| s/t |
b1 |
< f(x) + A1 y < |
b2 |
| |
b3 |
< A2x + A3 y < |
b4 |
| |
l |
< (x, y) < |
u |
|
(23) |
where the vectors
bi,
c,
d,
l,
u and the matrices
Ai
are constant,
F(
x) is a nonlinear function of some of the
variables, and
f(
x) is a vector of nonlinear functions. The
nonlinearities (if present) may be of a general nature but must be
smooth and preferably “almost linear", in the sense that they
should not change radically with small changes in the variables. We
make the following definitions:
| x |
the nonlinear variables |
| y |
the linear variables |
| (x, y) |
the vector x y |
| (1.1) |
the objective function |
| (1.2) |
the nonlinear constraints |
| (1.3) |
the linear constraints |
| (1.4) |
the bounds on the variables |
| m |
the total number of general constraints in (2) and (3) |
| n |
the total number of variables in x and y |
| m1 |
the number of nonlinear constraints (the dimension of f(x)) |
| n1 |
the number of nonlinear variables (the dimension of x) |
| n1′ |
the number of nonlinear objective variables (in F(x)) |
| n1′′ |
the number of nonlinear Jacobian variables (in f(x)) |
A large-scale problem is one in which
m and
n are several
hundred or several thousand. MINOS takes advantage of the fact
that the constraint matrices
Ai and the partial derivatives
∂
fi(
x) / ∂
xj are typically sparse (contain many
zeros).
The dimensions
n1′ and
n1′′ allow for the fact that
F(
x)
and
f(
x) may involve different sets of nonlinear variables “
x".
The two sets of variables
always overlap, in the sense that
the shorter “
x" is always the same as the beginning of the other.
If
x is the same in both cases, we have
n1 =
n1′ =
n1′′.
Otherwise, we define the number of nonlinear variables to be
n1 =
max(
n1′,
n1′′).
In the following sections we introduce more terminology and give an
overview of the MINOS optimization algorithms and the main system
features.
6.1.1 Linear Programming
When
F(
x) and
f(
x) are absent, the problem becomes a
linear program. Since there is no need to distinguish between
linear and nonlinear variables, we use
x rather than
y. We also
convert all general constraints into equalities with the aid of
slack variables
s, so that the only inequalities are simple bounds
on the variables. Thus, we write linear programs in the form
| |
|
|
cT x subject to Ax + s = b,
l ≤ (x, s) ≤ u.
(24) |
When the constraints are linear, the bounds on the slacks are
defined so that
b = 0. When there are nonlinear constraints, some
elements of
b are nonzero.
In the mathematical programming world,
x and
s are sometimes
called
structural variables and
logical variables.
Their upper and lower bounds are fundamental to problem formulations
and solution algorithms. Some of the components of
l may be
−∞ and those of
u may be +∞. If
lj =
uj, a
variable is said to be
fixed, and if its bounds are −∞
and +∞, the variable is called
free.
Within MINOS, a point (
x,
s) is said to be
feasible if the
following are true:
-
The constraints Ax + s = b are satisfied to within
machine precision ≈ 10−15.
- The bounds l ≤ (x, s) ≤ u are satisfied to within a
feasibility tolerance δfea ≈ 10−6.
- The nonlinear constraints (23) are satisfied to within a
row tolerance δrow ≈ 10−6.
Tolerances such as δ
fea and δ
row may be specified by
setting
Feasibility tolerance and
Row tolerance.
MINOS solves linear programs using a reliable implementation of
the
primal simplex method,
in which the
constraints
Ax +
s =
b are partitioned into the form
where the
basis matrix B is a square and nonsingular
submatrix of (
A I ). The elements of
x B and
x N are called the
basic and nonbasic variables respectively. Together, they are a
permutation of the vector (
x,
s). Certain
dual variables
π and
reduced costs d N are defined by the equations
BTπ =
c B,
d N =
c N −
NTπ,
(26)
where (
c B,
c N) is a permutation of the objective vector (
c,
0).
At a feasible point, nonbasic variables are typically equal to one
of their bounds, and basic variables are somewhere between their
bounds. To reach an optimal solution, the simplex method performs a
sequence of
iterations of the following general nature. With
guidance from
d N, a nonbasic variable is chosen to move from its
current value, and the basic variables are adjusted to satisfy the
constraints in (
24). Usually one of the basic variables
reaches a bound. The basis partition is then redefined with a column
of
B being replaced by a column of
N. When no such interchange
can be found to reduce the value of
cT x, the current solution is
optimal.
The simplex method
For convenience, let
x denote the variables (
x,
s). The main
steps in a simplex iteration are as follows:
-
Compute dual variables:
-
Solve BTπ = c B.
- Price:
-
Compute some or all of the reduced costs d N = c N − NTπ
to determine if a favorable nonbasic column aq exists.
- Compute search direction:
-
Solve Bp B = ± aq to determine the basic components of a
search direction p along which the objective is improved.
(The nonbasic elements of p are p N = 0, except for ± 1
for the element corresponding to aq.)
- Find maximum steplength:
-
Find the largest steplength αmax such that
x + αmax p continues to
satisfy the bounds on the variables. The steplength may be
determined by the new nonbasic variable reaching its opposite
bound, but normally some basic variable will reach a bound first.
- Update:
-
Take the step αmax. If this was determined by a
basic variable, interchange the corresponding column
of B with column aq from N.
When a starting basis is chosen and the basic variables
x B are
first computed, if any components of
x B lie significantly outside
their bounds, we say that the current point is
infeasible. In
this case, the simplex method uses a “Phase 1" procedure to reduce
the sum of infeasibilities. This is similar to the subsequent
“Phase 2" procedure just described.
The feasibility tolerance δ
fea is used to determine which Phase
is in effect. A similar
optimality tolerance δ
opt is
used during pricing to judge whether any reduced costs are
significantly large. (This tolerance is scaled by

π

, a
measure of the size of the current π.)
If the solution procedures are interrupted, some of the nonbasic
variables may lie strictly
between their bounds:
lj <
xj <
uj. In addition, at a “feasible" or “optimal" solution, some of
the basic variables may lie slightly outside their bounds:
lj −
δ
fea ≤
xj ≤
uj + δ
fea. In rare cases, even a few
nonbasic variables might lie outside their bounds by as much as
δ
fea.
MINOS maintains a sparse
LU factorization of the basis matrix
B, using a Markowitz ordering scheme and Bartels-Golub updates, as
implemented in the Fortran package
LUSOL
The basis factorization is central
to the efficient handling of sparse linear and nonlinear
constraints.
6.1.2 Problems with a Nonlinear Objective
When nonlinearities are confined to the term
F(
x) in the objective
function, the problem is a linearly constrained nonlinear program.
MINOS solves such problems using a
reduced-gradient method
combined with a
quasi-Newton method
that generally leads to superlinear convergence.
The implementation follows that described in
Murtagh and Saunders.
As a slight generalization of (
25), the constraints
Ax +
s
=
b are partitioned into the form
Bx B +
Sx S +
Nx N =
b,
(27)
where
x S is a set of
superbasic variables. As before, the
nonbasic variables are normally equal to one of their bounds, while
the basic
and superbasic variables lie somewhere between
their bounds (to within δ
fea). Let the number of superbasic
variables be
n S, the number of columns in
S. At a solution,
n S will be no more than
n1, the number of nonlinear variables,
and it is often much smaller than this. In many real-life cases we
have found that
n S remains reasonably small, say 200 or less,
regardless of the size of the problem. This is one reason why
MINOS has proved to be a practical tool.
In the reduced-gradient method,
x S is regarded as a set of
“independent variables" that are allowed to move in any desirable
direction to reduce the objective function (or the sum of
infeasibilities). The basic variables are then adjusted in order to
continue satisfying the linear constraints. If it appears that no
improvement can be made with the current definition of
B,
S and
N, one of the nonbasic variables is selected to be added to
S,
and the process is repeated with an increased value of
n S. At all
stages, if a basic or superbasic variable encounters one of its
bounds, that variable is made nonbasic and the value of
n S is
reduced by one.
For linear programs, we may interpret the simplex method as being
the same as the reduced-gradient method, with the number of
superbasic variables oscillating between 0 and 1. (In general, a
step of the simplex method
or the reduced-gradient method is
called a
minor iteration.)
A certain matrix
Z is needed for descriptive purposes. It takes
the form
though it is never computed explicitly. Given
LU factors of the
basis matrix
B, it is possible to compute products of the form
Z
q and
ZT g by solving linear equations involving
B or
BT.
This in turn allows optimization to be performed on the superbasic
variables, while the basic variables are adjusted to satisfy the
general linear constraints. (In the description below, the
reduced-gradient vector satisfies
d S =
ZT g, and the search
direction satisfies
p =
Zp S.)
An important part of MINOS is the quasi-Newton method used to
optimize the superbasic variables. This can achieve superlinear
convergence during any sequence of iterations for which the
B,
S,
N partition remains constant. It requires a dense
upper-triangular matrix
R of dimension
n S, which is updated in
various ways to approximate the
reduced Hessian:
where
H is the
Hessian of the objective function, i.e. the
matrix of second derivatives of
F(
x). As for unconstrained
optimization, the storage required for
R is sometimes a limiting
factor.
The reduced-gradient method
Let
g be the gradient of the nonlinear objective (
23).
The main steps in a reduced-gradient iteration are as follows:
-
Compute dual variables and reduced gradient:
-
Solve BTπ = g B and compute the reduced-gradient vector
d S = g S − STπ.
- Price:
-
If
d S
is sufficiently small,
compute some or all of the reduced costs d N = g N − NTπ
to determine if a favorable nonbasic column aq exists.
If so, move that column from N into S, expanding R accordingly.
- Compute search direction:
-
Solve RT R p S = − d S and Bp B = − Sp S
to determine the superbasic and basic components of a
search direction p along which the objective is improved.
(The nonbasic elements of p are p N = 0.)
- Find maximum steplength:
-
Find the largest steplength αmax such that
x + αmax p continues to
satisfy the bounds on the variables.
- Perform linesearch:
-
Find an approximate solution to the one-dimensional problem
|
F(x+α p)
subject to 0 ≤ α ≤ αmax.
|
- Update (quasi-Newton):
-
Take the step α. Apply a quasi-Newton update to R
to account for this step.
- Update (basis change):
-
If a superbasic variable reached a bound, move it from S into N.
If a basic variable reached a bound, find a suitable
superbasic variable to move from S into B,
and move the basic variable into N.
Update R if necessary.
At an optimum, the reduced gradient
d S should be zero. MINOS
terminates when
d S
≤ δ
opt
π

and the reduced
costs (component of
d N) are all sufficiently positive or
negative, as judged by the same quantity δ
opt
π

.
In the linesearch,
F(
x+α
p) really means the objective
function (
23) evaluated at the point (
x,
y,
s) + α
p. This steplength procedure is another important part of MINOS.
Two different procedures are used, depending on whether or not all
gradients are known
analytically.
The
number of nonlinear function evaluations required may be influenced
by setting the
Linesearch tolerance in the SPECS file.
Normally, the objective function
F(
x) will never be evaluated at a
point
x unless that point satisfies the linear constraints and the
bounds on the variables. An exception is during a finite-difference
check on the calculation of gradient elements. This check is
performed at the
starting point x0, which takes default
values or may be specified. MINOS ensures that the bounds
l ≤
x0 ≤
u are satisfied, but in general the starting point will not
satisfy the general linear constraints. If
F(
x0) is undefined,
the gradient check should be suppressed (
Verify level -1), or
the starting point should be redefined.
6.1.3 Problems with Nonlinear Constraints
If any of the constraints are nonlinear, MINOS employs a
projected Lagrangian algorithm, based on a method due to
Robinson, see Murtagh and Saunders.
This
involves a sequence of
major iterations, each of which
requires the solution of a
linearly constrained subproblem.
Each subproblem contains linearized versions of the nonlinear
constraints, as well as the original linear constraints and bounds.
At the start of the
k-th major iteration, let (
xk,
yk) be an
estimate of the variables, and let λ
k be an estimate of the
Lagrange multipliers (dual variables) associated with the nonlinear
constraints. The constraints are linearized by changing
f(
x) in
Equation (2) to its linear approximation:
f(x, xk) = f(xk) + J(xk)(x − xk),
or more briefly
f =
fk +
Jk(
x −
xk), where
J(
xk) is the
Jacobian matrix evaluated at
xk. (The
i-th row of the
Jacobian is the gradient vector for the
i-th nonlinear constraint
function.) The subproblem to be solved during the
k-th major
iteration is then
|
|
|
| F(x) + cT x + dT y − λkT fd + |
|
ρ fd 2 |
|
| s/t b1 |
≤ f+ A1y ≤ |
b2 |
| b3 |
≤ A2x + A3y ≤ |
b4, |
| l |
≤ (x, y) ≤ |
u, |
|
(30) |
where
fd =
f −
f is the difference between
f(
x) and its
linearization. The objective function (
30) is called an
augmented Lagrangian. The scalar ρ
k is a
penalty
parameter, and the term involving ρ
k is a modified
quadratic penalty function.
MINOS uses the reduced-gradient method to solve each subproblem.
As before, slack variables are introduced and the vectors
bi are
incorporated into the bounds on the slacks. The linearized
constraints take the form
We refer to this system as
Ax +
s =
b as in the linear case. The
Jacobian
Jk is treated as a sparse matrix, the same as the
matrices
A1,
A2 and
A3. The quantities
Jk,
b,
λ
k and ρ
k change each major iteration.
The projected Lagrangian method
For convenience, suppose that all variables and constraints are
nonlinear. The main steps in a major iteration are as follows:
-
Solve subproblem:
-
Find an approximate solution (2.8 x, λ) to the
kth subproblem (30)–(30).
- Compute search direction:
-
Adjust the elements of λ if necessary (if they
have the wrong sign).
Define a search direction
(Δ x, Δλ) = (2.8 x − xk, λ − λk).
- Find steplength:
-
Choose a steplength σ such that some merit
function M(x,λ) has a suitable value at the point
(xk + σΔ x, λk + σΔλ).
- Update:
-
Take the step σ to obtain (xk+1, λk+1).
In some cases, adjust ρk.
For the first major iteration, the nonlinear constraints are ignored
and minor iterations are performed until the original linear
constraints are satisfied.
The initial Lagrange multiplier estimate is typically λ
k =
0 (though it can be provided by the user). If a subproblem
terminates early, some elements of the new estimate λ may be
changed to zero.
The penalty parameter initially takes a certain default value
ρ
k = 100.0/
m1, where
m1 is the number of nonlinear
constraints. (A value
r times as big is obtained by specifying
Penalty parameter r.) For later major iterations, ρ
k
is reduced in stages when it appears that the sequence
{
xk,λ
k} is converging. In many cases it is safe to
specify
Penalty parameter 0.0 at the beginning, particularly
if a problem is only mildly nonlinear. This may improve the overall
efficiency.
In the output from MINOS, the term
Feasible subproblem
indicates that the
linearized constraints
(
30)–(
30) have been satisfied. In general, the
nonlinear constraints are satisfied only in the limit, so that
feasibility and
optimality occur at essentially the
same time. The nonlinear constraint violation is printed every
major iteration. Even if it is zero early on (say at the initial
point), it may increase and perhaps fluctuate before tending to
zero. On “well behaved" problems, the constraint violation will
decrease quadratically (i.e., very quickly) during the final few
major iterations.
For certain rare classes of problem it is safe to request the values
λ
k = 0 and ρ
k=0 for all subproblems by specifying
Lagrangian = No (in which case the nonlinear constraint
functions are evaluated only once per major iteration). However for
general problems, convergence is much more likely with the default
setting,
Lagrangian = Yes.
The merit function
Unfortunately, it is not known how to define a merit function
M(
x,λ) that can be
reduced at every major iteration.
As a result, there is no guarantee that the projected Lagrangian
method described above will converge from an arbitrary starting
point. This has been the principal theoretical gap in MINOS,
finally resolved by the PhD research of
Michael Friedlander.
The main features needed to stabilize MINOS are:
-
To relax the linearized constraints via an ℓ1 penalty function.
- To repeat a major iteration with increased ρk (and more
relaxed linearized constraints) if the nonlinear constraint
violation would increase too much.
In practice, the method of MINOS 5.51 often does converge and a
good
rate of convergence is often achieved in the final major
iterations, particularly if the constraint functions are “nearly
linear". As a precaution, MINOS prevents radical changes from one
major iteration to the next. Where possible, the steplength is
chosen to be σ = 1, so that each new estimate of the solution
is (
xk+1, λ
k+1) = (2.8
x, λ), the solution of
the subproblem. If this point is “too different", a shorter
steplength σ < 1 is chosen.
If the major iterations for a particular model do not appear to be
converging, some of the following actions may help:
-
Specify initial activity levels for the nonlinear variables
as carefully as possible.
- Include sensible upper and lower bounds on all variables.
- Specify a Major damping parameter that is lower than the
default value. This tends to make σ smaller.
- Specify a Penalty parameter that is higher than the
default value. This tends to prevent excessive departures
from the constraint linearization.
6.1.4 Problem Formulation
In general, it is worthwhile expending considerable prior analysis
to make the constraints completely linear if at all possible.
Sometimes a simple transformation will suffice. For example, a
pipeline optimization problem has pressure drop constraints of the
form
|
+ |
|
+ ≤
PT2 − P02,
|
where
di are the design variables (pipe diameters) and the other
terms are constant. These constraints are highly nonlinear, but by
redefining the decision variables to be
xi = 1/
di4.814 we can
make the constraints linear. Even if the objective function becomes
more nonlinear by such a transformation (and this usually happens),
the advantages of having linear constraints greatly outweigh this.
Similarly, it is usually best not to move nonlinearities from the
objective function into the constraints. For example, we should
not replace “
minimize F(
x)" by
minimize z subject to F(x) − z = 0.
Scaling is a very important matter during problem formulation.
A general rule is to scale both the data and the variables to be as
close to 1.0 as possible. In general we suggest the range 1.0 to
10.0. When conflicts arise, one should sacrifice the objective
function in favor of the constraints. Real-world problems tend to
have a natural scaling within each constraint, as long as the
variables are expressed in consistent physical units. Hence it is
often sufficient to apply a scale factor to each row. MINOS has
options to scale the rows and columns of the constraint matrix
automatically. By default, only the linear rows and columns are
scaled, and the procedure is reliable. If you request that the
nonlinear constraints and variables be scaled, bear in mind that the
scale factors are determined by the initial Jacobian
J(
x0), which
may differ considerably from
J(
x) at a solution.
Finally,
upper and lower bounds on the variables (and on the
constraints) are extremely useful for confining the region over
which optimization has to be performed. If sensible values are
known, they should always be used. They are also important for
avoiding singularities in the nonlinear functions. Note that bounds
may be violated slightly by as much as the feasibility tolerance
δ
fea. Hence, if √
x2 or log
x2 appear (for
example) and if δ
fea = 10
−6, the lower bound on
x2 would
normally have to be at least 10
−5. If it is
known that
x2 will be at least 0.5 (say) at a solution, then its lower bound
should be 0.5.
For a detailed discussion of many aspects of numerical optimization,
see
Gill, Murray and Wright for much invaluable advice on problem
formulation and assessment of results.
6.1.5 Restrictions
MINOS is designed to find solutions that are
locally
optimal. The nonlinear functions in a problem must be
smooth (i.e., their first derivatives must exist), especially
near the desired solution. The functions need not be separable.
A certain “feasible" region is defined by the general constraints
and the bounds on the variables. If the objective is convex within
this region and if the feasible region itself is convex, any optimal
solution obtained will be a
global optimum. Otherwise there
may be several local optima, and some of these may not be global.
In such cases the chances of finding a global optimum are usually
increased by choosing a starting point that is “sufficiently
close", but there is no general procedure for determining what
“close" means, or for verifying that a given local optimum is
indeed global.
Integer restrictions cannot be imposed directly. If a variable
xj is required to be 0 or 1, a common ploy is to include a
quadratic term
xj(1 −
xj) in the objective function. MINOS
will indeed terminate with
xj = 0 or 1, but inevitably the final
solution will just be a local optimum. (Note that the quadratic is
negative definite. MINOS will find a global minimum for quadratic
functions that are positive definite or positive semidefinite,
assuming the constraints are linear.)
6.2 Solver Options
The following sections describes some of the solver options
depending on problem type.
6.2.1 Options for Linear Programming
The following options apply specifically to linear programs.
|
|
|
|
|
Crash option |
i |
Default = 3 |
|
Crash tolerance |
t |
Default = 0.1 |
Except on restarts, a Crash procedure is used to select an initial
basis from certain rows and columns of the constraint matrix (
A I ).
The
Crash option i determines which rows and columns of
A
are eligible initially, and how many times Crash is called. Columns
of
I are used to pad the basis where necessary.
-
i=0 The initial basis contains only slack variables: B = I.
- 1 Crash is called once, looking for a triangular
basis in all rows and columns of A.
- 2 Crash is called twice (if there are nonlinear constraints).
The first call looks for a triangular basis in linear rows,
and the first major iteration proceeds with simplex iterations
until the linear constraints are satisfied.
The Jacobian is then evaluated for the second major iteration
and Crash is called again to find a triangular basis in the
nonlinear rows (retaining the current basis for linear
rows).
- 3 Crash is called up to three times
(if there are nonlinear constraints).
The first two calls treat linear equalities and
linear inequalities separately. As before, the last
call treats nonlinear rows at the start of the second major
iteration.
If
i ≥ 1, certain slacks on inequality rows are selected for the
basis first. (If
i ≥ 2, numerical values are used to exclude
slacks that are close to a bound.) Crash then makes several passes
through the columns of
A, searching for a basis matrix that is
essentially triangular. A column is assigned to “pivot" on a
particular row if the column contains a suitably large element in a
row that has not yet been assigned. (The pivot elements ultimately
form the diagonals of the triangular basis.) For remaining
unassigned rows, slack variables are inserted to complete the basis.
The
Crash tolerance allows Crash to ignore certain “small"
nonzeros in each column of
A. If
amax is the largest
element in column
j, other nonzeros
aij in the column are
ignored if |
aij| ≤
amax ×
t. (To be meaningful,
t
should be in the range 0 ≤
t < 1.)
When
t > 0.0, the bases obtained by Crash may not be strictly
triangular, but are likely to be nonsingular and almost triangular.
The intention is to choose a basis containing more columns of
A
and fewer (arbitrary) slacks. A feasible solution may be reached
sooner on some problems.
For example, suppose the first
m columns of
A are the matrix
shown under
LU factor tolerance; i.e., a tridiagonal matrix
with entries −1, 4, −1. To help Crash choose all
m columns for
the initial basis, we would specify
Crash tolerance t for
some value of
t > 1/4.
6.2.2 Options for All Problems
The following options have the same purpose for all problems,
whether they linear or nonlinear.
|
|
|
|
|
Check frequency |
k |
Default = 60 |
Every
k-th minor iteration after the most recent basis
factorization, a numerical test is made to see if the current
solution
x satisfies the general linear constraints (including
linearized nonlinear constraints, if any). The constraints are of
the form
Ax +
s =
b, where
s is the set of slack variables. To
perform the numerical test, the residual vector
r =
b −
Ax −
s is
computed. If the largest component of
r is judged to be too large,
the current basis is refactorized and the basic variables are
recomputed to satisfy the general constraints more accurately.
Check frequency 1 is useful for debugging purposes, but
otherwise this option should not be needed.
|
|
|
|
|
Cycle limit |
l |
Default = 1 |
|
Cycle print |
p |
Default = 1 |
|
Cycle tolerance |
t |
Default = 0.0 |
|
Phantom columns |
c |
Default = 0 |
|
Phantom elements |
e |
Default = 0 |
|
|
|
|
|
Debug level |
l |
Default = 0 |
This causes various amounts of information to be output to the Print
file. Most debug levels are not helpful to normal users, but they
are listed here for completeness.
-
l = 0
- No debug output.
- l = 2
- (or more) Output from m5setx showing the maximum
residual after a row check.
- l = 40
- Output from lu8rpc (which updates the LU factors
of the basis matrix), showing the position of the
last nonzero in the transformed incoming column.
- l = 50
- Output from lu1mar (which computes the LU factors
each refactorization), showing each pivot row and column
and the dimensions of the dense matrix involved in the associated
elimination.
- l = 100
- Output from m2bfac and m5log listing the basic
and superbasic variables and their values at every iteration.
|
|
|
|
|
Expand frequency |
k |
Default = 10000 |
This option is part of an anti-cycling procedure designed to
guarantee progress even on highly degenerate problems.
“Cycling" can occur only if zero steplengths are allowed. Here, the
strategy is to force a positive step at every iteration, at the
expense of violating the bounds on the variables by a small amount.
Suppose that the
Feasibility tolerance is δ. Over a
period of
k iterations, the tolerance actually used by MINOS
increases from 0.5δ to δ (in steps of 0.5δ/
k).
Every
k iterations, or when feasibility and optimality are first
detected, a resetting procedure eliminates any infeasible nonbasic
variables. Some additional iterations may be needed to restore
feasibility and optimality. Increasing
k reduces that likelihood,
but it gives less freedom to choose large pivot elements during
basis changes. (See
Pivot tolerance.)
|
|
|
|
|
Factorization frequency |
k |
Default = 100 (LP) or 50
(NLP) |
With linear programs, most iterations cause a basis change, in which
one column of the basis matrix
B is replaced by another. The
LU
factors of
B must be updated accordingly. At most
k updates are
performed before the current
B is factorized directly.
Each update tends to add nonzeros to the
LU factors. Since the
updating method is stable,
k mainly affects the efficiency of
minor iterations, rather than stability.
High values of
k (such as 100 or 200) may be more efficient on
“dense" problems, when the factors of
B tend to have two or three
times as many nonzeros as
B itself. Lower values of
k may be
more efficient on problems that are very sparse.
|
|
|
|
|
Feasibility tolerance |
t |
Default = 1.0e-6 |
This sets the feasibility tolerance δ
fea =
t (see
§
6.2.1). A variable or constraint is considered
feasible if it does not lie outside its bounds by more than
δ
fea.
MINOS first attempts to satisfy the linear constraints and bounds.
If the sum of infeasibilities cannot be reduced to zero, the problem
is declared infeasible. Let
sinf be the corresponding sum of
infeasibilities. If
sinf is quite small, it may be appropriate to
raise
t by a factor of 10 or 100. Otherwise, some error in the
data should be suspected. If
sinf is not small, there may be other
points that have a significantly smaller sum of infeasibilities.
MINOS does not attempt to find a solution that minimizes the sum.
For
Scale option 1 or 2, feasibility is defined in terms of
the
scaled problem (since it is then more likely to be
meaningful). The final unscaled solution can therefore be infeasible
by an unpredictable amount, depending on the size of the scale
factors. Precautions are taken so that in a “feasible solution" the
original variables will never be infeasible by more than 0.1.
Values that large are very unlikely.
|
|
|
|
|
Iterations limit |
k |
Default = 3m |
MINOS stops after
k iterations even if the simplex method has
not yet reached a solution. If
k = 0, no iterations are performed,
but the starting point is tested for both feasibility and
optimality.
|
|
|
|
|
LU factor tolerance |
t1 |
Default = 100.0 (LP) or 5.0 (NLP) |
|
LU update tolerance |
t2 |
Default = 110.0 (LP)
or 5.0 (NLP) |
These tolerances affect the stability and sparsity of the basis
factorization
B =
LU during refactorization and updating,
respectively. They must satisfy
t1,
t2 ≥ 1.0. The matrix
L
is a product of matrices of the form
where the multipliers μ satisfy |μ| ≤
ti. Values near
1.0 favor stability, while larger values favor sparsity. The default
values usually strike a good compromise. For large and relatively
dense problems,
t1 = 10.0 or 5.0 (say) may give a useful
improvement in stability without impairing sparsity to a serious
degree.
For certain very regular structures (e.g., band matrices) it may be
necessary to reduce
t1 and/or
t2 in order to achieve
stability. For example, if the columns of
A include a submatrix
of the form





 |
|
| 4 |
−1 |
| −1 |
4 |
−1 |
| |
 |
 |
 |
| |
|
−1 |
4 |
−1 |
| |
|
|
−1 |
4 |
|
|





 |
, |
one should set both
t1 and
t2 to values in the range
1.0 ≤
ti < 4.0.
|
|
|
|
|
LU density tolerance |
t3 |
Default = 0.5 |
|
LU singularity tolerance |
t4 |
Default = є0.67
≈ 3.25e−11 |
The density tolerance
t3 is used during
LU factorization of
the basis matrix. Columns of
L and rows of
U are formed
one at a time, and the remaining rows and columns of the
basis are altered appropriately. At any stage, if the density
of the remaining matrix exceeds
t3,
the Markowitz strategy for choosing pivots is altered to
reduce the time spent searching for each remaining pivot.
Raising the density tolerance towards 1.0 may give slightly
sparser
LU factors, with a slight increase in factorization time.
The singularity tolerance
t4 helps guard against ill-conditioned
basis matrices. When the basis is refactorized, the diagonal
elements of
U are tested as follows: if |
Ujj| ≤
t4 or
|
Ujj| <
t4 max
i |
Uij|, the
j-th column of the basis is
replaced by the corresponding slack variable. (This is most likely
to occur after a restart, or at the start of a major iteration.)
In some cases, the Jacobian matrix may converge to values that make
the basis exactly singular. (For example, a whole row of the
Jacobian could be zero at an optimal solution.) Before exact
singularity occurs, the basis could become very ill-conditioned and
the optimization could progress very slowly (if at all). Setting
t4 =
1.0e-5, say, may help cause a judicious change of
basis.
|
|
|
|
|
Maximize |
|
Minimize |
|
Default |
This specifies the required direction of optimization.
|
|
|
|
|
Multiple price |
k |
Default = 1 |
It is not normal to set
k>1 for linear programs, as it causes
MINOS to use the reduced-gradient method rather than the simplex
method. The number of iterations, and the total work, are likely to
increase.
The reduced-gradient iterations do
not correspond to the very
efficient multiple pricing “minor iterations" carried out by
certain commercial linear programming systems. Such systems require
storage for
k dense vectors of dimension
m, so that
k is
usually limited to 5 or 6. In MINOS, the total storage requirements
increase only slightly with
k. (The
Superbasics limit must
be at least
k.)
|
|
|
|
|
Optimality tolerance |
t |
Default = 1.0e-6 |
This is used to judge the size of the reduced gradients
dj =
gj −
π
T aj, where
gj is the gradient of the objective function
corresponding to the
j-th variable,
aj is the associated column
of the constraint matrix (or Jacobian), and π is the set of dual
variables.
By construction, the reduced gradients for basic variables are
always zero. Optimality is declared if the reduced gradients for
nonbasic variables at their lower or upper bounds satisfy
dj /

π

≥ −
t or
dj /

π

≤
t respectively, and if
|
dj| /

π

≤
t for superbasic variables.
In those tests,

π

is a measure of the size of the dual
variables. It is included to make the tests independent of a large
scale factor on the objective function. The quantity actually used
is defined by σ = Σ
i=1m |π
i|,

π

=
max{σ / √
m, 1.0} , so that only scale factors larger
than 1.0 are allowed for. If the objective is scaled down to be very
small, the optimality test reduces to comparing
dj against
t.
|
|
|
|
|
Partial price |
p |
Default = 10 (LP) or 1 (NLP) |
This parameter is recommended for large problems that have
significantly more variables than constraints. It reduces the work
required for each “pricing" operation (when a nonbasic variable is
selected to become basic or superbasic).
When
p=1, all columns of the constraint matrix (
A I ) are searched.
Otherwise,
A and
I are partitioned to give
p roughly equal
segments
Aj,
Ij (
j = 1 to
p). If the previous pricing
search was successful on
Aj,
Ij, the next search begins on
the segments
Aj+1,
Ij+1. (Subscripts are modulo
p.)
If a reduced gradient is found that is larger than some dynamic
tolerance, the variable with the largest such reduced gradient (of
appropriate sign) is selected to become superbasic. (Several may be
selected if multiple pricing has been specified.) If nothing is
found, the search continues on the next segments
Aj+2,
Ij+2, and so on.
Partial price t (or
t/2 or
t/3) may be appropriate for
time-stage models having
t time periods.
|
|
|
|
|
Pivot tolerance |
t |
Default = є2/3 ≈
10−11 |
Broadly speaking, the pivot tolerance is used to prevent columns
entering the basis if they would cause the basis to become almost
singular. When
x changes to
x + α
p for some search
direction
p, a “ratio test" is used to determine which component
of
x reaches an upper or lower bound first. The corresponding
element of
p is called the pivot element.
For linear problems, elements of
p are ignored (and therefore
cannot be pivot elements) if they are smaller than the pivot
tolerance
t. For nonlinear problems, elements smaller than
t
p
are ignored.
It is common for two or more variables to reach a bound at
essentially the same time. In such cases, the
Feasibility
tolerance provides some freedom to maximize the pivot element and
thereby improve numerical stability. Excessively small Feasibility
tolerances should therefore not be specified.
To a lesser extent, the
Expand frequency also provides some
freedom to maximize the pivot element. Excessively
large
Expand frequencies should therefore not be specified.
|
|
|
|
|
Scale option |
l |
Default = 2 (LP) or 1 (NLP) |
|
Scale |
Yes |
|
Scale |
No |
|
Scale linear variables |
|
Same as Scale option 1 |
|
Scale nonlinear variables |
|
Same as Scale option 2 |
|
Scale all variables |
|
Same as Scale option 2 |
|
Scale tolerance |
t |
Default = 0.9 |
|
Scale, Print |
|
Scale, Print, Tolerance |
t |
Three scale options are available as follows:
-
l = 0
- No scaling. If storage is at a premium, this option
saves m+n words of workspace.
- l = 1
- Linear constraints and variables are scaled by an
iterative procedure that attempts to make the
matrix coefficients as close as possible to 1.0 (see
Fourer, 1982). This sometimes improves the performance
of the solution procedures.
- l = 2
- All constraints and variables are scaled by the iterative
procedure. Also, a certain additional scaling is performed
that may be helpful if the right-hand side b or the
solution x is large. This takes into
account columns of ( A I ) that are fixed or have positive
lower bounds or negative upper bounds.
Scale Yes sets the default scaling. (
Caution: If all
variables are nonlinear,
Scale Yes unexpectedly does
nothing, because there are no linear variables to scale.)
Scale No suppresses scaling (equivalent to
Scale option
0).
If nonlinear constraints are present,
Scale option 1 or
0 should generally be tried at first.
Scale option 2 gives
scales that depend on the initial Jacobian, and should therefore be
used only if (a) a good starting point is provided, and (b) the
problem is not highly nonlinear.
Scale, Print causes the row-scales
r(
i) and column-scales
c(
j) to be printed. The scaled matrix coefficients are
aij =
aij c(
j) /
r(
i), and the scaled bounds on the
variables and slacks are
lj =
lj /
c(
j),
uj =
uj /
c(
j),
where
c(
j) ≡
r(
j−
n) if
j>
n.
All forms except
Scale option may specify a tolerance
t,
where 0 <
t < 1 (for example:
Scale, Print, Tolerance =
0.99). This affects how many passes might be needed through the
constraint matrix. On each pass, the scaling procedure computes the
ratio of the largest and smallest nonzero coefficients in each
column:
| ρj = |
|
|aij| / |
|
|aij| (aij≠ 0). |
If max
j ρ
j is less than
t times its previous value,
another scaling pass is performed to adjust the row and column
scales. Raising
t from 0.9 to 0.99 (say) usually increases the
number of scaling passes through
A. At most 10 passes are made.
If a
Scale option has not already been specified,
Scale,
Print or
Scale tolerance both set the default scaling.
|
|
|
|
|
Weight on linear objective |
w |
Default = 0.0 |
This keyword invokes the so-called
composite objective
technique, if the first solution obtained is infeasible, and if the
objective function contains linear terms. While trying to reduce the
sum of infeasibilities, the method also attempts to optimize the
linear objective. At each infeasible iteration, the objective
function is defined to be
|
σ w(cT x) + (sum of infeasibilities), |
where σ=1 for minimization, σ= −1 for maximization,
and
c is the linear objective. If an “optimal" solution is
reached while still infeasible,
w is reduced by a factor of 10.
This helps to allow for the possibility that the initial
w is too
large. It also provides dynamic allowance for the fact that the sum
of infeasibilities is tending towards zero.
The effect of
w is disabled after 5 such reductions, or if a
feasible solution is obtained.
The
Weight option is intended mainly for linear programs. It is
unlikely to be helpful on nonlinear problems.
6.2.3 Options for Nonlinear Objectives
The following options apply to nonlinear programs whose constraints
are linear.
|
|
|
|
|
Crash option |
l |
Default = 3 |
|
Crash tolerance |
t |
Default = 0.1 |
These options are the same as for linear programs.
6.2.4 Options for All Nonlinear problems
|
|
|
|
|
Expand frequency |
k |
Default = 10000 |
This option is used the same as for linear programs, but takes
effect only when there is just one superbasic variable. (Cycling can
occur only when the current solution is at a vertex of the feasible
region. Thus, zero steps are allowed if there is more than one
superbasic variable, but otherwise positive steps are enforced.)
Increasing
k helps reduce the number of slightly infeasible
nonbasic basic variables (most of which are eliminated during a
resetting procedure). However, it also diminishes the freedom to
choose a large pivot element (see
Pivot tolerance).
|
|
|
|
|
Feasibility tolerance |
t |
Default = 1.0e-6 |
When the constraints are linear, a
feasible solution is one
in which all variables, including slacks, satisfy their upper and
lower bounds to within the absolute tolerance
t. (Since slacks
are included, this means that the general linear constraints are
also satisfied to within
t.)
When nonlinear constraints are present, a
feasible
subproblem is one in which the linear constraints and bounds, as
well as the current linearization of the nonlinear constraints, are
satisfied to within the tolerance
t.
MINOS first attempts to satisfy the linear constraints and bounds.
If the sum of infeasibilities cannot be reduced to zero, the problem
is declared infeasible.
Normally, the nonlinear functions
F(
x) and
f(
x) are evaluated
only at points
x that satisfy the linear constraints and bounds.
If the functions are undefined in certain regions, every attempt
should be made to eliminate these regions from the problem. For
example, for a function
F(
x) = √
x1 + log
x2, it would be
essential to place lower bounds on both variables. If
Feasibility tolerance = 10
−6, the bounds
x1 ≥ 10
−5 and
x2 ≥ 10
−4 might be appropriate. (The log singularity is
more serious; in general, keep variables as far away from
singularities as possible.)
An exception is during optional gradient checking (see
Verify), which occurs before any optimization takes place. The
points at which the functions are evaluated satisfy the bounds but
not necessarily the general constraints. If this causes difficulty,
gradient checking should be suppressed by setting
Verify level
-1.
If a subproblem is infeasible, the bounds on the linearized
constraints are relaxed in several stages until the subproblem
appears feasible. (The true bounds are restored for the next
subproblem.) This approach sometimes allows the optimization to
proceed successfully. In general, infeasible subproblems are a
symptom of difficulty and it may be necessary to increase the
Penalty parameter or alter the starting point.
Note: Feasibility with respect to the nonlinear constraints
is measured against the
Row tolerance, not the
Feasibility tolerance.
|
|
|
|
|
Hessian dimension |
r |
Default = 50 |
This specifies that an
r ×
r triangular matrix
R is to be
available for use by the quasi-Newton algorithm (to approximate the
reduced Hessian matrix according to
ZT H Z ≈
RT R).
Suppose there are
s superbasic variables at a particular
iteration.
Whenever possible, r should be greater than s.
If
r ≥
s, the first
s columns of
R are used to approximate
the reduced Hessian in the normal manner. If there are no further
changes to the set of superbasic variables, the rate of convergence
is usually superlinear. If
r<
s, a matrix of the form
is used to approximate the reduced Hessian, where
Rr is an
r×
r upper triangular matrix and
D is a
diagonal
matrix of order
s−
r. The rate of convergence is no longer
superlinear (and may be arbitrarily slow).
The storage required is of order 1/2
r2, which is
substantial if
r is as large as 1000 (say). In general,
r
should be a slight over-estimate of the final number of superbasic
variables, whenever storage permits. It need not be larger than
n1 + 1, where
n1 is the number of nonlinear variables. For
many problems it can be much smaller than
n1.
|
|
|
|
|
Iterations limit |
k |
Default = 3m + 10n1 |
If the constraints are linear, this is the maximum number of
iterations allowed for the simplex method or the reduced-gradient
method. Otherwise, it is the maximum number of
minor
iterations, summed over all major iterations.
If
k = 0, no minor iterations are performed, but the starting
point is tested for both feasibility and optimality.
|
|
|
|
|
Linesearch tolerance |
t |
Default = 0.1 |
For nonlinear problems, this controls the accuracy with which a
steplength α is located during one-dimensional searches of
the form
|
F(x+α p)
subject to 0 < α ≤ β. |
A linesearch occurs on most minor iterations for which
x is
feasible. (If the constraints are nonlinear, the function being
minimized is the augmented Lagrangian in equation (5).)
The value of
t must satisfy 0.0 ≤
t < 1.0. The default value
t = 0.1 requests a moderately accurate search, and should be
satisfactory in most cases. If the nonlinear functions are cheap to
evaluate, a more accurate search may be appropriate; try
t = 0.01
or
t = 0.001. The number of iterations should decrease, and this
will reduce total run time if there are many linear or nonlinear
constraints. If the nonlinear functions are
expensive to
evaluate, a less accurate search may be appropriate; try
t = 0.5
or perhaps
t = 0.9. (The number of iterations will probably
increase, but the total number of function evaluations may decrease
enough to compensate.)
|
|
|
|
|
LU singularity tolerance |
t3 |
Default =
є0.67
≈ 3.25e−11 |
|
LU swap tolerance |
t4 |
Default = є1/4
≈ 10−4 |
The singularity tolerance
t3 helps guard against ill-conditioned
basis matrices. When the basis is refactorized, the diagonal
elements of
U are tested as follows: if |
Ujj| ≤
t3 or
|
Ujj| <
t3 max
i |
Uij|, the
j-th column of the basis is
replaced by the corresponding slack variable. (This is most likely
to occur after a restart, or at the start of a major iteration.)
In some cases, the Jacobian matrix may converge to values that make
the basis exactly singular. (For example, a whole row of the
Jacobian could be zero at an optimal solution.) Before exact
singularity occurs, the basis could become very ill-conditioned and
the optimization could progress very slowly (if at all). Setting
t3 =
1.0e-5, say, may help cause a judicious change of
basis.
The
LU swap tolerance is somewhat similar but can take effect
more
easily. It is again used only after a basis factorization, and
normally just at the start of a major iteration. If a diagonal
of
U seems to be rather small (as measured by
t4)
relative to the biggest diagonal of
U, a basis change is made
in which the basic variable associated with the small diagonal of
U
is swapped with a carefully chosen superbasic variable (if there
are any). The number of superbasic variables stays the same.
A message is printed to advise that a swap has occurred.
In practice this tends to help problems whose basis is becoming
ill-conditioned. If the number of swaps becomes excessive, set
LU swap tolerance 1.0e-6, say, or perhaps smaller.
|
|
|
|
|
Minor damping parameter |
d |
Default = 2.0 |
This parameter limits the change in
x during a linesearch. It
applies to all nonlinear problems, once a “feasible solution" or
“feasible subproblem" has been found.
A linesearch of the form
minimizeα F(
x+α
p) is
performed over the range 0 < α ≤ β, where β is
the step to the nearest upper or lower bound on
x. Normally, the
first steplength tried is α
1 = min(1,β), but in some
cases, such as
F(
x) =
aebx or
F(
x) =
axb, even a moderate
change in the components of
x can lead to floating-point overflow.
The parameter
d is therefore used to define a limit β =
d(1 +
x
) /
p
, and the first evaluation of
F(
x) is
at the potentially smaller steplength α
1 = min(1, β,
β).
Wherever possible, upper and lower bounds on
x should be used to
prevent evaluation of nonlinear functions at meaningless points. The
Minor damping parameter provides an additional safeguard. The
default value
d = 2.0 should not affect progress on well behaved
problems, but setting
d = 0.1 or 0.01 may be helpful when rapidly
varying functions are present. A “good" starting point may be
required. An important application is to the class of nonlinear
least-squares problems.
In cases where several local optima exist, specifying a small value
for
d may help locate an optimum near the starting point.
|
|
|
|
|
Multiple price |
k |
Default = 1 |
“Pricing" refers to a scan of the current nonbasic variables to
determine if any should be changed from their current value (by
allowing them to become superbasic or basic).
If multiple pricing is in effect, the
k best nonbasic variables
are selected for admission to the superbasic set. (“Best" means
the variables with largest reduced gradients of appropriate sign.)
If partial pricing is also in effect, the
k best variables are
selected from the current partition of
A and
I.
On large nonlinear problems it may help to set
k > 1 if there are
not many superbasic variables at the starting point but many at the
optimal solution.
|
|
|
|
|
Optimality tolerance |
t |
Default = 1.0e-6 |
|
|
|
|
|
Partial price |
p |
Default = 10 (LP) or 1 (NLP) |
This parameter may be useful for large problems that have
significantly more variables than constraints. Larger values reduce
the work required for each “pricing" operation (when a nonbasic
variable is selected to become basic or superbasic).
|
|
|
|
|
Scale option |
l |
Default = 2 (LP) or 1 (NLP) |
|
Scale |
Yes |
|
Scale |
No |
|
Scale linear variables |
|
Same as Scale option 1 |
|
Scale nonlinear variables |
|
Same as Scale option 2 |
|
Scale all variables |
|
Same as Scale option 2 |
|
Scale tolerance |
t |
Default = 0.9 |
|
Scale, Print |
|
Scale, Print, Tolerance |
t |
Three scale options are available as follows:
-
l = 0
- No scaling. If storage is at a premium, this option
saves m+n words of workspace.
- l = 1
- If some of the variables are linear, the constraints
and linear variables are scaled by an
iterative procedure that attempts to make the
matrix coefficients as close as possible to 1.0.
- l = 2
- The constraints and variables are scaled by the iterative
procedure. Also, a certain additional scaling is performed
that may be helpful if the right-hand side b or the
solution x is large. This takes into
account columns of ( A I ) that are fixed or have positive
lower bounds or negative upper bounds.
Scale option 1 is the default for nonlinear problems. (Only
linear variables are scaled.)
Scale Yes sets the default. (
Caution: If all variables
are nonlinear,
Scale Yes unexpectedly does
nothing,
because there are no linear variables to scale.)
Scale No
suppresses scaling (equivalent to
Scale option 0).
The
Scale tolerance and
Scale, Print options are the
same as for linear programs.
|
|
|
|
|
Subspace tolerance |
t |
Default = 0.5 |
This controls the extent to which optimization is confined to the
current set of basic and superbasic variables (Phase 4 iterations),
before one or more nonbasic variables are added to the superbasic
set (Phase 3). The value specified must satisfy 0 <
t ≤ 1.
When a nonbasic variable
xj is made superbasic, the norm of the
reduced-gradient vector (for all superbasics) is recorded. Let this
be
ZT g0
. (In fact, the norm is |
dj|, the size of the
reduced gradient for the new superbasic variable
xj.)
Subsequent Phase 4 iterations continue at least until the norm of
the reduced-gradient vector satisfies
ZT g
≤
t ×
ZT g0
. (
ZT g
is the size of the largest
reduced-gradient among the superbasic variables.)
A smaller value of
t is likely to increase the total number of
iterations, but may reduce the number of basis changes. A larger
value such as
t = 0.9 may sometimes lead to improved overall
efficiency, if the number of superbasic variables is substantially
larger at the optimal solution than at the starting point.
Other convergence tests on the change in the function being
minimized and the change in the variables may prolong Phase 4
iterations. This helps to make the overall performance insensitive
to larger values of
t.
|
|
|
|
|
Superbasics limit |
s |
Default = 50 |
This places a limit on the storage allocated for superbasic
variables. Ideally,
s should be set slightly larger than the
“number of degrees of freedom" expected at an optimal solution.
For nonlinear problems, the number of degrees of freedom is often
called the “number of independent variables". Normally,
s need
not be greater than
n1+1, where
n1 is the number of nonlinear
variables. For many problems,
s may be considerably smaller than
n1. This saves storage if
n1 is very large.
If
Hessian dimension r is specified, the default value of
s is the same number (and conversely). This is a safeguard to
ensure superlinear convergence wherever possible. Otherwise, the
default for both
r and
s is 50.
|
|
|
|
|
Unbounded objective value |
r1 |
Default = 1.0e+20 |
|
Unbounded step size |
r2 |
Default = 1.0e+10 |
These parameters are intended to detect unboundedness in nonlinear
problems. During a linesearch of the form
min
α F(
x + α
p) ,
if |
F | exceeds
r1 or if α exceeds
r2, iterations
are terminated with the exit message
Problem is unbounded (or
badly scaled).
If singularities are present, unboundedness in
F(
x) may be
manifested by a floating-point overflow (during the evaluation of
F(
x + α
p)), before the test against
r1 can be made.
Unboundedness in
x is best avoided by placing finite upper and
lower bounds on the variables. See also the
Minor damping
parameter.
|
|
|
|
|
Verify level |
l |
Default = 0 |
|
Verify objective gradients |
|
Same as Verify level 1 |
|
Verify constraint gradients |
|
Same as Verify level 2 |
|
Verify |
|
Same as Verify level 3 |
|
Verify gradients |
|
Same as Verify level 3 |
|
Verify |
Yes |
Same as Verify level 3 |
|
Verify |
No |
Same as Verify level 0 |
These options refer to a check on the gradients computed by your
nonlinear function routines
funobj and
funcon at the starting
point (the initial value of the nonlinear variables
x(*)).
Values output in the gradient array
g(*) are compared with
estimates obtained by finite differences.
-
l = 0
- Only a “cheap" test is performed, requiring
three evaluations of the nonlinear objective (if any) and two
evaluations of the nonlinear constraints.
- l = 1
- A more reliable check is made on each component
of the objective gradient.
- l = 2
- A check is made on each column of the Jacobian matrix
associated with the nonlinear constraints.
- l = 3
- A detailed check is made on both the objective and
the Jacobian.
- l = −1
- No checking is performed. This may be necessary
if the functions are undefined at the starting point.
Verify level 3 is recommended for a new function routine,
particularly if the “cheap" test indicates error. Missing gradients
are not checked (so there is no overhead). If there are many
nonlinear variables, the
Start and
Stop keywords may be used
to limit the check to a subset.
As noted, gradient verification occurs at the starting point, before
a problem is scaled, and before the first basis is factorized. The
bounds on
x will be satisfied, but the general linear constraints
may not. If the nonlinear objective or constraint functions are
undefined, you could initially specify
Objective = NONE
Nonlinear objective variables 0
Major iterations 1
New basis file 11 (say)
Verify level -1
to obtain a point that satisfies the linear constraints, and then
restart with the correct linear and nonlinear objective, along with
Old basis file 11
Verify level 3
6.2.5 Options for Nonlinear Constraints
The following options apply to problems with nonlinear constraints.
|
|
|
|
|
Completion |
Partial |
Default |
|
Completion |
Full |
When there are nonlinear constraints, this determines whether
subproblems should be solved to moderate accuracy (partial
completion), or to full accuracy (full completion). MINOS
implements the option by using two sets of convergence tolerances
for the subproblems.
Use of partial completion may reduce the work during early major
iterations, unless the
Minor iterations limit is active. The
optimal set of basic and superbasic variables will probably be
determined for any given subproblem, but the reduced gradient may be
larger than it would have been with full completion.
An automatic switch to full completion occurs when it appears that
the sequence of major iterations is converging. The switch is made
when the nonlinear constraint error is reduced below 100
* (
Row tolerance), the relative change in λ
k
is 0.1 or less, and the previous subproblem was solved to
optimality.
Full completion tends to give better Lagrange-multiplier estimates.
It may lead to fewer major iterations, but may result in more minor
iterations.
|
|
|
|
|
Crash option |
l |
Default = 3 |
|
Crash tolerance |
t |
Default = 0.1 |
Let
A refer to the linearized constraint matrix.
-
l = 0
- The initial basis contains only slack variables: B = I.
- l = 1
- A is evaluated at the starting point.
Crash is called once, looking for a triangular basis
in all rows and columns of A.
- l = 2
- A is evaluated only after the linear constraints are satisfied.
Crash is called twice. The first call looks
for a triangular basis in linear rows.
The first major iteration proceeds with simplex-type iterations
until the linear constraints are satisfied.
A is then evaluated for the second major iteration
and Crash is called again to find a triangular basis in the
nonlinear rows (retaining the current basis for linear
rows).
- l = 3
- Crash is called three times, treating
linear equalities and linear inequalities separately,
with simplex-type iterations in between.
As before, the last call treats nonlinear rows
at the start of the second major iteration.
|
|
|
|
|
Feasibility tolerance |
t |
Default = 1.0e-6 |
A “feasible subproblem" is one in which the linear constraints and
bounds, as well as the current linearization of the nonlinear
constraints, are satisfied to within
t.
Note that feasibility with respect to the nonlinear constraints is
determined by the
Row tolerance (not the
Feasibility
tolerance).
MINOS first attempts to satisfy the linear constraints and bounds.
If the sum of infeasibilities cannot be reduced to zero, the problem
is declared infeasible.
If
Scale option = 1 or 2, feasibility is defined in terms of
the scaled problem (since it is then more likely to be meaningful).
Normally, the nonlinear functions
F(
x) and
f(
x) are evaluated
only at points
x that satisfy the linear constraints and bounds.
If the functions are undefined in certain regions, every attempt
should be made to eliminate these regions from the problem. For
example, for a function
F(
x) = √
x1 + log
x2, it would be
essential to place lower bounds on both variables. If
Feasibility tolerance = 10
−6, the bounds
x1 ≥ 10
−5 and
x2 ≥ 10
−4 might be appropriate. (The log singularity is
more serious; in general, keep variables as far away from
singularities as possible.)
An exception is during optional gradient checking (see
Verify), which occurs before any optimization takes place. The
points at which the functions are evaluated satisfy the bounds but
not necessarily the general constraints. If this causes difficulty,
gradient checking should be suppressed by setting
Verify level
-1.
If a subproblem is infeasible, the bounds on the linearized
constraints are relaxed in several stages until the subproblem
appears feasible. (The true bounds are restored for the next
subproblem.) This approach sometimes allows the optimization to
proceed successfully. In general, infeasible subproblems are a
symptom of difficulty and it may be necessary to increase the
Penalty parameter or alter the starting point.
|
|
|
|
|
Lagrangian |
Yes |
Default |
|
Lagrangian |
No |
This determines the form of the objective function used for the
linearized subproblems. The default value
Yes is highly
recommended. The
Penalty parameter value is then also
relevant.
If
No is specified, the nonlinear constraint functions are
evaluated only twice per major iteration. Hence this option may be
useful if the nonlinear constraints are very expensive to evaluate.
However, in general there is a great risk that convergence may not
be achieved.
|
|
|
|
|
Major damping parameter |
d |
Default = 2.0 |
This parameter may assist convergence on problems that have highly
nonlinear constraints. It is intended to prevent large relative
changes between subproblem solutions (
xk,λ
k) and
(
xk+1,λ
k+1). For example, the default value 2.0
prevents the relative change in either
xk or λ
k from
exceeding 200 per cent. It will not be active on well behaved
problems. (If all components of
xk or λ
k are small, the
norms of those vectors will not be allowed to increase beyond about
2.0.)
The parameter is used to interpolate between the solutions at the
beginning and end of each major iteration. Thus,
xk+1 and
λ
k+1 are changed to
xk + σ ( xk+1 − xk) and
λk + σ (λk+1 − λk)
for some steplength σ < 1. In the case of nonlinear
equations (where the number of constraints is the same as the number
of variables) this gives a
damped Newton method.
This is a very crude control. If the sequence of major iterations
does not appear to be converging, one should first re-run the
problem with a higher
Penalty parameter (say 2, 4 or 10).
(Skip this re-run in the case of a square system of nonlinear
equations: there are no degrees of freedom and the
Penalty
parameter value has essentially no effect.)
If the subproblem solutions continue to change violently, try
reducing
d to 0.2 or 0.1 (say).
|
|
|
|
|
Major iterations |
k |
Default = 50 |
This is the maximum number of major iterations allowed. It is
intended to guard against an excessive number of linearizations of
the nonlinear constraints, since in some cases the sequence of major
iterations may not converge.
The progress of the major iterations can be best monitored using
Print level 0 (the default).
|
|
|
|
|
Minor iterations |
k |
Default = 40 |
This is the maximum number of minor iterations allowed during a
major iteration,
after the linearized constraints for that
subproblem have been satisfied. (An arbitrary number of minor
iterations may be needed to find a feasible point for each
subproblem.) The
Iterations limit provides an independent
limit on the total minor iterations (across all subproblems).
A moderate value (e.g., 30 ≤
k ≤ 200) prevents excessive
effort being expended on early major iterations, but allows later
subproblems to be solved to completion.
The first major iteration is special: it terminates as soon as the
linear constraints and bounds are satisfied (if possible), ignoring
the nonlinear constraints.
In general it is unsafe to specify a value as small as
k = 1 or 2.
(Even when an optimal solution has been reached, a few minor
iterations may be needed for the corresponding subproblem to be
recognized as optimal.)
|
|
|
|
|
Optimality tolerance |
t |
Default = 1.0e-6 |
|
|
|
|
|
Penalty parameter |
r |
Default = 1.0 |
This specifies that the initial value of ρ
k in the augmented
Lagrangian (5) should be
r times a certain default value
100/
m1, where
m1 is the number of nonlinear constraints. It is
used when
Lagrangian = Yes (the default setting).
For early runs on a problem with unknown characteristics, the
default value should be acceptable. If the problem is highly
nonlinear and the major iterations do not converge, a larger value
such as 2 or 5 may help. In general, a positive
r may be necessary
to ensure convergence,
even for convex programs.
On the other hand, if
r is too large, the rate of convergence may
be unnecessarily slow. If the functions are not highly nonlinear or
a good starting point is known, it is often safe to specify
Penalty parameter 0.0.
If several related problems are to be solved, the following strategy
for setting the
Penalty parameter may be useful:
-
Initially, use a moderate value for r (such as the default) and
a reasonably low Iterations and/or Major iterations limit.
- If successive major iterations appear to be terminate with radically
different solutions, try increasing the penalty parameter.
(See also the Major damping parameter.)
- If there appears to be little progress between major iterations,
it may help to reduce the penalty parameter.
|
|
|
|
|
Radius of convergence |
r |
Default = 0.01 |
This determines when the penalty parameter ρ
k is reduced (if
initialized to a positive value). Both the nonlinear constraint
violation (see
rowerr below) and the relative change in
consecutive Lagrange multiplier estimates must be less than
r at
the start of a major iteration before ρ
k is reduced or set to
zero.
A few major iterations later, full completion is requested if not
already set, and the remaining sequence of major iterations should
converge quadratically to an optimum.
|
|
|
|
|
Row tolerance |
t |
Default = 1.0e-6 |
This specifies how accurately the nonlinear constraints should be
satisfied at a solution. The default value is usually small enough,
since model data is often specified to about that accuracy.
Let
viol be the maximum violation of the nonlinear constraints
(2), and let
rowerr =
viol / (1 +
xnorm), where
xnorm is a
measure of the size of the current solution (
x,
y). The solution is
regarded as acceptably feasible if
rowerr ≤
t.
If the problem functions involve data that is known to be of low
accuracy, a larger
Row tolerance may be appropriate. On the
other hand, nonlinear constraints are often satisfied with rapidly
increasing accuracy during the last few major iterations. It is
common for the final solution to satisfy
rowerr =
O(є).
|
|
|
|
|
Scale option |
l |
Default = 2 (LP) or 1 (NLP) |
|
Scale |
Yes |
|
Scale |
No |
|
Scale linear variables |
|
Same as Scale option 1 |
|
Scale nonlinear variables |
|
Same as Scale option 2 |
|
Scale all variables |
|
Same as Scale option 2 |
|
Scale tolerance |
r |
Default = 0.9 |
|
Scale, Print |
|
Scale, Print, Tolerance |
r |
Three scale options are available as follows:
-
l = 0
- No scaling. If storage is at a premium, this option
saves m+n words of workspace.
- l = 1
- Linear constraints and variables are scaled by an
iterative procedure that attempts to make the
matrix coefficients as close as possible to 1.0.
- l = 2
- All constraints and variables are scaled by the iterative
procedure. Also, a certain additional scaling is performed
that may be helpful if the right-hand side b or the
solution x is large. This takes into
account columns of ( A I ) that are fixed or have positive
lower bounds or negative upper bounds.
Scale option 1 is the default for nonlinear problems. (Only
linear variables are scaled.)
Scale Yes sets the default scaling.
Caution: If all
variables are nonlinear,
Scale Yes unexpectedly does nothing,
because there are no linear variables to scale.)
Scale No
suppresses scaling (equivalent to
Scale option 0).
With nonlinear constraints,
Scale option 1 or
0 should
generally be tried first.
Scale option 2 gives scales that
depend on the initial Jacobian, and should therefore be used only if
(a) a good starting point is provided, and (b) the problem is not
highly nonlinear.
|
|
|
|
|
Verify level |
l |
Default = 0 |
|
Verify objective gradients |
|
Same as Verify level 1 |
|
Verify constraint gradients |
|
Same as Verify level 2 |
|
Verify |
|
Same as Verify level 3 |
|
Verify gradients |
|
Same as Verify level 3 |
|
Verify |
Yes |
Same as Verify level 3 |
|
Verify |
No |
Same as Verify level 0 |
This option refers to a finite-difference check on the gradients
(first derivatives) of each nonlinear function. It occurs before a
problem is scaled, and before the first basis is factorized.
(Hence, the variables may not yet satisfy the general linear
constraints.)
-
l = 0
- Only a “cheap" test is performed, requiring
three evaluations of the nonlinear objective (if any) and two
evaluations of the nonlinear constraints.
- l = 1
- A more reliable check is made on each component
of the objective gradient.
- l = 2
- A check is made on each column of the Jacobian matrix
associated with the nonlinear constraints.
- l = 3
- A detailed check is made on both the objective and
the Jacobian.
- l = −1
- No checking is performed. This may be necessary
if the functions are undefined at the starting point.
6.2.6 Options for Input and Output
The following options specify various files to be used, and the
amount of printed output required.
|
|
|
|
|
Print level |
l |
Default = 0 |
|
Print frequency |
k |
Default = 100 |
The PRINT file provides more complete information than the
SUMMARY file. It includes a listing of the main options,
statistics about the problem, scaling information, the iteration
log, the exit condition, and (optionally) the final solution. It
also includes error messages.
The files are specified in
Prob.SOL.PrintFile and
Prob.SOL.SummFile.
For problems with linear con