# TOMNET  
# REGISTER (TOMNET)
# LOGIN  
# myTOMNET
TOMLAB LOGO

« Previous « Start » Next »

4  MINOS details

4.1  Introduction

TOMNET /MINOS (hereafter referred to as MINOS) is a linear and nonlinear programming system, designed to solve large-scale constrained optimization problems of the following form:
 
min
x, y
  F(x) + cT x + dT y  
s/t b1 < f(x) + A1 y < b2
  b3 < A2x + A3 y < b4
  l < (x, y) < u
    (7)


where the vectors bi, c, d, l, u and the matrices Ai are constant, F(x) is a nonlinear function of some of the variables, and f(x) is a vector of nonlinear functions. The nonlinearities (if present) may be of a general nature but must be smooth and preferably “almost linear", in the sense that they should not change radically with small changes in the variables. We make the following definitions:

x the nonlinear variables
y the linear variables
(x, y) the vector (x,y)T .
(1.1) the objective function
(1.2) the nonlinear constraints
(1.3) the linear constraints
(1.4) the bounds on the variables
m the total number of general constraints in (2) and (3)
n the total number of variables in x and y
m1 the number of nonlinear constraints (the dimension of f(x))
n1 the number of nonlinear variables (the dimension of x)
n1 the number of nonlinear objective variables (in F(x))
n1′′ the number of nonlinear Jacobian variables (in f(x))


A large-scale problem is one in which m and n are several hundred or several thousand. MINOS takes advantage of the fact that the constraint matrices Ai and the partial derivatives ∂ fi(x) / ∂ xj are typically sparse (contain many zeros).

The dimensions n1 and n1′′ allow for the fact that F(x) and f(x) may involve different sets of nonlinear variables “x". The two sets of variables always overlap, in the sense that the shorter “x" is always the same as the beginning of the other. If x is the same in both cases, we have n1 = n1 = n1′′. Otherwise, we define the number of nonlinear variables to be n1 = max(n1,n1′′).

In the following sections we introduce more terminology and give an overview of the MINOS optimization algorithms and the main system features.

4.1.1  Linear Programming

When F(x) and f(x) are absent, the problem becomes a linear program. Since there is no need to distinguish between linear and nonlinear variables, we use x rather than y. We also convert all general constraints into equalities with the aid of slack variables s, so that the only inequalities are simple bounds on the variables. Thus, we write linear programs in the form
 
minimize
x, s
cT x subject to Ax + s = b,    l ≤ (x, s) ≤ u.     (8)


When the constraints are linear, the bounds on the slacks are defined so that b = 0. When there are nonlinear constraints, some elements of b are nonzero.

In the mathematical programming world, x and s are sometimes called structural variables and logical variables. Their upper and lower bounds are fundamental to problem formulations and solution algorithms. Some of the components of l may be −∞ and those of u may be +∞. If lj = uj, a variable is said to be fixed, and if its bounds are −∞ and +∞, the variable is called free.

Within MINOS, a point (x, s) is said to be feasible if the following are true:
  • The constraints Ax + s = b are satisfied to within machine precision ≈ 10−15.
  • The bounds l ≤ (x, s) ≤ u are satisfied to within a feasibility tolerance δfea ≈ 10−6.
  • The nonlinear constraints (7) are satisfied to within a row tolerance δrow ≈ 10−6.
Tolerances such as δfea and δrow may be specified by setting Feasibility tolerance and Row tolerance.





MINOS solves linear programs using a reliable implementation of the primal simplex method, in which the constraints Ax + s = b are partitioned into the form
Bx B + Nx N = b,     (9)
where the basis matrix B is a square and nonsingular submatrix of ( A I ). The elements of x B and x N are called the basic and nonbasic variables respectively. Together, they are a permutation of the vector (x, s). Certain dual variables π and reduced costs d N are defined by the equations
BTπ = c B,      d N = c NNTπ,     (10)
where (c B, c N) is a permutation of the objective vector (c, 0).

At a feasible point, nonbasic variables are typically equal to one of their bounds, and basic variables are somewhere between their bounds. To reach an optimal solution, the simplex method performs a sequence of iterations of the following general nature. With guidance from d N, a nonbasic variable is chosen to move from its current value, and the basic variables are adjusted to satisfy the constraints in (8). Usually one of the basic variables reaches a bound. The basis partition is then redefined with a column of B being replaced by a column of N. When no such interchange can be found to reduce the value of cT x, the current solution is optimal.

The simplex method

For convenience, let x denote the variables (x, s). The main steps in a simplex iteration are as follows:
Compute dual variables:
Solve BTπ = c B.
Price:
Compute some or all of the reduced costs d N = c NNTπ to determine if a favorable nonbasic column aq exists.
Compute search direction:
Solve Bp B = ± aq to determine the basic components of a search direction p along which the objective is improved. (The nonbasic elements of p are p N = 0, except for ± 1 for the element corresponding to aq.)
Find maximum steplength:
Find the largest steplength αmax such that x + αmax p continues to satisfy the bounds on the variables. The steplength may be determined by the new nonbasic variable reaching its opposite bound, but normally some basic variable will reach a bound first.
Update:
Take the step αmax. If this was determined by a basic variable, interchange the corresponding column of B with column aq from N.
When a starting basis is chosen and the basic variables x B are first computed, if any components of x B lie significantly outside their bounds, we say that the current point is infeasible. In this case, the simplex method uses a “Phase 1" procedure to reduce the sum of infeasibilities. This is similar to the subsequent “Phase 2" procedure just described.

The feasibility tolerance δfea is used to determine which Phase is in effect. A similar optimality tolerance δopt is used during pricing to judge whether any reduced costs are significantly large. (This tolerance is scaled by ∥π∥, a measure of the size of the current π.)

If the solution procedures are interrupted, some of the nonbasic variables may lie strictly between their bounds: lj < xj < uj. In addition, at a “feasible" or “optimal" solution, some of the basic variables may lie slightly outside their bounds: lj − δfeaxjuj + δfea. In rare cases, even a few nonbasic variables might lie outside their bounds by as much as δfea.

MINOS maintains a sparse LU factorization of the basis matrix B, using a Markowitz ordering scheme and Bartels-Golub updates, as implemented in the Fortran package LUSOL The basis factorization is central to the efficient handling of sparse linear and nonlinear constraints.

4.1.2  Problems with a Nonlinear Objective

When nonlinearities are confined to the term F(x) in the objective function, the problem is a linearly constrained nonlinear program. MINOS solves such problems using a reduced-gradient method combined with a quasi-Newton method that generally leads to superlinear convergence. The implementation follows that described in Murtagh and Saunders.

As a slight generalization of (9), the constraints Ax + s = b are partitioned into the form
Bx B + Sx S + Nx N = b,     (11)
where x S is a set of superbasic variables. As before, the nonbasic variables are normally equal to one of their bounds, while the basic and superbasic variables lie somewhere between their bounds (to within δfea). Let the number of superbasic variables be n S, the number of columns in S. At a solution, n S will be no more than n1, the number of nonlinear variables, and it is often much smaller than this. In many real-life cases we have found that n S remains reasonably small, say 200 or less, regardless of the size of the problem. This is one reason why MINOS has proved to be a practical tool.

In the reduced-gradient method, x S is regarded as a set of “independent variables" that are allowed to move in any desirable direction to reduce the objective function (or the sum of infeasibilities). The basic variables are then adjusted in order to continue satisfying the linear constraints. If it appears that no improvement can be made with the current definition of B, S and N, one of the nonbasic variables is selected to be added to S, and the process is repeated with an increased value of n S. At all stages, if a basic or superbasic variable encounters one of its bounds, that variable is made nonbasic and the value of n S is reduced by one.

For linear programs, we may interpret the simplex method as being the same as the reduced-gradient method, with the number of superbasic variables oscillating between 0 and 1. (In general, a step of the simplex method or the reduced-gradient method is called a minor iteration.)

A certain matrix Z is needed for descriptive purposes. It takes the form
Z = ⎛
⎜
⎜
⎝
B−1 S
I
0
⎞
⎟
⎟
⎠
,     (12)
though it is never computed explicitly. Given LU factors of the basis matrix B, it is possible to compute products of the form Z q and ZT g by solving linear equations involving B or BT. This in turn allows optimization to be performed on the superbasic variables, while the basic variables are adjusted to satisfy the general linear constraints. (In the description below, the reduced-gradient vector satisfies d S = ZT g, and the search direction satisfies p = Zp S.)

An important part of MINOS is the quasi-Newton method used to optimize the superbasic variables. This can achieve superlinear convergence during any sequence of iterations for which the B, S, N partition remains constant. It requires a dense upper-triangular matrix R of dimension n S, which is updated in various ways to approximate the reduced Hessian:
RT RZT H Z,     (13)
where H is the Hessian of the objective function, i.e. the matrix of second derivatives of F(x). As for unconstrained optimization, the storage required for R is sometimes a limiting factor.

The reduced-gradient method

Let g be the gradient of the nonlinear objective (7). The main steps in a reduced-gradient iteration are as follows:
Compute dual variables and reduced gradient:
Solve BTπ = g B and compute the reduced-gradient vector d S = g SSTπ.
Price:
If ∥d S∥ is sufficiently small, compute some or all of the reduced costs d N = g NNTπ to determine if a favorable nonbasic column aq exists. If so, move that column from N into S, expanding R accordingly.
Compute search direction:
Solve RT R p S = − d S and Bp B = − Sp S to determine the superbasic and basic components of a search direction p along which the objective is improved. (The nonbasic elements of p are p N = 0.)
Find maximum steplength:
Find the largest steplength αmax such that x + αmax p continues to satisfy the bounds on the variables.
Perform linesearch:
Find an approximate solution to the one-dimensional problem
 
minimize
α
F(xp) subject to 0 ≤ α ≤ αmax.
Update (quasi-Newton):
Take the step α. Apply a quasi-Newton update to R to account for this step.
Update (basis change):
If a superbasic variable reached a bound, move it from S into N. If a basic variable reached a bound, find a suitable superbasic variable to move from S into B, and move the basic variable into N. Update R if necessary.
At an optimum, the reduced gradient d S should be zero. MINOS terminates when ∥d S∥ ≤ δopt∥π∥ and the reduced costs (component of d N) are all sufficiently positive or negative, as judged by the same quantity δopt∥π∥.

In the linesearch, F(xp) really means the objective function (7) evaluated at the point (x, y, s) + α p. This steplength procedure is another important part of MINOS. Two different procedures are used, depending on whether or not all gradients are known analytically. The number of nonlinear function evaluations required may be influenced by setting the Linesearch tolerance in the SPECS file.

Normally, the objective function F(x) will never be evaluated at a point x unless that point satisfies the linear constraints and the bounds on the variables. An exception is during a finite-difference check on the calculation of gradient elements. This check is performed at the starting point x0, which takes default values or may be specified. MINOS ensures that the bounds lx0u are satisfied, but in general the starting point will not satisfy the general linear constraints. If F(x0) is undefined, the gradient check should be suppressed (Verify level -1), or the starting point should be redefined.

4.1.3  Problems with Nonlinear Constraints

If any of the constraints are nonlinear, MINOS employs a projected Lagrangian algorithm, based on a method due to Robinson, see Murtagh and Saunders. This involves a sequence of major iterations, each of which requires the solution of a linearly constrained subproblem. Each subproblem contains linearized versions of the nonlinear constraints, as well as the original linear constraints and bounds.

At the start of the k-th major iteration, let (xk, yk) be an estimate of the variables, and let λk be an estimate of the Lagrange multipliers (dual variables) associated with the nonlinear constraints. The constraints are linearized by changing f(x) in Equation (2) to its linear approximation:
f
(x, xk) = f(xk) + J(xk)(xxk),
or more briefly f = fk + Jk(xxk), where J(xk) is the Jacobian matrix evaluated at xk. (The i-th row of the Jacobian is the gradient vector for the i-th nonlinear constraint function.) The subproblem to be solved during the k-th major iteration is then
 
min
x, y
F(x) + cT x + dT y − λkT fd +
1
2
ρ ∥ fd ∥2
s/t b1
f
+ A1y
b2
b3 A2x + A3y b4,
l ≤ (x, y) ≤ u,
    (14)
where fd = ff is the difference between f(x) and its linearization. The objective function (14) is called an augmented Lagrangian. The scalar ρk is a penalty parameter, and the term involving ρk is a modified quadratic penalty function. MINOS uses the reduced-gradient method to solve each subproblem. As before, slack variables are introduced and the vectors bi are incorporated into the bounds on the slacks. The linearized constraints take the form
⎛
⎝
Jk A1
A2 A3
⎞
⎠
⎛
⎝
x
y
⎞
⎠
+ ⎛
⎝
s1
s2
⎞
⎠
= ⎛
⎝
Jk xkfk
0
⎞
⎠
.
We refer to this system as Ax + s = b as in the linear case. The Jacobian Jk is treated as a sparse matrix, the same as the matrices A1, A2 and A3. The quantities Jk, b, λk and ρk change each major iteration.

The projected Lagrangian method

For convenience, suppose that all variables and constraints are nonlinear. The main steps in a major iteration are as follows:
Solve subproblem:
Find an approximate solution (x, λ) to the kth subproblem (14)–(14).
Compute search direction:
Adjust the elements of λ if necessary (if they have the wrong sign). Define a search direction (Δ x, Δλ) = (xxk, λ − λk).
Find steplength:
Choose a steplength σ such that some merit function M(x,λ) has a suitable value at the point (xk + σΔ x, λk + σΔλ).
Update:
Take the step σ to obtain (xk+1, λk+1). In some cases, adjust ρk.
For the first major iteration, the nonlinear constraints are ignored and minor iterations are performed until the original linear constraints are satisfied.

The initial Lagrange multiplier estimate is typically λk = 0 (though it can be provided by the user). If a subproblem terminates early, some elements of the new estimate λ may be changed to zero.

The penalty parameter initially takes a certain default value ρk = 100.0/m1, where m1 is the number of nonlinear constraints. (A value r times as big is obtained by specifying Penalty parameter r.) For later major iterations, ρk is reduced in stages when it appears that the sequence {xkk} is converging. In many cases it is safe to specify Penalty parameter 0.0 at the beginning, particularly if a problem is only mildly nonlinear. This may improve the overall efficiency.

In the output from MINOS, the term Feasible subproblem indicates that the linearized constraints (14)–(14) have been satisfied. In general, the nonlinear constraints are satisfied only in the limit, so that feasibility and optimality occur at essentially the same time. The nonlinear constraint violation is printed every major iteration. Even if it is zero early on (say at the initial point), it may increase and perhaps fluctuate before tending to zero. On “well behaved" problems, the constraint violation will decrease quadratically (i.e., very quickly) during the final few major iterations.

For certain rare classes of problem it is safe to request the values λk = 0 and ρk=0 for all subproblems by specifying Lagrangian = No (in which case the nonlinear constraint functions are evaluated only once per major iteration). However for general problems, convergence is much more likely with the default setting, Lagrangian = Yes.

The merit function

Unfortunately, it is not known how to define a merit function M(x,λ) that can be reduced at every major iteration. As a result, there is no guarantee that the projected Lagrangian method described above will converge from an arbitrary starting point. This has been the principal theoretical gap in MINOS, finally resolved by the PhD research of Michael Friedlander. The main features needed to stabilize MINOS are:
  • To relax the linearized constraints via an ℓ1 penalty function.
  • To repeat a major iteration with increased ρk (and more relaxed linearized constraints) if the nonlinear constraint violation would increase too much.
In practice, the method of MINOS 5.51 often does converge and a good rate of convergence is often achieved in the final major iterations, particularly if the constraint functions are “nearly linear". As a precaution, MINOS prevents radical changes from one major iteration to the next. Where possible, the steplength is chosen to be σ = 1, so that each new estimate of the solution is (xk+1, λk+1) = (x, λ), the solution of the subproblem. If this point is “too different", a shorter steplength σ < 1 is chosen.

If the major iterations for a particular model do not appear to be converging, some of the following actions may help:
  1. Specify initial activity levels for the nonlinear variables as carefully as possible.
  2. Include sensible upper and lower bounds on all variables.
  3. Specify a Major damping parameter that is lower than the default value. This tends to make σ smaller.
  4. Specify a Penalty parameter that is higher than the default value. This tends to prevent excessive departures from the constraint linearization.

4.1.4  Problem Formulation

In general, it is worthwhile expending considerable prior analysis to make the constraints completely linear if at all possible. Sometimes a simple transformation will suffice. For example, a pipeline optimization problem has pressure drop constraints of the form
K1
d14.814
+
K2
d24.814
+ ...  ≤ PT2P02,
where di are the design variables (pipe diameters) and the other terms are constant. These constraints are highly nonlinear, but by redefining the decision variables to be xi = 1/di4.814 we can make the constraints linear. Even if the objective function becomes more nonlinear by such a transformation (and this usually happens), the advantages of having linear constraints greatly outweigh this.

Similarly, it is usually best not to move nonlinearities from the objective function into the constraints. For example, we should not replace “minimize F(x)" by
minimize z    subject to    F(x) − z = 0.


Scaling is a very important matter during problem formulation. A general rule is to scale both the data and the variables to be as close to 1.0 as possible. In general we suggest the range 1.0 to 10.0. When conflicts arise, one should sacrifice the objective function in favor of the constraints. Real-world problems tend to have a natural scaling within each constraint, as long as the variables are expressed in consistent physical units. Hence it is often sufficient to apply a scale factor to each row. MINOS has options to scale the rows and columns of the constraint matrix automatically. By default, only the linear rows and columns are scaled, and the procedure is reliable. If you request that the nonlinear constraints and variables be scaled, bear in mind that the scale factors are determined by the initial Jacobian J(x0), which may differ considerably from J(x) at a solution.

Finally, upper and lower bounds on the variables (and on the constraints) are extremely useful for confining the region over which optimization has to be performed. If sensible values are known, they should always be used. They are also important for avoiding singularities in the nonlinear functions. Note that bounds may be violated slightly by as much as the feasibility tolerance δfea. Hence, if √x2 or logx2 appear (for example) and if δfea = 10−6, the lower bound on x2 would normally have to be at least 10−5. If it is known that x2 will be at least 0.5 (say) at a solution, then its lower bound should be 0.5.




For a detailed discussion of many aspects of numerical optimization, see Gill, Murray and Wright for much invaluable advice on problem formulation and assessment of results.

4.1.5  Restrictions

MINOS is designed to find solutions that are locally optimal. The nonlinear functions in a problem must be smooth (i.e., their first derivatives must exist), especially near the desired solution. The functions need not be separable.

A certain “feasible" region is defined by the general constraints and the bounds on the variables. If the objective is convex within this region and if the feasible region itself is convex, any optimal solution obtained will be a global optimum. Otherwise there may be several local optima, and some of these may not be global. In such cases the chances of finding a global optimum are usually increased by choosing a starting point that is “sufficiently close", but there is no general procedure for determining what “close" means, or for verifying that a given local optimum is indeed global.

Integer restrictions cannot be imposed directly. If a variable xj is required to be 0 or 1, a common ploy is to include a quadratic term xj(1 − xj) in the objective function. MINOS will indeed terminate with xj = 0 or 1, but inevitably the final solution will just be a local optimum. (Note that the quadratic is negative definite. MINOS will find a global minimum for quadratic functions that are positive definite or positive semidefinite, assuming the constraints are linear.)

4.2  Solver Options

The following sections describes some of the solver options depending on problem type.

4.2.1  Options for Linear Programming

The following options apply specifically to linear programs.



                         
Crash option i Default = 3
Crash tolerance t Default = 0.1

Except on restarts, a Crash procedure is used to select an initial basis from certain rows and columns of the constraint matrix ( A I ). The Crash option i determines which rows and columns of A are eligible initially, and how many times Crash is called. Columns of I are used to pad the basis where necessary.
  • i=0 The initial basis contains only slack variables: B = I.

  • 1 Crash is called once, looking for a triangular basis in all rows and columns of A.

  • 2 Crash is called twice (if there are nonlinear constraints). The first call looks for a triangular basis in linear rows, and the first major iteration proceeds with simplex iterations until the linear constraints are satisfied. The Jacobian is then evaluated for the second major iteration and Crash is called again to find a triangular basis in the nonlinear rows (retaining the current basis for linear rows).

  • 3 Crash is called up to three times (if there are nonlinear constraints). The first two calls treat linear equalities and linear inequalities separately. As before, the last call treats nonlinear rows at the start of the second major iteration.
If i ≥ 1, certain slacks on inequality rows are selected for the basis first. (If i ≥ 2, numerical values are used to exclude slacks that are close to a bound.) Crash then makes several passes through the columns of A, searching for a basis matrix that is essentially triangular. A column is assigned to “pivot" on a particular row if the column contains a suitably large element in a row that has not yet been assigned. (The pivot elements ultimately form the diagonals of the triangular basis.) For remaining unassigned rows, slack variables are inserted to complete the basis.

The Crash tolerance allows Crash to ignore certain “small" nonzeros in each column of A. If amax is the largest element in column j, other nonzeros aij in the column are ignored if |aij| ≤ amax × t. (To be meaningful, t should be in the range 0 ≤ t < 1.)

When t > 0.0, the bases obtained by Crash may not be strictly triangular, but are likely to be nonsingular and almost triangular. The intention is to choose a basis containing more columns of A and fewer (arbitrary) slacks. A feasible solution may be reached sooner on some problems.

For example, suppose the first m columns of A are the matrix shown under LU factor tolerance; i.e., a tridiagonal matrix with entries −1, 4, −1. To help Crash choose all m columns for the initial basis, we would specify Crash tolerance t for some value of t > 1/4.

4.2.2  Options for All Problems

The following options have the same purpose for all problems, whether they linear or nonlinear.



                         
Check frequency k Default = 60

Every k-th minor iteration after the most recent basis factorization, a numerical test is made to see if the current solution x satisfies the general linear constraints (including linearized nonlinear constraints, if any). The constraints are of the form Ax + s = b, where s is the set of slack variables. To perform the numerical test, the residual vector r = bAxs is computed. If the largest component of r is judged to be too large, the current basis is refactorized and the basic variables are recomputed to satisfy the general constraints more accurately.

Check frequency 1 is useful for debugging purposes, but otherwise this option should not be needed.



                         
Cycle limit l Default = 1
Cycle print p Default = 1
Cycle tolerance t Default = 0.0
Phantom columns c Default = 0
Phantom elements e Default = 0



                         
Debug level l Default = 0

This causes various amounts of information to be output to the Print file. Most debug levels are not helpful to normal users, but they are listed here for completeness.
l = 0
No debug output.
l = 2
(or more) Output from m5setx showing the maximum residual after a row check.
l = 40
Output from lu8rpc (which updates the LU factors of the basis matrix), showing the position of the last nonzero in the transformed incoming column.
l = 50
Output from lu1mar (which computes the LU factors each refactorization), showing each pivot row and column and the dimensions of the dense matrix involved in the associated elimination.
l = 100
Output from m2bfac and m5log listing the basic and superbasic variables and their values at every iteration.


                         
Expand frequency k Default = 10000

This option is part of an anti-cycling procedure designed to guarantee progress even on highly degenerate problems. “Cycling" can occur only if zero steplengths are allowed. Here, the strategy is to force a positive step at every iteration, at the expense of violating the bounds on the variables by a small amount. Suppose that the Feasibility tolerance is δ. Over a period of k iterations, the tolerance actually used by MINOS increases from 0.5δ to δ (in steps of 0.5δ/k).

Every k iterations, or when feasibility and optimality are first detected, a resetting procedure eliminates any infeasible nonbasic variables. Some additional iterations may be needed to restore feasibility and optimality. Increasing k reduces that likelihood, but it gives less freedom to choose large pivot elements during basis changes. (See Pivot tolerance.)



                         
Factorization frequency k Default = 100 (LP) or 50 (NLP)

With linear programs, most iterations cause a basis change, in which one column of the basis matrix B is replaced by another. The LU factors of B must be updated accordingly. At most k updates are performed before the current B is factorized directly.

Each update tends to add nonzeros to the LU factors. Since the updating method is stable, k mainly affects the efficiency of minor iterations, rather than stability.

High values of k (such as 100 or 200) may be more efficient on “dense" problems, when the factors of B tend to have two or three times as many nonzeros as B itself. Lower values of k may be more efficient on problems that are very sparse.



                         
Feasibility tolerance t Default = 1.0e-6

This sets the feasibility tolerance δfea = t (see §4.2.1). A variable or constraint is considered feasible if it does not lie outside its bounds by more than δfea.

MINOS first attempts to satisfy the linear constraints and bounds. If the sum of infeasibilities cannot be reduced to zero, the problem is declared infeasible. Let sinf be the corresponding sum of infeasibilities. If sinf is quite small, it may be appropriate to raise t by a factor of 10 or 100. Otherwise, some error in the data should be suspected. If sinf is not small, there may be other points that have a significantly smaller sum of infeasibilities. MINOS does not attempt to find a solution that minimizes the sum.

For Scale option 1 or 2, feasibility is defined in terms of the scaled problem (since it is then more likely to be meaningful). The final unscaled solution can therefore be infeasible by an unpredictable amount, depending on the size of the scale factors. Precautions are taken so that in a “feasible solution" the original variables will never be infeasible by more than 0.1. Values that large are very unlikely.



                         
Iterations limit k Default = 3m

MINOS stops after k iterations even if the simplex method has not yet reached a solution. If k = 0, no iterations are performed, but the starting point is tested for both feasibility and optimality.



                         
LU factor tolerance t1 Default = 100.0 (LP) or 5.0 (NLP)
LU update tolerance t2 Default = 110.0 (LP) or 5.0 (NLP)

These tolerances affect the stability and sparsity of the basis factorization B = LU during refactorization and updating, respectively. They must satisfy t1, t2 ≥ 1.0. The matrix L is a product of matrices of the form
⎛
⎝
1
μ 1
⎞
⎠
,
where the multipliers μ satisfy |μ| ≤ ti. Values near 1.0 favor stability, while larger values favor sparsity. The default values usually strike a good compromise. For large and relatively dense problems, t1 = 10.0 or 5.0 (say) may give a useful improvement in stability without impairing sparsity to a serious degree.

For certain very regular structures (e.g., band matrices) it may be necessary to reduce t1 and/or t2 in order to achieve stability. For example, if the columns of A include a submatrix of the form
⎛
⎜
⎜
⎜
⎜
⎝
4 −1
−1 4 −1
  ⋰ ⋰ ⋰
    −1 4 −1
      −1 4
⎞
⎟
⎟
⎟
⎟
⎠
,
one should set both t1 and t2 to values in the range 1.0 ≤ ti < 4.0.



                         
LU density tolerance t3 Default = 0.5
LU singularity tolerance t4 Default = є0.67 ≈ 3.25e−11

The density tolerance t3 is used during LU factorization of the basis matrix. Columns of L and rows of U are formed one at a time, and the remaining rows and columns of the basis are altered appropriately. At any stage, if the density of the remaining matrix exceeds t3, the Markowitz strategy for choosing pivots is altered to reduce the time spent searching for each remaining pivot. Raising the density tolerance towards 1.0 may give slightly sparser LU factors, with a slight increase in factorization time.

The singularity tolerance t4 helps guard against ill-conditioned basis matrices. When the basis is refactorized, the diagonal elements of U are tested as follows: if |Ujj| ≤ t4 or |Ujj| < t4 maxi |Uij|, the j-th column of the basis is replaced by the corresponding slack variable. (This is most likely to occur after a restart, or at the start of a major iteration.)

In some cases, the Jacobian matrix may converge to values that make the basis exactly singular. (For example, a whole row of the Jacobian could be zero at an optimal solution.) Before exact singularity occurs, the basis could become very ill-conditioned and the optimization could progress very slowly (if at all). Setting t4 = 1.0e-5, say, may help cause a judicious change of basis.



                         
Maximize
Minimize   Default

This specifies the required direction of optimization.



                         
Multiple price k Default = 1

It is not normal to set k>1 for linear programs, as it causes MINOS to use the reduced-gradient method rather than the simplex method. The number of iterations, and the total work, are likely to increase.

The reduced-gradient iterations do not correspond to the very efficient multiple pricing “minor iterations" carried out by certain commercial linear programming systems. Such systems require storage for k dense vectors of dimension m, so that k is usually limited to 5 or 6. In MINOS, the total storage requirements increase only slightly with k. (The Superbasics limit must be at least k.)



                         
Optimality tolerance t Default = 1.0e-6

This is used to judge the size of the reduced gradients dj = gj − πT aj, where gj is the gradient of the objective function corresponding to the j-th variable, aj is the associated column of the constraint matrix (or Jacobian), and π is the set of dual variables.

By construction, the reduced gradients for basic variables are always zero. Optimality is declared if the reduced gradients for nonbasic variables at their lower or upper bounds satisfy dj / ∥π∥ ≥ −t or dj / ∥π∥t respectively, and if |dj| / ∥π∥t for superbasic variables.

In those tests, ∥π∥ is a measure of the size of the dual variables. It is included to make the tests independent of a large scale factor on the objective function. The quantity actually used is defined by σ = Σi=1mi|, ∥π∥ = max{σ / √m, 1.0} , so that only scale factors larger than 1.0 are allowed for. If the objective is scaled down to be very small, the optimality test reduces to comparing dj against t.



                         
Partial price p Default = 10 (LP) or 1 (NLP)

This parameter is recommended for large problems that have significantly more variables than constraints. It reduces the work required for each “pricing" operation (when a nonbasic variable is selected to become basic or superbasic).

When p=1, all columns of the constraint matrix ( A I ) are searched. Otherwise, A and I are partitioned to give p roughly equal segments Aj, Ij (j = 1 to p). If the previous pricing search was successful on Aj, Ij, the next search begins on the segments Aj+1, Ij+1. (Subscripts are modulo p.)

If a reduced gradient is found that is larger than some dynamic tolerance, the variable with the largest such reduced gradient (of appropriate sign) is selected to become superbasic. (Several may be selected if multiple pricing has been specified.) If nothing is found, the search continues on the next segments Aj+2, Ij+2, and so on.

Partial price t (or t/2 or t/3) may be appropriate for time-stage models having t time periods.



                         
Pivot tolerance t Default = є2/3 ≈ 10−11

Broadly speaking, the pivot tolerance is used to prevent columns entering the basis if they would cause the basis to become almost singular. When x changes to x + α p for some search direction p, a “ratio test" is used to determine which component of x reaches an upper or lower bound first. The corresponding element of p is called the pivot element.

For linear problems, elements of p are ignored (and therefore cannot be pivot elements) if they are smaller than the pivot tolerance t. For nonlinear problems, elements smaller than t∥p∥ are ignored.

It is common for two or more variables to reach a bound at essentially the same time. In such cases, the Feasibility tolerance provides some freedom to maximize the pivot element and thereby improve numerical stability. Excessively small Feasibility tolerances should therefore not be specified.

To a lesser extent, the Expand frequency also provides some freedom to maximize the pivot element. Excessively large Expand frequencies should therefore not be specified.



                         
Scale option l Default = 2 (LP) or 1 (NLP)
Scale Yes
Scale No
Scale linear variables   Same as Scale option 1
Scale nonlinear variables   Same as Scale option 2
Scale all variables   Same as Scale option 2
Scale tolerance t Default = 0.9
Scale, Print
Scale, Print, Tolerance t

Three scale options are available as follows:
l = 0
No scaling. If storage is at a premium, this option saves m+n words of workspace.
l = 1
Linear constraints and variables are scaled by an iterative procedure that attempts to make the matrix coefficients as close as possible to 1.0 (see Fourer, 1982). This sometimes improves the performance of the solution procedures.
l = 2
All constraints and variables are scaled by the iterative procedure. Also, a certain additional scaling is performed that may be helpful if the right-hand side b or the solution x is large. This takes into account columns of ( A I ) that are fixed or have positive lower bounds or negative upper bounds.
Scale Yes sets the default scaling. (Caution: If all variables are nonlinear, Scale Yes unexpectedly does nothing, because there are no linear variables to scale.) Scale No suppresses scaling (equivalent to Scale option 0).

If nonlinear constraints are present, Scale option 1 or 0 should generally be tried at first. Scale option 2 gives scales that depend on the initial Jacobian, and should therefore be used only if (a) a good starting point is provided, and (b) the problem is not highly nonlinear.

Scale, Print causes the row-scales r(i) and column-scales c(j) to be printed. The scaled matrix coefficients are aij = aij c(j) / r(i), and the scaled bounds on the variables and slacks are lj = lj / c(j), uj = uj / c(j), where c(j) ≡ r(jn) if j>n.

All forms except Scale option may specify a tolerance t, where 0 < t < 1 (for example: Scale, Print, Tolerance = 0.99). This affects how many passes might be needed through the constraint matrix. On each pass, the scaling procedure computes the ratio of the largest and smallest nonzero coefficients in each column:
ρj =
 
max
i
|aij| /
 
min
i
|aij|      (aij≠ 0).
If maxj ρj is less than t times its previous value, another scaling pass is performed to adjust the row and column scales. Raising t from 0.9 to 0.99 (say) usually increases the number of scaling passes through A. At most 10 passes are made.

If a Scale option has not already been specified, Scale, Print or Scale tolerance both set the default scaling.



                         
Weight on linear objective w Default = 0.0

This keyword invokes the so-called composite objective technique, if the first solution obtained is infeasible, and if the objective function contains linear terms. While trying to reduce the sum of infeasibilities, the method also attempts to optimize the linear objective. At each infeasible iteration, the objective function is defined to be
 
minimize
x
σ w(cT x) + (sum of infeasibilities),
where σ=1 for minimization, σ= −1 for maximization, and c is the linear objective. If an “optimal" solution is reached while still infeasible, w is reduced by a factor of 10. This helps to allow for the possibility that the initial w is too large. It also provides dynamic allowance for the fact that the sum of infeasibilities is tending towards zero.

The effect of w is disabled after 5 such reductions, or if a feasible solution is obtained.

The Weight option is intended mainly for linear programs. It is unlikely to be helpful on nonlinear problems.

4.2.3  Options for Nonlinear Objectives

The following options apply to nonlinear programs whose constraints are linear.



                         
Crash option l Default = 3
Crash tolerance t Default = 0.1

These options are the same as for linear programs.

4.2.4  Options for All Nonlinear problems



                         
Expand frequency k Default = 10000

This option is used the same as for linear programs, but takes effect only when there is just one superbasic variable. (Cycling can occur only when the current solution is at a vertex of the feasible region. Thus, zero steps are allowed if there is more than one superbasic variable, but otherwise positive steps are enforced.) Increasing k helps reduce the number of slightly infeasible nonbasic basic variables (most of which are eliminated during a resetting procedure). However, it also diminishes the freedom to choose a large pivot element (see Pivot tolerance).



                         
Feasibility tolerance t Default = 1.0e-6

When the constraints are linear, a feasible solution is one in which all variables, including slacks, satisfy their upper and lower bounds to within the absolute tolerance t. (Since slacks are included, this means that the general linear constraints are also satisfied to within t.)

When nonlinear constraints are present, a feasible subproblem is one in which the linear constraints and bounds, as well as the current linearization of the nonlinear constraints, are satisfied to within the tolerance t.

MINOS first attempts to satisfy the linear constraints and bounds. If the sum of infeasibilities cannot be reduced to zero, the problem is declared infeasible.

Normally, the nonlinear functions F(x) and f(x) are evaluated only at points x that satisfy the linear constraints and bounds. If the functions are undefined in certain regions, every attempt should be made to eliminate these regions from the problem. For example, for a function F(x) = √x1 + logx2, it would be essential to place lower bounds on both variables. If Feasibility tolerance = 10−6, the bounds x1 ≥ 10−5 and x2 ≥ 10−4 might be appropriate. (The log singularity is more serious; in general, keep variables as far away from singularities as possible.)

An exception is during optional gradient checking (see Verify), which occurs before any optimization takes place. The points at which the functions are evaluated satisfy the bounds but not necessarily the general constraints. If this causes difficulty, gradient checking should be suppressed by setting Verify level -1.

If a subproblem is infeasible, the bounds on the linearized constraints are relaxed in several stages until the subproblem appears feasible. (The true bounds are restored for the next subproblem.) This approach sometimes allows the optimization to proceed successfully. In general, infeasible subproblems are a symptom of difficulty and it may be necessary to increase the Penalty parameter or alter the starting point.

Note: Feasibility with respect to the nonlinear constraints is measured against the Row tolerance, not the Feasibility tolerance.



                         
Hessian dimension r Default = 50

This specifies that an r × r triangular matrix R is to be available for use by the quasi-Newton algorithm (to approximate the reduced Hessian matrix according to ZT H ZRT R). Suppose there are s superbasic variables at a particular iteration. Whenever possible, r should be greater than s.

If rs, the first s columns of R are used to approximate the reduced Hessian in the normal manner. If there are no further changes to the set of superbasic variables, the rate of convergence is usually superlinear. If r<s, a matrix of the form
R = ⎛
⎝
Rr 0
  D
⎞
⎠
is used to approximate the reduced Hessian, where Rr is an r× r upper triangular matrix and D is a diagonal matrix of order sr. The rate of convergence is no longer superlinear (and may be arbitrarily slow).

The storage required is of order 1/2r2, which is substantial if r is as large as 1000 (say). In general, r should be a slight over-estimate of the final number of superbasic variables, whenever storage permits. It need not be larger than n1 + 1, where n1 is the number of nonlinear variables. For many problems it can be much smaller than n1.



                         
Iterations limit k Default = 3m + 10n1

If the constraints are linear, this is the maximum number of iterations allowed for the simplex method or the reduced-gradient method. Otherwise, it is the maximum number of minor iterations, summed over all major iterations.

If k = 0, no minor iterations are performed, but the starting point is tested for both feasibility and optimality.



                         
Linesearch tolerance t Default = 0.1

For nonlinear problems, this controls the accuracy with which a steplength α is located during one-dimensional searches of the form
 
minimize
α
F(xp) subject to 0 < α ≤ β.
A linesearch occurs on most minor iterations for which x is feasible. (If the constraints are nonlinear, the function being minimized is the augmented Lagrangian in equation (5).)

The value of t must satisfy 0.0 ≤ t < 1.0. The default value t = 0.1 requests a moderately accurate search, and should be satisfactory in most cases. If the nonlinear functions are cheap to evaluate, a more accurate search may be appropriate; try t = 0.01 or t = 0.001. The number of iterations should decrease, and this will reduce total run time if there are many linear or nonlinear constraints. If the nonlinear functions are expensive to evaluate, a less accurate search may be appropriate; try t = 0.5 or perhaps t = 0.9. (The number of iterations will probably increase, but the total number of function evaluations may decrease enough to compensate.)



                         
LU singularity tolerance t3 Default = є0.67 ≈ 3.25e−11
LU swap tolerance t4 Default = є1/4 ≈ 10−4

The singularity tolerance t3 helps guard against ill-conditioned basis matrices. When the basis is refactorized, the diagonal elements of U are tested as follows: if |Ujj| ≤ t3 or |Ujj| < t3 maxi |Uij|, the j-th column of the basis is replaced by the corresponding slack variable. (This is most likely to occur after a restart, or at the start of a major iteration.)

In some cases, the Jacobian matrix may converge to values that make the basis exactly singular. (For example, a whole row of the Jacobian could be zero at an optimal solution.) Before exact singularity occurs, the basis could become very ill-conditioned and the optimization could progress very slowly (if at all). Setting t3 = 1.0e-5, say, may help cause a judicious change of basis.

The LU swap tolerance is somewhat similar but can take effect more easily. It is again used only after a basis factorization, and normally just at the start of a major iteration. If a diagonal of U seems to be rather small (as measured by t4) relative to the biggest diagonal of U, a basis change is made in which the basic variable associated with the small diagonal of U is swapped with a carefully chosen superbasic variable (if there are any). The number of superbasic variables stays the same. A message is printed to advise that a swap has occurred.

In practice this tends to help problems whose basis is becoming ill-conditioned. If the number of swaps becomes excessive, set LU swap tolerance 1.0e-6, say, or perhaps smaller.



                         
Minor damping parameter d Default = 2.0

This parameter limits the change in x during a linesearch. It applies to all nonlinear problems, once a “feasible solution" or “feasible subproblem" has been found.

A linesearch of the form minimizeα F(xp) is performed over the range 0 < α ≤ β, where β is the step to the nearest upper or lower bound on x. Normally, the first steplength tried is α1 = min(1,β), but in some cases, such as F(x) = aebx or F(x) = axb, even a moderate change in the components of x can lead to floating-point overflow.

The parameter d is therefore used to define a limit β = d(1 + ∥x∥) / ∥p∥ , and the first evaluation of F(x) is at the potentially smaller steplength α1 = min(1, β, β).

Wherever possible, upper and lower bounds on x should be used to prevent evaluation of nonlinear functions at meaningless points. The Minor damping parameter provides an additional safeguard. The default value d = 2.0 should not affect progress on well behaved problems, but setting d = 0.1 or 0.01 may be helpful when rapidly varying functions are present. A “good" starting point may be required. An important application is to the class of nonlinear least-squares problems.

In cases where several local optima exist, specifying a small value for d may help locate an optimum near the starting point.



                         
Multiple price k Default = 1

“Pricing" refers to a scan of the current nonbasic variables to determine if any should be changed from their current value (by allowing them to become superbasic or basic).

If multiple pricing is in effect, the k best nonbasic variables are selected for admission to the superbasic set. (“Best" means the variables with largest reduced gradients of appropriate sign.) If partial pricing is also in effect, the k best variables are selected from the current partition of A and I.

On large nonlinear problems it may help to set k > 1 if there are not many superbasic variables at the starting point but many at the optimal solution.



                         
Optimality tolerance t Default = 1.0e-6




                         
Partial price p Default = 10 (LP) or 1 (NLP)

This parameter may be useful for large problems that have significantly more variables than constraints. Larger values reduce the work required for each “pricing" operation (when a nonbasic variable is selected to become basic or superbasic).



                         
Scale option l Default = 2 (LP) or 1 (NLP)
Scale Yes
Scale No
Scale linear variables   Same as Scale option 1
Scale nonlinear variables   Same as Scale option 2
Scale all variables   Same as Scale option 2
Scale tolerance t Default = 0.9
Scale, Print
Scale, Print, Tolerance t

Three scale options are available as follows:
l = 0
No scaling. If storage is at a premium, this option saves m+n words of workspace.
l = 1
If some of the variables are linear, the constraints and linear variables are scaled by an iterative procedure that attempts to make the matrix coefficients as close as possible to 1.0.
l = 2
The constraints and variables are scaled by the iterative procedure. Also, a certain additional scaling is performed that may be helpful if the right-hand side b or the solution x is large. This takes into account columns of ( A I ) that are fixed or have positive lower bounds or negative upper bounds.
Scale option 1 is the default for nonlinear problems. (Only linear variables are scaled.)

Scale Yes sets the default. (Caution: If all variables are nonlinear, Scale Yes unexpectedly does nothing, because there are no linear variables to scale.) Scale No suppresses scaling (equivalent to Scale option 0).

The Scale tolerance and Scale, Print options are the same as for linear programs.



                         
Subspace tolerance t Default = 0.5

This controls the extent to which optimization is confined to the current set of basic and superbasic variables (Phase 4 iterations), before one or more nonbasic variables are added to the superbasic set (Phase 3). The value specified must satisfy 0 < t ≤ 1.

When a nonbasic variable xj is made superbasic, the norm of the reduced-gradient vector (for all superbasics) is recorded. Let this be ∥ZT g0∥. (In fact, the norm is |dj|, the size of the reduced gradient for the new superbasic variable xj.)

Subsequent Phase 4 iterations continue at least until the norm of the reduced-gradient vector satisfies ∥ZT g∥t × ∥ZT g0∥. (∥ZT g∥ is the size of the largest reduced-gradient among the superbasic variables.)

A smaller value of t is likely to increase the total number of iterations, but may reduce the number of basis changes. A larger value such as t = 0.9 may sometimes lead to improved overall efficiency, if the number of superbasic variables is substantially larger at the optimal solution than at the starting point.

Other convergence tests on the change in the function being minimized and the change in the variables may prolong Phase 4 iterations. This helps to make the overall performance insensitive to larger values of t.



                         
Superbasics limit s Default = 50

This places a limit on the storage allocated for superbasic variables. Ideally, s should be set slightly larger than the “number of degrees of freedom" expected at an optimal solution.

For nonlinear problems, the number of degrees of freedom is often called the “number of independent variables". Normally, s need not be greater than n1+1, where n1 is the number of nonlinear variables. For many problems, s may be considerably smaller than n1. This saves storage if n1 is very large.

If Hessian dimension r is specified, the default value of s is the same number (and conversely). This is a safeguard to ensure superlinear convergence wherever possible. Otherwise, the default for both r and s is 50.



                         
Unbounded objective value r1 Default = 1.0e+20
Unbounded step size r2 Default = 1.0e+10

These parameters are intended to detect unboundedness in nonlinear problems. During a linesearch of the form minα F(x + α p) , if | F | exceeds r1 or if α exceeds r2, iterations are terminated with the exit message Problem is unbounded (or badly scaled).

If singularities are present, unboundedness in F(x) may be manifested by a floating-point overflow (during the evaluation of F(x + α p)), before the test against r1 can be made.

Unboundedness in x is best avoided by placing finite upper and lower bounds on the variables. See also the Minor damping parameter.



                         
Verify level l Default = 0
Verify objective gradients   Same as Verify level 1
Verify constraint gradients   Same as Verify level 2
Verify   Same as Verify level 3
Verify gradients   Same as Verify level 3
Verify Yes Same as Verify level 3
Verify No Same as Verify level 0

These options refer to a check on the gradients computed by your nonlinear function routines funobj and funcon at the starting point (the initial value of the nonlinear variables x(*)). Values output in the gradient array g(*) are compared with estimates obtained by finite differences.
l = 0
Only a “cheap" test is performed, requiring three evaluations of the nonlinear objective (if any) and two evaluations of the nonlinear constraints.
l = 1
A more reliable check is made on each component of the objective gradient.
l = 2
A check is made on each column of the Jacobian matrix associated with the nonlinear constraints.
l = 3
A detailed check is made on both the objective and the Jacobian.
l = −1
No checking is performed. This may be necessary if the functions are undefined at the starting point.
Verify level 3 is recommended for a new function routine, particularly if the “cheap" test indicates error. Missing gradients are not checked (so there is no overhead). If there are many nonlinear variables, the Start and Stop keywords may be used to limit the check to a subset.

As noted, gradient verification occurs at the starting point, before a problem is scaled, and before the first basis is factorized. The bounds on x will be satisfied, but the general linear constraints may not. If the nonlinear objective or constraint functions are undefined, you could initially specify
   Objective = NONE
   Nonlinear objective variables   0
   Major iterations                1
   New basis file                 11  (say)
   Verify level                   -1
to obtain a point that satisfies the linear constraints, and then restart with the correct linear and nonlinear objective, along with
   Old basis file                 11
   Verify level                    3

4.2.5  Options for Nonlinear Constraints

The following options apply to problems with nonlinear constraints.



                         
Completion Partial Default
Completion Full

When there are nonlinear constraints, this determines whether subproblems should be solved to moderate accuracy (partial completion), or to full accuracy (full completion). MINOS implements the option by using two sets of convergence tolerances for the subproblems.

Use of partial completion may reduce the work during early major iterations, unless the Minor iterations limit is active. The optimal set of basic and superbasic variables will probably be determined for any given subproblem, but the reduced gradient may be larger than it would have been with full completion.

An automatic switch to full completion occurs when it appears that the sequence of major iterations is converging. The switch is made when the nonlinear constraint error is reduced below 100 * (Row tolerance), the relative change in λk is 0.1 or less, and the previous subproblem was solved to optimality.

Full completion tends to give better Lagrange-multiplier estimates. It may lead to fewer major iterations, but may result in more minor iterations.



                         
Crash option l Default = 3
Crash tolerance t Default = 0.1

Let A refer to the linearized constraint matrix.
l = 0
The initial basis contains only slack variables: B = I.
l = 1
A is evaluated at the starting point. Crash is called once, looking for a triangular basis in all rows and columns of A.
l = 2
A is evaluated only after the linear constraints are satisfied. Crash is called twice. The first call looks for a triangular basis in linear rows. The first major iteration proceeds with simplex-type iterations until the linear constraints are satisfied. A is then evaluated for the second major iteration and Crash is called again to find a triangular basis in the nonlinear rows (retaining the current basis for linear rows).
l = 3
Crash is called three times, treating linear equalities and linear inequalities separately, with simplex-type iterations in between. As before, the last call treats nonlinear rows at the start of the second major iteration.


                         
Feasibility tolerance t Default = 1.0e-6

A “feasible subproblem" is one in which the linear constraints and bounds, as well as the current linearization of the nonlinear constraints, are satisfied to within t.

Note that feasibility with respect to the nonlinear constraints is determined by the Row tolerance (not the Feasibility tolerance).

MINOS first attempts to satisfy the linear constraints and bounds. If the sum of infeasibilities cannot be reduced to zero, the problem is declared infeasible.

If Scale option = 1 or 2, feasibility is defined in terms of the scaled problem (since it is then more likely to be meaningful).

Normally, the nonlinear functions F(x) and f(x) are evaluated only at points x that satisfy the linear constraints and bounds. If the functions are undefined in certain regions, every attempt should be made to eliminate these regions from the problem. For example, for a function F(x) = √x1 + logx2, it would be essential to place lower bounds on both variables. If Feasibility tolerance = 10−6, the bounds x1 ≥ 10−5 and x2 ≥ 10−4 might be appropriate. (The log singularity is more serious; in general, keep variables as far away from singularities as possible.)

An exception is during optional gradient checking (see Verify), which occurs before any optimization takes place. The points at which the functions are evaluated satisfy the bounds but not necessarily the general constraints. If this causes difficulty, gradient checking should be suppressed by setting Verify level -1.

If a subproblem is infeasible, the bounds on the linearized constraints are relaxed in several stages until the subproblem appears feasible. (The true bounds are restored for the next subproblem.) This approach sometimes allows the optimization to proceed successfully. In general, infeasible subproblems are a symptom of difficulty and it may be necessary to increase the Penalty parameter or alter the starting point.



                         
Lagrangian Yes Default
Lagrangian No

This determines the form of the objective function used for the linearized subproblems. The default value Yes is highly recommended. The Penalty parameter value is then also relevant.

If No is specified, the nonlinear constraint functions are evaluated only twice per major iteration. Hence this option may be useful if the nonlinear constraints are very expensive to evaluate. However, in general there is a great risk that convergence may not be achieved.



                         
Major damping parameter d Default = 2.0

This parameter may assist convergence on problems that have highly nonlinear constraints. It is intended to prevent large relative changes between subproblem solutions (xkk) and (xk+1k+1). For example, the default value 2.0 prevents the relative change in either xk or λk from exceeding 200 per cent. It will not be active on well behaved problems. (If all components of xk or λk are small, the norms of those vectors will not be allowed to increase beyond about 2.0.)

The parameter is used to interpolate between the solutions at the beginning and end of each major iteration. Thus, xk+1 and λk+1 are changed to
xk + σ ( xk+1xk) and λk + σ (λk+1 − λk)
for some steplength σ < 1. In the case of nonlinear equations (where the number of constraints is the same as the number of variables) this gives a damped Newton method.

This is a very crude control. If the sequence of major iterations does not appear to be converging, one should first re-run the problem with a higher Penalty parameter (say 2, 4 or 10). (Skip this re-run in the case of a square system of nonlinear equations: there are no degrees of freedom and the Penalty parameter value has essentially no effect.)

If the subproblem solutions continue to change violently, try reducing d to 0.2 or 0.1 (say).



                         
Major iterations k Default = 50

This is the maximum number of major iterations allowed. It is intended to guard against an excessive number of linearizations of the nonlinear constraints, since in some cases the sequence of major iterations may not converge.

The progress of the major iterations can be best monitored using Print level 0 (the default).



                         
Minor iterations k Default = 40

This is the maximum number of minor iterations allowed during a major iteration, after the linearized constraints for that subproblem have been satisfied. (An arbitrary number of minor iterations may be needed to find a feasible point for each subproblem.) The Iterations limit provides an independent limit on the total minor iterations (across all subproblems).

A moderate value (e.g., 30 ≤ k ≤ 200) prevents excessive effort being expended on early major iterations, but allows later subproblems to be solved to completion.

The first major iteration is special: it terminates as soon as the linear constraints and bounds are satisfied (if possible), ignoring the nonlinear constraints.

In general it is unsafe to specify a value as small as k = 1 or 2. (Even when an optimal solution has been reached, a few minor iterations may be needed for the corresponding subproblem to be recognized as optimal.)



                         
Optimality tolerance t Default = 1.0e-6




                         
Penalty parameter r Default = 1.0

This specifies that the initial value of ρk in the augmented Lagrangian (5) should be r times a certain default value 100/m1, where m1 is the number of nonlinear constraints. It is used when Lagrangian = Yes (the default setting).

For early runs on a problem with unknown characteristics, the default value should be acceptable. If the problem is highly nonlinear and the major iterations do not converge, a larger value such as 2 or 5 may help. In general, a positive r may be necessary to ensure convergence, even for convex programs.

On the other hand, if r is too large, the rate of convergence may be unnecessarily slow. If the functions are not highly nonlinear or a good starting point is known, it is often safe to specify Penalty parameter 0.0.

If several related problems are to be solved, the following strategy for setting the Penalty parameter may be useful:
  1. Initially, use a moderate value for r (such as the default) and a reasonably low Iterations and/or Major iterations limit.

  2. If successive major iterations appear to be terminate with radically different solutions, try increasing the penalty parameter. (See also the Major damping parameter.)

  3. If there appears to be little progress between major iterations, it may help to reduce the penalty parameter.


                         
Radius of convergence r Default = 0.01

This determines when the penalty parameter ρk is reduced (if initialized to a positive value). Both the nonlinear constraint violation (see rowerr below) and the relative change in consecutive Lagrange multiplier estimates must be less than r at the start of a major iteration before ρk is reduced or set to zero.

A few major iterations later, full completion is requested if not already set, and the remaining sequence of major iterations should converge quadratically to an optimum.



                         
Row tolerance t Default = 1.0e-6

This specifies how accurately the nonlinear constraints should be satisfied at a solution. The default value is usually small enough, since model data is often specified to about that accuracy.

Let viol be the maximum violation of the nonlinear constraints (2), and let rowerr = viol / (1 + xnorm), where xnorm is a measure of the size of the current solution (x,y). The solution is regarded as acceptably feasible if rowerrt.

If the problem functions involve data that is known to be of low accuracy, a larger Row tolerance may be appropriate. On the other hand, nonlinear constraints are often satisfied with rapidly increasing accuracy during the last few major iterations. It is common for the final solution to satisfy rowerr = O(є).



                         
Scale option l Default = 2 (LP) or 1 (NLP)
Scale Yes
Scale No
Scale linear variables   Same as Scale option 1
Scale nonlinear variables   Same as Scale option 2
Scale all variables   Same as Scale option 2
Scale tolerance r Default = 0.9
Scale, Print
Scale, Print, Tolerance r


Three scale options are available as follows:
l = 0
No scaling. If storage is at a premium, this option saves m+n words of workspace.
l = 1
Linear constraints and variables are scaled by an iterative procedure that attempts to make the matrix coefficients as close as possible to 1.0.
l = 2
All constraints and variables are scaled by the iterative procedure. Also, a certain additional scaling is performed that may be helpful if the right-hand side b or the solution x is large. This takes into account columns of ( A I ) that are fixed or have positive lower bounds or negative upper bounds.
Scale option 1 is the default for nonlinear problems. (Only linear variables are scaled.)

Scale Yes sets the default scaling. Caution: If all variables are nonlinear, Scale Yes unexpectedly does nothing, because there are no linear variables to scale.) Scale No suppresses scaling (equivalent to Scale option 0).

With nonlinear constraints, Scale option 1 or 0 should generally be tried first. Scale option 2 gives scales that depend on the initial Jacobian, and should therefore be used only if (a) a good starting point is provided, and (b) the problem is not highly nonlinear.



                         
Verify level l Default = 0
Verify objective gradients   Same as Verify level 1
Verify constraint gradients   Same as Verify level 2
Verify   Same as Verify level 3
Verify gradients   Same as Verify level 3
Verify Yes Same as Verify level 3
Verify No Same as Verify level 0

This option refers to a finite-difference check on the gradients (first derivatives) of each nonlinear function. It occurs before a problem is scaled, and before the first basis is factorized. (Hence, the variables may not yet satisfy the general linear constraints.)
l = 0
Only a “cheap" test is performed, requiring three evaluations of the nonlinear objective (if any) and two evaluations of the nonlinear constraints.
l = 1
A more reliable check is made on each component of the objective gradient.
l = 2
A check is made on each column of the Jacobian matrix associated with the nonlinear constraints.
l = 3