## 4MINOS details

### 4.1  Introduction

TOMNET /MINOS (hereafter referred to as MINOS) is a linear and nonlinear programming system, designed to solve large-scale constrained optimization problems of the following form:
 min x, y
F(x) + cT x + dT y
s/t b1 < f(x) + A1 y < b2
b3 < A2x + A3 y < b4
l < (x, y) < u
(7)

where the vectors bi, c, d, l, u and the matrices Ai are constant, F(x) is a nonlinear function of some of the variables, and f(x) is a vector of nonlinear functions. The nonlinearities (if present) may be of a general nature but must be smooth and preferably “almost linear", in the sense that they should not change radically with small changes in the variables. We make the following definitions:

 x the nonlinear variables y the linear variables (x, y) the vector (x,y)T . (1.1) the objective function (1.2) the nonlinear constraints (1.3) the linear constraints (1.4) the bounds on the variables m the total number of general constraints in (2) and (3) n the total number of variables in x and y m1 the number of nonlinear constraints (the dimension of f(x)) n1 the number of nonlinear variables (the dimension of x) n1′ the number of nonlinear objective variables (in F(x)) n1′′ the number of nonlinear Jacobian variables (in f(x))

A large-scale problem is one in which m and n are several hundred or several thousand. MINOS takes advantage of the fact that the constraint matrices Ai and the partial derivatives ∂ fi(x) / ∂ xj are typically sparse (contain many zeros).

The dimensions n1 and n1′′ allow for the fact that F(x) and f(x) may involve different sets of nonlinear variables “x". The two sets of variables always overlap, in the sense that the shorter “x" is always the same as the beginning of the other. If x is the same in both cases, we have n1 = n1 = n1′′. Otherwise, we define the number of nonlinear variables to be n1 = max(n1,n1′′).

In the following sections we introduce more terminology and give an overview of the MINOS optimization algorithms and the main system features.

#### 4.1.1  Linear Programming

When F(x) and f(x) are absent, the problem becomes a linear program. Since there is no need to distinguish between linear and nonlinear variables, we use x rather than y. We also convert all general constraints into equalities with the aid of slack variables s, so that the only inequalities are simple bounds on the variables. Thus, we write linear programs in the form
 minimize x, s
cT x subject to Ax + s = b,    l ≤ (x, s) ≤ u.     (8)

When the constraints are linear, the bounds on the slacks are defined so that b = 0. When there are nonlinear constraints, some elements of b are nonzero.

In the mathematical programming world, x and s are sometimes called structural variables and logical variables. Their upper and lower bounds are fundamental to problem formulations and solution algorithms. Some of the components of l may be −∞ and those of u may be +∞. If lj = uj, a variable is said to be fixed, and if its bounds are −∞ and +∞, the variable is called free.

Within MINOS, a point (x, s) is said to be feasible if the following are true:
• The constraints Ax + s = b are satisfied to within machine precision ≈ 10−15.
• The bounds l ≤ (x, s) ≤ u are satisfied to within a feasibility tolerance δfea ≈ 10−6.
• The nonlinear constraints (7) are satisfied to within a row tolerance δrow ≈ 10−6.
Tolerances such as δfea and δrow may be specified by setting Feasibility tolerance and Row tolerance.

MINOS solves linear programs using a reliable implementation of the primal simplex method, in which the constraints Ax + s = b are partitioned into the form
Bx B + Nx N = b,     (9)
where the basis matrix B is a square and nonsingular submatrix of ( A I ). The elements of x B and x N are called the basic and nonbasic variables respectively. Together, they are a permutation of the vector (x, s). Certain dual variables π and reduced costs d N are defined by the equations
BTπ = c B,      d N = c NNTπ,     (10)
where (c B, c N) is a permutation of the objective vector (c, 0).

At a feasible point, nonbasic variables are typically equal to one of their bounds, and basic variables are somewhere between their bounds. To reach an optimal solution, the simplex method performs a sequence of iterations of the following general nature. With guidance from d N, a nonbasic variable is chosen to move from its current value, and the basic variables are adjusted to satisfy the constraints in (8). Usually one of the basic variables reaches a bound. The basis partition is then redefined with a column of B being replaced by a column of N. When no such interchange can be found to reduce the value of cT x, the current solution is optimal.

#### The simplex method

For convenience, let x denote the variables (x, s). The main steps in a simplex iteration are as follows:
Compute dual variables:
Solve BTπ = c B.
Price:
Compute some or all of the reduced costs d N = c NNTπ to determine if a favorable nonbasic column aq exists.
Compute search direction:
Solve Bp B = ± aq to determine the basic components of a search direction p along which the objective is improved. (The nonbasic elements of p are p N = 0, except for ± 1 for the element corresponding to aq.)
Find maximum steplength:
Find the largest steplength αmax such that x + αmax p continues to satisfy the bounds on the variables. The steplength may be determined by the new nonbasic variable reaching its opposite bound, but normally some basic variable will reach a bound first.
Update:
Take the step αmax. If this was determined by a basic variable, interchange the corresponding column of B with column aq from N.
When a starting basis is chosen and the basic variables x B are first computed, if any components of x B lie significantly outside their bounds, we say that the current point is infeasible. In this case, the simplex method uses a “Phase 1" procedure to reduce the sum of infeasibilities. This is similar to the subsequent “Phase 2" procedure just described.

The feasibility tolerance δfea is used to determine which Phase is in effect. A similar optimality tolerance δopt is used during pricing to judge whether any reduced costs are significantly large. (This tolerance is scaled by π, a measure of the size of the current π.)

If the solution procedures are interrupted, some of the nonbasic variables may lie strictly between their bounds: lj < xj < uj. In addition, at a “feasible" or “optimal" solution, some of the basic variables may lie slightly outside their bounds: lj − δfeaxjuj + δfea. In rare cases, even a few nonbasic variables might lie outside their bounds by as much as δfea.

MINOS maintains a sparse LU factorization of the basis matrix B, using a Markowitz ordering scheme and Bartels-Golub updates, as implemented in the Fortran package LUSOL The basis factorization is central to the efficient handling of sparse linear and nonlinear constraints.

#### 4.1.2  Problems with a Nonlinear Objective

When nonlinearities are confined to the term F(x) in the objective function, the problem is a linearly constrained nonlinear program. MINOS solves such problems using a reduced-gradient method combined with a quasi-Newton method that generally leads to superlinear convergence. The implementation follows that described in Murtagh and Saunders.

As a slight generalization of (9), the constraints Ax + s = b are partitioned into the form
Bx B + Sx S + Nx N = b,     (11)
where x S is a set of superbasic variables. As before, the nonbasic variables are normally equal to one of their bounds, while the basic and superbasic variables lie somewhere between their bounds (to within δfea). Let the number of superbasic variables be n S, the number of columns in S. At a solution, n S will be no more than n1, the number of nonlinear variables, and it is often much smaller than this. In many real-life cases we have found that n S remains reasonably small, say 200 or less, regardless of the size of the problem. This is one reason why MINOS has proved to be a practical tool.

In the reduced-gradient method, x S is regarded as a set of “independent variables" that are allowed to move in any desirable direction to reduce the objective function (or the sum of infeasibilities). The basic variables are then adjusted in order to continue satisfying the linear constraints. If it appears that no improvement can be made with the current definition of B, S and N, one of the nonbasic variables is selected to be added to S, and the process is repeated with an increased value of n S. At all stages, if a basic or superbasic variable encounters one of its bounds, that variable is made nonbasic and the value of n S is reduced by one.

For linear programs, we may interpret the simplex method as being the same as the reduced-gradient method, with the number of superbasic variables oscillating between 0 and 1. (In general, a step of the simplex method or the reduced-gradient method is called a minor iteration.)

A certain matrix Z is needed for descriptive purposes. It takes the form
Z =

 −B−1 S I 0

,     (12)
though it is never computed explicitly. Given LU factors of the basis matrix B, it is possible to compute products of the form Z q and ZT g by solving linear equations involving B or BT. This in turn allows optimization to be performed on the superbasic variables, while the basic variables are adjusted to satisfy the general linear constraints. (In the description below, the reduced-gradient vector satisfies d S = ZT g, and the search direction satisfies p = Zp S.)

An important part of MINOS is the quasi-Newton method used to optimize the superbasic variables. This can achieve superlinear convergence during any sequence of iterations for which the B, S, N partition remains constant. It requires a dense upper-triangular matrix R of dimension n S, which is updated in various ways to approximate the reduced Hessian:
RT RZT H Z,     (13)
where H is the Hessian of the objective function, i.e. the matrix of second derivatives of F(x). As for unconstrained optimization, the storage required for R is sometimes a limiting factor.

#### The reduced-gradient method

Let g be the gradient of the nonlinear objective (7). The main steps in a reduced-gradient iteration are as follows:
Compute dual variables and reduced gradient:
Solve BTπ = g B and compute the reduced-gradient vector d S = g SSTπ.
Price:
If d S is sufficiently small, compute some or all of the reduced costs d N = g NNTπ to determine if a favorable nonbasic column aq exists. If so, move that column from N into S, expanding R accordingly.
Compute search direction:
Solve RT R p S = − d S and Bp B = − Sp S to determine the superbasic and basic components of a search direction p along which the objective is improved. (The nonbasic elements of p are p N = 0.)
Find maximum steplength:
Find the largest steplength αmax such that x + αmax p continues to satisfy the bounds on the variables.
Perform linesearch:
Find an approximate solution to the one-dimensional problem
 minimize α
F(xp) subject to 0 ≤ α ≤ αmax.
Update (quasi-Newton):
Take the step α. Apply a quasi-Newton update to R to account for this step.
Update (basis change):
If a superbasic variable reached a bound, move it from S into N. If a basic variable reached a bound, find a suitable superbasic variable to move from S into B, and move the basic variable into N. Update R if necessary.
At an optimum, the reduced gradient d S should be zero. MINOS terminates when d S ≤ δoptπ and the reduced costs (component of d N) are all sufficiently positive or negative, as judged by the same quantity δoptπ.

In the linesearch, F(xp) really means the objective function (7) evaluated at the point (x, y, s) + α p. This steplength procedure is another important part of MINOS. Two different procedures are used, depending on whether or not all gradients are known analytically. The number of nonlinear function evaluations required may be influenced by setting the Linesearch tolerance in the SPECS file.

Normally, the objective function F(x) will never be evaluated at a point x unless that point satisfies the linear constraints and the bounds on the variables. An exception is during a finite-difference check on the calculation of gradient elements. This check is performed at the starting point x0, which takes default values or may be specified. MINOS ensures that the bounds lx0u are satisfied, but in general the starting point will not satisfy the general linear constraints. If F(x0) is undefined, the gradient check should be suppressed (Verify level -1), or the starting point should be redefined.

#### 4.1.3  Problems with Nonlinear Constraints

If any of the constraints are nonlinear, MINOS employs a projected Lagrangian algorithm, based on a method due to Robinson, see Murtagh and Saunders. This involves a sequence of major iterations, each of which requires the solution of a linearly constrained subproblem. Each subproblem contains linearized versions of the nonlinear constraints, as well as the original linear constraints and bounds.

At the start of the k-th major iteration, let (xk, yk) be an estimate of the variables, and let λk be an estimate of the Lagrange multipliers (dual variables) associated with the nonlinear constraints. The constraints are linearized by changing f(x) in Equation (2) to its linear approximation:
f
(x, xk) = f(xk) + J(xk)(xxk),
or more briefly f = fk + Jk(xxk), where J(xk) is the Jacobian matrix evaluated at xk. (The i-th row of the Jacobian is the gradient vector for the i-th nonlinear constraint function.) The subproblem to be solved during the k-th major iteration is then
 min x, y
F(x) + cT x + dT y − λkT fd +
1
2
ρ fd 2
s/t b1
f
+ A1y
b2
b3 A2x + A3y b4,
l ≤ (x, y) ≤ u,
(14)
where fd = ff is the difference between f(x) and its linearization. The objective function (14) is called an augmented Lagrangian. The scalar ρk is a penalty parameter, and the term involving ρk is a modified quadratic penalty function. MINOS uses the reduced-gradient method to solve each subproblem. As before, slack variables are introduced and the vectors bi are incorporated into the bounds on the slacks. The linearized constraints take the form

 Jk A1 A2 A3

 x y

+
 s1 s2

=
 Jk xk − fk 0

.
We refer to this system as Ax + s = b as in the linear case. The Jacobian Jk is treated as a sparse matrix, the same as the matrices A1, A2 and A3. The quantities Jk, b, λk and ρk change each major iteration.

#### The projected Lagrangian method

For convenience, suppose that all variables and constraints are nonlinear. The main steps in a major iteration are as follows:
Solve subproblem:
Find an approximate solution (x, λ) to the kth subproblem (14)–(14).
Compute search direction:
Adjust the elements of λ if necessary (if they have the wrong sign). Define a search direction (Δ x, Δλ) = (xxk, λ − λk).
Find steplength:
Choose a steplength σ such that some merit function M(x,λ) has a suitable value at the point (xk + σΔ x, λk + σΔλ).
Update:
Take the step σ to obtain (xk+1, λk+1). In some cases, adjust ρk.
For the first major iteration, the nonlinear constraints are ignored and minor iterations are performed until the original linear constraints are satisfied.

The initial Lagrange multiplier estimate is typically λk = 0 (though it can be provided by the user). If a subproblem terminates early, some elements of the new estimate λ may be changed to zero.

The penalty parameter initially takes a certain default value ρk = 100.0/m1, where m1 is the number of nonlinear constraints. (A value r times as big is obtained by specifying Penalty parameter r.) For later major iterations, ρk is reduced in stages when it appears that the sequence {xkk} is converging. In many cases it is safe to specify Penalty parameter 0.0 at the beginning, particularly if a problem is only mildly nonlinear. This may improve the overall efficiency.

In the output from MINOS, the term Feasible subproblem indicates that the linearized constraints (14)–(14) have been satisfied. In general, the nonlinear constraints are satisfied only in the limit, so that feasibility and optimality occur at essentially the same time. The nonlinear constraint violation is printed every major iteration. Even if it is zero early on (say at the initial point), it may increase and perhaps fluctuate before tending to zero. On “well behaved" problems, the constraint violation will decrease quadratically (i.e., very quickly) during the final few major iterations.

For certain rare classes of problem it is safe to request the values λk = 0 and ρk=0 for all subproblems by specifying Lagrangian = No (in which case the nonlinear constraint functions are evaluated only once per major iteration). However for general problems, convergence is much more likely with the default setting, Lagrangian = Yes.

#### The merit function

Unfortunately, it is not known how to define a merit function M(x,λ) that can be reduced at every major iteration. As a result, there is no guarantee that the projected Lagrangian method described above will converge from an arbitrary starting point. This has been the principal theoretical gap in MINOS, finally resolved by the PhD research of Michael Friedlander. The main features needed to stabilize MINOS are:
• To relax the linearized constraints via an ℓ1 penalty function.
• To repeat a major iteration with increased ρk (and more relaxed linearized constraints) if the nonlinear constraint violation would increase too much.
In practice, the method of MINOS 5.51 often does converge and a good rate of convergence is often achieved in the final major iterations, particularly if the constraint functions are “nearly linear". As a precaution, MINOS prevents radical changes from one major iteration to the next. Where possible, the steplength is chosen to be σ = 1, so that each new estimate of the solution is (xk+1, λk+1) = (x, λ), the solution of the subproblem. If this point is “too different", a shorter steplength σ < 1 is chosen.

If the major iterations for a particular model do not appear to be converging, some of the following actions may help:
1. Specify initial activity levels for the nonlinear variables as carefully as possible.
2. Include sensible upper and lower bounds on all variables.
3. Specify a Major damping parameter that is lower than the default value. This tends to make σ smaller.
4. Specify a Penalty parameter that is higher than the default value. This tends to prevent excessive departures from the constraint linearization.

#### 4.1.4  Problem Formulation

In general, it is worthwhile expending considerable prior analysis to make the constraints completely linear if at all possible. Sometimes a simple transformation will suffice. For example, a pipeline optimization problem has pressure drop constraints of the form
K1
d14.814
+
K2
d24.814
+   ≤ PT2P02,
where di are the design variables (pipe diameters) and the other terms are constant. These constraints are highly nonlinear, but by redefining the decision variables to be xi = 1/di4.814 we can make the constraints linear. Even if the objective function becomes more nonlinear by such a transformation (and this usually happens), the advantages of having linear constraints greatly outweigh this.

Similarly, it is usually best not to move nonlinearities from the objective function into the constraints. For example, we should not replace “minimize F(x)" by
minimize z    subject to    F(x) − z = 0.

Scaling is a very important matter during problem formulation. A general rule is to scale both the data and the variables to be as close to 1.0 as possible. In general we suggest the range 1.0 to 10.0. When conflicts arise, one should sacrifice the objective function in favor of the constraints. Real-world problems tend to have a natural scaling within each constraint, as long as the variables are expressed in consistent physical units. Hence it is often sufficient to apply a scale factor to each row. MINOS has options to scale the rows and columns of the constraint matrix automatically. By default, only the linear rows and columns are scaled, and the procedure is reliable. If you request that the nonlinear constraints and variables be scaled, bear in mind that the scale factors are determined by the initial Jacobian J(x0), which may differ considerably from J(x) at a solution.

Finally, upper and lower bounds on the variables (and on the constraints) are extremely useful for confining the region over which optimization has to be performed. If sensible values are known, they should always be used. They are also important for avoiding singularities in the nonlinear functions. Note that bounds may be violated slightly by as much as the feasibility tolerance δfea. Hence, if √x2 or logx2 appear (for example) and if δfea = 10−6, the lower bound on x2 would normally have to be at least 10−5. If it is known that x2 will be at least 0.5 (say) at a solution, then its lower bound should be 0.5.

For a detailed discussion of many aspects of numerical optimization, see Gill, Murray and Wright for much invaluable advice on problem formulation and assessment of results.

#### 4.1.5  Restrictions

MINOS is designed to find solutions that are locally optimal. The nonlinear functions in a problem must be smooth (i.e., their first derivatives must exist), especially near the desired solution. The functions need not be separable.

A certain “feasible" region is defined by the general constraints and the bounds on the variables. If the objective is convex within this region and if the feasible region itself is convex, any optimal solution obtained will be a global optimum. Otherwise there may be several local optima, and some of these may not be global. In such cases the chances of finding a global optimum are usually increased by choosing a starting point that is “sufficiently close", but there is no general procedure for determining what “close" means, or for verifying that a given local optimum is indeed global.

Integer restrictions cannot be imposed directly. If a variable xj is required to be 0 or 1, a common ploy is to include a quadratic term xj(1 − xj) in the objective function. MINOS will indeed terminate with xj = 0 or 1, but inevitably the final solution will just be a local optimum. (Note that the quadratic is negative definite. MINOS will find a global minimum for quadratic functions that are positive definite or positive semidefinite, assuming the constraints are linear.)

### 4.2  Solver Options

The following sections describes some of the solver options depending on problem type.

#### 4.2.1  Options for Linear Programming

The following options apply specifically to linear programs.

 Crash option i Default = 3 Crash tolerance t Default = 0.1

Except on restarts, a Crash procedure is used to select an initial basis from certain rows and columns of the constraint matrix ( A I ). The Crash option i determines which rows and columns of A are eligible initially, and how many times Crash is called. Columns of I are used to pad the basis where necessary.
• i=0 The initial basis contains only slack variables: B = I.

• 1 Crash is called once, looking for a triangular basis in all rows and columns of A.

• 2 Crash is called twice (if there are nonlinear constraints). The first call looks for a triangular basis in linear rows, and the first major iteration proceeds with simplex iterations until the linear constraints are satisfied. The Jacobian is then evaluated for the second major iteration and Crash is called again to find a triangular basis in the nonlinear rows (retaining the current basis for linear rows).

• 3 Crash is called up to three times (if there are nonlinear constraints). The first two calls treat linear equalities and linear inequalities separately. As before, the last call treats nonlinear rows at the start of the second major iteration.
If i ≥ 1, certain slacks on inequality rows are selected for the basis first. (If i ≥ 2, numerical values are used to exclude slacks that are close to a bound.) Crash then makes several passes through the columns of A, searching for a basis matrix that is essentially triangular. A column is assigned to “pivot" on a particular row if the column contains a suitably large element in a row that has not yet been assigned. (The pivot elements ultimately form the diagonals of the triangular basis.) For remaining unassigned rows, slack variables are inserted to complete the basis.

The Crash tolerance allows Crash to ignore certain “small" nonzeros in each column of A. If amax is the largest element in column j, other nonzeros aij in the column are ignored if |aij| ≤ amax × t. (To be meaningful, t should be in the range 0 ≤ t < 1.)

When t > 0.0, the bases obtained by Crash may not be strictly triangular, but are likely to be nonsingular and almost triangular. The intention is to choose a basis containing more columns of A and fewer (arbitrary) slacks. A feasible solution may be reached sooner on some problems.

For example, suppose the first m columns of A are the matrix shown under LU factor tolerance; i.e., a tridiagonal matrix with entries −1, 4, −1. To help Crash choose all m columns for the initial basis, we would specify Crash tolerance t for some value of t > 1/4.

#### 4.2.2  Options for All Problems

The following options have the same purpose for all problems, whether they linear or nonlinear.

 Check frequency k Default = 60

Every k-th minor iteration after the most recent basis factorization, a numerical test is made to see if the current solution x satisfies the general linear constraints (including linearized nonlinear constraints, if any). The constraints are of the form Ax + s = b, where s is the set of slack variables. To perform the numerical test, the residual vector r = bAxs is computed. If the largest component of r is judged to be too large, the current basis is refactorized and the basic variables are recomputed to satisfy the general constraints more accurately.

Check frequency 1 is useful for debugging purposes, but otherwise this option should not be needed.

 Cycle limit l Default = 1 Cycle print p Default = 1 Cycle tolerance t Default = 0.0 Phantom columns c Default = 0 Phantom elements e Default = 0

 Debug level l Default = 0

This causes various amounts of information to be output to the Print file. Most debug levels are not helpful to normal users, but they are listed here for completeness.
l = 0
No debug output.
l = 2
(or more) Output from m5setx showing the maximum residual after a row check.
l = 40
Output from lu8rpc (which updates the LU factors of the basis matrix), showing the position of the last nonzero in the transformed incoming column.
l = 50
Output from lu1mar (which computes the LU factors each refactorization), showing each pivot row and column and the dimensions of the dense matrix involved in the associated elimination.
l = 100
Output from m2bfac and m5log listing the basic and superbasic variables and their values at every iteration.

 Expand frequency k Default = 10000

This option is part of an anti-cycling procedure designed to guarantee progress even on highly degenerate problems. “Cycling" can occur only if zero steplengths are allowed. Here, the strategy is to force a positive step at every iteration, at the expense of violating the bounds on the variables by a small amount. Suppose that the Feasibility tolerance is δ. Over a period of k iterations, the tolerance actually used by MINOS increases from 0.5δ to δ (in steps of 0.5δ/k).

Every k iterations, or when feasibility and optimality are first detected, a resetting procedure eliminates any infeasible nonbasic variables. Some additional iterations may be needed to restore feasibility and optimality. Increasing k reduces that likelihood, but it gives less freedom to choose large pivot elements during basis changes. (See Pivot tolerance.)

 Factorization frequency k Default = 100 (LP) or 50 (NLP)

With linear programs, most iterations cause a basis change, in which one column of the basis matrix B is replaced by another. The LU factors of B must be updated accordingly. At most k updates are performed before the current B is factorized directly.

Each update tends to add nonzeros to the LU factors. Since the updating method is stable, k mainly affects the efficiency of minor iterations, rather than stability.

High values of k (such as 100 or 200) may be more efficient on “dense" problems, when the factors of B tend to have two or three times as many nonzeros as B itself. Lower values of k may be more efficient on problems that are very sparse.

 Feasibility tolerance t Default = 1.0e-6

This sets the feasibility tolerance δfea = t (see §4.2.1). A variable or constraint is considered feasible if it does not lie outside its bounds by more than δfea.

MINOS first attempts to satisfy the linear constraints and bounds. If the sum of infeasibilities cannot be reduced to zero, the problem is declared infeasible. Let sinf be the corresponding sum of infeasibilities. If sinf is quite small, it may be appropriate to raise t by a factor of 10 or 100. Otherwise, some error in the data should be suspected. If sinf is not small, there may be other points that have a significantly smaller sum of infeasibilities. MINOS does not attempt to find a solution that minimizes the sum.

For Scale option 1 or 2, feasibility is defined in terms of the scaled problem (since it is then more likely to be meaningful). The final unscaled solution can therefore be infeasible by an unpredictable amount, depending on the size of the scale factors. Precautions are taken so that in a “feasible solution" the original variables will never be infeasible by more than 0.1. Values that large are very unlikely.

 Iterations limit k Default = 3m

MINOS stops after k iterations even if the simplex method has not yet reached a solution. If k = 0, no iterations are performed, but the starting point is tested for both feasibility and optimality.

 LU factor tolerance t1 Default = 100.0 (LP) or 5.0 (NLP) LU update tolerance t2 Default = 110.0 (LP) or 5.0 (NLP)

These tolerances affect the stability and sparsity of the basis factorization B = LU during refactorization and updating, respectively. They must satisfy t1, t2 ≥ 1.0. The matrix L is a product of matrices of the form

 1 μ 1

,
where the multipliers μ satisfy |μ| ≤ ti. Values near 1.0 favor stability, while larger values favor sparsity. The default values usually strike a good compromise. For large and relatively dense problems, t1 = 10.0 or 5.0 (say) may give a useful improvement in stability without impairing sparsity to a serious degree.

For certain very regular structures (e.g., band matrices) it may be necessary to reduce t1 and/or t2 in order to achieve stability. For example, if the columns of A include a submatrix of the form

 4 −1 −1 4 −1 −1 4 −1 −1 4

,
one should set both t1 and t2 to values in the range 1.0 ≤ ti < 4.0.

 LU density tolerance t3 Default = 0.5 LU singularity tolerance t4 Default = є0.67 ≈ 3.25e−11

The density tolerance t3 is used during LU factorization of the basis matrix. Columns of L and rows of U are formed one at a time, and the remaining rows and columns of the basis are altered appropriately. At any stage, if the density of the remaining matrix exceeds t3, the Markowitz strategy for choosing pivots is altered to reduce the time spent searching for each remaining pivot. Raising the density tolerance towards 1.0 may give slightly sparser LU factors, with a slight increase in factorization time.

The singularity tolerance t4 helps guard against ill-conditioned basis matrices. When the basis is refactorized, the diagonal elements of U are tested as follows: if |Ujj| ≤ t4 or |Ujj| < t4 maxi |Uij|, the j-th column of the basis is replaced by the corresponding slack variable. (This is most likely to occur after a restart, or at the start of a major iteration.)

In some cases, the Jacobian matrix may converge to values that make the basis exactly singular. (For example, a whole row of the Jacobian could be zero at an optimal solution.) Before exact singularity occurs, the basis could become very ill-conditioned and the optimization could progress very slowly (if at all). Setting t4 = 1.0e-5, say, may help cause a judicious change of basis.

 Maximize Minimize Default

This specifies the required direction of optimization.

 Multiple price k Default = 1

It is not normal to set k>1 for linear programs, as it causes MINOS to use the reduced-gradient method rather than the simplex method. The number of iterations, and the total work, are likely to increase.

The reduced-gradient iterations do not correspond to the very efficient multiple pricing “minor iterations" carried out by certain commercial linear programming systems. Such systems require storage for k dense vectors of dimension m, so that k is usually limited to 5 or 6. In MINOS, the total storage requirements increase only slightly with k. (The Superbasics limit must be at least k.)

 Optimality tolerance t Default = 1.0e-6

This is used to judge the size of the reduced gradients dj = gj − πT aj, where gj is the gradient of the objective function corresponding to the j-th variable, aj is the associated column of the constraint matrix (or Jacobian), and π is the set of dual variables.

By construction, the reduced gradients for basic variables are always zero. Optimality is declared if the reduced gradients for nonbasic variables at their lower or upper bounds satisfy dj / π ≥ −t or dj / πt respectively, and if |dj| / πt for superbasic variables.

In those tests, π is a measure of the size of the dual variables. It is included to make the tests independent of a large scale factor on the objective function. The quantity actually used is defined by σ = Σi=1mi|, π = max{σ / √m, 1.0} , so that only scale factors larger than 1.0 are allowed for. If the objective is scaled down to be very small, the optimality test reduces to comparing dj against t.

 Partial price p Default = 10 (LP) or 1 (NLP)

This parameter is recommended for large problems that have significantly more variables than constraints. It reduces the work required for each “pricing" operation (when a nonbasic variable is selected to become basic or superbasic).

When p=1, all columns of the constraint matrix ( A I ) are searched. Otherwise, A and I are partitioned to give p roughly equal segments Aj, Ij (j = 1 to p). If the previous pricing search was successful on Aj, Ij, the next search begins on the segments Aj+1, Ij+1. (Subscripts are modulo p.)

If a reduced gradient is found that is larger than some dynamic tolerance, the variable with the largest such reduced gradient (of appropriate sign) is selected to become superbasic. (Several may be selected if multiple pricing has been specified.) If nothing is found, the search continues on the next segments Aj+2, Ij+2, and so on.

Partial price t (or t/2 or t/3) may be appropriate for time-stage models having t time periods.

 Pivot tolerance t Default = є2/3 ≈ 10−11

Broadly speaking, the pivot tolerance is used to prevent columns entering the basis if they would cause the basis to become almost singular. When x changes to x + α p for some search direction p, a “ratio test" is used to determine which component of x reaches an upper or lower bound first. The corresponding element of p is called the pivot element.

For linear problems, elements of p are ignored (and therefore cannot be pivot elements) if they are smaller than the pivot tolerance t. For nonlinear problems, elements smaller than tp are ignored.

It is common for two or more variables to reach a bound at essentially the same time. In such cases, the Feasibility tolerance provides some freedom to maximize the pivot element and thereby improve numerical stability. Excessively small Feasibility tolerances should therefore not be specified.

To a lesser extent, the Expand frequency also provides some freedom to maximize the pivot element. Excessively large Expand frequencies should therefore not be specified.

 Scale option l Default = 2 (LP) or 1 (NLP) Scale Yes Scale No Scale linear variables Same as Scale option 1 Scale nonlinear variables Same as Scale option 2 Scale all variables Same as Scale option 2 Scale tolerance t Default = 0.9 Scale, Print Scale, Print, Tolerance t

Three scale options are available as follows:
l = 0
No scaling. If storage is at a premium, this option saves m+n words of workspace.
l = 1
Linear constraints and variables are scaled by an iterative procedure that attempts to make the matrix coefficients as close as possible to 1.0 (see Fourer, 1982). This sometimes improves the performance of the solution procedures.
l = 2
All constraints and variables are scaled by the iterative procedure. Also, a certain additional scaling is performed that may be helpful if the right-hand side b or the solution x is large. This takes into account columns of ( A I ) that are fixed or have positive lower bounds or negative upper bounds.
Scale Yes sets the default scaling. (Caution: If all variables are nonlinear, Scale Yes unexpectedly does nothing, because there are no linear variables to scale.) Scale No suppresses scaling (equivalent to Scale option 0).

If nonlinear constraints are present, Scale option 1 or 0 should generally be tried at first. Scale option 2 gives scales that depend on the initial Jacobian, and should therefore be used only if (a) a good starting point is provided, and (b) the problem is not highly nonlinear.

Scale, Print causes the row-scales r(i) and column-scales c(j) to be printed. The scaled matrix coefficients are aij = aij c(j) / r(i), and the scaled bounds on the variables and slacks are lj = lj / c(j), uj = uj / c(j), where c(j) ≡ r(jn) if j>n.

All forms except Scale option may specify a tolerance t, where 0 < t < 1 (for example: Scale, Print, Tolerance = 0.99). This affects how many passes might be needed through the constraint matrix. On each pass, the scaling procedure computes the ratio of the largest and smallest nonzero coefficients in each column:
ρj =
 max i
|aij| /
 min i
|aij|      (aij≠ 0).
If maxj ρj is less than t times its previous value, another scaling pass is performed to adjust the row and column scales. Raising t from 0.9 to 0.99 (say) usually increases the number of scaling passes through A. At most 10 passes are made.

If a Scale option has not already been specified, Scale, Print or Scale tolerance both set the default scaling.

 Weight on linear objective w Default = 0.0

This keyword invokes the so-called composite objective technique, if the first solution obtained is infeasible, and if the objective function contains linear terms. While trying to reduce the sum of infeasibilities, the method also attempts to optimize the linear objective. At each infeasible iteration, the objective function is defined to be
 minimize x
σ w(cT x) + (sum of infeasibilities),
where σ=1 for minimization, σ= −1 for maximization, and c is the linear objective. If an “optimal" solution is reached while still infeasible, w is reduced by a factor of 10. This helps to allow for the possibility that the initial w is too large. It also provides dynamic allowance for the fact that the sum of infeasibilities is tending towards zero.

The effect of w is disabled after 5 such reductions, or if a feasible solution is obtained.

The Weight option is intended mainly for linear programs. It is unlikely to be helpful on nonlinear problems.

#### 4.2.3  Options for Nonlinear Objectives

The following options apply to nonlinear programs whose constraints are linear.

 Crash option l Default = 3 Crash tolerance t Default = 0.1

These options are the same as for linear programs.

#### 4.2.4  Options for All Nonlinear problems

 Expand frequency k Default = 10000

This option is used the same as for linear programs, but takes effect only when there is just one superbasic variable. (Cycling can occur only when the current solution is at a vertex of the feasible region. Thus, zero steps are allowed if there is more than one superbasic variable, but otherwise positive steps are enforced.) Increasing k helps reduce the number of slightly infeasible nonbasic basic variables (most of which are eliminated during a resetting procedure). However, it also diminishes the freedom to choose a large pivot element (see Pivot tolerance).

 Feasibility tolerance t Default = 1.0e-6

When the constraints are linear, a feasible solution is one in which all variables, including slacks, satisfy their upper and lower bounds to within the absolute tolerance t. (Since slacks are included, this means that the general linear constraints are also satisfied to within t.)

When nonlinear constraints are present, a feasible subproblem is one in which the linear constraints and bounds, as well as the current linearization of the nonlinear constraints, are satisfied to within the tolerance t.

MINOS first attempts to satisfy the linear constraints and bounds. If the sum of infeasibilities cannot be reduced to zero, the problem is declared infeasible.

Normally, the nonlinear functions F(x) and f(x) are evaluated only at points x that satisfy the linear constraints and bounds. If the functions are undefined in certain regions, every attempt should be made to eliminate these regions from the problem. For example, for a function F(x) = √x1 + logx2, it would be essential to place lower bounds on both variables. If Feasibility tolerance = 10−6, the bounds x1 ≥ 10−5 and x2 ≥ 10−4 might be appropriate. (The log singularity is more serious; in general, keep variables as far away from singularities as possible.)

An exception is during optional gradient checking (see Verify), which occurs before any optimization takes place. The points at which the functions are evaluated satisfy the bounds but not necessarily the general constraints. If this causes difficulty, gradient checking should be suppressed by setting Verify level -1.

If a subproblem is infeasible, the bounds on the linearized constraints are relaxed in several stages until the subproblem appears feasible. (The true bounds are restored for the next subproblem.) This approach sometimes allows the optimization to proceed successfully. In general, infeasible subproblems are a symptom of difficulty and it may be necessary to increase the Penalty parameter or alter the starting point.

Note: Feasibility with respect to the nonlinear constraints is measured against the Row tolerance, not the Feasibility tolerance.

 Hessian dimension r Default = 50

This specifies that an r × r triangular matrix R is to be available for use by the quasi-Newton algorithm (to approximate the reduced Hessian matrix according to ZT H ZRT R). Suppose there are s superbasic variables at a particular iteration. Whenever possible, r should be greater than s.

If rs, the first s columns of R are used to approximate the reduced Hessian in the normal manner. If there are no further changes to the set of superbasic variables, the rate of convergence is usually superlinear. If r<s, a matrix of the form
R =
 Rr 0 D

is used to approximate the reduced Hessian, where Rr is an r× r upper triangular matrix and D is a diagonal matrix of order sr. The rate of convergence is no longer superlinear (and may be arbitrarily slow).

The storage required is of order 1/2r2, which is substantial if r is as large as 1000 (say). In general, r should be a slight over-estimate of the final number of superbasic variables, whenever storage permits. It need not be larger than n1 + 1, where n1 is the number of nonlinear variables. For many problems it can be much smaller than n1.

 Iterations limit k Default = 3m + 10n1

If the constraints are linear, this is the maximum number of iterations allowed for the simplex method or the reduced-gradient method. Otherwise, it is the maximum number of minor iterations, summed over all major iterations.

If k = 0, no minor iterations are performed, but the starting point is tested for both feasibility and optimality.

 Linesearch tolerance t Default = 0.1

For nonlinear problems, this controls the accuracy with which a steplength α is located during one-dimensional searches of the form
 minimize α
F(xp) subject to 0 < α ≤ β.
A linesearch occurs on most minor iterations for which x is feasible. (If the constraints are nonlinear, the function being minimized is the augmented Lagrangian in equation (5).)

The value of t must satisfy 0.0 ≤ t < 1.0. The default value t = 0.1 requests a moderately accurate search, and should be satisfactory in most cases. If the nonlinear functions are cheap to evaluate, a more accurate search may be appropriate; try t = 0.01 or t = 0.001. The number of iterations should decrease, and this will reduce total run time if there are many linear or nonlinear constraints. If the nonlinear functions are expensive to evaluate, a less accurate search may be appropriate; try t = 0.5 or perhaps t = 0.9. (The number of iterations will probably increase, but the total number of function evaluations may decrease enough to compensate.)

 LU singularity tolerance t3 Default = є0.67 ≈ 3.25e−11 LU swap tolerance t4 Default = є1/4 ≈ 10−4

The singularity tolerance t3 helps guard against ill-conditioned basis matrices. When the basis is refactorized, the diagonal elements of U are tested as follows: if |Ujj| ≤ t3 or |Ujj| < t3 maxi |Uij|, the j-th column of the basis is replaced by the corresponding slack variable. (This is most likely to occur after a restart, or at the start of a major iteration.)

In some cases, the Jacobian matrix may converge to values that make the basis exactly singular. (For example, a whole row of the Jacobian could be zero at an optimal solution.) Before exact singularity occurs, the basis could become very ill-conditioned and the optimization could progress very slowly (if at all). Setting t3 = 1.0e-5, say, may help cause a judicious change of basis.

The LU swap tolerance is somewhat similar but can take effect more easily. It is again used only after a basis factorization, and normally just at the start of a major iteration. If a diagonal of U seems to be rather small (as measured by t4) relative to the biggest diagonal of U, a basis change is made in which the basic variable associated with the small diagonal of U is swapped with a carefully chosen superbasic variable (if there are any). The number of superbasic variables stays the same. A message is printed to advise that a swap has occurred.

In practice this tends to help problems whose basis is becoming ill-conditioned. If the number of swaps becomes excessive, set LU swap tolerance 1.0e-6, say, or perhaps smaller.

 Minor damping parameter d Default = 2.0

This parameter limits the change in x during a linesearch. It applies to all nonlinear problems, once a “feasible solution" or “feasible subproblem" has been found.

A linesearch of the form minimizeα F(xp) is performed over the range 0 < α ≤ β, where β is the step to the nearest upper or lower bound on x. Normally, the first steplength tried is α1 = min(1,β), but in some cases, such as F(x) = aebx or F(x) = axb, even a moderate change in the components of x can lead to floating-point overflow.

The parameter d is therefore used to define a limit β = d(1 + x) / p , and the first evaluation of F(x) is at the potentially smaller steplength α1 = min(1, β, β).

Wherever possible, upper and lower bounds on x should be used to prevent evaluation of nonlinear functions at meaningless points. The Minor damping parameter provides an additional safeguard. The default value d = 2.0 should not affect progress on well behaved problems, but setting d = 0.1 or 0.01 may be helpful when rapidly varying functions are present. A “good" starting point may be required. An important application is to the class of nonlinear least-squares problems.

In cases where several local optima exist, specifying a small value for d may help locate an optimum near the starting point.

 Multiple price k Default = 1

“Pricing" refers to a scan of the current nonbasic variables to determine if any should be changed from their current value (by allowing them to become superbasic or basic).

If multiple pricing is in effect, the k best nonbasic variables are selected for admission to the superbasic set. (“Best" means the variables with largest reduced gradients of appropriate sign.) If partial pricing is also in effect, the k best variables are selected from the current partition of A and I.

On large nonlinear problems it may help to set k > 1 if there are not many superbasic variables at the starting point but many at the optimal solution.

 Optimality tolerance t Default = 1.0e-6

 Partial price p Default = 10 (LP) or 1 (NLP)

This parameter may be useful for large problems that have significantly more variables than constraints. Larger values reduce the work required for each “pricing" operation (when a nonbasic variable is selected to become basic or superbasic).

 Scale option l Default = 2 (LP) or 1 (NLP) Scale Yes Scale No Scale linear variables Same as Scale option 1 Scale nonlinear variables Same as Scale option 2 Scale all variables Same as Scale option 2 Scale tolerance t Default = 0.9 Scale, Print Scale, Print, Tolerance t

Three scale options are available as follows:
l = 0
No scaling. If storage is at a premium, this option saves m+n words of workspace.
l = 1
If some of the variables are linear, the constraints and linear variables are scaled by an iterative procedure that attempts to make the matrix coefficients as close as possible to 1.0.
l = 2
The constraints and variables are scaled by the iterative procedure. Also, a certain additional scaling is performed that may be helpful if the right-hand side b or the solution x is large. This takes into account columns of ( A I ) that are fixed or have positive lower bounds or negative upper bounds.
Scale option 1 is the default for nonlinear problems. (Only linear variables are scaled.)

Scale Yes sets the default. (Caution: If all variables are nonlinear, Scale Yes unexpectedly does nothing, because there are no linear variables to scale.) Scale No suppresses scaling (equivalent to Scale option 0).

The Scale tolerance and Scale, Print options are the same as for linear programs.

 Subspace tolerance t Default = 0.5

This controls the extent to which optimization is confined to the current set of basic and superbasic variables (Phase 4 iterations), before one or more nonbasic variables are added to the superbasic set (Phase 3). The value specified must satisfy 0 < t ≤ 1.

When a nonbasic variable xj is made superbasic, the norm of the reduced-gradient vector (for all superbasics) is recorded. Let this be ZT g0. (In fact, the norm is |dj|, the size of the reduced gradient for the new superbasic variable xj.)

Subsequent Phase 4 iterations continue at least until the norm of the reduced-gradient vector satisfies ZT gt × ZT g0. (ZT g is the size of the largest reduced-gradient among the superbasic variables.)

A smaller value of t is likely to increase the total number of iterations, but may reduce the number of basis changes. A larger value such as t = 0.9 may sometimes lead to improved overall efficiency, if the number of superbasic variables is substantially larger at the optimal solution than at the starting point.

Other convergence tests on the change in the function being minimized and the change in the variables may prolong Phase 4 iterations. This helps to make the overall performance insensitive to larger values of t.

 Superbasics limit s Default = 50

This places a limit on the storage allocated for superbasic variables. Ideally, s should be set slightly larger than the “number of degrees of freedom" expected at an optimal solution.

For nonlinear problems, the number of degrees of freedom is often called the “number of independent variables". Normally, s need not be greater than n1+1, where n1 is the number of nonlinear variables. For many problems, s may be considerably smaller than n1. This saves storage if n1 is very large.

If Hessian dimension r is specified, the default value of s is the same number (and conversely). This is a safeguard to ensure superlinear convergence wherever possible. Otherwise, the default for both r and s is 50.

 Unbounded objective value r1 Default = 1.0e+20 Unbounded step size r2 Default = 1.0e+10

These parameters are intended to detect unboundedness in nonlinear problems. During a linesearch of the form minα F(x + α p) , if | F | exceeds r1 or if α exceeds r2, iterations are terminated with the exit message Problem is unbounded (or badly scaled).

If singularities are present, unboundedness in F(x) may be manifested by a floating-point overflow (during the evaluation of F(x + α p)), before the test against r1 can be made.

Unboundedness in x is best avoided by placing finite upper and lower bounds on the variables. See also the Minor damping parameter.

 Verify level l Default = 0 Verify objective gradients Same as Verify level 1 Verify constraint gradients Same as Verify level 2 Verify Same as Verify level 3 Verify gradients Same as Verify level 3 Verify Yes Same as Verify level 3 Verify No Same as Verify level 0

These options refer to a check on the gradients computed by your nonlinear function routines funobj and funcon at the starting point (the initial value of the nonlinear variables x(*)). Values output in the gradient array g(*) are compared with estimates obtained by finite differences.
l = 0
Only a “cheap" test is performed, requiring three evaluations of the nonlinear objective (if any) and two evaluations of the nonlinear constraints.
l = 1
A more reliable check is made on each component of the objective gradient.
l = 2
A check is made on each column of the Jacobian matrix associated with the nonlinear constraints.
l = 3
A detailed check is made on both the objective and the Jacobian.
l = −1
No checking is performed. This may be necessary if the functions are undefined at the starting point.
Verify level 3 is recommended for a new function routine, particularly if the “cheap" test indicates error. Missing gradients are not checked (so there is no overhead). If there are many nonlinear variables, the Start and Stop keywords may be used to limit the check to a subset.

As noted, gradient verification occurs at the starting point, before a problem is scaled, and before the first basis is factorized. The bounds on x will be satisfied, but the general linear constraints may not. If the nonlinear objective or constraint functions are undefined, you could initially specify
```   Objective = NONE
Nonlinear objective variables   0
Major iterations                1
New basis file                 11  (say)
Verify level                   -1
```
to obtain a point that satisfies the linear constraints, and then restart with the correct linear and nonlinear objective, along with
```   Old basis file                 11
Verify level                    3
```

#### 4.2.5  Options for Nonlinear Constraints

The following options apply to problems with nonlinear constraints.

 Completion Partial Default Completion Full

When there are nonlinear constraints, this determines whether subproblems should be solved to moderate accuracy (partial completion), or to full accuracy (full completion). MINOS implements the option by using two sets of convergence tolerances for the subproblems.

Use of partial completion may reduce the work during early major iterations, unless the Minor iterations limit is active. The optimal set of basic and superbasic variables will probably be determined for any given subproblem, but the reduced gradient may be larger than it would have been with full completion.

An automatic switch to full completion occurs when it appears that the sequence of major iterations is converging. The switch is made when the nonlinear constraint error is reduced below 100 * (Row tolerance), the relative change in λk is 0.1 or less, and the previous subproblem was solved to optimality.

Full completion tends to give better Lagrange-multiplier estimates. It may lead to fewer major iterations, but may result in more minor iterations.

 Crash option l Default = 3 Crash tolerance t Default = 0.1

Let A refer to the linearized constraint matrix.
l = 0
The initial basis contains only slack variables: B = I.
l = 1
A is evaluated at the starting point. Crash is called once, looking for a triangular basis in all rows and columns of A.
l = 2
A is evaluated only after the linear constraints are satisfied. Crash is called twice. The first call looks for a triangular basis in linear rows. The first major iteration proceeds with simplex-type iterations until the linear constraints are satisfied. A is then evaluated for the second major iteration and Crash is called again to find a triangular basis in the nonlinear rows (retaining the current basis for linear rows).
l = 3
Crash is called three times, treating linear equalities and linear inequalities separately, with simplex-type iterations in between. As before, the last call treats nonlinear rows at the start of the second major iteration.

 Feasibility tolerance t Default = 1.0e-6

A “feasible subproblem" is one in which the linear constraints and bounds, as well as the current linearization of the nonlinear constraints, are satisfied to within t.

Note that feasibility with respect to the nonlinear constraints is determined by the Row tolerance (not the Feasibility tolerance).

MINOS first attempts to satisfy the linear constraints and bounds. If the sum of infeasibilities cannot be reduced to zero, the problem is declared infeasible.

If Scale option = 1 or 2, feasibility is defined in terms of the scaled problem (since it is then more likely to be meaningful).

Normally, the nonlinear functions F(x) and f(x) are evaluated only at points x that satisfy the linear constraints and bounds. If the functions are undefined in certain regions, every attempt should be made to eliminate these regions from the problem. For example, for a function F(x) = √x1 + logx2, it would be essential to place lower bounds on both variables. If Feasibility tolerance = 10−6, the bounds x1 ≥ 10−5 and x2 ≥ 10−4 might be appropriate. (The log singularity is more serious; in general, keep variables as far away from singularities as possible.)

An exception is during optional gradient checking (see Verify), which occurs before any optimization takes place. The points at which the functions are evaluated satisfy the bounds but not necessarily the general constraints. If this causes difficulty, gradient checking should be suppressed by setting Verify level -1.

If a subproblem is infeasible, the bounds on the linearized constraints are relaxed in several stages until the subproblem appears feasible. (The true bounds are restored for the next subproblem.) This approach sometimes allows the optimization to proceed successfully. In general, infeasible subproblems are a symptom of difficulty and it may be necessary to increase the Penalty parameter or alter the starting point.

 Lagrangian Yes Default Lagrangian No

This determines the form of the objective function used for the linearized subproblems. The default value Yes is highly recommended. The Penalty parameter value is then also relevant.

If No is specified, the nonlinear constraint functions are evaluated only twice per major iteration. Hence this option may be useful if the nonlinear constraints are very expensive to evaluate. However, in general there is a great risk that convergence may not be achieved.

 Major damping parameter d Default = 2.0

This parameter may assist convergence on problems that have highly nonlinear constraints. It is intended to prevent large relative changes between subproblem solutions (xkk) and (xk+1k+1). For example, the default value 2.0 prevents the relative change in either xk or λk from exceeding 200 per cent. It will not be active on well behaved problems. (If all components of xk or λk are small, the norms of those vectors will not be allowed to increase beyond about 2.0.)

The parameter is used to interpolate between the solutions at the beginning and end of each major iteration. Thus, xk+1 and λk+1 are changed to
xk + σ ( xk+1xk) and λk + σ (λk+1 − λk)
for some steplength σ < 1. In the case of nonlinear equations (where the number of constraints is the same as the number of variables) this gives a damped Newton method.

This is a very crude control. If the sequence of major iterations does not appear to be converging, one should first re-run the problem with a higher Penalty parameter (say 2, 4 or 10). (Skip this re-run in the case of a square system of nonlinear equations: there are no degrees of freedom and the Penalty parameter value has essentially no effect.)

If the subproblem solutions continue to change violently, try reducing d to 0.2 or 0.1 (say).

 Major iterations k Default = 50

This is the maximum number of major iterations allowed. It is intended to guard against an excessive number of linearizations of the nonlinear constraints, since in some cases the sequence of major iterations may not converge.

The progress of the major iterations can be best monitored using Print level 0 (the default).

 Minor iterations k Default = 40

This is the maximum number of minor iterations allowed during a major iteration, after the linearized constraints for that subproblem have been satisfied. (An arbitrary number of minor iterations may be needed to find a feasible point for each subproblem.) The Iterations limit provides an independent limit on the total minor iterations (across all subproblems).

A moderate value (e.g., 30 ≤ k ≤ 200) prevents excessive effort being expended on early major iterations, but allows later subproblems to be solved to completion.

The first major iteration is special: it terminates as soon as the linear constraints and bounds are satisfied (if possible), ignoring the nonlinear constraints.

In general it is unsafe to specify a value as small as k = 1 or 2. (Even when an optimal solution has been reached, a few minor iterations may be needed for the corresponding subproblem to be recognized as optimal.)

 Optimality tolerance t Default = 1.0e-6

 Penalty parameter r Default = 1.0

This specifies that the initial value of ρk in the augmented Lagrangian (5) should be r times a certain default value 100/m1, where m1 is the number of nonlinear constraints. It is used when Lagrangian = Yes (the default setting).

For early runs on a problem with unknown characteristics, the default value should be acceptable. If the problem is highly nonlinear and the major iterations do not converge, a larger value such as 2 or 5 may help. In general, a positive r may be necessary to ensure convergence, even for convex programs.

On the other hand, if r is too large, the rate of convergence may be unnecessarily slow. If the functions are not highly nonlinear or a good starting point is known, it is often safe to specify Penalty parameter 0.0.

If several related problems are to be solved, the following strategy for setting the Penalty parameter may be useful:
1. Initially, use a moderate value for r (such as the default) and a reasonably low Iterations and/or Major iterations limit.

2. If successive major iterations appear to be terminate with radically different solutions, try increasing the penalty parameter. (See also the Major damping parameter.)

3. If there appears to be little progress between major iterations, it may help to reduce the penalty parameter.

 Radius of convergence r Default = 0.01

This determines when the penalty parameter ρk is reduced (if initialized to a positive value). Both the nonlinear constraint violation (see rowerr below) and the relative change in consecutive Lagrange multiplier estimates must be less than r at the start of a major iteration before ρk is reduced or set to zero.

A few major iterations later, full completion is requested if not already set, and the remaining sequence of major iterations should converge quadratically to an optimum.

 Row tolerance t Default = 1.0e-6

This specifies how accurately the nonlinear constraints should be satisfied at a solution. The default value is usually small enough, since model data is often specified to about that accuracy.

Let viol be the maximum violation of the nonlinear constraints (2), and let rowerr = viol / (1 + xnorm), where xnorm is a measure of the size of the current solution (x,y). The solution is regarded as acceptably feasible if rowerrt.

If the problem functions involve data that is known to be of low accuracy, a larger Row tolerance may be appropriate. On the other hand, nonlinear constraints are often satisfied with rapidly increasing accuracy during the last few major iterations. It is common for the final solution to satisfy rowerr = O(є).

 Scale option l Default = 2 (LP) or 1 (NLP) Scale Yes Scale No Scale linear variables Same as Scale option 1 Scale nonlinear variables Same as Scale option 2 Scale all variables Same as Scale option 2 Scale tolerance r Default = 0.9 Scale, Print Scale, Print, Tolerance r

Three scale options are available as follows:
l = 0
No scaling. If storage is at a premium, this option saves m+n words of workspace.
l = 1
Linear constraints and variables are scaled by an iterative procedure that attempts to make the matrix coefficients as close as possible to 1.0.
l = 2
All constraints and variables are scaled by the iterative procedure. Also, a certain additional scaling is performed that may be helpful if the right-hand side b or the solution x is large. This takes into account columns of ( A I ) that are fixed or have positive lower bounds or negative upper bounds.
Scale option 1 is the default for nonlinear problems. (Only linear variables are scaled.)

Scale Yes sets the default scaling. Caution: If all variables are nonlinear, Scale Yes unexpectedly does nothing, because there are no linear variables to scale.) Scale No suppresses scaling (equivalent to Scale option 0).

With nonlinear constraints, Scale option 1 or 0 should generally be tried first. Scale option 2 gives scales that depend on the initial Jacobian, and should therefore be used only if (a) a good starting point is provided, and (b) the problem is not highly nonlinear.

 Verify level l Default = 0 Verify objective gradients Same as Verify level 1 Verify constraint gradients Same as Verify level 2 Verify Same as Verify level 3 Verify gradients Same as Verify level 3 Verify Yes Same as Verify level 3 Verify No Same as Verify level 0

This option refers to a finite-difference check on the gradients (first derivatives) of each nonlinear function. It occurs before a problem is scaled, and before the first basis is factorized. (Hence, the variables may not yet satisfy the general linear constraints.)
l = 0
Only a “cheap" test is performed, requiring three evaluations of the nonlinear objective (if any) and two evaluations of the nonlinear constraints.
l = 1
A more reliable check is made on each component of the objective gradient.
l = 2
A check is made on each column of the Jacobian matrix associated with the nonlinear constraints.
l = 3
A detailed check is made on both the objective and the Jacobian.
l = −1
No checking is performed. This may be necessary if the functions are undefined at the starting point.

#### 4.2.6  Options for Input and Output

The following options specify various files to be used, and the amount of printed output required.

 Print level l Default = 0 Print frequency k Default = 100

The PRINT file provides more complete information than the SUMMARY file. It includes a listing of the main options, statistics about the problem, scaling information, the iteration log, the exit condition, and (optionally) the final solution. It also includes error messages.

For problems with linear constraints, Print level 0 gives most of the normal output. Print level 1 produces statistics for the basis matrix B and its LU factors each time the basis is factorized. Print frequency k produces one line of the iteration log every k minor iterations. Print frequency 1 causes every minor iteration to be logged. Print frequency 0 is shorthand for k = 99999.

For problems with nonlinear constraints, Print level 0 produces just one line of output per major iteration. This provides a short summary of the progress of the optimization. The Print frequency is ignored. If Print level > 0, certain quantities are printed at the start of each major iteration, and minor iterations are logged according to the Print frequency.

In the latter case, the value of l is best thought of as a binary number of the form
```      Print level   JFLXB
```
where each letter stands for a digit that is either 0 or 1. The quantities referred to are:
• B Basis statistics, as mentioned above.

• X xk, the nonlinear variables involved in the objective function or the constraints.

• L λk, the Lagrange-multiplier estimates for the nonlinear constraints. (Suppressed if Lagrangian = No, since then λk = 0.)

• F f(xk), the values of the nonlinear constraint functions.

• J J(xk), the Jacobian matrix.
To obtain output of any item, set the corresponding digit to 1, otherwise to 0. For example, Print level 10 sets X = 1 and the other digits equal to zero. The nonlinear variables will be printed each major iteration.

If J = 1, the Jacobian is output column-wise. Column j is preceded by the value of the corresponding variable xj and a key to indicate whether the variable is basic, superbasic or nonbasic. (Hence if J = 1, there is no reason to specify X = 1 unless the objective contains more nonlinear variables than the Jacobian.) A typical line of output is
```   3  1.250000D+01 BS      1  1.00000D+00      4  2.00000D+00
```
which means that x3 is basic at value 12.5, and the third column of the Jacobian has elements of 1.0 and 2.0 in rows 1 and 4.

 Solution Yes Default Solution No

The Yes and No options control whether the final solution is output to the PRINT file. Numerical values are printed in f16.5 format where possible.

The special values 0, 1 and −1 are printed as ., 1.0 and -1.0. Bounds outside the range (−1020,1020) appear as the word None.

 Summary file f Default = 6 (typically) Summary level l Default = 0 Summary frequency k Default = 100

The SUMMARY file provides a brief form of the iteration log and the exit condition. It also includes error messages. In an interactive environment, the output normally appears at the screen and allows a run to be monitored.

For problems with linear constraints, Summary level 0 produces brief output. Summary level 1 gives a few additional messages. Summary frequency k produces one line of the iteration log every k minor iterations. Summary frequency 1 causes every minor iteration to be logged. Summary frequency 0 is shorthand for k = 99999.

For problems with nonlinear constraints, Summary level 0 produces one line of output per major iteration. This provides a short summary of the progress of the optimization. The Summary frequency is ignored. If Summary level > 0, certain quantities are printed at the start of each major iteration, and minor iterations are logged according to the Summary frequency.

### 4.3  File Output

#### 4.3.1  The PRINT file

The following information is output to the PRINT file during the solution process. The longest line of output is 124 characters.

 • A listing of the SPECS file, if any. • The selected options. • An estimate of the storage needed and the amount available. • Some statistics about the problem data. • The storage available for the LU factors of the basis matrix. • A log from the scaling procedure, if Scale option > 0. • Notes about the initial basis obtained from CRASH or a BASIS file. • The major iteration log. • The minor iteration log. • Basis factorization statistics. • The EXIT condition and some statistics about the solution obtained. • The printed solution, if requested.

The last five items are described in the following sections.

#### 4.3.2  The major iteration log

Problems with nonlinear constraints require several major iterations to reach a solution, each involving the solution of an LC subproblem (a linearly constrained subproblem that generates search directions for x and λ). If Print level = 0, one line of information is output to the PRINT file each major iteration. An example log is shown in Figure 1.

``` Major minor   total ninf step     objective     Feasible Optimal  nsb   ncon     LU penalty BSwap
1     1T      1    0 0.0E+00  0.00000000E+00 0.0E+00 1.2E+01    8      4     31 1.0E-01     0
2    13      14    0 1.0E+00  2.67011596E+00 4.4E-06 2.8E-03    7     23     56 1.0E-01     8
Completion Full    now requested
3     3      17    0 1.0E+00  2.67009870E+00 3.1E-08 1.4E-06    7     29     41 1.0E-01     0
4     0      17    0 1.0E+00  2.67009863E+00 5.6E-17 1.4E-06    7     30     41 1.0E-02     0

```

Figure 1: The Major Iteration log

Label
Description
Major
The current major iteration number.
minor
is the number of iterations required by both the feasibility and optimality phases of the QP subproblem. Generally, Mnr will be 1 in the later iterations, since theoretical analysis predicts that the correct active set will be identified near the solution.
total
The total number of minor iterations.
ninf
The number of infeasibilities in the LC subproblem. Normally 0, because the bounds on the linearized constraints are relaxed in several stages until the constraints are “feasible”.
step
The step length α taken along the current search direction p. The variables x have just been changed to x + α p. On reasonably well-behaved problems, step=1.0 as the solution is approached, meaning the new estimate of (x,λ) is the solution of the LC subproblem.
objective
The value of true objective function.
Feasible
The value of rowerr, the maximum component of the scaled nonlinear constraint residual. The solution is regarded as acceptably feasible if Feasbl is less than the Row tolerance.
Optimal
The value of maxgap, the maximum complementarity gap. It is an estimate of the degree of nonoptimality of the reduced costs. Both Feasible and Optimal are small in the neighborhood of a solution.
nsb
The current number of superbasic variables.
ncon
The number of times subroutine funcon has been called to evaluate the nonlinear constraint functions. The Jacobian has been evaluated or approximated essentially the same number of times. (Function evaluations needed to estimate the Jacobian by finite differences are not included.)
LU
The number of nonzeros in the sparse LU factors of the basis matrix on completion of the LC subproblem. (The factors are computed at the start of each major iteration, and updated during minor iterations whenever a basis change occurs.)
As the solution is approached and the minor iterations decrease towards zero, LU reflects the number of nonzeros in the LU factors at the start of the LC subproblem.
penalty
The penalty parameter ρk used in the modified augmented Lagrangian that defines the objective function for the LC subproblem.
BSwap
The number of columns of the basis matrix B that were swapped with columns of S to improve the condition of B. The swaps are determined by an LU factorization of the rectangular matrix BS = (  B S  )T with stability being favored more than sparsity.

```    Itn ph pp     rg    +sbs  -sbs   -bs  step    pivot   ninf  sinf,objective     L     U ncp  nobj  ncon  nsb Hmod cond(H) conv
1  1  1 -1.0E+00     2     2     1 3.0E+01  1.0E+00    1  1.35000000E+02     0    19   0
2  1  1 -1.0E+00    27    27   102 7.0E+01  1.0E+00    1  1.05000000E+02     0    19   0
3  1  1 -1.0E+00     3     3    27 3.0E+01 -1.0E+00    1  3.50000000E+01     1    19   0
4  1  1 -1.0E+00    28    28    26 4.9E-11  1.0E+00    1  5.00000000E+00     1    20   0
5  1  1 -1.0E+00    47    47     2 4.9E-11  1.0E+00    1  5.00000000E+00     1    20   0
6  1  1  1.0E+00    27    27   101 5.0E+00 -1.0E+00    1  5.00000000E+00     2    20   0

Itn      6 -- feasible solution.  Objective =  -1.818044887E+02

7  3  1 -1.7E+01    87     0     0 1.0E+00  0.0E+00    0 -2.77020571E+02     4    21   0     6     0    1  1 0 1.0E+00 FFTT
8  3  1 -1.7E+01    72     0     0 1.9E-01  0.0E+00    0 -3.05336895E+02     4    21   0     8     0    2  1 0 5.5E+00 FFTT
9  3  1 -2.3E+01    41     0     0 1.0E+00  0.0E+00    0 -4.43743832E+02     4    21   0     9     0    3  1 0 6.5E+00 FFFF
10  4  1  6.6E-01     0     0     0 6.0E+00  0.0E+00    0 -5.64075338E+02     4    21   0    11     0    3  1 0 3.5E+00 FFTT
...

Itn ph pp     rg    +sbs  -sbs   -bs  step    pivot   ninf  sinf,objective     L     U ncp  nobj  ncon  nsb Hmod cond(H) conv
161  4  1  8.8E-03     0    73    71 4.2E+00  1.0E+00    0 -1.73532497E+03     4    20   0   340     0   17  1 1 9.6E+00 TTTF
162  3  1 -3.5E-02     6     0     0 1.5E+00  0.0E+00    0 -1.73533264E+03     4    20   0   342     0   18  1 0 1.3E+02 TTFF
163  4  1  2.9E-02     0     0     0 4.5E+00  0.0E+00    0 -1.73533617E+03     4    20   0   344     0   18  1 0 2.0E+01 TTFF
164  4  1  2.1E-02     0     0     0 2.3E+01  0.0E+00    0 -1.73538331E+03     4    20   0   347     0   18  1 0 9.8E+00 TTFF
165  4  1  3.0E-02     0     0     0 5.0E+00  0.0E+00    0 -1.73552261E+03     4    20   0   349     0   18  1 0 2.1E+01 TTFF
166  4  1  1.2E-02     0     0     0 1.0E+00  0.0E+00    0 -1.73556089E+03     4    20   0   350     0   18  1 0 2.2E+01 TTTF
tolrg  reduced to  1.162E-03      lvltol = 1
167  4  1  2.3E-03     0     0     0 1.0E+00  0.0E+00    0 -1.73556922E+03     4    20   0   351     0   18  1 0 2.2E+01 TTFF
168  4  1  1.2E-03     0     0     0 7.9E-01  0.0E+00    0 -1.73556953E+03     4    20   0   353     0   18  1 0 2.1E+01 TTFF
169  4  1  1.0E-04     0     0     0 1.0E+00  0.0E+00    0 -1.73556958E+03     4    20   0   354     0   18  1 0 2.0E+01 TTTT
tolrg  reduced to  1.013E-05      lvltol = 1
170  4  1  2.9E-05     0     0     0 1.1E+00  0.0E+00    0 -1.73556958E+03     4    20   0   356     0   18  1 0 1.7E+01 TTFF
171  4  1  1.0E-05     0     0     0 1.0E+00  0.0E+00    0 -1.73556958E+03     4    20   0   357     0   18  1 0 1.7E+01 TTFF
172  4  1  1.5E-06     0     0     0 1.2E+00  0.0E+00    0 -1.73556958E+03     4    20   0   359     0   18  1 0 1.7E+01 TTTF
tolrg  reduced to  1.000E-06      lvltol = 2
173  4  1  2.4E-07     0     0     0 1.0E+00  0.0E+00    0 -1.73556958E+03     4    20   0   360     0   18  1 0 1.7E+01 TTTF

Biggest dj =  3.583E-03 (variable     25)   norm rg =  2.402E-07   norm pi =  1.000E+00

```

Figure 2: The Minor Iteration log

#### 4.3.3  The minor iteration log

If Print level ≥ 1, one line of information is output to the PRINT file every kth minor iteration, where k is the specified Print frequency (default k=100). A heading is printed periodically. Problem t5weapon gives the log shown in Figure 2.
Label
Description
Itn
The current minor iteration number.
ph
The current phase of the solution procedure:
• 1 Phase 1 simplex method, trying to satisfy the linear constraints. The current solution is an infeasible vertex.
• 2 Phase 2 simplex method, solving a linear program.
• 3 Reduced-gradient method. A nonbasic variable has just become superbasic.
• 4 Reduced-gradient method, optimizing the current set of superbasic variables.
pp
The Partial Price indicator. The variable selected by the last PRICE operation came from the ppth partition of A and I. pp is set to zero when the basis is refactored.
A PRICE operation is defined to be the process by which a nonbasic variable is selected to become a new superbasic. The selected variable is denoted by jq. Variable jq often becomes basic immediately. Otherwise it remains superbasic, unless it reaches its opposite bound and becomes nonbasic again. If Partial price is in effect, variable jq is selected from App or Ipp, the ppth segments of the constraint matrix (  A I  ).
rg
In Phase 1, 2 or 3, this is dj, the reduced cost (reduced gradient) of the variable jq selected by PRICE at the start of the present iteration. Algebraically, dj=gj − πT aj for j=jq, where gj is the gradient of the current objective function, π is the vector of dual variables for the problem (or LC subproblem), and aj is the jth column of the current (  A I  ).
In Phase 4, rg is the largest reduced gradient among the superbasic variables.
+sbs
The variable jq selected by PRICE to be added to the superbasic set.
-sbs
The variable chosen to leave the set of superbasics. It has become basic if the entry under -bs is nonzero; otherwise it has become nonbasic.
-bs
The variable removed from the basis (if any) to become nonbasic.
step
The step length α taken along the current search direction p. The variables x have just been changed to x + α p.
pivot
If column aq replaces the rth column of the basis B, pivot is the rth element of a vector y satisfying By = aq. Wherever possible, step is chosen to avoid extremely small values of pivot (because they cause the basis to be nearly singular). In rare cases, it may be necessary to increase the Pivot tolerance to exclude very small elements of y from consideration during the computation of step.
ninf
The number of infeasibilities before the present iteration. This number decreases monotonically.
sinf,objective
If ninf>0, this is sinf, the sum of infeasibilities before the present iteration. It usually decreases at each nonzero step, but if ninf decreases by 2 or more, sinf may occasionally increase. Otherwise it is the value of the current objective function after the present iteration. For linear programs, it means the true linear objective function. For problems with linear constraints, it means the sum of the linear objective and the value returned by subroutine funobj. For problems with nonlinear constraints, it is the quantity just described if Lagrangian = No; otherwise it is the value of the augmented Lagrangian for the current major iterations (which tends to the true objective as convergence is approached).
L
The number of nonzeros representing the basis factor L. Immediately after a basis factorization B = LU, this is lenL, the number of subdiagonal elements in the columns of a lower triangular matrix. Further nonzeros are added to L when various columns of B are later replaced. (Thus, L increases monotonically.)
U
The number of nonzeros in the basis factor U. Immediately after a basis factorization, this is lenU, the number of diagonal and superdiagonal elements in the rows of an upper-triangular matrix. As columns of B are replaced, the matrix U is maintained explicitly (in sparse form). The value of U may fluctuate up or down; in general it will tend to increase.
ncp
The number of compressions required to recover storage in the data structure for U. This includes the number of compressions needed during the previous basis factorization. Normally ncp should increase very slowly. If not, the amount of workspace available to MINOS should be increased by a significant amount. As a suggestion, the work array z(*) should be extended by 2(L+U) elements.
The following items are printed if the problem is nonlinear or if the superbasic set is non-empty (i.e., if the current solution is not a vertex).
Label
Description
nobj
The number of times subroutine funobj has been called.
ncon
The number of times subroutine funcon has been called.
nsb
The current number of superbasic variables.
Hmod
An indication of the type of modifications made to the triangular matrix R that is used to approximate the reduced Hessian matrix. Two integers i1 and i2 are shown. They will remain zero for linear problems. If i1=1, a BFGS quasi-Newton update has been made to R, to account for a move within the current subspace. (This will not occur if the solution is infeasible.) If i2=1, R has been modified to account for a change in basis. This will sometimes occur even if the solution is infeasible (if a feasible point was obtained at some earlier stage).
Both updates are implemented by triangularizing the matrix R + vwT for some vectors v and w. If an update fails for numerical reasons, i1 or i2 will be set to 2, and the resulting R will be nearly singular. (However, this is highly unlikely.)
cond(H)
An estimate of the condition number of the reduced Hessian. It is the square of the ratio of the largest and smallest diagonals of the upper triangular matrix R—a lower bound on the condition number of the matrix RTR that approximates the reduced Hessian. cond(H) gives a rough indication of whether or not the optimization procedure is having difficulty. The reduced-gradient algorithm will make slow progress if cond(H) becomes as large as 108, and will probably fail to find a better solution if cond(H) reaches 1012 or more.
To guard against high values of cond(H), attention should be given to the scaling of the variables and the constraints. In some cases it may be necessary to add upper or lower bounds to certain variables to keep them a reasonable distance from singularities in the nonlinear functions or their derivatives.
conv
A set of four logical variables C1, C2, C3, C4 that are used to determine when to discontinue optimization in the current subspace (Phase 4) and consider releasing a nonbasic variable from its bound (the PRICE operation of Phase 3). Let rg be the norm of the reduced gradient, as described above. The meaning of the variables Cj is as follows:
 C1 is true if the change in x was sufficiently small; C2 is true if the change in the objective was sufficiently small; C3 is true if rg is smaller than some loose tolerance TOLRG; C4 is true if rg is smaller than some tighter tolerance.
The test used is of the form
if (C1 and C2 and C3) or C4 then go to Phase 3.
At present, tolrg = t |dj|, where t is the Subspace tolerance (nominally 0.5) and dj is the reduced-gradient norm at the most recent Phase 3 iteration. The “tighter tolerance" is the maximum of 0.1 tolrg and 10−7π. Only the tolerance t can be altered at run-time.

#### 4.3.4  Crash statistics

The following items are output to the PRINT file when no warm start is used. They refer to the number of columns that the CRASH procedure selects during several passes through A while searching for a triangular basis matrix.
Label
Description
Slacks
is the number of slacks selected initially.
Free cols
is the number of free columns in the basis, including those whose bounds are rather far apart.
Preferred
is the number of “preferred” columns in the basis (i.e., hs(j) = 3 for some jn). It will be a subset of the columns for which hs(j) = 3 was specified.
Unit
is the number of unit columns in the basis.
Double
is the number of columns in the basis containing 2 nonzeros.
Triangle
is the number of triangular columns in the basis with 3 or more nonzeros.
is the number of slacks used to pad the basis (to make it a nonsingular triangle).

#### 4.3.5  Basis factorization statistics

If Print level ≥ 1, the following items are output to the PRINT file whenever the basis B or the rectangular matrix BS = (  B S  )T is factorized. Note that BS may be factorized at the start of just some of the major iterations. It is immediately followed by a factorization of B itself.

Gaussian elimination is used to compute a sparse LU factorization of B or BS, where PLPT and PUQ are lower and upper triangular matrices for some permutation matrices P and Q.
Label
Description
Factorize
The number of factorizations since the start of the run.
Demand
A code giving the reason for the present factorization.
Itn
The current iteration number.
Nonlin
The number of nonlinear variables in the current basis B.
Linear
The number of linear variables in B.
Slacks
The number of slack variables in B.
m
The number of rows in the matrix being factorized (B or B S).
n
The number of columns in the matrix being factorized. Preceded by “=” if the matrix is B; by “>” if it is B S.
Elems
The number of nonzero elements in B or B S.
Amax
The largest nonzero in B or B S.
Density
The density of the matrix (percentage of nonzeros).
Merit
The average Markowitz merit count for the elements chosen to be the diagonals of PUQ. Each merit count is defined to be (c−1)(r−1) where c and r are the number of nonzeros in the column and row containing the element at the time it is selected to be the next diagonal. Merit is the average of m such quantities. It gives an indication of how much work was required to preserve sparsity during the factorization.
lenL
The number of nonzeros in the factor L.
L+U
The number of nonzeros in both L and U.
The number of times the data structure holding the partially factored matrix needed to be compressed to recover unused storage. Ideally this number should be zero. If it is more than 3 or 4, the amount of workspace available to MINOS should be increased for efficiency.
Incres
The percentage increase in the number of nonzeros in L and U relative to the number of nonzeros in B or B S.
Utri
The size of the “backward triangle” in B or B S. These top rows of U come directly from the matrix.
lenU
The number of nonzeros in the factor U.
Ltol
The maximum allowed size of nonzeros in L. Usually equal to the LU factor tolerance.
Umax
The maximum nonzero in U.
Ugrwth
The ratio Umax / Amax.
Ltri
The size of the “forward triangle” in B or B S. These initial columns of L come directly from the matrix.
dense1
is the number of columns remaining when the density of the basis matrix being factorized reached 0.3.
Lmax
The maximum nonzero in L (no larger than Ltol).
Akmax
The maximum nonzero arising during the factorization. (Printed only if Theshold Complete Pivoting is in effect.)
Agrwth
The ratio Akmax / Amax. (Printed only if Theshold Complete Pivoting is in effect.)
bump
The number of columns of B or B S excluding Utri and Ltri.
dense2
The number of columns remaining when the density of the basis matrix being factorized reached 0.6.
DUmax
The largest diagonal of U (really PUQ).
DUmin
The smallest diagonal of U.
condU
The ratio DUmax / DUmin. As long as Ltol is not large (say 10.0 or less), condU is an estimate of the condition number of B. If this number is extremely large, the basis is nearly singular and some numerical difficulties might occur. (However, an effort is made to avoid near-singularity by using slacks to replace columns of B that would have made Umin extremely small. Messages are issued to this effect, and the modified basis is refactored.)

#### 4.3.6  EXIT conditions

When the solution procedure terminates, an EXIT – message is printed to summarize the final result. Here we describe each message and suggest possible courses of action.

The number associated with each EXIT is the output value of the integer variable inform.

 The following messages arise when the SPECS file is found to contain no further problems.

-2 EXIT – input error.
MINOS encountered end-of-file or an endrun card before finding a SPECS file.

Otherwise, the SPECS file may be empty, or cards containing the keywords Skip or Endrun may imply that all problems should be ignored.

-1 ENDRUN
This message is printed at the end of a run if MINOS terminates of its own accord. Otherwise, the operating system will have intervened for one of many possible reasons (excess time, missing file, arithmetic error in user routines, etc.).

 The following messages arise when a solution exists (though it may not be optimal).

0 EXIT – optimal solution found
This is the message we all hope to see! It is certainly preferable to every other message, and we naturally want to believe what it says, because this is surely one situation where the computer knows best. There may be cause for celebration if the objective function has reached an astonishing new high (or low).

In all cases, a distinct level of caution is in order, even if it can wait until next morning. For example, if the objective value is much better than expected, we may have obtained an optimal solution to the wrong problem! Almost any item of data could have that effect if it has the wrong value. Verifying that the problem has been defined correctly is one of the more difficult tasks for a model builder. It is good practice in the function subroutines to print any data that is input during the first entry.

If nonlinearities exist, one must always ask the question: could there be more than one local optimum? When the constraints are linear and the objective is known to be convex (e.g., a sum of squares) then all will be well if we are minimizing the objective: a local minimum is a global minimum in the sense that no other point has a lower function value. (However, many points could have the same objective value, particularly if the objective is largely linear.) Conversely, if we are maximizing a convex function, a local maximum cannot be expected to be global, unless there are sufficient constraints to confine the feasible region.

Similar statements could be made about nonlinear constraints defining convex or concave regions. However, the functions of a problem are more likely to be neither convex nor concave. Always specify a good starting point if possible, especially for nonlinear variables, and include reasonable upper and lower bounds on the variables to confine the solution to the specific region of interest. We expect modelers to know something about their problem, and to make use of that knowledge as well as they can.

One other caution about “Optimal solution”s. Some of the variables or slacks may lie outside their bounds more than desired, especially if scaling was requested. Max Primal infeas refers to the largest bound infeasibility and which variable (or slack) is involved. If it is too large, consider restarting with a smaller Feasibility tolerance (say 10 times smaller) and perhaps Scale option 0.

Similarly, Max Dual infeas indicates which variable is most likely to be at a non-optimal value. Broadly speaking, if
Max Dual infeas/Norm of pi = 10d,
then the objective function would probably change in the dth significant digit if optimization could be continued. If d seems too large, consider restarting with smaller Optimality tolerances.

Finally, Nonlinear constraint violn shows the maximum infeasibility for nonlinear rows. If it seems too large, consider restarting with a smaller Row tolerance.

1 EXIT – the problem is infeasible
When the constraints are linear, this message can probably be trusted. Feasibility is measured with respect to the upper and lower bounds on the variables and slacks. Among all the points satisfying the general constraints Ax + s = 0, there is apparently no point that satisfies the bounds on x and s. Violations as small as the Feasibility tolerance are ignored, but at least one component of x or s violates a bound by more than the tolerance.

When nonlinear constraints are present, infeasibility is much harder to recognize correctly. Even if a feasible solution exists, the current linearization of the constraints may not contain a feasible point. In an attempt to deal with this situation, when solving each linearly constrained (LC) subproblem, MINOS is prepared to relax the bounds on the slacks associated with nonlinear rows.

If an LC subproblem proves to be infeasible or unbounded (or if the Lagrange multiplier estimates for the nonlinear constraints become large), MINOS enters so-called “nonlinear elastic” mode. The subproblem includes the original QP objective and the sum of the infeasibilities—suitably weighted using the Elastic weight parameter. In elastic mode, some of the bounds on the nonlinear rows “elastic”—i.e., they are allowed to violate their specified bounds. Variables subject to elastic bounds are known as elastic variables. An elastic variable is free to violate one or both of its original upper or lower bounds. If the original problem has a feasible solution and the elastic weight is sufficiently large, a feasible point eventually will be obtained for the perturbed constraints, and optimization can continue on the subproblem. If the nonlinear problem has no feasible solution, MINOS will tend to determine a “good” infeasible point if the elastic weight is sufficiently large. (If the elastic weight were infinite, MINOS would locally minimize the nonlinear constraint violations subject to the linear constraints and bounds.)

Unfortunately, even though MINOS locally minimizes the nonlinear constraint violations, there may still exist other regions in which the nonlinear constraints are satisfied. Wherever possible, nonlinear constraints should be defined in such a way that feasible points are known to exist when the constraints are linearized.

2 EXIT – the problem is unbounded (or badly scaled)
EXIT – violation limit exceeded – the problem may be unbounded
For linear problems, unboundedness is detected by the simplex method when a nonbasic variable can apparently be increased or decreased by an arbitrary amount without causing a basic variable to violate a bound. A message prior to the EXIT message will give the index of the nonbasic variable. Consider adding an upper or lower bound to the variable. Also, examine the constraints that have nonzeros in the associated column, to see if they have been formulated as intended.

Very rarely, the scaling of the problem could be so poor that numerical error will give an erroneous indication of unboundedness. Consider using the Scale option.

For nonlinear problems, MINOS monitors both the size of the current objective function and the size of the change in the variables at each step. If either of these is very large (as judged by the Unbounded parameters, the problem is terminated and declared UNBOUNDED. To avoid large function values, it may be necessary to impose bounds on some of the variables in order to keep them away from singularities in the nonlinear functions.

The second message indicates an abnormal termination while enforcing the limit on the constraint violations. This exit implies that the objective is not bounded below in the feasible region defined by expanding the bounds by the value of the Violation limit.

3 EXIT – major iteration limit exceeded
EXIT – minor iteration limit exceeded
EXIT – too many iterations
Either the Iterations limit or the Major iterations limit was exceeded before the required solution could be found. Check the iteration log to be sure that progress was being made. If so, restart the run using a basis file that was saved (or should have been saved!) at the end of the run.

4 EXIT – requested accuracy could not be achieved
A feasible solution has been found, but the requested accuracy in the dual infeasibilities could not be achieved. An abnormal termination has occurred, but MINOS is within 10−2 of satisfying the Major optimality tolerance. Check that the Major optimality tolerance is not too small.

5 EXIT – the superbasics limit is too small: nnn
The problem appears to be more nonlinear than anticipated. The current set of basic and superbasic variables have been optimized as much as possible and a PRICE operation is necessary to continue, but there are already nnn superbasics (and no room for any more).

In general, raise the Superbasics limit s by a reasonable amount, bearing in mind the storage needed for the reduced Hessian (about 1/2 s2 double words).

6 EXIT – constraint and objective values could not be calculated
This exit occurs if a value mode ≤ −1 is set during some call to funobj or funcon. MINOS assumes that you want the problem to be abandoned forthwith.

In some environments, this exit means that your subroutines were not successfully linked to MINOS. If the default versions of funobj and funcon are ever called, they issue a warning message and then set mode to terminate the run.

7 EXIT – subroutine funobj seems to be giving incorrect gradients
A check has been made on some individual elements of the objective gradient array at the first point that satisfies the linear constraints. At least one component gObj(j) is being set to a value that disagrees markedly with a forward-difference estimate of ∂ f / ∂ xj. (The relative difference between the computed and estimated values is 1.0 or more.) This exit is a safeguard, since MINOS will usually fail to make progress when the computed gradients are seriously inaccurate. In the process it may expend considerable effort before terminating with EXIT 9 below.

Check the function and gradient computation very carefully in funobj. A simple omission (such as forgetting to divide fObj by 2) could explain everything. If fObj or gObj(j) is very large, then give serious thought to scaling the function or the nonlinear variables.

If you feel certain that the computed gObj(j) is correct (and that the forward-difference estimate is therefore wrong), you can specify Verify level 0 to prevent individual elements from being checked. However, the optimization procedure may have difficulty.

8 EXIT – subroutine funcon seems to be giving incorrect gradients
This is analogous to the preceding exit. At least one of the computed Jacobian elements is significantly different from an estimate obtained by forward-differencing the constraint vector F(x). Follow the advice given above, trying to ensure that the arrays fCon and gCon are being set correctly in funcon.

9 EXIT – the current point cannot be improved upon
Several circumstances could lead to this exit.
1. Subroutines funobj or funcon could be returning accurate function values but inaccurate gradients (or vice versa). This is the most likely cause. Study the comments given for EXIT 7 and 8, and do your best to ensure that the coding is correct.

2. The function and gradient values could be consistent, but their precision could be too low. For example, accidental use of a real data type when double precision was intended would lead to a relative function precision of about 10−6 instead of something like 10−15. The default Optimality tolerance of 10−6 would need to be raised to about 10−3 for optimality to be declared (at a rather suboptimal point). Of course, it is better to revise the function coding to obtain as much precision as economically possible.

3. If function values are obtained from an expensive iterative process, they may be accurate to rather few significant figures, and gradients will probably not be available. One should specify
 Function precision t Major optimality tolerance √t
but even then, if t is as large as 10−5 or 10−6 (only 5 or 6 significant figures), the same exit condition may occur. At present the only remedy is to increase the accuracy of the function calculation.
10 EXIT – cannot satisfy the general constraints
An LU factorization of the basis has just been obtained and used to recompute the basic variables x B, given the present values of the superbasic and nonbasic variables. A step of “iterative refinement” has also been applied to increase the accuracy of x B. However, a row check has revealed that the resulting solution does not satisfy the current constraints Axs = 0 sufficiently well.

This probably means that the current basis is very ill-conditioned. Try Scale option 1 if scaling has not yet been used and there are some linear constraints and variables.

For certain highly structured basis matrices (notably those with band structure), a systematic growth may occur in the factor U. Consult the description of Umax, Umin and Growth in §4.3.5, and set the LU factor tolerance to 2.0 (or possibly even smaller, but not less than 1.0).

12 EXIT – terminated from subroutine s1User
The user has set the value iAbort = 1 in subroutine s1User. MINOS assumes that you want the problem to be abandoned forthwith.

 If the following exits occur during the first basis factorization, the primal and dual variables x and pi will have their original input values. BASIS files will be saved if requested, but certain values in the printed solution will not be meaningful.

20 EXIT – not enough integer/real storage for the basis factors
The main integer or real storage array iw(*) or rw(*) is apparently not large enough for this problem. The routine declaring iw and rw should be recompiled with a larger dimensions for those arrays. The new values should also be assigned to leniw and lenrw.

An estimate of the additional storage required is given in messages preceding the exit.

21 EXIT – error in basis package
A preceding message will describe the error in more detail. One such message says that the current basis has more than one element in row i and column j. This could be caused by a corresponding error in the input parameters a(*), ha(*), and ka(*).

22 EXIT – singular basis after nnn factorization attempts
This exit is highly unlikely to occur. The first factorization attempt will have found the basis to be structurally or numerically singular. (Some diagonals of the triangular matrix U were respectively zero or smaller than a certain tolerance.) The associated variables are replaced by slacks and the modified basis is refactorized, but singularity persists. This must mean that the problem is badly scaled, or the LU factor tolerance is too much larger than 1.0.

 If the following messages arise, either an OLD BASIS file could not be loaded properly, or some fatal system error has occurred. New BASIS files cannot be saved, and there is no solution to print. The problem is abandoned.

30 EXIT – the basis file dimensions do not match this problem
On the first line of the OLD BASIS file, the dimensions labeled m and n are different from those associated with the problem that has just been defined. You have probably loaded a file that belongs to another problem.

Remember, if you have added rows or columns to a(*), ha(*) and ka(*), you will have to alter m and n and the map beginning on the third line (a hazardous operation). It may be easier to restart with a PUNCH or DUMP file from an earlier version of the problem.

31 EXIT – the basis file state vector does not match this problem
For some reason, the OLD BASIS file is incompatible with the present problem, or is not consistent within itself. The number of basic entries in the state vector (i.e., the number of 3's in the map) is not the same as m on the first line, or some of the 2's in the map did not have a corresponding “j   xj” entry following the map.

32 EXIT – system error.  Wrong no. of basic variables:  nnn
This exit should never happen. It may indicate that the wrong MINOS source files have been compiled, or incorrect parameters have been used in the call to subroutine minoss.

Check that all integer variables and arrays are declared integer in your calling program (including those beginning with h!), and that all “real” variables and arrays are declared consistently. (They should be double precision on most machines.)

 The following messages arise if additional storage is needed to allow optimization to begin. The problem is abandoned.

42 EXIT – not enough 8-character storage to start solving the problem
The main character storage array cw(*) is not large enough.

43 EXIT – not enough integer storage to start solving the problem
The main integer storage array iw(*) is not large enough to provide workspace for the optimization procedure. See the advice given for Exit 20.

44 EXIT – not enough real storage to start solving the problem
The main storage array rw(*) is not large enough to provide workspace for the optimization procedure. Be sure that the Superbasics limit is not unreasonably large. Otherwise, see the advice for EXIT 20.

#### 4.3.7  Solution output

At the end of a run, the final solution is output to the PRINT file in accordance with the Solution keyword. Some header information appears first to identify the problem and the final state of the optimization procedure. A ROWS section and a COLUMNS section then follow, giving one line of information for each row and column. The format used is similar to certain commercial systems, though there is no industry standard.

An example of the printed solution is given in §4.3. In general, numerical values are output with format f16.5. The maximum record length is 111 characters, including the first (carriage-control) character.

To reduce clutter, a dot “.” is printed for any numerical value that is exactly zero. The values ±1 are also printed specially as 1.0 and -1.0. Infinite bounds (±1020 or larger) are printed as None.

Note: If two problems are the same except that one minimizes an objective f(x) and the other maximizes −f(x), their solutions will be the same but the signs of the dual variables πi and the reduced gradients dj will be reversed.

#### The ROWS section

General linear constraints take the form lAxu. The ith constraint is therefore of the form
α ≤ aT x ≤ β,
and the value of aT x is called the row activity. Internally, the linear constraints take the form Axs = 0, where the slack variables s should satisfy the bounds lsu. For the ith “row", it is the slack variable si that is directly available, and it is sometimes convenient to refer to its state. Slacks may be basic or nonbasic (but not superbasic).

Nonlinear constraints α ≤ Fi(x) + aT x ≤ β are treated similarly, except that the row activity and degree of infeasibility are computed directly from Fi(x) + aT x rather than from si.
Label
Description
Number
The value n+i. This is the internal number used to refer to the ith slack in the iteration log.
Row
The name of the ith row.
State
The state of the ith row relative to the bounds α and β. The various states possible are as follows.
State = LL
The row is at its lower limit, α.
State = UL
The row is at its upper limit, β.
State = EQ
The limits are the same (α = β).
State = BS
The constraint is not binding. si is basic.
A key is sometimes printed before the State to give some additional information about the state of the slack variable.
State key = A
Alternative optimum possible. The slack is nonbasic, but its reduced gradient is essentially zero. This means that if the slack were allowed to start moving from its current value, there would be no change in the objective function. The values of the basic and superbasic variables might change, giving a genuine alternative solution. The values of the dual variables might also change.
State key = D
Degenerate. The slack is basic, but it is equal to (or very close to) one of its bounds.
State key = I
Infeasible. The slack is basic and is currently violating one of its bounds by more than the Feasibility tolerance.
State key = N
Not precisely optimal. The slack is nonbasic. Its reduced gradient is larger than the Optimality tolerance .
Note: If Scale option > 0, the tests for assigning A, D, I, N are made on the scaled problem, since the keys are then more likely to be meaningful.
Activity
The row value aT x (or Fi(x) + aT x for nonlinear rows).
Slack activity
The amount by which the row differs from its nearest bound. (For free rows, it is taken to be minus the Activity.)
Lower limit
α, the lower bound on the row.
Upper limit
β, the upper bound on the row.
Dual activity
The value of the dual variable πi, often called the shadow price (or simplex multiplier) for the ith constraint. The full vector π always satisfies BT π = gB, where B is the current basis matrix and gB contains the associated gradients for the current objective function.
I
The constraint number, i.

#### The COLUMNS section

Here we talk about the “column variables" xj, j = 1:n. We assume that a typical variable has bounds α ≤ xj ≤ β.
Label
Description
Number
The column number, j. This is the internal number used to refer to xj in the iteration log.
Column
The name of xj.
State
The state of xj relative to the bounds α and β. The various states possible are as follows.
State = LL
xj is nonbasic at its lower limit, α.
State = UL
xj is nonbasic at its upper limit, β.
State = EQ
xj is nonbasic and fixed at the value α = β.
State = FR
xj is nonbasic at some value strictly between its bounds: α < xj < β.
State = BS
xj is basic. Usually α < xj < β.
State = SBS
xj is superbasic. Usually α < xj < β.
A key is sometimes printed before the State to give some additional information about the state of xj.
State key = A
Alternative optimum possible. The variable is nonbasic, but its reduced gradient is essentially zero. This means that if xj were allowed to start moving from its current value, there would be no change in the objective function. The values of the basic and superbasic variables might change, giving a genuine alternative solution. The values of the dual variables might also change.
State key = D
Degenerate. xj is basic, but it is equal to (or very close to) one of its bounds.
State key = I
Infeasible. xj is basic and is currently violating one of its bounds by more than the Feasibility tolerance.
State key = N
Not precisely optimal. xj is nonbasic. Its reduced gradient is larger than the Optimality tolerance .
Note: If Scale option > 0, the tests for assigning A, D, I, N are made on the scaled problem, since the keys are then more likely to be meaningful.
Activity
The value of the variable xj.
gj, the jth component of the gradient of the (linear or nonlinear) objective function. (If any xj is infeasible, gj is the gradient of the sum of infeasibilities.)
Lower limit
α, the lower bound on xj.
Upper limit
β, the upper bound on xj.
The reduced gradient dj = gj − πT aj, where aj is the jth column of the constraint matrix (or the jth column of the Jacobian at the start of the final major iteration).
M+J
The value m+j.

#### 4.3.8  The SUMMARY file

If Summary file > 0, the following information is output to the SUMMARY file. (It is a brief form of the PRINT file.) All output lines are less than 72 characters.

 • The Begin line from the SPECS file, if any. • The basis file loaded, if any. • A brief Major iteration log. • A brief Minor iteration log. • The EXIT condition and a summary of the final solution.

The following SUMMARY file is from example problem t6wood using Print level 0 and Major damping parameter 0.5.

```
==============================
M I N O S  5.51     (Nov 2002)
==============================

Begin t6wood   (WOPLANT test problem; optimal obj = -15.55716)

Name   WOPLANT
===>  Note:  row  OBJ       selected as linear part of objective.
Rows          9
Columns      12
Elements     73

Scale option  2,    Partial price   1

Itn      0 -- linear constraints satisfied.

This is problem t6wood.  Derivative level =  3

funcon  sets      36   out of      50   constraint gradients.

Major minor  step     objective  Feasible Optimal  nsb   ncon penalty BSwap
1     0T 0.0E+00  0.00000E+00 5.9E-01 1.1E+01    0      4 1.0E+00     0
2    22  5.0E-01 -1.56839E+01 2.7E-01 1.6E+01    3     47 1.0E+00     0
3    10  6.0E-01 -1.51527E+01 1.5E-01 9.9E+00    2     68 1.0E+00     2
4    21  5.7E-01 -1.53638E+01 6.4E-02 3.6E+00    3    113 1.0E+00     1
5    15  1.0E+00 -1.55604E+01 2.7E-02 1.4E-01    3    144 1.0E+00     0
6     5  1.0E+00 -1.55531E+01 6.4E-03 2.2E-01    3    154 1.0E+00     0
7     4  1.0E+00 -1.55569E+01 3.1E-04 7.0E-04    3    160 1.0E-01     0
8     2  1.0E+00 -1.55572E+01 1.6E-08 1.1E-04    3    163 1.0E-02     0
9     1  1.0E+00 -1.55572E+01 5.1E-14 2.2E-06    3    165 1.0E-03     0

EXIT -- optimal solution found

Problem name                 WOPLANT
No. of iterations                  80   Objective value     -1.5557160112E+01
No. of major iterations             9   Linear objective    -1.5557160112E+01
Penalty parameter            0.000100   Nonlinear objective  0.0000000000E+00
No. of calls to funobj              0   No. of calls to funcon            165
No. of superbasics                  3   No. of basic nonlinears             6
No. of degenerate steps             0   Percentage                       0.00
Norm of x   (scaled)          9.8E-01   Norm of pi  (scaled)          1.8E+02
Norm of x                     3.2E+01   Norm of pi                    1.6E+01
Max Prim inf(scaled)        0 0.0E+00   Max Dual inf(scaled)        1 2.2E-06
Max Primal infeas           0 0.0E+00   Max Dual infeas             1 5.8E-08
Nonlinear constraint violn    5.1E-14

Solution printed on file   9

funcon called with nstate =   2

Time for MPS input                           0.00 seconds
Time for solving problem                     0.04 seconds
Time for solution output                     0.00 seconds
Time for constraint functions                0.00 seconds
Time for objective function                  0.00 seconds
Endrun
```