AD Forward Mode - Basic Use in MATLAB

4 Basic Use of Forward Mode for First Derivatives

4.1 Differentiating Matlab Expressions

Using forward mode AD, Mad can easily compute first derivatives of functions defined via expressions built-up of the arithmetic operations and intrinsic functions of Matlab. For the moment let us simply consider the trivial operation y=f(x)=x² for which the derivative is 2x. Let us determine this derivative at x=3 for which f(3)=9 and df(3)/dx=6. We will make use of the fmad constructor function fmad, and the functions getvalue and getderivs for extracting values and derivatives from fmad objects.

Example 1

MADEXForward1: Forward Mode Example y=x²

This example demonstrates how to differentiate simple arithmetic expressions in Matlab.

See also MADEXForward2

Contents

Differentiating y=f(x)=x²
Using fmad
Calculating y
Extracting the value
Extracting the derivatives

Differentiating y=f(x)=x²

MAD can easily compute first derivatives of functions defined via expressions built-up of the arithmetic operations and intrinsic functions of Matlab using forward mode AD. For the moment let us simply consider the trivial operation

y=f(x)=x²

for which the derivative is

=2x.

Let us determine this derivative at x=3 for which f(3)=9 and df(3)/dx=6.

Using fmad

We first define x to be of fmad class and have value 3 and derivative 1 (since the derivative with respect to the independent variable is one, dx/dx=1). The command and Matlab output is as below.


format compact % so there are no blank lines in the output
x=fmad(3,1) % creates the fmad object with value 3 and derivative 1

value =
     3
derivatives =
     1

Calculating y

The value of the object x is 3 and its derivatives correspond to an object of size [1 1] (a scalar) with one derivative of value 1. We now calculate y exactly as we would normally, allowing the fmad function libraries to take care of the arithmetic associated with values and derivatives.


y=x^2

value =
     9
derivatives =
     6

Extracting the value

We extract the value of y using the MAD function getvalue.


yvalue=getvalue(y)

yvalue =
     9

Extracting the derivatives

Similarly we extract the derivatives using getderivs.


yderivs=getderivs(y)

yderivs =
     6

4.2 Vector-Valued Functions

Let us now consider a simple, linear vector-valued function for which we may use fmad to calculate Jacobian-vector products or the function's Jacobian. We also see, for the first time fmad's getinternalderivs function.

Example 2

MADEXForward2: Forward Mode Example y=A*x

This example demonstrates how to calculate Jacobian-vector products and how to calculate the Jacobian for a simple vector-valued function.

See also MADEXForward1

Contents

Differentiating y=f(x)=A*x
Directional derivative in [1;0;0] direction
Directional derivative in [0;1;0] direction
Calculating the Jacobian
But dy is a 3-D array not a matrix!

Differentiating y=f(x)=A*x

MAD easily deals with vector-valued functions. Let us consider the function

y=f(x)=A*x

at x=[1; 1 ;1 ] where A is the matrix,

A=[ 1  4  7
    2  5  8
    3  6  9]

for which the Jacobian is the matrix A and directional derivatives in the 3 respective coordinate directions are the 3 respective columns of A.


format compact;
A=[1 4 7;2 5 8; 3 6 9];
xval=[1;1;1];

Directional derivative in [1;0;0] direction

Using the fmad constructor function we initialise x with its directional derivative, calculate y and extract the derivative. We find the correct directional derivative [1;2;3].


x=fmad(xval,[1;0;0])
y=A*x;
dy=getderivs(y)

fmad object x
value =
     1
     1
     1
derivatives =
     1
     0
     0
dy =
     1
     2
     3

Directional derivative in [0;1;0] direction

Similarly in the [0;1;0] direction we get [4;5;6].


x=fmad(xval,[0;1;0]);
y=A*x;
dy=getderivs(y)

Calculating the Jacobian

To obtain the full Jacobian we initialise the derivative with the 3 x 3 identity matrix.


x=fmad(xval,eye(3))
y=A*x
dy=getderivs(y)

fmad object x
value =
     1
     1
     1
derivvec object derivatives
Size = 3  1
No. of derivs = 3
derivs(:,:,1) =
     1
     0
     0
derivs(:,:,2) =
     0
     1
     0
derivs(:,:,3) =
     0
     0
     1
fmad object y
value =
    12
    15
    18
derivvec object derivatives
Size = 3  1
No. of derivs = 3
derivs(:,:,1) =
     1
     2
     3
derivs(:,:,2) =
     4
     5
     6
derivs(:,:,3) =
     7
     8
     9
dy(:,:,1) =
     1
     2
     3
dy(:,:,2) =
     4
     5
     6
dy(:,:,3) =
     7
     8
     9

But dy is a 3-D array not a matrix!

Notice now that we have a 3-dimensional array returned for a result we expect to be a matrix. This is easily remedied using the Matlab intrinsic squeeze, which squeezes out singleton dimensions, or more conveniently using getinternalderivs to extract the derivatives from the fmad object y.


dy=squeeze(dy)
dy=getinternalderivs(y)

dy =
     1     4     7
     2     5     8
     3     6     9
dy =
     1     4     7
     2     5     8
     3     6     9

This example brings us to an important topic, how are the derivatives stored in Mad?

4.3 Storage of Derivatives in Mad

Mad stores derivatives in two ways depending upon whether a variable of fmad class possesses single, or multiple directional derivatives.

4.3.1 Single Directional Derivatives

For a variable of fmad class with a single directional derivatives the derivatives are stored as an array of class double with the same shape as the variable's value. The getderivs function will therefore return this array. We have seen this behaviour in Example 1 and Example 2.

4.3.2 Multiple Directional Derivatives

For a variable of fmad class with multiple directional derivatives, the derivatives are stored in an object of class derivvec. Mad has been written so that multiple directional derivatives may be thought of as an array of one-dimension higher than that of the corresponding value using, what we term, the derivative's external representation. Mad also has an internal representation of derivatives that may often be of use, particularly when interfacing Mad with another Matlab package. Accessing the internal representation is covered in section 5.1. Consider an N-dimensional Matlab array A with size(A)=[n1 n2 n3 ... nN], such that (n1 n2 ... nN) are positive integers. Then the external representation of A's derivatives, which we shall call dA is designed to have size(dA)=[n1 n2 n3 ... nN nD], where nD>1 is the number of directional derivatives. An example or two may make things clearer.

Example 3

MADEXExtRep1: External representation of a matrix

This example illustrates MAD's external representation of derivatives for a matrix.

See also MADEXExtRep2, MADEXIntRep1

Contents

Define a 2 x 2 matrix A.
Multiple directional derivatives for A
The fmad object
The derivs component

Define a 2 x 2 matrix A.


format compact
A=[1 2; 3 4]

A =
     1     2
     3     4

Multiple directional derivatives for A

Define a 2 x 2 x 3 matrix (note the transpose) to provide 3 directional derivatives for dA.


dA=reshape([1 2 3 4;5 6 7 8;9 10 11 12]',[2 2 3])

dA(:,:,1) =
     1     3
     2     4
dA(:,:,2) =
     5     7
     6     8
dA(:,:,3) =
     9    11
    10    12

The fmad object

We create the fmad object,


A=fmad(A,dA)

value =
     1     2
     3     4
Derivatives
Size = 2  2
No. of derivs = 3
derivs(:,:,1) =
     1     3
     2     4
derivs(:,:,2) =
     5     7
     6     8
derivs(:,:,3) =
     9    11
    10    12

The derivs component

The last index of the A's derivs component refers to the directional derivatives and we see that, as expected, we have a 2 x 2 x 3 array which makes up 3 sets of 2 x 2 directional derivative matrices.

Note that fmad (like Matlab and Fortran) assumes a column major (or first index fastest) ordering for array storage. The derivatives supplied to the constructor function fmad are therefore assumed to be in column major order and the external representation reshapes them appropriately. We therefore needn't bother reshaping the derivatives before passing them to fmad as the example below shows.

Example 4

MADEXExtRep2: External Representation Example

This example shows the use of column-major ordering for an input derivative.

See also MADEXExtRep1, MADEXIntRep1

Contents

Define a 2 x 2 matrix A.
Multiple directional derivatives for A
The fmad object
The derivs component

Define a 2 x 2 matrix A.


format compact
A=[1 2; 3 4]

A =
     1     2
     3     4

Multiple directional derivatives for A

Define a vector to provide directional derivatives for dA.


dA=[1 2 3 4 5 6 7 8 9 10 11 12];

The fmad object

We create the fmad object,


A=fmad(A,dA)

value =
     1     2
     3     4
Derivatives
Size = 2  2
No. of derivs = 3
derivs(:,:,1) =
     1     3
     2     4
derivs(:,:,2) =
     5     7
     6     8
derivs(:,:,3) =
     9    11
    10    12

The derivs component

Notice how the elements of dA are taken in column major order to provide the 3 directional derivatives of A in turn.

4.4 Use of Intrinsics (Operators and Functions)

We have produced overloaded fmad functions for most commonly used Matlab operators and functions. For example addition of two fmad objects is determined by the function @fmad\plus, subtraction @fmad\minus, etc. To see which overloaded intrinsic operators and functions are provided type

help fmad.Contents

at the Matlab prompt. The help information for individual functions describes the function and any additional information such as restrictions, e.g.

help fmad.plus

4.5 Additional Functions

We have also coded a small number of additional functions which are not part of Matlab but application specific. For example, for a matrix A, logdet(A) calculates log(abs(det(A))) a function used in statistical parameter estimation. This composition of functions may be differentiated very efficiently by considering it in this composite form [Kub94, For01]. Note that a version of the function valid for arguments of classes double and sparse is provided in the MAD/madutil.

4.6 Differentiating User Defined Functions

In section 4.1 we saw how simple Matlab expressions could be differentiated using Mad. It is usually straightforward to apply the same methodology to differentiating user defined functions.

Example 5 Consider the function which defines the Brusselator stiff ODE problem from the Matlab ODE toolbox, supplied as brussode_f.m and shown in Figure 1. For a given scalar time t (unused) and vector y of 2N rows, this function brussode_f will supply the right-hand-side function for an ODE of the form,

=F(t,y).

The Jacobian of the function is required to solve the ODE efficiently via an implicit stiff ODE solver (e.g., ode15s in Matlab). The function has actually been written in so-called vectorised form, i.e. if y is a matrix then the dydt supplied will also be a matrix with each column of dydt corresponding to the appropriate column of y. This approach allows for the efficient approximation of the function's Jacobian via finite-differences.

It is trivial to calculate the Jacobian of the function of figure 1 using Mad. The commands to do so are given in the script MADEXbrussodeJac.m. The script also includes a simple one-sided finite-difference (FD) calculation of the Jacobian.

MADEXbrussodeJac: Full Jacobian of Brusselator Problem

This example shows how to use the fmad class to calculate the Jacobian of the Brusselator problem and compare the result with those from finite-differencing. More efficient AD approaches are available as shown in MADEXbrussodeJacSparse, MADEXbrussodeJacComp

See also: brussode_f, brussode_S, MADEXbrussodeJacSparse, MADEXbrussodeJacComp, brussode

Contents

Initialise variables
Use fmad to calculate the Jacobian
Using finite-differencing
Comparing the AD and FD Jacobian

Initialise variables

We define the input variables.


N=3;
t=0;
y0=ones(2*N,1);

Use fmad to calculate the Jacobian

We initialise y with the value y0 and derivatives given by the identity matrix of size equal to the length of y0. We may then evaluate the function and extract the value and Jacobian.


y=fmad(y0,eye(length(y0)));
F=brussode_f(t,y,N);
F_fmad=getvalue(F); % grab value
JF_fmad=getinternalderivs(F) % grab Jacobian

JF_fmad =
   -2.6400    1.0000    0.3200         0         0         0
    1.0000   -1.6400         0    0.3200         0         0
    0.3200         0   -2.6400    1.0000    0.3200         0
         0    0.3200    1.0000   -1.6400         0    0.3200
         0         0    0.3200         0   -2.6400    1.0000
         0         0         0    0.3200    1.0000   -1.6400

Using finite-differencing

We make 2N extra copies of y, add a perturbation to the copies, perform all function evaluations taking advantage of the vectorization of the function brussode_f, and approximate the derivatives using 1-sided finite-differences.


y=repmat(y0,[1 2*N+1]);
y(:,2:end)=y(:,2:end)+sqrt(eps)*eye(2*N);
F=brussode_f(t,y,N);
F_fd=F(:,1);
JF_fd=(F(:,2:end)-repmat(F_fd,[1 2*N]))./sqrt(eps)

JF_fd =
   -2.6400    1.0000    0.3200         0         0         0
    1.0000   -1.6400         0    0.3200         0         0
    0.3200         0   -2.6400    1.0000    0.3200         0
         0    0.3200    1.0000   -1.6400         0    0.3200
         0         0    0.3200         0   -2.6400    1.0000
         0         0         0    0.3200    1.0000   -1.6400

Comparing the AD and FD Jacobian

The function values computed are, of course, identical. The Jacobians disagree by about 1e-8, this is due to the truncation error of the finite-difference approximation.


disp(['Function values norm(F_fd-F_fmad)  = ',num2str(norm(F_fd-F_fmad,inf))])
disp(['Function Jacobian norm(JF_fd-JF_fmad)= ',num2str(norm(JF_fd-JF_fmad,inf))])

Function values norm(F_fd-F_fmad)  = 0
Function Jacobian norm(JF_fd-JF_fmad)= 2.861e-008

function dydt=brussode_f(t,y,N)
% brussode_f: defines ODE for the Brusselator problem.
%            Taken from matlab 6.5 brussode.
%
% See also MADEXbrussode, brussode_S, brussode

% AUTHOR: S.A.Forth
% DATE: 30/5/06
% Copyright 2006: S.A.Forth, Cranfield University
%
% REVISIONS:
% DATE  WHO   WHAT
c = 0.02 * (N+1)^2;
dydt = zeros(2*N,size(y,2));      % preallocate dy/dt

% Evaluate the 2 components of the function at one edge of the grid
% (with edge conditions).
i = 1;
dydt(i,:) = 1 + y(i+1,:).*y(i,:).^2 - 4*y(i,:) + c*(1-2*y(i,:)+y(i+2,:));
dydt(i+1,:) = 3*y(i,:) - y(i+1,:).*y(i,:).^2 + c*(3-2*y(i+1,:)+y(i+3,:));

% Evaluate the 2 components of the function at all interior grid points.
i = 3:2:2*N-3;
dydt(i,:) = 1 + y(i+1,:).*y(i,:).^2 - 4*y(i,:) + ...
    c*(y(i-2,:)-2*y(i,:)+y(i+2,:));
dydt(i+1,:) = 3*y(i,:) - y(i+1,:).*y(i,:).^2 + ...
    c*(y(i-1,:)-2*y(i+1,:)+y(i+3,:));

% Evaluate the 2 components of the function at the other edge of the grid
% (with edge conditions).
i = 2*N-1;
dydt(i,:) = 1 + y(i+1,:).*y(i,:).^2 - 4*y(i,:) + c*(y(i-2,:)-2*y(i,:)+1);
dydt(i+1,:) = 3*y(i,:) - y(i+1,:).*y(i,:).^2 + c*(y(i-1,:)-2*y(i+1,:)+3);

Figure 1: Brusselator function supplied with Matlab ODE Toolbox

We should note that the above approach is not efficient either for FD or AD for large N. The Jacobian is sparse and exploiting the sparsity greatly increases the efficiency of both approaches. Exploiting sparsity will be covered in Sections 5.3 and 5.4.

4.7 Possible Problems Using the Forward Mode

In many cases fmad will be able to differentiate fairly involved functions as shown in Figure 1. However some problems in terms of robustness and efficiency may occur.

Assigning an fmad object to a variable of class double. For example,
```
>> x=fmad(1,1);
>> y=zeros(2,1);
>> y(1)=x;
??? Conversion to double from fmad is not possible.
```
See Section 5.2 and Example 4 for a solution to this problem.
Differentiation for a particular function is not supported. For example,
```
>> x=fmad(1,1);
>> y=tanh(x)
??? Error using ==> tanh
Function 'tanh' not defined for variables of class 'fmad'.
```
Please contact the author to arrange for the fmad class to be extended for your case.
Derivatives are not accurate when differentiating an iteratively defined function. See section 5.5 for an approach to this issue.
Differentiation is slow compared to finite-differences. The fmad and derivvec class are designed to be competitive with finite-differencing in terms of speed of execution. They will of course return derivatives correct to within machine rounding. If this is not the case then please contact the author.
Derivatives are incorrect. Then you have discovered a bug, please contact the author.

« Previous « Start » Next »