Pivoting in LU Factorization
At an intermediate stage of LU factorization, the 2D array A that
originally contained the matrix
A has a mix of L, U, and partially processed parts of A.
Because of that, mixing the terminology of array versus matrix is almost
unavoidable. At the start, array A contains the matrix A. At the
end, it is a data structure representing the matrices L and U.
All LU factorization algorithms for dense matrices work in-place, so in general
assume that A refers to the array.
Another common nomenclature: at an intermediate stage of LU factorization, the parts of the
L and U factors that have already been computed are called
the reduced part. The rest of the array, usually its trailing principle submatrix,
is called the unreduced part.
When using any of the LU factorization methods (rank-1 update or matrix-vector
product) of LU factorization, dividing by A(k,k) in the scaling operation will cause
problems if the absolute value |A(k,k)| is small. To avoid this, algorithms
search for a large value in the unreduced part of the array A, then swap
rows and sometimes columns to move it to the (k,k) diagonal position.
Full pivoting means bringing the largest element in absolute value
into that position by swapping rows and columns in the matrix to put the largest
element in the (k,k) diagonal position. Partial pivoting means
searching for the largest only among the entries in the k-th column, and so only
require row swaps. Full pivoting is rarely used because it requires
O(n3) searching and data movement operations1.
Also, LU factorization is "usually stable" when partial pivoting is used.
Some classes of
problems (e.g., two point boundary value problems in ODEs) can make it fail, but
most often partial pivoting suffices. In general when the term "pivoting" is used,
it refers to partial pivoting, not full pivoting. After pivoting is done to move a
larger (or largest) value into the (k,k) diagonal position, it is called
the "pivot" or "pivot entry" or "pivot element". In any case it is the diagonal
element A(k,k), chosen in an attempt to avoid dividing by a too-small value
when scaling the subdiagonal part of column k.
Including partial pivoting in the
rank-1 version of LU factorization
gives:
for k = 1:n
p = index of max element in |A(k:n,k)|
piv(k) = p
swap rows k and p of A: A(k,:) ↔ A(p,:)
A(k+1:n,k) = A(k+1:n,k)/A(k,k)
A(k+1:n,k+1:n) = A(k+1:n,k+1:n) − A(k+1:n,k) * A(k,k+1:n)
end for
The first step defines p as the index of the largest element in absolute
value in the diagonal and subdiagonal elements in column k. The outermost
loop on k runs to n, so that piv(n) will be defined. If
the LU factorization is then used to solve Ax = b, that last entry should
never be referenced; correct indexing of loops should lead to it never being used.
Mathematically, for a square matrix piv(n) = n must hold (why?). Also,
exchange of rows k and p of the array A means the entire
rows are swapped, not just the parts of the remaining trailing submatrix. The
corresponding parts of the previously computed portions of L and U
must also be swapped, since since they are overwriting A. It is another
magic feature of LU factorization that this makes the factors come out in the right
order as well.
The Matlab function LU1p.m implements this
with different indexing variables. However, notice three important features:
- swapping two rows does
not require a temporary variable the way that C or C++ does. Instead the swap is
done in one line: A([pivot(r) r], 1:n) = A([r pivot(r)], 1:n). The same syntax
can be used in Fortran, but it requires a comma to separate the row numbers, viz.,
A([pivot(r), r], 1:n) = A([r, pivot(r)], 1:n).
- when finding the pivot element using max() and abs(), it is possible that
more than one entry in the column has the maximal absolute value. In that case,
Matlab will return all such elements and their indices to s and I:
[s, I] = max(abs(A(r:m, r))). To guard against this, the Matlab function
uses indexing s(1) and I(1) to reference the the first maximal element and its
index.
- when A(r:m,r) is passed to the max() function, it does not know that the
indexing into that vector starts at r and goes to m. Instead it assumes the
indexing starts at 1 and goes to m-r. So when the index I is returned, we
have to shift it by adding r-1 so that it will be in the correct range:
pivot(r) = I(1)+r-1.
This is not specific to Matlab, and the same shift has to be done in C, C++,
or Fortran.
Including pivoting with the
matrix-vector product version
and letting the matrix be m × n with m ≥ n gives:
for k = 1:n−1
B(k:m,k) = B(k:m,k) − B(k:m,1:k−1) * B(1:k−1,k)
p = index of max element in |B(k:m,k)|
piv(k) = p
swap rows k and p of B: B(k,:) ↔ B(p,:)
B(k,k+1:n) = B(k,k+1:n) − B(k,1:k−1) * B(1:k−1,k+1:n)
B(k+1:m,k) = B(k+1:m,k)/B(k,k)
end for
B(n:m,n) = B(n:m,n) − B(n:m,1:n−1) * B(1:n−1,n)
B(n+1:m,n) = B(n+1:m,n)/B(n,n)
The array name B instead of A is used to distinguish this from
the usual square matrix case where m = n. However, B will actually be a
subarray of the array A in most cases. The last two steps are the new
ingredients - they apply the remaining updates to the rest of the matrix. Also,
another pivot step can be taken if |B(n,n)| is too small. That is one case
where piv(n) must be defined.
The Matlab function LU3.m implements this
algorithm without pivoting. The Matlab script
testLU3.m creates a 10 x 7 matrix and then
calls LU3.m on it, checking the resulting LU factors.
Obviously, the three lines for implementing pivoting in the algorithms above are
simple, and simple to implement. Here are some tips:
- If k = p, don't call a function to swap a row with itself (well, duh!).
Since swapping rows constitutes data movement without any computation occurring,
it has a load/store ratio of ∞. This is the worst possible ratio of memory
references to flops, so should be avoided when possible.
- The goal of pivoting is not to get the largest possible element into the
pivot position, but to avoid dividing by too-small of an entry. So the choice
of the pivot element can be relaxed. Markovitz pivoting with parameter
s searches for the subdiagonal element with largest absolute value,
but only swaps the rows if |A(k,p)| > s*|A(k,k)|. The value
s = 10 is a common choice. In other words, if the current diagonal
element A(k,k) is within a factor of 10 of the largest possible pivot, use
it as the pivot.
- Always choosing the largest subdiagonal element as the pivot can be seen
as setting the "Markovitz parameter" s = 1. Never pivoting can be seen
as setting s = ∞. This is a case where actually using one of
the IEEE special floating point values can make the coding simpler.
- Choosing the Markovitz parameter s too large can lead to numerical
instability, which is what pivoting is supposed to avoid. Part of the issue is
just how much time is spent in swapping rows for a give value of s.
At this point, you have the skills needed to determine that ... and it would be
foolish not to insert the necessary timing calls in advance.
- Pivoting does mean that LU factorization is not completely in-place, using
only the original array A for everything. However, it introduces only an integer
vector of length n and so is minor compared to copying or replicating the
original data for A which requires n2 words of storage.
- The BLAS level-1 function dswap() will swap two vectors in-place,
and was in large part defined to be used for LU factorization pivoting. This
also explains why dswap has the signature dswap(n, x, incx, y, incy):
the entries of the vectors x and y are rows of the 2D array A. When A is in
column-major order, the strides incx and incy are greater
than 1. What should those strides be?
- The Matlab idiom for swapping rows k and p in an array A is
A([k p], :) = A([p k], :) which is more straightforward than the
usual loop with a temporary approach required by C/C++ or Java.
Just beware when using Fortran: it requires commas separate the two row
indices: A([k, p], :) = A([p, k], :) . Using brackets [k, p] was
introduced in Fortran 2008; before that the notation had to use
(\k, p \) which is still valid but just plain ugly.
1 a version of full pivoting has been developed that does not require
O(n3) work, but is more complicated than what we need here.
Next: BLAS-3 version of LU factorization
- Last Modified: Mon 04 Nov 2019, 07:01 AM