Numerical Properties of LU Factorization
Deriving error bounds for LU factorization is not done in P573. That is the task of
a numerical analysis course instead. The practical computational implications of the
error analyses is stressed here.
Matrix Norms
Expressing the error properties of LU factorization uses matrix norms.
Let e be machine precision; for 64-bit double precision numbers this means
e ≈ 2.23 × 10−16.
Recall the various norms of a n-vector x:
- ||x||1 =
∑i=1:n |x(i)|
- ||x||inf = maxi=1:n |x(i)|
- ||x||2 =
square root of ∑i=1:n x(i)2
The
norm of a matrix
is also defined, but can be more
complex. For an m × n matrix A,
- ||A||F =
sqrt{∑i=1:m ∑j=1:n A{ij}**2}
- ||A||1 =
maxj (∑i=1:m |A(i,j)| )
- ||A||inf =
maxi (∑j=1:n |A(i,j)| )
- ||A||2 =
max | yTAx |/(||x|| ||y||)
over all nonzero vectors y and x.
- ||A||2 =
square root of largest eigenvalue of ATA.
As
mentioned earlier
,
||AB|| ≤ ||A||*||B|| does not hold for all matrix
norms. If a norm does satisfy that property, it is called a consistent norm.
The matrix 2-norm is defined using vector norms, and
can be equivalently defined that way. More generally, for p = 1, 2,
and ∞ define
||A||p = max ||Ax||p / || x ||p
for all x ≠ 0.
Heuristically ||A||p is the maximum amount by which A can "stretch" a vector.
Slight digression: the p-norm for both vectors and matrices
can be defined for any p such that 1 ≤ p ≤ ∞. So
there are an uncountably infinite number of different norms.
Norms defined as above are called operator norms in functional analysis.
Condition numbers
LU factorization was the original motivation for numerical analysis
back before Egypt built the pyramids (or around then).
Results generally use the condition number k(A) of A, defined
for nonsingular matrices as
k(A) = ||A|| ||A−1||
Notes about k(A):
- The condition number definition depends on the norm used; generally
k(A) = k2(A) is used and the convention is that
without a subscript, k(A) means the 2-norm based condition number
- k(A) ≥ 1 (prove this!).
- When A is symmetric, k(A) =
|λ1|/|λn|, where
λ1 = an eigenvalue of A with maximal absolute
value and λn = an eigenvalue of A with minimal
absolute value. The phrasing "an eigenvalue" is needed since eigenvalues can be
repeated, or two different eigenvalues may have the same absolute value. The term
"maximum" implies a unique eigenvalue achieves that value, while "maximal" allows it
to be achieved by multiple eigenvalues.
- An ill-conditioned problem means one with a large condition number.
[But what is "large" in this context?]
- Conditioning is a property of the problem; numerical stability is a
property of an algorithm and its implementation.
- Conditioning is a broader concept than what is defined here. Generally,
a condition number measures how close is the given problem to one that is singular
or unsoluble in the given context.
- Actually it is the reciprocal of the condition number that gives the
distance, and most software will return it instead of k(A) in a variable
called rcond for "reciprocal condition number".
- A condition number can be defined for a singular or rectangular matrix, but
then the context is usually solving least-squares problems. One definition of a condition
number for a matrix is the ratio of largest to smallest nonzero singular
values of the matrix, and that is well-defined for all matrices except one.
That definition also coincides with the definition
k(A) = ||A|| ||A−1|| when A
is square and nonsingular, making it a robust extension to singular or rectangular
matrices.
LU factorization error analysis: perturbation result:
For this error bound, a perturbation to a matrix A is denoted
ΔA. This occurs as a result of representing real
numbers as finite precision machine numbers, or from non-exact arithmetic operations.
ΔA is a single matrix, not the product Δ*A
(or the Laplace operator applied to A). So think of
ΔA as a small change in the matrix A, of order
10-16 in double precision arithmetic. The same holds for
Δb as a small perturbation of the vector b.
Let x be the exact solution to Ax = b,
where A is nonsingular and b is given.
If ||ΔA|| || A−1 || = r < 1,
then the perturbed system (A + ΔA)y = b + Δb is nonsingular
with a unique solution y .
Furthermore, if h is chosen so that
if ||ΔA|| < h || A || and
||Δb|| < h || b ||, then
|| x − y || < 2 h k(A) ||x|| / (1−r)
Big Fat Hairy Deal.
An equation (OK, inequality) like this isn't worth diddly-squat unless you can
understand and use it. So mentally, let's take this apart: k(A) ≥ 1, and
is much larger than 1 for an ill-conditioned matrix. The term r is the
product of the perturbation of A and the norm
||A−1||, and that norm measures how much the inverse of
A can stretch a vector. Requiring r < 1 implies that the
perturbation is small enough to overcome any tendency of A−1
to blow up a vector it multiplies. If r is small enough, the
denominator 1-r is close to 1 and can be neglected. h bounds the
relative sizes of the perturbations (||ΔA|| and
||Δb||) to the quantities (||A|| and ||b||). So the
perturbation result gives an upper bound on
|| x − y || / |x|| ,
which is the relative error in the computed solution y from
the exact solution x. If things aren't too perverse, that bound boils down
to 2 h k(A) .
So the perturbation bound depends on
k(A), and if that is large then the relative error in
the computed solution may be large — unless the perturbations on the matrix
A and right-hand side vector b can be made small to
offset k(A).
But Wait, There's More (Part I)
This gives an upper bound on the relative error, but a computed solution may be much
more accurate than the inequality implies. In some cases (not LU factorization,
however) a computed solution can be accurate to machine tolerance even when the
condition number is infinite and the matrix is actually singular. For solving linear
systems of equations, "the" error vector is || x − y ||, where
x and y are the exact and computed solutions. If the matrix is
singular, there is no single exact solution. In this case, the accuracy of a
computed or approximate solution can be measured by the residual vector
b − Ay. If y nearly solves the linear system, then ||b
− Ay|| should be near zero. So the perturbation result is not the final
complete One True Bound.
But Wait, There's More (Part II)
The bound above depends on the condition number of the matrix. That in turn requires
knowing ||A−1||. How can you compute that without knowing
A−1? And if you know A−1, then
why not just compute x = A−1*b and not waste time and
computation fooling around with LU factorization, triangular solves, etc.?
One answer is that you can get estimates of the condition number from some numerical
linear algebra libraries. But in typical Ouroboros fashion, they often base the
estimate of ||A−1|| by computing the LU factorization of
the matrix A and then dynamically choosing right hand side vectors
u to get maximal blow up in the solution v to LUv = u.
But if the computed L and U are wildly inaccurate, then doesn't
that mean the estimated ||A−1|| is also wildly inaccurate?
That means some bound needs to be provided not on the computed solution vector, but
also on the computed LU factors ....
LU factorization accuracy result:
Define |G| as the matrix G with all of its entries replaced by their absolute
values (the Matlab function abs(G) does this element by element absolute value).
|G| is not a norm, because it returns a matrix, not a real number. Unfortunately
some (mostly older) textbooks and papers use the notation |G| for the norm
of G.
In these post-pyramidal times, that notation is mostly deprecated, but beware of
it in publications, textbooks, or online materials for other courses.
Let ε be machine tolerance; for double precision arithmetic this is
approximately 2.2 × 10−16.
Let L and U be the factors computed by any of the algorithms covered,
and let x be the solution computed using the permutation matrix and
the two triangular solves.
Then (A + E) x = b, where
|E| ≤ n ε (3|A| + 5|L| |U|) + O(ε2)
This is a different creature than the perturbution result.
- First, the O(ε2) term is imminently
discardable. For ε ≈ 2.2 × 10−16,
ε2 ≈ 4.8 × 10−32 so it can
be safely ignored.
- This gives bounds on each element in the "error matrix" E, not on
the kind of sum totals that come from norms on matrices. This means bounds can
be found on individual elements, e.g., the last diagonal element
E(n,n).
- The bound does not say how close x is to the true
solution, but instead says x exactly solves a "nearby" system - with
explicit componentwise bounds on just how "nearby" that system is to the exact
system Ax = b.
- The condition number is not explicitly used in the bound. Of course, the
condition number will implicitly rear its ugly head in the size of the L
and U factors, and keep in mind that the (reciprocal) condition number
measures how close a singular system is to the given one.
- Everything on the right-hand side (A, L, and U) is
known once the factorization has been computed.
Big Fat Hairy Deal Redux
Again, try to make sense of the inequality instead of treating it as a dead skunk to
be routed to someone you don't like. This bound has a factor of n, so if
the linear system is of order billion = 109, potentially 9 significant
digits are lost. [Linear systems of order 1010 and larger are solved
routinely in scientific computing]. The factor ε ≈
10−16 compensates for this enormous growth of the error bound,
but the numbers mean potentially a computation with double precision numbers and
arithmetic will give results with only 8 significant digits, about the same as single
precision floating point numbers.
Even if E is small, that does not mean the computed solution
x is close to the exact solution. If the condition number of
A is large, that means the matrix is close to a singular matrix.
That in turn means A+E could be singular, and the computed
x may have zero correct digits. This is the distinction between
the
condition of a problem and the stability of an algorithm and its implementation.
- A badly conditioned problem may give widely wrong results even with a stable
algorithm is used.
- A well conditioned problem may give widely wrong results if a numerically
unstable algorithm is used.
- The only case where accurate results are guaranteed is when a numerically
stable agorithm is applied to a well conditioned problem.
The good news is:
- An unstable method applied to a badly conditioned problem may give highly
accurate results.
But that would be an astonishing stroke of luck, vastly more unlikely than
buying a winning lottery ticket every day, until the heat death of the universe
brings it all to a stop.
- Last Modified: Mon 04 Nov 2019, 07:15 AM