P573: Linear Algebra Basics
Linear Algebra Basics
This short review is primarily a listing of some terminology needed. In general the
vector space used is restricted to the space ℜn, the set of
n-vectors with elements drawn from the real numbers ℜ. One obvious exception
is the vector space of all m × n matrices with elements drawn from the real
numbers. By default, x ∈ ℜn is a column vector (an n × 1
matrix). A row vector is indicated by transposing a column vector:
xT. In Matlab a transpose is indicated by a single quote mark, e.g. x'.
What if x is a complex vector in Matlab, is it the transpose or Hermitian transpose
(AKA the conjugate transpose)?
Answer this for yourself: set up a random vector of length n = 5 with complex
numbers as its components:
x = randn(5,1) + sqrt(-1)*randn(5,1)
and then see what x' is:
x'
[If you want to use Matlab's help facility, doing "help ' " won't work. Instead use
"help punct"]
All of the vectors and matrices used in P573 will be real-valued unless otherwise
explicitly stated. The only exception is for eigenvalues of nonsymmetric matrices.
Sometimes I will denote multiplication by
juxtaposition, e.g. Ba. Other times, particularly when emphasis is needed on
the multiplication operation, I'll use B*a.
Vector Norms
There are several "norms" or measures of the sizes of vectors. The reason for
specifying these as vector norms is because there are also matrix
norms, defined later in this page.
For a vector v with components
v1, v2, ..., vn,
the three most common norms used are
- 2-norm: sqrt( ∑ from i = 1 to n of (vi * vi))
- 1-norm: ∑ from i = 1 to n of | vi |
- ∞-norm: the maximum value for i = 1:n of | vi |
In Matlab, those vector norms are given by functions
- norm(v) = norm(v, 2) = 2-norm of v
- norm(v, 1) = 1-norm of v
- norm(v, inf) = norm(v, 'inf') = ∞-norm of v
Linear Dependence
Let x1, x2, ..., xm be vectors in ℜn,
and let α1, α2, ..., αm be real numbers.
Then the vector
y = α1 x1 + α2 x2 + ... + αm
xm
is called a linear combination of the vectors; the scalars αi
are called the coefficients of the linear combination. If all the αi
are zero the linear combination is the trivial one. A set of vectors
x1, x2, ..., xm in ℜn are linearly
independent if the only linear combination of them that adds up to zero,
is the trivial one. Otherwise they are linearly dependent. If a set
of vectors x1, x2, ..., xm is linearly dependent,
there is a nontrivial (i.e., not all equal to 0) set of scalar coefficients
α1, α2, ..., αm such that
α1x1 + α2 x2 + ... + αm
xm= 0
In the above equation, the αi are scalars and the xi
are vectors.
At least one of the coefficients is nonzero. As an example, suppose that
α2 ≠ 0. Dividing everything by α2 ≠,
the linear dependent property can be written as
x2 = − (α1/α2 x1 + ... + αm/α2
xm)
This says the vector x2 contributes no additional information as far as linear
combinations are concerned, because it can be represented using some linear
combination of the other vectors. This information theoretic
point of view is often helpful in making sense of linear
algebra properties and operations.
A more concise way of stating a linear dependence relationship is to use matrix
notation. Define X as the m × n matrix with the ith vector xi
as its ith column. Define the m-vector
a = [α1 α2 ... αm].
The equation
y = α1 x1 + α2 x2 + ... +
αm xm can be written as
y = X*a
Linear dependence says some nonzero vector a will give y = 0 = X*a. The advantage of
the matrix version goes beyond saving ink and electrons; it is the crucial step in
being able too identify BLAS operations and to try replace lower level BLAS with
higher level ones.
This gives another good heuristic: a matrix*vector product is just a
succint way of expressing a linear combination.
Subspaces
S is a subspace of ℜn if x+y and α*x are ∈ S
whenever α ∈ ℜ is a real scalar and x, y ∈ S. A subspace can be
generated by a set of vectors V, by defining S to be the set of all possible
linear combinations of vectors in V. A basis for a subspace S is a set of
vectors in S such that the vectors are linearly independent, and they generate S.
For a given subspace S, any two bases B1 and B2 must have the
same number of vectors; that number is the dimension of the subspace.
Matrices
An m × n matrix is often defined as a rectangular array of numbers with m
rows and n columns. Extreme danger lurks here for computer scientists. The
problem is that a matrix is a mathematical entity, while
an array is a computer data structure that can be used to hold the
matrix. So get it straight now: a matrix can be stored in any data structure
(linked list, tree, two queues, etc).
Except in some digital signal processing applications,
matrices are always indexed from 1 by scientists,
mathematicians, and other creatures that roamed the earth centuries before
computer science was invented. However, a 5 × 3 matrix that is indexed
starting from 1 can be stored as a 2D array with rows indexed from 0 to 4 and columns
indexed from 0 to 2. Or it can have rows indexed from -17 to -13 (at least
in Fortran or Blitz++), or be stored in a 1D array indexed from 0 to 14, or
stored in a graph data structure, .... etc.
Structured Matrices
The term "structure" refers to many things in linear algebra, but the
most common use in numerical computing involves the nonzero structure of the
matrix. The reason we
concentrate on the "nonzero pattern" is because many of the matrices that
occur in solving partial differential equations (PDEs) and ordinary
differential equations (ODEs) have all but a few of their entries equal to
zero. Terms you should know include
- a diagonal matrix: Aij = 0 for i ≠ j
- lower triangular matrix: Aij = 0 for j > i
- upper triangular matrix: Aij = 0 for i > j
- strictly lower triangular matrix: Aij = 0 for j >= i
- strictly upper triangular matrix: Aij = 0 for i >= j
- upper Hessenberg matrix: Aij = 0 for i > j+1
- tridiagonal matrix: Aij = 0 for |i - j| > 1
Other terms that are commonly used include
- the superdiagonal part of a matrix: the entries strictly above the
main diagonal
- the superdiagonal of a matrix: the first diagonal above the main diagonal
of a matrix
- the subdiagonal part of a matrix: the entries strictly below the main
diagonal
- the subdiagonal of a matrix: the first diagonal below the main diagonal
of a matrix
A symmetric matrix A is one where Aij = Aji or equivalently,
A = AT.
Although it sounds petty, being able to index from any given starting
integer is extremely helpful in coding some numerical linear algebra
operations. For example, a "banded matrix" is typically stored by
diagonals (instead of by rows or columns). By convention, the main
diagonal is numbered as the 0-th diagonal, the first superdiagonal is
numbered as diagonal 1, etc. The first subdiagonal is numbered
-1, the diagonal below that is numbered -2, etc.
It is much easier to read and debug a code using banded matrices
if the arrays storing the matrix values can be indexed from -n to n.
Submatrices
Given a matrix A and a set of row/column indices {i1, i2,
..., ik}, the corresponding principal submatrix of A is
the k × k array consisting of elements with row and column numbers in that
index set. A leading principal submatrix has i1 = 1, i2
= 2, ..., ik = k, for some k. A trailing principal submatrix
is similarly defined, for the last k indices. Submatrices are also
obtained by partitioning a matrix. Often it is useful to consider an m
× n
matrix A as a row vector of n column vectors, each from ℜm:
A = [c1 c2 ... cn]
and other times it is convenient to consider A as a column vector of m row
vectors, each from ℜn:
A = [ r1T
r2T
.
.
.
rmT]
In general the vectors ri from a row partitioning
is different from the ci given by a column partitioning.
When will they be the same?
More generally
a matrix can be partitioned into blocks, where each block is itself a matrix.
Matrix Operations
Adding matrices, multiplying them by a scalar, and similar operations are
easily defined in linear algebra. Make sure that you are familiar with
multiplying matrices: in the operation C = A*B, the number of columns of A
must equal the number of rows of B. If A is m × n and B is n × k, then C
is m × k and its (r,c) element is the sum from i = 1 to i = k of Ari*Bic.
That is, the rth row of A is multiplied times corresponding elements
in the cth column of B. This holds even when Ari, Bic,
etc. are subblocks of A and B rather than just real numbers. One special
case is y = Ax, where A is m × n and y is m × 1, x is n × 1.
By partitioning A by columns, the equation y = A*x expresses y as a linear
combination of the columns of A; the ithcolumn as coefficient
xi, the ith component of the vector x. Transpose:
if A is an m × n matrix, then AT is the n × m matrix with element
(i,j) the same as the (j,i) element of A. Recall that
(AB)T = BT AT
More generally, taking one or both of m and n to 1 gives four special cases of
matrix multiplication
that account for a surprisingly large proportion of scientific computations.
- In C = A*B where A is 1xn and B is n × 1, then C is a scalar and
the operation is called a dotproduct, inner product, or the scalar
product of the vectors A and B.
- In C = C + A*B, if A is n × 1 and B is 1 × 1, then C is an
n-vector and the operation is called a saxpy or daxpy operation. It
is most commonly formulated as y = y + alpha*x, where x and y are n-vectors and
alpha is a scalar.
- In C = C + A*B, if A is m × 1 and B is 1 × n, the operation is
called a rank-one update of C. The reason is because the rank of the
matrix being added to C is equal to one. The product A*B in this case is called
the outer product of A and B. Any matrix with rank 1 can be written as the
outer product of two vectors. Rank-one updates will be common in solving linear
systems numerically.
- If if A is m × n and B is n × 1, the operation is called a
matrix-vector product.
More Subspaces
The range of an m × n matrix A is the set of all possible linear combinations
of the columns of A. Another way to express this is
{y: y = Ax for some vector x}, where braces indicate a set.
The range of A, range(A), is a subspace (reality
check: it is a subspace of which space, ℜm or ℜn?) The
null space of A is the set of all vectors that A maps to the zero
vector: {x: Ax = 0}. Stated otherwise, a vector is in the null space of A
if its components give a linear combination of the columns of A that sums
up to zero. Note that range(A) and null(A) live in different worlds; one
is in ℜm, the other in ℜn. Also note that if A is square
and of order n, then the dimensions of range(A) and null(A) sum to n.
Inverses
The most prototypical problem in numerical computing is to "solve" the equation
Ax = b for x, given coefficient matrix A and right-hand side vector
b. When A is square n × n and has full column rank, then a unique
inverse matrix A-1 exists such that A-1A = I = AA-1,
where I is the identity matrix of order n. The solution can be written as
x = A-1b. However, in practice you never want to compute or form
the inverse matrix A-1 in solving a linear system. Doing so requires
more computation and has lower numerical accuracy than Gaussian elimination
does. A square matrix is nonsingular if any of the following hold:
- An inverse matrix A-1 exists.
- A has full column rank (columns are a linearly independent set of vectors)
- A has full row rank (rows are a linearly independent set of vectors)
- The only solution to Ax = 0 is x = 0.
- null(A) = {0}, the zero vector of order n.
- dim(range(A)) = n
If matrices G and H are both invertible and the same size, then
(AB)-1 = B-1 A-1
When A is not square, we typically want a "least squares" solution
to Ax = b. That is, find the vector x that minimizes the 2-norm of the vector Ax
- b. This is how you experimentally find the
(n½, rinf) parameters for the Hockney-Jessup
model of a pipelined operation.
The norm of a vector gives its length, while the norm of a matrix gives the
maximum (relative) amount by which it can stretch a vector:
||A||2 = max{||Ax||2/||x||2, over all x ≠ 0}
This may seem to be a circular definition, but it is not. It defines a
matrix norm by using vector norms, which are already
well-defined..
A can be rectangular m × n, so the norm ||Ax||2 in the numerator
is for a vector of length m, while the norm ||x||2
in the denominator is for a vector
of length n. Mutatis mutandi you can similarly define the 1-norm
and ∞-norm of a matrix, but they have an equivalent but easier computation:
- ||A||1 = max{||aj||1, j = 1, 2, ...
n}, where aj is the j-th column of A
- ||A||inf = max{||ai||1, i = 1, 2, ... m}, where ai
is the i-th row of A
This is not a complete definition of matrix norms; with one exception we are
going to use the three above, which are examples of
operator norms. One norm that is not defined
as the maximum of a ratio of vector norms is the Frobenius norm:
- ||A||F = sqrt [ sum(A(i,j)2, over i = 1, 2, ...
m and j = 1, 2, ... n} ]
Like any norm, matrix norms must satisfy the properties
- Positivity: ||A|| >= 0, with equality only when A = 0
- Linearity: ||sA|| = |x|*||A||, for any scalar s
- Triangle inequality: ||A + B|| ≤ ||A|| + ||B|| for any conformant A and B
A property that does not hold for all matrix norms but does for the four
listed above is
||AB|| ≤ ||A||*||B||.
[Note the difference between
this and the triangle inequality, which deals with the sum (not the product)
of matrices.] If the matrix norm does satisfy this additional property, it
is called a
consistent norm.
Matrix norms are most commonly used in numerical analysis, to measure how close a
computed matrix is to the ideal (in the Platonic sense) matrix. For example, a
common approach to solving a linear system involves factoring it into the product of
a lower and upper triangular matrices: A = L*U. After performing LU factorization
the L and U factors are likely to have some errors from finite precision arithmetic.
In that case, ||A-L*U|| provides a measure of how far off the computed factorization
is from the exact one. Here, A, L, and U refer to matrices. (Slight digression:
in LU factorization the array A is usually overwritten with the combined L and
U factors.) A less common use for matrix norms is to define information compression
approximations to large operators. That is used in the context of text retrieval or
web search engines.
For large matrices A, avoid using the 2-norm and instead use almost any other norm.
For square matrices A, the two-norm is the largest absolute value of the eigenvalues
of A. This is expensive (in time and computation) to compute. Use the matrix 1 or
matrix ∞ norm, or the "Frobenius" norm which is the 2-norm of the vector of
length m*n obtained by stacking all the elements of A into one vector.
Throughout the remainder of this course, all norms (matrix or vector)
will be the 2-norm, unless explicitly stated otherwise. So
the subscript 2 will be dropped hereafter.
However, when numerically computing a matrix norm, only the 1, infinity, or
Frobenius norms will be used for the efficiency.
Orthogonality and Orthogonal Matrices
- Two vectors x and y are orthogonal if their inner product is
0.
- The Cauchy relation is xTy = ||x||*||y||*cos(θ),
where θ is the angle between x and y. So two vectors are orthogonal
if at least one of them is the zero vector, or if the angle between them
is 90 degrees. This also implies that xTx = ||x||2,
since the angle a vector makes with itself is 0 degrees.
- A set of vectors is orthogonal if they are pairwise mutually
orthogonal.
- A set of vectors is orthonormal if it is orthogonal, and
each vector in it has norm 1 ( ||xi|| = 1, for all i ).
- An m × n matrix has orthonormal columns if the set of
vectors formed by its columns is an orthogonal set.
- An m × n matrix Q with orthonormal columns satisfies
QTQ = I, but in general QQT ≠ I. A matrix with
orthonormal columns necessarily has m ≥ n. (Why? Answer that before
continuing, or the rest will not be of any use to you).
- A matrix Q is orthogonal if it is square and has orthonormal columns.
This also implies it has orthonormal rows. This in turn means
that the inverse of an orthogonal matrix is its transpose. Computationally,
that makes it darned easier to solve a linear system with an orthogonal matrix
as a coefficient matrix.
- If Q is orthogonal, then ||QA|| = ||A|| for any conformant matrix A,
including the case where A is n × 1 (a vector). This equality holds for both
the 2-norm and the Frobenius-norm, but not the one-norm or ∞-norm.
- If Q is orthogonal, ||Q|| = 1.
An orthonormal basis for a subspace is especially convenient to work with.
Let {qi, i = 1, 2, ..., k} be an orthonormal basis for a
subspace S. Then if x is a vector in S, its representation as a linear
combination of the qis is easily computed as
x = (q1Tx)q1
+ (q2Tx)q2 + ... + (qkTx)qk.
If you create the n × k matrix Q with columns consisting of
those vectors qis , then the coefficients of the linear combination
are the entries in the vector QTx. Finding the representation
in a non-orthonormal basis is much more difficult computationally.
Because orthogonal matrices don't change the norm, the multiplication
of matrices and vectors is highly stable numerically.
In floating point arithmetic you can multiply a vector by millions of orthogonal
matrices and have only
mild loss of precision at the end.
Multiplying by an orthogonal matrix is
sometimes called "applying an orthogonal transformation".
Eigenvalues
Eigenvalues are only defined for square matrices. If there is a scalar λ
and a nonzero vector x such that Ax = λ x, then λ is an
eigenvalue of A and x is the corresponding eigenvector. Even if A has
all real-valued entries, it may have complex-valued eigenvalues. Geometrically the
relation Ax = λ x says that in the direction given by the vector x, the action of A
may stretch, or shrink the vector by a factor of |λ|.
A (square, of course) matrix of order n has n
eigenvalues, but those may not be distinct values - some eigenvalues may be repeated
with different eigenvectors. Numerically you will almost never encounter
eigenvalues with this kind of multiplicity; floating point arithmetic invariably
perturbs the array values representing the matrices and vectors, and so the
eigenvalues will seem distinct.
Some properties of eigenvalues:
- A symmetric matrix has all real eigenvalues (no complex-valued ones).
This is easy to prove - try it.
- Define X = [x1 x2 ... xn],
the matrix whose columns are the eigenvectors of A. Let S be the diagonal
matrix with the corresponding eigenvalues along its main diagonal. Then
AX = XS from the definition of eigenvectors. If X is nonsingular,
then A = XSX-1
- If A is symmetric, it has a full set of n eigenvectors which can be chosen
to be orthonormal. In that case, A = QSQT, where Q is orthogonal.
This is called the Schur decomposition of A. Because
orthogonal transformations are so friendly to numerical computing,
almost always a symmetric matrice's eigenvectors are chose to be
orthogonal this way.
- When A is symmetric, you can represent any n-vector x as a linear combination
of the (orthonormal) eigenvectors of A. This is called the eigendecomposition
of x (with respect to A).
When A is not symmetric, the eigenvalues may be complex and the eigenvectors
may not form a basis for ℜn. This suggests that maybe eigenvalues are
not the right thing to look at for nonsymmetric matrices, and there is a
better approach, the singular value decomposition, for factoring a
nonsymmetric or even a rectangular matrix.
Next: LU Factorization
- Last Modified: Mon 04 Nov 2019, 06:59 AM