P573: Linear Algebra Basics

Linear Algebra Basics

This short review is primarily a listing of some terminology needed. In general the vector space used is restricted to the space ℜⁿ, the set of n-vectors with elements drawn from the real numbers ℜ. One obvious exception is the vector space of all m × n matrices with elements drawn from the real numbers. By default, x ∈ ℜⁿ is a column vector (an n × 1 matrix). A row vector is indicated by transposing a column vector: x^T. In Matlab a transpose is indicated by a single quote mark, e.g. x'. What if x is a complex vector in Matlab, is it the transpose or Hermitian transpose (AKA the conjugate transpose)? Answer this for yourself: set up a random vector of length n = 5 with complex numbers as its components:

    x = randn(5,1) + sqrt(-1)*randn(5,1)

and then see what x' is:

x'

[If you want to use Matlab's help facility, doing "help ' " won't work. Instead use "help punct"]

All of the vectors and matrices used in P573 will be real-valued unless otherwise explicitly stated. The only exception is for eigenvalues of nonsymmetric matrices. Sometimes I will denote multiplication by juxtaposition, e.g. Ba. Other times, particularly when emphasis is needed on the multiplication operation, I'll use B*a.

Vector Norms

There are several "norms" or measures of the sizes of vectors. The reason for specifying these as vector norms is because there are also matrix norms, defined later in this page. For a vector v with components v₁, v₂, ..., v_n, the three most common norms used are

2-norm: sqrt( ∑ from i = 1 to n of (v_i * v_i))
1-norm: ∑ from i = 1 to n of | v_i |
∞-norm: the maximum value for i = 1:n of | v_i |

In Matlab, those vector norms are given by functions

norm(v) = norm(v, 2) = 2-norm of v
norm(v, 1) = 1-norm of v
norm(v, inf) = norm(v, 'inf') = ∞-norm of v

Linear Dependence

Let x₁, x₂, ..., x_m be vectors in ℜⁿ, and let α₁, α₂, ..., α_m be real numbers. Then the vector

y = α₁ x₁ + α₂ x₂ + ... + α_m x_m
is called a linear combination of the vectors; the scalars α_i are called the coefficients of the linear combination. If all the α_i are zero the linear combination is the trivial one. A set of vectors x₁, x₂, ..., x_m in ℜⁿ are linearly independent if the only linear combination of them that adds up to zero, is the trivial one. Otherwise they are linearly dependent. If a set of vectors x₁, x₂, ..., x_m is linearly dependent, there is a nontrivial (i.e., not all equal to 0) set of scalar coefficients α₁, α₂, ..., α_m such that
α₁x₁ + α₂ x₂ + ... + α_m x_m= 0

In the above equation, the α_i are scalars and the x_i are vectors. At least one of the coefficients is nonzero. As an example, suppose that α₂ ≠ 0. Dividing everything by α₂ ≠, the linear dependent property can be written as
x₂ = − (α₁/α₂ x₁ + ... + α_m/α₂ x_m)
This says the vector x₂ contributes no additional information as far as linear combinations are concerned, because it can be represented using some linear combination of the other vectors. This information theoretic point of view is often helpful in making sense of linear algebra properties and operations.

A more concise way of stating a linear dependence relationship is to use matrix notation. Define X as the m × n matrix with the i^th vector x_i as its i^th column. Define the m-vector

a = [α₁ α₂ ... α_m].
The equation y = α₁ x₁ + α₂ x₂ + ... + α_m x_m can be written as
y = X*a
Linear dependence says some nonzero vector a will give y = 0 = X*a. The advantage of the matrix version goes beyond saving ink and electrons; it is the crucial step in being able too identify BLAS operations and to try replace lower level BLAS with higher level ones. This gives another good heuristic: a matrix*vector product is just a succint way of expressing a linear combination.

Subspaces

S is a subspace of ℜⁿ if x+y and α*x are ∈ S whenever α ∈ ℜ is a real scalar and x, y ∈ S. A subspace can be generated by a set of vectors V, by defining S to be the set of all possible linear combinations of vectors in V. A basis for a subspace S is a set of vectors in S such that the vectors are linearly independent, and they generate S. For a given subspace S, any two bases B₁ and B₂ must have the same number of vectors; that number is the dimension of the subspace.

Matrices

An m × n matrix is often defined as a rectangular array of numbers with m rows and n columns. Extreme danger lurks here for computer scientists. The problem is that a matrix is a mathematical entity, while an array is a computer data structure that can be used to hold the matrix. So get it straight now: a matrix can be stored in any data structure (linked list, tree, two queues, etc). Except in some digital signal processing applications, matrices are always indexed from 1 by scientists, mathematicians, and other creatures that roamed the earth centuries before computer science was invented. However, a 5 × 3 matrix that is indexed starting from 1 can be stored as a 2D array with rows indexed from 0 to 4 and columns indexed from 0 to 2. Or it can have rows indexed from -17 to -13 (at least in Fortran or Blitz++), or be stored in a 1D array indexed from 0 to 14, or stored in a graph data structure, .... etc.

Structured Matrices

The term "structure" refers to many things in linear algebra, but the most common use in numerical computing involves the nonzero structure of the matrix. The reason we concentrate on the "nonzero pattern" is because many of the matrices that occur in solving partial differential equations (PDEs) and ordinary differential equations (ODEs) have all but a few of their entries equal to zero. Terms you should know include

a diagonal matrix: A_ij = 0 for i ≠ j
lower triangular matrix: A_ij = 0 for j > i
upper triangular matrix: A_ij = 0 for i > j
strictly lower triangular matrix: A_ij = 0 for j >= i
strictly upper triangular matrix: A_ij = 0 for i >= j
upper Hessenberg matrix: A_ij = 0 for i > j+1
tridiagonal matrix: A_ij = 0 for |i - j| > 1

Other terms that are commonly used include

the superdiagonal part of a matrix: the entries strictly above the main diagonal
the superdiagonal of a matrix: the first diagonal above the main diagonal of a matrix
the subdiagonal part of a matrix: the entries strictly below the main diagonal
the subdiagonal of a matrix: the first diagonal below the main diagonal of a matrix

A symmetric matrix A is one where A_ij = A_ji or equivalently, A = A^T.

Although it sounds petty, being able to index from any given starting integer is extremely helpful in coding some numerical linear algebra operations. For example, a "banded matrix" is typically stored by diagonals (instead of by rows or columns). By convention, the main diagonal is numbered as the 0-th diagonal, the first superdiagonal is numbered as diagonal 1, etc. The first subdiagonal is numbered -1, the diagonal below that is numbered -2, etc. It is much easier to read and debug a code using banded matrices if the arrays storing the matrix values can be indexed from -n to n.

Submatrices

Given a matrix A and a set of row/column indices {i₁, i₂, ..., i_k}, the corresponding principal submatrix of A is the k × k array consisting of elements with row and column numbers in that index set. A leading principal submatrix has i₁ = 1, i₂ = 2, ..., i_k = k, for some k. A trailing principal submatrix is similarly defined, for the last k indices. Submatrices are also obtained by partitioning a matrix. Often it is useful to consider an m × n matrix A as a row vector of n column vectors, each from ℜ^m:
A = [c₁ c₂ ... c_n]

and other times it is convenient to consider A as a column vector of m row vectors, each from ℜⁿ:

A = [ r₁^T

         r₂^T
         .
         .
         .
         r_m^T]
In general the vectors r_i from a row partitioning is different from the c_i given by a column partitioning. When will they be the same?

More generally a matrix can be partitioned into blocks, where each block is itself a matrix.

Matrix Operations

Adding matrices, multiplying them by a scalar, and similar operations are easily defined in linear algebra. Make sure that you are familiar with multiplying matrices: in the operation C = A*B, the number of columns of A must equal the number of rows of B. If A is m × n and B is n × k, then C is m × k and its (r,c) element is the sum from i = 1 to i = k of A_ri*B_ic. That is, the r^th row of A is multiplied times corresponding elements in the c^th column of B. This holds even when A_ri, B_ic, etc. are subblocks of A and B rather than just real numbers. One special case is y = Ax, where A is m × n and y is m × 1, x is n × 1. By partitioning A by columns, the equation y = A*x expresses y as a linear combination of the columns of A; the i^thcolumn as coefficient x_i, the i^th component of the vector x. Transpose: if A is an m × n matrix, then A^T is the n × m matrix with element (i,j) the same as the (j,i) element of A. Recall that
(AB)^T = B^T A^T

More generally, taking one or both of m and n to 1 gives four special cases of matrix multiplication that account for a surprisingly large proportion of scientific computations.

In C = A*B where A is 1xn and B is n × 1, then C is a scalar and the operation is called a dotproduct, inner product, or the scalar product of the vectors A and B.

In C = C + A*B, if A is n × 1 and B is 1 × 1, then C is an n-vector and the operation is called a saxpy or daxpy operation. It is most commonly formulated as y = y + alpha*x, where x and y are n-vectors and alpha is a scalar.

In C = C + A*B, if A is m × 1 and B is 1 × n, the operation is called a rank-one update of C. The reason is because the rank of the matrix being added to C is equal to one. The product A*B in this case is called the outer product of A and B. Any matrix with rank 1 can be written as the outer product of two vectors. Rank-one updates will be common in solving linear systems numerically.

If if A is m × n and B is n × 1, the operation is called a matrix-vector product.

More Subspaces

The range of an m × n matrix A is the set of all possible linear combinations of the columns of A. Another way to express this is
{y: y = Ax for some vector x}, where braces indicate a set. The range of A, range(A), is a subspace (reality check: it is a subspace of which space, ℜ^m or ℜⁿ?) The null space of A is the set of all vectors that A maps to the zero vector: {x: Ax = 0}. Stated otherwise, a vector is in the null space of A if its components give a linear combination of the columns of A that sums up to zero. Note that range(A) and null(A) live in different worlds; one is in ℜ^m, the other in ℜⁿ. Also note that if A is square and of order n, then the dimensions of range(A) and null(A) sum to n.

Inverses

The most prototypical problem in numerical computing is to "solve" the equation Ax = b for x, given coefficient matrix A and right-hand side vector b. When A is square n × n and has full column rank, then a unique inverse matrix A^-1 exists such that A^-1A = I = AA^-1, where I is the identity matrix of order n. The solution can be written as x = A^-1b. However, in practice you never want to compute or form the inverse matrix A^-1 in solving a linear system. Doing so requires more computation and has lower numerical accuracy than Gaussian elimination does. A square matrix is nonsingular if any of the following hold:

An inverse matrix A^-1 exists.
A has full column rank (columns are a linearly independent set of vectors)
A has full row rank (rows are a linearly independent set of vectors)
The only solution to Ax = 0 is x = 0.
null(A) = {0}, the zero vector of order n.
dim(range(A)) = n

If matrices G and H are both invertible and the same size, then (AB)^-1 = B^-1 A^-1

When A is not square, we typically want a "least squares" solution to Ax = b. That is, find the vector x that minimizes the 2-norm of the vector Ax - b. This is how you experimentally find the (n_½, r_inf) parameters for the Hockney-Jessup model of a pipelined operation.

Matrix Norms

The norm of a vector gives its length, while the norm of a matrix gives the maximum (relative) amount by which it can stretch a vector:

||A||₂ = max{||Ax||₂/||x||₂, over all x ≠ 0}
This may seem to be a circular definition, but it is not. It defines a matrix norm by using vector norms, which are already well-defined.. A can be rectangular m × n, so the norm ||Ax||₂ in the numerator is for a vector of length m, while the norm ||x||₂ in the denominator is for a vector of length n. Mutatis mutandi you can similarly define the 1-norm and ∞-norm of a matrix, but they have an equivalent but easier computation:

||A||₁ = max{||a_j||₁, j = 1, 2, ... n}, where a_jis the j-th column of A

||A||_inf = max{||a_i||1, i = 1, 2, ... m}, where a_i is the i-th row of A

This is not a complete definition of matrix norms; with one exception we are going to use the three above, which are examples of operator norms. One norm that is not defined as the maximum of a ratio of vector norms is the Frobenius norm:

||A||_F = sqrt [ sum(A(i,j)², over i = 1, 2, ... m and j = 1, 2, ... n} ]

Like any norm, matrix norms must satisfy the properties

Positivity: ||A|| >= 0, with equality only when A = 0

Linearity: ||sA|| = |x|*||A||, for any scalar s

Triangle inequality: ||A + B|| ≤ ||A|| + ||B|| for any conformant A and B

A property that does not hold for all matrix norms but does for the four listed above is

||AB|| ≤ ||A||*||B||.

[Note the difference between this and the triangle inequality, which deals with the sum (not the product) of matrices.] If the matrix norm does satisfy this additional property, it is called a consistent norm.

Matrix norms are most commonly used in numerical analysis, to measure how close a computed matrix is to the ideal (in the Platonic sense) matrix. For example, a common approach to solving a linear system involves factoring it into the product of a lower and upper triangular matrices: A = L*U. After performing LU factorization the L and U factors are likely to have some errors from finite precision arithmetic. In that case, ||A-L*U|| provides a measure of how far off the computed factorization is from the exact one. Here, A, L, and U refer to matrices. (Slight digression: in LU factorization the array A is usually overwritten with the combined L and U factors.) A less common use for matrix norms is to define information compression approximations to large operators. That is used in the context of text retrieval or web search engines.

For large matrices A, avoid using the 2-norm and instead use almost any other norm. For square matrices A, the two-norm is the largest absolute value of the eigenvalues of A. This is expensive (in time and computation) to compute. Use the matrix 1 or matrix ∞ norm, or the "Frobenius" norm which is the 2-norm of the vector of length m*n obtained by stacking all the elements of A into one vector.

Throughout the remainder of this course, all norms (matrix or vector) will be the 2-norm, unless explicitly stated otherwise. So the subscript 2 will be dropped hereafter.
However, when numerically computing a matrix norm, only the 1, infinity, or Frobenius norms will be used for the efficiency.

Orthogonality and Orthogonal Matrices

Two vectors x and y are orthogonal if their inner product is 0.
The Cauchy relation is x^Ty = ||x||*||y||*cos(θ), where θ is the angle between x and y. So two vectors are orthogonal if at least one of them is the zero vector, or if the angle between them is 90 degrees. This also implies that x^Tx = ||x||², since the angle a vector makes with itself is 0 degrees.
A set of vectors is orthogonal if they are pairwise mutually orthogonal.
A set of vectors is orthonormal if it is orthogonal, and each vector in it has norm 1 ( ||x_i|| = 1, for all i ).
An m × n matrix has orthonormal columns if the set of vectors formed by its columns is an orthogonal set.

An m × n matrix Q with orthonormal columns satisfies Q^TQ = I, but in general QQ^T ≠ I. A matrix with orthonormal columns necessarily has m ≥ n. (Why? Answer that before continuing, or the rest will not be of any use to you).
A matrix Q is orthogonal if it is square and has orthonormal columns. This also implies it has orthonormal rows. This in turn means that the inverse of an orthogonal matrix is its transpose. Computationally, that makes it darned easier to solve a linear system with an orthogonal matrix as a coefficient matrix.
If Q is orthogonal, then ||QA|| = ||A|| for any conformant matrix A, including the case where A is n × 1 (a vector). This equality holds for both the 2-norm and the Frobenius-norm, but not the one-norm or ∞-norm.
If Q is orthogonal, ||Q|| = 1.

An orthonormal basis for a subspace is especially convenient to work with. Let {q_i, i = 1, 2, ..., k} be an orthonormal basis for a subspace S. Then if x is a vector in S, its representation as a linear combination of the q_is is easily computed as
x = (q₁^Tx)q₁ + (q₂^Tx)q₂ + ... + (q_k^Tx)q_k.
If you create the n × k matrix Q with columns consisting of those vectors q_is , then the coefficients of the linear combination are the entries in the vector Q^Tx. Finding the representation in a non-orthonormal basis is much more difficult computationally.

Because orthogonal matrices don't change the norm, the multiplication of matrices and vectors is highly stable numerically. In floating point arithmetic you can multiply a vector by millions of orthogonal matrices and have only mild loss of precision at the end. Multiplying by an orthogonal matrix is sometimes called "applying an orthogonal transformation".

Eigenvalues

Eigenvalues are only defined for square matrices. If there is a scalar λ and a nonzero vector x such that Ax = λ x, then λ is an eigenvalue of A and x is the corresponding eigenvector. Even if A has all real-valued entries, it may have complex-valued eigenvalues. Geometrically the relation Ax = λ x says that in the direction given by the vector x, the action of A may stretch, or shrink the vector by a factor of |λ|. A (square, of course) matrix of order n has n eigenvalues, but those may not be distinct values - some eigenvalues may be repeated with different eigenvectors. Numerically you will almost never encounter eigenvalues with this kind of multiplicity; floating point arithmetic invariably perturbs the array values representing the matrices and vectors, and so the eigenvalues will seem distinct.

Some properties of eigenvalues:

A symmetric matrix has all real eigenvalues (no complex-valued ones). This is easy to prove - try it.
Define X = [x₁ x₂ ... x_n], the matrix whose columns are the eigenvectors of A. Let S be the diagonal matrix with the corresponding eigenvalues along its main diagonal. Then AX = XS from the definition of eigenvectors. If X is nonsingular, then A = XSX^-1
If A is symmetric, it has a full set of n eigenvectors which can be chosen to be orthonormal. In that case, A = QSQ^T, where Q is orthogonal. This is called the Schur decomposition of A. Because orthogonal transformations are so friendly to numerical computing, almost always a symmetric matrice's eigenvectors are chose to be orthogonal this way.
When A is symmetric, you can represent any n-vector x as a linear combination of the (orthonormal) eigenvectors of A. This is called the eigendecomposition of x (with respect to A).

When A is not symmetric, the eigenvalues may be complex and the eigenvectors may not form a basis for ℜⁿ. This suggests that maybe eigenvalues are not the right thing to look at for nonsymmetric matrices, and there is a better approach, the singular value decomposition, for factoring a nonsymmetric or even a rectangular matrix.

Next: LU Factorization

Last Modified: Mon 04 Nov 2019, 06:59 AM