Linear Least Squares: Householder Refectors

QR Factorization by Householder Reflectors

One way to derive a BLAS level 2 QR factorization uses "Householder tranforms". A Household transform is also called a "Householder reflector" because geometrically it does a reflection of a vector through a plane. Givens rotations zero out one entry at a time in A, while Householder reflectors can zero out an entire subdiagonal of A in a single step. One useful notation is needed now: the vector e_i has all of its entries equal to zero, except entry i which is set to 1. Other ways of defining e_i is as the i-th column of the identity matrix, or as the i^th unit vector in Cartesian coordinates.

Let x be a vector of length m. For the QR factorization via Householder transformations x will be a part of a column of the matrix A, but it is easier to define first without the indexing in A. Set

σ = ± sqrt(x^Tx)
v = x + σ e₁ [this just adds σ to the first element of the vector x]
γ = 2/||v||²
H = I − γ v v^T

Then H*x = (*, 0, ..., 0)^T, and the Householder matrix H is orthogonal. Although the sign of the nonzero entry * is not yet known, its absolute value abs(*) must be ||x|| (why?).

Properties of H:

H^TH = I, and if L is the subspace {y: y^Tv = 0}, then H*v gives the mirror reflection of the vector v through L
Choose the sign in the definition of σ to match that of x₁. That will make ||v|| larger than if σ and x₁ have different signs. The definition of γ above is the only place where a divide occurs, and so it is the most likely place to encounter numerical inaccuracies
The scalar γ need not be stored, by scaling the vector v with sqrt(γ) to get w = sqrt(γ)*v. Then H = I − ww^T.
Applying H*A = A − γ*v*(v^T*A) is a matrix*vector product followed by a rank-1 update, both of which are BLAS-2 operations

To get a complete QR factorization of a matrix A, proceed column-by-column. At the k-th step, use H_k chosen to zero out all but first component in A(k:m,k), by treating A(k:m,k) as the "x" in the defintion of H. On the k-th step, rows 1:k-1 of A are not affected, so the "vector" x is decreasing in size as the algorithm proceeds.

Representations of H and storage schemes:

Several ways of handling the data generated during the factorization exist, corresponding to different ways chosen to normalize (or not) the vectors v that define the matrix H. Those include

H = I − γ*v*v^T as shown above
H = I − v*v^T, with v replaced by sqrt(γ)*v in item (1)
H = I − β v*v^T where v is normalized so that its first entry is 1.0
H = I − 2*v*v^T where v is replaced by (sqrt(γ)/2)*v in item (1)

The last one comes from the geometric interpretation of H as a mirror reflector. If you are standing 2.3 meters in front of a mirror, your image seems to be 2*2.3 = 4.6 meters away from you. An uncountably infinite number of scalings and representations of a single Householder matrix H exist. E.g., why not scale v by 1.21473081? (The answer to the last question is "what justification do you possibly have in your alleged mind for it?", or more succintly, "you're doing it wrong.") All of the ways listed (should) give the exact same matrix H, so choose one that is convienent or better yet, gives a better memory reference to flop ratio.

Without scaling, the vector v that defines the H-reflector form representation (1) is the same as A(k:m,k) except possibly for its first entry. All of the others possibly involve some scaling of the subdiagonal of A(k:m,k), or about (mn - n(n+1)/2) flops. The overall algorithm will involve O(m*n²) flops, so the scaling is comparatively minor in cost. Using Householder transforms represented as H_k = I − v_kv_k^T avoids the need to store γ_k, but altogether the values γ_k will need only n doubles, a minor overhead.

Vitally important: regardless of the representation chosen, never explicitly form the matrix H. It will be m×m and a complete QR factorization would require n of them, for a total storage cost of O(n*m²). Instead use the factored from of H, which requires storage of only O(m*n). Put in the example values from class of m = 6M and n = 200k to see what a difference that makes.

Even using the form H = I − v*v^T still leaves some storage implementation issues. For example: put each v_k into the diagonal and subdiagonal part of the array A. Put the strictly upper triangular part of R in corresponding part of A, and the diagonal of R in an auxillary vector d. At the end of the factorization, a 8 by 6 matrix would have

$\begin{displaymath}A = \left[\begin{array}{cccccc}v_1^1 & r_{12} & r_{13} & r...... v_7^2 & v_6^3 & v_5^4 & v_4^5 & v_3^6 \\\end{array}\right],\end{displaymath}$

and the auxillary vector d = (r₁₁, r₂₂, ... , r₆₆ )^T. Or the diagonal elements of R can be put on the main diagonal of the array A, and the values vⁱ_i, i = 1:n, could be relegated to the auxillary vector. Different software packages take different approaches, so know what they have chosen before trying to use them.

Householder QR Factorization Algorithm

At long last, here is a Matlab implementation of QR factorization using Householder reflectors:

   [m,n] = size(A);
   for k = 1:min(m−1,n),
       t = norm( A(k:m,k) );
       s = sign( A(k,k) ) * t;
       t = t^2 + s*(2*A(k,k) + s);
       A(k,k) = A(k,k) + s;
       d(k) = −s;
       gamma = sqrt(2/t);
       A(k:m,k) = gamma*A(k:m,k);
       if (k < n),
         A(k:m,k+1:n) = A(k:m,k+1:n) 
           − A(k:m,k)* (A(k:m,k)'*A(k:m,k+1:n));
       end
   end  
   if (m <= n),
      d(m) = A(m,m);
   end

What representation form for H does this implementation use? What does it do with any extra storage needed, like putting in diag(R), the scaling factors, or my shoe size in inches? In other words, at the end do you have the array A looking like

$\begin{displaymath}A = \left[\begin{array}{cccccc}v_1^1 & r_{12} & r_{13} & r...... v_7^2 & v_6^3 & v_5^4 & v_4^5 & v_3^6 \\\end{array}\right],\end{displaymath}$

or something else? Don't bother going any further until you can nail that down and give justification for it. Even if you assume my algorithm is correct (a big mistake, BTW), solving a least squares problem still requires computing x = R^-1*Q^T*b, so you need to know how to compute the product and solve the triangular system. As a subtle reminder of what can go wrong, the formula x = R^-1*Q^T*b is almost always wrong - there is a dimension mismatch in it. Track that down and fix it.

After the fact(orization)

To solve a least squares problems using the QR factorization above, let H_k be the k^th Householder reflector generated. Then Q^T = H_n * H_n−1 * … * H₁

Solving the full rank least squares problem then only requires solving Rx = Q^Tb. Because on step k the factorization only needs to work on the entries in rows k through m, each H_k can be treated as a (m−k)×(m−k) matrix, operating on the trailing part of b. In that case, the algorithm for computing Q^Tb is:

for k = 1:min(m−1,n)
      b(k:m) = H_k*b(k:m) 
end for

The exact form depends on what representation of H_k was chosen. E.g., for one of them the algorithm is

for k = 1: min(m−1,n)
        t = A(k:m,k)' * b(k:m)
        b(k:m) = b(k:m) − t*A(k:m,k)
end for

Which one of the representations for H does this version assume, (1), (2), (3), (4), or none of the above? If it helps, here is a hint: it is not a version that scales by 1.21473081 or by my shoe size.

Usually, all which is needed is the least squares solution. But if needed, how can the matrix Q (or Q^T) be explicitly created? A hint (slightly more helpful than the shoe sizae one): you can feed in a sequence of vectors into the function that computes Q^Tb to build Q.

Fortunately the entire matrix Q is rarely needed since it is m×m. What if you need to form the matrix Q₁, an m×n matrix whose columns form an orthonormal basis for range(A)? Recall that generally m >> n, so computing the entire m×m Q and then extracting the first n columns of it is impractical or even impossible.

Using Householder transformations moves the QR factorization from the BLAS-1 operations of the Givens method to the BLAS-2 level. But A = QR is entirely a matrix computation, so there must be a BLAS-3 version lurking about ....

Next: block QR factorization

Started : Mon 23 Aug 2010, 06:00 PM to merge multiple versions

Last Modified: Mon 04 Nov 2019, 07:19 AM