Let x be a vector of length m. For the QR factorization via Householder transformations x will be a part of a column of the matrix A, but it is easier to define first without the indexing in A. Set
Without scaling, the vector v that defines the H-reflector form representation (1) is the same as A(k:m,k) except possibly for its first entry. All of the others possibly involve some scaling of the subdiagonal of A(k:m,k), or about (mn - n(n+1)/2) flops. The overall algorithm will involve O(m*n2) flops, so the scaling is comparatively minor in cost. Using Householder transforms represented as Hk = I − vkvkT avoids the need to store γk, but altogether the values γk will need only n doubles, a minor overhead.
Vitally important: regardless of the representation chosen, never explicitly form the matrix H. It will be m×m and a complete QR factorization would require n of them, for a total storage cost of O(n*m2). Instead use the factored from of H, which requires storage of only O(m*n). Put in the example values from class of m = 6M and n = 200k to see what a difference that makes.
Even using the form H = I − v*vT still leaves some storage implementation issues. For example: put each vk into the diagonal and subdiagonal part of the array A. Put the strictly upper triangular part of R in corresponding part of A, and the diagonal of R in an auxillary vector d. At the end of the factorization, a 8 by 6 matrix would have
and the auxillary vector d = (r11, r22, ... , r66 )T. Or the diagonal elements of R can be put on the main diagonal of the array A, and the values vii, i = 1:n, could be relegated to the auxillary vector. Different software packages take different approaches, so know what they have chosen before trying to use them.
[m,n] = size(A); for k = 1:min(m−1,n), t = norm( A(k:m,k) ); s = sign( A(k,k) ) * t; t = t^2 + s*(2*A(k,k) + s); A(k,k) = A(k,k) + s; d(k) = −s; gamma = sqrt(2/t); A(k:m,k) = gamma*A(k:m,k); if (k < n), A(k:m,k+1:n) = A(k:m,k+1:n) − A(k:m,k)* (A(k:m,k)'*A(k:m,k+1:n)); end end if (m <= n), d(m) = A(m,m); end
What representation form for H does this implementation use? What does it do with any extra storage needed, like putting in diag(R), the scaling factors, or my shoe size in inches? In other words, at the end do you have the array A looking like
or something else? Don't bother going any further until you can nail that down and give justification for it. Even if you assume my algorithm is correct (a big mistake, BTW), solving a least squares problem still requires computing x = R-1*QT*b, so you need to know how to compute the product and solve the triangular system. As a subtle reminder of what can go wrong, the formula x = R-1*QT*b is almost always wrong - there is a dimension mismatch in it. Track that down and fix it.
Solving the full rank least squares problem then only requires solving Rx = QTb. Because on step k the factorization only needs to work on the entries in rows k through m, each Hk can be treated as a (m−k)×(m−k) matrix, operating on the trailing part of b. In that case, the algorithm for computing QTb is:
for k = 1:min(m−1,n) b(k:m) = Hk*b(k:m) end for
The exact form depends on what representation of Hk was chosen. E.g., for one of them the algorithm is
for k = 1: min(m−1,n) t = A(k:m,k)' * b(k:m) b(k:m) = b(k:m) − t*A(k:m,k) end for
Which one of the representations for H does this version assume, (1), (2), (3), (4), or none of the above? If it helps, here is a hint: it is not a version that scales by 1.21473081 or by my shoe size.
Usually, all which is needed is the least squares solution. But if needed, how can the matrix Q (or QT) be explicitly created? A hint (slightly more helpful than the shoe sizae one): you can feed in a sequence of vectors into the function that computes QTb to build Q.
Fortunately the entire matrix Q is rarely needed since it is m×m. What if you need to form the matrix Q1, an m×n matrix whose columns form an orthonormal basis for range(A)? Recall that generally m >> n, so computing the entire m×m Q and then extracting the first n columns of it is impractical or even impossible.
Using Householder transformations moves the QR factorization from the BLAS-1 operations of the Givens method to the BLAS-2 level. But A = QR is entirely a matrix computation, so there must be a BLAS-3 version lurking about ....
Next: block QR factorization