Jacobi method for heat diffusion problem
The codes covered in class are available, and you can download
the tar file
with the files. When uncompressed/expanded, both will create a directory named
"diffusion", with two subdirectories "c" and "f", for C and Fortran. Choose either, or
if you want to, write your own code.
In addition, there are files
- ccc, which can be sourced (not executed) to set up environment variables
for OMP (and MKL and Fortran co-arrays). It uses c-shell syntax, so you will
need to edit it if you use bash instead.
- plothistory.m which will read in a file named "history" as specified in the
code, and plot the convergence of the iterative method.
- showplate.m which reads in a file named "plate" and will give a 3D image of
the plate and its boundary conditions.
- showresults.m which reads in a file named "results" and creates 3 plots of
the four methods you need to implement: one each for number of iterations,
elapsed time, and Gflop/sec computational rate.
The C/Fortran codes solve a heat diffusion problem on a 2D metal plate, discretized into a
(N+2) x (N+2) domain where N is a given mesh parameter. The plate is represented
by a 2D array T, although in C it is stored in a 1D array. The update to T on
each iteration is in an array dT, standing for "delta T". The outer nodes on the
plate, indexed by T(0,:), T(:,0), T(N+1,:), and T(:,N+1), are boundary conditions
set by the function BC.c or subroutine BC.f90. The interior nodes, indexed by T(1:N,1:N),
are initialized to all zeros.
Your tasks
First, pick C/C++ or Fortran to work with.
Make sure you can compile and run the code as is; this may require some changes to
the make.inc file to match the platform you are using. If you want, run the code
to generate the file "plate" that has a solution in it that you can use to check
versions and modification that you make. To do this, set N = 360, tolerance =
1.0e-14, maxits = 500000, and showplate = 1 (for C) or showplate = .true. (for
Fortran). That will take some time (I found it took 416909 iterations) and the
plate file will be 5 Mbytes. Then save
the file as something like "true_solution" or whatever else you want to name it.
Then you can use it to check your variants of the code. But don't hand in that file;
I can generate the correct solution myself.
The main task is to
create three modifications of the function/subroutine jacobi.c or jacobi.f90:
- gauss_seidel.c or gauss_seidel.f90, implementing the Gauss-Seidel variant.
That is the version which merges the three double-nested loops in the Jacobi
version into a single double-nested loop. That will mean the nodes are updated
as soon as their update values are known, instead of computing all of the
update values in dT and then adding them to T.
- omp_jacobi.c or omp_jacobi.f90, where you insert OpenMP directives to
make the Jacobi method as fast as possible.
- omp_gauss_seidel.c or omp_gauss_seidel.f90, which uses OpenMP to make
the Gauss-Seidel method as fast as possible.
Here "as fast as possible" means taking least amount of wall-clock time.
According to math theory, the Gauss-Seidel method takes fewer iterations than
the Jacobi method does. That does not imply that it will take less time, however.
The main questions will be how much impact does changing to Gauss-Seidel
have, how much impact OpenMP has, which of Jacobi or Gauss-Seidel
benefits most from using OpenMP, and what kind of OpenMP scheduling
(static, dynamic, guided) is best. For the latter, you can get into an infinite
number of tests choosing the chunk size for the scheduling. To avoid that,
just use default scheduling clauses without specifying the chunksize.
Parameter settings
While testing and debugging your versions it will probably suffice to use
the input file "parameters" with the line
128 1234567 1.0e-6 100 1 1
or for Fortran,
128 1234567 1.0e-6 100 .true. 1
Those specify N = 128, maxits = 1234567, tolerance = 1.0e-6, printfrequency = 100,
showplate = true, and using method 1 (Jacobi iteration).
That will create the convergence and final solution files which you can check
against the original version. Once you're confident things are running correctly,
use
1000 1234567 1.0e-6 -100 0 1
or for Fortran,
1000 1234567 1.0e-6 -100 .false. 1
Those settings will be suitable for timings and drawing conclusions. You can
try larger values of N for fun, but make sure that the run does not terminate
from reaching the maximum allowed number of iterations. If that happens, then
you cannot draw any conclusions about the relative speed, especially if
comparing to a version that did not reach maxits.
Notes and warnings
Most but not all of these points were covered in class:
- The Gauss-Seidel version should be easy to create since it takes little
modification to the code.
- Generally it is more efficient with Openmp to start the parallel region
outside of all loops, rather than inside a loop. for the Jacobi method, doing
it inside the while-loop can mean starting and stopping the parallel region
hundreds of thousands of times. however, starting the parallel region before
the while loop means you will need to make sure that variables like the inner
lis declared to beioop index i is declared to be private.
The variable tempsum will need to be declared reduction(+:tempsum)
This is a case where you should consider using the
default(none) declaration, since that will force you to consider what
status each variable should have in the parallel region.
- You will also need to make sure only one thread updates the count variable
iterations (which should be a shared variable?).
- Generally it is more efficient with OpenMP to make the outermost loop
in a loop nest the parallel one. As covered in class be aware that
it may mean the number of iterations, execution time, and even solution found
by the Gauss-Seidel method may differ from one run to another, even with all
the problem parameters kept identical.
- If you decide to time multiple versions in a single execution, don't
forget to re-initialize the array T to be all zeros for the interior nodes
before calling a different solver.
- In Fortran, there are many ways to code the loops that update T and/or dT:
- Use nested do-loops as in the original jacobi.f90
- Use array operations with no do-loops: T = T + dT
or T(1:N,1:N) = T(1:N,1:N) + dT(1:N,1:N)
- Use a single forall loop: forall(i=1:n,j=1:N) ...
- Use do-concurrent loops
- Use some combination, e.g., replace the innermost loop
with an array operation like T(1:N,j) = T(1:N,j) + dT(1:N,j)
You are free to use any variant you want, but beware that if you use a forall
loop, gfortran will not spread the work across multiple threads, even if you
specify the "workshare" construct. You can also replace the associations
(left,right,up,down) I used with direct array references, or use
Fortran pointers in place of them. My recommendation is
to use the double-nested loops like the ones in jacobi.f90; the other variant
ways of expressing the computation either have no effect or (like the forall)
obviate any OpenMP efforts. Do not use co-arrays.
How and what to hand in
Submit your results through Canvas. In the directory c or f (depending on what
language you choose to use) add in a plain text file
named "analysis" with your results summarized - this should not take more than one
page. Do not use MS Word, LibreOffice Writer, Latex, or some PDF file. Nothing
here requires that kind of formating or effort.
Make sure the directory also has your gauss_seidel.*, omp_jacobi.*, and
omp_gauss_seidel.* files, where * = c or f90. Also make sure to
delete all executable or object files. My makefiles create an executable
named "go", and the object files have a .o suffix. The makefiles provided will
remove those files do that if you do "make clean" in the directory.
Put your name in every file you hand in, as a comment line in the code and
as text in your analysis page.
Pack the folder using
tar cf c.tar c
or
tar cf f.tar f
and then submit it via Canvas.
- Last Modified: Tue 06 Feb 2018, 06:33 PM