B673: Assignment 1

Jacobi method for heat diffusion problem

The codes covered in class are available, and you can download the tar file with the files. When uncompressed/expanded, both will create a directory named "diffusion", with two subdirectories "c" and "f", for C and Fortran. Choose either, or if you want to, write your own code. In addition, there are files

ccc, which can be sourced (not executed) to set up environment variables for OMP (and MKL and Fortran co-arrays). It uses c-shell syntax, so you will need to edit it if you use bash instead.
plothistory.m which will read in a file named "history" as specified in the code, and plot the convergence of the iterative method.
showplate.m which reads in a file named "plate" and will give a 3D image of the plate and its boundary conditions.
showresults.m which reads in a file named "results" and creates 3 plots of the four methods you need to implement: one each for number of iterations, elapsed time, and Gflop/sec computational rate.

The C/Fortran codes solve a heat diffusion problem on a 2D metal plate, discretized into a (N+2) x (N+2) domain where N is a given mesh parameter. The plate is represented by a 2D array T, although in C it is stored in a 1D array. The update to T on each iteration is in an array dT, standing for "delta T". The outer nodes on the plate, indexed by T(0,:), T(:,0), T(N+1,:), and T(:,N+1), are boundary conditions set by the function BC.c or subroutine BC.f90. The interior nodes, indexed by T(1:N,1:N), are initialized to all zeros.

Your tasks

First, pick C/C++ or Fortran to work with. Make sure you can compile and run the code as is; this may require some changes to the make.inc file to match the platform you are using. If you want, run the code to generate the file "plate" that has a solution in it that you can use to check versions and modification that you make. To do this, set N = 360, tolerance = 1.0e-14, maxits = 500000, and showplate = 1 (for C) or showplate = .true. (for Fortran). That will take some time (I found it took 416909 iterations) and the plate file will be 5 Mbytes. Then save the file as something like "true_solution" or whatever else you want to name it. Then you can use it to check your variants of the code. But don't hand in that file; I can generate the correct solution myself.

The main task is to create three modifications of the function/subroutine jacobi.c or jacobi.f90:

gauss_seidel.c or gauss_seidel.f90, implementing the Gauss-Seidel variant. That is the version which merges the three double-nested loops in the Jacobi version into a single double-nested loop. That will mean the nodes are updated as soon as their update values are known, instead of computing all of the update values in dT and then adding them to T.
omp_jacobi.c or omp_jacobi.f90, where you insert OpenMP directives to make the Jacobi method as fast as possible.
omp_gauss_seidel.c or omp_gauss_seidel.f90, which uses OpenMP to make the Gauss-Seidel method as fast as possible.

Here "as fast as possible" means taking least amount of wall-clock time. According to math theory, the Gauss-Seidel method takes fewer iterations than the Jacobi method does. That does not imply that it will take less time, however.

The main questions will be how much impact does changing to Gauss-Seidel have, how much impact OpenMP has, which of Jacobi or Gauss-Seidel benefits most from using OpenMP, and what kind of OpenMP scheduling (static, dynamic, guided) is best. For the latter, you can get into an infinite number of tests choosing the chunk size for the scheduling. To avoid that, just use default scheduling clauses without specifying the chunksize.

Parameter settings

While testing and debugging your versions it will probably suffice to use the input file "parameters" with the line

128  1234567  1.0e-6  100  1  1

or for Fortran,

128  1234567  1.0e-6  100  .true. 1

Those specify N = 128, maxits = 1234567, tolerance = 1.0e-6, printfrequency = 100, showplate = true, and using method 1 (Jacobi iteration). That will create the convergence and final solution files which you can check against the original version. Once you're confident things are running correctly, use

1000  1234567  1.0e-6  -100  0  1

or for Fortran,

1000  1234567  1.0e-6  -100  .false. 1

Those settings will be suitable for timings and drawing conclusions. You can try larger values of N for fun, but make sure that the run does not terminate from reaching the maximum allowed number of iterations. If that happens, then you cannot draw any conclusions about the relative speed, especially if comparing to a version that did not reach maxits.

Notes and warnings

Most but not all of these points were covered in class:

The Gauss-Seidel version should be easy to create since it takes little modification to the code.
Generally it is more efficient with Openmp to start the parallel region outside of all loops, rather than inside a loop. for the Jacobi method, doing it inside the while-loop can mean starting and stopping the parallel region hundreds of thousands of times. however, starting the parallel region before the while loop means you will need to make sure that variables like the inner lis declared to beioop index i is declared to be private. The variable tempsum will need to be declared reduction(+:tempsum) This is a case where you should consider using the default(none) declaration, since that will force you to consider what status each variable should have in the parallel region.
You will also need to make sure only one thread updates the count variable iterations (which should be a shared variable?).
Generally it is more efficient with OpenMP to make the outermost loop in a loop nest the parallel one. As covered in class be aware that it may mean the number of iterations, execution time, and even solution found by the Gauss-Seidel method may differ from one run to another, even with all the problem parameters kept identical.
If you decide to time multiple versions in a single execution, don't forget to re-initialize the array T to be all zeros for the interior nodes before calling a different solver.
In Fortran, there are many ways to code the loops that update T and/or dT:
- Use nested do-loops as in the original jacobi.f90
- Use array operations with no do-loops: T = T + dT or T(1:N,1:N) = T(1:N,1:N) + dT(1:N,1:N)
- Use a single forall loop: forall(i=1:n,j=1:N) ...
- Use do-concurrent loops
- Use some combination, e.g., replace the innermost loop with an array operation like T(1:N,j) = T(1:N,j) + dT(1:N,j)
You are free to use any variant you want, but beware that if you use a forall loop, gfortran will not spread the work across multiple threads, even if you specify the "workshare" construct. You can also replace the associations (left,right,up,down) I used with direct array references, or use Fortran pointers in place of them. My recommendation is to use the double-nested loops like the ones in jacobi.f90; the other variant ways of expressing the computation either have no effect or (like the forall) obviate any OpenMP efforts. Do not use co-arrays.

How and what to hand in

Submit your results through Canvas. In the directory c or f (depending on what language you choose to use) add in a plain text file named "analysis" with your results summarized - this should not take more than one page. Do not use MS Word, LibreOffice Writer, Latex, or some PDF file. Nothing here requires that kind of formating or effort.

Make sure the directory also has your gauss_seidel.*, omp_jacobi.*, and omp_gauss_seidel.* files, where * = c or f90. Also make sure to delete all executable or object files. My makefiles create an executable named "go", and the object files have a .o suffix. The makefiles provided will remove those files do that if you do "make clean" in the directory. Put your name in every file you hand in, as a comment line in the code and as text in your analysis page. Pack the folder using

 tar cf c.tar c

 tar cf f.tar f

and then submit it via Canvas.

Last Modified: Tue 06 Feb 2018, 06:33 PM