Timings on Computers
Introduction and the Unix time Command
Accurately timing things on computers is essential for performance analysis,
identification of bottlenecks in programs, and experimental determination
of computational efficiency. For large scale problems it is also important
to track down
-
memory activity such as page faults, which occurs when the program
accesses data that is not in the computer's main memory. In this case,
the data needs to be read in from disk (or tape on
HPSS),
and can require 10-1000 times as long to access.
-
I/O activity such as periodic output of the state of the computation. Although
not always avoidable, sometimes this can be made more efficient by writing
output files in binary format, or by spawning a separate thread to handle
the output.
You may think you know just where memory and I/O activity is going on,
but the late 1980's Pablo project at UIUC showed that users are typically unaware
of the actual effects I/O
has on their programs. That has changed in the past 30 years, and now
computational scientists are even more unaware of the effects that I/O
imposes.
For timing, the Unix command time in your default shell
gives data for an entire program
(or more precisely, an entire Unix process). If the executable is called
"a.out", then a typical result using a sh, csh, or tcsh shell would look like
% time a.out
2648.1u 330.0s 52:43 94% 0+12372k 347+21447io 347pf+0w
The breakdown of the output line is:
-
2648.1u: seconds of user time devoted to process. User time is the time
the CPU spent executing your code (and not other processes that may have
been swapped in and out during your code's execution).
-
330.0s: seconds of system time devoted to the process. System time is time
spent in executing system calls. A major source of system time is I/O
- a memory access that goes to or from a disk requires calling the operating
system to perform the action, and this is charged to the system time. Another
source is calling a timing routine within your code.
-
52:43: elapsed wall-clock time. This means 52 minutes, 43 seconds and is
what would be recorded if you used a stopwatch.
-
94% : Percent of machine's CPU resources devoted to your process. This
roughly means that during the 52:43 wall clock time, your code was being
executed 94% of the time, and other code was running 6%. If this number
is 50% or less, then possibly the timings are not reliable because your
program was being swapped in and out of the CPU, to give time slices to
other user's programs.
-
0+ : Average amount of shared memory in kbytes (for shared libraries).
-
12372k : Average amount of unshared memory in kbytes. This indicates the
program used 12.4 Mbytes of memory.
-
347+ : Number of block input operations.
-
21447io : Number of block output operations. Output is buffered; the operating
system waits until you have accumulated enough output to a file to make
it worthwhile to actually execute a write to disk (which is relatively
expensive in terms of time).
-
347pf+ : Number of page faults. This indicates how many times during execution
the program tried to access data that was not available in main memory
and instead resided on a disk. For this example, most of the page faults
probably only occured on the initialization
phase of this program. Note that the number matches the number of block
input operations, making that conjecture even more likely.
-
0w : Number of times your job is moved out of the game for another job
to run on the CPU. This can indicate a heavily loaded machine (for
which the only cure is to buy a more modern one), or it can indicate that
your program is making several "system calls". A system call occurs when
your program has to request the operating system to execute some command,
such as an I/O operation or to read/set a system variable like the clock.
In this case, the OS regards it as a great opportunity to move your job
out and another one in. Why is this shown as 0, but the code had the
CPU for only 94% of the time? The answer involves ....
Also, be aware that just using "time" will generally use your shell's timer.
The same info can be obtained using /usr/bin/time instead, and with the -v
option it displays the fields in a more understandable format.
Problems With Using Unix time
First, not all of the fields may be implemented by a given computer.
This is not a problem that the user can deal with; it is up to the vendor.
Obviously this can lead to misleading results. In the time example above,
were there really no swaps, or does the OS interrupt your job and
swap it out so often (even on a lightly loaded machine) that the vendor
finds it too embarrasing to let a user find it out easily? [BTW: that actually
was the case for the example above].
Another weakness: while the time command may show you've gotten 99.9% utilization of
the CPU during the execution, that is just the fraction of time spent
on your process, not the fraction of utilization you are getting of the
theoretical peak of the machine. A vector update operation can get
99.9% CPU utilization over 217 seconds ... and yet just be running at
5% of the possible peak rate.
More deadly, the numbers are for the complete process. You typically
are interested in an algorithm's time and want to exclude time it takes
to read data in, to output diagnostics and results, etc. Or you want those
those sections to be timed, and need to
isolate those individual segments. Even more typically, if a code is taking
a long time, you want to know
why and where it is taking a long time. That
means finding where the time gets spent, in some case all the way down
to which individual loops or function calls are the guilty parties.
So we need to look at timers which can be invoked from within your program.
Next page:
getrusage()
- Started change tracking: Mon 14 Sep 2009, 01:58 PM
- Modified: Mon 29 Aug 2016, 07:32 AM to correct links
- Modified: Fri 27 Sep 2019, 02:31 PM for /usr/bin/time, fix hpss link
- Last Modified: Fri 27 Sep 2019, 02:33 PM