Nevertheless it is instructive to figure out how to time really small, fast things like simple operations. In some cases of developing high performance code you will want to know when to trade off the multiplication 2*x with x+x, which in turn means knowing the relative costs of a multiply and a divide. [In practice doing the "operation reduction" of 2*x into x+x has not been worth doing for many, many years now. But myth and legend lay a heavy weight on the area of scientific computing, often propogated by CS professors who really should retire and yell at kids to stay off their lawn. However, you'll still see it done in some codes because it was useful at the time it was written.]
A simple operation like floating point addition will invariably take much less time than the clock resolution and overhead. If not, update your computer from the pre-1990 dinosaur you're apparently using. Doing several small operations in a loop has the problem that many other things are going on, which are of significant size compared to the simple operation.
For example, the loop index must be initialized, incremented, and tested on each iteration. Furthermore, there is a "branch" instruction at the heart of every loop (typically buried in the assembly language your code generates) - and branches can be far more expensive than a floating point add.
The key idea is to time two loops with lots of repetitions, each with the loop body containing the same statement replicated several times. If they differ in the number of repetitions, then subtracting the time for each will also subtract off the overhead time of setting up and managing the loop - branches and all. Here is the technique for addition:
time_1 = get_current_time() for k = 1, ... , repetitions x = x + y . . . x = x + y end for time_2 = get_current_time() for k = 1, ... , repetitions x = x + y . . . x = x + y end for time_3 = get_current_time()where the first loop body has the operation replicated k1 times and the second has it replicated k2 times. Subtracting gives the number of operations:
number_ops = repetitions*(k2 - k1),where k2 > k1. The cost (in time) of doing the operation can then be found with some simple arithmetic on time_1, time_2, and time_3, but you should compute the formula yourself to help understand this methodology.
Problems to consider when doing this include:
The material on this page was originally written in 1992, but has been updated several times. Over that time, CPU hardware and run-time systems have gotten much better at handling the branching that occurs in any loop, using speculative execution, predictive branching, etc. So the claim about branching taking far longer than updating the loop counter may well be completely wrong by now. The techniques covered in this material about timing provide you with the knowledge you need to write a small toolkit that can experimentally determine the tradeoff costs. Don't trust vendor documentation or claims - it is far more reliable to write and run a small test code.