## Overhead of the modulo operator

While optimizing a number-crunching algorithm, I noticed (almost by accident) that a large amount of time was spent performing modulo/remainder operations (%) used for wrapping around the ends of arrays.  I found that changing

`i = (i + 1) % n;`

to

```i++;
while(i >= n) {
i -= n;
}```

sped up my code.  I decided to investigate how fast the common integer arithmetic operators are.  I ran some tests using a microbenchmarking tool I wrote.  (It takes care of warming up the code for JIT compiling, randomizing test order, and repeating tests to calculate variability.)  For comparison, I wrote a similar tool in C++ and tested the operations with it, too.  I included three ways of incrementing with wraparound:

• `i = (i + 1) % n;`
• `i++; while(i > n) i -= n;`
• `i++; if(i > n) i -= n;`

Here are the results.  Times are the time to perform a single operation:

Operation Java C++ C++ (optimization enabled)
Addition 0.3993 +/- 0.0005 ns 2.5030 +/- 0.0020 ns 0.1530 +/- 0.0000 ns
Multiplication 1.2647 +/- 0.0075 ns 3.1280 +/- 0.0070 ns 1.2880 +/- 0.0020 ns
Division 11.1591 +/- 0.0807 ns 13.2260 +/- 0.0050 ns 11.3220 +/- 0.0020 ns
Modulo 12.6602 +/- 0.2044 ns 14.9750 +/- 0.2280 ns 12.0850 +/- 0.2150 ns
Increment with modulo 13.4542 +/- 0.0289 ns 15.3370 +/- 0.0150 ns 13.0320 +/- 0.0040 ns
Increment with while 0.7853 +/- 0.0553 ns 3.5040 +/- 0.3080 ns 1.4430 +/- 0.0810 ns
Increment with if 1.1490 +/- 0.0611 ns 3.2240 +/- 0.2540 ns 1.7230 +/- 0.0000 ns

Why are there differences between the Java and C++ results?  I’m not sure, but they are reproducible and consistent.  Almost certainly, it boils down to the exact native opcodes the code is (just-in-time) compiled to.

Also surprising is that incrementing with a “while” check is faster than incrementing with an “if” check.  I expected the “while” to be slower, because it requires an additional comparison after `i` is decremented.  Whatever the cause is, it is reproducible and fairly consistent across Java and C++.

Of course, results will vary by compiler, JVM, and architecture, so this should all be taken with a grain of salt.