Writing optimizer-friendly code

An optimizing compiler tries to generate code that is both small and fast. It does so by repeatedly applying a number of transformations to the source program. Most optimizations follow mathematical or logical rules based on a sound theoretical foundation. Other transformations are based on heuristics, where experience has shown that some transformations often result in good code or open up opportunities for further optimization.

There are very few cases where the optimizer relies on black magic, so the way you write your source code can determine whether an optimization can be applied to your program or not. Sometimes small changes in the source code can have a significant impact on how efficient code the compiler can generate.

This article will list a few things to keep in mind while writing your code, but let's start by making one thing clear. Trying to write your code on as few lines as possible, using ?:-expressions, postincrements, and comma expressions to squeeze in a lot of side effects in a single expression, will not make the compiler generate more efficient code. It will just make your source code convoluted and hard to maintain. It is easy to overlook a postincrement or an assignment if it is done in the middle of a complex expression. Write your code in a style that is easy to read.

Loops

What can go wrong in a simple loop like this?

for (i = 0; i != n; ++i) 
{ 
a[i] = b[i]; 
}

Well, nothing can go wrong but there are several things that affect how efficient code the compiler can generate.

The type of the index variable should match the pointer.

An array expression like a[i] really means *(&a[0]+i*sizeof(a[0]) or in plain English: add the offset of the i:th element to a pointer to the first element of a. For pointer arithmetic it is generally a good idea if the index expression has the same size as the pointer (except for __far pointers where the pointer and the index expression have different sizes). If the type of the index expression is smaller than the pointer type, the index expression has to be cast to the correct size before adding it to the pointer. If stack space is more precious than code size it might be worth choosing a smaller type for the index variable, but it will often cost both code size and execution time. The casts can also prevent several loop optimizations.

The loop condition is important too. Many of the loop optimizations can only be performed if the number of iterations can be computed prior to entering the loop. Unfortunately, it isn't always as simple as subtracting the initial value from the final value and dividing with the increment. What happens if i is an unsigned char, n is an int, and n has the value 1000? The variable i will overflow long before it reaches 1000. It is unlikely that the programmer wanted an infinite loop that repeatedly copies 256 elements from b to a, but the compiler cannot divine the programmer's intentions. It has to assume the worst and cannot apply any of the optimizations that need the trip count prior to entering the loop. You should also avoid the relation operators <= and >= in loop conditions where the final value is a variable. If the loop condition is i <= n there is a possibility that n has the highest value representable in the type, so the compiler has to assume this is a potentially infinite loop.

Aliasing

Global variables are usually considered bad practice. Global variables can potentially be modified from anywhere in your program, and any part of the program may depend on the value of a global variable. This can result in complex dependencies making it hard to understand the program and identify which parts of the program that might be affected if the value of a global variable is changed. From the optimizer's point of view it is even worse, a store through a pointer can potentially change the value of any global variable. It is called aliasing when a variable can be accessed in more ways than one, and aliasing makes the code much harder to optimize.

char *buf
void clear_buf() 
{
 int i; 
 for (i = 0; i < 128; ++i) 
 { 
 buf[i] = 0; 
 } 
}

Even though the programmer might know that writing to the buffer which buf points to will never change the variable itself, the compiler has to assume the worst and reload buf from memory on every iteration of the loop.

Passing the address of the buffer as an argument instead of using a global variable will remove the aliasing:

void clear_buf(char *buf)
{ 
 int i; 
 for (i = 0; i < 128; ++i) 
 { 
 buf[i] = 0;
 } 
}

After that small change the pointer buf cannot be affected by the stores through the pointer. The pointer buf is invariant in the loop and the value can be loaded once, before the loop, instead of reloading it on each iteration.

Global variables can be useful when passing information between sections of code that don't share a caller/callee relation. If you do, just use them to pass the information. It is better to use auto variables for anything computationally intensive, especially when computations also involve pointer operations.

Try to avoid postincrement and postdecrement

In the following, everything said about postincrements applies to postdecrements as well. The C standard text for the semantics of postincrement states that "The result of the postfix ++ operator is the value of the operand. After the result is obtained, the value of the operand is incremented". While it is fairly common that microcontrollers have an addressing mode that can increment a pointer after a load or store operation, very few microcontrollers can handle postincrement of other types with the same efficiency. To comply with the standard the compiler may have to copy the operand to a temporary variable before doing the increment. For straight line code the increment can be lifted out of the expression and placed after the expression. An expression like

foo = a[i++];

can be executed as

foo = a[i];
i = i + 1;

But what happens when the postincrement is a part of the condition in a while loop? There is no place after the condition where the increment can be inserted so the increment must be done before the test.

A simple loop like

i = 0;
while (a[i++] != 0)
 {
 ... 
}

has to be executed as

loop: 
 temp = i; /* save the value of the operand */
 i = temp + 1; /* increment the operand */ 
 if (a[temp] == 0) /* use the saved value */ 
 goto no_loop;
 ... 
 goto loop; 
no_loop:

or

loop: 
 temp = a[i]; /* use the value of the operand */
 i = i + 1; /* increment the operand */
 if (temp == 0)
 goto no_loop;
 ... 
 goto loop; 
no_loop:

If the value of i after the loop is irrelevant, it is better to place the increment inside the loop body. The almost identical loop

i = 0; 
while (a[i] != 0) 
{ 
++i; 
... 
}

can be executed without temporary variables:

loop:
if (a[i] == 0) 
goto no_loop;
 i = i + 1;
 ... 
goto loop; 
no_loop:

Developers of optimizing compilers are well aware of the complexities introduced by postincrements. Even though we try our very best to identify these patterns and eliminate as many of the temporaries as possible, there will always be cases where it is hard to generate efficient code—especially when the loop conditions grow more complex than in the loop above. It is often better to split up a complex expression into several simpler expression, as when the loop condition above was divided into a test and an increment.

The choice between preincrement and postincrement is even more important in C++. Both operator++ and operator--can be overloaded in both their prefix and postfix forms. When operators are overloaded for class objects it isn't necessary to emulate the behavior of the operator on the basic types, but it is certainly a good idea to keep the behavior as close to the basic types as possible. So classes where it is intuitive to increment and decrement objects, e.g. iterators, typically have both the prefix (operator++() and operator--()) and the postfix forms (operator++(int) and operator-- (int)).

To emulate the behavior of the prefix ++ for a basic type operator++() can modify the object and return a reference to the modified object. What about emulating the behavior of postfix ++ for a basic type? Do you remember? "The result of the postfix ++ operator is the value of the operand. After the result is obtained, the value of the operand is incremented." Just as for the non-straight line code above, the implementer of operator++(int) has to copy the original object, modify the original object, and return the copy by value. The copying makes operator++(int) more costly than operator++().

For the basic types the optimizer can often remove unnecessary copying if the result of i++ is ignored, but the optimizer cannot change a call to one overloaded operator into another. If you, out of habit, write i++ instead of ++i you will call the more costly increment operator.

We have been arguing against the use of postincrement, but we have to admit that there are times when postincrement is useful. If postincrementing a variable would do exactly what you want to do, go ahead, use postincrement. But don't postincrement a variable in an expression just to avoid writing a separate expression to increment the variable. Every time you add an unnecessary postincrement to a loop condition, an if condition, a switch expression, a ?:-expression, or function call arguments, there is a possibility that the compiler has to generate larger and slower code. Was that list too long to remember? Start a new habit today! Write ++i instead of i++ when you don't use the result of the incrementation. When you do use the result, ask yourself if you can't do the increment as the next statement.