I was recently discussing aliasing problems when it comes to code optimization. And something stroke me : most people aren’t aware of very simple cases where aliasing just kill your performance. Before studying some simple code, let’s explain what aliasing is. And how it annoys your tiny precious compiler when it comes to optimization.

Aliasing refers to a simple fact : the same piece of memory can be accessed by different symbolic names. ( http://en.wikipedia.org/wiki/Aliasing_%28computing%29 ). So when converting your code into assembly, the compiler must ensure that the code executes exactly what is written. A simple case :

void foo( int * a, int * b)
{
    a[ 0 ] = b[ 0 ];
    a[ 1 ] = b[ 1 ];
    a[ 2 ] = b[ 2 ];
}

But what happen if this function is called like this :

foo( array + 1, array );

b[ 1 ] points to the same memory a[ 0 ] points to. To prevent the code from not behaving like it should, the compiler generates code that will work in this situation.

The pseudo assembly is then :
 load r1, b+0
 store a+0, r1
 load r1, b+1
 store a+1, r1
 load r1, b+2
 store a+2, r1
 
While you might think this code is ok, it’s pretty slow. Hardware will stall waiting for r1 to be loaded before storing it. And it happens three times.

In this case, we would have hopped this code :
 load r1, b+0
 load r2, b+1
 load r3, b+2
 store a+0, r1
 store a+1, r2
 store a+2, r3
 
This way, we only pay the latency once.

That’s an easy case and it’s easy to fix. One would recommend the use of __restrict :

void foo( int * __restrict a, int * __restrict b)

But there is another way, especially if your compiler does not support __restrict keyword.

void foo( int * a, int * b)
{
    int
        b0 = b[ 0 ],
        b1 = b[ 1 ],
        b2 = b[ 2 ];
       
    a[ 0 ] = b0;
    a[ 1 ] = b1;
    a[ 2 ] = b2;
}

It should give the same result ( at least in release mode ).

Rule of thumb to prevent aliasing : use local variables. The compiler knows their scope, and so knows nothing aliases with it.

While the first example is easy to spot, look at this one :

bool findItem( int & item_index, const string & value, const string_table & table )
{
    int count = table.size();
   
    for( item_index = 0; item_index < count; ++item_index )
    {
        if( table[ item_index ] == value )
        {
            return true;
        }
    }

    return false;
}

All arguments are passed as references, i.e. pointers. The compiler needs to apply aliasing rule. Moreover, operator == is a function, operator [] is also a function. The compiler does not know what happens there. Have you found what can go wrong here? What about item_index? It’s a pointer. What might happen if item_index is modified by any call in this function? You might expect the compiler will cache the index into a register.

But no, aliasing rules apply!! :

for( item_index = 0; item_index < count; ++item_index ) convert to :

load r2, count
store item_index, 0
start:
    load r1, item_index
    jump to end if ! (r1 < r2 )
    //…loop code
    load r1, item_index
    inc r1
    store item_index, r1
jump start
end:

The compiler must re-read the variable after the loop code. It might have been changed somewhere else. This cause a “Load hits Store” (More about it here : http://assemblyrequired.crashworks.org/2008/07/08/load-hit-stores-and-the-__restrict-keyword/). The memory containing item_index is written just before the same memory is read. The CPU just stalls a lot of cycle waiting for the store and the read to be finished. Fixing this problem is easy : use a local variable. The same code again, but fixed :

bool findItem( int & item_index, const string & value, const string_table & table )
{
    int count = table.size();
   
    for( int local_item_index = 0; local_item_index < count; ++local_item_index )
    {
        if( table[ local_item_index ] == value )
        {
            item_index = local_item_index;
            return true;
        }
    }

    return false;
}

Aliasing can impact performances even if the piece of code does not seem critical. But for critical code, it can be worst. Let’s image a 4X4 matrix multiplication :

r->xx = a->xx * b->xx + a->xy * b->yx + ….
r->xy = a->xx * b->xy + a->xy * b->yy + ….

If no care is taken, a->xx will be read four times. So will be all other elements. After having wrote r->xx, any elements of any matrix can be changed, as r->xx might have point to the same memory. Once again, __restrict or local variable can be used to fix the problem.

Consider aliasing early in your code. It can improve performance without much work. If you use the __restrict keyword, you must respect the contract you made with the compiler (asserts are your friends to ensure condition are ok ). Using local variables, the code will still be valid, but might be a little less optimal. But whatever technique you decide to use, I recommend you read the generated code, you might be surprised!