Hello humans. Welcome to the 7th part of the C/C++ Low Level Curriculum series I’ve been writing. This post covers the conditional operator, and switch statements. As per usual I will be showing snippets of C++ code and throwing the corresponding x86 assembler at you (as produced by VS2010) to show you what your high level code is actually doing at the assembler level.

Disclaimer: in an ideal world I’d like to try to avoid assumed knowledge, but keeping up the level of detail in each post that this entails is, frankly, too much work. Consequently I will from now on point you at post 6 as a “how to” and then get on with it…

Here are the backlinks for preceding articles in the series (warning: it might take you a while, the first few are quite long):

  1. http://altdevblogaday.com/2011/11/09/a-low-level-curriculum-for-c-and-c/
  2. http://altdevblogaday.com/2011/11/24/c-c-low-level-curriculum-part-2-data-types/
  3. http://altdevblogaday.com/2011/12/14/c-c-low-level-curriculum-part-3-the-stack/
  4. http://altdevblogaday.com/2011/12/24/c-c-low-level-curriculum-part-4-more-stack/
  5. http://altdevblogaday.com/2012/02/07/c-c-low-level-curriculum-part-5-even-more-stack/
  6. http://altdevblogaday.com/2012/03/07/c-c-low-level-curriculum-part-6-conditionals/ [see near the top of this post for details on compiling & running the code snippets]

The conditional operator

I assume that everyone’s familiar with the conditional operator, also known as the “question mark”, or the ternary operator (“ternary” because it’s the only C/C++ operator that takes three operands).

If you’re not, here’s a link so you can catch up (I predict that you will be so stoked to find out about it that you will be over-using it within the week).

Personally I heartily approve of the conditional operator when used judiciously, but it’s not always great for source level debugging because it’s basically a single line if-else and can be hard to follow in the debugger (in fact I’ve heard of it being banned under the coding standards at more than one company, but there you are we can’t all be sane can we?).

Anyway, let’s have a quick look at it with some code:

1
2
3
4
5
6
7
8
9
10
#include "stdafx.h"
 
int main(int argc, char* argv[])
{
    // the line after this comment is logically equivalent to the following line of code:
    // int iLocal; if( argc > 2 ){ iLocal = 3; }else{ iLocal = 7; }
    int iLocal = (argc > 2) ? 3 : 7;
 
    return 0;
}

If you remember the the assembler that a basic if-else generated in the last article, then the assembler generated here will probably bust your mind gaskets…

Note:

  1. I’ve deliberately left the function prologue and epilogue out of the asm below, and just left the assembler involved with the conditonal assignment
  2. if your disassembly view doesn’t show the variable names, then you need to right click the window and check “Show Symbol Names”
1
2
3
4
5
6
     5:     int iLocal = (argc > 2) ? 3 : 7;
01311249  xor         eax,eax
0131124B  cmp         dword ptr [argc],2
0131124F  setle       al
01311252  lea         eax,[eax*4+3]
01311259  mov         dword ptr [iLocal],eax

Clearly this is not very much like the code for the simple if-else that we looked at previously.

This is because there is trickery afoot and the compiler has chosen to do sneaky branchless code to implement the logic specified by the C++ code.

So, let’s examine it line by line:

  • line 1 – uses the xor instruction to set eax to 0. Anything XORed with itself is 0.
  • line 2 – as in the previous if examples this uses cmp to test the condition, setting flags in a special purpose CPU register based on the result of the comparison.
  • line 3 – this is a new one! The instruction setless equal sets its operand to 1 if the 1st operand of the preceding cmp was less than or equal to the 2nd operand, and to 0 if it was greater. We’ve not seen the operand al before, it’s a legacy (386) register name which now maps to the lowest byte of the eax register (if you’re a sensible person and are stepping through this code in your debugger with the register window open, you will see that this instruction causes the eax register to be set to 1 – also note that this only works because eax has already been set to 0).
  • line 4 – uses the load effective address instruction do do some sneaky maths that relies on the value of eax set by setle in line 3.
  • line 5moves the value from eax into the memory address storing the value of iLocal

That’s all fine, but how does it work?

Firstly, note that at the assembler level the comparative instruction setle is (as in the previous post’s examples) testing the opposite condition to the conditonal specified in the C++ code.

This means that the eax register will be set to 0 in line 3 if argc is greater than 2, which in turn means that the eax*4+3 part of line 4 will evaluate to (0*4)+3 - i.e. 3.

Conversely, if argc is less than or equal to 2, the eax register will be set to 1 which in turn means that line 4 will evaluate to (1*4)+3 - i.e. 7.

So, as you can see, the assembler is doing the same branchless set of instructions  regardless of the condition, but using the 0 or 1 result of the conditional instruction in the maths to cancel out or include one of the terms and give what I like to call a “mathematical if”. Clever.

Incidentally this sort of branchless-but-still-conditional code has been a bit / lot of a hot topic over the last few years, especially on consoles  since their CPUs are particularly branch mis-prediction sensitive.

Judicious use of the “branchless condtional” idiom is a tool that can be used to combat branch (mis-)prediction related performance issues – for an example of this, see the use of the fsel PPU instruction in this ADBAD post by Tony Albrecht, and for brief a discussion of branch prediction issues (primarily PC related) see this article by Igor Ostrovsky (who works for Microsoft).

The conditional operator (part deux)

So, clearly our above super-simple-sample resulted in the compiler generating clever assembler because of the constant values in it; interesting certainly, but not necessarily representative of most “real world” assembler.

Let’s see what happens if we use variables with the conditional operator…

1
2
3
4
5
6
7
8
9
10
#include "stdafx.h"
 
int main(int argc, char* argv[])
{
    int iOperandTwo = 3;
    int iOperandThree = 7;
    int iLocal = (argc > 2) ? iOperandTwo : iOperandThree;
 
    return 0;
}

And, here’s the relevant disassembly:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
     5:     int iOperandTwo = 3;
00CF1619  mov         dword ptr [iOperandTwo],3
     6:     int iOperandThree = 7;
00CF1620  mov         dword ptr [iOperandThree],7
     7:     int iLocal = (argc > 2) ? iOperandTwo : iOperandThree;
00CF1627  cmp         dword ptr [argc],2
00CF162B  jle         main+25h (0CF1635h)
00CF162D  mov         eax,dword ptr [iOperandTwo]
00CF1630  mov         dword ptr [ebp-50h],eax
00CF1633  jmp         main+2Bh (0CF163Bh)
00CF1635  mov         ecx,dword ptr [iOperandThree]
00CF1638  mov         dword ptr [ebp-50h],ecx
00CF163B  mov         edx,dword ptr [ebp-50h]
00CF163E  mov         dword ptr [iLocal],edx

Since the conditional operator is now assigning from variables we’d expect it to generate something that looks more like the sort of code we saw from the basic if-else we looked at last time, which it has.

We have the expected cmp followed by a conditional jump testing against the opposite of the conditional, then two blocks of assembler, the first of which (lines 7 to 9) unconditionally jumps over the second (lines 10 and 11) if it executes, so essentially it’s behaving more or less as expected; however there’s clearly some interesting stuff happening in there:

  1. the two branches use different registers to store their intermediate values; the first uses eax, the second uses ecx
  2. both branches store their result to the same memory address in the Stack (see this post if you don’t know or can’t remember about Stack Frames) – i.e. [ebp-50h]
  3. the code that assigns the value to iLocal (lines 12 and 13) only exists once and is executed regardless of which branch was taken; it takes the value from[ebp-50h] and writes it into iLocal using uses a third register (edx)

The use of different registers for the different branches in step 1 looks like it might be significant but (according to several expert sources) this is apparently perfectly normal compiler behaviour and not anything to read into.

Steps 2 and 3 show that the that generated from the conditional operator (at least with VS2010) isn’t directly equivalent to the intuitively equivalent if-else statement:

1
2
3
4
5
6
7
8
9
10
11
// intuitively equivalent if-else of
// int iLocal = (argc > 2 ) ? iOperandTwo : iOperandThree;
int iLocal;
if( argc > 2 )
{
    iLocal = iOperandTwo;
}
else
{
    iLocal = iOperandThree;
}

Rather than choosing between one of two assignments like this if-else, the assembler generated for our use of the conditional operator does exactly what we told it to: choose one of two values (store it temorarily in the Stack) and assign iLocal from it.

A few final notes on the ? operator:

  1. You can see that less lines of C++ code does not equate to less assembler
  2. It can be nested, but don’t do it! It’s hideous and will also be very hard to follow when source-level debugging
  3. Be very careful with operator precedence when using it. Use brackets to ensure it will resolve the way you intend.

Switch Statements

The final type of conditional statement we’ll be looking at is the switch statement. Like the conditional operator, the switch statement is an often abused and maligned construct that you wouldn’t want to live without.

To be fair to the switch statement it’s not the fault of the switch statement that it’s possible for maniacs to write brittle and insane code using them.

An aside about switch statements

Where I have consistently found really horrific examples of switch statements is when an originally stateless synchronous system has been forced to become asyncronous and state driven under time pressure. This specific situation seems always to somehow spawn the kind of monolithic, hard to follow, difficult to change, architecturally brittle switch statements that have given the switch statement a bad rep over the years.

Code that has had network functionality retrofitted to it is (in my experience) an extremely common place to find problem switch statements. It’s always better to fix a system properly if it starts to look systemically broken than it is to soldier on regardless, and if it looks like you need to introduce a set of states into a system to then (in my experience) it’s architecturally more sensible to use polymorphic behaviour (e.g. a state class with one or more virtual functions) than a switch statement.

Where were we?

Sorry, let’s get on and take a look at a switch statement…

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
#include "stdafx.h"
 
int main(int argc, char* argv[])
{
    int iLocal = 0;
 
    // n.b. no "break" in case 1 so we can
    // see what "fall through" looks like
    switch( argc )
    {
    case 1:
        iLocal = 6;
    case 3:
        iLocal = 7;
        break;
    case 5:
        iLocal = 8;
        break;
    default:
        iLocal = 9;
        break;
    }
 
    return 0;
}

And here’s the disassembly…

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
     9:     switch( argc )
00C61620  mov         eax,dword ptr [argc]
00C61623  mov         dword ptr [ebp-48h],eax
00C61626  cmp         dword ptr [ebp-48h],1
00C6162A  je          main+2Ah (0C6163Ah)
00C6162C  cmp         dword ptr [ebp-48h],3
00C61630  je          main+31h (0C61641h)
00C61632  cmp         dword ptr [ebp-48h],5
00C61636  je          main+3Ah (0C6164Ah)
00C61638  jmp         main+43h (0C61653h)
    10:     {
    11:     case 1:
    12:         iLocal = 6;
00C6163A  mov         dword ptr [iLocal],6
    13:     case 3:
    14:         iLocal = 7;
00C61641  mov         dword ptr [iLocal],7
    15:         break;
00C61648  jmp         main+4Ah (0C6165Ah)
    16:     case 5:
    17:         iLocal = 8;
00C6164A  mov         dword ptr [iLocal],8
    18:         break;
00C61651  jmp         main+4Ah (0C6165Ah)
    19:     default:
    20:         iLocal = 9;
00C61653  mov         dword ptr [iLocal],9
    21:         break;
    22:     }

This is more or less exactly what you’d expect:

  • line 1 stores argc into the Stack at [ebp-48h]
  • then block from line 2 to 9 implements the logic of the switch by a series of comparisons of this value against the constants specified in the case statements and associated conditional jumps to the assembler generated by the code in the corresponding case statement
  • if none of the conditional jumps are triggered, the logic causes an unconditional jump to the default: case.
  • in particular, note that:
  1. wherever the break keyword is used this causes an unconditional jump past the end of the assembler generated by the switch
  2. the “drop through” from case 1: into case 3: in the high level code happens at the assembler level as a by product of the organisation of the adjacent blocks of instructions generated for the switch by the compiler, and the lack of unconditional jump at the end of the assembler for case 1:

If you look at assembler from the sample if-else-if-else in the last article; you should be able to see that the assembler generated for this switch is (more or less) what would happen if we had written the switch as an if-else-if-else and then re-organised the assembler so all the logic was in one place at the top, and the assembler generated for each code block was left where it was.

So other than the fact that the switch statement is a very useful C/C++ language convenience for managing what would often otherwise be messy looking and error prone chains of if-else-if-else statements, based on this example it doesn’t appear to be doing anything which might offer a significant advantage at the assembler level – so why would I have claimed that the compiler might generate “pretty cool assembler” for a switch?

Before we assume we’ve seen it all, let’s try using a contiguous range of values for the constants in the cases of the switch. You know, just for fun – and for the sake of simplicity let’s start at 0.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
#include "stdafx.h"
 
int main(int argc, char* argv[])
{
    int iLocal = 0;
 
    switch( argc )
    {
    case 0:
        iLocal = 4;
        break;
    case 1:
        iLocal = 5;
        break;
    case 2:
        iLocal = 6;
        break;
    case 3:
        iLocal = 7;
        break;
    }
 
    return 0;
}

And here’s the disassembly it generates…

Ok, so this time something more interesting is definitely going on – n.b. I’ve used a screenshot rather than just pasting the text because we need to look in a memory window to make sense of it.

So what exactly is it doing?

  • it moves argc into eax, then stores it into the Stack at [ebp-48h]
  • it then compares the value stored in the address [ebp-48h] with 3 (i.e. our maximum case constant)
  • if this value is greater than 3 then ja (jump above) on the next line will cause execution to jump to 8D1658h – the 1st instruction after the code generated by the case blocks, skipping the switch
  • if the value is less than or equal to 3 then the value is moved into ecx, and we then have an unconditional jump to … somewhere :-/

Ok, so that final unconditional jump has some syntax we’ve not yet seen for its address operand, and which clearly isn’t a constant:

1
jmp    dword ptr    (1B1664h)[ecx*4]

This says “jump to the location stored in the memory address at an offset of 4 times the value of ecx from the memory address 8D1664h“, so how is this implementing the logic of the C++ switch statement?

To answer this question we need to look in a memory window at the address 8D1664h (n.b. to open a memory window from the menu in VS2010 when debugging go Debug -> Windows -> Memory -> … and choose one of the memory windows. To set the address just copy and paste it from the disassembly into the “Address:” input box. You will also need to right click and choose “4-byte integer” and set the “Columns:” list box to 1 to have it look like the screenshot above).

So, if you cast your eyes up to the memory window on the left of the screenshot above, you will see that the top 4 rows are highlighted, these values start at address 8D1664h and are 4 byte integers (hence the ecx*4 in the operand) – which specifically in this case are pointers.

The instruction jmp dword ptr (8D1664h)[ecx*4] will jump to the value stored in the address:

  • 8D1664h + 0 = 8D1664h if the value in ecx is 0
  • 8D1664h + 4  = 8D1668h if the value of ecx is 1
  • 8D1664h + 8  = 8D166Ch if the value of ecx is 2
  • 8D1664h + Ch  = 8D1670h if the value of ecx is 3

So, the four highlighted rows make up a jump table – since our case constant’s range is from 0 to 3 it is an array of 4 pointers – with each element of the array pointing to the execution address of the case block matching its array index.

You can verify this by checking the addresses of the first instruction generated for each case against the 4 values stored in the array.

Maybe it’s just me, but I think this is some pretty cool assembler. It’s certainly more elegant that the assembler generated by the first switch we looked at, but what – if anything – is the advantage of this over the assembler that was generated for the previous case statement?

In theory this jump table form reaches the code in constant time for all cases, whereas in the if-else-if-else form the time to reach the code corresponding to each case will be proportional to the number of previous cases in the switch statement.

You’re pretty unlikely to find that a switch statement is a performance bottleneck in your code (unless you’ve done something silly) but, all things being equal, the jump table appoach uses less instructions to get to the conditional which is normally A Good Thing and – in theory – should make it faster on average.

One final note on switch statements; I am reliably informed that in addition to the if-else-if-else alike linear search behaviour for resolving the correct case to execute, most modern compilers are also capable of generating a binary search for the cases of switch statements with appropriate ranges of case constant values.

Using a binary search rather than a linear search will improve average search time from linear to logarithmic (i.e. O(n) to O(log n)). However, in the average case a binary searched switch will still almost always take more instructions and branches to reach the correct case than a jump table switch.

It’s also possible that the compiler might choose to use one or more of these methods in a single switch, though this would probably require a large number of cases in the switch and ranges of case constants with very specific properties so it’s not likely you will come across these very often.

A couple of final things to note about switch statements:

  1. the compiler should be able to generate a jump table regardless of the order in of the constants in your code (e.g. case 2: … case 1: … case 3: … should still work fine)
  2. having a range of case constants that starts at 0 makes the conditional code around a jump table simpler, as it removes the lower bounds check
  3. a jump table should get created as long as the overall range of constants is large enough and/or closely packed enough for the compiler to decide it’s worthwhile even if they’re not completely contiguous. Look at the disassembly if you want to check.

 Summary

So, this concludes our look at conditionals, hopefully you’ve found it interesting and illuminating ;)

A final point to take away from our look at conditionals is that whilst the compiler could generate the same assembler for an if-else as for the conditional operator it doesn’t. Similarly, it could generate the same assembler from an if-else-if-else as for a  switch statement but it doesn’t do that either.

In part, this shows the limits of the compiler but also shows the importance of using the appropriate conditional for purpose – the benefit is that which you use makes your intent clearer to human readers of your code.

We’ve now covered enough ground that you should be finding that you can apply the information I’ve given you to everyday programming problems such as debugging release code, or code you don’t have debugging information for.

The main things I’d like you to take away from our look at conditionals are all things that will help you when debugging without symbols:

  1. anytime you see cmp followed by a jxx to a nearby address in the disassembly you’re probably looking at code generated by a conditional statement in the C/C++ code
  2. if the address operand to the jump instruction is lower than the current instruction’s address (i.e. it’s jumping backwards) you’re most likely looking at a loop
  3. assembler generated from conditionals generally tests the opposite of the test being done in the C / C++ code

By using these heuristics, looking at the values in the registers, the values in the Stack that have been written by the assembler, and by looking up your current address in the symbol file to tell you which function you’re in (if you’re not generating a symbol file for all your builds you should be – look in the documentation for your platform’s compiler toolchain to find out how) you should be able to make an educated guess at what variables in the C/C++ code are likely to be causing the current issue and this will usually tell you why it crashed, or give you a lead so you can Sherlock Holmes your way to the root of the problem – it’s certainly a lot quicker than the ubiquitous insertion of the many printf()…

Our next topic will be loops, which obviously also use conditional jumps (which is why we covered conditionals first…)

One final thing…

Thanks to Tony, Bruce, and Fabian for extra information, advice, and proof reading.

And, for those of you who like to go off and look for yourselves (hopefully most of you!), I’ve recently discovered this wiki book on x86 Assembler http://en.wikibooks.org/wiki/X86_Assembly. It has a large overlap with this series of articles and also covers programming in x86 assembler. Highly recommended – I’ve certainly found it pretty useful.

A final, final nugget of wisdom from Bruce Dawson:

Another problem I’ve seen with ?: is with people who have ‘cleverly’ created string classes that both have a const char* constructor and a const char* conversion operator. This has potential to be exquisitely dangerous and exquisitely inefficient, by allowing lots of hidden conversions. These come to their fruition with ?:. Imagine this:

return bFlag ? mStringObject : “Hello world”;

The question is, what is the type of this expression? Does mStringObject get converted to a const char*, or does “Hello world” get converted to a string object? I have no idea, and nobody should memorize the relevant rules. Such code is too fragile and dangerous.

Some people might assume that the type of the ?: will depend on the return type of the function but that is not true. They are independent. Thus, it is quite possible that “Hello world” will be converted to a string object and then (if the return type of the function is const char*) this (temporary!!!) string object will be converted back to a const char*. In addition to being inefficient this leads to undefined behavior, since we are returning a pointer to memory owned by an object that is destroyed as the function exits.

Using the ?: operator with mismatched types is evil evil evil. String classes with a const char* conversion operator are evil evil evil. Putting them together… priceless.