<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>#AltDevBlogADay &#187; Alex Darby</title>
	<atom:link href="http://www.altdevblogaday.com/author/alex-darby/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.altdevblogaday.com</link>
	<description>Each day a little more #gamedev love</description>
	<lastBuildDate>Wed, 22 May 2013 11:16:20 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.5.1</generator>
		<item>
		<title>C/C++ Low Level Curriculum Part 12: Multiple Inheritance</title>
		<link>http://www.altdevblogaday.com/2013/05/22/cc-low-level-curriculum-part-12-multiple-inheritance/</link>
		<comments>http://www.altdevblogaday.com/2013/05/22/cc-low-level-curriculum-part-12-multiple-inheritance/#comments</comments>
		<pubDate>Wed, 22 May 2013 11:16:20 +0000</pubDate>
		<dc:creator>Alex Darby</dc:creator>
				<category><![CDATA[#gamedev]]></category>
		<category><![CDATA[Education]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[C++]]></category>
		<category><![CDATA[education]]></category>
		<category><![CDATA[gamedev]]></category>
		<category><![CDATA[low level]]></category>
		<category><![CDATA[programming]]></category>

		<guid isPermaLink="false">http://www.altdevblogaday.com/?p=29405</guid>
		<description><![CDATA[<p>Hello, and welcome to the 12th part of the C / C++ low level curriculum. Really soon after part 11! (No, of course part 11 didn&#8217;t get too big and need to be split. Why would you ask?)</p>
<p><a href="http://www.altdevblogaday.com/2013/05/22/cc-low-level-curriculum-part-12-multiple-inheritance/" class="more-link">Read more on C/C++ Low Level Curriculum Part 12: Multiple Inheritance&#8230;</a></p>
]]></description>
				<content:encoded><![CDATA[<p>Hello, and welcome to the 12th part of the C / C++ low level curriculum. Really soon after part 11! (No, of course part 11 didn&#8217;t get too big and need to be split. Why would you ask?)</p>
<p>Last time we looked at the basics of how inheritance was implemented at the low level; and this time we&#8217;re going to examine how using multiple inheritance affects this picture (<em><strong>note:</strong> </em>we&#8217;re leaving the keyword <em><strong>virtual</strong></em> til next time).<img title="More..." alt="" src="http://www.altdevblogaday.com/wp-includes/js/tinymce/plugins/wordpress/img/trans.gif" /></p>
<p><span id="more-29405"></span></p>
<h2>Before We Begin</h2>
<p>I will assume that you have already read the previous posts in the series, but I will also put in-line links to any important terms or concepts that you might need to know about to make sense of what you&#8217;re reading. I&#8217;m helpful like that.</p>
<p>Another big assumption I&#8217;m going to make is that you&#8217;re already very familiar with the language features of C++ and  comfortable using the language features we&#8217;re discussing. If I need to demonstrate anything out of the ordinary I&#8217;ll explain it &#8211; or at least link to an explanation.</p>
<p>In this series I discuss what happens with vanilla unoptimised win32 debug code generated by the VS 2010 compiler &#8211; whilst the specifics will differ on other platforms (and probably with other compilers) the general sweep of the code should be basically the same &#8211; because it&#8217;s assembly that has been generated by a C++ compiler &#8211; and so following the same examples given here with a source / disassembly debugger on your platform of choice should provide you with the same insights we get here.</p>
<p>With this in mind, in case you missed them, here are the backlinks to the previous posts in the series:</p>
<ol>
<li><a href="http://www.altdevblogaday.com/2011/11/09/a-low-level-curriculum-for-c-and-c/" rel="nofollow">http://altdevblogaday.com/2011/11/09/a-low-level-curriculum-for-c-and-c/</a></li>
<li><a href="http://www.altdevblogaday.com/2011/11/24/c-c-low-level-curriculum-part-2-data-types/" rel="nofollow">http://altdevblogaday.com/2011/11/24/c-c-low-level-curriculum-part-2-data-types/</a></li>
<li><a href="http://www.altdevblogaday.com/2011/12/14/c-c-low-level-curriculum-part-3-the-stack/">http://altdevblogaday.com/2011/12/14/c-c-low-level-curriculum-part-3-the-stack/</a></li>
<li><a href="http://www.altdevblogaday.com/2011/12/24/c-c-low-level-curriculum-part-4-more-stack/">http://altdevblogaday.com/2011/12/24/c-c-low-level-curriculum-part-4-more-stack/</a></li>
<li><a href="http://www.altdevblogaday.com/2012/02/07/c-c-low-level-curriculum-part-5-even-more-stack/">http://altdevblogaday.com/2012/02/07/c-c-low-level-curriculum-part-5-even-more-stack/</a></li>
<li><a href="http://www.altdevblogaday.com/2012/03/07/c-c-low-level-curriculum-part-6-conditionals/">http://altdevblogaday.com/2012/03/07/c-c-low-level-curriculum-part-6-conditionals/</a></li>
<li><a href="http://www.altdevblogaday.com/2012/04/10/cc-low-level-curriculum-part-7-more-conditionals/">http://www.altdevblogaday.com/2012/04/10/cc-low-level-curriculum-part-7-more-conditionals/</a></li>
<li><a href="http://www.altdevblogaday.com/2012/05/07/cc-low-level-curriculum-part-8-looking-at-optimised-assembly/">http://www.altdevblogaday.com/2012/05/07/cc-low-level-curriculum-part-8-looking-at-optimised-assembly/</a></li>
<li><a href="http://www.altdevblogaday.com/2012/09/04/cc-low-level-curriculum-part-9-loops/">http://www.altdevblogaday.com/2012/09/04/cc-low-level-curriculum-part-9-loops/</a></li>
<li><a href="http://www.altdevblogaday.com/2013/01/05/cc-low-level-curriculum-part-10-user-defined-types/">http://www.altdevblogaday.com/2013/01/05/cc-low-level-curriculum-part-10-user-defined-types/</a></li>
<li><a href="http://www.altdevblogaday.com/2013/05/03/cc-low-level-curriculum-part-11-inheritance/">http://www.altdevblogaday.com/2013/05/03/cc-low-level-curriculum-part-11-inheritance/</a></li>
</ol>
<p>I won&#8217;t lie &#8211; it&#8217;s not light reading :)</p>
<p>&nbsp;</p>
<h2>I&#8217;ve just read post 11 and it all seemed pretty obvious&#8230;</h2>
<p>Good! It should do!</p>
<p>In my experience, most good solutions to problems appear obvious when explained well ;)</p>
<p>Now we&#8217;ve got an understanding of how single inheritance behaves, let&#8217;s examine how multiple inheritance affects this picture&#8230;</p>
<p>&nbsp;</p>
<h2>Multiple Inheritance</h2>
<p>Sample time!</p>
<p>Like last time I have <a href="http://www.altdevblogaday.com/wp-content/uploads/2013/05/ADBAD_CCPPLLC_12_00.zip">lovingly zipped up a hand-crafted VS2010 solution / project / source code combo</a> to go with this sample which contains the following code:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
</pre></td><td class="code"><pre class="c" style="font-family:monospace;">class CTestBaseOne
<span style="color: #009900;">&#123;</span>
public<span style="color: #339933;">:</span>
    <span style="color: #993333;">int</span> _iA<span style="color: #339933;">;</span>
    <span style="color: #993333;">int</span> _iB<span style="color: #339933;">;</span>
&nbsp;
    CTestBaseOne<span style="color: #009900;">&#40;</span> <span style="color: #993333;">int</span> iA<span style="color: #339933;">,</span> <span style="color: #993333;">int</span> iB <span style="color: #009900;">&#41;</span>
    <span style="color: #339933;">:</span> _iA<span style="color: #009900;">&#40;</span> iA <span style="color: #009900;">&#41;</span>
    <span style="color: #339933;">,</span> _iB<span style="color: #009900;">&#40;</span> iB <span style="color: #009900;">&#41;</span>
    <span style="color: #009900;">&#123;</span><span style="color: #009900;">&#125;</span>
&nbsp;
    <span style="color: #993333;">int</span> SumBase<span style="color: #009900;">&#40;</span> <span style="color: #993333;">void</span> <span style="color: #009900;">&#41;</span>
    <span style="color: #009900;">&#123;</span>
        <span style="color: #b1b100;">return</span> _iA <span style="color: #339933;">+</span> _iB<span style="color: #339933;">;</span>
    <span style="color: #009900;">&#125;</span>
<span style="color: #009900;">&#125;</span><span style="color: #339933;">;</span>
&nbsp;
class CTestBaseTwo
<span style="color: #009900;">&#123;</span>
public<span style="color: #339933;">:</span>
    <span style="color: #993333;">int</span> _iC<span style="color: #339933;">;</span>
    <span style="color: #993333;">int</span> _iD<span style="color: #339933;">;</span>
&nbsp;
    CTestBaseTwo<span style="color: #009900;">&#40;</span> <span style="color: #993333;">int</span> iC<span style="color: #339933;">,</span> <span style="color: #993333;">int</span> iD <span style="color: #009900;">&#41;</span>
    <span style="color: #339933;">:</span> _iC<span style="color: #009900;">&#40;</span> iC <span style="color: #009900;">&#41;</span>
    <span style="color: #339933;">,</span> _iD<span style="color: #009900;">&#40;</span> iD <span style="color: #009900;">&#41;</span>
    <span style="color: #009900;">&#123;</span><span style="color: #009900;">&#125;</span>
&nbsp;
    <span style="color: #993333;">int</span> SumBaseTwo<span style="color: #009900;">&#40;</span> <span style="color: #993333;">void</span> <span style="color: #009900;">&#41;</span>
    <span style="color: #009900;">&#123;</span>
        <span style="color: #b1b100;">return</span> _iC <span style="color: #339933;">+</span> _iD<span style="color: #339933;">;</span>
    <span style="color: #009900;">&#125;</span>
<span style="color: #009900;">&#125;</span><span style="color: #339933;">;</span>
&nbsp;
class CTestDerived
<span style="color: #339933;">:</span> public CTestBaseOne
<span style="color: #339933;">,</span> public CTestBaseTwo
<span style="color: #009900;">&#123;</span>
public<span style="color: #339933;">:</span>
    <span style="color: #993333;">int</span> _iE<span style="color: #339933;">;</span>
    <span style="color: #993333;">int</span> _iF<span style="color: #339933;">;</span>
&nbsp;
    CTestDerived<span style="color: #009900;">&#40;</span> <span style="color: #993333;">int</span> iA<span style="color: #339933;">,</span> <span style="color: #993333;">int</span> iB<span style="color: #339933;">,</span> <span style="color: #993333;">int</span> iC<span style="color: #339933;">,</span> <span style="color: #993333;">int</span> iD <span style="color: #009900;">&#41;</span>
    <span style="color: #339933;">:</span> CTestBaseOne <span style="color: #009900;">&#40;</span> iA<span style="color: #339933;">,</span> iB <span style="color: #009900;">&#41;</span>
    <span style="color: #339933;">,</span> CTestBaseTwo <span style="color: #009900;">&#40;</span> iC<span style="color: #339933;">,</span> iD <span style="color: #009900;">&#41;</span>
    <span style="color: #339933;">,</span> _iE <span style="color: #009900;">&#40;</span> iB <span style="color: #009900;">&#41;</span>
    <span style="color: #339933;">,</span> _iF <span style="color: #009900;">&#40;</span> iD <span style="color: #009900;">&#41;</span>
    <span style="color: #009900;">&#123;</span><span style="color: #009900;">&#125;</span>
&nbsp;
    <span style="color: #993333;">int</span> SumDerived<span style="color: #009900;">&#40;</span> <span style="color: #993333;">void</span> <span style="color: #009900;">&#41;</span>
    <span style="color: #009900;">&#123;</span>
        <span style="color: #b1b100;">return</span> <span style="color: #b1b100;">return</span> SumBase<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span> <span style="color: #339933;">+</span> SumBaseTwo<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span> <span style="color: #339933;">+</span>_iE <span style="color: #339933;">+</span> _iF<span style="color: #339933;">;</span>
    <span style="color: #009900;">&#125;</span>
<span style="color: #009900;">&#125;</span><span style="color: #339933;">;</span>
&nbsp;
<span style="color: #993333;">int</span> main<span style="color: #009900;">&#40;</span> <span style="color: #993333;">int</span> argc<span style="color: #339933;">,</span> <span style="color: #993333;">char</span><span style="color: #339933;">*</span> argv<span style="color: #009900;">&#91;</span><span style="color: #009900;">&#93;</span> <span style="color: #009900;">&#41;</span>
<span style="color: #009900;">&#123;</span>
    CTestBaseOne    cTestBaseOne<span style="color: #009900;">&#40;</span> argc<span style="color: #339933;">,</span> argc <span style="color: #339933;">+</span> <span style="color: #0000dd;">1</span> <span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    CTestBaseTwo    cTestBaseTwo<span style="color: #009900;">&#40;</span> argc<span style="color: #339933;">,</span> argc <span style="color: #339933;">+</span> <span style="color: #0000dd;">1</span> <span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    CTestDerived    cTestDerived<span style="color: #009900;">&#40;</span> argc<span style="color: #339933;">,</span> argc <span style="color: #339933;">+</span> <span style="color: #0000dd;">1</span><span style="color: #339933;">,</span> argc <span style="color: #339933;">+</span> <span style="color: #0000dd;">2</span><span style="color: #339933;">,</span> argc <span style="color: #339933;">+</span> <span style="color: #0000dd;">3</span> <span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
    <span style="color: #b1b100;">return</span>      cTestBaseOne.<span style="color: #202020;">SumBase</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span>  <span style="color: #339933;">+</span> cTestBaseTwo.<span style="color: #202020;">SumBaseTwo</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span> 
            <span style="color: #339933;">+</span>   cTestDerived.<span style="color: #202020;">SumBase</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span>  <span style="color: #339933;">+</span> cTestDerived.<span style="color: #202020;">SumBaseTwo</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span> <span style="color: #339933;">+</span> cTestDerived.<span style="color: #202020;">SumDerived</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span></pre></td></tr></table></div>

<p>Once you&#8217;ve unzipped it, go ahead and build it.</p>
<p>Don&#8217;t forget to pay attention to the build output &#8211; it shows the memory layout which we&#8217;re going to talk about next.</p>
<p>&nbsp;</p>
<h2>Memory Layout</h2>
<p>As you can see, we now have two base classes, and one class that derives from both of them.</p>
<p>When you build the project, you should see that the memory layout of these classes looks like this:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="txt" style="font-family:monospace;">1&gt;  class CTestBaseOne	size(8):
1&gt;  	+---
1&gt;   0	| _iA
1&gt;   4	| _iB
1&gt;  	+---
1&gt;  
1&gt;  class CTestBaseTwo	size(8):
1&gt;  	+---
1&gt;   0	| _iC
1&gt;   4	| _iD
1&gt;  	+---
1&gt;   
1&gt;  class CTestDerived	size(24):
1&gt;  	+---
1&gt;  	| +--- (base class CTestBaseOne)
1&gt;   0	| | _iA
1&gt;   4	| | _iB
1&gt;  	| +---
1&gt;  	| +--- (base class CTestBaseTwo)
1&gt;   8	| | _iC
1&gt;  12	| | _iD
1&gt;  	| +---
1&gt;  16	| _iE
1&gt;  20	| _iF
1&gt;  	+---</pre></td></tr></table></div>

<p>This is all as we might expect &#8211; given what we found out <a href="http://www.altdevblogaday.com/2013/05/03/cc-low-level-curriculum-part-11-inheritance/">last time</a>.</p>
<p>In particular note that:</p>
<ul>
<li>the memory layout of both base classes is embedded into <em><strong>CTestDerived</strong></em></li>
<li><em><strong>CTestBaseOne</strong></em> and <em><strong>CTestBaseTwo</strong></em> appear in the memory layout <em>in the same order they are declared in the base-specifier-list of</em> <em><strong>CTestDerived.</strong></em></li>
</ul>
<p>n.b. the base-specifier-list is the part of the declaration of a class where the base classes are specified.</p>
<p>In the simple case of single inheritance we considered in the <a href="http://www.altdevblogaday.com/2013/05/03/cc-low-level-curriculum-part-11-inheritance/">last post</a>, we saw that the functions of a base class <em><strong>B</strong></em> could be called on instances of a derived class <em><strong>D</strong></em> because:</p>
<ul>
<li>the memory layout of <em><strong>D</strong></em> contains a literal instance of <em><strong>B</strong></em> at an offset of 0 bytes within itself and&#8230;</li>
<li>&#8230;this means that the member data of an instance of <em><strong>B</strong></em> is at the same offset relative to the memory layout of an instance of <em><strong>D</strong></em>&#8230;</li>
<li>&#8230;and so the hard coded offsets used to access these members within functions belonging to <em><strong>B</strong></em> are also valid for instances of <em><strong>D</strong></em></li>
</ul>
<p>Looking at the memory layout for this multiply inherited class we can see that:</p>
<ol>
<li>this relationship still holds for <em><strong>CTestBaseOne</strong></em> and <em><strong>CTestDerived -</strong></em> <em><strong>CTestBaseOne</strong></em> is an an offset of 0 bytes within the memory layout of <em><strong>CTestDerived</strong></em></li>
<li>however, this same relationship is not true of <em><strong>CTestBaseTwo</strong></em> and <em><strong>CTestDerived</strong></em></li>
</ol>
<p>Given this situation, how do functions of <em><strong>CTestBaseTwo</strong></em> work with instances of <em><strong>CTestDerived</strong></em>?</p>
<p>As usual the best thing to do do is take a look&#8230;</p>
<p>&nbsp;</p>
<h2>Calling a function of CTestBaseTwo on CTestDerived</h2>
<p>Put a breakpoint on the return statement from <em><strong>main()</strong></em>, run the code, and when it stops right click then choose &#8216;Go To Disassembly&#8217;.</p>
<p>Rather than paste the disassembly as text this time, I&#8217;ve inserted a screenshot of my debugger window &#8211; this allows more formatting and highlighting options.</p>
<p><em>N.B. in this screenshot I have &#8220;Show symbol names&#8221; checked under viewing options. Whilst this typically makes it easier to relate disassembly to C or C++ code, it does hide detail (i.e. the addresses of the symbols) .</em></p>
<p><a href="http://www.altdevblogaday.com/wp-content/uploads/2013/05/CCPPLLC_12Inheritance_P2_MultipleInheritance_00.png"><img class="alignnone  wp-image-29432" alt="CCPPLLC_12Inheritance_P2_MultipleInheritance_00" src="http://www.altdevblogaday.com/wp-content/uploads/2013/05/CCPPLLC_12Inheritance_P2_MultipleInheritance_00.png" width="632" height="484" /></a></p>
<p>Let&#8217;s pick this apart then, starting at the current line indicator where the breakpoint is and working down:</p>
<ul>
<li>we can see that (following the x86 <em>thiscall</em> convention) before each function is called, the address of the corresponding object is stored into <em><strong>ecx</strong></em><em><strong> </strong></em>using <em><strong>lea</strong></em>.</li>
<li>First it loads the address of <em><strong>cTestDerived</strong> </em>into <em><strong>ecx</strong> </em>and then calls <em><strong>CTestDerived::SumDerived()</strong></em>&#8230;</li>
<li>then it&#8230;</li>
<li>Oh, wait&#8230; it&#8217;s loading the address <em><strong>[ebp-20h]</strong></em> into <em><strong>ecx&#8230;</strong></em></li>
<li>that symbol isn&#8217;t resolving in the disassembly window, so what witchcraft is this!?</li>
</ul>
<p>I have helpfully highlighted the most salient areas of the screenshot with red boxes :)</p>
<p>If you look at the function calls made in the disassembly, and compare them to the calls in the C++ code, you will see that all of the high level function calls have an analogue at the assembly level except for <em><strong>cDerived.SumBaseTwo()</strong></em>.</p>
<p><em><strong>CTestBaseTwo::SumBaseTwo </strong></em>is getting called, but with <em><strong>[ebp-20h]</strong></em> as the <em><strong>this</strong></em> pointer in <em><strong>ecx</strong></em>, not <em><strong>[cTestDerived] </strong></em>(n.b. see the top red box in the screenshot).</p>
<p><em><strong></strong></em>So, the question is: how does the address <em><strong>[ebp-20h]</strong></em> related to the address of <em><strong>cTestDerived</strong></em>?</p>
<p>This would be a good time to reiterate that the watch window is your friend. We can use the watch window to Sherlock Holmes our way to an answer.</p>
<p>If you look in the watch window below the disassembly view (shown again below by itself for those of you who are vertical resolution challenged) you can see that I have used watch window expression evaluation to find out some information about these values:</p>
<p><a href="http://www.altdevblogaday.com/wp-content/uploads/2013/05/CCPPLLC_12Inheritance_P2_MultipleInheritance_WatchCasting_00.png"><img class="alignnone  wp-image-29434" alt="CCPPLLC_12Inheritance_P2_MultipleInheritance_WatchCasting_00" src="http://www.altdevblogaday.com/wp-content/uploads/2013/05/CCPPLLC_12Inheritance_P2_MultipleInheritance_WatchCasting_00.png" width="782" height="112" /></a></p>
<p>This shows us that:</p>
<ul>
<li>the address of<strong></strong><em><strong> </strong><strong>cTestDerived</strong></em> is 0x0048fa84&#8230;</li>
<li>&#8230; and the address of <em><strong>cTestDerived</strong> </em>cast to a pointer to <em><strong>CTestBaseOne</strong></em> has the same address, &#8230;</li>
<li>&#8230;but when the address of <em><strong>cTestDerived</strong></em> is cast to <em><strong>CTestBaseTwo</strong></em> we get 0x0048fa8c&#8230;</li>
<li>&#8230;which is the same value as <em><strong>[ebp-20h]&#8230;</strong></em></li>
<li>&#8230;or an 8 byte offset from the address of <em><strong>cTestDerived&#8230;</strong></em></li>
<li><em><strong></strong></em>&#8230;which is the offset of <em><strong>CTestBaseTwo</strong></em> within <em><strong>CTestDerived</strong></em></li>
</ul>
<p>&nbsp;</p>
<h2>Should this be surprising?</h2>
<p>Here&#8217;s the memory layout of <em><strong>CTestDerived </strong></em>again:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="txt" style="font-family:monospace;">1&gt;  class CTestDerived	size(24):
1&gt;  	+---
1&gt;  	| +--- (base class CTestBaseOne)
1&gt;   0	| | _iA
1&gt;   4	| | _iB
1&gt;  	| +---
1&gt;  	| +--- (base class CTestBaseTwo)
1&gt;   8	| | _iC
1&gt;  12	| | _iD
1&gt;  	| +---
1&gt;  16	| _iE
1&gt;  20	| _iF
1&gt;  	+---</pre></td></tr></table></div>

<p>Since we know that:</p>
<ul>
<li>(within non-static member functions) member variables are accessed via constant offsets from their <em><strong>this</strong> </em>pointer</li>
</ul>
<ul>
<li>the memory for <em><strong>CTestBaseTwo</strong></em> starts at an offset of 8 bytes from the start of the memory layout of an instance of <em><strong>CTestDerived</strong></em></li>
</ul>
<p>it follows that <em><strong>CTestBaseTwo::SumBaseTwo()</strong></em> wouldn&#8217;t work if the compiler passed the address of an instance of <em><strong>CTestDerived </strong></em>because the constant offsets used to access the members of <em><strong>CTestBaseTwo</strong></em> would be off by 8 bytes.</p>
<p>Consequently, any time a <em><strong>CTestBaseTwo</strong></em> member function is called on an instance of <em><strong>CTestDerived</strong></em> the compiler must ensure that a compatible this pointer is generated<em><strong> </strong></em>to pass to the function - i.e. pointing at the start address of <em><strong>CTestBaseTwo </strong></em>within<strong> </strong>the instance of <em><strong>CTestDerived.</strong></em></p>
<p>Frighteningly obvious once you know isn&#8217;t it?</p>
<p>I honestly don&#8217;t think it should be surprising though &#8211; given the way that we know data within user defined types is accessed at the assembly level (see <a href="http://www.altdevblogaday.com/2013/01/05/cc-low-level-curriculum-part-10-user-defined-types/">part 10</a>), it pretty much had to work like this.</p>
<p>&nbsp;</p>
<h2>&#8230;one more little thing</h2>
<p>In the example above, <em><strong>cTestDerived</strong> </em>is a <a href="http://www.altdevblogaday.com/2011/12/14/c-c-low-level-curriculum-part-3-the-stack/">Stack</a> variable &#8211; so the compiler knows exactly where it is located in the current <a href="http://www.altdevblogaday.com/2011/12/14/c-c-low-level-curriculum-part-3-the-stack/">Stack Frame</a>.</p>
<p>This means that the compiler can calculate the address of the instance of <em><strong>CTestBaseTwo</strong></em> within <em><strong>cTestDerived</strong></em> at compile time, and can therefore access it at no extra cost compared to any other Stack variable.</p>
<p>We should probably check whether this is this any different when we&#8217;re dealing with a pointer to a <em><strong>CTestDerived </strong></em>at an arbitrary point in memory, just to be thorough.</p>
<p>Luckily I have already thought of this :)</p>
<p>If you place a breakpoint on the return statement of <em><strong>CTestDerived::SumDerived</strong></em><strong> </strong> you can check the disassembly yourself, but here are the relevant lines from my disassembly window:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
</pre></td><td class="code"><pre class="asm" style="font-family:monospace;">   <span style="color: #ff0000;">52</span><span style="color: #339933;">:</span>     <span style="color: #00007f; font-weight: bold;">int</span> SumDerived<span style="color: #009900; font-weight: bold;">&#40;</span> void <span style="color: #009900; font-weight: bold;">&#41;</span>
    <span style="color: #ff0000;">53</span><span style="color: #339933;">:</span>     <span style="color: #009900; font-weight: bold;">&#123;</span>
<span style="color: #adadad; font-style: italic;">001010A0</span>  <span style="color: #00007f; font-weight: bold;">push</span>        <span style="color: #46aa03; font-weight: bold;">esi</span>  
<span style="color: #adadad; font-style: italic;">001010A1</span>  <span style="color: #00007f; font-weight: bold;">mov</span>         <span style="color: #46aa03; font-weight: bold;">esi</span><span style="color: #339933;">,</span><span style="color: #46aa03; font-weight: bold;">ecx</span>  
<span style="color: #adadad; font-style: italic;">001010A3</span>  <span style="color: #00007f; font-weight: bold;">push</span>        <span style="color: #46aa03; font-weight: bold;">edi</span>  
    <span style="color: #ff0000;">54</span><span style="color: #339933;">:</span>         return SumBase<span style="color: #009900; font-weight: bold;">&#40;</span><span style="color: #009900; font-weight: bold;">&#41;</span> <span style="color: #339933;">+</span> SumBaseTwo<span style="color: #009900; font-weight: bold;">&#40;</span><span style="color: #009900; font-weight: bold;">&#41;</span> <span style="color: #339933;">+</span>_iE <span style="color: #339933;">+</span> _iF<span style="color: #666666; font-style: italic;">;</span>
<span style="color: #adadad; font-style: italic;">001010A4</span>  <span style="color: #00007f; font-weight: bold;">lea</span>         <span style="color: #46aa03; font-weight: bold;">ecx</span><span style="color: #339933;">,</span><span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #46aa03; font-weight: bold;">esi</span><span style="color: #339933;">+</span><span style="color: #ff0000;">8</span><span style="color: #009900; font-weight: bold;">&#93;</span>  
<span style="color: #adadad; font-style: italic;">001010A7</span>  <span style="color: #00007f; font-weight: bold;">call</span>        CTestBaseTwo<span style="color: #339933;">::</span>SumBaseTwo <span style="color: #009900; font-weight: bold;">&#40;</span><span style="color: #ff0000;">101050h</span><span style="color: #009900; font-weight: bold;">&#41;</span></pre></td></tr></table></div>

<p>As you should be able to see by now, the code in this function is adding a constant offset of 8 bytes onto the this pointer it is passed to generate the this pointer it is passing to <em><strong>CTestBaseTwo::SumBaseTwo</strong></em></p>
<p>If you&#8217;re having trouble seeing it, remember that the &#8216;<em>thiscall</em>&#8216; win32 member function calling convention uses <em><strong>ecx</strong> </em>to pass the <em><strong>this</strong> </em>pointer.</p>
<p>Most significantly, looking back to the <a href="http://www.altdevblogaday.com/2013/01/05/cc-low-level-curriculum-part-10-user-defined-types/">last post</a>, we can see that this is essentially the same way that member variables of user defined types are accessed when we had a pointer to an instance of  a user defined type &#8211; in fact, <em>at the level of the assembly code</em>, <em>there is really no difference between a member variable and a base class</em>; this distinction is really only meaningful at the level of the C++ code.</p>
<p>We now also know that multiple inheritance can cause your code a small additional cost in pointer arithmetic when calling member functions of any of its base types that has a non-zero offset within its memory layout.</p>
<p>&nbsp;</p>
<h2>What was that earlier? about declaration order?</h2>
<p>If you&#8217;re paying attention, you should have noticed that when we looked at the memory layout of <em><strong>CTestDerived </strong></em>I mentioned in passing that the ordering of  <em><strong>CTestBaseOne</strong></em> and <em><strong>CTestBaseTwo </strong></em>within it matches the textual order they were listed in its base-specifier-list.</p>
<p>This is obviously significant, since it implies that if the textual order in which <em><strong>CTestBaseOne</strong></em> and <em><strong>CTestBaseTwo </strong></em>are listed changes, then the memory layout of <em><strong>CTestDerived </strong></em>will change to reflect this.</p>
<p>If you swap the order of <em><strong>CTestBaseOne</strong></em> and <em><strong>CTestBaseTwo </strong></em>around here&#8217;s the memory layout printed during the build process:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="txt" style="font-family:monospace;">1&gt;  class CTestDerived	size(24):
1&gt;  	+---
1&gt;  	| +--- (base class CTestBaseTwo)
1&gt;   0	| | _iC
1&gt;   4	| | _iD
1&gt;  	| +---
1&gt;  	| +--- (base class CTestBaseOne)
1&gt;   8	| | _iA
1&gt;  12	| | _iB
1&gt;  	| +---
1&gt;  16	| _iE
1&gt;  20	| _iF
1&gt;  	+---
1&gt;</pre></td></tr></table></div>

<p>Given what we have discovered so far, we can see that this new memory layout means that <em><strong>CTestDerived</strong> </em>can now be treated as an instance of <em><strong>CTestBaseTwo</strong></em>.</p>
<p>We can also see that with this new layout, the compiler would need to adjust <em><strong>CTestDerived</strong> </em>pointers in order to call <em><strong>CTestOne</strong> </em>functions.</p>
<p>I leave it as an exercise for you, o budding expert reader of x86 disassembly, to check this for yourself :)</p>
<p>&nbsp;</p>
<h2>Aside: construction and destruction with single inheritance</h2>
<p>Something we entirely skipped past in last post was construction and destruction of inherited types.</p>
<p>This was intentional &#8211; construction and destruction behaviour is straightforward with single inheritance.</p>
<p>We all should know the expected high level behaviour for single inheritance (of arbitrary depth) - in summary:</p>
<ul>
<li>each constructor calls the constructor of its base class before it does the work of its own function definition &#8211; i.e. classes are constructed in order &#8221;<em>from inside to out</em>&#8221; or &#8220;<em>least to most derived</em>&#8220;.</li>
<li>destructors do the opposite, each destructor does its own work before calling the destructor of its base class &#8211; i.e. classes are destructed in order &#8220;<em>from outside to in</em>&#8221; or &#8220;<em>most to least derived</em>&#8220;.</li>
</ul>
<p>The disassembly matches the high level behaviour in a very straightforward way and I leave it as an exercise for the reader to step through the disassembly of construction &amp; destruction in some test code to see this in action.</p>
<p>Like the rest of the behaviour we&#8217;ve discovered so far, when you think about it, it&#8217;s actually pretty obvious that this sort of &#8216;stack-like&#8217; construction / destruction behaviour is required in order to make inheritance work correctly.</p>
<p>&nbsp;</p>
<h2>Construction and destruction with multiple inheritance</h2>
<p>It was pretty obvious that we were coming to this, right?</p>
<p>What happens with construction and destruction when multiple inheritance is involved is less simple.</p>
<p>For example, what order do the constructors of multiple base classes get called in? &#8230; and what order do their destructors get called in?</p>
<p>We also assume that &#8211; since the constructor and destructor are member functions &#8211; there must be some fiddling with this pointers during this process too.</p>
<p>Luckily, this is very easy to empirically determine: we can just add some text output into the constructors and destructors of the sample classes to print the name of the function and the value of their <em><strong>this </strong></em>pointer.</p>
<p>Here&#8217;s a <a href="http://www.altdevblogaday.com/wp-content/uploads/2013/05/ADBAD_CCPPLLC_12_01.zip">link to a VS2010 project I prepared earlier to do just that</a>, I&#8217;ve just added a little extra code to the original example code.</p>
<p>Below is the command line output produced when it is run:</p>
<p><a href="http://www.altdevblogaday.com/wp-content/uploads/2013/05/CCPPLLC_12Inheritance_P2_ConDestructionThisPointerFixing00.png"><img class="alignnone  wp-image-29450" alt="CCPPLLC_12Inheritance_P2_ConDestructionThisPointerFixing00" src="http://www.altdevblogaday.com/wp-content/uploads/2013/05/CCPPLLC_12Inheritance_P2_ConDestructionThisPointerFixing00.png" width="569" height="287" /></a></p>
<p>You can see that:</p>
<ul>
<li>constructors are called in the textual order they appear in the class declaration for <strong><em>CTestDerived</em></strong> - from least derived (i.e. <strong><em>CTestBaseOne</em></strong>) to most (<strong><em>CTestDerived</em></strong>).</li>
</ul>
<ul>
<li>destructors are called in the opposite order &#8211; this is to ensure that work done in the constructors is un-done in the opposite order.</li>
</ul>
<ul>
<li>this also shows that the <em><strong>this</strong> </em>pointers are changed for the constructor and destructor of <em><strong>CTestBaseTwo</strong></em> just as they did when we were calling regular member functions</li>
</ul>
<p>At this point you should feel free to swap around the order of <em><strong>CTestBaseOne</strong></em> and <em><strong>CTestBaseTwo</strong></em> in <em><strong>CTestDerived</strong></em>&#8216;s base-specifier-list to check that construction and destruction follow the same rules as the ordering of base types in derived type&#8217;s memory layout (they do, I promise).</p>
<p>&nbsp;</p>
<h2>Summary</h2>
<p>That&#8217;s it for this time and it was massive! I bet you&#8217;re glad I split this off from post 11 now :)</p>
<p>The astute amongst you will have noticed that we have not looked at any code using the keyword <em><strong>virtual. </strong></em>This is entirely deliberate, and that&#8217;s for next time.</p>
<p>So, let&#8217;s recap what we&#8217;ve discovered so far about inheritance&#8230;</p>
<p>First, what we learned about single inheritance in <a href="http://www.altdevblogaday.com/2013/05/03/cc-low-level-curriculum-part-11-inheritance/">part 11</a>:</p>
<p style="padding-left: 30px;">1) We know that the memory layout of user defined types is fixed at compile time&#8230;</p>
<p style="padding-left: 30px;">2) &#8230;and so code accessing a data member of a user defined type can use a constant offset relative to the start address of an instance of the type (see <a href="http://www.altdevblogaday.com/2013/01/05/cc-low-level-curriculum-part-10-user-defined-types/">part 10</a>).</p>
<p style="padding-left: 30px;">3) In single inheritance, the memory layout of a derived type <em><strong>D</strong></em> literally extends that of its base type <em><strong>B.</strong></em></p>
<p style="padding-left: 30px;">4) This ensures that the inherited members of <em><strong>B</strong></em> are at the same constant offsets relative to the start address of an instance of <em><strong>D</strong></em> as they would be relative to the start address of an instance of <em><strong>B</strong></em>&#8230;</p>
<p style="padding-left: 30px;"><em><strong></strong></em>5) &#8230;which means that a pointer to an instance of type <em><strong>D</strong> </em>can safely be treated as a pointer to an instance of type <em><strong>B</strong></em>&#8230;</p>
<p style="padding-left: 30px;">6) &#8230;which in turn guarantees that member functions of type <em><strong>B</strong></em> can safely be called on instances of type <em><strong>D.</strong></em></p>
<p>&nbsp;</p>
<p>We&#8217;ve also discovered that if a derived class <em><strong>D</strong></em> class inherits from multiple base types <em><strong>A</strong></em> and <em><strong>B</strong></em>, then this multiple inheritance breaks the convenience of the single inheritance approach somewhat:</p>
<p style="padding-left: 30px;">7) As with single inheritance, the memory layout of an instance of <em><strong>D</strong></em> contains the member data of both <em><strong>A</strong></em> and <em><strong>B</strong></em>, laid out exactly as it was in each base class.</p>
<p style="padding-left: 30px;">8) Member functions of both type <em><strong>A</strong> </em>and type <em><strong>B</strong> </em>will use constant offsets relative to the their this pointers to access their data members.</p>
<p style="padding-left: 30px;">9) Logically; only either <em><strong>A</strong></em> or <em><strong>B</strong></em> may have an offset of 0 bytes within the memory layout of an instance of <em><strong>D</strong></em>&#8230;</p>
<p style="padding-left: 30px;">10) &#8230; consequently a pointer to an instance of type <em><strong>D</strong></em> can only be safely treated as a pointer to whichever of <em><strong>A</strong> </em>or <em><strong>B</strong></em> has a 0 byte offset within its memory layout</p>
<p style="padding-left: 30px;">11) Which base type has a 0-byte offset is determined by the textual ordering of the <em><strong>A</strong></em> and <em><strong>B</strong></em> types within the base-specifier-list of <em><strong>D</strong></em>&#8216;s class declaration</p>
<p style="padding-left: 30px;">12) &#8230;if <em><strong>A</strong></em> were at the 0 byte offset within <em><strong>D</strong></em>, the compiler would need to calculate a compatible &#8216;this&#8217; pointer whenever a member function of <em><strong>B</strong></em> called on an instance of <em><strong>D </strong></em>(and vice versa)</p>
<p style="padding-left: 30px;">13) &#8230;when an instance of <strong><em>D</em></strong> is created, the instances of <em><strong>A</strong></em> and <em><strong>B</strong></em> contained within its memory layout will be constructed by their own constructors before the constructor for <em><strong>D</strong> </em>is called, and&#8230;</p>
<p style="padding-left: 30px;">14) &#8230;the order in which <em><strong>A</strong> </em>and <em><strong>B</strong></em> are constructed depends on their textual ordering within the base-specifier-list in <em><strong>D</strong></em>&#8216;s class declaration (i.e. they will be constructed in memory offset order).</p>
<p>&nbsp;</p>
<h3 style="padding-left: 30px;">Disclaimer</h3>
<p style="padding-left: 30px;">The above numbered bullet points are facts we have discovered empirically by examining the behaviour of win32 x86 code created by the Visual Studio 2010 compiler.</p>
<p style="padding-left: 30px;">Do not assume that code generated by other platforms / compilers will behave identically. It should behave very similarly, but you should check.</p>
<p style="padding-left: 30px;">In general, for any <a href="http://en.wikipedia.org/wiki/C%2B%2B11#Modification_to_the_definition_of_plain_old_data">POD type</a> you should be able to save out its memory to file as binary and load it back into the memory of a different instance of the same type.</p>
<p style="padding-left: 30px;">Obviously, this will hold true <a href="http://en.wikipedia.org/wiki/If_and_only_if">iff</a> you don&#8217;t change target platform, compiler, compiler options, alignment specifications, or the declaration of the type.</p>
<h3 style="padding-left: 60px;"></h3>
<h3 style="padding-left: 30px;">Thanks</h3>
<p style="padding-left: 30px;">Thanks for peer review go out to Bruce Dawson and Amir Embrahimi; and for general #altdevblogaday admin assistance to Luke Dicken.</p>
<p>&nbsp;</p>
<h2>Appendix: What does the C++ standard have to say about all this?</h2>
<p>I spent some time reading the C++11 ISO standard (or a near final revision of it at least), but even after consulting the source it is not 100% clear to me exactly what is and what isn&#8217;t guaranteed by the standard &#8211; see the below for more information.</p>
<p><span style="font-size: 13px;">Below is a collection of snippets of  information I found in </span><a style="font-size: 13px;" href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2011/n3242.pdf">this near-final draft of the ISO C++ 11 standard</a><span style="font-size: 13px;"> document  when I was looking up various bits for this post (you have to pay ANSI for the actual one!). </span></p>
<p><span style="font-size: 13px;">If you want more information on any of the points below, click the link above to download the .pdf and search for the indented text &#8211; that will get you to the page it was on. This document does not make for light reading!</span></p>
<p>&nbsp;</p>
<h3><strong>1. Ordering of base classes in memory within a multiply inherited class.</strong></h3>
<p>As far as I can tell, this is not guaranteed by the standard &#8211; in fact the draft standard says this:</p>
<p style="padding-left: 30px;"><strong>10.1 Multiple base classes [class.mi]</strong><br />
1 A class can be derived from any number of base classes. [Note: The use of more than one direct base class<br />
is often called multiple inheritance. — end note ] [Example:<br />
class A { /∗ ... ∗/ };<br />
class B { /∗ ... ∗/ };<br />
class C { /∗ ... ∗/ };<br />
class D : public A, public B, public C { /∗ ... ∗/ };<br />
— end example ]</p>
<p style="padding-left: 30px;">2 [Note: The order of derivation is not signiﬁcant except as speciﬁed by the semantics of initialization by<br />
constructor (12.6.2), cleanup (12.4), and storage layout (9.2, 11.1). — end note ]</p>
<p>which more or less says that it&#8217;s not guaranteed.</p>
<p>&nbsp;</p>
<h3><strong>2. Ordering of data members within a class.</strong></h3>
<p>So, it appears that a C++ compiler <em><strong>is</strong> </em>allowed to reorder data members of a class in memory vs. their textual declaration order if (and only if) their access control (i.e. public, private, protected) is different:</p>
<p style="padding-left: 30px;"><strong>15</strong> Nonstatic data members of a (non-union) class with the same access control (Clause 11) are allocated so that later members have higher addresses within a class object. The order of allocation of non-static data members with diﬀerent access control is unspeciﬁed (11). Implementation alignment requirements might cause two adjacent members not to be allocated immediately after each other;</p>
<p>I can&#8217;t imagine this would ever be a problem for you unless you&#8217;re writing a reflection library or similar.</p>
<p>&nbsp;</p>
<p><strong>3. Ordering of constructor calls when constructing types with base types</strong></p>
<p>Thankfully, there does appear to be some sanity remaining in the universe; as I managed to find the part of the standard that specifies the order in which constructors are called for types that use inheritance:</p>
<p style="padding-left: 30px;">In a non-delegating constructor, initialization proceeds in the following order:<br />
— First, and only for the constructor of the most derived class (1.8), virtual base classes are initialized in the order they appear on a depth-ﬁrst left-to-right traversal of the directed acyclic graph of base classes, where “left-to-right” is the order of appearance of the base classes in the derived class base-speciﬁer-list.<br />
— Then, direct base classes are initialized in declaration order as they appear in the base-speciﬁer-list (regardless of the order of the mem-initializers).<br />
— Then, non-static data members are initialized in the order they were declared in the class deﬁnition (again regardless of the order of the mem-initializers).<br />
— Finally, the compound-statement of the constructor body is executed.<br />
[Note: The declaration order is mandated to ensure that base and member subobjects are destroyed in the<br />
reverse order of initialization. — end note ]</p>
<p>TL;DR &#8211; (if you are not using virtual base classes) each constructor initialises its base classes in declaration order, then the class members in declaration order, then the constructor&#8217;s body is executed called.</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.altdevblogaday.com/2013/05/22/cc-low-level-curriculum-part-12-multiple-inheritance/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>C/C++ Low Level Curriculum Part 11: Inheritance</title>
		<link>http://www.altdevblogaday.com/2013/05/03/cc-low-level-curriculum-part-11-inheritance/</link>
		<comments>http://www.altdevblogaday.com/2013/05/03/cc-low-level-curriculum-part-11-inheritance/#comments</comments>
		<pubDate>Fri, 03 May 2013 21:06:20 +0000</pubDate>
		<dc:creator>Alex Darby</dc:creator>
				<category><![CDATA[#gamedev]]></category>
		<category><![CDATA[Education]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[C++]]></category>
		<category><![CDATA[education]]></category>
		<category><![CDATA[gamedev]]></category>
		<category><![CDATA[low level]]></category>
		<category><![CDATA[programming]]></category>

		<guid isPermaLink="false">http://www.altdevblogaday.com/?p=29256</guid>
		<description><![CDATA[<p>Hello, and welcome to the 11th part of the C / C++ low level curriculum. About time? Definitely!</p>
<p><a href="http://www.altdevblogaday.com/2013/01/05/cc-low-level-curriculum-part-10-user-defined-types/">Last time</a> we looked at the basics of User Defined Types: how structs, classes, and unions are laid out in memory; and (some of) the implications of memory alignment on this picture.</p>
<p><a href="http://www.altdevblogaday.com/2013/05/03/cc-low-level-curriculum-part-11-inheritance/" class="more-link">Read more on C/C++ Low Level Curriculum Part 11: Inheritance&#8230;</a></p>
]]></description>
				<content:encoded><![CDATA[<p>Hello, and welcome to the 11th part of the C / C++ low level curriculum. About time? Definitely!</p>
<p><a href="http://www.altdevblogaday.com/2013/01/05/cc-low-level-curriculum-part-10-user-defined-types/">Last time</a> we looked at the basics of User Defined Types: how structs, classes, and unions are laid out in memory; and (some of) the implications of memory alignment on this picture.</p>
<p>In part 11 we&#8217;re going to look at how inheritance affects this picture, in particular the implications for memory layout of derived types and also for their behaviour during construction and destruction (<em><strong>note:</strong> </em>we&#8217;re leaving multiple inheritance and the keyword <em><strong>virtual</strong></em> out of this picture to start with).<span id="more-29256"></span></p>
<p>&nbsp;</p>
<h2>Before We Begin</h2>
<p>I will assume that you have already read the previous posts in the series, but I will also put in-line links to any important terms or concepts that you might need to know about to make sense of what you&#8217;re reading. I&#8217;m helpful like that.</p>
<p>Another big assumption I&#8217;m going to make is that you&#8217;re already very familiar with the C++ language and  comfortable using the language features we&#8217;re discussing, as well as the accepted usage limitations of those features etc. If I need to demonstrate anything out of the ordinary I&#8217;ll explain it &#8211; or at least link to an explanation.</p>
<p>In this series I discuss what happens with vanilla unoptimised win32 debug code generated by the VS 2010 compiler &#8211; whilst the specifics will differ on other platforms (and probably with other compilers) the general sweep of the code should be basically the same &#8211; because it&#8217;s assembly that has been generated by a C++ compiler &#8211; and so following the same examples given here with a source / disassembly debugger on your platform of choice should provide you with the same insights we get here.</p>
<p>With this in mind, in case you missed them, here are the backlinks to the previous posts in the series:</p>
<ol>
<li><a href="http://www.altdevblogaday.com/2011/11/09/a-low-level-curriculum-for-c-and-c/" rel="nofollow">http://altdevblogaday.com/2011/11/09/a-low-level-curriculum-for-c-and-c/</a></li>
<li><a href="http://www.altdevblogaday.com/2011/11/24/c-c-low-level-curriculum-part-2-data-types/" rel="nofollow">http://altdevblogaday.com/2011/11/24/c-c-low-level-curriculum-part-2-data-types/</a></li>
<li><a href="http://www.altdevblogaday.com/2011/12/14/c-c-low-level-curriculum-part-3-the-stack/">http://altdevblogaday.com/2011/12/14/c-c-low-level-curriculum-part-3-the-stack/</a></li>
<li><a href="http://www.altdevblogaday.com/2011/12/24/c-c-low-level-curriculum-part-4-more-stack/">http://altdevblogaday.com/2011/12/24/c-c-low-level-curriculum-part-4-more-stack/</a></li>
<li><a href="http://www.altdevblogaday.com/2012/02/07/c-c-low-level-curriculum-part-5-even-more-stack/">http://altdevblogaday.com/2012/02/07/c-c-low-level-curriculum-part-5-even-more-stack/</a></li>
<li><a href="http://www.altdevblogaday.com/2012/03/07/c-c-low-level-curriculum-part-6-conditionals/">http://altdevblogaday.com/2012/03/07/c-c-low-level-curriculum-part-6-conditionals/</a></li>
<li><a href="http://www.altdevblogaday.com/2012/04/10/cc-low-level-curriculum-part-7-more-conditionals/">http://www.altdevblogaday.com/2012/04/10/cc-low-level-curriculum-part-7-more-conditionals/</a></li>
<li><a href="http://www.altdevblogaday.com/2012/05/07/cc-low-level-curriculum-part-8-looking-at-optimised-assembly/">http://www.altdevblogaday.com/2012/05/07/cc-low-level-curriculum-part-8-looking-at-optimised-assembly/</a></li>
<li><a href="http://www.altdevblogaday.com/2012/09/04/cc-low-level-curriculum-part-9-loops/">http://www.altdevblogaday.com/2012/09/04/cc-low-level-curriculum-part-9-loops/</a></li>
<li><a href="http://www.altdevblogaday.com/2013/01/05/cc-low-level-curriculum-part-10-user-defined-types/">http://www.altdevblogaday.com/2013/01/05/cc-low-level-curriculum-part-10-user-defined-types/</a></li>
</ol>
<p>I won&#8217;t lie &#8211; it&#8217;s not light reading :)</p>
<p>&nbsp;</p>
<h2>Class vs. Struct: a Gentle Reminder</h2>
<p>The C++ keywords struct and class define types that are identical in implementational detail and what you can do with them (the only difference being at the language level: the <em>default</em> access specifier if none is specified is <em>private</em> for class, and <em>public</em> for struct).</p>
<p>So, whilst I will be using the keyword <em><strong>class</strong></em> throughout this article please take it as read that anything we talk about here applies equally to types defined using the keyword <strong>struct</strong>.</p>
<p>&nbsp;</p>
<h2>What happens when we derive from another type?</h2>
<p>So, what <em>does</em> happen when you derive a user defined type from another non built-in type?</p>
<p>Clearly the data members you specify in the declarations have to go somewhere, and so do all those specified in the type(s) you are deriving from.</p>
<p>At the level of C++ there is nothing other than the standard to tell you how this works &#8211; and nothing other than looking at what happens with the code generated by the compiler you are using will tell you for definite.</p>
<p>As in the last post, we will be relying heavily on the frankly awesome secret 007 compiler flag /d1reportSingleClassLayout in order to tell us exactly how the (Visual Studio 2010 win32 x86) compiler has decided to lay our example structures out in memory.</p>
<p>It&#8217;s about time to look at some example code, so, rather than have you go through the usual rigmarole of setting up your project I have kindly set one up for you.</p>
<p>The zip file <a href="http://www.altdevblogaday.com/wp-content/uploads/2013/05/ADBAD_CCPPLLC_11_00.zip">in this link</a> contains a VS2010 solution with a single project and .cpp file lovingly set up to run the code shown below, which is in <em><strong>00_Inheritance.cpp</strong></em></p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
</pre></td><td class="code"><pre class="c" style="font-family:monospace;">class CTestBase
<span style="color: #009900;">&#123;</span>
public<span style="color: #339933;">:</span>
    <span style="color: #993333;">int</span> _iA<span style="color: #339933;">;</span>
    <span style="color: #993333;">int</span> _iB<span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span><span style="color: #339933;">;</span>
&nbsp;
class CTestDerived
<span style="color: #339933;">:</span> public CTestBase
<span style="color: #009900;">&#123;</span>
public<span style="color: #339933;">:</span>
    <span style="color: #993333;">int</span> _iC<span style="color: #339933;">;</span>
    <span style="color: #993333;">int</span> _iD<span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span><span style="color: #339933;">;</span>
&nbsp;
<span style="color: #993333;">int</span> main<span style="color: #009900;">&#40;</span><span style="color: #993333;">int</span> argc<span style="color: #339933;">,</span> <span style="color: #993333;">char</span><span style="color: #339933;">*</span> argv<span style="color: #009900;">&#91;</span><span style="color: #009900;">&#93;</span><span style="color: #009900;">&#41;</span>
<span style="color: #009900;">&#123;</span>
    <span style="color: #b1b100;">return</span> <span style="color: #0000dd;">0</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span></pre></td></tr></table></div>

<p>When you compile this project you should get the following in your &#8220;Build&#8221; output window (the magic of /d1reportSingleClassLayout!) :</p>
<pre>1&gt; class CTestBase size(8):
1&gt;   +---
1&gt; 0 | _iA
1&gt; 4 | _iB
1&gt;   +---
1&gt; 
1&gt; class CTestDerived size(16):
1&gt;    +---
1&gt;    | +--- (base class CTestBase)
1&gt;  0 | | _iA
1&gt;  4 | | _iB
1&gt;    | +---
1&gt;  8 | _iC
1&gt; 12 | _iD
1&gt;    +---</pre>
<p>Looking at this, it should be fairly obvious that the data members of <em><strong>CTestDerived</strong> </em>have just been concatenated onto the end of the memory layout of <em><strong>CTestBase </strong></em>- and, more importantly, that the memory layout of <em><strong>CTestBase</strong></em> within <em><strong>CTestDerived</strong></em> is identical to that when it&#8217;s not a base class.</p>
<p>It&#8217;s that simple!  (for certain definitions of &#8216;it&#8217; and &#8216;simple&#8217;&#8230;)</p>
<p>Armed with this information from last post:</p>
<p style="padding-left: 30px;">&#8220;A guarantee is given in both the C and C++ language specifications that memory address of each member will be higher than that of the one declared before it (see this post on <a href="http://stackoverflow.com/questions/281045/do-class-struct-members-always-get-created-in-memory-in-the-order-they-were-decl">Stack Overflow</a> for more detail of the wording).&#8221;</p>
<p>it is obvious that &#8211; since <em><strong>CTestDerived</strong> </em>inherits all of the members of <em><strong>CTestBase</strong> </em>- its members must appear after those of <em><strong>CTestBase</strong> </em>in memory.</p>
<p>I remember when I had this explained this to me - not long after having started my first job in the industry as a fresh faced graduate &#8211; I did the internal equivalent of a double-take, because the information I had just received was so bleedingly obvious that I couldn&#8217;t believe I&#8217;d ever not known it.</p>
<p>&nbsp;</p>
<h2>If it&#8217;s that easy, why post about it?</h2>
<p>Good question!</p>
<p>The fact that the memory layout of a type is identical in all situations is required by the standard &#8211; and also by logic &#8211; let&#8217;s see why&#8230;</p>
<p>First, download and open the <a href="http://www.altdevblogaday.com/wp-content/uploads/2013/05/ADBAD_CCPPLLC_11_01.zip">second zipped VS2010 project file</a> - this contains the code below in <em><strong>01_InheritanceWithFunctions.cpp</strong></em>:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
</pre></td><td class="code"><pre class="c" style="font-family:monospace;">class CTestBase
<span style="color: #009900;">&#123;</span>
public<span style="color: #339933;">:</span>
    <span style="color: #993333;">int</span> _iA<span style="color: #339933;">;</span>
    <span style="color: #993333;">int</span> _iB<span style="color: #339933;">;</span>
&nbsp;
    CTestBase<span style="color: #009900;">&#40;</span> <span style="color: #993333;">int</span> iA<span style="color: #339933;">,</span> <span style="color: #993333;">int</span> iB <span style="color: #009900;">&#41;</span>
    <span style="color: #339933;">:</span> _iA<span style="color: #009900;">&#40;</span> iA <span style="color: #009900;">&#41;</span>
    <span style="color: #339933;">,</span> _iB<span style="color: #009900;">&#40;</span> iB <span style="color: #009900;">&#41;</span>
    <span style="color: #009900;">&#123;</span><span style="color: #009900;">&#125;</span>
&nbsp;
    <span style="color: #993333;">int</span> SumBase<span style="color: #009900;">&#40;</span> <span style="color: #993333;">void</span> <span style="color: #009900;">&#41;</span>
    <span style="color: #009900;">&#123;</span>
        <span style="color: #b1b100;">return</span> _iA <span style="color: #339933;">+</span> _iB<span style="color: #339933;">;</span>
    <span style="color: #009900;">&#125;</span>
<span style="color: #009900;">&#125;</span><span style="color: #339933;">;</span>
&nbsp;
class CTestDerived
<span style="color: #339933;">:</span> public CTestBase
<span style="color: #009900;">&#123;</span>
public<span style="color: #339933;">:</span>
    <span style="color: #993333;">int</span> _iC<span style="color: #339933;">;</span>
    <span style="color: #993333;">int</span> _iD<span style="color: #339933;">;</span>
&nbsp;
    CTestDerived<span style="color: #009900;">&#40;</span> <span style="color: #993333;">int</span> iA<span style="color: #339933;">,</span> <span style="color: #993333;">int</span> iB<span style="color: #339933;">,</span> <span style="color: #993333;">int</span> iC<span style="color: #339933;">,</span> <span style="color: #993333;">int</span> iD <span style="color: #009900;">&#41;</span>
    <span style="color: #339933;">:</span> CTestBase <span style="color: #009900;">&#40;</span> iA<span style="color: #339933;">,</span> iB <span style="color: #009900;">&#41;</span>
    <span style="color: #339933;">,</span> _iC <span style="color: #009900;">&#40;</span> iC <span style="color: #009900;">&#41;</span>
    <span style="color: #339933;">,</span> _iD <span style="color: #009900;">&#40;</span> iD <span style="color: #009900;">&#41;</span>
    <span style="color: #009900;">&#123;</span><span style="color: #009900;">&#125;</span>
&nbsp;
    <span style="color: #993333;">int</span> SumDerived<span style="color: #009900;">&#40;</span> <span style="color: #993333;">void</span> <span style="color: #009900;">&#41;</span>
    <span style="color: #009900;">&#123;</span>
        <span style="color: #b1b100;">return</span> _iA <span style="color: #339933;">+</span> _iB <span style="color: #339933;">+</span> _iC <span style="color: #339933;">+</span> _iD<span style="color: #339933;">;</span>
    <span style="color: #009900;">&#125;</span>
<span style="color: #009900;">&#125;</span><span style="color: #339933;">;</span>
&nbsp;
<span style="color: #993333;">int</span> main<span style="color: #009900;">&#40;</span><span style="color: #993333;">int</span> argc<span style="color: #339933;">,</span> <span style="color: #993333;">char</span><span style="color: #339933;">*</span> argv<span style="color: #009900;">&#91;</span><span style="color: #009900;">&#93;</span><span style="color: #009900;">&#41;</span>
<span style="color: #009900;">&#123;</span>
    CTestBase       cTestBase   <span style="color: #009900;">&#40;</span> argc<span style="color: #339933;">,</span> argc <span style="color: #339933;">+</span> <span style="color: #0000dd;">1</span> <span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    CTestDerived    cTestDerived<span style="color: #009900;">&#40;</span> argc<span style="color: #339933;">,</span> argc <span style="color: #339933;">+</span> <span style="color: #0000dd;">1</span><span style="color: #339933;">,</span> argc <span style="color: #339933;">+</span> <span style="color: #0000dd;">2</span><span style="color: #339933;">,</span> argc <span style="color: #339933;">+</span> <span style="color: #0000dd;">3</span> <span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
    <span style="color: #b1b100;">return</span> cTestBase.<span style="color: #202020;">SumBase</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span> <span style="color: #339933;">+</span> cTestDerived.<span style="color: #202020;">SumBase</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span> <span style="color: #339933;">+</span> cTestDerived.<span style="color: #202020;">SumDerived</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span></pre></td></tr></table></div>

<p>Put a breakpoint on the return statement from main, and then compile and run the <em><strong>release</strong> </em>build configuration.</p>
<p>The first thing to note is that the memory layouts printed to the output window during the build are unaffected by the addition of these functions.</p>
<p>This is what you would expect, as we know that non-virtual member function calls are resolved at compile time just like regular non-member and static member functions.</p>
<p>Since <em><strong>CTestDerived</strong> </em>is derived from <em><strong>CTestBase</strong></em>, we know from our high level knowledge about C++ that we can call both of these functions on an instance of <em><strong>CTestDerived</strong></em> &#8211; what we&#8217;re looking at right now is how this is implemented.</p>
<p>When the breakpoint is hit, right click and choose &#8220;Go To Disassembly&#8221;.</p>
<p>I&#8217;ve pasted the part I&#8217;d like to discuss below&#8230;</p>
<p>(N.B. to get the same disassembly as this you should have the following Viewing Options checked in the disassembly window: &#8216;Show source code&#8217;, &#8216;Show line numbers&#8217;, &#8216;Show address&#8217;, and &#8216;Show symbol names&#8217;)</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
</pre></td><td class="code"><pre class="asm" style="font-family:monospace;">    <span style="color: #ff0000;">44</span><span style="color: #339933;">:</span>     return cTestBase<span style="color: #339933;">.</span>SumBase<span style="color: #009900; font-weight: bold;">&#40;</span><span style="color: #009900; font-weight: bold;">&#41;</span> <span style="color: #339933;">+</span> cTestDerived<span style="color: #339933;">.</span>SumBase<span style="color: #009900; font-weight: bold;">&#40;</span><span style="color: #009900; font-weight: bold;">&#41;</span> <span style="color: #339933;">+</span> cTestDerived<span style="color: #339933;">.</span>SumDerived<span style="color: #009900; font-weight: bold;">&#40;</span><span style="color: #009900; font-weight: bold;">&#41;</span><span style="color: #666666; font-style: italic;">;</span>
<span style="color: #adadad; font-style: italic;">0129109A</span>  <span style="color: #00007f; font-weight: bold;">lea</span>         <span style="color: #46aa03; font-weight: bold;">ecx</span><span style="color: #339933;">,</span><span style="color: #009900; font-weight: bold;">&#91;</span>cTestDerived<span style="color: #009900; font-weight: bold;">&#93;</span>  
<span style="color: #adadad; font-style: italic;">0129109D</span>  <span style="color: #00007f; font-weight: bold;">call</span>        CTestDerived<span style="color: #339933;">::</span>SumDerived <span style="color: #009900; font-weight: bold;">&#40;</span><span style="color: #ff0000;">1291060h</span><span style="color: #009900; font-weight: bold;">&#41;</span>  
<span style="color: #adadad; font-style: italic;">012910A2</span>  <span style="color: #00007f; font-weight: bold;">lea</span>         <span style="color: #46aa03; font-weight: bold;">ecx</span><span style="color: #339933;">,</span><span style="color: #009900; font-weight: bold;">&#91;</span>cTestDerived<span style="color: #009900; font-weight: bold;">&#93;</span>  
<span style="color: #adadad; font-style: italic;">012910A5</span>  <span style="color: #00007f; font-weight: bold;">mov</span>         <span style="color: #46aa03; font-weight: bold;">esi</span><span style="color: #339933;">,</span><span style="color: #46aa03; font-weight: bold;">eax</span>  
<span style="color: #adadad; font-style: italic;">012910A7</span>  <span style="color: #00007f; font-weight: bold;">call</span>        CTestBase<span style="color: #339933;">::</span>SumBase <span style="color: #009900; font-weight: bold;">&#40;</span><span style="color: #ff0000;">1291020h</span><span style="color: #009900; font-weight: bold;">&#41;</span>  
<span style="color: #adadad; font-style: italic;">012910AC</span>  <span style="color: #00007f; font-weight: bold;">lea</span>         <span style="color: #46aa03; font-weight: bold;">ecx</span><span style="color: #339933;">,</span><span style="color: #009900; font-weight: bold;">&#91;</span>cTestBase<span style="color: #009900; font-weight: bold;">&#93;</span>  
<span style="color: #adadad; font-style: italic;">012910AF</span>  <span style="color: #00007f; font-weight: bold;">add</span>         <span style="color: #46aa03; font-weight: bold;">esi</span><span style="color: #339933;">,</span><span style="color: #46aa03; font-weight: bold;">eax</span>  
<span style="color: #adadad; font-style: italic;">012910B1</span>  <span style="color: #00007f; font-weight: bold;">call</span>        CTestBase<span style="color: #339933;">::</span>SumBase <span style="color: #009900; font-weight: bold;">&#40;</span><span style="color: #ff0000;">1291020h</span><span style="color: #009900; font-weight: bold;">&#41;</span>  
<span style="color: #adadad; font-style: italic;">012910B6</span>  <span style="color: #00007f; font-weight: bold;">pop</span>         <span style="color: #46aa03; font-weight: bold;">edi</span>  
<span style="color: #adadad; font-style: italic;">012910B7</span>  <span style="color: #00007f; font-weight: bold;">add</span>         <span style="color: #46aa03; font-weight: bold;">eax</span><span style="color: #339933;">,</span><span style="color: #46aa03; font-weight: bold;">esi</span></pre></td></tr></table></div>

<p>We&#8217;ve <a href="http://www.altdevblogaday.com/2012/02/07/c-c-low-level-curriculum-part-5-even-more-stack/">previously covered</a> that the win32 calling convention for member functions (&#8216;thiscall&#8217;) passes <em><strong>this</strong></em> to member functions in the <em><strong>ecx</strong></em> register.</p>
<p>Correspondingly, you&#8217;ll notice that the address of <em><strong>cTestBase</strong></em> and <em><strong>cTestDerived</strong></em> are being stored in <em><strong>ecx</strong></em> using <em><strong>lea</strong></em> (&#8216;load effective address&#8217;) immediately before calling their member functions.</p>
<p>Specifically, note that the address of <em><strong>cTestDerived</strong> </em>is passed un-tampered with in <em><strong>ecx</strong></em> when calling the base class function <em><strong>CTestBase::SumBase</strong></em>. Remember this for later (and for the next post!).</p>
<p>So, let&#8217;s look at the disassembly for <em><strong>CTestBase::SumBase </strong></em>and <em><strong>CTestDerived::SumDerived </strong></em>- I tend to single step the disassembly and step into them, but putting breakpoints in them is more reliable :)</p>
<p><em><strong>CTestBase::SumBase </strong></em></p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
</pre></td><td class="code"><pre class="asm" style="font-family:monospace;">    <span style="color: #ff0000;">14</span><span style="color: #339933;">:</span>     <span style="color: #00007f; font-weight: bold;">int</span> SumBase<span style="color: #009900; font-weight: bold;">&#40;</span> void <span style="color: #009900; font-weight: bold;">&#41;</span>
    <span style="color: #ff0000;">15</span><span style="color: #339933;">:</span>     <span style="color: #009900; font-weight: bold;">&#123;</span>
    <span style="color: #ff0000;">16</span><span style="color: #339933;">:</span>         return _iA <span style="color: #339933;">+</span> _iB<span style="color: #666666; font-style: italic;">;</span>
<span style="color: #adadad; font-style: italic;">01291020</span>  <span style="color: #00007f; font-weight: bold;">mov</span>         <span style="color: #46aa03; font-weight: bold;">eax</span><span style="color: #339933;">,</span><span style="color: #0000ff; font-weight: bold;">dword</span> ptr <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #46aa03; font-weight: bold;">ecx</span><span style="color: #339933;">+</span><span style="color: #ff0000;">4</span><span style="color: #009900; font-weight: bold;">&#93;</span>  
<span style="color: #adadad; font-style: italic;">01291023</span>  <span style="color: #00007f; font-weight: bold;">add</span>         <span style="color: #46aa03; font-weight: bold;">eax</span><span style="color: #339933;">,</span><span style="color: #0000ff; font-weight: bold;">dword</span> ptr <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #46aa03; font-weight: bold;">ecx</span><span style="color: #009900; font-weight: bold;">&#93;</span>  
    <span style="color: #ff0000;">17</span><span style="color: #339933;">:</span>     <span style="color: #009900; font-weight: bold;">&#125;</span>
<span style="color: #adadad; font-style: italic;">01291025</span>  <span style="color: #00007f; font-weight: bold;">ret</span></pre></td></tr></table></div>

<p><em><strong>CTestDerived::SumDerived </strong></em></p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
</pre></td><td class="code"><pre class="asm" style="font-family:monospace;">    <span style="color: #ff0000;">33</span><span style="color: #339933;">:</span>     <span style="color: #00007f; font-weight: bold;">int</span> SumDerived<span style="color: #009900; font-weight: bold;">&#40;</span> void <span style="color: #009900; font-weight: bold;">&#41;</span>
    <span style="color: #ff0000;">34</span><span style="color: #339933;">:</span>     <span style="color: #009900; font-weight: bold;">&#123;</span>
    <span style="color: #ff0000;">35</span><span style="color: #339933;">:</span>         return _iA <span style="color: #339933;">+</span> _iB <span style="color: #339933;">+</span> _iC <span style="color: #339933;">+</span> _iD<span style="color: #666666; font-style: italic;">;</span>
<span style="color: #adadad; font-style: italic;">01291060</span>  <span style="color: #00007f; font-weight: bold;">mov</span>         <span style="color: #46aa03; font-weight: bold;">eax</span><span style="color: #339933;">,</span><span style="color: #0000ff; font-weight: bold;">dword</span> ptr <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #46aa03; font-weight: bold;">ecx</span><span style="color: #339933;">+</span><span style="color: #ff0000;">0Ch</span><span style="color: #009900; font-weight: bold;">&#93;</span>  
<span style="color: #adadad; font-style: italic;">01291063</span>  <span style="color: #00007f; font-weight: bold;">add</span>         <span style="color: #46aa03; font-weight: bold;">eax</span><span style="color: #339933;">,</span><span style="color: #0000ff; font-weight: bold;">dword</span> ptr <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #46aa03; font-weight: bold;">ecx</span><span style="color: #339933;">+</span><span style="color: #ff0000;">8</span><span style="color: #009900; font-weight: bold;">&#93;</span>  
<span style="color: #adadad; font-style: italic;">01291066</span>  <span style="color: #00007f; font-weight: bold;">add</span>         <span style="color: #46aa03; font-weight: bold;">eax</span><span style="color: #339933;">,</span><span style="color: #0000ff; font-weight: bold;">dword</span> ptr <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #46aa03; font-weight: bold;">ecx</span><span style="color: #339933;">+</span><span style="color: #ff0000;">4</span><span style="color: #009900; font-weight: bold;">&#93;</span>  
<span style="color: #adadad; font-style: italic;">01291069</span>  <span style="color: #00007f; font-weight: bold;">add</span>         <span style="color: #46aa03; font-weight: bold;">eax</span><span style="color: #339933;">,</span><span style="color: #0000ff; font-weight: bold;">dword</span> ptr <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #46aa03; font-weight: bold;">ecx</span><span style="color: #009900; font-weight: bold;">&#93;</span>  
    <span style="color: #ff0000;">36</span><span style="color: #339933;">:</span>     <span style="color: #009900; font-weight: bold;">&#125;</span>
<span style="color: #adadad; font-style: italic;">0129106B</span>  <span style="color: #00007f; font-weight: bold;">ret</span></pre></td></tr></table></div>

<p>We can see that all offsets from <strong>ecx</strong> used in both functions correspond to the memory layouts we have in the build output for the type that the function belongs to.</p>
<p>Since <em><strong>_iA</strong> </em>and <em><strong>_iB</strong></em> are at the same offset within both <em><strong>CTestBase</strong> </em>and <em><strong>CTestDerived </strong></em>(i.e. 0 and 4 bytes respectively), <em><strong>CTestBase::SumBase</strong></em> can safely be called on instances of <em><strong>CTestDerived</strong></em>.</p>
<p>We already know that this is possible from our high level understanding of C++, but now we know the implementational detail that makes it possible.</p>
<p>Whilst the specifics of the disassembly will probably differ from platform to platform, the principles underlying its operation should not.</p>
<p>&nbsp;</p>
<h2>Summary</h2>
<p>To summarise what we&#8217;ve established so far :</p>
<p style="padding-left: 30px;"><em><strong>1)</strong></em> in member functions, member data of a class is accessed via specific offsets from the <em><strong>this</strong></em> pointer</p>
<p style="padding-left: 30px;"><em><strong>2)</strong></em> these offsets are constants at compile time and are baked into the assembly code for the member functions</p>
<p style="padding-left: 30px;"><em><strong>3)</strong></em> this means that the memory layout of the members of a given class must always be identical or the member functions won&#8217;t work</p>
<p>&nbsp;</p>
<p>If we follow this logic through, we can see that:</p>
<p style="padding-left: 30px;"><em><strong>4)</strong></em> the memory layout of a class <em><strong>B</strong> </em>that inherits from another class <em><strong>A</strong> </em>must contain class <em><strong>A</strong></em>&#8216;s members in the same memory layout as class <em><strong>A</strong></em></p>
<p style="padding-left: 30px;"><em><strong>5)</strong></em> the memory layout of any given class <em><strong>A</strong></em> is identical regardless of whether it is an instance of <em><strong>A,</strong></em> or it is included in the memory of some type derived from <em><strong>A. </strong></em></p>
<p style="padding-left: 30px;"><em><strong></strong><strong>6)</strong></em> <em>note:</em> this behaviour is required by the standard, and (more significantly) by <em><strong>logic.</strong></em></p>
<p><em><strong> </strong></em></p>
<p>Finally, it follows that (because each member of a struct must have a higher address than those declared before it):</p>
<p style="padding-left: 30px;"><em><strong>7)</strong></em> the extra memory required by derived class <em><strong>B</strong> </em>will be concatenated onto the end of the memory layout of its base class <em><strong>A</strong></em></p>
<p>&nbsp;</p>
<p>That&#8217;s all for now &#8211; next time we&#8217;ll look at how multiple inheritance affects this picture.</p>
<p>I know it&#8217;s pretty short, but this just means the next one will get here more quickly :)</p>
<p>&nbsp;</p>
<h2>Epilogue &#8211; for those who wondered what I changed in the project settings</h2>
<p>There&#8217;s quite  few changes to the default VS2010 win32 console app project properties in the projects I&#8217;ve zipped up for this post.</p>
<p>The changes have to do with making the optimised release build configuration leave the code structure alone (i.e. not strip out or &#8216;fold&#8217; functions to save exe size, prevent functions being inlined), and prevent extraneous &#8216;debug checking&#8217; code being inserted (makes function calls slower, and code less easy to follow in disassembly)</p>
<ul>
<li>turning off &#8216;Whole Program Optimisation&#8217; (Configuration Properties-&gt;General)</li>
<li>turning off &#8216;Inline Function Expansion&#8217; (Configuration Properties-&gt;C/C++ -&gt;Optimisation)</li>
<li>turning off &#8216;Basic Runtime Checks&#8217; (Configuration Properties-&gt;C/C++ -&gt;Code Generation)</li>
<li>getting rid of pre-compiled headers to streamline the number of files (Configuration Properties-&gt;C/C++ -&gt;Precompiled Headers)</li>
<li>turning off &#8216;Enable COMDAT folding&#8217; (Configuration Properties-&gt;Linker-&gt; Optimization)</li>
</ul>
<p>Essentially, this makes the Release configuration assembly have the same structure as the Debug one WRT function calls.</p>
<p>Also, I use the <em><strong>argc</strong></em> parameter to main as input to the code, and return value computed from that so that the optimiser can&#8217;t assume constant input or output values.</p>
<p>If you use constant inputs, or don&#8217;t output a value computed from the inputs then it&#8217;s pretty hard to convince the optimiser not to optimse the entire .exe to &#8216;return 0;&#8217;&#8230; ;)</p>
<p>&nbsp;</p>
<h2>Shout out</h2>
<p>Thanks (again) to Bruce &#8211; king (or at the very least duke) of advice and peer review.</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.altdevblogaday.com/2013/05/03/cc-low-level-curriculum-part-11-inheritance/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>C/C++ Low Level Curriculum Part 10: User Defined Types</title>
		<link>http://www.altdevblogaday.com/2013/01/05/cc-low-level-curriculum-part-10-user-defined-types/</link>
		<comments>http://www.altdevblogaday.com/2013/01/05/cc-low-level-curriculum-part-10-user-defined-types/#comments</comments>
		<pubDate>Sat, 05 Jan 2013 15:25:05 +0000</pubDate>
		<dc:creator>Alex Darby</dc:creator>
				<category><![CDATA[#gamedev]]></category>
		<category><![CDATA[Education]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[C++]]></category>
		<category><![CDATA[education]]></category>
		<category><![CDATA[low level]]></category>
		<category><![CDATA[programming]]></category>

		<guid isPermaLink="false">http://www.altdevblogaday.com/?p=28818</guid>
		<description><![CDATA[<p>Hello again peoples of the interweb. It has been <em>quite</em> a while since the last one (probably even longer than the gap between part 8 and part 9) so I thought I ought to pull my finger out and get the next post in the C/C++ Low Level Curriculum done.</p>
<p><a href="http://www.altdevblogaday.com/2013/01/05/cc-low-level-curriculum-part-10-user-defined-types/" class="more-link">Read more on C/C++ Low Level Curriculum Part 10: User Defined Types&#8230;</a></p>
]]></description>
				<content:encoded><![CDATA[<p>Hello again peoples of the interweb. It has been <em>quite</em> a while since the last one (probably even longer than the gap between part 8 and part 9) so I thought I ought to pull my finger out and get the next post in the C/C++ Low Level Curriculum done.</p>
<p>In the previous posts we&#8217;ve covered the structural aspects of the language: flow control, functions, and so forth; and so now we move on to looking in detail at user defined types in C/C++ (i.e. struct, class, and associated keywords) which I naively expected to comprise the bulk of this potentially never ending series when I started it. D&#8217;oh!</p>
<p><span id="more-28818"></span>Before we start, dear reader, I&#8217;m going to assume that you&#8217;re the kind of person / recently self aware google web trawling AI entity who likes to understand your jargon terms and so I will be including appropriate links (probably mostly wikipedia or other ADBAD articles) where appropriate.</p>
<p>You may also want to read the previous posts in this series (though I don&#8217;t think this one will particularly rely on older posts) so, in case you missed them, here are the back-links for preceding articles in the series (warning: reading these might take a while&#8230;) :</p>
<ol>
<li><a href="http://www.altdevblogaday.com/2011/11/09/a-low-level-curriculum-for-c-and-c/" rel="nofollow">http://altdevblogaday.com/2011/11/09/a-low-level-curriculum-for-c-and-c/</a></li>
<li><a href="http://www.altdevblogaday.com/2011/11/24/c-c-low-level-curriculum-part-2-data-types/" rel="nofollow">http://altdevblogaday.com/2011/11/24/c-c-low-level-curriculum-part-2-data-types/</a></li>
<li><a href="http://www.altdevblogaday.com/2011/12/14/c-c-low-level-curriculum-part-3-the-stack/">http://altdevblogaday.com/2011/12/14/c-c-low-level-curriculum-part-3-the-stack/</a></li>
<li><a href="http://www.altdevblogaday.com/2011/12/24/c-c-low-level-curriculum-part-4-more-stack/">http://altdevblogaday.com/2011/12/24/c-c-low-level-curriculum-part-4-more-stack/</a></li>
<li><a href="http://www.altdevblogaday.com/2012/02/07/c-c-low-level-curriculum-part-5-even-more-stack/">http://altdevblogaday.com/2012/02/07/c-c-low-level-curriculum-part-5-even-more-stack/</a></li>
<li><a href="http://www.altdevblogaday.com/2012/03/07/c-c-low-level-curriculum-part-6-conditionals/">http://altdevblogaday.com/2012/03/07/c-c-low-level-curriculum-part-6-conditionals/</a></li>
<li><a href="http://www.altdevblogaday.com/2012/04/10/cc-low-level-curriculum-part-7-more-conditionals/">http://www.altdevblogaday.com/2012/04/10/cc-low-level-curriculum-part-7-more-conditionals/</a></li>
<li><a href="http://www.altdevblogaday.com/2012/05/07/cc-low-level-curriculum-part-8-looking-at-optimised-assembly/">http://www.altdevblogaday.com/2012/05/07/cc-low-level-curriculum-part-8-looking-at-optimised-assembly/</a></li>
<li><a href="http://www.altdevblogaday.com/2012/09/04/cc-low-level-curriculum-part-9-loops/">http://www.altdevblogaday.com/2012/09/04/cc-low-level-curriculum-part-9-loops/</a></li>
</ol>
<p>&nbsp;</p>
<h2>Data Types and Enums</h2>
<p>We covered <a href="http://www.altdevblogaday.com/2011/11/24/c-c-low-level-curriculum-part-2-data-types/">fundamental and intrinsic types in the second post</a> in the series, which also touched on the <em><strong>enum</strong> </em>keyword. I deliberately didn&#8217;t cover the use of the keywords <em><strong>struct</strong></em> or <em><strong>class </strong></em>in this post, but we did cover some facts about the behaviour of values defined using the <em><strong>enum</strong> </em>keyword (i.e. that it was up to the compiler to decide what intrinsic type to use to represent each <em><strong>enum</strong></em>erated type you declare, based on the range required by its values).</p>
<p>Helpfully, the <a href="http://en.wikipedia.org/wiki/C%2B%2B11#Strongly_typed_enumerations">C++11 standard made some sweeping changes to the behaviour of enums</a>; amongst which was the ability to specify the the fundamental type used to represent the values of each <em><strong>enum</strong></em>. Tasty.</p>
<p>Mentioning this welcome change is the extent of our discussion of <em><strong>enum</strong></em>, so let&#8217;s get on with starting to look at <strong style="font-style: italic;">struct</strong>, <em><strong>class, </strong></em>and<em><strong>union</strong></em>.</p>
<p>&nbsp;</p>
<h2>Thankyou Visual Studio Devteam</h2>
<p>If you have been <em>really</em> paying attention to the older posts, you might remember that I mentioned some undocumented (and unsupported!) command line options for Microsoft&#8217;s Visual Studio C++ compiler which can be used to print out the memory layout of data types defined using the <em><strong>struct</strong>, </em><em><strong>class, </strong></em>or<em><strong>union</strong></em> keywords.</p>
<p>These secret compiler options are <strong>/d1reportAllClassLayout</strong> which reports the layout of all classes in the current project, and its more user friendly sibling <strong>/d1reportSingleClassLayout</strong><em><strong>xxx</strong> </em>(where <em><strong>xxx</strong> </em> is a string used to do a substring match against classes that you wish to have reported).</p>
<p>I will be leaning pretty heavily on this compiler for the next few posts, so we may as well cover how to use it. It definitely works in VS2010 and VS2012; it even works with the Express versions. Woo!</p>
<p>Here&#8217;s where you type in the command line option in the property pages (n.b. this is the &#8216;single&#8217; version and matches any class or struct with the string &#8216;Test&#8217; in its name):</p>
<p><a href="http://www.altdevblogaday.com/wp-content/uploads/2012/12/CCPPLLC_010_MagicCompilerOptions.png"><img class="alignnone  wp-image-28825" alt="" src="http://www.altdevblogaday.com/wp-content/uploads/2012/12/CCPPLLC_010_MagicCompilerOptions.png" width="664" height="469" /></a></p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<h2>Output from /d1reportSingleClassLayout</h2>
<p>So far so froody.</p>
<p>Now, it&#8217;s about time we looked at a code snippet defining a simple <a href="http://en.wikipedia.org/wiki/Plain_old_data_structure">POD</a> <em><strong>struct</strong> </em>(POD types being the simplest cases of aggregate data types) and the output produced by <strong>/d1reportSingleClassLayout</strong> when we build it&#8230;</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
</pre></td><td class="code"><pre class="c" style="font-family:monospace;"><span style="color: #339933;">#include &quot;stdafx.h&quot;</span>
&nbsp;
<span style="color: #993333;">struct</span> STest
<span style="color: #009900;">&#123;</span>
    <span style="color: #993333;">int</span> iA<span style="color: #339933;">;</span>
    <span style="color: #993333;">int</span> iB<span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span><span style="color: #339933;">;</span>
&nbsp;
<span style="color: #993333;">int</span> main<span style="color: #009900;">&#40;</span><span style="color: #993333;">int</span> argc<span style="color: #339933;">,</span> <span style="color: #993333;">char</span><span style="color: #339933;">*</span> argv<span style="color: #009900;">&#91;</span><span style="color: #009900;">&#93;</span><span style="color: #009900;">&#41;</span>
<span style="color: #009900;">&#123;</span>
    <span style="color: #b1b100;">return</span> <span style="color: #0000dd;">0</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span></pre></td></tr></table></div>

<p>When we compile this with the fancy secret compiler switch, as expected we find an extra bit of information in amongst the usual Visual Studio compiler&#8217;s output:</p>
<pre>1&gt; class STest size(8):
1&gt;   +---
1&gt; 0 | iA
1&gt; 4 | iB
1&gt;   +---</pre>
<p>Hopefully this should appear pretty much self explanatory to you, but in case it doesn&#8217;t &#8211; rest assured we&#8217;re about to look at it in a little more detail.</p>
<p>The first line contains the name of the class and its size in bytes &#8211; <em><strong>STest</strong></em> is a <em><strong>struct</strong></em>, but it is reported as a <em><strong>class</strong></em> &#8211; don&#8217;t worry about this for now.</p>
<p>The <em><strong>struct</strong></em>&#8216;s name contains the string &#8216;Test&#8217; which is the substring we specified to match against in the compiler option in order to get class layout information.</p>
<p>The rest of the information details the member-by-member memory layout of the struct organised by the name of the data members &#8211; the number at the start of the line is the memory offset in bytes of that member relative to the start of the struct.</p>
<p>The first thing to note is that the member variables are laid out in memory in the order specified in the class declaration.</p>
<p>A guarantee is given in both the C and C++ language specifications that memory address of each member will be higher than that of the one declared before it (see this post on <a href="http://stackoverflow.com/questions/281045/do-class-struct-members-always-get-created-in-memory-in-the-order-they-were-decl">Stack Overflow</a> for more detail of the wording).</p>
<p>In the case of <em><strong>STest</strong></em> the first member <strong>iA</strong> is at an offset of 0 bytes from the start of the struct; and the second member <em><strong>iB</strong> </em>is at an offset of 4 bytes from the start of the struct.</p>
<p>Importantly (by doing a little maths with the offsets and the size of the struct) this also tells us that the size taken up by <em><strong>iA</strong></em> is 4 bytes, and the size taken up by <em><strong>iB</strong></em> is 4 bytes &#8211; since <em><strong>sizeof(int) == 4</strong></em> this matches up with what we would expect.</p>
<h2>Accessing the members of a struct in assembly</h2>
<p>We all knew this was coming, right?</p>
<p>Woo! I know you all live for hexadecimal numbers and assembler mnemonics.</p>
<p>As always, the main thing I want you to take away from this is not so much the understanding of the specific assembly code itself (though clearly it has its benefits&#8230;), but more of a generalised appreciation for the combinations of assembly instructions that &#8216;smell like&#8217; the compiler accessing the members of a struct or class.</p>
<p>Getting used to the assembly level &#8216;smells&#8217; of the various high level constructs in compiler generated assembly code will enable you to find your bearings much more quickly in code you see in the disassembly window, and &#8211; most importantly (assuming that you are lucky enough to have a valid callstack &#8211; and, like a sensible person, you have symbols for your release build) &#8211; you should quickly develop the ability to work out which bit of the high level code corresponds to the assembly you&#8217;re currently looking at. Win.</p>
<p>Here&#8217;s a code snippet that accesses the data members of the struct we just defined:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
</pre></td><td class="code"><pre class="c" style="font-family:monospace;"><span style="color: #339933;">#include &quot;stdafx.h&quot;</span>
&nbsp;
<span style="color: #993333;">struct</span> STest
<span style="color: #009900;">&#123;</span>
    <span style="color: #993333;">int</span> iA<span style="color: #339933;">;</span>
    <span style="color: #993333;">int</span> iB<span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span><span style="color: #339933;">;</span>
&nbsp;
<span style="color: #993333;">int</span> main<span style="color: #009900;">&#40;</span><span style="color: #993333;">int</span> argc<span style="color: #339933;">,</span> <span style="color: #993333;">char</span><span style="color: #339933;">*</span> argv<span style="color: #009900;">&#91;</span><span style="color: #009900;">&#93;</span><span style="color: #009900;">&#41;</span>
<span style="color: #009900;">&#123;</span>
    STest sOnStack<span style="color: #339933;">;</span>
    sOnStack.<span style="color: #202020;">iA</span> <span style="color: #339933;">=</span> <span style="color: #0000dd;">1</span><span style="color: #339933;">;</span>
    sOnStack.<span style="color: #202020;">iB</span> <span style="color: #339933;">=</span> <span style="color: #0000dd;">2</span><span style="color: #339933;">;</span>
&nbsp;
    STest<span style="color: #339933;">*</span> psOnHeap <span style="color: #339933;">=</span> new STest<span style="color: #339933;">;</span>
    psOnHeap<span style="color: #339933;">-&gt;</span>iA <span style="color: #339933;">=</span> <span style="color: #0000dd;">3</span><span style="color: #339933;">;</span>
    psOnHeap<span style="color: #339933;">-&gt;</span>iB <span style="color: #339933;">=</span> <span style="color: #0000dd;">4</span><span style="color: #339933;">;</span>
    delete psOnHeap<span style="color: #339933;">;</span>
&nbsp;
    <span style="color: #b1b100;">return</span> <span style="color: #0000dd;">0</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span></pre></td></tr></table></div>

<p>Before we look at the disassembly we should explain a little about the snippet.</p>
<p>Two instances of <em><strong>STest</strong></em> are created:</p>
<ul>
<li><em><strong>sOnStack</strong> </em>on the Stack &#8211; i.e. automatically allocated by the compiler as a local variable</li>
<li><em><strong>psOnHeap</strong></em> on the Heap &#8211; i.e. dynamically allocated.</li>
</ul>
<p>The reasons for doing this will become clear once we&#8217;ve inspected the assembly.</p>
<p style="padding-left: 30px;"><em><strong>Aside</strong>: technically the area of dynamic memory managed by <strong>new</strong> and <strong>delete</strong> in C++ is called the Free Store, but almost everyone calls it the Heap. I&#8217;m pretty sure this is because the dynamic memory in C managed by <a href="http://en.wikipedia.org/wiki/C_dynamic_memory_allocation">malloc and free</a> has colloquially and historically been known as &#8220;the Heap&#8221;, and a lot of C++ implementations define new and delete using malloc and free (and most if not all used to).</em></p>
<p>So here&#8217;s the disassembly generated by the VS2010 debug compiler:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
</pre></td><td class="code"><pre class="asm" style="font-family:monospace;">    <span style="color: #ff0000;">14</span><span style="color: #339933;">:</span> STest sOnStack<span style="color: #666666; font-style: italic;">;</span>
    <span style="color: #ff0000;">15</span><span style="color: #339933;">:</span> sOnStack<span style="color: #339933;">.</span>iA = <span style="color: #ff0000;">1</span><span style="color: #666666; font-style: italic;">;</span>
<span style="color: #adadad; font-style: italic;">00A01269</span> <span style="color: #00007f; font-weight: bold;">mov</span> <span style="color: #0000ff; font-weight: bold;">dword</span> ptr <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #46aa03; font-weight: bold;">ebp</span><span style="color: #339933;">-</span><span style="color: #ff0000;">8</span><span style="color: #009900; font-weight: bold;">&#93;</span><span style="color: #339933;">,</span><span style="color: #ff0000;">1</span> 
    <span style="color: #ff0000;">16</span><span style="color: #339933;">:</span> sOnStack<span style="color: #339933;">.</span>iB = <span style="color: #ff0000;">2</span><span style="color: #666666; font-style: italic;">;</span>
<span style="color: #adadad; font-style: italic;">00A01270</span> <span style="color: #00007f; font-weight: bold;">mov</span> <span style="color: #0000ff; font-weight: bold;">dword</span> ptr <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #46aa03; font-weight: bold;">ebp</span><span style="color: #339933;">-</span><span style="color: #ff0000;">4</span><span style="color: #009900; font-weight: bold;">&#93;</span><span style="color: #339933;">,</span><span style="color: #ff0000;">2</span> 
    <span style="color: #ff0000;">17</span><span style="color: #339933;">:</span> 
    <span style="color: #ff0000;">18</span><span style="color: #339933;">:</span> STest<span style="color: #339933;">*</span> psOnHeap = new STest<span style="color: #666666; font-style: italic;">;</span>
<span style="color: #adadad; font-style: italic;">00A01277</span> <span style="color: #00007f; font-weight: bold;">push</span> <span style="color: #ff0000;">8</span> 
<span style="color: #adadad; font-style: italic;">00A01279</span> <span style="color: #00007f; font-weight: bold;">call</span> 00A010F5 
<span style="color: #adadad; font-style: italic;">00A0127E</span> <span style="color: #00007f; font-weight: bold;">add</span> <span style="color: #46aa03; font-weight: bold;">esp</span><span style="color: #339933;">,</span><span style="color: #ff0000;">4</span> 
<span style="color: #adadad; font-style: italic;">00A01281</span> <span style="color: #00007f; font-weight: bold;">mov</span> <span style="color: #0000ff; font-weight: bold;">dword</span> ptr <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #46aa03; font-weight: bold;">ebp</span><span style="color: #339933;">-</span><span style="color: #ff0000;">54h</span><span style="color: #009900; font-weight: bold;">&#93;</span><span style="color: #339933;">,</span><span style="color: #46aa03; font-weight: bold;">eax</span> 
<span style="color: #adadad; font-style: italic;">00A01284</span> <span style="color: #00007f; font-weight: bold;">mov</span> <span style="color: #46aa03; font-weight: bold;">eax</span><span style="color: #339933;">,</span><span style="color: #0000ff; font-weight: bold;">dword</span> ptr <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #46aa03; font-weight: bold;">ebp</span><span style="color: #339933;">-</span><span style="color: #ff0000;">54h</span><span style="color: #009900; font-weight: bold;">&#93;</span> 
<span style="color: #adadad; font-style: italic;">00A01287</span> <span style="color: #00007f; font-weight: bold;">mov</span> <span style="color: #0000ff; font-weight: bold;">dword</span> ptr <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #46aa03; font-weight: bold;">ebp</span><span style="color: #339933;">-</span><span style="color: #ff0000;">0Ch</span><span style="color: #009900; font-weight: bold;">&#93;</span><span style="color: #339933;">,</span><span style="color: #46aa03; font-weight: bold;">eax</span> 
    <span style="color: #ff0000;">19</span><span style="color: #339933;">:</span> psOnHeap<span style="color: #339933;">-</span>&gt;iA = <span style="color: #ff0000;">3</span><span style="color: #666666; font-style: italic;">;</span>
<span style="color: #adadad; font-style: italic;">00A0128A</span> <span style="color: #00007f; font-weight: bold;">mov</span> <span style="color: #46aa03; font-weight: bold;">eax</span><span style="color: #339933;">,</span><span style="color: #0000ff; font-weight: bold;">dword</span> ptr <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #46aa03; font-weight: bold;">ebp</span><span style="color: #339933;">-</span><span style="color: #ff0000;">0Ch</span><span style="color: #009900; font-weight: bold;">&#93;</span> 
<span style="color: #adadad; font-style: italic;">00A0128D</span> <span style="color: #00007f; font-weight: bold;">mov</span> <span style="color: #0000ff; font-weight: bold;">dword</span> ptr <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #46aa03; font-weight: bold;">eax</span><span style="color: #009900; font-weight: bold;">&#93;</span><span style="color: #339933;">,</span><span style="color: #ff0000;">3</span> 
    <span style="color: #ff0000;">20</span><span style="color: #339933;">:</span> psOnHeap<span style="color: #339933;">-</span>&gt;iB = <span style="color: #ff0000;">4</span><span style="color: #666666; font-style: italic;">; </span>
<span style="color: #adadad; font-style: italic;">00A01293</span> <span style="color: #00007f; font-weight: bold;">mov</span> <span style="color: #46aa03; font-weight: bold;">eax</span><span style="color: #339933;">,</span><span style="color: #0000ff; font-weight: bold;">dword</span> ptr <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #46aa03; font-weight: bold;">ebp</span><span style="color: #339933;">-</span><span style="color: #ff0000;">0Ch</span><span style="color: #009900; font-weight: bold;">&#93;</span> 
<span style="color: #adadad; font-style: italic;">00A01296</span> <span style="color: #00007f; font-weight: bold;">mov</span> <span style="color: #0000ff; font-weight: bold;">dword</span> ptr <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #46aa03; font-weight: bold;">eax</span><span style="color: #339933;">+</span><span style="color: #ff0000;">4</span><span style="color: #009900; font-weight: bold;">&#93;</span><span style="color: #339933;">,</span><span style="color: #ff0000;">4</span></pre></td></tr></table></div>

<p>Looking at lines 3 and 6 (and remembering what we learned in <a href="http://www.altdevblogaday.com/2011/11/09/a-low-level-curriculum-for-c-and-c/">post 2</a> about how variables in memory are accessed in assembly); we can see that both <em><strong>sOnStack.iA</strong></em> and <em><strong>sOnStack.iB</strong></em> are being directly accessed by their memory addresses as offsets from <em><strong>ebp </strong></em>(<em><strong>[ebp-8]</strong></em> and <em><strong>[ebp-4]</strong></em> respectively).</p>
<p>Looking at<em> lines 15-16</em> and <em>lines 18-19</em>, we can see that <em><strong>psOnHeap.iA</strong></em> and<em><strong> psOnHeap.iB</strong></em> are being accessed differently.</p>
<p>Since this is different to what we have seen before, let&#8217;s break it down a little:</p>
<ul>
<li>For each of these assignments, first the pointer <em><strong>psOnHeap</strong> </em>(i.e. memory address of the instance of <em><strong>STest</strong></em> created at line 7) is loaded into <em><strong>eax</strong> </em>(<em>line 15 and line 18</em>), and&#8230;</li>
<li>&#8230; then the member is accessed via the memory address stored in <em><strong>eax</strong></em><strong> </strong>(<em>line 16</em> and <em>line 19</em> &#8211; via <em><strong>[eax]</strong></em> and <em><strong>[eax+4]</strong></em> respectively).</li>
</ul>
<p>In particular, note that when <em><strong>STest::iB</strong></em> is accessed (at address <em><strong>[eax+4]</strong> </em>- <em>line 19</em>) an 4 byte offset is added, which is exactly the offset that the output from <strong>/d1reportSingleClassLayout</strong><strong> </strong>gave us.</p>
<p>Hopefully it should now be pretty obvious why the instances of <em><strong>STest</strong> </em>are accessed differently like this &#8211; and by extension why I showed code accessing an instance on the Stack and on the Heap (via a pointer):</p>
<ul>
<li>When an instance of a user defined type is on the Stack, the compiler is in charge of where the instance is stored (relative to the stack frame); and so it can access its members by their direct offsets within the stack frame.</li>
<li>When an instance is stored in a memory location that is not known at compile time (e.g. accessed via a pointer) the compiler can&#8217;t do this and has to access it via offsets from the instance&#8217;s base address (i.e. the memory address the instance starts at).</li>
</ul>
<p><em><strong>NOTE</strong></em>: this is debug disassembly code, please do not attempt to infer anything about the relative efficiency of Stack vs. Heap memory from this! As far as I am aware, on every machine I&#8217;ve ever used Stack and Heap are both stored in the same memory and accessed via the same physical systems so in terms of theoretical minimum access speeds Stack == Heap.</p>
<p>&nbsp;</p>
<h2>What about class?</h2>
<p>The short answer to this question is that there<em><strong> is</strong></em> no difference whatsoever between <em><strong>class</strong></em>es and <em><strong>struct</strong></em>s at the implementational level of C++.</p>
<p>The long answer is that, in the C++ language, <em><strong>struct</strong> </em>is actually a special case of <em><strong>class</strong></em> with one specific difference &#8211; for <em><strong>class</strong></em>es any unsupplied access specifier (i.e. <em><strong>public</strong></em>, <em><strong>protected</strong></em>, or <em><strong>private </strong></em>anywhere in the type declaration) will default to <em><strong>private</strong></em>, but for <em><strong>struct</strong></em>s it will default to <em><strong>public</strong></em>.</p>
<p>That&#8217;s it.  The only difference.  Honest.</p>
<p>Access specifiers are ultimately just language level syntactic sugar to allow us to control the way our classes are used; under the hood struct and class are implemented the same way &#8211; even with regards to stuff like inheritance and virtual functions.</p>
<p>If you do a search and replace of <em><strong>struct</strong></em> for <em><strong>class</strong></em> in the snippet and add a <em><strong>public:</strong></em> to the top of each class declaration (so it compiles) you will get the exact same output from the class layout information and the same disassembly.</p>
<p>&nbsp;</p>
<h2>Union</h2>
<p>As well as <em><strong>class</strong> </em>and <em><strong>struct,</strong></em> there is another keyword that can define a type &#8211; the keyword <em><strong>union</strong></em>.</p>
<p>It&#8217;s not a frequently seen or used language feature, and so it&#8217;s all the more worthwhile discussing here because it can be very useful and its low frequency of use means that a lot of people don&#8217;t really know what it&#8217;s for, let alone how it works.</p>
<p>Let&#8217;s look at this with another example code snippet. This has had two new types added to it:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
</pre></td><td class="code"><pre class="c" style="font-family:monospace;"><span style="color: #339933;">#include &quot;stdafx.h&quot;</span>
&nbsp;
<span style="color: #993333;">struct</span> STest
<span style="color: #009900;">&#123;</span>
    <span style="color: #993333;">int</span> iA<span style="color: #339933;">;</span>
    <span style="color: #993333;">int</span> iB<span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span><span style="color: #339933;">;</span>
&nbsp;
<span style="color: #993333;">struct</span> STestTwo
<span style="color: #009900;">&#123;</span>
    <span style="color: #993333;">int</span> iC<span style="color: #339933;">;</span>
    <span style="color: #993333;">int</span> iD<span style="color: #339933;">;</span>
    <span style="color: #993333;">int</span> iE<span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span><span style="color: #339933;">;</span>
&nbsp;
<span style="color: #993333;">union</span> UTestUnion
<span style="color: #009900;">&#123;</span>
    STest sTest<span style="color: #339933;">;</span>
    STestTwo sTestTwo<span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span><span style="color: #339933;">;</span>
&nbsp;
<span style="color: #993333;">int</span> main<span style="color: #009900;">&#40;</span><span style="color: #993333;">int</span> argc<span style="color: #339933;">,</span> <span style="color: #993333;">char</span><span style="color: #339933;">*</span> argv<span style="color: #009900;">&#91;</span><span style="color: #009900;">&#93;</span><span style="color: #009900;">&#41;</span>
<span style="color: #009900;">&#123;</span>
    UTestUnion<span style="color: #339933;">*</span> psOnHeap <span style="color: #339933;">=</span> new UTestUnion<span style="color: #339933;">;</span>
&nbsp;
    psOnHeap<span style="color: #339933;">-&gt;</span>sTest.<span style="color: #202020;">iA</span>    <span style="color: #339933;">=</span> <span style="color: #0000dd;">1</span><span style="color: #339933;">;</span>
    psOnHeap<span style="color: #339933;">-&gt;</span>sTestTwo.<span style="color: #202020;">iC</span> <span style="color: #339933;">=</span> <span style="color: #0000dd;">2</span><span style="color: #339933;">;</span>
&nbsp;
    psOnHeap<span style="color: #339933;">-&gt;</span>sTest.<span style="color: #202020;">iB</span>    <span style="color: #339933;">=</span> <span style="color: #0000dd;">3</span><span style="color: #339933;">;</span> 
    psOnHeap<span style="color: #339933;">-&gt;</span>sTestTwo.<span style="color: #202020;">iD</span> <span style="color: #339933;">=</span> <span style="color: #0000dd;">4</span><span style="color: #339933;">;</span>
&nbsp;
    psOnHeap<span style="color: #339933;">-&gt;</span>sTestTwo.<span style="color: #202020;">iE</span> <span style="color: #339933;">=</span> <span style="color: #0000dd;">5</span><span style="color: #339933;">;</span>
&nbsp;
    delete psOnHeap<span style="color: #339933;">;</span>
&nbsp;
    <span style="color: #b1b100;">return</span> <span style="color: #0000dd;">0</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span></pre></td></tr></table></div>

<p>Compiling this with <strong>/d1reportSingleClassLayout</strong> we get the following output for the layouts:</p>
<pre>1&gt; class STest	size(8):
1&gt;  	+---
1&gt;   0	| iA
1&gt;   4	| iB
1&gt;  	+---
1&gt;  
1&gt; class STestTwo	size(12):
1&gt;  	+---
1&gt;   0	| iC
1&gt;   4	| iD
1&gt;   8	| iE
1&gt;  	+---
1&gt;
1&gt; class UTestUnion	size(12):
1&gt;     +---
1&gt;   0 | STest sTest
1&gt;   0 | STestTwo sTestTwo
1&gt;     +---</pre>
<p>The first thing to note is that <em><strong>UTestUnion</strong></em> is the same size as <em><strong>STestTwo</strong></em>. This is exactly as one would expect.</p>
<p>The second thing to note is that both <em><strong>UTestUnion::sTest</strong></em> and <em><strong>UTestUnion::sTestTwo</strong></em> have an offset of 0 bytes within <em><strong>UTestUnion</strong></em>. Again, exactly as one would expect.</p>
<p>So, why is this the case?</p>
<p>The keyword <em><strong>union</strong></em> allows you specify multiple layouts for a chunk of memory. When we declare the union of <em><strong>STest</strong></em> and <em><strong>STestTwo </strong></em>within <em><strong>UTestUnion</strong></em>, we declare our intent to be able to treat the memory of type <em><strong>UTestUnion</strong></em> as either an instance of <em><strong>STest</strong></em> or an instance of <em><strong>STestTwo</strong></em> at our discretion.</p>
<p>This means that, within the type <em><strong>UTestUnion</strong></em>, an instance of <em><strong>STest</strong></em> and an instance of <em><strong>STestTwo</strong></em> exist overlaid on each other. Since the union can be treated as either type, this means that it must necessarily have the same size as the larger of the two types.</p>
<p>Let&#8217;s back this up by looking at the disassembly:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
</pre></td><td class="code"><pre class="asm" style="font-family:monospace;">    <span style="color: #ff0000;">29</span><span style="color: #339933;">:</span> psOnHeap<span style="color: #339933;">-</span>&gt;sTest<span style="color: #339933;">.</span>iA = <span style="color: #ff0000;">1</span><span style="color: #666666; font-style: italic;">;</span>
<span style="color: #adadad; font-style: italic;">0122127C</span> <span style="color: #00007f; font-weight: bold;">mov</span> <span style="color: #46aa03; font-weight: bold;">eax</span><span style="color: #339933;">,</span><span style="color: #0000ff; font-weight: bold;">dword</span> ptr <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #46aa03; font-weight: bold;">ebp</span><span style="color: #339933;">-</span><span style="color: #ff0000;">4</span><span style="color: #009900; font-weight: bold;">&#93;</span> 
<span style="color: #adadad; font-style: italic;">0122127F</span> <span style="color: #00007f; font-weight: bold;">mov</span> <span style="color: #0000ff; font-weight: bold;">dword</span> ptr <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #46aa03; font-weight: bold;">eax</span><span style="color: #009900; font-weight: bold;">&#93;</span><span style="color: #339933;">,</span><span style="color: #ff0000;">1</span> 
    <span style="color: #ff0000;">30</span><span style="color: #339933;">:</span> psOnHeap<span style="color: #339933;">-</span>&gt;sTestTwo<span style="color: #339933;">.</span>iC = <span style="color: #ff0000;">2</span><span style="color: #666666; font-style: italic;">;</span>
<span style="color: #adadad; font-style: italic;">01221285</span> <span style="color: #00007f; font-weight: bold;">mov</span> <span style="color: #46aa03; font-weight: bold;">eax</span><span style="color: #339933;">,</span><span style="color: #0000ff; font-weight: bold;">dword</span> ptr <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #46aa03; font-weight: bold;">ebp</span><span style="color: #339933;">-</span><span style="color: #ff0000;">4</span><span style="color: #009900; font-weight: bold;">&#93;</span> 
<span style="color: #adadad; font-style: italic;">01221288</span> <span style="color: #00007f; font-weight: bold;">mov</span> <span style="color: #0000ff; font-weight: bold;">dword</span> ptr <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #46aa03; font-weight: bold;">eax</span><span style="color: #009900; font-weight: bold;">&#93;</span><span style="color: #339933;">,</span><span style="color: #ff0000;">2</span> 
    <span style="color: #ff0000;">31</span><span style="color: #339933;">:</span> 
    <span style="color: #ff0000;">32</span><span style="color: #339933;">:</span> psOnHeap<span style="color: #339933;">-</span>&gt;sTest<span style="color: #339933;">.</span>iB = <span style="color: #ff0000;">3</span><span style="color: #666666; font-style: italic;">; </span>
<span style="color: #adadad; font-style: italic;">0122128E</span> <span style="color: #00007f; font-weight: bold;">mov</span> <span style="color: #46aa03; font-weight: bold;">eax</span><span style="color: #339933;">,</span><span style="color: #0000ff; font-weight: bold;">dword</span> ptr <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #46aa03; font-weight: bold;">ebp</span><span style="color: #339933;">-</span><span style="color: #ff0000;">4</span><span style="color: #009900; font-weight: bold;">&#93;</span> 
<span style="color: #adadad; font-style: italic;">01221291</span> <span style="color: #00007f; font-weight: bold;">mov</span> <span style="color: #0000ff; font-weight: bold;">dword</span> ptr <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #46aa03; font-weight: bold;">eax</span><span style="color: #339933;">+</span><span style="color: #ff0000;">4</span><span style="color: #009900; font-weight: bold;">&#93;</span><span style="color: #339933;">,</span><span style="color: #ff0000;">3</span> 
    <span style="color: #ff0000;">33</span><span style="color: #339933;">:</span> psOnHeap<span style="color: #339933;">-</span>&gt;sTestTwo<span style="color: #339933;">.</span>iD = <span style="color: #ff0000;">4</span><span style="color: #666666; font-style: italic;">;</span>
<span style="color: #adadad; font-style: italic;">01221298</span> <span style="color: #00007f; font-weight: bold;">mov</span> <span style="color: #46aa03; font-weight: bold;">eax</span><span style="color: #339933;">,</span><span style="color: #0000ff; font-weight: bold;">dword</span> ptr <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #46aa03; font-weight: bold;">ebp</span><span style="color: #339933;">-</span><span style="color: #ff0000;">4</span><span style="color: #009900; font-weight: bold;">&#93;</span> 
<span style="color: #adadad; font-style: italic;">0122129B</span> <span style="color: #00007f; font-weight: bold;">mov</span> <span style="color: #0000ff; font-weight: bold;">dword</span> ptr <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #46aa03; font-weight: bold;">eax</span><span style="color: #339933;">+</span><span style="color: #ff0000;">4</span><span style="color: #009900; font-weight: bold;">&#93;</span><span style="color: #339933;">,</span><span style="color: #ff0000;">4</span> 
    <span style="color: #ff0000;">34</span><span style="color: #339933;">:</span> 
    <span style="color: #ff0000;">35</span><span style="color: #339933;">:</span> psOnHeap<span style="color: #339933;">-</span>&gt;sTestTwo<span style="color: #339933;">.</span>iE = <span style="color: #ff0000;">5</span><span style="color: #666666; font-style: italic;">;</span>
<span style="color: #adadad; font-style: italic;">012212A2</span> <span style="color: #00007f; font-weight: bold;">mov</span> <span style="color: #46aa03; font-weight: bold;">eax</span><span style="color: #339933;">,</span><span style="color: #0000ff; font-weight: bold;">dword</span> ptr <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #46aa03; font-weight: bold;">ebp</span><span style="color: #339933;">-</span><span style="color: #ff0000;">4</span><span style="color: #009900; font-weight: bold;">&#93;</span> 
<span style="color: #adadad; font-style: italic;">012212A5</span> <span style="color: #00007f; font-weight: bold;">mov</span> <span style="color: #0000ff; font-weight: bold;">dword</span> ptr <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #46aa03; font-weight: bold;">eax</span><span style="color: #339933;">+</span><span style="color: #ff0000;">8</span><span style="color: #009900; font-weight: bold;">&#93;</span><span style="color: #339933;">,</span><span style="color: #ff0000;">5</span></pre></td></tr></table></div>

<p>There it is, clear as day :)</p>
<p>In case you&#8217;re not seeing it, here&#8217;s a quick breakdown:</p>
<ul>
<li>we can see that <em><strong>psOnHeap</strong></em> is stored at <em><strong>[ebp-4].</strong></em></li>
<li>(<em>line 2</em>)<strong><em> UTestUnion::sTest::iA</em></strong> and (<em>line 5</em>) <strong><em>UTestUnion::sTestTwo::iC</em></strong> are both being accessed directly via the value loaded into <strong>eax</strong> from <em><strong>[ebp-4]</strong> - </em>i.e. at an offset of 0 bytes; the same as their offset within their respective types as shown in the class layout information.<strong><br />
</strong></li>
<li>(<em>line 9</em>)<em> <strong>UTestUnion::sTest::iB</strong></em> and (<em>line 12</em>) <em><strong>UTestUnion::sTestTwo::iD</strong></em> are both being accessed via <em><strong>[eax+4]</strong></em> at an offset of 4 bytes from the value loaded into <em><strong>eax</strong></em> from <em><strong>[ebp-4]</strong></em>. Again, the same as their offsets within their respective types as shown in the class layout information.</li>
<li>(<em>line 16</em>) <strong><em>UTestUnion::sTestTwo::iE</em></strong> is accessed via <em><strong>[eax+8]</strong></em> &#8211; an offset of 8 bytes as specified in the class layout information.</li>
</ul>
<p>A more &#8216;real world&#8217; example of the use of <em><strong>union</strong></em> might be a data structure used in a vector maths library similar to the one below:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
</pre></td><td class="code"><pre class="c" style="font-family:monospace;">class CTestVec4
<span style="color: #009900;">&#123;</span>
    <span style="color: #993333;">union</span> 
    <span style="color: #009900;">&#123;</span>
        <span style="color: #993333;">struct</span> 
        <span style="color: #009900;">&#123;</span>
            <span style="color: #993333;">float</span> x<span style="color: #339933;">;</span>
            <span style="color: #993333;">float</span> y<span style="color: #339933;">;</span>
            <span style="color: #993333;">float</span> z<span style="color: #339933;">;</span>
            <span style="color: #993333;">float</span> w<span style="color: #339933;">;</span>
        <span style="color: #009900;">&#125;</span><span style="color: #339933;">;</span>
        <span style="color: #993333;">struct</span>  
        <span style="color: #009900;">&#123;</span>
            <span style="color: #993333;">float</span> vec<span style="color: #009900;">&#91;</span> <span style="color: #0000dd;">4</span> <span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span>
        <span style="color: #009900;">&#125;</span><span style="color: #339933;">;</span>
    <span style="color: #009900;">&#125;</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span><span style="color: #339933;">;</span></pre></td></tr></table></div>

<p>The above code declares a vector structure whose data can be accessed either via its components or like an array &#8211; e.g. <em><strong>CTestVec4::z</strong></em> occupies the same memory as <em><strong>CTestVec4::vec[ 2 ]</strong></em>.</p>
<p>The code looks like it should be illegal but it isn&#8217;t &#8211; leaving all the names out is entirely deliberate, this defines an &#8220;anonymous union&#8221; which makes the syntax for accessing the union &#8216;less cumbersome&#8217; (i.e. basically just less typing).</p>
<p>If you weren&#8217;t sure how <em><strong>union</strong></em> worked, or indeed what it was for, now you know :)</p>
<p>&nbsp;</p>
<h2>Surely we&#8217;re due a spanner in the works about now?</h2>
<p>We most certainly are! Well spotted.</p>
<p>There is an incredibly important low level aspect of the way memory is laid out within <em><strong>class</strong></em>es and <em><strong>struct</strong></em>s that I have deliberately skimmed over until now.</p>
<p>Consider the following snippet containing an innocent looking <em><strong>struct</strong> </em>declaration:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
</pre></td><td class="code"><pre class="c" style="font-family:monospace;"><span style="color: #339933;">#include &quot;stdafx.h&quot;</span>
&nbsp;
<span style="color: #993333;">struct</span> STestSpanner
<span style="color: #009900;">&#123;</span>
    <span style="color: #993333;">int</span>     iOne<span style="color: #339933;">;</span>
    <span style="color: #993333;">double</span>  fdTwo<span style="color: #339933;">;</span>
    <span style="color: #993333;">char</span>    chThree<span style="color: #339933;">;</span>
    <span style="color: #993333;">int</span>     iFour<span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span><span style="color: #339933;">;</span>
&nbsp;
<span style="color: #993333;">int</span> main<span style="color: #009900;">&#40;</span><span style="color: #993333;">int</span> argc<span style="color: #339933;">,</span> <span style="color: #993333;">char</span><span style="color: #339933;">*</span> argv<span style="color: #009900;">&#91;</span><span style="color: #009900;">&#93;</span><span style="color: #009900;">&#41;</span>
<span style="color: #009900;">&#123;</span>
    <span style="color: #b1b100;">return</span> <span style="color: #0000dd;">0</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span></pre></td></tr></table></div>

<p>When we compile this we get the following output from the class layout information:</p>
<pre>1&gt;  class STestSpanner    size(24):
1&gt;  	+---
1&gt;   0	| iOne
1&gt;    	| &lt;alignment member&gt; (size=4)
1&gt;   8	| fdTwo
1&gt;  16	| chThree
1&gt;    	| &lt;alignment member&gt; (size=3)
1&gt;  20	| iFour
1&gt;  	+---</pre>
<p>What witchcraft is this!?!</p>
<p>We&#8217;ve added a <em><strong>char</strong> </em>and a <em><strong>double</strong></em> to our <em><strong>struct </strong></em>- requiring a total of an extra <em><strong>9</strong></em> bytes (on win32 <em><strong>sizeof(char) == 1</strong></em> and <em><strong>sizeof(double) == 8</strong></em>).</p>
<p>However, the total size of the struct has increased by <em><strong>16</strong> </em>bytes &#8211; in addition to the <strong>9</strong> bytes we know we asked for, we have also added an additional <strong>7</strong> bytes of invisible &#8216;<em>alignment member</em>&#8216; fields.</p>
<p>What is going on? Padding &#8211; that&#8217;s what.</p>
<p>&nbsp;</p>
<h2>Padding!?!</h2>
<p>As long as the layout of the data members of a type meet the ordering requirement of the language standard (as covered earlier) they do not have to be immediately adjacent in memory.</p>
<p>The compiler is free, encouraged even, to insert additional padding into your <em><strong>struct</strong></em>s / <em><strong>class</strong></em>es at its discretion.</p>
<p>Why might the compiler wish to adjust the layout like this?</p>
<p>The short answer is: to optimise for speed of memory access for the intrinsic types stored in that structure.</p>
<p>The longer answer goes something like this&#8230;</p>
<p>On each platform, the various intrinsic types have different sizes and typically also different memory alignment requirements.</p>
<p>In the best case, accessing an intrinsic type with an alignment requirement from an incorrectly aligned memory address will cause the memory access time to be slower than usual (on x86 the cost is typically relatively small, but can be an order of magnitude slower on other platforms); and in the worst case can actually cause the CPU to crash (yes, really &#8211; on some platforms unaligned access makes the CPU freak out).</p>
<p>There are three separate factors at play in determining the size of <em><strong>STestSpanner:</strong></em></p>
<ol>
<li>The logical number of bytes required by its constituent types &#8211; this is the minimum possible size of the type.</li>
<li>The ordering of the constituent types within the class declaration.</li>
<li>The individual alignment requirements of those constituent types.</li>
</ol>
<p>The compiler honours both the ordering of the constituent types and their individual alignment requirements, and this interaction determines the amount of padding bytes that get added.</p>
<p>Since they affect each other and are closely related, the distinction between alignment, packing, and padding often causes confusion:</p>
<ul>
<li>Alignment is a constraint on the start address of instances of a type in memory</li>
<li>Packing is a constraint on the alignment of adjacent members within the memory of a struct or class</li>
<li>Padding is the bytes added within a <em><strong>class</strong> </em>or <em><strong>struct</strong> </em>to maintain its packing constraints</li>
</ul>
<p>This <a href="http://stackoverflow.com/questions/548164/mis-aligned-pointers-on-x86">Stack Overflow question</a> has a good a discussion of the implications of unaligned access on x86 for those of you who are interested.</p>
<p>&nbsp;</p>
<h2>Sorry, why the padding?</h2>
<p>As we have covered before, compiler writers are wily.</p>
<p>You must be able to declare an array of any type that you define, and so the memory layout of a struct or class you define must not only maintain the alignment constraints of its constituent intrinsic types within an instance, but also across an array of instances laid out contiguously in memory.</p>
<p>The simplest way to ensure this is to make the internal structure of the type adhere to the largest alignment constraint of its constituent types &#8211; and this means that the size and internal packing of any structure you declare will usually end up being determined by the largest alignment requirement of its constituent types.</p>
<p>In the case of our struct this would be double which has a default alignment of 8 bytes (at least under the Visual Studio x86 complier, it varies with other compilers), and consequently so does the structure STest &#8211; hence the 7 extra bytes of padding  to take the struct from the 17 bytes ( 2 ints (8bytes) + 1 double (8 bytes) + 1 char (1 byte) ) of data we asked it to contain up to 24 bytes &#8211; i.e. the next size that maintains the alignment of double.</p>
<p>You should find that,  no matter how you shuffle the  members of <em><strong>STestSpanner</strong> </em>around, you still end up with a 24 byte structure that includes 7 bytes of padding.</p>
<p>On the plus side, if we needed to add 3 extra <em><strong>char</strong> </em>and an extra <em><strong>int</strong> </em>into <em><strong>STestSpanner</strong> </em>we would get that storage space for free as long as we put them in the correct positions in the type declaration :)</p>
<p>&nbsp;</p>
<h2>But what about all the wasted bytes?</h2>
<p>The compiler knows what it is doing, and 99% of the time you should not worry about the wasted space.</p>
<p>Get a cup of tea and a biscuit and make peace with it &#8211; it&#8217;s not wasted space, it&#8217;s space invested in making your memory access more efficient.</p>
<p>However, you should worry about it a little &#8211; because it is entirely possible to cause the compiler to introduce padding into a type that is <em>actually</em> a total waste of memory.</p>
<p>Consider this struct:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
</pre></td><td class="code"><pre class="c" style="font-family:monospace;"><span style="color: #993333;">struct</span> STestSpannerTwo
<span style="color: #009900;">&#123;</span>
    <span style="color: #993333;">int</span>     iOne<span style="color: #339933;">;</span>
    <span style="color: #993333;">double</span>  fdTwo<span style="color: #339933;">;</span>
    <span style="color: #993333;">int</span>     iThree<span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span><span style="color: #339933;">;</span></pre></td></tr></table></div>

<p>which produces this class layout:</p>
<pre>1&gt;  class STestSpanner	size(24):
1&gt;     +---
1&gt;   0 | iOne
1&gt;     | &lt;alignment member&gt; (size=4)
1&gt;   8 | fdTwo
1&gt;  16 | iThree
1&gt;     | &lt;alignment member&gt; (size=4)
1&gt;     +---</pre>
<p>A quick look at the class layout above informs us that there is 4 bytes of padding between <em><strong>STestSpannerTwo::iOne</strong></em> and <em><strong>STestSpannerTwo::fdTwo; </strong></em>and another 4 bytes of padding after <em><strong>STestSpannerTwo::iThree.</strong></em></p>
<p>The x86 intrinsic 64 bit float format used to represent double is 8 bytes long and clearly has default alignment of 8-bytes under Visual Studio&#8217;s compiler.</p>
<p>The constraints of our type declaration combined with the 8 byte alignment constraint for <em><strong>double</strong></em> have resulted in:</p>
<ul>
<li>4 padding bytes after <em><strong>STestSpannerTwo::iOne</strong></em> and before <em><strong>STestSpannerTwo::fdTwo </strong></em>to maintain the alignment within the memory of a single instance of <em><strong>STestSpannerTwo&#8230;</strong></em></li>
<li>&#8230;and 4 padding bytes after <em><strong>STestSpannerTwo::iThree </strong></em>to maintain the 8 byte alignment across instances of <em><strong>STestSpannerTwo</strong></em><strong> </strong> that are contiguous in memory (i.e. an array of <em><strong>STestSpannerTwo</strong></em>).</li>
</ul>
<p>However, we can also see that  <em><strong>STestSpannerTwo::iThree</strong></em> is a 4 byte <em><strong>int</strong></em> and so it will fit into the first block of padding; eliminating the need for the 8 padding bytes.</p>
<p>Re-ordering the members by hand like this will save 8 bytes off the total size of the struct, and so we can see that &#8211; in this case &#8211; we can save 33% of the memory used by the struct basically for free &#8211; don&#8217;t take my word for it, try it!</p>
<p>whilst this isn&#8217;t something you should lose sleep over, you should now be able to see the benefit in always taking a second to consider the most appropriate place to insert a new data member into an existing type ;)</p>
<p>&nbsp;</p>
<h2>&#8230;but what if I <em><strong>really</strong></em> need those padding bytes?</h2>
<p>Unsurprisingly, this being C/C++, it is entirely possible to ask the compiler to change its default alignment and packing behaviour.</p>
<p>This is usually accomplished by use of command line compiler options and/or compiler specific commands that are inserted inline in your code.</p>
<p>In Visual Studio, for example, there is the <em><strong>/Zp</strong></em> compiler option, and another 2 ways to affect the alignment of data structures and the packing of their members with compiler commands in the code itself <a href="http://msdn.microsoft.com/en-us/library/83ythb65(v=vs.110).aspx"><em><strong>__declspec( align( x ) )</strong></em></a> and <a href="http://msdn.microsoft.com/en-us/library/2e70t5y1(v=vs.110).aspx"><em><strong>#pragma pack (x)</strong></em></a>). There may also be others I&#8217;ve never seen or used, but a quick search on t&#8217;interwebs didn&#8217;t find them.</p>
<p>For example, using <em><strong>#pragma pack</strong></em> to tell the compiler to pack <em><strong>STestSpanner </strong></em>to a 1 byte boundary like this:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
</pre></td><td class="code"><pre class="c" style="font-family:monospace;"><span style="color: #339933;">#pragma pack(1) </span>
<span style="color: #993333;">struct</span> STestSpanner
<span style="color: #009900;">&#123;</span>
    <span style="color: #993333;">int</span>     iOne<span style="color: #339933;">;</span>
    <span style="color: #993333;">double</span>  fdTwo<span style="color: #339933;">;</span>
    <span style="color: #993333;">char</span>    chThree<span style="color: #339933;">;</span>
    <span style="color: #993333;">int</span>     iFour<span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span><span style="color: #339933;">;</span></pre></td></tr></table></div>

<p>Gives this class layout output:</p>
<pre>1&gt;  class STestSpanner	size(17):
1&gt;  	+---
1&gt;   0	| iOne
1&gt;   4	| fdTwo
1&gt;  12	| chThree
1&gt;  13	| iFour
1&gt;  	+---</pre>
<p><em><strong>Be warned!</strong></em></p>
<p>Adjusting the packing of a type can (and probably will) break the alignment constraints of its constituent types &#8211; in the example above there are no &#8216;wasted&#8217; bytes; but <em><strong>STestSpanner </strong></em>no longer honours the alignment requirements of  its constituent types and so will presumably take significantly longer to access than it would do had we not fiddled about with it.</p>
<p>This means that when you ask the compiler to change the packing of a type you need to be <em><strong>very</strong></em> careful.</p>
<p>My advice is that &#8211; in general &#8211; you shouldn&#8217;t mess with alignment or padding unless you have a <em>very</em> good reason to; altering padding and alignment can get you into a whole world of pain, especially with bigger projects and when using code libraries &#8211; here&#8217;s a link to <a href="http://blogs.msdn.com/b/vcblog/archive/2007/05/17/diagnosing-hidden-odr-violations-in-visual-c-and-fixing-lnk2022.aspx">a post on the Visual C++ Team Blog</a> that goes into it in some detail.</p>
<p>The decisions to change alignment or padding will typically come down to essentially platform specific trade offs based on data derived from run time profiling &#8211; balancing issues like data sizes vs. memory resource constraints, worst case access times of individual parts of the data, and issues relating to system cache sizes and alignments.</p>
<p>&nbsp;</p>
<h2>Summary</h2>
<p>In summary here are the main points I&#8217;d like you to take away from this post:</p>
<ul>
<li>The undocumented compiler options  <strong>/d1reportSingleClassLayout </strong>and  <strong><strong>/d1reportAllClassLayout </strong></strong>are awesome, and can help you to understand the memory use implications of code you write, as well as being very useful debugging tools</li>
<li>We now know that, when an instance of a structure is accessed via pointer,  its members are accessed via an offset from the instance&#8217;s base address in the assembly, and &#8230;</li>
<li>&#8230; that, logically, we can use this in the disassembly view to work out which member is being accessed</li>
<li>The difference between <em><strong>struct</strong></em> and <em><strong>class. </strong></em>i.e. that there is no low level representational difference.</li>
<li>What the keyword <em><strong>union</strong></em> does and how it works</li>
<li>What padding and alignment are and some of their implications</li>
</ul>
<p>Next time we will look at how (simple) inheritance affects this picture&#8230;</p>
<p>&nbsp;</p>
<h2>Epilogue: Debugger Trick 17a</h2>
<p>Here&#8217;s an alternative way to find out the offset of a member within a user defined data type, a way that you can happily use in the debugger rather than having to compile the code.</p>
<p>This method works with the vast majority of debuggers I&#8217;ve used in the last 5 years or so on both PC and console; and it relies on the fact that the standard C style cast syntax works in watch windows (try it! It&#8217;s awesome).</p>
<p>One upshot of this is that you can use casting to calculate the byte offset of any member of any user defined type (this is also valid, and very useful, C/C++ code):</p>
<p><a href="http://www.altdevblogaday.com/wp-content/uploads/2012/12/CLLC_10UserDefinedTypes_DebuggerTrick7a.png"><img alt="CLLC_10UserDefinedTypes_DebuggerTrick7a" src="http://www.altdevblogaday.com/wp-content/uploads/2012/12/CLLC_10UserDefinedTypes_DebuggerTrick7a.png" width="545" height="107" /></a></p>
<p>That code looks horrible, not to mention dangerous, but what it&#8217;s doing is actually very simple and totally safe.</p>
<p>We have seen that, when using a pointer to an instance of a type, the compiler accesses members of that user defined type by adding an offset to the memory address the instance is stored at (its &#8216;base address&#8217;).</p>
<p>Note that the values we&#8217;re seeing in the watch window are identical to those seen in the output from <strong>/d1reportSingleClassLayout</strong>.</p>
<p>Here&#8217;s how this works:</p>
<ul>
<li><em><strong>(STest*) 0</strong></em> - tells the debugger to treat 0 as the value of a pointer to an instance STest. If you&#8217;re thinking &#8220;but 0 is NULL!&#8221; &#8211; remember that 0 is <em>only &#8216;NULL&#8217; by convention</em> (in fact, on some consoles, 0 is a valid memory address and can be accessed&#8230;). In any case, this code isn&#8217;t accessing the memory &#8211; it&#8217;s just asking the compiler to treat 0 as the value of a pointer.</li>
<li><em><strong>&amp;(((STest*)0)-&gt;iB)</strong></em> - tells the debugger to calculate the address of <em><strong>STest::iB</strong></em>. Again, since this is just calculating an address and not attempting to access it it is fine.</li>
</ul>
<p>This is possibly my favourite thing I have ever found out about debuggers, and has come in incredibly useful over the years :)</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.altdevblogaday.com/2013/01/05/cc-low-level-curriculum-part-10-user-defined-types/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>C/C++ Low Level Curriculum Part 9: Loops</title>
		<link>http://www.altdevblogaday.com/2012/09/04/cc-low-level-curriculum-part-9-loops/</link>
		<comments>http://www.altdevblogaday.com/2012/09/04/cc-low-level-curriculum-part-9-loops/#comments</comments>
		<pubDate>Tue, 04 Sep 2012 18:00:25 +0000</pubDate>
		<dc:creator>Alex Darby</dc:creator>
				<category><![CDATA[#gamedev]]></category>
		<category><![CDATA[Education]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[assembly]]></category>
		<category><![CDATA[C++]]></category>
		<category><![CDATA[education]]></category>
		<category><![CDATA[low level]]></category>
		<category><![CDATA[programming]]></category>

		<guid isPermaLink="false">http://altdevblogaday.com/?p=24837</guid>
		<description><![CDATA[<p>Welcome to the 9th post in this C/C++ low level curriculum series I&#8217;ve been doing. It&#8217;s been a long time since post 8 (way longer than I thought it was), a fact I can only apologise for. My 3 year old son stopped having a nap in the afternoon in late April and it&#8217;s totally ruined my productivity&#8230;</p>
<p><a href="http://www.altdevblogaday.com/2012/09/04/cc-low-level-curriculum-part-9-loops/" class="more-link">Read more on C/C++ Low Level Curriculum Part 9: Loops&#8230;</a></p>
]]></description>
				<content:encoded><![CDATA[<p>Welcome to the 9th post in this C/C++ low level curriculum series I&#8217;ve been doing. It&#8217;s been a long time since post 8 (way longer than I thought it was), a fact I can only apologise for. My 3 year old son stopped having a nap in the afternoon in late April and it&#8217;s totally ruined my productivity&#8230;</p>
<p>This post covers the 3 built-in looping control structures <em><strong>while</strong></em>, <em><strong>do-while</strong></em>, and <em><strong>for</strong></em> as well as the manual <strong>if-goto</strong> loop (old school!); as usual, we look in some detail at the assembly generated by the compiler looks like. Did I forget about the new <a href="http://en.wikipedia.org/wiki/C%2B%2B11#Range-based_for-loop">range-based-for</a> loop that was added in the C++11 standard? Nope. If you have access to a C++11 compliant compiler you&#8217;re more than welcome to look at that yourself &#8211; think of it as homework&#8230;</p>
<p><span id="more-24837"></span>Here are the backlinks for preceding articles in the series (warning: it might take you a while, the first few are quite long):</p>
<ol>
<li><a href="http://www.altdevblogaday.com/2011/11/09/a-low-level-curriculum-for-c-and-c/" rel="nofollow">http://altdevblogaday.com/2011/11/09/a-low-level-curriculum-for-c-and-c/</a></li>
<li><a href="http://www.altdevblogaday.com/2011/11/24/c-c-low-level-curriculum-part-2-data-types/" rel="nofollow">http://altdevblogaday.com/2011/11/24/c-c-low-level-curriculum-part-2-data-types/</a></li>
<li><a href="http://www.altdevblogaday.com/2011/12/14/c-c-low-level-curriculum-part-3-the-stack/">http://altdevblogaday.com/2011/12/14/c-c-low-level-curriculum-part-3-the-stack/</a></li>
<li><a href="http://www.altdevblogaday.com/2011/12/24/c-c-low-level-curriculum-part-4-more-stack/">http://altdevblogaday.com/2011/12/24/c-c-low-level-curriculum-part-4-more-stack/</a></li>
<li><a href="http://www.altdevblogaday.com/2012/02/07/c-c-low-level-curriculum-part-5-even-more-stack/">http://altdevblogaday.com/2012/02/07/c-c-low-level-curriculum-part-5-even-more-stack/</a></li>
<li><a href="http://www.altdevblogaday.com/2012/03/07/c-c-low-level-curriculum-part-6-conditionals/">http://altdevblogaday.com/2012/03/07/c-c-low-level-curriculum-part-6-conditionals/</a></li>
<li><a href="http://www.altdevblogaday.com/2012/04/10/cc-low-level-curriculum-part-7-more-conditionals/">http://www.altdevblogaday.com/2012/04/10/cc-low-level-curriculum-part-7-more-conditionals/</a></li>
<li><a href="http://www.altdevblogaday.com/2012/05/07/cc-low-level-curriculum-part-8-looking-at-optimised-assembly/">http://www.altdevblogaday.com/2012/05/07/cc-low-level-curriculum-part-8-looking-at-optimised-assembly/</a></li>
</ol>
<p>&nbsp;</p>
<h1>A brief history of looping</h1>
<p>It occurred to me that a sensible order to cover the looping constructs of the C/C++ language might be to address them in the order in which they were introduced into the language.</p>
<p>A couple of years back a friend showed me a brilliant website / article that covered the evolution of the C programming language. It was very interesting, and from what I can remember, contained information on the order in which the various features of the C compiler were added &#8211; including which looping construct came first. I tried to find it on t&#8217; internet, but failed. Feel free to link me up in a comment if you happen to know where it is&#8230;</p>
<p>Since I couldn&#8217;t find the article /website in question I&#8217;ve decided to cover them in the order of the amount of work they do automatically for the programmer, which in my opinion is: <strong>if-goto</strong>, <strong>while</strong>, <strong>do-while</strong>, and finally <strong>for</strong>.</p>
<p>This seems to me to be a sensible order for 2 reasons; firstly because it&#8217;s likely to be the order in which they were introduced into programming languages, and secondly because the concepts encapsulated by these constructs sort of build on each other in that order.</p>
<p>&nbsp;</p>
<h2>if-goto</h2>
<p>From our previous excursions into the land of assembly we are already familiar with the concept of <em><strong>j</strong></em>u<em><strong>mp</strong></em>ing the execution address, and with the concept of &#8216;conditional jumping&#8217; (i.e. conditionally changing the execution address). The most direct way to loop the execution of a piece of code several times (as opposed to the simplest to type) is to use the high level keywords that correspond to these assembly level concepts.</p>
<p>We are already familiar with the keyword <strong><em>if</em>,</strong> but we&#8217;ve not really covered <em><strong>goto</strong></em><strong><em> &#8211; </em></strong><em><strong></strong></em>possibly the most maligned of all the language features of C/C++, and almost certainly the most banned by corporate coding standards.</p>
<p>Personally I don&#8217;t think that <em><strong>goto</strong> </em>is inherently <a href="http://xkcd.com/292/">more dangerous</a> than (for example) operator overloading; but, the purpose of this article is not to discuss <strong>goto</strong> &#8211; if you&#8217;re interested here&#8217;s <a href="http://en.wikipedia.org/wiki/Goto">the Wikipedia page </a>which contains a fair amount of detail (and links to) on the arguments for and against it.</p>
<p>The purpose of this article is not to discuss the merits of <strong>goto</strong> or, for that matter, operator overloading so let&#8217;s get on with it.</p>
<p>Here&#8217;s the first code snippet (see the previous article for <a href="http://www.altdevblogaday.com/2012/05/07/cc-low-level-curriculum-part-8-looking-at-optimised-assembly/">how to set up a project</a> that will just accept this code&#8230;)</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
</pre></td><td class="code"><pre class="c" style="font-family:monospace;"><span style="color: #339933;">#include &quot;stdafx.h&quot;</span>
&nbsp;
<span style="color: #339933;">#define ARRAY_SIZE(array) (sizeof(array)/sizeof(array[0]))</span>
&nbsp;
<span style="color: #993333;">int</span> main<span style="color: #009900;">&#40;</span><span style="color: #993333;">int</span> argc<span style="color: #339933;">,</span> <span style="color: #993333;">char</span><span style="color: #339933;">*</span> argv<span style="color: #009900;">&#91;</span><span style="color: #009900;">&#93;</span><span style="color: #009900;">&#41;</span>
<span style="color: #009900;">&#123;</span>
    <span style="color: #993333;">int</span> k_aiData<span style="color: #009900;">&#91;</span><span style="color: #009900;">&#93;</span> <span style="color: #339933;">=</span> <span style="color: #009900;">&#123;</span> <span style="color: #0000dd;">1</span><span style="color: #339933;">,</span> <span style="color: #0000dd;">2</span><span style="color: #339933;">,</span> <span style="color: #0000dd;">3</span><span style="color: #339933;">,</span> <span style="color: #0000dd;">4</span><span style="color: #339933;">,</span> <span style="color: #0000dd;">5</span><span style="color: #339933;">,</span> <span style="color: #0000dd;">6</span><span style="color: #339933;">,</span> <span style="color: #0000dd;">7</span><span style="color: #339933;">,</span> <span style="color: #0000dd;">8</span> <span style="color: #009900;">&#125;</span><span style="color: #339933;">;</span>
    <span style="color: #993333;">int</span> iSum       <span style="color: #339933;">=</span> <span style="color: #0000dd;">0</span><span style="color: #339933;">;</span>
    <span style="color: #993333;">int</span> iLoop      <span style="color: #339933;">=</span> <span style="color: #0000dd;">0</span><span style="color: #339933;">;</span>
&nbsp;
LoopStart<span style="color: #339933;">:</span>
    <span style="color: #b1b100;">if</span><span style="color: #009900;">&#40;</span> iLoop <span style="color: #339933;">&lt;</span> ARRAY_SIZE<span style="color: #009900;">&#40;</span>k_aiData<span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#41;</span>
    <span style="color: #009900;">&#123;</span>
        iSum <span style="color: #339933;">+=</span> k_aiData<span style="color: #009900;">&#91;</span> iLoop <span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span>
        <span style="color: #339933;">++</span>iLoop<span style="color: #339933;">;</span>
        <span style="color: #b1b100;">goto</span> LoopStart<span style="color: #339933;">;</span>
    <span style="color: #009900;">&#125;</span>
&nbsp;
    <span style="color: #b1b100;">return</span> <span style="color: #0000dd;">0</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span></pre></td></tr></table></div>

<p>You should be able to see that this code is simply looping over the values in the array <em><strong>k_aiData</strong></em> and summing them, other than the use of <em><strong>if</strong></em> and <em><strong>goto</strong></em> it&#8217;s essentially a standard loop to iterate an array.</p>
<p>The pre-processor macro <strong>ARRAY_SIZE</strong> that I&#8217;ve used here is a simple way to make dealing with statically allocated arrays less error prone. Essentially we could initialise the array k_aiData with any number of elements we wanted to and the rest of the code would still just work. There are simple ways to achieve this in a type safe manner using templates too, but I chose to use a macro here because a readable version of the code takes up less vertical space than the template.</p>
<p>If you are wondering why I am not incrementing <strong>iLoop</strong> inside the square brackets, this is so that the high level code that is doing the work of the loop is identical across all code snippets.</p>
<p>If you are also wondering why I am using the prefix as opposed to postfix version of <strong>operator++</strong> then well done to you &#8211; award yourself 6.29 paying attention points. In this case it makes no difference to the assembly generated, but in these days of operator overloading it&#8217;s generally better to use the prefix version as a point of good practice &#8211; unless of course you require postfix behaviour (the first comment on the first answer to <a href="http://stackoverflow.com/questions/3846296/overloading-of-the-operator">this question on Stack Overflow</a> should prove illuminating if you don&#8217;t know what implications of the different behaviours are).</p>
<p>Since we&#8217;re using two keywords that have a very clear relationship to assembly level concepts, it&#8217;s reasonable to assume that the disassembly for this code will be pretty much as we wrote it at the high level. As we all know, we should never assume; so let&#8217;s check our assumptions.</p>
<p>Here is the debug x86 disassembly for the looping section:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
</pre></td><td class="code"><pre class="asm" style="font-family:monospace;">    <span style="color: #ff0000;">11</span><span style="color: #339933;">:</span> LoopStart<span style="color: #339933;">:</span>
    <span style="color: #ff0000;">12</span><span style="color: #339933;">:</span>     if<span style="color: #009900; font-weight: bold;">&#40;</span> iLoop &lt; ARRAY_SIZE<span style="color: #009900; font-weight: bold;">&#40;</span>k_aiData<span style="color: #009900; font-weight: bold;">&#41;</span> <span style="color: #009900; font-weight: bold;">&#41;</span>
<span style="color: #adadad; font-style: italic;">00BB1299</span>  <span style="color: #00007f; font-weight: bold;">cmp</span>         <span style="color: #0000ff; font-weight: bold;">dword</span> ptr <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #46aa03; font-weight: bold;">ebp</span><span style="color: #339933;">-</span><span style="color: #ff0000;">2Ch</span><span style="color: #009900; font-weight: bold;">&#93;</span><span style="color: #339933;">,</span><span style="color: #ff0000;">8</span>  
<span style="color: #adadad; font-style: italic;">00BB129D</span>  <span style="color: #00007f; font-weight: bold;">jae</span>         LoopStart<span style="color: #339933;">+</span><span style="color: #ff0000;">1Eh</span> <span style="color: #009900; font-weight: bold;">&#40;</span><span style="color: #ff0000;">0BB12B7h</span><span style="color: #009900; font-weight: bold;">&#41;</span>  
    <span style="color: #ff0000;">13</span><span style="color: #339933;">:</span>     <span style="color: #009900; font-weight: bold;">&#123;</span>
    <span style="color: #ff0000;">14</span><span style="color: #339933;">:</span>         iSum <span style="color: #339933;">+</span>= k_aiData<span style="color: #009900; font-weight: bold;">&#91;</span> iLoop <span style="color: #009900; font-weight: bold;">&#93;</span><span style="color: #666666; font-style: italic;">;</span>
<span style="color: #adadad; font-style: italic;">00BB129F</span>  <span style="color: #00007f; font-weight: bold;">mov</span>         <span style="color: #46aa03; font-weight: bold;">eax</span><span style="color: #339933;">,</span><span style="color: #0000ff; font-weight: bold;">dword</span> ptr <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #46aa03; font-weight: bold;">ebp</span><span style="color: #339933;">-</span><span style="color: #ff0000;">2Ch</span><span style="color: #009900; font-weight: bold;">&#93;</span>  
<span style="color: #adadad; font-style: italic;">00BB12A2</span>  <span style="color: #00007f; font-weight: bold;">mov</span>         <span style="color: #46aa03; font-weight: bold;">ecx</span><span style="color: #339933;">,</span><span style="color: #0000ff; font-weight: bold;">dword</span> ptr <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #46aa03; font-weight: bold;">ebp</span><span style="color: #339933;">-</span><span style="color: #ff0000;">28h</span><span style="color: #009900; font-weight: bold;">&#93;</span>  
<span style="color: #adadad; font-style: italic;">00BB12A5</span>  <span style="color: #00007f; font-weight: bold;">add</span>         <span style="color: #46aa03; font-weight: bold;">ecx</span><span style="color: #339933;">,</span><span style="color: #0000ff; font-weight: bold;">dword</span> ptr <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #46aa03; font-weight: bold;">ebp</span><span style="color: #339933;">+</span><span style="color: #46aa03; font-weight: bold;">eax</span><span style="color: #339933;">*</span><span style="color: #ff0000;">4</span><span style="color: #339933;">-</span><span style="color: #ff0000;">24h</span><span style="color: #009900; font-weight: bold;">&#93;</span>  
<span style="color: #adadad; font-style: italic;">00BB12A9</span>  <span style="color: #00007f; font-weight: bold;">mov</span>         <span style="color: #0000ff; font-weight: bold;">dword</span> ptr <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #46aa03; font-weight: bold;">ebp</span><span style="color: #339933;">-</span><span style="color: #ff0000;">28h</span><span style="color: #009900; font-weight: bold;">&#93;</span><span style="color: #339933;">,</span><span style="color: #46aa03; font-weight: bold;">ecx</span>  
    <span style="color: #ff0000;">15</span><span style="color: #339933;">:</span>         <span style="color: #339933;">++</span>iLoop<span style="color: #666666; font-style: italic;">;</span>
<span style="color: #adadad; font-style: italic;">00BB12AC</span>  <span style="color: #00007f; font-weight: bold;">mov</span>         <span style="color: #46aa03; font-weight: bold;">eax</span><span style="color: #339933;">,</span><span style="color: #0000ff; font-weight: bold;">dword</span> ptr <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #46aa03; font-weight: bold;">ebp</span><span style="color: #339933;">-</span><span style="color: #ff0000;">2Ch</span><span style="color: #009900; font-weight: bold;">&#93;</span>  
<span style="color: #adadad; font-style: italic;">00BB12AF</span>  <span style="color: #00007f; font-weight: bold;">add</span>         <span style="color: #46aa03; font-weight: bold;">eax</span><span style="color: #339933;">,</span><span style="color: #ff0000;">1</span>  
<span style="color: #adadad; font-style: italic;">00BB12B2</span>  <span style="color: #00007f; font-weight: bold;">mov</span>         <span style="color: #0000ff; font-weight: bold;">dword</span> ptr <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #46aa03; font-weight: bold;">ebp</span><span style="color: #339933;">-</span><span style="color: #ff0000;">2Ch</span><span style="color: #009900; font-weight: bold;">&#93;</span><span style="color: #339933;">,</span><span style="color: #46aa03; font-weight: bold;">eax</span>  
    <span style="color: #ff0000;">16</span><span style="color: #339933;">:</span>         goto LoopStart<span style="color: #666666; font-style: italic;">;</span>
<span style="color: #adadad; font-style: italic;">00BB12B5</span>  <span style="color: #00007f; font-weight: bold;">jmp</span>         LoopStart <span style="color: #009900; font-weight: bold;">&#40;</span><span style="color: #ff0000;">0BB1299h</span><span style="color: #009900; font-weight: bold;">&#41;</span>  
    <span style="color: #ff0000;">17</span><span style="color: #339933;">:</span>     <span style="color: #009900; font-weight: bold;">&#125;</span>
    <span style="color: #ff0000;">18</span><span style="color: #339933;">:</span> 
    <span style="color: #ff0000;">19</span><span style="color: #339933;">:</span>     return <span style="color: #ff0000;">0</span><span style="color: #666666; font-style: italic;">;</span>
<span style="color: #adadad; font-style: italic;">00BB12B7</span>  <span style="color: #00007f; font-weight: bold;">xor</span>         <span style="color: #46aa03; font-weight: bold;">eax</span><span style="color: #339933;">,</span><span style="color: #46aa03; font-weight: bold;">eax</span></pre></td></tr></table></div>

<p>As expected, the disassembly for this is very straightforward, and you should be familiar with almost all of it from previous posts.</p>
<p>As we saw in the first <a href="http://www.altdevblogaday.com/2012/03/07/c-c-low-level-curriculum-part-6-conditionals/">article on conditionals</a>, the assembly code (lines 3 &amp; 4) that maps to the <strong>if</strong> statement (line 2) tests the logical opposite of the high level code. This is because the high level <strong>if</strong> conceptually &#8216;steps into&#8217; the curly brackets it controls if its test passes, whereas the assembly has to jump past the assembly code generated by the content of the <strong>if</strong> in order to <em><strong>not</strong></em> execute it (<em>remember:</em> curly brackets are a high level convenience for programmers!).</p>
<p>In this case, <em>line 3</em> compares <strong>iLoop</strong> (at address <em><strong>[ebp-2Ch]</strong></em>) to 8 (the size of the array obtained from <strong>ARRAY_SIZE</strong> is a compile time constant), and (<em>line 4</em>) uses <em><strong>jae</strong></em> (<em><strong>j</strong></em>ump if <em><strong>a</strong></em>bove or <em><strong>e</strong></em>qual) to conditionally jump execution to <strong>LoopStart+1Eh (0BB12B7h) &#8211; </strong>which is the memory address immediately after the assembly generated by the content of the curly brackets controlled by the <strong>if</strong> statement.</p>
<p>The next block of assembly adds the <strong>iLoop</strong>-th element of <strong>k_aiData</strong> to <strong>iSum</strong>. By this point, we should all be familiar with the assembly for adding two integers, and the way in which the elements of <em><strong>k_aiData</strong></em> are accessed is the only real new assembly code idiom that we&#8217;re seeing in this disassembly.</p>
<p>The instruction that accesses the <strong>iLoop</strong>-th element from the array is doing a surprising amount of work for an assembly instruction; certainly this is the first time that we&#8217;ve seen any significant computation being performed within a single line of assembly code, and it&#8217;s all occurring in the square brackets in the place that usually contains the address of the value we wish to access.</p>
<p>So, let&#8217;s look at it in detail:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>9
</pre></td><td class="code"><pre class="asm" style="font-family:monospace;"><span style="color: #00007f; font-weight: bold;">add</span> <span style="color: #46aa03; font-weight: bold;">ecx</span><span style="color: #339933;">,</span><span style="color: #0000ff; font-weight: bold;">dword</span> ptr <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #46aa03; font-weight: bold;">ebp</span><span style="color: #339933;">+</span><span style="color: #46aa03; font-weight: bold;">eax</span><span style="color: #339933;">*</span><span style="color: #ff0000;">4</span><span style="color: #339933;">-</span><span style="color: #ff0000;">24h</span><span style="color: #009900; font-weight: bold;">&#93;</span></pre></td></tr></table></div>

<p>When <em>line 9</em> is executed, the <strong>eax</strong> register holds the value of <strong>iLoop</strong> and <em><strong>[ebp-24h]</strong></em> is the address of the array <strong>k_aiData</strong>.</p>
<p>Since <strong>k_aiData</strong> is an array of <strong>int</strong>, the address of <strong>k_aiData[ 0 ]</strong> is <em><strong>[ebp-24h]</strong></em> and <strong>sizeof( int )</strong> is <strong>4</strong> on the x86, it should be pretty obvious that the computation <em><strong>[ebp+eax*4-24h]</strong></em> on line 9 equates to the memory address of the <strong>iLoop</strong>-th element of <strong>k_aiData</strong>.</p>
<p>If you&#8217;re having trouble seeing it, here is the address computation seen in the disassembly rearranged step by step so that we can swap out the registers and memory addresses for the high level variables:</p>
<p style="padding-left: 30px"><em><strong>ebp+eax*4-24h</strong></em></p>
<p style="padding-left: 30px"><em><strong>= ebp + ( eax*4 ) + (-24h)</strong></em></p>
<p style="padding-left: 30px"><em><strong>= ebp + <em><strong>(-24h)</strong></em> + ( eax*4 )</strong></em></p>
<p style="padding-left: 30px"><em><strong>= ( epb &#8211; 24h ) + (eax * 4 )</strong></em></p>
<p style="padding-left: 30px"><em><strong> = &amp;k_aiData[ 0 ] + ( iLoop * sizeof( int ) )<br />
</strong></em></p>
<p>Now we&#8217;ve examined the new elements of the disassembly we&#8217;ve not seen before, the rest of this post should clip along fairly quickly :)</p>
<p>So, after the value stored in the <strong>iLoop</strong>-th element of k_<strong>aiData</strong> has been added to <strong>iSum</strong>, all that remains is to ++<strong>iLoop</strong> ( lines 12-14) and then jump back to the label at the start of the loop (line 16).</p>
<p>Clearly this will continue until <strong>iLoop</strong> &gt;= 8, and so we can see that the assembly is <a href="http://en.wikipedia.org/wiki/Isomorphism">isomorphic </a>with the high level code.</p>
<p>&nbsp;</p>
<h3><strong>Why add Looping Constructs?</strong></h3>
<p>Since looping behaviour can simply be achieved using the <strong>if-goto</strong>, this begs the question &#8220;Why did <a href="http://en.wikipedia.org/wiki/Dennis_Ritchie">Dennis Ritchie</a> (sadly no longer with us) bother with the rest of the looping constructs available in C?&#8221;</p>
<p>There are three main reasons that spring to my mind, the first is efficiency (of typing rather than execution), the second is robustness, and the third is clarity of intent.</p>
<p>Writing a loop using the <strong>if-goto</strong> idiom involves a fair amount of typing, and loops are very common in most code bases. No-one likes to type more than they have to &#8211; especially programmers. Since the programmers using the language were probably originally the programmers <em>of</em> the language it was more or less an inevitability that a more textually terse method of writing loops would come about.</p>
<p>Secondly, and more importantly, the code involved in any writing two given <strong>if-goto</strong> loops is very similar and doing it by hand would be more prone to error (as well as tedious) than using a code construct specifically made to handle looping which removes the need for the explicit <strong>goto</strong> and associated label.</p>
<p>Thirdly, and possibly even more importantly, an explicit looping construct makes the <em>intent</em> of the code far more clear, <strong>if</strong> and <strong>goto</strong> both have plenty of other uses as well as looping, and so any programmer coming along later to read code containing an <strong>if-goto</strong> loop would have to expend significant mental effort just to get to the point where they can see that the code is in fact a loop; which would clearly be very bad.</p>
<p><em><strong></strong></em>Taken together, these three reasons mean that you will almost certainly never write a loop using <strong>if-goto</strong> for any reason other than just for fun; and you certainly won&#8217;t need to write one. The only reason I am covering it is because I feel that it&#8217;s worth considering as a step in the evolution of looping constructs in languages.</p>
<p>&nbsp;</p>
<h2>while</h2>
<p>So, we come to <strong>while</strong>. The while loop is basically an automatic <strong>if-goto</strong>, and we will see this when we look at the disassembly (which is essentially why I covered the <strong>if-goto</strong> in the first place).</p>
<p>Here&#8217;s the code snippet upgraded to use <strong>while</strong></p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
</pre></td><td class="code"><pre class="c" style="font-family:monospace;"><span style="color: #339933;">#include &quot;stdafx.h&quot;</span>
&nbsp;
<span style="color: #339933;">#define ARRAY_SIZE(array) (sizeof(array)/sizeof(array[0]))</span>
&nbsp;
<span style="color: #993333;">int</span> main<span style="color: #009900;">&#40;</span><span style="color: #993333;">int</span> argc<span style="color: #339933;">,</span> <span style="color: #993333;">char</span><span style="color: #339933;">*</span> argv<span style="color: #009900;">&#91;</span><span style="color: #009900;">&#93;</span><span style="color: #009900;">&#41;</span>
<span style="color: #009900;">&#123;</span>
    <span style="color: #993333;">int</span> k_aiData<span style="color: #009900;">&#91;</span><span style="color: #009900;">&#93;</span> <span style="color: #339933;">=</span> <span style="color: #009900;">&#123;</span> <span style="color: #0000dd;">1</span><span style="color: #339933;">,</span> <span style="color: #0000dd;">2</span><span style="color: #339933;">,</span> <span style="color: #0000dd;">3</span><span style="color: #339933;">,</span> <span style="color: #0000dd;">4</span><span style="color: #339933;">,</span> <span style="color: #0000dd;">5</span><span style="color: #339933;">,</span> <span style="color: #0000dd;">6</span><span style="color: #339933;">,</span> <span style="color: #0000dd;">7</span><span style="color: #339933;">,</span> <span style="color: #0000dd;">8</span> <span style="color: #009900;">&#125;</span><span style="color: #339933;">;</span>
    <span style="color: #993333;">int</span> iSum       <span style="color: #339933;">=</span> <span style="color: #0000dd;">0</span><span style="color: #339933;">;</span>
    <span style="color: #993333;">int</span> iLoop      <span style="color: #339933;">=</span> <span style="color: #0000dd;">0</span><span style="color: #339933;">;</span>
&nbsp;
    <span style="color: #b1b100;">while</span><span style="color: #009900;">&#40;</span> iLoop <span style="color: #339933;">&lt;</span> ARRAY_SIZE<span style="color: #009900;">&#40;</span>k_aiData<span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#41;</span>
    <span style="color: #009900;">&#123;</span>
        iSum <span style="color: #339933;">+=</span> k_aiData<span style="color: #009900;">&#91;</span> iLoop <span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span>
        iLoop<span style="color: #339933;">++;</span>
    <span style="color: #009900;">&#125;</span>
&nbsp;
    <span style="color: #b1b100;">return</span> <span style="color: #0000dd;">0</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span></pre></td></tr></table></div>

<p>Clearly the high level code looks neater already, and (more importantly) the manual elements of putting the <strong>if</strong> and <strong>goto</strong> in the right places have been removed; so it&#8217;s a lot harder to do something wrong as a result of human error, and it&#8217;s instantly obvious that the code is looping over the content of the array <strong>k_aiData</strong>.</p>
<p>Much better &#8211; well done programming language designers of yesteryear!</p>
<p>Now let&#8217;s have a look at the (dis)assembly that it generates&#8230;</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
</pre></td><td class="code"><pre class="asm" style="font-family:monospace;">    <span style="color: #ff0000;">11</span><span style="color: #339933;">:</span>     while<span style="color: #009900; font-weight: bold;">&#40;</span> iLoop &lt; ARRAY_SIZE<span style="color: #009900; font-weight: bold;">&#40;</span>k_aiData<span style="color: #009900; font-weight: bold;">&#41;</span> <span style="color: #009900; font-weight: bold;">&#41;</span>
<span style="color: #adadad; font-style: italic;">013E1299</span>  <span style="color: #00007f; font-weight: bold;">cmp</span>         <span style="color: #0000ff; font-weight: bold;">dword</span> ptr <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #46aa03; font-weight: bold;">ebp</span><span style="color: #339933;">-</span><span style="color: #ff0000;">2Ch</span><span style="color: #009900; font-weight: bold;">&#93;</span><span style="color: #339933;">,</span><span style="color: #ff0000;">8</span>  
<span style="color: #adadad; font-style: italic;">013E129D</span>  <span style="color: #00007f; font-weight: bold;">jae</span>         main<span style="color: #339933;">+</span><span style="color: #ff0000;">77h</span> <span style="color: #009900; font-weight: bold;">&#40;</span><span style="color: #ff0000;">13E12B7h</span><span style="color: #009900; font-weight: bold;">&#41;</span>  
    <span style="color: #ff0000;">12</span><span style="color: #339933;">:</span>     <span style="color: #009900; font-weight: bold;">&#123;</span>
    <span style="color: #ff0000;">13</span><span style="color: #339933;">:</span>         iSum <span style="color: #339933;">+</span>= k_aiData<span style="color: #009900; font-weight: bold;">&#91;</span> iLoop <span style="color: #009900; font-weight: bold;">&#93;</span><span style="color: #666666; font-style: italic;">;</span>
<span style="color: #adadad; font-style: italic;">013E129F</span>  <span style="color: #00007f; font-weight: bold;">mov</span>         <span style="color: #46aa03; font-weight: bold;">eax</span><span style="color: #339933;">,</span><span style="color: #0000ff; font-weight: bold;">dword</span> ptr <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #46aa03; font-weight: bold;">ebp</span><span style="color: #339933;">-</span><span style="color: #ff0000;">2Ch</span><span style="color: #009900; font-weight: bold;">&#93;</span>  
<span style="color: #adadad; font-style: italic;">013E12A2</span>  <span style="color: #00007f; font-weight: bold;">mov</span>         <span style="color: #46aa03; font-weight: bold;">ecx</span><span style="color: #339933;">,</span><span style="color: #0000ff; font-weight: bold;">dword</span> ptr <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #46aa03; font-weight: bold;">ebp</span><span style="color: #339933;">-</span><span style="color: #ff0000;">28h</span><span style="color: #009900; font-weight: bold;">&#93;</span>  
<span style="color: #adadad; font-style: italic;">013E12A5</span>  <span style="color: #00007f; font-weight: bold;">add</span>         <span style="color: #46aa03; font-weight: bold;">ecx</span><span style="color: #339933;">,</span><span style="color: #0000ff; font-weight: bold;">dword</span> ptr <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #46aa03; font-weight: bold;">ebp</span><span style="color: #339933;">+</span><span style="color: #46aa03; font-weight: bold;">eax</span><span style="color: #339933;">*</span><span style="color: #ff0000;">4</span><span style="color: #339933;">-</span><span style="color: #ff0000;">24h</span><span style="color: #009900; font-weight: bold;">&#93;</span>  
<span style="color: #adadad; font-style: italic;">013E12A9</span>  <span style="color: #00007f; font-weight: bold;">mov</span>         <span style="color: #0000ff; font-weight: bold;">dword</span> ptr <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #46aa03; font-weight: bold;">ebp</span><span style="color: #339933;">-</span><span style="color: #ff0000;">28h</span><span style="color: #009900; font-weight: bold;">&#93;</span><span style="color: #339933;">,</span><span style="color: #46aa03; font-weight: bold;">ecx</span>  
    <span style="color: #ff0000;">14</span><span style="color: #339933;">:</span>         iLoop<span style="color: #339933;">++</span><span style="color: #666666; font-style: italic;">;</span>
<span style="color: #adadad; font-style: italic;">013E12AC</span>  <span style="color: #00007f; font-weight: bold;">mov</span>         <span style="color: #46aa03; font-weight: bold;">eax</span><span style="color: #339933;">,</span><span style="color: #0000ff; font-weight: bold;">dword</span> ptr <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #46aa03; font-weight: bold;">ebp</span><span style="color: #339933;">-</span><span style="color: #ff0000;">2Ch</span><span style="color: #009900; font-weight: bold;">&#93;</span>  
<span style="color: #adadad; font-style: italic;">013E12AF</span>  <span style="color: #00007f; font-weight: bold;">add</span>         <span style="color: #46aa03; font-weight: bold;">eax</span><span style="color: #339933;">,</span><span style="color: #ff0000;">1</span>  
<span style="color: #adadad; font-style: italic;">013E12B2</span>  <span style="color: #00007f; font-weight: bold;">mov</span>         <span style="color: #0000ff; font-weight: bold;">dword</span> ptr <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #46aa03; font-weight: bold;">ebp</span><span style="color: #339933;">-</span><span style="color: #ff0000;">2Ch</span><span style="color: #009900; font-weight: bold;">&#93;</span><span style="color: #339933;">,</span><span style="color: #46aa03; font-weight: bold;">eax</span>  
    <span style="color: #ff0000;">15</span><span style="color: #339933;">:</span>     <span style="color: #009900; font-weight: bold;">&#125;</span>
<span style="color: #adadad; font-style: italic;">013E12B5</span>  <span style="color: #00007f; font-weight: bold;">jmp</span>         main<span style="color: #339933;">+</span><span style="color: #ff0000;">59h</span> <span style="color: #009900; font-weight: bold;">&#40;</span><span style="color: #ff0000;">13E1299h</span><span style="color: #009900; font-weight: bold;">&#41;</span>  
    <span style="color: #ff0000;">16</span><span style="color: #339933;">:</span> 
    <span style="color: #ff0000;">17</span><span style="color: #339933;">:</span>     return <span style="color: #ff0000;">0</span><span style="color: #666666; font-style: italic;">;</span>
<span style="color: #adadad; font-style: italic;">013E12B7</span>  <span style="color: #00007f; font-weight: bold;">xor</span>         <span style="color: #46aa03; font-weight: bold;">eax</span><span style="color: #339933;">,</span><span style="color: #46aa03; font-weight: bold;">eax</span></pre></td></tr></table></div>

<p>Almost entirely unsurprisingly, the assembly that has been generated from the <strong>while</strong> is essentially identical to that generated for the <strong>if-goto</strong> we just looked at &#8211; only the addresses that are being jumped to have changed.</p>
<p>This is the sort of thing that restores my faith in humanity; well, in compiler programmers specifically but they&#8217;re still human. I assume.</p>
<p>&nbsp;</p>
<h2>do-while</h2>
<p>Let&#8217;s move swiftly on with the code snippet for the next type of loop, the <strong>do-while</strong>.</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
</pre></td><td class="code"><pre class="c" style="font-family:monospace;"><span style="color: #339933;">#include &quot;stdafx.h&quot;</span>
&nbsp;
<span style="color: #339933;">#define ARRAY_SIZE(array) (sizeof(array)/sizeof(array[0]))</span>
&nbsp;
<span style="color: #993333;">int</span> main<span style="color: #009900;">&#40;</span><span style="color: #993333;">int</span> argc<span style="color: #339933;">,</span> <span style="color: #993333;">char</span><span style="color: #339933;">*</span> argv<span style="color: #009900;">&#91;</span><span style="color: #009900;">&#93;</span><span style="color: #009900;">&#41;</span>
<span style="color: #009900;">&#123;</span>
    <span style="color: #993333;">int</span> k_aiData<span style="color: #009900;">&#91;</span><span style="color: #009900;">&#93;</span> <span style="color: #339933;">=</span> <span style="color: #009900;">&#123;</span> <span style="color: #0000dd;">1</span><span style="color: #339933;">,</span> <span style="color: #0000dd;">2</span><span style="color: #339933;">,</span> <span style="color: #0000dd;">3</span><span style="color: #339933;">,</span> <span style="color: #0000dd;">4</span><span style="color: #339933;">,</span> <span style="color: #0000dd;">5</span><span style="color: #339933;">,</span> <span style="color: #0000dd;">6</span><span style="color: #339933;">,</span> <span style="color: #0000dd;">7</span><span style="color: #339933;">,</span> <span style="color: #0000dd;">8</span> <span style="color: #009900;">&#125;</span><span style="color: #339933;">;</span>
    <span style="color: #993333;">int</span> iSum       <span style="color: #339933;">=</span> <span style="color: #0000dd;">0</span><span style="color: #339933;">;</span>
    <span style="color: #993333;">int</span> iLoop      <span style="color: #339933;">=</span> <span style="color: #0000dd;">0</span><span style="color: #339933;">;</span>
&nbsp;
    <span style="color: #b1b100;">do</span> 
    <span style="color: #009900;">&#123;</span>
        iSum <span style="color: #339933;">+=</span> k_aiData<span style="color: #009900;">&#91;</span> iLoop <span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span>
        <span style="color: #339933;">++</span>iLoop<span style="color: #339933;">;</span>
    <span style="color: #009900;">&#125;</span> 
    <span style="color: #b1b100;">while</span><span style="color: #009900;">&#40;</span> iLoop <span style="color: #339933;">&lt;</span> ARRAY_SIZE<span style="color: #009900;">&#40;</span>k_aiData<span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
    <span style="color: #b1b100;">return</span> <span style="color: #0000dd;">0</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span></pre></td></tr></table></div>

<p>Essentially the same code, but now we&#8217;re testing the loop&#8217;s exit condition at the end of each loop rather than at the beginning.</p>
<p>All being sane in the universe, I think it would be reasonable to expect the assembly generated for this code to turn out very similar to the previous two loops &#8211; except that the testing code is likely to be after the body of the loop rather than before it&#8230;.</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
</pre></td><td class="code"><pre class="asm" style="font-family:monospace;">    <span style="color: #ff0000;">11</span><span style="color: #339933;">:</span>     <span style="color: #0000ff; font-weight: bold;">do</span> 
    <span style="color: #ff0000;">12</span><span style="color: #339933;">:</span>     <span style="color: #009900; font-weight: bold;">&#123;</span>
    <span style="color: #ff0000;">13</span><span style="color: #339933;">:</span>         iSum <span style="color: #339933;">+</span>= k_aiData<span style="color: #009900; font-weight: bold;">&#91;</span> iLoop <span style="color: #009900; font-weight: bold;">&#93;</span><span style="color: #666666; font-style: italic;">;</span>
<span style="color: #adadad; font-style: italic;">00CC1299</span>  <span style="color: #00007f; font-weight: bold;">mov</span>         <span style="color: #46aa03; font-weight: bold;">eax</span><span style="color: #339933;">,</span><span style="color: #0000ff; font-weight: bold;">dword</span> ptr <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #46aa03; font-weight: bold;">ebp</span><span style="color: #339933;">-</span><span style="color: #ff0000;">2Ch</span><span style="color: #009900; font-weight: bold;">&#93;</span>  
<span style="color: #adadad; font-style: italic;">00CC129C</span>  <span style="color: #00007f; font-weight: bold;">mov</span>         <span style="color: #46aa03; font-weight: bold;">ecx</span><span style="color: #339933;">,</span><span style="color: #0000ff; font-weight: bold;">dword</span> ptr <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #46aa03; font-weight: bold;">ebp</span><span style="color: #339933;">-</span><span style="color: #ff0000;">28h</span><span style="color: #009900; font-weight: bold;">&#93;</span>  
<span style="color: #adadad; font-style: italic;">00CC129F</span>  <span style="color: #00007f; font-weight: bold;">add</span>         <span style="color: #46aa03; font-weight: bold;">ecx</span><span style="color: #339933;">,</span><span style="color: #0000ff; font-weight: bold;">dword</span> ptr <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #46aa03; font-weight: bold;">ebp</span><span style="color: #339933;">+</span><span style="color: #46aa03; font-weight: bold;">eax</span><span style="color: #339933;">*</span><span style="color: #ff0000;">4</span><span style="color: #339933;">-</span><span style="color: #ff0000;">24h</span><span style="color: #009900; font-weight: bold;">&#93;</span>  
<span style="color: #adadad; font-style: italic;">00CC12A3</span>  <span style="color: #00007f; font-weight: bold;">mov</span>         <span style="color: #0000ff; font-weight: bold;">dword</span> ptr <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #46aa03; font-weight: bold;">ebp</span><span style="color: #339933;">-</span><span style="color: #ff0000;">28h</span><span style="color: #009900; font-weight: bold;">&#93;</span><span style="color: #339933;">,</span><span style="color: #46aa03; font-weight: bold;">ecx</span>  
    <span style="color: #ff0000;">14</span><span style="color: #339933;">:</span>         <span style="color: #339933;">++</span>iLoop<span style="color: #666666; font-style: italic;">;</span>
<span style="color: #adadad; font-style: italic;">00CC12A6</span>  <span style="color: #00007f; font-weight: bold;">mov</span>         <span style="color: #46aa03; font-weight: bold;">eax</span><span style="color: #339933;">,</span><span style="color: #0000ff; font-weight: bold;">dword</span> ptr <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #46aa03; font-weight: bold;">ebp</span><span style="color: #339933;">-</span><span style="color: #ff0000;">2Ch</span><span style="color: #009900; font-weight: bold;">&#93;</span>  
<span style="color: #adadad; font-style: italic;">00CC12A9</span>  <span style="color: #00007f; font-weight: bold;">add</span>         <span style="color: #46aa03; font-weight: bold;">eax</span><span style="color: #339933;">,</span><span style="color: #ff0000;">1</span>  
<span style="color: #adadad; font-style: italic;">00CC12AC</span>  <span style="color: #00007f; font-weight: bold;">mov</span>         <span style="color: #0000ff; font-weight: bold;">dword</span> ptr <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #46aa03; font-weight: bold;">ebp</span><span style="color: #339933;">-</span><span style="color: #ff0000;">2Ch</span><span style="color: #009900; font-weight: bold;">&#93;</span><span style="color: #339933;">,</span><span style="color: #46aa03; font-weight: bold;">eax</span>  
    <span style="color: #ff0000;">15</span><span style="color: #339933;">:</span>     <span style="color: #009900; font-weight: bold;">&#125;</span> 
    <span style="color: #ff0000;">16</span><span style="color: #339933;">:</span>     while<span style="color: #009900; font-weight: bold;">&#40;</span> iLoop &lt; ARRAY_SIZE<span style="color: #009900; font-weight: bold;">&#40;</span>k_aiData<span style="color: #009900; font-weight: bold;">&#41;</span> <span style="color: #009900; font-weight: bold;">&#41;</span><span style="color: #666666; font-style: italic;">;</span>
<span style="color: #adadad; font-style: italic;">00CC12AF</span>  <span style="color: #00007f; font-weight: bold;">cmp</span>         <span style="color: #0000ff; font-weight: bold;">dword</span> ptr <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #46aa03; font-weight: bold;">ebp</span><span style="color: #339933;">-</span><span style="color: #ff0000;">2Ch</span><span style="color: #009900; font-weight: bold;">&#93;</span><span style="color: #339933;">,</span><span style="color: #ff0000;">8</span>  
<span style="color: #adadad; font-style: italic;">00CC12B3</span>  <span style="color: #00007f; font-weight: bold;">jb</span>          main<span style="color: #339933;">+</span><span style="color: #ff0000;">59h</span> <span style="color: #009900; font-weight: bold;">&#40;</span><span style="color: #ff0000;">0CC1299h</span><span style="color: #009900; font-weight: bold;">&#41;</span>  
    <span style="color: #ff0000;">17</span><span style="color: #339933;">:</span> 
    <span style="color: #ff0000;">18</span><span style="color: #339933;">:</span>     return <span style="color: #ff0000;">0</span><span style="color: #666666; font-style: italic;">;</span>
<span style="color: #adadad; font-style: italic;">00CC12B5</span>  <span style="color: #00007f; font-weight: bold;">xor</span>         <span style="color: #46aa03; font-weight: bold;">eax</span><span style="color: #339933;">,</span><span style="color: #46aa03; font-weight: bold;">eax</span></pre></td></tr></table></div>

<p>As expected then, the code doing the work of the loop and incrementing <strong>iLoop</strong> is basically identical.</p>
<p>Also as expected, the conditional jump that keeps the loop going is a little different &#8211; it&#8217;s using the jump instruction <em><strong>jb</strong></em> (<em><strong>j</strong></em>ump if <em><strong>b</strong></em>elow) so, unlike pretty much all the other assembly code we&#8217;ve looked at generated by high level conditionals, this is testing the same condition as the high level code &#8211; but why?</p>
<p>As discussed earlier, the high level language concept of &#8216;curly bracket scope&#8217; doesn&#8217;t exist at the assembly level. Despite this, the compiler has to generate assembly code that is logically <a href="http://en.wikipedia.org/wiki/Isomorphism">isomorphic</a> with the high level code; so in order to satisfy the high level behavioural constraint of &#8216;stepping into&#8217; the curly bracketed code if a pre-condition <em><strong>is</strong></em> met, the assembly skips over the code within the curly brackets if the condition <em><strong>isn&#8217;t</strong></em> met.</p>
<p>So, since the looping condition is a post-condition in a <strong>do-while</strong> loop (i.e. at the end of the &#8216;curly bracket scope&#8217; it controls) the high level code and assembly code both need to jump back to the start of the loop if the looping condition is met, and so the test in the assembly code is the same as that at the high level.</p>
<p>&nbsp;</p>
<h2>for</h2>
<p>So, we come to the <strong>for</strong> loop, the loop you probably use the most often.</p>
<p>The<strong> for</strong> loop was the looping construct that worked the hardest for you until the new C++11 ANSI standard introduced the &#8216;range-based&#8217; <strong>for</strong> to the language this time last year (not counting the various template based solutions). Unfortunately (although it&#8217;s obviously supported in the recently released VC2012) support for the C++11 standard is patchy at best on most video game platforms so the <strong>for</strong> loop is still the default solution.</p>
<p>Let&#8217;s take a second to look at the &#8216;anatomy of a loop&#8217;. More or less any looping code it has 3 responsibilities in addition to the work it does per iteration of the loop:</p>
<p style="padding-left: 30px">a) declare and/or initialise loop state variables<br />
b) test loop exit condition<br />
c) update state variables for the next loop</p>
<p>These 3 responsibilities define the scope and manner of the iteration the loop is doing, and therefore can be seen as the &#8216;fingerprint&#8217; of that iteration.</p>
<p>The for <strong>loop</strong> is a &#8216;language level refactoring&#8217; that gathers these three responsibilities into one construct giving them textual adjacency, thus making the entire fingerprint visible in one place.</p>
<p>Whilst this is pretty obvious when you stop to examine it, the importance of explicitly stating this should not be underestimated.</p>
<p>Why? Let&#8217;s look at <strong>for</strong> compared to <strong>while</strong>, replacing the code with the corresponding <em><strong>a</strong></em>, <em><strong>b</strong></em>, or <em><strong>c</strong> </em>from the list above.</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
</pre></td><td class="code"><pre class="c" style="font-family:monospace;"><span style="color: #b1b100;">for</span><span style="color: #009900;">&#40;</span> a<span style="color: #339933;">;</span> b<span style="color: #339933;">;</span> c<span style="color: #009900;">&#41;</span>
<span style="color: #009900;">&#123;</span>
    <span style="color: #666666; font-style: italic;">//do work</span>
<span style="color: #009900;">&#125;</span></pre></td></tr></table></div>

<p>as opposed to:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
</pre></td><td class="code"><pre class="c" style="font-family:monospace;">a<span style="color: #339933;">;</span>
<span style="color: #b1b100;">while</span><span style="color: #009900;">&#40;</span> b <span style="color: #009900;">&#41;</span>
<span style="color: #009900;">&#123;</span>
    <span style="color: #666666; font-style: italic;">//do work</span>
    c<span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span></pre></td></tr></table></div>

<p>So, the <strong>for</strong> loop takes up less vertical space than the <strong>while</strong> (in this instance at least) but what, if anything, are the other advantages:</p>
<ul>
<li>variables declared by <em><strong>a</strong></em> in the <strong>for</strong> are scoped to the loop. Smaller scope == less entropy == less bugs.</li>
<li><em><strong>c</strong> </em>is obviously distinct from the work code of the loop in the <em><strong>for</strong></em>, but not so in the <strong>while</strong> (be honest; how many times have you accidentally done an infinite <strong>while</strong> because you forgot to increment at the end?)</li>
<li>the adjacency of <em><strong>a</strong></em>,<em><strong> b</strong></em>, and <em><strong>c</strong></em> in the <strong>for</strong> allows possible bugs with loop conditions to be spotted more easily</li>
</ul>
<p>Whoever invented the for loop deserves a pat on the back, because <strong> for</strong> takes the improvements made by the while and do-while loops to the next level &#8211; by reducing human error and increasing the clarity of intent even further.</p>
<p>I looked him up and it turns out that the earliest equivalent to <strong>for</strong> I found by googling is the <a href="http://en.wikibooks.org/wiki/Fortran/Fortran_control#Loops">DO loop in FORTRAN</a> which was invented in 1957 by a team led by the late <a href="http://en.wikipedia.org/wiki/John_Backus">John Backus </a>at IBM. Since that&#8217;s about as close to an answer as I feel I need to get, I now invite you to join me in a posthumous air high-five to John to celebrate his team&#8217;s sterling work.</p>
<p>Let&#8217;s look at one now shall we? Here&#8217;s the code snippet:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
</pre></td><td class="code"><pre class="c" style="font-family:monospace;"><span style="color: #339933;">#include &quot;stdafx.h&quot;</span>
&nbsp;
<span style="color: #339933;">#define ARRAY_SIZE(array) (sizeof(array)/sizeof(array[0]))</span>
&nbsp;
<span style="color: #993333;">int</span> main<span style="color: #009900;">&#40;</span><span style="color: #993333;">int</span> argc<span style="color: #339933;">,</span> <span style="color: #993333;">char</span><span style="color: #339933;">*</span> argv<span style="color: #009900;">&#91;</span><span style="color: #009900;">&#93;</span><span style="color: #009900;">&#41;</span>
<span style="color: #009900;">&#123;</span>
    <span style="color: #993333;">int</span> k_aiData<span style="color: #009900;">&#91;</span><span style="color: #009900;">&#93;</span> <span style="color: #339933;">=</span> <span style="color: #009900;">&#123;</span> <span style="color: #0000dd;">1</span><span style="color: #339933;">,</span> <span style="color: #0000dd;">2</span><span style="color: #339933;">,</span> <span style="color: #0000dd;">3</span><span style="color: #339933;">,</span> <span style="color: #0000dd;">4</span><span style="color: #339933;">,</span> <span style="color: #0000dd;">5</span><span style="color: #339933;">,</span> <span style="color: #0000dd;">6</span><span style="color: #339933;">,</span> <span style="color: #0000dd;">7</span><span style="color: #339933;">,</span> <span style="color: #0000dd;">8</span> <span style="color: #009900;">&#125;</span><span style="color: #339933;">;</span>
    <span style="color: #993333;">int</span> iSum       <span style="color: #339933;">=</span> <span style="color: #0000dd;">0</span><span style="color: #339933;">;</span>
&nbsp;
    <span style="color: #b1b100;">for</span><span style="color: #009900;">&#40;</span> <span style="color: #993333;">int</span> iLoop <span style="color: #339933;">=</span> <span style="color: #0000dd;">0</span><span style="color: #339933;">;</span> iLoop <span style="color: #339933;">&lt;</span> ARRAY_SIZE<span style="color: #009900;">&#40;</span>k_aiData<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span> <span style="color: #339933;">++</span>iLoop <span style="color: #009900;">&#41;</span>
    <span style="color: #009900;">&#123;</span>
        iSum <span style="color: #339933;">+=</span> k_aiData<span style="color: #009900;">&#91;</span> iLoop <span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span>
    <span style="color: #009900;">&#125;</span> 
&nbsp;
    <span style="color: #b1b100;">return</span> <span style="color: #0000dd;">0</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span></pre></td></tr></table></div>

<p>&#8230;and here&#8217;s the disassembly (n.b. I un-ticked the &#8216;Show symbol names&#8217; check box in the disassembly display options for this&#8230;)</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
</pre></td><td class="code"><pre class="asm" style="font-family:monospace;">    <span style="color: #ff0000;">10</span><span style="color: #339933;">:</span>     for<span style="color: #009900; font-weight: bold;">&#40;</span> <span style="color: #00007f; font-weight: bold;">int</span> iLoop = <span style="color: #ff0000;">0</span><span style="color: #666666; font-style: italic;">; iLoop &lt; ARRAY_SIZE(k_aiData); ++iLoop )</span>
<span style="color: #adadad; font-style: italic;">00DC1292</span>  <span style="color: #00007f; font-weight: bold;">mov</span>         <span style="color: #0000ff; font-weight: bold;">dword</span> ptr <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #46aa03; font-weight: bold;">ebp</span><span style="color: #339933;">-</span><span style="color: #ff0000;">2Ch</span><span style="color: #009900; font-weight: bold;">&#93;</span><span style="color: #339933;">,</span><span style="color: #ff0000;">0</span>  
<span style="color: #adadad; font-style: italic;">00DC1299</span>  <span style="color: #00007f; font-weight: bold;">jmp</span>         00DC12A4  
<span style="color: #adadad; font-style: italic;">00DC129B</span>  <span style="color: #00007f; font-weight: bold;">mov</span>         <span style="color: #46aa03; font-weight: bold;">eax</span><span style="color: #339933;">,</span><span style="color: #0000ff; font-weight: bold;">dword</span> ptr <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #46aa03; font-weight: bold;">ebp</span><span style="color: #339933;">-</span><span style="color: #ff0000;">2Ch</span><span style="color: #009900; font-weight: bold;">&#93;</span>  
<span style="color: #adadad; font-style: italic;">00DC129E</span>  <span style="color: #00007f; font-weight: bold;">add</span>         <span style="color: #46aa03; font-weight: bold;">eax</span><span style="color: #339933;">,</span><span style="color: #ff0000;">1</span>  
<span style="color: #adadad; font-style: italic;">00DC12A1</span>  <span style="color: #00007f; font-weight: bold;">mov</span>         <span style="color: #0000ff; font-weight: bold;">dword</span> ptr <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #46aa03; font-weight: bold;">ebp</span><span style="color: #339933;">-</span><span style="color: #ff0000;">2Ch</span><span style="color: #009900; font-weight: bold;">&#93;</span><span style="color: #339933;">,</span><span style="color: #46aa03; font-weight: bold;">eax</span>  
<span style="color: #adadad; font-style: italic;">00DC12A4</span>  <span style="color: #00007f; font-weight: bold;">cmp</span>         <span style="color: #0000ff; font-weight: bold;">dword</span> ptr <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #46aa03; font-weight: bold;">ebp</span><span style="color: #339933;">-</span><span style="color: #ff0000;">2Ch</span><span style="color: #009900; font-weight: bold;">&#93;</span><span style="color: #339933;">,</span><span style="color: #ff0000;">8</span>  
<span style="color: #adadad; font-style: italic;">00DC12A8</span>  <span style="color: #00007f; font-weight: bold;">jae</span>         00DC12B9  
    <span style="color: #ff0000;">11</span><span style="color: #339933;">:</span>     <span style="color: #009900; font-weight: bold;">&#123;</span>
    <span style="color: #ff0000;">12</span><span style="color: #339933;">:</span>         iSum <span style="color: #339933;">+</span>= k_aiData<span style="color: #009900; font-weight: bold;">&#91;</span> iLoop <span style="color: #009900; font-weight: bold;">&#93;</span><span style="color: #666666; font-style: italic;">;</span>
<span style="color: #adadad; font-style: italic;">00DC12AA</span>  <span style="color: #00007f; font-weight: bold;">mov</span>         <span style="color: #46aa03; font-weight: bold;">eax</span><span style="color: #339933;">,</span><span style="color: #0000ff; font-weight: bold;">dword</span> ptr <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #46aa03; font-weight: bold;">ebp</span><span style="color: #339933;">-</span><span style="color: #ff0000;">2Ch</span><span style="color: #009900; font-weight: bold;">&#93;</span>  
<span style="color: #adadad; font-style: italic;">00DC12AD</span>  <span style="color: #00007f; font-weight: bold;">mov</span>         <span style="color: #46aa03; font-weight: bold;">ecx</span><span style="color: #339933;">,</span><span style="color: #0000ff; font-weight: bold;">dword</span> ptr <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #46aa03; font-weight: bold;">ebp</span><span style="color: #339933;">-</span><span style="color: #ff0000;">28h</span><span style="color: #009900; font-weight: bold;">&#93;</span>  
<span style="color: #adadad; font-style: italic;">00DC12B0</span>  <span style="color: #00007f; font-weight: bold;">add</span>         <span style="color: #46aa03; font-weight: bold;">ecx</span><span style="color: #339933;">,</span><span style="color: #0000ff; font-weight: bold;">dword</span> ptr <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #46aa03; font-weight: bold;">ebp</span><span style="color: #339933;">+</span><span style="color: #46aa03; font-weight: bold;">eax</span><span style="color: #339933;">*</span><span style="color: #ff0000;">4</span><span style="color: #339933;">-</span><span style="color: #ff0000;">24h</span><span style="color: #009900; font-weight: bold;">&#93;</span>  
<span style="color: #adadad; font-style: italic;">00DC12B4</span>  <span style="color: #00007f; font-weight: bold;">mov</span>         <span style="color: #0000ff; font-weight: bold;">dword</span> ptr <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #46aa03; font-weight: bold;">ebp</span><span style="color: #339933;">-</span><span style="color: #ff0000;">28h</span><span style="color: #009900; font-weight: bold;">&#93;</span><span style="color: #339933;">,</span><span style="color: #46aa03; font-weight: bold;">ecx</span>  
    <span style="color: #ff0000;">13</span><span style="color: #339933;">:</span>     <span style="color: #009900; font-weight: bold;">&#125;</span> 
<span style="color: #adadad; font-style: italic;">00DC12B7</span>  <span style="color: #00007f; font-weight: bold;">jmp</span>         00DC129B  
    <span style="color: #ff0000;">14</span><span style="color: #339933;">:</span> 
    <span style="color: #ff0000;">15</span><span style="color: #339933;">:</span>     return <span style="color: #ff0000;">0</span><span style="color: #666666; font-style: italic;">;</span>
<span style="color: #adadad; font-style: italic;">00DC12B9</span>  <span style="color: #00007f; font-weight: bold;">xor</span>         <span style="color: #46aa03; font-weight: bold;">eax</span><span style="color: #339933;">,</span><span style="color: #46aa03; font-weight: bold;">eax</span></pre></td></tr></table></div>

<p>Sooooo &#8230; this one looks a little different, right? It&#8217;s not very different though, just re-organised a little:</p>
<ol>
<li><em>Line 2</em>-3: is initialising <strong>iLoop</strong> (i.e.<em><strong> [ebp-2Ch]</strong></em>) to 0<em>, </em>and then<em> </em>jumping over<em> lines 4-6</em></li>
<li><em>Lines 4-6</em>: are incrementing <strong>iLoop</strong></li>
<li><em>Lines 7-8</em>: are comparing <strong>iLoop</strong> with 8 and exits the loop by jumping to <em>line 19</em> if <strong>iLoop</strong> &gt;= 8 (n.b. pre-condition check so opposite of high level)</li>
<li><em>Lines 11-14</em>: indexing the array and accumulating the sum of element values (should look very familiar by now)</li>
<li><em>Line 16</em>: loops back to <em>line 4</em></li>
</ol>
<p>So, the assembly in each of steps 1, 2, and 3 implements one of the semi-colon separated parts of the <strong>for</strong> loop&#8217;s &#8216;parameters; in fact, steps 1 to 3 correspond to <em><strong>a</strong> </em>(initialise), <em><strong>c</strong> </em>(increment), and <strong><em>b</em> </strong>(test exit condition) respectively in our  &#8216;anatomy of a loop&#8217; list above.</p>
<p>Only steps 1 and 3 are executed on the first iteration of the loop, and only steps 2 and 3 on all other iterations.</p>
<p>Also note that steps 2 and 3 are in the opposite order in the assembly compared to the high level code &#8211; this is, again, down to the disparity between high level nicety and low level execution.</p>
<p>So, the assembly that is generated from a <strong>for</strong> loop is more or less as you might expect. We&#8217;ve covered all the (non-templated-non-C++11) looping constructs now, end of story &#8211; next article. Move along please.</p>
<p>&nbsp;</p>
<h2>Wait! I&#8217;m not quite finished!</h2>
<p>Hold on! The reason the <a href="http://www.altdevblogaday.com/2012/05/07/cc-low-level-curriculum-part-8-looking-at-optimised-assembly/">last post</a> was about how to look at optimised assembly is mostly because I  wanted to look at the optimised assembly generated by the C++ looping constructs in this post.</p>
<p>So, rather than re-compile all the snippets one by one let&#8217;s set up the project <a href="http://www.altdevblogaday.com/2012/05/07/cc-low-level-curriculum-part-8-looking-at-optimised-assembly/">just like in post 8</a>, and then download and paste in this code (massive &#8216;snippet&#8217;): <a href="http://www.altdevblogaday.com/wp-content/uploads/2012/03/CPPLLC_Part9MoreLoops.txt" target="_blank">CPPLLC_Part9MoreLoops</a>.</p>
<p>This file contains a simple program that has 4 functions in addition to <strong>main</strong> &#8211; they are:</p>
<ul>
<li><strong>SumGoto</strong> &#8211; sums the elements of an array using an <strong>if-goto</strong> loop</li>
<li><strong>SumWhile</strong> &#8211; sums the elements of an array using a <strong>while</strong> loop</li>
<li><strong>SumDo</strong> &#8211; sums the elements of an array using a <strong>do-while</strong> loop, and</li>
<li><strong>SumFor</strong> &#8211; sums the elements of an array using a <strong>for</strong> loop</li>
</ul>
<p>All very straightforward really. The only unusual thing you might notice is that <strong>main</strong> looks like this:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
</pre></td><td class="code"><pre class="c" style="font-family:monospace;"><span style="color: #993333;">int</span> main<span style="color: #009900;">&#40;</span> <span style="color: #993333;">int</span> argc<span style="color: #339933;">,</span> <span style="color: #993333;">char</span><span style="color: #339933;">*</span> argv<span style="color: #009900;">&#91;</span><span style="color: #009900;">&#93;</span> <span style="color: #009900;">&#41;</span>
<span style="color: #009900;">&#123;</span>
    <span style="color: #666666; font-style: italic;">// array and a nice const for the size</span>
    <span style="color: #993333;">const</span> <span style="color: #993333;">int</span> k_iArraySize <span style="color: #339933;">=</span> <span style="color: #0000dd;">8</span><span style="color: #339933;">;</span>
    <span style="color: #993333;">int</span>       k_aiData<span style="color: #009900;">&#91;</span> k_iArraySize <span style="color: #009900;">&#93;</span> <span style="color: #339933;">=</span> <span style="color: #009900;">&#123;</span> <span style="color: #0000dd;">0</span><span style="color: #339933;">,</span> <span style="color: #0000dd;">1</span><span style="color: #339933;">,</span> <span style="color: #0000dd;">2</span><span style="color: #339933;">,</span> <span style="color: #0000dd;">3</span><span style="color: #339933;">,</span> <span style="color: #0000dd;">4</span><span style="color: #339933;">,</span> <span style="color: #0000dd;">5</span><span style="color: #339933;">,</span> <span style="color: #0000dd;">6</span><span style="color: #339933;">,</span> <span style="color: #0000dd;">7</span> <span style="color: #009900;">&#125;</span><span style="color: #339933;">;</span>
&nbsp;
    <span style="color: #993333;">int</span> iSumGoto  <span style="color: #339933;">=</span> SumGoto <span style="color: #009900;">&#40;</span> k_aiData<span style="color: #339933;">,</span> <span style="color: #000066;">atoi</span><span style="color: #009900;">&#40;</span> argv<span style="color: #009900;">&#91;</span> <span style="color: #0000dd;">1</span> <span style="color: #009900;">&#93;</span> <span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    <span style="color: #993333;">int</span> iSumWhile <span style="color: #339933;">=</span> SumWhile<span style="color: #009900;">&#40;</span> k_aiData<span style="color: #339933;">,</span> <span style="color: #000066;">atoi</span><span style="color: #009900;">&#40;</span> argv<span style="color: #009900;">&#91;</span> <span style="color: #0000dd;">2</span> <span style="color: #009900;">&#93;</span> <span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    <span style="color: #993333;">int</span> iSumDo    <span style="color: #339933;">=</span> SumDo   <span style="color: #009900;">&#40;</span> k_aiData<span style="color: #339933;">,</span> <span style="color: #000066;">atoi</span><span style="color: #009900;">&#40;</span> argv<span style="color: #009900;">&#91;</span> <span style="color: #0000dd;">3</span> <span style="color: #009900;">&#93;</span> <span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    <span style="color: #993333;">int</span> iSumFor   <span style="color: #339933;">=</span> SumFor  <span style="color: #009900;">&#40;</span> k_aiData<span style="color: #339933;">,</span> <span style="color: #000066;">atoi</span><span style="color: #009900;">&#40;</span> argv<span style="color: #009900;">&#91;</span> <span style="color: #0000dd;">4</span> <span style="color: #009900;">&#93;</span> <span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
    std<span style="color: #339933;">::</span><span style="color: #202020;">cout</span> <span style="color: #339933;">&lt;&lt;</span> iSumGoto <span style="color: #339933;">&lt;&lt;</span> iSumWhile <span style="color: #339933;">&lt;&lt;</span> iSumDo <span style="color: #339933;">&lt;&lt;</span> iSumFor<span style="color: #339933;">;</span>
    <span style="color: #b1b100;">return</span> <span style="color: #0000dd;">0</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span></pre></td></tr></table></div>

<p>So it&#8217;s using command line arguments as input, and printing to stdout for output. This is a relatively simple way to prevent the overzealous optimising compiler from removing all the code &#8211; we force it to keep it in there by doing input and output at runtime.</p>
<p>Before we compile and run it, you&#8217;ll also need to make a couple of changes in your project&#8217;s property pages &#8211; make sure you have the &#8216;Release&#8217; build configuration selected&#8230;</p>
<p>The first is to pass some command line arguments to the code &#8211; apart from any other reasons, this is shockingly naive code and will crash if it doesn&#8217;t get the arguments it expects, so add the following (which will make it iterate <strong>k_aiData</strong> fully for each function):</p>
<p><a href="http://www.altdevblogaday.com/wp-content/uploads/2012/03/CLLC_9Loops_CommandLineArguments.png"><img class="alignnone size-full wp-image-27334" src="http://www.altdevblogaday.com/wp-content/uploads/2012/03/CLLC_9Loops_CommandLineArguments.png" alt="screenshot of adding command line parameters in the project's property pages" width="842" height="202" /></a></p>
<p>&nbsp;</p>
<p>We also need to turn off function inlining or the compiler will optimise away all the function calls making the disassembly much harder to follow:</p>
<p><a href="http://www.altdevblogaday.com/wp-content/uploads/2012/03/CLLC_9Loops_TurnOffInlining.png"><img class="alignnone size-full wp-image-27333" src="http://www.altdevblogaday.com/wp-content/uploads/2012/03/CLLC_9Loops_TurnOffInlining.png" alt="screenshot of propertioes page showing how to turn off function inlining" width="851" height="265" /></a></p>
<p>Final pre-launch check: add a breakpoint to the C++ line in each loop that sums the loop&#8217;s elements (i.e. &#8216;iSum += xxxx&#8217;), and off we go!</p>
<p>&nbsp;</p>
<h2>Optimised Disassembly O&#8217;clock!</h2>
<p>Build and run the code and you should end up with your debugger stopped on the breakpoint you have put in <strong>SumGoto</strong>.</p>
<p>Right click and choose &#8216;Go To Disassembly&#8217;, you should see something like the image below &#8211; but before we look at it in detail, a brief aside is needed:</p>
<p style="padding-left: 30px">The code in <strong>main</strong> that calls <strong>SumGoto</strong> looks like this:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="asm" style="font-family:monospace;"><span style="color: #adadad; font-style: italic;">00DB191A</span>  <span style="color: #00007f; font-weight: bold;">push</span>        <span style="color: #46aa03; font-weight: bold;">eax</span>  
<span style="color: #adadad; font-style: italic;">00DB191B</span>  <span style="color: #00007f; font-weight: bold;">lea</span>         <span style="color: #46aa03; font-weight: bold;">esi</span><span style="color: #339933;">,</span><span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #46aa03; font-weight: bold;">ebp</span><span style="color: #339933;">-</span><span style="color: #ff0000;">24h</span><span style="color: #009900; font-weight: bold;">&#93;</span>  
<span style="color: #adadad; font-style: italic;">00DB191E</span>  <span style="color: #00007f; font-weight: bold;">call</span>        SumGoto <span style="color: #009900; font-weight: bold;">&#40;</span><span style="color: #ff0000;">0DB1880h</span><span style="color: #009900; font-weight: bold;">&#41;</span></pre></td></tr></table></div>

<p style="padding-left: 30px"><strong>eax</strong> (which contains <strong>k_iArraysize</strong>) is pushed onto the stack, but the address of <strong>k_aiData[ 0 ]</strong> (which is stored at <em><strong>[ebp-24h]</strong></em>) is stored into <strong>esi</strong> rather than being pushed onto the stack.</p>
<p style="padding-left: 30px"><em>&#8220;Wait!&#8221; </em>I hear you say<em> &#8220;They just did who in a whatnow? I thought we covered calling conventions, and no-one said anything about using <strong>esi</strong> for parameter passing!</em>&#8220;</p>
<p style="padding-left: 30px">Don&#8217;t worry about this for now, just accept that &#8211; for whatever reason &#8211; in this case the address of <strong>k_aiData[ 0 ]</strong> is being passed via the <strong>esi</strong> register (I investigate this in the article&#8217;s epilogue if you&#8217;re really interested).</p>
<p>So, here&#8217;s the disassembly for SumGoto:</p>
<div id="attachment_27335" class="wp-caption alignnone" style="width: 727px"><a href="http://www.altdevblogaday.com/wp-content/uploads/2012/03/CLLC_9Loops_OptimisedDisassembly_SumGoto.png"><img class="size-full wp-image-27335" src="http://www.altdevblogaday.com/wp-content/uploads/2012/03/CLLC_9Loops_OptimisedDisassembly_SumGoto.png" alt="" width="717" height="606" /></a><p class="wp-caption-text"><span style="color: #ff0000"><em><strong>make sure you have the same view options checked in the context menu, or your disassembly may look very different!</strong></em></span></p></div>
<p>Interestingly this bears little visible relation to the debug disassembly we looked at for the <strong>if-goto</strong> earlier. So let&#8217;s pick it apart to see what it&#8217;s doing differently:</p>
<ol>
<li><em>00DB1880 to 00DB1884</em> &#8211; function prologue of <strong>SumGoto.</strong></li>
<li><em>00DB1885</em> &#8211; <em><strong>mov</strong></em>ing function parameter <strong>iDataCount</strong> (i.e. the number of loops) into the <strong>edi</strong> register.</li>
<li><em>00DB1888 to 00DB188E</em> &#8211; initialising registers <strong>ecx</strong>, <strong>edx</strong>, <strong>ebx</strong>, and <strong>eax</strong> to 0 (n.b. anything XOR itself is 0).</li>
<li><em>ooDB1890 to 00DB1893</em> &#8211; compare <strong>edi</strong> (number of loops remaining) with 2; if less jump to 00DB18A7 (2nd instruction in step 9) otherwise continue.</li>
<li><em>ooDB1895</em> &#8211; another new assembly instruction; <strong>dec</strong> decreases its register operand by 1 &#8211; in this case <strong>edi</strong> (<strong>iDataCount</strong>).</li>
<li><em>00DB1896 to 00DB1899</em> &#8211; we know that the address of <strong>k_aiData[0]</strong> is in <strong>esi</strong>, so from the address calculation in the square brackets it is pretty obvious that these two lines are indexing into <strong>k_aiData</strong> and summing the odd and even elements into <strong>edx</strong> and <strong>ecx</strong> respectively.</li>
<li><em>00DB189D</em> &#8211; is incrementing eax by two. <strong>eax</strong> clearly contains the count of elements that have been looped over so far &#8211; because&#8230;</li>
<li><em>00DB18A0 to ooDB18A2</em> &#8211; &#8230;are comparing <strong>eax</strong> to <strong>edi</strong>. If <strong>eax</strong> &lt; <strong>edi</strong> execution jumps back to step 6.</li>
<li><em>00DB18A4 to 00DB18AB</em> &#8211; this ties in with the decrement to <strong>edi</strong> made at step 5. Since the code is looping and summing 2 elements at a time, this code checks if <strong>iDataCount</strong> was odd or even. If odd it jumps to step 11, if even it jumps to step 12.</li>
<li><em>00DB18AD</em> &#8211; leaves <strong>ecx</strong> unchanged. What is it for? It&#8217;s essentially a <strong>nop</strong> instruction (<em><strong>n</strong></em>o <em><strong>op</strong></em>eration)<strong></strong>, <strong>nop</strong> instructions are used in assembly code for various reasons such as memory maintaining alignment of certain instructions (the 1st answer to <a href="http://stackoverflow.com/questions/2703394/whats-the-point-of-lea-eax-eax">this question on Stack Overflow</a> explains sufficiently for our requirements at this point). In any case, both possible code paths through step 9 will skip this instruction entirely.</li>
<li><em>00DB18B0</em> &#8211; if <strong>iDataCount</strong> was odd, this code <strong>mov</strong>es the value of the array element that would have been missed by iterating 2 elements at a time into <strong>ebx.</strong></li>
<li><em>00DB18B3</em> &#8211; this uses <strong>lea</strong> to add the sums of odd and even elements of <strong>k_aiData</strong> that have been accumulating in<strong> edx</strong> and <strong>ecx</strong> and store them in <strong>eax</strong> (remember, <strong>eax</strong> is used to return integer values from functions).</li>
<li><em>00DB18B6</em> &#8211; this is actually the start of the epilogue of <strong>SumGoto</strong> &#8211; restoring <strong>edi</strong> to the value it stored before <strong>SumGoto</strong> was called. There&#8217;s no particular reason for this to have been put in before the next instruction. Optimising compilers do this sort of thing relatively often, as long as the code it generates is correct it&#8217;s not worth worrying about too much.</li>
<li><em>00DB18B7</em> &#8211; this line adds the value from <strong>ebx</strong> (see step11) to the sum to be returned in <strong>eax</strong>.</li>
<li><em>00DB18B9 to 00DB18BB</em> &#8211; function epilogue of <strong>SumGoto</strong>.</li>
</ol>
<p>Ouch. That seems far more complex than the debug assembly code for the <strong>if-goto</strong> loop. You may have to read through it a few times before you satisfy yourself about how it works &#8211; I recommend stepping through it in the debugger looking at the registers in a watch window.</p>
<p>Somewhat surprisingly, <strong>SumWhile</strong> and <strong>SumFor</strong> look pretty much exactly like <strong>SumGoto</strong>, but <strong>SumDo</strong> is way smaller:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
</pre></td><td class="code"><pre class="asm" style="font-family:monospace;">SumDo<span style="color: #339933;">:</span>
<span style="color: #adadad; font-style: italic;">00DB1830</span>  <span style="color: #00007f; font-weight: bold;">xor</span>         <span style="color: #46aa03; font-weight: bold;">eax</span><span style="color: #339933;">,</span><span style="color: #46aa03; font-weight: bold;">eax</span>  
<span style="color: #adadad; font-style: italic;">00DB1832</span>  <span style="color: #00007f; font-weight: bold;">xor</span>         <span style="color: #46aa03; font-weight: bold;">ecx</span><span style="color: #339933;">,</span><span style="color: #46aa03; font-weight: bold;">ecx</span>  
<span style="color: #adadad; font-style: italic;">00DB1834</span>  <span style="color: #00007f; font-weight: bold;">add</span>         <span style="color: #46aa03; font-weight: bold;">eax</span><span style="color: #339933;">,</span><span style="color: #0000ff; font-weight: bold;">dword</span> ptr <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #46aa03; font-weight: bold;">esi</span><span style="color: #339933;">+</span><span style="color: #46aa03; font-weight: bold;">ecx</span><span style="color: #339933;">*</span><span style="color: #ff0000;">4</span><span style="color: #009900; font-weight: bold;">&#93;</span>  
<span style="color: #adadad; font-style: italic;">00DB1837</span>  <span style="color: #00007f; font-weight: bold;">inc</span>         <span style="color: #46aa03; font-weight: bold;">ecx</span>  
<span style="color: #adadad; font-style: italic;">00DB1838</span>  <span style="color: #00007f; font-weight: bold;">cmp</span>         <span style="color: #46aa03; font-weight: bold;">ecx</span><span style="color: #339933;">,</span><span style="color: #46aa03; font-weight: bold;">edx</span>  
<span style="color: #adadad; font-style: italic;">00DB183A</span>  <span style="color: #00007f; font-weight: bold;">jl</span>          SumDo<span style="color: #339933;">+</span><span style="color: #ff0000;">4</span> <span style="color: #009900; font-weight: bold;">&#40;</span><span style="color: #ff0000;">0DB1834h</span><span style="color: #009900; font-weight: bold;">&#41;</span>  
<span style="color: #adadad; font-style: italic;">00DB183C</span>  <span style="color: #00007f; font-weight: bold;">ret</span></pre></td></tr></table></div>

<p>This is incredibly simple to follow, and intuitively much more the sort of thing I would have intuitively expected to see for all of the looping constructs, but there is method to the compiler&#8217;s seeming madness&#8230;</p>
<p>Summing two elements per iteration of the loop like the assembly of <strong>SumGoto</strong>, <strong>SumWhile</strong>, and <strong>SumFor</strong> are doing is actually a form of loop unrolling. Although (in this code) the compiler doesn&#8217;t know how many iterations of the loop it will end up doing, it can still improve the overall &#8216;looping instructions to working instructions&#8217; ratio of the loop by this pairwise unrolling. Over a large enough array it should be faster than code that is not unrolled in the same way.</p>
<p>By changing the compiler options (under C/C++ -&gt; Optimisation) from &#8216;Maximize Speed (/02)&#8217; to &#8216;Minimize Size (/01)&#8217; you can generate assembly that looks a lot more as you would expect. Since /02 is the default for release build configurations under Visual Studio 2010 I thought I should explain this assembly, and I leave looking at the assembly generated by /01 as an exercise for you dear reader :)</p>
<h1>Conclusions</h1>
<p>So, there we have it. looping constructs, and a genuine taste of the differences between optimised and debug assembly &#8211; albeit in a massively simplified scenario compared to real code.</p>
<p>What should we take away from this? Well, I guess primarily the point of this was to demonstrate that whilst the optimising compiler is constrained to generate assembly code that is isomorphic with your high level code, you should never take it for granted that the code it generates will look how you expect it to.</p>
<p>This should, I think, about finish up the program control / structural aspects of C/C++ and leave us free to move on to look at the way other mechanics of the language work at the assembly level.</p>
<p>I feel that there might possibly be a post on the range based for and on recursion at some point, but we&#8217;ll see &#8211; feel free to leave a comment if you think there&#8217;s something glaring that I&#8217;ve left out and I&#8217;ll try to rectify that before moving on&#8230;</p>
<p>Finally, a hearty thank you to all the AltDevAuthors who chipped in with sage advice on this post &#8211; Tony, Paul, Ted, Bruce, Ignacio, and Rich.</p>
<p>&nbsp;</p>
<h1>Epilogue</h1>
<h2 style="padding-left: 30px">Notes on [ebp+eax*4-XXh]</h2>
<p style="padding-left: 30px">Whilst this addressing mode seems like magic, there are limitations on the computations that can be performed within the square brackets in this way &#8211; see <a href="http://en.wikipedia.org/wiki/X86#Addressing_modes">this article on Wikipedia</a> for a summary of the limits.</p>
<p style="padding-left: 30px">Regardless of this, it is commonly seen used in conjunction with another x86 assembly instruction called <em><strong>lea</strong></em> (<em><strong>l</strong></em>oad <em><strong>e</strong></em>ffective <em><strong>a</strong></em>ddress) (as seen in the optimised <strong>SumGoto</strong> assembly) which will load the result of the address computation (rather than the value at that address) into a specific register.</p>
<p style="padding-left: 30px">When I&#8217;ve seen the mnemonic <strong>lea</strong> in the disassembly window it has most often been used for this purpose &#8211; though don&#8217;t assume that it is! Since we&#8217;re not (necessarily) assembly programmers, we don&#8217;t need to worry about this too much but I thought I&#8217;d mention it.</p>
<h2 style="padding-left: 30px">Notes on Using esi to pass parameters to functions</h2>
<p style="padding-left: 30px">So, this is certainly not what we&#8217;d expect given the coverage of calling conventions we did earlier in the series.</p>
<p style="padding-left: 30px">I googled for at least 10 minutes (clearly not exhaustive, but usually long enough to find a trail to an answer) and couldn&#8217;t find any specific information pertaining to the use of <strong>esi</strong> to pass parameters in a <em>documented</em> calling convention; however I did find several other people who had observed this behaviour and were looking for answers about it.</p>
<p style="padding-left: 30px">So, in the spirit of discovery I decided to see what happened if I compiled the looping functions (<strong>SumGoto</strong>, <strong>SumWhile</strong>, <strong>SumDo</strong>, and <strong>SumFor</strong>) into a separate library and then linked to that library instead of having them compile inside the same logical compilation unit as <strong>main</strong>. As anticipated, this sorted out the parameter passing so that it conformed to the <strong>cdecl</strong> calling convention, no more kooky use of <strong>esi</strong> to pass the array.</p>
<p style="padding-left: 30px">What do we conclude from this then? Well, it seems that if the compiler knows that the code it&#8217;s generating isn&#8217;t going in a library (or you have Link Time Code Generation enabled) &#8211; and so code only has to conform to the &#8216;local&#8217; calling conventions of the executable it&#8217;s generating &#8211; then the compiler takes liberties with the calling conventions in order to optimise function parameter passing &#8211; here&#8217;s a couple of links from Bruce on the matter: from <a href="http://msdn.microsoft.com/en-US/library/xbf3tbeh%28v=vs.100%29.aspx">MSDN (mentions it, but no specifics to speak of)</a> and from <a href="http://stackoverflow.com/questions/2320665/what-are-custom-calling-conventions">StackOverflow</a>.</p>
<p style="padding-left: 30px">Final take away point: if something makes no sense when you&#8217;re debugging, don&#8217;t assume anything &#8211; put on your Deerstalker and Sherlock Holmes your way to the bottom of it.</p>
<h2 style="padding-left: 30px">A final note on the genesis of loops</h2>
<p style="padding-left: 30px">I&#8217;ve already mentioned that I didn&#8217;t find the page on the history of C that I was looking for, so I can&#8217;t say with any degree of certainty which order the various looping constructs were actually added to the language.</p>
<p style="padding-left: 30px">However, in my searching I did find this interesting little nugget of information on Stack Exchange about the <a href="http://programmers.stackexchange.com/questions/149465/who-created-the-ideas-of-the-first-loop-constructs">history of looping</a> &#8211; my personal gut feeling on this matter is that whoever first coined the use of Sigma in mathematical notation is probably the father (or mother) of programmatic looping, but whoever invented knitting is the true originator ;)</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.altdevblogaday.com/2012/09/04/cc-low-level-curriculum-part-9-loops/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>C/C++ Low Level Curriculum Part 8: looking at optimised assembly</title>
		<link>http://www.altdevblogaday.com/2012/05/07/cc-low-level-curriculum-part-8-looking-at-optimised-assembly/</link>
		<comments>http://www.altdevblogaday.com/2012/05/07/cc-low-level-curriculum-part-8-looking-at-optimised-assembly/#comments</comments>
		<pubDate>Mon, 07 May 2012 22:35:39 +0000</pubDate>
		<dc:creator>Alex Darby</dc:creator>
				<category><![CDATA[#gamedev]]></category>
		<category><![CDATA[Education]]></category>
		<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://www.altdevblogaday.com/?p=26021</guid>
		<description><![CDATA[<p>It&#8217;s that time again where I have managed to find a few spare hours to squoze out an article for the Low Level Curriculum. This is the 8th post in this series, which is not in any way significant except that I like the number 8. As well as being a power of two, it is also the maximum number of unarmed people who can simultaneously get close enough to attack you (according to a martial arts book I once read).</p>
<p><a href="http://www.altdevblogaday.com/2012/05/07/cc-low-level-curriculum-part-8-looking-at-optimised-assembly/" class="more-link">Read more on C/C++ Low Level Curriculum Part 8: looking at optimised assembly&#8230;</a></p>
]]></description>
				<content:encoded><![CDATA[<p>It&#8217;s that time again where I have managed to find a few spare hours to squoze out an article for the Low Level Curriculum. This is the 8th post in this series, which is not in any way significant except that I like the number 8. As well as being a power of two, it is also the maximum number of unarmed people who can simultaneously get close enough to attack you (according to a martial arts book I once read).</p>
<p>This post covers how to set up Visual Studio to allow you to easily look at the optimised assembly code generated for simple code snippets like the ones we deal with in this series. If you wonder why I feel this is worth a post of its own here&#8217;s the reason &#8211; optimising compilers are good, and given code with constants as input and no external output (like the snippets I give as examples in this series) the compiler will generally optimise the code away to nothing &#8211; which I find makes it pretty hard to look at. This should prove immensely useful, both to refer back to, and for your own experimentation.</p>
<p><span id="more-26021"></span>Here are the backlinks for preceding articles in the series in case you want to refer back to any of them (warning: the first few are quite long):</p>
<ol>
<li><a href="http://www.altdevblogaday.com/2011/11/09/a-low-level-curriculum-for-c-and-c/" rel="nofollow">http://altdevblogaday.com/2011/11/09/a-low-level-curriculum-for-c-and-c/</a></li>
<li><a href="http://www.altdevblogaday.com/2011/11/24/c-c-low-level-curriculum-part-2-data-types/" rel="nofollow">http://altdevblogaday.com/2011/11/24/c-c-low-level-curriculum-part-2-data-types/</a></li>
<li><a href="http://www.altdevblogaday.com/2011/12/14/c-c-low-level-curriculum-part-3-the-stack/">http://altdevblogaday.com/2011/12/14/c-c-low-level-curriculum-part-3-the-stack/</a></li>
<li><a href="http://www.altdevblogaday.com/2011/12/24/c-c-low-level-curriculum-part-4-more-stack/">http://altdevblogaday.com/2011/12/24/c-c-low-level-curriculum-part-4-more-stack/</a></li>
<li><a href="http://www.altdevblogaday.com/2012/02/07/c-c-low-level-curriculum-part-5-even-more-stack/">http://altdevblogaday.com/2012/02/07/c-c-low-level-curriculum-part-5-even-more-stack/</a></li>
<li><a href="http://www.altdevblogaday.com/2012/03/07/c-c-low-level-curriculum-part-6-conditionals/">http://altdevblogaday.com/2012/03/07/c-c-low-level-curriculum-part-6-conditionals/</a></li>
<li><a href="http://www.altdevblogaday.com/2012/04/10/cc-low-level-curriculum-part-7-more-conditionals/">http://www.altdevblogaday.com/2012/04/10/cc-low-level-curriculum-part-7-more-conditionals</a>/</li>
</ol>
<h1>Assumptions</h1>
<p>Strictly speaking, dear reader, I am making tonnes of assumptions about you as I write this &#8211; that you read English, that you like to program etc. but we&#8217;ll be here all day if I try to list those so let&#8217;s stick to the ones that might be immediately inconvenient if they were incorrect.</p>
<p>I will be assuming that you have access to some sub-species of Visual Studio 2010 on a Windows PC, and that you are familiar with using it to do all the everyday basics like change build configurations, open files, edit, compile, run, and debug C/C++.</p>
<h1>Creating a project</h1>
<p>Open Visual Studio and from the menu choose &#8220;File -&gt; New -&gt; Project&#8230;&#8221;.</p>
<p>Once the new project wizard window opens (see below):</p>
<ul>
<li>go to the tree view on the left of the window and select &#8220;Other Languages -&gt; Visual C++&#8221;</li>
<li>in the main pane select &#8220;Win32 Console Application   Visual C++&#8221;</li>
<li>give it a name in the <strong><em>Name </em></strong>edit box</li>
<li>browse for a location of your choosing on your PC</li>
<li>click OK to create the project</li>
</ul>
<p><a href="http://www.altdevblogaday.com/wp-content/uploads/2012/05/CLLC_8OptAsm_NewProjectWizard.png"><img class="alignnone size-full wp-image-26025" src="http://www.altdevblogaday.com/wp-content/uploads/2012/05/CLLC_8OptAsm_NewProjectWizard.png" alt="" width="810" height="450" /></a></p>
<p>Once you have clicked OK just click &#8220;Finish&#8221; on the next stage of the wizard &#8211; in case you&#8217;re wondering, the options available when you click next don&#8217;t matter for our purposes (and un-checking the &#8220;Precompiled header&#8221; check box makes no difference, it still generates a console app that uses a precompiled header&#8230;).</p>
<h1>Changing the Project Properties</h1>
<p>The next step is to use the menu to select &#8220;Project -&gt; <em>&lt;YourProjectName&gt;</em> Properties&#8221;, which will bring up the properties dialog for the project.</p>
<p>When the properties dialog appears (see image below):</p>
<ul>
<li>select &#8220;All Configurations&#8221; from the Configuration drop list</li>
<li>select &#8220;Configuration Properties -&gt;General&#8221; in the tree view at the left of the window</li>
<li>in the main pane change &#8220;Whole Program Optimisation&#8221; to &#8220;No Whole Program Optimisation&#8221;.</li>
</ul>
<p><a href="http://www.altdevblogaday.com/wp-content/uploads/2012/05/CLLC_8OptAsm_ProjectProperties.png"><img class="alignnone size-full wp-image-26026" src="http://www.altdevblogaday.com/wp-content/uploads/2012/05/CLLC_8OptAsm_ProjectProperties.png" alt="" width="851" height="604" /></a></p>
<p>Next, in the tree view (see image below):</p>
<ul>
<li>in the tree view, navigate to &#8220;C/C++ -&gt; Code Generation&#8221;</li>
<li>in the main pane, change &#8220;Basic Runtime Checks&#8221; to &#8220;Default&#8221; (i.e. off)</li>
</ul>
<p><a href="http://www.altdevblogaday.com/wp-content/uploads/2012/05/CLLC_8OptAsm_ProjProps_RTChecks.png"><img class="alignnone size-full wp-image-26103" src="http://www.altdevblogaday.com/wp-content/uploads/2012/05/CLLC_8OptAsm_ProjProps_RTChecks.png" alt="" width="851" height="604" /></a></p>
<p>Finally (see image below):</p>
<ul>
<li>in the tree view, go to &#8220;C/C++ -&gt; Output Files&#8221;</li>
<li>in the main pane change &#8220;Assembler Output&#8221; to &#8220;Assembly With Source Code /(FAs)&#8221;</li>
<li>once you&#8217;ve done that click &#8220;OK&#8221;</li>
</ul>
<p><a href="http://www.altdevblogaday.com/wp-content/uploads/2012/05/CLLC_8OptAsm_ProjProps_AsmListing.png"><img class="alignnone size-full wp-image-26028" src="http://www.altdevblogaday.com/wp-content/uploads/2012/05/CLLC_8OptAsm_ProjProps_AsmListing.png" alt="" width="851" height="604" /></a></p>
<p>Now, when you compile the Visual Studio compiler will generate an <em><strong>.asm</strong></em> file as well as an <em><strong>.exe</strong></em> file. This file will contain the intermediate assembly code generated by the compiler, with the source code inserted into it inline as comments.</p>
<p>You could alternatively choose the &#8220;Assembly, Machine Code and Source (/FAcs)&#8221; option if you like &#8211; this will generate a .cod file that contains the machine code as well as the asm and source.</p>
<p>I prefer the regular .asm because it&#8217;s less visually noisy and the assembler mnemonics are all aligned on the same column, so that&#8217;s what I&#8217;ll assume you&#8217;re using if you&#8217;re following the article, but the .cod file is fine.</p>
<h1>So, what did we do there?</h1>
<p>Well, first we turned off link time code generation. Amongst other things, this will prevent the linker stripping the .asm generated for functions that are compiled but not called anywhere.</p>
<p>Secondly, we turned off the basic runtime checks (which are already off in Release). These checks make the function prologues and epilogues generated do significant amounts of (basically unneccessary) extra work causing a worst case 5x slowdown (see <a href="http://randomascii.wordpress.com/2011/07/22/visual-c-debug-buildsfast-checks-cause-5x-slowdowns/">this post </a>by Bruce Dawson on his personal blog for an in depth explanation).</p>
<p>Finally, we asked the compiler not to throw away the assembly code it generates for our program; this data is produced by the compilation process whenever you compile but is usually thrown away, we&#8217;re just asking Visual Studio to write it into an <em><strong>.asm</strong></em> file so we can take a look at it.</p>
<p>Since we made these changes for &#8220;All Configurations&#8221; this means we will have access to <em><strong>.asm</strong></em> files containing the assembly code generated by both the Debug and Release build configurations.</p>
<h1>Let&#8217;s try it out</h1>
<p>So in the spirit of discovery, let&#8217;s try it out (for the sake of familiarity) with a language feature we looked at last time &#8211; the conditional operator:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
</pre></td><td class="code"><pre class="c" style="font-family:monospace;"><span style="color: #339933;">#include &quot;stdafx.h&quot;</span>
&nbsp;
<span style="color: #993333;">int</span> ConditionalTest<span style="color: #009900;">&#40;</span> bool bFlag<span style="color: #339933;">,</span> <span style="color: #993333;">int</span> iOnTrue<span style="color: #339933;">,</span> <span style="color: #993333;">int</span> iOnFalse <span style="color: #009900;">&#41;</span>
<span style="color: #009900;">&#123;</span>
    <span style="color: #b1b100;">return</span> <span style="color: #009900;">&#40;</span> bFlag <span style="color: #339933;">?</span> iOnTrue <span style="color: #339933;">:</span> iOnFalse <span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span>
&nbsp;
<span style="color: #993333;">int</span> main<span style="color: #009900;">&#40;</span><span style="color: #993333;">int</span> argc<span style="color: #339933;">,</span> <span style="color: #993333;">char</span><span style="color: #339933;">*</span> argv<span style="color: #009900;">&#91;</span><span style="color: #009900;">&#93;</span><span style="color: #009900;">&#41;</span>
<span style="color: #009900;">&#123;</span>
    <span style="color: #993333;">int</span> a <span style="color: #339933;">=</span> <span style="color: #0000dd;">1</span><span style="color: #339933;">,</span> b <span style="color: #339933;">=</span> <span style="color: #0000dd;">2</span><span style="color: #339933;">;</span>
    bool bFlag <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">false</span><span style="color: #339933;">;</span>
    <span style="color: #993333;">int</span> c <span style="color: #339933;">=</span> ConditionalTest<span style="color: #009900;">&#40;</span> bFlag<span style="color: #339933;">,</span> a<span style="color: #339933;">,</span> b <span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    <span style="color: #b1b100;">return</span> <span style="color: #0000dd;">0</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span></pre></td></tr></table></div>

<p>The question you have in your head at this moment should be &#8220;why have we put the code into a function?&#8221;. Rest assured that this will become apparent soon enough.</p>
<p>Now we have to build the code and look in the <em><strong>.asm</strong></em> files generated to see what the compiler has been up to&#8230;</p>
<p>First build the Debug build configuration &#8211; this should already be selected in the solution configuration drop-down (at the top of your Visual Studio window unless you&#8217;ve moved it).</p>
<p>Next build the Release configuration.</p>
<p>Now we need to open the .asm files. Unless you have messed with project settings that I didn&#8217;t tell you to these will be in the following paths:</p>
<p style="padding-left: 30px">&lt;<em>path where you put the project</em>&gt;/Debug/&lt;<em>projectName</em>&gt;.asm</p>
<p style="padding-left: 30px">&lt;<em>path where you put the project</em>&gt;/Release/&lt;<em>projectName</em>&gt;.asm</p>
<h1>.asm files</h1>
<p>I&#8217;m not going to go into any significant detail about how<em><strong> .asm</strong></em> files are laid out here, if you want to find out more here&#8217;s a link to the <a href="http://msdn.microsoft.com/en-us/library/afzk3475.aspx">Microsoft documentation for their assembler</a>.</p>
<p>The main thing you should note is that we can find the C/C++ functions in the <em><strong>.asm</strong></em> file by looking for their names; and that &#8211; once we find them &#8211; the mixture of source code and assembly code looks basically the same as it does in the disassembly view of Visual Studio in the debugger.</p>
<h1>main()</h1>
<p>Let&#8217;s look at main() first. This is where I explain why the code snippet we wanted to look at was put in a function. I can tell you&#8217;re excited.</p>
<p>Here&#8217;s main() from the Debug .asm (I&#8217;ve reformatted it slightly to make it take up less vertical space):</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
</pre></td><td class="code"><pre class="asm" style="font-family:monospace;">_TEXT    <span style="color: #0000ff; font-weight: bold;">SEGMENT</span>
_c<span style="color: #0000ff; font-weight: bold;">$</span> = <span style="color: #339933;">-</span><span style="color: #ff0000;">16</span>                        <span style="color: #666666; font-style: italic;">; size = 4</span>
_bFlag<span style="color: #0000ff; font-weight: bold;">$</span> = <span style="color: #339933;">-</span><span style="color: #ff0000;">9</span>                        <span style="color: #666666; font-style: italic;">; size = 1</span>
_b<span style="color: #0000ff; font-weight: bold;">$</span> = <span style="color: #339933;">-</span><span style="color: #ff0000;">8</span>                        <span style="color: #666666; font-style: italic;">; size = 4</span>
_a<span style="color: #0000ff; font-weight: bold;">$</span> = <span style="color: #339933;">-</span><span style="color: #ff0000;">4</span>                        <span style="color: #666666; font-style: italic;">; size = 4</span>
_argc<span style="color: #0000ff; font-weight: bold;">$</span> = <span style="color: #ff0000;">8</span>                        <span style="color: #666666; font-style: italic;">; size = 4</span>
_argv<span style="color: #0000ff; font-weight: bold;">$</span> = <span style="color: #ff0000;">12</span>                        <span style="color: #666666; font-style: italic;">; size = 4</span>
_main    PROC                        <span style="color: #666666; font-style: italic;">; COMDAT</span>
<span style="color: #666666; font-style: italic;">; 9    : {</span>
    <span style="color: #00007f; font-weight: bold;">push</span>    <span style="color: #46aa03; font-weight: bold;">ebp</span>
    <span style="color: #00007f; font-weight: bold;">mov</span>    <span style="color: #46aa03; font-weight: bold;">ebp</span><span style="color: #339933;">,</span> <span style="color: #46aa03; font-weight: bold;">esp</span>
    <span style="color: #00007f; font-weight: bold;">sub</span>    <span style="color: #46aa03; font-weight: bold;">esp</span><span style="color: #339933;">,</span> <span style="color: #ff0000;">80</span>                    <span style="color: #666666; font-style: italic;">; 00000050H</span>
    <span style="color: #00007f; font-weight: bold;">push</span>    <span style="color: #46aa03; font-weight: bold;">ebx</span>
    <span style="color: #00007f; font-weight: bold;">push</span>    <span style="color: #46aa03; font-weight: bold;">esi</span>
    <span style="color: #00007f; font-weight: bold;">push</span>    <span style="color: #46aa03; font-weight: bold;">edi</span>
<span style="color: #666666; font-style: italic;">; 10   :     int a = 1, b = 2;</span>
    <span style="color: #00007f; font-weight: bold;">mov</span>    <span style="color: #0000ff; font-weight: bold;">DWORD</span> PTR _a<span style="color: #0000ff; font-weight: bold;">$</span><span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #46aa03; font-weight: bold;">ebp</span><span style="color: #009900; font-weight: bold;">&#93;</span><span style="color: #339933;">,</span> <span style="color: #ff0000;">1</span>
    <span style="color: #00007f; font-weight: bold;">mov</span>    <span style="color: #0000ff; font-weight: bold;">DWORD</span> PTR _b<span style="color: #0000ff; font-weight: bold;">$</span><span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #46aa03; font-weight: bold;">ebp</span><span style="color: #009900; font-weight: bold;">&#93;</span><span style="color: #339933;">,</span> <span style="color: #ff0000;">2</span>
<span style="color: #666666; font-style: italic;">; 11   :     bool bFlag = false;</span>
    <span style="color: #00007f; font-weight: bold;">mov</span>    <span style="color: #0000ff; font-weight: bold;">BYTE</span> PTR _bFlag<span style="color: #0000ff; font-weight: bold;">$</span><span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #46aa03; font-weight: bold;">ebp</span><span style="color: #009900; font-weight: bold;">&#93;</span><span style="color: #339933;">,</span> <span style="color: #ff0000;">0</span>
<span style="color: #666666; font-style: italic;">; 12   :     int c = ConditionalTest( bFlag, a, b );</span>
    <span style="color: #00007f; font-weight: bold;">mov</span>    <span style="color: #46aa03; font-weight: bold;">eax</span><span style="color: #339933;">,</span> <span style="color: #0000ff; font-weight: bold;">DWORD</span> PTR _b<span style="color: #0000ff; font-weight: bold;">$</span><span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #46aa03; font-weight: bold;">ebp</span><span style="color: #009900; font-weight: bold;">&#93;</span>
    <span style="color: #00007f; font-weight: bold;">push</span>    <span style="color: #46aa03; font-weight: bold;">eax</span>
    <span style="color: #00007f; font-weight: bold;">mov</span>    <span style="color: #46aa03; font-weight: bold;">ecx</span><span style="color: #339933;">,</span> <span style="color: #0000ff; font-weight: bold;">DWORD</span> PTR _a<span style="color: #0000ff; font-weight: bold;">$</span><span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #46aa03; font-weight: bold;">ebp</span><span style="color: #009900; font-weight: bold;">&#93;</span>
    <span style="color: #00007f; font-weight: bold;">push</span>    <span style="color: #46aa03; font-weight: bold;">ecx</span>
    <span style="color: #00007f; font-weight: bold;">movzx</span>    <span style="color: #46aa03; font-weight: bold;">edx</span><span style="color: #339933;">,</span> <span style="color: #0000ff; font-weight: bold;">BYTE</span> PTR _bFlag<span style="color: #0000ff; font-weight: bold;">$</span><span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #46aa03; font-weight: bold;">ebp</span><span style="color: #009900; font-weight: bold;">&#93;</span>
    <span style="color: #00007f; font-weight: bold;">push</span>    <span style="color: #46aa03; font-weight: bold;">edx</span>
    <span style="color: #00007f; font-weight: bold;">call</span>    ?ConditionalTest@@YAH_NHH@Z        <span style="color: #666666; font-style: italic;">; ConditionalTest</span>
    <span style="color: #00007f; font-weight: bold;">add</span>    <span style="color: #46aa03; font-weight: bold;">esp</span><span style="color: #339933;">,</span> <span style="color: #ff0000;">12</span>                    <span style="color: #666666; font-style: italic;">; 0000000cH</span>
    <span style="color: #00007f; font-weight: bold;">mov</span>    <span style="color: #0000ff; font-weight: bold;">DWORD</span> PTR _c<span style="color: #0000ff; font-weight: bold;">$</span><span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #46aa03; font-weight: bold;">ebp</span><span style="color: #009900; font-weight: bold;">&#93;</span><span style="color: #339933;">,</span> <span style="color: #46aa03; font-weight: bold;">eax</span>
<span style="color: #666666; font-style: italic;">; 13   :     return 0;</span>
    <span style="color: #00007f; font-weight: bold;">xor</span>    <span style="color: #46aa03; font-weight: bold;">eax</span><span style="color: #339933;">,</span> <span style="color: #46aa03; font-weight: bold;">eax</span>
<span style="color: #666666; font-style: italic;">; 14   : }</span>
    <span style="color: #00007f; font-weight: bold;">pop</span>    <span style="color: #46aa03; font-weight: bold;">edi</span>
    <span style="color: #00007f; font-weight: bold;">pop</span>    <span style="color: #46aa03; font-weight: bold;">esi</span>
    <span style="color: #00007f; font-weight: bold;">pop</span>    <span style="color: #46aa03; font-weight: bold;">ebx</span>
    <span style="color: #00007f; font-weight: bold;">mov</span>    <span style="color: #46aa03; font-weight: bold;">esp</span><span style="color: #339933;">,</span> <span style="color: #46aa03; font-weight: bold;">ebp</span>
    <span style="color: #00007f; font-weight: bold;">pop</span>    <span style="color: #46aa03; font-weight: bold;">ebp</span>
    <span style="color: #00007f; font-weight: bold;">ret</span>    <span style="color: #ff0000;">0</span>
_main    ENDP
_TEXT    ENDS</pre></td></tr></table></div>

<p>As long as you&#8217;ve read the previous posts, this should mostly look pretty familiar.</p>
<p>It breaks down as follows:</p>
<ul>
<li>lines 1-8: these lines define the offsets of the various Stack variables from <em><strong>[ebp]</strong></em> within main()&#8217;s Stack Frame</li>
<li>lines 10-15: function prologue of main()</li>
<li>lines 17-20: initialise the Stack variables</li>
<li>lines 22-30: push the parameters to ConditionalTest() into the Stack, call it, and assign its return value</li>
<li>line 32: sets up main()&#8217;s return value</li>
<li>lines 34-38: function epilogue of main()</li>
<li>line 39: return from main()</li>
</ul>
<p>Nothing unexpected there really, the only new thing to take in is the declarations of the Stack variable offsets from <em><strong>[ebp]</strong></em>.</p>
<p>I feel these tend to make the assembly code easier to follow than the code in the disassembly window in the Visual Studio debugger.</p>
<p>And, for comparison, here&#8217;s main() for the Release .asm:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
</pre></td><td class="code"><pre class="asm" style="font-family:monospace;">_TEXT    <span style="color: #0000ff; font-weight: bold;">SEGMENT</span>
_argc<span style="color: #0000ff; font-weight: bold;">$</span> = <span style="color: #ff0000;">8</span>                        <span style="color: #666666; font-style: italic;">; size = 4</span>
_argv<span style="color: #0000ff; font-weight: bold;">$</span> = <span style="color: #ff0000;">12</span>                        <span style="color: #666666; font-style: italic;">; size = 4</span>
_main    PROC                        <span style="color: #666666; font-style: italic;">; COMDAT</span>
<span style="color: #666666; font-style: italic;">; 10   :     int a = 1, b = 2;</span>
<span style="color: #666666; font-style: italic;">; 11   :     bool bFlag = false;</span>
<span style="color: #666666; font-style: italic;">; 12   :     int c = ConditionalTest( bFlag, a, b );</span>
<span style="color: #666666; font-style: italic;">; 13   :     return 0;</span>
    <span style="color: #00007f; font-weight: bold;">xor</span>    <span style="color: #46aa03; font-weight: bold;">eax</span><span style="color: #339933;">,</span> <span style="color: #46aa03; font-weight: bold;">eax</span>
<span style="color: #666666; font-style: italic;">; 14   : }</span>
    <span style="color: #00007f; font-weight: bold;">ret</span>    <span style="color: #ff0000;">0</span>
_main    ENDP
_TEXT    ENDS</pre></td></tr></table></div>

<p>The astute amongst you will have noticed that the Release assembly code is significantly smaller than the Debug.</p>
<p>In fact, it&#8217;s clearly doing nothing at all other than returning 0. Good optimising! High five!</p>
<p>As I alluded to earlier, the optimising compiler is great at spotting code that evaluates to a compile time constant and will happily replace any code it can with the equivalent constant.</p>
<h1>So that&#8217;s why we put the code snippet in a function</h1>
<p>It should hopefully be relatively clear by this point why we might have put the code snippet into a function, and then asked the linker not to remove code for functions that aren&#8217;t called.</p>
<p>Even if it can optimise away calls to a function, the compiler can&#8217;t optimise away the function before link time because some code outside of the object file it exists in <em>might</em> call it. Incidentally, the same effect usually keeps variables defined at global scope from being optimised away before linkage.</p>
<p>I&#8217;m going to call this <a href="http://en.wikipedia.org/wiki/Schr%C3%B6dinger%27s_cat"><em><strong>Schrödinger linkage</strong></em></a> (catchy, right?). If we want our simple code snippet to stay around after optimising we only need to make sure that it takes advantage of Schrödinger linkage to cheat the optimiser.</p>
<p>If the compiler can&#8217;t tell whether the function will be called, then it certainly can&#8217;t tell what the values of its parameters will be during one of these potential calls, or what its return value might be used for and so it can&#8217;t optimise away any code that relies on those inputs or contributes to the output either.</p>
<p>The upshot of this is that if we put our code snippet in a function, make sure that it uses the function parameters as inputs, and that its output is returned from the function then it should survive optimisation.</p>
<p>It&#8217;s really a testament to all the compiler programmers over the years that it takes so much effort to get at the optimised assembly code generated by a simple code snippet &#8211; compiler programmers we salute you!</p>
<h1>ConditionalTest()</h1>
<p>So, here&#8217;s the Debug .asm for ConditionalTest() (ignoring the prologue / epilogue):</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
</pre></td><td class="code"><pre class="asm" style="font-family:monospace;"><span style="color: #666666; font-style: italic;">; 5    :     return( bFlag ? iOnTrue : iOnFalse );</span>
    <span style="color: #00007f; font-weight: bold;">movzx</span>    <span style="color: #46aa03; font-weight: bold;">eax</span><span style="color: #339933;">,</span> <span style="color: #0000ff; font-weight: bold;">BYTE</span> PTR _bFlag<span style="color: #0000ff; font-weight: bold;">$</span><span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #46aa03; font-weight: bold;">ebp</span><span style="color: #009900; font-weight: bold;">&#93;</span>
    <span style="color: #00007f; font-weight: bold;">test</span>    <span style="color: #46aa03; font-weight: bold;">eax</span><span style="color: #339933;">,</span> <span style="color: #46aa03; font-weight: bold;">eax</span>
    <span style="color: #00007f; font-weight: bold;">je</span>    SHORT <span style="color: #0000ff; font-weight: bold;">$</span>LN3@Conditiona
    <span style="color: #00007f; font-weight: bold;">mov</span>    <span style="color: #46aa03; font-weight: bold;">ecx</span><span style="color: #339933;">,</span> <span style="color: #0000ff; font-weight: bold;">DWORD</span> PTR _iOnTrue<span style="color: #0000ff; font-weight: bold;">$</span><span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #46aa03; font-weight: bold;">ebp</span><span style="color: #009900; font-weight: bold;">&#93;</span>
    <span style="color: #00007f; font-weight: bold;">mov</span>    <span style="color: #0000ff; font-weight: bold;">DWORD</span> PTR tv66<span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #46aa03; font-weight: bold;">ebp</span><span style="color: #009900; font-weight: bold;">&#93;</span><span style="color: #339933;">,</span> <span style="color: #46aa03; font-weight: bold;">ecx</span>
    <span style="color: #00007f; font-weight: bold;">jmp</span>    SHORT <span style="color: #0000ff; font-weight: bold;">$</span>LN4@Conditiona
<span style="color: #0000ff; font-weight: bold;">$</span>LN3@Conditiona<span style="color: #339933;">:</span>
    <span style="color: #00007f; font-weight: bold;">mov</span>    <span style="color: #46aa03; font-weight: bold;">edx</span><span style="color: #339933;">,</span> <span style="color: #0000ff; font-weight: bold;">DWORD</span> PTR _iOnFalse<span style="color: #0000ff; font-weight: bold;">$</span><span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #46aa03; font-weight: bold;">ebp</span><span style="color: #009900; font-weight: bold;">&#93;</span>
    <span style="color: #00007f; font-weight: bold;">mov</span>    <span style="color: #0000ff; font-weight: bold;">DWORD</span> PTR tv66<span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #46aa03; font-weight: bold;">ebp</span><span style="color: #009900; font-weight: bold;">&#93;</span><span style="color: #339933;">,</span> <span style="color: #46aa03; font-weight: bold;">edx</span>
<span style="color: #0000ff; font-weight: bold;">$</span>LN4@Conditiona<span style="color: #339933;">:</span>
    <span style="color: #00007f; font-weight: bold;">mov</span>    <span style="color: #46aa03; font-weight: bold;">eax</span><span style="color: #339933;">,</span> <span style="color: #0000ff; font-weight: bold;">DWORD</span> PTR tv66<span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #46aa03; font-weight: bold;">ebp</span><span style="color: #009900; font-weight: bold;">&#93;</span>
<span style="color: #666666; font-style: italic;">; 6    : }</span></pre></td></tr></table></div>

<p>As you should be able to see, this is doing the basically same thing as the code we looked at in the Debug disassembly in the <a href="http://http://www.altdevblogaday.com/2012/04/10/cc-low-level-curriculum-part-7-more-conditionals/">previous article</a>:</p>
<ul>
<li>branching based on the result of <em><strong>test</strong></em>ing the value of bFlag (the mnemonic <em><strong>test</strong></em> does a bitwise logical AND)</li>
<li>both branches set a Stack variable at an offset of <em><strong>tv66</strong></em> from <em><strong>[ebp]</strong></em></li>
<li>and both branches then execute the last line which copies the content of that address into <em><strong>eax</strong><strong><br />
</strong></em></li>
</ul>
<p>Again, the assembly code is arguably easier to follow than the corresponding disassembly because the <em><strong>jmp</strong></em> mnemonic jumps to labels visibly defined in the code, whereas in the disassembly view in Visual Studio you generally have to cross reference the operand to <em><strong>jmp</strong></em> with the memory addresses in the disassembly view to see where it&#8217;s <em><strong>j</strong></em>u<em><strong>mp</strong></em>ing to&#8230;</p>
<p>Let&#8217;s compare this with the Release assembler (again not showing the function prologue or epilogue):</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
</pre></td><td class="code"><pre class="asm" style="font-family:monospace;"><span style="color: #666666; font-style: italic;">; 5    :     return( bFlag ? iOnTrue : iOnFalse );</span>
    <span style="color: #00007f; font-weight: bold;">cmp</span>    <span style="color: #0000ff; font-weight: bold;">BYTE</span> PTR _bFlag<span style="color: #0000ff; font-weight: bold;">$</span><span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #46aa03; font-weight: bold;">ebp</span><span style="color: #009900; font-weight: bold;">&#93;</span><span style="color: #339933;">,</span> <span style="color: #ff0000;">0</span>
    <span style="color: #00007f; font-weight: bold;">mov</span>    <span style="color: #46aa03; font-weight: bold;">eax</span><span style="color: #339933;">,</span> <span style="color: #0000ff; font-weight: bold;">DWORD</span> PTR _iOnTrue<span style="color: #0000ff; font-weight: bold;">$</span><span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #46aa03; font-weight: bold;">ebp</span><span style="color: #009900; font-weight: bold;">&#93;</span>
    <span style="color: #00007f; font-weight: bold;">jne</span>    SHORT <span style="color: #0000ff; font-weight: bold;">$</span>LN4@Conditiona
    <span style="color: #00007f; font-weight: bold;">mov</span>    <span style="color: #46aa03; font-weight: bold;">eax</span><span style="color: #339933;">,</span> <span style="color: #0000ff; font-weight: bold;">DWORD</span> PTR _iOnFalse<span style="color: #0000ff; font-weight: bold;">$</span><span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #46aa03; font-weight: bold;">ebp</span><span style="color: #009900; font-weight: bold;">&#93;</span>
<span style="color: #0000ff; font-weight: bold;">$</span>LN4@Conditiona<span style="color: #339933;">:</span>
<span style="color: #666666; font-style: italic;">; 6    : }</span></pre></td></tr></table></div>

<p>You will note that the work of this function is now done in 4 instructions as opposed to 9 in the Debug:</p>
<ul>
<li>it <em><strong>c</strong></em>o<em><strong>mp</strong></em>ares the value of bFlag against 0</li>
<li>unconditionally <em><strong>mov</strong></em>es the value of iOnTrue into <em><strong>eax</strong></em></li>
<li>if the value of bFlag was not equal to 0 (i.e. it was true) it jumps past the next instruction&#8230;</li>
<li>&#8230;otherwise this <em><strong>mov</strong></em>es the value of iOnFalse into <em><strong>eax</strong></em></li>
</ul>
<p>As I&#8217;ve stated before I&#8217;m not an assembly code programmer and I&#8217;m not an optimisation expert. Consequently, I&#8217;m not going to offer my opinion on the significance of the ordering of the instructions in this Release assembly code.</p>
<p>I am, however, prepared to go out on a limb and say it&#8217;s a pretty safe bet that the Release version with 4 instructions is going to execute significantly faster than the Debug version with 9.</p>
<p>So, why such a big difference between Debug and Release for something that when debugging at source level is a single-step?</p>
<p>Essentially this is because the unoptimised assembly code generated by the compiler must be amenable to single-step debugging at the source level:</p>
<ul>
<li>it almost always does the exact logical equivalent of what the high level code asked it to do and, specifically, in the same order</li>
<li>it also has to frequently write values from CPU registers back into memory so that the debugger can show them updating</li>
</ul>
<h1>Summary</h1>
<p>What&#8217;s the main point I&#8217;d like you to take away from this article? Optimising compilers are feisty!</p>
<p>You have to know how to stop them optimising away your isolated C/C++ code snippets if you want to easily be able to see the optimised assembly code they generate.</p>
<p>This article shows a simple boilerplate way to short-circuit the Visual Studio optimising compiler &#8211; mileage will vary on other platforms.</p>
<p>There are other strategies to stop the optimiser optimising away your code, but they basically all come down to utilising the Schrödinger linkage effect; in general:</p>
<ul>
<li>use global variables, function parameters, or function call results as inputs to the code</li>
<li>use global variables, function return values, or function call parameters as outputs from the code</li>
<li>if you&#8217;re not using Visual Studio&#8217;s compiler you may also need to turn off inlining</li>
</ul>
<p>A final extreme method I have been told about is to insert <em><strong>nop</strong></em> instructions via inline assembly around / within the code you want to isolate. Note that you should use this approach with caution, as it interferes directly with the optimiser and can easily affect the output to the point where it is no longer representative.</p>
<h1>Epilogue</h1>
<p>So, I hope you found this interesting &#8211; I certainly expect you will find it useful :)</p>
<p>The next article (as promised last time!) is about looping, which is another reason why it seemed like a good time to cover getting at optimised assembly code for simple C/C++ snippets.</p>
<p>I will be referring back to this in future articles in situations where looking at the optimised assembly code is particularly relevant.</p>
<p>If you&#8217;re wondering what you should look at first to see how Debug and Release code differ, and want to get practise at beating the optimiser, I&#8217;d suggest starting with something straight forward like adding a few numbers together.</p>
<p>Lastly, but by no means leastly, thanks to Rich, Ted, and Bruce for their input and proof reading; and Bruce for supplying me with the tip that made this post possible.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.altdevblogaday.com/2012/05/07/cc-low-level-curriculum-part-8-looking-at-optimised-assembly/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>C/C++ Low Level Curriculum Part 7: More Conditionals</title>
		<link>http://www.altdevblogaday.com/2012/04/10/cc-low-level-curriculum-part-7-more-conditionals/</link>
		<comments>http://www.altdevblogaday.com/2012/04/10/cc-low-level-curriculum-part-7-more-conditionals/#comments</comments>
		<pubDate>Tue, 10 Apr 2012 15:46:57 +0000</pubDate>
		<dc:creator>Alex Darby</dc:creator>
				<category><![CDATA[#gamedev]]></category>
		<category><![CDATA[Education]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[C++]]></category>
		<category><![CDATA[gamedev]]></category>
		<category><![CDATA[low level]]></category>
		<category><![CDATA[programming]]></category>

		<guid isPermaLink="false">http://altdevblogaday.com/?p=25391</guid>
		<description><![CDATA[<p>Hello humans. Welcome to the 7th part of the C/C++ Low Level Curriculum series I&#8217;ve been writing. This post covers the conditional operator, and switch statements. As per usual I will be showing snippets of C++ code and throwing the corresponding x86 assembler at you (as produced by VS2010) to show you what your high level code is actually doing at the assembler level.</p>
<p><a href="http://www.altdevblogaday.com/2012/04/10/cc-low-level-curriculum-part-7-more-conditionals/" class="more-link">Read more on C/C++ Low Level Curriculum Part 7: More Conditionals&#8230;</a></p>
]]></description>
				<content:encoded><![CDATA[<p>Hello humans. Welcome to the 7th part of the C/C++ Low Level Curriculum series I&#8217;ve been writing. This post covers the conditional operator, and switch statements. As per usual I will be showing snippets of C++ code and throwing the corresponding x86 assembler at you (as produced by VS2010) to show you what your high level code is actually doing at the assembler level.</p>
<p>Disclaimer: in an ideal world I&#8217;d like to try to avoid assumed knowledge, but keeping up the level of detail in each post that this entails is, frankly, too much work. Consequently I will from now on point you at post 6 as a &#8220;how to&#8221; and then get on with it&#8230;</p>
<p>Here are the backlinks for preceding articles in the series (warning: it might take you a while, the first few are quite long):</p>
<ol>
<li><a href="http://www.altdevblogaday.com/2011/11/09/a-low-level-curriculum-for-c-and-c/" rel="nofollow">http://altdevblogaday.com/2011/11/09/a-low-level-curriculum-for-c-and-c/</a></li>
<li><a href="http://www.altdevblogaday.com/2011/11/24/c-c-low-level-curriculum-part-2-data-types/" rel="nofollow">http://altdevblogaday.com/2011/11/24/c-c-low-level-curriculum-part-2-data-types/</a></li>
<li><a href="http://www.altdevblogaday.com/2011/12/14/c-c-low-level-curriculum-part-3-the-stack/">http://altdevblogaday.com/2011/12/14/c-c-low-level-curriculum-part-3-the-stack/</a></li>
<li><a href="http://www.altdevblogaday.com/2011/12/24/c-c-low-level-curriculum-part-4-more-stack/">http://altdevblogaday.com/2011/12/24/c-c-low-level-curriculum-part-4-more-stack/</a></li>
<li><a href="http://www.altdevblogaday.com/2012/02/07/c-c-low-level-curriculum-part-5-even-more-stack/">http://altdevblogaday.com/2012/02/07/c-c-low-level-curriculum-part-5-even-more-stack/</a></li>
<li><a href="http://www.altdevblogaday.com/2012/03/07/c-c-low-level-curriculum-part-6-conditionals/">http://altdevblogaday.com/2012/03/07/c-c-low-level-curriculum-part-6-conditionals/</a> [see near the top of this post for details on compiling &amp; running the code snippets]</li>
</ol>
<h1>The conditional operator</h1>
<p>I assume that everyone&#8217;s familiar with the conditional operator, also known as the &#8220;question mark&#8221;, or the ternary operator (&#8220;ternary&#8221; because it&#8217;s the only C/C++ operator that takes three operands).</p>
<p>If you&#8217;re not, here&#8217;s a <a href="http://en.wikipedia.org/wiki/%3F:#C.2B.2B">link so you can catch up</a> (I predict that you will be so stoked to find out about it that you will be over-using it within the week).</p>
<p>Personally I heartily approve of the conditional operator when used judiciously, but it&#8217;s not always great for source level debugging because it&#8217;s basically a single line <em>if-else</em> and can be hard to follow in the debugger (in fact I&#8217;ve heard of it being banned under the coding standards at more than one company, but there you are we can&#8217;t all be sane can we?).</p>
<p>Anyway, let&#8217;s have a quick look at it with some code:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
</pre></td><td class="code"><pre class="c" style="font-family:monospace;"><span style="color: #339933;">#include &quot;stdafx.h&quot;</span>
&nbsp;
<span style="color: #993333;">int</span> main<span style="color: #009900;">&#40;</span><span style="color: #993333;">int</span> argc<span style="color: #339933;">,</span> <span style="color: #993333;">char</span><span style="color: #339933;">*</span> argv<span style="color: #009900;">&#91;</span><span style="color: #009900;">&#93;</span><span style="color: #009900;">&#41;</span>
<span style="color: #009900;">&#123;</span>
    <span style="color: #666666; font-style: italic;">// the line after this comment is logically equivalent to the following line of code:</span>
    <span style="color: #666666; font-style: italic;">// int iLocal; if( argc &gt; 2 ){ iLocal = 3; }else{ iLocal = 7; }</span>
    <span style="color: #993333;">int</span> iLocal <span style="color: #339933;">=</span> <span style="color: #009900;">&#40;</span>argc <span style="color: #339933;">&gt;</span> <span style="color: #0000dd;">2</span><span style="color: #009900;">&#41;</span> <span style="color: #339933;">?</span> <span style="color: #0000dd;">3</span> <span style="color: #339933;">:</span> <span style="color: #0000dd;">7</span><span style="color: #339933;">;</span>
&nbsp;
    <span style="color: #b1b100;">return</span> <span style="color: #0000dd;">0</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span></pre></td></tr></table></div>

<p>If you remember the the assembler that a basic <em><strong>if-else</strong></em> generated <a href="http://altdevblogaday.com/2012/03/07/c-c-low-level-curriculum-part-6-conditionals/">in the last article</a>, then the assembler generated here will probably bust your mind gaskets&#8230;</p>
<p>Note:</p>
<ol>
<li>I&#8217;ve deliberately left the function prologue and epilogue out of the asm below, and just left the assembler involved with the conditonal assignment</li>
<li>if your disassembly view doesn&#8217;t show the variable names, then you need to right click the window and check &#8220;Show Symbol Names&#8221;</li>
</ol>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
</pre></td><td class="code"><pre class="asm" style="font-family:monospace;">     <span style="color: #ff0000;">5</span><span style="color: #339933;">:</span>     <span style="color: #00007f; font-weight: bold;">int</span> iLocal = <span style="color: #009900; font-weight: bold;">&#40;</span>argc &gt; <span style="color: #ff0000;">2</span><span style="color: #009900; font-weight: bold;">&#41;</span> ? <span style="color: #ff0000;">3</span> <span style="color: #339933;">:</span> <span style="color: #ff0000;">7</span><span style="color: #666666; font-style: italic;">;</span>
<span style="color: #adadad; font-style: italic;">01311249</span>  <span style="color: #00007f; font-weight: bold;">xor</span>         <span style="color: #46aa03; font-weight: bold;">eax</span><span style="color: #339933;">,</span><span style="color: #46aa03; font-weight: bold;">eax</span>
<span style="color: #adadad; font-style: italic;">0131124B</span>  <span style="color: #00007f; font-weight: bold;">cmp</span>         <span style="color: #0000ff; font-weight: bold;">dword</span> ptr <span style="color: #009900; font-weight: bold;">&#91;</span>argc<span style="color: #009900; font-weight: bold;">&#93;</span><span style="color: #339933;">,</span><span style="color: #ff0000;">2</span>
<span style="color: #adadad; font-style: italic;">0131124F</span>  <span style="color: #00007f; font-weight: bold;">setle</span>       <span style="color: #46aa03; font-weight: bold;">al</span>
<span style="color: #adadad; font-style: italic;">01311252</span>  <span style="color: #00007f; font-weight: bold;">lea</span>         <span style="color: #46aa03; font-weight: bold;">eax</span><span style="color: #339933;">,</span><span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #46aa03; font-weight: bold;">eax</span><span style="color: #339933;">*</span><span style="color: #ff0000;">4</span><span style="color: #339933;">+</span><span style="color: #ff0000;">3</span><span style="color: #009900; font-weight: bold;">&#93;</span>
<span style="color: #adadad; font-style: italic;">01311259</span>  <span style="color: #00007f; font-weight: bold;">mov</span>         <span style="color: #0000ff; font-weight: bold;">dword</span> ptr <span style="color: #009900; font-weight: bold;">&#91;</span>iLocal<span style="color: #009900; font-weight: bold;">&#93;</span><span style="color: #339933;">,</span><span style="color: #46aa03; font-weight: bold;">eax</span></pre></td></tr></table></div>

<p>Clearly this is not very much like the code for the simple if-else that we looked at <a href="http://altdevblogaday.com/2012/03/07/c-c-low-level-curriculum-part-6-conditionals/">previously</a>.</p>
<p>This is because there is trickery afoot and the compiler has chosen to do sneaky branchless code to implement the logic specified by the C++ code.</p>
<p>So, let&#8217;s examine it line by line:</p>
<ul>
<li><em>line 1</em> &#8211; uses the <em><strong>xor</strong></em> instruction to set <strong><em>eax</em></strong> to 0. Anything <a href="http://en.wikipedia.org/wiki/Exclusive_or">XORed </a>with itself is 0.</li>
<li><em>line 2</em> &#8211; as in the previous <strong><em>if</em></strong> examples this uses <em><strong>cmp</strong></em> to test the condition, setting flags in a special purpose CPU register based on the result of the comparison.</li>
<li><em>line 3</em> &#8211; this is a new one! The instruction <em><strong>set</strong><strong>l</strong></em>ess <em><strong>e</strong></em>qual sets its operand to 1 if the 1st operand of the preceding <em><strong>cmp</strong></em> was less than or equal to the 2nd operand, and to 0 if it was greater. We&#8217;ve not seen the operand <em><strong>al</strong></em> before, it&#8217;s a legacy (386) register name which now maps to the lowest byte of the <em><strong>eax</strong></em> register (if you&#8217;re a sensible person and are stepping through this code in your debugger with the register window open, you will see that this instruction causes the <em><strong>eax</strong></em> register to be set to 1 &#8211; also note that this only works because <em><strong>eax</strong></em> has already been set to 0).</li>
<li><em>line 4</em> &#8211; uses the <em><strong>l</strong></em>oad <em><strong>e</strong></em>ffective <em><strong>a</strong></em>ddress instruction do do some sneaky maths that relies on the value of <em><strong>eax</strong></em> set by <em><strong>setle</strong></em> in <em>line 3</em>.</li>
<li><em>line 5</em> &#8211; <em><strong>mov</strong></em>es the value from <em><strong>eax</strong></em> into the memory address storing the value of iLocal</li>
</ul>
<p>That&#8217;s all fine, but how does it work?</p>
<p>Firstly, note that at the assembler level the comparative instruction <em><strong>setle</strong></em> is (as in the previous post&#8217;s examples) testing the opposite condition to the conditonal specified in the C++ code.</p>
<p>This means that the <em><strong>eax</strong></em> register will be set to 0 in <em>line 3</em> if <em><strong>argc</strong></em> is greater than 2, which in turn means that the <em><strong>eax*4+3</strong></em> part of <em>line 4</em> will evaluate to <em><strong>(0*4)+3</strong> </em>- i.e. 3.</p>
<p>Conversely, if <em><strong>argc</strong></em> is less than or equal to 2, the <em><strong>eax</strong></em> register will be set to 1 which in turn means that <em>line 4</em> will evaluate to <em><strong>(1*4)+3</strong> </em>- i.e. 7.</p>
<p>So, as you can see, the assembler is doing the same branchless set of instructions  regardless of the condition, but using the 0 or 1 result of the conditional instruction in the maths to cancel out or include one of the terms and give what I like to call a &#8220;mathematical if&#8221;. Clever.</p>
<p>Incidentally this sort of branchless-but-still-conditional code has been a bit / lot of a hot topic over the last few years, especially on consoles  since their CPUs are particularly branch mis-prediction sensitive.</p>
<p>Judicious use of the &#8220;branchless condtional&#8221; idiom is a tool that can be used to combat branch (mis-)prediction related performance issues &#8211; for an example of this, see the use of the <em><strong>fsel</strong></em> PPU instruction in <a href="http://altdevblogaday.com/2011/11/10/optimisation_lessons/">this ADBAD post by Tony Albrecht</a>, and for brief a discussion of branch prediction issues (primarily PC related) see <a href="http://igoro.com/archive/fast-and-slow-if-statements-branch-prediction-in-modern-processors/">this article</a> by Igor Ostrovsky (who works for Microsoft).</p>
<h1>The conditional operator (part deux)</h1>
<p>So, clearly our above super-simple-sample resulted in the compiler generating clever assembler because of the constant values in it; interesting certainly, but not necessarily representative of most &#8220;real world&#8221; assembler.</p>
<p>Let&#8217;s see what happens if we use variables with the conditional operator&#8230;</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
</pre></td><td class="code"><pre class="c" style="font-family:monospace;"><span style="color: #339933;">#include &quot;stdafx.h&quot;</span>
&nbsp;
<span style="color: #993333;">int</span> main<span style="color: #009900;">&#40;</span><span style="color: #993333;">int</span> argc<span style="color: #339933;">,</span> <span style="color: #993333;">char</span><span style="color: #339933;">*</span> argv<span style="color: #009900;">&#91;</span><span style="color: #009900;">&#93;</span><span style="color: #009900;">&#41;</span>
<span style="color: #009900;">&#123;</span>
    <span style="color: #993333;">int</span> iOperandTwo <span style="color: #339933;">=</span> <span style="color: #0000dd;">3</span><span style="color: #339933;">;</span>
    <span style="color: #993333;">int</span> iOperandThree <span style="color: #339933;">=</span> <span style="color: #0000dd;">7</span><span style="color: #339933;">;</span>
    <span style="color: #993333;">int</span> iLocal <span style="color: #339933;">=</span> <span style="color: #009900;">&#40;</span>argc <span style="color: #339933;">&gt;</span> <span style="color: #0000dd;">2</span><span style="color: #009900;">&#41;</span> <span style="color: #339933;">?</span> iOperandTwo <span style="color: #339933;">:</span> iOperandThree<span style="color: #339933;">;</span>
&nbsp;
    <span style="color: #b1b100;">return</span> <span style="color: #0000dd;">0</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span></pre></td></tr></table></div>

<p>And, here&#8217;s the relevant disassembly:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
</pre></td><td class="code"><pre class="asm" style="font-family:monospace;">     <span style="color: #ff0000;">5</span><span style="color: #339933;">:</span>     <span style="color: #00007f; font-weight: bold;">int</span> iOperandTwo = <span style="color: #ff0000;">3</span><span style="color: #666666; font-style: italic;">;</span>
<span style="color: #adadad; font-style: italic;">00CF1619</span>  <span style="color: #00007f; font-weight: bold;">mov</span>         <span style="color: #0000ff; font-weight: bold;">dword</span> ptr <span style="color: #009900; font-weight: bold;">&#91;</span>iOperandTwo<span style="color: #009900; font-weight: bold;">&#93;</span><span style="color: #339933;">,</span><span style="color: #ff0000;">3</span>
     <span style="color: #ff0000;">6</span><span style="color: #339933;">:</span>     <span style="color: #00007f; font-weight: bold;">int</span> iOperandThree = <span style="color: #ff0000;">7</span><span style="color: #666666; font-style: italic;">;</span>
<span style="color: #adadad; font-style: italic;">00CF1620</span>  <span style="color: #00007f; font-weight: bold;">mov</span>         <span style="color: #0000ff; font-weight: bold;">dword</span> ptr <span style="color: #009900; font-weight: bold;">&#91;</span>iOperandThree<span style="color: #009900; font-weight: bold;">&#93;</span><span style="color: #339933;">,</span><span style="color: #ff0000;">7</span>
     <span style="color: #ff0000;">7</span><span style="color: #339933;">:</span>     <span style="color: #00007f; font-weight: bold;">int</span> iLocal = <span style="color: #009900; font-weight: bold;">&#40;</span>argc &gt; <span style="color: #ff0000;">2</span><span style="color: #009900; font-weight: bold;">&#41;</span> ? iOperandTwo <span style="color: #339933;">:</span> iOperandThree<span style="color: #666666; font-style: italic;">;</span>
<span style="color: #adadad; font-style: italic;">00CF1627</span>  <span style="color: #00007f; font-weight: bold;">cmp</span>         <span style="color: #0000ff; font-weight: bold;">dword</span> ptr <span style="color: #009900; font-weight: bold;">&#91;</span>argc<span style="color: #009900; font-weight: bold;">&#93;</span><span style="color: #339933;">,</span><span style="color: #ff0000;">2</span>
<span style="color: #adadad; font-style: italic;">00CF162B</span>  <span style="color: #00007f; font-weight: bold;">jle</span>         main<span style="color: #339933;">+</span><span style="color: #ff0000;">25h</span> <span style="color: #009900; font-weight: bold;">&#40;</span><span style="color: #ff0000;">0CF1635h</span><span style="color: #009900; font-weight: bold;">&#41;</span>
<span style="color: #adadad; font-style: italic;">00CF162D</span>  <span style="color: #00007f; font-weight: bold;">mov</span>         <span style="color: #46aa03; font-weight: bold;">eax</span><span style="color: #339933;">,</span><span style="color: #0000ff; font-weight: bold;">dword</span> ptr <span style="color: #009900; font-weight: bold;">&#91;</span>iOperandTwo<span style="color: #009900; font-weight: bold;">&#93;</span>
<span style="color: #adadad; font-style: italic;">00CF1630</span>  <span style="color: #00007f; font-weight: bold;">mov</span>         <span style="color: #0000ff; font-weight: bold;">dword</span> ptr <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #46aa03; font-weight: bold;">ebp</span><span style="color: #339933;">-</span><span style="color: #ff0000;">50h</span><span style="color: #009900; font-weight: bold;">&#93;</span><span style="color: #339933;">,</span><span style="color: #46aa03; font-weight: bold;">eax</span>
<span style="color: #adadad; font-style: italic;">00CF1633</span>  <span style="color: #00007f; font-weight: bold;">jmp</span>         main<span style="color: #339933;">+</span><span style="color: #ff0000;">2Bh</span> <span style="color: #009900; font-weight: bold;">&#40;</span><span style="color: #ff0000;">0CF163Bh</span><span style="color: #009900; font-weight: bold;">&#41;</span>
<span style="color: #adadad; font-style: italic;">00CF1635</span>  <span style="color: #00007f; font-weight: bold;">mov</span>         <span style="color: #46aa03; font-weight: bold;">ecx</span><span style="color: #339933;">,</span><span style="color: #0000ff; font-weight: bold;">dword</span> ptr <span style="color: #009900; font-weight: bold;">&#91;</span>iOperandThree<span style="color: #009900; font-weight: bold;">&#93;</span>
<span style="color: #adadad; font-style: italic;">00CF1638</span>  <span style="color: #00007f; font-weight: bold;">mov</span>         <span style="color: #0000ff; font-weight: bold;">dword</span> ptr <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #46aa03; font-weight: bold;">ebp</span><span style="color: #339933;">-</span><span style="color: #ff0000;">50h</span><span style="color: #009900; font-weight: bold;">&#93;</span><span style="color: #339933;">,</span><span style="color: #46aa03; font-weight: bold;">ecx</span>
<span style="color: #adadad; font-style: italic;">00CF163B</span>  <span style="color: #00007f; font-weight: bold;">mov</span>         <span style="color: #46aa03; font-weight: bold;">edx</span><span style="color: #339933;">,</span><span style="color: #0000ff; font-weight: bold;">dword</span> ptr <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #46aa03; font-weight: bold;">ebp</span><span style="color: #339933;">-</span><span style="color: #ff0000;">50h</span><span style="color: #009900; font-weight: bold;">&#93;</span>
<span style="color: #adadad; font-style: italic;">00CF163E</span>  <span style="color: #00007f; font-weight: bold;">mov</span>         <span style="color: #0000ff; font-weight: bold;">dword</span> ptr <span style="color: #009900; font-weight: bold;">&#91;</span>iLocal<span style="color: #009900; font-weight: bold;">&#93;</span><span style="color: #339933;">,</span><span style="color: #46aa03; font-weight: bold;">edx</span></pre></td></tr></table></div>

<p>Since the conditional operator is now assigning from variables we&#8217;d expect it to generate something that looks more like the sort of code we saw from the basic<em><strong> if-else</strong></em> we looked at <a href="http://altdevblogaday.com/2012/03/07/c-c-low-level-curriculum-part-6-conditionals/">last time</a>, which it has.</p>
<p>We have the expected <em><strong>cmp</strong></em> followed by a conditional jump testing against the opposite of the conditional, then two blocks of assembler, the first of which (<em>lines 7 to 9</em>) unconditionally jumps over the second (<em>lines 10 and 11</em>) if it executes, so essentially it&#8217;s behaving more or less as expected; however there&#8217;s clearly some interesting stuff happening in there:</p>
<ol>
<li>the two branches use different registers to store their intermediate values; the first uses <em><strong>eax</strong></em>, the second uses <em><strong>ecx</strong></em></li>
<li>both branches store their result to the same memory address in the Stack (see <a href="http://altdevblogaday.com/2011/12/14/c-c-low-level-curriculum-part-3-the-stack/">this post if you don&#8217;t know or can&#8217;t remember about Stack Frames</a>) &#8211; i.e. <em><strong>[ebp-50h]</strong></em></li>
<li>the code that assigns the value to iLocal (<em>lines 12 and 13</em>) only exists once and is executed regardless of which branch was taken; it takes the value from<em><strong>[ebp-50h]</strong></em> and writes it into iLocal using uses a third register (<em><strong>edx</strong></em>)</li>
</ol>
<p>The use of different registers for the different branches in step 1 looks like it might be significant but (according to several expert sources) this is apparently perfectly normal compiler behaviour and not anything to read into.</p>
<p>Steps 2 and 3 show that the that generated from the conditional operator (at least with VS2010) isn&#8217;t directly equivalent to the intuitively equivalent <em><strong>if-else</strong></em> statement:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
</pre></td><td class="code"><pre class="c" style="font-family:monospace;"><span style="color: #666666; font-style: italic;">// intuitively equivalent if-else of</span>
<span style="color: #666666; font-style: italic;">// int iLocal = (argc &gt; 2 ) ? iOperandTwo : iOperandThree;</span>
<span style="color: #993333;">int</span> iLocal<span style="color: #339933;">;</span>
<span style="color: #b1b100;">if</span><span style="color: #009900;">&#40;</span> argc <span style="color: #339933;">&gt;</span> <span style="color: #0000dd;">2</span> <span style="color: #009900;">&#41;</span>
<span style="color: #009900;">&#123;</span>
    iLocal <span style="color: #339933;">=</span> iOperandTwo<span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span>
<span style="color: #b1b100;">else</span>
<span style="color: #009900;">&#123;</span>
    iLocal <span style="color: #339933;">=</span> iOperandThree<span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span></pre></td></tr></table></div>

<p>Rather than choosing between one of two assignments like this <em><strong>if-else</strong></em>, the assembler generated for our use of the conditional operator does exactly what we told it to: choose one of two values (store it temorarily in the Stack) and assign iLocal from it.</p>
<p>A few final notes on the ? operator:</p>
<ol>
<li>You can see that less lines of C++ code does not equate to less assembler</li>
<li>It can be nested, but don&#8217;t do it! It&#8217;s hideous and will also be very hard to follow when source-level debugging</li>
<li>Be very careful with operator precedence when using it. Use brackets to ensure it will resolve the way you intend.</li>
</ol>
<h1>Switch Statements</h1>
<p>The final type of conditional statement we&#8217;ll be looking at is the switch statement. Like the conditional operator, the switch statement is an often abused and maligned construct that you wouldn&#8217;t want to live without.</p>
<p>To be fair to the switch statement it&#8217;s not the fault of the switch statement that it&#8217;s possible for maniacs to write brittle and insane code using them.</p>
<h2>An aside about switch statements</h2>
<p>Where I have consistently found really horrific examples of switch statements is when an originally stateless synchronous system has been forced to become asyncronous and state driven under time pressure. This specific situation seems always to somehow spawn the kind of monolithic, hard to follow, difficult to change, architecturally brittle switch statements that have given the switch statement a bad rep over the years.</p>
<p>Code that has had network functionality retrofitted to it is (in my experience) an extremely common place to find problem switch statements. It&#8217;s always better to fix a system properly if it starts to look systemically broken than it is to soldier on regardless, and if it looks like you need to introduce a set of states into a system to then (in my experience) it&#8217;s architecturally more sensible to use polymorphic behaviour (e.g. a state class with one or more virtual functions) than a switch statement.</p>
<h2>Where were we?</h2>
<p>Sorry, let&#8217;s get on and take a look at a switch statement&#8230;</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
</pre></td><td class="code"><pre class="c" style="font-family:monospace;"><span style="color: #339933;">#include &quot;stdafx.h&quot;</span>
&nbsp;
<span style="color: #993333;">int</span> main<span style="color: #009900;">&#40;</span><span style="color: #993333;">int</span> argc<span style="color: #339933;">,</span> <span style="color: #993333;">char</span><span style="color: #339933;">*</span> argv<span style="color: #009900;">&#91;</span><span style="color: #009900;">&#93;</span><span style="color: #009900;">&#41;</span>
<span style="color: #009900;">&#123;</span>
    <span style="color: #993333;">int</span> iLocal <span style="color: #339933;">=</span> <span style="color: #0000dd;">0</span><span style="color: #339933;">;</span>
&nbsp;
    <span style="color: #666666; font-style: italic;">// n.b. no &quot;break&quot; in case 1 so we can</span>
    <span style="color: #666666; font-style: italic;">// see what &quot;fall through&quot; looks like</span>
    <span style="color: #b1b100;">switch</span><span style="color: #009900;">&#40;</span> argc <span style="color: #009900;">&#41;</span>
    <span style="color: #009900;">&#123;</span>
    <span style="color: #b1b100;">case</span> <span style="color: #0000dd;">1</span><span style="color: #339933;">:</span>
        iLocal <span style="color: #339933;">=</span> <span style="color: #0000dd;">6</span><span style="color: #339933;">;</span>
    <span style="color: #b1b100;">case</span> <span style="color: #0000dd;">3</span><span style="color: #339933;">:</span>
        iLocal <span style="color: #339933;">=</span> <span style="color: #0000dd;">7</span><span style="color: #339933;">;</span>
        <span style="color: #000000; font-weight: bold;">break</span><span style="color: #339933;">;</span>
    <span style="color: #b1b100;">case</span> <span style="color: #0000dd;">5</span><span style="color: #339933;">:</span>
        iLocal <span style="color: #339933;">=</span> <span style="color: #0000dd;">8</span><span style="color: #339933;">;</span>
        <span style="color: #000000; font-weight: bold;">break</span><span style="color: #339933;">;</span>
    <span style="color: #b1b100;">default</span><span style="color: #339933;">:</span>
        iLocal <span style="color: #339933;">=</span> <span style="color: #0000dd;">9</span><span style="color: #339933;">;</span>
        <span style="color: #000000; font-weight: bold;">break</span><span style="color: #339933;">;</span>
    <span style="color: #009900;">&#125;</span>
&nbsp;
    <span style="color: #b1b100;">return</span> <span style="color: #0000dd;">0</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span></pre></td></tr></table></div>

<p>And here&#8217;s the disassembly&#8230;</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
</pre></td><td class="code"><pre class="asm" style="font-family:monospace;">     <span style="color: #ff0000;">9</span><span style="color: #339933;">:</span>     switch<span style="color: #009900; font-weight: bold;">&#40;</span> argc <span style="color: #009900; font-weight: bold;">&#41;</span>
<span style="color: #adadad; font-style: italic;">00C61620</span>  <span style="color: #00007f; font-weight: bold;">mov</span>         <span style="color: #46aa03; font-weight: bold;">eax</span><span style="color: #339933;">,</span><span style="color: #0000ff; font-weight: bold;">dword</span> ptr <span style="color: #009900; font-weight: bold;">&#91;</span>argc<span style="color: #009900; font-weight: bold;">&#93;</span>
<span style="color: #adadad; font-style: italic;">00C61623</span>  <span style="color: #00007f; font-weight: bold;">mov</span>         <span style="color: #0000ff; font-weight: bold;">dword</span> ptr <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #46aa03; font-weight: bold;">ebp</span><span style="color: #339933;">-</span><span style="color: #ff0000;">48h</span><span style="color: #009900; font-weight: bold;">&#93;</span><span style="color: #339933;">,</span><span style="color: #46aa03; font-weight: bold;">eax</span>
<span style="color: #adadad; font-style: italic;">00C61626</span>  <span style="color: #00007f; font-weight: bold;">cmp</span>         <span style="color: #0000ff; font-weight: bold;">dword</span> ptr <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #46aa03; font-weight: bold;">ebp</span><span style="color: #339933;">-</span><span style="color: #ff0000;">48h</span><span style="color: #009900; font-weight: bold;">&#93;</span><span style="color: #339933;">,</span><span style="color: #ff0000;">1</span>
<span style="color: #adadad; font-style: italic;">00C6162A</span>  <span style="color: #00007f; font-weight: bold;">je</span>          main<span style="color: #339933;">+</span><span style="color: #ff0000;">2Ah</span> <span style="color: #009900; font-weight: bold;">&#40;</span><span style="color: #ff0000;">0C6163Ah</span><span style="color: #009900; font-weight: bold;">&#41;</span>
<span style="color: #adadad; font-style: italic;">00C6162C</span>  <span style="color: #00007f; font-weight: bold;">cmp</span>         <span style="color: #0000ff; font-weight: bold;">dword</span> ptr <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #46aa03; font-weight: bold;">ebp</span><span style="color: #339933;">-</span><span style="color: #ff0000;">48h</span><span style="color: #009900; font-weight: bold;">&#93;</span><span style="color: #339933;">,</span><span style="color: #ff0000;">3</span>
<span style="color: #adadad; font-style: italic;">00C61630</span>  <span style="color: #00007f; font-weight: bold;">je</span>          main<span style="color: #339933;">+</span><span style="color: #ff0000;">31h</span> <span style="color: #009900; font-weight: bold;">&#40;</span><span style="color: #ff0000;">0C61641h</span><span style="color: #009900; font-weight: bold;">&#41;</span>
<span style="color: #adadad; font-style: italic;">00C61632</span>  <span style="color: #00007f; font-weight: bold;">cmp</span>         <span style="color: #0000ff; font-weight: bold;">dword</span> ptr <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #46aa03; font-weight: bold;">ebp</span><span style="color: #339933;">-</span><span style="color: #ff0000;">48h</span><span style="color: #009900; font-weight: bold;">&#93;</span><span style="color: #339933;">,</span><span style="color: #ff0000;">5</span>
<span style="color: #adadad; font-style: italic;">00C61636</span>  <span style="color: #00007f; font-weight: bold;">je</span>          main<span style="color: #339933;">+</span><span style="color: #ff0000;">3Ah</span> <span style="color: #009900; font-weight: bold;">&#40;</span><span style="color: #ff0000;">0C6164Ah</span><span style="color: #009900; font-weight: bold;">&#41;</span>
<span style="color: #adadad; font-style: italic;">00C61638</span>  <span style="color: #00007f; font-weight: bold;">jmp</span>         main<span style="color: #339933;">+</span><span style="color: #ff0000;">43h</span> <span style="color: #009900; font-weight: bold;">&#40;</span><span style="color: #ff0000;">0C61653h</span><span style="color: #009900; font-weight: bold;">&#41;</span>
    <span style="color: #ff0000;">10</span><span style="color: #339933;">:</span>     <span style="color: #009900; font-weight: bold;">&#123;</span>
    <span style="color: #ff0000;">11</span><span style="color: #339933;">:</span>     case <span style="color: #ff0000;">1</span><span style="color: #339933;">:</span>
    <span style="color: #ff0000;">12</span><span style="color: #339933;">:</span>         iLocal = <span style="color: #ff0000;">6</span><span style="color: #666666; font-style: italic;">;</span>
<span style="color: #adadad; font-style: italic;">00C6163A</span>  <span style="color: #00007f; font-weight: bold;">mov</span>         <span style="color: #0000ff; font-weight: bold;">dword</span> ptr <span style="color: #009900; font-weight: bold;">&#91;</span>iLocal<span style="color: #009900; font-weight: bold;">&#93;</span><span style="color: #339933;">,</span><span style="color: #ff0000;">6</span>
    <span style="color: #ff0000;">13</span><span style="color: #339933;">:</span>     case <span style="color: #ff0000;">3</span><span style="color: #339933;">:</span>
    <span style="color: #ff0000;">14</span><span style="color: #339933;">:</span>         iLocal = <span style="color: #ff0000;">7</span><span style="color: #666666; font-style: italic;">;</span>
<span style="color: #adadad; font-style: italic;">00C61641</span>  <span style="color: #00007f; font-weight: bold;">mov</span>         <span style="color: #0000ff; font-weight: bold;">dword</span> ptr <span style="color: #009900; font-weight: bold;">&#91;</span>iLocal<span style="color: #009900; font-weight: bold;">&#93;</span><span style="color: #339933;">,</span><span style="color: #ff0000;">7</span>
    <span style="color: #ff0000;">15</span><span style="color: #339933;">:</span>         break<span style="color: #666666; font-style: italic;">;</span>
<span style="color: #adadad; font-style: italic;">00C61648</span>  <span style="color: #00007f; font-weight: bold;">jmp</span>         main<span style="color: #339933;">+</span><span style="color: #ff0000;">4Ah</span> <span style="color: #009900; font-weight: bold;">&#40;</span><span style="color: #ff0000;">0C6165Ah</span><span style="color: #009900; font-weight: bold;">&#41;</span>
    <span style="color: #ff0000;">16</span><span style="color: #339933;">:</span>     case <span style="color: #ff0000;">5</span><span style="color: #339933;">:</span>
    <span style="color: #ff0000;">17</span><span style="color: #339933;">:</span>         iLocal = <span style="color: #ff0000;">8</span><span style="color: #666666; font-style: italic;">;</span>
<span style="color: #adadad; font-style: italic;">00C6164A</span>  <span style="color: #00007f; font-weight: bold;">mov</span>         <span style="color: #0000ff; font-weight: bold;">dword</span> ptr <span style="color: #009900; font-weight: bold;">&#91;</span>iLocal<span style="color: #009900; font-weight: bold;">&#93;</span><span style="color: #339933;">,</span><span style="color: #ff0000;">8</span>
    <span style="color: #ff0000;">18</span><span style="color: #339933;">:</span>         break<span style="color: #666666; font-style: italic;">;</span>
<span style="color: #adadad; font-style: italic;">00C61651</span>  <span style="color: #00007f; font-weight: bold;">jmp</span>         main<span style="color: #339933;">+</span><span style="color: #ff0000;">4Ah</span> <span style="color: #009900; font-weight: bold;">&#40;</span><span style="color: #ff0000;">0C6165Ah</span><span style="color: #009900; font-weight: bold;">&#41;</span>
    <span style="color: #ff0000;">19</span><span style="color: #339933;">:</span>     <span style="color: #0000ff; font-weight: bold;">default</span><span style="color: #339933;">:</span>
    <span style="color: #ff0000;">20</span><span style="color: #339933;">:</span>         iLocal = <span style="color: #ff0000;">9</span><span style="color: #666666; font-style: italic;">;</span>
<span style="color: #adadad; font-style: italic;">00C61653</span>  <span style="color: #00007f; font-weight: bold;">mov</span>         <span style="color: #0000ff; font-weight: bold;">dword</span> ptr <span style="color: #009900; font-weight: bold;">&#91;</span>iLocal<span style="color: #009900; font-weight: bold;">&#93;</span><span style="color: #339933;">,</span><span style="color: #ff0000;">9</span>
    <span style="color: #ff0000;">21</span><span style="color: #339933;">:</span>         break<span style="color: #666666; font-style: italic;">;</span>
    <span style="color: #ff0000;">22</span><span style="color: #339933;">:</span>     <span style="color: #009900; font-weight: bold;">&#125;</span></pre></td></tr></table></div>

<p>This is more or less exactly what you&#8217;d expect:</p>
<ul>
<li><em>line 1</em> stores <em><strong>argc</strong></em> into the Stack at <em><strong>[ebp-48h]</strong></em></li>
<li>then block from <em>line 2 to 9</em> implements the logic of the <em><strong>switch</strong></em> by a series of comparisons of this value against the constants specified in the <em><strong>case</strong></em> statements and associated conditional jumps to the assembler generated by the code in the corresponding <em><strong>case</strong></em> statement</li>
<li>if none of the conditional jumps are triggered, the logic causes an unconditional jump to the <em><strong>default:</strong></em> case.</li>
<li>in particular, note that:</li>
</ul>
<blockquote>
<ol>
<li>wherever the <em><strong>break</strong></em> keyword is used this causes an unconditional jump past the end of the assembler generated by the switch</li>
<li>the &#8220;drop through&#8221; from <em><strong>case 1:</strong></em> into <em><strong>case 3:</strong></em> in the high level code happens at the assembler level as a by product of the organisation of the adjacent blocks of instructions generated for the <em><strong>switch</strong></em> by the compiler, and the lack of unconditional jump at the end of the assembler for <em><strong>case 1:</strong></em></li>
</ol>
</blockquote>
<p>If you look at assembler from the sample <em><strong>if-else-if-else</strong></em> in the <a href="http://altdevblogaday.com/2012/03/07/c-c-low-level-curriculum-part-6-conditionals/">last article</a>; you should be able to see that the assembler generated for this switch is (more or less) what would happen if we had written the switch as an <em><strong>if-else-if-else</strong></em> and then re-organised the assembler so all the logic was in one place at the top, and the assembler generated for each code block was left where it was.</p>
<p>So other than the fact that the <em><strong>switch</strong></em> statement is a very useful C/C++ language convenience for managing what would often otherwise be messy looking and error prone chains of <em><strong>if-else-if-else</strong></em> statements, based on this example it doesn&#8217;t appear to be doing anything which might offer a significant advantage at the assembler level &#8211; so why would I have claimed that the compiler might generate &#8220;pretty cool assembler&#8221; for a <em><strong>switch</strong></em>?</p>
<p>Before we assume we&#8217;ve seen it all, let&#8217;s try using a contiguous range of values for the constants in the <em><strong>case</strong></em>s of the <em><strong>switch</strong></em>. You know, just for fun &#8211; and for the sake of simplicity let&#8217;s start at 0.</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
</pre></td><td class="code"><pre class="c" style="font-family:monospace;"><span style="color: #339933;">#include &quot;stdafx.h&quot;</span>
&nbsp;
<span style="color: #993333;">int</span> main<span style="color: #009900;">&#40;</span><span style="color: #993333;">int</span> argc<span style="color: #339933;">,</span> <span style="color: #993333;">char</span><span style="color: #339933;">*</span> argv<span style="color: #009900;">&#91;</span><span style="color: #009900;">&#93;</span><span style="color: #009900;">&#41;</span>
<span style="color: #009900;">&#123;</span>
    <span style="color: #993333;">int</span> iLocal <span style="color: #339933;">=</span> <span style="color: #0000dd;">0</span><span style="color: #339933;">;</span>
&nbsp;
    <span style="color: #b1b100;">switch</span><span style="color: #009900;">&#40;</span> argc <span style="color: #009900;">&#41;</span>
    <span style="color: #009900;">&#123;</span>
    <span style="color: #b1b100;">case</span> <span style="color: #0000dd;">0</span><span style="color: #339933;">:</span>
        iLocal <span style="color: #339933;">=</span> <span style="color: #0000dd;">4</span><span style="color: #339933;">;</span>
        <span style="color: #000000; font-weight: bold;">break</span><span style="color: #339933;">;</span>
    <span style="color: #b1b100;">case</span> <span style="color: #0000dd;">1</span><span style="color: #339933;">:</span>
        iLocal <span style="color: #339933;">=</span> <span style="color: #0000dd;">5</span><span style="color: #339933;">;</span>
        <span style="color: #000000; font-weight: bold;">break</span><span style="color: #339933;">;</span>
    <span style="color: #b1b100;">case</span> <span style="color: #0000dd;">2</span><span style="color: #339933;">:</span>
        iLocal <span style="color: #339933;">=</span> <span style="color: #0000dd;">6</span><span style="color: #339933;">;</span>
        <span style="color: #000000; font-weight: bold;">break</span><span style="color: #339933;">;</span>
    <span style="color: #b1b100;">case</span> <span style="color: #0000dd;">3</span><span style="color: #339933;">:</span>
        iLocal <span style="color: #339933;">=</span> <span style="color: #0000dd;">7</span><span style="color: #339933;">;</span>
        <span style="color: #000000; font-weight: bold;">break</span><span style="color: #339933;">;</span>
    <span style="color: #009900;">&#125;</span>
&nbsp;
    <span style="color: #b1b100;">return</span> <span style="color: #0000dd;">0</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span></pre></td></tr></table></div>

<p>And here&#8217;s the disassembly it generates&#8230;</p>
<p><em><strong><span style="color: #ff0000"><a href="http://www.altdevblogaday.com/wp-content/uploads/2012/04/CLLC_7ConditionalsTwo_ContiguousCaseSwitch.png"><img class="alignnone size-full wp-image-25447" src="http://www.altdevblogaday.com/wp-content/uploads/2012/04/CLLC_7ConditionalsTwo_ContiguousCaseSwitch.png" alt="" width="784" height="573" /></a></span></strong></em></p>
<p>Ok, so this time something more interesting is definitely going on &#8211; n.b. I&#8217;ve used a screenshot rather than just pasting the text because we need to look in a memory window to make sense of it.</p>
<p>So what exactly is it doing?</p>
<ul>
<li>it <em><strong>mov</strong></em>es <em><strong>argc</strong></em> into <em><strong>eax,</strong></em> then stores it into the Stack at <em><strong>[ebp-48h]</strong></em></li>
<li>it then <em><strong>c</strong></em>o<em><strong>mp</strong></em>ares the value stored in the address <em><strong>[ebp-48h]</strong></em> with 3 (i.e. our maximum case constant)</li>
<li>if this value is greater than 3 then<em><strong> ja</strong></em> (jump above) on the next line will cause execution to jump to <em><strong>8D1658</strong><strong>h</strong></em> &#8211; the 1st instruction after the code generated by the case blocks, skipping the switch</li>
<li>if the value is less than or equal to 3 then the value is <em><strong>mov</strong></em>ed into <em><strong>ecx</strong></em>, and we then have an unconditional <em><strong>j</strong></em>u<em><strong>mp</strong></em> to &#8230; somewhere :-/</li>
</ul>
<p>Ok, so that final unconditional jump has some syntax we&#8217;ve not yet seen for its address operand, and which clearly isn&#8217;t a constant:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
</pre></td><td class="code"><pre class="asm" style="font-family:monospace;"><span style="color: #00007f; font-weight: bold;">jmp</span>    <span style="color: #0000ff; font-weight: bold;">dword</span> ptr    <span style="color: #009900; font-weight: bold;">&#40;</span><span style="color: #ff0000;">1B1664h</span><span style="color: #009900; font-weight: bold;">&#41;</span><span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #46aa03; font-weight: bold;">ecx</span><span style="color: #339933;">*</span><span style="color: #ff0000;">4</span><span style="color: #009900; font-weight: bold;">&#93;</span></pre></td></tr></table></div>

<p>This says &#8220;jump to the location stored in the memory address at an offset of 4 times the value of <em><strong>ecx</strong></em> from the memory address <em><strong>8D1664h</strong></em>&#8220;, so how is this implementing the logic of the C++ switch statement?</p>
<p>To answer this question we need to look in a memory window at the address <em><strong>8D1664h</strong></em> (n.b. to open a memory window from the menu in VS2010 when debugging go Debug -&gt; Windows -&gt; Memory -&gt; &#8230; and choose one of the memory windows. To set the address just copy and paste it from the disassembly into the &#8220;Address:&#8221; input box. You will also need to right click and choose &#8220;4-byte integer&#8221; and set the &#8220;Columns:&#8221; list box to 1 to have it look like the screenshot above).</p>
<p>So, if you cast your eyes up to the memory window on the left of the screenshot above, you will see that the top 4 rows are highlighted, these values start at address <em><strong><em><strong>8D1664h</strong></em></strong></em> and are 4 byte integers (hence the <em><strong>ecx</strong><strong>*4</strong></em> in the operand) &#8211; which specifically in this case are pointers.</p>
<p>The instruction <em><strong>jmp dword ptr (8D1664h)[ecx*4]</strong></em> will <em><strong>j</strong></em>u<em><strong>mp</strong></em> to the value stored in the address:</p>
<ul>
<li><em><strong><em><strong>8D1664h </strong></em>+ 0</strong></em> = <em><strong><em><strong>8D1664h</strong></em></strong></em> if the value in <em><strong>ecx</strong></em> is 0</li>
<li><em><strong></strong></em><em><strong>8D1664h</strong> <strong>+ 4 </strong></em> = <em><strong><em><strong>8D1668h </strong></em></strong></em>if the value of <em><strong>ecx</strong></em> is 1</li>
<li><em><strong>8D1664h</strong> <strong>+ 8 </strong></em> = <em><strong><em><strong>8D166Ch </strong></em></strong></em>if the value of <em><strong>ecx</strong></em> is 2</li>
<li><em><strong>8D1664h</strong> <strong>+ Ch </strong></em> = <em><strong><em><strong>8D1670h </strong></em></strong></em>if the value of <em><strong>ecx</strong></em> is 3</li>
</ul>
<p>So, the four highlighted rows make up a jump table &#8211; since our <em><strong>case</strong></em> constant&#8217;s range is from 0 to 3 it is an array of 4 pointers &#8211; with each element of the array pointing to the execution address of the <em><strong>case</strong></em> block matching its array index.</p>
<p>You can verify this by checking the addresses of the first instruction generated for each <em><strong>case</strong></em> against the 4 values stored in the array.</p>
<p>Maybe it&#8217;s just me, but I think this is some pretty cool assembler. It&#8217;s certainly more elegant that the assembler generated by the first <em><strong>switch</strong></em> we looked at, but what &#8211; if anything &#8211; is the advantage of this over the assembler that was generated for the previous case statement?</p>
<p>In theory this jump table form reaches the code in constant time for all <em><strong>case</strong></em>s, whereas in the <em><strong>if-else-if-else</strong></em> form the time to reach the code corresponding to each <em><strong>case</strong></em> will be proportional to the number of previous <em><strong>case</strong></em>s in the switch statement.</p>
<p>You&#8217;re pretty unlikely to find that a switch statement is a performance bottleneck in your code (unless you&#8217;ve done <a href="http://stackoverflow.com/questions/927403/overhead-of-a-switch-statement-in-c">something silly</a>) but, all things being equal, the jump table appoach uses less instructions to get to the conditional which is normally A Good Thing and &#8211; in theory &#8211; should make it faster on average.</p>
<p>One final note on <em><strong>switch</strong></em> statements; I am reliably informed that in addition to the <em><strong>if-else-if-else</strong></em> alike linear search behaviour for resolving the correct case to execute, most modern compilers are also capable of generating a <a href="http://en.wikipedia.org/wiki/Binary_search">binary search</a> for the <em><strong>case</strong></em>s of switch statements with appropriate ranges of case constant values.</p>
<p>Using a binary search rather than a linear search will improve average search time from linear to logarithmic (i.e. O(n) to O(log n)). However, in the average case a binary searched <em><strong>switch</strong></em> will still almost always take more instructions and branches to reach the correct case than a jump table switch.</p>
<p>It&#8217;s also possible that the compiler might choose to use one or more of these methods in a single switch, though this would probably require a large number of cases in the switch and ranges of case constants with very specific properties so it&#8217;s not likely you will come across these very often.</p>
<p>A couple of final things to note about switch statements:</p>
<ol>
<li>the compiler should be able to generate a jump table regardless of the order in of the constants in your code (e.g. case 2: &#8230; case 1: &#8230; case 3: &#8230; should still work fine)</li>
<li>having a range of case constants that starts at 0 makes the conditional code around a jump table simpler, as it removes the lower bounds check</li>
<li>a jump table should get created as long as the overall range of constants is large enough and/or closely packed enough for the compiler to decide it&#8217;s worthwhile even if they&#8217;re not completely contiguous. Look at the disassembly if you want to check.</li>
</ol>
<h1> Summary</h1>
<p>So, this concludes our look at conditionals, hopefully you&#8217;ve found it interesting and illuminating ;)</p>
<p>A final point to take away from our look at conditionals is that whilst the compiler <em>could</em> generate the same assembler for an <em><strong>if-else</strong></em> as for the <em><strong>conditional</strong></em> operator it doesn&#8217;t. Similarly, it could generate the same assembler from an <em><strong>if-else-if-else</strong></em> as for a  <em><strong>switch</strong></em> statement but it doesn&#8217;t do that either.</p>
<p>In part, this shows the limits of the compiler but also shows the importance of using the appropriate conditional for purpose &#8211; the benefit is that which you use makes your intent clearer to <em><strong>human</strong></em> readers of your code.</p>
<p>We&#8217;ve now covered enough ground that you should be finding that you can apply the information I&#8217;ve given you to everyday programming problems such as debugging release code, or code you don&#8217;t have debugging information for.</p>
<p>The main things I&#8217;d like you to take away from our look at conditionals are all things that will help you when debugging without symbols:</p>
<ol>
<li>anytime you see <em><strong>cmp</strong></em> followed by a <em><strong>j</strong><strong>xx</strong></em> to a nearby address in the disassembly you&#8217;re probably looking at code generated by a conditional statement in the C/C++ code</li>
<li>if the address operand to the jump instruction is lower than the current instruction&#8217;s address (i.e. it&#8217;s jumping backwards) you&#8217;re most likely looking at a loop</li>
<li>assembler generated from conditionals generally tests the opposite of the test being done in the C / C++ code</li>
</ol>
<p>By using these heuristics, looking at the values in the registers, the values in the Stack that have been written by the assembler, and by looking up your current address in the symbol file to tell you which function you&#8217;re in (if you&#8217;re not generating a symbol file for all your builds you should be &#8211; look in the documentation for your platform&#8217;s compiler toolchain to find out how) you should be able to make an educated guess at what variables in the C/C++ code are likely to be causing the current issue and this will usually tell you why it crashed, or give you a lead so you can Sherlock Holmes your way to the root of the problem &#8211; it&#8217;s certainly a lot quicker than the ubiquitous insertion of the many printf()&#8230;</p>
<p>Our next topic will be loops, which obviously also use conditional jumps (which is why we covered conditionals first&#8230;)</p>
<h1>One final thing&#8230;</h1>
<p>Thanks to Tony, Bruce, and Fabian for extra information, advice, and proof reading.</p>
<p>And, for those of you who like to go off and look for yourselves (hopefully most of you!), I&#8217;ve recently discovered this wiki book on x86 Assembler <a href="http://en.wikibooks.org/wiki/X86_Assembly">http://en.wikibooks.org/wiki/X86_Assembly</a>. It has a large overlap with this series of articles and also covers programming in x86 assembler. Highly recommended &#8211; I&#8217;ve certainly found it pretty useful.</p>
<p>A final, final nugget of wisdom from Bruce Dawson:<em><strong></strong></em></p>
<blockquote><p>Another problem I&#8217;ve seen with ?: is with people who have &#8216;cleverly&#8217; created string classes that both have a const char* constructor and a const char* conversion operator. This has potential to be exquisitely dangerous and exquisitely inefficient, by allowing lots of hidden conversions. These come to their fruition with ?:. Imagine this:</p>
<p>return bFlag ? mStringObject : &#8220;Hello world&#8221;;</p>
<p>The question is, what is the type of this expression? Does mStringObject get converted to a const char*, or does &#8220;Hello world&#8221; get converted to a string object? I have no idea, and nobody should memorize the relevant rules. Such code is too fragile and dangerous.</p>
<p>Some people might assume that the type of the ?: will depend on the return type of the function but that is not true. They are independent. Thus, it is quite possible that &#8220;Hello world&#8221; will be converted to a string object and then (if the return type of the function is const char*) this (temporary!!!) string object will be converted back to a const char*. In addition to being inefficient this leads to undefined behavior, since we are returning a pointer to memory owned by an object that is destroyed as the function exits.</p>
<p>Using the ?: operator with mismatched types is evil evil evil. String classes with a const char* conversion operator are evil evil evil. Putting them together&#8230; priceless.</p></blockquote>
]]></content:encoded>
			<wfw:commentRss>http://www.altdevblogaday.com/2012/04/10/cc-low-level-curriculum-part-7-more-conditionals/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>C / C++ Low Level Curriculum Part 6: Conditionals</title>
		<link>http://www.altdevblogaday.com/2012/03/07/c-c-low-level-curriculum-part-6-conditionals/</link>
		<comments>http://www.altdevblogaday.com/2012/03/07/c-c-low-level-curriculum-part-6-conditionals/#comments</comments>
		<pubDate>Wed, 07 Mar 2012 07:06:41 +0000</pubDate>
		<dc:creator>Alex Darby</dc:creator>
				<category><![CDATA[#gamedev]]></category>
		<category><![CDATA[Education]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[C++]]></category>
		<category><![CDATA[education]]></category>
		<category><![CDATA[gamedev]]></category>
		<category><![CDATA[low level]]></category>

		<guid isPermaLink="false">http://altdevblogaday.com/?p=24724</guid>
		<description><![CDATA[<p>Hello interwebs! As the title suggests this is the 6th part of the C / C++ Low Level Curriculum series I&#8217;ve been doing. In this installment we&#8217;ll be starting to look at conditional statements, and what the code that you&#8217;re asking the compiler to generate when you use them looks like (at least before the optimiser gets to it&#8230;).</p>
<p><a href="http://www.altdevblogaday.com/2012/03/07/c-c-low-level-curriculum-part-6-conditionals/" class="more-link">Read more on C / C++ Low Level Curriculum Part 6: Conditionals&#8230;</a></p>
]]></description>
				<content:encoded><![CDATA[<p>Hello interwebs! As the title suggests this is the 6th part of the C / C++ Low Level Curriculum series I&#8217;ve been doing. In this installment we&#8217;ll be starting to look at conditional statements, and what the code that you&#8217;re asking the compiler to generate when you use them looks like (at least before the optimiser gets to it&#8230;).</p>
<p>Just in case anyone is unclear about what they are, conditionals are the language features that allow us control over which parts of our code get executed. At face value, the subject of conditionals might seem a simple one, but it is precisely because it seems simple &#8211; and because so much else builds on top of it &#8211; that it is the first topic that I&#8217;ve chosen to look at in detail after function calls.</p>
<p>Though we won&#8217;t get around to all of them in this post, our look at conditionals will take us on a tour through a representative sample of x86 disassembly generated by <em><strong>if</strong></em> statements, the<strong> </strong><em><strong>conditional</strong> </em><em><strong>operator</strong></em> (or &#8220;ternary operator&#8221;, or &#8220;question mark&#8221;)<em><strong></strong></em>, and<em><strong> switch</strong></em> statements; and whilst we look at all of these we&#8217;ll also be looking at disassembly generated by the (built in!) relational and logical operators that are used with them (i.e. ==, !=, &lt;=, &gt;=, &gt;, &lt;, !, &amp;&amp;, and ||).</p>
<h1>Prologue</h1>
<p>Firstly, I&#8217;d like to apologise to anyone who reads these posts regularly for the fact that my rate of posting has slowed down &#8211; I will hopefully speed up again to the regular 2 week posting cycle in the near future.</p>
<p>Secondly, here are the backlinks for anyone who wants to start from the beginning of the series (warning: it might take you a while, the first few are quite long):</p>
<ol>
<li><a href="../2012/02/07/2011/11/09/a-low-level-curriculum-for-c-and-c/" rel="nofollow">http://altdevblogaday.com/2011/11/09/a-low-level-curriculum-for-c-and-c/</a></li>
<li><a href="../2012/02/07/2011/11/24/c-c-low-level-curriculum-part-2-data-types/" rel="nofollow">http://altdevblogaday.com/2011/11/24/c-c-low-level-curriculum-part-2-data-types/</a></li>
<li><a href="../2011/12/14/c-c-low-level-curriculum-part-3-the-stack/">http://altdevblogaday.com/2011/12/14/c-c-low-level-curriculum-part-3-the-stack/</a></li>
<li><a href="../2011/12/24/c-c-low-level-curriculum-part-4-more-stack/">http://altdevblogaday.com/2011/12/24/c-c-low-level-curriculum-part-4-more-stack/</a></li>
<li><a href="http://altdevblogaday.com/2012/02/07/c-c-low-level-curriculum-part-5-even-more-stack/">http://altdevblogaday.com/2012/02/07/c-c-low-level-curriculum-part-5-even-more-stack/</a></li>
</ol>
<p>Generally I will try to avoid too much assumed knowledge; but if something comes up that I&#8217;ve explained previously, or that I know another ADBAD author has covered already then I will just link to it; this implies that you, dear reader, should assume that I assume you will read anything I link to if you want to make complete sense of the article :)</p>
<h1>Compiling and running code from this article</h1>
<p>I assume that you are using Windows, are familiar with the VS2010 IDE, and comfortable writing, running, and debugging C++ programs.</p>
<p>As with the previous posts in this series, I’m using a win32 console application made by the “new project” wizard in VS2010 with the default options (VS2010 express edition is fine).</p>
<p>The only change I make from the default project setup is to turn off “Basic Runtime Checks” to make the generated assembler more legible (and significantly faster…) see this <a href="../2011/12/14/c-c-low-level-curriculum-part-3-the-stack/">previous post</a> for details on how to do this.</p>
<p>To run code from this article in a VS2010 project created this way, open the .cpp file that isn’t <em>stdafx.cpp</em> and replace everything in it with text copied and pasted from the code box.</p>
<p>The disassembly we look at is from the debug build configuration, which generates “vanilla” unoptimised win32 x86 code.</p>
<h1>Instructions and Mnemonics: an aside</h1>
<p>I&#8217;ve just realised that so far in this series I have typically been using the term <em><strong>instruction</strong> </em>when referring to an assembler <em><strong>mnemonic</strong></em>.</p>
<p>I felt that I should point out that this isn&#8217;t 100% accurate, because whilst assembler mnemonics are normally thought of as having a 1:1 correspondence to binary CPU instructions, they are <em><strong>not</strong></em> actually instructions.</p>
<p>In fact, in x86 assembler, the menemonics often actually have a 1:x relationship with the corresponding opcodes, because multiple variants of each mnemonic exist that differ in the types and sizes of their operands.</p>
<p>This is not something you should worry yourself about too much, as it&#8217;s a fairly harmless Kenobiism, but I still felt I should point it out if I was going to carry on doing it ;)</p>
<h1>Conditionals</h1>
<p>The best place to start is, as someone or other famously once remarked, at the beginning; so let&#8217;s start with the most basic form of the <em><strong>if</strong></em> statement.</p>
<p>Before anyone mentions it, I know I could have omitted the curly braces around <em><strong>iLocal = 1;</strong> </em>on <strong></strong><em><strong>line 9</strong>. </em>If you&#8217;re the kind of person who&#8217;s so lazy that you like to leave out curly braces in these situations then that&#8217;s up to you; but I would just like to point out that there is probably a special place in one of the deeper and less pleasant circles of the Hell I don&#8217;t believe in that is reserved for your sort &#8211; just a couple of floors up from those who do the same thing with loops.</p>
<p>Also, I&#8217;ve left the <em><strong>#inlcude &#8220;stdafx.h&#8221;</strong></em> in the code box so that your line numbers match mine if you&#8217;re working through this yourself.</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
</pre></td><td class="code"><pre class="c" style="font-family:monospace;"><span style="color: #339933;">#include &quot;stdafx.h&quot;</span>
&nbsp;
<span style="color: #993333;">int</span> main<span style="color: #009900;">&#40;</span><span style="color: #993333;">int</span> argc<span style="color: #339933;">,</span> <span style="color: #993333;">char</span><span style="color: #339933;">*</span> argv<span style="color: #009900;">&#91;</span><span style="color: #009900;">&#93;</span><span style="color: #009900;">&#41;</span>
<span style="color: #009900;">&#123;</span>
    <span style="color: #993333;">int</span> iLocal <span style="color: #339933;">=</span> <span style="color: #0000dd;">0</span><span style="color: #339933;">;</span>
&nbsp;
    <span style="color: #b1b100;">if</span><span style="color: #009900;">&#40;</span> argc <span style="color: #339933;">&lt;</span> <span style="color: #0000dd;">0</span> <span style="color: #009900;">&#41;</span>
    <span style="color: #009900;">&#123;</span>
        iLocal <span style="color: #339933;">=</span> <span style="color: #0000dd;">1</span><span style="color: #339933;">;</span>
    <span style="color: #009900;">&#125;</span>
&nbsp;
    <span style="color: #b1b100;">return</span> <span style="color: #0000dd;">0</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span></pre></td></tr></table></div>

<p>Anyway, as usual if you&#8217;re looking at this in VS2010 then copy and paste the above code over whichever is your project&#8217;s main .cpp file, put a breakpoint on <em><strong>line 7</strong></em>, tell Visual Studio to compile and run, wait for the breakpoint to be hit, then right click in the source window and choose &#8220;Go To Disassembly&#8221;. You should now be seeing something like this:</p>
<div id="attachment_24853" class="wp-caption alignnone" style="width: 665px"><a href="http://altdevblogaday.com/wp-content/uploads/2012/03/6Conditionals_SimpleIfDisasm.png"><img class="size-full wp-image-24853" src="http://altdevblogaday.com/wp-content/uploads/2012/03/6Conditionals_SimpleIfDisasm.png" alt="Simple If Disassembly" width="655" height="431" /></a><p class="wp-caption-text">n.b. right-click and check you have the same options checked as me...</p></div>
<p>As we <a href="http://altdevblogaday.com/2011/12/14/c-c-low-level-curriculum-part-3-the-stack/" target="_blank">already know</a> the assembler above <em><strong>int iLocal = 0;</strong></em> is the function prologue (or preamble) and the assembler after the closing brace of <em><strong>main()</strong></em> is function epilogue.</p>
<p>The specific disassembler we&#8217;re interested in is between <em><strong>lines 7 and 13</strong></em> of the source code that is shown inline with the disassembly, so here it is pasted into a code window (N.B. the addresses corresponding to the disassembly instructions will almost certainly differ on your screen if you&#8217;re running this yourself&#8230;)</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
</pre></td><td class="code"><pre class="asm" style="font-family:monospace;">     <span style="color: #ff0000;">7</span><span style="color: #339933;">:</span>     if<span style="color: #009900; font-weight: bold;">&#40;</span> argc &lt; <span style="color: #ff0000;">0</span> <span style="color: #009900; font-weight: bold;">&#41;</span>
<span style="color: #adadad; font-style: italic;">010D20B0</span>  <span style="color: #00007f; font-weight: bold;">cmp</span>         <span style="color: #0000ff; font-weight: bold;">dword</span> ptr <span style="color: #009900; font-weight: bold;">&#91;</span>argc<span style="color: #009900; font-weight: bold;">&#93;</span><span style="color: #339933;">,</span><span style="color: #ff0000;">0</span>
<span style="color: #adadad; font-style: italic;">010D20B4</span>  <span style="color: #00007f; font-weight: bold;">jge</span>         main<span style="color: #339933;">+</span><span style="color: #ff0000;">1Dh</span> <span style="color: #009900; font-weight: bold;">&#40;</span><span style="color: #ff0000;">10D20BDh</span><span style="color: #009900; font-weight: bold;">&#41;</span>
     <span style="color: #ff0000;">8</span><span style="color: #339933;">:</span>     <span style="color: #009900; font-weight: bold;">&#123;</span>
     <span style="color: #ff0000;">9</span><span style="color: #339933;">:</span>         iLocal = <span style="color: #ff0000;">1</span><span style="color: #666666; font-style: italic;">;</span>
<span style="color: #adadad; font-style: italic;">010D20B6</span>  <span style="color: #00007f; font-weight: bold;">mov</span>         <span style="color: #0000ff; font-weight: bold;">dword</span> ptr <span style="color: #009900; font-weight: bold;">&#91;</span>iLocal<span style="color: #009900; font-weight: bold;">&#93;</span><span style="color: #339933;">,</span><span style="color: #ff0000;">1</span>
    <span style="color: #ff0000;">10</span><span style="color: #339933;">:</span>     <span style="color: #009900; font-weight: bold;">&#125;</span>
    <span style="color: #ff0000;">11</span><span style="color: #339933;">:</span>
    <span style="color: #ff0000;">12</span><span style="color: #339933;">:</span>     return <span style="color: #ff0000;">0</span><span style="color: #666666; font-style: italic;">;</span>
<span style="color: #adadad; font-style: italic;">010D20BD</span>  <span style="color: #00007f; font-weight: bold;">xor</span>         <span style="color: #46aa03; font-weight: bold;">eax</span><span style="color: #339933;">,</span><span style="color: #46aa03; font-weight: bold;">eax</span>
    <span style="color: #ff0000;">13</span><span style="color: #339933;">:</span> <span style="color: #009900; font-weight: bold;">&#125;</span></pre></td></tr></table></div>

<p>Straight away, there are a couple of new assembler mnemonics we&#8217;ve not come across so far in this series of posts. We&#8217;ll cover these as we come to them.</p>
<p><em><strong>line 2</strong></em> is <em><strong>c</strong></em>o<em><strong>mp</strong></em>aring <em><strong>argc</strong></em> against <em><strong>0</strong></em>. The instruction<em> <strong>cmp</strong></em> doesn&#8217;t have an instant effect on code execution, it compares  its first and second operand and stores the result of the comparison in an internal register of the CPU known as <em><strong>EFLAGS</strong></em>.</p>
<p><em><strong>line 3</strong></em> uses the mnemonic<em><strong> jge</strong></em>, which means<em> <strong>j</strong></em>ump <em><strong>g</strong></em>reater <em><strong>e</strong></em>qual. It will cause a jump to the address <em><strong>0x010D20BD</strong></em> supplied as its operand if the outcome of the previous <em><strong>cmp</strong></em> instruction has set the content of the <em><strong>EFLAGS</strong></em> register to indicate that its first operand was greater than or equal to its second operand &#8211; i.e. if <em><strong>argc</strong></em> is greater than or equal to <em><strong>0 </strong></em>then execution will jump past the instructions generated by the block of code controlled by the<em> <strong>if</strong></em>.</p>
<h1>Hold on a minute&#8230;<strong><br />
</strong></h1>
<p>So, we&#8217;ve only covered the most basic form of an <em><strong>if</strong></em> statement and we&#8217;ve already encountered a major difference between what we might think we&#8217;re asking the compiler to do, and the code it&#8217;s generating.</p>
<p>The intuitive way to think about an <em><strong>if</strong></em> block in a high level language is that if the condition of the <em><strong>if</strong></em> is met, then execution will step into the curly braces delimted block of code it controls.</p>
<p>However, the assembler is clearly testing the logical opposite of what we&#8217;ve asked it to, and if that condition is met then it is skipping over the code block controlled by the <em><strong>if</strong></em>.</p>
<p>This is because, at the assembler level, instructions are executed in sequential order unless a jump instruction tells it to do otherwise &#8211; and so assembler has no equivalent to the high level concept of a curly brace delimited &#8220;code block&#8221;. The upshot of this is that the high level notion of &#8220;stepping into&#8221; a code block is implemented at the assembler level by &#8220;not skipping over&#8221; the code the block has generated.</p>
<p>Clearly these two behaviours are logically isomorphic (i.e. produce the same output given the same input), but the high level version is easier for the human mind to cope with intuitively, and the version generated by the compiler better suits the sequential-execution-unless-tampered-with behaviour of the underlying machine.</p>
<p>Just for the sake of clarity let&#8217;s re-write the C++ code in a form that matches what the assembler we just looked at does, using the C++ keyword <em><strong>goto</strong></em>:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
</pre></td><td class="code"><pre class="c" style="font-family:monospace;"><span style="color: #339933;">#include &quot;stdafx.h&quot;</span>
&nbsp;
<span style="color: #993333;">int</span> main<span style="color: #009900;">&#40;</span><span style="color: #993333;">int</span> argc<span style="color: #339933;">,</span> <span style="color: #993333;">char</span><span style="color: #339933;">*</span> argv<span style="color: #009900;">&#91;</span><span style="color: #009900;">&#93;</span><span style="color: #009900;">&#41;</span>
<span style="color: #009900;">&#123;</span>
    <span style="color: #993333;">int</span> iLocal <span style="color: #339933;">=</span> <span style="color: #0000dd;">0</span><span style="color: #339933;">;</span>
&nbsp;
    <span style="color: #666666; font-style: italic;">// corresponding original code in comments to the right...</span>
    <span style="color: #b1b100;">if</span><span style="color: #009900;">&#40;</span> argc <span style="color: #339933;">&gt;=</span> <span style="color: #0000dd;">0</span> <span style="color: #009900;">&#41;</span> <span style="color: #b1b100;">goto</span> GreaterEqualZero<span style="color: #339933;">;</span>   <span style="color: #666666; font-style: italic;">//if( argc &lt; 0 )</span>
                                             <span style="color: #666666; font-style: italic;">//{</span>
    iLocal <span style="color: #339933;">=</span> <span style="color: #0000dd;">1</span><span style="color: #339933;">;</span>                              <span style="color: #666666; font-style: italic;">//    iLocal = 1;</span>
                                             <span style="color: #666666; font-style: italic;">//}</span>
    GreaterEqualZero<span style="color: #339933;">:</span>
&nbsp;
    <span style="color: #b1b100;">return</span> <span style="color: #0000dd;">0</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span></pre></td></tr></table></div>

<p><em><strong>NOTE:</strong></em> Ironically (though unsurprisingly) this C++ code generates different assembler to the original code. Please don&#8217;t worry about this.</p>
<h1>if &#8230; else if &#8230; else</h1>
<p>So let&#8217;s take a look at a more complicated<em> <strong>if</strong></em> construct:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
</pre></td><td class="code"><pre class="c" style="font-family:monospace;"><span style="color: #339933;">#include &quot;stdafx.h&quot;</span>
&nbsp;
<span style="color: #993333;">int</span> main<span style="color: #009900;">&#40;</span><span style="color: #993333;">int</span> argc<span style="color: #339933;">,</span> <span style="color: #993333;">char</span><span style="color: #339933;">*</span> argv<span style="color: #009900;">&#91;</span><span style="color: #009900;">&#93;</span><span style="color: #009900;">&#41;</span>
<span style="color: #009900;">&#123;</span>
    <span style="color: #993333;">int</span> iLocal <span style="color: #339933;">=</span> <span style="color: #0000dd;">0</span><span style="color: #339933;">;</span>
&nbsp;
    <span style="color: #b1b100;">if</span><span style="color: #009900;">&#40;</span> argc <span style="color: #339933;">==</span> <span style="color: #0000dd;">0</span> <span style="color: #009900;">&#41;</span>
    <span style="color: #009900;">&#123;</span>
        iLocal <span style="color: #339933;">=</span> <span style="color: #0000dd;">13</span><span style="color: #339933;">;</span>
    <span style="color: #009900;">&#125;</span>
    <span style="color: #b1b100;">else</span> <span style="color: #b1b100;">if</span><span style="color: #009900;">&#40;</span> argc <span style="color: #339933;">!=</span> <span style="color: #0000dd;">42</span> <span style="color: #009900;">&#41;</span>
    <span style="color: #009900;">&#123;</span>
        iLocal <span style="color: #339933;">=</span> <span style="color: #009900;">&#40;</span><span style="color: #0000dd;">6</span> <span style="color: #339933;">*</span> <span style="color: #0000dd;">9</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    <span style="color: #009900;">&#125;</span>
    <span style="color: #b1b100;">else</span>
    <span style="color: #009900;">&#123;</span>
        iLocal <span style="color: #339933;">=</span> <span style="color: #0000dd;">1066</span><span style="color: #339933;">;</span>
    <span style="color: #009900;">&#125;</span>
&nbsp;
    <span style="color: #b1b100;">return</span> <span style="color: #0000dd;">0</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span></pre></td></tr></table></div>

<p>This code generates the following assembler, which given what we saw in the previous example is more or less exactly what you&#8217;d expect:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
</pre></td><td class="code"><pre class="asm" style="font-family:monospace;">     <span style="color: #ff0000;">7</span><span style="color: #339933;">:</span>     if<span style="color: #009900; font-weight: bold;">&#40;</span> argc == <span style="color: #ff0000;">0</span> <span style="color: #009900; font-weight: bold;">&#41;</span>
<span style="color: #adadad; font-style: italic;">002020B0</span>  <span style="color: #00007f; font-weight: bold;">cmp</span>         <span style="color: #0000ff; font-weight: bold;">dword</span> ptr <span style="color: #009900; font-weight: bold;">&#91;</span>argc<span style="color: #009900; font-weight: bold;">&#93;</span><span style="color: #339933;">,</span><span style="color: #ff0000;">0</span>
<span style="color: #adadad; font-style: italic;">002020B4</span>  <span style="color: #00007f; font-weight: bold;">jne</span>         main<span style="color: #339933;">+</span><span style="color: #ff0000;">1Fh</span> <span style="color: #009900; font-weight: bold;">&#40;</span><span style="color: #ff0000;">2020BFh</span><span style="color: #009900; font-weight: bold;">&#41;</span>
     <span style="color: #ff0000;">8</span><span style="color: #339933;">:</span>     <span style="color: #009900; font-weight: bold;">&#123;</span>
     <span style="color: #ff0000;">9</span><span style="color: #339933;">:</span>         iLocal = <span style="color: #ff0000;">13</span><span style="color: #666666; font-style: italic;">;</span>
<span style="color: #adadad; font-style: italic;">002020B6</span>  <span style="color: #00007f; font-weight: bold;">mov</span>         <span style="color: #0000ff; font-weight: bold;">dword</span> ptr <span style="color: #009900; font-weight: bold;">&#91;</span>iLocal<span style="color: #009900; font-weight: bold;">&#93;</span><span style="color: #339933;">,</span><span style="color: #ff0000;">0Dh</span>
<span style="color: #adadad; font-style: italic;">002020BD</span>  <span style="color: #00007f; font-weight: bold;">jmp</span>         main<span style="color: #339933;">+</span><span style="color: #ff0000;">35h</span> <span style="color: #009900; font-weight: bold;">&#40;</span><span style="color: #ff0000;">2020D5h</span><span style="color: #009900; font-weight: bold;">&#41;</span>
    <span style="color: #ff0000;">10</span><span style="color: #339933;">:</span>     <span style="color: #009900; font-weight: bold;">&#125;</span>
    <span style="color: #ff0000;">11</span><span style="color: #339933;">:</span>     else if<span style="color: #009900; font-weight: bold;">&#40;</span> argc != <span style="color: #ff0000;">42</span> <span style="color: #009900; font-weight: bold;">&#41;</span>
<span style="color: #adadad; font-style: italic;">002020BF</span>  <span style="color: #00007f; font-weight: bold;">cmp</span>         <span style="color: #0000ff; font-weight: bold;">dword</span> ptr <span style="color: #009900; font-weight: bold;">&#91;</span>argc<span style="color: #009900; font-weight: bold;">&#93;</span><span style="color: #339933;">,</span><span style="color: #ff0000;">2Ah</span>
<span style="color: #adadad; font-style: italic;">002020C3</span>  <span style="color: #00007f; font-weight: bold;">je</span>          main<span style="color: #339933;">+</span><span style="color: #ff0000;">2Eh</span> <span style="color: #009900; font-weight: bold;">&#40;</span><span style="color: #ff0000;">2020CEh</span><span style="color: #009900; font-weight: bold;">&#41;</span>
    <span style="color: #ff0000;">12</span><span style="color: #339933;">:</span>     <span style="color: #009900; font-weight: bold;">&#123;</span>
    <span style="color: #ff0000;">13</span><span style="color: #339933;">:</span>         iLocal = <span style="color: #009900; font-weight: bold;">&#40;</span><span style="color: #ff0000;">6</span> <span style="color: #339933;">*</span> <span style="color: #ff0000;">9</span><span style="color: #009900; font-weight: bold;">&#41;</span><span style="color: #666666; font-style: italic;">;</span>
<span style="color: #adadad; font-style: italic;">002020C5</span>  <span style="color: #00007f; font-weight: bold;">mov</span>         <span style="color: #0000ff; font-weight: bold;">dword</span> ptr <span style="color: #009900; font-weight: bold;">&#91;</span>iLocal<span style="color: #009900; font-weight: bold;">&#93;</span><span style="color: #339933;">,</span><span style="color: #ff0000;">36h</span>
    <span style="color: #ff0000;">14</span><span style="color: #339933;">:</span>     <span style="color: #009900; font-weight: bold;">&#125;</span>
    <span style="color: #ff0000;">15</span><span style="color: #339933;">:</span>     else
<span style="color: #adadad; font-style: italic;">002020CC</span>  <span style="color: #00007f; font-weight: bold;">jmp</span>         main<span style="color: #339933;">+</span><span style="color: #ff0000;">35h</span> <span style="color: #009900; font-weight: bold;">&#40;</span><span style="color: #ff0000;">2020D5h</span><span style="color: #009900; font-weight: bold;">&#41;</span>
    <span style="color: #ff0000;">16</span><span style="color: #339933;">:</span>     <span style="color: #009900; font-weight: bold;">&#123;</span>
    <span style="color: #ff0000;">17</span><span style="color: #339933;">:</span>         iLocal = <span style="color: #ff0000;">1066</span><span style="color: #666666; font-style: italic;">;</span>
<span style="color: #adadad; font-style: italic;">002020CE</span>  <span style="color: #00007f; font-weight: bold;">mov</span>         <span style="color: #0000ff; font-weight: bold;">dword</span> ptr <span style="color: #009900; font-weight: bold;">&#91;</span>iLocal<span style="color: #009900; font-weight: bold;">&#93;</span><span style="color: #339933;">,</span><span style="color: #ff0000;">42Ah</span>
    <span style="color: #ff0000;">18</span><span style="color: #339933;">:</span>     <span style="color: #009900; font-weight: bold;">&#125;</span>
    <span style="color: #ff0000;">19</span><span style="color: #339933;">:</span>
    <span style="color: #ff0000;">20</span><span style="color: #339933;">:</span>     return <span style="color: #ff0000;">0</span><span style="color: #666666; font-style: italic;">;</span>
<span style="color: #adadad; font-style: italic;">002020D5</span>  <span style="color: #00007f; font-weight: bold;">xor</span>         <span style="color: #46aa03; font-weight: bold;">eax</span><span style="color: #339933;">,</span><span style="color: #46aa03; font-weight: bold;">eax</span></pre></td></tr></table></div>

<p>The main things to note about this code are:</p>
<ol>
<li>Each <em><strong>if</strong></em> and <em><strong>else if</strong></em> condition is implemented as a <em><strong>cmp</strong> followed by a <strong>jxx</strong></em> &#8211; there are two new ones in here: <em><strong>je</strong></em> (<em><strong>j</strong></em>ump <em><strong>e</strong></em>qual) and <em><strong>jne</strong></em> (<em><strong>j</strong></em>ump <em><strong>n</strong></em>ot <em><strong>e</strong></em>qual)</li>
<li>As in the first example, each <em><strong>if</strong></em> and <em><strong>else if</strong></em> condition is causing the compiler to generate the logically opposite test to the high level language, and skipping the assembler generated by the controlled block of code if it succeeds</li>
<li>The test for the first <em><strong>if</strong></em> jumps to the condition of the <em><strong>else if</strong></em> when its condition is not met. If there were more chained <em><strong>else if</strong></em> statements then this pattern would continue through them.</li>
<li>Each block of code has an unconditonal <em><strong>jmp</strong></em> at the end of it that takes the execution past the code block controlled by the <em><strong>else</strong></em></li>
</ol>
<p>That was all pretty straightforward for once. Joy.</p>
<p>Next, let&#8217;s take a look at the effects of the<em><strong> &amp;&amp;</strong></em> and <em><strong>||</strong></em> operators:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
</pre></td><td class="code"><pre class="c" style="font-family:monospace;"><span style="color: #339933;">#include &quot;stdafx.h&quot;</span>
&nbsp;
<span style="color: #993333;">int</span> main<span style="color: #009900;">&#40;</span><span style="color: #993333;">int</span> argc<span style="color: #339933;">,</span> <span style="color: #993333;">char</span><span style="color: #339933;">*</span> argv<span style="color: #009900;">&#91;</span><span style="color: #009900;">&#93;</span><span style="color: #009900;">&#41;</span>
<span style="color: #009900;">&#123;</span>
    <span style="color: #993333;">int</span> iLocal <span style="color: #339933;">=</span> <span style="color: #0000dd;">0</span><span style="color: #339933;">;</span>
&nbsp;
    <span style="color: #b1b100;">if</span><span style="color: #009900;">&#40;</span> <span style="color: #009900;">&#40;</span> argc <span style="color: #339933;">&gt;=</span> <span style="color: #0000dd;">7</span> <span style="color: #009900;">&#41;</span> <span style="color: #339933;">&amp;&amp;</span> <span style="color: #009900;">&#40;</span> argc <span style="color: #339933;">&lt;=</span> <span style="color: #0000dd;">13</span> <span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#41;</span>
    <span style="color: #009900;">&#123;</span>
        iLocal <span style="color: #339933;">=</span> <span style="color: #0000dd;">1024</span><span style="color: #339933;">;</span>
    <span style="color: #009900;">&#125;</span>
    <span style="color: #b1b100;">else</span> <span style="color: #b1b100;">if</span><span style="color: #009900;">&#40;</span> argc <span style="color: #339933;">||</span> <span style="color: #009900;">&#40;</span> <span style="color: #339933;">!</span>argc <span style="color: #009900;">&#41;</span> <span style="color: #339933;">||</span> <span style="color: #009900;">&#40;</span> argc <span style="color: #339933;">==</span> <span style="color: #0000dd;">69</span> <span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#41;</span> <span style="color: #666666; font-style: italic;">// deliberately nonsensical test</span>
    <span style="color: #009900;">&#123;</span>
        iLocal <span style="color: #339933;">=</span> <span style="color: #0000dd;">666</span><span style="color: #339933;">;</span>
    <span style="color: #009900;">&#125;</span>
&nbsp;
    <span style="color: #b1b100;">return</span> <span style="color: #0000dd;">0</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span></pre></td></tr></table></div>

<p>This generates the following assembler, which is much more interesting than the  first <em><strong>if &#8230; else if</strong></em> example:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
</pre></td><td class="code"><pre class="asm" style="font-family:monospace;">     <span style="color: #ff0000;">7</span><span style="color: #339933;">:</span>     if<span style="color: #009900; font-weight: bold;">&#40;</span> <span style="color: #009900; font-weight: bold;">&#40;</span> argc &gt;= <span style="color: #ff0000;">7</span> <span style="color: #009900; font-weight: bold;">&#41;</span> &amp;&amp; <span style="color: #009900; font-weight: bold;">&#40;</span> argc &lt;= <span style="color: #ff0000;">13</span> <span style="color: #009900; font-weight: bold;">&#41;</span> <span style="color: #009900; font-weight: bold;">&#41;</span>
<span style="color: #adadad; font-style: italic;">00F120B0</span>  <span style="color: #00007f; font-weight: bold;">cmp</span>         <span style="color: #0000ff; font-weight: bold;">dword</span> ptr <span style="color: #009900; font-weight: bold;">&#91;</span>argc<span style="color: #009900; font-weight: bold;">&#93;</span><span style="color: #339933;">,</span><span style="color: #ff0000;">7</span>
<span style="color: #adadad; font-style: italic;">00F120B4</span>  <span style="color: #00007f; font-weight: bold;">jl</span>          main<span style="color: #339933;">+</span><span style="color: #ff0000;">25h</span> <span style="color: #009900; font-weight: bold;">&#40;</span><span style="color: #ff0000;">0F120C5h</span><span style="color: #009900; font-weight: bold;">&#41;</span>
<span style="color: #adadad; font-style: italic;">00F120B6</span>  <span style="color: #00007f; font-weight: bold;">cmp</span>         <span style="color: #0000ff; font-weight: bold;">dword</span> ptr <span style="color: #009900; font-weight: bold;">&#91;</span>argc<span style="color: #009900; font-weight: bold;">&#93;</span><span style="color: #339933;">,</span><span style="color: #ff0000;">0Dh</span>
<span style="color: #adadad; font-style: italic;">00F120BA</span>  <span style="color: #00007f; font-weight: bold;">jg</span>          main<span style="color: #339933;">+</span><span style="color: #ff0000;">25h</span> <span style="color: #009900; font-weight: bold;">&#40;</span><span style="color: #ff0000;">0F120C5h</span><span style="color: #009900; font-weight: bold;">&#41;</span>
     <span style="color: #ff0000;">8</span><span style="color: #339933;">:</span>     <span style="color: #009900; font-weight: bold;">&#123;</span>
     <span style="color: #ff0000;">9</span><span style="color: #339933;">:</span>         iLocal = <span style="color: #ff0000;">1024</span><span style="color: #666666; font-style: italic;">;</span>
<span style="color: #adadad; font-style: italic;">00F120BC</span>  <span style="color: #00007f; font-weight: bold;">mov</span>         <span style="color: #0000ff; font-weight: bold;">dword</span> ptr <span style="color: #009900; font-weight: bold;">&#91;</span>iLocal<span style="color: #009900; font-weight: bold;">&#93;</span><span style="color: #339933;">,</span><span style="color: #ff0000;">400h</span>
<span style="color: #adadad; font-style: italic;">00F120C3</span>  <span style="color: #00007f; font-weight: bold;">jmp</span>         main<span style="color: #339933;">+</span><span style="color: #ff0000;">3Eh</span> <span style="color: #009900; font-weight: bold;">&#40;</span><span style="color: #ff0000;">0F120DEh</span><span style="color: #009900; font-weight: bold;">&#41;</span>
    <span style="color: #ff0000;">10</span><span style="color: #339933;">:</span>     <span style="color: #009900; font-weight: bold;">&#125;</span>
    <span style="color: #ff0000;">11</span><span style="color: #339933;">:</span>     else if<span style="color: #009900; font-weight: bold;">&#40;</span> argc || <span style="color: #009900; font-weight: bold;">&#40;</span> !argc <span style="color: #009900; font-weight: bold;">&#41;</span> || <span style="color: #009900; font-weight: bold;">&#40;</span> argc == <span style="color: #ff0000;">69</span> <span style="color: #009900; font-weight: bold;">&#41;</span> <span style="color: #009900; font-weight: bold;">&#41;</span>
<span style="color: #adadad; font-style: italic;">00F120C5</span>  <span style="color: #00007f; font-weight: bold;">cmp</span>         <span style="color: #0000ff; font-weight: bold;">dword</span> ptr <span style="color: #009900; font-weight: bold;">&#91;</span>argc<span style="color: #009900; font-weight: bold;">&#93;</span><span style="color: #339933;">,</span><span style="color: #ff0000;">0</span>
<span style="color: #adadad; font-style: italic;">00F120C9</span>  <span style="color: #00007f; font-weight: bold;">jne</span>         main<span style="color: #339933;">+</span><span style="color: #ff0000;">37h</span> <span style="color: #009900; font-weight: bold;">&#40;</span><span style="color: #ff0000;">0F120D7h</span><span style="color: #009900; font-weight: bold;">&#41;</span>
<span style="color: #adadad; font-style: italic;">00F120CB</span>  <span style="color: #00007f; font-weight: bold;">cmp</span>         <span style="color: #0000ff; font-weight: bold;">dword</span> ptr <span style="color: #009900; font-weight: bold;">&#91;</span>argc<span style="color: #009900; font-weight: bold;">&#93;</span><span style="color: #339933;">,</span><span style="color: #ff0000;">0</span>
<span style="color: #adadad; font-style: italic;">00F120CF</span>  <span style="color: #00007f; font-weight: bold;">je</span>          main<span style="color: #339933;">+</span><span style="color: #ff0000;">37h</span> <span style="color: #009900; font-weight: bold;">&#40;</span><span style="color: #ff0000;">0F120D7h</span><span style="color: #009900; font-weight: bold;">&#41;</span>
<span style="color: #adadad; font-style: italic;">00F120D1</span>  <span style="color: #00007f; font-weight: bold;">cmp</span>         <span style="color: #0000ff; font-weight: bold;">dword</span> ptr <span style="color: #009900; font-weight: bold;">&#91;</span>argc<span style="color: #009900; font-weight: bold;">&#93;</span><span style="color: #339933;">,</span><span style="color: #ff0000;">45h</span>
<span style="color: #adadad; font-style: italic;">00F120D5</span>  <span style="color: #00007f; font-weight: bold;">jne</span>         main<span style="color: #339933;">+</span><span style="color: #ff0000;">3Eh</span> <span style="color: #009900; font-weight: bold;">&#40;</span><span style="color: #ff0000;">0F120DEh</span><span style="color: #009900; font-weight: bold;">&#41;</span>
    <span style="color: #ff0000;">12</span><span style="color: #339933;">:</span>     <span style="color: #009900; font-weight: bold;">&#123;</span>
    <span style="color: #ff0000;">13</span><span style="color: #339933;">:</span>         iLocal = <span style="color: #ff0000;">666</span><span style="color: #666666; font-style: italic;">;</span>
<span style="color: #adadad; font-style: italic;">00F120D7</span>  <span style="color: #00007f; font-weight: bold;">mov</span>         <span style="color: #0000ff; font-weight: bold;">dword</span> ptr <span style="color: #009900; font-weight: bold;">&#91;</span>iLocal<span style="color: #009900; font-weight: bold;">&#93;</span><span style="color: #339933;">,</span><span style="color: #ff0000;">29Ah</span>
    <span style="color: #ff0000;">14</span><span style="color: #339933;">:</span>     <span style="color: #009900; font-weight: bold;">&#125;</span>
    <span style="color: #ff0000;">15</span><span style="color: #339933;">:</span>
    <span style="color: #ff0000;">16</span><span style="color: #339933;">:</span>     return <span style="color: #ff0000;">0</span><span style="color: #666666; font-style: italic;">;</span>
<span style="color: #adadad; font-style: italic;">00F120DE</span>  <span style="color: #00007f; font-weight: bold;">xor</span>         <span style="color: #46aa03; font-weight: bold;">eax</span><span style="color: #339933;">,</span><span style="color: #46aa03; font-weight: bold;">eax</span>
    <span style="color: #ff0000;">17</span><span style="color: #339933;">:</span> <span style="color: #009900; font-weight: bold;">&#125;</span></pre></td></tr></table></div>

<p>Now, I don&#8217;t know about you but the first time I saw assembler generated by using<em><strong> &amp;&amp;</strong></em> and <em><strong>||</strong></em> I was amazed by the sheer simplistic audacity of it &#8211; I think it&#8217;s because I&#8217;m not an assembler programmer, but I expected it to be a little more complicated and fiddly than this.</p>
<p>Looking in detail at the code generated for the <em><strong>if</strong></em> statement using<em><strong> &amp;&amp;</strong></em> (<em><strong>lines 2 to 5</strong></em>), we can see that is using another two conditional jump instructions we&#8217;ve not yet seen: <em><strong>jl</strong></em> (<em><strong>j</strong></em>ump <em><strong>l</strong></em>ess) and <em><strong>jg</strong></em> (<em><strong>j</strong></em>ump <em><strong>g</strong></em>reater) and as before is testing the logically opposite condition to that specified by the high level code.</p>
<p>More interestingly, in order to implement <em><strong>&amp;&amp;</strong></em>, the compiler simply concatenates the separate tests &#8211; if either of these tests fails they will cause execution to jump past the block of code controlled by the <em><strong>if</strong></em> statement. This means that the block of code controlled by the <em><strong>if</strong></em> will only be executed if both tests are passed, which clearly implements a logical AND.</p>
<p>If we now turn our attention to the code generated by the <em><strong>if</strong></em> statement using <em><strong>||</strong></em> (<em><strong>lines 12 to 17</strong></em>) we see a similar pattern of consecutive conditional tests, though clearly it must be different since it implements conditions joined by<em><strong> ||</strong></em>.</p>
<p>The first thing to notice is that the first two tests done by the assembler are logically the same as their high level equivalents. This bucks the trend we have seen so far, but why?</p>
<p>Well, the address passed as operands to the conditional jumps on <em><strong>lines 13 </strong></em>and<strong> </strong><em><strong>15</strong></em> will move execution past the rest of the tests, to the start of the controlled code block. Unsurprisingly though, the last test of the<em><strong> || if</strong></em> statement (<em><strong>lines 16 &amp; 17</strong></em>) follows the standard test-the-opposite-and-jump-past idiom we&#8217;ve come to expect from an <em><strong>if</strong></em> statement.</p>
<p>The jump-into-controlled-block behaviour of all but the last <em><strong>||</strong></em> conditional means that as soon as any one of the tests is passed the controlled code will be executed, which clearly implements a logical OR.</p>
<h1>Aside: Lazy Evaluation</h1>
<p>I&#8217;m sure that most &#8211; if not all &#8211; of you will have heard that C++ has &#8220;lazy evaluation&#8221; of &amp;&amp; and ||. If you&#8217;ve never been 100% sure of what this means, you&#8217;ve just seen it in action in this block of assembler!</p>
<p>The &amp;&amp; will fail if either of its operands fails; so if the first test fails it will never do the second (or third, or fourth &#8230;).</p>
<p>Similarly the || will succeed if either of its operands succeeds; so if the first test passes it will never do the second (or third, or forth &#8230;).</p>
<p>Since neither necessarily evaluates all of its operands this makes them technically &#8220;lazy&#8221;; which in this circumstance you can read as awesome, elegant, and efficient (for certain definitions of efficient).</p>
<h1>Summary</h1>
<p>The main points to take away from the assembler we&#8217;ve looked at in this post are that:</p>
<ol>
<li>The conditional test that you see in the disassembly is likely to be the logical opposite of the test the high level code is asking for&#8230;</li>
<li>&#8230;and the conditional jump will typically be jumping over the assembler that is generated by the &#8220;code block&#8221; controlled by the conditional in the the high level code.</li>
<li>This is because there is no concept of a &#8220;code block&#8221; at the level of assembler.</li>
</ol>
<p>More or less all control code boils down to various combinations of conditionals and jumps at the assembly level; and being familiar with the assembler mnemonics that are used to implement these C / C++ features, and the various ways that they are used will almost certainly prove invaluable when you find yourself in the unenviable situation of a crash deep within some library code that you don&#8217;t have symbols for (or that your debugger can&#8217;t find symbols for).</p>
<p>Incidentally if you find yourself lost in code that you should have symbols for but your machine refuses to find them, you might try <a href="http://altdevblogaday.com/2011/11/08/source-indexing-is-underused-awesomeness/">this post</a> by Bruce Dawson to see if it helps ;)</p>
<p>Next time we&#8217;ll continue looking at conditionals with the <em><strong>conditional</strong></em> operator (also known as the &#8220;ternary operator&#8221; or more commonly the question mark), and the the <em><strong>switch</strong></em> statement.</p>
<p>Also, thanks to Fabian and Bruce for giving this a once-over and offering sage advice on content.</p>
<p>&nbsp;</p>
<p><em><strong>Disclaimers</strong></em></p>
<p>I am pretty sure that the code in this article doesn&#8217;t demonstrate all the relational operators; so I&#8217;m leaving it to you, dear reader, to try out the ones I left out to see what they do :)</p>
<p>I also avoided writing any conditions for the <em><strong>if</strong></em> statements that contained function calls, clearly this will make the assembler generated by the test code significantly more complex and assuming that you have read the previous posts on the assembler generated when calling functions too you should be able to make sense of this by yourself. I have to admit that I also partly avoided doing this so I could steer clear of operator overloading. That&#8217;s for later. Probably.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.altdevblogaday.com/2012/03/07/c-c-low-level-curriculum-part-6-conditionals/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>C / C++ Low Level Curriculum Part 5: Even More Stack</title>
		<link>http://www.altdevblogaday.com/2012/02/07/c-c-low-level-curriculum-part-5-even-more-stack/</link>
		<comments>http://www.altdevblogaday.com/2012/02/07/c-c-low-level-curriculum-part-5-even-more-stack/#comments</comments>
		<pubDate>Tue, 07 Feb 2012 07:06:59 +0000</pubDate>
		<dc:creator>Alex Darby</dc:creator>
				<category><![CDATA[#gamedev]]></category>
		<category><![CDATA[Education]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[C++]]></category>
		<category><![CDATA[education]]></category>
		<category><![CDATA[low level]]></category>
		<category><![CDATA[programming]]></category>

		<guid isPermaLink="false">http://altdevblogaday.com/?p=23824</guid>
		<description><![CDATA[<p>Welcome to the 5th installment of the series I&#8217;m doing on a C/C++ Low-Level Curriculum. This is the 3rd post about the Stack, the fundamentals have been covered a couple of posts ago, and the previous post and this one are really just for extra information to round out the picture of ways the Stack is used in win32 x86 function calls &#8211; then we can move on to other low level aspects of the C/C++ languages.</p>
<p><a href="http://www.altdevblogaday.com/2012/02/07/c-c-low-level-curriculum-part-5-even-more-stack/" class="more-link">Read more on C / C++ Low Level Curriculum Part 5: Even More Stack&#8230;</a></p>
]]></description>
				<content:encoded><![CDATA[<p>Welcome to the 5th installment of the series I&#8217;m doing on a C/C++ Low-Level Curriculum. This is the 3rd post about the Stack, the fundamentals have been covered a couple of posts ago, and the previous post and this one are really just for extra information to round out the picture of ways the Stack is used in win32 x86 function calls &#8211; then we can move on to other low level aspects of the C/C++ languages.</p>
<p>The last two (win32 x86) function calling conventions we&#8217;re going to look at are <strong>thiscall</strong> which is used for calling non-static member functions of classes, and <strong>fastcall</strong> which emphasises register use over stack use for parameters. As with the previous posts about the Stack, the point of this isn&#8217;t so much the specific calling conventions that we&#8217;re examining, but rather to see the different ways that the Stack and registers are used to pass information around when functions are called.</p>
<p>&nbsp;</p>
<h2><strong>Previously on #AltDevBlogADay&#8230;</strong></h2>
<p>If you missed the previous C/C++ Low Level Curriculum posts, here are some backlinks:</p>
<ol>
<li><a href="../2011/11/09/a-low-level-curriculum-for-c-and-c/" rel="nofollow">http://altdevblogaday.com/2011/11/09/a-low-level-curriculum-for-c-and-c/</a></li>
<li><a href="../2011/11/24/c-c-low-level-curriculum-part-2-data-types/" rel="nofollow">http://altdevblogaday.com/2011/11/24/c-c-low-level-curriculum-part-2-data-types/</a></li>
<li><a href="http://altdevblogaday.com/2011/12/14/c-c-low-level-curriculum-part-3-the-stack/">http://altdevblogaday.com/2011/12/14/c-c-low-level-curriculum-part-3-the-stack/</a></li>
<li><a href="http://altdevblogaday.com/2011/12/24/c-c-low-level-curriculum-part-4-more-stack/">http://altdevblogaday.com/2011/12/24/c-c-low-level-curriculum-part-4-more-stack/</a></li>
</ol>
<p>Generally I will try to avoid too much assumed knowledge, but this post does assume that you have read the posts linked above as 3 and 4 (or have a working knowledge of how the Stack works in vanilla x86 assembler, in which case why are you reading this!?).</p>
<p>&nbsp;</p>
<h2>Compiling and running code from this article</h2>
<p>I assume that you are familiar with the VS2010 IDE, and comfortable writing, running, and debugging C++ programs.</p>
<p>As with the previous posts in this series, I&#8217;m using a win32 console application made by the &#8220;new project&#8221; wizard in VS2010 with the default options (express edition is fine).</p>
<p>The only change I make from the default project setup is to turn off &#8220;Basic Runtime Checks&#8221; to make the generated assembler more legible (and significantly faster&#8230;) see this <a href="http://altdevblogaday.com/2011/12/14/c-c-low-level-curriculum-part-3-the-stack/">previous post</a> for details on how to do this.</p>
<p>To run code from this article in a VS2010 project created this way open the .cpp file that isn&#8217;t <em>stdafx.cpp</em> and replace everything below the line: #include &#8220;stdafx.h&#8221; with text copied and pasted from the code box.</p>
<p>The disassembly we look at is from the debug build configuration, which will generate &#8220;vanilla&#8221; unoptimised win32 x86 code.</p>
<p>&nbsp;</p>
<h2>The &#8220;thiscall&#8221; calling convention</h2>
<p>As I&#8217;m sure you&#8217;re aware, in any non-static class member function it is possible to access a pointer to the instance of the class that the function was called on via the C++ keyword <strong>this</strong>.</p>
<p>The presence of the <strong>this</strong> pointer is often explained away by saying that it is an invisible &#8220;0th parameter to member functions&#8221;, which isn&#8217;t necessarily incorrect but <em>is</em> the same kind of truth that Obiwan Kenobi might have dealt in if he had been a computer science professor rather than a retired Jedi Knight; that is to say &#8220;true, from a certain point of view&#8221;.</p>
<p>The <strong>thiscall</strong> calling convention is more or less exactly the same as the <strong>stdcall</strong> calling convention we have already looked at in some detail in the last two posts (<a href="http://altdevblogaday.com/2011/12/14/c-c-low-level-curriculum-part-3-the-stack/">this-&gt;pPrevious-&gt;pPrevious</a>, <a href="http://altdevblogaday.com/2011/12/24/c-c-low-level-curriculum-part-4-more-stack/">this-&gt;pPrevious</a>). Though it is the default calling convention used by the VS2010 compiler for non-static member functions, it&#8217;s worth noting that there are situations where the compiler won&#8217;t use it (e.g. if your function uses the elipsis operator to take a varaible number of arguments).</p>
<p>As we have seen in the last two posts; the unoptimised win32 x86 <strong>stdcall</strong> calling convention passes its parameters on the Stack. The <strong>thiscall</strong> convention obviously must somehow pass the <strong>this</strong> pointer to member functions, but rather than storing an extra parameter on the Stack, it uses a register (<strong>ecx</strong>) to pass it to the called function.</p>
<p>The code below demonstrates this&#8230;</p>
<pre>class CSumOf
{
public:
    int m_iSumOf;

    void SumOf( int iParamOne, int iParamTwo )
    {
        m_iSumOf = iParamOne + iParamTwo;
    }
};

int main( int argc, char** argv )
{
    int iValOne        = 1;
    int iValTwo        = 2;
    CSumOf cMySumOf;
    cMySumOf.SumOf( iValOne, iValTwo );
    return 0;
}</pre>
<p>Paste this into VS2010, and put a breakpoint on the line</p>
<pre>cMySumOf.SumOf( iValOne, iValTwo );</pre>
<p>Run the debug build configuration; when the breakpoint is hit, right click and choose &#8220;Go To Disassembly&#8221;, and you should see something like this (n.b. the addresses in the leftmost column of the disassembly will almost certainly differ):</p>
<div id="attachment_23833" class="wp-caption alignnone" style="width: 718px"><a href="http://altdevblogaday.com/wp-content/uploads/2012/02/EvenMoreStack_thiscall_main1.png"><img class="size-full wp-image-23833" src="http://altdevblogaday.com/wp-content/uploads/2012/02/EvenMoreStack_thiscall_main1.png" alt="" width="708" height="508" /></a><p class="wp-caption-text">Make sure that the check boxes in your right-click context menu match those shown in this screenshot, or your disassembly will not match mine!</p></div>
<p>The block of assembler that we&#8217;re interested in for the purposes of illustrating how the <strong>thiscall</strong> convention works is shown below:</p>
<pre>    14:     int iValOne        = 1;
00EE1259  mov         dword ptr [iValOne],1
    15:     int iValTwo        = 2;
00EE1260  mov         dword ptr [iValTwo],2
    16:     CSumOf cMySumOf;
    17:     cMySumOf.SumOf( iValOne, iValTwo );
00EE1267  mov         eax,dword ptr [iValTwo]
00EE126A  push        eax
00EE126B  mov         ecx,dword ptr [iValOne]
00EE126E  push        ecx
00EE126F  lea         ecx,[cMySumOf]
00EE1272  call        CSumOf::SumOf (0EE112Ch)</pre>
<p>The assembler involved with calling <strong>CSumof::SumOf()</strong> starts at <em>line 7</em> and goes to <em>line 12</em>.</p>
<p><em>Lines 7 to 10</em> are pushing the parameters to the function onto the stack in reverse order of declaration, exactly as with the <strong>stdcall</strong> convention we looked at in the <a href="http://altdevblogaday.com/2011/12/24/c-c-low-level-curriculum-part-4-more-stack/">previous article</a>.</p>
<p><em>Line 11</em> is storing the address of <strong>cMySumOf</strong> in <strong>ecx</strong> using the instruction <strong>lea</strong>. If you right click and un-check &#8220;Show Symbol Names&#8221; you can see that <strong>lea</strong> is computing the address of <strong>cMySumOf</strong> given its offset from the <strong>ebx</strong> register.</p>
<p><em>Line 12</em> is obviously calling the function.</p>
<p>Stepping into the function call in the disassembly you should see the following: <em>(not forgetting that we have to step through an additional <strong>jmp</strong> instruction before we get there because of VS2010 incremental linking &#8211; see approx. half way through <a href="http://altdevblogaday.com/2011/12/14/c-c-low-level-curriculum-part-3-the-stack/">this post </a>for the details)</em></p>
<pre>     6:     void SumOf( int iParamOne, int iParamTwo )
     7:     {
00EE1280  push        ebp
00EE1281  mov         ebp,esp
00EE1283  sub         esp,44h
00EE1286  push        ebx
00EE1287  push        esi
00EE1288  push        edi
00EE1289  mov         dword ptr [ebp-4],ecx
     8:         m_iSumOf = iParamOne + iParamTwo;
00EE128C  mov         eax,dword ptr [iParamOne]
00EE128F  add         eax,dword ptr [iParamTwo]
00EE1292  mov         ecx,dword ptr [this]
00EE1295  mov         dword ptr [ecx],eax
     9:     }</pre>
<p>The calling code stored the address of the calling instance of the local variable <strong>cMySumOf</strong> in the <strong>ecx</strong> register before calling this function, and if we examine <em>line 9</em> in code box above, you can see that &#8211; compared to the <strong>stdcall</strong> assembler &#8211; the function prologue has an extra step &#8211; it is <strong>mov</strong>ing the value in <strong>ecx</strong> into a memory address within the function&#8217;s stack frame (i.e. <strong>ebp-4</strong>). The upshot of this is that after <em>line 9</em> [<strong>ebp-4</strong>] now stores the function&#8217;s <strong>this</strong> pointer.</p>
<p>The function then proceeds exactly as you might expect from the disassembly we&#8217;ve examined in previous articles up until <em>line 13</em>.</p>
<p><em>Line 13</em> <strong>mov</strong>es the <strong>this</strong> pointer (previously stored in the function&#8217;s stack frame) into <strong>ecx</strong>, then <em>line 14</em> stores the value of <strong>eax</strong> into the address specified by <strong>ecx</strong> (remember: in the VS2010 disassembly view, values in <strong>[</strong>square brackets<strong>]</strong> are memory accesses, taking the address to access from the value in the brackets). If you right click in the disassembly window and un-check &#8220;Show Symbol Names&#8221; you will see that the symbol <strong>this</strong> corresponds to <strong>ebp-4</strong>, which is where the value of <strong>ecx</strong> was stored at the end of the function prologue.</p>
<p>The astute amongst you will have noticed that the assembler is storing the <strong>this</strong> pointer from <strong>ecx</strong> into the Stack only to get it re-load it into <strong>ecx</strong> later without having used the <strong></strong>register in the intervening time. This is exactly the kind of odd thing that un-optimised compiler generated assembler will do, try not to let it bother you :)</p>
<p>So the sum of the two parameters is stored using the <strong>this</strong> pointer, and then we hit the function epilogue and the function returns; end of story &#8211; or is it?</p>
<p>&nbsp;</p>
<h2>Nothing to see here. Move along.</h2>
<p>This is not what you might expect because &#8211; based on what we&#8217;ve seen so far &#8211; that assembler that is setting <strong>CSumOf::m_iSumOf</strong> in the member function doesn&#8217;t obviously match the C++ code we wrote.</p>
<p>What we&#8217;re seeing looks like it might have been generated by the code</p>
<pre>*((int*) this) = iParamOne + iParamTwo;</pre>
<p>And in fact if you substitute that line it will generate exactly the same assembler &#8211; so how does that work?!?</p>
<pre>// Here's what we wrote. Since m_iSumOf is a class member the language syntax allows use to
// "access it directly" (another Professor Kenobiism) in the member function
m_iSumOf = iParamOne + iParamTwo;

// in fact, what happens is that the compiler evaluates the code as if it was written like this
this-&gt;m_iSumOf = iParamOne + iParamTwo;</pre>
<p>Ok, so there&#8217;s invisible pointer access in the C++ code, but that still doesn&#8217;t explain what we&#8217;re seeing &#8211; exactly how is</p>
<pre>*((int*) this)</pre>
<p>equivalent to</p>
<pre>this-&gt;m_iSumOf</pre>
<p>The answer has to do with memory layout of C++ classes (and structs), which is a topic for another entire article (probably several).</p>
<p>For now we&#8217;ll keep the explanation simple whilst trying not to channel our friend Professor Kenobi more than absolutely necessary&#8230;</p>
<p>First let&#8217;s take it as read that the member data for an instance of class must be stored somewhere in memory, and take a high level look at how the &#8220;pointing to&#8221; operator works with another code snippet:</p>
<pre>this-&gt;m_iSumOf = 0;</pre>
<p>This basically tells the compiler generate assembler that:</p>
<ul>
<li>gets the value of <strong>this</strong> (a memory address)</li>
<li>looks up the offset of <strong>m_iSumOf</strong> relative to the start of the data making up an instance of <strong>CSumOf</strong> (which is known at compile time, so it&#8217;s constant at run time)</li>
<li>adds the offset to the address of <strong>this</strong> to get the memory address storing <strong>m_iSumOf</strong> and then sets the value at the resulting memory address to 0</li>
</ul>
<p>The <strong>this</strong> pointer holds the address of the first byte of the data in an instance of <strong>CSumOf</strong>.</p>
<p>The first (and only) member variable in <strong>CSumOf</strong> is <strong>m_iSumOf</strong>, which puts it at an offset of 0 relative to the <strong>this</strong> pointer &#8211; and clearly even a debug build knows better than to add an offset of 0, so it accesses the memory at the address <strong>this</strong>.</p>
<p>So, again, we can see that even in seemingly innoccuous everyday C++ code there is hidden stuff going on &#8211; which is a big part of why I&#8217;m doing this series :)</p>
<p>Incidentally, I have recently been made aware of an unbelievably useful (and undocumented!) feature of the VS2010 compiler which prints the memory layout of classes to the build output during compilation: here&#8217;s the link I was sent, I hope you find it useful: <a href="http://thetweaker.wordpress.com/2010/11/07/d1reportallclasslayout-dumping-object-memory-layout/">http://thetweaker.wordpress.com/2010/11/07/d1reportallclasslayout-dumping-object-memory-layout/</a></p>
<p>&nbsp;</p>
<h2>fastcall (last one, I promise)</h2>
<p>At last we come to the win32 x86 calling convention excitingly named <strong>fastcall</strong>, so named because in theory it makes function calls faster (than the more common <strong>stdcall</strong> or <strong>cdecl</strong> conventions).</p>
<p>So why is it faster than the other calling conventions that we&#8217;ve looked at? To answer this, we&#8217;ll need to examine the assembler generated by a function call that uses the <strong>fastcall</strong> convention.</p>
<p>To demonstrate this we&#8217;ll use the code below:</p>
<pre>int __fastcall SumOf( int iParamOne, int iParamTwo, int iParamThree )
{
    int iLocal = iParamOne + iParamTwo + iParamThree;
    return iLocal;
}

int main( int argc, char** argv )
{
    int iValOne   = 1;
    int iValTwo   = 2;
    int iValThree = 4;
    int iResult   = SumOf( iValOne, iValTwo, iValThree );
    return 0;
}</pre>
<p>This is basically the same as the code used in the <a href="http://altdevblogaday.com/2011/12/24/c-c-low-level-curriculum-part-4-more-stack/">previous post </a>in the series to show how the <strong>stdcall</strong> calling convention stores multiple parameters on the stack, except the function <strong>SumOf</strong> has got an extra keyword between the return type and the name of the function.</p>
<p>The <strong>__fastcall</strong> keyword is a not-quite Microsoft specific C++ extension that changes the calling convention used to call the function it is applied to (<a href="http://en.wikipedia.org/wiki/X86_calling_conventions#fastcall">http://en.wikipedia.org/wiki/X86_calling_conventions#fastcall</a>).</p>
<p>If you follow the usual drill to make a runnable project from this snippet, put a breakpoint on <em>line 12</em>, then compile and run the debug configuration, wait for the breakpoint to get hit, and go to disassembly you should see something like this:</p>
<pre>     8: int main( int argc, char** argv )
     9: {
010F1280  push        ebp
010F1281  mov         ebp,esp
010F1283  sub         esp,50h
010F1286  push        ebx
010F1287  push        esi
010F1288  push        edi
    10:     int iValOne   = 1;
010F1289  mov         dword ptr [iValOne],1
    11:     int iValTwo   = 2;
010F1290  mov         dword ptr [iValTwo],2
    12:     int iValThree = 4;
010F1297  mov         dword ptr [iValThree],4
    13:     int iResult   = SumOf( iValOne, iValTwo, iValThree );
010F129E  mov         eax,dword ptr [iValThree]
010F12A1  push        eax
010F12A2  mov         edx,dword ptr [iValTwo]
010F12A5  mov         ecx,dword ptr [iValOne]
010F12A8  call        SumOf (10F1136h)
010F12AD  mov         dword ptr [iResult],eax
    14:     return 0;
010F12B0  xor         eax,eax
    15: }</pre>
<p>You should by this point be pretty familiar with function prologues, and the assembler that precedes a function call in the other conventions we&#8217;ve examined, so we&#8217;ll just look at the differences with <strong>__fastcall</strong>.</p>
<p>Looking at <em>lines 16 to 20</em>, we can see that of the three parameters passed to <strong>SumOf()</strong>:</p>
<ul>
<li>the 3rd (<strong>iValThree</strong>) is being <strong>push</strong>ed onto the stack,</li>
<li>the 2nd (<strong>iValTwo</strong>) is being <strong>mov</strong>ed into the <strong>edx</strong> register, and</li>
<li>the 1st (<strong>iValOne</strong>) is being <strong>mov</strong>ed into the <strong>ecx</strong> register</li>
</ul>
<p>Stepping into the disassembly of <strong>SumOf()</strong> you should see something like this (N.B. I unchecked &#8220;Show Symbol Names&#8221; before grabbing this text from the disassembly view so the addresses were all visible):</p>
<pre>     2: int __fastcall SumOf( int iParamOne, int iParamTwo, int iParamThree )
     3: {
010F1250  push        ebp
010F1251  mov         ebp,esp
010F1253  sub         esp,4Ch
010F1256  push        ebx
010F1257  push        esi
010F1258  push        edi
010F1259  mov         dword ptr [ebp-8],edx
010F125C  mov         dword ptr [ebp-4],ecx
     4:     int iLocal = iParamOne + iParamTwo + iParamThree;
010F125F  mov         eax,dword ptr [ebp-4]
010F1262  add         eax,dword ptr [ebp-8]
010F1265  add         eax,dword ptr [ebp+8]
010F1268  mov         dword ptr [ebp-0Ch],eax
     5:     return iLocal;
010F126B  mov         eax,dword ptr [ebp-0Ch]
     6: }</pre>
<p>The assembly making up the function prologue is doing extra work compared to a <strong>stdcall</strong> function; taking the values of <strong>ecx</strong> and <strong>edx</strong> and storing them into the function&#8217;s Stack frame (<em>lines 9 &amp; 10</em>).</p>
<p><em>Lines 12 to 1</em>4 then add the three values passed to it using <strong>eax</strong> &#8211; iParamOne (passed via <strong>ecx</strong> now in [<strong>ebp-4</strong>]), iParamTwo (passed via <strong>edx</strong> now in [<strong>ebp-8</strong>]), and iParamThree (passed via the Stack in [<strong>ebp+8</strong>]).</p>
<p><em>Line 15</em> sets <strong>iLocal</strong> from the sum calculated in <strong>eax</strong>, and then <em>Line 16</em> <strong>mov</strong>es the return value of the function into <strong>eax</strong> where the calling code will expect to find it (as previous established in <a href="http://altdevblogaday.com/2011/12/14/c-c-low-level-curriculum-part-3-the-stack/">this post</a>).</p>
<p>That&#8217;s all well and good, but how is <strong>fastcall</strong> faster than the alternative calling conventions?</p>
<p>In <em>theory</em>, passing the arguments via registers should save two operations per parameter:</p>
<ol>
<li>not writing the value into the Stack (i.e. memory access) before the function is called, and</li>
<li>not reading it from the Stack (i.e. memory access) when it is needed inside the function.</li>
</ol>
<p>As a rule of thumb, performing less operations and avoiding those that involve accessing memory should result in faster code, but this is not always the case. I don&#8217;t want to get into discussing why this is, because on its own it is a subject for many posts and by someone more qualified than myself to explain (e.g. <a href="http://altdevblogaday.com/author/bruce-dawson/">Bruce Dawson</a>, <a href="http://altdevblogaday.com/author/mike-acton/">Mike Acton</a>, <a href="http://altdevblogaday.com/author/tony-albrecht/">Tony Albrecht</a>, <a href="http://altdevblogaday.com/author/jaymin-kessler/">Jaymin Kessler</a>, or <a href="http://altdevblogaday.com/author/john-mccutchan/">John McCutchan</a>).</p>
<p>In all honesty I would be <em>extremely</em> surprised if the unoptimised code we&#8217;ve looked at runs any faster at all when using <strong>fastcall</strong>. As you can see by examining the disassembly above, the first of these potentially saved operations is being un-done by pushing the content of <strong>ecx</strong> and <strong>edx</strong> onto the Stack in the function prologue, and the second is being un-done by accessing the parameter values from the Stack in <em>lines 12 &amp; 13</em>.</p>
<p>I assume that, like the other instances of unoptimised compiler generated assembler performing redundant operations we have come across, these unnecessary instructions would happily optimise away in a release build; however the sad fact is that it is pretty hard to test the disassembly of trivial programs like the one we&#8217;ve been looking at meaningfully in a release build configuration.</p>
<p>Why? because the optimising compiler is so good that any simple program (like this one) which uses compile time constants for input, and does no output will pretty much compile to &#8220;return 0;&#8221;</p>
<p>I leave it as an exercise for you, dear reader, to work out the smallest number of changes to this code that will result in disassembly that actually calls SumOf() :)</p>
<p>&nbsp;</p>
<h2>Summary</h2>
<p>So, we have now seen how <strong>thiscall</strong> and <strong>fastcall</strong> differ from the other x86 calling conventions we&#8217;ve looked at, and we have seen yet again that even in simple code there is black magic going on behond the scenes of the language syntax.</p>
<p>Also, I want to point out that &#8211; whilst non x86 platforms will be do things slightly differently &#8211; this information is more generally useful than it may appear; the more different ways you&#8217;ve seen assembler doing similar tasks (like calling functions using the Stack), the more likely you are to be able to make sense of some new assembly language that you&#8217;ve never seen before (e.g. powerPC assembler) sure the mnemonics may be very different but you should be able to guess at a lot of them and the documentation is out there to allow you to put the rest of the picture together given time.</p>
<p>No doubt we will revisit the Stack from time to time as this (Potentially neverending! Help!) series of articles continues, but I&#8217;ve now covered it in as much detail as I feel is appropriate until we&#8217;ve covered some other aspects of the Low Level view of C/C++ (for example; we will definitely be coming back to the Stack when we examine structs &amp; classes and their memory layout to discuss pass by value).</p>
<p>Next time we&#8217;ll be looking at the disassembly from common C / C++ language constructs like loops and control statements, which are very useful things to be familiar with know if you find yourself staring at bunch of disassembly as a result of a crash in code you don&#8217;t have symbols for..</p>
<p>In case you missed it whilst reading the main body of the post, here&#8217;s that link again concerning the undocumented VS2010 compiler feature that dumps memory layouts of classes to the build output: <a href="http://thetweaker.wordpress.com/2010/11/07/d1reportallclasslayout-dumping-object-memory-layout/">http://thetweaker.wordpress.com/2010/11/07/d1reportallclasslayout-dumping-object-memory-layout/</a></p>
<p>Also, thanks to Fabian and Bruce for their help reviewing this post.</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.altdevblogaday.com/2012/02/07/c-c-low-level-curriculum-part-5-even-more-stack/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>C / C++ Low Level Curriculum Part 4: More Stack</title>
		<link>http://www.altdevblogaday.com/2011/12/24/c-c-low-level-curriculum-part-4-more-stack/</link>
		<comments>http://www.altdevblogaday.com/2011/12/24/c-c-low-level-curriculum-part-4-more-stack/#comments</comments>
		<pubDate>Sat, 24 Dec 2011 16:41:40 +0000</pubDate>
		<dc:creator>Alex Darby</dc:creator>
				<category><![CDATA[#gamedev]]></category>
		<category><![CDATA[Education]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[C++]]></category>
		<category><![CDATA[low level]]></category>
		<category><![CDATA[programming]]></category>

		<guid isPermaLink="false">http://altdevblogaday.com/?p=21926</guid>
		<description><![CDATA[<p>Welcome to the 4th part of my C/C++ Low Level Curriculum &#8211; more Stack!</p>
<p>The last post was a mammoth and took me ages so this post is going to be significantly shorter, and will consequently cover less ground. Specifically we&#8217;re going to look at how more than one parameter is passed in compiler generated unoptimised x86 assembler using the stdcall calling convention.</p>
<p><a href="http://www.altdevblogaday.com/2011/12/24/c-c-low-level-curriculum-part-4-more-stack/" class="more-link">Read more on C / C++ Low Level Curriculum Part 4: More Stack&#8230;</a></p>
]]></description>
				<content:encoded><![CDATA[<p>Welcome to the 4th part of my C/C++ Low Level Curriculum &#8211; more Stack!</p>
<p>The last post was a mammoth and took me ages so this post is going to be significantly shorter, and will consequently cover less ground. Specifically we&#8217;re going to look at how more than one parameter is passed in compiler generated unoptimised x86 assembler using the stdcall calling convention.</p>
<p>This post assumes that you have read the previous post on <a href="http://altdevblogaday.com/2011/12/14/c-c-low-level-curriculum-part-3-the-stack/">the Stack </a>(or know how the Stack works in &#8220;vanilla&#8221; x86 assembler already).</p>
<p>If you missed the previous posts here are backlinks:</p>
<ol>
<li><a href="../2011/11/09/a-low-level-curriculum-for-c-and-c/" rel="nofollow">http://altdevblogaday.com/2011/11/09/a-low-level-curriculum-for-c-and-c/</a></li>
<li><a href="../2011/11/24/c-c-low-level-curriculum-part-2-data-types/" rel="nofollow">http://altdevblogaday.com/2011/11/24/c-c-low-level-curriculum-part-2-data-types/</a></li>
<li><a href="http://altdevblogaday.com/2011/12/14/c-c-low-level-curriculum-part-3-the-stack/">http://altdevblogaday.com/2011/12/14/c-c-low-level-curriculum-part-3-the-stack/</a></li>
</ol>
<p>I&#8217;ve also dropped in a good link to some IBM resources on the PowerPC ABI which explain in detail (and at the assembler level) how the Stack is used on PowerPC based CPUs as it is actually more different from x86 than I remembered. You may find this particularly useful if you work on Current Gen consoles and want to understand how they use the Stack, call functions, and pass parameters.</p>
<p>&nbsp;</p>
<h2>More than one function parameter</h2>
<p>As before, I&#8217;m using a win32 console application made by the &#8220;new project&#8221; wizard in VS2010 with the default options. The disassembly we&#8217;ll be looking at is from the debug build configuration, which generates vanilla unoptimised stdcall x86 code.</p>
<p>The only change I make is to turn off &#8220;Basic Runtime Checks&#8221; to make the generated assembler more legible (and significantly faster&#8230;) see the <a href="http://altdevblogaday.com/2011/12/14/c-c-low-level-curriculum-part-3-the-stack/">previous post</a> for details on how to do this.</p>
<p>We&#8217;re going to update the very simple program used for the last article so that the function it calls requires 3 parameters.</p>
<pre>int SumOf( int iParamOne, int iParamTwo, int iParamThree )
{
    int iLocal = iParamOne + iParamTwo + iParamThree;
    return iLocal;
}

int main( int argc, char** argv )
{
    int iValOne   = 1;
    int iValTwo   = 2;
    int iValThree = 4;
    int iResult   = SumOf( iValOne, iValTwo, iValThree );
    return 0;
}</pre>
<p>and here&#8217;s the assembler it generates for main() (as before the addresses of the instructions will almost certainly differ for you):</p>
<pre>     7: int main( int argc, char** argv )
     8: {
00401280  push        ebp
00401281  mov         ebp,esp
00401283  sub         esp,50h
00401286  push        ebx
00401287  push        esi
00401288  push        edi
     9:     int iValOne        = 1;
00401289  mov         dword ptr [ebp-4],1
    10:     int iValTwo        = 2;
00401290  mov         dword ptr [ebp-8],2
    11:     int iValThree    = 4;
00401297  mov         dword ptr [ebp-0Ch],4
    12:     int iResult        = SumOf( iValOne, iValTwo, iValThree );
0040129E  mov         eax,dword ptr [ebp-0Ch]
004012A1  push        eax
004012A2  mov         ecx,dword ptr [ebp-8]
004012A5  push        ecx
004012A6  mov         edx,dword ptr [ebp-4]
004012A9  push        edx
004012AA  call        00401127
004012AF  add         esp,0Ch
004012B2  mov         dword ptr [ebp-10h],eax
    13:     return 0;
004012B5  xor         eax,eax
    14: }
004012B7  pop         edi
004012B8  pop         esi
004012B9  pop         ebx
004012BA  mov         esp,ebp
004012BC  pop         ebp
004012BD  ret</pre>
<p>&nbsp;</p>
<h2>Calling SumOf()</h2>
<p>As we saw in the last article, we know we can safely ignore the function preamble and postamble (<em>lines 3-8</em>, and <em>lines 28-33</em>; also known as the prologue and epilogue respectively) which set up and tear down the function&#8217;s Stack Frame as we know they&#8217;re not involved in passing the parameters to SumOf().</p>
<p>A quick look at the disassembly initialising the local variables tells us that iValOne, iValTwo, and iValThree are stored at [<strong>ebp-4</strong>], [<strong>ebp-8</strong>], and [<strong>ebp-0Ch</strong>] respectively.</p>
<p>The disassembly relevant to the function call and the assignment of its return value is this part:</p>
<pre>    12:     int iResult        = SumOf( iValOne, iValTwo, iValThree );
0040129E  mov         eax,dword ptr [ebp-0Ch]
004012A1  push        eax
004012A2  mov         ecx,dword ptr [ebp-8]
004012A5  push        ecx
004012A6  mov         edx,dword ptr [ebp-4]
004012A9  push        edx
004012AA  call        00401127
004012AF  add         esp,0Ch
004012B2  mov         dword ptr [ebp-10h],eax</pre>
<p>As in the case with a single argument, copies of the function parameters&#8217; values are <strong>push</strong>ed onto the Stack &#8211; but note that they are pushed on <em>in the opposite order</em> to the order in which the function&#8217;s parameter list expects them in the C++ code.</p>
<p>The final thing to note, is that following the <strong>call</strong> instruction on <em>line 22</em> (i.e. immediately before the assembler for SumOf() is executed) the copy of iValOne that was pushed onto the stack in <em>line 21</em> is at [<strong>esp+4</strong>] because call <strong>push</strong>es the return address onto the Stack.</p>
<p>Just in case, here&#8217;s what the stack looks like immediately after <em>line 22</em> is executed, but before any code in SumOf() is executed:</p>
<div id="attachment_21993" class="wp-caption alignnone" style="width: 306px"><a href="http://altdevblogaday.com/wp-content/uploads/2011/12/MoreStack_Shapshot0.png"><img class="size-full wp-image-21993" src="http://altdevblogaday.com/wp-content/uploads/2011/12/MoreStack_Shapshot0.png" alt="" width="296" height="460" /></a><p class="wp-caption-text">Stack after line 22, before any of SumOf() is executed</p></div>
<p>&nbsp;</p>
<h2>Accessing the parameters</h2>
<p>Here&#8217;s the disassembly for SumOf():</p>
<pre>     1: int SumOf( int iParamOne, int iParamTwo, int iParamThree )
     2: {
00401250  push        ebp
00401251  mov         ebp,esp
00401253  sub         esp,44h
00401256  push        ebx
00401257  push        esi
00401258  push        edi
     3:     int iLocal = iParamOne + iParamTwo + iParamThree;
00401259  mov         eax,dword ptr [ebp+8]
0040125C  add         eax,dword ptr [ebp+0Ch]
0040125F  add         eax,dword ptr [ebp+10h]
00401262  mov         dword ptr [ebp-4],eax
     4:     return iLocal;
00401265  mov         eax,dword ptr [ebp-4]
     5: }
00401268  pop         edi
00401269  pop         esi
0040126A  pop         ebx
0040126B  mov         esp,ebp
0040126D  pop         ebp
0040126E  ret</pre>
<p>We can see that the function prologue code <strong>push</strong>es <strong>ebp</strong> which moves <strong>esp</strong> on another 4 bytes, then <strong>mov</strong>es <strong>esp</strong> to <strong>ebp</strong> &#8211; so after <em>line 4</em> the copy of iValOne&#8217;s value is now at [<strong>ebp+8</strong>].</p>
<p>Here&#8217;s another Stack snapshot showing the state after the function prologue (i.e. after<em> line 8</em>):</p>
<div id="attachment_21997" class="wp-caption alignnone" style="width: 301px"><a href="http://altdevblogaday.com/wp-content/uploads/2011/12/MoreStack_Shapshot11.png"><img class="size-full wp-image-21997" src="http://altdevblogaday.com/wp-content/uploads/2011/12/MoreStack_Shapshot11.png" alt="apshot1" width="291" height="538" /></a><p class="wp-caption-text">state of the Stack after the prologue of SumOf()</p></div>
<p>Looking at <em>lines 10-12</em> we can see that the assembler is accessing the function parameters as follows:</p>
<ul>
<li>iParamOne (iValOne) from [<strong>ebp+8</strong>]</li>
<li>iParamTwo (iValTwo) from [<strong>ebp+0Ch</strong>]</li>
<li>iParamThree (iValThree) from [<strong>ebp+10h</strong>]</li>
</ul>
<p>Which, unsurprisingly, is exactly where the values main() <strong>push</strong>ed onto the Stack before calling this function ended up after the function prologue.</p>
<p>Now we can see why the function parameters are <strong>push</strong>ed onto the stack in reverse order by main() &#8211; because functions called expect them to be stored in the Stack in parameter list order starting from [<strong>ebp+8</strong>] and incrementing in offset from <strong>ebp</strong> for each parameter.</p>
<p>As before the return value (iLocal, stored at [<strong>ebp-4</strong>]) is <strong>mov</strong>ed into <strong>eax</strong> before the function&#8217;s epilogue code in order to return it to main(), and since we know how the epilogue and return work from the last article we&#8217;re done with vanilla stdcall with multiple parameters. Joy.</p>
<p>&nbsp;</p>
<h2>Summary</h2>
<p>We&#8217;ve looked in some detail at how the Stack is used to call functions in vanilla unoptimised compiler generated stdcall x86 assembler, this should leave you in a pretty good place to go mooching about in disassembly windows with a fair idea of which parts of the disassembly for each function is most likely to be relevant.</p>
<p>For extra information, and to show you how different the Stack use can be (whilst still being basically the same in principle), here&#8217;s a link to the 4th in a series of articles on the IBM Technical Library site dealing with PowerPC assembler, and in particular with the 64 bit PowerPC ABI:</p>
<p>http://www.ibm.com/developerworks/linux/library/l-powasm4/index.html</p>
<p>In all likelihood you&#8217;ll need to read the first three articles to make sense of the 4th, but the 4th one is where most of the juicy info is :)</p>
<p>&nbsp;</p>
<h2>Next Time</h2>
<p>Next time, I&#8217;m going to look at the x86 thiscall calling convention used when C++ member functions (where the &#8216;this&#8217; pointer is passed in <strong>ecx</strong>), and we&#8217;ll also have a look in overview at how the exciting sounding &#8216;fastcall&#8217; x86 calling convention uses the Stack.</p>
<p>Oh, and Merry Christmas!</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.altdevblogaday.com/2011/12/24/c-c-low-level-curriculum-part-4-more-stack/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>C / C++ Low Level Curriculum Part 3: The Stack</title>
		<link>http://www.altdevblogaday.com/2011/12/14/c-c-low-level-curriculum-part-3-the-stack/</link>
		<comments>http://www.altdevblogaday.com/2011/12/14/c-c-low-level-curriculum-part-3-the-stack/#comments</comments>
		<pubDate>Wed, 14 Dec 2011 23:10:21 +0000</pubDate>
		<dc:creator>Alex Darby</dc:creator>
				<category><![CDATA[#gamedev]]></category>
		<category><![CDATA[Education]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[C++]]></category>
		<category><![CDATA[development]]></category>
		<category><![CDATA[education]]></category>
		<category><![CDATA[low level]]></category>
		<category><![CDATA[programming]]></category>

		<guid isPermaLink="false">http://altdevblogaday.com/?p=21166</guid>
		<description><![CDATA[<p>Welcome to the 3rd part of the series I&#8217;m doing on a Low Level Curriculum for C/C++.</p>
<p>This one is about the Stack, which is arguably the most important component of the underlying &#8220;engine&#8221; of C/C++. If you only ever bother to learn about one aspect of the low level behaviour of C/C++, then my advice is to make that one thing be the Stack.</p>
<p><a href="http://www.altdevblogaday.com/2011/12/14/c-c-low-level-curriculum-part-3-the-stack/" class="more-link">Read more on C / C++ Low Level Curriculum Part 3: The Stack&#8230;</a></p>
]]></description>
				<content:encoded><![CDATA[<p>Welcome to the 3rd part of the series I&#8217;m doing on a Low Level Curriculum for C/C++.</p>
<p>This one is about the Stack, which is arguably the most important component of the underlying &#8220;engine&#8221; of C/C++. If you only ever bother to learn about one aspect of the low level behaviour of C/C++, then my advice is to make that one thing be the Stack.</p>
<p>You probably won&#8217;t need to have read the first two parts of this series to follow this article, but I will assume that you have &#8211; and I definitely assume that you do not fear the disassemby window.</p>
<p>In case you want to go back and read them here are links to the first two articles:</p>
<ol>
<li>http://altdevblogaday.com/2011/11/09/a-low-level-curriculum-for-c-and-c/</li>
<li>http://altdevblogaday.com/2011/11/24/c-c-low-level-curriculum-part-2-data-types/</li>
</ol>
<p>&nbsp;</p>
<h2>Prologue</h2>
<p>If you are a C++ programmer and you&#8217;re not 100% sure what the Stack is or how it works, then you are not alone.</p>
<div class="wp-caption alignleft" style="width: 170px"><img src="http://altdevblogaday.com/wp-content/uploads/2011/10/ADBAD_TheBjible.png" alt="" width="160" height="202" /><p class="wp-caption-text">The Bjible</p></div>
<p>Bjarne Stroustrup&#8217;s book &#8220;The C++ Programming Language (3rd edition)&#8221; &#8211; which is pretty much the standard text on C++ (at least until the update for the C++11 standard is published&#8230;) &#8211; does not discuss what the stack is or how it works; although it does refers to data or objects being &#8220;on the stack&#8221; as if the reader knows what this means.</p>
<p>The closest you get to concrete information about the Stack in the Bjible is the following paragraph in a section in Appendix C entitled &#8220;C.9 Memory Management&#8221;&#8230;</p>
<p style="padding-left: 30px"><em>&#8220;<strong>Automatic memory</strong>, in which function arguments and local variables are allocated. Each entry</em><em> into a function or a block gets its own copy. This kind of memory is automatically created</em><em> and destroyed; hence the name automatic memory. Automatic memory is also said <strong>‘‘to be</strong></em><em><strong> on the stack.’’</strong> If you absolutely must be explicit about this, C++ provides the redundant</em><em> keyword auto .&#8221;</em></p>
<p>Don&#8217;t get me wrong, this is still a very very good book (and one I refer to fairly often), but the fact that the standard text on C++ all but ignores something as core to the internal operation of C++ as the Stack is telling indeed. In my experience, this is symptomatic of the disconnect between programming language and underlying implementation that exists in the academic mindset.</p>
<p>On my Computer Science degree the concept of the Stack was covered in a couple of slides during a compulsory 1st year module called &#8220;Computer Systems and Architecture&#8221;, but never specifically with relation to the programming languages we were learning &#8211; and this, dear reader, is why I feel compelled to write about it&#8230;</p>
<p>&nbsp;</p>
<h2>What is the Stack?</h2>
<p>Unsurprisingly, the Stack is a <a href="http://en.wikipedia.org/wiki/Stack_%28abstract_data_type%29">stack data structure</a>. For the sake of clarity I&#8217;m going to capitalise the Stack to discriminate it from just any old instance of a stack data structure.</p>
<p>In a single threaded program the Stack contains the vast majority of the data relating to the current execution state of the program and all non-global &#8220;automatic&#8221; memory under the control of the compiler &#8211; i.e. local variables, function parameters etc.</p>
<p>When you put a breakpoint in your code in your IDE of choice and use the Call Stack window to discover the stack of function calls that got you to your breakpoint, the data used to populate that window was almost certainly derived by the debugger from examining the state of the Stack.</p>
<p>The detailed specifics of the way the Stack functions varies from CPU to CPU, machine to machine, compiler to compiler, and even with the same compiler and different compiler options (more about that in the next post!).</p>
<p>Broadly speaking, each time a new function is called:</p>
<ul>
<li>the current CPU execution state (i.e. the instantaneous values of CPU registers) is backed up into the Stack so it can be reinstated later,</li>
</ul>
<ul>
<li>some or all of the parameters expected by the function being called are put into the Stack (many implementations use registers for parameter passing when possible), and</li>
</ul>
<ul>
<li>the CPU jumps to a new location in memory to execute the code of the called function.</li>
</ul>
<p>In the generalised case then, the Stack includes the following information:</p>
<ul>
<li>all of the local variables and function parameters of the functions below the top of the call stack,</li>
</ul>
<ul>
<li>copies of the contents of all the CPU registers that were in use in each function below the top of the call stack when the call to the function above it was made,</li>
</ul>
<ul>
<li>the memory address of the next instruction to execute in each of the functions below the top of the call stack when the function above them returns (the &#8220;return address&#8221;).</li>
</ul>
<p>The area of Stack that contains the local variables belonging to a function (and any other data it might happen to put there) is said to be that function&#8217;s Stack Frame. This is an important term, don&#8217;t forget it!</p>
<p>Clearly the operation of the Stack is massively important to the execution of your code, and hopefully it is now obvious why something as easily done as writing outside of the bounds of an array declared as a local inside a function can cause such an epic fail &#8211; the out of bounds write is very likely to overwrite a function&#8217;s return addess or some other value crucial to the correct running of the program once the current function returns.</p>
<p>&nbsp;</p>
<h2>How does the Stack work in practice?</h2>
<p>To help answer these questions let&#8217;s consider a (very simple) C/C++ program:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
</pre></td><td class="code"><pre class="c" style="font-family:monospace;"><span style="color: #993333;">int</span> AddOneTo<span style="color: #009900;">&#40;</span> <span style="color: #993333;">int</span> iParameter <span style="color: #009900;">&#41;</span>
<span style="color: #009900;">&#123;</span>
    <span style="color: #993333;">int</span> iLocal <span style="color: #339933;">=</span> iParameter <span style="color: #339933;">+</span> <span style="color: #0000dd;">1</span><span style="color: #339933;">;</span>
    <span style="color: #b1b100;">return</span> iLocal<span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span>
&nbsp;
<span style="color: #993333;">int</span> main<span style="color: #009900;">&#40;</span> <span style="color: #993333;">int</span> argc<span style="color: #339933;">,</span> <span style="color: #993333;">char</span><span style="color: #339933;">**</span> argv <span style="color: #009900;">&#41;</span>
<span style="color: #009900;">&#123;</span>
    <span style="color: #993333;">int</span> iResult <span style="color: #339933;">=</span> <span style="color: #0000dd;">0</span><span style="color: #339933;">;</span>
    iResult <span style="color: #339933;">=</span> AddOneTo<span style="color: #009900;">&#40;</span> iResult <span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    <span style="color: #b1b100;">return</span> <span style="color: #0000dd;">0</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span></pre></td></tr></table></div>

<p>I will be stepping through this program as a win32 console application built using VS2010 in a debug configuration with more or less the default compiler and linker settings, and the screenshots in this article will reflect this.</p>
<p>I would definitely advise doing this yourself by hand after reading this article because it&#8217;s always much more instructional to muddle through something like this yourself than it is to just read it&#8230;</p>
<p>As I mentioned earlier, the detailed specifics of the operation of the Stack &#8211; especially with regards to passing parameters to functions &#8211; will depend on the compiler options you use. The differences are mostly down to a set of code generation standards that are called &#8220;calling conventions&#8221;, each convention has its own rules about how parameters are passed to functions and how values are returned. Some tend to be faster, some tend to be slower, but most are designed to meet specific requirements for passing data &#8211; such as C++ member function calls, or variable numbers of parameters (e.g. printf() ).</p>
<p>The default calling convention used by VS2010 with C-style functions (i.e. no &#8216;this&#8217; pointer) is known as <strong>stdcall</strong>, and since we&#8217;re looking at a debug build the disassembly we&#8217;re looking at will be using entirely un-optimised stdcall. This calling convention puts everything on the Stack other than return values from functions which are returned via the <strong>eax</strong> register.</p>
<p>If you are runningthis code on non wintel hardware, then the operation and organisation of the Stack in the code generated by your compiler, and the way parameters are passed will almost certainly be subtly &#8211; or even markedly &#8211; different from what I&#8217;m showing you in my debugger here, but the fundamental mechanisms by which it works should be basically the same.</p>
<p>&nbsp;</p>
<h3>Setting up</h3>
<p>To start with, make an empty win32 console project, create a new .cpp file in it (I&#8217;d call it main.cpp if I were you), and then paste the above code into it.</p>
<p>Next, open up the project settings and make sure you have the &#8220;run time checks&#8221; option set to &#8220;default&#8221;.  This not only makes the debug code (a lot) faster, but also simplifies the assembler it generates substantially &#8211; especially in the case of our very simple program. The image below shows you what the options dialog should look like after you make the change.</p>
<div id="attachment_21554" class="wp-caption alignnone" style="width: 848px"><a href="http://altdevblogaday.com/wp-content/uploads/2011/12/TheStack_ProjectSettings.png"><img class="size-full wp-image-21554" src="http://altdevblogaday.com/wp-content/uploads/2011/12/TheStack_ProjectSettings.png" alt="" width="838" height="360" /></a><p class="wp-caption-text">important areas higlighted in red for good measure!</p></div>
<p>Put a breakpoint on line 7 (yes, I know this line is the definition of main()) and then compile and run a debug build. When the Breakpoint gets hit, right click the source window and choose &#8220;Go To Disassembly&#8221;.</p>
<p>You should now be seeing something like this (n.b. the addresess of the instructions down the left edge of the window will almost certainly be different in your disassembly window):</p>
<div id="attachment_21555" class="wp-caption alignnone" style="width: 679px"><a href="http://altdevblogaday.com/wp-content/uploads/2011/12/SampleMainDisassembly4.png"><img class="size-full wp-image-21555" src="http://altdevblogaday.com/wp-content/uploads/2011/12/SampleMainDisassembly4.png" alt="disassembly of main()" width="669" height="478" /></a><p class="wp-caption-text">n.b. make sure you have the same display options checked in the right-click menu</p></div>
<h1><strong><span style="color: #ff00ff">DON&#8217;T PANIC</span></strong></h1>
<p>Clearly this is significantly more daunting than the disassembly we have looked at before so, before going any further, let&#8217;s cover a little background on the way that the Stack is managed in compiler generated 32bit x86 assembler (at least by the VS2010 C++ compiler with the default compiler and linker settings).</p>
<p>&nbsp;</p>
<h3>Before we begin&#8230;</h3>
<p>The first piece of information that will start to make sense of this is that two key CPU registers are usually involved in the management of Stack Frames in 32bit x86 assembler:</p>
<ul>
<li><strong>esp</strong> &#8211; or the <em>Stack Pointer</em> register which always points to the &#8220;top&#8221; of the Stack, and</li>
</ul>
<ul>
<li><strong>ebp</strong> &#8211; or the <em>Base Pointer</em> register which points to the  start (or base) of the current Stack Frame.</li>
</ul>
<p>Local variables are typically represented as offsets from the <strong>ebp</strong> register, in this case iResult is stored at the address [<strong>ebp-4</strong>].</p>
<p>If you want to see the local variable names rather than offsets from <strong>ebp</strong> you can right-click and check the &#8220;Show Symbol Names&#8221; option, but knowing that offsets from ebp are negative is useful.</p>
<p>Why are the offsets from <strong>ebp</strong> of local variables negative?  This is because the x86 Stack grows downwards in memory &#8211; i.e. the &#8220;top&#8221; of the stack is stored in a lower memory address than the &#8220;bottom&#8221;.  Consequently, the address stored in <strong>ebp</strong> is higher than the address in <strong>esp</strong>, and so local variables within the stack frame have negative offsets from <strong>ebp</strong> (and would have a positive offset from <strong>esp</strong>).</p>
<p>I&#8217;m pretty sure that every machine I&#8217;ve worked with has had a stack that grows down rather than up in address space, but as far as I know there&#8217;s no Universal Law of Computers that says that the Stack must do this &#8211; I&#8217;m sure there must be machines that have stacks which grow the opposite way.</p>
<p>Whilst this sounds counter-intuitive, having a stack that grows downward in memory address makes sense when considered in terms of the traditional overall memory layout of C / C++ programs, which we&#8217;ll cover in a later article in the series dealing with memory.</p>
<h3>Push and Pop</h3>
<p>As I&#8217;m sure you all already know, the two key operations of the abstract data structure known as a stack are to <strong>push</strong> something onto the top of it (covering the previous top), or to <strong>pop</strong> whatever is on the top off it (exposing the previous top).</p>
<p>Unsurprisingly the x86 instruction set has a <strong>push</strong> and a <strong>pop</strong> instruction, each of which take a register operand:</p>
<ul>
<li><strong>push</strong> decrements <strong>esp</strong> by the size of its operand and then stores that operand into the address pointed to by<strong>esp</strong> (i.e. on the top of the Stack).</li>
</ul>
<ul>
<li>This means that after a <strong>push</strong> instruction, the value at the address <strong>esp</strong> points to whatever was <strong>push</strong>ed onto the Stack.</li>
</ul>
<ul>
<li><strong>pop</strong> copies the value from the address contained in <strong>esp</strong> into its operand and then increments <strong>esp</strong> by the size of its operand so that its operand is essentially removed from the Stack.</li>
</ul>
<p>These behaviours are key to the way in which the Stack operates.</p>
<p>&nbsp;</p>
<h3>How the Stack looks before our code executes</h3>
<p>As I&#8217;m sure most of you know there is code that runs before main(). This code is responsible for all sorts of system initialisation, and when it&#8217;s finished it calls main() passing the command line arguments as parameters.</p>
<p>Let&#8217;s look at the layout of the Stack at the point just before the first instruction in main() is executed &#8211; in the diagram below I have called the function that calls main() &#8220;pre-main()&#8221; &#8211; you will probably find that the name of the actual function in your program&#8217;s call stack is a scary looking combination of underscores and capitalised acronyms.</p>
<p>This diagram will make more sense when you come back to look at it after reading the rest of the article.</p>
<div id="attachment_21988" class="wp-caption alignnone" style="width: 297px"><a href="http://altdevblogaday.com/wp-content/uploads/2011/12/TheStack_Snapshot_01.png"><img class="size-full wp-image-21988" src="http://altdevblogaday.com/wp-content/uploads/2011/12/TheStack_Snapshot_01.png" alt="TheStack_Snapshot_0" width="287" height="242" /></a><p class="wp-caption-text">the state of the Stack before our code executes</p></div>
<p>&nbsp;</p>
<h3>The Function Preamble (or Function Prologue)</h3>
<p>Before the disassembly even gets as far as assigning 0 to iResult we have a fair amount of assembler for what looks like nothing at the C/C++ level; so let&#8217;s make sense of it.</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
</pre></td><td class="code"><pre class="asm" style="font-family:monospace;">     <span style="color: #ff0000;">7</span><span style="color: #339933;">:</span> <span style="color: #00007f; font-weight: bold;">int</span> main<span style="color: #009900; font-weight: bold;">&#40;</span> <span style="color: #00007f; font-weight: bold;">int</span> argc<span style="color: #339933;">,</span> char<span style="color: #339933;">**</span> argv <span style="color: #009900; font-weight: bold;">&#41;</span>
     <span style="color: #ff0000;">8</span><span style="color: #339933;">:</span> <span style="color: #009900; font-weight: bold;">&#123;</span>
<span style="color: #adadad; font-style: italic;">01311280</span>  <span style="color: #00007f; font-weight: bold;">push</span>        <span style="color: #46aa03; font-weight: bold;">ebp</span>
<span style="color: #adadad; font-style: italic;">01311281</span>  <span style="color: #00007f; font-weight: bold;">mov</span>         <span style="color: #46aa03; font-weight: bold;">ebp</span><span style="color: #339933;">,</span><span style="color: #46aa03; font-weight: bold;">esp</span>
<span style="color: #adadad; font-style: italic;">01311283</span>  <span style="color: #00007f; font-weight: bold;">sub</span>         <span style="color: #46aa03; font-weight: bold;">esp</span><span style="color: #339933;">,</span><span style="color: #ff0000;">44h</span>
<span style="color: #adadad; font-style: italic;">01311286</span>  <span style="color: #00007f; font-weight: bold;">push</span>        <span style="color: #46aa03; font-weight: bold;">ebx</span>
<span style="color: #adadad; font-style: italic;">01311287</span>  <span style="color: #00007f; font-weight: bold;">push</span>        <span style="color: #46aa03; font-weight: bold;">esi</span>
<span style="color: #adadad; font-style: italic;">01311288</span>  <span style="color: #00007f; font-weight: bold;">push</span>        <span style="color: #46aa03; font-weight: bold;">edi</span></pre></td></tr></table></div>

<p>A block of assembler very similar to this will be at the start of every compiler generated function, and is usually referred to as the function&#8217;s &#8220;preamble&#8221; or &#8220;prologue&#8221;.</p>
<p>The purpose of this code is to create the stack frame for the function, and to store the content of any registers the function is going to use so that those values can be reinstated before the function returns:</p>
<p><em>[n.b. line numbers used in points below refer to the numbers in the code box above]</em></p>
<ul>
<li><em>line 3</em>: is storing the current value of <strong>ebp</strong> on the Stack using the instruction <strong>push</strong>.</li>
</ul>
<ul>
<li><em>line4:</em> is <strong>mov</strong>ing the current value of <strong>esp</strong> to <strong>ebp</strong>. <strong>ebp</strong> now points straight to the old value of <strong>ebp</strong> that was just <strong>push</strong>ed onto the stack.</li>
</ul>
<ul>
<li><em>line 5:</em> is <strong>sub</strong>tracting 44h (68 in decimal) in place from the value <strong>esp. </strong>This has the effect of allocating 68bytes on the Stack after <strong>ebp</strong>.</li>
</ul>
<ul>
<li><em>line 6, 7, and 8</em>: are storing the values contained in the <strong>ebx</strong>, <strong>esi</strong>, and <strong>edi</strong> registers respectively by <strong>push</strong>ing them onto the stack. This is because the assembler in main() makes use of these registers and needs to be able to restore them to their old states before it returns. Note that each <strong>push</strong> instruction will decrease the value of <strong>esp</strong> by 4 bytes.</li>
</ul>
<p><em>N.B. If you&#8217;re following this in the debugger, then I would advise that you open a &#8220;Registers&#8221; window in your debugger to watch the values of the registers change as you single step through the disassembly. You would probably also do well to have memory windows open so you can point them at <strong>esp</strong> and <strong>ebp</strong> to watch the values change in memory (to get a memory window in VS2010 to track a register you will need to click the &#8220;Reevaluate Automatically&#8221; button to the right of the &#8220;Address:&#8221; edit box and then type the register&#8217;s name into the edit box).</em></p>
<p>At this point, the Stack looks like this:</p>
<div id="attachment_21989" class="wp-caption alignnone" style="width: 298px"><a href="http://altdevblogaday.com/wp-content/uploads/2011/12/TheStack_Snapshot_11.png"><img class="size-full wp-image-21989" src="http://altdevblogaday.com/wp-content/uploads/2011/12/TheStack_Snapshot_11.png" alt="TheStack_Snapshot_1" width="288" height="321" /></a><p class="wp-caption-text">state of the Stack after preamble of main()</p></div>
<p>A couple of things to note about these Stack snapshot diagrams:</p>
<ul>
<li>The value T in the top left of these snapshots is used to identify old values of ebp stored in the Stack.</li>
</ul>
<ul>
<li>Different colours are used to show which function is responsible for putting the data onto the Stack (and therefore for taking it off&#8230;).</li>
</ul>
<ul>
<li>Different shades of the same colour represent different logical types of data put onto the stack by each function (i.e. base pointer, stack frame (locals), values of saved registers, and parameters /return addresses).</li>
</ul>
<p>&nbsp;</p>
<h3>The function Postamble (or Epilogue)</h3>
<p>Once the body of the function has finished executing, the Stack and registers need to be put back in the same state they were in before the function preamble so that the calling function can happily carry on excuting where it left off.</p>
<p>This is the job of the function postamble, or epilogue. The postamble simply does the logical opposite of the preamble &#8211; it <strong>pop</strong>s whatever the preamble <strong>push</strong>ed, and reinstates the values that <strong>esp</strong> and <strong>ebp</strong> had before the preamble code executed.</p>
<p>In the code box below I&#8217;ve deleted the body of the function so that the preamble and postamble code are ajdacent. Looking at it like this it&#8217;s very clear that the postamble is doing the opposite of the preamble.</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
</pre></td><td class="code"><pre class="asm" style="font-family:monospace;">     <span style="color: #ff0000;">7</span><span style="color: #339933;">:</span> <span style="color: #00007f; font-weight: bold;">int</span> main<span style="color: #009900; font-weight: bold;">&#40;</span> <span style="color: #00007f; font-weight: bold;">int</span> argc<span style="color: #339933;">,</span> char<span style="color: #339933;">**</span> argv <span style="color: #009900; font-weight: bold;">&#41;</span>
     <span style="color: #ff0000;">8</span><span style="color: #339933;">:</span> <span style="color: #009900; font-weight: bold;">&#123;</span>
<span style="color: #adadad; font-style: italic;">01311280</span>  <span style="color: #00007f; font-weight: bold;">push</span>        <span style="color: #46aa03; font-weight: bold;">ebp</span>
<span style="color: #adadad; font-style: italic;">01311281</span>  <span style="color: #00007f; font-weight: bold;">mov</span>         <span style="color: #46aa03; font-weight: bold;">ebp</span><span style="color: #339933;">,</span><span style="color: #46aa03; font-weight: bold;">esp</span>
<span style="color: #adadad; font-style: italic;">01311283</span>  <span style="color: #00007f; font-weight: bold;">sub</span>         <span style="color: #46aa03; font-weight: bold;">esp</span><span style="color: #339933;">,</span><span style="color: #ff0000;">44h</span>
<span style="color: #adadad; font-style: italic;">01311286</span>  <span style="color: #00007f; font-weight: bold;">push</span>        <span style="color: #46aa03; font-weight: bold;">ebx</span>
<span style="color: #adadad; font-style: italic;">01311287</span>  <span style="color: #00007f; font-weight: bold;">push</span>        <span style="color: #46aa03; font-weight: bold;">esi</span>
<span style="color: #adadad; font-style: italic;">01311288</span>  <span style="color: #00007f; font-weight: bold;">push</span>        <span style="color: #46aa03; font-weight: bold;">edi</span>
<span style="color: #339933;">...</span> <span style="color: #0000ff; font-weight: bold;">code</span> representing the body of main<span style="color: #009900; font-weight: bold;">&#40;</span><span style="color: #009900; font-weight: bold;">&#41;</span> removed <span style="color: #339933;">...</span>
    <span style="color: #ff0000;">12</span><span style="color: #339933;">:</span> <span style="color: #009900; font-weight: bold;">&#125;</span>
<span style="color: #adadad; font-style: italic;">013112A1</span>  <span style="color: #00007f; font-weight: bold;">pop</span>         <span style="color: #46aa03; font-weight: bold;">edi</span>
<span style="color: #adadad; font-style: italic;">013112A2</span>  <span style="color: #00007f; font-weight: bold;">pop</span>         <span style="color: #46aa03; font-weight: bold;">esi</span>
<span style="color: #adadad; font-style: italic;">013112A3</span>  <span style="color: #00007f; font-weight: bold;">pop</span>         <span style="color: #46aa03; font-weight: bold;">ebx</span>
<span style="color: #adadad; font-style: italic;">013112A4</span>  <span style="color: #00007f; font-weight: bold;">mov</span>         <span style="color: #46aa03; font-weight: bold;">esp</span><span style="color: #339933;">,</span><span style="color: #46aa03; font-weight: bold;">ebp</span>
<span style="color: #adadad; font-style: italic;">013112A6</span>  <span style="color: #00007f; font-weight: bold;">pop</span>         <span style="color: #46aa03; font-weight: bold;">ebp</span>
<span style="color: #adadad; font-style: italic;">013112A7</span>  <span style="color: #00007f; font-weight: bold;">ret</span></pre></td></tr></table></div>

<p>The only line in the preamble that doesn&#8217;t have a direct opposite in the postamble is <em>line 5</em> (<strong>sub esp, 44h</strong>) &#8211; and that&#8217;s because assigning to esp from ebp in <em>line 14</em> undoes <em>lines 4 and 5</em>.</p>
<p>Since all functions have a preamble and postamble, covering these two sections of the disassembly up front means we can essentially ignore the preamble and postamble from now on, and focus on the code that is different in each function.</p>
<p>&nbsp;</p>
<h3>And now: actual code we wrote!</h3>
<p>Now we&#8217;ve covered the preamble and postamble we can concentrate on the body of main(), and how it calls the function AddOneTo().</p>
<p>This is the first bit of disassembly that directly correlates with the code that we can see at the C/C++ level.</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
</pre></td><td class="code"><pre class="asm" style="font-family:monospace;">     <span style="color: #ff0000;">9</span><span style="color: #339933;">:</span>     <span style="color: #00007f; font-weight: bold;">int</span> iResult = <span style="color: #ff0000;">0</span><span style="color: #666666; font-style: italic;">;</span>
<span style="color: #adadad; font-style: italic;">01311289</span>  <span style="color: #00007f; font-weight: bold;">mov</span>         <span style="color: #0000ff; font-weight: bold;">dword</span> ptr <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #46aa03; font-weight: bold;">ebp</span><span style="color: #339933;">-</span><span style="color: #ff0000;">4</span><span style="color: #009900; font-weight: bold;">&#93;</span><span style="color: #339933;">,</span><span style="color: #ff0000;">0</span>  
    <span style="color: #ff0000;">10</span><span style="color: #339933;">:</span>     iResult = AddOneTo<span style="color: #009900; font-weight: bold;">&#40;</span> iResult <span style="color: #009900; font-weight: bold;">&#41;</span><span style="color: #666666; font-style: italic;">;</span>
<span style="color: #adadad; font-style: italic;">01311290</span>  <span style="color: #00007f; font-weight: bold;">mov</span>         <span style="color: #46aa03; font-weight: bold;">eax</span><span style="color: #339933;">,</span><span style="color: #0000ff; font-weight: bold;">dword</span> ptr <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #46aa03; font-weight: bold;">ebp</span><span style="color: #339933;">-</span><span style="color: #ff0000;">4</span><span style="color: #009900; font-weight: bold;">&#93;</span>  
<span style="color: #adadad; font-style: italic;">01311293</span>  <span style="color: #00007f; font-weight: bold;">push</span>        <span style="color: #46aa03; font-weight: bold;">eax</span>  
<span style="color: #adadad; font-style: italic;">01311294</span>  <span style="color: #00007f; font-weight: bold;">call</span>        0131101E  
<span style="color: #adadad; font-style: italic;">01311299</span>  <span style="color: #00007f; font-weight: bold;">add</span>         <span style="color: #46aa03; font-weight: bold;">esp</span><span style="color: #339933;">,</span><span style="color: #ff0000;">4</span>  
<span style="color: #adadad; font-style: italic;">0131129C</span>  <span style="color: #00007f; font-weight: bold;">mov</span>         <span style="color: #0000ff; font-weight: bold;">dword</span> ptr <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #46aa03; font-weight: bold;">ebp</span><span style="color: #339933;">-</span><span style="color: #ff0000;">4</span><span style="color: #009900; font-weight: bold;">&#93;</span><span style="color: #339933;">,</span><span style="color: #46aa03; font-weight: bold;">eax</span></pre></td></tr></table></div>

<p>So, as we should all be familiar with from <a href="http://altdevblogaday.com/2011/11/09/a-low-level-curriculum-for-c-and-c/">getting over our fear of disassembly</a> <em>line 2</em> in the above code is setting iResult, which we can see is within our current stack frame at adress [<strong>ebp-4</strong>] in the Stack.</p>
<p>The rest of the instructions are setting up to call the function AddOneTo(), calling it, and assigning to iResult from its return value.</p>
<ul>
<li><em>line 4</em>: is <strong>mov</strong>ing the value of iResult into the <strong>eax</strong> register.</li>
</ul>
<ul>
<li><em>line 5</em>: is <strong>push</strong>ing the value of iResult from <strong>eax</strong> onto the top of the stack (which also decrements <strong>esp</strong>). This is storing a copy of the value of iResult on the stack as the function parameter to AddOneTo().</li>
</ul>
<ul>
<li><em>line 6</em>: is <strong>call</strong>ing to address 0131101Eh. This instruction causes the function AddOneTo() to be called. It first pushes the address 01311299h onto the Stack (which is the memory address of the instruction at<em> line 7</em>), and then it jumps program execution to the instruction at 0131101Eh.</li>
</ul>
<ul>
<li><em>line 7</em>: When the function called by <em>line 6</em> returns, the function parameter pushed onto the Stack in <em>line 5</em> must be removed so the Stack state is as it was before AddOneTo() was called. To achieve this we <strong>add</strong> 4 to <strong>esp</strong> &#8211; this has the same effect on <strong>esp</strong> as <strong>pop</strong>, but we don&#8217;t care about the value so it makes sense to adjust <strong>esp</strong> directly. I assume this is also more efficient, but I&#8217;ve never looked into it.</li>
</ul>
<ul>
<li><em>line 8:</em> moves the value in <strong>eax</strong> into [<strong>ebp-4</strong>] where we know that iResult is stored. The standard &#8220;stdcall&#8221; convention for win32 x86 code specifies that <strong>eax</strong> is used to return values from functions so this line is assigning the return value of AddOneTo() to iResult.</li>
</ul>
<p>&nbsp;</p>
<p><strong>Calling AddOneTo()</strong></p>
<p>Let’s just review the instructions involved in calling AddOneTo() in detail:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="asm" style="font-family:monospace;">     <span style="color: #ff0000;">9</span><span style="color: #339933;">:</span>     <span style="color: #00007f; font-weight: bold;">int</span> iResult = <span style="color: #ff0000;">0</span><span style="color: #666666; font-style: italic;">;</span>
<span style="color: #adadad; font-style: italic;">01311289</span>  <span style="color: #00007f; font-weight: bold;">mov</span>         <span style="color: #0000ff; font-weight: bold;">dword</span> ptr <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #46aa03; font-weight: bold;">ebp</span><span style="color: #339933;">-</span><span style="color: #ff0000;">4</span><span style="color: #009900; font-weight: bold;">&#93;</span><span style="color: #339933;">,</span><span style="color: #ff0000;">0</span>  
    <span style="color: #ff0000;">10</span><span style="color: #339933;">:</span>     iResult = AddOneTo<span style="color: #009900; font-weight: bold;">&#40;</span> iResult <span style="color: #009900; font-weight: bold;">&#41;</span><span style="color: #666666; font-style: italic;">;</span>
<span style="color: #adadad; font-style: italic;">01311290</span>  <span style="color: #00007f; font-weight: bold;">mov</span>         <span style="color: #46aa03; font-weight: bold;">eax</span><span style="color: #339933;">,</span><span style="color: #0000ff; font-weight: bold;">dword</span> ptr <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #46aa03; font-weight: bold;">ebp</span><span style="color: #339933;">-</span><span style="color: #ff0000;">4</span><span style="color: #009900; font-weight: bold;">&#93;</span>  
<span style="color: #adadad; font-style: italic;">01311293</span>  <span style="color: #00007f; font-weight: bold;">push</span>        <span style="color: #46aa03; font-weight: bold;">eax</span>  
<span style="color: #adadad; font-style: italic;">01311294</span>  <span style="color: #00007f; font-weight: bold;">call</span>        0131101E</pre></td></tr></table></div>

<ol>
<li>A copy of iResult’s value is pushed onto the stack (via <strong>eax</strong>) as the parameter for AddOneTo().</li>
<li><strong>push</strong> moves <strong>esp</strong> by 4 bytes (i.e. 32 bits) then stores its operand at that address, after the push instruction the value of iResult is stored in the address [<strong>esp</strong>].</li>
<li><strong>call</strong> pushes the address of the next instruction to execute after the function returns (the <strong><em>return address</em></strong> &#8211; 01311299h) onto the Stack, and then jumps execution to the address 0131101Eh.</li>
<li>The copy of iResult’s value is now at [<strong>esp+4</strong>], and the return address is at [<strong>esp</strong>].</li>
</ol>
<p>At this point, the Stack looks like this:</p>
<div id="attachment_21990" class="wp-caption alignnone" style="width: 306px"><a href="http://altdevblogaday.com/wp-content/uploads/2011/12/TheStack_Snapshot_21.png"><img class="size-full wp-image-21990" src="http://altdevblogaday.com/wp-content/uploads/2011/12/TheStack_Snapshot_21.png" alt="TheStack_Snapshot_2" width="296" height="360" /></a><p class="wp-caption-text">state of the Stack immediately after the call to AddOneTo() is made from main()</p></div>
<p>The code at the address 0131101Eh that has been <strong>call</strong>ed looks like this:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
</pre></td><td class="code"><pre class="asm" style="font-family:monospace;">AddOneTo<span style="color: #339933;">:</span>
<span style="color: #adadad; font-style: italic;">0131101E</span>  <span style="color: #00007f; font-weight: bold;">jmp</span>         <span style="color: #ff0000;">01311250</span></pre></td></tr></table></div>

<p>I have to confess this is confusing. This instruction simply makes the code execution jump again, this time to the disassembly that represents the actual function body of AddOneTo(), which is at 01311250h. Why call to an instruction that does another jump?</p>
<p>If you step through this yourself, you’ll notice that the disassembly around this function appears to be a collection of <strong>goto</strong> style labels. This is because they are. You’ll also notice that the instructions associated with each label are jumping elsewhere. Clearly we&#8217;re looking at some sort of &#8220;jump table&#8221;.</p>
<p>The reason for this? Since I used the default debug configuration build settings; the option “Enable Incremental Linking” is set to “Yes”.</p>
<p>Incremental linking makes linking faster (apparently), but clearly introduces a small overhead to all function calls. This is the sort of thing you might possibly want to consider turning off in the build options &#8211; but you&#8217;d want to profile it to to make an informed decision as to its impact either way before doing so.</p>
<p>The <strong>jmp</strong> instruction doesn’t disturb the Stack, so no harm done really (other than the likely instruction cache miss introduced by the incremental link&#8217;s jump table).</p>
<p>&nbsp;</p>
<p><strong>Getting at the parameter passed to AddOneTo()</strong></p>
<p>So, at last, we come to the disassembly of the body of AddOneTo():</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
</pre></td><td class="code"><pre class="asm" style="font-family:monospace;">     <span style="color: #ff0000;">1</span><span style="color: #339933;">:</span> <span style="color: #00007f; font-weight: bold;">int</span> AddOneTo<span style="color: #009900; font-weight: bold;">&#40;</span> <span style="color: #00007f; font-weight: bold;">int</span> iParameter <span style="color: #009900; font-weight: bold;">&#41;</span>
     <span style="color: #ff0000;">2</span><span style="color: #339933;">:</span> <span style="color: #009900; font-weight: bold;">&#123;</span>
<span style="color: #adadad; font-style: italic;">01311250</span>  <span style="color: #00007f; font-weight: bold;">push</span>        <span style="color: #46aa03; font-weight: bold;">ebp</span>  
<span style="color: #adadad; font-style: italic;">01311251</span>  <span style="color: #00007f; font-weight: bold;">mov</span>         <span style="color: #46aa03; font-weight: bold;">ebp</span><span style="color: #339933;">,</span><span style="color: #46aa03; font-weight: bold;">esp</span>  
<span style="color: #adadad; font-style: italic;">01311253</span>  <span style="color: #00007f; font-weight: bold;">sub</span>         <span style="color: #46aa03; font-weight: bold;">esp</span><span style="color: #339933;">,</span><span style="color: #ff0000;">44h</span>  
<span style="color: #adadad; font-style: italic;">01311256</span>  <span style="color: #00007f; font-weight: bold;">push</span>        <span style="color: #46aa03; font-weight: bold;">ebx</span>  
<span style="color: #adadad; font-style: italic;">01311257</span>  <span style="color: #00007f; font-weight: bold;">push</span>        <span style="color: #46aa03; font-weight: bold;">esi</span>  
<span style="color: #adadad; font-style: italic;">01311258</span>  <span style="color: #00007f; font-weight: bold;">push</span>        <span style="color: #46aa03; font-weight: bold;">edi</span>  
     <span style="color: #ff0000;">3</span><span style="color: #339933;">:</span>     <span style="color: #00007f; font-weight: bold;">int</span> iLocal = iParameter <span style="color: #339933;">+</span> <span style="color: #ff0000;">1</span><span style="color: #666666; font-style: italic;">;</span>
<span style="color: #adadad; font-style: italic;">01311259</span>  <span style="color: #00007f; font-weight: bold;">mov</span>         <span style="color: #46aa03; font-weight: bold;">eax</span><span style="color: #339933;">,</span><span style="color: #0000ff; font-weight: bold;">dword</span> ptr <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #46aa03; font-weight: bold;">ebp</span><span style="color: #339933;">+</span><span style="color: #ff0000;">8</span><span style="color: #009900; font-weight: bold;">&#93;</span>  
<span style="color: #adadad; font-style: italic;">0131125C</span>  <span style="color: #00007f; font-weight: bold;">add</span>         <span style="color: #46aa03; font-weight: bold;">eax</span><span style="color: #339933;">,</span><span style="color: #ff0000;">1</span>  
<span style="color: #adadad; font-style: italic;">0131125F</span>  <span style="color: #00007f; font-weight: bold;">mov</span>         <span style="color: #0000ff; font-weight: bold;">dword</span> ptr <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #46aa03; font-weight: bold;">ebp</span><span style="color: #339933;">-</span><span style="color: #ff0000;">4</span><span style="color: #009900; font-weight: bold;">&#93;</span><span style="color: #339933;">,</span><span style="color: #46aa03; font-weight: bold;">eax</span>  
     <span style="color: #ff0000;">4</span><span style="color: #339933;">:</span>     return iLocal<span style="color: #666666; font-style: italic;">;</span>
<span style="color: #adadad; font-style: italic;">01311262</span>  <span style="color: #00007f; font-weight: bold;">mov</span>         <span style="color: #46aa03; font-weight: bold;">eax</span><span style="color: #339933;">,</span><span style="color: #0000ff; font-weight: bold;">dword</span> ptr <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #46aa03; font-weight: bold;">ebp</span><span style="color: #339933;">-</span><span style="color: #ff0000;">4</span><span style="color: #009900; font-weight: bold;">&#93;</span>  
     <span style="color: #ff0000;">5</span><span style="color: #339933;">:</span> <span style="color: #009900; font-weight: bold;">&#125;</span>
<span style="color: #adadad; font-style: italic;">01311265</span>  <span style="color: #00007f; font-weight: bold;">pop</span>         <span style="color: #46aa03; font-weight: bold;">edi</span>  
<span style="color: #adadad; font-style: italic;">01311266</span>  <span style="color: #00007f; font-weight: bold;">pop</span>         <span style="color: #46aa03; font-weight: bold;">esi</span>  
<span style="color: #adadad; font-style: italic;">01311267</span>  <span style="color: #00007f; font-weight: bold;">pop</span>         <span style="color: #46aa03; font-weight: bold;">ebx</span>  
<span style="color: #adadad; font-style: italic;">01311268</span>  <span style="color: #00007f; font-weight: bold;">mov</span>         <span style="color: #46aa03; font-weight: bold;">esp</span><span style="color: #339933;">,</span><span style="color: #46aa03; font-weight: bold;">ebp</span>  
<span style="color: #adadad; font-style: italic;">0131126A</span>  <span style="color: #00007f; font-weight: bold;">pop</span>         <span style="color: #46aa03; font-weight: bold;">ebp</span>  
<span style="color: #adadad; font-style: italic;">0131126B</span>  <span style="color: #00007f; font-weight: bold;">ret</span></pre></td></tr></table></div>

<p>We&#8217;re already familar with the preamble (<em>lines 3 to 8</em>) and postamble (<em>lines 16 to 20</em>) which are identical to that of main().</p>
<p><em>Line 10</em> is much more interesting, as it gets the function parameter iParameter off the stack. Note the positive offset from <strong>ebp </strong>– this<strong> </strong>means that the address it is <strong>mov</strong>ing a value from into <strong>eax</strong> is outside this function’s stack frame.</p>
<p>As we established earlier, when we jumped to the address of this function, the address <strong>esp</strong> pointed to contained the<strong> </strong>return address, and a <em>copy</em> of the local variable iResult (i.e was stored at [<strong>esp+4</strong>].</p>
<p>The first instruction in the preamble is a push, which changes <strong>esp</strong> by a further 4 bytes; so immediately after <em>line 3</em> the value of iResult &#8211; or iParameter as it is refered to in this function &#8211; is now at [<strong>esp+8</strong>].</p>
<p>The next instruction <strong>mov</strong>es the value of <strong>esp</strong> into <strong>ebp</strong>, so the value of iReturn passed as a parameter to this function is now also at [<strong>ebp+8</strong>] – which is exactly where <em>line 10</em> is accessing it from.</p>
<p>So now we know how arguments are passed to functions. Win!</p>
<p>Since this is the most data that this program ever puts on the Stack, we should take a look at a snapshot of the Stack so we can see exactly where everything is:</p>
<div id="attachment_21991" class="wp-caption alignnone" style="width: 300px"><a href="http://altdevblogaday.com/wp-content/uploads/2011/12/TheStack_Snapshot_31.png"><img class="size-full wp-image-21991" src="http://altdevblogaday.com/wp-content/uploads/2011/12/TheStack_Snapshot_31.png" alt="TheStack_Snapshot_3" width="290" height="440" /></a><p class="wp-caption-text">state of the Stack immediately after AddOneTo() preamble</p></div>
<p>&nbsp;</p>
<h3>Returning the result of AddOneTo()</h3>
<p>Ignoring the function preamble, we are left with:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
</pre></td><td class="code"><pre class="asm" style="font-family:monospace;">     <span style="color: #ff0000;">3</span><span style="color: #339933;">:</span>     <span style="color: #00007f; font-weight: bold;">int</span> iLocal = iParameter <span style="color: #339933;">+</span> <span style="color: #ff0000;">1</span><span style="color: #666666; font-style: italic;">;</span>
<span style="color: #adadad; font-style: italic;">01311259</span>  <span style="color: #00007f; font-weight: bold;">mov</span>         <span style="color: #46aa03; font-weight: bold;">eax</span><span style="color: #339933;">,</span><span style="color: #0000ff; font-weight: bold;">dword</span> ptr <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #46aa03; font-weight: bold;">ebp</span><span style="color: #339933;">+</span><span style="color: #ff0000;">8</span><span style="color: #009900; font-weight: bold;">&#93;</span>  
<span style="color: #adadad; font-style: italic;">0131125C</span>  <span style="color: #00007f; font-weight: bold;">add</span>         <span style="color: #46aa03; font-weight: bold;">eax</span><span style="color: #339933;">,</span><span style="color: #ff0000;">1</span>  
<span style="color: #adadad; font-style: italic;">0131125F</span>  <span style="color: #00007f; font-weight: bold;">mov</span>         <span style="color: #0000ff; font-weight: bold;">dword</span> ptr <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #46aa03; font-weight: bold;">ebp</span><span style="color: #339933;">-</span><span style="color: #ff0000;">4</span><span style="color: #009900; font-weight: bold;">&#93;</span><span style="color: #339933;">,</span><span style="color: #46aa03; font-weight: bold;">eax</span>  
     <span style="color: #ff0000;">4</span><span style="color: #339933;">:</span>     return iLocal<span style="color: #666666; font-style: italic;">;</span>
<span style="color: #adadad; font-style: italic;">01311262</span>  <span style="color: #00007f; font-weight: bold;">mov</span>         <span style="color: #46aa03; font-weight: bold;">eax</span><span style="color: #339933;">,</span><span style="color: #0000ff; font-weight: bold;">dword</span> ptr <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #46aa03; font-weight: bold;">ebp</span><span style="color: #339933;">-</span><span style="color: #ff0000;">4</span><span style="color: #009900; font-weight: bold;">&#93;</span>  
     <span style="color: #ff0000;">5</span><span style="color: #339933;">:</span> <span style="color: #009900; font-weight: bold;">&#125;</span>
<span style="color: #adadad; font-style: italic;">01311265</span>  <span style="color: #00007f; font-weight: bold;">pop</span>         <span style="color: #46aa03; font-weight: bold;">edi</span>  
<span style="color: #adadad; font-style: italic;">01311266</span>  <span style="color: #00007f; font-weight: bold;">pop</span>         <span style="color: #46aa03; font-weight: bold;">esi</span>  
<span style="color: #adadad; font-style: italic;">01311267</span>  <span style="color: #00007f; font-weight: bold;">pop</span>         <span style="color: #46aa03; font-weight: bold;">ebx</span>  
<span style="color: #adadad; font-style: italic;">01311268</span>  <span style="color: #00007f; font-weight: bold;">mov</span>         <span style="color: #46aa03; font-weight: bold;">esp</span><span style="color: #339933;">,</span><span style="color: #46aa03; font-weight: bold;">ebp</span>  
<span style="color: #adadad; font-style: italic;">0131126A</span>  <span style="color: #00007f; font-weight: bold;">pop</span>         <span style="color: #46aa03; font-weight: bold;">ebp</span>  
<span style="color: #adadad; font-style: italic;">0131126B</span>  <span style="color: #00007f; font-weight: bold;">ret</span></pre></td></tr></table></div>

<ul>
<li><em>Line 2</em> <strong>mov</strong>es the value of iParameter from the Stack into <strong>eax</strong>.</li>
</ul>
<ul>
<li><em>Lines 3 &amp; 4</em> are <strong>add</strong>ing one to the parameter value in <strong>eax</strong> and then <strong>mov</strong>ing the content of <strong>eax</strong> into the address at [<strong>ebp-4</strong>] which is the address of the local variable iLocal.</li>
</ul>
<ul>
<li><em>Line 6</em> sets up the function’s return value by <strong>mov</strong>ing the value of iLocal into <strong>eax </strong>– which is where the stdcall calling convention specifies that return values go. If you remember, the code in main that accesses the return value expects it in <strong>eax</strong>.</li>
</ul>
<p>If you’re paying attention you should have noticed that <em>lines 4 and 6 </em>are essentially redundant, since the return value was already in <strong>eax</strong> after <em>line 3</em>.</p>
<p>You will see this sort of thing all the time when looking at completely unoptimised disassembly -  but it&#8217;s not a bad thing after all, when not asked to optimise the compiler’s task is to do exactly what your code has asked it to.</p>
<p>So the final piece of the puzzle in all this is actually returning from one function to another.</p>
<p>We know that the postamble puts the Stack back into the same state it was in immediately before the function&#8217;s preamble executed, well we already know what that looks like &#8211; because we have a digram of it at T=2:</p>
<div id="attachment_21562" class="wp-caption alignnone" style="width: 306px"><a href="http://altdevblogaday.com/wp-content/uploads/2011/12/TheStack_Snapshot_2.png"><img class="size-full wp-image-21562" src="http://altdevblogaday.com/wp-content/uploads/2011/12/TheStack_Snapshot_2.png" alt="" width="296" height="361" /></a><p class="wp-caption-text">state of the Stack immediately after the call to AddOneTo() is made from main()</p></div>
<p>The final <strong>ret</strong> on <em>line 13</em> in the code box above causes the return address currently stored on the top of Stack by the <strong>call </strong>instruction in main() to be popped (adding 4 to <strong>esp</strong>) and resumes execution at that address – i.e. at the instruction immediately after the <strong>call</strong> in main().</p>
<p>Phew. There you have it. See, it wasn&#8217;t that bad was it?</p>
<p>&nbsp;</p>
<h2>Summary</h2>
<p>In this article I’ve taken you through the disassembly of a simple program with a single function call with one parameter and a return value to demonstrate how the stack works.</p>
<p>The sample disassembly we looked at used the x86 stdcall calling convention, and whilst specifics of the disassembly generated to manage the stack will vary between calling conventions, the way the Stack works on any other machine or with any other calling convention should be very similar in principle.</p>
<p>If, for totally random example, you’re working with a platform that uses some variant of the IBM power PC CPU just fire up one of the simple SDK demos and step through the disassembly in a debug build. Whilst the disassembler mnenmonics will be (very) different, just spend a little while with a hardware manual and you should find that it’s doing pretty much the same thing as this x86 code we&#8217;ve just looked at.</p>
<p>You’re likely to find some significant variation relative to the x86 assembler we looked since the platform you’re working with almost certianly uses a different function calling convention – for example your compiler might mostly pass arguments via registers and only use the Stack when a function requires a large number of parameters, or uses var args (e.g. printf() ).</p>
<p>Either way, a quick flick through the hardware manuals and / or a search or two through the hardware / compiler manufacturer’s forums and newsgroups should tell get you the details of the calling convention you’re dealing with and enable you to get your head around it.</p>
<p>On top of this, the understanding you now have of the Stack’s mechanisms and data layouts should enable you to properly appreciate why out of bounds array access with local arrays can be so dangerous, and precisely why you should think very carefully before passing pointers to local variables…</p>
<p>&nbsp;</p>
<h2><strong>Next Time…</strong></h2>
<p>Believe it or not, we&#8217;re not finished with the Stack &#8211; we have plenty more to cover about it!</p>
<p>For example:</p>
<ul>
<li>What happens when passing &gt; 1 parameter?</li>
<li>More detail on how local variables use Stack memory.</li>
<li>How pass by value and return by value work.</li>
<li>How some of the different x86 function calling conventions work – particularly __fastcall which is more similar to the calling conventions typically used by the ABIs (Application Binary Interface) for console platforms.</li>
</ul>
<p>Given how long this article has become (and the time it took to write…), I will probably split these into more than one post.</p>
<p>&nbsp;</p>
<h2><strong>Epilogue…</strong></h2>
<p>Assuming that you’ve run this code yourself in VS2010 then, if you have some spare time and fancy a bit of a shock, you may want to run this code in a release build configuration with the same breakpoint and look at the disassembly window when the breakpoint is hit.</p>
<p>I found the optimised code generated pretty interesting.</p>
<p>You may also find it fun to see what changes you need to make to the code to force the optimised code to keep the function call and local variables. I certainly did.</p>
<p>&nbsp;</p>
<h3>Thanks</h3>
<p>I&#8217;d like to thank Bruce, Tiffany, Jonathan, Fabian, and Darren for their feedback on this article. It is definitely much better for it.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.altdevblogaday.com/2011/12/14/c-c-low-level-curriculum-part-3-the-stack/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>C / C++ Low Level Curriculum part 2: Data Types</title>
		<link>http://www.altdevblogaday.com/2011/11/24/c-c-low-level-curriculum-part-2-data-types/</link>
		<comments>http://www.altdevblogaday.com/2011/11/24/c-c-low-level-curriculum-part-2-data-types/#comments</comments>
		<pubDate>Thu, 24 Nov 2011 12:00:30 +0000</pubDate>
		<dc:creator>Alex Darby</dc:creator>
				<category><![CDATA[Education]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[C++]]></category>
		<category><![CDATA[learning]]></category>
		<category><![CDATA[level]]></category>
		<category><![CDATA[low]]></category>
		<category><![CDATA[low level]]></category>

		<guid isPermaLink="false">http://altdevblogaday.com/?p=20391</guid>
		<description><![CDATA[<h2>Prologue</h2>
<p>Hello and welcome to the 2nd part of the C / C++ low level curriculum series of posts that I&#8217;m currently doing.</p>
<p>Here&#8217;s a link to the first one if you missed it: http://altdevblogaday.com/2011/11/09/a-low-level-curriculum-for-c-and-c/</p>
<p><a href="http://www.altdevblogaday.com/2011/11/24/c-c-low-level-curriculum-part-2-data-types/" class="more-link">Read more on C / C++ Low Level Curriculum part 2: Data Types&#8230;</a></p>
]]></description>
				<content:encoded><![CDATA[<h2>Prologue</h2>
<p>Hello and welcome to the 2nd part of the C / C++ low level curriculum series of posts that I&#8217;m currently doing.</p>
<p>Here&#8217;s a link to the first one if you missed it: http://altdevblogaday.com/2011/11/09/a-low-level-curriculum-for-c-and-c/</p>
<p>This post is going to be a little lighter than most of the other posts in the series, primarily because this post is vying for my spare time with my urge to save a blonde girl with pointy ears from the skinny androgynous Demon Lord of extended monologue in a virtual universe powered by three equilateral triangles.</p>
<p>Before we continue, I&#8217;d like to quickly bring to public note a book that has now been recommended to me many times as a result of the first post: http://www1.idc.ac.il/tecs</p>
<p>I can&#8217;t personally vouch for it, but I fully intend to buy it and grok its face off as soon as I get some spare time in my schedule. This book looks awesome, and if it is half as good as it looks to be then reading it should be an extremely worthwhile investment of your time&#8230;</p>
<h3>Assumptions</h3>
<p>The next thing on my agenda is to discuss assumptions.</p>
<p>Assumptions are dangerous. Even by writing this I am making many assumptions &#8211; that you have a computer, that you can read and understand The Queen&#8217;s English, and that on some level you care about understanding the low-level of C++ to name but a few.</p>
<p>Consequently, dear reader, I feel that it&#8217;s worth mentioning what I assume about you before I go any further.</p>
<p>The important thing, I guess, that I should mention is that I assume that you are already familiar with and comfortable using C and/or C++. If you&#8217;re not, then I&#8217;d advise you to go and get comfortable before you read any more of this :)</p>
<h2></h2>
<h2>Data Types?</h2>
<p>So, again, I find myself almost instantly qualifying the title of the post and explaining what I mean when I say <em>data types</em>.</p>
<p>What I am talking about is the &#8220;Fundamental&#8221; types of C++ and what you should know about how they relate to the machine level &#8211; even this seemingly straightforward aspect of C++ is not necessarily what you would expect; especially when dealing with multiple target platforms.</p>
<p>Whilst this isn&#8217;t the kind of information that will suddenly improve your code by an order of magnitude, it is (in my opinion) one of the key building blocks of understanding C / C++ at the low level; as it has tonnes of potential knock on effects in terms of speed of execution, memory layout of complex types etc.</p>
<p>Certainly, no-one ever sat me down and explained this to me, I just sort of absorbed it or looked it up over the years.</p>
<p>&nbsp;</p>
<h2>Fundamental and Intrinsic Types</h2>
<p>The<em> fundamental</em> types of C/C++ are all the types that have a language keyword.</p>
<p>These are not to be confused with the <em>intrinsic types</em> which are the types that are natively handled by some given CPU (i.e. the data types that the machine instructions of that CPU operate on).</p>
<p>Whenever you use new hardware you should check how the compiler for your platform is representing your fundamental types. The best way to do this is (can you guess?) to look at the disassembly window.</p>
<p>These days all fundamental types of C++ can be represented by an intrinsic type on most platforms; but you definitely shouldn&#8217;t take this for granted, it has only really been the case since the current console hardware generation.</p>
<p>There are 3 categories of fundamental type: integer, floating, and void.</p>
<p>As we all know, the void type cannot be used to store values. It is used to specify &#8220;no type&#8221;.</p>
<p>For both integral and floating point types there is a progression of types that can hold larger values and/or have more numerical precision.</p>
<p>For integers this progression is (from least to most precision) char, short, int, long, long long; and for floats: float, double, long double.</p>
<p>Clearly, the numerical value limits that a given type must be able to store mandate a certain minimum data size  for that type (i.e. number of bits needed to store the prescribed values when stored in binary).</p>
<p>&nbsp;</p>
<h2>Sizes of Fundamental types</h2>
<p>As far as I have been able to discover, the C and C++ standards make no explicit guarantee about the <em>specific</em> size of any of the Fundamental types</p>
<p>There are, however, several key rules about the sizes of the various types which I have paraphrased below:</p>
<ol>
<li>A char must be a minimum of 8 bits.</li>
<li>sizeof( char ) == 1.</li>
<li>If a pointer of type char* points at the very first address of a contiguous block of memory, then every single address in that block of memory must be traversable by simply incrementing that pointer.</li>
<li>The C standard specifies a value that each of the integer types must be able to represent (see page 33 in <a href="http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf" target="_blank">this .pdf of the C standard</a> if you want the values - see the header of a standard conformant C++ implementation for details of the values used by your compiler).</li>
<li>The C++ standard says nothing about size, only that &#8220;There are five standard signed integer types : “signed char”, “short int”, “int”, “long int”, and “long long int”. In this list, each type provides at least as much storage as those preceding it in the list.&#8221; (see page 75 <a href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2011/n3242.pdf" target="_blank">in this .pdf of the latest C++ standard</a> I could find).</li>
<li>4 &amp; 5 have similar rules in the C and C++ standard for the progression of floats.</li>
</ol>
<p>Helpfully, MSDN has a useful <a href="http://msdn.microsoft.com/en-us/library/cc953fe1%28v=VS.100%29.aspx" target="_blank">summary of this information</a> (though it&#8217;s partly MSVC specific, it&#8217;s a good starting point).</p>
<p>Despite all this leeway in the standard, the size of the fundamental types across PC and current gen console platforms is (to the best of my knowledge) relatively consistent.</p>
<p>The C++ standard also defines bool as an integral type. It has two values, true and false, which can be implicitly converted to and from the integer values 1 and 0 respectively; and is the return type of all the logical operators (==, !=, &gt;, &lt; etc.).</p>
<p>As far as I have been able to ascertain, the standard only specifies that bool must be able to represent a binary state. Consequently, the size of bool can vary dramatically according to compiler implementation, and even within code generated by the same compiler &#8211; I have seen it vary between 1 and 4 bytes on platforms I&#8217;ve used &#8211; I have always assumed that this was down to speed of execution vs. storage size tradeoffs.</p>
<p>This &#8216;size of bool&#8217; issue resulted in the use of bool being banned from use in complex data structures at least one company that I have worked at.  I should clarify that this was a &#8216;proactive&#8217; banning based on the fact that it <em>might</em> cause trouble rather than one that resulted from trouble actually having been caused.</p>
<p>We should also mention enums at this point (thanks John!) &#8211; the standard gives the storage value of an enumerated type the liberty to vary in size depending on the range of values represented by each specific enum &#8211; even within the same codebase &#8211; so an enum with values &lt; 255 (or &lt;= 256 members with no values assigned) may well have sizeof() == 1, and one which has to represent 32 bit values would typically have sizeof() == 4.</p>
<p>This brings us onto pointers. Strictly speaking pointers are not defined as one of the fundamental types, but the value of a pointer clearly has a corresponding data size so we&#8217;re covering them here.</p>
<p>The first thing to note about pointers is that the numeric limits required for a pointer on any given platform are determined by the size of the addressable memory on that platform.</p>
<p>If you have 1 GB of memory that must be accessible in 1 byte increments, then a pointer needs to be able to hold values up to ((1024 * 1024 * 1024) &#8211; 1), which is (2^30 -1) or 30 bits. 4GB is the most that can be addressed with a 32 bit value &#8211; which is why win32 systems can&#8217;t make use of more than 4GB.</p>
<p>For example, when compiling for win32 with VS2010, pointers are 32 bit (i.e. sizeof() ==4), and when compiling for OSX with XCode (on the Macbook Pro I use at work for iOS development) pointers are 42 bit (sizeof() ==6).</p>
<p>One thing that is definitely worth noting is that all <em><strong>data</strong></em> pointers produced by a given compiler will be the same size (n.b. this is not true of function pointers).  The type of a pointer is, after all, a language level abstraction &#8211; under the hood they are all just a memory address. This is also why they can all be happily converted to and from void* &#8211; void* being a &#8216;typeless pointer&#8217; (n.b. function pointers cannot be converted to or from void*).</p>
<p>That said, knowing the type of the pointer is absolutely crucial to the low level of many of the higher level language mechanisms &#8211; as we shall see in later posts.</p>
<h3>Addendum</h3>
<p>So, following on from a couple of the comments, I need to cover function pointers as separate from data pointers.</p>
<p>I made an incorrect assertion that all pointers were the same size. This is only true of data pointers.</p>
<p>Function pointers can be of different sizes precisely because they are not necessarily just memory addresses &#8211; in the case of multiply inherited functions or virtual functions they are typically structures.</p>
<p>I recommend the blog that Bryan Robertson linked me to, as it gives a concrete example of why pointer to member functions often need to be more than a memory address: <a href="http://blogs.msdn.com/b/oldnewthing/archive/2004/02/09/70002.aspx">http://blogs.msdn.com/b/oldnewthing/archive/2004/02/09/70002.aspx</a></p>
<p>I also found these useful links relating to function pointers and void* (this whole site is pretty damn useful in fact):</p>
<p><a href="http://www.parashift.com/c++-faq-lite/pointers-to-members.html#faq-33.10">http://www.parashift.com/c++-faq-lite/pointers-to-members.html#faq-33.10</a></p>
<p><a href="http://www.parashift.com/c++-faq-lite/pointers-to-members.html#faq-33.11">http://www.parashift.com/c++-faq-lite/pointers-to-members.html#faq-33.11</a></p>
<p>Thanks to Bryan and Jalf for pushing me to find out more :)</p>
<p>&nbsp;</p>
<h2>Intrinsic Types used by Fundamental Types</h2>
<p>As I mentioned up front, this varies between the various platforms &#8211; and even then there&#8217;s no guarantee that the compiler will do the &#8220;sensible&#8221; thing and use all the intrinsic types supported by the platform you&#8217;re using.</p>
<p>Here is a screenshot of a win32 console app I knocked up that prints the sizes of the Fundamental types as created by compiling for win32 under VS2010:</p>
<div id="attachment_20405" class="wp-caption alignnone" style="width: 687px"><a href="http://altdevblogaday.com/wp-content/uploads/2011/11/builtinTypes.png"><img class="size-full wp-image-20405" src="http://altdevblogaday.com/wp-content/uploads/2011/11/builtinTypes.png" alt="C++ Built In Types" width="677" height="342" /></a><p class="wp-caption-text">C++ Fundamental types (win32 compiled on Windows 7 with VS2010)</p></div>
<p>My home machine is a 64 bit intel thing of some description, about a year old.</p>
<p>Since the processor is 64 bit, I&#8217;d hope that all of these sizes correspond to intrinsic types (8 bytes being the size of a 64 bit CPU register), however since I&#8217;m compiling for win32 (which can only fit 4 bytes in a standard CPU register) I&#8217;m guessing that it won&#8217;t be using intrinsics for types &gt; 32 bit.</p>
<div id="attachment_20598" class="wp-caption alignleft" style="width: 418px"><a href="http://altdevblogaday.com/wp-content/uploads/2011/11/longlongadd.png"><img class="size-full wp-image-20598" src="http://altdevblogaday.com/wp-content/uploads/2011/11/longlongadd.png" alt="adding 2 long long values and storing the result in a 3rd long long" width="408" height="214" /></a><p class="wp-caption-text">adding 2 long long values and storing the result in a 3rd long long</p></div>
<p>In any event, I can&#8217;t be sure without looking at the disassembly.</p>
<p><em><em>&lt;&#8230;pause to add some simple test code with long long and run it&#8230;&gt;</em><br />
</em></p>
<p>Sure enough, these 8 byte long long values are being handled as 2 32bit values<em></em>.</p>
<p>Ignoring the actual addition, you can clearly see this because the code initialising llTest and llTest2 is setting them in two separate steps for the upper and lower 32 bits of the 64 bit values.</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>So now I know, and it wasn&#8217;t even scary &#8211; really I should go and check the rest of them&#8230;</p>
<p>&nbsp;</p>
<h2>Fancy Intrinsics</h2>
<p>Most modern CPUs have fancy intrinsics &#8211; e.g. 128 bit vector registers that can store and operate on four-32bit-floats-in-one-value sort of stuff.</p>
<p>In theory these sorts of extra intrinsics can provide big wins in certain situations &#8211; e.g. heavy duty chunks of vector maths, or non vector maths that can be parallelised into vectors.</p>
<p>The chances are that your compiler won&#8217;t ever use these without you asking it nicely. There are plenty of good reasons why this is the case (apparently), but you should find that support for these hardware specific intrinsics will be mentioned in your hardware / compiler manuals.</p>
<p>&nbsp;</p>
<h2>Summary</h2>
<p>So, what would I like you to take away from this?</p>
<p>Firstly, that there is a difference between the data types of the C++ language and the hardware data types.</p>
<p>Secondly, don&#8217;t just trust that your compiler is doing what would intuitively seem sensible to you. Check its work.</p>
<p>Thirdly, it&#8217;s not rocket science! You can find out by just modifying one of the sample programs for your new hardware  and then looking at the disassembly in the debugger.</p>
<p>Finally, thought I might insert a few points of note here:</p>
<ol>
<li>Almost all CPUs have 8 bit bytes. Any CPU with more than 8 bits per byte was probably designed by a maniac / genius (n.b. I find that there is a particularly fine line between the two in Computer Science circles).</li>
<li>One thing you need to watch out for with numerical types is that in the C standard, int and short both have the same numerical limit (unsigned int and unsigned short both have 0xFFFF (i.e. 16 bits)). I&#8217;ve never had a problem with it, but an int <em>could</em> be represented as 16 bit.</li>
<li>If you want to know the size of any given type just use the sizeof() keyword. Your compiler knows these things.</li>
</ol>
<p>&nbsp;</p>
<h2>Epilogue</h2>
<p>If you are hungry for more information on this level (i.e. fundamental and intrinsic types) I recommend searching #AltDevBlogADay, because there are loads to choose from&#8230;</p>
<p>Here are a few of articles I found when doing a quick search (apologies to those whose articles I missed as a result of less than thorough searching!):</p>
<p>http://altdevblogaday.com/2011/08/21/practical-flt-point-tricks/</p>
<p>http://altdevblogaday.com/2011/08/06/demise-low-level-programmer/</p>
<p><a href="http://altdevblogaday.com/2011/11/10/optimisation_lessons/">http://altdevblogaday.com/2011/11/10/optimisation_lessons/</a></p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.altdevblogaday.com/2011/11/24/c-c-low-level-curriculum-part-2-data-types/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>A Low Level Curriculum for C and C++</title>
		<link>http://www.altdevblogaday.com/2011/11/09/a-low-level-curriculum-for-c-and-c/</link>
		<comments>http://www.altdevblogaday.com/2011/11/09/a-low-level-curriculum-for-c-and-c/#comments</comments>
		<pubDate>Wed, 09 Nov 2011 04:00:23 +0000</pubDate>
		<dc:creator>Alex Darby</dc:creator>
				<category><![CDATA[Education]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[C++]]></category>
		<category><![CDATA[curriculum]]></category>
		<category><![CDATA[learning]]></category>
		<category><![CDATA[level]]></category>
		<category><![CDATA[low]]></category>
		<category><![CDATA[low level]]></category>

		<guid isPermaLink="false">http://altdevblogaday.com/?p=19829</guid>
		<description><![CDATA[<h1>Background</h1>
<p>In my last post I wrote <a href="http://altdevblogaday.com/2011/10/25/why-i-became-an-educator/">Why I became an Educator</a> I was bemoaning the lack of focus on Low Level understanding that seems to have afflicted Computer Science degree courses of recent times (at least in the UK&#8230;).</p>
<p><a href="http://www.altdevblogaday.com/2011/11/09/a-low-level-curriculum-for-c-and-c/" class="more-link">Read more on A Low Level Curriculum for C and C++&#8230;</a></p>
]]></description>
				<content:encoded><![CDATA[<h1>Background</h1>
<p>In my last post I wrote <a href="http://altdevblogaday.com/2011/10/25/why-i-became-an-educator/">Why I became an Educator</a> I was bemoaning the lack of focus on Low Level understanding that seems to have afflicted Computer Science degree courses of recent times (at least in the UK&#8230;).</p>
<p>As a result, I received a comment from someone called Travis Woodward that said:</p>
<p style="padding-left: 30px"><em>There are plenty of students out there who are more than willing to dive into low level stuff, but its hard to know where to start or even what to learn (the old &#8216;you don&#8217;t know what you don&#8217;t know&#8217; problem). </em></p>
<p style="padding-left: 30px"><em>I&#8217;ve looked around for something approaching a low level curriculum, but they tend to just be lists of topics which aren&#8217;t actually that helpful without context and suggested resources to start you off. The best intro I&#8217;ve found so far is a course called CS107: Programming Paradigms from Stanford on iTunesU, which has a good section on how C and C++ look to a compiler.</em></p>
<p style="padding-left: 30px"><em>So if any low level programmers want to put together a low level curriculum with suggested resources, then please do! :)</em></p>
<p>This is of course a commendable idea, and so I decided to get started on it&#8230;</p>
<p>&nbsp;</p>
<h1>Low Level Curriculum?</h1>
<p>Before I go any further, I&#8217;d like to clarify what I mean by &#8220;Low Level Curriculum&#8221;.</p>
<p>During my time in the industry I&#8217;ve helped architect and build a multi-platform once-Next-Gen-now-current-gen engine and tool chain, I&#8217;ve written plenty of shaders, tracked down countless hideous bugs by looking at disassembly and memory windows, hunted down the odd submission blocking threaded race condition, and on several occasions had to hand-unpick the broken stacks of core dumps from PS3 / X360 test decks to find bugs that only occur &#8220;in the wild&#8221;; but that doesn&#8217;t make me a low level programmer &#8211; this is the kind of thing I&#8217;d expect anyone with my sort of experience to have done.</p>
<p>I&#8217;ve never sat for hours poring over GPad or Pix captures, I&#8217;ve never really had to worry about stuff like patching fragment shaders or batched physics calculations on SPUs, or how to get the most from my AltiVecs, and I&#8217;ve certainly never had to re-code large chunks of code in assembler taking advantage of caching or sneaky DMA modes to get a few extra FPS out of anything &#8211; this is what low level programmers do, platform specific hardware optimised code usually written to get the best performance out of a machine.</p>
<p>This curriculum is not about learning to be a low level programmer.</p>
<p>What it <em><strong>is</strong></em> about is gaining a solid understanding of the low level implementational underpinnings of C and C++ * &#8211; an understanding that I strongly feel should be the base line for any programmer working in games.</p>
<p>Over the course of however many posts this eventually takes up I&#8217;ll be covering:</p>
<ol>
<li>Data types</li>
<li>Pointers, Arrays, and References</li>
<li>Functions and the Stack</li>
<li>The C++ object model (several posts)</li>
<li>Memory (again, several posts)</li>
<li>Caches</li>
</ol>
<p>Assuming you read and understand all of the posts in this series &#8211; and that I manage to communicate the information effectively &#8211; you should end up in a place where for any given &#8220;foible&#8221; of the language you understand not only <em>that</em> it exists but also &#8211; and most crucially &#8211; <em><strong>why</strong></em> it exists.</p>
<p>For example, you may (or may not) know <em>that</em> virtual function calls don&#8217;t work in constructors, before the end of this series of posts you will understand <em><strong>why</strong></em> they can&#8217;t work in constructors.</p>
<p>Again just to be clear, I&#8217;m not necessarily talking about the same level of understanding of this as someone who writes compilers for their day job; but certainly a level of understanding that gives you a much better idea of what is likely to be going on at the level of the underlying engine that C++ sits on top of, and which consequently enables you to far better understand the implications of the code you write.</p>
<p>* N.B. to be 100% clear, C will be covered strictly as a subset of C++.</p>
<p>&nbsp;</p>
<h1>There is no source code available for the current location.</h1>
<div id="attachment_19885" class="wp-caption alignleft" style="width: 310px"><a href="http://altdevblogaday.com/wp-content/uploads/2011/11/NoSourceCodeAvailable.png"><img class="size-medium wp-image-19885 " src="http://altdevblogaday.com/wp-content/uploads/2011/11/NoSourceCodeAvailable-300x120.png" alt="There is no source code available for the current location" width="300" height="120" /></a><p class="wp-caption-text">Aieeee! Spare me the hexadecimal!</p></div>
<p>I&#8217;m sure the vast majority of programmers who use Visual Studio freak out the first few times they see this dialog.</p>
<p>I know I did.</p>
<p>I learned to program primarily in a green (or orange if the green screens were taken) screen dumb terminal Unix mainframe environment. You know, like they have in old films like Alien. The second and third year students had priority use of the XWindows machines (and the few Silicon Graphics workstations were for 3rd year graphics projects only), so dumb terminals were where I learned my trade.</p>
<p>Even on the XWindows machines there was no programming IDE that I was aware of &#8211; I used EMACS and GNU make files, and the only debugger I had use of was command line GDB, which is not what you&#8217;d call user-friendly. I got by with std::cout.</p>
<p>When I graduated I went from this world of bakelite keyboards, screen burn, and command lines into the bright world of windows 95 development using Visual Studio 4 (slightly before Direct X and hardware accelerated graphics).</p>
<p>When I first saw this dialog box you can bet your life I freaked out &#8211; and why wouldn&#8217;t I?</p>
<p>Thanks to the language syntax and code architecture focussed high level teaching methods employed by my university I had no more idea of what went on behind the veil of the compiler than my brief forays into debugging with GDB had afforded me.</p>
<p>I&#8217;d just got a degree from a well-respected University where they had altogether avoided teaching me about assembler as part of the main syllabus, and I had assumed it was because they were worried it was too much for my puny mind to deal with.</p>
<p>Suffice to say, I got over the freaking out part &#8211; but I still saw this dialog as a &#8220;No Entry&#8221; sign for far longer than I&#8217;d like to admit.</p>
<p>I only really started to really get over it a few years later when I was working closely with someone who had got a job in games on the strength of their assembler programming.</p>
<p>I had a crash, and they just casually leant over and clicked the &#8220;Show Disassembly&#8221; button. Then, equally casually, showed me exactly why my code was crashing &#8211; explaining it in terms of how C++ maps to assembler &#8211; and told me how to fix it.</p>
<p>This blew my mindgaskets three times because:</p>
<ol>
<li>this person was so casual about it</li>
<li>disassembly clearly wasn&#8217;t the black magic it appeared to be</li>
<li>given it was so simple I couldn&#8217;t believe I hadn&#8217;t been taught about the low level innards of C++ at University</li>
</ol>
<p>&nbsp;</p>
<h1>Rending the Veil of Disassembly</h1>
<p>I really didn&#8217;t realise how incredibly important this was until I had the pleasure of meeting a guy called Andy Yelland. If you already know Andy, then you will know exactly what I mean, but for those of you who have not met him I will explain.</p>
<p>Andy is one of those people who changes your perspective. He is more or less the polar opposite of the stereotypical ninja-level video game programmer: well dressed, professional, endlessly well-informed, friendly, funny, and socially adept.</p>
<p>However, the most amazing thing about Andy is the speed with which he can dissect a console core dump. He just sits there and calmly unpicks the stack, occasionally keeping a few notes about which register some value is in, or looking up the address of a function in a symbol file as he goes, and then in somewhere between 5 minutes and a few hours (depending on how tricky the issue is) he&#8217;ll turn around and tell you exactly what the problem was.</p>
<p>Not only that, but he&#8217;ll happily do this in a codebase he&#8217;s never even seen before &#8211; and even better, he&#8217;s totally happy to sit and explain it all to you as he does it.</p>
<p>After sitting with Andy for a few dissections I realised that whilst what he does seems like black magic, it is in fact anything but.</p>
<p>It&#8217;s about having an expert understanding of how C++ works at the assembly level, and bloody-mindedly applying that knowledge to reverse engineer the state of the system backwards from the current stack state (i.e. when the crash happened) to the point where the bad data was introduced.</p>
<p>Clearly this takes a lot of practice, and to get anywhere near as good as Andy at it will take anyone (who isn&#8217;t Rain Man) years of their life.</p>
<p>I&#8217;m not saying that I think everyone should be able to casually decipher disassembly representing code they didn&#8217;t write &#8211; I certainly can&#8217;t do that.</p>
<p>What I <em><strong>am</strong></em> saying is that I think all game programmers should be able to look at disassembly and be able to at least make an educated guess at what is going on, and by leveraging their understanding of how C++ is implemented at the low level &#8211; and given time, possibly with some hardware manuals &#8211; then they should be able to work it out.</p>
<h1>The first rule of the Low Level Curriculum for C++: Don&#8217;t fear Disassembly</h1>
<p>Ok, so assuming that you agree with me where do you start?</p>
<p>I think the best way to start making sense of it is to look at some simple code in the disassembly window, so let&#8217;s do that.</p>
<p>Make yourself a test project in a C++ programming IDE of your choice and write some simple code in your main() function.</p>
<p>For the sake of argument, let&#8217;s say you&#8217;re using my weapon of choice &#8211; Visual Studio 2010 on a Windows PC.</p>
<p>The Code I&#8217;m suggesting we look at is this:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
</pre></td><td class="code"><pre class="c" style="font-family:monospace;"><span style="color: #993333;">int</span> x <span style="color: #339933;">=</span> <span style="color: #0000dd;">1</span><span style="color: #339933;">;</span>
<span style="color: #993333;">int</span> y <span style="color: #339933;">=</span> <span style="color: #0000dd;">2</span><span style="color: #339933;">;</span>
<span style="color: #993333;">int</span> z <span style="color: #339933;">=</span> <span style="color: #0000dd;">0</span><span style="color: #339933;">;</span>
&nbsp;
z <span style="color: #339933;">=</span> x <span style="color: #339933;">+</span> y<span style="color: #339933;">;</span></pre></td></tr></table></div>

<p>Make sure you&#8217;re in the &#8220;debug&#8221; configuration, and put a breakpoint on the line</p>
<p style="padding-left: 30px"><strong>z = x + y;</strong></p>
<p>then run the program.</p>
<p>When the breakpoint gets hit, right-click in the text editor and choose &#8220;Go To Disassembly&#8221; from the context menu.</p>
<h2><strong><span style="color: #ff00ff">DON&#8217;T PANIC!</span></strong></h2>
<div id="attachment_19895" class="wp-caption alignnone" style="width: 693px"><a href="http://altdevblogaday.com/wp-content/uploads/2011/11/YouveBeenDisassembled2.png"><img class="size-full wp-image-19895 " src="http://altdevblogaday.com/wp-content/uploads/2011/11/YouveBeenDisassembled2.png" alt="YouveBeenDisassembled" width="683" height="332" /></a><p class="wp-caption-text">NOTE: ensure that you have the same options checked in the right-click context menu!</p></div>
<p>You should now see something that looks something like the image above. Your text will almost certainly be scrolled differently, because I&#8217;ve messed about with the window sizes and text position for clarity.</p>
<p>The black text with line numbers is clearly the code we compiled, the grey text below each line of code shows the assembler that each line of code generated.</p>
<p>&nbsp;</p>
<h3>So what does it all mean?</h3>
<p>The hexadecimal number at the start of each line of assembler is the memory address of that line of assembler &#8211; remember code is really just a stream of data that tells the CPU what to do, so logically it must be at an address in memory. Don&#8217;t worry about these too much, I just wanted to make the point that the instructions are in memory too.</p>
<p><strong>mov</strong> and <strong>add</strong> are assembler mnemonics &#8211; each represents a CPU instruction, one per line with its arguments.</p>
<p><strong>eax</strong> and <strong>ebp</strong> are two of the registers in the x86 CPU architecture. Registers are &#8220;working area&#8221; for CPUs: fragments of memory that are built into the CPU itself, and which the CPU can access instantaneously. Rather than having addresses like memory, registers are named in assembler because there are usually only a (relatively) small number of them.</p>
<p>The <strong>eax</strong> register is a &#8220;general purpose&#8221; register, but is primarily used for mathematical operations.</p>
<p>The <strong>ebp</strong> register is the &#8220;base pointer&#8221; register. In x86 assembler, local variables will typically be accessed via an offset from this register. We will cover the purpose of <strong>ebp</strong> in later posts.</p>
<p>As I alluded to in the previous sentence, <strong>ebp-8</strong>, <strong>ebp-14h</strong>, and <strong>ebp-20h</strong> are the memory addresses (as offsets from the <strong>ebp</strong> register) storing the values of the local variables <strong>x</strong>, <strong>y</strong>, and <strong>z</strong> respectively.</p>
<p><strong>dword ptr [</strong><strong> ... ]</strong> means &#8220;the 32 bit value stored in the address in the square brackets&#8221; (this is definitely true for the Win32 assembler, it may be different for the Win64 one &#8211; I&#8217;ve not checked).</p>
<p>&nbsp;</p>
<h3>How does it work?</h3>
<p>Now, we know that the assembler generated by the C++ code we&#8217;ve written will initialise the three variables x, y, and z; then add x to y and store the result in z.</p>
<p>Let&#8217;s look at each line of assembler in isolation (ignoring the address).</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="c" style="font-family:monospace;">mov    dword ptr <span style="color: #009900;">&#91;</span>ebp<span style="color: #339933;">-</span><span style="color: #0000dd;">8</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">,</span><span style="color: #0000dd;">1</span></pre></td></tr></table></div>

<p>This assembler instruction sets the value of the variable <strong>x</strong> by <strong>mov</strong>ing the value <strong>1</strong> into the memory at address <strong>ebp-8</strong>.</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="c" style="font-family:monospace;">mov    dword ptr <span style="color: #009900;">&#91;</span>ebp<span style="color: #339933;">-</span>14h<span style="color: #009900;">&#93;</span><span style="color: #339933;">,</span><span style="color: #0000dd;">2</span></pre></td></tr></table></div>

<p>This assembler instruction sets the value of the variable <strong>y</strong> by <strong>mov</strong>ing the value <strong>2 </strong>into the memory at address <strong>ebp-14h</strong> &#8211; n.b. the &#8216;h&#8217; is necessary because 14 in decimal is a different value from 14 in hexadecimal &#8211; this wasn&#8217;t necessary when specifying the offset for the value of <strong>x</strong> because 8 is the same value in decimal and hexadecimal.</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="c" style="font-family:monospace;">mov    dword ptr <span style="color: #009900;">&#91;</span>ebp<span style="color: #339933;">-</span>20h<span style="color: #009900;">&#93;</span><span style="color: #339933;">,</span><span style="color: #0000dd;">0</span></pre></td></tr></table></div>

<p>This instruction is, unsurprisingly, setting the value of the variable <strong>z</strong>.</p>
<p>Now we&#8217;re up to the interesting part, doing the arithmetic and assigning the result to <strong>z</strong>.</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="c" style="font-family:monospace;">mov    eax<span style="color: #339933;">,</span> dword ptr <span style="color: #009900;">&#91;</span>ebp<span style="color: #339933;">-</span><span style="color: #0000dd;">8</span><span style="color: #009900;">&#93;</span></pre></td></tr></table></div>

<p>This instruction <strong>mov</strong>es the value of the memory at address <strong>ebp-8</strong> (i.e. <strong>x</strong>) into the <strong>eax</strong> register&#8230;</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="c" style="font-family:monospace;">add    eax<span style="color: #339933;">,</span> dword ptr <span style="color: #009900;">&#91;</span>ebp<span style="color: #339933;">-</span>14h<span style="color: #009900;">&#93;</span></pre></td></tr></table></div>

<p>&#8230;this instruction <strong>add</strong>s the value of the memory at address <strong>ebp-14h</strong> (i.e. <strong>y</strong>) to the <strong>eax</strong> register&#8230;</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="c" style="font-family:monospace;">mov    dword ptr <span style="color: #009900;">&#91;</span>ebp<span style="color: #339933;">-</span>20h<span style="color: #009900;">&#93;</span><span style="color: #339933;">,</span>eax</pre></td></tr></table></div>

<p>&#8230;and this instruction moves the value from <strong>eax</strong> into the memory at address <strong>ebp-20h</strong> (i.e. <strong>z</strong>).</p>
<p>So, as you can see, whilst the assembler looks very different, it is logically isomorphic with the C++ code that it was generated from (i.e. whilst its behaviour may be slightly different, it will give the same output for any given input).</p>
<p>&nbsp;</p>
<h1>Hold on, why did we look at that again?</h1>
<p>Those of you with brains connected to your eyes will have noticed that the intro to disassembly I just gave was &#8211; to use a British colloquialism &#8211; &#8220;a bit noddy&#8221;.</p>
<p>In all honesty, that was the whole point of choosing such a simple example. the intention was to show how something as simple as adding two integers and storing the result in a third in C++ maps to assembler.</p>
<p>You can use this exact technique to look at the vast majority of the C++ language constructs and see what they actually generate, and the purpose of this was to show you that it&#8217;s simple enough to do.</p>
<p>Obviously this example showed only two of the x86 assembler mnemonics, of which there are many more.</p>
<p>If you want to make sense of assembler code that is using mnemonics you don&#8217;t know, it&#8217;s usually as simple as googling them. That&#8217;s all I&#8217;ve ever done, and there is so much information about x86 assembler floating about on the interweb that you should have little trouble deciphering it.</p>
<p>I found a super helpful webpage that covers the x86 registers in some detail: http://www.swansontec.com/sregisters.html</p>
<p>Here&#8217;s a link to a page to download a .pdf x86 &#8220;cheat sheet&#8221;: http://www.jegerlehner.ch/intel/</p>
<p>And the obvious wikipedia page: http://en.wikipedia.org/wiki/X86_instruction_listings</p>
<p>And a <em><strong>beefy</strong></em> link also linked from wikipedia: http://home.comcast.net/~fbui/intel.html</p>
<p>&nbsp;</p>
<h1>Summary</h1>
<p>Whilst very few programmers will ever need to write assembler, all game programmers will &#8211; sooner or later &#8211; find it to their advantage to be able to read and make some sense of it. It&#8217;s amazing what you can figure out with only a partial knowledge of assembler and how it maps to C++ code.</p>
<p>The example code we looked at was, as I&#8217;ve already admitted, very simple.</p>
<p>The point of this first post wasn&#8217;t to give you answers, but to show that the disassembly window is only daunting if you let it be; and to encourage you to explore what your compiler is doing with the code you give it.</p>
<p>Don&#8217;t give up just because you don&#8217;t understand what you&#8217;re seeing yet; google it or post a specific question somewhere like http://stackoverflow.com/.</p>
<p>&nbsp;</p>
<h2>Epilogue</h2>
<p>I guess there are a few other things that I&#8217;d like to draw your attention to a few other things that I think are there to take away from this tiny snippet of disassembly:</p>
<ol>
<li>the C++ concept of a variable (or any other language&#8217;s concept of a variable for that matter) doesn&#8217;t exist at the assembler level. In assembler the values of x, y, and z are stored in specific memory addresses, and the CPU gets at them by explicit use of their respective memory addresses. The high level language concept of a variable, whilst easier to think about, is actually already an abstracted concept identifying a value in a memory address.</li>
<li>note that in order to do anything &#8220;interesting&#8221; (e.g. add a value to another) the CPU needs to have at least part of the data it is operating on in a register (I&#8217;m sure some that some CPUs must be able to operate directly on memory, but it&#8217;s certainly not the usual way of doing things).</li>
</ol>
<p>Finally, I feel that this extremely simple example illustrates what I think is one of the most important facts about programming:</p>
<p>High level languages exist only to make life easy for humans, they&#8217;re not any kind of accurate reflection of how CPUs actually work &#8211; in fact even assembler is a human convenience compared to the actual binary opcodes that the mnemonics (e.g. mov, add etc.) represent.</p>
<p>My advice is don&#8217;t think about the actual opcodes too much, and definitely don&#8217;t worry about the electrons or the silicon :)</p>
]]></content:encoded>
			<wfw:commentRss>http://www.altdevblogaday.com/2011/11/09/a-low-level-curriculum-for-c-and-c/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Why I became an Educator</title>
		<link>http://www.altdevblogaday.com/2011/10/25/why-i-became-an-educator/</link>
		<comments>http://www.altdevblogaday.com/2011/10/25/why-i-became-an-educator/#comments</comments>
		<pubDate>Tue, 25 Oct 2011 04:00:17 +0000</pubDate>
		<dc:creator>Alex Darby</dc:creator>
				<category><![CDATA[Education]]></category>
		<category><![CDATA[General Interest]]></category>

		<guid isPermaLink="false">http://altdevblogaday.com/?p=19173</guid>
		<description><![CDATA[<p>I’ve always felt that my education left me woefully under prepared for the realities of the workplace, let alone the realities of the workplace in the Video Game Industry.</p>
<p>I&#8217;ve learned almost all of what I consider to be the core knowledge required by game programmers out of necessity whilst doing my job; and each time I&#8217;ve made one of these fundamental discoveries, I&#8217;ve felt that there was no good reason why I couldn’t have been told about it during my education.</p>
<p><a href="http://www.altdevblogaday.com/2011/10/25/why-i-became-an-educator/" class="more-link">Read more on Why I became an Educator&#8230;</a></p>
]]></description>
				<content:encoded><![CDATA[<p>I’ve always felt that my education left me woefully under prepared for the realities of the workplace, let alone the realities of the workplace in the Video Game Industry.</p>
<p>I&#8217;ve learned almost all of what I consider to be the core knowledge required by game programmers out of necessity whilst doing my job; and each time I&#8217;ve made one of these fundamental discoveries, I&#8217;ve felt that there was no good reason why I couldn’t have been told about it during my education.</p>
<div id="attachment_19171" class="wp-caption alignright" style="width: 302px"><a href="http://altdevblogaday.com/wp-content/uploads/2011/10/ADBAD_SpeccyPlus.png"><img class="size-full wp-image-19171" src="http://altdevblogaday.com/wp-content/uploads/2011/10/ADBAD_SpeccyPlus.png" alt="Sinclair ZX Spectrum +" width="292" height="172" /></a><p class="wp-caption-text">Kneel before the Sinclair ZX Spectrum+</p></div>
<p>I had no formal training in Computer Science before I studied for a degree in Computer Science, but &#8211; like most programmers of my generation &#8211; I learned to program in BASIC on one of the 1980&#8242;s many cheap personal computers &#8211; in my case a Sinclair ZX Spectrum+.</p>
<p>Eventually, I ended up studying Computer Science at University (Jt Hons Comp Sci / AI &amp; Psychology).</p>
<p>When I went to University, C++ was still a new language and the template keyword wasn&#8217;t in the language standard. In fact, according to Wikipedia  there wasn&#8217;t a langauge standard until 1998, which is 5 years after I started university.</p>
<p>As I assume most other people were back then, I was taught language syntax in lectures, and set a lot of coursework to force me to apply that syntax and allow my learning to be assessed.</p>
<p>Unfairly or fairly, I genuinely don&#8217;t feel that I learned anything significant about programming directly from the people who ostensibly educated me in it.</p>
<p>I may have learned how to program whilst solving the problems that the lecturers had set; but I was taught far, far more by one or two specific  people around me (thanks Nigel and Matt) when I was solving them than by the lecturers who had read language syntax at me.</p>
<div id="attachment_19174" class="wp-caption alignleft" style="width: 170px"><a href="http://altdevblogaday.com/wp-content/uploads/2011/10/ADBAD_TheBjible.png"><img class="size-full wp-image-19174 " src="http://altdevblogaday.com/wp-content/uploads/2011/10/ADBAD_TheBjible.png" alt="The Bjible" width="160" height="202" /></a><p class="wp-caption-text">The Bjible</p></div>
<p>Even back then in 1993 &#8211; before what we now think of as the internet was really in existence &#8211; the langauge syntax of C++ could be easily learned from a book (e.g. &#8220;The C++ Programming Language&#8221; by Bjarne Stroustrup &#8211; also known as the &#8220;Bjible&#8221;), but the syntax of the language isn&#8217;t the code that executes &#8211; so why did no-one explain those parts of it to me?</p>
<p>Ironically, I managed to sail through university and get a job at Codemasters with the functional level of knowledge that I had been given by a year&#8217;s worth of C++ syntax lectures and coursework, but I was in for a massive shock when I actually started work.</p>
<p>Two of the guys I knew at Codies were doing a Sega Saturn port of the PSOne version of Micro Machines V3. They had reverse engineered the opcodes of the Saturn&#8217;s sound chip using a multimeter and an oscilloscope and, from what I recall, were using their findings to get extra texture memory by storing textures in sound RAM.</p>
<p>At that point in time I literally didn&#8217;t know what the stack was, and heap was just another word for a badly organised pile. What those two guys were doing was literally and figuratively completely beyond me.</p>
<p>Needless to say I learned fast, I had to. I was also lucky enough to be around programmers who had that level of knowledge and were more than happy to share it.</p>
<p>Over the years I have seen a consistent worsening of understanding of the low-level underpinnings of programming languages from graduates &#8211; which is, I suppose, to be expected given the ongoing trend toward teaching only Java on traditional Computer Science Degrees (at least in the UK).</p>
<p>However, more worryingly, I have also seen a constantly surprising lack of low-level knowledge and understanding demonstrated by theoretically experienced programmers.</p>
<p>One of my main motivations in becoming an educator is to try to address this issue; as most of the really horrible hard to track down bugs I&#8217;ve seen over the years have arisen from some subtle disconnect between what a programmer thinks they&#8217;re asking the compiler to do and what they&#8217;re actually asking it to do.</p>
<p>Without a thorough understanding of the underpinnings of C++ &#8211; i.e. how the engine of the C++ language is implemented &#8211; you can&#8217;t possibly properly understand the implications of the C++ code that you&#8217;re writing.</p>
<p>Even if you understand the way that C++ works internally then, sooner or later, you&#8217;re still going to have to worry about the implications of any given implementational choice for the way the code executes on the specific hardware architecture it will execute on.</p>
<p>It&#8217;s a lot to take in. Very few people get it right all of the time &#8211; and I&#8217;m not foolish enough to claim to be one of them &#8211; but that&#8217;s not a get-out-of-jail-free card&#8230;</p>
<p>If you&#8217;re only thinking about what you want the compiler to do, and not worrying about what you&#8217;re <em>actually</em> telling the compiler to do, then you&#8217;re much more likely to get it wrong.</p>
<p>Remember:</p>
<p><strong>All code is bad, especially your own</strong> &#8211; <strong>and if it compiles first time and seems to work then it&#8217;s probably wrong.</strong></p>
]]></content:encoded>
			<wfw:commentRss>http://www.altdevblogaday.com/2011/10/25/why-i-became-an-educator/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Making your own type id is fun!</title>
		<link>http://www.altdevblogaday.com/2011/10/10/making-your-own-type-id-is-fun/</link>
		<comments>http://www.altdevblogaday.com/2011/10/10/making-your-own-type-id-is-fun/#comments</comments>
		<pubDate>Mon, 10 Oct 2011 22:18:52 +0000</pubDate>
		<dc:creator>Alex Darby</dc:creator>
				<category><![CDATA[#gamedev]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[C++]]></category>
		<category><![CDATA[RTTI]]></category>
		<category><![CDATA[type id]]></category>

		<guid isPermaLink="false">http://altdevblogaday.com/?p=18381</guid>
		<description><![CDATA[<h1>To RTTI, or not to RTTI?</h1>
<p>Someone once told me &#8220;If you need to use RTTI then your design is wrong.&#8221;</p>
<p>I find that when you meet someone who says stuff like this, the best thing to do is tell them that &#8211; having tried both approaches &#8211; you totally prefer inheritance to composition; and then sneak away under cover of  the resulting self-righteousness mushroom cloud.</p>
<p><a href="http://www.altdevblogaday.com/2011/10/10/making-your-own-type-id-is-fun/" class="more-link">Read more on Making your own type id is fun!&#8230;</a></p>
]]></description>
				<content:encoded><![CDATA[<h1>To RTTI, or not to RTTI?</h1>
<p>Someone once told me &#8220;If you need to use RTTI then your design is wrong.&#8221;</p>
<p>I find that when you meet someone who says stuff like this, the best thing to do is tell them that &#8211; having tried both approaches &#8211; you totally prefer inheritance to composition; and then sneak away under cover of  the resulting self-righteousness mushroom cloud.</p>
<p><a href="http://altdevblogaday.com/wp-content/uploads/2011/10/2331839040_02de6e4cac.jpg"><img class="alignnone size-thumbnail wp-image-18388" src="http://altdevblogaday.com/wp-content/uploads/2011/10/2331839040_02de6e4cac-150x150.jpg" alt="" width="150" height="150" /></a></p>
<p>&nbsp;</p>
<h1>Pragmatic? Moi?!?</h1>
<p>In game programming we often end up inhabiting an interesting middle ground between engineering best practice and the practicality of getting the job done so we can hit the milestone and keep the company afloat.</p>
<p>In addition to this issue, the Object Oriented goal of localising the logical behaviour of an object into a class that contains the data that it operates on is sometimes at odds with sensibility as well as the other goals of OOP.</p>
<p>For example; I&#8217;ve lost track of the number of <em>seemingly</em> well thought out state machine manager classes I&#8217;ve seen which, when put into use, produce a system of state classes whose logic about state transitions is so decentralised amongst the various states that it actively works against the maintainability of the code.</p>
<p>For me the issues surrounding use of RTTI are less about philosophy, and more about pragmatics.</p>
<p>There are plenty of situations where you can entirely work around a problem without using typeid or dynamic_cast; but there are also situations where the solution that doesn&#8217;t require them involves:</p>
<ol>
<li>too many layers of abstraction to be readily digested at a later date by anyone who didn&#8217;t write the code in the first place, and / or</li>
<li>too much refactoring of existing code given the time and / or</li>
<li>is too risky given the current situation (e.g. in the final run-up to mastering)</li>
</ol>
<p>In any of these situations I say: &#8220;Go for your life; but for the love of Bob please don&#8217;t use typeid or dynamic_cast&lt;&gt;.&#8221;</p>
<p>&nbsp;</p>
<h1>What&#8217;s wrong with typeid and dynamic_cast&lt;&gt;?</h1>
<p>Well, there&#8217;s nothing actually <em>wrong</em> with them as such; they work as intended, and it&#8217;s not like they go around stamping on kittens or anything&#8230;</p>
<p>However, you need to leave RTTI turned on to use them; which uses extra memory for typeid (up to 40 bytes per class with virtual functions according to wikipedia, so not actually that  bad), and dynamic_cast&lt;&gt; &#8220;can be pretty slow&#8221; and definitely isn&#8217;t constant time.</p>
<p>To be honest, I &#8211; and the other programmers who were founding members of FreeStyleGames &#8211; always felt the RTTI functionality was (is)  just another little thing that people will tend to misuse if left to their own devices &#8211; so we always turned it off. I still do so now.</p>
<p>At the end of the day, using it has some small fixed and not so fixed penalties, and not using it doesn&#8217;t. That was enough to make up my mind for me.</p>
<p>Either way, I&#8217;m not writing this to argue coherently about whether you should or shouldn&#8217;t use the default C++ RTTI; that&#8217;s well trodden ground, people have their opinions and let&#8217;s just accept that I&#8217;ve decided to live on the side of the fence where we turn it off and use the extra memory for something more useful like storing user defined 1s and 0s.</p>
<p>&nbsp;</p>
<h1>So, what about this type id then?</h1>
<p>So, assuming that you&#8217;re someone who&#8217;s sold on not using RTTI; and that you have a situation where you feel that your best-available-solution requires it (or something like it) then I hope what follows is useful &#8211; let&#8217;s get on with it shall we?</p>
<p>&nbsp;</p>
<h2 style="padding-left: 30px">What do we want from our type id?</h2>
<p style="padding-left: 30px">Firstly, our type id &#8211; let&#8217;s call it altTypeId &#8211; needs to resolve to a unique value for each type that we ask it to give us an altTypeId for.</p>
<p style="padding-left: 30px">Secondly, the value it gives for each type needs to be constant throughout the duration of each instance of the execution of our program.</p>
<p style="padding-left: 30px">Finally, I think it&#8217;s fair to say that it should to usable in a typeid alike fashion.</p>
<h2 style="padding-left: 30px">template and #define are your friends</h2>
<p style="padding-left: 30px">Unsurprisingly, given that this problem requires a unique value per type we&#8217;re going to use a template.</p>
<p style="padding-left: 30px">Even more unsurprisingly, since the solution uses a template, we&#8217;re going to need a #define macro to wrap it so your code doesn&#8217;t become a hideous mess of angle brackets and scope operators.</p>
<h2 style="padding-left: 30px">Volia! Le code est ici!</h2>
<p style="padding-left: 30px">Let&#8217;s start at the beginning, we need a type to hold altTypeId.</p>
<p style="padding-left: 30px">It&#8217;s pretty unlikely that any code base is going to have more than 0xFFFFFFFF types in total, and the number that might need to be resolved using a type id should be smaller than the total number. This makes an unsigned int (assuming it to be 32bit) a sensible choice.</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
</pre></td><td class="code"><pre class="c" style="font-family:monospace;"><span style="color: #666666; font-style: italic;">// typedef for altTypeId as unsigned int</span>
<span style="color: #993333;">typedef</span> <span style="color: #993333;">unsigned</span> <span style="color: #993333;">int</span> altTypeId<span style="color: #339933;">;</span></pre></td></tr></table></div>

<p style="padding-left: 30px">Ok, so the next part is we need some way to generate unique values for these altTypeIds. My solution to this is a class that contains a single static function which returns 0 the first time it&#8217;s called, and then 1 the next, then 2, and so on.</p>
<p style="padding-left: 30px">This is only a class because I want to make all of its functionality protected and hide the ability to generate altTypeIds away from anything not derived from it.</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
</pre></td><td class="code"><pre class="c" style="font-family:monospace;">class CAltTypeIdGen_Base
<span style="color: #009900;">&#123;</span>
public<span style="color: #339933;">:</span>
    <span style="color: #000000; font-weight: bold;">enum</span><span style="color: #009900;">&#123;</span> ALTTYPEID_INVALID <span style="color: #339933;">=</span> <span style="color: #208080;">0xFFffFFff</span> <span style="color: #009900;">&#125;</span><span style="color: #339933;">;</span>
&nbsp;
protected<span style="color: #339933;">:</span>
    <span style="color: #666666; font-style: italic;">// generates type Id values starting from 0</span>
    <span style="color: #666666; font-style: italic;">// you would probably want to put this in a .cpp in case you want to use it in a library</span>
    <span style="color: #993333;">static</span> altTypeId GenerateAltTypeId<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span>
    <span style="color: #009900;">&#123;</span>
        <span style="color: #993333;">static</span> altTypeId s_uNextClassID <span style="color: #339933;">=</span> <span style="color: #0000dd;">0</span><span style="color: #339933;">;</span>
        <span style="color: #b1b100;">return</span> s_uNextClassID<span style="color: #339933;">++;</span>
    <span style="color: #009900;">&#125;</span>
&nbsp;
    <span style="color: #666666; font-style: italic;">// only constructible from derived types</span>
    CAltTypeIdGen_Base<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span>
    <span style="color: #009900;">&#123;</span><span style="color: #009900;">&#125;</span>
<span style="color: #009900;">&#125;</span><span style="color: #339933;">;</span></pre></td></tr></table></div>

<p style="padding-left: 30px">I&#8217;m assuming that you&#8217;re thinking &#8220;What?&#8221; at this point. The above class doesn&#8217;t make a lot of sense without the template that turns it into a one-number-per-type system. Here it is&#8230;</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
</pre></td><td class="code"><pre class="c" style="font-family:monospace;">template<span style="color: #339933;">&lt;</span> typename T <span style="color: #339933;">&gt;</span>
class TAltTypeIdGen <span style="color: #339933;">:</span> public CAltTypeIdGen_Base
<span style="color: #009900;">&#123;</span>
public<span style="color: #339933;">:</span>
    <span style="color: #666666; font-style: italic;">// this generates a typeID for each class that instantiates the template</span>
    <span style="color: #993333;">static</span> altTypeId GetAltTypeId<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span>
    <span style="color: #009900;">&#123;</span>
        <span style="color: #993333;">static</span> altTypeId s_uClassId <span style="color: #339933;">=</span> GenerateAltTypeId<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
        <span style="color: #b1b100;">return</span> s_uClassId<span style="color: #339933;">;</span>
    <span style="color: #009900;">&#125;</span>
&nbsp;
    private<span style="color: #339933;">:</span>
    <span style="color: #666666; font-style: italic;">// no instance of this class can be created.</span>
    TAltTypeIdGen<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span>
    <span style="color: #009900;">&#123;</span><span style="color: #009900;">&#125;</span>
<span style="color: #009900;">&#125;</span><span style="color: #339933;">;</span></pre></td></tr></table></div>

<p style="padding-left: 30px">Now we have a way of generating a unique altTypeId for each type in the code base &#8211; this template type cannot be constructed, and the internal static data of its only accessible function can only be initialised once. Joy.</p>
<p style="padding-left: 30px">To finish it off I&#8217;ve added a macro that makes it less visually traumatic to use, and a template function that extracts the altTypeId from an instance of a type which is pretty helpful too.</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
</pre></td><td class="code"><pre class="c" style="font-family:monospace;"><span style="color: #666666; font-style: italic;">// define to make CAltTypeIdGen_Base::ALTTYPEID_INVALID less visually traumatic</span>
<span style="color: #339933;">#define ALTTYPEID_INVALID CAltTypeIdGen_Base::ALTTYPEID_INVALID </span>
&nbsp;
<span style="color: #666666; font-style: italic;">// macro for getting hold of a type's altTypeId</span>
<span style="color: #339933;">#define GetAltTypeIdOf( TYPENAME ) ( TAltTypeIdGen&lt; TYPENAME &gt;::GetAltTypeId() ) </span>
&nbsp;
<span style="color: #666666; font-style: italic;">// resolves to the correct form of TAltTypeIdGen&lt; T &gt;::GetAltTypeId()</span>
template<span style="color: #339933;">&lt;</span> typename T <span style="color: #339933;">&gt;</span>
altTypeId GetAltTypeIdOfInstance<span style="color: #009900;">&#40;</span> T instance <span style="color: #009900;">&#41;</span>
<span style="color: #009900;">&#123;</span>
    <span style="color: #b1b100;">return</span> TAltTypeIdGen<span style="color: #339933;">&lt;</span> T <span style="color: #339933;">&gt;::</span><span style="color: #202020;">GetAltTypeId</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span></pre></td></tr></table></div>

<p style="padding-left: 30px">That was easy, right? Now we can use the altTypeId to make decisions based on the type of variables e.g.:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
</pre></td><td class="code"><pre class="c" style="font-family:monospace;"><span style="color: #993333;">int</span> iDemo <span style="color: #339933;">=</span> <span style="color: #0000dd;">0</span><span style="color: #339933;">;</span>
<span style="color: #b1b100;">if</span><span style="color: #009900;">&#40;</span> GetAltTypeIdOf<span style="color: #009900;">&#40;</span> <span style="color: #993333;">int</span> <span style="color: #009900;">&#41;</span> <span style="color: #339933;">==</span> GetAltTypeIdOfInstance<span style="color: #009900;">&#40;</span> iDemo <span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#41;</span>
<span style="color: #009900;">&#123;</span>
    <span style="color: #666666; font-style: italic;">// do stuff</span>
<span style="color: #009900;">&#125;</span></pre></td></tr></table></div>

<h2 style="padding-left: 30px">But&#8230; that&#8217;s not really useful for anything&#8230; is it?</h2>
<p style="padding-left: 30px">Busted! Ok, so the astute amongst you will have noticed that so far altTypeId is not really mega useful&#8230;</p>
<p style="padding-left: 30px">So, the first thing to note about what I&#8217;ve shown here is that it isn&#8217;t a straight replacement for typeid, and it definitely isn&#8217;t a straight replacement for dynamic_cast&lt;&gt;. It does however definitely let you solve the majority of problems whose alternate solutions might involve these two C++ language features.</p>
<p style="padding-left: 30px">The main limitation of altTypeId is that &#8211; without a little extra stitching into your code &#8211; can&#8217;t be used to do any reasoning about the actual types of objects pointed to by a pointer of a base type.</p>
<p style="padding-left: 30px">This is for two reasons:</p>
<blockquote>
<ol>
<li>templates can only operate on what you instantiate them with at compile time &#8211; so a pointer to a type will give you a different altTypeId than an instance of the type would, and</li>
<li>a pointer to a base class will always yield the altTypeId of a pointer to a base class &#8211; there&#8217;s no polymorphic cleverness in altTypeId</li>
</ol>
</blockquote>
<p style="padding-left: 30px">To be honest I&#8217;m pretty sure that some template black magic could fix the first issue, but I&#8217;ve never bothered to because there are lots of reasons type id should maintain the differentiation between the type of an instance and the type of a pointer to that instance. Apart from anything else type and type* <em><strong>are</strong></em> actually different types&#8230;</p>
<p style="padding-left: 30px">The second issue is definitely fixable &#8211; you could either:</p>
<blockquote>
<ol>
<li>have a (pure) virtual function in the base type that returns an altTypeId and implement it in all deriving classes to return their actual altTypeId, or</li>
<li>have a base class that has an altTypeId as a member which is set by an argument to its constructor when derived classes are constructed. This way is more tedious to write, as you have to chain the constructors down the hierarchy if there is &gt; 1 level of derivation from the base class. However it has no virtual function overhead for getting at the altTypeId.</li>
</ol>
</blockquote>
<p style="padding-left: 30px">I&#8217;ve used both of these approaches myself, and they&#8217;re both fine &#8211; though the second one takes more explaining to get people to do it correctly.</p>
<h1>Summary</h1>
<p>The code is here if you want to just download it: <a href="http://codepad.org/uqoOP7WW">http://codepad.org/uqoOP7WW.</a></p>
<p>Hopefully this was of some interest and not so bleedingly obvious that you, dear reader, were wondering why I bothered to post it.</p>
<p>I&#8217;ve had occasion to use this code maybe 4 or 5 times over the years since I originally wrote it and each time I dig it out I usually make a couple of tweaks so this is relatively mature code (I know it&#8217;s tiny, but it&#8217;s changed quite a lot from the original implementation!).</p>
<p>I&#8217;m 99.9% sure that with some extra template magic it could be made into something a lot cleverer &#8211; in fact I know it can because I built a reflection system off it, but I can&#8217;t give you the source to that since it belongs to Activision.</p>
<p>Incidentally, if you&#8217;re interested in templates I highly recommend the book &#8220;Modern C++ Design&#8221; by Andrei Alexandrescu, the reflection code I wrote drew heavily on this book. Ironically it&#8217;s 10 years old now and will probably still blow your mind gaskets unless you&#8217;re the kind of person who programs entirely in templates and contributes to boost. Some of the stuff in there is pure genius, and yet it&#8217;s almost entirely unused in games where its use would <em>make things good &#8482;</em> (e.g. policy based design).</p>
<p>Final thought; solutions 1 &amp; 2 in the previous section would also work with an enum containing a value for all derived types of the base class instead of a fancy templatey solution, but if you do it that way you will make Bjarne Stroustrup cry.</p>
<h1>Epilogue, or The Benefit of Peer Criticism</h1>
<p>Following on from comments made on this by Unai Lander, Michael Nicolella, and Nicolas Silva I have added this epilogue.</p>
<p>Unai suggested a more elegant solution, Michael pointed out a couple of possible issues with run time efficiency of the solution I posted originally, and Nicolas spotted what I was getting at with the phrase &#8220;template black magic&#8221; &#8211; specifically if you want const int, volatile int, and int to return the same altTypeId you will need to use a technique called &#8220;qualifier stripping&#8221; which uses partial template specialisation and is covered in detail in the Andrescu book &#8220;Modern C++ Design&#8221;, page 44 [2.10.4 Stripping Qualifiers].</p>
<p>Unai&#8217;s solution removes the need to have the base type for the template Id generator type. Like many really elegant solutions it seems bleedingly obivous when you see it :)</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
</pre></td><td class="code"><pre class="c" style="font-family:monospace;"><span style="color: #993333;">typedef</span> <span style="color: #993333;">unsigned</span> <span style="color: #993333;">int</span> altTypeId<span style="color: #339933;">;</span>
&nbsp;
template<span style="color: #339933;">&lt;</span> typename T <span style="color: #339933;">&gt;</span>
class TAltTypeIdGenerator
<span style="color: #009900;">&#123;</span>
private<span style="color: #339933;">:</span>
    <span style="color: #666666; font-style: italic;">// no instance of this class can be created.</span>
    TAltTypeIdGenerator<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span>
    <span style="color: #009900;">&#123;</span><span style="color: #009900;">&#125;</span>
&nbsp;
public<span style="color: #339933;">:</span>
    <span style="color: #666666; font-style: italic;">// this generates a typeID for each class that instantiates the template</span>
    <span style="color: #993333;">static</span> altTypeId GetAltTypeId<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span>
    <span style="color: #009900;">&#123;</span>
        <span style="color: #666666; font-style: italic;">// I wasn't sure if the char would take up &gt; 1 byte because of alignment.</span>
        <span style="color: #666666; font-style: italic;">// With VS2010 on win32 they take up exactly 1 byte each (and in the pastebin too...)</span>
        <span style="color: #993333;">static</span> <span style="color: #993333;">char</span> chAddressOfThisIsTheTypeId<span style="color: #339933;">;</span>
        <span style="color: #b1b100;">return</span> reinterpret_cast<span style="color: #339933;">&lt;</span> altTypeId <span style="color: #339933;">&gt;</span><span style="color: #009900;">&#40;</span> <span style="color: #339933;">&amp;</span>chAddressOfThisIsTheTypeId <span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    <span style="color: #009900;">&#125;</span>
<span style="color: #009900;">&#125;</span><span style="color: #339933;">;</span></pre></td></tr></table></div>

<p>Michael pointed out that using a function local static introduces the overhead of conditional branching into the function, which is bad for obvious reasons.</p>
<p>So, I did a little empirical experimentation (in VS2010) to see what the assembler did for various cases of function local statics and here&#8217;s what I found:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
</pre></td><td class="code"><pre class="c" style="font-family:monospace;"><span style="color: #993333;">int</span> functionNotInit<span style="color: #009900;">&#40;</span> <span style="color: #993333;">void</span> <span style="color: #009900;">&#41;</span>     <span style="color: #666666; font-style: italic;">// no branches</span>
<span style="color: #009900;">&#123;</span>
    <span style="color: #993333;">static</span> <span style="color: #993333;">int</span> iNotInit<span style="color: #339933;">;</span>
    <span style="color: #b1b100;">return</span> iNotInit<span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span>
&nbsp;
<span style="color: #993333;">int</span> functionNotInitInc<span style="color: #009900;">&#40;</span> <span style="color: #993333;">void</span> <span style="color: #009900;">&#41;</span> <span style="color: #666666; font-style: italic;">// no branches</span>
<span style="color: #009900;">&#123;</span>
    <span style="color: #993333;">static</span> <span style="color: #993333;">int</span> iNotInit<span style="color: #339933;">;</span>
    <span style="color: #b1b100;">return</span> iNotInit<span style="color: #339933;">++;</span>
<span style="color: #009900;">&#125;</span>
&nbsp;
<span style="color: #993333;">int</span> functionInitZero<span style="color: #009900;">&#40;</span> <span style="color: #993333;">void</span> <span style="color: #009900;">&#41;</span> <span style="color: #666666; font-style: italic;">// no branches</span>
<span style="color: #009900;">&#123;</span>
    <span style="color: #993333;">static</span> <span style="color: #993333;">int</span> iInit <span style="color: #339933;">=</span> <span style="color: #0000dd;">0</span><span style="color: #339933;">;</span>
    <span style="color: #b1b100;">return</span> iInit<span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span>
&nbsp;
<span style="color: #993333;">int</span> functionInitZeroInc<span style="color: #009900;">&#40;</span> <span style="color: #993333;">void</span> <span style="color: #009900;">&#41;</span>    <span style="color: #666666; font-style: italic;">// no branches</span>
<span style="color: #009900;">&#123;</span>
    <span style="color: #993333;">static</span> <span style="color: #993333;">int</span> iInit <span style="color: #339933;">=</span> <span style="color: #0000dd;">0</span><span style="color: #339933;">;</span>
    <span style="color: #b1b100;">return</span> iInit<span style="color: #339933;">++;</span>
<span style="color: #009900;">&#125;</span>
&nbsp;
<span style="color: #993333;">int</span> functionInitFunction<span style="color: #009900;">&#40;</span> <span style="color: #993333;">void</span> <span style="color: #009900;">&#41;</span> <span style="color: #666666; font-style: italic;">// conditional branch to call SetValue on first time through</span>
<span style="color: #009900;">&#123;</span>
    <span style="color: #993333;">static</span> <span style="color: #993333;">int</span> iInitFromFunc <span style="color: #339933;">=</span> SetValue<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    <span style="color: #b1b100;">return</span> iInitFromFunc<span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span></pre></td></tr></table></div>

<p>So, the upshot is that (as an uninitialised function local static appears to have no branch overhead associated with it) I have revised the code to use Unai&#8217;s method and whacked the new version into a paste bin for your pleasure here:</p>
<p><a href="http://codepad.org/RXZfzCpo">http://codepad.org/RXZfzCpo</a></p>
<p>Thanks to Unai, Michael, and Nicolas :)</p>
]]></content:encoded>
			<wfw:commentRss>http://www.altdevblogaday.com/2011/10/10/making-your-own-type-id-is-fun/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>How I discovered my favourite templated list class</title>
		<link>http://www.altdevblogaday.com/2011/09/25/my-favourite-templated-list/</link>
		<comments>http://www.altdevblogaday.com/2011/09/25/my-favourite-templated-list/#comments</comments>
		<pubDate>Sun, 25 Sep 2011 00:42:23 +0000</pubDate>
		<dc:creator>Alex Darby</dc:creator>
				<category><![CDATA[#gamedev]]></category>
		<category><![CDATA[Education]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[C++]]></category>
		<category><![CDATA[fragmentation]]></category>
		<category><![CDATA[list]]></category>
		<category><![CDATA[template]]></category>

		<guid isPermaLink="false">http://altdevblogaday.com/?p=16746</guid>
		<description><![CDATA[<h1>Templated list class?!?</h1>
<p>Really? This isn&#8217;t 1996 dude. We have the STL and the fancy new ISO approved C++11 standard. We don&#8217;t need your stupid list class.</p>
<p>That&#8217;s fine. Sorry. How stupid of me &#8211; please feel free to ignore the rest of this blog.</p>
<p><a href="http://www.altdevblogaday.com/2011/09/25/my-favourite-templated-list/" class="more-link">Read more on How I discovered my favourite templated list class&#8230;</a></p>
]]></description>
				<content:encoded><![CDATA[<h1>Templated list class?!?</h1>
<p>Really? This isn&#8217;t 1996 dude. We have the STL and the fancy new ISO approved C++11 standard. We don&#8217;t need your stupid list class.</p>
<p>That&#8217;s fine. Sorry. How stupid of me &#8211; please feel free to ignore the rest of this blog.</p>
<p>I can only apologise for having wasted your time.</p>
<h1>My favourite templated list class</h1>
<p>If you&#8217;re still reading, well done &#8211; you have passed the test.</p>
<p>You receive the +1 Spectacles of Second-Hand Perspective, and the +2 Underpants of Questionable Enlightenment.</p>
<p>Please continue reading.</p>
<h2>Prelude</h2>
<p style="text-align: left">As other ADBAD posts about data structures have already said, there&#8217;s no such thing as a one-size-fits-all data structure.</p>
<p>Obviously you should definitely avoid spending time writing new data structures unless you genuinely need to, and not optimise until you find a bottleneck; but sometimes there are cases where you don&#8217;t actually <em>need</em> to, but if you don&#8217;t then you will maybe sleep a little less soundly at night wondering what that generic class you used is up to when you&#8217;re not looking at it.</p>
<p>The templated list class I&#8217;m eventually going to give you the code for <em>definitely</em> doesn&#8217;t fit all, and I guess the absolute need for it is questionable, but it&#8217;s still my favourite templated list class.</p>
<p>Why, dear reader?  That&#8217;s what I&#8217;m about to tell you.</p>
<h2>The Backstory</h2>
<p>It was the last console hardware transition. Our middleware provider had been bought by a large publisher and had essentially gone out of business. New consoles were appearing daily. The company decided to do a couple of work-for-hire type projects whilst we planned how to hit the next-gen.</p>
<p>We looked at the other middleware available at the time. We decided that we didn&#8217;t want to use Unreal and that most other extant providers were no more likely to be around forever than our previous one. After much deliberation it was decided that a small group of us should roll our own in-house engine.</p>
<h2>What we were doing</h2>
<p>Going from mature bleeding edge middleware to a self built engine is a big shock to the system</p>
<p>Our initial platforms were PS3 / X360, but we made a conscious choice to leave the door open for Wii because even the early indications were that it might end up having the biggest install base. This meant that the multi-platform architecture had to be very flexible and modular to leave room for the various platforms to be different but for our libs to work the same.</p>
<p>Several of the guys involved were old hands at engine code and we knew what we were doing, but even working as fast as we could, we knew that the most we&#8217;d be able to get done in any sensible timescale was the basics; and that any features we added above that would need to be directly applicable to the target game in order to make it worth our while.</p>
<h2>What was on our mind</h2>
<p>A lot of our core concerns in writing this engine were to mitigate against problems we&#8217;d experienced on our previous projects.</p>
<p>There was big picture stuff &#8211; like making sure the tool chain enabled assets to be added without needing code to be written so that the art and design teams could work effectively and iterate content without code issues slowing them up.</p>
<p>There was also nitty gritty stuff &#8211; for example, our previous game had had no end of problems with OOM caused by heap fragmentation. It was no-one&#8217;s fault directly &#8211; more a symptom of the fact that team sizes were growing, games were getting more complex, and tools hadn&#8217;t quite caught up &#8211; everyone was doing their own thing to get the game done, and because the game was fine until it had been running for some time none of the programmers had noticed it.</p>
<h2>Must&#8230; Allocate&#8230; Memory&#8230; Fragmenting.</h2>
<p>At the time we were relying on a slightly tweaked version of the basic the memory manager that our middleware provider had given us. It didn&#8217;t have the ability to do any sort of checks for fragmentation.</p>
<p>Heap fragmentation sucks, especially if you don&#8217;t know it&#8217;s happening. We only really found out that it was happening between alpha and beta &#8211; when QA started hammering it.</p>
<p>A couple of us spent a very long time tidying it all up. It was a mess. So much code existed that was blithely doing temporary allocations in amongst long term ones. Some stuff was just ill considered decisions made at 2am when under time pressure, but it turned out that a lot of it was to do with either:</p>
<ul>
<li>Unexpected interactions between the asset management and graphics subsystems of the middleware when used in the way that our game was using them.</li>
<li>Insanely naive / inappropriate use of STL and other library code in the front end tin the code that wrote the persistent data that the gameplay relied on for settings etc. This was the killer.</li>
</ul>
<p>It was a fairly big job to sort out, and involved several large architectural changes to the loading system of the game, plus tearing out all the STL use in our game side code.</p>
<p>Just to be clear, the problem wasn&#8217;t directly caused by STL but the way in which it was being used; it was quicker to tear it out than fix it &#8211; imagine a situation where someone had used STL whenever possible as opposed to where it made sense to, like having a std::vector&lt; std::tuple&lt; int, float &gt; &gt; instead of an array of a struct with an int and a float in it. And not pre-allocating the size of the vector even when it was fixed.</p>
<p>I&#8217;m honestly not trying to start an STL fight. This code was so insane that it managed to read the data out of a 15kb XML file and take up 750kb in memory. That was nothing to do with STL itself, but the way it had been (ab)used really didn&#8217;t help.</p>
<p>The upshot of all this? Our Technical Director ended up as an anti-STL extremist, and he and several others (myself included) ended up borderline paranoic about memory fragmentation.</p>
<h2>What we did about it</h2>
<p>When we started writing out new in-house engine, the first thing that was done to address these issues was that the technical director banned STL and we wrote our own templated container classes. Sure there were other things that could have been done, but that&#8217;s what <em>was</em> done.</p>
<p>Another thing was the creation of a memory management subsystem that had all sorts of nice features including fragmentation detection, multiple heaps for different sizes and types of allocs etc. etc.</p>
<p>We also set out one of our prime directives to do whatever we could to prevent fragmentation at the architectural level to make it harder for people to accidentally cause fragmentation.</p>
<h2>So, about this template list class you mentioned&#8230;</h2>
<p>I&#8217;ve not forgotten. Honestly.</p>
<p>The asset management system was on my schedule. Sure it&#8217;s pretty dry stuff, not everyone&#8217;s cup of tea, and definitely not a glory area like implementing a deferred bipolar thrumble edging render pass or whatever; but someone had to do it.</p>
<p>I&#8217;ll be honest, I like this sort of stuff. You get to stroke your chin and consider the &#8220;best way to do it ™&#8221;, and also have a good chance of getting the rare treat of writing code once and never having to change it again &#8211; other than the odd bug fix for a real life edge case you missed in your test runs.</p>
<p>With a prime directive of &#8220;prevent memory fragmentation&#8221;, I found it became very interesting. I stroked my chin and contemplatively nommed the arm of my glasses many times, discussed it over many cups of tea with other programmers, and eventually I came up with a plan that would work.</p>
<p>It did work, and it&#8217;s possibly the only bit of code I&#8217;ve ever written that I&#8217;ve been 100% happy with.</p>
<p>Simple, elegant, and competely bulletproof. Anti-fragmentation asset handling. The future.</p>
<p>There were a couple of teeny weeny little caveats on usage &#8211; much of the underlying architectural fragmentation resistance came from the fact it was sort of stack like; so it always unloaded assets in the opposite order to loading &#8211; but it was a small price to pay for being awesome and fragmentation proof.</p>
<p>That&#8217;s another story though. The important thing is that my main architectural concern was preventing memory fragmentation, and the best way that I know of to entirely reduce the risk of fragmenting memory is not to allocate or deallocate anything.</p>
<p>So, in an ideal world, this asset manager needed to store the managed assets in a data structure that didn&#8217;t have a fixed size, and which also didn&#8217;t allocate or deallocate memory.</p>
<h2>Just one more tangent.</h2>
<p>After I had been made paranoid about memory fragmentation and unintended allocations, I started to worry about the STL container classes.</p>
<p>Once I found out that you can mostly handle that sort of thing with an appropriate STL allocator I got over it (it still bothers me that so few programmers I&#8217;ve known seem to worry about this stuff, but that&#8217;s largely out of my control).</p>
<p>However, I eventually came up against a use-case with generic containers that still bothered me, and it was essentially a property of the way a generic containers have to work in order to be generic &#8211; so AFAIK there&#8217;s no work around other than a different approach.</p>
<p>Consider a common situation where you&#8217;ve got a known number of pre-allocated objects and you&#8217;re using a dead list and an active list to manage which are being used and which are not.</p>
<p>What happens when you take something out of one list and put it into another?</p>
<p>The link element used to store the object in the list doesn&#8217;t move with the object, but it still has to go somewhere.</p>
<p>I reasoned that what&#8217;s actually going on in any given templated list implementation is likely to be somewhere between the following two cases:</p>
<p style="padding-left: 30px"><strong>Worst case:</strong> a list element is newed every time I insert something into a list, and deleted every time I take it out. This is potentially one free store alloc + constructor and one destructor + free store dealloc per object moved between lists.</p>
<p style="padding-left: 30px"><strong>Best case:</strong> list elements are pre-allocated for each list so elements are used when an object is inserted and recycled when they&#8217;re removed. This still has an overhead for internal list element management, and also has to pre-allocate the maximum number of elements in both lists which is wasteful.</p>
<p>I honestly doubt there are many situations where this has become a bottleneck, but I just couldn&#8217;t bring myself to feel happy about the list element not moving with the object as it moves from one list to the other &#8211; it just seems wasteful and sloppy.</p>
<p>The way to work around this is bleedingly obvious, and is how most &#8211; if not all &#8211; heaps keep track of the memory they manage. You just put the list element into the object you&#8217;re going to store in the list.</p>
<p>This, as it happens, is also where the allocationless data structure comes in.</p>
<h2>Ladies and Gentlemen&#8230;</h2>
<p>It is my profound pleasure to present to you my favourite template list class.</p>
<p>(<strong>Note: 12/10/2011</strong> &#8211; I found a bug in the code and have put the fixed code into the pastebin&#8230;)</p>
<p>In fact, it&#8217;s in a pastebin: <a href="http://codepad.org/hK6xO2bO">http://codepad.org/hK6xO2bO</a></p>
<p>Codepad is ace. It runs the code too, and so you can see the output as well as the nicely highlighted code. Yay!</p>
<p>Broadly speaking, the code works like this:</p>
<ul>
<li>There is a class for the templated list called CNoAllocList which does all the donkey work.</li>
<li>Intances of CNoAllocList can&#8217;t be created directly, and it&#8217;s wrapped by a template called TNoAllocList which is how you create one.</li>
<li>There is a base class called CNoAllocListable from which you derive the type you want to store.</li>
<li>You create an instance of TNoAllocList&lt;&gt; with your type  as the template parameter&#8230;</li>
<li>&#8230;and then you can do all the normal sort of list operations on it with instances of your CNoAlloListable derived type.</li>
</ul>
<p>A few little things to bear in mind:</p>
<ol>
<li>This probably isn&#8217;t code you would want to drop straight into an existing code base &#8211; it&#8217;s intended to be more illustrative than instantly usable.</li>
<li>I couldn&#8217;t post the original code as:
<ol>
<li> I don&#8217;t have it, and</li>
<li>(more importantly) it belongs to Activision.</li>
</ol>
</li>
<li>The template isn&#8217;t even vaguely STL container compliant, but it&#8217;s not supposed to be.</li>
<li>I&#8217;ve left writing an iterator as an exercise for you, but it is useable as is.</li>
<li>I&#8217;ve put lots of comments in the code which &#8211; whether you love or hate that &#8211; should explain it nicely.</li>
<li>I also like hungarian notation, and tend to go a bit mad with access specifiers so, for that I apologise.</li>
</ol>
<h2>So, why is it my favourite?</h2>
<p>I&#8217;m sure the amount of fuzzy contentedness I get from this approach to lists is not normal, but I <em><strong>love</strong></em> it.</p>
<p>There are two reasons why:</p>
<p style="padding-left: 30px">1) It doesn&#8217;t allocate.</p>
<p style="padding-left: 30px">2) It only has one case for inserting links and one case for deleting links. This is possibly my overall favourite thing.  The same approach to storing the head &amp; tail that gives this property could be used in any list class.</p>
<p>Oh wait there&#8217;s three reasons:</p>
<p style="padding-left: 30px">3) The links move about between lists with the objects.</p>
<p style="padding-left: 30px">It&#8217;s not really important in the &#8220;generic list&#8221; case, but if you&#8217;re using a free list and an active list this saves all of the overhead of managing the list elements that the lists are built of.</p>
<p>Sorry, four reasons (&#8220;hold on, I&#8217;ll come in again&#8230;&#8221;):</p>
<p style="padding-left: 30px">4) The only real tradeoff for this awesomeness is that you can&#8217;t put a given object in more than one list simultaneously.</p>
<p style="padding-left: 30px">This might be fixable, but I&#8217;d be surprised if it didn&#8217;t need a significantly different implementation.</p>
<h2>Parting thoughts</h2>
<p>As I mentioned at the start, TNoAllocList  is not a one size fits all data structure.</p>
<p>I think it&#8217;s probably most suited to the use case that inspired me to write it in the first place &#8211; managing of  free / active lists of objects.</p>
<p>I hope this wasn&#8217;t a waste of your time, and ideally I hope it was useful &#8211; or at least interesting.</p>
<p>The final thing to remember is, just because your list <em>can&#8217;t</em> fragment memory it doesn&#8217;t mean <em>you</em> won&#8217;t.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.altdevblogaday.com/2011/09/25/my-favourite-templated-list/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

<!-- Dynamic page generated in 6.263 seconds. -->
<!-- Cached page generated by WP-Super-Cache on 2013-05-22 11:17:09 -->
<!-- Compression = gzip -->