<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>#AltDevBlogADay &#187; Niklas Frykholm</title>
	<atom:link href="http://www.altdevblogaday.com/author/niklas-frykholm/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.altdevblogaday.com</link>
	<description>Each day a little more #gamedev love</description>
	<lastBuildDate>Wed, 22 May 2013 11:16:20 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.5.1</generator>
		<item>
		<title>Code Share: Source Censoring, Part 2</title>
		<link>http://www.altdevblogaday.com/2013/04/30/code-share-source-censoring-part-2/</link>
		<comments>http://www.altdevblogaday.com/2013/04/30/code-share-source-censoring-part-2/#comments</comments>
		<pubDate>Tue, 30 Apr 2013 15:38:24 +0000</pubDate>
		<dc:creator>Niklas Frykholm</dc:creator>
				<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://www.altdevblogaday.com/?p=29389</guid>
		<description><![CDATA[<p>A while ago I shared the tool we use for <a class="link" href="http://www.altdevblogaday.com/2011/11/22/code-share-source-censoring/">censoring source code</a> in the Bitsquid engine.</p>
<p>Quick recap: We need to censor the code in our source distributions because there are parts of the code that are covered by NDAs to third parties and cannot be publicly disclosed. We do this with a tool that strips out the secret code and replaces it with blank lines, based on preprocessor definitions.</p>
<p><a href="http://www.altdevblogaday.com/2013/04/30/code-share-source-censoring-part-2/" class="more-link">Read more on Code Share: Source Censoring, Part 2&#8230;</a></p>
]]></description>
				<content:encoded><![CDATA[<p>A while ago I shared the tool we use for <a class="link" href="http://www.altdevblogaday.com/2011/11/22/code-share-source-censoring/">censoring source code</a> in the Bitsquid engine.</p>
<p>Quick recap: We need to censor the code in our source distributions because there are parts of the code that are covered by NDAs to third parties and cannot be publicly disclosed. We do this with a tool that strips out the secret code and replaces it with blank lines, based on preprocessor definitions.</p>
<p>The stripping tool is only part of the solution, though. It works well if you only distribute code drops. You take a snapshot of the code, run the stripping tool to strip out secrets, zip it up. Done!</p>
<p>But frankly this is a <em>terrible</em> way of distributing source code. There is no history, no indication of what has changed from version to version and keeping local changes merged with the mainline is a constant pain in the ass.</p>
<p>The only sane way of distributing source code is to expose a <em>mercurial</em> (or <em>git)</em> source repository that you can pull changes from. This lets customers examine the history, find out which version introduced a particular bug, maintain their own branches that they merge with the mainline at their convenience, etc.</p>
<p>But of course, we cannot just share our own internal repository (because it contains secrets).</p>
<h2>hg-clone.rb</h2>
<p>We handle this with another tool, that we have inventively decided to call <em>hg-clone.rb</em>.</p>
<p>What <em>hg-clone.rb</em> does is pretty straight forward. Given two repositories as argument, a <em>SOURCE</em> and a <em>DESTINATION</em>, it checks out each revision in the <em>SOURCE</em> repository, runs a <em>filter</em> program (to strip out any secrets) and checks the result into the destination repository.</p>
<pre>SRC:    0  --&gt; 1  --&gt; 2  --&gt; 3  --&gt; 4  --&gt; 5  --&gt; ...
    	|      |      |      |      |      |
    	F      F      F      F      F      F
    	|      |      |      |      |      |
        v      v      v      v      v      v
DST:    0' --&gt; 1' --&gt; 2' --&gt; 3' --&gt; 4' --&gt; 5' --&gt; ...</pre>
<p>You call the program as</p>
<pre>hg-clone SOURCE DESTINATION --filter FILTER --target TARGET-REV --cutoff CUTOFF-REV</pre>
<p><em>SOURCE</em> and <em>DESTINATION</em> are the source and destination repositories. <em>DESTINATION</em> does not need to exist, if it doesn&#8217;t it will be created. <em>FILTER</em> is the filter program, it will be run once in the destination directory before each revision is committed.</p>
<p><em>TARGET-REV</em> is the target revision that should be copied from the source to the destination. <em>hg-clone</em> will first transfer the parent(s) of the target revision to the destination repository (if they haven&#8217;t already been transfered), then it will transfer the target revision. This process is applied recursively, so if the parents&#8217; parents haven&#8217;t been transferred, they will be transferred first, etc. Only the revisions that are ancestors of <em>TARGET-REV</em> will be transferred, so you can have secret development branches that won&#8217;t be copied to the destination until they have been merged with your release branch.</p>
<p>If you don&#8217;t specify a <em>TARGET-REV</em>, the last revision in the source repository will be used.</p>
<p><em>CUTOFF-REV</em> can be used to cutoff the recursive parent transfer at a specific revision. If you set the cutoff to revision 1000, then any revision that has a parent before revision 1000 will be reparented to revision 1000 in the destination repository. Essentially, in the destination repository, history will start at revision 1000. This can be used to hide a shady past.</p>
<p><em>hg-clone</em> tries its best to preserve authors, dates, messages, branches, etc between the source and destination repositories. It cannot however preserve version numbers, since those are based on a content hash, which changes when the filter is applied. What it does instead is to insert a marker <em>[clonedfrom:116:91fe33c1a569]</em> in the commit message that specifies which revision in the source repository the current revision in the destination repository was cloned from. This commitment marker is also used to determine the mapping between revisions in the source and the destination and whether a particular revision has already been copied over or not.</p>
<p>To use this in practice, you would typically set up one external repository for each customer with a corresponding filter program for stripping out the things that customer is not allowed to see. Then you would set up a <em>cron</em> job to run <em>hg-clone</em> and copy revisions from the main repository to the customer&#8217;s.</p>
<p>Instead of having one repository per customer, you could alternatively have one repository for each possible NDA combination (e.g., +PS3 +PS4 -X360). However, this can be problematic, because if a customer becomes disclosed for a platform you will have to switch them over to a new repository, which might be painful. If you have one repository per customer you can just change the filter function.</p>
<p>The <em>hg-clone</em> program is available from <a class="link" href="https://bitbucket.org/bitsquid/hg_clone">our bitbucket repository</a>.</p>
<p>This has also been posted to <a class="link" href="http://bitsquid.blogspot.com">The Bitsquid blog</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.altdevblogaday.com/2013/04/30/code-share-source-censoring-part-2/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>What is gimbal lock and why do we still have to worry about it?</title>
		<link>http://www.altdevblogaday.com/2013/03/15/what-is-gimbal-lock-and-why-do-we-still-have-to-worry-about-it/</link>
		<comments>http://www.altdevblogaday.com/2013/03/15/what-is-gimbal-lock-and-why-do-we-still-have-to-worry-about-it/#comments</comments>
		<pubDate>Fri, 15 Mar 2013 14:17:48 +0000</pubDate>
		<dc:creator>Niklas Frykholm</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[euler angles]]></category>
		<category><![CDATA[gimbal lock]]></category>

		<guid isPermaLink="false">http://www.altdevblogaday.com/?p=29247</guid>
		<description><![CDATA[<p>If you have ever worked with rotations and Euler angles you are probably at least somewhat familiar with the phrase <em>&#8220;gimbal lock&#8221;</em>. But like many things concerning rotations, angles and spaces it can be tricky to visualize and get a good grasp of.</p>
<p><a href="http://www.altdevblogaday.com/2013/03/15/what-is-gimbal-lock-and-why-do-we-still-have-to-worry-about-it/" class="more-link">Read more on What is gimbal lock and why do we still have to worry about it?&#8230;</a></p>
]]></description>
				<content:encoded><![CDATA[<p>If you have ever worked with rotations and Euler angles you are probably at least somewhat familiar with the phrase <em>&#8220;gimbal lock&#8221;</em>. But like many things concerning rotations, angles and spaces it can be tricky to visualize and get a good grasp of.</p>
<p>Sometimes it feels like every time I need to think about gimbal lock I have forgotten everything about it and have to go back the beginning and ask myself: OK, but what is it <em>really</em> that is happening?</p>
<p>Hopefully, this article will take care of that problem.</p>
<p>The <a class="link" href="http://en.wikipedia.org/wiki/Gimbal_lock">Wikipedia page</a> shows how gimbal lock can happen in a mechanical system. But it isn&#8217;t necessarily self-evident how this translates to the computer game world. In the computer there are no mechanical limitations, we can rotate an object however we like. How can anything be &#8220;locked&#8221;?</p>
<h2>Euler angles</h2>
<p>When we are using <a class="link" href="http://en.wikipedia.org/wiki/Euler_angles">Euler angles</a>, we represent an object&#8217;s orientation as three consecutive rotations around the object&#8217;s axes. We can choose the axes and the order in which we apply the rotations arbitrarily, and depending on what we choose we get different Euler representations. So XYZ is the Euler representation where the first angle rotates the object around its X-axis, the second around its (new) Y-axis and the third around its (new) Z-axis. YZX gives us a different representation. We can even have representations with repeated axes, such as XZX.</p>
<p>So if we want to talk about the &#8220;Euler angles&#8221; of an object, we really must also talk about what axes we are rotating around and in what order. Otherwise we have no idea at all what we are talking about. Unfortunately, many articles about Euler angles are pretty sloppy with this and throw around terms like <em>yaw</em>, <em>pitch</em> and <em>roll</em> as if they had completely well-defined and unambiguous meanings. I prefer to use more wordy, but descriptive names, such as <em>euler_xyz[0]</em> that unambiguously state the axis rotation order and the index of the angle we are talking about.</p>
<p>An object has three rotational degrees of freedom and it is quite easy to see that the three Euler angles for a particular axis order (XYZ) are enough to define any possible orientation of an object. Note though that the representation is not unique. There are many possible Euler angles that represent the same orientation. For example, adding 360 degrees to any of the three angles will give us a different representation that results in the same object orientation.</p>
<p>So &#8220;gimbal lock&#8221; doesn&#8217;t mean that there are rotations that can&#8217;t be expressed as Euler angles. We can express any rotation in Euler angle form. Given an object, we can convert its orientation to Euler angles, and from that orientation we can rotate the object however we like and convert the new orientation to other Euler angles.</p>
<p>So what exactly is it that is &#8220;locked&#8221;? It seems we can do whatever we like.</p>
<h2>The &#8220;lock&#8221; in gimbal lock</h2>
<p>The term &#8220;gimbal lock&#8221; comes from the mechanical world. If the problem had originated in the world of computers, it would probably have been called something less confusing, such as &#8220;Euler angle flip&#8221; or &#8220;coordinate singularity&#8221;.</p>
<p>Because, in the computer world, there is really nothing that gets &#8220;locked&#8221;. Instead, the problem is this: when the Euler angles have particular values, there are orientations that are very similar to the current orientation which can&#8217;t be achieved by just making small changes to the Euler angles. In particular, this happens when one of the angles is at 90 degrees, so that two rotation axes coincide.</p>
<p>So even though the orientations are &#8220;close&#8221; in the real world, they are not close in the Euler representation. In fact, at least one of the Euler angles will have to flip 180 degrees in order for us to represent the new orientation.</p>
<p>So one of the angles have to flip? What is the big deal? Can&#8217;t we just flip it and get on with our stuff?</p>
<p>We can, as long as the angles only represent instantaneous &#8220;snapshots&#8221; of the object&#8217;s orientation. However, if the angles represent key frames in an animation and we want to interpolate between those key frames we run into trouble. If one of the angles flips 180 degrees between two key frames and we interpolate between those values, we will see the the object animating through all those 180 degrees. In the viewport, we will see the object doing a &#8220;flip&#8221; or &#8220;roll&#8221; that shouldn&#8217;t be there.</p>
<p>Note that it is only the interpolation that creates this unwanted behavior. If we just displayed the actual key frames and didn&#8217;t interpolate between them &#8212; everything would look right. We could work in Euler angles as much as we liked and be as close to the gimbal lock position as we wanted and no-one would ever know.</p>
<p>So the only thing we need to fix to get rid of gimbal lock is the interpolation. If you have done any work with 3D graphics you probably already know the answer &#8212; to use quaternions instead of Euler angles to represent angles. Quaternions don&#8217;t have the weird singularity points that Euler angles have and we can interpolate between any keyframes by just lerping the quaternions.</p>
<p>It doesn&#8217;t matter if the animation package is using Euler angles internally, as long as we convert everything to quaternions before we do the interpolation. Note that interpolation in quaternion space is not the same as interpolation in Euler space though, so to get as close as possible to what the animator intended, we probably want to sample the animation at our target frame rate and generate our quaternion key frames from those samples, rather than directly converting the animator&#8217;s key frames (which may be further apart).</p>
<p>Well, there is one caveat actually. If we have more than 180 degrees of rotation in a single frame we can&#8217;t represent that nicely with quaternions. Quaternions always lerp the shortest path between two orientations and you can&#8217;t represent several &#8220;laps&#8221; of rotation with quaternions as you can do with Euler angles (by setting one of the angles to 9000 degrees, for example). But you can fix that by sampling at a higher frame rate if you need to represent really fast rotations with quaternions.</p>
<p>So with that we can say good bye and good riddance to Euler angles and never have to worry about their sorry gimbal locking asses ever again.</p>
<p>Or so you may think&#8230;</p>
<h2>The return of gimbal lock</h2>
<p>I certainly thought so, until I started working on the new cutscene animation system for our level editor.</p>
<p>You see, animators really like to work with <em>curves</em>. They like to see a visual representation of what the animation will do to an object over time with key points that can be moved and handles that can be adjusted to change the slope of the curve.</p>
<div id="attachment_29248" class="wp-caption aligncenter" style="width: 970px"><a href="http://www.altdevblogaday.com/wp-content/uploads/2013/03/curves.png"><img src="http://www.altdevblogaday.com/wp-content/uploads/2013/03/curves.png" alt="Curves! Animators love them!" width="960" height="587" class="size-full wp-image-29248" /></a><p class="wp-caption-text">Curves! Animators love them!</p></div>
<p>Quaternions are great for interpolation, but they are no good for curve editing.</p>
<p>Sure, you could probably draw some curves that represented a quaternion (the laziest thing would be to just draw the x, y and z components of the quaternion), but those curves wouldn&#8217;t <em>mean</em> anything to an animator, the way the Euler angle curves do. They wouldn&#8217;t be able to <em>do</em> anything with them.</p>
<p>So, animators want curves with keyframe interpolation. Curves need Euler angles. But what happens when we mix Euler angles with keyframe interpolation? Presto! Our old friend the gimbal lock is back again! Haven&#8217;t we missed him.</p>
<p>That&#8217;s it. We&#8217;re stuck. Gimbal lock is here to stay and the animators will just have to work around it.</p>
<p>And we have to add support for all the usual tricks and workarounds that animators use to get around gimbal lock, such as changing the axis order (from XYZ, to XYZ, XZX or another of the twelve possible permutations), converting to quaternion and back again, applying an &#8220;Euler filter&#8221;, etc.</p>
<p>But who said this game engine gig should be easy?</p>
<p>This has also been posted to <a class="link" href="http://bitsquid.blogspot.com">The Bitsquid blog</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.altdevblogaday.com/2013/03/15/what-is-gimbal-lock-and-why-do-we-still-have-to-worry-about-it/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>A Bug in Object Replication and Message Reordering</title>
		<link>http://www.altdevblogaday.com/2013/03/01/a-bug-in-object-replication-and-message-reordering/</link>
		<comments>http://www.altdevblogaday.com/2013/03/01/a-bug-in-object-replication-and-message-reordering/#comments</comments>
		<pubDate>Fri, 01 Mar 2013 16:13:59 +0000</pubDate>
		<dc:creator>Niklas Frykholm</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[migration]]></category>
		<category><![CDATA[network]]></category>
		<category><![CDATA[peer to peer]]></category>

		<guid isPermaLink="false">http://www.altdevblogaday.com/?p=29212</guid>
		<description><![CDATA[<p>The Bitsquid network system supports a peer-to-peer model with <em>migration</em> of network objects &#8212; i.e., changing the owner of a network object from one peer to another. This recently lead to an rare race condition.</p>
<p><a href="http://www.altdevblogaday.com/2013/03/01/a-bug-in-object-replication-and-message-reordering/" class="more-link">Read more on A Bug in Object Replication and Message Reordering&#8230;</a></p>
]]></description>
				<content:encoded><![CDATA[<p>The Bitsquid network system supports a peer-to-peer model with <em>migration</em> of network objects &#8212; i.e., changing the owner of a network object from one peer to another. This recently lead to an rare race condition.</p>
<p>To understand this bug you must first understand a little bit about how our network system works.</p>
<h2>Background</h2>
<p>The entire network model is based on an packet delivery system (on top of UDP) that provides ACKs for unreliable packets as well as a reliable (and ordered) packet stream between any two network endpoints. At the next layer we have implemented a remote-procedure-call service for Lua as well as an object replication system.</p>
<p>Games can use these services however they like, but our recommendation is to do as much as possible with the object replication system and as little as possible with RPC calls, since using explicit RPC messages tends to require more bandwidth and be more error prone.</p>
<p>The network be run in both client-server and peer-to-peer mode. The only difference is that in client-server mode, the server relays all messages (clients never talk directly to each other) and owns most of the network objects. (Clients <em>can</em> own objects in client-server mode, in that case the changes to the objects are relayed by the server.)</p>
<p>Objects are replicated with a message stream that looks something like this:</p>
<pre>A: CREATE [wait for ack] UPDATE_1 UPDATE_2 ... UPDATE_n DESTROY</pre>
<p>Here, <em>A</em> (the owner of the object) first sends a reliable message that creates the object. When it has received an <em>ACK</em> for that message, it starts sending updates, informing the other players about changes to the object. (By monitoring ACKs, it knows which changes the other players have received, so it only sends updates when necessary and it will resend updates if the messages are lost.) Finally, at some future point, the object is destroyed, through another reliable message.</p>
<p>The <em>UPDATE</em> messages are sent on the unreliable stream (for maximum performance), so they can potentially arrive before <em>CREATE</em> or after <em>DELETE</em>. But this is not a problem, because we simply ignore <em>UPDATE</em> messages that arrive out of order.</p>
<p>This approach ensures that everybody that participates in the game session will see the same set of objects with the same properties (at least eventually, keeping in mind that messages can be delayed).</p>
<h2>Migration</h2>
<p>Migration complicates this picture somewhat.</p>
<p>Migrating a network object means changing the owner of the object from one peer to another. There are a number of reasons why you might want to do that. First, if a player drops out of the game, the objects owned by that player may need to be taken over by somebody else. Second, in a peer-to-peer game we may want to load balance, so that each peer is managing about the same amount of objects. Finally, sometimes a particular player is interacting directly with a particular object (picking up a rock, etc). It can then be beneficial to make that player owner of the object, so that the interaction is not affected by network latency.</p>
<p>In our network, migration is implemented with a reliable <em>MIGRATION</em> message that tells everybody in the session about the object&#8217;s new owner. The migration message is always sent by a special peer, the <em>HOST</em> of the game session. (To ensure that peers do not compete for the ownership of an object.)</p>
<p>So if we look at a message stream with migration involved, it looks something like this:</p>
<pre>   A:  C Ua Ua Ua Ua Ua
HOST:                    M_ab
   B:                          Ub Ub Ub Ub Ub Ub Ub Ub D</pre>
<p>If you are an experienced network programmer you should start to smell trouble at this point.</p>
<p>The problem is that while the message system provides an ordered stream of messages between any two endpoints, there is no ordering of messages between <em>different</em> endpoints.</p>
<p>Consider an additional network peer <em>X</em>. There is an ordered stream of messages <em>A → X</em>. There is also an ordered stream of messages <em>B → X</em>. But there is no guaranteed ordering between the messages sent from <em>A</em> and the messages sent from <em>B</em> and <em>HOST</em>. So, suppose the messages from <em>A → X</em> are delayed. Then <em>X</em> could see the following message stream:</p>
<pre>M_ab Ub Ub Ub D C Ua Ua Ua</pre>
<p>So <em>X</em> gets a request to migrate the object before it has been created. And the creation message arrives after <em>DELETE</em>. In other words, a complete mess.</p>
<p>To be sure, this only happens if the object gets migrated <em>really</em> close to being created or deleted and if there are asymmetric network delays on top of that. But of course, it always happens to <em>someone</em>.</p>
<h2>The Fix</h2>
<p>There are many possible ways of fixing this. Here are some:</p>
<ul>
<li>
<p><em>We could impose a global message ordering.</em> We could make sure that the reliable message streams are globally ordered to prevent &#8220;paradoxes&#8221; of this kind. I.e., if <em>HOST</em> sends <em>M_ab</em> after receiving <em>C</em>, no peer should receive <em>M_ab</em> before <em>C</em>. Unfortunately, this is not as easy as it sounds. For example, what if <em>A</em> dies before it has sent <em>C</em> to <em>X?</em> In that case, that failed delivery will also block the channels <em>HOST → X</em> and <em>B → X</em>, since they are not allowed to deliver any messages before <em>X</em> has received <em>C</em>.</p>
</li>
<li>
<p><em>We could use a migration handshake.</em> We could do some kind of handshake procedure to make sure that everybody has received <em>M_ab</em>, before <em>B</em> takes over ownership. But this would require a lot of extra messages and temporarily put the object in limbo.</p>
</li>
<li>
<p><em>We could fix the ACKs.</em> We could make it so that <em>X</em> doesn&#8217;t <em>ACK</em> <em>M_ab</em> until <em>C</em> has arrived, thus forcing <em>HOST</em> to keep resending it, until we are ready to receive it. This would work, but would require us to implement ACKing of individual messages. Currently, we just ACK an entire UDP packet (containing many messages) on reception, which is simpler and more performant.</p>
</li>
<li>
<p><em>We could create an internal message queue.</em> We could queue up migration, create and delete messages in some sort of internal queue if they arrive out of order and try to fix things up later. This is a truly horrible &#8220;solution&#8221; that increases code complexity and is likely to cause lots of confusing bugs in the future.</p>
</li>
</ul>
<p>All these solutions are probably workable, but they all have the drawback of increasing complexity. And I <em>really</em> don&#8217;t like to increase the complexity of network code. Reasoning about network code is hard enough as it is, we should always strive for the simplest solution possible.</p>
<p>So, instead, the first thing I did was to simplify the problem by eliminating the host from the equation. I simply let the new owner send out the migration message instead of the host:</p>
<pre>   A:  C Ua Ua Ua Ua Ua
   B:                    M_ab Ub Ub Ub Ub Ub Ub Ub Ub D</pre>
<p>This is already a lot better. Now we only have two parties to worry about (apart from <em>X)</em>, instead of three.</p>
<p>We still want the host to be <em>in charge</em> of migration. Otherwise we run into tricky problems of what should happen if several peers try to assume ownership of an object at the same time. So we let the host initiate the migration by sending a message to the new owner (<em>B)</em>. Then, <em>B</em> is responsible for notifying everybody else about this.</p>
<p>With this approach, we can use the same &#8220;wait for ack&#8221; trick that we used during creation to make sure that <em>B</em> doesn&#8217;t send any updates to peers that haven&#8217;t acked the migration:</p>
<pre>   A:  C [wait] Ua Ua Ua Ua Ua
   B:                            M_ab [wait] Ub Ub Ub Ub Ub Ub Ub Ub D</pre>
<p>We still haven&#8217;t completely solved the problem, <em>X</em> can still see weird message orderings such as:</p>
<pre>M_ab   C   D
M_ab   D   C</pre>
<p>But this won&#8217;t be a problem as long as we do two things:</p>
<ul>
<li>
<p>We let <em>MIGRATE</em> act as a <em>CREATE</em> message, if we get <em>MIGRATE</em> for an object that doesn&#8217;t exist.</p>
</li>
<li>
<p>We ignore &#8220;old&#8221; <em>CREATE</em> messages. (The <em>C</em> that arrives after <em>M.)</em></p>
</li>
</ul>
<p>To be able to distinguish old messages I introduced a <em>migration counter</em>. This is just a number that starts at zero when the object is created and is increased (by <em>HOST)</em> every time the object is migrated.</p>
<p>We tag all <em>CREATE</em>, <em>DESTROY</em> and <em>MIGRATE</em> messages with the migration counter and simply ignore &#8220;old&#8221; messages. With this approach, the message streams will look like this:</p>
<pre>   A:  C_0 [wait] Ua Ua Ua Ua Ua
   B:                             M_ab_1 [wait] Ub Ub Ub Ub Ub Ub Ub Ub D_1</pre>
<p>We can now verify that all possible message orderings that <em>X</em> can see work correctly:</p>
<pre>C_0      M_ab_1  D_1  -- ok, the expected order
M_ab_1   C_0     D_1  -- ok, M_ab_1 creates the object with migration counter 1 and C_0 is ignored
M_ab_1   D_1     C_0  -- ok, M_ab_1 creates the object with migration counter 1 and C_0 is ignored</pre>
<p>The system works equally well if there are multiple migration steps:</p>
<pre>   A:  C_0 [wait] Ua Ua 
   B:                   M_ab_1 [wait] Ub Ub Ub
   C:                                            M_bc_2 [wait] Uc Uc Uc D_2</pre>
<p>No matter in which order the messages arrive we will end up in the correct state.</p>
<p>This has also been posted to <a class="link" href="http://bitsquid.blogspot.com">The Bitsquid blog</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.altdevblogaday.com/2013/03/01/a-bug-in-object-replication-and-message-reordering/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Why Lua?</title>
		<link>http://www.altdevblogaday.com/2013/02/19/why-lua/</link>
		<comments>http://www.altdevblogaday.com/2013/02/19/why-lua/#comments</comments>
		<pubDate>Tue, 19 Feb 2013 08:23:00 +0000</pubDate>
		<dc:creator>Niklas Frykholm</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[lua]]></category>

		<guid isPermaLink="false">http://www.altdevblogaday.com/?p=29194</guid>
		<description><![CDATA[<p>A question that I get asked regularly is why we have chosen <a class="link" href="http://www.lua.org/">Lua</a> as our engine scripting language. I guess as opposed to more well-known languages, such as JavaScript or C#. The short answer is that Lua is lighter and more elegant than both those languages. It is also faster than JavaScript and more dynamic than C#.</p>
<p><a href="http://www.altdevblogaday.com/2013/02/19/why-lua/" class="more-link">Read more on Why Lua?&#8230;</a></p>
]]></description>
				<content:encoded><![CDATA[<p>A question that I get asked regularly is why we have chosen <a class="link" href="http://www.lua.org/">Lua</a> as our engine scripting language. I guess as opposed to more well-known languages, such as JavaScript or C#. The short answer is that Lua is lighter and more elegant than both those languages. It is also faster than JavaScript and more dynamic than C#.</p>
<p>When we started Bitsquid, we set out four key design principles for the engine:</p>
<ul>
<li>
<p><strong>Simplicity</strong>. (A small, manageable codebase with a minimalistic, modular design.)</p>
</li>
<li>
<p><strong>Flexibility</strong>. (A completely data-driven engine that is not tied to any particular game type.)</p>
</li>
<li>
<p><strong>Dynamism</strong>. (Fast iteration times, with hot reload of everything on real target platforms.)</p>
</li>
<li>
<p><strong>Speed</strong>. (Excellent multicore performance and cache-friendly data-oriented layouts.)</p>
</li>
</ul>
<p>Whenever we design new systems for the engine, we always keep these four goals in mind. As we shall see below, Lua does very well on all four counts, which makes it a good fit for our engine.</p>
<h2>Simplicity in Lua</h2>
<p>As I grow older (and hopefully more experienced) I find myself appreciating simplicity more and more. My favorite scripting language has gone from &#8220;Swiss army chainsaw&#8221; <a class="link" href="http://www.perl.org/">Perl</a> (I claim youthful ignorance!) to &#8220;kitchen-drawer-esque&#8221; <a class="link" href="http://www.ruby-lang.org/en/">Ruby</a>, to minimalistic <a class="link" href="http://www.lua.org/">Lua</a>.</p>
<p>Lua is <em>really small</em> for a programming language. The entire Lua syntax fits on a single page. In fact, here it is:</p>
<pre>chunk ::= {stat [`;´]} [laststat [`;´]]
block ::= chunk
stat ::=  varlist `=´ explist | 
     functioncall | 
     do block end | 
     while exp do block end | 
     repeat block until exp | 
     if exp then block {elseif exp then block} [else block] end | 
     for Name `=´ exp `,´ exp [`,´ exp] do block end | 
     for namelist in explist do block end | 
     function funcname funcbody | 
     local function Name funcbody | 
     local namelist [`=´ explist] 
laststat ::= return [explist] | break
funcname ::= Name {`.´ Name} [`:´ Name]
varlist ::= var {`,´ var}
var ::=  Name | prefixexp `[´ exp `]´ | prefixexp `.´ Name 
namelist ::= Name {`,´ Name}
explist ::= {exp `,´} exp
exp ::=  nil | false | true | Number | String | `...´ | function | 
     prefixexp | tableconstructor | exp binop exp | unop exp 
prefixexp ::= var | functioncall | `(´ exp `)´
functioncall ::=  prefixexp args | prefixexp `:´ Name args 
args ::=  `(´ [explist] `)´ | tableconstructor | String 
function ::= function funcbody
funcbody ::= `(´ [parlist] `)´ block end
parlist ::= namelist [`,´ `...´] | `...´
tableconstructor ::= `{´ [fieldlist] `}´
fieldlist ::= field {fieldsep field} [fieldsep]
field ::= `[´ exp `]´ `=´ exp | Name `=´ exp | exp
fieldsep ::= `,´ | `;´
binop ::= `+´ | `-´ | `*´ | `/´ | `^´ | `%´ | `..´ | 
     `&lt;´ | `&lt;=´ | `&gt;´ | `&gt;=´ | `==´ | `~=´ | 
     and | or
unop ::= `-´ | not | `#´</pre>
<p>The same minimalistic philosophy is applied across the entire language. From the standard libraries to the C interface to the actual language implementation. You can understand all of Lua by just understanding a few key concepts.</p>
<p>Lua&#8217;s simplicity and size does not mean that it lacks features. Rather it is just <em>really well</em> designed. It comes with a small set of orthogonal features that can be combined in lots of interesting ways. This gives the language a feeling of elegance, which is quite rare in the programming world. It is not a perfect language (perfect languages don&#8217;t exist), but it is a little gem that fits very well into its particular niche. In that way, Lua is similar to C (the original, not the C++ monstrosity) &#8212; it has a nice small set of features that fit very well together. (I suspect that Smalltalk and LISP also have this feeling of minimalistic elegance, but I haven&#8217;t done enough real-world programming in those languages to really be able to tell.)</p>
<p>As an example of how powerful Lua&#8217;s minimalism can be, consider this: Lua does not have a class or object system, but that doesn&#8217;t matter, because you can implement a class system in about 20 lines or so of Lua code. In fact, here is one:</p>
<pre>function class(klass, super)
    if not klass then
        klass = {}
        
        local meta = {}
        meta.__call = function(self, ...)
            local object = {}
            setmetatable(object, klass)
            if object.init then object:init(...) end
            return object
        end
        setmetatable(klass, meta)
    end
    
    if super then
        for k,v in pairs(super) do
            klass[k] = v
        end
    end
    klass.__index = klass
    
    return klass
end</pre>
<p>If you prefer prototype based languages &#8212; no problem &#8212; you can make a prototype object system in Lua too. </p>
<p>Smallness and simplicity makes everything easier. It makes Lua easier to learn, read, understand, port, master and optimize. A project such as LuaJIT &#8212; created by a single developer &#8212; would not have been possible in a more complicated language.</p>
<h2>Flexibility in Lua</h2>
<p>Lua is a fully featured language, and in the Bitsquid engine, Lua is not just used as an extension language, rather it has direct control over the gameplay loop. This means that you have complete control over the engine from Lua. You can create completely different games by just changing the Lua code. (Examples: First person medieval combat <a class="link" href="http://www.waroftherosesthegame.com/">War of the Roses</a>, top-down RTS <a class="link" href="http://www.kratergame.com/">Krater</a>, beat-em-up platformer <a class="link" href="http://www.theshowdowneffect.com/">Showdown</a> and hand-held puzzler <a class="link" href="http://www.tegrazone.com/games/hamiltons">Hamilton</a>.)</p>
<h2>Dynamism in Lua</h2>
<p>Unlike C#, which only has limited support for <em>Edit and Continue</em>, Lua makes it possible to reload <em>everything</em> &#8212; the entire program &#8212; on all target platforms, including consoles, mobiles and tablets.</p>
<p>This means that gameplay programmers can work on the code, tweak constants, fix bugs and add features without having to restart the game. And they can do this while running on the real target hardware, so that they know exactly what performance they get, how the controls feel and how much memory they are using. This enables fast iterations which is the key to increasing productivity and improving quality in game development.</p>
<h2>Speed of Lua</h2>
<p>Measuring the performance of a language is always tricky, but by most accounts, <a class="link" href="http://luajit.org/">LuaJIT 2</a> is one of the fastest dynamic language implementations in the world. It outperforms other dynamic languages on many benchmarks, often by a substantial margin.</p>
<p>On the platforms where JITting isn&#8217;t allowed, LuaJIT can be run in interpreter mode. The interpreter mode of LuaJIT is very competitive with other non-JITed language implementations.</p>
<p>Furthermore, Lua has a very simple C interoperability interface (simplified further by LuaJIT FFI). This means that in performance critical parts of the code it is really easy to drop into C and get maximum performance.</p>
<h2>Lua&#8217;s weak points</h2>
<p>As I said above, no language is perfect. The things I miss most when programming in Lua don&#8217;t have that much to do with the actual language, but rather with the ecosystem around it. C# has spoiled me with things like an integrated debugger, Intellisense, a very active <a class="link" href="http://stackoverflow.com/">StackOverflow</a> community and the wonderfully helpful <a class="link" href="http://www.jetbrains.com/resharper/">ReSharper</a>. Lua has no &#8220;official&#8221; debugger, and not much in the way of autocompletion or refactoring tools.</p>
<p>Some people would argue that this shouldn&#8217;t be counted as an argument against Lua, since it doesn&#8217;t really concern the <em>language</em> Lua. I disagree. A language is not a singular, isolated thing. It is part of a bigger programming experience. When we judge a language we must take that entire experience into account: Can you find help in online forums? Are there any good free-to-use development tools? Is the user base fragmented? Can you easily create GUIs with native look-and-feel? Etc.</p>
<p>The lack of an official debugger is not a huge issue. Lua has an excellent debugging API that can be used to communicate with external debuggers. Using that API you can quite easily write your own debugger (we have) or integrate a debugger into your favorite text editor. Also, quite recently, the <a class="link" href="http://unknownworlds.com/blog/lua-ide-decoda-open-source/">Decoda IDE</a> was open sourced, which means there is now a good open source debugger available.</p>
<p>Getting autocompletion and refactoring to work well with Lua is trickier. Since Lua is dynamically typed the IDE doesn&#8217;t know the type of variables, parameters or return values. So it doesn&#8217;t know what methods to suggest. And when doing refactoring operations, it can&#8217;t distinguish between methods that have the same name, but operate on different types.</p>
<p>But I don&#8217;t think it necessarily <em>has</em> to be this way. An IDE could do type inference and try to guess the type of variables. For example, if a programmer started to write something like this:</p>
<pre>local car = Car()
car:</pre>
<p>the IDE could infer that the variable <em>car</em> was of type <em>Car</em>. It could then display suitable autocompletion information for the <em>Car</em> class.</p>
<p>Lua&#8217;s dynamic nature makes it tricky to write type inference code that is guaranteed to be 100 % correct. For example, a piece of Lua code could dynamically access the global <em>_G</em> table and change the <em>math.sin()</em> function so that returned a string instead of a number. But such examples are probably not <em>that</em> common in regular Lua code. Also, autocompletion backed by type inference could still be very useful to the end user even if it wasn&#8217;t always 100 % correct.</p>
<p>Type inference could be combined with explicit type hinting to cover the cases where the IDE was not able to make a correct guess (such as for functions exposed through the C API). Hinting could be implemented with a specially formatted comment that specified the type of a variable or a function:</p>
<pre>-- @type Car -&gt; number
function top_speed(car)
    ...
end</pre>
<p>In the example above, the comment would indicate that <em>top_speed</em> is a function that takes a <em>Car</em> argument and returns a <em>number</em>.</p>
<p>Type hinting and type inference could also be used to detect &#8220;type errors&#8221; in Lua code. For example, if the IDE saw something like this:</p>
<pre>local bike = Bicycle()
local s = top_speed(bike)</pre>
<p>it could conclude that since <em>bike</em> is probably a <em>Bicycle</em> object and since <em>top_speed</em> expects a <em>Car</em> object, this call will probably result in a runtime error. It could indicate this with a squiggly red line in the source code.</p>
<p>I don&#8217;t know of any Lua IDE that really explores this possibility. I might try it for my next hack day.</p>
<p>This has also been posted to <a class="link" href="http://bitsquid.blogspot.com">The Bitsquid blog</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.altdevblogaday.com/2013/02/19/why-lua/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Garbage Collection and Memory Allocation Sizes</title>
		<link>http://www.altdevblogaday.com/2013/01/31/garbage-collection-and-memory-allocation-sizes/</link>
		<comments>http://www.altdevblogaday.com/2013/01/31/garbage-collection-and-memory-allocation-sizes/#comments</comments>
		<pubDate>Thu, 31 Jan 2013 15:21:52 +0000</pubDate>
		<dc:creator>Niklas Frykholm</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[garbage collection]]></category>
		<category><![CDATA[lua]]></category>
		<category><![CDATA[memory management]]></category>

		<guid isPermaLink="false">http://www.altdevblogaday.com/?p=29153</guid>
		<description><![CDATA[<p>As a performance conscious programmer in a soft-realtime environment I&#8217;ve never been too fond of garbage collection.</p>
<p>Incremental garbage collectors (like the one in Lua) make it tolerable (you get rid of the horrible garbage collection stalls), but there is still something unsettling about it. I keep looking at the garbage collection time in the profiler, and I can&#8217;t shake the feeling that all that time is wasted, because <em>it doesn&#8217;t really do anything</em>.</p>
<p><a href="http://www.altdevblogaday.com/2013/01/31/garbage-collection-and-memory-allocation-sizes/" class="more-link">Read more on Garbage Collection and Memory Allocation Sizes&#8230;</a></p>
]]></description>
				<content:encoded><![CDATA[<p>As a performance conscious programmer in a soft-realtime environment I&#8217;ve never been too fond of garbage collection.</p>
<p>Incremental garbage collectors (like the one in Lua) make it tolerable (you get rid of the horrible garbage collection stalls), but there is still something unsettling about it. I keep looking at the garbage collection time in the profiler, and I can&#8217;t shake the feeling that all that time is wasted, because <em>it doesn&#8217;t really do anything</em>.</p>
<p>Of course that isn&#8217;t true. Garbage collection frees the programmers from a lot of busywork. And the time they gain can go into optimizing other systems, which leads to a net performance win.</p>
<p>It also simplifies some of the hairy ownership questions that arise when data is transferred between systems. Without garbage collection, those questions must be solved in some other way. Either by reference counting (error-prone) or by making local copies of the data to assume ownership (ugly and costly).</p>
<p>But still, there is that annoying performance hit. </p>
<p>I was pretty surprised to see that the developers <a class="link" href="http://golang.org/">Go</a>, a language that looks well-designed and targets low-level programmers, decided to go with garbage collection rather than manual memory management. It seemed like a strange choice.</p>
<p>But recently I&#8217;ve started to see things differently.</p>
<p>One thing I&#8217;ve noticed as I delve deeper and deeper into data-oriented design is that I tend to allocate memory in much larger chunks than before. It&#8217;s a natural consequence of trying to keep things continuous in memory, treating resources as large memory blobs and managing arrays of similar objects together.</p>
<p>This has interesting consequences for garbage collection, because when the garbage collector only has to keep track of a small number of large chunks, rather than a large number of small chunks, it can perform a lot better.</p>
<p>Let&#8217;s look at a simple example in Lua. Say we want to write a class for managing bullets. In the non-data-oriented solution, we allocate each bullet as a separate object:</p>
<pre>function Bullet:update(dt)
    self.position = self.position + self.velocity * dt
end

function Bullets:update(dt)
    for i,bullet in ipairs(self.bullets) do
        bullet:update(dt)
    end
end</pre>
<p>In the data-oriented solution, we instead use two big arrays to store the position and velocity of <em>all</em> the bullets:</p>
<pre>function Bullets:update(dt)
    for i=1,#self.pos do
        self.pos[i] = self.pos[i] + dt * self.vel[i]
    end
end</pre>
<p>I tested these two solutions with a large number of bullets and got two interesting results:</p>
<ul>
<li>
<p>The data-oriented solution runs <strong>50 times</strong> faster.</p>
</li>
<li>
<p>The data-oriented solution only needs <strong>half</strong> as much time for garbage collection.</p>
</li>
</ul>
<p>That the data-oriented solution runs so much faster shows what cache coherence can do for you. It is also a testament to how awesome LuaJIT is when you give it tight inner loops to work with.</p>
<p>Note that in this test, the <em>Bullet</em> code itself did not create any garbage. The speed-up comes from being faster at collecting the garbage created by <em>other systems</em>. And the reason for this is simply that with fewer, larger memory allocations, there is less stuff that the garbage collector has to trawl through. If we add in the benefit that the data-oriented solution will create fewer objects and generate less garbage, the benefits will be even greater.</p>
<p>So maybe the real culprit in isn&#8217;t garbage collection, but rather having many small memory allocations. And having many small memory allocations does not just hurt the garbage collector, it is bad for other reasons as well. It leads to bad cache usage, high overhead in the memory allocator, fragmentation and bad allocator performance. It also makes all kinds of memory problems harder to deal with: memory leaks, dangling pointers, tracking how much memory is used by each system, etc.</p>
<p>So it is not just garbage-collected languages like Lua that would benefit from allocating memory in larger chunks, but manually managed languages like C++ as well.</p>
<p>Recently, I&#8217;ve come to think that the best solution to memory management issues in C++ is to avoid the kitchen-sink global memory allocator as much as possible and instead let each subsystem take a much more hands-on approach to managing its own memory.</p>
<p>What I mean by this is that instead of having the sound system (for example) send lots of memory requests to the kitchen-sink memory manager, it would only request a few large memory blocks. Then, it would be the responsibility of the system to divide that up into smaller, more manageable pieces that it can make practical use of.</p>
<p>This approach has a number of advantages:</p>
<ul>
<li>
<p>Since the system knows the usage patterns for its data, it can arrange the memory efficiently. A global memory allocator has no such knowledge.</p>
</li>
<li>
<p>It becomes much easier to track memory use by system. There will be a relatively small number of global memory allocations, each tagged by system. It becomes obvious how much memory each system is consuming.</p>
</li>
<li>
<p>Memory <em>inside</em> a system can be easily tracked, since the system knows what the memory <em>means</em> and can thus give useful information about it (such as the name of the object that owns it).</p>
</li>
<li>
<p>When a system shuts down it can quickly and efficiently free all of its memory.</p>
</li>
<li>
<p>Fragmentation problems are reduced.</p>
</li>
<li>
<p>It actively encourages good memory behavior. It makes it easier to do good things (achieve cache locality, etc) and harder to do bad things (lots of small memory allocations).</p>
</li>
<li>
<p>Buffer overflows will tend to overwrite data within the same system or cause page faults, which will make them easier to find.</p>
</li>
<li>
<p>Dangling pointer access will tend to cause page faults, which will make them easier to find.</p>
</li>
</ul>
<p>I&#8217;m tempted to go so far as to <em>only</em> allow <em>whole page allocations</em> on the global level. I.e., a system would only be allowed to request memory from the global manager in chunks of whole system pages. Then it would be up to the system to divide that up into smaller pieces. For example, if we did the bullet example in C++, we might use one such chunk to hold our array of <em>Bullet</em> structs.</p>
<p>This has the advantage of completely eliminating external fragmentation. (Since everything is allocated in chunks of whole pages and they can be remapped by the memory manager.) We can still get address space fragmentation, but using a 64-bit address space should take care of that. And with this approach using 64-bit pointers is less expensive, because we have fewer individually allocated memory blocks and thus fewer pointers.</p>
<p>Instead we get internal fragmentation. If we allocate the bullet array as a multiple of the page size (say 4 K), we will on average have 2 K of wasted space at the end of the array (assuming the number of bullets is random).</p>
<p>But internal fragmentation is a <em>much nicer</em> problem to deal with than external fragmentation. When we have internal fragmentation, it is one particular system that is having trouble. We can go into that system and do all kinds of things to optimize how its handling memory and solve the problem. With external fragmentation, the problem is <em>global</em>. There is no particular system that owns it and no clear way to fix it other than to try lots of things that we hope might &#8220;improve&#8221; the fragmentation situation.</p>
<p>The same goes for out-of-memory problems. With this approach, it is very clear which system is using too much memory and easy to fix that by reducing the content or doing optimizations to that system.</p>
<p>Dealing with bugs and optimizations on a system-by-system simplifies things enormously. It is quite easy to get a good grasp of everything that happens in a particular system. Grasping everything happens in the entire engine is a superhuman task.</p>
<p>Another nice thing about this approach is that it is quite easy to introduce it on a system-by-system basis. All we have to do is to change one system at a time so that it allocates its memory using the page allocator, rather than the kitchen-sink allocator. </p>
<p>And if we have some messy systems left that are too hard to convert to this approach we can just let them keep using the kitchen-sink allocator. Or, even better, we can give them their own private heaps in memory that they allocate from the page allocator. Then they can make whatever mess they want there, without disturbing the other systems.</p>
<p>This has also been posted to <a class="link" href="http://bitsquid.blogspot.com">The Bitsquid blog</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.altdevblogaday.com/2013/01/31/garbage-collection-and-memory-allocation-sizes/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Four meditations on bad design decisions</title>
		<link>http://www.altdevblogaday.com/2012/12/11/four-meditations-on-bad-design-decisions/</link>
		<comments>http://www.altdevblogaday.com/2012/12/11/four-meditations-on-bad-design-decisions/#comments</comments>
		<pubDate>Tue, 11 Dec 2012 15:25:12 +0000</pubDate>
		<dc:creator>Niklas Frykholm</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[rewrite]]></category>

		<guid isPermaLink="false">http://www.altdevblogaday.com/?p=28847</guid>
		<description><![CDATA[<p>I&#8217;ve recently been doing a major rewrite of one of our core engine systems, the graph that we use for our visual scripting language <em>Flow</em>. Taking it from something that looks like this:</p>
<p><a href="http://www.altdevblogaday.com/2012/12/11/four-meditations-on-bad-design-decisions/" class="more-link">Read more on Four meditations on bad design decisions&#8230;</a></p>
]]></description>
				<content:encoded><![CDATA[<p>I&#8217;ve recently been doing a major rewrite of one of our core engine systems, the graph that we use for our visual scripting language <em>Flow</em>. Taking it from something that looks like this:</p>
<p><a href="http://www.altdevblogaday.com/wp-content/uploads/2012/12/meditations-old-flow.png"><img src="http://www.altdevblogaday.com/wp-content/uploads/2012/12/meditations-old-flow-300x164.png" alt="" width="300" height="164" class="aligncenter size-medium wp-image-28852" /></a></p>
<p>To something that looks like this:</p>
<p><a href="http://www.altdevblogaday.com/wp-content/uploads/2012/12/meditations-new-flow.png"><img src="http://www.altdevblogaday.com/wp-content/uploads/2012/12/meditations-new-flow-300x154.png" alt="" width="300" height="154" class="aligncenter size-medium wp-image-28851" /></a></p>
<p>A major rewrite like this is always a humbling experience. When you have to rewrite your own code, every bad decision you made comes back to haunt you. And you don&#8217;t have anybody else to blame them on.</p>
<p>As if facing your own inadequacy wasn&#8217;t enough &#8212; rewriting an existing system is always harder than writing one from scratch. When you write a new system you start with a blank slate and can do whatever you want. When you rewrite, you are constrained by what the old system did &#8212; at least if you want to maintain any kind of backwards compatibility.</p>
<p>In addition, a new system can be written iteratively. You can start with a very small, simple system, release early and get feedback. Based on that feedback you can tweak the system. You don&#8217;t have to think about adding features until you have a good stable base.</p>
<p>When you are doing a rewrite you can&#8217;t release the new system until it is <em>at least as good</em> as the old one. Otherwise, your users will question why you have spent all that time working on a system that is <em>worse</em> than what you had before. And they will be right.</p>
<p>So a rewrite forces you away from the comfortable land of early releases and quick iterations and into the ugly old waterfall model.</p>
<p>With the power of hindsight, I&#8217;d like to reflect a bit on four design mistakes I made when I wrote the first version of the system that made this rewrite a lot harder than it could have been.</p>
<h2>Don&#8217;t use strings for non-text things</h2>
<p>Strings have one really good use &#8212; to hold pieces of text that either gets displayed to or inputted by the user. All other use of strings should be regarded as suspicious.</p>
<p>Strings are scary because they are both ambiguous and powerful. Does &#8220;a/b.txt&#8221; and &#8220;A//b.txt&#8221; represent the same path? Hard to tell. But maybe you can use case conversion, search and replace and some regular expression monstrosity to figure that out.</p>
<p>If you are doing that kind of string manipulation in any part of the code that is not directly related to user input or output, it is a clear warning sign that your code might be too &#8220;stringified&#8221;.</p>
<p>The most obvious example stringified code is the use of &#8220;stringly typed&#8221; data, for example, storing a date as the string &#8220;2012-12-09&#8243;. But the problem with strings can also manifest more subtle ways.</p>
<p>The original version of Flow used strings to identify connectors, both internally (as a representation of the connection) and visually (to show the name of the connector):</p>
<p><a href="http://www.altdevblogaday.com/wp-content/uploads/2012/12/meditations-connector-name.png"><img src="http://www.altdevblogaday.com/wp-content/uploads/2012/12/meditations-connector-name-300x260.png" alt="" width="300" height="260" class="aligncenter size-medium wp-image-28849" /></a></p>
<p>As a consequence, a Flow node couldn&#8217;t have two connectors with the same name, and a connector couldn&#8217;t be renamed (even visually) without breaking all existing connections.</p>
<p>In retrospect, rather than having a single <em>Name</em> property, it would be much better to have separate <em>Id</em> and <em>DisplayName</em> properties. The <em>Id</em> would be a <em>GUID</em> that uniquely identified the property, and the <em>DisplayName</em> would be a (localizable) name, suitable for displaying to the end user.</p>
<p>Using names/strings as identifiers has bitten me in other ways as well. In one system I knew that the names had to be unique (because that is how the script would refer to the objects) so I thought it would be safe to use them as identifiers. What I didn&#8217;t consider was that there could be situations when there <em>temporarily</em> were two objects that had the same name. For example, if the user had created a <em>rock</em> object, and wanted to create a <em>rock_small</em> object &#8212; as she was half-way through typing that name, there would suddenly be two objects named <em>rock</em>. This created problems for the system.</p>
<p>Lesson learned, I now avoid using strings as identifiers.</p>
<h2>When in doubt, you should opt-out</h2>
<p>Every system acquires features over time. That is good of course. Those features make the system more powerful and easier to work with.</p>
<p>But among the good features there are usually a few that don&#8217;t feel quite right. That don&#8217;t really fit into the design of the system. You can <em>do</em> them of course. You can do anything.</p>
<p>But usually it is best not to. Most of the time when I have added a feature that didn&#8217;t quite feel right, I have regretted it later. In retrospect it would have been better to try to find a different way of doing what the users wanted that was more natural to the ideas behind the system.</p>
<p>An example: Users of Flow wanted some way to specify the order in which events were triggered, when multiple connections are connected to the same <em>Out</em> connector.  This is needed in some situations, for example you may want to make sure that a unit is spawned before it is used.</p>
<p>In the old version of Flow, this was implemented with a context menu on the connection where you could select if it should be a &#8220;Do First&#8221;, &#8220;Do Last&#8221; or &#8220;Do Normal&#8221; connection.</p>
<p><a href="http://www.altdevblogaday.com/wp-content/uploads/2012/12/meditations-do-first.png"><img src="http://www.altdevblogaday.com/wp-content/uploads/2012/12/meditations-do-first-300x230.png" alt="" width="300" height="230" class="aligncenter size-medium wp-image-28850" /></a></p>
<p>This solution never felt 100 % right to me. It was hard to find a good intuitive way to visually represent the &#8220;Do First&#8221; and &#8220;Do Last&#8221; connections, and as a result the Flow graphs became harder to understand. </p>
<p>In retrospect, it would have been much better to avoid this feature and wait until I had come up with the more elegant alternative: a sequence node that triggers each of its outputs sequentially:</p>
<p><a href="http://www.altdevblogaday.com/wp-content/uploads/2012/12/meditations-sequence.png"><img src="http://www.altdevblogaday.com/wp-content/uploads/2012/12/meditations-sequence-300x241.png" alt="" width="300" height="241" class="aligncenter size-medium wp-image-28853" /></a></p>
<h2>Be explicit or you&#8217;ll miss it</h2>
<p>Writing code where a lot of things happen implicitly feels great &#8212; to begin with. It is amazing how much you are able to do with just a few lines of code.</p>
<p>But in my experience, implicit code almost always ends up more costly in the long run. It is harder to understand, harder to debug and harder to change. It tends to lock you down in a &#8220;local minimum&#8221; that can be tricky to come out of.</p>
<p>In Flow, a lot of things are done implicitly. The definition of a Flow node is just a simple C# class:</p>
<pre>[Category("Animation")]
public class AnimationEvent : Node
{
    public InVariableUnit Unit;
    public StringVariable Event;
    public InEvent In;
    public OutEvent Out;
}</pre>
<p>Through reflection, Flow finds out the members in the class and their types and automatically generates Flow nodes for them. This process involves some ugly string processing (bad decision #1), such as stripping <em>In</em> and <em>Variable</em> from the type name to find the underlying type of members. Reflection is also used to serialize the graphs.</p>
<p>While it is nice to be able to express a node so concisely, there are also a lot of problematic consequences. For example, since the class names get serialized, we can&#8217;t change the names of classes or properties without breaking the ability to load old files. Also, we have to use some <em>really</em> ugly C# hacks to make sure that the reflection system always returns the members of a class in the order they are declared in the file (so that we can control the order of the connectors).</p>
<p>In retrospect, it would been much better to avoid all this clever reflection stuff and instead just define the node types in configuration files.</p>
<h2>Avoid the road of complex code</h2>
<p>There is some code that needs to be complex, because it is dealing with fundamentally tricky stuff (like computational geometry) or because it needs to run <em>really</em>, <em>really</em> fast. But in all other cases, complexity is just a cost.</p>
<p>If your code starts to feel complex and hard to keep track of, it is a sign that you are probably doing something wrong. And if you are not careful, you may lock yourself in, so that when you write the next version of the system, you have to recreate all that complex behavior in your new, simpler system. You have to deliberately make your code uglier.</p>
<p>The old version of Flow had a way of &#8220;folding&#8221; nodes. You could select a number of nodes, group them, and then &#8220;fold&#8221; the group, collapse it to a single node.</p>
<p>The system had a lot of really hairy code for dealing with this. The code takes a bunch of nodes and creates a new node from them, with connectors matching only the external connectors of the collapsed nodes. it also keeps track of the internal nodes and their connections so they can be recreated if the node is later &#8220;expanded&#8221;.</p>
<p>As you might imagine, this was complicated further by the need for connector names to be unique (see bad decision #1), which meant that some of the external connectors in the new node had to be renamed (since they could come from different internal nodes that had connectors with the same name). So a mapping table was needed to keep track of these renames. Obviously a bad idea, but once you have started down the path of wrongness, it can be hard to turn around.</p>
<p>The new version handles this a lot better. Collapse and expansion is just a visual feature. There are no new nodes created and no other strange things happening to the data, the visualizer just chooses to draw the data in a different way when it is collapsed. In retrospect, that is a much better choice.</p>
<p><a href="http://www.altdevblogaday.com/wp-content/uploads/2012/12/meditations-collapsed.png"><img src="http://www.altdevblogaday.com/wp-content/uploads/2012/12/meditations-collapsed-300x104.png" alt="" width="300" height="104" class="aligncenter size-medium wp-image-28848" /></a></p>
<pre>That is all, four simple lessons
to guide your future coding sessions
now let your code be light and merry
until its time for Charon's ferry</pre>
<p>This has also been posted to <a class="link" href="http://bitsquid.blogspot.com">The Bitsquid blog</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.altdevblogaday.com/2012/12/11/four-meditations-on-bad-design-decisions/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>A Formal Language for Data Definitions</title>
		<link>http://www.altdevblogaday.com/2012/11/17/a-formal-language-for-data-definitions/</link>
		<comments>http://www.altdevblogaday.com/2012/11/17/a-formal-language-for-data-definitions/#comments</comments>
		<pubDate>Sat, 17 Nov 2012 14:19:31 +0000</pubDate>
		<dc:creator>Niklas Frykholm</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[binary]]></category>
		<category><![CDATA[blobs]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[definition]]></category>
		<category><![CDATA[Language]]></category>
		<category><![CDATA[layout]]></category>

		<guid isPermaLink="false">http://www.altdevblogaday.com/?p=28719</guid>
		<description><![CDATA[<p>Lately, I&#8217;ve started to think again about the irritating problem that there is no formal language for describing binary data layouts (at least not that I know of). So when people attempt to describe a file format or a network protocol they have to resort to vague and nondescript things like:</p>
<p><a href="http://www.altdevblogaday.com/2012/11/17/a-formal-language-for-data-definitions/" class="more-link">Read more on A Formal Language for Data Definitions&#8230;</a></p>
]]></description>
				<content:encoded><![CDATA[<p>Lately, I&#8217;ve started to think again about the irritating problem that there is no formal language for describing binary data layouts (at least not that I know of). So when people attempt to describe a file format or a network protocol they have to resort to vague and nondescript things like:</p>
<pre>Each section in the file starts with a header with the format:

4 bytes			header identifier
2 bytes			header length
0--20 bytes		extra data in header

The extra data is described below.</pre>
<p>As anyone who has tried to decipher such descriptions can testify, they are not always clear-cut, which leads to a lot of unnecessary work when trying to coax data out of a document.</p>
<p>It is even worse when I create my own data formats (for our engine&#8217;s runtime data). I would like to document those format in a clear and unambiguous way, so that others can understand them. But since I have no standardized way of doing that, I too have to resort to ad-hoc methods.</p>
<p>This whole thing reminds me of the state of mathematics before formal algebraic notation was introduced. When you had to write things like: <em>the sum of the square of these two numbers equals the square of the previous number</em>. Formal notation can bring a lot of benefits (just look at what it has done for mathematics, music, and chess).</p>
<p>For data layouts, a formal definition language would allow us to write a tool that could open any binary file (that we had a data definition) for and display its content in a human readable way:</p>
<pre>height = 128
width = 128
comment = "A funny cat animation"
frames = [
	{display_time = 0.1 image_data = [100 120 25 ...]}
	...
]</pre>
<p>The tool could even allow us to edit the readable data and save it back out as a binary file.</p>
<p>A formal language would also allow debuggers to display more useful information. By writing data definition files, we could make the debugger understand all our types and display them nicely. And it would be a lot cleaner than the hackery that is <em>autoexp.dat</em>.</p>
<p>Just to toss something out there, here&#8217;s an idea of what a data definition might look like:</p>
<pre>typdedef uint32_t StringHash;

struct Light
{
	StringHash	name;
	Vector3		color;
	float		falloff_start;
	float 		falloff_end;
};

struct Level
{
	uint32_t version;
	uint32_t num_lights;
	uoffset32_t light_data_offset;

	...

light_data_offset:
	Light lights[num_lights];
};</pre>
<p>This is a C-inspired approach, with some additions. Array lengths can be parametrized on earlier data in the file and a labels can be used to generate offsets to different sections in the file..</p>
<p>I&#8217;m still tossing around ideas in my head about what the best way would be to make a language like this a reality. Some of the things I&#8217;m thinking about are:</p>
<h2>Use Case</h2>
<p>I don&#8217;t think it would do much good to just define a langauge. I want to couple it with something that makes it immediately useful. First, for my own motivation. Second, to provide a &#8220;reality check&#8221; to make sure that the choices I make for the language are the right ones. And third, as a reference implementation for anyone else who might want to make use of the language.</p>
<p>My current idea is to write a binary-to-JSON converter. I.e., a program that given a data definition file can automatically convert back and forth between a binary and a JSON-representation of that same data.</p>
<h2>Syntax</h2>
<p>The syntax in the example is very &#8220;C like&#8221;. The advantage of that is that it will automatically understand C structs if you just paste them into the data definition file, which reduces the work required to set up a file.</p>
<p>The disadvantage is that it can be confusing with a language that is very similar to C, but not exactly C. It is easy to make mistakes. Also, C++ (we probably want some kind of template support) is quite tricky to parse. If we want to add our own enhancements on top of that, we might just make a horrible mess.</p>
<p>So maybe it would be better to go for something completely different. Something Lisp-like perhaps. (Because: Yay, Lisp! But also: Ugh, Lisp.)</p>
<p>I&#8217;m still not 100 % decided, but I&#8217;m leaning towards a restricted variant of C. Something that retains the basic syntatic elements, but is easier to parse.</p>
<h2>Completeness</h2>
<p>Should this system be able to describe <em>any</em> possible binary format out there?</p>
<p>Completeness would be nice of course. It is kind of annoying to have gone through all the trouble of defining language and creating the tools and <em>still</em> not be able to handle all forms of binary data.</p>
<p>On the other hand, there are a lot of different formats out there and some of them have a complexity that is borderline insane. The only way to be able to describe <em>everything</em> is to have a data definition language that is Turing complete and procedural (in other words, a detailed list of the instructions required to pack and unpack the data).</p>
<p>But if we go down that route, we haven&#8217;t really raised the abstraction level. In that case, why even bothering with creating a new language. The format description could just be a list of the C instructions needed to unpack the data. That doesn&#8217;t feel like a step forward.</p>
<p>Perhaps some middle ground could be found. Maybe we could make language that was simple and readable for &#8220;normal&#8221; data, but still had the power to express more esoteric constructs. One approach would be to regard the &#8220;declarative statements&#8221; as syntactic sugar in a procedural language. With this approach, the declaration:</p>
<pre>struct LightCollection
{
	unsigned num_lights;
	LightData lights[num_lights];
};</pre>
<p>Would just be syntactic sugar for:</p>
<pre>function unpack_light_collection(stream)
	local res = {}
	res.num_lights = unpack_unsigned(stream)
	res.lights = []
	for i=1,res.num_lights do
		res.lights[i] = unpack_light_data(stream)
	end
end</pre>
<p>This would allow the declarative syntax to be used in most places, but we could drop out to full-featured Turing complete code whenever needed.</p>
<p>This has also been posted to <a class="link" href="http://bitsquid.blogspot.com">The Bitsquid blog</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.altdevblogaday.com/2012/11/17/a-formal-language-for-data-definitions/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Bitsquid Foundation Library</title>
		<link>http://www.altdevblogaday.com/2012/11/01/bitsquid-foundation-library/</link>
		<comments>http://www.altdevblogaday.com/2012/11/01/bitsquid-foundation-library/#comments</comments>
		<pubDate>Thu, 01 Nov 2012 21:42:31 +0000</pubDate>
		<dc:creator>Niklas Frykholm</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[bitsquid]]></category>
		<category><![CDATA[collections]]></category>
		<category><![CDATA[foundation]]></category>
		<category><![CDATA[library]]></category>
		<category><![CDATA[memory management]]></category>
		<category><![CDATA[stl]]></category>

		<guid isPermaLink="false">http://www.altdevblogaday.com/?p=28508</guid>
		<description><![CDATA[<p>Today I want to talk a bit about the <a class="link" href="https://bitbucket.org/bitsquid/foundation/overview">Bitsquid Foundation Library</a> that we recently released on Bitbucket (under the permissive MIT license).</p>
<p>It&#8217;s a minimalistic &#8220;foundation&#8221; library with things like memory management and collection classes. The idea is to have something that can be used as a reasonable starting-off point for other open source projects.</p>
<p><a href="http://www.altdevblogaday.com/2012/11/01/bitsquid-foundation-library/" class="more-link">Read more on Bitsquid Foundation Library&#8230;</a></p>
]]></description>
				<content:encoded><![CDATA[<p>Today I want to talk a bit about the <a class="link" href="https://bitbucket.org/bitsquid/foundation/overview">Bitsquid Foundation Library</a> that we recently released on Bitbucket (under the permissive MIT license).</p>
<p>It&#8217;s a minimalistic &#8220;foundation&#8221; library with things like memory management and collection classes. The idea is to have something that can be used as a reasonable starting-off point for other open source projects.</p>
<p>The library makes some interesting design choices that touches on topics that I have already talked about in this blog and that I think are worth elaborating on a bit further. It also serves an example on how these techniques can be used in &#8220;real world&#8221; code.</p>
<h2>Separation of data and code</h2>
<p>The foundation library implements the idea of separating data definitions and function implementation, that I talked about in <a class="link" href="http://www.altdevblogaday.com/2012/09/03/a-new-way-of-organizing-header-files/">this article</a>.</p>
<p>Data is stored in structs with public members (prefixed with an underscore to indicate that you should not mess with them unless you know what you are doing) that are found in <em>*_types.h</em> files. Functions that operate on the data are written outside of the struct, in separate <em>*.h</em> files (and organized into namespaces).</p>
<p>For example, the data definition for the dynamic <em>Array&lt;T&gt;</em> class is found in <em>collection_types.h:</em></p>
<pre>template&lt;typename T&gt; struct Array
{
    Array(Allocator &amp;a);
    ~Array();
    Array(const Array &amp;other);
    Array &amp;operator=(const Array &amp;other);
    
    T &amp;operator[](uint32_t i);
    const T &amp;operator[](uint32_t i) const;

    Allocator *_allocator;
    uint32_t _size;
    uint32_t _capacity;
    T *_data;
};</pre>
<p>The struct contains only the data used by the array and the operators which C++ forces us to implement as member functions. </p>
<p>The implementation of these functions, as well as the declaration and definition of all other functions that operate on the arrays are found in the <em>array.h</em> file. It contains things like:</p>
<pre>namespace array
{
    template&lt;typename T&gt;
    inline uint32_t size(const Array&lt;T&gt; &amp;a)
    {return a._size;}
    
    template&lt;typename T&gt;
    inline bool any(const Array&lt;T&gt; &amp;a)
    {return a._size != 0;}
    
    template&lt;typename T&gt;
    inline bool empty(const Array&lt;T&gt; &amp;a)
    {return a._size == 0;}
}

template &lt;typename T&gt;
inline Array&lt;T&gt;::Array(Allocator &amp;allocator) :
    _allocator(&amp;allocator), _size(0), _capacity(0), _data(0)
{}</pre>
<p>This way of arranging data and code fills two purposes.</p>
<p>First, it improves compile times by reducing header inclusion. Header files that want to make use of arrays only need to include <em>collection_types.h</em>, which just contains a few struct definitions. They don&#8217;t have to drag in <em>array.h</em>, with all its inline code.</p>
<p>Headers including other headers indiscriminately because they need their types is what leads to exploding compile times. By only including the minimal thing we need (the type definitions), compile times are minimized.</p>
<p>Second, and more importantly, this design allows the collection types to be freely extended. Is there anything you miss in the <em>array</em> interface? Perhaps you would like <em>shift()</em> and <em>unshift()</em> methods? Or <em>binary_search()?</em></p>
<p>No problem. If you want them you can just add them, and you don&#8217;t even need to modify <em>array.h</em>. Just create your own file <em>array_extensions.h</em> or whatever, and add some new functions to the <em>array</em> namespace, that manipulate the data in the <em>Array&lt;T&gt;</em> interface. The functions you create will be just as good as the functions I have created.</p>
<p>Note that this isn&#8217;t true for traditional class designs, where you have first-class citizens (methods) and second-class citizens (external functions).</p>
<p>The foundation library has some interesting examples of this. For example, the <em>string_stream</em> functions don&#8217;t operate on any special <em>StringStream</em> class, they just directly use an <em>Array&lt;char&gt;</em>. Also, the <em>hash</em> and <em>multi_hash</em> interfaces both work on the same underlying <em>Hash&lt;T&gt;</em> struct.</p>
<p>I believe that this design leads to simpler, more orthogonal code that is easier to extend and reuse.</p>
<h2>Memory management</h2>
<p>The library implements the allocator system mentioned in <a class="link" href="http://bitsquid.blogspot.se/2010/09/custom-memory-allocation-in-c.html">this article</a>. There is an abstract <em>Allocator</em> interface, and implementations of that interface can provide different allocation strategies (e.g. ArenaAllocator, HeapAllocator, SlotAllocator, etc). </p>
<p>Since I want to keep the library platform independent, I haven&#8217;t implemented a <em>PageAllocator</em>. Instead, the <em>MallocAllocator</em> is used as the lowest allocator level. If you want to, you can easily add a <em>PageAllocator</em> for your target platform.</p>
<p>For the same reason, I haven&#8217;t added any critical section locking to the allocators, so they aren&#8217;t thread safe. (I&#8217;m thinking about adding an interface for that though, so that you can plug in a critical section implementation if needed.)</p>
<p>The system for temporary allocations is kind of interesting and deserves a bit further explanation.</p>
<p>Most games have a need for temporary memory. For example, you might need some temporary memory to hold the result of a computation until it is done, or to allow a function to return an array of results.</p>
<p>Allocating such memory using the ordinary allocation system (i.e., malloc) puts a lot of unnecessary stress on the allocators. It can also create fragmentation, when long lived allocations that need to stay resident in memory are mixed with short lived temporary allocations.</p>
<p>The foundation library has two allocators for dealing with such temporary allocations, the <em>ScratchAllocator</em> and the <em>TempAllocator</em>.</p>
<p>The <em>ScratchAllocator</em> services allocation requests using a fixed size ring buffer. An <em>allocate</em> pointer advances through the buffer as memory is allocated, and a corresponding <em>free</em> pointer advances as memory is freed. Memory can thus be allocated and deallocated with simple pointer arithmetic. No calls need to be made to the underlying memory management system.</p>
<p>If the scratch buffer is exhausted (the <em>allocate</em> pointer wraps around and catches up with the <em>free</em> pointer), the <em>ScratchAllocator</em> will revert to using the ordinary <em>MallocAllocator</em> to service requests. So it won&#8217;t crash or run out of memory. But it will run slower, so try to avoid this by making sure that your scratch buffer is large enough.</p>
<p>If you forget to free something allocated with the <em>ScratchAllocator</em>, or if you accidentally mix in a long-lived allocation among the short-lived ones, that allocation will block the <em>free</em> pointer from advancing, which will eventually exhaust your scratch buffer, so keep an eye out for such situations.</p>
<p><em>TempAllocator&lt;BYTES&gt;</em> is a scoped allocator that automatically frees all its allocated memory when it is destroyed (meaning you don&#8217;t have to explicitly call <em>deallocate()</em>, you can just let the allocator fall out of scope). This means you can use it everywhere where you need a little extra memory in a function scope:</p>
<pre>void test()
{
     TempAllocator1024 ta;
     Array&lt;char&gt; message(ta);
     ...
}</pre>
<p>The <em>BYTES</em> argument to <em>TempAllocator&lt;BYTES&gt;</em> specifies how much stack space the allocator should reserve. The <em>TempAllocator</em> contains <em>char buffer[BYTES]</em> that gets allocated on the stack together with the <em>TempAllocator</em>.</p>
<p>Allocation requests are first serviced from the stack buffer, then (if the stack buffer is exhausted) from the <em>ScratchAllocator</em>.</p>
<p>This means that <em>TempAllocator</em> gives you an allocator that can be used by all collection classes and will use the fastest allocation method possible (local stack memory, followed by scratch buffer memory, followed by malloc() if all else fails).</p>
<h2>Minimalistic collection types</h2>
<p>The collection classes in the library are distinctly anti-STL. Some of the important differences:</p>
<ul>
<li>
<p>They use the allocation system described above (taking an <em>Allocator</em> as argument to the constructor). They can thus be used sensibly with different allocators (unlike STL types).</p>
</li>
<li>
<p>The use the data/function separation also described above, which means that the headers are cheap to include, and that you can extend them with your own functionality.</p>
</li>
<li>
<p>They use a minimalistic design. They assume that the stored data consists of plain-old-data objects (PODs). Constructors and destructors are not called for the stored objects and they are moved with raw <em>memmove()</em> operations rather than with copy constructors.</p>
</li>
</ul>
<p>This simplifies the code and improves the performance (calling constructors and destructors is not free). It also saves us the headache of dealing with storing objects that must be constructed with Allocators.</p>
<p>Personally I like this minimalistic approach. If I want to keep non-POD data in a collection, I prefer to store it as pointers anyway, so I have control over when and how the data is constructed and destroyed. I don&#8217;t like those things happening &#8220;behind my back&#8221;. You may disagree of course, but in that case you are probably happy to use STL (or boost).</p>
<p>Another example of choosing minimalism is the <em>Hash&lt;T&gt;</em> class. The hash uses a fixed key type which is a <em>uint64_t</em>. If you want to use a key that doesn&#8217;t fit into 64 bits, you have to hash it yourself before using it to access the data.</p>
<h2>And more?</h2>
<p>I&#8217;m planning to add some basic math code to the library, but haven&#8217;t gotten around to it yet.</p>
<p>Is there anything else you&#8217;d like to see in a library like this?</p>
<p>This has also been posted to <a class="link" href="http://bitsquid.blogspot.se/">the Bitsquid blog</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.altdevblogaday.com/2012/11/01/bitsquid-foundation-library/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>A Data-Oriented, Data-Driven System for Vector Fields &#8212; Part 3</title>
		<link>http://www.altdevblogaday.com/2012/10/17/a-data-oriented-data-driven-system-for-vector-fields-part-3/</link>
		<comments>http://www.altdevblogaday.com/2012/10/17/a-data-oriented-data-driven-system-for-vector-fields-part-3/#comments</comments>
		<pubDate>Wed, 17 Oct 2012 19:54:51 +0000</pubDate>
		<dc:creator>Niklas Frykholm</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[bytecode]]></category>
		<category><![CDATA[parsing]]></category>
		<category><![CDATA[physics]]></category>
		<category><![CDATA[vector field]]></category>

		<guid isPermaLink="false">http://www.altdevblogaday.com/?p=28410</guid>
		<description><![CDATA[<p>In this post, I&#8217;ll finish my series on <a class="link" href="http://en.wikipedia.org/wiki/Vector_field">vector fields</a> (see <a class="link" href="http://www.altdevblogaday.com/2012/09/17/a-data-oriented-data-driven-system-for-vector-fields-part-1/">part 1</a> and <a class="link" href="http://www.altdevblogaday.com/2012/10/02/a-data-oriented-data-driven-system-for-vector-fields-part-2/">part 2</a>) by tying up some loose ends.</p>
<p>Quick recap of what has happened so far:</p>
<ul>
<li>
<p>I&#8217;ve decided to represent my vector fields in functional form, as a superposition of individual effect functions <em>G_i(p)</em>.</p>
</li>
</ul>
<p><a href="http://www.altdevblogaday.com/2012/10/17/a-data-oriented-data-driven-system-for-vector-fields-part-3/" class="more-link">Read more on A Data-Oriented, Data-Driven System for Vector Fields &#8212; Part 3&#8230;</a></p>
]]></description>
				<content:encoded><![CDATA[<p>In this post, I&#8217;ll finish my series on <a class="link" href="http://en.wikipedia.org/wiki/Vector_field">vector fields</a> (see <a class="link" href="http://www.altdevblogaday.com/2012/09/17/a-data-oriented-data-driven-system-for-vector-fields-part-1/">part 1</a> and <a class="link" href="http://www.altdevblogaday.com/2012/10/02/a-data-oriented-data-driven-system-for-vector-fields-part-2/">part 2</a>) by tying up some loose ends.</p>
<p>Quick recap of what has happened so far:</p>
<ul>
<li>
<p>I&#8217;ve decided to represent my vector fields in functional form, as a superposition of individual effect functions <em>G_i(p)</em>.</p>
</li>
<li>
<p>I represent these functions in bytecode format, as a piece of bytecode that given an input position <em>p</em> computes a vector field strength <em>F_i</em>.</p>
</li>
<li>
<p>By running each step of the virtual machine over a thousands of input points, the cost of decoding and interpreting the bytecode instructions is amortized over all those points. </p>
</li>
<li>
<p>This means that we get the bytecode decoding &#8220;for free&#8221; &#8212; the bytecode can run at nearly native speed.</p>
</li>
</ul>
<h2>Bytecode format</h2>
<p>In the last article I didn&#8217;t say much about what format I used for the bytecode. Generally speaking, designing a bytecode format can be tricky, because you have to balance the <em>compactness</em> (keeping programs short) against the <em>decoding cost</em> (keeping bytecode fast).</p>
<p>Lucky for us, we don&#8217;t care about either of these things. <em>Compactness</em> doesn&#8217;t matter, because our programs will be very short anyway (just a few instructions). <em>Decoding cost</em> doesn&#8217;t matter (much), because it is amortized.</p>
<p>When it doesn&#8217;t really matter I always pick the simplest thing I can think of. In this case it is something like:</p>
<pre>(instruction) (result) (argument-1) (argument-2) </pre>
<p>Here, <em>instruction</em> is a 4-byte instruction identifier. <em>result</em> is a 4-byte channel identifier that tells us which channel the result should be written to. <em>argument-1</em> and <em>argument-2</em> are either channel identifiers or Vector4&#8242;s with constant arguments. (Instructions of higher arity would have more arguments.)</p>
<p>Note that using 4 bytes for instructions and registers is beyond overkill, but it is the simplest option.</p>
<p>One annoyance with this representation is that I need different instructions depending on whether <em>argument-1</em> or <em>argument-2</em> is constant. For a 2-arity instruction, I need four variants to cover all cases. For a 4-arity instruction (such as <em>select)</em>, I would need 16 variants.</p>
<p>There are two ways of dealing with this. First, I could make the code that executes each instruction a bit more complex, so that it can handle both constant and register arguments. Second, I could make all instructions operate only on registers and have a single instruction for loading constants into registers.</p>
<p>Unfortunately, both of these option results in significantly slower bytecode. In the first case, the extra logic in each bytecode executor makes it slower. In the second case, we need extra instructions for loading constants, which increases the execution time.</p>
<p>So at least for two argument functions, the best option seems to be to have separate code for handling each argument combination. For four argument functions, it might be better to use one of the other options.</p>
<p>Just to give you some example of how the bytecode works, here is some raw byte code and the corresponding disassembled bytecode instructions:</p>
<pre>05000000 02000000 00000000 00000000000020410000000000000000
r2 = sub          r0       (0,10,0,0)

16000000 03000000 00000000000000000000803f00000000 02000000
r3 = cross        (0,0,1,0)                        r2

0a000000 04000000 00002041000020410000204100002041 03000000
r4 = mul          (10,10,10,10)                    r3

10000000 03000000 02000000 02000000
r3 = dot          r2       r2

0c000000 05000000 04000000 03000000
r5 = div          r4       r3

09000000 03000000 05000000 0000a0400000a0400000a0400000a040
r3 = mul          r5       (5,5,5,5)

00000000  01000000  01000000  03000000
r1 = add            r1        r3</pre>
<h2>High-level language</h2>
<p>You can&#8217;t really expect people to author their effects in raw bytecode, or even in our &#8220;bytecode assembly language&#8221;. Effect authors will be a lot more productive if they can use a more comfortable language.</p>
<p>I decided to create such a language and model it after <a class="link" href="http://en.wikipedia.org/wiki/High_Level_Shader_Language">HLSL</a>, since it serves a similar purpose (fast processing of vectorized data). Programmers interested in writing vector field effects are probably already used to working with HLSL. Plus, if at some point we want to move some of this work to the GPU we can reuse the code.</p>
<p>To show what the high level language looks like, here is an implementation of a whirl effect:</p>
<pre>const float4 center = float4(0,10,0,0);
const float4 up = float4(0,0,1,0);
const float4 speed = float4(10,10,10,10);
const float4 radius = float4(5,5,5,5);

struct vf_in
{
    float4 position : CHANNEL0;
    float4 wind : CHANNEL1;
};

struct vf_out
{
    float4 wind : CHANNEL1;
};

void whirl(in vf_in in, out vf_out out)
{
    float4 r = in.position - center;
    out.wind = in.wind + speed * cross(up, r) / dot(r,r) * radius;
}</pre>
<p>
	If you squint, you may notice that this high level code exactly corresponds to the low level bytecode in the previous<br />
	example.
</p>
<p>Just as with HLSL, although this <em>looks</em> like C it actually <em>isn&#8217;t</em> C. Things that work in C may not work in this language and vice versa. I&#8217;m quite strict when I parse this. I figure it is better to be start by being strict rather than permissive. This gives you more leeway to extend or modify the language later while keeping backwards compatibility. A strict syntax can always be loosened later, but if you design the language with a too permissive syntax you can paint yourself in a corner (case in point: Ruby).</p>
<p>I usually don&#8217;t bother with <a class="link" href="http://en.wikipedia.org/wiki/Lex_(software)">Lex</a> or <a class="link" href="http://en.wikipedia.org/wiki/Yacc">Yacc</a> when I write a parser. They are OK tools, I guess, but if I can get by without them I prefer not to have the extra precompile step and to have code that is a bit more straightforward to read and debug.</p>
<p>Instead I tend to use a <a class="link" href="http://en.wikipedia.org/wiki/Recursive_descent_parser">recursive descent parser</a> (a predictive variant, with no backtracking) or some variation of <a class="link" href="http://en.wikipedia.org/wiki/Shunting-yard_algorithm">Dijkstra&#8217;s shunting yard algorithm</a>. Or sometimes a combination of both.</p>
<p>For this language I parse the overall structure with recursive descent, and then use Dijkstra&#8217;s algorithm to process each statement in the function body.</p>
<p>I generate the bytecode directly from the shunting yard algorithm. When I pop an operator from the operator stack I generate the bytecode for computing that operator and storing the result in a temporary register. I then push that register to the value stack so that the result can be used in other computations. Temporary channels are recycled after they are popped of the value stack to minimize the channel count. </p>
<h2>Constant patching</h2>
<p>Constants in the bytecode can be changed when an effect is played. I do this by directly patching the bytecode with the new constant values.</p>
<p>When I generate the bytecode I keep track of where in the bytecode different global constants can be found. This <em>patch</em> list is a simple array of entries like:</p>
<pre>(hashed constant name) (offset in bytecode)</pre>
<p>When playing a vector field effect, the gameplay programmer specifies the constant values with a table:</p>
<pre>VectorField.add(vf, "whirl", {radius = 10})</pre>
<p>I look through the patch list, find all the offsets of constants named &#8220;radius&#8221; and replace them with the value(s) supplied by the gameplay programmer.</p>
<p>Since globals can be patched later, I can&#8217;t do constant folding when I generate the bytecode. (Without global patching, I could just check if both arguments were constants when I popped an operator, and in that case, compute the constant result and push that directly to the value stack, instead of generating a bytecode instruction.)</p>
<p>I could reduce the instruction count somewhat and improve performance by doing a constant folding pass on the bytecode <em>after</em> the globals have been patched, but I haven&#8217;t implemented that yet.</p>
<h2>Physics integration</h2>
<p>In my physics system I maintain a list of all awake (non-sleeping) actors. I apply wind from a vector field with an explicit call:</p>
<pre>void apply_wind(const VectorField &amp;field, const CollisionFilter &amp;filter);</pre>
<p>This extracts the position of every awake actor that matches the collision filter and sends that list to the vector field for evaluation. It then does a second loop through the actors to apply wind forces from the returned wind velocities.</p>
<p>I&#8217;ve chosen to have an explicit step for applying wind, so that you don&#8217;t have to pay anything for the wind support unless you actually use it. Having an explicit step also opens up the possibility to have other types of vector fields. For example, there could be a vector field representing gravity forces and a corresponding function:</p>
<pre>void apply_acceleration(const VectorField &amp;field, const CollisionFilter &amp;filter);</pre>
<p>The fact that the wind is only applied to <em>awake</em> actors is important. Without that check, the wind forces would keep every actor in the world awake all the time, which would be really expensive for the physics engine. Just as with gravity, we want physics objects to come to rest and go to &#8220;sleep&#8221; when the wind forces are in balance with other forces on the actor.</p>
<p>This of course creates a problem when the wind forces are varying. An actor may be in balance now, but a change in the wind direction could change that. A leaf that is resting on the ground may be lifted by a sudden updraft. Since we don&#8217;t apply the wind forces to sleeping object we can&#8217;t get that behavior. Once a leaf has come to rest, it will stay put.</p>
<p>This problem is most noticeable when you have drastic effects like explosions in the vector field. It looks really strange when actors are completely immobile and &#8220;sleep through&#8221; a big explosion.</p>
<p>I deal with this by having a function for explicitly waking actors in an AABB:</p>
<pre>wake_actors(const Vector3 &amp;min, const Vector3 &amp;max, const CollisionFilter &amp;filter)</pre>
<p>If you want to play a drastic wind effect (like an explosion), you should first wake the nearby actors with a call to <em>wake_actors()</em>. This ensures that all nearby actors will get the wind forces from the explosion (since they are now awake).</p>
<p>I apply the wind force with the standard formula:</p>
<pre>F = 1/2 r v^2 C A</pre>
<p>Where <em>r</em> is the density of air, <em>v</em> is the relative velocity of the air with respect to the object (so <em>v = v_wind &#8211; v_object</em>, where <em>v_wind</em> is the wind speed and <em>v_object</em> is the object&#8217;s speed). <em>C</em> is a drag coefficient that depends on the object&#8217;s shape and <em>A</em> is the object&#8217;s reference area.</p>
<p>For <em>C</em> and <em>A</em>, I actually loop through all the physics shapes in the actor and estimate <em>C</em> and <em>A</em> based on those shapes. This is by no means a perfect approach. There are many situations where <em>C</em> might be really different from what such an estimation gives. For example, an object that is heavily perforated would receive much less wind force.</p>
<p>However, I want to have something in place that gives decent behavior in <em>most</em> cases, so that it only very rarely has to be changed. The less artists have to mess around with physical parameters, the smaller is the chance that anything gets messed up.</p>
<p>Note that the wind force is just air resistance with a velocity for the air. So by implementing wind you get the &#8220;air resistance&#8221; behavior &#8220;for free&#8221;.</p>
<h2>Rotation</h2>
<p>If you compute the drag force using the formula above and apply it to a physics actor, it won&#8217;t add any rotation to the actor. This is actually correct. The drag force, as we compute it here, has no rotational component.</p>
<p>Yet it feels counter-intuitive. We expect objects to rotate when they are blown about by the wind. Leafs and papers certainly swirl around a lot when the wind blows.</p>
<p>What happens in that case is actually a second order effect. When the wind blows around an object you get zones of high and low pressure as well as turbulence, and it is the forces from these interactions that affects the object&#8217;s rotation.</p>
<p>These interactions are  tricky to model accurately and they depend a lot on the object&#8217;s shape. Right now, I&#8217;m not even trying. Instead I use a much simpler approach: I apply the drag force a bit above the object&#8217;s actual center of mass so that it produces a torque and makes the object rotate. This is a complete hack that has no basis at all in physical reality, but it does add some rotation. At least it looks a lot better than applying the wind force without any rotation.</p>
<p>It should be possible to do better &#8212; to make some kind of estimate of what rotational forces wind induces when it blows against typical physics shapes: boxes, spheres, capsules, etc. Just give my a couple of days in a wind tunnel and I&#8217;ll try to come up with something.</p>
<p>This has also been posted to <a class="link" href="http://bitsquid.blogspot.se/">the Bitsquid blog</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.altdevblogaday.com/2012/10/17/a-data-oriented-data-driven-system-for-vector-fields-part-3/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>A Data-Oriented, Data-Driven System for Vector Fields &#8212; Part 2</title>
		<link>http://www.altdevblogaday.com/2012/10/02/a-data-oriented-data-driven-system-for-vector-fields-part-2/</link>
		<comments>http://www.altdevblogaday.com/2012/10/02/a-data-oriented-data-driven-system-for-vector-fields-part-2/#comments</comments>
		<pubDate>Tue, 02 Oct 2012 20:44:16 +0000</pubDate>
		<dc:creator>Niklas Frykholm</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[bytecode]]></category>
		<category><![CDATA[vector field]]></category>

		<guid isPermaLink="false">http://www.altdevblogaday.com/?p=28305</guid>
		<description><![CDATA[<p>In <a class="link" href="http://www.altdevblogaday.com/2012/09/17/a-data-oriented-data-driven-system-for-vector-fields-part-1/">Part 1</a> we decided to represent a vector field as a superposition of individual effects:</p>
<pre>G(p) = G_0(p) + G_1(p) + ... + G_n(p)</pre>
<p>Here, each <em>G_i(p)</em> is a function that represents some effect, such as wind, an explosion or the updraft from an air vent.</p>
<p><a href="http://www.altdevblogaday.com/2012/10/02/a-data-oriented-data-driven-system-for-vector-fields-part-2/" class="more-link">Read more on A Data-Oriented, Data-Driven System for Vector Fields &#8212; Part 2&#8230;</a></p>
]]></description>
				<content:encoded><![CDATA[<p>In <a class="link" href="http://www.altdevblogaday.com/2012/09/17/a-data-oriented-data-driven-system-for-vector-fields-part-1/">Part 1</a> we decided to represent a vector field as a superposition of individual effects:</p>
<pre>G(p) = G_0(p) + G_1(p) + ... + G_n(p)</pre>
<p>Here, each <em>G_i(p)</em> is a function that represents some effect, such as wind, an explosion or the updraft from an air vent.</p>
<p>The next step is to find a way of quickly evaluating the function <em>G(p)</em>, a general function that could be almost anything, for lots of different positions <em>p_i</em>. This is quite tricky to do well in C++.</p>
<p>Of course, evaluating <em>specific</em> functions is not hard. If we want to evaluate a specific function, such as:</p>
<pre>Vector3(sin(p.x), sin(p.y), 0);</pre>
<p>we can just type it up:</p>
<pre>inline Vector3 f(const Vector3 &amp;p)
{
	return vector3(sin(p.x), sin(p.y), 0);
}</pre>
<p>But if we don&#8217;t know beforehand what <em>G(p)</em> will be we don&#8217;t have that option.</p>
<p>We could write our system so that it supported a limited set of specific effects, with hardcoded C++ implementations. For example, there could be an &#8220;explosion&#8221; effect with some parameters (radius, strength, etc), an &#8220;updraft&#8221; effect, a &#8220;whirl&#8221; effect, etc. Similarly we could have support for a variety of standard shapes, such as &#8220;sphere&#8221;, &#8220;cylinder&#8221;, &#8220;capsule&#8221;, etc. And perhaps some different types of falloffs (&#8220;linear&#8221;, &#8220;quadratic&#8221;). Perhaps also some temporal effects (&#8220;attack-sustain-release&#8221;, &#8220;ease-in-ease-out&#8221;).</p>
<p>But it is hard to know where to draw the limit with this approach. Exactly what effects and shapes and falloffs and time curves should the system support? The more things we add, the more cluttered the system becomes. And the system is still not completely general. No matter how much we add, there will still be some things that the user just <em>can&#8217;t</em> do, without disturbing a programmer and get her to add a new effect to the system. This means that the system is not <em>truly</em> data-driven. </p>
<p>Whether this is a problem or not depends a lot on your development style. If you are a single artist-programmer working on a single game you may not even care. To you code and data is the same thing. Who cares if you have to add something to the code to make a special effect. That is what the code is for.</p>
<p>At Bitsquid, however, we are in a different position. We are making a general purpose engine to be used on multiple platforms for all kinds of tasks. We can&#8217;t put game specific code in the engine or everything will end up a total mess. Sure, our licensees could modify their cloned copy of the source to add their own effects. But that is not an ideal solution. It forces them to learn our code, it makes it harder for us to reproduce their bugs, since our code bases have now diverged and it makes it harder for us to modify and optimize the source code without putting our licensees in merge hell.</p>
<p>So our aim is always to be completely data-driven.</p>
<p>But how can we represent a general function as data? There are really only two possibilities:</p>
<ul>
<li>
<p>As a piece of executable machine code.</p>
</li>
<li>
<p>As a piece of <em>bytecode</em> that gets executed by a virtual machine.</p>
</li>
</ul>
<p>The first approach is the fastest of course, but it has two drawbacks. First, machine code is platform dependent. Writing a system that can dynamically generate machine code for a lot of different targets is no small undertaking (though it could be simplified by using LLVM). Second, and more serious, many systems simply don&#8217;t <em>allow</em> us execute dynamically generated machine code.</p>
<p>The inevitable conclusion is that we have to use bytecode (perhaps coupled with a machine code compiler on the platforms where that is feasible).</p>
<p>Unfortunately, as everybody who has used a dynamic language without a JIT compiler knows, bytecode is slow. Usually, at least a factor 10 slower than machine code. And remember that one of our design goals for this system was that it should be fast. We said in the beginning that it should be able to handle at least 10 000 queries per frame.</p>
<p>So what can we do?</p>
<h2>The Massively Vectorized Virtual Machine</h2>
<p>At this point it makes sense to stop and think a bit about <em>why</em> bytecode is slow. If you look at the code of a virtual machine, it is essentially a tight loop that repeatedly does three things:</p>
<ul>
<li>
<p>Decode the next bytecode instruction into operation + arguments.</p>
</li>
<li>
<p>Jump to the code that performs the operation.</p>
</li>
<li>
<p>Execute the operation.</p>
</li>
</ul>
<p>The third step is usually just as fast as handwritten machine code would be. Computing <em>a+b</em> is not more expensive because it was triggered by an <em>OP_ADD</em> bytecode instruction.</p>
<p>So all the overhead of bytecode, the thing that makes it &#8220;slow&#8221;, is found in the first two steps.</p>
<p>Well then here is an idea: what if we could reuse the computations that we make in those two steps?</p>
<p>Remember that our goal is to compute <em>G(p)</em> for a lot of points <em>p_i</em>. We want to evaluate the same function, the same <em>bytecode</em> instructions, for a lot of different data points. In that case, why repeat the expensive operation of decoding the bytecode instructions again and again for each point? Why not just decode the instruction <em>once</em> and then execute it for <em>all</em> data points?</p>
<p>So, with that change, our virtual machine loop now becomes:</p>
<ul>
<li>
<p>Decode the next bytecode instruction.</p>
</li>
<li>
<p>Jump to the code that executes it.</p>
</li>
<li>
<p>Execute that single instruction for <em>all</em> the input data.</p>
</li>
</ul>
<p>With this change, the cost of decoding the bytecode is now amortized over all the query points. The more query points we have, the less time (proportionally) we will spend on decoding bytecode. With enough points (&gt;1024) that time should be nearly negligible . In other worlds, our bytecode should be able to run <strong>at nearly the same speed as native machine code</strong>.</p>
<p>In a quick test I made, the overhead of a bytecode implementation compared to native code was just 16 % &#8212; a far cry from the 10x slowdown we have come to expect.</p>
<h2>Fleshing out the Details</h2>
<p>Since we are computing a vector function on vector input and we want it to run as fast as possible, it makes sense to use SSE (or its equivalent on other platforms) and represent all our data as vector4 intrinsics.</p>
<p>Virtual machines can be stack-based or register-based. Stack-based machines produce more compact bytecode since the arguments are implicit. Register-based machines need fewer instructions to accomplish a task, since they don&#8217;t have to juggle things around on the stack. In our case, compact bytecode doesn&#8217;t buy us much, since our programs are short and the decoding cost is amortized. On the other hand, accomplishing the same thing with fewer instructions means less code to execute for <em>each</em> query point. So a register-based virtual machine seems to be a clear win.</p>
<p>Here is what the code for an explosion effect could look like in a made-up intermediate language for our virtual machine. The effect produces a wind of 50 m/s outwards from the center of a sphere of radius 5 m located at (2,4,0):</p>
<pre>direction = sub position, (2,4,0,0)
lensqr = dot direction, direction
direction = normalize direction
direction = mul direction, (50,50,50,50)
direction = select_lt lensqr, (25,25,25,25), direction, (0,0,0,0)
output = add output, direction</pre>
<p>Here <em>position</em> is the input query position and <em>output</em> is the output result of the function. <em>direction</em> and <em>lensqr</em> are temporary variables.</p>
<p>Note that the final operation adds the result to the <em>output</em> register instead of overwriting it. This allows us to merge multiple effects by simply concatenating their bytecode. So to evaluate <em>G(p)</em> for a large number of points, we can first intersect the AABB of the points with the AABB of each individual effect <em>G_i(p)</em>. Then we merge the bytecodes of each intersecting effect into a single bytecode function <em>G&#8217;(p)</em> that we finally evaluate for each point.</p>
<p>We can feed <em>position</em> and <em>output</em> to the virtual machine as arrays of intrinsics:</p>
<pre>void evaluate(void *bytecode, unsigned n, Vector4I *positions, Vector4I *output)</pre>
<p>Note that since we are running the bytecode one instruction at a time for all the data, the local variables (<em>direction</em> and <em>lensqr)</em> need to be arrays too, since we need to remember their value for each of the input positions.</p>
<p>We could allocate arrays for these local variables and pass them to <em>evaluate</em> just as we do for <em>positions</em> and <em>output</em>. But that seems a bit wasteful. A complicated function could have twenty global variables or more, meaning that with 10 000 particles we would need to allocate 3.2 MB of temporary memory. The amount needed will vary widely, depending on how complicated the function is, which is driven by the data. This makes it hard to do a memory budget for the system.</p>
<p>So let&#8217;s use an alternative approach. We allocate all local variable buffers from a &#8220;scratch space&#8221; which is provided by the caller:</p>
<pre>void evaluate(void *bytecode, unsigned n, Vector4I *positions, Vector4I *output, unsigned scratch_bytes, void *scratch_space)</pre>
<p>Now the caller has complete control over the amount of temporary memory the system uses. It is predictable and can be made to fit any desired memory budget.</p>
<p>To make this work, we need to chop this scratch memory up into areas for each local variable. The size of those buffers then determine how many input positions we can process at a time.</p>
<p>For example, suppose we have 256 K of scratch memory and 8 local variables. Each local variable then gets 32 K of memory, which can hold 2 K Vector4I&#8217;s. So this means that instead of processing all 10 000 particles at the same time when we execute an opcode, we process the particles in 5 chunks, handling 2 048 particles each time. The cost of decoding the bytecode gets amortized over 2 048 particles, instead of over 10 000, but it is still negligible.</p>
<p>The nice thing about this approach is that we always use a constant, predictable amount of scratch space, regardless of how many query points we process and how complicated the function is. Instead we scale down how many particles we process at a time.</p>
<p>Since both input data and local variables are now Vector4I buffers, the inner loop of the virtual machine is simple to write, it will look something like:</p>
<pre>void run_vm(const void *bytecode, unsigned n, Vector4I **registers)
{
	const void *pc = bytecode;
	while (true) {
		unsigned op = DECODE_OP(pc);
		switch(op) {
			case OP_ADD:
				Vector4I *a = registers[DECODE_REGISTER(pc)];
				Vector4I *b = registers[DECODE_REGISTER(pc)];
				Vector4I *c = registers[DECODE_REGISTER(pc)];
				Vector4I *ae = a + n;
				while (a != ae) {
					*a++ = addi(*b++, *c++);
				}
				break;
			...
		}
	}
}</pre>
<h2 class="title">An Example</h2>
<p>Here is a <a class="link" href="http://www.youtube.com/watch?v=HkYvvEUXhcw&amp;feature=g-upl">YouTube video</a> that shows a vector field implemented using this method. Unfortunately, the YouTube compression is not very nice to a video that contains this much high-frequency information. But at least it gives some idea of the effect.</p>
<p>The video shows 20 000 particles being animated by the vector field at a query cost of about 0.4 ms on a single thread (of course, parallelization is trivial, so you can divide that by the number of available cores).</p>
<p>This has also been posted to <a class="link" href="http://bitsquid.blogspot.se/">the Bitsquid blog</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.altdevblogaday.com/2012/10/02/a-data-oriented-data-driven-system-for-vector-fields-part-2/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>A Data-Oriented, Data-Driven System for Vector Fields &#8211; Part 1</title>
		<link>http://www.altdevblogaday.com/2012/09/17/a-data-oriented-data-driven-system-for-vector-fields-part-1/</link>
		<comments>http://www.altdevblogaday.com/2012/09/17/a-data-oriented-data-driven-system-for-vector-fields-part-1/#comments</comments>
		<pubDate>Mon, 17 Sep 2012 20:45:22 +0000</pubDate>
		<dc:creator>Niklas Frykholm</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[data driven]]></category>
		<category><![CDATA[data oriented]]></category>
		<category><![CDATA[vector field]]></category>
		<category><![CDATA[wind]]></category>

		<guid isPermaLink="false">http://www.altdevblogaday.com/?p=27906</guid>
		<description><![CDATA[<p>A <em>vector field</em> is a function that assigns a vector value to each point in 3D space. Vector fields can be used to represent things like <em>wind</em> (the vector field specifies the wind velocity at each point in space), water, magnetism, etc.</p>
<p><a href="http://www.altdevblogaday.com/2012/09/17/a-data-oriented-data-driven-system-for-vector-fields-part-1/" class="more-link">Read more on A Data-Oriented, Data-Driven System for Vector Fields &#8211; Part 1&#8230;</a></p>
]]></description>
				<content:encoded><![CDATA[<p>A <em>vector field</em> is a function that assigns a vector value to each point in 3D space. Vector fields can be used to represent things like <em>wind</em> (the vector field specifies the wind velocity at each point in space), water, magnetism, etc.</p>
<p>To me, wind is the most interesting use case. I want a system that can be used for physics (trees, tumble weed, paper cups), particles (leaves, sparks, smoke) and graphics (grass). I also want the system to be capable of handling both global effects (wind blowing through the entire level) and local effects (explosions, air vents, landing helicopters, rising hot air from fires, etc). But I don&#8217;t want to limit the system to <em>only</em> handling wind. I imagine that once the system is in place, it could be put to other interesting uses as well.</p>
<p>There are a number of things that make this an interesting non-trivial design challenge:</p>
<ul>
<li>
<p>Vector fields represent a global shared state. All systems (particles, physics, etc) should react to the same wind. This can create strong couplings between unrelated systems, which we want to avoid.</p>
</li>
<li>
<p>The system must be fast. We want to be able to make large particle effects that are affected by wind. As a design goal, let&#8217;s say that it should be able to handle at least 10 000 queries / frame.</p>
</li>
<li>
<p>As stated above, the system must be flexible enough to handle both global wind and a large variety of different local effects (air vents, fans, etc).</p>
</li>
</ul>
<p>I&#8217;ll outline the system in a series of articles. Let&#8217;s start by thinking a bit about how we can represent the vector field in a way that allows for fast queries.</p>
<h2>1. Use a functional representation</h2>
<p>Storing the vector value for every point in 3D space at a decent resolution would require huge amounts of memory. It would also be very expensive to update. If we wanted to change the global wind direction, we would have to loop over all those points and change the value.</p>
<p>So, instead, we will use a functional representation. We will express the field as some closed function <em>F(p, t)</em> that gives us the field vector at point <em>p</em> in space at the time <em>t</em>.</p>
<p>For example, we could express a global wind that oscillates in the x-direction as:</p>
<pre>F(p, t) = Vector3(sin(t), 0, 0)</pre>
<p>The closed function form allows us to evaluate the vector field at any point in space and time.</p>
<p>Note that even with a functional form as the main representation, we can still interact with grid based representations. For example, we can render some section of the <em>F(p, t)</em> function to a texture for use on a GPU. Similarly, if we have some grid based wind data that we want to add to the simulation, we could use that as part of the <em>F(p, t)</em> expression:</p>
<pre>F(p, t) = Vector3(sin(t), 0, 0) + sample_grid(grid, p)</pre>
<h2>2. Ignore the time coordinate</h2>
<p>The vector field function <em>F(p, t)</em> is a function of both space <em>and</em> time. The wind varies throughout the level and if we look at any one point, the wind at that point varies over time.</p>
<p>But in practice, we treat the <em>p</em> and <em>t</em> coordinates very differently. We start at some time <em>t_0</em> and then evaluate <em>F(p, t_0)</em> for thousands of different <em>p</em> values. Then we move on to <em>t_1</em> and do the same thing. </p>
<p>We can make use of the fact that <em>t</em> remains constant for a large number of evaluations to simplify the function. For example at <em>t=0.5</em> the function:</p>
<pre>F(p, t) = sin(p.x) * sin(p.y) * cos(t)</pre>
<p>simplifies to:</p>
<pre>G(p) = sin(p.x) * sin(p.y) * 0.8776</pre>
<p>which is cheaper to evaluate.</p>
<p>Taking this approach a step further, it makes sense to split our system in two parts &#8212; a high level system that knows about time and every frame produces a new <em>G(p)</em> for the current time, and a low level system that ignores time completely and just computes <em>G(p)</em>. Since the high level system only runs once per frame it can afford to do all kinds of complicated but interesting stuff, like constant folding, optimization, etc.</p>
<p>For the low level system we have reduced the problem to evaluating <em>G(p)</em>.</p>
<h2>3. Express the field as a superposition of individual effects</h2>
<p>To make it possible for the field to contain both global effects (world wind) and local effects (air vents, explosions) we express it as a superposition of individual effect functions:</p>
<pre>G(p) = G_1(p) + G_2(p) + ... + G_n(p)</pre>
<p>Here <em>G_i(p)</em> represents each individual effect. A base wind could be expressed as just a constant:</p>
<pre>G_0(p) = Vector3(2.1, 1.4, 0)</pre>
<p>A turbulence function could add a random component</p>
<pre>G_1(p) = turbulence(seed, p, 4)</pre>
<p>An explosion effect could create a wind with a speed of 100 m/s outwards from the center of the explosion in a sphere with radius 4.0 meter around the explosion center:</p>
<pre>G_2(p) = sphere(p,c,4) * normalize(p-c) * 100</pre>
<p>Here <em>sphere(p,c,4)</em> is a spherical support function that defines the range of the effect. It is <em>1</em> if <em>||p &#8211; c|| &lt;= 4.0</em> and <em>0</em> otherwise.</p>
<p>Note again that we have stripped out the time component. At the higher level, this might be an expanding sphere with decreasing wind speeds, but at the low level we only care what it looks like at this instance.</p>
<p>Similar functions can be added for other local effects.</p>
<h2>4. Use the AABB to cull local fields</h2>
<p>If we have a lot of local effects (explosions, etc), evaluating <em>G(p)</em> will be pretty expensive.</p>
<p>We can reduce the cost by only evaluating the local effects that are close enough to our particle system to matter.</p>
<p>I.e., instead of evaluating <em>G(p)</em> for all particles, we first intersect the AABB of each <em>G_i(p)</em>&#8216;s support with the AABB of our particle system.</p>
<p>That gives us a simpler function <em>G&#8217;(p)</em> that we can then evaluate for each particle.</p>
<p>If we wanted to, we could use the wavelength of the field for further simplifications. If the scale at which a field effect changes is much larger than our AABB, we can replace that effect with a Taylor series expansion. Similarly, if an effect oscillates at a scale much smaller than the size of our particles, we can replace it with its average value.</p>
<h2>Next time</h2>
<p>Next time I will look at how we can efficiently evaluate arbitrary functions, such as:</p>
<pre>G(p) = Vector3(1,1,0) + turbulence(seed, p, 2) + sphere(p, c, 4)</pre>
<p>for a huge number of particle positions <em>p</em>.</p>
<p>This has also been posted to <a class="link" href="http://bitsquid.blogspot.com">The Bitsquid blog</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.altdevblogaday.com/2012/09/17/a-data-oriented-data-driven-system-for-vector-fields-part-1/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>A new way of organizing header files</title>
		<link>http://www.altdevblogaday.com/2012/09/03/a-new-way-of-organizing-header-files/</link>
		<comments>http://www.altdevblogaday.com/2012/09/03/a-new-way-of-organizing-header-files/#comments</comments>
		<pubDate>Mon, 03 Sep 2012 19:58:18 +0000</pubDate>
		<dc:creator>Niklas Frykholm</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[code organization]]></category>
		<category><![CDATA[headers]]></category>

		<guid isPermaLink="false">http://www.altdevblogaday.com/?p=27508</guid>
		<description><![CDATA[<p>Recently, I&#8217;ve become increasingly dissatisfied with the standard C++ way of organizing header files (one <em>.h</em> file and one <em>.cpp</em> file per class) and started experimenting with alternatives.</p>
<p>I have two main problems with the ways headers are usually organized.</p>
<p><a href="http://www.altdevblogaday.com/2012/09/03/a-new-way-of-organizing-header-files/" class="more-link">Read more on A new way of organizing header files&#8230;</a></p>
]]></description>
				<content:encoded><![CDATA[<p>Recently, I&#8217;ve become increasingly dissatisfied with the standard C++ way of organizing header files (one <em>.h</em> file and one <em>.cpp</em> file per class) and started experimenting with alternatives.</p>
<p>I have two main problems with the ways headers are usually organized.</p>
<p>First, it leads to long compile times, especially when templates and inline functions are used. Fundamental headers like <em>array.h</em> and <em>vector3.h</em> get included by a lot of other header files that need to use the types they define. These, in turn, get included by other files that need <em>their</em> types. Eventually you end up with a messy nest of header files that get included in a lot more translation units than necessary.</p>
<p>Sorting out such a mess once it has taken root can be surprisingly difficult. You remove an <em>#include</em> statement somewhere and are greeted by 50 compile errors. You have to fix these one by one by inserting missing <em>#include</em> statements and forward declarations. Then you notice that the Android release build is broken and needs additional fixes. This introduces a circular header dependency that needs to be resolved. Then it is on to the next <em>#include</em> line &#8212; remove it, rinse and repeat. After a day of this mind-numbingly boring activity you might have reduced your compile time by four seconds. Hooray!</p>
<p>Compile times have an immediate and important effect on programmer productivity and through general bit rot they tend to grow over time. There are many things that can increase compile times, but relatively few forces that work in the opposite direction.</p>
<p>It would be a lot better if we could change the way we work with headers, so that we didn&#8217;t get into this mess to begin with.</p>
<p>My second problem is more philosophical. The basic idea behind object-oriented design is that data and the functions that operate on it should be grouped together (in the same class, in the same file). This idea has some merits &#8212; it makes it easier to verify that class constraints are not broken &#8212; but it also leads to problems. Classes get coupled tightly with concepts that are not directly related to them &#8212; for example things like serialization, endian-swapping, network synchronization and script access. This pollutes the class interface and makes reuse and refactoring harder.</p>
<p>Class interfaces also tend to grow indefinitely, because there is always &#8220;more useful stuff&#8221; that can be added. For example, a string class (one of my pet peeves) could be extended with functionality for tokenization, path manipulation, number parsing, etc. To prevent &#8220;class bloat&#8221;, you could write this code as external functions instead, but this leads to a slightly strange situation where a class has some &#8220;canonized&#8221; members and some second-class citizens. It also means that the class must export enough information to allow any kind of external function to be written, which kind of breaks the whole <em>encapsulation</em> idea.</p>
<p>In my opinion, it is much cleaner to organize things by <em>functionality</em> than by type. Put the serialization code in one place, the path manipulation code in another place, etc.</p>
<p>My latest idea about organization is to put all type declarations for all structs and classes in a single file (say <em>types.h</em>):</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="cpp" style="font-family:monospace;"><span style="color: #0000ff;">struct</span> Vector3 <span style="color: #008000;">&#123;</span>
	<span style="color: #0000ff;">float</span> x, y, z<span style="color: #008080;">;</span>
<span style="color: #008000;">&#125;</span><span style="color: #008080;">;</span>
&nbsp;
<span style="color: #0000ff;">template</span> <span style="color: #000080;">&lt;</span><span style="color: #0000ff;">class</span> T<span style="color: #000080;">&gt;</span>
<span style="color: #0000ff;">class</span> Array<span style="color: #000080;">&lt;</span>T<span style="color: #000080;">&gt;</span> <span style="color: #008000;">&#123;</span>
<span style="color: #0000ff;">public</span><span style="color: #008080;">:</span>
	Array<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span> <span style="color: #008080;">:</span> _capacity<span style="color: #008000;">&#40;</span><span style="color: #0000dd;">0</span><span style="color: #008000;">&#41;</span>, _size<span style="color: #008000;">&#40;</span><span style="color: #0000dd;">0</span><span style="color: #008000;">&#41;</span>, _data<span style="color: #008000;">&#40;</span><span style="color: #0000dd;">0</span><span style="color: #008000;">&#41;</span> <span style="color: #008000;">&#123;</span><span style="color: #008000;">&#125;</span>
	~Array<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span> <span style="color: #008000;">&#123;</span><span style="color: #0000dd;">free</span><span style="color: #008000;">&#40;</span>_data<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span><span style="color: #008000;">&#125;</span>
	<span style="color: #0000ff;">unsigned</span> _capacity<span style="color: #008080;">;</span>
	<span style="color: #0000ff;">unsigned</span> _size<span style="color: #008080;">;</span>
	T <span style="color: #000040;">*</span>_data<span style="color: #008080;">;</span>
<span style="color: #008000;">&#125;</span><span style="color: #008080;">;</span>
&nbsp;
<span style="color: #0000ff;">class</span> IFileSystem<span style="color: #008080;">;</span>
<span style="color: #0000ff;">class</span> INetwork<span style="color: #008080;">;</span></pre></td></tr></table></div>

<p>Note that <em>types.h</em> has no function declarations, but it includes the full data specification of any struct or class that we want to use &#8220;by value&#8221;. It also has forward declarations for classes that we want to use &#8220;by reference&#8221;. (These classes are assumed to have pure virtual interfaces. They can only be created by factory functions.)</p>
<p>Since <em>types.h</em> only contains type definitions and not a ton of inline code, it ends up small and fast to compile, even if we put all our types there.</p>
<p>Since it contains all type definitions, it is usually the only file that needs to be included by external headers. This means we avoid the hairy problem with a big nest of headers that include other headers. We also don’t have to bother with inserting forward declarations in every header file, since the types we need are already forward declared for us in <em>types.h</em>.</p>
<p>We put the function declarations (along with any inline code) in the usual header files. So <em>vector3.h</em> would have things like:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="cpp" style="font-family:monospace;"><span style="color: #0000ff;">inline</span> Vector3 operator<span style="color: #000040;">+</span><span style="color: #008000;">&#40;</span><span style="color: #0000ff;">const</span> Vector3 <span style="color: #000040;">&amp;</span>a, <span style="color: #0000ff;">const</span> Vector3 <span style="color: #000040;">&amp;</span>b<span style="color: #008000;">&#41;</span>
<span style="color: #008000;">&#123;</span>
	Vector3 res<span style="color: #008080;">;</span>
	res.<span style="color: #007788;">x</span> <span style="color: #000080;">=</span> a.<span style="color: #007788;">x</span> <span style="color: #000040;">+</span> b.<span style="color: #007788;">x</span><span style="color: #008080;">;</span>
	res.<span style="color: #007788;">y</span> <span style="color: #000080;">=</span> a.<span style="color: #007788;">y</span> <span style="color: #000040;">+</span> b.<span style="color: #007788;">y</span><span style="color: #008080;">;</span>
	res.<span style="color: #007788;">z</span> <span style="color: #000080;">=</span> a.<span style="color: #007788;">z</span> <span style="color: #000040;">+</span> b.<span style="color: #007788;">z</span><span style="color: #008080;">;</span>
	<span style="color: #0000ff;">return</span> res<span style="color: #008080;">;</span>
<span style="color: #008000;">&#125;</span></pre></td></tr></table></div>

<p><em>.cpp</em> files that wanted to use these operations would include <em>vector3.h</em>. But <em>.h</em> files and other <em>.cpp</em> files would not need to include the file. The file gets included where it is needed and not anywhere else.</p>
<p>Similarly, <em>array.h</em> would contain thinks like:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="cpp" style="font-family:monospace;"><span style="color: #0000ff;">template</span> <span style="color: #000080;">&lt;</span><span style="color: #0000ff;">class</span> T<span style="color: #000080;">&gt;</span>
<span style="color: #0000ff;">void</span> push_back<span style="color: #008000;">&#40;</span>Array<span style="color: #000080;">&lt;</span>T<span style="color: #000080;">&gt;</span> <span style="color: #000040;">&amp;</span>a, <span style="color: #0000ff;">const</span> T <span style="color: #000040;">&amp;</span>item<span style="color: #008000;">&#41;</span>
<span style="color: #008000;">&#123;</span>
	<span style="color: #0000ff;">if</span> <span style="color: #008000;">&#40;</span>a._size <span style="color: #000040;">+</span> <span style="color: #0000dd;">1</span> <span style="color: #000080;">&gt;</span> a._capacity<span style="color: #008000;">&#41;</span>
		grow<span style="color: #008000;">&#40;</span>a<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
	a._data<span style="color: #008000;">&#91;</span>a._size<span style="color: #000040;">++</span><span style="color: #008000;">&#93;</span> <span style="color: #000080;">=</span> item<span style="color: #008080;">;</span>
<span style="color: #008000;">&#125;</span></pre></td></tr></table></div>

<p>Note that <em>types.h</em> only contains the constructor and the destructor for <em>Array&lt;T&gt;</em>, not any other member functions.</p>
<p>Furthermore, I prefer to design classes so that the &#8220;zero-state&#8221; where all members are zeroed is always a valid empty state for the class. That way, the constructor becomes trivial, it just needs to zero all member variables. We can also construct arrays of objects with a simple <em>memset()</em>.</p>
<p>If a class needs a more complicated empty state, then perhaps it should be an abstract interface-class instead of a value class.</p>
<p>For <em>IFileSystem</em>, <em>file_system.h</em> defines the virtual interface:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="cpp" style="font-family:monospace;"><span style="color: #0000ff;">class</span> IFileSystem
<span style="color: #008000;">&#123;</span>
	<span style="color: #0000ff;">virtual</span> <span style="color: #0000ff;">bool</span> exists<span style="color: #008000;">&#40;</span><span style="color: #0000ff;">const</span> <span style="color: #0000ff;">char</span> <span style="color: #000040;">*</span>path<span style="color: #008000;">&#41;</span> <span style="color: #000080;">=</span> <span style="color: #0000dd;">0</span><span style="color: #008080;">;</span>
	<span style="color: #0000ff;">virtual</span> IFile <span style="color: #000040;">*</span>open_read<span style="color: #008000;">&#40;</span><span style="color: #0000ff;">const</span> <span style="color: #0000ff;">char</span> <span style="color: #000040;">*</span>path<span style="color: #008000;">&#41;</span> <span style="color: #000080;">=</span> <span style="color: #0000dd;">0</span><span style="color: #008080;">;</span>
	<span style="color: #0000ff;">virtual</span> IFile <span style="color: #000040;">*</span>open_write<span style="color: #008000;">&#40;</span><span style="color: #0000ff;">const</span> <span style="color: #0000ff;">char</span> <span style="color: #000040;">*</span>path<span style="color: #008000;">&#41;</span> <span style="color: #000080;">=</span> <span style="color: #0000dd;">0</span><span style="color: #008080;">;</span>
	...
<span style="color: #008000;">&#125;</span><span style="color: #008080;">;</span>
&nbsp;
IFileSystem <span style="color: #000040;">*</span>make_file_system<span style="color: #008000;">&#40;</span><span style="color: #0000ff;">const</span> <span style="color: #0000ff;">char</span> <span style="color: #000040;">*</span>root<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
<span style="color: #0000ff;">void</span> destroy_file_system<span style="color: #008000;">&#40;</span>IFileSystem <span style="color: #000040;">*</span>fs<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span></pre></td></tr></table></div>

<p>Since the “open structs” in <em>types.h</em> can be accessed from anywhere, we can grop operations by what they do rather than by what types they operate on. For example, we can put all the serialization code in <em>serialization.h</em> and <em>serialization.cpp</em>. We can create a file <em>path.h</em> that provides path manipulation functions for strings.</p>
<p>An external project can also &#8220;extend&#8221; any of our classes by just writing new methods for it. These methods will have the same access to the <em>Vector3</em> data and be called in exactly the same way as our built-in ones.</p>
<p>The main drawback of this model is that internal state is not as &#8220;protected&#8221; as in standard object-oriented design. External code can &#8220;break&#8221; our objects by manipulating members directly instead of using methods. For example, a stupid programmer might try to change the size of an array by manipulating the <em>_size</em> field directly, instead of using the <em>resize()</em> method.</p>
<p>Naming conventions can be used to mitigate this problem. In the example above, if a type is declared with <em>class</em> and the members are preceded by an underscore, the user should not manipulate them directly. If the type is declared as a <em>struct</em>, and the members do not start with an underscore, it is OK to manipulate them directly. Of course, a stupid programmer can still ignore this and go ahead and manipulate the members directly anyway. On the other hand, there is no end to the things a stupid programmer can do to destroy code. The best way to protect against stupid programmers is to not hire them.</p>
<p>I haven’t yet written anything really big in this style, but I&#8217;ve started to nudge some files in the Bitsquid codebase in this direction, and so far the experience has been positive.</p>
<p>This has also been posted to <a class="link" href="http://bitsquid.blogspot.com">The Bitsquid blog</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.altdevblogaday.com/2012/09/03/a-new-way-of-organizing-header-files/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Cleaning bad code</title>
		<link>http://www.altdevblogaday.com/2012/08/18/cleaning-bad-code/</link>
		<comments>http://www.altdevblogaday.com/2012/08/18/cleaning-bad-code/#comments</comments>
		<pubDate>Sat, 18 Aug 2012 05:20:41 +0000</pubDate>
		<dc:creator>Niklas Frykholm</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[clean code]]></category>
		<category><![CDATA[code cleaning]]></category>
		<category><![CDATA[refactoring]]></category>
		<category><![CDATA[simple code]]></category>

		<guid isPermaLink="false">http://www.altdevblogaday.com/?p=27238</guid>
		<description><![CDATA[<p>Guess what! You&#8217;ve just inherited a stinking, steaming pile of messy old code. Congratulations! It&#8217;s all yours.</p>
<p>Bad code can code can come from all kinds of places. Middleware, the internet, perhaps even your own company.</p>
<p><a href="http://www.altdevblogaday.com/2012/08/18/cleaning-bad-code/" class="more-link">Read more on Cleaning bad code&#8230;</a></p>
]]></description>
				<content:encoded><![CDATA[<p>Guess what! You&#8217;ve just inherited a stinking, steaming pile of messy old code. Congratulations! It&#8217;s all yours.</p>
<p>Bad code can code can come from all kinds of places. Middleware, the internet, perhaps even your own company.</p>
<p>You know that nice guy in the corner that nobody had time to check up on? Guess what he was doing all that time. Churning out bad code.</p>
<p>Or remember that module someone wrote years ago, just before she left the company. That module that twenty different people have then added hacks, patches and bug fixes to, without really understanding what they were doing. Yup, that one.</p>
<p>Or what about that open source thing you downloaded that you knew was horrible, but it solved a very specific and quite hairy problem that would have taken you ages to do by yourself.</p>
<p>Bad code doesn&#8217;t have to be a problem, as long as it&#8217;s not misbehaving, and nobody pokes their bloody nose in it. Unfortunately, that state of ignorant bliss rarely lasts. A bug will be discovered. A feature requested. A new platform released. Now you have to dig into that horrible mess and try to clean it up. This article offers some humble advice for that unfortunate situation.</p>
<h2>0. Is it worth doing?</h2>
<p>The first thing you need to ask yourself is whether the code is <em>worth</em> cleaning. I&#8217;m of the opinion that when it comes to code cleaning you should either karate do &#8220;yes&#8221;, or karate do &#8220;no&#8221;. Either you assume full responsibility for the code and rework it until you end up with something that you are actually happy to maintain and proud to have in your codebase.</p>
<p><em>Or</em> you decide that even though the code looks horrible, it isn&#8217;t cost-effective to take time out of your busy schedule to fix it. So instead you just do the smallest change possible that solves your current problem.</p>
<p>In other words, you either regard the code as <em>yours</em> or <em>theirs</em>.</p>
<p>There are merits to both alternatives. Good programmers get an itch when they see bad code. They bring out their torches and pitchforks and chant: &#8220;Unclean! Unclean!&#8221; And that is a good instinct.</p>
<p>But cleaning code is also a lot of work. It is easy to underestimate the time it takes. It can be nearly as time consuming as rewriting the whole thing from scratch. And it doesn&#8217;t bring any short term benefits. Two weeks cleaning code won&#8217;t add any new features to the game, but it might give you some new bugs.</p>
<p>On the other hand, the long term effects of <em>never</em> cleaning your code can be devastating. Entropy is the code-killer.</p>
<p>So, never an easy choice. Some things to consider are:</p>
<ul>
<li>
<p>How many changes do you expect to make to the code?</p>
<p>Is it just this one small bug that you need to fix, or is this code that you expect to return to many times to tweak and tune and add new features. If it&#8217;s just this one bug, then perhaps it is best to let sleeping dogs lie. However, if this is a module that you will need to mess around with a lot, then spending some time to clean it up now, will save a lot of headache later.</p>
</li>
<li>
<p>Will you need/want to import upstream changes?</p>
<p>Is this an open source project that is under active development? If so, and you want to pull the changes made upstream you can&#8217;t make any big changes to the code or you will be in merge hell every time you pull. So just be a nice team player, accept its idiosyncrasies and send patches with your bug fixes to the maintainer.</p>
</li>
<li>
<p>How much work is it?</p>
<p>How many lines of code can you realistically clean in a day? An order of magnitude estimate says more than 100 and less than 10 000, so let&#8217;s say 1 000. So if the module has 30 000 lines, you might be looking at a month of work. Can you spend that? Is it worth it?</p>
</li>
<li>
<p>Is it a part of your core functionality?</p>
<p>If what the module does is something peripheral, like say font rendering or image loading, you might not care that it is messy. You might swap out the whole thing for something else in the future, who knows. But you should <em>own</em> the code that relates to your core competence.</p>
</li>
<li>
<p>How bad is it?</p>
<p>If the code is just slightly bad, then perhaps you can live with it. If it is mind-numbingly, frustratingly incomprehensibly bad, then perhaps something needs to be done.</p>
</li>
</ul>
<h2>1. Get a test case</h2>
<p>Seriously cleaning a piece of code means messing around with it a lot. You will break things.</p>
<p>If you have a decent test case with good coverage you will immediately know what has broken and you can usually quite quickly figure out what stupid mistake you just made. The time and anxiety this saves over the course of the cleaning process is just ridiculous. Get a test case. It&#8217;s the first thing you should do.</p>
<p>Unit tests are best, but all code is not amenable to to unit testing. (Test fanatics, send your hate mail now!) If unit tests are too cumbersome, use an integration test instead. For example, fire up a game level and run the character through a specific set of actions related to the code you are cleaning.</p>
<p>Since such tests ate more time consuming, it might not make sense to run it after <em>every</em> change you make, which would be ideal. But as you put every single change you make into source control, it&#8217;s not so bad. Run the test every once in a while (e.g., every five changes). When it discovers a problem you can do a binary search of those last few commits to find out which one caused the problem.</p>
<p>If you discover an issue that wasn&#8217;t detected by your test, make sure that you add that to the test, so that you capture it in the future.</p>
<h2>2. Use source control</h2>
<p>Do people still have to be told to use source control? I sure hope not.</p>
<p>For cleaning work it is absolutely crucial. You will be making lots and lots of small changes to the code. If something breaks you want to be able to look back in the revision history and find out where it broke.</p>
<p>Also, if you are anything like me, you will sometimes start down a refactoring path (like removing a stupid class) and realize after a while that it wasn&#8217;t such a good idea, or, that it <em>was</em> a good idea, but that everything would be a lot simpler if you did something else first. So you want to be able to quickly revert everything you just did and begin anew.</p>
<p>Your company should have a source control system in-place that allows you to do these changes in a separate branch and commit as much as you like without disturbing anybody else.</p>
<p>But even if it doesn&#8217;t, you should <em>still</em> use source control. In that case, download mercurial (or git), create a new repository and put the code that you checked out of your company&#8217;s stupid system there. Do your changes in that repository, committing as you go. When you are done you can merge everything back into the stupid system.</p>
<p>Cloning the repository into a sensible source control system only takes a few minutes. It is <em>absolutely</em> worth it. If you don&#8217;t know mercurial, spend an hour to learn it. You will be happy you did. Or if you prefer, spend 30 hours to learn git instead. (I kid! Not really. Nerd fight now!)</p>
<h2>3. Make one (small) change at a time</h2>
<p>There are two ways of improving bad code: revolution and reform. The revolution method is to burn everything with fire and rewrite it from scratch. The reform method is to refactor the code with one small change at a time without ever breaking it.</p>
<p>This article is about the reform method. I&#8217;m not saying that revolutions never are necessary. Sometimes things are so bad that they just need to go. But people who get frustrated with the slow pace of reform and advocate revolution often fail to realize the full complexity of the problem and thus don&#8217;t give the existing system enough credit for the things it does.</p>
<p>Joel Spolsky has <a class="link" href="http://www.joelonsoftware.com/articles/fog0000000069.html">written a classic article about this</a> without falling into the trap of making strained political metaphors.</p>
<p>The best way of reforming code is to make one minimal change at a time, test it and commit it. When the change is small it is easier to understand its consequences and make sure that it doesn&#8217;t affect the existing functionality. If something goes wrong, you only have a small amount of code that you need to check. If you start doing a change and realize that it is bad, you won&#8217;t loose much work by reverting to the last commit. If you notice after a while that something has gone subtly wrong, a binary search in the revision history will let you find the small change that introduced the problem.</p>
<p>A common mistake is to do more than one thing at the same time. For example, while getting rid of an unnecessary level of inheritance you might notice that the API methods are not as orthogonal as you would like them to be and start to rearrange them. Don&#8217;t! Get rid of the inheritance first, commit that and <em>then</em> fix the API.</p>
<p>Smart programmers organize the way they work so that they don&#8217;t have to be that smart.</p>
<p>Try to find a path that takes you from what the code is now to what you want it to be in a sequence of small steps. For example, in one step you might rename the methods to give them more sane names. In the next, you might change some member variables to function parameters. Then you reorder some algorithms so that they are clearer. And so on.</p>
<p>If you start doing a change and realize that it was a bigger change than you originally thought, don&#8217;t be afraid to revert and find a way of doing the same thing in smaller, simpler steps.</p>
<h2>4. Don&#8217;t clean and fix at the same time</h2>
<p>This is a corollary to (3), but important enough to get its own point.</p>
<p>It is a common problem. You start to look at a module because you want to add some new functionality. Then you notice that the code is really badly organized, so you start reorganizing it at the same time as you are adding the new functionality.</p>
<p>The problem with this is that cleaning and fixing has diametrically opposite goals. When you clean, you want to make the code look better without changing its functionality. When you fix, you want to change its functionality to something better. If you clean and fix at the same time it becomes very hard to make sure that your cleaning didn&#8217;t indadvertedly change something.</p>
<p>Do the cleaning first. <em>Then</em>, when you have a nice clean base to work with, add the new functionality.</p>
<h2>5. Remove any functionality that you are not using</h2>
<p>The time it takes to clean is proportional to the amount of code, its complexity and its messiness. </p>
<p>If there is any functionality in the code that you are currently not using and don&#8217;t plan to be using in the foreseeable future &#8212; get rid of it. That will both reduce the amount of code you will have to go through and its complexity (by getting rid of unnecessary concepts and dependencies). You will be able to clean faster and the end result will be simpler.</p>
<p>Don&#8217;t save code because &#8220;who knows, you might need it some day&#8221;. Code is costly &#8212; it needs to be ported, bug checked, read and understood. The less code you have, the better. In the unlikely event that you do need the old code, you can always find it in the source repository.</p>
<h2>6. Delete most of the comments</h2>
<p>Bad code rarely has good comments. Instead, they are often:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="cpp" style="font-family:monospace;"><span style="color: #666666;">// Pointless:</span>
&nbsp;
	<span style="color: #666666;">// Set x to 3</span>
	x <span style="color: #000080;">=</span> <span style="color: #0000dd;">3</span><span style="color: #008080;">;</span>
&nbsp;
<span style="color: #666666;">// Incomprehensible:</span>
&nbsp;
	<span style="color: #666666;">// Fix for CB (aug)</span>
	pos <span style="color: #000040;">+</span><span style="color: #000080;">=</span> vector3<span style="color: #008000;">&#40;</span><span style="color: #0000dd;">0</span>, <span style="color: #000040;">-</span><span style="color:#800080;">0.007</span>, <span style="color: #0000dd;">0</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
&nbsp;
<span style="color: #666666;">// Sowing fear and doubt:</span>
&nbsp;
	<span style="color: #666666;">// Really we shouldn't be doing this</span>
	t <span style="color: #000080;">=</span> get_latest_time<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
&nbsp;
<span style="color: #666666;">// Downright lying:</span>
&nbsp;
	<span style="color: #666666;">// p cannot be NULL here</span>
	p<span style="color: #000040;">-</span><span style="color: #000080;">&gt;</span>set_speed<span style="color: #008000;">&#40;</span><span style="color:#800080;">0.7</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span></pre></td></tr></table></div>

<p>Read through the code. If a comment doesn&#8217;t make sense to you and doesn&#8217;t further your understanding of the code &#8212; get rid of it. Otherwise you will just waste mental energy on trying to understand that comment on each future reading of the code.</p>
<p>The same goes for dead code that has been commented or #ifdef&#8217;ed out. Get rid of it. It&#8217;s there in the source repository if you need it.</p>
<p>Even when comments are correct and useful, remember that you will be doing a lot of refactoring of the code. The comments may no longer be correct when you are done. And there is no unit test in world that can tell you if you have &#8220;broken the comments&#8221;.</p>
<p>Good code needs few comments because the code itself is clearly written and easy to understand. Variables with good names do not need comments explaining their purpose. Functions with clear inputs and outputs and no special cases or gotchas require little explanation. Simple, well written algorithms can be understood without comments. Asserts document expectations and preconditions.</p>
<p>In many cases the best thing to do is just to get rid of all old comments, focus on making the code clear and readable, and then add back whatever comments are needed &#8212; now reflecting the new API and your own understanding of the code.</p>
<h2>7. Get rid of shared mutable state</h2>
<p>Shared mutable state is the single biggest problem when it comes to understanding code, because it allows for spooky &#8220;action at a distance&#8221;, where one piece of code changes how a completely different piece of code behaves. People often say that multithreading is difficult. But really, it is the fact that the threads share mutable state that is the problem. If you get rid of that, multithreading is not so complex.</p>
<p>Since your goal is to write high-performant software, you won&#8217;t be able to get rid of all mutable state, but your code can still benefit enormously from reducing it as much as possible. Strive for programs that are &#8220;almost functional&#8221; and make sure you know exactly what state you are mutating where and why.</p>
<p>Shared mutable state can come from several different places:</p>
<ul>
<li>
<p>Global variables. The classic example. By now everybody surely knows that global variables are bad. But note (and this is a distinction that people sometimes fail to make), that it is only shared <em>mutable</em> state that is problematic. Global <em>constants</em> are not bad. Pi is not bad. Sprintf is not bad.</p>
</li>
<li>
<p>Objects &#8212; big bags of fun. Objects are a way for a large number of functions (the methods) to implicitly share a big bag of mutable state (the members). If a lazy programmer needs to pass some information around between methods, she can just make a new member that they can read and write as they please. It&#8217;s almost like a global variable. How fun! The more members and the more methods an object has, the bigger this problem is.</p>
</li>
<li>
<p>Megafunctions. You have heard about them. These mythic creatures that dwell in the deepest recesses of the darkest codebases. Broken programmers talk about them in dusky bars, their sanity shattered by their encounters: &#8220;I just kept scrolling and scrolling. I couldn&#8217;t believe my eyes. It was 12 000 lines long.&#8221;</p>
<p>When functions are big enough, their local variables are almost as bad as global variables. It becomes impossible to tell what effect a change to a local variable might have 2 000 lines further down in the code.</p>
</li>
<li>
<p>Reference and pointer parameters. Reference and pointer parameters that are passed without <em>const</em> can be used to subtly share mutable state between the caller, the callee and anyone else who might be passed the same pointer.</p>
</li>
</ul>
<p>Here are some practical ideas for getting rid of shared mutable state:</p>
<ul>
<li>
<p>Split big functions into smaller ones.</p>
</li>
<li>
<p>Split big objects into smaller ones by grouping members that belong together.</p>
</li>
<li>
<p>Make members private.</p>
</li>
<li>
<p>Change methods to be <em>const</em> and return the result instead of mutating state.</p>
</li>
<li>
<p>Change methods to be <em>static</em> and take their arguments as parameters instead of reading them from shared state.</p>
</li>
<li>
<p>Get rid of objects entirely and implement the functionality as pure functions without side effects.</p>
</li>
<li>
<p>Make local variables <em>const</em>.</p>
</li>
<li>
<p>Change pointer and reference arguments to <em>const</em>.</p>
</li>
</ul>
<h2>8. Get rid of unnecessary complexity</h2>
<p>Unnecessary complexity is often a result of over-engineering &#8212; where the support structures (for serialization, reference counting, virtualized interfaces, abstract factories, visitors, etc) dwarf the code that performs the actual functionality.</p>
<p>Sometimes over-engineering occurs because software projects start out with a lot more ambitious goals than what actually gets implemented. More often, I think, it reflects the ambitions/esthetics of a programmer who has read books on design patterns and the waterfall model and believes that over-engineering makes a product &#8220;solid&#8221; and &#8220;high-quality&#8221;.</p>
<p>Often, the heavy, rigid, overly complex model that results is unable to adapt to feature requests that were not anticipated by the designer. Those features are then implemented as hacks, bolt-ons and backdoors on top of the ivory tower resulting in a schizophrenic mix of absolute order and utter chaos.</p>
<p>The cure against over-engineering is YAGNI &#8212; you are not gonna need it! Only build the things that you <em>know</em> you need. Add more complicated stuff <em>when</em> you need it, not before.</p>
<p>Some practical ideas for cleaning out of unnecessary complexity:</p>
<ul>
<li>
<p>Remove the functionality you are not using (as suggested above).</p>
</li>
<li>
<p>Simplify necessary concepts, and get rid of unneeded ones.</p>
</li>
<li>
<p>Remove unnecessary abstractions, replace with concrete implementations.</p>
</li>
<li>
<p>Remove unnecessary virtualization and simplify object hierarchies.</p>
</li>
<li>
<p>If only one setting is ever used, get rid of the possibility of running the module in other configurations.</p>
</li>
</ul>
<h2>9. That is all</h2>
<p>Now go clean your room!</p>
<p>This has also been posted to <a class="link" href="http://bitsquid.blogspot.com">The Bitsquid blog</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.altdevblogaday.com/2012/08/18/cleaning-bad-code/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>A simpler design for asynchronous APIs</title>
		<link>http://www.altdevblogaday.com/2012/08/03/a-simpler-design-for-asynchronous-apis/</link>
		<comments>http://www.altdevblogaday.com/2012/08/03/a-simpler-design-for-asynchronous-apis/#comments</comments>
		<pubDate>Fri, 03 Aug 2012 22:33:54 +0000</pubDate>
		<dc:creator>Niklas Frykholm</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[API]]></category>
		<category><![CDATA[asynchronous]]></category>
		<category><![CDATA[design]]></category>
		<category><![CDATA[programming]]></category>

		<guid isPermaLink="false">http://www.altdevblogaday.com/?p=27115</guid>
		<description><![CDATA[<p>Accessing Internet services, e.g. to fetch a web page or to store data on a leaderboard, requires an asynchronous API. You send a request and then, at some later point, you receive a reply.</p>
<p><a href="http://www.altdevblogaday.com/2012/08/03/a-simpler-design-for-asynchronous-apis/" class="more-link">Read more on A simpler design for asynchronous APIs&#8230;</a></p>
]]></description>
				<content:encoded><![CDATA[<p>Accessing Internet services, e.g. to fetch a web page or to store data on a leaderboard, requires an asynchronous API. You send a request and then, at some later point, you receive a reply.</p>
<p>Asynchronous APIs are trickier to design than synchronous ones. You can&#8217;t simply return the result of the operation, since it isn&#8217;t ready yet. Instead you have to wait until it is done and then send it to the caller through some other channel. This often results in designs that are needlessly complicated and cumbersome to work with.</p>
<h2>Callbacks</h2>
<p>The most common approach is perhaps to use callbacks. You make the asynchronous request and when it completes the callback is called. The callback can either be a global system-wide callback, or (which is nicer) a callback that you supply when you make the asynchronous call.</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="cpp" style="font-family:monospace;">leaderboard<span style="color: #000040;">-</span><span style="color: #000080;">&gt;</span>set_score<span style="color: #008000;">&#40;</span><span style="color: #0000dd;">100</span>, set_score_cb, my_user_data<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
&nbsp;
<span style="color: #0000ff;">void</span> set_score_cb<span style="color: #008000;">&#40;</span>SetScoreResult <span style="color: #000040;">*</span>result, <span style="color: #0000ff;">void</span> <span style="color: #000040;">*</span>user_data<span style="color: #008000;">&#41;</span>
<span style="color: #008000;">&#123;</span>
   ...
<span style="color: #008000;">&#125;</span></pre></td></tr></table></div>

<p>I have already mentioned in a <a class="link" href="http://www.altdevblogaday.com/2011/02/11/managing-coupling-part-2-%E2%80%94-polling-callbacks-and-events/">previous article</a> that I&#8217;m not too fond of callbacks and that I prefer polling in most cases. Badly designed polling can be expensive, but in the case of asynchronous network operations we wouldn&#8217;t expect to have more than a dozen or so in-flight at any one time, which means the cost of polling is negligible.</p>
<p>Callbacks tend to make code worse. There are several reasons.</p>
<p>First, you usually have little control over <em>when</em> a callback happens. This means that it can happen at a time that isn&#8217;t very suitable to you. For cleanliness, you may want to do all your leaderboard processing in your <tt>update_leaderboard()</tt> function. But the callback might be called outside <tt>update_leaderboard()</tt>, messing up all your carefully laid plans.</p>
<p>Second, it can be tricky to know what you can and cannot do in a callback. The code that calls you might make some assumptions that you inadvertently violate. These things can sometimes be really tricky to spot. Consider something as simple as:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="cpp" style="font-family:monospace;"><span style="color: #0000ff;">int</span> n <span style="color: #000080;">=</span> _leaderboard_operations.<span style="color: #007788;">size</span><span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
<span style="color: #0000ff;">for</span> <span style="color: #008000;">&#40;</span><span style="color: #0000ff;">int</span> i<span style="color: #000080;">=</span><span style="color: #0000dd;">0</span><span style="color: #008080;">;</span> i<span style="color: #000040;">!</span><span style="color: #000080;">=</span>n<span style="color: #008080;">;</span> <span style="color: #000040;">++</span>i<span style="color: #008000;">&#41;</span> <span style="color: #008000;">&#123;</span>
	<span style="color: #0000ff;">if</span> <span style="color: #008000;">&#40;</span>done<span style="color: #008000;">&#40;</span>_leaderboard_operations<span style="color: #008000;">&#91;</span>i<span style="color: #008000;">&#93;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span>
		do_callback<span style="color: #008000;">&#40;</span>_leaderboard_operations<span style="color: #008000;">&#91;</span>i<span style="color: #008000;">&#93;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
<span style="color: #008000;">&#125;</span></pre></td></tr></table></div>

<p>This looks perfectly innocent. But if the callback happens to do something that changes the <tt>_leaderboard_operations</tt> vector, for example by posting a new request or removing an old one, the code can blow up with memory access errors. I have been bitten by things like this many times. By now, every time I see a callback a warning clock goes off in my head: &#8220;danger, danger &#8212; there is a callback here, remember that when you make a callback <em>anything</em> can happen&#8221;.</p>
<p>Sometimes it can be necessary to double buffer data to get rid of bugs like this.</p>
<p>Third, callbacks always happen in the wrong <em>context</em>. You get the callback in some &#8220;global&#8221;, &#8220;top-level&#8221; context, and from there you have to drill down to the code that actually knows what to do with the information. (Typically by casting the <tt>user_data</tt> pointer to some class and calling a member function on it.) This makes the code hard to follow.</p>
<p>In other words, callbacks lead to hard-to-read code, hard-to-follow code flow, subtle bugs, redundant boilerplate forwarding stubs and instruction cache misses. Bleh!</p>
<h2>Request objects</h2>
<p>Another common approach is to have some sort of <em>request object</em> that represents the asynchronous operation. Something like:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="cpp" style="font-family:monospace;">SetScoreRequest <span style="color: #000040;">*</span>request <span style="color: #000080;">=</span> _leaderboard<span style="color: #000040;">-</span><span style="color: #000080;">&gt;</span>set_score<span style="color: #008000;">&#40;</span><span style="color: #0000dd;">100</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
...
<span style="color: #0000ff;">if</span> <span style="color: #008000;">&#40;</span>request<span style="color: #000040;">-</span><span style="color: #000080;">&gt;</span>is_done<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span> <span style="color: #008000;">&#123;</span>
	<span style="color: #0000ff;">bool</span> success <span style="color: #000080;">=</span> request<span style="color: #000040;">-</span><span style="color: #000080;">&gt;</span>result<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
	<span style="color: #0000dd;">delete</span> request<span style="color: #008080;">;</span>
<span style="color: #008000;">&#125;</span></pre></td></tr></table></div>

<p>Or perhaps, using the C++11 concepts of promises and futures (I have only a passing acquaintance with C++11, so forgive me if I mess something up):</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="cpp" style="font-family:monospace;">std<span style="color: #008080;">::</span><span style="color: #007788;">promise</span><span style="color: #000080;">&lt;</span><span style="color: #0000ff;">bool</span><span style="color: #000080;">&gt;</span> <span style="color: #000040;">*</span>promise <span style="color: #000080;">=</span> <span style="color: #0000dd;">new</span> std<span style="color: #008080;">::</span><span style="color: #007788;">promise</span><span style="color: #000080;">&lt;</span><span style="color: #0000ff;">bool</span><span style="color: #000080;">&gt;</span><span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
_leaderboard<span style="color: #000040;">-</span><span style="color: #000080;">&gt;</span>set_score<span style="color: #008000;">&#40;</span><span style="color: #0000dd;">100</span>, promise<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
...
<span style="color: #007788;">std</span><span style="color: #008080;">::</span><span style="color: #007788;">future</span><span style="color: #000080;">&lt;</span><span style="color: #0000ff;">bool</span><span style="color: #000080;">&gt;</span> future <span style="color: #000080;">=</span> promise<span style="color: #000040;">-</span><span style="color: #000080;">&gt;</span>get_future<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
<span style="color: #0000ff;">if</span> <span style="color: #008000;">&#40;</span>future.<span style="color: #007788;">valid</span><span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span> <span style="color: #008000;">&#123;</span>
	<span style="color: #0000ff;">bool</span> success <span style="color: #000080;">=</span> future.<span style="color: #007788;">get</span><span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
	_leaderboard<span style="color: #000040;">-</span><span style="color: #000080;">&gt;</span>forget_promise<span style="color: #008000;">&#40;</span>promise<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
	<span style="color: #0000dd;">delete</span> promise<span style="color: #008080;">;</span>
<span style="color: #008000;">&#125;</span></pre></td></tr></table></div>

<p>This is a lot better than the callback approach, but still in my view, overly complicated. It is clearly a design based on the object-oriented philosophy of &#8212; when in doubt, make more objects.</p>
<p>But these extra objects don&#8217;t really <em>do</em> much. They just act as pointless intermediaries that pass some information back and forth between our code and the <tt>_leaderboard</tt> object. And they are a hassle for the caller to keep track of. She must store them somewhere and make sure to delete them when she is done to avoid memory leaks.</p>
<p>Furthermore, if we want to expose this API to a scripting language, such as Lua, we have to expose these extra objects as well.</p>
<h2>ID tokens</h2>
<p>As readers of <a class="link" href="http://www.altdevblogaday.com/2011/01/26/managing-decoupling/">my previous articles</a> know, I&#8217;m a big fan of using IDs. Instead of exposing internal system objects to the caller of an API, I prefer to give the caller IDs that uniquely identifies the objects and provide functions for obtaining information about them.</p>
<p>That way, I am free to organize my internal data however I like. And it is easier to see when the state of my objects might mutate, since all calls go through a single API.</p>
<p>With this approach the interface would look something like this:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="cpp" style="font-family:monospace;"><span style="color: #0000ff;">unsigned</span> set_score<span style="color: #008000;">&#40;</span><span style="color: #0000ff;">int</span> value<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
<span style="color: #0000ff;">enum</span> SetScoreResult <span style="color: #008000;">&#123;</span>SSR_IN_PROGRESS, SSR_SUCCESS, SSR_FAILURE<span style="color: #008000;">&#125;</span><span style="color: #008080;">;</span>
SetScoreResult set_score_result<span style="color: #008000;">&#40;</span><span style="color: #0000ff;">unsigned</span> id<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span></pre></td></tr></table></div>

<p>Note that there are no objects that the user must maintain and release. The ID can easily be manipulated by a scripting layer. If the user doesn&#8217;t need to know if the operation succeeded, she can just throw away the returned ID.</p>
<p>In this API I don&#8217;t have any method for freeing tokens. I don&#8217;t want to force the user to do that, since it is both a hassle (the user must track all IDs and decide who <em>owns</em> them) and error prone (easy to forget to release an ID).</p>
<p>But obviously, we must free tokens <em>somehow</em>. We can&#8217;t store the results of the <em>set_score()</em> operations forever. If we did, we would eventually run out of memory.</p>
<p>There are several ways you could approach this problem. My preferred solution in this particular case is to just have a fixed limit on the number of operations that we remember. Since we don&#8217;t expect more than a dozen simultaneous operations, if we make room for 64, we have plenty of slack and still use only 64 bytes of memory. A modest amount by any standard.</p>
<p>We can keep the results in a round-robin buffer:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="cpp" style="font-family:monospace;"><span style="color: #666666;">/// Maximum number of requests whose result we remember.</span>
<span style="color: #0000ff;">static</span> <span style="color: #0000ff;">const</span> <span style="color: #0000ff;">int</span> MAX_IN_FLIGHT <span style="color: #000080;">=</span> <span style="color: #0000dd;">64</span><span style="color: #008080;">;</span>
&nbsp;
<span style="color: #666666;">/// The result of the last MAX_IN_FLIGHT requests.</span>
<span style="color: #0000ff;">char</span> results<span style="color: #008000;">&#91;</span>MAX_IN_FLIGHT<span style="color: #008000;">&#93;</span><span style="color: #008080;">;</span>
&nbsp;
<span style="color: #666666;">/// Number of requests that have been made.</span>
<span style="color: #0000ff;">unsigned</span> num_requests<span style="color: #008080;">;</span>
&nbsp;
SetScoreResult set_score_result<span style="color: #008000;">&#40;</span><span style="color: #0000ff;">unsigned</span> id<span style="color: #008000;">&#41;</span>
<span style="color: #008000;">&#123;</span>
	<span style="color: #666666;">// If more than MAX_IN_FLIGHT requests have been made after this one,</span>
	<span style="color: #666666;">// the information about it is lost.</span>
	<span style="color: #0000ff;">if</span> <span style="color: #008000;">&#40;</span>num_requests <span style="color: #000040;">-</span> id <span style="color: #000080;">&gt;</span> MAX_IN_FLIGHT<span style="color: #008000;">&#41;</span>
		<span style="color: #0000ff;">return</span> SSR_NO_INFORMATION<span style="color: #008080;">;</span>
&nbsp;
	<span style="color: #0000ff;">return</span> results<span style="color: #008000;">&#91;</span>id <span style="color: #000040;">%</span> MAX_IN_FLIGHT<span style="color: #008000;">&#93;</span><span style="color: #008080;">;</span>
<span style="color: #008000;">&#125;</span></pre></td></tr></table></div>

<p>This means that you can only ask about the result of the last 64 operations. On the other hand, this solution uses very little memory, does not allocate anything, has very quick lookups and doesn&#8217;t require the user to explicitly free tokens. </p>
<p>To me, this added simpleness and flexibility outweighs the disadvantage of having a limit on the maximum number of in flight operations that we support.</p>
<h2>Implicit APIs</h2>
<p>In many cases, the best solution to asynchronous conundrums is to redesign the API to abstract away the entire concept of <em>asynchronous operations</em>, so that the user doesn&#8217;t even have to bother with it.</p>
<p>This can require some creative rethinking in order to focus on what it is the user <em>really</em> wants to do. For example, for our example, we might come up with this:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="cpp" style="font-family:monospace;"><span style="color: #666666;">/// Sets the score to the specified value. This is an asynchronous operation.</span>
<span style="color: #666666;">/// You can use acknowledged_score() to find out when it has completed.</span>
<span style="color: #0000ff;">void</span> set_score<span style="color: #008000;">&#40;</span><span style="color: #0000ff;">int</span> score<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
&nbsp;
<span style="color: #666666;">/// Returns the last score that has been acknowledged by the server.</span>
<span style="color: #0000ff;">int</span> acknowledged_score<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span></pre></td></tr></table></div>

<p>This is probably all that the user needs to know.</p>
<p>Now we have <em>really</em> simplified the API. The user still needs to be aware that <tt>set_score()</tt> isn&#8217;t propagated to the server immediately, but she doesn&#8217;t at all have to get involved in what asynchronous operations are performed and how they progress.</p>
<p>This kind of radical rewrite might not be possible (or even desirable) for all asynchronous systems. You have to balance the value of high-level abstractions and simplifications against the need for low-level control. But it is almost always worth exploring the possibility since it can lead to interesting ideas and dramatically simplified APIs.</p>
<p>For example, the interface for an asynchronous web fetcher could be as simple as:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="cpp" style="font-family:monospace;"><span style="color: #0000ff;">const</span> <span style="color: #0000ff;">char</span> <span style="color: #000040;">*</span>fetch<span style="color: #008000;">&#40;</span><span style="color: #0000ff;">const</span> <span style="color: #0000ff;">char</span> <span style="color: #000040;">*</span>url<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span></pre></td></tr></table></div>

<p>If called with an URL that hadn&#8217;t been fetched yet, the function would issue a request for the URL and return <em>NULL</em>. Once the data was available, the function would return it. On the next call, the data would be freed. To fetch a web page, you would just repeatedly call the function with an URL until you got a reply.</p>
<p>Quite fetching, wouldn&#8217;t you say?</p>
<p>This has also been posted to <a class="link" href="http://bitsquid.blogspot.com">The Bitsquid blog</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.altdevblogaday.com/2012/08/03/a-simpler-design-for-asynchronous-apis/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Matrices, Rotation, Scale and Drifting</title>
		<link>http://www.altdevblogaday.com/2012/07/03/matrices-rotation-scale-and-drifting/</link>
		<comments>http://www.altdevblogaday.com/2012/07/03/matrices-rotation-scale-and-drifting/#comments</comments>
		<pubDate>Tue, 03 Jul 2012 12:10:46 +0000</pubDate>
		<dc:creator>Niklas Frykholm</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[floating-point math]]></category>
		<category><![CDATA[matrices]]></category>
		<category><![CDATA[transforms]]></category>

		<guid isPermaLink="false">http://www.altdevblogaday.com/?p=26814</guid>
		<description><![CDATA[<p>If you are using Matrix4x4s to store your node transforms and want to support scaling you are facing an annoying numerical problem: rotating a node causes its scale to drift from the original value.</p>
<p><a href="http://www.altdevblogaday.com/2012/07/03/matrices-rotation-scale-and-drifting/" class="more-link">Read more on Matrices, Rotation, Scale and Drifting&#8230;</a></p>
]]></description>
				<content:encoded><![CDATA[<p>If you are using Matrix4x4s to store your node transforms and want to support scaling you are facing an annoying numerical problem: rotating a node causes its scale to drift from the original value.</p>
<h2>The cause of drifting</h2>
<p>Drifting happens because in a Matrix4x4 the rotation and the scale are stored together in the upper left 3&#215;3 part of the matrix:</p>
<p><a href="http://www.altdevblogaday.com/wp-content/uploads/2012/07/scale1.png"><img src="http://www.altdevblogaday.com/wp-content/uploads/2012/07/scale1-300x300.png" alt="" width="300" height="300" class="aligncenter size-medium wp-image-26815" /></a></p>
<p>This means that if we want to change the rotation of a Matrix4x4 without affecting the scale we must extract the scale and reapply it:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="text" style="font-family:monospace;">void set_rotation(Matrix4x4 &amp;pose, const Quaternion &amp;rot)
{
     Vector3 s = scale(pose);
     Matrix3x3 rotm = matrix3x3(rot);
     scale(rotm, s);
     set_3x3(pose, rotm);
}</pre></td></tr></table></div>

<p>The problem here is that since floating point computation is imprecise, <tt>scale(pose)</tt> is not guaranteed to be exactly the same before this operation as after. Numerical errors will cause a very small difference. So even though we only intended to rotate the node we have inadvertently made it ever so slightly bigger (or smaller).</p>
<p>Does it matter? Sure, it is annoying that an object that we didn&#8217;t want to have any scaling at all suddenly has a scale of 1.0000001, but surely such a small change would be impercievable and couldn&#8217;t affect gameplay.</p>
<p>True, if we only rotated the object once. However, if we are dealing with an animated or spinning object we will be changing its rotation every frame. So if the error is 0.0000001 the first frame, it might be 0.0000002 the second frame and 0.0000003 the third frame.</p>
<p>Note that the error growth is linear rather than geometric because the error in each iteration is proportional to the current <em>scale</em>, not to the current error. I. e., to <em>(1 + e)</em> rather than <em>e</em>. We can assume that <em>1 &gt;&gt; e</em>, because otherwise we already have a clearly visible error.</p>
<p>I ran a test using our existing math code. Rotating a transform using the method described above yields the following result:</p>
<table>
<tr>
<th>Error </th>
<th>Frames </th>
<th>Time (at 60 Hz)</th>
</tr>
<tr>
<td>0.000001</td>
<td>202 </td>
<td>3 s</td>
</tr>
<tr>
<td>0.000002 </td>
<td>437 </td>
<td>7 s</td>
</tr>
<tr>
<td>0.000005 </td>
<td>897 </td>
<td>15 s</td>
</tr>
<tr>
<td>0.000010 </td>
<td>1654 </td>
<td>28 s</td>
</tr>
<tr>
<td>0.000020 </td>
<td>3511 </td>
<td>58 s</td>
</tr>
<tr>
<td>0.000050 </td>
<td>8823 </td>
<td>2 min</td>
</tr>
<tr>
<td>0.000100 </td>
<td>14393 </td>
<td>4 min</td>
</tr>
<tr>
<td>0.000200 </td>
<td>24605</td>
<td>7 min</td>
</tr>
<tr>
<td>0.000500 </td>
<td>52203 </td>
<td>15 min</td>
</tr>
<tr>
<td>0.001000 </td>
<td>100575 </td>
<td>28 min</td>
</tr>
</table>
<p>As you can see, after 28 minutes we have an error of 0.1 %. At this point, it starts to get noticeable.</p>
<p>You could debate if this is something that needs fixing. Maybe you can live with the fact that objects grow by 0.1 % every half hour, because your game sessions are short and the small scale differences will never be noted. However, since Bitsquid is a general purpose engine, we need a better solution to the problem.</p>
<p>At this point, you might be asking yourself why this problem only happens when we introduce scaling. Don&#8217;t we have the same issue with just translation and rotation? No, because translation and rotation are stored in completely separate parts of the matrix:</p>
<p><a href="http://www.altdevblogaday.com/wp-content/uploads/2012/07/scale2.png"><img src="http://www.altdevblogaday.com/wp-content/uploads/2012/07/scale2-300x300.png" alt="" width="300" height="300" class="aligncenter size-medium wp-image-26816" /></a></p>
<p>Setting the rotation doesn&#8217;t touch any of the position elements and can&#8217;t introduce errors in them, and vice versa.</p>
<h2>Solutions to scale drifting</h2>
<p>I can think of four possible solutions to this problem:</p>
<blockquote>
<ul>
<li>
<p>Store rotation and scale separately</p>
</li>
<li>
<p>Always set rotation and scale together</p>
</li>
<li>
<p>Quantize the scale values</p>
</li>
<li>
<p>Prevent systematic errors</p>
</li>
</ul>
</blockquote>
<h2>Solution 1: Store rotation and scale separately</h2>
<p>The root cause of the problem is that rotation and scale are jumbled together in the Matrix4x4. We can fix that by separating them. So instead of using a Matrix4x4 we would store our pose as:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="text" style="font-family:monospace;">struct Pose {
      Vector3 translation;
      Matrix3x3 rotation;
      Vector3 scale;
};</pre></td></tr></table></div>

<p>With the pose stored like this, changing the rotation does not touch the scale values, so we have eliminated the problem of drifting.</p>
<p>Note that this representation is actually using slightly less memory than a Matrix4x4 &#8212; 15 floats instead of 16. (We could reduce the storage space even further by storing the rotation as a quaternion, but then it would be more expensive to convert it to matrix form.)</p>
<p>However, the representation is not as convenient as a Matrix4x4. We can&#8217;t compose it or compute its inverse with regular matrix operations, as we can do for a Matrix4x4. We could write custom operations for that, or we could just convert this representation to a temporary Matrix4x4 whenever we needed those operations.</p>
<p>Converting to a Matrix4x4 requires initializing the 16 floats (some with values from the pose) and 9 floating point multiplications (to apply the scale). What kind of a performance impact would this have?</p>
<p>I would guess that the part of the codebase that would be most affected would be the scene graph local-to-world transformation. With this solution, you would want to store the local transform as a Pose and the world transform as a Matrix4x4. The local-to-world transform requires about 36 multiplications and 36 additions (says my quick estimate). So adding a temp Matrix4x4 conversion would take you from 72 to 81 FLOPS.</p>
<p>So a very rough estimate is that this change would make your scene graph transforms about 12 % more expensive. Likely, the real value is less than that since you probably have additional overhead costs that are the same for both methods. And of course, the scene graph transforms are just one small (and parallelizable) part of what your engine does. We rarely spend more than 2 % of our frame time there, meaning the total performance hit is something like 0.2 %.</p>
<p>I think that is a quite reasonable price to pay for a neat solution to the problem of drifting, but you may disagree of course. Also, perhaps the use of Matrix4x4s is so ingrained in your code base that it is simply not possible to change it. So let&#8217;s look at the other possible solutions.</p>
<h2>Solution 2: Always set rotation and scale together</h2>
<p>The fundamental problem with <em>set_rotation()</em> is that we try to change just the orientation of the node without affecting the scale. Extracting the scale and reapplying it is what causes the drifting.</p>
<p>If we don&#8217;t allow the user to just change the rotation, but force him to always set the scale and the rotation together, the problem disappears:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="text" style="font-family:monospace;">void set_rotation_and_scale(Matrix4x4 &amp;pose, const Quaternion &amp;rot, const Vector3 &amp;s)
{
    Matrix3x3 rotm = matrix3x3(rot);
    scale(rotm, s);
    set_3x3(pose, rotm);
}</pre></td></tr></table></div>

<p>Since we have eliminated the step where we extract the scale and feed it back, we have rid ourselves of the feedback loop that caused runaway drifting. Of course, we haven&#8217;t completely eliminated the problem, because nothing prevents the user from emulating what we did in <em>set_rotation()</em> and recreating the feedback loop:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="text" style="font-family:monospace;">Vector3 s = scale(pose);
set_rotation_and_scale(pose, new_rotation, s);</pre></td></tr></table></div>

<p>Now the drifting problem is back with a vengeance, reintroduced by the user.</p>
<p>To prevent drifting the user must take care not to create such feedback loops. I.e., she can never extract the scale from the matrix. Instead she must store the scale at some other place (separate from the matrix) so that she can always feed the matrix with the correct scale value.</p>
<p>What we have done is essentially to move the burden of keeping track of the scale of objects from the transform (the Matrix4x4) to the user of the transform. This prevents drifting and doesn&#8217;t have any performance costs, but it is pretty inconvenient for the user to have to track the scale of objects manually. Also, it is error prone, since the user who is not 100 % certain of what she is doing can accidentally recreate the feedback loop that causes drifting.</p>
<h2>Solution 3: Quantize the scale values</h2>
<p>If none of the two options presented so far seem palpable to you, there is actually a third possibility.</p>
<p>Consider what would happen if we changed the <em>Vector3 scale(const Matrix4x4 &amp;)</em> function so that it always returned integer values.</p>
<p>Calling <em>set_rotation()</em> as before would introduce an error to the scale and set it to, say 1.0000001. But the next time we ran <em>set_rotation()</em> and asked for the scale it would be rounded to the nearest integer value, so it would be returned as 1 &#8212; the correct value. Applying the new rotation would again introduce an error and change the value to 1.0000001, but then again, the next time the function ran, the value returned would be snapped back to 1.</p>
<p>So by rounding the returned scale to fixed discrete values we prevent the feedback loop. We still get small errors in the scale, but without the runaway effect they are unnoticeable. (Small errors occur everywhere, for example in the scene graph transforms. That&#8217;s the nature of floating point computation. It is not the small errors that are the problem but the mechanisms that can cause them to result in visible effects.)</p>
<p>Of course, if we round to integer values we can only scale an object by 1, 2, 3, etc. Not by 0.5, for instance. But we can fix that by using some other set of discrete numbers for the scale. For example, we could round to the nearest 0.0001. This would let us have scales of 0.9998, 0.9999, 1.0000, 1.0001, 1.0002, … Hopefully that is enough precision to cover all the different scales that our artists might want to use.</p>
<p>Drifting won&#8217;t happen in this scheme, because the floating point errors will never be big enough to change the number to the next discrete value. (Unless you used really large scale values. If you want to support that &#8212; typically not interesting, because things like texture and vertex resolution start to look wonky &#8212; you could use a geometric quantization scheme instead of an arithmetic one.)</p>
<p>Snapping the scale values in this way might be OK for static scaling. But what if you want to smoothly change the scaling with an animation? Won&#8217;t the discrete steps cause visible jerks in the movement?</p>
<p>Actually not. Remember that it is only the value returned by <em>scale()</em> that is quantized, the user is still free to <em>set_scale()</em> to any non-quantized value. When the scale is driven by an animation, it is fed from an outside source. We don&#8217;t need to read it from the matrix and reapply it. So the quantization that happens in <em>scale()</em> never comes into play.</p>
<p>So amazingly enough, this hacky solution of snapping the scale to a fixed set of discrete values actually seems to work for most real world problems. There might be situations where it would cause trouble, but I can&#8217;t really come up with any.</p>
<h2>Solution 4: Prevent systematic errors</h2>
<p>A final approach is to try to address <em>how</em> the numbers are drifting instead of stopping them from drifting. If you look at the table above you see that the errors are growing linearly. That is not what you would expect if the errors were completely random.</p>
<p>If the errors in each iteration were completely random, you would get a <a class="link" href="http://en.wikipedia.org/wiki/Random_walk">random walk</a> process where the total error would be <em>e * sqrt(N)</em> rather than <em>e * N</em>, where <em>e</em> is the error from one iteration and <em>N</em> the number of iterations. The fact that the error grows linearly tells us that our computation has a systematic bias &#8212; the error is always pushed in one particular direction.</p>
<p>If we could get rid of this systematic bias and get a truly random error, the accumulated error would grow much more slowly, the square root makes all the difference. For example, for the error to grow to 0.1 % it would take 5.2 years rather than 28 minutes. At that point, we might be ok with the drifting.</p>
<p>I haven&#8217;t thought that much about what would be needed to get rid of the systematic bias in the <em>set_rotation()</em> function. It&#8217;s a pretty tricky problem that requires a deep understanding of what happens to all the floating point numbers as they travel through the equations.</p>
<h2>Conclusion</h2>
<p>In the Bitsquid engine we have so far gone with #2, as a make-shift until we decided on the best permanent solution to this problem. After reviewing the options in this article I think we will most likely go with #1. #3 is an interesting hack and I think it would work almost everywhere, but I&#8217;m willing to pay the slight performance price for the cleaner and clearer solution of #1.</p>
<p>This has also been posted to <a class="link" href="http://bitsquid.blogspot.com">The Bitsquid blog</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.altdevblogaday.com/2012/07/03/matrices-rotation-scale-and-drifting/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Hack Day Report</title>
		<link>http://www.altdevblogaday.com/2012/06/19/hack-day-report/</link>
		<comments>http://www.altdevblogaday.com/2012/06/19/hack-day-report/#comments</comments>
		<pubDate>Tue, 19 Jun 2012 21:36:21 +0000</pubDate>
		<dc:creator>Niklas Frykholm</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[collision]]></category>
		<category><![CDATA[hack-day]]></category>
		<category><![CDATA[particles]]></category>
		<category><![CDATA[physics]]></category>

		<guid isPermaLink="false">http://www.altdevblogaday.com/?p=26673</guid>
		<description><![CDATA[<p>Last Friday, we had our second <em>hack day</em> (aka <em>do-what-you-want day</em>, aka <em>google day</em>) at the office.</p>
<p>Different companies seem to take different approaches to hack days. At some places it just means that you can spend a certain percentage of your working week on your own projects. We wanted something that was a bit more focused and felt more like a special event, so we used the following approach:</p>
<p><a href="http://www.altdevblogaday.com/2012/06/19/hack-day-report/" class="more-link">Read more on Hack Day Report&#8230;</a></p>
]]></description>
				<content:encoded><![CDATA[<p>Last Friday, we had our second <em>hack day</em> (aka <em>do-what-you-want day</em>, aka <em>google day</em>) at the office.</p>
<p>Different companies seem to take different approaches to hack days. At some places it just means that you can spend a certain percentage of your working week on your own projects. We wanted something that was a bit more focused and felt more like a special event, so we used the following approach:</p>
<ul>
<li>
<p>People were encouraged to pick tasks that could be completed, or taken to a &#8220;proof-of-concept&#8221; level in a single day. The goal was that at the end of the day you should have something interesting to <em>show/tell</em> your colleagues.</p>
</li>
<li>
<p>It is ok to fail of course. Failure is often interesting. Trying crazy ideas with a significant risk of spectacular failure is part of the charm of a hack day.</p>
</li>
<li>
<p>A couple of days before the event, everbody presented their projects. The idea was to get everybody to start thinking about the topics, so that we could help each other with ideas and suggestions.</p>
</li>
<li>
<p>We ate breakfast together in the morning to start the discussions and get everybody in the spirit of the event. At the end of the day, we rounded off with a couple of beers.</p>
</li>
<li>
<p>We avoided Skype, email and meetings during the day, so that we could focus 100 % on the projects.</p>
</li>
<li>
<p>A couple of days after the events we had a small show &amp; tell, where everybody could present what they had learned.</p>
</li>
</ul>
<h2>Results</h2>
<p>A number of interesting projects came out of this hack day:</p>
<ul>
<li>
<p>Tobias and Mats created an improved highlighting system for indicating selected objects in the level editor. (Highlighting the OOBB works well for small objects, but for big things like landscapes and sub-levels, it is just confusing.)</p>
</li>
<li>
<p>Jim looked into a cross-platform solution for capturing screen shots and videos on target machines and transmitting them over the network.</p>
</li>
<li>
<p>Andreas created a Lua profiling tool, that can dynamically enable and disable profiling for any Lua function by hot-patching the code with profiler calls.</p>
</li>
<li>
<p>Finally, I rewrote the collision algorithm for our particle systems.</p>
</li>
</ul>
<p>Being an egotistical bastard, I will focus on my own project.</p>
<p>Particle collision is one of those annoying things that it is difficult to find a good general solution to, for two reasons:</p>
<ul>
<li>
<p>It ties together two completely different systems (particles and physics), creating an ugly coupling between them. Since the solution must have decent performance, the coupling must be done at a fairly low level, which makes it even worse.</p>
</li>
<li>
<p>Particles can have <em>very</em> different collision requirements. Some effects need a massive amount of particles (e. g., sparks), but don&#8217;t care that much about collision quality. As long as <em>most</em> of them bounce <em>somewhat</em> accurately, it is OK. Other effects may have just a single particle (e. g., a bullet casing). Performance doesn&#8217;t matter at all, but if it doesn&#8217;t bounce right you will surely notice. Handling both effects in the same system is a challenge. Having different systems for different effects is another kind of challenge.</p>
</li>
</ul>
<p>My previous attempts at implementing particle collision have all been based on first cutting out a slice of the physics world around the particle effect and then trying to find a fast representation of the collision shapes in that world slice.</p>
<p>The problem with this approach is that there are a lot of variables to tweak and tune:</p>
<ul>
<li>
<p>How big should the world slice be?</p>
</li>
<li>
<p>How much detail should there be in the simplified representation? More detail is slower, but gives better collision results.</p>
</li>
<li>
<p>What kind of representation should we use?</p>
</li>
<li>
<p>How should we handle dynamic/moving objects? How often should the world slice be updated?</p>
</li>
</ul>
<p>I&#8217;ve tried a lot of different representations: a triangle soup, a collection of half-spheres, a height field, but none of them has given completely satisfactory results. Often, parameters that work for one effect at one location fail for a different effect at a different location. Both performance and behavior are hard to predict.</p>
<p>The main idea for the new approach came from a Naughty Dog presentation at GDC. Instead of trying to create a shared collision model for all particles, we give each particle <em>its own</em> collision model, and we store it inside the particle itself, together with the other particle data.</p>
<p>Of course, it would be expensive to store a complicated collision model inside every particle, so we use the simplest model possible: a plane. We can represent that by a normal and an offset from origin. So with this approach, the data for a particle might look something like this:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="text" style="font-family:monospace;">struct Particle {
	Vector3 position;
	Vector3 velocity;
	Color8 color;
	Vector3 collision_plane_normal;
	float collision_plane_offset;
};</pre></td></tr></table></div>

<p>(Side note: Our particle data doesn&#8217;t actually look like this, we use a &#8220;structure-of-arrays&#8221; approach rather than an &#8220;array-of-structures&#8221; and we don&#8217;t have a fixed set of fields, each effect has its own set.)</p>
<p>Note that we don&#8217;t bother with any flag for indicating whether there is plane or not. If there is no collision, we just put the collision plane far enough below the origin.</p>
<p>With this approach the collision test is super fast &#8212; just a dot product and a compare. It is also really easy to parallelize the test or run it off-CPU, since it just uses local particle data and doesn&#8217;t need to access any shared memory.</p>
<p>With this method we have divided the original collision problem into two simpler ones:</p>
<ul>
<li>
<p>Collision test against a plane. (Trivial.)</p>
</li>
<li>
<p>Finding a suitable collision plane for each particle.</p>
</li>
</ul>
<p>This means that if we want to, we can use different approaches for finding the collision planes for different effects. E.g., for static effects we could hard code the collision plane and avoid collision queries completely.</p>
<p>Generally, we can find a suitable collision plane for a particle by raycasting along its trajectory. If we didn&#8217;t have any performance constraints, we could do a raycast for every particle every frame. That way we would always know what surface the particle would hit next, which means that we would get perfect collision behavior.</p>
<p>Of course, we can&#8217;t <em>actually</em> do that. Raycasts are comparatively expensive and we want to be able to support large numbers of particles.</p>
<p>To control the performance, I exposed a parameter that lets the effect designer control how many raycasts per frame an effect is a allowed to make. A typical value of 1.0 means that every frame, one particle in the effect is picked at random, a raycast is performed along that particles trajectory and its collision plane is updated with the result.</p>
<p>Note that with this solution, the work is always evenly distributed over the duration of the effect. That is a lot nicer than what you typically get with the &#8220;world slice&#8221; approach where there is a big chunk of work in the beginning when you cut out the world slice. </p>
<p>Astute readers will have noticed a fatal flaw with the design as it has been presented so far: it can&#8217;t possibly work for very many particles. If we have an effect with 1 000 particles and do a raycast every frame, it will take 33 seconds before every particle has found its collision normal. By then, they will long since have fallen through the floor.</p>
<p>So, if we want to use this approach for large numbers of particles we must be able to somehow reuse the collision results. Typically, an effect will have bundles of particles traveling in approximately the same direction. When one such particle has done a raycast and found a collision, we want to be able to share the result with its neighbors somehow.</p>
<p>I wanted to find a solution to this without having to create a complicated collision representation, because that would bring back many of the problems I had with the &#8220;world slice&#8221; approach. Eventually, I decided that since what we want to do is to cache a collision query of the form:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="text" style="font-family:monospace;">(position, direction) -&gt; collision_plane</pre></td></tr></table></div>

<p>The simplest possible thing would be to store the results in a hash. Hashes are nice, predictable data structures with well known performance characteristics.</p>
<p>To be able to hash on position and direction we must quantize them to integer values. We can quantize the position by dividing the world into cells of a certain width and height:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="text" style="font-family:monospace;">	const float cell_side = 0.5f;
	const float cell_height = 2.0f;
	int ix = position.x / cell_side;
	int iy = position.y / cell_side;
	int iz = position.z / cell_height;
	uint64 key = HASH_3(ix, iy, iz);</pre></td></tr></table></div>

<p>In this example, I use a higher resolution along the xy-axes than along the z-axes, because typically that is where the more interesting features are. <tt>HASH_3()</tt> is a macro that performs the first three rounds of the <em>murmur_hash</em> algorithm.</p>
<p>To quantize the direction we can use a similar approach. I decided to quantize the direction to just six different values, depending on along which principal axis the particle is mostly traveling:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="text" style="font-family:monospace;">	unsigned id;
	if (fabsf(dir.x) &gt;= fabsf(dir.y) &amp;&amp; fabsf(dir.x) &gt;= fabsf(dir.z))
		id = dir.x &gt; 0 ? 0 : 1;
	else if (fabsf(dir.y) &gt;= fabsf(dir.z))
		id = dir.y &gt; 0 ? 2 : 3;
	else
		id = dir.z &gt; 0 ? 4 : 5;
	key = key ^ id;</pre></td></tr></table></div>

<p>Now that we have computed a quantized representation of <em>(position, direction)</em>, we can use that as lookup value into our hash, both for storing and fetching values:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="text" style="font-family:monospace;">	struct CollisionPlane {
		Vector3 normal;
		float offset;
	};
	HashMap&lt;uint64, CollisionPlane&gt; _cache;</pre></td></tr></table></div>

<p>(Side note: Unless I&#8217;m worried about hash function collisions, I prefer to hash my keys <em>before</em> I insert them in the <em>HashMap</em> and just use a <tt>HashMap&lt;uint64,T&gt;</tt> instead of <tt>HashMap&lt;MyComplicatedKeyStruct,T&gt;</tt>. That way the hash map uses less memory and lookups can be done with a simple modulo operation.)</p>
<p>Whenever I do a particle raycast I store the result in the cache. When particles are spawned they lookup their collision plane in the cache. Particles also query the cache every time they bounce, since that typically means they will be traveling in a new direction.</p>
<p>I have a maximum size that the cache is allowed to use. When the cache reaches the maximum size, older entries are thrown out.</p>
<h2>Results</h2>
<p>The system gives high quality results for effects with few particles (because you get lots of raycasts per particle) and is still able to handle massive amounts of particles. The performance load is evenly distributed and it doesn&#8217;t need any special cases for dynamic objects.</p>
<p>There are some drawbacks. The cache requires some tweaking. Since it can only store one collision plane for each quantization cell it will miss important features if the cells are too big. On the other hand, if the cells are too small, we need lots of entries in the cache to represent the world, which means more memory and slower lookups.</p>
<p>Since we only have one collision normal per particle, there are some things that the particles just can&#8217;t do. For example, they can never come to rest at the bottom of a V-shape, because they will always only be colliding with one of the planes in the V. Overall, they will behave pretty badly in corners, where several collision planes with different normals meet. Some of these issues could be fixed by storing more than one collision plane in the particle, but I don&#8217;t think it is worth it. I prefer the simpler approach and having particles that in some tricky situations can fall through the ground.</p>
<p>Compared to the old collision code, the new code is simpler, runs faster and looks better.</p>
<p>All in all, I would say that the hack day was a success. We had great fun and produced some useful stuff. We will definitely do more days like this in the future. Not too often though. I think it is important that these days feel like a special treat. If they become too mundane, something important is lost. Once a month or so, would be ideal, I think.</p>
<p>This has also been posted to <a class="link" href="http://bitsquid.blogspot.com">The Bitsquid blog</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.altdevblogaday.com/2012/06/19/hack-day-report/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Read my lips: No more loading screens</title>
		<link>http://www.altdevblogaday.com/2012/06/04/read-my-lips-no-more-loading-screens/</link>
		<comments>http://www.altdevblogaday.com/2012/06/04/read-my-lips-no-more-loading-screens/#comments</comments>
		<pubDate>Mon, 04 Jun 2012 23:27:19 +0000</pubDate>
		<dc:creator>Niklas Frykholm</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[resource management]]></category>
		<category><![CDATA[resources]]></category>
		<category><![CDATA[streaming]]></category>

		<guid isPermaLink="false">http://www.altdevblogaday.com/?p=26495</guid>
		<description><![CDATA[<p>I&#8217;ve already talked a bit about how we manage resources in the Bitsquid engine, for example in:</p>
<ul>
<li>
<p><a class="link" href="http://www.bitsquid.se/files/resource_management.html">http://www.bitsquid.se/files/resource_management.html</a> (a bit out of date now)</p>
</li>
<li>
<p><a class="link" href="http://bitsquid.blogspot.se/2012/03/cutting-pipe-achieving-sub-second.html">http://bitsquid.blogspot.se/2012/03/cutting-pipe-achieving-sub-second.html</a> (my GDC talk)</p>
</li>
</ul>
<p>Today I want to focus on a particular aspect of resource management that I haven&#8217;t discussed in any great detail before: <em>streaming</em>.</p>
<p><a href="http://www.altdevblogaday.com/2012/06/04/read-my-lips-no-more-loading-screens/" class="more-link">Read more on Read my lips: No more loading screens&#8230;</a></p>
]]></description>
				<content:encoded><![CDATA[<p>I&#8217;ve already talked a bit about how we manage resources in the Bitsquid engine, for example in:</p>
<ul>
<li>
<p><a class="link" href="http://www.bitsquid.se/files/resource_management.html">http://www.bitsquid.se/files/resource_management.html</a> (a bit out of date now)</p>
</li>
<li>
<p><a class="link" href="http://bitsquid.blogspot.se/2012/03/cutting-pipe-achieving-sub-second.html">http://bitsquid.blogspot.se/2012/03/cutting-pipe-achieving-sub-second.html</a> (my GDC talk)</p>
</li>
</ul>
<p>Today I want to focus on a particular aspect of resource management that I haven&#8217;t discussed in any great detail before: <em>streaming</em>.</p>
<p>Streaming is good for two things:</p>
<ul>
<li>
<p>Getting rid of loading screens.</p>
</li>
<li>
<p>Showing more detail/variety than we can fit in memory.</p>
</li>
</ul>
<p>This is 2012, and according to the Mayans it is the last year anyone should have to be forced to look at a loading screen. Since I&#8217;m not directly involved in games production, but only make an engine, I can afford to make such broad, sweeping statements.</p>
<h2>A quick recap</h2>
<p>Since the two links above are quite long reads, let&#8217;s start with a quick summary of how our resource system works.</p>
<p>A <em>resource</em> in the Bitsquid engine is a piece of data uniquely identified by its <em>name</em> and <em>type</em>. For example:</p>
<table>
<tr>
<th>Type</th>
<th>Name</th>
</tr>
<tr>
<td>unit</td>
<td>units/beings/player</td>
</tr>
<tr>
<td>texture</td>
<td>vegetation/grass/tall_grass_01</td>
</tr>
<tr>
<td>wav</td>
<td>music/soft_jazz</td>
</tr>
</table>
<p>The resource files are created by our tools. They are human readable files written in a JSON-like format. Before they can be used in the runtime, they need to be <em>compiled</em>. The data compiler compiles each resource into a platform specific optimized binary blob:</p>
<p><a href="http://www.altdevblogaday.com/wp-content/uploads/2012/06/streaming1.png"><img src="http://www.altdevblogaday.com/wp-content/uploads/2012/06/streaming1.png" alt="" width="600" height="215" class="aligncenter size-full wp-image-26496" /></a></p>
<p>For loading and unloading, the resources are grouped into <em>packages</em>. A package (itself a resource) is a list of resources that can be loaded and unloaded together.</p>
<p>In a very small game, the entire game could be a single package. In a larger game a package could contain all the resources for a particular level, or a tile set that is used in multiple levels. The gameplay programmers decide when to load and unload packages.</p>
<p>For the final release of the game, the packages are converted to <em>bundles</em>. A bundle contains all the compiled resources in a package concatenated together to a single file which is compressed by a stream compression filter. The engine loads the entire bundle in one go, without seeking. This is crucial for optic media, but it also really speeds up hard drive performance.</p>
<p><a href="http://www.altdevblogaday.com/wp-content/uploads/2012/06/streaming2.png"><img src="http://www.altdevblogaday.com/wp-content/uploads/2012/06/streaming2.png" alt="" width="600" height="125" class="aligncenter size-full wp-image-26497" /></a></p>
<h2>Getting rid of loading screens</h2>
<p>To get rid of loading screens we can use <em>package streaming</em>. By this I simply mean the ability to load new packages in the background while the engine is doing other things. This means that when the player is approaching a new area in the game, we can start downloading that data in the background, while she continues playing. When she arrives at the area, the data is already in memory and she can proceed without having to wait for a loading screen.</p>
<p>In the Bitsquid engine, packages are <em>always</em> loaded in the background, so package streaming is enabled by default. The gameplay programmer can tell the engine to start loading a package. The loading is handled by a separate background thread and the gameplay programmer can poll the engine every frame to determine if the loading has completed or not. When the loading is done she can start to use the new data.</p>
<p>It is up to the gameplay programmer to decide what the engine should do during the loading time. She can choose to stall the engine (ugh), show a loading screen (semi-ugh) or do something more interesting (yes, 2012!).</p>
<p>There are many different ways of organizing and structuring packages for successful streaming and the engine lets you use whichever method you prefer. Different solutions might work for different games. A game with linear progression could have a package for each &#8220;stage&#8221; and trigger download of the next one when the current is almost complete. An open world game could have a package for each section of the map, together with additional packages for &#8220;special&#8221; locations. A game with random encounters could have separate packages for the different encounters you can run into.</p>
<p>The fact that the engine doesn&#8217;t have a hard-wired streaming model gives the designer a lot of power and flexibility. But with that power also comes a greater responsibility. The designer must bear the entire burden of setting up the packages correctly and deciding when to load them and unload them.</p>
<p>Perhaps in the future, we will pick one or two streaming models as &#8220;standard solutions&#8221; and provide some convenience functionality for them. You will of course still have the option to drop into &#8220;manual&#8221; mode for full flexibility if needed.</p>
<h2>Showing more detail</h2>
<p>For showing more detail than can fit in memory we use something that I for lack of a better word call <em>resource streaming</em>. What resource streaming means is simply that for certain resources, we don&#8217;t load the entire resource into memory when we load its package. We only keep the resource partially in memory and stream bits and pieces of it in and out as needed.</p>
<p>The simplest example of a streaming resource is perhaps a video file. Video files can contain hundreds of megabytes of data and we don&#8217;t want to load all that into memory as one big blob. Instead we want to stream it in, frame by frame, as the video is being played.</p>
<p>Remember what I said in the recap, that each resource gets compiled into a platform specific binary blob. Actually, I lied. The truth is that each resource gets compiled into <em>two</em> blobs:</p>
<p><a href="http://www.altdevblogaday.com/wp-content/uploads/2012/06/streaming3.png"><img src="http://www.altdevblogaday.com/wp-content/uploads/2012/06/streaming3.png" alt="" width="600" height="203" class="aligncenter size-full wp-image-26498" /></a></p>
<p>The first one is the <em>memory-resident</em> blob. That is the one we already know about. It goes into the bundle, gets loaded into memory by the background thread, etc.</p>
<p>The second one is the <em>streaming</em> blob. It contains data that shouldn&#8217;t go straight into memory but instead be accessible to a streaming manager that will move the data in and out of memory as necessary.</p>
<p>Not all data types produce a <em>streaming</em> blob when compiled. In fact, most don&#8217;t. Only types that use resource streaming (such as video) output any streaming data.</p>
<p>It is up to the data compiler for the specific content type to decide what goes into the memory-resident blob and what goes into the streaming blob. For example, the video compiler puts all header information (number of frames, content size, frame rate, etc) in the memory-resident blob, and just the raw frame data in the streaming blob. That way we can know the size of the video and other useful information without pulling in any stream data. </p>
<p>If you wanted to build a mega-texture solution on top of this (we haven&#8217;t) you would put the lowest MIP-level in the memory resident data together perhaps with some kind of index that told you how to quickly locate data in the stream.</p>
<p>For voice data, you could put the first 100 ms or so of the sound into the memory-resident area and the rest into the stream. That way you can start playing the sound immediately when it is triggered, without waiting for any data to be streamed in.</p>
<p>All the streaming blobs for a particular package are put into a <em>streaming bundle</em> that ends up next to the ordinary bundle for the package. The offset and size of the streaming data for each resource gets stored in the ordinary bundle together with the memory-resident data for that resource. That way, each resource always knows where to find its streaming data in the streaming bundle:</p>
<p><a href="http://www.altdevblogaday.com/wp-content/uploads/2012/06/streaming4.png"><img src="http://www.altdevblogaday.com/wp-content/uploads/2012/06/streaming4.png" alt="" width="600" height="171" class="aligncenter size-full wp-image-26499" /></a></p>
<p>Unlike the ordinary bundle, we expect the stream bundle to be accessed randomly. For that reason it should preferably reside on a hard drive rather than on optical media. For the same reason, we do not compress the stream bundle. Most of the resource formats that you would want to stream already have built-in compression (video, sound, texture). For other resources, you can always add whatever kind of compression you want when you compile the data.</p>
<p>When a bundle is loaded, we open the corresponding stream bundle and keep the file handle around for future reference. Any system that wants to access the stream data can use this handle to (asynchronously) read from the stream bundle.</p>
<p>We don&#8217;t provide any general system for deciding when to stream in this data, how much to cache in-memory and when to throw it out. Instead we leave that up to each individual system that supports resource streaming. A video streaming solution will have very different ideas about how to cache data than a texture streaming solution. Forcing them both to adhere to the same model would just complicate and suboptimize things.</p>
<p>With the combination of packet streaming and resource streaming you can cover almost any imaginable streaming scenario with a model that is both simple and flexible.</p>
<p>This has also been posted to <a class="link" href="http://bitsquid.blogspot.com">The Bitsquid blog</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.altdevblogaday.com/2012/06/04/read-my-lips-no-more-loading-screens/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Playing (with) Video</title>
		<link>http://www.altdevblogaday.com/2012/05/20/playing-with-video/</link>
		<comments>http://www.altdevblogaday.com/2012/05/20/playing-with-video/#comments</comments>
		<pubDate>Sun, 20 May 2012 06:49:06 +0000</pubDate>
		<dc:creator>Niklas Frykholm</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[bink]]></category>
		<category><![CDATA[H.264]]></category>
		<category><![CDATA[video]]></category>
		<category><![CDATA[VP8]]></category>

		<guid isPermaLink="false">http://www.altdevblogaday.com/?p=26301</guid>
		<description><![CDATA[<p>So you want to play some video? Shouldn&#8217;t be too hard, right? Just download some video playing library and call the <em>play_video()</em> function. Easy-peasy-lemon-squeezy.</p>
<p>Well, you have to make sure that the video is encoded correctly, that the library works on all platforms and plays nice with your memory, file, sound and streaming abstractions, and that the audio and video doesn&#8217;t desynchronize, which for some inexplicable reason seems to be a huge problem.</p>
<p><a href="http://www.altdevblogaday.com/2012/05/20/playing-with-video/" class="more-link">Read more on Playing (with) Video&#8230;</a></p>
]]></description>
				<content:encoded><![CDATA[<p>So you want to play some video? Shouldn&#8217;t be too hard, right? Just download some video playing library and call the <em>play_video()</em> function. Easy-peasy-lemon-squeezy.</p>
<p>Well, you have to make sure that the video is encoded correctly, that the library works on all platforms and plays nice with your memory, file, sound and streaming abstractions, and that the audio and video doesn&#8217;t desynchronize, which for some inexplicable reason seems to be a huge problem.</p>
<p>But this is just technical stuff. We can deal with that. What is worse is that video playback is also a legal morass.</p>
<p>There are literally  <em>thousands</em> of broad patents covering different aspects of video decompression. If you want to do some video coding experiments of your own you will have to read, understand and memorize all these patents so that you can carefully tip-toe your code and algorithms around them.</p>
<p>Of course, if you had a big enough pool of patents of your own you might not have to care as much, since if someone sued you, you could sue them right back with something from your own stockpile. Mutually assured destruction through lawyers. Ah, the wonderful world of software patents.</p>
<p>So, creating your own solution is pretty much out of the question. You have to pick one of the existing alternatives and do the best you can with it. In this article I&#8217;m going to look at some different options and discuss the advantages and drawbacks of each one:</p>
<ul>
<li>
<p>Just say no</p>
</li>
<li>
<p>Bink</p>
</li>
<li>
<p>Platform specific</p>
</li>
<li>
<p>H.264</p>
</li>
<li>
<p>WebM</p>
</li>
</ul>
<p>There are other alternatives that didn&#8217;t make it to this list, such as Dirac, Theora, and DivX. I&#8217;ve decided to focus on these five, since in my view H.264 is the best of the commercial formats and WebM the most promising of the &#8220;free&#8221; ones.</p>
<p>An initial idea might be: Why not just do whatever it is <a class="link" href="http://www.videolan.org/vlc/">VLC</a> does? Everybody&#8217;s favorite video player plays pretty much whatever you throw at it and is open source software.</p>
<p>Unfortunately that doesn&#8217;t work, for two reasons. First, VLC:s code is a mix of GPL and LGPL stuff. Even if you just use the LGPL parts you will run into trouble on platforms that don&#8217;t support dynamic linking. Second, the VLC team doesn&#8217;t really care about patents and just infringe away. You can probably not afford to do the same. (As a result, there is a very <a class="link" href="http://www.videolan.org/press/patents.html">real threat</a> that VLC might be sued out of existence.)</p>
<h2>A quick introduction</h2>
<p>Before we start looking at the alternatives I want to say something short about what a video file <em>is</em>, since there is some confusion in the matter, even among educated people.</p>
<p>A video file has three main parts:</p>
<ul>
<li>
<p>Video data (H.264, DivX, Theora, VP8, &#8230;)</p>
</li>
<li>
<p>Audio data (MP3, AAC, Vorbis, &#8230;)</p>
</li>
<li>
<p>A container format (Avi, Mkv, MP4, Ogg, &#8230;)</p>
</li>
</ul>
<p>The container format is just a way of packing together the audio and video data in a single file, together with some additional information.</p>
<p>The simplest possible container format would be to just concatenate the audio data to the video data and be done with it. But typically we want more functionality. We want to be able to <em>stream</em> the content, i. e. start playing it before we have downloaded the whole file, which means that audio and video data must be multiplexed. We also want to be able to quickly seek to specific time codes, so we may need an index for that. We might also want things like audio tracks in different languages, subtitling, commentary, DVD menus, etc. Container formats can become quite intricate once you start to add all this stuff.</p>
<p>A common source of confusion is that the extension of a video file (.avi, .mkv, .mp4, .ogg) only tells you the container format, not the codecs used for the audio and video data <em>in</em> the container. So a video player may fail to play a file even though it understands the container format (because it doesn&#8217;t understand what&#8217;s inside it).</p>
<h2>Option 1: Just say no</h2>
<p>Who says there has to be video in a game? The alternative is to do all cut scenes, splash screens, logos, etc in-game and use the regular renderer for everything. As technology advances and real-time visuals come closer and closer in quality to offline renders, this becomes an increasingly attractive option. It also has a number of advantages:</p>
<ul>
<li>
<p>You can re-use the in-game content.</p>
</li>
<li>
<p>Production is simpler. If you change something you don&#8217;t have to re-render the entire movie.</p>
</li>
<li>
<p>You don&#8217;t have to decide on resolution and framerate, everything is rendered at the user&#8217;s settings.</p>
</li>
<li>
<p>You can dynamically adapt the content, for example dress the players in their customized gear.</p>
</li>
<li>
<p>Having everything be &#8220;in-game visuals&#8221; is good marketing.</p>
</li>
</ul>
<p>If I was making a game I would do everything in-game. But I&#8217;m not, I&#8217;m making an engine. And I can&#8217;t really tell my customers what they can and cannot do. The fact is that there are a number of legitimate reasons for using video:</p>
<ul>
<li>
<p>Some scenes are too complex to be rendered in-game.</p>
</li>
<li>
<p>Producing videos <em>can</em> be simpler than making in-game content, since it is easier to outsource. Anybody can make a video, but only the core team can make in-game content and they may not have much time left on their hands.</p>
</li>
<li>
<p>Playing a video while streaming in content can be used to hide loading times. An in-game scene could be used in the same way, but a high-fidelity in-game scene might require too much memory, not leaving enough for the content that is streaming in.</p>
</li>
</ul>
<p>As engine developers it seems we should at least provide <em>some</em> way of playing video, even if we <em>recommend</em> to our customers to do their cutscenes in-game.</p>
<h2>Option 2: Bink</h2>
<p><a class="link" href="http://www.radgametools.com/bnkmain.htm">Bink</a> from RAD game tools is as close as you can get to a de facto standard in the games industry, being used in more than 5800 games on 14 different platforms.</p>
<p>The main drawback of Bink is the pricing. At $ 8500 per platform per game it is not exactly expensive, but for a smaller game targeting multiple platforms that is still a noticeable sum.</p>
<p>Many games have quite modest video needs. Perhaps they will just use the video player for a 30 second splash screen at the start of the game and nothing more. Paying $ 34 000 to get that on four platforms seems excessive.</p>
<p>At Bitsquid our goal has always been to develop an engine that works for both big budget and small budget titles. This means that all the essential functionality of an engine (animation, sound, gui, video, etc) should be available to the licensees without any additional licensing costs (above what they are already paying for an engine). Licensees who have special interest in one particular area may very well choose to integrate a special middleware package to fulfill their needs, but we don&#8217;t want to force <em>everybody</em> to do that.</p>
<p>So, in terms of video, this means that we want to include a basic video player without the $ 8500 price tag of Bink. That video player may not be as performant as Bink in terms of memory and processor use, but it should work well enough for anyone who just wants to play a full screen cutscene or splash screen when the CPU isn&#8217;t doing much else. People who want to play a lot of video in CPU taxing situations can still choose to integrate Bink. For them, the price and effort will be worth it.</p>
<h2>Option 3: Platform specific</h2>
<p>One approach to video playing is to not develop a platform-independent library but instead use the video playing capabilities inherent in each platform. For example, Windows has <em>Windows Media Foundation</em>, MacOS has <em>QuickTime</em>, etc.</p>
<p>Using the platform&#8217;s own library has several advantages. It is free to use, even for proprietary formats, because the platform manufacturers have already payed the license fees for the codecs. (Note though, that for some formats you need a license not just for the player, but for the distribution of content as well.) The implementation is already there, even if the APIs are not the easiest to use.</p>
<p>The biggest advantage is that on low-end platforms, using the built-in platform libraries can give you access to special video decoding hardware. For example, many phones have built-in H.264 decoding hardware. This means you can play video nearly for free, something that otherwise would be very costly on a low-end CPU.</p>
<p>But going platform specific also has a lot of drawbacks. If you target many platforms you have your work cut out for you in integrating all their different video playing backends. It adds an additional chunk of work that you need to do whenever you want to add a new platform. Furthermore, it may be tricky to support the same capabilities on all different platforms. Do they all support the same codecs, or do you have to encode the videos specifically for each platform? Do all platforms support &#8220;play to texture&#8221; or can you only play the videos full screen? What about the sound? Can you extract that from the video and position it as a regular source that reverbs through your 3D sound world? Some platforms (i.e. Vista) have almost no codecs installed by default, forcing you to distribute codecs together with your content.</p>
<p>Since we are developing a generic engine we want to cover as many platforms as possible and minimize the effort required to move a project from one platform to another. For that reason, we need a platform independent library as the primary implementation. But we might want to complement it with platform specific libraries for low end platforms that have built-in decoding hardware.</p>
<h2>Option 4: H.264 (MPEG-4, AVC)</h2>
<p>Over the last few years H.264 has emerged as the most popular commercial codec. It is used in Blu-ray players, video cameras, on iTunes, YouTube, etc. If you want a codec with good tool support and high quality, H.264 is the best choice.</p>
<p>However, H.264 is covered by patents. Patents that need to be licensed if you want to use H.264 without risking a lawsuit.</p>
<p>The H.264 patents are managed by an entity known as MPEG LA. They have gathered all the patents that they believe pertain to H.264 in &#8220;patent pool&#8221; that you can license all at once, with a single agreement. That patent pool contains 1700 patents. Yes, you read that right. The act of encoding/decoding a H.264 file is covered by 1700 patents. You can find the list in all its 97 page glory at <a class="link" href="http://www.mpegla.com/main/programs/avc/Documents/avc-att1.pdf">http://www.mpegla.com/main/programs/avc/Documents/avc-att1.pdf</a>.</p>
<p>I am not a lawyer, as they say on Slashdot, but this is my best understanding of how this patent game works:</p>
<ul>
<li>
<p>Buying a license from MPEG LA gives you the right to use the 1700 patents in the pool.</p>
</li>
<li>
<p>This doesn&#8217;t mean you can&#8217;t be sued for patent infringement. Anyone that holds a patent which is not one of the 1700 in the pool could claim that H.264 infringes on it and sue you. That seems unlikely, MPEG LA has made an effort to gather all relevant patents, but there is no way to be certain.</p>
</li>
<li>
<p>MPEG LA doesn&#8217;t by itself go after people who use H.264 without a license, that is up to the holders of the 1700 patents in the pool.</p>
</li>
</ul>
<p>The licensing terms of H.264 are irritating, but not necessarily a big financial burden:</p>
<ul>
<li>
<p>If you distribute an encoder or decoder you can distribute 100 000 copies for free, then you have to pay $ 0.20 per unit.</p>
</li>
<li>
<p>If you distribute a H.264 encoded movie, it is free if it is shorter than 12 minutes, then you have to pay $ 0.02 per copy.</p>
</li>
</ul>
<p>Note that unlike the case with other popular codecs such as MP3, it is not just the decoder/encoder that you need to license, you also need a license just for distributing H.264 content.</p>
<p>From what I&#8217;ve been able to discern, but don&#8217;t take my word for it, a game that only plays a fixed set of movies/cutscenes would not be regarded as a general decoder (even though it contains decoding software), but rather as content, which means you would pay $0.02 per copy sold if you had more than 12 minutes of video in your game and nothing otherwise (you would still need to obtain a license though).</p>
<p>Of course, if you support H.264, you may also want to support AAC, the standard audio format that accompanies it. AAC is covered by a separate licensing body (Via Licensing) that has its own licensing terms. I haven&#8217;t investigated them in any great detail.</p>
<p>You have to decide for yourself how well these terms sit with you. At Bitsquid we finally decided that if we should have a standard video playing facility, it should be one that people could use without thinking too much about patents and licensing (to the extent that is possible).</p>
<h2>Option 5: VP8 (WebM)</h2>
<p>VP8 is a &#8220;free&#8221; video codec owned by Google. It is covered by patents, but Google has granted free use of those patents and also provides a BSD licensed library <em>libvpx</em> for encoding and decoding video files. The format is also endorsed by the Free Software Foundation.</p>
<p>It is generally acknowledged that when it comes to quality, VP8 is not quite as good as H.264, though the differences are not enormous. So you are trading some visual quality for the convenience of a license free format.</p>
<p>There has been some discussion (most of it about a year ago) about whether VP8 really is unencumbered by patents. MPEG LA claimed that it knew several patents that VP8 was infringing and showed interest in creating a &#8220;patent pool&#8221; for VP8 similar to the one it holds for H.264. Nothing has come of that yet and MPEG LA has not disclosed which patents it thinks VP8 is infringing, which means that Google cannot really respond to the allegations. It is hard to know how much stock to put in this and whether anything will come of it.</p>
<p>You could argue that with this potential &#8220;threat&#8221; against VP8, it would be better to use another &#8220;free&#8221; alternative such as Dirac or Theora. However, there is not much evidence that they would fare better. Everyone who makes a &#8220;free&#8221; codec tries their best to make sure that they don&#8217;t infringe on any patents. But with thousands of these patents around, each open to legal interpretation, there is just no way to be sure.</p>
<p>This is just the sad state of affairs of software patents. And you are not safe with the commercial formats either. Even if you have licensed the 1700 patents in the MPEG LA patent pool, someone can still sue you for violating patent number 1701. <em>No one</em> in this business offers indemnification against patent suits. Not Google. Not MPEG LA. Not Bink (I think).</p>
<p>It all becomes a question of risk. Bink has been around a long time without being sued, which is reassuring. VP8 hasn&#8217;t been around that long.</p>
<p>Will there be patent claims made against VP8? Maybe. Who knows. Similar threats were made against Theora, but nothing happened there. If it does happen, Google will most likely fight back and the whole thing will drag on in the courts. Will there be patent claims against <em>you</em> for using VP8? Seems extremely unlikely. Games are not interesting enough targets. Video decoding is not our main business, and we can easily switch technology if needed. Phone manufacturers, YouTube and TV companies are more interesting targets.</p>
<p>Do you have to care about any of this at all? Up to you to decide.</p>
<h2>Our conclusion</h2>
<p>None of these alternatives are really attractive, it is more about picking the least worst than finding the best, which is frustrating for a perfectionist with self-worth issues. Good cases can be made for all of them. This is what we have decided:</p>
<ul>
<li>
<p>We will provide VP8 decoding on all platforms through libvpx. All other things equal, the &#8220;most free&#8221; format will give us most flexibility in the long run.</p>
</li>
<li>
<p>We will not (at least not right now) support matroska or other advanced container formats. Instead, we will play the videos from simple IVF streams. Sound will be played as Vorbis files through our regular sound system so we get positioning, reverb, etc.</p>
</li>
<li>
<p>If needed, we will complement this basic approach with platform-specific libraries that take advantage of the hardware decoders on low-end platforms (phones and tablets).</p>
</li>
<li>
<p>Customers that need to play a lot of movies while doing other CPU intense tasks and that aren&#8217;t happy with the performance of libvpx are recommended to look into Bink.</p>
</li>
<li>
<p>Customers that are worried about &#8220;patent risk&#8221; with VP8 are recommended to do whatever their lawyers tell them to do. (Use Bink, a platform specific library, obtain a H.264 license or avoid video all together.)</p>
</li>
</ul>
<p>This has also been posted to <a class="link" href="http://bitsquid.blogspot.com">The Bitsquid blog</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.altdevblogaday.com/2012/05/20/playing-with-video/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Embracing Dynamism</title>
		<link>http://www.altdevblogaday.com/2012/05/05/embracing-dynamism/</link>
		<comments>http://www.altdevblogaday.com/2012/05/05/embracing-dynamism/#comments</comments>
		<pubDate>Sat, 05 May 2012 21:14:25 +0000</pubDate>
		<dc:creator>Niklas Frykholm</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[dynamic]]></category>
		<category><![CDATA[lua]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[scripting]]></category>

		<guid isPermaLink="false">http://www.altdevblogaday.com/?p=26033</guid>
		<description><![CDATA[<p>Are you stuck in static thinking? Do you see your program as a fixed collection of classes and functions with unchanging behavior.</p>
<p>While that view is mostly true for old school languages such as C++ and Java, the game is different for dynamic languages: Lua, JavaScript, Python, etc. That can be easy to forget if you spend most of your time in the static world, so in this article I&#8217;m going to show some of the tricks you can apply when everything is fluid and malleable.</p>
<p><a href="http://www.altdevblogaday.com/2012/05/05/embracing-dynamism/" class="more-link">Read more on Embracing Dynamism&#8230;</a></p>
]]></description>
				<content:encoded><![CDATA[<p>Are you stuck in static thinking? Do you see your program as a fixed collection of classes and functions with unchanging behavior.</p>
<p>While that view is mostly true for old school languages such as C++ and Java, the game is different for dynamic languages: Lua, JavaScript, Python, etc. That can be easy to forget if you spend most of your time in the static world, so in this article I&#8217;m going to show some of the tricks you can apply when everything is fluid and malleable.</p>
<p>At Bitsquid our dynamic language of choice is Lua. Lua has the advantage of being fast, fully dynamic, small, simple and having a traditional (i.e. non-LISP-y) syntax. We use Lua for most gameplay code and it interfaces with the engine through an API with exposed C functions, such as <em>World.render()</em> or <em>Unit.set_position()</em>.</p>
<p>I will use Lua in all the examples below, but the techniques can be used in most dynamic languages.</p>
<h2>1. Read-eval-print-loop</h2>
<p>Dynamic languages can compile and execute code at runtime. In Lua, it is as simple as:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="lua" style="font-family:monospace;"><span style="color: #0000aa;">loadstring</span><span style="color: #66cc66;">&#40;</span><span style="color: #ff6666;">&quot;print(10*10)&quot;</span><span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#40;</span><span style="color: #66cc66;">&#41;</span></pre></td></tr></table></div>

<p>This can be used to implement a command console where you can type Lua code and directly execute it in the running game. This can be an invaluable debugging and tuning tool. For example if you need to debug some problem with the bazooka:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="lua" style="font-family:monospace;">World<span style="color: #66cc66;">.</span>spawn_unit<span style="color: #66cc66;">&#40;</span><span style="color: #ff6666;">&quot;bazooka&quot;</span><span style="color: #66cc66;">,</span> Unit<span style="color: #66cc66;">.</span>position<span style="color: #66cc66;">&#40;</span>player<span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span></pre></td></tr></table></div>

<p>Or tune the player&#8217;s run speed:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="lua" style="font-family:monospace;">Unit<span style="color: #66cc66;">.</span>set_data<span style="color: #66cc66;">&#40;</span>player<span style="color: #66cc66;">,</span> <span style="color: #ff6666;">&quot;run_speed&quot;</span><span style="color: #66cc66;">,</span> <span style="color: #cc66cc;">4.3</span><span style="color: #66cc66;">&#41;</span></pre></td></tr></table></div>

<h2>2. Reload code</h2>
<p>The console can be used for more than giving commands, you can also use it to redefine functions. If the gameplay code defines a scoring rule for kills:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="lua" style="font-family:monospace;"><span style="color: #aa9900; font-weight: bold;">function</span> Player<span style="color: #66cc66;">.</span>register_kill<span style="color: #66cc66;">&#40;</span>self<span style="color: #66cc66;">,</span> enemy<span style="color: #66cc66;">&#41;</span>
	self<span style="color: #66cc66;">.</span>score <span style="color: #66cc66;">=</span> self<span style="color: #66cc66;">.</span>score <span style="color: #66cc66;">+</span> <span style="color: #cc66cc;">10</span>
<span style="color: #aa9900; font-weight: bold;">end</span></pre></td></tr></table></div>

<p>you can use the console to redefine the function and change the rules:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="lua" style="font-family:monospace;"><span style="color: #aa9900; font-weight: bold;">function</span> Player<span style="color: #66cc66;">.</span>register_kill<span style="color: #66cc66;">&#40;</span>self<span style="color: #66cc66;">,</span> enemy<span style="color: #66cc66;">&#41;</span>
	<span style="color: #aa9900; font-weight: bold;">if</span> enemy<span style="color: #66cc66;">.</span><span style="color: #0000aa;">type</span> <span style="color: #66cc66;">==</span> <span style="color: #ff6666;">&quot;boss&quot;</span> <span style="color: #aa9900; font-weight: bold;">then</span>
		self<span style="color: #66cc66;">.</span>score <span style="color: #66cc66;">=</span> self<span style="color: #66cc66;">.</span>score <span style="color: #66cc66;">+</span> <span style="color: #cc66cc;">100</span>
	<span style="color: #aa9900; font-weight: bold;">else</span>
		self<span style="color: #66cc66;">.</span>score <span style="color: #66cc66;">=</span> self<span style="color: #66cc66;">.</span>score <span style="color: #66cc66;">+</span> <span style="color: #cc66cc;">10</span>
	<span style="color: #aa9900; font-weight: bold;">end</span>
<span style="color: #aa9900; font-weight: bold;">end</span></pre></td></tr></table></div>

<p>Executing this code will replace the existing <em>Player.register_kill</em> function with the new one. All code that previously called the old function will now call the new one and the new scoring rules will apply immediately.</p>
<p>If you take some care with how you use the global namespace you can write your Lua code so that <em>all</em> of it is reloadable using this technique. Then the gameplay programmer can just edit the Lua files on disk and press a key to reload them in-game. The game will continue to run with the new gameplay code, without any need for a reboot. Pretty cool.</p>
<p>You can even get this to work for script errors. If there is an error in the Lua code, don&#8217;t crash the game, just freeze it and allow the gameplay programmer to fix the error, reload the code and continue running.</p>
<h2>3. Override system functions</h2>
<p>The functions in the engine API don&#8217;t have any special privileges, they can be redefined just as other Lua functions. This can be used to add custom functionality or for debugging purposes.</p>
<p>Say, for example, that you have some units that are mysteriously popping up all over the level. You know they are being spawned somewhere in the gameplay code, but you can&#8217;t find where. One solution would be to override the <em>World.spawn_unit</em> function and print a stack trace whenever the offending unit is spawned:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="lua" style="font-family:monospace;">old_spawn_unit <span style="color: #66cc66;">=</span> World<span style="color: #66cc66;">.</span>spawn_unit
<span style="color: #aa9900; font-weight: bold;">function</span> World<span style="color: #66cc66;">.</span>spawn_unit<span style="color: #66cc66;">&#40;</span><span style="color: #0000aa;">type</span><span style="color: #66cc66;">,</span> position<span style="color: #66cc66;">&#41;</span>
	<span style="color: #aa9900; font-weight: bold;">if</span> <span style="color: #0000aa;">type</span> <span style="color: #66cc66;">==</span> <span style="color: #ff6666;">&quot;tribble&quot;</span> <span style="color: #aa9900; font-weight: bold;">then</span>
		<span style="color: #0000aa;">print</span> <span style="color: #ff6666;">&quot;Tribble spawned by:&quot;</span>
		print_stack_trace<span style="color: #66cc66;">&#40;</span><span style="color: #66cc66;">&#41;</span>
	<span style="color: #aa9900; font-weight: bold;">end</span>
	old_spawn_unit<span style="color: #66cc66;">&#40;</span><span style="color: #0000aa;">type</span><span style="color: #66cc66;">,</span> position<span style="color: #66cc66;">&#41;</span>
<span style="color: #aa9900; font-weight: bold;">end</span></pre></td></tr></table></div>

<p>Now, whenever a <em>tribble</em> is spawned by the script, a call stack will be printed and we can easily find who is doing the spawning.</p>
<p>Note that before we replace <em>World.spawn_unit</em>, we save the original function in the variable <em>old_spawn_unit</em>. This enables us to call <em>old_spawn_unit()</em> to do the actual spawning.</p>
<p>This technique could also be used to find all (potentially expensive) raycasts being done by the script.</p>
<h2>4. Handle deprecated functions</h2>
<p>Sometimes we need to deprecate functions in the engine API. It can be annoying to the people using the engine of course, but backwards compatibilty is the mother of stagnation. If you never throw away old code, you will eventually have a huge ugly code mess on your hands.</p>
<p>Luckily, since the script can create functions in the engine namespace, the script can provide the backwards compatibility when needed.</p>
<p>For example, we used to have a function <em>PhysicsWorld.clear_kinematic(world, actor)</em>. That naming was inconsistent with some of our other functions so we changed it to <em>Actor.set_kinematic(actor, false)</em>.</p>
<p>One way of dealing with this change would be to go through all the code in the project, find all uses of <em>PhysicsWorld.clear_kinematic</em> and change them to use <em>Actor.set_kinematic</em> instead. But <em>another</em> way would be to just implement <em>PhysicsWorld.clear_kinematic</em> in the script:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="lua" style="font-family:monospace;"><span style="color: #aa9900; font-weight: bold;">function</span> PhysicsWorld<span style="color: #66cc66;">.</span>clear_kinematic<span style="color: #66cc66;">&#40;</span>world<span style="color: #66cc66;">,</span> actor<span style="color: #66cc66;">&#41;</span>
	Actor<span style="color: #66cc66;">.</span>set_kinematic<span style="color: #66cc66;">&#40;</span>actor<span style="color: #66cc66;">,</span> <span style="color: #aa9900;">false</span><span style="color: #66cc66;">&#41;</span>
<span style="color: #aa9900; font-weight: bold;">end</span></pre></td></tr></table></div>

<p>Now the rest of the code can go on using <em>PhysicsWorld.clear_kinematic</em> without even caring that the function has been removed from the engine API. You could even use a combination of the two strategies &#8212; implementing the deprecated function in Lua for a quick fix, and then looking into removing the uses of it.</p>
<h2>5. Dynamically inserting profiling</h2>
<p>Top-down profiling with explicit profiler scopes is a good way of finding out where a game is spending most of its time. However, to be useful, explicit profiler scopes need to be inserted in all the &#8220;right&#8221; places (all potentially expensive functions).</p>
<p>In C we need to guess where these right places are before compiling the program. In Lua, we can just insert the profiler scopes dynamically. We can even create a function that adds profiling to any function we want:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="lua" style="font-family:monospace;"><span style="color: #aa9900; font-weight: bold;">function</span> profile<span style="color: #66cc66;">&#40;</span>class_name<span style="color: #66cc66;">,</span> method_name<span style="color: #66cc66;">&#41;</span>
	<span style="color: #aa9900; font-weight: bold;">local</span> f <span style="color: #66cc66;">=</span> <span style="color: #0000aa;">_G</span><span style="color: #66cc66;">&#91;</span>class_name<span style="color: #66cc66;">&#93;</span><span style="color: #66cc66;">&#91;</span>method_name<span style="color: #66cc66;">&#93;</span>
	<span style="color: #0000aa;">_G</span><span style="color: #66cc66;">&#91;</span>class_name<span style="color: #66cc66;">&#93;</span><span style="color: #66cc66;">&#91;</span>method_name<span style="color: #66cc66;">&#93;</span> <span style="color: #66cc66;">=</span> <span style="color: #aa9900; font-weight: bold;">function</span> <span style="color: #66cc66;">&#40;</span><span style="color: #66cc66;">...</span><span style="color: #66cc66;">&#41;</span>
		Profiler<span style="color: #66cc66;">.</span>start<span style="color: #66cc66;">&#40;</span>class_name <span style="color: #66cc66;">..</span> <span style="color: #ff6666;">&quot;.&quot;</span> <span style="color: #66cc66;">..</span> method_name<span style="color: #66cc66;">&#41;</span>
		f<span style="color: #66cc66;">&#40;</span><span style="color: #66cc66;">...</span><span style="color: #66cc66;">&#41;</span>
		Profiler<span style="color: #66cc66;">.</span>stop<span style="color: #66cc66;">&#40;</span><span style="color: #66cc66;">&#41;</span>
	<span style="color: #aa9900; font-weight: bold;">end</span>
<span style="color: #aa9900; font-weight: bold;">end</span></pre></td></tr></table></div>

<p>When we call this function as <em>profile(&#8216;Player&#8217;, &#8216;update&#8217;)</em> it will first save the existing <em>Player.update</em> function and then replace it with a function that calls <em>Profiler.start(&#8220;Player.update&#8221;)</em> before calling the original function and <em>Profiler.stop()</em> before returning.</p>
<p>Using this techinque, we can dynamically add profiling to any function we want during our optimization session.</p>
<h2>6. Tab completion</h2>
<p>If you implement an interactive Lua console, it is nice to support tab completion, so the user doesn&#8217;t have to remember all function names. But how do you build the list of callable functions to use with tab completion?</p>
<p>Using Lua of course! Just find all tables (i.e., classes) in the global namespace and all functions stored in those tables:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="lua" style="font-family:monospace;">t <span style="color: #66cc66;">=</span> <span style="color: #66cc66;">&#123;</span><span style="color: #66cc66;">&#125;</span>
&nbsp;
<span style="color: #aa9900; font-weight: bold;">for</span> class_name<span style="color: #66cc66;">,</span>class <span style="color: #aa9900; font-weight: bold;">in</span> <span style="color: #0000aa;">pairs</span><span style="color: #66cc66;">&#40;</span><span style="color: #0000aa;">_G</span><span style="color: #66cc66;">&#41;</span> <span style="color: #aa9900; font-weight: bold;">do</span>
	<span style="color: #aa9900; font-weight: bold;">if</span> <span style="color: #0000aa;">type</span><span style="color: #66cc66;">&#40;</span>class<span style="color: #66cc66;">&#41;</span> <span style="color: #66cc66;">==</span> <span style="color: #ff6666;">'table'</span> <span style="color: #aa9900; font-weight: bold;">then</span>
		<span style="color: #aa9900; font-weight: bold;">for</span> function_name<span style="color: #66cc66;">,</span><span style="color: #aa9900; font-weight: bold;">function</span> <span style="color: #aa9900; font-weight: bold;">in</span> <span style="color: #0000aa;">pairs</span><span style="color: #66cc66;">&#40;</span>class<span style="color: #66cc66;">&#41;</span> <span style="color: #aa9900; font-weight: bold;">do</span>
			<span style="color: #aa9900; font-weight: bold;">if</span> <span style="color: #0000aa;">type</span><span style="color: #66cc66;">&#40;</span><span style="color: #aa9900; font-weight: bold;">function</span><span style="color: #66cc66;">&#41;</span> <span style="color: #66cc66;">==</span> <span style="color: #ff6666;">'function'</span> <span style="color: #aa9900; font-weight: bold;">then</span>
				t<span style="color: #66cc66;">&#91;</span><span style="color: #66cc66;">#</span>t<span style="color: #66cc66;">+</span><span style="color: #cc66cc;">1</span><span style="color: #66cc66;">&#93;</span> <span style="color: #66cc66;">=</span> class_name <span style="color: #66cc66;">..</span> <span style="color: #ff6666;">'.'</span> <span style="color: #66cc66;">..</span> function_name
			<span style="color: #aa9900; font-weight: bold;">end</span>
		<span style="color: #aa9900; font-weight: bold;">end</span>
	<span style="color: #aa9900; font-weight: bold;">end</span>
<span style="color: #aa9900; font-weight: bold;">end</span></pre></td></tr></table></div>

<p>After running this, <em>t</em> will contain the full list of function names.</p>
<h2>7. Looping through all objects</h2>
<p>By recursing through <em>_G</em> you can enumerate all reachable objects in the Lua runtime.</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="lua" style="font-family:monospace;"><span style="color: #aa9900; font-weight: bold;">function</span> enumerate<span style="color: #66cc66;">&#40;</span>f<span style="color: #66cc66;">&#41;</span>
	<span style="color: #aa9900; font-weight: bold;">local</span> seen <span style="color: #66cc66;">=</span> <span style="color: #66cc66;">&#123;</span><span style="color: #66cc66;">&#125;</span>
	<span style="color: #aa9900; font-weight: bold;">local</span> recurse <span style="color: #66cc66;">=</span> <span style="color: #aa9900; font-weight: bold;">function</span><span style="color: #66cc66;">&#40;</span>t<span style="color: #66cc66;">&#41;</span>
		<span style="color: #aa9900; font-weight: bold;">if</span> <span style="color: #0000aa;">type</span><span style="color: #66cc66;">&#40;</span>t<span style="color: #66cc66;">&#41;</span> <span style="color: #66cc66;">~=</span> <span style="color: #ff6666;">'table'</span> <span style="color: #aa9900; font-weight: bold;">then</span> <span style="color: #aa9900; font-weight: bold;">return</span> <span style="color: #aa9900; font-weight: bold;">end</span>
		<span style="color: #aa9900; font-weight: bold;">if</span> seen<span style="color: #66cc66;">&#91;</span>t<span style="color: #66cc66;">&#93;</span> <span style="color: #66cc66;">==</span> <span style="color: #aa9900;">true</span> <span style="color: #aa9900; font-weight: bold;">then</span> <span style="color: #aa9900; font-weight: bold;">return</span> <span style="color: #aa9900; font-weight: bold;">end</span>
		f<span style="color: #66cc66;">&#40;</span>t<span style="color: #66cc66;">&#41;</span>
		seen<span style="color: #66cc66;">&#91;</span>t<span style="color: #66cc66;">&#93;</span> <span style="color: #66cc66;">=</span> <span style="color: #aa9900;">true</span>
		recurse<span style="color: #66cc66;">&#40;</span><span style="color: #0000aa;">getmetatable</span><span style="color: #66cc66;">&#40;</span>t<span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span>
		<span style="color: #aa9900; font-weight: bold;">for</span> k<span style="color: #66cc66;">,</span>v <span style="color: #aa9900; font-weight: bold;">in</span> <span style="color: #0000aa;">pairs</span><span style="color: #66cc66;">&#40;</span>t<span style="color: #66cc66;">&#41;</span> <span style="color: #aa9900; font-weight: bold;">do</span>
			recurse<span style="color: #66cc66;">&#40;</span>k<span style="color: #66cc66;">&#41;</span>
			recurse<span style="color: #66cc66;">&#40;</span>v<span style="color: #66cc66;">&#41;</span>
		<span style="color: #aa9900; font-weight: bold;">end</span>
	<span style="color: #aa9900; font-weight: bold;">end</span>
	recurse<span style="color: #66cc66;">&#40;</span><span style="color: #0000aa;">_G</span><span style="color: #66cc66;">&#41;</span>
<span style="color: #aa9900; font-weight: bold;">end</span></pre></td></tr></table></div>

<p>Calling <em>enumerate(f)</em> will call <em>f(o)</em> on all objects <em>o</em> in the runtime. (Assuming they are reachable from <em>_G</em>. Potentially, there could also be objects only reachable through Lua references held in C.)</p>
<p>Such an enumeration could be used for many things. For example, you could use it to print the health of every object in the game.</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="lua" style="font-family:monospace;"><span style="color: #aa9900; font-weight: bold;">function</span> print_health<span style="color: #66cc66;">&#40;</span>o<span style="color: #66cc66;">&#41;</span>
	<span style="color: #aa9900; font-weight: bold;">if</span> o<span style="color: #66cc66;">.</span>health <span style="color: #aa9900; font-weight: bold;">then</span> <span style="color: #0000aa;">print</span><span style="color: #66cc66;">&#40;</span>o<span style="color: #66cc66;">.</span>health<span style="color: #66cc66;">&#41;</span> <span style="color: #aa9900; font-weight: bold;">end</span>
<span style="color: #aa9900; font-weight: bold;">end</span>
enumerate<span style="color: #66cc66;">&#40;</span>print_health<span style="color: #66cc66;">&#41;</span></pre></td></tr></table></div>

<p>The technique could also be used for memory optimizations. You could loop through all Lua objects and find the memory used by each object type. Then you could focus your optimization efforts on the resource hogs.</p>
<p>This has also been posted to <a class="link" href="http://bitsquid.blogspot.com">The Bitsquid Blog</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.altdevblogaday.com/2012/05/05/embracing-dynamism/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Inheriting Velocity in Ragdolls</title>
		<link>http://www.altdevblogaday.com/2012/04/20/inheriting-velocity-in-ragdolls/</link>
		<comments>http://www.altdevblogaday.com/2012/04/20/inheriting-velocity-in-ragdolls/#comments</comments>
		<pubDate>Fri, 20 Apr 2012 21:29:39 +0000</pubDate>
		<dc:creator>Niklas Frykholm</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[animation]]></category>
		<category><![CDATA[ragdolls]]></category>

		<guid isPermaLink="false">http://www.altdevblogaday.com/?p=25654</guid>
		<description><![CDATA[<p>After a slew of abstract articles about C++ and code structuring I&#8217;d like to get back to some more meaty game engine stuff. So today I&#8217;ll talk about ragdolls. In particular, how to preserve the momentum of animated objects, so that when you switch over to the ragdoll it continues to stumble forward in the same direction that the animation was moving, before crashing to a gruesome death.</p>
<p><a href="http://www.altdevblogaday.com/2012/04/20/inheriting-velocity-in-ragdolls/" class="more-link">Read more on Inheriting Velocity in Ragdolls&#8230;</a></p>
]]></description>
				<content:encoded><![CDATA[<p>After a slew of abstract articles about C++ and code structuring I&#8217;d like to get back to some more meaty game engine stuff. So today I&#8217;ll talk about ragdolls. In particular, how to preserve the momentum of animated objects, so that when you switch over to the ragdoll it continues to stumble forward in the same direction that the animation was moving, before crashing to a gruesome death.</p>
<p><a href="http://www.altdevblogaday.com/wp-content/uploads/2012/04/ragdoll_velocities_1.jpg"><img src="http://www.altdevblogaday.com/wp-content/uploads/2012/04/ragdoll_velocities_1.jpg" alt="" width="500" height="182" class="aligncenter size-full wp-image-25655" /></a></p>
<p>So this is a small, but important problem. We want to somehow get the velocities of the animated objects and then apply them to the bodies in the ragdoll. The only snag is that animated objects typically don&#8217;t know anything about velocities. Also, we need some way of matching up the physics bodies with the animated objects.</p>
<p>First, some background information. In the Bitsquid engine, physics, scene graph and animation are completely separate systems. We strongly believe in minimizing the couplings between different systems since that makes the engine easier to understand, reason about, modify, optimize and rewrite.</p>
<p><a href="http://www.altdevblogaday.com/wp-content/uploads/2012/04/ragdoll_velocities_2.jpg"><img src="http://www.altdevblogaday.com/wp-content/uploads/2012/04/ragdoll_velocities_2.jpg" alt="" width="498" height="236" class="aligncenter size-full wp-image-25656" /></a></p>
<ul>
<li>
<p>The physics system simulates a number of bodies, possibly connected by joints.</p>
</li>
<li>
<p>The scene graph handles local-to-world transforms for a collection of nodes in a hierarchy.</p>
</li>
<li>
<p>The animation system evaluates and blends animation curves for bones.</p>
</li>
</ul>
<p>Bones and bodies hold references (just integer indices, really) to nodes in the scene graph and this how the systems communicate. After the animation has been evaluated, the resulting local transforms are written to the bones&#8217; nodes in the scene graph.</p>
<p>For keyframed physics (animated hit bodies), the animation drives the physics, which means the physics&#8217; bodies will read their world transforms from the corresponding nodes in the scene graph. For ragdolled physics, the world transforms of the bodies are written to the scene graph after the simulation has completed.</p>
<p>For partial ragdolls (such as a non-functioning, but still attached limb) or powered ragdolls (ragdolls driven by motors to achieve animation poses) it gets a little more involved (perhaps a topic for a future post), but the basic setup is the same.</p>
<p>Given this setup there are two ways of calculating the animation velocities:</p>
<ul>
<li>
<p>We can calculate the velocities directly by differentiating the animation curves.</p>
</li>
<li>
<p>We can record a node&#8217;s transform at two different time steps and compute the velocity from the difference.</p>
</li>
</ul>
<p>The first approach is doable, but not very practical. Not only do we have to differentiate all the animation curves, we must also take into account how those velocities are affected by the blend tree and local-to-world transforms. And even if we do all that, we still don&#8217;t account for movements from other sources than animation, such as scripted movements, IK or interactions with the character controller.</p>
<p>The second option is the more reasonable one. Now all we need is a way of obtaining the transforms from two different time steps. There are a number of possible options:</p>
<ul>
<li>
<p>We could add an array of Matrix4x4:s to our scene graph&#8217;s <em>last_world</em> where we store the last world transform of every object. So whenever we want to go to ragdoll we always have a <em>last_world</em> transform to calculate velocities from.</p>
</li>
<li>
<p>We could simulate the character backwards in time when we want to go to ragdoll and obtain a <em>last_world</em> transform that way.</p>
</li>
<li>
<p>We could delay the transition to ragdoll one frame, so that we have enough time to gather two world transforms for computing the velocity.</p>
</li>
</ul>
<p>The first approach is conceptually simple, but costly. We are increasing the size of all our scene graphs by about 50 % (previously they contained <em>local</em> and <em>world</em> transforms, now they will also need <em>last_world</em>). In addition we must <em>memcpy(last_world, world)</em> before we compute new world transforms. That&#8217;s a significant cost to pay all the time for something that happens very seldom (transition to ragdoll).</p>
<p>The second appraoch sounds a bit crazy, but some games actually already have this functionality. Servers in competetive multi-player fps games often need to rewind players in time in order to accurately determine if they were able to hit each other. Still, I find the approach to be a bit too complicated and invovled just to get a velocity.</p>
<p>The third aproach seems simple and cheap, but it violates one of our Bitsquid principles: <em>Thou Shalt Not Have Any Frame Delays</em>. Delaying something a frame can be a quick fix to many hairy problems, but it puts your game in a very weird transitional state where it at the same time both is and isn&#8217;t (yet) something. The character isn&#8217;t <em>really</em> a ragdoll yet, but it will be the next frame, whether I want to or not.</p>
<p>This new slightly self-contradictory state invites a host of bugs and before you know it, little logic pieces will start to seep into the code base <em>&#8220;do this</em> unless <em>you are in the special transition-to-ragdoll state&#8221;</em>. Congratulations, you have just made your codebase a lot more complicated and bug prone.</p>
<p>If this is not enough, consider the poor sucker who just wants to write a routine that does A, B, C and D, when A, B and C requires frame delays. Suddenly what was supposed to be simple function got turned into a state machine that needs to run for four frames to produce it result.</p>
<p>The simple rule that actions should take place immediately protects against such insanity.</p>
<p>So three options, none of them especially palpable.</p>
<p>I actually went with the first one, to always compute and store <em>last_world</em> in the scene graph, but with a flag so that this is only used for units that actually need it (characters that can go to ragdoll). When there is no clear winner, I always pick the simplest solution, because it is a lot easier to optimize later if the need should arise. (We could for example track <em>last_world</em> only for the nodes which have a corresponding ragdoll actor. Also we could store <em>last_world</em> as <em>(p,q)</em> instead of as a matrix.)</p>
<p>For completion, given the two transforms, the code for compting the velocities will look something like this:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="text" style="font-family:monospace;">Vector3 p0 = translation(tm_0);
Vector3 p1 = translation(tm_1);
Vector3 velocity = (p1 - p0) / dt
&nbsp;
Quaternion q0 = rotation(tm_0);
Quaternion q1 = rotation(tm_1);
Quaternion q = q1 * inverse(q0);
AxisAngle aa = q.decompose();
Vector3 angular_velocity = aa.axis * aa.angle / dt;</pre></td></tr></table></div>

<p>This has also been posted to <a class="link" href="http://bitsquid.blogspot.com">The Bitsquid Blog</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.altdevblogaday.com/2012/04/20/inheriting-velocity-in-ragdolls/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>PIMPL vs Pure Virtual Interfaces</title>
		<link>http://www.altdevblogaday.com/2012/03/21/pimpl-vs-pure-virtual-interfaces/</link>
		<comments>http://www.altdevblogaday.com/2012/03/21/pimpl-vs-pure-virtual-interfaces/#comments</comments>
		<pubDate>Wed, 21 Mar 2012 19:33:46 +0000</pubDate>
		<dc:creator>Niklas Frykholm</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[C++]]></category>
		<category><![CDATA[pimpl]]></category>

		<guid isPermaLink="false">http://altdevblogaday.com/?p=25109</guid>
		<description><![CDATA[<p>In C++, separating the <em>interface</em> (public declarations) of a class from its <em>implementation</em> (private methods, members and definitions) serves several useful purposes:</p>
<ul>
<li>
<p>Implementation details are hidden, making interfaces easier to read and understand.</p>
</li>
</ul>
<p><a href="http://www.altdevblogaday.com/2012/03/21/pimpl-vs-pure-virtual-interfaces/" class="more-link">Read more on PIMPL vs Pure Virtual Interfaces&#8230;</a></p>
]]></description>
				<content:encoded><![CDATA[<p>In C++, separating the <em>interface</em> (public declarations) of a class from its <em>implementation</em> (private methods, members and definitions) serves several useful purposes:</p>
<ul>
<li>
<p>Implementation details are hidden, making interfaces easier to read and understand.</p>
</li>
<li>
<p>Smaller header files with fewer dependencies means faster compile times.</p>
</li>
<li>
<p>A weaker coupling between the interface and the implementation gives greater freedom in reorganizing and refactoring the implementation internals.</p>
</li>
</ul>
<p>In pure C, we can achieve this separation by using a pointer to a forward declared struct:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="cpp" style="font-family:monospace;"><span style="color: #0000ff;">struct</span> SoundWorld<span style="color: #008080;">;</span>
<span style="color: #0000ff;">typedef</span> <span style="color: #0000ff;">unsigned</span> SoundInstanceId<span style="color: #008080;">;</span>
&nbsp;
SoundWorld <span style="color: #000040;">*</span>make_sound_world<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
<span style="color: #0000ff;">void</span> destroy_sound_world<span style="color: #008000;">&#40;</span>SoundWorld <span style="color: #000040;">*</span>world<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
SoundInstanceId play<span style="color: #008000;">&#40;</span>SoundWorld <span style="color: #000040;">*</span>world, SoundResource <span style="color: #000040;">*</span>sound<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
<span style="color: #0000ff;">void</span> stop<span style="color: #008000;">&#40;</span>SoundWorld <span style="color: #000040;">*</span>world, SoundInstanceId id<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span></pre></td></tr></table></div>

<p>The struct is opaque to the users of the API. The actual content is defined in the <em>.cpp</em> file:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="cpp" style="font-family:monospace;"><span style="color: #0000ff;">struct</span> SoundWorld <span style="color: #008000;">&#123;</span>
    SoundInstance playing_instances<span style="color: #008000;">&#91;</span>MAX_PLAYING_INSTANCES<span style="color: #008000;">&#93;</span><span style="color: #008080;">;</span>
    Matrix4x4 listener_pose<span style="color: #008080;">;</span>
    ...
<span style="color: #008000;">&#125;</span><span style="color: #008080;">;</span></pre></td></tr></table></div>

<p>C++ programmers are often recommended to use the <a class="link" href="http://en.wikipedia.org/wiki/Opaque_pointer">PIMPL idiom</a> (pointer to implementation) to achieve the same thing:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="cpp" style="font-family:monospace;"><span style="color: #0000ff;">class</span> SoundWorldImplementation<span style="color: #008080;">;</span>
&nbsp;
<span style="color: #0000ff;">class</span> SoundWorld
<span style="color: #008000;">&#123;</span>
<span style="color: #0000ff;">public</span><span style="color: #008080;">:</span>
    <span style="color: #0000ff;">typedef</span> <span style="color: #0000ff;">unsigned</span> InstanceId<span style="color: #008080;">;</span>
&nbsp;
    SoundWorld<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
    ~SoundWorld<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
&nbsp;
    InstanceId play<span style="color: #008000;">&#40;</span>SoundResource <span style="color: #000040;">*</span>sound<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
    <span style="color: #0000ff;">void</span> stop<span style="color: #008000;">&#40;</span>InstanceId id<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
&nbsp;
<span style="color: #0000ff;">private</span><span style="color: #008080;">:</span>
    SoundWorldImplementation <span style="color: #000040;">*</span>_impl<span style="color: #008080;">;</span>
<span style="color: #008000;">&#125;</span><span style="color: #008080;">;</span></pre></td></tr></table></div>

<p>Here, <em>SoundWorld</em> is the external interface of the class. All the messy stuff: instance variables, private methods, etc is found in the <em>SoundWorldImplementation</em> class, which is in the <em>.cpp</em> file.</p>
<p>The <em>_impl</em> pointer is created in the constructor and calls to the methods in <em>SoundWorld</em> are forwarded to the implementation object via method stubs:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="cpp" style="font-family:monospace;">SoundWorld<span style="color: #008080;">::</span><span style="color: #007788;">SoundWorld</span><span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span>
<span style="color: #008000;">&#123;</span>
    _impl <span style="color: #000080;">=</span> <span style="color: #0000dd;">new</span> SoundWorldImplementation<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
<span style="color: #008000;">&#125;</span>
&nbsp;
InstanceId SoundWorld<span style="color: #008080;">::</span><span style="color: #007788;">play</span><span style="color: #008000;">&#40;</span>SoundResource <span style="color: #000040;">*</span>sound<span style="color: #008000;">&#41;</span>
<span style="color: #008000;">&#123;</span>
    <span style="color: #0000ff;">return</span> _impl<span style="color: #000040;">-</span><span style="color: #000080;">&gt;</span>play<span style="color: #008000;">&#40;</span>sound<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
<span style="color: #008000;">&#125;</span></pre></td></tr></table></div>

<p>Another solution to the same problem is to write the interface as an abstract, pure virtual class in the <em>.h</em> file and then create the implementation as a subclass in the <em>.cpp</em> file.</p>
<p>You don&#8217;t see this solution recommended as much (at least not as a solution to this particular problem), but I actually like it better. With this approach, the header file will look something like this:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="cpp" style="font-family:monospace;"><span style="color: #0000ff;">class</span> SoundWorld
<span style="color: #008000;">&#123;</span>
<span style="color: #0000ff;">public</span><span style="color: #008080;">:</span>
    <span style="color: #0000ff;">typedef</span> <span style="color: #0000ff;">unsigned</span> InstanceId<span style="color: #008080;">;</span>
&nbsp;
    <span style="color: #0000ff;">virtual</span> ~SoundWorld<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span> <span style="color: #008000;">&#123;</span><span style="color: #008000;">&#125;</span>
    <span style="color: #0000ff;">virtual</span> InstanceId play<span style="color: #008000;">&#40;</span>SoundResource <span style="color: #000040;">*</span>sound<span style="color: #008000;">&#41;</span> <span style="color: #000080;">=</span> <span style="color: #0000dd;">0</span><span style="color: #008080;">;</span>
    <span style="color: #0000ff;">virtual</span> <span style="color: #0000ff;">void</span> stop<span style="color: #008000;">&#40;</span>InstanceId id<span style="color: #008000;">&#41;</span> <span style="color: #000080;">=</span> <span style="color: #0000dd;">0</span><span style="color: #008080;">;</span>
&nbsp;
    <span style="color: #0000ff;">static</span> SoundWorld <span style="color: #000040;">*</span>make<span style="color: #008000;">&#40;</span>Allocator <span style="color: #000040;">&amp;</span>a<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
    <span style="color: #0000ff;">static</span> <span style="color: #0000ff;">void</span> destroy<span style="color: #008000;">&#40;</span>Allocator <span style="color: #000040;">&amp;</span>a, SoundWorld <span style="color: #000040;">*</span>sw<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
<span style="color: #008000;">&#125;</span><span style="color: #008080;">;</span></pre></td></tr></table></div>

<p>Note that since the class is now abstract, we cannot create actual instances of it, to do that we need the factory functions <em>make()</em> and <em>destroy()</em>. I&#8217;ve added an <em>allocator</em> parameter for good measure, because I always want to specify <a class="link" href="http://bitsquid.blogspot.se/2010/09/custom-memory-allocation-in-c.html">explicit allocators</a> for all memory operations.</p>
<p>The corresponding <em>.cpp</em> file looks something like:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="cpp" style="font-family:monospace;"><span style="color: #0000ff;">class</span> SoundWorldImplementation <span style="color: #008080;">:</span> <span style="color: #0000ff;">public</span> SoundWorld
<span style="color: #008000;">&#123;</span>
    <span style="color: #0000ff;">friend</span> <span style="color: #0000ff;">class</span> SoundWorld<span style="color: #008080;">;</span>
&nbsp;
    SoundInstance _playing_instances<span style="color: #008000;">&#91;</span>MAX_PLAYING_INSTANCES<span style="color: #008000;">&#93;</span><span style="color: #008080;">;</span>
    Matrix4x4 _listener_pose<span style="color: #008080;">;</span>
&nbsp;
    SoundWorldImplementation<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span>
    <span style="color: #008000;">&#123;</span>
        ...
    <span style="color: #008000;">&#125;</span>
&nbsp;
    <span style="color: #0000ff;">virtual</span> InstanceId play<span style="color: #008000;">&#40;</span>SoundResource <span style="color: #000040;">*</span>sound<span style="color: #008000;">&#41;</span> 
    <span style="color: #008000;">&#123;</span>
        ...
    <span style="color: #008000;">&#125;</span>
&nbsp;
    <span style="color: #0000ff;">virtual</span> <span style="color: #0000ff;">void</span> stop<span style="color: #008000;">&#40;</span>InstanceId<span style="color: #008000;">&#41;</span> 
    <span style="color: #008000;">&#123;</span>
        ...
    <span style="color: #008000;">&#125;</span>
<span style="color: #008000;">&#125;</span><span style="color: #008080;">;</span>
&nbsp;
SoundWorld <span style="color: #000040;">*</span>SoundWorld<span style="color: #008080;">::</span><span style="color: #007788;">make</span><span style="color: #008000;">&#40;</span>Allocator <span style="color: #000040;">&amp;</span>a<span style="color: #008000;">&#41;</span>
<span style="color: #008000;">&#123;</span>
    <span style="color: #0000ff;">return</span> a.<span style="color: #007788;">make</span><span style="color: #000080;">&lt;</span>SoundWorldImplementation<span style="color: #000080;">&gt;</span><span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
<span style="color: #008000;">&#125;</span>
&nbsp;
SoundWorld <span style="color: #000040;">*</span>SoundWorld<span style="color: #008080;">::</span><span style="color: #007788;">destroy</span><span style="color: #008000;">&#40;</span>Allocator <span style="color: #000040;">&amp;</span>a, SoundWorld <span style="color: #000040;">*</span>sw<span style="color: #008000;">&#41;</span>
<span style="color: #008000;">&#123;</span>
    <span style="color: #0000ff;">return</span> a.<span style="color: #007788;">destroy</span><span style="color: #000080;">&lt;</span>SoundWorldImplementation<span style="color: #000080;">&gt;</span><span style="color: #008000;">&#40;</span>sw<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
<span style="color: #008000;">&#125;</span></pre></td></tr></table></div>

<p>The reason why most people recommend the PIMPL approach is that it has some distinct advantages:</p>
<ul>
<li>
<p>Factory functions are not needed, you can use <em>new()</em>, <em>delete()</em> or create objects on the stack.</p>
</li>
<li>
<p>The <em>SoundWorld</em> class can be subclassed.</p>
</li>
<li>
<p>The interface methods are not virtual, so calling them <em>might be</em> faster. (On the other hand, we need an extra memory fetch to get to the implementation object.)</p>
</li>
<li>
<p>PIMPL can be introduced in an existing class without changing its external interface or its relation to other classes.</p>
</li>
</ul>
<p>For my use cases, none of these advantages matter that much. Since I want to supply my own allocators, I&#8217;m not interested in <em>new</em> and <em>delete</em>. And I only use this for &#8220;big&#8221; objects, that are always heap (rather than stack) allocated.</p>
<p>I don&#8217;t make much use of implementation inheritance. In my opinion, it is almost always a bad design decision that leads to strongly coupled code and hard to follow code paths. Inheritance should be limited to interface inheritance.</p>
<p>The performance issue of virtual calls is not a huge issue, since I only use this for &#8220;big&#8221; objects (<em>Systems</em> and <em>Managers</em>). Also, I design the API so that the number of API calls is minimized. I.e., instead of a function:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="cpp" style="font-family:monospace;"><span style="color: #0000ff;">void</span> set_sound_position<span style="color: #008000;">&#40;</span>InstanceId id, <span style="color: #0000ff;">const</span> Vector3 <span style="color: #000040;">&amp;</span>pos<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span></pre></td></tr></table></div>

<p>I have:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="cpp" style="font-family:monospace;"><span style="color: #0000ff;">void</span> set_sound_positions<span style="color: #008000;">&#40;</span><span style="color: #0000ff;">unsigned</span> count, <span style="color: #0000ff;">const</span> InstanceId <span style="color: #000040;">*</span>ids, <span style="color: #0000ff;">const</span> Vector3 <span style="color: #000040;">*</span>positions<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span></pre></td></tr></table></div>

<p>This reduces the virtual call overhead, but also has additional benefits, such as being DMA friendly and allowing for parallelization and batch optimizations.</p>
<p>In the words of Mike Acton: <em>Where there&#8217;s one, there&#8217;s more than one.</em></p>
<p>The abstract class method has some advantages of its own:</p>
<ul>
<li>
<p>Cleaner code and a lot less typing, since we don&#8217;t have to write forwarding stubs for the methods in the public interface.</p>
</li>
<li>
<p>Multiple classes can implement the same interface. We can statically or dynamically select which particular implementation we want to use, which gives us more flexibility.</p>
</li>
</ul>
<p>To me, not having to write a ton of stupid boilerplate cruft is actually kind of a big deal. I know some people don&#8217;t mind boilerplate. It&#8217;s just a little extra typing, they say. Since there is nothing complicated or difficult in the boilerplate code, it doesn&#8217;t pose a problem. Programmers are not limited by typing speed, so how much you have to type doesn&#8217;t matter.</p>
<p>I don&#8217;t agree at all. In my view, <em>every line</em> of code is a burden. It comes with a cost that you pay again and again as you write, read, debug, optimize, improve, extend and refactor your code. For me, the main benefit of &#8220;higher-level&#8221; languages is that they let me do more with less code. So I&#8217;m happy to pay the overhead of a virtual call if it saves me from having 150 lines of idiotic boilerplate.</p>
<p>A nice thing about the interface and implementation separation is that it gets rid of another piece of hateful C++ boilerplate: <em>method declarations</em> (hands up everybody who enjoys keeping their <em>.h</em> and <em>.cpp</em> files synchronized).</p>
<p>Methods defined inside a C++ class do not have to be declared and can be written in any order. So if we want to add helper methods to our implementation class, that are not part of the public interface, we can just write them anywhere in the class:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="cpp" style="font-family:monospace;"><span style="color: #0000ff;">class</span> SoundWorldImplementation <span style="color: #008080;">:</span> <span style="color: #0000ff;">public</span> SoundWorld
<span style="color: #008000;">&#123;</span>
    <span style="color: #0000ff;">virtual</span> InstanceId play<span style="color: #008000;">&#40;</span>SoundResource <span style="color: #000040;">*</span>resource<span style="color: #008000;">&#41;</span> <span style="color: #008000;">&#123;</span>
        InstanceId id <span style="color: #000080;">=</span> allocate_id<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
        ...
    <span style="color: #008000;">&#125;</span>
&nbsp;
    <span style="color: #666666;">// A private method - no declaration necessary.</span>
    InstanceId allocate_id<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span> <span style="color: #008000;">&#123;</span>
        ...
    <span style="color: #008000;">&#125;</span>
<span style="color: #008000;">&#125;</span><span style="color: #008080;">;</span></pre></td></tr></table></div>

<p>It&#8217;s interesting that this small, purely syntactical change &#8212; getting rid of method declarations &#8212; makes a significant different in how the language &#8220;feels&#8221;. At least to me.</p>
<p>With this approach, adding a helper method feels like &#8220;less work&#8221; and so I&#8217;m more inclined to do it. This favors better structured code that is decomposed into a larger number of functions. More like Smalltalk than traditional C (home of the mega-method). The <a class="link" href="http://en.wikipedia.org/wiki/Linguistic_relativity">Sapir-Worf hypothesis</a> appears to hold some merit, at least in the realm of programming languages.</p>
<p>Another interesting thing to note is that the pure C implementation of opaque pointers stacks up pretty well against the C++ variants. It is simple, terse and fast (no virtual calls, no forwarding functions).</p>
<p>Every year I&#8217;m a little more impressed by C and a little more depressed by C++.</p>
<p>(This has also been posted to <a class="link" href="http://bitsquid.blogspot.com">The Bitsquid Blog</a>).</p>
]]></content:encoded>
			<wfw:commentRss>http://www.altdevblogaday.com/2012/03/21/pimpl-vs-pure-virtual-interfaces/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Caring by Sharing: The Bitsquid Documentation System</title>
		<link>http://www.altdevblogaday.com/2012/03/04/the-bitsquid-documentation-system/</link>
		<comments>http://www.altdevblogaday.com/2012/03/04/the-bitsquid-documentation-system/#comments</comments>
		<pubDate>Sun, 04 Mar 2012 16:46:40 +0000</pubDate>
		<dc:creator>Niklas Frykholm</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[documentation]]></category>
		<category><![CDATA[markup]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[ruby]]></category>

		<guid isPermaLink="false">http://altdevblogaday.com/?p=24850</guid>
		<description><![CDATA[<p>In a <a class="link" href="http://bitsquid.blogspot.com/2011/09/simple-roll-your-own-documentation.html">previous article</a> I talked a bit about our documentation system. It has now solidified into something interesting enough to be worth sharing.</p>
<p>The system consists of a collection of Ruby files that read input files (with extension <em>.bsdoc</em>) written in a simple markup language:</p>
<p><a href="http://www.altdevblogaday.com/2012/03/04/the-bitsquid-documentation-system/" class="more-link">Read more on Caring by Sharing: The Bitsquid Documentation System&#8230;</a></p>
]]></description>
				<content:encoded><![CDATA[<p>In a <a class="link" href="http://bitsquid.blogspot.com/2011/09/simple-roll-your-own-documentation.html">previous article</a> I talked a bit about our documentation system. It has now solidified into something interesting enough to be worth sharing.</p>
<p>The system consists of a collection of Ruby files that read input files (with extension <em>.bsdoc</em>) written in a simple markup language:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="text" style="font-family:monospace;"># Header
&nbsp;
Some text.
&nbsp;
* And
* A list</pre></td></tr></table></div>

<p>and converts them to HTML:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="html4strict" style="font-family:monospace;"><span style="color: #009900;">&lt;<span style="color: #000000; font-weight: bold;">h1</span>&gt;</span>Header<span style="color: #009900;">&lt;<span style="color: #66cc66;">/</span><span style="color: #000000; font-weight: bold;">h1</span>&gt;</span>
&nbsp;
<span style="color: #009900;">&lt;<span style="color: #000000; font-weight: bold;">p</span>&gt;</span>Some text.<span style="color: #009900;">&lt;<span style="color: #66cc66;">/</span><span style="color: #000000; font-weight: bold;">p</span>&gt;</span>
&nbsp;
<span style="color: #009900;">&lt;<span style="color: #000000; font-weight: bold;">ul</span>&gt;</span>
	<span style="color: #009900;">&lt;<span style="color: #000000; font-weight: bold;">li</span>&gt;&lt;<span style="color: #000000; font-weight: bold;">p</span>&gt;</span>And<span style="color: #009900;">&lt;<span style="color: #66cc66;">/</span><span style="color: #000000; font-weight: bold;">p</span>&gt;&lt;<span style="color: #66cc66;">/</span><span style="color: #000000; font-weight: bold;">li</span>&gt;</span>
	<span style="color: #009900;">&lt;<span style="color: #000000; font-weight: bold;">li</span>&gt;&lt;<span style="color: #000000; font-weight: bold;">p</span>&gt;</span>A list<span style="color: #009900;">&lt;<span style="color: #66cc66;">/</span><span style="color: #000000; font-weight: bold;">p</span>&gt;&lt;<span style="color: #66cc66;">/</span><span style="color: #000000; font-weight: bold;">li</span>&gt;</span>
<span style="color: #009900;">&lt;<span style="color: #66cc66;">/</span><span style="color: #000000; font-weight: bold;">ul</span>&gt;</span></pre></td></tr></table></div>

<p>We then use the HTML Help Compiler to convert the help files to <em>.chm</em>.</p>
<p>You can find the repository at:</p>
<ul>
<li>
<p><a class="link" href="https://bitbucket.org/bitsquid">https://bitbucket.org/bitsquid</a></p>
</li>
</ul>
<h2>Motivation</h2>
<p>Why have we created our own markup system instead of just using an existing one? (Markdown, Textile, RDoc, POD, Restructured Text, Doxygen, BBDoc, Wikimedia, Docbook, etc.)</p>
<p>For two reasons. First, none of these existing systems work exactly the way that we want. </p>
<p>An example. A large part of our documentation consists of Lua interface documentation. To make that as easy to possible to write, we use a special tag <tt>@api</tt> to enter an API documentation mode. In that mode, each unindented line documents a new Lua function. The indented lines that follow contain the documentation for the function.</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="text" style="font-family:monospace;">## Application (singleton)
&nbsp;
Interface to access global application functionality. Note that since the
application is a singleton (there is only one application), you don’t need
to pass any %r Application object to the application functions. All the
functions operate on the application singleton.
&nbsp;
@api
&nbsp;
resolution() : width, height
	Returns the screen resolution.
&nbsp;
argv() : arg1, arg2, arg3, ...
	Returns the command line arguments supplied to the application.</pre></td></tr></table></div>

<p>The documentation system recognizes the Lua function definitions and formats them appropriately. It also creates index entries for the functions in the <em>.chm</em> file. In addition, it can create cross-references between classes and functions (with the <tt>%r</tt> marker).</p>
<p>No out-of-the-box system can provide the same level of convenience.</p>
<p>In any documentation system, the documentation files are the most valuable resource. What really matters is that documentation is easy to write and easy to modify. In particular, my main concerns are:</p>
<ul>
<li>
<p>Preserving semantic information.</p>
</li>
<li>
<p>Avoiding unnecessary markup and clutter.</p>
</li>
</ul>
<p>By preserving semantic information I mean that we should be able to say, for example, that something is a Lua function definition, or a piece of sample C++ code, rather than just saying that something is <em>italic</em> or <em>preformatted</em>. If we have enough semantic information, we can do all kinds of things to the data in post-processing. We can parse the function definition using a Lua parser, or run the C++ code through a syntax highlighter. We can convert the files to some other format if we ever decide to switch documentation system.</p>
<p>If the documentation format <em>doesn&#8217;t</em> preserve semantic data, there is no way of getting that data back, except by going through all the documentation and adding it manually. That&#8217;s painful.</p>
<p>Avoiding markup and clutter is all about making the documents easy to write and easy to modify. That&#8217;s the whole point of using a markup language (instead of plain HTML) in the first place.</p>
<p>Our custom markup language lets us achieve both these goals in a way that no off-the-shelf solution could.</p>
<p>The second reason for writing our own system is that there is no fundamentally hard problem that the existing systems solve. If they did something really advanced that would take us months to duplicate, then it might be better to use an existing system even if it wasn&#8217;t perfectly adapted to our needs. But parsing some text and converting it to HTML isn&#8217;t hard. The entire documentation system is just a few hundred lines of Ruby code.</p>
<p>(In contrast, Doxygen actually <em>does</em> solve a hard problem. Parsing general C++ code is tricky. That&#8217;s why we use Doxygen to document our C++ code, but our own system for stand-alone documentation.)</p>
<h2>The System Design</h2>
<p>If I&#8217;ve done my job and convinced you that the best thing to do is to write your own documentation system, then what&#8217;s the point of sharing my code with you?</p>
<p>Well, the system we use consists of two parts. One part (the bulk of it) is generic and can be used to implement <em>any</em> markup language. The rules that are specific to <em>our</em> markup language are all kept in a single file (<em>bsdoc.rb</em>). To write your own documentation system, you could re-use the generic parts and just write your own markup definition.</p>
<p>The generic part of the system consists of four files:</p>
<dl>
<dt><em>paragraph_parser.rb</em></dt>
<dd>
<p>Parses the paragraphs of a document into block-level HTML code.</p>
</dd>
<dt><em>span_parser.rb</em></dt>
<dd>
<p>Does span-level parsing inside a HTML block.</p>
</dd>
<dt><em>generator.rb</em></dt>
<dd>
<p>Generates the output HTML.</p>
</dd>
<dt><em>toc.rb</em></dt>
<dd>
<p>Adds section numbering and a table of contents to an HTML file.</p>
</dd>
</dl>
<p>Most of the code is pretty straight forward. A rule set is a collection of regular expressions. The expressions are tested in turn against the content and the first one that matches is applied. There are separate rules for parsing the document on the block level (the <em>ParagraphParser</em>) and inside each line (the <em>SpanParser</em>).</p>
<p>There are some ideas in the system that I think are interesting enough to mention though:</p>
<h3>Line-by-line parsing</h3>
<p>On the paragraph level, the document is parsed line-by-line. Each rule regex is tested in turn and the first one that matches is applied. This ensures that the process is speedy for all kinds of input (<em>O(N)</em> in the number of lines). It also makes the system simpler to reason about.</p>
<h3>No intermediate representation</h3>
<p>The system does not build any intermediate representation of the document. It is converted directly from the <em>.bsdoc</em> source format to HTML. This again simplifies the system, because we don&#8217;t have to device an intermediate representation for all kinds of data that we want to handle.</p>
<h3>HTML &#8220;contexts&#8221; for lines</h3>
<p>When a rule is applied, it doesn&#8217;t write raw HTML code to the output. Instead, it gives the generator a piece of text and a list of tags that should be applied to it. I call this the &#8220;context&#8221; of the text.</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="ruby" style="font-family:monospace;">env.<span style="color:#9900CC;">write</span><span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#006600; font-weight:bold;">%</span>w<span style="color:#006600; font-weight:bold;">&#40;</span>ul li <span style="color:#CC0066; font-weight:bold;">p</span><span style="color:#006600; font-weight:bold;">&#41;</span>, <span style="color:#996600;">&quot;Hi!&quot;</span><span style="color:#006600; font-weight:bold;">&#41;</span></pre></td></tr></table></div>

<p>The generator will add tags as appropriate to ensure that the line is printed in the right context:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="html4strict" style="font-family:monospace;"><span style="color: #009900;">&lt;<span style="color: #000000; font-weight: bold;">ul</span>&gt;&lt;<span style="color: #000000; font-weight: bold;">li</span>&gt;&lt;<span style="color: #000000; font-weight: bold;">p</span>&gt;</span>Hi!<span style="color: #009900;">&lt;<span style="color: #66cc66;">/</span><span style="color: #000000; font-weight: bold;">p</span>&gt;&lt;<span style="color: #66cc66;">/</span><span style="color: #000000; font-weight: bold;">li</span>&gt;&lt;<span style="color: #66cc66;">/</span><span style="color: #000000; font-weight: bold;">ul</span>&gt;</span></pre></td></tr></table></div>

<p>When several lines are printed, the generator only opens and closes the minimum number of tags that are necessary to give each line the right context. It does this by matching the list of contexts for neighboring lines:</p>
<p>This:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="ruby" style="font-family:monospace;">env.<span style="color:#9900CC;">write</span><span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#006600; font-weight:bold;">%</span>w<span style="color:#006600; font-weight:bold;">&#40;</span>ul li <span style="color:#CC0066; font-weight:bold;">p</span><span style="color:#006600; font-weight:bold;">&#41;</span>, <span style="color:#996600;">&quot;First item!&quot;</span><span style="color:#006600; font-weight:bold;">&#41;</span>
env.<span style="color:#9900CC;">write</span><span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#006600; font-weight:bold;">%</span>w<span style="color:#006600; font-weight:bold;">&#40;</span>ul li <span style="color:#CC0066; font-weight:bold;">p</span><span style="color:#006600; font-weight:bold;">&#41;</span>, <span style="color:#996600;">&quot;First paragraph!&quot;</span><span style="color:#006600; font-weight:bold;">&#41;</span>
env.<span style="color:#9900CC;">write</span><span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#006600; font-weight:bold;">%</span>w<span style="color:#006600; font-weight:bold;">&#40;</span>ul li<span style="color:#006600; font-weight:bold;">&#41;</span>, <span style="color:#0000FF; font-weight:bold;">nil</span><span style="color:#006600; font-weight:bold;">&#41;</span>
env.<span style="color:#9900CC;">write</span><span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#006600; font-weight:bold;">%</span>w<span style="color:#006600; font-weight:bold;">&#40;</span>ul li <span style="color:#CC0066; font-weight:bold;">p</span><span style="color:#006600; font-weight:bold;">&#41;</span>, <span style="color:#996600;">&quot;First item, second paragraph!&quot;</span><span style="color:#006600; font-weight:bold;">&#41;</span>
env.<span style="color:#9900CC;">write</span><span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#006600; font-weight:bold;">%</span>w<span style="color:#006600; font-weight:bold;">&#40;</span>ul<span style="color:#006600; font-weight:bold;">&#41;</span>, <span style="color:#0000FF; font-weight:bold;">nil</span><span style="color:#006600; font-weight:bold;">&#41;</span>
env.<span style="color:#9900CC;">write</span><span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#006600; font-weight:bold;">%</span>w<span style="color:#006600; font-weight:bold;">&#40;</span>ul li <span style="color:#CC0066; font-weight:bold;">p</span><span style="color:#006600; font-weight:bold;">&#41;</span>, <span style="color:#996600;">&quot;Second item!&quot;</span><span style="color:#006600; font-weight:bold;">&#41;</span></pre></td></tr></table></div>

<p>ends up as:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="html4strict" style="font-family:monospace;"><span style="color: #009900;">&lt;<span style="color: #000000; font-weight: bold;">ul</span>&gt;</span>
	<span style="color: #009900;">&lt;<span style="color: #000000; font-weight: bold;">li</span>&gt;</span>
		<span style="color: #009900;">&lt;<span style="color: #000000; font-weight: bold;">p</span>&gt;</span>
			First item!
			First paragraph!
		<span style="color: #009900;">&lt;<span style="color: #000000; font-weight: bold;">p</span>&gt;</span>
		<span style="color: #009900;">&lt;<span style="color: #000000; font-weight: bold;">p</span>&gt;</span>First item, second paragraph!<span style="color: #009900;">&lt;<span style="color: #66cc66;">/</span><span style="color: #000000; font-weight: bold;">p</span>&gt;</span>
	<span style="color: #009900;">&lt;<span style="color: #66cc66;">/</span><span style="color: #000000; font-weight: bold;">li</span>&gt;</span>
	<span style="color: #009900;">&lt;<span style="color: #000000; font-weight: bold;">li</span>&gt;&lt;<span style="color: #000000; font-weight: bold;">p</span>&gt;</span>Second item!<span style="color: #009900;">&lt;<span style="color: #66cc66;">/</span><span style="color: #000000; font-weight: bold;">p</span>&gt;&lt;<span style="color: #66cc66;">/</span><span style="color: #000000; font-weight: bold;">li</span>&gt;</span>
<span style="color: #009900;">&lt;<span style="color: #66cc66;">/</span><span style="color: #000000; font-weight: bold;">ul</span>&gt;</span></pre></td></tr></table></div>

<p>Note the trick of writing <em>nil</em> to explicitly close a scope.</p>
<p>Since I really, really hate badly formatted HTML documents, I&#8217;ve made sure that the output from the generator looks (almost) as good as hand-written HTML.</p>
<p>Using contexts in this way gets rid of a lot of the complexities of HTML generation. When we write our rules we don&#8217;t have to think about opening and closing tags, we just have to make sure that we use an appropriate context for each line.</p>
<h3>Nested scopes</h3>
<p>The final idea is to automatically handle nested markup by applying the rules recursively. Consider this input document:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="text" style="font-family:monospace;">* Caught in the jungle
	* By a bear 
	* By a lion
	* By something else
* Caught in the woods</pre></td></tr></table></div>

<p>I don&#8217;t have any special parsing rules for dealing with nested lists. Instead, the first line of this document creates a scope with the context <tt>%w(ul li)</tt>. That scope is applied to all indented lines that follow it. The system strips the indentation from the line, processes it using the normal rule set, and then prepends <tt>%w(ul li)</tt> to its context. When it reaches a line without indentation, it drops the scope. Scopes can be stacked for multiple levels of nesting.</p>
<p>This way we can deal with arbitrarily complex nested structures (a code sample in a list in a blockquote) without any special processing rules.</p>
<h2>A Bonus for AltDevBlogADay Writers</h2>
<p>As a bonus for my fellow AltDevBlogADay writers I&#8217;ve added a syntax module for writing AltDevBlogADay articles. It converts source documents to a format suitable for publishing on AltDevBlogADay. (This includes taking care of the tricky <tt>&lt;pre&gt;</tt> tags.)</p>
<p>There is also a package for <em>Sublime Text 2</em> (my favorite text editor) that gives you syntax highlighting and a build command for converting a document to HTML and previewing it in a browser. I&#8217;m currently writing all my AltDevBlogADay articles in this way.</p>
<p>(This article has also been posted to <a class="link" href="http://bitsquid.blogspot.com/">The Bitsquid blog</a>.)</p>
]]></content:encoded>
			<wfw:commentRss>http://www.altdevblogaday.com/2012/03/04/the-bitsquid-documentation-system/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Sensible Error Handling &#8212; Part 3</title>
		<link>http://www.altdevblogaday.com/2012/02/20/sensible-error-handling-part-3/</link>
		<comments>http://www.altdevblogaday.com/2012/02/20/sensible-error-handling-part-3/#comments</comments>
		<pubDate>Mon, 20 Feb 2012 22:22:38 +0000</pubDate>
		<dc:creator>Niklas Frykholm</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[Errors]]></category>
		<category><![CDATA[Warnings]]></category>

		<guid isPermaLink="false">http://altdevblogaday.com/?p=24419</guid>
		<description><![CDATA[<p>In my epic trilogy on sensible error handling I&#8217;ve arrived at the third and final category of errors &#8212; warnings.</p>
<p>Warnings happen when the user does something that is kinda sorta bad, but not exactly wrong per se. It can be things like having two nodes with the same name in an entity&#8217;s scene graph, a particle effect with 1 000 000 particles or a 4096 x 4096 texture mapped to a 3 x 3 cm flower.</p>
<p><a href="http://www.altdevblogaday.com/2012/02/20/sensible-error-handling-part-3/" class="more-link">Read more on Sensible Error Handling &#8212; Part 3&#8230;</a></p>
]]></description>
				<content:encoded><![CDATA[<p>In my epic trilogy on sensible error handling I&#8217;ve arrived at the third and final category of errors &#8212; warnings.</p>
<p>Warnings happen when the user does something that is kinda sorta bad, but not exactly wrong per se. It can be things like having two nodes with the same name in an entity&#8217;s scene graph, a particle effect with 1 000 000 particles or a 4096 x 4096 texture mapped to a 3 x 3 cm flower.</p>
<p>Not necessarily <em>wrong</em> &#8212; perhaps there will be a sequence where the player is miniaturized and has to walk on the surface of the flower, fighting off hostile pollen &#8212; but definitely fishy.</p>
<p>The problem with warnings is that they are so easy to ignore. When a project has hundreds of warnings that scroll by every time you start it, no one will pay any particular attention to them and no one will notice a new one.</p>
<p>But then of course, if warnings are <em>not</em> easy to ignore, everyone on the project will have to spend a lot of their valuable time ignoring them.</p>
<p>So the <em>real</em> problem, as in so many cases, is that we don&#8217;t really know what we want. We want warnings to be <em>both</em> hard to ignore <em>and</em> easy to ignore. We can&#8217;t get good warnings in our tools without resolving this conflict in our minds.</p>
<h2>Types of warnings</h2>
<p>To progress we need more information. We need to think about what kind of warnings there are and how we want to handle them.</p>
<p>In the Bitsquid engine, our warnings can be classified into three basic types:</p>
<ul>
<li>
<p>Performance warnings</p>
</li>
<li>
<p>Suspicion warnings</p>
</li>
<li>
<p>Deprecation warnings</p>
</li>
</ul>
<p><em>Performance warnings</em> occur when the user does something that is potentially bad for performance, such as using a texture without a MIP chain or loading a 300 MB sound file into memory.</p>
<p><em>Suspicion warnings</em> occur when we detect other kinds of suspicious behavior and want to ask the user <em>&#8220;did you really mean to do X?&#8221;</em>. An example might be defining a font without any glyphs. It is not exactly an error, but it is not very useful either, and most likely, not what the user wanted.</p>
<p><em>Deprecation warnings</em>, finally, are warnings that really should be errors. We want all our data to follow a particular rule, but we have too much legacy data to be able to strictly enforce it.</p>
<p>A typical example might be naming conventions. We may want to force all nodes in a scene graph to have unique names or all mesh names to start with <em>mesh_</em>, but unless we laid down that rule at the start of the project it might be too much work to fix all the old data.</p>
<p>Another example of deprecation is when a script function is deprecated. We may want to get rid of the function <tt>AudioWorld.set_listener(pos)</tt> (because it assumes that there is only one listener in the world) and replace it with <tt>AudioWorld.set_listeners(table)</tt>, but there is a lot of script code that already uses <tt>set_listener</tt> and no time to rewrite it.</p>
<p>As for <em>when</em> warnings should be shown, I think there are only two times when you really care about warnings:</p>
<ul>
<li>
<p>When you are working with a particular object (unit, mesh, level, sound, etc), you want to see all the warnings pertaining to that object.</p>
</li>
<li>
<p>When you are doing a review of the game (e.g., a performance review), you want to look through all warnings pertaining to the aspect that you are reviewing (in this case, all performance warnings).</p>
</li>
</ul>
<p>Armed with this information, we can come up with some useful strategies for dealing with warnings.</p>
<h2>Treat warnings as errors</h2>
<p>Errors are a lot easier to deal with than warnings, at least if you adhere to the philosophy of &#8220;asserting on errors&#8221; that was outlined in the first part of this series. An error is always an error, it doesn&#8217;t require a judgement call to determine whether it is right or wrong. And since the engine doesn&#8217;t proceed until the error has been fixed, errors get fixed as soon as possible (usually before the content is checked in, and in the rare occasions when some one checks in bad data without test running it &#8212; as soon as someone else pulls). Once the error is fixed it will never bother us again.</p>
<p>In contrast, warnings linger and accumulate into an ever expanding morass of hopelessness.</p>
<p>So, one of the best strategies for dealing with warnings is to make them errors. If there is any way you can convert the warning into an error, do that. Instead of warning if two nodes in a scene graph have the same name, make it an error. Instead of warning when an object is set to be driven by both animation and physics, make it an error.</p>
<p>Of course, when we want to make an error of something that was previously just a warning, we run into the deprecation problem.</p>
<h2>Ideas for deprecation warnings</h2>
<p>The strategy for deprecation warnings is clear. We want to get rid of them and treat them as &#8220;real errors&#8221; instead. This gives us cleaner data, better long term maintainability and cleaner engine code (since we can get rid of legacy code paths for backward compatibility).</p>
<p>Here are some approaches for dealing with deprecation, in falling order of niceness:</p>
<p><strong>1. Write a conversion script</strong></p>
<p>Write a conversion script that converts all the old/deprecated data into the new format. (An argument for keeping your source data in a nice, readable, script-friendly format, such as JSON.)</p>
<p>This is by far the nicest solution, because it means you can just run the script on the content to patch it up, and then immediately turn the warning into an error. But it does require some programming effort. (And we programmers are so overworked, couldn&#8217;t an artist/slave spend three weeks renaming the 12 000 objects by hand instead?)</p>
<p>Of course, sometimes this approach isn&#8217;t possible. I.e., when there is no nice 1-1 mapping from the current (bad) state to the desired (good) state.</p>
<p>One thing I&#8217;ve noticed though, is that we programmers can have a tendency to get caught up in binary thinking. If a problem can&#8217;t be solved for every possible edge case we might declare it <em>&#8220;theoretically unsolvable&#8221;</em> and move on to other things. When building stable systems with multiple levels of abstractions, that is a very sound instinct (a sort function that works 98 % of the time is worse than useless &#8212; it&#8217;s dangerous). But when it comes to improving artist workflows it can lead us astray.</p>
<p>For example, if our script manages to rename 98 % of our resources automatically and leaves 2 % tricky cases to be done by hand, that means we&#8217;ve reduced the workload on the artist from three weeks to 2.5 hours. Quite significant.</p>
<p>So even if you can&#8217;t write a perfect conversion script, a <em>pretty good one</em> can still be very helpful.</p>
<p><strong>2. Implement a script override</strong></p>
<p>This is something I&#8217;ve found quite useful for dealing with deprecated script functions. The idea is that when we want to remove a function from the engine API, we replace it with a scripted implementation.</p>
<p>So when we replace <em>AudioWorld.set_listener()</em> with <em>AudioWorld.set_listeners()</em>, we implement <em>AudioWorld.set_listener()</em> as a pure Lua function, using the new engine API:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="lua" style="font-family:monospace;"><span style="color: #aa9900; font-weight: bold;">function</span> AudioWorld<span style="color: #66cc66;">.</span>set_listener<span style="color: #66cc66;">&#40;</span>pos<span style="color: #66cc66;">&#41;</span>
	<span style="color: #aa9900; font-weight: bold;">local</span> t <span style="color: #66cc66;">=</span> <span style="color: #66cc66;">&#123;</span>pos<span style="color: #66cc66;">&#125;</span>
	AudioWorld<span style="color: #66cc66;">.</span>set_listeners<span style="color: #66cc66;">&#40;</span>t<span style="color: #66cc66;">&#41;</span>
<span style="color: #aa9900; font-weight: bold;">end</span></pre></td></tr></table></div>

<p>This leaves it up to the gameplay programmers to decide if they want to replace all calls to <em>set_listener()</em> with <em>set_listeners()</em> or if they want to continue to use the script implementation of <em>set_listener()</em>.</p>
<p>This technique can be used whenever the old, deprecated interface can be implemented in terms of the new one.</p>
<p><strong>3. Use a doomsday clock</strong></p>
<p>Sometimes you are out of luck and there simply is no other way of converting the data than to fix it by hand. You need the data to be fixed so that you can change the warnings to errors, but it is a fair amount of work and unless you put some pressure on the artists, it just never happens. That&#8217;s when you bring out the doomsday clock.</p>
<p>The doomsday clock is a visible warning message that says something like:</p>
<blockquote><p><em>Inconsistent naming. The unit &#8216;larch_03&#8242; uses the same name &#8216;branch&#8217; for two different scene graph nodes. This warning will become a hard on error on the 1st of May, 2012. Fix your errors before then.</em></p>
</blockquote>
<p>This gives the team ample time to address the issue, but also sets a hard deadline for when it needs to be fixed.</p>
<p>For the doomsday clock to work you need a producer that is behind the idea and sees the value of turning warnings into errors. If you have that, it can be a good way of gradually cleaning up a project. If not, the warnings will never get fixed and instead you&#8217;ll just be asked again and again to move the doomsday deadline forward.</p>
<p><strong>4. Surrender</strong></p>
<p>Sometimes you just have to surrender to practicality. There might be too much bad data and just not enough time to fix it. Which means you just can&#8217;t turn that warning into an error.</p>
<p>But even if you can&#8217;t do anything about the old data, you can at least prevent any <em>new</em> bad data from entering the project and polluting it further.</p>
<p>One way of doing that is to patch up your tools so that they add a new field to the source data (another argument for using an easily <em>extensible</em> source data format, such as JSON):</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="text" style="font-family:monospace;">bad_name_is_error = true</pre></td></tr></table></div>

<p>In the data compiler, you check the <em>bad_name_is_error</em> flag. If it is set, a bad name generates a hard error, if not a warning. This means that for all new data (created with the latest version of the tool) you get the hard error check that you want, but the old data continues to work as before.</p>
<h2>Design the tools to avoid warnings</h2>
<p>Warnings are generated when the users do stuff they did not intend to. The warnings we see thus tell us something of the mistakes that users typically make, using our tools.</p>
<p>One way of reducing the amount of warnings is to use this information to guide the design of the tools. When we see a warning get triggered we should ask ourselves why the user wasn&#8217;t able to express her intents and how we could improve our tools to make that easier.</p>
<p>For example, if there are a lot of warnings about particle system overdraw, perhaps our particle system editor could have on screen indicators that showed the amount of overdraw.</p>
<p>There are lot of other ways in which we can improve our tools so that they help users to do the right thing, instead of blaming them for doing wrong.</p>
<h2>Put the warnings in the tools</h2>
<p>The most useful time to get a warning is when you are working on an object. At that time, you know exactly what you want to achieve, and it is easy to make changes.</p>
<p>It follows then that the best place to show warnings is in the tools, rather than during game play. You may have that as well, to catch any strays that don&#8217;t get vetted by the tools, but it should not be the first line of defense.</p>
<p>For every tool where it makes sense, there should be a visible warning icon displaying the number of warnings for the currently edited object. For added protection, you could also require the user to check off these warnings before saving/exporting the object to indicate: &#8220;yes I really want to do this&#8221;.</p>
<h2>Make a review tool for warnings</h2>
<p>Apart from when a particular object is edited, the other time when displaying warnings is really useful is when doing a project review in order to improve performance or quality.</p>
<p>I haven&#8217;t yet implemented it, but the way I see it, such a tool would analyze all the content in the project and organize the warnings by type. One category might be &#8220;Potentially expensive particle systems&#8221; &#8212; it would list all particle systems with, say, &gt; 2000 particles, ordered by size. Another category could be: &#8220;Possibly invisible units&#8221; &#8212; a list of all the units placed below the ground in the levels.</p>
<p>The tool would allow a producer to &#8220;tick off&#8221; warnings for things that are really OK. Perhaps, the super duper effect really needs to have 50 000 particles. The producer can mark that as valid which means the warning is hidden in all future reviews.</p>
<p>Hiding could be implemented real simply. We could just hash the object name together with the warning message and make sure we don&#8217;t show that particular message for that particular object again.</p>
<p>This has also been posted to <a class="link" href="http://bitsquid.blogspot.com">The Bitsquid blog</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.altdevblogaday.com/2012/02/20/sensible-error-handling-part-3/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Sensible Error Handling &#8211; Part 2</title>
		<link>http://www.altdevblogaday.com/2012/02/05/sensible-error-handling-part-2/</link>
		<comments>http://www.altdevblogaday.com/2012/02/05/sensible-error-handling-part-2/#comments</comments>
		<pubDate>Sun, 05 Feb 2012 15:13:06 +0000</pubDate>
		<dc:creator>Niklas Frykholm</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[C++]]></category>
		<category><![CDATA[Errors]]></category>
		<category><![CDATA[exceptions]]></category>
		<category><![CDATA[programming]]></category>

		<guid isPermaLink="false">http://altdevblogaday.com/?p=23878</guid>
		<description><![CDATA[<p>In my <a href="http://altdevblogaday.com/2012/01/22/sensible-error-handling-part-1/">last post</a> I wrote that there are three kinds of errors that we game programmers need to deal with:</p>
<ul>
<li>Unexpected errors</li>
<li>Expected errors</li>
<li>Warnings</li>
</ul>
<p>An <i>unexpected error</i> is an error that is unlikely happen and that the caller of our API has no sensible way of handling, such as a corrupted internal state, a failed memory allocation, a bad parameter supplied to a function or a file missing from a game disc. I also argued that the best way of dealing with such errors was to crash fast and hard with an assert, to expose the error and avoid &#8220;exporting&#8221; it in the API.</p>
<p><a href="http://www.altdevblogaday.com/2012/02/05/sensible-error-handling-part-2/" class="more-link">Read more on Sensible Error Handling &#8211; Part 2&#8230;</a></p>
]]></description>
				<content:encoded><![CDATA[<p>In my <a href="http://altdevblogaday.com/2012/01/22/sensible-error-handling-part-1/">last post</a> I wrote that there are three kinds of errors that we game programmers need to deal with:</p>
<ul>
<li>Unexpected errors</li>
<li>Expected errors</li>
<li>Warnings</li>
</ul>
<p>An <i>unexpected error</i> is an error that is unlikely happen and that the caller of our API has no sensible way of handling, such as a corrupted internal state, a failed memory allocation, a bad parameter supplied to a function or a file missing from a game disc. I also argued that the best way of dealing with such errors was to crash fast and hard with an assert, to expose the error and avoid &#8220;exporting&#8221; it in the API.</p>
<p>In this post I&#8217;m going to look at the expected errors.</p>
<h2>Expected errors</h2>
<p>An <i>expected error</i> is an error that we expect to happen and that the caller <i>must</i> have a plan for dealing with. A typical example is an error when fetching a web page or saving data to a memory card (which can be yanked at any moment).</p>
<p>If you are familiar with Java, the distinction between &#8220;expected&#8221; and &#8220;unexpected&#8221; errors matches quite closely Java&#8217;s concept of &#8220;checked&#8221; and &#8220;unchecked&#8221; errors. Checked errors are errors that the caller <i>must</i> deal with (or explicitly rethrow). Unchecked errors are errors that the caller is not expected to deal with. They will typically cause a crash or a long jump out to the main loop, for the applications where that makes sense.</p>
<p>My main rule for dealing with expected errors is:</p>
<ul>
	<i>Minimize the points and types of failures</i>
</ul>
<p>In other words, just as our APIs abstract functionality &#8212; replacing low-level calls with high-level concepts &#8212; they should also abstract <i>dysfunctionality</i> and replace a large number of low-level failure states with a few high-level ones.</p>
<p>Minimizing the <i>points</i> of failure means that instead of having every function (<i>enumerate()</i>, <i>open()</i>, <i>read()</i>, <i>close</i>, etc) return an error code, we design the API so that errors occur in as few places as possible. This reduces the checks that the caller needs to do and the number of different possible paths through her code.</p>
<p>Minimizing the <i>types</i> of failure means that when we fail we only do it in one of a very small number of well-defined ways. We don&#8217;t return an <i>int</i> error code that can take on 4 billion different values with vaguely defined, ambiguous and overlapping meanings (quick: what is the difference between EWOULDBLOCK and EAGAIN?).</p>
<p>In most cases <i>true/false</i> is enough (together with a log entry with more details). If the caller needs more information, we can use an <i>enum</i> for that specific function, with a very specific small range of values.</p>
<p>Again, the idea behind all this is to reduce the burden on the caller. If there is only a small number of errors that can happen, it is easy for her to verify that she has all the bases covered.</p>
<p>As an example, a (partial) save game interface may look like:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="cpp" style="font-family:monospace;"><span style="color: #0000ff;">class</span> SaveSystem
<span style="color: #008000;">&#123;</span>
	<span style="color: #0000ff;">struct</span> Data <span style="color: #008000;">&#123;</span><span style="color: #0000ff;">const</span> <span style="color: #0000ff;">char</span> <span style="color: #000040;">*</span>p<span style="color: #008080;">;</span> <span style="color: #0000ff;">unsigned</span> len<span style="color: #008080;">;</span><span style="color: #008000;">&#125;</span><span style="color: #008080;">;</span>
	<span style="color: #0000ff;">enum</span> LoadResult <span style="color: #008000;">&#123;</span>IN_PROGRESS, COMPLETED, FAILED<span style="color: #008000;">&#125;</span><span style="color: #008080;">;</span>
&nbsp;
	<span style="color: #0000ff;">unsigned</span> num_saved_games<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
	LoadId start_loading_game<span style="color: #008000;">&#40;</span><span style="color: #0000ff;">unsigned</span> i<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
	LoadResult load_result<span style="color: #008000;">&#40;</span>LoadId id<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
	Data loaded_data<span style="color: #008000;">&#40;</span>LoadId id<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
	<span style="color: #0000ff;">void</span> free_data<span style="color: #008000;">&#40;</span>LoadId id<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
<span style="color: #008000;">&#125;</span><span style="color: #008080;">;</span></pre></td></tr></table></div>

<p>Note that there is only a single place where the caller needs to check for errors (in the reply to <i>load_result()</i>). And there is only one possible fail state, either the load completes successfully or it fails.</p>
<h3>To except or not to except</h3>
<p>Exceptions are often touted as the latest and greatest in error handling, but as you know from my previous post I am not too found of them.</p>
<p>Exceptions can work for unexpected errors. I still prefer to use asserts, but if you are writing a program that cannot crash, an exception can be a reasonable way to get back to the main loop if you reach an unexpected failure state. (It&#8217;s not the only option though. Lua&#8217;s <i>pcall()</i> mechanism is an elegant and minimalistic alternative.)</p>
<p>But for the <i>expected</i> errors, the errors that are a part of the API, exceptions have a number of serious problems.</p>
<p>The first is that exceptions do not have to be declared in the API, so if you encounter an API that looks like this:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="cpp" style="font-family:monospace;"><span style="color: #0000ff;">class</span> SaveSystem
<span style="color: #008000;">&#123;</span>
	<span style="color: #0000ff;">struct</span> Data <span style="color: #008000;">&#123;</span><span style="color: #0000ff;">const</span> <span style="color: #0000ff;">char</span> <span style="color: #000040;">*</span>p<span style="color: #008080;">;</span> <span style="color: #0000ff;">unsigned</span> len<span style="color: #008080;">;</span><span style="color: #008000;">&#125;</span><span style="color: #008080;">;</span>
	<span style="color: #0000ff;">class</span> LoadException <span style="color: #008080;">:</span> <span style="color: #0000ff;">public</span> Exception <span style="color: #008000;">&#123;</span><span style="color: #008000;">&#125;</span><span style="color: #008080;">;</span>
&nbsp;
	<span style="color: #0000ff;">unsigned</span> num_saved_games<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
	LoadId start_loading_game<span style="color: #008000;">&#40;</span><span style="color: #0000ff;">unsigned</span> i<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
	<span style="color: #0000ff;">bool</span> load_completed<span style="color: #008000;">&#40;</span>LoadId id<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
	Data loaded_data<span style="color: #008000;">&#40;</span>LoadId id<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
	<span style="color: #0000ff;">void</span> free_data<span style="color: #008000;">&#40;</span>LoadId id<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
<span style="color: #008000;">&#125;</span><span style="color: #008080;">;</span></pre></td></tr></table></div>

<p>you are immediately faced with a number of questions. Which functions in the API can throw a <i>LoadException</i>? All of them or just some? Do I need to check for it everywhere? Are there any other exceptions that can be thrown, like <i>FileNotFoundException</i> or <i>IJustMadeUpThisException</i>. Should I just <i>catch</i> everything everywhere to be safe?</p>
<p>In my view, this is unacceptable. The errors are an important part of the API. If you don&#8217;t know what errors can occur and where, you have an incomplete picture of the API. Fine, we can address that with throw-declarations:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="cpp" style="font-family:monospace;"><span style="color: #0000ff;">class</span> SaveSystem
<span style="color: #008000;">&#123;</span>
	<span style="color: #0000ff;">struct</span> Data <span style="color: #008000;">&#123;</span><span style="color: #0000ff;">const</span> <span style="color: #0000ff;">char</span> <span style="color: #000040;">*</span>p<span style="color: #008080;">;</span> <span style="color: #0000ff;">unsigned</span> len<span style="color: #008080;">;</span><span style="color: #008000;">&#125;</span><span style="color: #008080;">;</span>
	<span style="color: #0000ff;">class</span> LoadException <span style="color: #008080;">:</span> <span style="color: #0000ff;">public</span> Exception <span style="color: #008000;">&#123;</span><span style="color: #008000;">&#125;</span><span style="color: #008080;">;</span>
&nbsp;
	<span style="color: #0000ff;">unsigned</span> num_saved_games<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span> <span style="color: #0000ff;">throw</span><span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
	LoadId start_loading_game<span style="color: #008000;">&#40;</span><span style="color: #0000ff;">unsigned</span> i<span style="color: #008000;">&#41;</span> <span style="color: #0000ff;">throw</span><span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
	<span style="color: #0000ff;">bool</span> load_completed<span style="color: #008000;">&#40;</span>LoadId id<span style="color: #008000;">&#41;</span> <span style="color: #0000ff;">throw</span><span style="color: #008000;">&#40;</span>LoadException<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
	Data loaded_data<span style="color: #008000;">&#40;</span>LoadId id<span style="color: #008000;">&#41;</span> <span style="color: #0000ff;">throw</span><span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
	<span style="color: #0000ff;">void</span> free_data<span style="color: #008000;">&#40;</span>LoadId id<span style="color: #008000;">&#41;</span> <span style="color: #0000ff;">throw</span><span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
<span style="color: #008000;">&#125;</span><span style="color: #008080;">;</span></pre></td></tr></table></div>

<p>Now the interface is at least well-defined, if a bit cluttered. Note that if you go down this route <i>every single function</i> in your code base should have a <i>throw</i> declaration. Otherwise you are back in no man&#8217;s land, without any clue about which functions throw exceptions and which don&#8217;t. </p>
<p>But declaring exceptions can have its drawbacks too. If you require all functions to declare exceptions, a function that just wants to &#8220;pass along&#8221; some exceptions up the call stack must declare them. This gives the exceptions an infectious tendency. Unless you are careful with your design the high level functions will gather longer and longer lists of exceptions that become harder and harder to maintain. Templates cause additional problems, because you can&#8217;t know what exceptions a templated object might throw.</p>
<p>These issues have sparked a heated debate in the Java-community about whether checked (declared) exceptions are a good idea or not. C# has chosen not to support exception declarations.</p>
<p>At the heart of the debate is (I think) a confusion about what exceptions are for. Are they for diagnosing and recovering from unforeseen errors, or are they a convenient control structure for dealing with expected errors? By explicitly distinguishing &#8220;unexpected errors&#8221; from &#8220;expected errors&#8221; we make these two roles clearer and can thus avoid a lot of the confusion.</p>
<p>Anyways, the declarations are not my only gripe with exceptions. My second issue is that they introduce additional &#8220;hidden&#8221; code paths, which makes the code harder to read, understand and reason about.</p>
<p>Consider the following piece of code:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="cpp" style="font-family:monospace;"><span style="color: #0000ff;">if</span> <span style="color: #008000;">&#40;</span>ss<span style="color: #000040;">-</span><span style="color: #000080;">&gt;</span>load_completed<span style="color: #008000;">&#40;</span>id<span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span> <span style="color: #008000;">&#123;</span>
	Data data <span style="color: #000080;">=</span> ss<span style="color: #000040;">-</span><span style="color: #000080;">&gt;</span>loaded_data<span style="color: #008000;">&#40;</span>id<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
	...
<span style="color: #008000;">&#125;</span></pre></td></tr></table></div>

<p>By just glancing at this code, it is pretty hard to tell that an error in <i>load_completed()</i> will cause it to leave the current function and jump to some other location higher up in the call stack.</p>
<p>When exceptions are used you can&#8217;t just read the code straight up. You have to consider that at every single line you are looking at, an exception might be raised and the code flow changed.</p>
<p>This leads me to the concept of <i>exception safety</i>. Is your code &#8220;exception safe&#8221;? I&#8217;ll go out on a limb and say: probably not. Writing &#8220;exception safe&#8221; code requires having a mindset where you view every single function in your code base as a &#8220;transaction&#8221; that can be fully or partially rolled back in the case of an exception. That is a lot of extra effort, especially if you need to do it in <i>every single line</i> in your code base.</p>
<p>It might still be worth it, of course, if exceptions had many other advantages. But as a method for dealing with expected errors, I just don&#8217;t see those advantages, so I&#8217;d rather use my brain cycles for something else.</p>
<p>So what do I propose instead? Error codes!</p>
<p>Yes, yes I know, we all hate error codes, but <i>why</i> do we hate them? As I see it, there are three main problems with using error codes for error reporting:</p>
<ol>
<li>The code gets littered with error checks, making it hard to read.</li>
<li>Undescriptive error codes lead to confusion about what errors a function can return and what they mean.</li>
<li>Since C functions cannot return multiple values, we cannot both return an error code and a result. If we use error codes, the result must be returned in a parameter, which is inelegant.</li>
</ol>
<p>I have already addressed the first two points. By designing our API so that errors only happen in a few places, we minimize the checks that are needed. And instead of returning an undescriptive generic error code, we should return a function-specific enum that exactly describes the errors that the function can generate:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="cpp" style="font-family:monospace;"><span style="color: #0000ff;">enum</span> LoadResult <span style="color: #008000;">&#123;</span>IN_PROGRESS, COMPLETED, FILE_NOT_FOUND, FILE_COULD_NOT_BE_READ, FILE_CORRUPTED<span style="color: #008000;">&#125;</span><span style="color: #008080;">;</span>
LoadResult load_result<span style="color: #008000;">&#40;</span>LoadId id<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span></pre></td></tr></table></div>

<p>As for the third problem, I don&#8217;t know why C programmers are so adverse to just putting two values in a struct and returning that. In my opinion, this:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="cpp" style="font-family:monospace;"><span style="color: #0000ff;">struct</span> Data <span style="color: #008000;">&#123;</span><span style="color: #0000ff;">const</span> <span style="color: #0000ff;">char</span> <span style="color: #000040;">*</span>p<span style="color: #008080;">;</span> <span style="color: #0000ff;">unsigned</span> len<span style="color: #008080;">;</span><span style="color: #008000;">&#125;</span><span style="color: #008080;">;</span>
Data loaded_data<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span></pre></td></tr></table></div>

<p>Is a lot nicer than this:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="cpp" style="font-family:monospace;"><span style="color: #0000ff;">const</span> <span style="color: #0000ff;">char</span> <span style="color: #000040;">*</span>loaded_data<span style="color: #008000;">&#40;</span><span style="color: #0000ff;">unsigned</span> <span style="color: #000040;">&amp;</span>len<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span></pre></td></tr></table></div>

<p>Maybe in them olden days, returning 8 bytes on the stack was such a horrible inefficient operation that it caused your vacuum tubes to explode. But clearly, it is time to move on. If you want to return multiple value &#8212; just do it! The &#8220;return in parameter&#8221; idiom should only be used for types where returning on the stack would cause memory allocation, such as strings or vectors.</p>
<p>This is how you return an error code in 2012:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="cpp" style="font-family:monospace;"><span style="color: #0000ff;">struct</span> SaveResult <span style="color: #008000;">&#123;</span>
	<span style="color: #0000ff;">enum</span> <span style="color: #008000;">&#123;</span>NO_ERROR, DISK_FULL, WRITE_ERROR<span style="color: #008000;">&#125;</span> error<span style="color: #008080;">;</span>
	<span style="color: #0000ff;">unsigned</span> saved_bytes<span style="color: #008080;">;</span>
<span style="color: #008000;">&#125;</span><span style="color: #008080;">;</span>
SaveResult save_result<span style="color: #008000;">&#40;</span>SaveId id<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span></pre></td></tr></table></div>

<p>In the next and final part of this series I&#8217;ll look at warnings.</p>
<p>(This has also been posted to the <a href="http://bitsquid.blogspot.com/">BitSquid blog</a>.)</p>
]]></content:encoded>
			<wfw:commentRss>http://www.altdevblogaday.com/2012/02/05/sensible-error-handling-part-2/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Sensible Error Handling: Part 1</title>
		<link>http://www.altdevblogaday.com/2012/01/22/sensible-error-handling-part-1/</link>
		<comments>http://www.altdevblogaday.com/2012/01/22/sensible-error-handling-part-1/#comments</comments>
		<pubDate>Sun, 22 Jan 2012 09:47:34 +0000</pubDate>
		<dc:creator>Niklas Frykholm</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://altdevblogaday.com/?p=23290</guid>
		<description><![CDATA[<p>To err is human. But it is also quite computery. Unfortunately, error handling tends to bring out the worst in APIs.</p>
<p>Error handling is what makes your program go from something nice, clear and readable such as:</p>
<p><a href="http://www.altdevblogaday.com/2012/01/22/sensible-error-handling-part-1/" class="more-link">Read more on Sensible Error Handling: Part 1&#8230;</a></p>
]]></description>
				<content:encoded><![CDATA[<p>To err is human. But it is also quite computery. Unfortunately, error handling tends to bring out the worst in APIs.</p>
<p>Error handling is what makes your program go from something nice, clear and readable such as:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="cpp" style="font-family:monospace;">Stuff s <span style="color: #000080;">=</span> open_something<span style="color: #008000;">&#40;</span>x<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
<span style="color: #0000ff;">int</span> len <span style="color: #000080;">=</span> get_size<span style="color: #008000;">&#40;</span>s<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
...</pre></td></tr></table></div>

<p>To some horrible monstrosity such as:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="cpp" style="font-family:monospace;">Stuff s<span style="color: #008080;">;</span>
<span style="color: #0000ff;">int</span> err <span style="color: #000080;">=</span> open_something<span style="color: #008000;">&#40;</span>x, <span style="color: #000040;">&amp;</span>s<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
<span style="color: #0000ff;">if</span> <span style="color: #008000;">&#40;</span>err <span style="color: #000080;">==</span> E_STUFFNOTFOUND<span style="color: #008000;">&#41;</span> <span style="color: #008000;">&#123;</span>
    <span style="color: #0000dd;">fprintf</span><span style="color: #008000;">&#40;</span><span style="color: #0000ff;">stderr</span>, ”Something was not found”<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
    <span style="color: #0000ff;">goto</span> <span style="color: #0000dd;">exit</span><span style="color: #008080;">;</span>
<span style="color: #008000;">&#125;</span> <span style="color: #0000ff;">else</span> <span style="color: #0000ff;">if</span> <span style="color: #008000;">&#40;</span>err <span style="color: #000080;">==</span> E_INVAL<span style="color: #008000;">&#41;</span> <span style="color: #008000;">&#123;</span>
    <span style="color: #0000dd;">fprintf</span><span style="color: #008000;">&#40;</span><span style="color: #0000ff;">stderr</span>, ”Something was invalid”<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
    <span style="color: #0000ff;">goto</span> <span style="color: #0000dd;">exit</span><span style="color: #008080;">;</span>
<span style="color: #008000;">&#125;</span> <span style="color: #0000ff;">else</span> <span style="color: #0000ff;">if</span> <span style="color: #008000;">&#40;</span>err <span style="color: #000080;">==</span> E_RETRY <span style="color: #000040;">||</span> err <span style="color: #000080;">=</span> E_COMPUTERNOTINTHEMOOD<span style="color: #008000;">&#41;</span> 
    <span style="color: #0000ff;">goto</span> <span style="color: #0000dd;">exit</span><span style="color: #008080;">;</span>
<span style="color: #0000ff;">int</span> len <span style="color: #000080;">=</span> <span style="color: #0000dd;">0</span><span style="color: #008080;">;</span>
err <span style="color: #000080;">=</span> get_size<span style="color: #008000;">&#40;</span>s, <span style="color: #000040;">&amp;</span>len<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
<span style="color: #0000ff;">if</span> <span style="color: #008000;">&#40;</span>err <span style="color: #000080;">==</span> E_HULLNOTPOLARIZED<span style="color: #008000;">&#41;</span>
    <span style="color: #0000ff;">goto</span> close_and_exit<span style="color: #008080;">;</span>
...</pre></td></tr></table></div>

<p>In this article (and the follow-up) I’m going to discuss how I think you should design systems so that the error handling is as sensible as possible and the burden on the callers is minimized.</p>
<p>Note that I’m discussing this from the perspective of game development where errors will never cause serious damage to humans or property (I’m disregarding the keyboards smashed in frustration when a game crashes during the final minutes of a three hour boss fight).</p>
<h2>Types of Errors</h2>
<p>There are three main types of errors that we need to deal with:</p>
<ul>
<li>Expected errors</li>
<li>Unexpected errors</li>
<li>Warnings</li>
</ul>
<p>By an <i>expected error</i> I mean any kind of error that happens in a situation where the caller can reasonably expect that something might go wrong and has a plan for dealing with that. The most typical example is network code. Since the network may die at any time, the caller cannot just call <i>fetch_web_page()</i> and assume that she will get a valid result. She must always check for and be prepared to handle errors.</p>
<p>An <i>unexpected error</i> is an error that happens when the caller has no reason to assume that something might go wrong. A typical example might be a <i>NULL</i> pointer returned by an allocator that is out of memory or a corrupted internal state caused by a buffer overflow problem.</p>
<p>What errors can be considered ”expected” depends on context. When opening a saved game or a user config file, <i>File Not Found</i> might be an expected error, because we can expect the user to muck around with those files. When opening our main .<i>pak</i> bundles, <i>File Not Found</i> is an unexpected error, because we don’t expect the user to partially delete an installed game. And besides, there is not much we can do beyond displaying an error message if our data isn’t there.</p>
<p>A <i>warning</i> happens when someone has done something that is kind-of sort-of bad, probably, but we are able to continue running without any ill effects. An example might be a call to a deprecated function.</p>
<h2>Unexpected Errors</h2>
<p>The unexpected errors are the most common ones. Expected errors only happen in a few well-defined places, such as network code. Unexpected errors can happen everywhere. It is always safe to assume that you program contains lots of bugs that you have no idea about.</p>
<p>My policy for handling unexpected errors is simple:</p>
<blockquote><p>
Crash the engine as soon as possible with an informative error message.
</p></blockquote>
<p>This may seem like a totally irresponsible thing to do. Crashing is&#8230; bad, right?</p>
<p>Actually it is exactly the opposite.</p>
<p>If we didn’t crash it would be up to the caller to handle the error. So the programmer writing that code wouldn’t only have to think about what she wanted to achieve with our API, but also in what ways our code might fail and how she would have to handle that. That is more work and leads to cluttered code, as in the example above. It is also nearly impossible to do in a good way. Remember, these are unexpected errors. Anything might happen.</p>
<p>By crashing, the API is <i>taking full responsibility</i> for performing what the caller asks of it. We are saying: either we will do what you wanted or, if there is a problem with that, we will deal with that too. In either case, you don’t have to worry about it.</p>
<p>Crashing makes APIs simpler and reduces the mental burden of the caller. Here is what a file API might look like if designed with the ”crash”-philosophy in mind.</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="cpp" style="font-family:monospace;"><span style="color: #0000ff;">bool</span> exists<span style="color: #008000;">&#40;</span><span style="color: #0000ff;">const</span> <span style="color: #0000ff;">char</span> <span style="color: #000040;">*</span>path<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
Archive open<span style="color: #008000;">&#40;</span><span style="color: #0000ff;">const</span> <span style="color: #0000ff;">char</span> <span style="color: #000040;">*</span>path<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span></pre></td></tr></table></div>

<p>Note the curious absence of any error codes. If the caller passes a malformed path, we crash, we do not return an <i>E_INVALIDARGUMENT</i> error. If the file doesn’t exist, we crash. The caller is responsible for using <i>exist()</i> to check for files that might not exist. There are no errors for the caller to handle and the code will be clean and readable.</p>
<p>Since life is so much simpler for the caller when she doesn’t have to think about errors, we write our code with that in mind. Instead of functions returning error codes, such as:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="cpp" style="font-family:monospace;"><span style="color: #666666;">/// Returns E_PARSE_ERROR on badly formatted Json, E_NULL if</span>
<span style="color: #666666;">/// passed a null pointer, E_OVERFLOW if too big, etc.</span>
<span style="color: #0000ff;">int</span> parse_json_number<span style="color: #008000;">&#40;</span><span style="color: #0000ff;">const</span> <span style="color: #0000ff;">char</span> <span style="color: #000040;">*</span>s, <span style="color: #0000ff;">double</span> <span style="color: #000040;">&amp;</span>number<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span></pre></td></tr></table></div>

<p>we have functions that crash on errors:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="cpp" style="font-family:monospace;"><span style="color: #0000ff;">double</span> parse_json_number<span style="color: #008000;">&#40;</span><span style="color: #0000ff;">const</span> <span style="color: #0000ff;">char</span> <span style="color: #000040;">*</span>s<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span></pre></td></tr></table></div>

<p>In most cases this is all we need, because we expect the Json to be well formed. If it isn&#8217;t, some other part of our tech has made an error that needs to be fixed. If we had any situations where we could expect bad Json (perhaps hand-entered through the in-game console), we would add a validating function:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="cpp" style="font-family:monospace;"><span style="color: #0000ff;">bool</span> is_valid_json_number<span style="color: #008000;">&#40;</span><span style="color: #0000ff;">const</span> <span style="color: #0000ff;">char</span> <span style="color: #000040;">*</span>s<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span></pre></td></tr></table></div>

<p>Now we can have some code that deals with bad data without forcing error handling into all our code.</p>
<h3>But do we really need to crash?</h3>
<p>At this point, some people will probably agree with most of the things I say, but still feel uneasy about crashing. Because crashing is&#8230; bad, right? Nobody wants to be the programmer that crashed the engine. Surely it is better to write a really serious, really super-stern error message that can’t be ignored but then try to patch things up and solider on so that we don’t crash. If a file doesn’t exist perhaps we can pretend that it did exist but was empty. If the Json we tried to parse was malformed, perhaps we can just return the part of it that we managed to parse. If the caller wants to access data beyond the end of an array, perhaps we can just return the last element.</p>
<p>No thanks.</p>
<p>I have two problems with this.</p>
<p>First, this makes programmers expend a lot of mental energy thinking about how to patch up an erroneous state. Most likely, this work is completely futile. They won’t be able to think about all the errors that might possibly occur. The attempts of patching things up will probably just cause a cascade of other errors and a more serious (and confusing) crash later on. And the ”error fixing” code will be strange and ugly. More code is always a burden, a cost. Let’s not spend it on magically patching up errors in ways that won’t work. Let’s focus on fixing the errors instead. </p>
<p>Second, I don’t care how stern your error message is, I promise you it will be ignored. If it happens infrequently, if it is just on one machine, if it is in a new system, if we just need to send these screen shots off to day, if a deadline is coming up, if we’re past the deadline, if there’s another deadline. It will be ignored. Your code will gather more and more errors that don’t get fixed, until it is a glitchy, horrible mess.</p>
<p>That’s why I love crashing. It is an error that can’t be ignored. Of course it is unacceptable for an engine to crash. And that’s why the error will be fixed. Which will make everybody happier in the long run. Crashes improve the production process and lead to better quality code.</p>
<p>Nobody wants the game to crash for the end user, but the way to achieve that is with testing and bug fixing, not by finding ways of ignoring the errors that you detect.</p>
<h3>Exceptions</h3>
<p>Rather than crashing isn’t it better to throw an exception? If the exception isn’t caught we get a crash, just as before. But we also have the option, if we really want to, to catch the exception and handle the error. It would seem that by using exceptions we can have our cake and eat it too.</p>
<p>Low-level programmers tend to abhor exceptions because they come with some performance overheads, even when they aren’t thrown. I’m not actually sure what the current status is, whether this is something that you still have to worry about or if exceptions are ”fast enough” on all current compilers and platforms.</p>
<p>I haven’t needed to care about that, because I dislike exceptions for the complexity they add. The crash model is dead simple, the code either works or not. The caller knows that she is not responsible for any error handling.</p>
<p>With exceptions, this clear and useful distinction between expected and unexpected errors is muddled and the caller is faced with a number of questions:</p>
<p>This function throws exceptions. Do I need to handle those? What kind of exceptions might it throw? Even if I don’t catch the exception, might someone higher up in the call hierarchy do it? Does this mean that I need to write all my code so that the state is valid if an exception is thrown somewhere (might be anywhere, really) by one of the functions I call? What if I’m in a constructor? What if I’m in a destructor.</p>
<p>By using exceptions instead of just crashing we are creating a more complicated API (the API now includes all the different exceptions that the different functions might call) and significantly increasing the mental burden on the caller for very little gain.</p>
<h3>Good error reports</h3>
<p>When we crash, we try to create an error message and a log report that is as informative as possible to facilitate debugging of the problem. Our reports always include:</p>
<ul>
<li>A description of the error</li>
<li>The call stack</li>
<li>The error context</li>
</ul>
<p>We use printf-formatting to create an the error message. Note that the C preprocessor supports variadic macros, so you can create macros that work like printf:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="cpp" style="font-family:monospace;"><span style="color: #339900;">#if defined(DEVELOPMENT)</span>
    <span style="color: #339900;">#define XASSERT(test, msg, ...) do {if (!(test)) error(__LINE__, __FILE__, \
        &quot;Assertion failed: %s\n\n&quot; msg, #test,  __VA_ARGS__);} while (0)</span>
<span style="color: #339900;">#else</span>
    <span style="color: #339900;">#define XASSERT(test, msg, ...) ((void)0)</span>
<span style="color: #339900;">#endif</span>
&nbsp;
XASSERT<span style="color: #008000;">&#40;</span>exists<span style="color: #008000;">&#40;</span>file<span style="color: #008000;">&#41;</span>, ”File <span style="color: #000040;">%</span>s does not exist”, file<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span></pre></td></tr></table></div>

<p>Call stack generation and translation from raw addresses to file names and line numbers is platform specific and a lot more cumbersome than it ought to be. But it is still well worth doing. Call stacks let you diagnose many errors with a glance. It is a lot faster than loading up crash dumps in the debugger.</p>
<p>On Windows, use <i>StalkWalk64</i> to generate the call stack and the <i>Sym*</i> functions to translate it.</p>
<p>The error context is our way of providing contexts for error messages. The problem is that sometimes crashes happen in deeply nested code that doesn’t have all the information we would like to give to the user. For example:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="cpp" style="font-family:monospace;"><span style="color: #0000ff;">double</span> parse_json_number<span style="color: #008000;">&#40;</span><span style="color: #0000ff;">const</span> <span style="color: #0000ff;">char</span> <span style="color: #000040;">*</span>s<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span></pre></td></tr></table></div>

<p>If there is a parse error, it would be very helpful for the user to know in which file the error occurred. But the <i>parse_json_number</i> function doesn’t know that. It doesn’t even know if there is a file. It might have been asked to parse data from network or memory.</p>
<p>If we were using exceptions we could handle this by catching the exception at a higher level, adding some information to it (such as the file name) and rethrowing it. But that is rather tedious and also tricky to do in a good way. If we want to add the information to the original exception, then it must already have members for all the possible information that all higher level functions might want to add. That’s a bit strange. Should we throw a new exception? Then the exception gets thrown from the ”wrong place”. The result of all this is that people seldom bother ”decorating” their exceptions in this way. At least I’ve never seen a code base that does it systematically.</p>
<p>What we do instead, is to allow the programmer to define error contexts using scope variables:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="cpp" style="font-family:monospace;"><span style="color: #0000ff;">void</span> init<span style="color: #008000;">&#40;</span><span style="color: #0000ff;">const</span> <span style="color: #0000ff;">char</span> <span style="color: #000040;">*</span>file<span style="color: #008000;">&#41;</span>
<span style="color: #008000;">&#123;</span>
    ErrorContext ec<span style="color: #008000;">&#40;</span><span style="color: #FF0000;">&quot;Parsing JSON:&quot;</span>, file<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
    JsonDoc <span style="color: #000040;">*</span>doc <span style="color: #000080;">=</span> parse_json<span style="color: #008000;">&#40;</span>file<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
<span style="color: #008000;">&#125;</span></pre></td></tr></table></div>

<p>The error contexts get stored on a stack:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="cpp" style="font-family:monospace;">__THREAD Array<span style="color: #000080;">&lt;</span><span style="color: #0000ff;">const</span> <span style="color: #0000ff;">char</span> <span style="color: #000040;">*</span><span style="color: #000080;">&gt;</span> <span style="color: #000040;">*</span>_error_context_name<span style="color: #008080;">;</span>
__THREAD Array<span style="color: #000080;">&lt;</span><span style="color: #0000ff;">const</span> <span style="color: #0000ff;">char</span> <span style="color: #000040;">*</span><span style="color: #000080;">&gt;</span> <span style="color: #000040;">*</span>_error_context_data<span style="color: #008080;">;</span>
&nbsp;
<span style="color: #0000ff;">class</span> ErrorContext
<span style="color: #008000;">&#123;</span>
<span style="color: #0000ff;">public</span><span style="color: #008080;">:</span>
    ErrorContext<span style="color: #008000;">&#40;</span><span style="color: #0000ff;">const</span> <span style="color: #0000ff;">char</span> <span style="color: #000040;">*</span>name, <span style="color: #0000ff;">const</span> <span style="color: #0000ff;">char</span> <span style="color: #000040;">*</span>data<span style="color: #008000;">&#41;</span> <span style="color: #008000;">&#123;</span>
        _error_context_name<span style="color: #000040;">-</span><span style="color: #000080;">&gt;</span>push_back<span style="color: #008000;">&#40;</span>name<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
        _error_context_data<span style="color: #000040;">-</span><span style="color: #000080;">&gt;</span>push_back<span style="color: #008000;">&#40;</span>data<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
    <span style="color: #008000;">&#125;</span>
    ~ErrorContext<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span> <span style="color: #008000;">&#123;</span>
        _error_context_name<span style="color: #000040;">-</span><span style="color: #000080;">&gt;</span>pop_back<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
        _error_context_data<span style="color: #000040;">-</span><span style="color: #000080;">&gt;</span>pop_back<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
    <span style="color: #008000;">&#125;</span>
<span style="color: #008000;">&#125;</span><span style="color: #008080;">;</span></pre></td></tr></table></div>

<p>Note that we only store string pointers, not the full string data. We assume that whatever string the user gives us lives in the same scope as the error context and is valid as long as the error context is. This means that setting the error context just requires pushing 8 bytes to a stack, so the performance overhead is very small.</p>
<p>Note also that the stack uses thread local storage, so we have separate error context stacks for our different execution threads.</p>
<p>When an error occurs, we print all the contexts in the stack, giving the user a good idea of where the error occurred:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="text" style="font-family:monospace;">When spawning level: big_world
When spawning unit: big_bird
When applying material: feathers
Assertion failed: texture != NULL
    Texture not loaded: yellow_feathers
    In material_manager.cpp:1337</pre></td></tr></table></div>

<h3>Next time</h3>
<p>Next time, I’ll look at the other kinds of errors: expected errors and warnings.</p>
<p><i>(This has also been posted to <a href="http://bitsquid.blogspot.com/">the Bitsquid blog</a>.)</i></p>
]]></content:encoded>
			<wfw:commentRss>http://www.altdevblogaday.com/2012/01/22/sensible-error-handling-part-1/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>5 Tips for Programmer Productivity</title>
		<link>http://www.altdevblogaday.com/2012/01/06/5-tips-for-programmer-productivity/</link>
		<comments>http://www.altdevblogaday.com/2012/01/06/5-tips-for-programmer-productivity/#comments</comments>
		<pubDate>Fri, 06 Jan 2012 12:17:26 +0000</pubDate>
		<dc:creator>Niklas Frykholm</dc:creator>
				<category><![CDATA[#gamedev]]></category>
		<category><![CDATA[Game design]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[communication]]></category>
		<category><![CDATA[distractions]]></category>
		<category><![CDATA[interruption solutions]]></category>
		<category><![CDATA[maintaining productivity]]></category>
		<category><![CDATA[problem solving]]></category>
		<category><![CDATA[staying focused]]></category>
		<category><![CDATA[time management]]></category>

		<guid isPermaLink="false">http://altdevblogaday.com/?p=22474</guid>
		<description><![CDATA[<h2>1. Embrace the now-principle</h2>
<p>If something takes less than five minutes to do, do it immediately.</p>
<p>It seems like the lazy option, but postponing something actually takes a lot of effort. The task needs to be written down somewhere. Then you need to track it and prioritize it with respect to other tasks. You will probably think <em>about</em> doing it lots of times, before you actually get down to <em>doing</em> it. And then you have to understand what you meant when you wrote it down and try to get back in that same mindset.</p>
<p><a href="http://www.altdevblogaday.com/2012/01/06/5-tips-for-programmer-productivity/" class="more-link">Read more on 5 Tips for Programmer Productivity&#8230;</a></p>
]]></description>
				<content:encoded><![CDATA[<h2>1. Embrace the now-principle</h2>
<p>If something takes less than five minutes to do, do it immediately.</p>
<p>It seems like the lazy option, but postponing something actually takes a lot of effort. The task needs to be written down somewhere. Then you need to track it and prioritize it with respect to other tasks. You will probably think <em>about</em> doing it lots of times, before you actually get down to <em>doing</em> it. And then you have to understand what you meant when you wrote it down and try to get back in that same mindset.</p>
<p>For small tasks it is just not worth it.</p>
<p>Instead, just do it. Fix the issue right now while you are already thinking about it. It is faster, simpler and saves you the agony of an ever growing todo-list.</p>
<h2>2. Fix the cause, not just the symptom</h2>
<p>Don’t just fix problems. Fix the <em>processes</em> that allowed the problems to occur, so that the same problems never occur again. See bugs not as nuisances, but as chances to improve your processes and increase the quality of your code.</p>
<p>If an artist tells you that she gets an ”Error when compiling unit” error, don’t just diagnose it and tell her: ”It’s because you have two nodes with the same name, that is not allowed.” At the <em>very least</em> fix the error message so that it says ”Error when compiling unit ‘bed’. Two nodes have the same name ‘pillow’. One of them must be renamed so that names are unique.” Even better, fix the exporter or the tool, so that it is <em>impossible</em> for the artist to create two nodes with the same name.</p>
<p>If you find an error that could have been caught by an assert, then add that assert so that it will find the error next time.</p>
<p>If someone asks you ”How can I configure the animation compression?”, don’t just answer them. Also write a short text that explains how it is done, and <em>add that text to the documentation</em>.</p>
<p>In this way, you are not just patching holes and fixing leaks, you are actively making things better. This not only pleases the people who come to you with problems, it also makes your work feel more meaningful.</p>
<h2>3. Try not to break concentration while ”the computer is working”</h2>
<p>The job of programming is filled with a lot of weird little micro pauses. The code is compiling. The console is rebooting. The level is loading. The client is connecting. Etc.</p>
<p>In the best of worlds, these pauses would not exist, and by all means, do all you can to get rid of them. Make your code build faster. Hot reload data and scripts. Make a tool for quickly setting up a bunch of PS3s for a network test. Etc.</p>
<p>But even with the best of efforts, some pauses will likely remain. The question is what to do with them. The temptation is to take a short break from programming and do something else: check mail, answer a Skype, read two paragraphs of an interesting article, update the twitter feed, etc.</p>
<p>For me, these constant mental context switches can put a real damper on productivity, since they make it impossible to maintain concentration and flow.</p>
<p>Nor are these micro-excursions particularly relaxing. Reading two paragraphs of a web page while constantly glancing at a progress bar on the other monitor is not something that soothes my mind. Quite contrary, it is much more stressful than remaining in the zen-like state of concentrated work. It is much better to take one real break than a hundred micro breaks.</p>
<p>So for both productivity and peace of mind I now make a conscious effort to stay focused on the problem at hand while ”the computer is working”. I have a <a href="http://www.sublimetext.com/2">separate text editor</a>, unaffected by IDE freezes, where I can work on related tasks, such as:</p>
<ul>
<li>Adding documentation</li>
<li>Refactoring and code review</li>
<li>Planning the next stage of implementation</li>
<li>Writing script code that tests the system</li>
</ul>
<p>It is still an ongoing battle against the Lure of the Internet, but I find that when I manage to stay focused I am both more productive and more relaxed.</p>
<h2>4. Use source control even more than you think you should</h2>
<p>Source control is not just for source code. With modern distributed source control systems such as Mercurial and Git it is dead simple to create a source repository anywhere and then later (if needed) push it to a server for backup/sharing.</p>
<p>Do you have configuration and settings files for your text editor, IDE, etc? Put them in source control so that you can easily share them between your different machines. Do they need to be installed in special locations. Put them in source control together with a script that installs them in the right place.</p>
<p>Do you use any third party libraries such as zlib, LuaJIT or stb_vorbis? Check them into source control. That way, if you have to do any modifications (bug fixes, fixes for compiler warnings, platform fixes, your own personal optimizations, etc) you will know exactly what you have changed. If a new version of the library is released you can use the source control diff to see what has changed upstream and merge it with your own local changes.</p>
<p>Does an API come with sample code? Before you start playing with that sample code, check it into source control. That way, you can always revert the samples back to their pristine state, without having to reinstall the API. And if you find a bug in the APIs and manage to reproduce it in one of the samples, you can use the source control tool to produce a .patch file for the sample that you can send to the API manufacturers as part of your bug report. That will keep both you and them happy.</p>
<h2>5. Monitor your builds</h2>
<p>Set up a build server that continuously builds all your executables (engine, tools, exporters, &#8230;) in all configurations (debug, development, release) for all platforms, so you know as soon as possible if something breaks. Fixing a problem right away is much easier than doing it two months down the line.</p>
<p>The build server doesn’t have to be a complicated thing. It is more important that it exists than that it has all the bells and whistles. If you don’t have time to do something advanced <a href="http://altdevblogaday.com/2011/05/11/write-a-script-for-it/">just write a script</a> that compiles everything and reports the result. You can expand on that later.</p>
<p>Do the same for content. Write a script that loads all levels and spawns all units.</p>
<p>Use the report system that works best for you. We use Skype for internal real-time communication, so it makes sense to report broken builds over Skype. If e-mail or IRC works better for you, use that instead.</p>
<p>(Also posted to <a href="http://bitsquid.blogspot.com/2012/01/5-tips-for-programmer-productivity.html">the BitSquid blog</a>.)</p>
]]></content:encoded>
			<wfw:commentRss>http://www.altdevblogaday.com/2012/01/06/5-tips-for-programmer-productivity/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Platform Specific Resources</title>
		<link>http://www.altdevblogaday.com/2011/12/22/platform-specific-resources/</link>
		<comments>http://www.altdevblogaday.com/2011/12/22/platform-specific-resources/#comments</comments>
		<pubDate>Thu, 22 Dec 2011 21:37:17 +0000</pubDate>
		<dc:creator>Niklas Frykholm</dc:creator>
				<category><![CDATA[#gamedev]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[pipeline]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[resources]]></category>

		<guid isPermaLink="false">http://altdevblogaday.com/?p=21886</guid>
		<description><![CDATA[<p>I recently added a new feature to the BitSquid tool chain – support for source and destination platforms in the data compiler. What it means is that you can take the data for one platform (the source) and compile it to run on a different platform (the destination). So you can take the data for the mobile version of a game (with all its content optimizations) and compile it so that it runs on your development PC.</p>
<p><a href="http://www.altdevblogaday.com/2011/12/22/platform-specific-resources/" class="more-link">Read more on Platform Specific Resources&#8230;</a></p>
]]></description>
				<content:encoded><![CDATA[<p>I recently added a new feature to the BitSquid tool chain – support for source and destination platforms in the data compiler. What it means is that you can take the data for one platform (the source) and compile it to run on a different platform (the destination). So you can take the data for the mobile version of a game (with all its content optimizations) and compile it so that it runs on your development PC.</p>
<p>This is nice for two reasons. First, access to target hardware can be limited. In a perfect world, every artist would have a dev kit for every target platform. In practice, this might not be economically possible. It might not even be <em>electrically</em> possible (those main fuses can only take so much). Being able to preview and play console/handheld content on PC is better than nothing, in this less-than-perfect world.</p>
<p>Second, since all our editors use the engine for visualization, if we have specified a handheld device as our source platform, all the editors will automatically show the resources as they will appear on that device.</p>
<p>This new feature gives me a chance to talk a little bit about how we have implemented support for platform specific resources, something I haven’t touched on before in this blog.</p>
<p>The BitSquid Tech uses the regular file system for its source data. A resource is identified by its name and type, both of which are determined from the path to the source file:</p>
<p><a href="http://altdevblogaday.com/wp-content/uploads/2011/12/properties_1.png"><img src="http://altdevblogaday.com/wp-content/uploads/2011/12/properties_1.png" alt="" width="400" class="aligncenter size-full wp-image-21887" /></a></p>
<p>Note that even though the name <em>is</em> a path, it is not treated as one, but as a unique identifier. It is hashed to a 64-bit integer by the engine and to refer to a resource you must always specify its full name (and get the same hash result). In the compiled data, the raw names don’t even exist anymore, the files are stored in flat directories indexed by the hash values.</p>
<p>In addition to name and type a resource can also have a number of properties. Properties are dot-separated strings that appear before the type in the file name:</p>
<p>￼<a href="http://altdevblogaday.com/wp-content/uploads/2011/12/properties_2.png"><img src="http://altdevblogaday.com/wp-content/uploads/2011/12/properties_2.png" alt="" width="500" class="aligncenter size-full wp-image-21888" /></a></p>
<p>Properties are used to indicate different variants of the same resource. So all these files represent variants of the same resource:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="lua" style="font-family:monospace;">buttons<span style="color: #66cc66;">.</span>texture
buttons<span style="color: #66cc66;">.</span>ps3<span style="color: #66cc66;">.</span>texture
buttons<span style="color: #66cc66;">.</span>en<span style="color: #66cc66;">.</span>x360<span style="color: #66cc66;">.</span>texture
buttons<span style="color: #66cc66;">.</span>fr<span style="color: #66cc66;">.</span>x360<span style="color: #66cc66;">.</span>texture</pre></td></tr></table></div>

<p>The two most important forms of properties are <em>platforms</em> and <em>languages</em>. </p>
<p><em>Platform properties</em> (x360, ps3, android, win32, etc) are used to provide platform specific versions of resources. This can be used for platform optimized versions of units and levels. Another use is for controller and button images that differ from platform to platform. Since BitSquid is scripted in Lua and Lua files are just a resource like any other, this can also be used for platform specific gameplay code:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="lua" style="font-family:monospace;">PlayerController<span style="color: #66cc66;">.</span>android<span style="color: #66cc66;">.</span>lua</pre></td></tr></table></div>

<p><em>Language properties</em> (en, fr, jp, it, sv, etc) are used for localization. Since all resources have properties, all resources can be localized.</p>
<p>But the property system is not limited to platforms and languages. A developer can make up whatever properties she needs and use them to provide different variants of resources:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="lua" style="font-family:monospace;">bullet_hit<span style="color: #66cc66;">.</span>noblood<span style="color: #66cc66;">.</span>particle_effect
foilage<span style="color: #66cc66;">.</span>withkittens<span style="color: #66cc66;">.</span>texture</pre></td></tr></table></div>

<p>Properties can be resolved either at data compile time or at runtime.</p>
<p>Platform properties are resolved at compile time. When we compile for PS3 and a resource has <em>ps3</em> specific variants, only those variants are included in the compiled data. (If the resource doesn’t have any ps3 variants, we include all variants that do not have a specified platform.)</p>
<p>Language properties and other custom properties are resolved at runtime. All variants are compiled to the runtime data. When running, the game can specify what resource variants it wants with a <em>property preference order</em>. The property preference order specifies the variants it wants to use, in order of preference.</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="lua" style="font-family:monospace;">Application<span style="color: #66cc66;">.</span>set_property_preference_order <span style="color: #66cc66;">&#123;</span>”withkittens”<span style="color: #66cc66;">,</span> ”noblood”<span style="color: #66cc66;">,</span> ”fr”<span style="color: #66cc66;">&#125;</span></pre></td></tr></table></div>

<p>This means that the game would prefer to get a resource that has lots of kittens, no blood and is in French. But if it can’t get all that, it will rather have something that is kitten-full than blood-free. And it prefers a bloodless English resource to a bloody French one.</p>
<p>In other words, if we requested the resource <em>buttons.texture</em> with these settings, the engine would look for variants in the order:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="lua" style="font-family:monospace;">buttons<span style="color: #66cc66;">.</span>withkittens<span style="color: #66cc66;">.</span>noblood<span style="color: #66cc66;">.</span>fr<span style="color: #66cc66;">.</span>texture
buttons<span style="color: #66cc66;">.</span>withkittens<span style="color: #66cc66;">.</span>noblood<span style="color: #66cc66;">.</span>texture
buttons<span style="color: #66cc66;">.</span>withkittens<span style="color: #66cc66;">.</span>fr<span style="color: #66cc66;">.</span>texture
buttons<span style="color: #66cc66;">.</span>withkittens<span style="color: #66cc66;">.</span>texture
buttons<span style="color: #66cc66;">.</span>noblood<span style="color: #66cc66;">.</span>fr<span style="color: #66cc66;">.</span>texture
buttons<span style="color: #66cc66;">.</span>noblood<span style="color: #66cc66;">.</span>texture
buttons<span style="color: #66cc66;">.</span>fr<span style="color: #66cc66;">.</span>texture
buttons<span style="color: #66cc66;">.</span>texture</pre></td></tr></table></div>

<p>To add support for different source and destination platforms to this system all I had to do was to add a feature that lets the data compiler use <em>one</em> platform for resolving properties and a <em>different</em> platform as the format for the runtime files it produces.</p>
<p>(This has also been posted to the <a href="http://bitsquid.blogspot.com/">BitSquid blog</a>.)</p>
]]></content:encoded>
			<wfw:commentRss>http://www.altdevblogaday.com/2011/12/22/platform-specific-resources/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>A Pragmatic Approach to Performance</title>
		<link>http://www.altdevblogaday.com/2011/12/08/a-pragmatic-approach-to-performance/</link>
		<comments>http://www.altdevblogaday.com/2011/12/08/a-pragmatic-approach-to-performance/#comments</comments>
		<pubDate>Thu, 08 Dec 2011 00:23:46 +0000</pubDate>
		<dc:creator>Niklas Frykholm</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[algorithmic complexity]]></category>
		<category><![CDATA[lua]]></category>
		<category><![CDATA[memory access]]></category>
		<category><![CDATA[optimization]]></category>
		<category><![CDATA[parallelization]]></category>
		<category><![CDATA[Performance]]></category>
		<category><![CDATA[pragmatic]]></category>
		<category><![CDATA[sampling profiler]]></category>
		<category><![CDATA[time management]]></category>
		<category><![CDATA[top down]]></category>
		<category><![CDATA[top-down profiler]]></category>
		<category><![CDATA[top-down profiling]]></category>

		<guid isPermaLink="false">http://altdevblogaday.com/?p=21278</guid>
		<description><![CDATA[<p>Is premature optimization the root of all evil? Or is the fix-it-later attitude to performance turning programmers from proud ”computer scientists” to despicable ”script kiddies”?</p>
<p>These are questions without definite answers, but in this article I’ll try to describe my own approach to performance. How I go about to ensure that my systems run decently, without compromising other goals, such as modularity, maintainability and flexibility.</p>
<p><a href="http://www.altdevblogaday.com/2011/12/08/a-pragmatic-approach-to-performance/" class="more-link">Read more on A Pragmatic Approach to Performance&#8230;</a></p>
]]></description>
				<content:encoded><![CDATA[<p>Is premature optimization the root of all evil? Or is the fix-it-later attitude to performance turning programmers from proud ”computer scientists” to despicable ”script kiddies”?</p>
<p>These are questions without definite answers, but in this article I’ll try to describe my own approach to performance. How I go about to ensure that my systems run decently, without compromising other goals, such as modularity, maintainability and flexibility.</p>
<p><strong>§1 Programmer time is a finite resource</strong></p>
<p>If you are writing a big program, some parts of the code will not be as fast as theoretically possible. Sorry, let me rephrase. If you are writing a big program, <em>no part</em> of the code will be as fast as theoretically possible. Yes, I think it is reasonable to assume that every single line of your code could be made to run a little tiny bit faster.</p>
<p>Writing fast software is not about maximum performance all the time. It is about <em>good performance where it matters</em>. If you spend three weeks optimizing a small piece of code that only gets called once a frame, then that’s three weeks of work you could have spent doing something more meaningful. If you had spent it on optimizing code that actually mattered, you could even have made a significant improvement to the game’s frame rate.</p>
<p>There is never enough time to add all the features, fix all the bugs and optimize all the code, so the goal should always be maximum performance for minimum effort.</p>
<p><strong>§2 Don’t underestimate the power of simplicity</strong></p>
<p>Simple solutions are easier to implement than complex solution. But that’s only the tip of the iceberg. The real benefits of simple solutions come in the long run. Simple solutions are easier to understand, easier to debug, easier to maintain, easier to port, easier to profile, easier to optimize, easier to parallelize and easier to replace. Over time, all these savings add up.</p>
<p>Using a simple solution can save so much time that even if it is slower than a more complex solution, as a whole your program will run faster, because you can use the time you saved to optimize other parts of the code. The parts that really matter.</p>
<p>I only use complex solutions when it is really justified. I.e. when the complex solution is significantly faster than the simple one (a factor 2 or so) and when it is in a system that matters (that consumes a significant percentage of the frame time).</p>
<p>Of course simplicity is in the eyes of the beholder. I think arrays are simple. I think POD data types are simple. I think blobs are simple. I don’t think class structures with 12 levels of inheritance are simple. I don’t think classes templated on 8 policy class parameters are simple. I don’t think geometric algebra is simple.</p>
<p><strong>§3 Take advantage of the system design opportunity</strong></p>
<p>Some people seem to think that to avoid ”premature optimization” you should design your systems without any regard to performance whatsoever. You should just slap something together and fix it later when you ”optimize” the code.</p>
<p>I wholeheartedly disagree. Not because I love performance for its own sake, but for purely pragmatic reasons.</p>
<p>When you design a system you have a clear picture in your head of how the different pieces fit together, what the requirements are and how often different functions get called. At that point, it is not much extra effort to take a few moments to think about how the system will perform and how you can setup the data structures so that it runs at fast as possible.</p>
<p>In contrast, if you build your system without considering performance and have to come in and ”fix it” at some later point, that will be much harder. If you have to rearrange the fundamental data structures or add multithreading support, you may have to rewrite the entire system almost from scratch. Only now the system is in production, so you may be restricted by the published API and dependencies to other systems. Also, you cannot break any of the projects that are using the system. And since it was several months since you (or someone else) wrote the code, you have to start by understanding all the thoughts that went into it. And all the little bug fixes and feature tweaks that have been added over time will most likely be lost in the rewrite. You will start again with a fresh batch of bugs.</p>
<p>So by just following our general guideline ”maximum efficiency with minimum effort”, we see that it is better to consider performance up front. Simply since that requires a lot less effort than fixing it later.</p>
<p>Within reason of course. The performance improvements we do up front are easier, but we are less sure that they matter in the big picture. Later, profile-guided fixes require more effort, but we know better where to focus our attention. As in whole life, balance is important.</p>
<p>When I design a system, I do a rough estimate of how many times each piece of code will be executed per frame and use that to guide the design:</p>
<ul>
<li>1-10 Performance doesn’t matter. Do whatever you want.</li>
<li>100 Make sure it is O(n), data-oriented and cache friendly</li>
<li>1000 Make sure it is multithreaded</li>
<li>10000 Think really hard about what you are doing</li>
</ul>
<p>I also have a few general guidelines that I try to follow when writing new systems:</p>
<ul>
<li>Put static data in immutable, single-allocation memory blobs</li>
<li>Allocate dynamic data in big contiguous chunks</li>
<li>Use as little memory as possible</li>
<li>Prefer arrays to complex data structures</li>
<li>Access memory linearly (in a cache friendly way)</li>
<li>Make sure procedures run in O(n) time</li>
<li>Avoid ”do nothing” updates &#8212; instead, keep track of active objects</li>
<li>If the system handles many objects, support data parallelism</li>
</ul>
<p>By now I have written so many systems in this ”style” that it doesn’t require much effort to follow these guidelines. And I know that by doing so I get a decent baseline performance. The guidelines focus on the most important low-hanging fruit: algorithmic complexity, memory access and parallelization and thus give good performance for a relatively small effort.</p>
<p>Of course it is not always possible to follow all guidelines. For example, some algorithms really require more than O(n) time. But I know that when I go outside the guidelines I need to stop and think things through, to make sure I don’t trash the performance.</p>
<p><strong>§4 Use top-down profiling to find bottlenecks</strong></p>
<p>No matter how good your up front design is, your code will be spending time in unexpected places. The content people will use your system in crazy ways and expose bottlenecks that you’ve never thought about. There will be bugs in your code. Some of these bugs will not result in outright crashes, just bad performance. There will be things you haven’t really thought through.</p>
<p>To understand where your program is <em>actually</em> spending its time, a top down profiler is an invaluable tool. We use explicit profiler scopes in our code and pipe the data live over the network to an external tool that can visualize it in various ways:</p>
<p>￼</p>
<div id="attachment_21280" class="wp-caption alignnone" style="width: 878px"><a href="http://altdevblogaday.com/wp-content/uploads/2011/12/Pragmatic-Performance-1.jpg"><img class="size-full wp-image-21280" src="http://altdevblogaday.com/wp-content/uploads/2011/12/Pragmatic-Performance-1.jpg" alt="" width="868" height="386" /></a><p class="wp-caption-text">An (old) screenshot of the BitSquid Profiler</p></div>
<p>The top-down profiler tells you where your optimization efforts need to be focused. Do you spend 60 % of the frame time in the animation system and 0.5 % in the Gui. Then any optimizations you can make to the animations will really pay off, but what you do with the Gui won’t matter one iota.</p>
<p>With a top-down profiler you can insert narrower and narrower profiler scopes in the code to get to the root of a performance problem &#8212; where the time is actually being spent.</p>
<p>I use the general design guidelines to get a good baseline performance for all systems and then drill down with the top-down profiler to find those systems that need a little bit of extra optimization attention.</p>
<p><strong>§5 Use bottom-up profiling to find low-level optimization targets</strong></p>
<p>I find that as a general tool, interactive top-down profiling with explicit scopes is more useful than a bottom-up sampling profiler.</p>
<p>But sampling profilers still have their uses. They are good at finding hotspot functions that are called from many different places and thus don’t necessary show up in a top-down profiler. Such hotspots can be a target for low-level, instruction-by-instruction optimizations. Or they can be an indication that you are doing something bad.</p>
<p>For example if strcmp() is showing up as a hotspot, then your program is being very very naughty and should be sent straight to bed without any cocoa.</p>
<p>A hotspot that often shows up in our code is lua_Vexecute(). This is not surprising. That is the main Lua VM function, a big switch statement that executes most of Lua’s opcodes. But it does tell us that some low level, platform specific optimizations of that function might actually result in real measurable performance benefits.</p>
<p><strong>§6 Beware of synthetic benchmarks</strong></p>
<p>I don’t do much synthetic benchmarking, i.e., looping the code 10 000 times over some made-up piece of data and measuring the execution time.</p>
<p>If I’m at a point where I don’t know whether a change will make the code faster or not, then I want to verify that with data from an actual game. Otherwise, how can I be sure that I’m not just optimizing the benchmark in ways that won’t carry over to real world cases.</p>
<p>A benchmark with 500 instances of the same entity, all playing the same animation is quite different from the same scene with 50 different unit types, all playing different animations. The data access patterns are completely different. Optimizations that improve one case may not matter at all in the other.</p>
<p><strong>§7 Optimization is gardening</strong></p>
<p>Programmers optimize the engine. Artists put in more stuff. It has always been thus. And it is good.</p>
<p>Optimization is not an isolated activity that happens at a specific time. It is a part of the whole life cycle: design, maintenance and evolution. Optimization is an ongoing dialog between artists and programmers about what the capabilities of the engine should be.</p>
<p>Managing performance is like tending a garden, checking that everything is ok, rooting out the weeds and finding ways for the plants to grow better.</p>
<p>It is the job of the artists to push the engine to its knees. And it is the job of the programmers’ job to bring it back up again, only stronger. In the process, a middle ground will be found where the games can shine as bright as possible.</p>
<p>(Also posted to the <a href="http://bitsquid.blogspot.com/">BitSquid blog</a>.)</p>
]]></content:encoded>
			<wfw:commentRss>http://www.altdevblogaday.com/2011/12/08/a-pragmatic-approach-to-performance/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Code Share: Source Censoring</title>
		<link>http://www.altdevblogaday.com/2011/11/22/code-share-source-censoring/</link>
		<comments>http://www.altdevblogaday.com/2011/11/22/code-share-source-censoring/#comments</comments>
		<pubDate>Tue, 22 Nov 2011 21:19:29 +0000</pubDate>
		<dc:creator>Niklas Frykholm</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[ifdef]]></category>
		<category><![CDATA[nda]]></category>
		<category><![CDATA[preprocessor]]></category>
		<category><![CDATA[stripping]]></category>

		<guid isPermaLink="false">http://altdevblogaday.com/?p=20498</guid>
		<description><![CDATA[<p>A tricky problem when distributing source code is how to deal with code that is under various forms of NDAs.</p>
<p>For example, in the BitSquid engine we have a fair amount of code that is PS3 specific, but some of our licensees are not registered PS3 developers and are thus not allowed to see that code. Before we can send them a new version of the engine we need to strip out all the PS3 specific stuff.</p>
<p><a href="http://www.altdevblogaday.com/2011/11/22/code-share-source-censoring/" class="more-link">Read more on Code Share: Source Censoring&#8230;</a></p>
]]></description>
				<content:encoded><![CDATA[<p>A tricky problem when distributing source code is how to deal with code that is under various forms of NDAs.</p>
<p>For example, in the BitSquid engine we have a fair amount of code that is PS3 specific, but some of our licensees are not registered PS3 developers and are thus not allowed to see that code. Before we can send them a new version of the engine we need to strip out all the PS3 specific stuff.</p>
<p>That includes not only the code that runs on the PS3 PPU, but also the SPU code, parts of the Win32 data compiler that cross-compiles data for the PS3 (using libraries from the PS3 SDK) and parts of the documentation.</p>
<p>Since we need to do that <em>every time</em> we distribute a new engine version, doing it manually is not really an option. We need some kind of automated method.</p>
<p>One approach would be to put all the platform specific code in separate subdirectories. You could then easily write a script that excluded all those directories from the code base.</p>
<p>The drawback of this method is that it can become <em>really</em> messy. If you target a large number of platforms (Windows, PS3, X360, MacOS, iOS, Android, more?) you get a huge number of files and directories to keep track of. And sometimes the implementations for different platforms are almost the same. Perhaps you just need one extra variable or to change a type from <em>int</em> to <em>int64_t</em>. Having to create a new subdirectory, an extra file and some kind of abstract platform independent interface just for that is total overkill. And it also makes the code harder to read and harder to maintain.</p>
<p>I don’t want to be forced to organize my code to comply with NDA requirements. If it makes sense to put the platform specific stuff in a separate file, I want to do that. But if it is something that is better handled with a couple of lines in an <em>#ifdef</em>, I want to be able to do that instead.</p>
<p>So, to that end, I&#8217;ve decided to write a code stripping tool that not only can strip specific files and directories, but <em>also</em> individual code lines inside an <em>#ifdef</em>. What that means is that it will take code that looks like this:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="cpp" style="font-family:monospace;"><span style="color: #0000dd;">printf</span><span style="color: #008000;">&#40;</span>”Hello ”<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
<span style="color: #339900;">#ifdef PS3</span>
	<span style="color: #0000dd;">printf</span><span style="color: #008000;">&#40;</span>”PS3 World<span style="color: #000040;">!</span>\n”<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
<span style="color: #339900;">#else</span>
	<span style="color: #0000dd;">printf</span><span style="color: #008000;">&#40;</span>”Some Other World<span style="color: #000040;">!</span>\n”<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
<span style="color: #339900;">#endif</span></pre></td></tr></table></div>

<p>And convert it to:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="cpp" style="font-family:monospace;"><span style="color: #0000dd;">printf</span><span style="color: #008000;">&#40;</span>”Hello ”<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
<span style="color: #339900;">#ifdef STRIPPED_PS3</span>
<span style="color: #666666;">//	........... ...........</span>
<span style="color: #339900;">#else</span>
	<span style="color: #0000dd;">printf</span><span style="color: #008000;">&#40;</span>”Some Other World<span style="color: #000040;">!</span>\n”<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
<span style="color: #339900;">#endif</span></pre></td></tr></table></div>

<p>As you can see the secret PS3 code has been ”censored”, while the non-secret code has been left intact. The tool has also changed the name of the tag to show to the reader that content has been stripped out.</p>
<p>The reason why the tool ”blanks out” the secret lines instead of just removing them is to preserve line numbers between the stripped and non-stripped version of the code. So if someone reports bugs or sends us patches for the stripped code, we can immediately apply them to the non-stripped code.</p>
<p>Since having tons of completely blank lines can be a bit confusing when scrolling through the code, we replace the characters with dots to preserve the ”visual impression” of the source. (Cryptographically inclined readers will note that this leaks some information about the original code, but, I would postulate, not enough to be of any practical consequence.)</p>
<p>The tool tries to be reasonably clever when evaluating the macros, so if you have written something like:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="cpp" style="font-family:monospace;"><span style="color: #339900;">#if defined(PS3) || defined(X360)</span></pre></td></tr></table></div>

<p>the code will get stripped if you have told it to strip both the PS3 and X360 tags, but not if you just strip one of them.</p>
<p>However, the tool doesn’t attempt to be a ”real” preprocessor with all that entails. It only strips expressions that explicitly involve the tags you have told it to strip. I.e., if you give it something like this:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="cpp" style="font-family:monospace;"><span style="color: #339900;">#ifdef PS3</span>
	<span style="color: #339900;">#define MY_SPECIAL_DEFINE</span>
<span style="color: #339900;">#endif</span>
<span style="color: #339900;">#ifdef MY_SPECIAL_DEFINE</span>
	<span style="color: #666666;">// What happens here</span>
<span style="color: #339900;">#endif</span></pre></td></tr></table></div>

<p>You will get:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="cpp" style="font-family:monospace;"><span style="color: #339900;">#ifdef STRIPPED_PS3</span>
<span style="color: #666666;">//	....... .................</span>
<span style="color: #339900;">#endif</span>
<span style="color: #339900;">#ifdef MY_SPECIAL_DEFINE</span>
	<span style="color: #666666;">// What happens here</span>
<span style="color: #339900;">#endif</span></pre></td></tr></table></div>

<p>The code comes with an extensive set of unit tests that show how it works in different situations. I’ve pushed it to <a href="https://bitbucket.org/bitsquid/code_censor">our bitbucket repository</a>.</p>
<p>Feel free to use it for whatever purpose you see fit, perhaps as a way to make it easier for you to share your own code. If you find any problems with it, please report them to me, together with a unit test that exposes the issue.</p>
<p>(This has also been posted to the <a href="http://bitsquid.blogspot.com/">BitSquid blog</a>.)</p>
]]></content:encoded>
			<wfw:commentRss>http://www.altdevblogaday.com/2011/11/22/code-share-source-censoring/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>An Example in Data-Oriented Design: Sound Parameters</title>
		<link>http://www.altdevblogaday.com/2011/11/07/an-example-in-data-oriented-design-sound-parameters/</link>
		<comments>http://www.altdevblogaday.com/2011/11/07/an-example-in-data-oriented-design-sound-parameters/#comments</comments>
		<pubDate>Mon, 07 Nov 2011 22:30:24 +0000</pubDate>
		<dc:creator>Niklas Frykholm</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[C++]]></category>
		<category><![CDATA[data oriented programming]]></category>

		<guid isPermaLink="false">http://altdevblogaday.com/?p=19902</guid>
		<description><![CDATA[<p>The BitSquid sound system allows arbitrary parameters to be set on playing sounds:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="lua" style="font-family:monospace;">force <span style="color: #66cc66;">=</span> <span style="color: #cc66cc;">35.3</span>
material <span style="color: #66cc66;">=</span> <span style="color: #ff6666;">&#34;wood&#34;</span>
weapon <span style="color: #66cc66;">=</span> <span style="color: #ff6666;">&#34;axe&#34;</span></pre></td></tr></table></div>

<p>In the sound editor the sound designer can setup curves and switches that depend on these parameters. So, for example, the designer can choose to play different wav files for a weapon impact, depending on the weapon that was used and the material it hit. In addition the volume and pitch of the sound can be controlled by a curve connected to the force of the impact.</p>
<p><a href="http://www.altdevblogaday.com/2011/11/07/an-example-in-data-oriented-design-sound-parameters/" class="more-link">Read more on An Example in Data-Oriented Design: Sound Parameters&#8230;</a></p>
]]></description>
				<content:encoded><![CDATA[<p>The BitSquid sound system allows arbitrary parameters to be set on playing sounds:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="lua" style="font-family:monospace;">force <span style="color: #66cc66;">=</span> <span style="color: #cc66cc;">35.3</span>
material <span style="color: #66cc66;">=</span> <span style="color: #ff6666;">&quot;wood&quot;</span>
weapon <span style="color: #66cc66;">=</span> <span style="color: #ff6666;">&quot;axe&quot;</span></pre></td></tr></table></div>

<p>In the sound editor the sound designer can setup curves and switches that depend on these parameters. So, for example, the designer can choose to play different wav files for a weapon impact, depending on the weapon that was used and the material it hit. In addition the volume and pitch of the sound can be controlled by a curve connected to the force of the impact.</p>
<p>To implement this behavior, we need a way of representing such parameter sets in the engine. Since there can potentially be lots of playing sounds, we need a representation that is as efficient as possible.</p>
<p>If you did a by-the-book C++ design of this problem, you might end up with an abomination like this:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="cpp" style="font-family:monospace;"><span style="color: #0000ff;">struct</span> ParameterValue
<span style="color: #008000;">&#123;</span>
	<span style="color: #0000ff;">enum</span> Type <span style="color: #008000;">&#123;</span>STRING_TYPE, NUMERIC_TYPE<span style="color: #008000;">&#125;</span><span style="color: #008080;">;</span>
	Type type<span style="color: #008080;">;</span>
	std<span style="color: #008080;">::</span><span style="color: #007788;">string</span> string_value<span style="color: #008080;">;</span>
	<span style="color: #0000ff;">float</span> numeric_value<span style="color: #008080;">;</span>
<span style="color: #008000;">&#125;</span><span style="color: #008080;">;</span>
&nbsp;
<span style="color: #0000ff;">typedef</span> std<span style="color: #008080;">::</span><span style="color: #007788;">map</span><span style="color: #000080;">&lt;</span>std<span style="color: #008080;">::</span><span style="color: #007788;">string</span>, ParameterValue<span style="color: #000080;">&gt;</span> Parameters<span style="color: #008080;">;</span>
&nbsp;
<span style="color: #0000ff;">struct</span> SoundInstance
<span style="color: #008000;">&#123;</span>
	<span style="color: #666666;">// Other members...</span>
	Parameters <span style="color: #000040;">*</span>parameters<span style="color: #008080;">;</span>
<span style="color: #008000;">&#125;</span><span style="color: #008080;">;</span>
&nbsp;
std<span style="color: #008080;">::</span><span style="color: #007788;">vector</span><span style="color: #000080;">&lt;</span>SoundInstance<span style="color: #000080;">&gt;</span> playing_sounds<span style="color: #008080;">;</span></pre></td></tr></table></div>

<p>which would result in tons of pointer chasing, memory allocation and data copying.</p>
<p>So let’s fix it!</p>
<p>First, let’s get rid of the strings. Strings should almost only be used for text that is <em>displayed to the end user</em>. For everything else, they are usually a bad idea. In this case, since the only thing we need to do is match strings that are equal (find the parameter named ”material”, check if its is value ”wood”, etc) we can use a hash instead of the full string value:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="cpp" style="font-family:monospace;"><span style="color: #0000ff;">struct</span> ParameterValue
<span style="color: #008000;">&#123;</span>
	<span style="color: #0000ff;">enum</span> Type <span style="color: #008000;">&#123;</span>STRING_TYPE, NUMERIC_TYPE<span style="color: #008000;">&#125;</span><span style="color: #008080;">;</span>
	Type type<span style="color: #008080;">;</span>
	<span style="color: #0000ff;">union</span> <span style="color: #008000;">&#123;</span>
		IdString32 string_value<span style="color: #008080;">;</span>
		<span style="color: #0000ff;">float</span> numeric_value<span style="color: #008080;">;</span>
	<span style="color: #008000;">&#125;</span><span style="color: #008080;">;</span>
<span style="color: #008000;">&#125;</span><span style="color: #008080;">;</span>
&nbsp;
<span style="color: #0000ff;">typedef</span> std<span style="color: #008080;">::</span><span style="color: #007788;">map</span><span style="color: #000080;">&lt;</span>IdString32, ParameterValue<span style="color: #000080;">&gt;</span> Parameters<span style="color: #008080;">;</span></pre></td></tr></table></div>

<p>IdString32 is our type for representing hashed strings. It just stores a 4-byte string hash. Since it is a POD-type, we can put it in a union together with the numeric value. This takes the ParameterValue struct down to a manageable 8 bytes with no dynamic data allocation.</p>
<p>But we can actually make it even smaller, by just getting rid of the type:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="cpp" style="font-family:monospace;"><span style="color: #0000ff;">union</span> ParameterValue <span style="color: #008000;">&#123;</span>
	IdString32 string_value<span style="color: #008080;">;</span>
	<span style="color: #0000ff;">float</span> numeric_value<span style="color: #008080;">;</span>
<span style="color: #008000;">&#125;</span><span style="color: #008080;">;</span></pre></td></tr></table></div>

<p>We can do this because when we access the parameter we know which type we want. If we are evaluating a curve, we want a numeric value. If we want to compare it to a hash, we want a string value. Getting rid of the type means we can’t <em>assert()</em> on type errors (if someone has done something silly like setting the ”material” to 3.5 or the ”force” to ”banana”). But other than that everything will work as before.</p>
<p>Next, let’s attack the map:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="cpp" style="font-family:monospace;"><span style="color: #0000ff;">typedef</span> std<span style="color: #008080;">::</span><span style="color: #007788;">map</span><span style="color: #000080;">&lt;</span>IdString32, ParameterValue<span style="color: #000080;">&gt;</span> Parameters<span style="color: #008080;">;</span></pre></td></tr></table></div>

<p>Just like std::string, std::map should set off all kinds of warning bells in your head. std::map is almost never a good choice. Better alternatives are: linear search in a std::vector (for smallish maps), binary search in a sorted array (for larger, static maps) or hash_map.</p>
<p>In this case, we don’t expect there to be that many parameters set on a sound (&lt;10 in the typical case), so linear search is fine:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="cpp" style="font-family:monospace;"><span style="color: #0000ff;">struct</span> Parameter <span style="color: #008000;">&#123;</span>
	IdString32 key<span style="color: #008080;">;</span>
	<span style="color: #0000ff;">union</span> <span style="color: #008000;">&#123;</span>
		IdString32 string_value<span style="color: #008080;">;</span>
		<span style="color: #0000ff;">float</span> numeric_value<span style="color: #008080;">;</span>
	<span style="color: #008000;">&#125;</span><span style="color: #008080;">;</span>
<span style="color: #008000;">&#125;</span><span style="color: #008080;">;</span>
&nbsp;
<span style="color: #0000ff;">typedef</span> std<span style="color: #008080;">::</span><span style="color: #007788;">vector</span><span style="color: #000080;">&lt;</span>Parameter<span style="color: #000080;">&gt;</span> Parameters<span style="color: #008080;">;</span>
&nbsp;
<span style="color: #0000ff;">struct</span> SoundInstance
<span style="color: #008000;">&#123;</span>
	<span style="color: #666666;">// Other members...</span>
	Parameters <span style="color: #000040;">*</span>parameters<span style="color: #008080;">;</span>
<span style="color: #008000;">&#125;</span><span style="color: #008080;">;</span>
&nbsp;
std<span style="color: #008080;">::</span><span style="color: #007788;">vector</span><span style="color: #000080;">&lt;</span>SoundInstance<span style="color: #000080;">&gt;</span> _playing_sounds<span style="color: #008080;">;</span></pre></td></tr></table></div>

<p>A lot better than what we started with. But I’m still not 100 % satisfied.</p>
<p>I don’t like the fact that we have a vector of sound instances, and each of those contains a vector of parameters. Vectors-in-vectors raise performance warning flags for me. I like it when my data structures are just arrays of POD structs. Then I know that they are cache friendly and don’t put much strain on the memory system. 512 parameter vectors allocated on the heap for 512 playing sounds make me uneasy.</p>
<p>So what can we do? We could go to a fixed number of parameters:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="cpp" style="font-family:monospace;"><span style="color: #0000ff;">struct</span> SoundInstance
<span style="color: #008000;">&#123;</span>
	<span style="color: #666666;">// Other members...</span>
	<span style="color: #0000ff;">unsigned</span> num_parameters<span style="color: #008080;">;</span>
	Parameter parameters<span style="color: #008000;">&#91;</span>MAX_INSTANCE_PARAMETERS<span style="color: #008000;">&#93;</span><span style="color: #008080;">;</span>
<span style="color: #008000;">&#125;</span><span style="color: #008080;">;</span></pre></td></tr></table></div>

<p>Now the SoundInstance is a POD and all the data is just one big happy blob.</p>
<p>The drawback of this approach is that you might need to set <em>MAX_INSTANCE_PARAMETERS</em> pretty high to be able to handle the most complicated sounds. This would waste some memory for all the sounds that use just one or two parameters.</p>
<p>Say you have 512 sounds and MAX_INSTANCE_PARAMETERS = 32, with 8 bytes in the Parameter struct that then totals to 131 K. Not terrible, but not a tuppence either.</p>
<p>There should be some way of doing better. But if we can’t use a dynamic vector, nor a static array, what can we then possibly use?</p>
<p>A linked list!</p>
<p>Regular linked list have horrible cache behavior and are best stayed away from. But we can achieve the benefits of linked lists while still having decent cache performance by putting the list in an array:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="cpp" style="font-family:monospace;"><span style="color: #0000ff;">struct</span> ParameterNode <span style="color: #008000;">&#123;</span>
	IdString32 key<span style="color: #008080;">;</span>
	<span style="color: #0000ff;">union</span> <span style="color: #008000;">&#123;</span>
		IdString32 string_value<span style="color: #008080;">;</span>
		<span style="color: #0000ff;">float</span> numeric_value<span style="color: #008080;">;</span>
	<span style="color: #008000;">&#125;</span><span style="color: #008080;">;</span>
	ParameterNode <span style="color: #000040;">*</span>next<span style="color: #008080;">;</span>
<span style="color: #008000;">&#125;</span><span style="color: #008080;">;</span>
&nbsp;
ParameterNode nodes<span style="color: #008000;">&#91;</span>MAX_PARAMETERS<span style="color: #008000;">&#93;</span><span style="color: #008080;">;</span>
&nbsp;
<span style="color: #0000ff;">struct</span> SoundInstance
<span style="color: #008000;">&#123;</span>
	<span style="color: #666666;">// Other members...</span>
	ParameterNode <span style="color: #000040;">*</span>parameters<span style="color: #008080;">;</span>
<span style="color: #008000;">&#125;</span><span style="color: #008080;">;</span>
&nbsp;
std<span style="color: #008080;">::</span><span style="color: #007788;">vector</span><span style="color: #000080;">&lt;</span>SoundInstance<span style="color: #000080;">&gt;</span> playing_sounds<span style="color: #008080;">;</span></pre></td></tr></table></div>

<p>Now we have all the parameters stored in a single memory blob. And instead of having a maximum number of parameters per sound, we have a total limit on the number of set parameters (which works much better when most sounds have few parameters). We could get rid of that limit as well if we needed to, by using a vector instead of an array to store the nodes and indices instead of pointers for the ”links”.</p>
<p>You can use many different strategies for allocating nodes from the array. My favorite method is to walk over the array until the next free node is found:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="cpp" style="font-family:monospace;"><span style="color: #0000ff;">unsigned</span> last_allocated <span style="color: #000080;">=</span> MAX_PARAMETERS<span style="color: #000040;">-</span><span style="color: #0000dd;">1</span><span style="color: #008080;">;</span>
&nbsp;
Node <span style="color: #000040;">*</span>allocate_node<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span>
<span style="color: #008000;">&#123;</span>
	<span style="color: #0000ff;">while</span> <span style="color: #008000;">&#40;</span><span style="color: #0000ff;">true</span><span style="color: #008000;">&#41;</span> <span style="color: #008000;">&#123;</span>
		last_allocated <span style="color: #000080;">=</span> <span style="color: #008000;">&#40;</span>last_allocated <span style="color: #000040;">+</span> <span style="color: #0000dd;">1</span><span style="color: #008000;">&#41;</span> <span style="color: #000040;">%</span> MAX_PARAMETERS<span style="color: #008080;">;</span>
		<span style="color: #0000ff;">if</span> <span style="color: #008000;">&#40;</span>nodes<span style="color: #008000;">&#91;</span>last_allocated<span style="color: #008000;">&#93;</span>.<span style="color: #007788;">key</span> <span style="color: #000080;">==</span> <span style="color: #0000dd;">0</span><span style="color: #008000;">&#41;</span>
			<span style="color: #0000ff;">break</span><span style="color: #008080;">;</span>
	<span style="color: #008000;">&#125;</span>
	<span style="color: #0000ff;">return</span> <span style="color: #000040;">&amp;</span>nodes<span style="color: #008000;">&#91;</span>last_allocated<span style="color: #008000;">&#93;</span><span style="color: #008080;">;</span>
<span style="color: #008000;">&#125;</span></pre></td></tr></table></div>

<p>Here, an empty key is used to indicate free nodes.</p>
<p>The advantage of this method is that nodes that are allocated at the same time end up in adjacent array slots. This means that all the parameters of a particular sound (which tend to get set at the same time) get stored next to each other in memory, which means they can be accessed without cache misses.</p>
<p><em>(This has also been posted to the <a href="http://bitsquid.blogspot.com/">BitSquid blog</a>.)</em></p>
]]></content:encoded>
			<wfw:commentRss>http://www.altdevblogaday.com/2011/11/07/an-example-in-data-oriented-design-sound-parameters/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Low-Level Animation &#8212; Part 2</title>
		<link>http://www.altdevblogaday.com/2011/10/23/low-level-animation-part-2/</link>
		<comments>http://www.altdevblogaday.com/2011/10/23/low-level-animation-part-2/#comments</comments>
		<pubDate>Sun, 23 Oct 2011 18:45:25 +0000</pubDate>
		<dc:creator>Niklas Frykholm</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://altdevblogaday.com/?p=19137</guid>
		<description><![CDATA[<p>Some time ago I wrote an <a href="http://bitsquid.blogspot.com/2009/11/bitsquid-low-level-animation-system.html">article</a> describing how animation compression is implemented in the BitSquid engine. In that article I made a vague promise that I would follow up with a description of how to pack the data in a cache-friendly way. Now, the time has come to deliver on that vague promise.</p>
<p><a href="http://www.altdevblogaday.com/2011/10/23/low-level-animation-part-2/" class="more-link">Read more on Low-Level Animation &#8212; Part 2&#8230;</a></p>
]]></description>
				<content:encoded><![CDATA[<p>Some time ago I wrote an <a href="http://bitsquid.blogspot.com/2009/11/bitsquid-low-level-animation-system.html">article</a> describing how animation compression is implemented in the BitSquid engine. In that article I made a vague promise that I would follow up with a description of how to pack the data in a cache-friendly way. Now, the time has come to deliver on that vague promise.</p>
<p>A quick recap: After curve fitting, each track of our animation consists of a number of curve points that describe the curve for each animation track:</p>
<p>￼<a href="http://altdevblogaday.com/wp-content/uploads/2011/10/animation_1.png"><img src="http://altdevblogaday.com/wp-content/uploads/2011/10/animation_1.png" alt="" width="600" class="alignnone size-full wp-image-19138" /></a></p>
<p>By an <em>animation track</em> I mean the animation of a single parameter, typically the position or rotation of a bone.</p>
<p>The data for the track is a sequence of times and curve data:</p>
<p>￼<a href="http://altdevblogaday.com/wp-content/uploads/2011/10/animation_2.png"><img src="http://altdevblogaday.com/wp-content/uploads/2011/10/animation_2.png" alt="" width="600" class="alignnone size-full wp-image-19139" /></a></p>
<p>Here t_i is the time of a curve point and A_i is the corresponding curve data.</p>
<p>To evaluate the curve at any particular point t we need the curve points both before and after the time t</p>
<p>￼<a href="http://altdevblogaday.com/wp-content/uploads/2011/10/animation_3.png"><img src="http://altdevblogaday.com/wp-content/uploads/2011/10/animation_3.png" alt="" width="600" class="alignnone size-full wp-image-19140" /></a></p>
<p>Depending on what curve type you use (hermite, bezier, b-spline, etc) you might actually need more than two curve points to evaluate a segment, but that doesn’t really affect the discussion in this article, so for the sake of simplicity, let’s stick with two.</p>
<p>Note that the time points for the different tracks in the animation typically do not match up. For example, one curve may be completely flat and only require one sample at the start and one sample at the end. Another curve may be complicated and require lots of samples.</p>
<p>To simplify the discussion further, assume that the animation only contains two tracks (it is easy to generalize the solution to more tracks). We will call the curve points of one (t_i,  A_i) and the curve points of the other (s_i, B_i):</p>
<p>￼<a href="http://altdevblogaday.com/wp-content/uploads/2011/10/animation_4.png"><img src="http://altdevblogaday.com/wp-content/uploads/2011/10/animation_4.png" alt="" width="600" class="alignnone size-full wp-image-19141" /></a></p>
<p>How can we organize this data to be as cache friendly as possible?</p>
<p>The most natural approach is perhaps to sort the data first by track and then by time. Let’s see what this means for the cache. To evaluate the animation for some particular time t, we have to go into the data for each track at that time to look up the two neighboring curve points. Let’s assume that we have somehow cached our current position in each track, so that we don’t have to search for it, we will still have at least one cache miss for each track. A modern character can have over 100 bones, with two tracks per bone. That’s 200 cache misses for just a single frame of a single animation.</p>
<p>To do better, we need to organize the data by time somehow. But it is not immediately clear how. Just sorting the data by time won’t help, because then a flat curve with just two curve points, one at the beginning and one at the end, will have them at complete opposite ends of the data and no matter what we do we will get cache misses when touching them.</p>
<p>Let’s consider all the data we need to evaluate the tracks at time t. We need (t_i, A_i), (t_i+1, A_i+1) and (s_j, B_j), (s_j+1, B_j+1) where t_i &lt;= t &lt;= t_i+1 and s_j &lt;= t &lt;= s_j+1. This is our ”hot” data, because we will need to refer to it several times as we evaluate the curve at different points in time. In fact, we can keep using this same data until we reach whichever is smallest of t_i+1 and s_j+1.</p>
<p>A general rule in memory access optimization is to keep the ”hot” data together, so let’s create an additional data structure, an array with the currently active curve points for a playing animation instance.</p>
<p>￼<a href="http://altdevblogaday.com/wp-content/uploads/2011/10/animation_5.png"><img src="http://altdevblogaday.com/wp-content/uploads/2011/10/animation_5.png" alt="" width="361" height="259" class="alignnone size-full wp-image-19142" /></a></p>
<p>Now we’re getting somewhere. Not only have we significantly improved the cache behavior; as long as we don’t need to fetch new curve points we only need to refer to the active array, a single memory access. We have also decomposed our animation evaluation problem into two simpler tasks: evaluating curves and fetching new curve points. This makes our code both simpler and more flexible.</p>
<p>Let’s look at the second issue, fetching new curve points. In the example above, when we reach the time t_i+1 we will need to fetch the new curve point (t_i+2, A_i+2) and when we reach the time s_j+1 we will need to fetch (s_j+2, B_j+2).</p>
<p>￼<a href="http://altdevblogaday.com/wp-content/uploads/2011/10/animation_6.png"><img src="http://altdevblogaday.com/wp-content/uploads/2011/10/animation_6.png" alt="" width="600" class="alignnone size-full wp-image-19143" /></a></p>
<p>Generalizing, we always need to fetch the point (t_i, A_i) at the time t_i-1, and we always need to fetch the point (s_i, B_i) at the time s_i-1. This is excellent, because since we now the time when each of our curve points will be needed we can put them all in a single stream of data which is sorted by the time when they will be needed.</p>
<p>￼<a href="http://altdevblogaday.com/wp-content/uploads/2011/10/animation_7.png"><img src="http://altdevblogaday.com/wp-content/uploads/2011/10/animation_7.png" alt="" width="600" class="alignnone size-full wp-image-19144" /></a></p>
<p>This means that our animation player only needs to keep a single pointer into the animation stream. That pointer will always point to the next curve point that needs to be moved to the <em>active</em> list. As time is advanced, curve points are copied from the animation data into the <em>active</em> list and then the curve is evaluated.</p>
<p>￼<a href="http://altdevblogaday.com/wp-content/uploads/2011/10/animation_8.png"><img src="http://altdevblogaday.com/wp-content/uploads/2011/10/animation_8.png" alt="" width="600" class="alignnone size-full wp-image-19145" /></a></p>
<p>Note the excellent cache behavior this gives us. To fetch new curve points, we just move a pointer forward in memory. And then, to evaluate the curves, we just need to access our <em>active</em> array, a single continuous memory block. This gives us a grand total of just two memory accesses.</p>
<p>Another nice property is that since we are now accessing the animation data as a stream (strictly linearly, from beginning to end) we can gzip it and get another factor two of compression. We can also easily stream it from disk.</p>
<p>One drawback of this system is that it only supports playing an animation forward, you cannot jump to a particular time in an animation without ”fast forwarding” through all intermediate curve points.</p>
<p>If you need support for jumping, the easiest way to achieve it is perhaps to add a separate index with <em>jump frames</em>. A <em>jump frame</em> consists of the state of the <em>active</em> array at some point in time, together with an offset into the data stream. In other words, all the state information that the animation player needs to jump to that time point and resume playing.</p>
<p>Using jump frames let’s you balance performance and memory use. If you add more jump frames you will use more memory but on the other hand, you will be able to find a jump frame closer to the time you <em>actually</em> want to go to which means less fast forwarding.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.altdevblogaday.com/2011/10/23/low-level-animation-part-2/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Caring by Sharing: Header Hero</title>
		<link>http://www.altdevblogaday.com/2011/10/08/caring-by-sharing-header-hero/</link>
		<comments>http://www.altdevblogaday.com/2011/10/08/caring-by-sharing-header-hero/#comments</comments>
		<pubDate>Sat, 08 Oct 2011 16:21:37 +0000</pubDate>
		<dc:creator>Niklas Frykholm</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[C++]]></category>
		<category><![CDATA[compile time]]></category>
		<category><![CDATA[headers]]></category>

		<guid isPermaLink="false">http://altdevblogaday.com/?p=18136</guid>
		<description><![CDATA[<p><a href="http://altdevblogaday.com/wp-content/uploads/2011/10/header_hero_1.png"><img src="http://altdevblogaday.com/wp-content/uploads/2011/10/header_hero_1.png" alt="" width="600" class="alignnone size-full wp-image-18141" /></a><br />
￼<br />
Compile times get worse over time, that is the second law of C++ programming dynamics. There are many small day-to-day changes that each exacerbate the problem slightly: The project grows. New header files get included. Clever templates get written. And so on. There are comparatively few forces that work in the other direction. Once an <em>#include</em> has been added, it stays.</p>
<p><a href="http://www.altdevblogaday.com/2011/10/08/caring-by-sharing-header-hero/" class="more-link">Read more on Caring by Sharing: Header Hero&#8230;</a></p>
]]></description>
				<content:encoded><![CDATA[<p><a href="http://altdevblogaday.com/wp-content/uploads/2011/10/header_hero_1.png"><img src="http://altdevblogaday.com/wp-content/uploads/2011/10/header_hero_1.png" alt="" width="600" class="alignnone size-full wp-image-18141" /></a><br />
￼<br />
Compile times get worse over time, that is the second law of C++ programming dynamics. There are many small day-to-day changes that each exacerbate the problem slightly: The project grows. New header files get included. Clever templates get written. And so on. There are comparatively few forces that work in the other direction. Once an <em>#include</em> has been added, it stays.</p>
<p>The only exception is when some hero steps up, says <em>Enough!</em> and starts to crunch down on those header files. It is thankless menial work that offers few rewards, save the knowledge that you are contributing to the public good. </p>
<p>Today, I want to give something back to these unsung heroes, so I’ve made a small tool to make their drudgery a bit less&#8230; drudgery-ish? It is called <em>Header Hero</em>:</p>
<p><a href="http://altdevblogaday.com/wp-content/uploads/2011/10/header_hero_2.png"><img src="http://altdevblogaday.com/wp-content/uploads/2011/10/header_hero_2.png" alt="" width="397" height="285" class="alignnone size-full wp-image-18138" /></a>￼</p>
<p>To run <em>Header Hero</em> you specify the directories where your <em>.cpp</em> files can be found as well as the directories to search for included headers. The program scans your <em>.h</em> and <em>.cpp</em> files to find all the include links. It presents the result in a summarized report that shows you what the worst headers are. You can think of it as a header file profiler.</p>
<p>You don’t need to specify all your include directories, but only the ones you have specified will be scanned.</p>
<p>I’ve focused on making the tool <em>fast</em> by caching as much information as possible and using a simple parser that just looks for <em>#include</em> patterns rather than running the real C preprocessor. The downside is that if you are using any fancy preprocessor tricks, they will most likely be missed. On the other hand, the tool can scan a huge project in seconds. And after the initial scan, new scans can be done in a fraction of that time.</p>
<p>The program produces a report that looks something like this:</p>
<p>￼<a href="http://altdevblogaday.com/wp-content/uploads/2011/10/header_hero_3.png"><img src="http://altdevblogaday.com/wp-content/uploads/2011/10/header_hero_3.png" alt="" width="593" height="473" class="alignnone size-full wp-image-18139" /></a></p>
<p>At the top are some statistics, such as the total number of files and lines in the project. <em>Total Parsed</em> counts how many lines that would actually be parsed in a full recompile of the project. So, a header that is included by several <em>.cpp</em> files adds to that number every time. The <em>Blowup Factor</em> are the last two items divided. It specifies how many times, on average, each line gets parsed. A value of 35 means that on average, each line in our project is parsed 35 times. That seems quite a lot.</p>
<p>Below the summary are a list of the header files sorted by how many lines they contributed to the <em>Total Parsed</em> number. In other words, the size of that file multiplied by the number of times it was included.</p>
<p>Looking at the sample report above, it seems pretty reasonable. At the top we find big templated collection classes (<em>map, set, string, vector</em>) that have big header files and are used in a lot of places. Math (<em>matrix4x4, vector3</em>) and utility (<em>critical_section, file_system</em>) files also end up high on the list.</p>
<p>But when you dig into it, there are also things that seem a bit fishy. <em>Set&lt;T&gt;</em> is not a very popular collection class. Sets are used less than maps, and <em>HashSet</em> is usually preferable to <em>Set</em>. Why does it end up so high on the list? What is <em>shader.h</em> doing there? That seems too specialized to end up so high. And <em>file_system.h</em>? There shouldn’t be that much code that directly accesses the file system, only the resource loader needs to do that.</p>
<p>To answer those questions, you can click on any file in the report to get a detailed view of its relations:</p>
<p><a href="http://altdevblogaday.com/wp-content/uploads/2011/10/header_hero_4.png"><img src="http://altdevblogaday.com/wp-content/uploads/2011/10/header_hero_4.png" alt="" width="593" height="473" class="alignnone size-full wp-image-18140" /></a></p>
<p>In the middle we find the file we are looking at. To the left are the files that directly include it. The number after each file name specifies how many files that directly or indirectly include <em>that</em> file. To the right are the files included by the file. The numbers are all the files directly or indirectly included by <em>those</em> files. You can double click on any file name in the view to refocus on it.</p>
<p>Here we clearly see that the main culprit is <em>data_compiler.h</em>. It includes <em>set.h</em> and is in turn included by 316 other files. To fix the compile times we can make <em>data_compiler.h</em> not include <em>set.h</em> or we can try to reduce the number of files that include <em>data_compiler.h</em> (that number also seems high). If we also fix <em>scene_graph.h</em> we can really make a difference.</p>
<p>Breaking dependencies is a whole topic in itself, especially when it comes to templates and inlined code. Here are some quick tips though:</p>
<p>1) Predeclare the structs and classes that you use instead of including the header file. Don’t forget that you can predeclare templates and typedefs as well as regular classes:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="cpp" style="font-family:monospace;"><span style="color: #0000ff;">class</span> MyClass<span style="color: #008080;">;</span>
<span style="color: #0000ff;">typedef</span> <span style="color: #0000ff;">int</span> Id<span style="color: #008080;">;</span>
<span style="color: #0000ff;">template</span> <span style="color: #000080;">&lt;</span><span style="color: #0000ff;">class</span> T<span style="color: #000080;">&gt;</span> <span style="color: #0000ff;">class</span> Vector<span style="color: #008080;">;</span></pre></td></tr></table></div>

<p>2) Predeclared types can only be used as pointers and references. You can’t have a member variable of a type whose actual size is unknown. So you may have to change your member variables to pointers in order to get rid of the header dependency. You can also use the <a href="http://en.wikipedia.org/wiki/Opaque_pointer">pimpl idiom</a>, if you can live with the extra indirection and lack of inlining.</p>
<p>3) Switching from in-place variables to pointers can lead to bad memory access patterns. One way of fixing that is to placement new the object directly into a raw memory buffer.</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="cpp" style="font-family:monospace;"><span style="color: #666666;">// a.h</span>
&nbsp;
<span style="color: #0000ff;">class</span> B<span style="color: #008080;">;</span>
&nbsp;
<span style="color: #0000ff;">class</span> A <span style="color: #008000;">&#123;</span>
    A<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
    B <span style="color: #000040;">*</span>_b<span style="color: #008080;">;</span>
    <span style="color: #0000ff;">static</span> <span style="color: #0000ff;">const</span> <span style="color: #0000ff;">int</span> SIZE_OF_B <span style="color: #000080;">=</span> <span style="color: #0000dd;">20</span><span style="color: #008080;">;</span>
    <span style="color: #0000ff;">char</span> _b_storage<span style="color: #008000;">&#91;</span>SIZE_OF_B<span style="color: #008000;">&#93;</span><span style="color: #008080;">;</span>
<span style="color: #008000;">&#125;</span><span style="color: #008080;">;</span>
&nbsp;
<span style="color: #666666;">// a.cpp</span>
&nbsp;
<span style="color: #339900;">#include ”b.h”</span>
&nbsp;
A<span style="color: #008080;">::</span><span style="color: #007788;">A</span><span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span>
<span style="color: #008000;">&#123;</span>
    XASSERT<span style="color: #008000;">&#40;</span><span style="color: #0000dd;">sizeof</span><span style="color: #008000;">&#40;</span>B<span style="color: #008000;">&#41;</span> <span style="color: #000080;">==</span> SIZE_OF_B<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
    _b <span style="color: #000080;">=</span> <span style="color: #0000dd;">new</span> <span style="color: #008000;">&#40;</span>_b_storage<span style="color: #008000;">&#41;</span> B<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
<span style="color: #008000;">&#125;</span></pre></td></tr></table></div>

<p>With this technique, you get the data for <em>B</em> stored inside <em>A</em>, without having to include the <em>b.h</em> header in <em>a.h</em>. But the code isn’t exactly easy to read, so you should only use this in desperate situations.</p>
<p>4) For files with small type definitions, but lots of inlined methods (e.g., <em>matrix4x4.h</em>), a good strategy is to split the file, so you have just the type in one file and all the methods in the other. Header files can then include just the type definition, while <em>.cpp</em> files pull in the whole shebang.</p>
<p>Using these techniques you can get rid of the header dependencies one by one, until you are back at reasonable compile times. Since a rescan takes just a fraction of a second it is easy to see how your changes affect the compile time. Just make sure you have your integration test running, it is easy to break build configurations when you are fiddling around with the headers.</p>
<p>Here is the result of about a day and a half of header optimization in our code base:</p>
<p>￼<a href="http://altdevblogaday.com/wp-content/uploads/2011/10/header_hero_5.png"><img src="http://altdevblogaday.com/wp-content/uploads/2011/10/header_hero_5.png" alt="" width="571" height="474" class="alignnone size-full wp-image-18141" /></a></p>
<p>From 6 million to 4.3 million lines, that’s not too shabby. We can now do a complete rebuild in 37 seconds on a reasonably modern machine. With this tool we can hopefully keep that number.</p>
<p>You can download the C# source code here. Feel free to do whatever you like with it:</p>
<ul>
<li><a href="https://bitbucket.org/bitsquid/header_hero">https://bitbucket.org/bitsquid/header_hero</a></li>
</ul>
<p><em>(This has also been posted to the <a href="http://bitsquid.blogspot.com/">BitSquid Blog</a>.)</em></p>
]]></content:encoded>
			<wfw:commentRss>http://www.altdevblogaday.com/2011/10/08/caring-by-sharing-header-hero/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Managing Decoupling Part 4 &#8212; The ID Lookup Table</title>
		<link>http://www.altdevblogaday.com/2011/09/23/managing-decoupling-part-4-the-id-lookup-table/</link>
		<comments>http://www.altdevblogaday.com/2011/09/23/managing-decoupling-part-4-the-id-lookup-table/#comments</comments>
		<pubDate>Fri, 23 Sep 2011 20:32:30 +0000</pubDate>
		<dc:creator>Niklas Frykholm</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[C++]]></category>
		<category><![CDATA[data oriented programming]]></category>
		<category><![CDATA[decoupling]]></category>

		<guid isPermaLink="false">http://altdevblogaday.com/?p=17165</guid>
		<description><![CDATA[<p>Today I am going to dig deeper into an important and versatile data structure that pops up all the time in the BitSquid engine &#8212; the ID lookup table.</p>
<p>I have <a href="http://altdevblogaday.com/2011/01/26/managing-decoupling/">already talked</a> about the advantages of using IDs to refer to objects owned by other systems, but let me just quickly recap.</p>
<p><a href="http://www.altdevblogaday.com/2011/09/23/managing-decoupling-part-4-the-id-lookup-table/" class="more-link">Read more on Managing Decoupling Part 4 &#8212; The ID Lookup Table&#8230;</a></p>
]]></description>
				<content:encoded><![CDATA[<p>Today I am going to dig deeper into an important and versatile data structure that pops up all the time in the BitSquid engine &#8212; the ID lookup table.</p>
<p>I have <a href="http://altdevblogaday.com/2011/01/26/managing-decoupling/">already talked</a> about the advantages of using IDs to refer to objects owned by other systems, but let me just quickly recap.</p>
<p>IDs are better than direct pointers because we don’t get dangling references if the other system decides that the object needs to be destroyed.</p>
<p>IDs are better than <em>shared_ptr&lt;&gt;</em> and <em>weak_ptr&lt;&gt;</em> because it allows the other system to reorganize its objects in memory, delete them at will and doesn’t require thread synchronization to maintain a reference count. They are also POD (plain old data) structures, so they can be copied and moved in memory freely, passed back and forth between C++ and Lua, etc.</p>
<p>By an ID I simply mean an opaque data structure of <em>n</em> bits. It has no particular meaning to us, we just use it to refer to an object. The system provides the mechanism for looking up an object based on it. Since we seldom create more than 4 billion objects, 32 bits is usually enough for the ID, so we can just use a standard integer. If a system needs a lot of objects, we can go to 64 bits.</p>
<p>In this post I’m going to look at what data structures a system might use to do the lookup from ID to system object. There are some requirements that such data structures need to fulfill:</p>
<ul>
<li>There should be a 1-1 mapping between live objects and IDs.</li>
<li>If the system is supplied with an ID to an old object, it should be able to detect that the object is no longer alive.</li>
<li>Lookup from ID to object should be very fast (this is the most common operation).</li>
<li>Adding and removing objects should be fast.</li>
</ul>
<p>Let’s look at three different ways of implementing this data structure, with increasing degrees of sophistication.</p>
<h2>The STL Method</h2>
<p>The by-the-book object oriented approach is to allocate objects on the heap and use a <em>std::map</em> to map from ID to object.</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
</pre></td><td class="code"><pre class="cpp" style="font-family:monospace;"><span style="color: #0000ff;">typedef</span> <span style="color: #0000ff;">unsigned</span> ID<span style="color: #008080;">;</span>
&nbsp;
<span style="color: #0000ff;">struct</span> System
<span style="color: #008000;">&#123;</span>
	ID _next_id<span style="color: #008080;">;</span>
	std<span style="color: #008080;">::</span><span style="color: #007788;">map</span><span style="color: #000080;">&lt;</span>ID, Object <span style="color: #000040;">*</span><span style="color: #000080;">&gt;</span> _objects<span style="color: #008080;">;</span>
&nbsp;
	System<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span> <span style="color: #008000;">&#123;</span>_next_id <span style="color: #000080;">=</span> <span style="color: #0000dd;">0</span><span style="color: #008080;">;</span><span style="color: #008000;">&#125;</span>
&nbsp;
	<span style="color: #0000ff;">inline</span> <span style="color: #0000ff;">bool</span> has<span style="color: #008000;">&#40;</span>ID id<span style="color: #008000;">&#41;</span> <span style="color: #008000;">&#123;</span>
		<span style="color: #0000ff;">return</span> _objects.<span style="color: #007788;">count</span><span style="color: #008000;">&#40;</span>id<span style="color: #008000;">&#41;</span> <span style="color: #000080;">&gt;</span> <span style="color: #0000dd;">0</span><span style="color: #008080;">;</span>
	<span style="color: #008000;">&#125;</span>
&nbsp;
	<span style="color: #0000ff;">inline</span> Object <span style="color: #000040;">&amp;</span>lookup<span style="color: #008000;">&#40;</span>ID id<span style="color: #008000;">&#41;</span> <span style="color: #008000;">&#123;</span>
		<span style="color: #0000ff;">return</span> <span style="color: #000040;">*</span>_objects<span style="color: #008000;">&#91;</span>id<span style="color: #008000;">&#93;</span><span style="color: #008080;">;</span>
	<span style="color: #008000;">&#125;</span>
&nbsp;
	<span style="color: #0000ff;">inline</span> ID add<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span> <span style="color: #008000;">&#123;</span>
		ID id <span style="color: #000080;">=</span> _next_id<span style="color: #000040;">++</span><span style="color: #008080;">;</span>
		Object <span style="color: #000040;">*</span>o <span style="color: #000080;">=</span> <span style="color: #0000dd;">new</span> Object<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
		o<span style="color: #000040;">-</span><span style="color: #000080;">&gt;</span>id <span style="color: #000080;">=</span> id<span style="color: #008080;">;</span>
		_objects<span style="color: #008000;">&#91;</span>id<span style="color: #008000;">&#93;</span> <span style="color: #000080;">=</span> o<span style="color: #008080;">;</span>
		<span style="color: #0000ff;">return</span> id<span style="color: #008080;">;</span>
	<span style="color: #008000;">&#125;</span>
&nbsp;
	<span style="color: #0000ff;">inline</span> <span style="color: #0000ff;">void</span> <span style="color: #0000dd;">remove</span><span style="color: #008000;">&#40;</span>ID id<span style="color: #008000;">&#41;</span> <span style="color: #008000;">&#123;</span>
		Object <span style="color: #000040;">&amp;</span>o <span style="color: #000080;">=</span> lookup<span style="color: #008000;">&#40;</span>id<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
		_objects.<span style="color: #007788;">erase</span><span style="color: #008000;">&#40;</span>id<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
		<span style="color: #0000dd;">delete</span> <span style="color: #000040;">&amp;</span>o<span style="color: #008080;">;</span>
	<span style="color: #008000;">&#125;</span>
<span style="color: #008000;">&#125;</span><span style="color: #008080;">;</span></pre></td></tr></table></div>

<p>Note that if we create more than four billion objects, the <em>_next_id</em> counter will wrap around and we risk getting two objects with the same ID.</p>
<p>Apart from that, the only problem with this solution is that it is really inefficient. All objects are allocated individually on the heap, which gives bad cache behavior and the map lookup results in tree walking which is also bad for the cache. We can switch the map to a <em>hash_map</em> for slightly better performance, but that still leaves a lot of unnecessary pointer chasing.</p>
<h2>Array With Holes</h2>
<p>What we really want to do is to store our objects linearly in memory, because that will give us the best possible cache behavior. We can either use a fixed size array <em>Object[MAX_SIZE]</em> if we know the maximum number of objects that will ever be used, or we can be more flexible and use a <em>std::vector</em>.</p>
<p><strong>Note:</strong> If you care about performance and use <em>std::vector&lt;T&gt;</em> you should make a variant of it (call it <em>array&lt;T&gt;</em> for example) that doesn’t call constructors or initializes memory. Use that for simple types, when you don’t care about initialization. A dynamic <em>vector&lt;T&gt;</em> buffer that grows and shrinks a lot can spend a huge amount of time doing completely unnecessary constructor calls.</p>
<p>To find an object in the array, we need to know its index. But just using the index as ID is not enough, because the object might have been destroyed and a new object might have been created at the same index. To check for that, we also need an id value, as before. So we make the ID type a combination of both:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
</pre></td><td class="code"><pre class="cpp" style="font-family:monospace;"><span style="color: #0000ff;">struct</span> ID <span style="color: #008000;">&#123;</span>
	<span style="color: #0000ff;">unsigned</span> index<span style="color: #008080;">;</span>
	<span style="color: #0000ff;">unsigned</span> inner_id<span style="color: #008080;">;</span>
<span style="color: #008000;">&#125;</span><span style="color: #008080;">;</span></pre></td></tr></table></div>

<p>Now we can use the index to quickly look up the object and the <em>inner_id</em> to verify its identity.</p>
<p>Since the object index is stored in the ID which is exposed externally, once an object has been created it cannot move. When objects are deleted they will leave holes in the array.</p>
<p>￼<a href="http://altdevblogaday.com/wp-content/uploads/2011/09/id_lookup_1.png"><img src="http://altdevblogaday.com/wp-content/uploads/2011/09/id_lookup_1.png" alt="" width="600" class="alignnone size-full wp-image-17166" /></a></p>
<p>When we create new objects we don’t just want to add them to the end of the array. We want to make sure that we fill the holes in the array first.</p>
<p>The standard way of doing that is with a free list. We store a pointer to the first hole in a variable. In each hole we store a pointer to the next hole. These pointers thus form a linked list that enumerates all the holes.</p>
<p>￼<a href="http://altdevblogaday.com/wp-content/uploads/2011/09/id_lookup_2.png"><img src="http://altdevblogaday.com/wp-content/uploads/2011/09/id_lookup_2.png" alt="" width="600" class="alignnone size-full wp-image-17167" /></a></p>
<p>An interesting thing to note is that we usually don’t need to allocate any memory for these pointers. Since the pointers are only used for holes (i. e. dead objects) we can reuse the objects’ own memory for storing them. The objects don’t need that memory, since they are dead.</p>
<p>Here is an implementation. For clarity, I have used an explicit member <em>next</em> in the object for the free list rather than reusing the object’s memory:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
</pre></td><td class="code"><pre class="cpp" style="font-family:monospace;"><span style="color: #0000ff;">struct</span> System
<span style="color: #008000;">&#123;</span>
	<span style="color: #0000ff;">unsigned</span> _next_inner_id<span style="color: #008080;">;</span>
	std<span style="color: #008080;">::</span><span style="color: #007788;">vector</span><span style="color: #000080;">&lt;</span>Object<span style="color: #000080;">&gt;</span> _objects<span style="color: #008080;">;</span>
	<span style="color: #0000ff;">unsigned</span> _freelist<span style="color: #008080;">;</span>
&nbsp;
	System<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span> <span style="color: #008000;">&#123;</span>
		_next_inner_id <span style="color: #000080;">=</span> <span style="color: #0000dd;">0</span><span style="color: #008080;">;</span>
		_freelist <span style="color: #000080;">=</span> <span style="color: #0000ff;">UINT_MAX</span><span style="color: #008080;">;</span>
	<span style="color: #008000;">&#125;</span>
&nbsp;
	<span style="color: #0000ff;">inline</span> <span style="color: #0000ff;">bool</span> has<span style="color: #008000;">&#40;</span>ID id<span style="color: #008000;">&#41;</span> <span style="color: #008000;">&#123;</span>
		<span style="color: #0000ff;">return</span> _objects<span style="color: #008000;">&#91;</span>id.<span style="color: #007788;">index</span><span style="color: #008000;">&#93;</span>.<span style="color: #007788;">id</span>.<span style="color: #007788;">inner_id</span> <span style="color: #000080;">==</span> id.<span style="color: #007788;">inner_id</span><span style="color: #008080;">;</span>
	<span style="color: #008000;">&#125;</span>
&nbsp;
	<span style="color: #0000ff;">inline</span> Object <span style="color: #000040;">&amp;</span>lookup<span style="color: #008000;">&#40;</span>ID id<span style="color: #008000;">&#41;</span> <span style="color: #008000;">&#123;</span>
		<span style="color: #0000ff;">return</span> _objects<span style="color: #008000;">&#91;</span>id.<span style="color: #007788;">index</span><span style="color: #008000;">&#93;</span><span style="color: #008080;">;</span>
	<span style="color: #008000;">&#125;</span>
&nbsp;
	<span style="color: #0000ff;">inline</span> ID add<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span> <span style="color: #008000;">&#123;</span>
		ID id<span style="color: #008080;">;</span>
		id.<span style="color: #007788;">inner_id</span> <span style="color: #000080;">=</span> _next_inner_id<span style="color: #000040;">++</span><span style="color: #008080;">;</span>
		<span style="color: #0000ff;">if</span> <span style="color: #008000;">&#40;</span>_freelist <span style="color: #000080;">==</span> <span style="color: #0000ff;">UINT_MAX</span><span style="color: #008000;">&#41;</span> <span style="color: #008000;">&#123;</span>
			Object o<span style="color: #008080;">;</span>
			id.<span style="color: #007788;">index</span> <span style="color: #000080;">=</span> _objects.<span style="color: #007788;">size</span><span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
			o.<span style="color: #007788;">id</span> <span style="color: #000080;">=</span> id<span style="color: #008080;">;</span>
			_objects.<span style="color: #007788;">push_back</span><span style="color: #008000;">&#40;</span>o<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
		<span style="color: #008000;">&#125;</span> <span style="color: #0000ff;">else</span> <span style="color: #008000;">&#123;</span>
			id.<span style="color: #007788;">index</span> <span style="color: #000080;">=</span> _freelist<span style="color: #008080;">;</span>
			_freelist <span style="color: #000080;">=</span> _objects<span style="color: #008000;">&#91;</span>_freelist<span style="color: #008000;">&#93;</span>.<span style="color: #007788;">next</span><span style="color: #008080;">;</span>
		<span style="color: #008000;">&#125;</span>
		<span style="color: #0000ff;">return</span> id<span style="color: #008080;">;</span>
	<span style="color: #008000;">&#125;</span>
&nbsp;
	<span style="color: #0000ff;">inline</span> <span style="color: #0000ff;">void</span> <span style="color: #0000dd;">remove</span><span style="color: #008000;">&#40;</span>ID id<span style="color: #008000;">&#41;</span> <span style="color: #008000;">&#123;</span>
		Object <span style="color: #000040;">&amp;</span>o <span style="color: #000080;">=</span> lookup<span style="color: #008000;">&#40;</span>id<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
		o.<span style="color: #007788;">id</span>.<span style="color: #007788;">inner_id</span> <span style="color: #000080;">=</span> <span style="color: #0000ff;">UINT_MAX</span><span style="color: #008080;">;</span>
		o.<span style="color: #007788;">next</span> <span style="color: #000080;">=</span> _freelist<span style="color: #008080;">;</span>
		_freelist <span style="color: #000080;">=</span> id.<span style="color: #007788;">index</span><span style="color: #008080;">;</span>
	<span style="color: #008000;">&#125;</span>
<span style="color: #008000;">&#125;</span><span style="color: #008080;">;</span></pre></td></tr></table></div>

<p>This is a lot better than the STL solution. Insertion and removal is O(1). Lookup is just array indexing, which means it is very fast. In a quick-and-dirty-don’t-take-it-too-seriously test this was 40 times faster than the STL solution. In real-life it all depends on the actual usage patterns, of course.</p>
<p>The only part of this solution that is not an improvement over the STL version is that our ID structs have increased from 32 to 64 bits.</p>
<p>There are things that can be done about that. For example, if you never have more than 64 K objects live at the same time, you can get by with 16 bits for the index, which leaves 16 bits for the <em>inner_id</em>. Note that the <em>inner_id</em> doesn’t have to be globally unique, it is enough if it is unique for that index slot. So a 16 bit <em>inner_id</em> is fine if we never create more than 64 K objects in the same index slot.</p>
<p>If you want to go down that road you probably want to change the implementation of the free list slightly. The code above uses a standard free list implementation that acts as a LIFO stack. This means that if you create and delete objects in quick succession they will all be assigned to the same index slot which means you quickly run out of <em>inner_ids</em> for that slot. To prevent that, you want to make sure that you always have a certain number of elements in the free list (allocate more if you run low) and rewrite it as a FIFO. If you always have <em>N</em> free objects and use a FIFO free list, then you are guaranteed that you won’t see an inner_id collision until you have created at least <em>N</em> * 64 K objects.</p>
<p>Of course you can slice and dice the 32 bits in other ways if you hare different limits on the maximum number of objects. You have to crunch the numbers for your particular case to see if you can get by with a 32 bit ID.</p>
<h2>Packed Array</h2>
<p>One drawback with the approach sketched above is that since the index is exposed externally, the system cannot reorganize its objects in memory for maximum performance. </p>
<p>The holes are especially troubling. At some point the system probably wants to loop over all its objects and update them. If the object array is nearly full, no problem, But if the array has 50 % objects and 50 % holes, the loop will touch twice as much memory as necessary. That seems suboptimal.</p>
<p>We can get rid of that by introducing an extra level of indirection, where the IDs point to an array of indices that points to the objects themselves:</p>
<p>￼<a href="http://altdevblogaday.com/wp-content/uploads/2011/09/id_lookup_3.png"><img src="http://altdevblogaday.com/wp-content/uploads/2011/09/id_lookup_3.png" alt="" width="600" class="alignnone size-full wp-image-17168" /></a></p>
<p>This means that we pay the cost of an extra array lookup whenever we resolve the ID. On the other hand, the system objects are packed tight in memory which means that they can be updated more efficiently. Note that the system update doesn’t have to touch or care about the index array. Whether this is a net win depends on how the system is used, but my guess is that in most cases more items are touched internally than are referenced externally.</p>
<p>To remove an object with this solution we use the standard trick of swapping it with the last item in the array. Then we update the index so that it points to the new location of the swapped object.</p>
<p>Here is an implementation. To keep things interesting, this time with a fixed array size, a 32 bit ID and a FIFO free list.</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
</pre></td><td class="code"><pre class="cpp" style="font-family:monospace;"><span style="color: #0000ff;">typedef</span> <span style="color: #0000ff;">unsigned</span> ID<span style="color: #008080;">;</span>
&nbsp;
<span style="color: #339900;">#define MAX_OBJECTS 64*1024</span>
<span style="color: #339900;">#define INDEX_MASK 0xffff</span>
<span style="color: #339900;">#define NEW_OBJECT_ID_ADD 0x10000</span>
&nbsp;
<span style="color: #0000ff;">struct</span> Index <span style="color: #008000;">&#123;</span>
	ID id<span style="color: #008080;">;</span>
	<span style="color: #0000ff;">unsigned</span> <span style="color: #0000ff;">short</span> index<span style="color: #008080;">;</span>
	<span style="color: #0000ff;">unsigned</span> <span style="color: #0000ff;">short</span> next<span style="color: #008080;">;</span>
<span style="color: #008000;">&#125;</span><span style="color: #008080;">;</span>
&nbsp;
<span style="color: #0000ff;">struct</span> System
<span style="color: #008000;">&#123;</span>
	<span style="color: #0000ff;">unsigned</span> _num_objects<span style="color: #008080;">;</span>
	Object _objects<span style="color: #008000;">&#91;</span>MAX_OBJECTS<span style="color: #008000;">&#93;</span><span style="color: #008080;">;</span>
	Index _indices<span style="color: #008000;">&#91;</span>MAX_OBJECTS<span style="color: #008000;">&#93;</span><span style="color: #008080;">;</span>
	<span style="color: #0000ff;">unsigned</span> <span style="color: #0000ff;">short</span> _freelist_enqueue<span style="color: #008080;">;</span>
	<span style="color: #0000ff;">unsigned</span> <span style="color: #0000ff;">short</span> _freelist_dequeue<span style="color: #008080;">;</span>
&nbsp;
	System<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span> <span style="color: #008000;">&#123;</span>
		_num_objects <span style="color: #000080;">=</span> <span style="color: #0000dd;">0</span><span style="color: #008080;">;</span>
		<span style="color: #0000ff;">for</span> <span style="color: #008000;">&#40;</span><span style="color: #0000ff;">unsigned</span> i<span style="color: #000080;">=</span><span style="color: #0000dd;">0</span><span style="color: #008080;">;</span> i<span style="color: #000080;">&lt;</span>MAX_OBJECTS<span style="color: #008080;">;</span> <span style="color: #000040;">++</span>i<span style="color: #008000;">&#41;</span> <span style="color: #008000;">&#123;</span>
			_indices<span style="color: #008000;">&#91;</span>i<span style="color: #008000;">&#93;</span>.<span style="color: #007788;">id</span> <span style="color: #000080;">=</span> i<span style="color: #008080;">;</span>
			_indices<span style="color: #008000;">&#91;</span>i<span style="color: #008000;">&#93;</span>.<span style="color: #007788;">next</span> <span style="color: #000080;">=</span> i<span style="color: #000040;">+</span><span style="color: #0000dd;">1</span><span style="color: #008080;">;</span>
		<span style="color: #008000;">&#125;</span>
		_freelist_dequeue <span style="color: #000080;">=</span> <span style="color: #0000dd;">0</span><span style="color: #008080;">;</span>
		_freelist_enqueue <span style="color: #000080;">=</span> MAX_OBJECTS<span style="color: #000040;">-</span><span style="color: #0000dd;">1</span><span style="color: #008080;">;</span>
	<span style="color: #008000;">&#125;</span>
&nbsp;
	<span style="color: #0000ff;">inline</span> <span style="color: #0000ff;">bool</span> has<span style="color: #008000;">&#40;</span>ID id<span style="color: #008000;">&#41;</span> <span style="color: #008000;">&#123;</span>
		Index <span style="color: #000040;">&amp;</span>in <span style="color: #000080;">=</span> _indices<span style="color: #008000;">&#91;</span>id <span style="color: #000040;">&amp;</span> INDEX_MASK<span style="color: #008000;">&#93;</span><span style="color: #008080;">;</span>
		<span style="color: #0000ff;">return</span> in.<span style="color: #007788;">id</span> <span style="color: #000080;">==</span> id <span style="color: #000040;">&amp;&amp;</span> in.<span style="color: #007788;">index</span> <span style="color: #000040;">!</span><span style="color: #000080;">=</span> <span style="color: #0000ff;">USHRT_MAX</span><span style="color: #008080;">;</span>
	<span style="color: #008000;">&#125;</span>
&nbsp;
	<span style="color: #0000ff;">inline</span> Object <span style="color: #000040;">&amp;</span>lookup<span style="color: #008000;">&#40;</span>ID id<span style="color: #008000;">&#41;</span> <span style="color: #008000;">&#123;</span>
		<span style="color: #0000ff;">return</span> _objects<span style="color: #008000;">&#91;</span>_indices<span style="color: #008000;">&#91;</span>id <span style="color: #000040;">&amp;</span> INDEX_MASK<span style="color: #008000;">&#93;</span>.<span style="color: #007788;">index</span><span style="color: #008000;">&#93;</span><span style="color: #008080;">;</span>
	<span style="color: #008000;">&#125;</span>
&nbsp;
	<span style="color: #0000ff;">inline</span> ID add<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span> <span style="color: #008000;">&#123;</span>
		Index <span style="color: #000040;">&amp;</span>in <span style="color: #000080;">=</span> _indices<span style="color: #008000;">&#91;</span>_freelist_dequeue<span style="color: #008000;">&#93;</span><span style="color: #008080;">;</span>
		_freelist_dequeue <span style="color: #000080;">=</span> in.<span style="color: #007788;">next</span><span style="color: #008080;">;</span>
		in.<span style="color: #007788;">id</span> <span style="color: #000040;">+</span><span style="color: #000080;">=</span> NEW_OBJECT_ID_ADD<span style="color: #008080;">;</span>
		in.<span style="color: #007788;">index</span> <span style="color: #000080;">=</span> _num_objects<span style="color: #000040;">++</span><span style="color: #008080;">;</span>
		Object <span style="color: #000040;">&amp;</span>o <span style="color: #000080;">=</span> _objects<span style="color: #008000;">&#91;</span>in.<span style="color: #007788;">index</span><span style="color: #008000;">&#93;</span><span style="color: #008080;">;</span>
		o.<span style="color: #007788;">id</span> <span style="color: #000080;">=</span> in.<span style="color: #007788;">id</span><span style="color: #008080;">;</span>
		<span style="color: #0000ff;">return</span> o.<span style="color: #007788;">id</span><span style="color: #008080;">;</span>
	<span style="color: #008000;">&#125;</span>
&nbsp;
	<span style="color: #0000ff;">inline</span> <span style="color: #0000ff;">void</span> <span style="color: #0000dd;">remove</span><span style="color: #008000;">&#40;</span>ID id<span style="color: #008000;">&#41;</span> <span style="color: #008000;">&#123;</span>
		Index <span style="color: #000040;">&amp;</span>in <span style="color: #000080;">=</span> _indices<span style="color: #008000;">&#91;</span>id <span style="color: #000040;">&amp;</span> INDEX_MASK<span style="color: #008000;">&#93;</span><span style="color: #008080;">;</span>
&nbsp;
		Object <span style="color: #000040;">&amp;</span>o <span style="color: #000080;">=</span> _objects<span style="color: #008000;">&#91;</span>in.<span style="color: #007788;">index</span><span style="color: #008000;">&#93;</span><span style="color: #008080;">;</span>
		o <span style="color: #000080;">=</span> _objects<span style="color: #008000;">&#91;</span><span style="color: #000040;">--</span>_num_objects<span style="color: #008000;">&#93;</span><span style="color: #008080;">;</span>
		_indices<span style="color: #008000;">&#91;</span>o.<span style="color: #007788;">id</span> <span style="color: #000040;">&amp;</span> INDEX_MASK<span style="color: #008000;">&#93;</span>.<span style="color: #007788;">index</span> <span style="color: #000080;">=</span> in.<span style="color: #007788;">index</span><span style="color: #008080;">;</span>
&nbsp;
		in.<span style="color: #007788;">index</span> <span style="color: #000080;">=</span> <span style="color: #0000ff;">USHRT_MAX</span><span style="color: #008080;">;</span>
		_indices<span style="color: #008000;">&#91;</span>_freelist_enqueue<span style="color: #008000;">&#93;</span>.<span style="color: #007788;">next</span> <span style="color: #000080;">=</span> id <span style="color: #000040;">&amp;</span> INDEX_MASK<span style="color: #008080;">;</span>
		_freelist_enqueue <span style="color: #000080;">=</span> id <span style="color: #000040;">&amp;</span> INDEX_MASK<span style="color: #008080;">;</span>
	<span style="color: #008000;">&#125;</span>
<span style="color: #008000;">&#125;</span><span style="color: #008080;">;</span></pre></td></tr></table></div>

<p><em>(This has also been posted to <a href="http://bitsquid.blogspot.com/">the BitSquid Blog</a>.)</em></p>
]]></content:encoded>
			<wfw:commentRss>http://www.altdevblogaday.com/2011/09/23/managing-decoupling-part-4-the-id-lookup-table/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>A Simple Roll-Your-Own Documentation System</title>
		<link>http://www.altdevblogaday.com/2011/09/08/a-simple-roll-your-own-documentation-system/</link>
		<comments>http://www.altdevblogaday.com/2011/09/08/a-simple-roll-your-own-documentation-system/#comments</comments>
		<pubDate>Thu, 08 Sep 2011 21:39:27 +0000</pubDate>
		<dc:creator>Niklas Frykholm</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[documentation]]></category>
		<category><![CDATA[markup]]></category>
		<category><![CDATA[ruby]]></category>

		<guid isPermaLink="false">http://altdevblogaday.com/?p=16104</guid>
		<description><![CDATA[<p>I like to roll my own documentation systems. There, I’ve said it. Not for inline documentation, mind you. For that there is Doxygen and for that I am grateful. Because while I love coding, there is fun coding and not-so-fun coding, and writing C++ parsers tends to fall in the latter category.</p>
<p><a href="http://www.altdevblogaday.com/2011/09/08/a-simple-roll-your-own-documentation-system/" class="more-link">Read more on A Simple Roll-Your-Own Documentation System&#8230;</a></p>
]]></description>
				<content:encoded><![CDATA[<p>I like to roll my own documentation systems. There, I’ve said it. Not for inline documentation, mind you. For that there is Doxygen and for that I am grateful. Because while I love coding, there is fun coding and not-so-fun coding, and writing C++ parsers tends to fall in the latter category.</p>
<p>So for inline documentation I use Doxygen, but for everything else, I roll my own. Why?</p>
<p>I don’t want to use Word or Pages or any other word processing program because I want my documents to be plain text that can be diffed and merged when necessary. And I want to be able to output it as <em>clean</em> HTML or in any other format I may like.</p>
<p>I don’t want to use HTML or LaTeX or any other presentation-oriented language, because I want to be able to massage the content in various ways before presenting it. Reordering it, adding an index or a glossary, removing deprecated parts, etc. Also, writing &lt;p&gt; gets boring very quickly.</p>
<p>I don’t want to use a Wiki, because I want to check in my documents together with the code, so that code versions and document versions match in the repository. I definitely <em>don’t</em> want to manage five different Wikis, corresponding to different engine release versions. Also, Wiki markup languages tend to be verbose and obtuse.</p>
<p>I <em>could</em> use an existing markup language, such as DocBook, Markdown or ReStructured Text. But all of them contain lots of stuff that I don’t need and lack some stuff that I <em>do</em> need. For example I want to include snippets of syntax highlighted Lua code, margin notes and math formulas. And I want to do it in a way that is easy to read and easy to write. Because I want there to be as few things as possible standing in the way of writing good documentation.</p>
<p>So I roll my own. But as you will see, it is not that much work.</p>
<p>I’ve written a fair number of markup systems over the years (perhaps one too many, but hey, that is how you learn) and I’ve settled on a pretty minimalistic structure that can be implemented in a few hundred lines of Ruby. In general, I tend to favor simple minimalistic systems over big frameworks that try to ”cover everything”. Covering everything is usually impossible and when you discover that you need new functionality, the lightweight systems are a lot easier to extend than the behemoths.</p>
<p>There are two basic components to the system. Always two there are, a <em>parser</em> and a <em>generator</em>. The parser reads the source document and converts it to some kind of structured representation. The generator takes the structured representation and converts it to an output format. Here I’ll only consider HTML, because to me that is the only output format that really matters.</p>
<p>To have something concrete to talk about, let’s use this source document, written in a syntax that I just made up:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="text" style="font-family:monospace;">@h1 Flavors of ice cream
&nbsp;
My favorite ice cream flavors are:
&nbsp;
@li Strawberry
@li Seagull</pre></td></tr></table></div>

<h2>The Parser</h2>
<p>The most crucial point of the system is what the structured representation should look like. How should the parser communicate with the generator? My minimalistic solution is to just let the representation be a list of lines, with each line consisting of a type marker and some text.</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="text" style="font-family:monospace;">(:h1, ”Flavors of...”)
(:empty, ””)
(:text, ”My favorite...”)
(:empty, ””)
(:li, ”Strawberry”)
(:li, ”Seagull”)</pre></td></tr></table></div>

<p>To some this will probably seem like complete heresy. Surely I need some kind of hierarchical representation. How can I otherwise represent things like a list-in-a-list-in-a-cat-in-a-hat?</p>
<p>No problem, to represent a list item nested in another list, I just use a <em>@li_li</em> tag and a corresponding <em>:li_li</em> type marker. If someone wants three or more levels of nesting I suggest that they rewrite their document. This is supposed to be <em>readable</em> documentation, not Tractatus Logico-Philosophicus. I simply don’t think that deep nesting is important enough to warrant a complicated hierarchical design. As I said, I prefer the simple things in life.</p>
<p>So, now that we know the output format, we can write the parser in under 20 lines:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
</pre></td><td class="code"><pre class="ruby" style="font-family:monospace;"><span style="color:#9966CC; font-weight:bold;">class</span> Parser
  attr_reader <span style="color:#ff3333; font-weight:bold;">:lines</span>
&nbsp;
  <span style="color:#9966CC; font-weight:bold;">def</span> initialize<span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#006600; font-weight:bold;">&#41;</span>
    <span style="color:#0066ff; font-weight:bold;">@lines</span> = <span style="color:#006600; font-weight:bold;">&#91;</span><span style="color:#006600; font-weight:bold;">&#93;</span>
  <span style="color:#9966CC; font-weight:bold;">end</span>
&nbsp;
  <span style="color:#9966CC; font-weight:bold;">def</span> parse<span style="color:#006600; font-weight:bold;">&#40;</span>line<span style="color:#006600; font-weight:bold;">&#41;</span>
    <span style="color:#9966CC; font-weight:bold;">case</span> line
    <span style="color:#9966CC; font-weight:bold;">when</span> <span style="color:#006600; font-weight:bold;">/</span>^$<span style="color:#006600; font-weight:bold;">/</span>
      <span style="color:#0066ff; font-weight:bold;">@lines</span> <span style="color:#006600; font-weight:bold;">&lt;&lt;</span> <span style="color:#006600; font-weight:bold;">&#123;</span>:type <span style="color:#006600; font-weight:bold;">=&gt;</span> <span style="color:#ff3333; font-weight:bold;">:empty</span>, <span style="color:#ff3333; font-weight:bold;">:line</span> <span style="color:#006600; font-weight:bold;">=&gt;</span> <span style="color:#996600;">&quot;&quot;</span><span style="color:#006600; font-weight:bold;">&#125;</span>
    <span style="color:#9966CC; font-weight:bold;">when</span> <span style="color:#006600; font-weight:bold;">/</span>@<span style="color:#006600; font-weight:bold;">&#40;</span>\S<span style="color:#006600; font-weight:bold;">+</span><span style="color:#006600; font-weight:bold;">&#41;</span>\s<span style="color:#006600; font-weight:bold;">+</span><span style="color:#006600; font-weight:bold;">&#40;</span>.<span style="color:#006600; font-weight:bold;">*</span><span style="color:#006600; font-weight:bold;">&#41;</span>$<span style="color:#006600; font-weight:bold;">/</span>
      <span style="color:#0066ff; font-weight:bold;">@lines</span> <span style="color:#006600; font-weight:bold;">&lt;&lt;</span> <span style="color:#006600; font-weight:bold;">&#123;</span>:type <span style="color:#006600; font-weight:bold;">=&gt;</span> $1.<span style="color:#9900CC;">intern</span>, <span style="color:#ff3333; font-weight:bold;">:line</span> <span style="color:#006600; font-weight:bold;">=&gt;</span> $2<span style="color:#006600; font-weight:bold;">&#125;</span>
    <span style="color:#9966CC; font-weight:bold;">when</span> <span style="color:#006600; font-weight:bold;">/</span>^<span style="color:#006600; font-weight:bold;">&#40;</span>.<span style="color:#006600; font-weight:bold;">*</span><span style="color:#006600; font-weight:bold;">&#41;</span>$<span style="color:#006600; font-weight:bold;">/</span>
      <span style="color:#0066ff; font-weight:bold;">@lines</span> <span style="color:#006600; font-weight:bold;">&lt;&lt;</span> <span style="color:#006600; font-weight:bold;">&#123;</span>:type <span style="color:#006600; font-weight:bold;">=&gt;</span> <span style="color:#ff3333; font-weight:bold;">:text</span>, <span style="color:#ff3333; font-weight:bold;">:line</span> <span style="color:#006600; font-weight:bold;">=&gt;</span> line<span style="color:#006600; font-weight:bold;">&#125;</span>
    <span style="color:#9966CC; font-weight:bold;">end</span>
  <span style="color:#9966CC; font-weight:bold;">end</span>
<span style="color:#9966CC; font-weight:bold;">end</span></pre></td></tr></table></div>

<p>Of course you can go a lot fancier with the parser than this. For example, you can make a more Markdown-like syntax where you create lists by just starting lines with bullet points. But this doesn’t really change the basic structure, you just need to add more whens in your case-statement.</p>
<p>One useful approach, as you make more advanced parsers, is to have markers that put the parser in a particular state. For example, you could have a marker <em>@lua</em> that made the parser consider all the lines following it to be of type <em>:lua</em> until the marker <em>@endlua</em> was reached.</p>
<h2>The Generator</h2>
<p>A useful trick when writing HTML generators is to always keep track of the HTML tags that you have currently opened. This lets you write a method <em>context(tags)</em> which takes a list of tags as arguments and closes and opens tags so that exactly the tags specified in the list are open.</p>
<p>With such a method available, it is simple to write the code for outputting tags:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
</pre></td><td class="code"><pre class="ruby" style="font-family:monospace;"><span style="color:#9966CC; font-weight:bold;">class</span> <span style="color:#CC00FF; font-weight:bold;">Generator</span>
  <span style="color:#9966CC; font-weight:bold;">def</span> h1<span style="color:#006600; font-weight:bold;">&#40;</span>line<span style="color:#006600; font-weight:bold;">&#41;</span>
    context<span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#006600; font-weight:bold;">%</span>W<span style="color:#006600; font-weight:bold;">&#40;</span>h1 <span style="color:#008000; font-style:italic;">#{&quot;a name=\&quot;#{line}\&quot;&quot;}))</span>
    <span style="color:#CC0066; font-weight:bold;">print</span> line
  <span style="color:#9966CC; font-weight:bold;">end</span>
&nbsp;
  <span style="color:#9966CC; font-weight:bold;">def</span> text<span style="color:#006600; font-weight:bold;">&#40;</span>line<span style="color:#006600; font-weight:bold;">&#41;</span>
    context<span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#006600; font-weight:bold;">%</span>w<span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#CC0066; font-weight:bold;">p</span><span style="color:#006600; font-weight:bold;">&#41;</span><span style="color:#006600; font-weight:bold;">&#41;</span>
    <span style="color:#CC0066; font-weight:bold;">print</span> line
  <span style="color:#9966CC; font-weight:bold;">end</span>
&nbsp;
  <span style="color:#9966CC; font-weight:bold;">def</span> empty<span style="color:#006600; font-weight:bold;">&#40;</span>line<span style="color:#006600; font-weight:bold;">&#41;</span>
    context<span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#006600; font-weight:bold;">%</span>w<span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#006600; font-weight:bold;">&#41;</span><span style="color:#006600; font-weight:bold;">&#41;</span>
    <span style="color:#CC0066; font-weight:bold;">print</span> line
  <span style="color:#9966CC; font-weight:bold;">end</span>
&nbsp;
  <span style="color:#9966CC; font-weight:bold;">def</span> li<span style="color:#006600; font-weight:bold;">&#40;</span>line<span style="color:#006600; font-weight:bold;">&#41;</span>
    context<span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#006600; font-weight:bold;">%</span>w<span style="color:#006600; font-weight:bold;">&#40;</span>ul li<span style="color:#006600; font-weight:bold;">&#41;</span><span style="color:#006600; font-weight:bold;">&#41;</span>
    <span style="color:#CC0066; font-weight:bold;">print</span> line
    context<span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#006600; font-weight:bold;">%</span>w<span style="color:#006600; font-weight:bold;">&#40;</span>ul<span style="color:#006600; font-weight:bold;">&#41;</span><span style="color:#006600; font-weight:bold;">&#41;</span>
  <span style="color:#9966CC; font-weight:bold;">end</span>
<span style="color:#9966CC; font-weight:bold;">end</span></pre></td></tr></table></div>

<p>Notice how this works. The <em>li()</em> method makes sure that we are in a <em>&lt;ul&gt; &lt;li&gt;</em> context, so it closes all other open tags and opens the right ones. Then, after printing its content, it says that the context should just be <em>&lt;ul&gt;</em> which forces the closure of the <em>&lt;li&gt;</em> tag. If we wanted to support the <em>:li_li</em> tag, mentioned above, we could write it simply as:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
</pre></td><td class="code"><pre class="ruby" style="font-family:monospace;"><span style="color:#9966CC; font-weight:bold;">class</span> <span style="color:#CC00FF; font-weight:bold;">Generator</span>
  <span style="color:#9966CC; font-weight:bold;">def</span> li_li<span style="color:#006600; font-weight:bold;">&#40;</span>line<span style="color:#006600; font-weight:bold;">&#41;</span>
    context<span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#006600; font-weight:bold;">%</span>w<span style="color:#006600; font-weight:bold;">&#40;</span>ul li ul li<span style="color:#006600; font-weight:bold;">&#41;</span><span style="color:#006600; font-weight:bold;">&#41;</span>
    <span style="color:#CC0066; font-weight:bold;">print</span> line
    context<span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#006600; font-weight:bold;">%</span>w<span style="color:#006600; font-weight:bold;">&#40;</span>ul li ul<span style="color:#006600; font-weight:bold;">&#41;</span><span style="color:#006600; font-weight:bold;">&#41;</span>
  <span style="color:#9966CC; font-weight:bold;">end</span>
<span style="color:#9966CC; font-weight:bold;">end</span></pre></td></tr></table></div>

<p>Notice also that this approach allows us to just step through the lines in the data structure and print them. We don’t have to look back and forward in the data structure to find out where a <em>&lt;ul&gt;</em> should begin and end.</p>
<p>The rest of the Generator class implements the <em>context()</em> function and handles indentation:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
</pre></td><td class="code"><pre class="ruby" style="font-family:monospace;"><span style="color:#9966CC; font-weight:bold;">class</span> <span style="color:#CC00FF; font-weight:bold;">Generator</span>
  <span style="color:#9966CC; font-weight:bold;">def</span> initialize<span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#006600; font-weight:bold;">&#41;</span>
    <span style="color:#0066ff; font-weight:bold;">@out</span> = <span style="color:#996600;">&quot;&quot;</span>
    <span style="color:#0066ff; font-weight:bold;">@context</span> = <span style="color:#006600; font-weight:bold;">&#91;</span><span style="color:#006600; font-weight:bold;">&#93;</span>
    <span style="color:#0066ff; font-weight:bold;">@indent</span> = <span style="color:#006666;">0</span>
  <span style="color:#9966CC; font-weight:bold;">end</span>
&nbsp;
  <span style="color:#9966CC; font-weight:bold;">def</span> <span style="color:#CC0066; font-weight:bold;">print</span><span style="color:#006600; font-weight:bold;">&#40;</span>s<span style="color:#006600; font-weight:bold;">&#41;</span>
    <span style="color:#0066ff; font-weight:bold;">@out</span> <span style="color:#006600; font-weight:bold;">&lt;&lt;</span> <span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#996600;">&quot;  &quot;</span> <span style="color:#006600; font-weight:bold;">*</span> <span style="color:#0066ff; font-weight:bold;">@indent</span><span style="color:#006600; font-weight:bold;">&#41;</span> <span style="color:#006600; font-weight:bold;">&lt;&lt;</span> s <span style="color:#006600; font-weight:bold;">&lt;&lt;</span> <span style="color:#996600;">&quot;<span style="color:#000099;">\n</span>&quot;</span>
  <span style="color:#9966CC; font-weight:bold;">end</span>
&nbsp;
  <span style="color:#9966CC; font-weight:bold;">def</span> <span style="color:#CC0066; font-weight:bold;">open</span><span style="color:#006600; font-weight:bold;">&#40;</span>ci<span style="color:#006600; font-weight:bold;">&#41;</span>
    <span style="color:#CC0066; font-weight:bold;">print</span> <span style="color:#996600;">&quot;&lt;#{ci}&gt;&quot;</span>
    <span style="color:#0066ff; font-weight:bold;">@indent</span> <span style="color:#006600; font-weight:bold;">+</span>= <span style="color:#006666;">1</span>
  <span style="color:#9966CC; font-weight:bold;">end</span>
&nbsp;
  <span style="color:#9966CC; font-weight:bold;">def</span> close<span style="color:#006600; font-weight:bold;">&#40;</span>ci<span style="color:#006600; font-weight:bold;">&#41;</span>
    <span style="color:#0066ff; font-weight:bold;">@indent</span> <span style="color:#006600; font-weight:bold;">-</span>= <span style="color:#006666;">1</span>
    <span style="color:#CC0066; font-weight:bold;">print</span> <span style="color:#996600;">&quot;&lt;/#{ci[/^<span style="color:#000099;">\S</span>*/]}&gt;&quot;</span>
  <span style="color:#9966CC; font-weight:bold;">end</span>
&nbsp;
  <span style="color:#9966CC; font-weight:bold;">def</span> context<span style="color:#006600; font-weight:bold;">&#40;</span>c<span style="color:#006600; font-weight:bold;">&#41;</span>
    i = <span style="color:#006666;">0</span>
    <span style="color:#9966CC; font-weight:bold;">while</span> <span style="color:#0066ff; font-weight:bold;">@context</span><span style="color:#006600; font-weight:bold;">&#91;</span>i<span style="color:#006600; font-weight:bold;">&#93;</span> != <span style="color:#0000FF; font-weight:bold;">nil</span> <span style="color:#006600; font-weight:bold;">&amp;&amp;</span> <span style="color:#0066ff; font-weight:bold;">@context</span><span style="color:#006600; font-weight:bold;">&#91;</span>i<span style="color:#006600; font-weight:bold;">&#93;</span> == c<span style="color:#006600; font-weight:bold;">&#91;</span>i<span style="color:#006600; font-weight:bold;">&#93;</span>
      i <span style="color:#006600; font-weight:bold;">+</span>= <span style="color:#006666;">1</span>
    <span style="color:#9966CC; font-weight:bold;">end</span>
    <span style="color:#9966CC; font-weight:bold;">while</span> <span style="color:#0066ff; font-weight:bold;">@context</span>.<span style="color:#9900CC;">size</span> <span style="color:#006600; font-weight:bold;">&gt;</span> i
      close<span style="color:#006600; font-weight:bold;">&#40;</span>@context.<span style="color:#9900CC;">last</span><span style="color:#006600; font-weight:bold;">&#41;</span>
      <span style="color:#0066ff; font-weight:bold;">@context</span>.<span style="color:#9900CC;">pop</span>
    <span style="color:#9966CC; font-weight:bold;">end</span>
    <span style="color:#9966CC; font-weight:bold;">while</span> c.<span style="color:#9900CC;">size</span> <span style="color:#006600; font-weight:bold;">&gt;</span> <span style="color:#0066ff; font-weight:bold;">@context</span>.<span style="color:#9900CC;">size</span>
      <span style="color:#0066ff; font-weight:bold;">@context</span>.<span style="color:#9900CC;">push</span><span style="color:#006600; font-weight:bold;">&#40;</span> c<span style="color:#006600; font-weight:bold;">&#91;</span>@context.<span style="color:#9900CC;">size</span><span style="color:#006600; font-weight:bold;">&#93;</span> <span style="color:#006600; font-weight:bold;">&#41;</span>
      <span style="color:#CC0066; font-weight:bold;">open</span><span style="color:#006600; font-weight:bold;">&#40;</span>@context.<span style="color:#9900CC;">last</span><span style="color:#006600; font-weight:bold;">&#41;</span>
    <span style="color:#9966CC; font-weight:bold;">end</span>
  <span style="color:#9966CC; font-weight:bold;">end</span>
&nbsp;
  <span style="color:#9966CC; font-weight:bold;">def</span> <span style="color:#CC0066; font-weight:bold;">format</span><span style="color:#006600; font-weight:bold;">&#40;</span>lines<span style="color:#006600; font-weight:bold;">&#41;</span>
    lines.<span style="color:#9900CC;">each</span> <span style="color:#006600; font-weight:bold;">&#123;</span><span style="color:#006600; font-weight:bold;">|</span>line<span style="color:#006600; font-weight:bold;">|</span> <span style="color:#0000FF; font-weight:bold;">self</span>.<span style="color:#9900CC;">send</span><span style="color:#006600; font-weight:bold;">&#40;</span>line<span style="color:#006600; font-weight:bold;">&#91;</span><span style="color:#ff3333; font-weight:bold;">:type</span><span style="color:#006600; font-weight:bold;">&#93;</span>, line<span style="color:#006600; font-weight:bold;">&#91;</span><span style="color:#ff3333; font-weight:bold;">:line</span><span style="color:#006600; font-weight:bold;">&#93;</span><span style="color:#006600; font-weight:bold;">&#41;</span>
    context<span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#006600; font-weight:bold;">%</span>w<span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#006600; font-weight:bold;">&#41;</span><span style="color:#006600; font-weight:bold;">&#41;</span>
    <span style="color:#0000FF; font-weight:bold;">return</span> <span style="color:#0066ff; font-weight:bold;">@out</span>
  <span style="color:#9966CC; font-weight:bold;">end</span>
<span style="color:#9966CC; font-weight:bold;">end</span></pre></td></tr></table></div>

<p>Used as:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
</pre></td><td class="code"><pre class="ruby" style="font-family:monospace;">parser = Parser.<span style="color:#9900CC;">new</span>
text.<span style="color:#9900CC;">each_line</span> <span style="color:#006600; font-weight:bold;">&#123;</span><span style="color:#006600; font-weight:bold;">|</span>line<span style="color:#006600; font-weight:bold;">|</span> parser.<span style="color:#9900CC;">parse</span><span style="color:#006600; font-weight:bold;">&#40;</span>line<span style="color:#006600; font-weight:bold;">&#41;</span><span style="color:#006600; font-weight:bold;">&#125;</span>
<span style="color:#CC0066; font-weight:bold;">puts</span> <span style="color:#CC00FF; font-weight:bold;">Generator</span>.<span style="color:#9900CC;">new</span>.<span style="color:#CC0066; font-weight:bold;">format</span><span style="color:#006600; font-weight:bold;">&#40;</span>parser.<span style="color:#9900CC;">lines</span><span style="color:#006600; font-weight:bold;">&#41;</span></pre></td></tr></table></div>

<p>So there you have it, the start of a custom documentation system, easy to extend with new tags in under 100 lines of Ruby code.</p>
<p>There are some things I haven’t touched on here, like TOC generation or inline formatting (bold and emphasized text). But it is easy to write them as extensions of this basic system. For example, the TOC could be generated with an additional pass over the structured data. If there is enough interest I could show an example in a follow-up post.</p>
<p><span style="font-size: 80%">(This has also been posted to the <a href="http://bitsquid.blogspot.com/">BitSquid blog</a>. Check out <a href="http://www.bitsquid.se/">our new website</a>.)</span></p>
]]></content:encoded>
			<wfw:commentRss>http://www.altdevblogaday.com/2011/09/08/a-simple-roll-your-own-documentation-system/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>An idea for better watch windows</title>
		<link>http://www.altdevblogaday.com/2011/08/24/an-idea-for-better-watch-windows/</link>
		<comments>http://www.altdevblogaday.com/2011/08/24/an-idea-for-better-watch-windows/#comments</comments>
		<pubDate>Wed, 24 Aug 2011 21:48:42 +0000</pubDate>
		<dc:creator>Niklas Frykholm</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[debugging]]></category>
		<category><![CDATA[Visual Studio]]></category>

		<guid isPermaLink="false">http://altdevblogaday.com/?p=15045</guid>
		<description><![CDATA[<p>Watch windows suck. I’ve spent a large part of my career looking at them (that’s how those bugs get fixed) and it’s often a frustrating experience.</p>
<p><a href="http://altdevblogaday.com/wp-content/uploads/2011/08/image1.png"><img src="http://altdevblogaday.com/wp-content/uploads/2011/08/image1.png" alt="Visual Studio Watch WIndow" width="460" height="110" class="alignnone size-full wp-image-15047" /></a>￼</p>
<p>Visual Studio’s watch window is one of the better ones, but it still has many issues that make the debugging experience a lot less pleasant than it could be.</p>
<p><a href="http://www.altdevblogaday.com/2011/08/24/an-idea-for-better-watch-windows/" class="more-link">Read more on An idea for better watch windows&#8230;</a></p>
]]></description>
				<content:encoded><![CDATA[<p>Watch windows suck. I’ve spent a large part of my career looking at them (that’s how those bugs get fixed) and it’s often a frustrating experience.</p>
<p><a href="http://altdevblogaday.com/wp-content/uploads/2011/08/image1.png"><img src="http://altdevblogaday.com/wp-content/uploads/2011/08/image1.png" alt="Visual Studio Watch WIndow" width="460" height="110" class="alignnone size-full wp-image-15047" /></a>￼</p>
<p>Visual Studio’s watch window is one of the better ones, but it still has many issues that make the debugging experience a lot less pleasant than it could be.</p>
<ul>
<li>Custom data types such as <em>MyTree</em>, <em>MyHashSet</em> and <em>MyLinkedList</em> are difficult to look at. To get to the content you have to understand the internal data layout and expand the links by hand.</li>
<li>I like to pack my resource data into <a href="http://bitsquid.blogspot.com/2010/02/blob-and-i.html" title="The Blob and I">tight static blobs</a> &#8212; <em>file formats for memory</em>. A simple such blob might have a header with a variable number of offsets into a buffer of tightly packed strings. Such memory layouts cannot be described with just C structs and the watch window can’t inspect them. You have to cast pointers by hand or use the <em>Memory</em> view.</li>
</ul>
<div id="attachment_15048" class="wp-caption alignnone" style="width: 603px"><a href="http://altdevblogaday.com/wp-content/uploads/2011/08/image2.png"><img src="http://altdevblogaday.com/wp-content/uploads/2011/08/image2.png" alt="Visual Studio Memory Window" width="593" height="283" class="size-full wp-image-15048" /></a><p class="wp-caption-text">I don’t even see the code. All I see is a hermite curve fitted, time key sorted, zlib compressed reload animation.</p></div>
<ul>
<li>If I have an array with 10 000 floats and one of them is a <em>#NaN</em>, I have no way of finding out except to expand it and scroll through the numbers until I find the bad one.</li>
<li>The watch window can’t do reverse lookup of string hashes, so when I see a hash value in the data I have no idea what it refers to.</li>
</ul>
<p>Yes, I know that some of these things can be fixed. I know that you can get the Visual Studio Debugger to understand your own data types by editing <em>autoexp.dat</em>. And since I’ve done that for all our major collection types (<em>Vector</em>, <em>Deque</em>, <em>Map</em>, <em>SortMap</em>, <em>HashMap</em>, <em>Set</em>, <em>SortSet</em>, <em>HashSet</em>, <em>ConstConfigValue</em> and <em>DynamicConfigValue</em>) I know what a pain it is, and I know I don’t want to do it any more. Also, it doesn’t help the debuggers for the other platforms.</p>
<p>I also know that you can do some tricks with Visual Studio extensions. At my previous company we had reverse hash lookup through a Visual Studio extension. That was also painful to write, and a single platform solution.</p>
<p>So yes, you can fix some things and will make your work environment a little better. But I think we should aim higher.</p>
<p>Consider this: The variable watcher has access to the entire game memory <em>and</em> plenty of time to analyze it. (Variable watching is not a time critical task.)</p>
<p>Imagine what a well written C program that knew the layout of all your data structures could do with that information. It could expand binary trees and display them in a nice view, reverse lookup your hashes, highlight uninitialized <em>0xdeadbeef</em> variables, spell check your strings, etc.</p>
<h2>The idea</h2>
<p>So this is my idea: instead of writing plug-ins and extensions for all the IDEs and platforms in the world, we write the watcher as a separate external program. The user starts the program, connects to a process, enters a memory address and a variable type and gets presented with a nice view of the data:</p>
<p><a href="http://altdevblogaday.com/wp-content/uploads/2011/08/image3.png"><img src="http://altdevblogaday.com/wp-content/uploads/2011/08/image3.png" alt="Variable Watcher" width="660" height="520" class="alignnone size-full wp-image-15049" /></a><br />
￼<br />
The connection backend would be customizable so that we could use it both for local processes and remote devices (Xbox/PS3). The front end sends an <em>(address, size)</em> request and the backend replies with a bunch of data. So the platform doesn’t matter. As long as there is some way of accessing the memory of the device we can connect it to the watcher.</p>
<p>We can even use it to look at file contents. All we need is a backend that can return data from different offsets in the file. This works especially well for <a href="http://bitsquid.blogspot.com/2010/02/blob-and-i.html" title="The Blob and I">data blobs</a>, where the file and memory formats are identical. The watcher would function as a general data viewer that could be used for both files and memory.</p>
<p>For this to work, we need a way to describe our data structures to the program. It should understand regular C structs, of course, but we also need some way of describing more complex data, such as variable length objects, offsets, choices, etc. Essentially, what we need is a generic way to describe blobs of structured data, no matter what the format and layout.</p>
<p>I’m not sure what such a description language might look like (or if one already exists), but it might be something loosely based on C structs and then extended to cover more cases. Perhaps something like:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="c" style="font-family:monospace;"><span style="color: #993333;">struct</span> Data
<span style="color: #009900;">&#123;</span>
	zero_terminated <span style="color: #993333;">char</span><span style="color: #009900;">&#91;</span><span style="color: #009900;">&#93;</span> name<span style="color: #339933;">;</span>
	pad_to_4_bytes_alignment<span style="color: #339933;">;</span>
	platform_endian <span style="color: #993333;">unsigned</span> count<span style="color: #339933;">;</span>
	Entry entries<span style="color: #009900;">&#91;</span>count<span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span><span style="color: #339933;">;</span></pre></td></tr></table></div>

<p>The program also needs an extension mechanism so that we can write custom code for processing objects that can’t be described using even this more advanced syntax. This could be used for things like reverse hash lookups, or other queries that depend on external data.</p>
<p>Going further the program could be extended with more visualizers that could allow you to view and edit complex objects in lots of interesting ways:</p>
<p><a href="http://altdevblogaday.com/wp-content/uploads/2011/08/image4.png"><img src="http://altdevblogaday.com/wp-content/uploads/2011/08/image4.png" alt="Custom Watchers" width="660" height="520" class="alignnone size-full wp-image-15050" /></a></p>
<p>I think this could be a really useful tool, both for debugging and for inspecting files (as a sort of beefed up hex editor). All I need is some time to write it.</p>
<p>What do you think?</p>
<p>(This has also been posted to the <a href="http://bitsquid.blogspot.com/" title="BitSquid Blog">BitSquid blog</a>.)</p>
]]></content:encoded>
			<wfw:commentRss>http://www.altdevblogaday.com/2011/08/24/an-idea-for-better-watch-windows/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Fixing memory issues in Lua</title>
		<link>http://www.altdevblogaday.com/2011/08/09/fixing-memory-issues-in-lua/</link>
		<comments>http://www.altdevblogaday.com/2011/08/09/fixing-memory-issues-in-lua/#comments</comments>
		<pubDate>Tue, 09 Aug 2011 07:15:17 +0000</pubDate>
		<dc:creator>Niklas Frykholm</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[garbage collection]]></category>
		<category><![CDATA[lua]]></category>
		<category><![CDATA[memory]]></category>

		<guid isPermaLink="false">http://altdevblogaday.com/?p=14013</guid>
		<description><![CDATA[<p>Garbage collection can be both a blessing and a curse. On the one hand, it frees you from manually managing memory. This saves development time, reduces bugs, and avoids tricky decisions about objects&#8217; ownerships and lifetimes.</p>
<p><a href="http://www.altdevblogaday.com/2011/08/09/fixing-memory-issues-in-lua/" class="more-link">Read more on Fixing memory issues in Lua&#8230;</a></p>
]]></description>
				<content:encoded><![CDATA[<p>Garbage collection can be both a blessing and a curse. On the one hand, it frees you from manually managing memory. This saves development time, reduces bugs, and avoids tricky decisions about objects&#8217; ownerships and lifetimes.</p>
<p>On the other hand, when you <i>do</i> run into memory issues (and you most likely will), they can be a lot harder to diagnose and fix, because you don&#8217;t have detailed control over how memory is allocated and freed.</p>
<p>In this post I&#8217;ll show some techniques that you can use to address memory issues in Lua (and by extension, in other garbage collected languages).</p>
<p>All Lua memory issues essentially boil down to one of two things:</p>
<dl>
<dt>Lua uses too much memory</dt>
<dd>On consoles memory is a precious resource and sometimes Lua is just using too much of it. The root cause can either be memory leaks or badly constructed/bloated data structures.</dd>
<dt>Garbage collection is taking too long</dt>
<dd>Too much garbage collection is (not surprisingly) caused by having too much garbage. The code must be rewritten so that it generates less garbage.</dd>
</dl>
<p>Let&#8217;s look at each issue in turn and see how we can address it.</p>
<h2>1. Lua uses too much memory</h2>
<p>The first step towards plugging leaks and reducing memory use is to find out where the memory is going. Once we know that, the problems are usually quite easy to fix.</p>
<p>So how do we find out where the memory is going? One way would be to add tracing code to the <i>lua_Alloc()</i> function, but actually there is a much simpler method that doesn&#8217;t require any C code and is more in line with Lua&#8217;s dynamic nature. We can just use Lua to count all the objects in the runtime image:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="lua" style="font-family:monospace;"><span style="color: #aa9900; font-weight: bold;">function</span> count_all<span style="color: #66cc66;">&#40;</span>f<span style="color: #66cc66;">&#41;</span>
	<span style="color: #aa9900; font-weight: bold;">local</span> seen <span style="color: #66cc66;">=</span> <span style="color: #66cc66;">&#123;</span><span style="color: #66cc66;">&#125;</span>
	<span style="color: #aa9900; font-weight: bold;">local</span> count_table
	count_table <span style="color: #66cc66;">=</span> <span style="color: #aa9900; font-weight: bold;">function</span><span style="color: #66cc66;">&#40;</span>t<span style="color: #66cc66;">&#41;</span>
		<span style="color: #aa9900; font-weight: bold;">if</span> seen<span style="color: #66cc66;">&#91;</span>t<span style="color: #66cc66;">&#93;</span> <span style="color: #aa9900; font-weight: bold;">then</span> <span style="color: #aa9900; font-weight: bold;">return</span> <span style="color: #aa9900; font-weight: bold;">end</span>
		f<span style="color: #66cc66;">&#40;</span>t<span style="color: #66cc66;">&#41;</span>
		seen<span style="color: #66cc66;">&#91;</span>t<span style="color: #66cc66;">&#93;</span> <span style="color: #66cc66;">=</span> <span style="color: #aa9900;">true</span>
		<span style="color: #aa9900; font-weight: bold;">for</span> k<span style="color: #66cc66;">,</span>v <span style="color: #aa9900; font-weight: bold;">in</span> <span style="color: #0000aa;">pairs</span><span style="color: #66cc66;">&#40;</span>t<span style="color: #66cc66;">&#41;</span> <span style="color: #aa9900; font-weight: bold;">do</span>
			<span style="color: #aa9900; font-weight: bold;">if</span> <span style="color: #0000aa;">type</span><span style="color: #66cc66;">&#40;</span>v<span style="color: #66cc66;">&#41;</span> <span style="color: #66cc66;">==</span> <span style="color: #ff6666;">&quot;table&quot;</span> <span style="color: #aa9900; font-weight: bold;">then</span>
				count_table<span style="color: #66cc66;">&#40;</span>v<span style="color: #66cc66;">&#41;</span>
			<span style="color: #aa9900; font-weight: bold;">elseif</span> <span style="color: #0000aa;">type</span><span style="color: #66cc66;">&#40;</span>v<span style="color: #66cc66;">&#41;</span> <span style="color: #66cc66;">==</span> <span style="color: #ff6666;">&quot;userdata&quot;</span> <span style="color: #aa9900; font-weight: bold;">then</span>
				f<span style="color: #66cc66;">&#40;</span>v<span style="color: #66cc66;">&#41;</span>
			<span style="color: #aa9900; font-weight: bold;">end</span>
		<span style="color: #aa9900; font-weight: bold;">end</span>
	<span style="color: #aa9900; font-weight: bold;">end</span>
	count_table<span style="color: #66cc66;">&#40;</span><span style="color: #0000aa;">_G</span><span style="color: #66cc66;">&#41;</span>
<span style="color: #aa9900; font-weight: bold;">end</span></pre></td></tr></table></div>

<p>Here we just start with the global table <i>_G</i> and recursively enumerate all subtables and userdata. For each object that we haven&#8217;t seen before, we call the enumeration function <i>f</i>. This will enumerate all the objects in the Lua runtime that can be reached from <i>_G</i>. Depending on how you use Lua you may also want to add some code for enumerating objects stored in the registry, and recurse over metatables and function upvalues to make sure that you really count all the objects in the runtime.</p>
<p>Once you have a function for enumerating all your Lua objects, there are lots of useful things you can do. When it comes to plugging leaks and reducing memory usage I find one of the most useful things is to count the number of objects of each type:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="lua" style="font-family:monospace;"><span style="color: #aa9900; font-weight: bold;">function</span> type_count<span style="color: #66cc66;">&#40;</span><span style="color: #66cc66;">&#41;</span>
	<span style="color: #aa9900; font-weight: bold;">local</span> counts <span style="color: #66cc66;">=</span> <span style="color: #66cc66;">&#123;</span><span style="color: #66cc66;">&#125;</span>
	<span style="color: #aa9900; font-weight: bold;">local</span> enumerate <span style="color: #66cc66;">=</span> <span style="color: #aa9900; font-weight: bold;">function</span> <span style="color: #66cc66;">&#40;</span>o<span style="color: #66cc66;">&#41;</span>
		<span style="color: #aa9900; font-weight: bold;">local</span> t <span style="color: #66cc66;">=</span> type_name<span style="color: #66cc66;">&#40;</span>o<span style="color: #66cc66;">&#41;</span>
		counts<span style="color: #66cc66;">&#91;</span>t<span style="color: #66cc66;">&#93;</span> <span style="color: #66cc66;">=</span> <span style="color: #66cc66;">&#40;</span>counts<span style="color: #66cc66;">&#91;</span>t<span style="color: #66cc66;">&#93;</span> <span style="color: #aa9900; font-weight: bold;">or</span> <span style="color: #cc66cc;">0</span><span style="color: #66cc66;">&#41;</span> <span style="color: #66cc66;">+</span> <span style="color: #cc66cc;">1</span>
	<span style="color: #aa9900; font-weight: bold;">end</span>
	count_all<span style="color: #66cc66;">&#40;</span>enumerate<span style="color: #66cc66;">&#41;</span>
	<span style="color: #aa9900; font-weight: bold;">return</span> counts
<span style="color: #aa9900; font-weight: bold;">end</span></pre></td></tr></table></div>

<p>Here <i>type_name()</i> is a function that returns the name of an object&#8217;s type. This function will depend on what kind of class/object system you use in your Lua runtime. One common approach is to have global class objects that also act as metatables for objects:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="lua" style="font-family:monospace;"><span style="color: #808080; font-style: italic;">-- A class</span>
Car <span style="color: #66cc66;">=</span> <span style="color: #66cc66;">&#123;</span><span style="color: #66cc66;">&#125;</span>
Car<span style="color: #66cc66;">.</span>__index <span style="color: #66cc66;">=</span> Car
&nbsp;
<span style="color: #808080; font-style: italic;">-- A method</span>
<span style="color: #aa9900; font-weight: bold;">function</span> Car<span style="color: #66cc66;">.</span>honk<span style="color: #66cc66;">&#40;</span>self<span style="color: #66cc66;">&#41;</span>
	<span style="color: #0000aa;">print</span> <span style="color: #ff6666;">&quot;toot&quot;</span>
<span style="color: #aa9900; font-weight: bold;">end</span>
&nbsp;
<span style="color: #808080; font-style: italic;">-- An object</span>
<span style="color: #aa9900; font-weight: bold;">local</span> my_car <span style="color: #66cc66;">=</span> <span style="color: #66cc66;">&#123;</span><span style="color: #66cc66;">&#125;</span>
<span style="color: #0000aa;">setmetatable</span><span style="color: #66cc66;">&#40;</span>my_car<span style="color: #66cc66;">,</span> Car<span style="color: #66cc66;">&#41;</span></pre></td></tr></table></div>

<p>In this case, the <i>type_name()</i> function could look something like this:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="lua" style="font-family:monospace;">global_type_table <span style="color: #66cc66;">=</span> <span style="color: #aa9900;">nil</span>
<span style="color: #aa9900; font-weight: bold;">function</span> type_name<span style="color: #66cc66;">&#40;</span>o<span style="color: #66cc66;">&#41;</span>
	<span style="color: #aa9900; font-weight: bold;">if</span> global_type_table <span style="color: #66cc66;">==</span> <span style="color: #aa9900;">nil</span> <span style="color: #aa9900; font-weight: bold;">then</span>
		global_type_table <span style="color: #66cc66;">=</span> <span style="color: #66cc66;">&#123;</span><span style="color: #66cc66;">&#125;</span>
		<span style="color: #aa9900; font-weight: bold;">for</span> k<span style="color: #66cc66;">,</span>v <span style="color: #aa9900; font-weight: bold;">in</span> <span style="color: #0000aa;">pairs</span><span style="color: #66cc66;">&#40;</span><span style="color: #0000aa;">_G</span><span style="color: #66cc66;">&#41;</span> <span style="color: #aa9900; font-weight: bold;">do</span>
			global_type_table<span style="color: #66cc66;">&#91;</span>v<span style="color: #66cc66;">&#93;</span> <span style="color: #66cc66;">=</span> k
		<span style="color: #aa9900; font-weight: bold;">end</span>
		global_type_table<span style="color: #66cc66;">&#91;</span><span style="color: #cc66cc;">0</span><span style="color: #66cc66;">&#93;</span> <span style="color: #66cc66;">=</span> <span style="color: #ff6666;">&quot;table&quot;</span>
	<span style="color: #aa9900; font-weight: bold;">end</span>
	<span style="color: #aa9900; font-weight: bold;">return</span> global_type_table<span style="color: #66cc66;">&#91;</span><span style="color: #0000aa;">getmetatable</span><span style="color: #66cc66;">&#40;</span>o<span style="color: #66cc66;">&#41;</span> <span style="color: #aa9900; font-weight: bold;">or</span> <span style="color: #cc66cc;">0</span><span style="color: #66cc66;">&#93;</span> <span style="color: #aa9900; font-weight: bold;">or</span> <span style="color: #ff6666;">&quot;Unknown&quot;</span>
<span style="color: #aa9900; font-weight: bold;">end</span></pre></td></tr></table></div>

<p>The object count usually gives you a good idea of where your memory problems lie. For example, if the number of <i>AiPathNode</i> objects constantly rises, you can conclude that you are somehow leaking those objects. If you have 200&nbsp;000 <i>GridCell</i> objects you should write a smarter grid implementation.</p>
<p>You can also use this enumeration technique to pinpoint problems further if necessary. For example, if you are hunting for leaks, you can rewrite the <i>count_all()</i> function so that it keeps track of the sub keys where an object were found. In this way, you might see that the <i>AiPathNode</i> objects can be accessed through paths like:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="lua" style="font-family:monospace;"><span style="color: #0000aa;">_G</span><span style="color: #66cc66;">.</span>managers<span style="color: #66cc66;">.</span>ai_managers<span style="color: #66cc66;">.</span>active_paths<span style="color: #66cc66;">&#91;</span><span style="color: #cc66cc;">2027</span><span style="color: #66cc66;">&#93;</span></pre></td></tr></table></div>

<p>Then you know that the source of the leak is that paths never get removed from the <i>active_paths</i> table.</p>
<h2>2. Garbage collection is taking too long</h2>
<p>Garbage collection is a very cache unfriendly task that can have a significant performance impact. This is especially frustrating since garbage collection doesn&#8217;t really <i>do</i> anything. Well, it lets your gameplay programmers work faster and with fewer bugs, but when you have reached the optimization phase you tend to forget about that and just swear at the slow collector.</p>
<p>Lua&#8217;s default garbage collection scheme is not adapted for realtime software and if you just run it straight up you will get lots of disturbing frame rate hitches. As has already been mentioned in previous #AltDevBlogADay articles, it is better to use a step size of 0 and just run the garbage collector for a certain number of milliseconds every frame:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="cpp" style="font-family:monospace;">OpaqueTimeValue start <span style="color: #000080;">=</span> <span style="color: #0000dd;">time</span><span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
<span style="color: #0000ff;">while</span> <span style="color: #008000;">&#40;</span>milliseconds_elapsed_since<span style="color: #008000;">&#40;</span>start<span style="color: #008000;">&#41;</span> <span style="color: #000080;">&lt;</span> milliseconds_to_run<span style="color: #008000;">&#41;</span>
	lua_gc<span style="color: #008000;">&#40;</span>L, LUA_GCSTEP, <span style="color: #0000dd;">0</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span></pre></td></tr></table></div>

<p>Note that you can run this garbage collection on any thread, as long as Lua is not running at the same time, so you might be able to offset some of the cost by running the garbage collection on a background thread while your main thread is doing something non-Lua related.</p>
<p>How much time should you spend on garbage collection? A tricky question. If you spend too little, the garbage will grow and you will eventually run out of memory. If you spend too much, you are wasting precious milliseconds.</p>
<p>My preferred solution is to use a feedback mechanism. I dynamically adjust the garbage collection time so that the amount of garbage always stays below 10 % of the total Lua memory. If the garbage goes above that, I increase the collection time. If the garbage goes below, I decrease the collection time. As with all feedback mechanisms is a good idea to plot the curves for memory use and garbage collection time as you tweak the feedback parameters. That way you can verify that the system behaves nicely and that the curves settle down in a stable state rather than going into oscillation.</p>
<p>Choosing the figure 10 % is a balance between memory use and performance. If you choose a higher value, your program will use more memory (because of the increased amount of garbage). On the other hand, you can give the garbage collection a smaller time slice. I&#8217;ve chosen a pretty low number, because on consoles, memory is always precious. If you are targeting a platform with more memory, you can go higher.</p>
<p>Let&#8217;s compute how much time we need to spend on garbage collection to stay below a certain fraction <img src='http://s0.wp.com/latex.php?latex=0+%5Cle+%5Calpha+%5Cle+1+&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='0 &#92;le &#92;alpha &#92;le 1 ' title='0 &#92;le &#92;alpha &#92;le 1 ' class='latex' /> of garbage. Assume that we complete a full garbage collection cycle (scan all Lua memory) in time <img src='http://s0.wp.com/latex.php?latex=t&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='t' title='t' class='latex' />. The amount of garbage generated in that time will be:</p>
<p><img src='http://s0.wp.com/latex.php?latex=t+g+&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='t g ' title='t g ' class='latex' /></p>
<p>Where <img src='http://s0.wp.com/latex.php?latex=g&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='g' title='g' class='latex' /> is the garbage/s created by the program. To make sure that we stay below a fraction <img src='http://s0.wp.com/latex.php?latex=%5Calpha&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;alpha' title='&#92;alpha' class='latex' /> we must have (where <img src='http://s0.wp.com/latex.php?latex=m&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='m' title='m' class='latex' /> is the total memory used by the program, including the garbage):</p>
<p><img src='http://s0.wp.com/latex.php?latex=t+g+%5Cle+%5Calpha+m+&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='t g &#92;le &#92;alpha m ' title='t g &#92;le &#92;alpha m ' class='latex' /></p>
<p>Assume that we sweep <img src='http://s0.wp.com/latex.php?latex=s+&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='s ' title='s ' class='latex' /> bytes/s. Then the time <img src='http://s0.wp.com/latex.php?latex=t+&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='t ' title='t ' class='latex' /> required to sweep the entire memory <img src='http://s0.wp.com/latex.php?latex=m&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='m' title='m' class='latex' /> will be:</p>
<p><img src='http://s0.wp.com/latex.php?latex=t+%3D+%5Cfrac%7Bm%7D%7Bs%7D+&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='t = &#92;frac{m}{s} ' title='t = &#92;frac{m}{s} ' class='latex' /></p>
<p>Combining the two equations we get:</p>
<p><img src='http://s0.wp.com/latex.php?latex=s+%5Cge+%5Cfrac%7Bg%7D%7B%5Calpha%7D+&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='s &#92;ge &#92;frac{g}{&#92;alpha} ' title='s &#92;ge &#92;frac{g}{&#92;alpha} ' class='latex' /></p>
<p>So the amount of garbage collection work we need to do per frame is directly proportional to the amount of garbage / s generated by the program and inversely proportional to the fraction of garbage we are willing to accept. (Note that interestingly, <img src='http://s0.wp.com/latex.php?latex=m+&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='m ' title='m ' class='latex' /> cancels out of the equation.)</p>
<p>So, if we are willing to spend more memory, we can address garbage collection problems by increasing <img src='http://s0.wp.com/latex.php?latex=%5Calpha+&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;alpha ' title='&#92;alpha ' class='latex' />. But since <img src='http://s0.wp.com/latex.php?latex=%5Calpha+&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;alpha ' title='&#92;alpha ' class='latex' /> can never be higher than 1, there are limits to what we can achieve in this way. A better option, that doesn&#8217;t cost any memory, is to reduce <img src='http://s0.wp.com/latex.php?latex=g+&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='g ' title='g ' class='latex' /> &#8212; the amount of garbage generated.</p>
<p>In my experience, most garbage generation problems are &#8220;easy mistakes&#8221; from sloppy and thoughtless programming. Once you know where the problems are, it is usually not hard to rewrite the code so that garbage generation is avoided. Some useful refactoring techniques are:</p>
<ul>
<li>Update the fields in an existing table instead of creating a new one.</li>
<li>Return a reference to an object member rather than a copy. Copy only when needed.</li>
<li>Write functions so that they take and return values rather than tables to avoid temporary tables. I. e., <i>make_point(2,3)</i> rather than <i>make_point({2,3})</i>.</li>
<li>If you need temporary objects, find a way of reusing them so you don&#8217;t need to create so many of them.</li>
<li>Avoid excessive string concatenation.</li>
</ul>
<p>Of course a key requirement for this to work is that your Lua-to-C bindings are written so that they don&#8217;t generate garbage. Otherwise your poor gameplay programmer has no chance. In my opinion, it should be possible to call any C function in a &#8220;garbage free&#8221; way (though you may choose to also have a more convenient path that <i>does</i> generate garbage). For tips on how to write garbage free bindings, see my previous posts on <a href="http://altdevblogaday.com/2011/06/26/lightweight-lua-bindings/">Lightweight Lua Bindings</a>.</p>
<p>To reduce garbage generation, you need to be able to pinpoint where in the program the garbage is being generated. Luckily, that is not difficult.</p>
<p>Once the game has reached a stable state (total Lua memory doesn&#8217;t grow or shrink) any allocation made can be considered garbage, because it will soon be freed again (otherwise the Lua memory would keep growing). So to find the garbage all you have to do is to add some tracing code to <i>lua_Alloc</i> that you can trigger when you have reached a stable state.</p>
<p>You can use  <i>lua_getstack()</i> to get the current Lua stack trace from inside <i>lua_Alloc</i> and use a <i>HashMap</i> to count the number of allocations associated with each stack trace. If you then sort this data by the number of allocations it is easy to identify the &#8220;hotspots&#8221; that are generating the most garbage. A gameplay programmer can go through this list and reduce the amount of garbage generation using the tips above.</p>
<p>The code may look something like this:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="cpp" style="font-family:monospace;"><span style="color: #0000ff;">struct</span> TraceEntry <span style="color: #008000;">&#123;</span>
	TraceEntry<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span> <span style="color: #008080;">:</span> alloc_count<span style="color: #008000;">&#40;</span><span style="color: #0000dd;">0</span><span style="color: #008000;">&#41;</span>, alloc_bytes<span style="color: #008000;">&#40;</span><span style="color: #0000dd;">0</span><span style="color: #008000;">&#41;</span> <span style="color: #008000;">&#123;</span><span style="color: #008000;">&#125;</span>
	String trace<span style="color: #008080;">;</span>
	<span style="color: #0000ff;">unsigned</span> alloc_count<span style="color: #008080;">;</span>
	<span style="color: #0000ff;">unsigned</span> alloc_bytes<span style="color: #008080;">;</span>
<span style="color: #008000;">&#125;</span><span style="color: #008080;">;</span>
HashMap<span style="color: #000080;">&lt;</span><span style="color: #0000ff;">uint64</span>, TraceEntry<span style="color: #000080;">&gt;</span> _traces<span style="color: #008080;">;</span>
&nbsp;
<span style="color: #0000ff;">if</span> <span style="color: #008000;">&#40;</span>_tracing_allocs<span style="color: #008000;">&#41;</span> <span style="color: #008000;">&#123;</span>
	lua_Debug stack<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">5</span><span style="color: #008000;">&#93;</span> <span style="color: #000080;">=</span> <span style="color: #008000;">&#123;</span><span style="color: #0000dd;">0</span><span style="color: #008000;">&#125;</span><span style="color: #008080;">;</span>
	<span style="color: #0000ff;">int</span> count <span style="color: #000080;">=</span> lua_debugger<span style="color: #008080;">::</span><span style="color: #007788;">stack_dump</span><span style="color: #008000;">&#40;</span>L, stack, <span style="color: #0000dd;">5</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
	<span style="color: #0000ff;">uint64</span> hash <span style="color: #000080;">=</span> murmur_hash_64<span style="color: #008000;">&#40;</span><span style="color: #000040;">&amp;</span>stack<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">0</span><span style="color: #008000;">&#93;</span>, <span style="color: #0000dd;">sizeof</span><span style="color: #008000;">&#40;</span>lua_Debug<span style="color: #008000;">&#41;</span><span style="color: #000040;">*</span>count<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
	TraceEntry <span style="color: #000040;">&amp;</span>te <span style="color: #000080;">=</span> _traces<span style="color: #008000;">&#91;</span>hash<span style="color: #008000;">&#93;</span><span style="color: #008080;">;</span>
	te.<span style="color: #007788;">alloc_count</span> <span style="color: #000040;">+</span><span style="color: #000080;">=</span> <span style="color: #0000dd;">1</span><span style="color: #008080;">;</span>
	te.<span style="color: #007788;">alloc_bytes</span> <span style="color: #000040;">+</span><span style="color: #000080;">=</span> <span style="color: #008000;">&#40;</span>new_size <span style="color: #000040;">-</span> old_size<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
	<span style="color: #0000ff;">if</span> <span style="color: #008000;">&#40;</span>te.<span style="color: #007788;">trace</span>.<span style="color: #007788;">empty</span><span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span>
		lua_debugger<span style="color: #008080;">::</span><span style="color: #007788;">stack_dump_to_string</span><span style="color: #008000;">&#40;</span>L, te.<span style="color: #007788;">trace</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
<span style="color: #008000;">&#125;</span></pre></td></tr></table></div>

<p>In my experience, spending a few hours on fixing the worst hot spots indicated by the trace can reduce the garbage collection time by an order of magnitude.</p>
<p><i>(This has also been posted to the <a href="http://bitsquid.blogspot.com/">BitSquid blog</a>.)</i></p>
]]></content:encoded>
			<wfw:commentRss>http://www.altdevblogaday.com/2011/08/09/fixing-memory-issues-in-lua/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>An in-place parsing experiment</title>
		<link>http://www.altdevblogaday.com/2011/07/25/an-in-place-parsing-experiment/</link>
		<comments>http://www.altdevblogaday.com/2011/07/25/an-in-place-parsing-experiment/#comments</comments>
		<pubDate>Mon, 25 Jul 2011 12:00:04 +0000</pubDate>
		<dc:creator>Niklas Frykholm</dc:creator>
				<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://altdevblogaday.com/?p=11341</guid>
		<description><![CDATA[<p>Our engine recently ran into an out-of-memory crash when parsing a 150 MB JSON file (foiled by the artists again). The file caused our parser to create a DOM-like tree representation that was over 2 GB in size. No worries, I put the DOM tree on a strict memory diet and reduced the size to 120 MB &#8212; slightly less than the size of the original JSON data.</p>
<p><a href="http://www.altdevblogaday.com/2011/07/25/an-in-place-parsing-experiment/" class="more-link">Read more on An in-place parsing experiment&#8230;</a></p>
]]></description>
				<content:encoded><![CDATA[<p>Our engine recently ran into an out-of-memory crash when parsing a 150 MB JSON file (foiled by the artists again). The file caused our parser to create a DOM-like tree representation that was over 2 GB in size. No worries, I put the DOM tree on a strict memory diet and reduced the size to 120 MB &#8212; slightly less than the size of the original JSON data.</p>
<p>But it made me think of something I&#8217;ve always wanted to try: in-place parsing.</p>
<h2>In-place parsing</h2>
<p>The idea behind in-place parsing is simple. Instead of copying the entire JSON document into a DOM tree representation, we let the data in the JSON file represent itself. We represent an object simply by a pointer to the particular position in the JSON data where that object starts. And then we lazily parse out the properties as we need them.</p>
<p>Let&#8217;s look at an example:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="json" style="font-family:monospace;">{
   &quot;people&quot; : [
      {&quot;name&quot; : &quot;Niklas&quot;, &quot;occupation&quot; : &quot;programmer&quot;},
      {&quot;name&quot; : &quot;Obama&quot;, &quot;occupation&quot; : &quot;president&quot;}
   ]
}</pre></td></tr></table></div>

<p>If I wanted to represent the first person in this list, I would use a pointer into the JSON string where that object starts:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="json" style="font-family:monospace;">{&quot;name&quot; : &quot;Niklas&quot;, &quot;occupation&quot; : &quot;programmer&quot;},  {&quot;name&quot; : &quot;Obama&quot;, ...</pre></td></tr></table></div>

<p>Similarly, if I wanted to talk about my name I would use a pointer to where that string starts:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="json" style="font-family:monospace;">&quot;Niklas&quot;, &quot;occupation&quot; : &quot;programmer&quot;},  {&quot;name&quot; : &quot;Obama&quot;, &quot;occupatio...</pre></td></tr></table></div>

<p>An in-place parsing API provides functions for parsing values, given pointers to the values&#8217; locations in the JSON document. I use something like this:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="cpp" style="font-family:monospace;"><span style="color: #0000ff;">enum</span> ValueType <span style="color: #008000;">&#123;</span>NIL, BOOL, INTEGER, FLOAT, STRING, ARRAY, OBJECT<span style="color: #008000;">&#125;</span><span style="color: #008080;">;</span>
ValueType type<span style="color: #008000;">&#40;</span><span style="color: #0000ff;">const</span> <span style="color: #0000ff;">char</span> <span style="color: #000040;">*</span>s<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
<span style="color: #0000ff;">bool</span> to_bool<span style="color: #008000;">&#40;</span><span style="color: #0000ff;">const</span> <span style="color: #0000ff;">char</span> <span style="color: #000040;">*</span>s<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
<span style="color: #0000ff;">int</span> to_int<span style="color: #008000;">&#40;</span><span style="color: #0000ff;">const</span> <span style="color: #0000ff;">char</span> <span style="color: #000040;">*</span>s<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
<span style="color: #0000ff;">float</span> to_float<span style="color: #008000;">&#40;</span><span style="color: #0000ff;">const</span> <span style="color: #0000ff;">char</span> <span style="color: #000040;">*</span>s<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
<span style="color: #0000ff;">void</span> to_string<span style="color: #008000;">&#40;</span><span style="color: #0000ff;">const</span> <span style="color: #0000ff;">char</span> <span style="color: #000040;">*</span>s, Vector<span style="color: #000080;">&lt;</span><span style="color: #0000ff;">char</span><span style="color: #000080;">&gt;</span> <span style="color: #000040;">&amp;</span>str<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
<span style="color: #0000ff;">void</span> to_array<span style="color: #008000;">&#40;</span><span style="color: #0000ff;">const</span> <span style="color: #0000ff;">char</span> <span style="color: #000040;">*</span>s, Vector<span style="color: #000080;">&lt;</span><span style="color: #0000ff;">const</span> <span style="color: #0000ff;">char</span> <span style="color: #000040;">*</span><span style="color: #000080;">&gt;</span> <span style="color: #000040;">&amp;</span>array<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
<span style="color: #0000ff;">void</span> to_object<span style="color: #008000;">&#40;</span><span style="color: #0000ff;">const</span> <span style="color: #0000ff;">char</span> <span style="color: #000040;">*</span>s, Map<span style="color: #000080;">&lt;</span>DynamicString, <span style="color: #0000ff;">const</span> <span style="color: #0000ff;">char</span> <span style="color: #000040;">*</span><span style="color: #000080;">&gt;</span> <span style="color: #000040;">&amp;</span>object<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span></pre></td></tr></table></div>

<p>As you can see from this interface, when parsing a JSON array or object I do not recursively parse all sub objects (doing that would be the same as creating a DOM tree). Instead I represent the object as a map from a string (the property name) to the location in the JSON file where the value of that property is found.</p>
<p>When I need to recurse into the sub-objects I do further parsing of their <i>const char *</i> pointers.</p>
<p>So the <i>people</i> array in the example above will be represented as an array of string pointers to the locations of the objects in the array. Something like this:</p>
<pre escaped="true" lang="">
[0]: '{"name" : "Niklas", "occu...'
[1]: '{"name" : "Obama", "occup...'
</pre>
<p>The appealing thing about this approach is that it allocates almost no memory in addition to what the original JSON file uses. We only need a few bytes for the <i>Vectors</i> and <i>Maps</i> of the objects we are currently parsing.</p>
<p>This means that an in-place parser can handle very large files. A 2 GB JSON file is no problem, we can just mmap() it and run the in-place parser.</p>
<p>Since the bulk of the data is in the immutable JSON file, in-place parsers are also easy to multi-thread. Different threads can parse different branches of the JSON file without stepping on each other&#8217;s toes.</p>
<p>But what about speed? Since we avoid the busy work of allocating and shuffling nodes in a DOM tree, we might be faster than a DOM parser. On the other hand, we have to make multiple passes through the file. First to get the root object, then a second pass when we parse out its properties, etc.</p>
<h2>Performance analysis</h2>
<p>How bad are the multiple passes of the in-place parser? Let&#8217;s see how much memory they touch all together.</p>
<p>In the first pass we parse the entire file to find the keys and values of the root object. The cost of this is about <img src='http://s0.wp.com/latex.php?latex=O%28N%29&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='O(N)' title='O(N)' class='latex' />, where <img src='http://s0.wp.com/latex.php?latex=N&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='N' title='N' class='latex' /> is the size of the JSON file.</p>
<p>In the second pass we parse the objects at the next level of nesting to find the location of their keys and value. The cost of this step is <img src='http://s0.wp.com/latex.php?latex=c_1+N+&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='c_1 N ' title='c_1 N ' class='latex' />, where <img src='http://s0.wp.com/latex.php?latex=c_1+&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='c_1 ' title='c_1 ' class='latex' /> is the fraction of the file covered by objects at this level &#8212; it is probably a value quite close to 100 %.</p>
<p>Continuing, we find the total parse cost to be:</p>
<p><img src='http://s0.wp.com/latex.php?latex=O%28N+%2B+c_1+N+%2B+c_2+N+%2B+c_3+N+%2B+...%29+&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='O(N + c_1 N + c_2 N + c_3 N + ...) ' title='O(N + c_1 N + c_2 N + c_3 N + ...) ' class='latex' /></p>
<p><img src='http://s0.wp.com/latex.php?latex=O%28%281+%2B+c_1+%2B+c_2+%2B+c_3+%2B+...%29+N%29+&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='O((1 + c_1 + c_2 + c_3 + ...) N) ' title='O((1 + c_1 + c_2 + c_3 + ...) N) ' class='latex' /></p>
<p><img src='http://s0.wp.com/latex.php?latex=O%28kN%29%2C+k+%3D+%281+%2B+c_1+%2B+c_2+%2B+c_3+%2B+...%29+&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='O(kN), k = (1 + c_1 + c_2 + c_3 + ...) ' title='O(kN), k = (1 + c_1 + c_2 + c_3 + ...) ' class='latex' /></p>
<p>Where <img src='http://s0.wp.com/latex.php?latex=k+&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='k ' title='k ' class='latex' /> is the average &#8220;depth&#8221; or &#8220;level of nesting&#8221; of data in the file. </p>
<p>Let&#8217;s compare this with how much memory the DOM parser touches. The DOM parser touches the <img src='http://s0.wp.com/latex.php?latex=N+&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='N ' title='N ' class='latex' /> bytes of the JSON file as it parses it. In the process it builds a DOM tree of comparable size, touching another <img src='http://s0.wp.com/latex.php?latex=N&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='N' title='N' class='latex' /> bytes. Then, when the DOM tree is read to build the application specific objects, it touches the <img src='http://s0.wp.com/latex.php?latex=N&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='N' title='N' class='latex' /> bytes again. And finally, when it tears down the DOM tree, it touches it again giving us a total cost of <img src='http://s0.wp.com/latex.php?latex=O%284N%29+&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='O(4N) ' title='O(4N) ' class='latex' />.</p>
<p>For a typical file that is not very deeply nested you would expect <img src='http://s0.wp.com/latex.php?latex=k+&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='k ' title='k ' class='latex' /> to be maybe in the range <img src='http://s0.wp.com/latex.php?latex=4+%5Cle+k+%5Cle+8+&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='4 &#92;le k &#92;le 8 ' title='4 &#92;le k &#92;le 8 ' class='latex' /> which means that <img src='http://s0.wp.com/latex.php?latex=O%28kN%29+&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='O(kN) ' title='O(kN) ' class='latex' /> and <img src='http://s0.wp.com/latex.php?latex=O%284N%29+&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='O(4N) ' title='O(4N) ' class='latex' /> are in the same ballpark. The devil then is in the constants. Does the tree building and node shuffling of the DOM method make it more expensive, or is the extra parsing performed by the in-place method more expensive?</p>
<h2>A test run</h2>
<p>To test this, I ran both parsers on the the 150 MB JSON file that started all this. I created a lightweight &#8220;helper interface&#8221; for the in-place parser that gave it the same interface as our DOM nodes, so that I could switch parsers without modifying the data compiler.</p>
<p>The result:</p>
<pre escaped="true" lang="">
Original DOM-parsing:   4420.2 ms
In-place parsing:       5073.5 ms
</pre>
<p>Pretty much the same, in other words. Damn, that&#8217;s disappointing! So while the in-place method saves a lot of memory, that save does not translate to an immediate speed increase. At least not after the memory optimizations I made on the DOM tree.</p>
<p>The test JSON file has the bulk of its data in nodes with the path:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="cpp" style="font-family:monospace;">root.<span style="color: #007788;">animations</span><span style="color: #008000;">&#91;</span>i<span style="color: #008000;">&#93;</span>.<span style="color: #007788;">stream</span>.<span style="color: #007788;">data</span><span style="color: #008000;">&#91;</span>t<span style="color: #008000;">&#93;</span></pre></td></tr></table></div>

<p>So <img src='http://s0.wp.com/latex.php?latex=k+%5Capprox+5+&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='k &#92;approx 5 ' title='k &#92;approx 5 ' class='latex' /> in this case.</p>
<p>From here, I could go on and try various optimizations to improve the in-place parser, but to be fair I would have to spend an equal ammount of time optimizing the tree parser, because neither has been extensively low-level optimized.</p>
<p>But frankly, this problem does not warrant that much attention. JSON files are never parsed by our runtime, only in our data compile step and big files are rare, so this has no huge effect on our end users. I was mostly curious in seeing if moving to an in-place parser would give an order-of-magnitude improvement. Seems it doesn&#8217;t, at least not with the approach I have been using.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.altdevblogaday.com/2011/07/25/an-in-place-parsing-experiment/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Lightweight Lua Bindings &#8212; Part 2</title>
		<link>http://www.altdevblogaday.com/2011/07/09/lightweight-lua-bindings-part-2/</link>
		<comments>http://www.altdevblogaday.com/2011/07/09/lightweight-lua-bindings-part-2/#comments</comments>
		<pubDate>Sat, 09 Jul 2011 18:25:56 +0000</pubDate>
		<dc:creator>Niklas Frykholm</dc:creator>
				<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://altdevblogaday.com/?p=10993</guid>
		<description><![CDATA[<p>In my <a href="http://altdevblogaday.com/2011/06/26/lightweight-lua-bindings/">last article</a> I showed how you could create lightweight and garbage collection friendly Lua bindings by using raw C pointers stored as <i>light user data</i>. Unfortunately, this approach doesn&#8217;t work for temporary objects, because if the objects don&#8217;t have permanent life on the C side, there is nothing for the raw pointers to point to.</p>
<p><a href="http://www.altdevblogaday.com/2011/07/09/lightweight-lua-bindings-part-2/" class="more-link">Read more on Lightweight Lua Bindings &#8212; Part 2&#8230;</a></p>
]]></description>
				<content:encoded><![CDATA[<p>In my <a href="http://altdevblogaday.com/2011/06/26/lightweight-lua-bindings/">last article</a> I showed how you could create lightweight and garbage collection friendly Lua bindings by using raw C pointers stored as <i>light user data</i>. Unfortunately, this approach doesn&#8217;t work for temporary objects, because if the objects don&#8217;t have permanent life on the C side, there is nothing for the raw pointers to point to.</p>
<p>By far the most common source of temporary objects are mathematical computations involving vector3s and quaternions, such as:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="lua" style="font-family:monospace;"><span style="color: #aa9900; font-weight: bold;">local</span> p <span style="color: #66cc66;">=</span> Camera<span style="color: #66cc66;">.</span>get_position<span style="color: #66cc66;">&#40;</span>cam<span style="color: #66cc66;">&#41;</span> <span style="color: #66cc66;">+</span> velocity<span style="color: #66cc66;">*</span>dt</pre></td></tr></table></div>

<p>Here, temporary vector3s are generated as intermediate results in each step of the computation.</p>
<p>Note that computations with simple floats are not a problem, because they live on the stack rather than on the heap. They are passed by value and do not require garbage collection. It is only computations with bigger objects: vector3s, quaternions and matrices that cause us trouble.</p>
<p>The best thing would be if we could tell Lua to treat vector3s just as floats, allocate them on the stack and pass them by value. Unfortunately, Lua does not allow that. Only the built-in types (nil, bool, number, light user data) have value semantics. All other objects are references, heap allocated and garbage collected. (Note though, that there <i>are</i> garbage collected languages that support complex types with value semantics. For example, in C# you can create your own value types by using the <i>struct</i> keyword.)</p>
<p>If you are willing to modify Lua you can extend its value type and make room for vector3s. Just add a <i>Vector3</i> to the <i>Value</i> union in <i>object.h</i> and a corresponding new Lua type <i>LUA_TVECTOR3</i>. However, note that this will significantly increase the memory used by Lua, since the <i>Value</i> union is used everywhere, so it&#8217;s not something that I would recommend.</p>
<p>Two other techniques commonly used by garbage collected languages are <i>escape analysis</i> and <i>generational garbage collection</i>. With escape analysis the compiler tries to detect which objects can escape the current scope. If the compiler can detect that an object is truly temporary and doesn&#8217;t leave the scope, it can convert the heap allocation to a stack allocation. Generational garbage collection means that the garbage collector spends more time looking at newly created objects. This means that it&#8217;s less bad to create a lot of temporary garbage, because it will probably be cleaned up pretty quickly. It&#8217;s still not <i>good</i> though. And both approaches are a bit scary in that you are putting your fate completely in the hands of the compiler.</p>
<p>Lua has no escape analysis. There is an experimental generational garbage collector in Lua 5.2. </p>
<p>An unconventional approach that is possible in Lua is to represent a vector3, not as a compound object, but as three individual float values on the stack. This is possible since Lua allows functions to return multiple values:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="lua" style="font-family:monospace;"><span style="color: #aa9900; font-weight: bold;">local</span> cx<span style="color: #66cc66;">,</span> cy<span style="color: #66cc66;">,</span> cz <span style="color: #66cc66;">=</span> Camera<span style="color: #66cc66;">.</span>get_position<span style="color: #66cc66;">&#40;</span>cam<span style="color: #66cc66;">&#41;</span>
<span style="color: #aa9900; font-weight: bold;">local</span> px<span style="color: #66cc66;">,</span> py<span style="color: #66cc66;">,</span> pz <span style="color: #66cc66;">=</span> Vector3<span style="color: #66cc66;">.</span>add<span style="color: #66cc66;">&#40;</span>cx<span style="color: #66cc66;">,</span> cy<span style="color: #66cc66;">,</span> cz<span style="color: #66cc66;">,</span> Vector3<span style="color: #66cc66;">.</span>multiply<span style="color: #66cc66;">&#40;</span>vx<span style="color: #66cc66;">,</span> vy<span style="color: #66cc66;">,</span> vz<span style="color: #66cc66;">,</span> t<span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span></pre></td></tr></table></div>

<p>Possible, and completely garbage free, but tedious and hard to read. I wouldn&#8217;t want to write a lot of code in this way. But it is a technique that can be good to remember if you want to trim your garbage. For example, if you write a Lua function that takes a rect, write it so that it takes <i>(x,y,w,h)</i> rather than <i>({x,y,w,h})</i> and your callers will generate a lot less garbage.</p>
<p>So, having exhausted all other possibilities, we are back where we started. We have to store the vector3s as user data. But we don&#8217;t want to be constantly creating and destroying them on the heap. So what can we do? We recycle them!</p>
<p>Instead of allocating new vector3s every time, we keep a pool of them around for use as temporary objects. And whenever we need a new one, we just get it from the pool.</p>
<p>Note that this approach works with both heavy and light user data. With heavy user data, the pool is a Lua table of heavy user data vector3s. With light user data, the pool is a permanent buffer on the C side and we represent vector3s as light user data pointers into that buffer. Either approach is possible, but the light user data approach has better performance. We can pack the data more tightly, and the Lua garbage collector doesn&#8217;t have to sweep the objects.</p>
<p>There is one final part of the puzzle and that is how we recycle the vector3s. When we generate a vector3 we don&#8217;t know where the user might store it and how long it might be in use. When is it safe to reuse a slot in the pool for a new vector3?</p>
<p>The ideal thing would be if we could detect exactly which vector3s Lua was using, so that we could recycle only those vector3s not currently in use. But if you think about it, that is exactly the garbage collection problem. And garbage collection was what we were trying to get away from! We don&#8217;t want to replace one kind of garbage collection with another. We need something more efficient.</p>
<p>There are a lot of approaches you could take to recycling. For example, you could ask the user to mark in various ways which objects can and cannot be recycled. Since this system is already quite complex, and hard to understand for gameplay programmers I&#8217;ve decided to go with the simplest possible rule:</p>
<blockquote><p>
Each frame flip all of the vector3s in Lua are recycled.
</p></blockquote>
<p>I call this &#8220;The Great Death&#8221;. This means that you can do whatever you like with the vector3s in the current frame: add them, multiply them, etc. They will work just as vector3s in C++ and they won&#8217;t require garbage collection. But you can&#8217;t save them in a variable and use them the next frame. If you want to do that you will have to &#8220;box&#8221; them, like this:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="lua" style="font-family:monospace;">self<span style="color: #66cc66;">.</span>box <span style="color: #66cc66;">=</span> Vector3Box<span style="color: #66cc66;">&#40;</span><span style="color: #66cc66;">&#41;</span>
<span style="color: #66cc66;">...</span>
self<span style="color: #66cc66;">.</span>box<span style="color: #66cc66;">:</span>store<span style="color: #66cc66;">&#40;</span>Camera<span style="color: #66cc66;">.</span>get_position<span style="color: #66cc66;">&#40;</span>cam<span style="color: #66cc66;">&#41;</span> <span style="color: #66cc66;">+</span> t<span style="color: #66cc66;">*</span>velocity<span style="color: #66cc66;">&#41;</span></pre></td></tr></table></div>

<p>A Vector3Box is a heap allocated heavy user data object that stores a vector3. You can extract it in some later frame by calling <i>self.box:retreive()</i>. This gives you back a temporary vector3 valid for the current frame, that you can use for more computations. Since <i>Vector3Box()</i> allocates memory you want to minimize those calls. Create your boxes up front as member variables and then use <i>store()</i> and <i>retrieve()</i> to set and get data.</p>
<h2>The implementation</h2>
<p>Now that we have all the pieces in place, let&#8217;s see what the actual implementation looks like. At the C side we have a buffer that holds all our temporary vector3s.</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="lua" style="font-family:monospace;">const int LUA_VECTOR3_BUFFER_SIZE <span style="color: #66cc66;">=</span> <span style="color: #cc66cc;">4096</span><span style="color: #66cc66;">;</span>
Vector3 lua_vector3_buffer<span style="color: #66cc66;">&#91;</span>LUA_VECTOR3_BUFFER_SIZE<span style="color: #66cc66;">&#93;</span><span style="color: #66cc66;">;</span>
unsigned used_lua_vector3s <span style="color: #66cc66;">=</span> <span style="color: #cc66cc;">0</span><span style="color: #66cc66;">;</span></pre></td></tr></table></div>

<p>This buffer is statically allocated, so we can check if a light user data is a vector3 by checking if the pointer lies within the buffer&#8217;s range.</p>
<p>Whenever we need to return a new vector3 from a function in the Lua interface we just return <i>lua_vector3_buffer[used_lua_vector3s++]</i> (we check that the buffer doesn&#8217;t overflow) and each frame we reset <i>used_lua_vector3s = 0</i>.</p>
<p>It is nice to be able to use mathematical operators such as *, +, &#8211; etc on vector3s. Lua doesn&#8217;t allow light user data objects to have individual meta tables, <i>but</i> we can set one common meta table for all light user data objects. So we can get *, + and &#8211; for vector3s by putting them in that table.</p>
<p>If you want mathematical operators for other types, such as quaternions and matrices, you have to write the <i>__mul</i> function in the meta table so that it checks the objects&#8217; types and calls the right function. I don&#8217;t bother, because in my experience, it is only the operators for vector3 that really matter in terms of making the scripts readable. For the other classes I just use the functional style <i>Matrix4x4.multiply(m1, m2)</i>.</p>
<p>From the perspective of the gameplay programmer there are two drawbacks with this solution. The first is that if you use too many temporary vector3s in one frame you will overflow the buffer and assert. The second is that if you mess up and store a vector3 in a variable and use it in a later frame when it has been recycled for some other purpose, the variable will have silently &#8220;mutated&#8221; under your feet. A really confusing and hard-to-debug experience.</p>
<p>The first issue can be solved by giving the gameplay programmer some control over the <i>used_lua_vector3s</i> variable. So if she knows that if she is going to do a bunch of computations in a scope and that no vector3 will escape that scope, she can manually save and restore the variable:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="lua" style="font-family:monospace;"><span style="color: #aa9900; font-weight: bold;">local</span> c <span style="color: #66cc66;">=</span> Script<span style="color: #66cc66;">.</span>used_lua_vector3s<span style="color: #66cc66;">&#40;</span><span style="color: #66cc66;">&#41;</span>
<span style="color: #aa9900; font-weight: bold;">for</span> i<span style="color: #66cc66;">=</span><span style="color: #cc66cc;">1</span><span style="color: #66cc66;">,</span><span style="color: #cc66cc;">1000</span> <span style="color: #aa9900; font-weight: bold;">do</span>
	Script<span style="color: #66cc66;">.</span>set_used_lua_vector3s<span style="color: #66cc66;">&#40;</span>c<span style="color: #66cc66;">&#41;</span>
	<span style="color: #aa9900; font-weight: bold;">local</span> t <span style="color: #66cc66;">=</span> <span style="color: #cc66cc;">2</span><span style="color: #66cc66;">*</span><span style="color: #0000aa;">math.pi</span><span style="color: #66cc66;">*</span>i<span style="color: #66cc66;">/</span><span style="color: #cc66cc;">500</span>
	<span style="color: #aa9900; font-weight: bold;">local</span> pos <span style="color: #66cc66;">=</span> Vector3<span style="color: #66cc66;">&#40;</span><span style="color: #0000aa;">math.cos</span><span style="color: #66cc66;">&#40;</span>t<span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">,</span> <span style="color: #0000aa;">math.sin</span><span style="color: #66cc66;">&#40;</span>t<span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">,</span> <span style="color: #cc66cc;">0</span><span style="color: #66cc66;">&#41;</span>
	Unit<span style="color: #66cc66;">.</span>set_position<span style="color: #66cc66;">&#40;</span>unit<span style="color: #66cc66;">&#91;</span>i<span style="color: #66cc66;">&#93;</span><span style="color: #66cc66;">,</span> pos<span style="color: #66cc66;">&#41;</span>
<span style="color: #aa9900; font-weight: bold;">end</span></pre></td></tr></table></div>

<p>To address the issue of using stale vector3s, note that all vector3 pointers are 12-byte aligned with respect to the start of the vector3 buffer. This means that we can store an arbitrary number 0&#8211;11 in the lower bits of the pointer and just mask it out when we access the vector3. So every frame, we can generate a random number in the range 0&#8211;11 and store that in the pointer to all vector3s that we create that frame. When a vector3 is passed to us from Lua, we check that its stored number matches the current random number. This means that every time a stale vector3 is used, we have an 11/12 chance of detecting that. Over time, QA and programmers should be able to detect all bad uses of stale vector3s and convert them to proper &#8220;boxed&#8221; values.</p>
<p>Of course, we only run this test in development builds, in release mode we can skip all this.</p>
<p>So there you have it. While this solution has its quirks it shows a perfectly workable way of doing decent performance vector3 calculations in Lua without involving the garbage collector.</p>
<p>(This has also been posted to the <a href="http://bitsquid.blogspot.com/">BitSquid blog</a>.)</p>
]]></content:encoded>
			<wfw:commentRss>http://www.altdevblogaday.com/2011/07/09/lightweight-lua-bindings-part-2/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Lightweight Lua Bindings</title>
		<link>http://www.altdevblogaday.com/2011/06/26/lightweight-lua-bindings/</link>
		<comments>http://www.altdevblogaday.com/2011/06/26/lightweight-lua-bindings/#comments</comments>
		<pubDate>Sun, 26 Jun 2011 00:03:10 +0000</pubDate>
		<dc:creator>Niklas Frykholm</dc:creator>
		
		<guid isPermaLink="false">http://altdevblogaday.org/?p=9709</guid>
		<description><![CDATA[<p>A scripting language, such as Lua, can bring huge productivity gains to a game project. Quick iterations, immediate code reloads and an in-game console with a <a href="http://en.wikipedia.org/wiki/Read-eval-print_loop">read-eval-print-loop</a> are invaluable tools. A less obvious benefit is that introducing a scripting language creates a clear dividing line between &#8220;engine&#8221; and &#8220;gameplay&#8221; code with a well defined API between them. This is often good for the structure of the engine, at least if you intend to use it for more than one game.</p>
<p><a href="http://www.altdevblogaday.com/2011/06/26/lightweight-lua-bindings/" class="more-link">Read more on Lightweight Lua Bindings&#8230;</a></p>
]]></description>
				<content:encoded><![CDATA[<p>A scripting language, such as Lua, can bring huge productivity gains to a game project. Quick iterations, immediate code reloads and an in-game console with a <a href="http://en.wikipedia.org/wiki/Read-eval-print_loop">read-eval-print-loop</a> are invaluable tools. A less obvious benefit is that introducing a scripting language creates a clear dividing line between &#8220;engine&#8221; and &#8220;gameplay&#8221; code with a well defined API between them. This is often good for the structure of the engine, at least if you intend to use it for more than one game.</p>
<p>The main drawback is of course performance. It is a scary thing to discover late in a project that the game is slow because the script is doing too much. Especially since bad script performance cannot always be traced back to bugs or bad algorithms. Sure, you get those as well, but you can also get problems with &#8220;overall slowness&#8221; that cannot easily be traced back to specific bottlenecks or hot spots. There are two reasons for this. First, the slowness of script code compared to C, which means that everything just takes more time. And second, the fact that gameplay code tends to be &#8220;connection&#8221; rather than &#8220;compute&#8221; heavy which means there is less to gain from algorithmic improvements.</p>
<p>Part of this is a management issue. It is important to monitor the script performance (on the slowest target platform) throughout the production so that measures can be taken early if it looks like it will become a problem. But in this article I will focus on the technical aspects, specifically the C-to-Lua bindings.</p>
<p>It is important to note that when I am talking about performance in this article I mean performance on current generation consoles, because that is where performance problems occur. PC processors are much more powerful (especially when running virtual machines, which tend to be brutal to the cache). The extra cores on the consoles don&#8217;t help much with script execution (since scripts are connection heavy, they are hard to multithread). <i>And</i> the PC can run LuaJIT which <a href="http://luajit.org/performance_x86.html">changes the game completely</a>.</p>
<p>This may of course change in future generation consoles. If anyone from Sony or Microsoft is reading this, <i>please</i> add support for JITting to your next generation ventures.</p>
<h2>Lua bindings</h2>
<p>Apart from optimizing the Lua interpreter itself, optimizing the bindings between Lua and C is the best way of achieving a general performance improvement, since the bindings are used whenever Lua calls some function in the C code which in a typical game happens constantly.</p>
<p>The standard way of binding an object on the C side to Lua is to use a <i>full userdata</i> object. This is a heap allocated data blob with an associated <i>metatable</i> that can be used to store the methods of the object. This allows the user to make a call like:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="lua" style="font-family:monospace;">game_world<span style="color: #66cc66;">:</span>get_camera<span style="color: #66cc66;">&#40;</span><span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">:</span>set_position<span style="color: #66cc66;">&#40;</span>Vector3<span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">0</span><span style="color: #66cc66;">,</span><span style="color: #cc66cc;">0</span><span style="color: #66cc66;">,</span><span style="color: #cc66cc;">0</span><span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span></pre></td></tr></table></div>

<p>In many ways, this is the easiest and most convenient way of using objects in Lua, but it comes with several performance problems:</p>
<ul>
<li>Any time an object is passed from C to Lua, such as the camera in <i>get_camera()</i><br />
		or the vector created by <i>Vector3(0,0,0)</i>, memory for the object must be allocated on the heap. This can be costly.</li>
<li>All the heap objects must be garbage collected by Lua. Calls such as <tt>get_camera()</tt> create temporary objects that must be collected at some later time. The more garbage we create, the more time we need to spend in garbage collection.</li>
<li>Making use of many heap allocated objects can lead to bad cache performance. When the C side wants to use an object from Lua, it must first fetch it from Lua&#8217;s heap, then (in most cases) extract an object pointer from its data and look up the object in the game heap. So each time there is an extra cache miss.</li>
<li>The colon method call syntax <i>world:get_camera()</i> actually translates to something like (I&#8217;ve simplified this a bit, see the Lua documentation for details) <i>world._meta_table["get_camera"](world)</i>. I.e., it creates an extra table lookup operation for every call.</li>
</ul>
<p>We can get rid of the first two issues by caching the Lua objects. I.e. instead of creating a new Lua object every time <i>get_camera()</i> is called, we keep a reference to the object on the Lua side and just look it up and return it every time it is requested. But this has other disadvantages. Managing the cache can be tricky and it creates a lot more objects in the Lua heap, since the heap will now hold every object that has ever been touched by Lua. This makes garbage collection take longer and the heap can grow uncontrollably during the play of a level, depending on which objects the player interacts with. Also, this doesn&#8217;t solve the issue with objects that are truly temporary, such as <i>Vector3(0,0,0)</i>.</p>
<p>A better option is to use what Lua calls <i>light userdata</i>. A light userdata is essentially just a C pointer stored in Lua, with no additional information. It lives on the Lua stack (i.e. not the heap), does not require any memory allocations, does not participate in garbage collection and does not have an associated metatable. This addresses all our performance problems, but introduces new (not performance-related) issues:</p>
<ul>
<li>Since the objects don&#8217;t have metatables we cannot use the convenient colon syntax for calling their methods.</li>
<li>Light user data objects do not carry any type information, they are just raw pointers. So on the C side we have no way of telling if we have been called with an object of the right type.</li>
<li>Lifetime management is trickier since objects do not have destructors and are not garbage collected. How do we manage dangling pointers in Lua?</li>
</ul>
<h3>Colon syntax</h3>
<p>With light user data we cannot use the colon syntax to look up methods. Instead we must call global functions and pass in the objects as parameters. But we can still make sure to organize our methods nicely, i.e., put all the functions that operate on <i>World</i> objects in a table called <i>World</i>. It might then look something like this:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="lua" style="font-family:monospace;">Camera<span style="color: #66cc66;">.</span>set_position<span style="color: #66cc66;">&#40;</span>World<span style="color: #66cc66;">.</span>get_camera<span style="color: #66cc66;">&#40;</span>game_world<span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">,</span> Vector3<span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">0</span><span style="color: #66cc66;">,</span><span style="color: #cc66cc;">0</span><span style="color: #66cc66;">,</span><span style="color: #cc66cc;">0</span><span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span></pre></td></tr></table></div>

<p>If you are used to the object oriented style this way of writing can feel awkward at first. But in my experience you get accustomed to it quite quickly. It does have some implications which are not purely syntactical though. On the plus side, this style of writing makes it easy to cache the method lookups for better performance:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="lua" style="font-family:monospace;"><span style="color: #aa9900; font-weight: bold;">local</span> camera_set_position <span style="color: #66cc66;">=</span> Camera<span style="color: #66cc66;">.</span>set_position
<span style="color: #aa9900; font-weight: bold;">local</span> world_get_camera <span style="color: #66cc66;">=</span> World<span style="color: #66cc66;">.</span>get_camera
&nbsp;
camera_set_position<span style="color: #66cc66;">&#40;</span>world_get_camera<span style="color: #66cc66;">&#40;</span>game_world<span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">,</span> Vector3<span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">0</span><span style="color: #66cc66;">,</span><span style="color: #cc66cc;">0</span><span style="color: #66cc66;">,</span><span style="color: #cc66cc;">0</span><span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span></pre></td></tr></table></div>

<p>This transformation is so simple that you can easily write a script that performs it on your entire code base.</p>
<p>The main drawback is that we are no longer doing dynamic method lookup, we are calling one specific C method. So we can&#8217;t do virtual inheritance with method overrides. To me that is not a big problem because firstly, I think inheritance is vastly overrated as a design concept, and secondly, if you really need virtual calls you can always do the virtual method resolution on the C side and get the benefits while still having a static call in Lua.</p>
<h3>Type checking</h3>
<p>For full userdata we can check the type by looking at the metatable. The Lua library function <i>luaL_checkudata</i> provides this service. Since light userdata is just a raw pointer to Lua, no corresponding functionality is offered. So we need to provide the type checking ourselves. But how can we know the type of an arbitrary C pointer?</p>
<p>An important thing to notice is that type checking is only used for debugging. We only need to know if a function has been called with the right arguments or not. So we don&#8217;t actually need to know the exact type of the pointer, we just need to know if it points to the thing we expect. And since this is only used for bug detection, it doesn&#8217;t matter if we get a few false positives. And it is fine if the test takes a few cycles since we can strip it from our release builds.</p>
<p>Since we just need to know &#8220;is the object of this type&#8221; we can make test different for each type. So for each type, we can just pick whatever test fits that type best. Some possibilities are:</p>
<ul>
<li>Store a known four byte type marker at the start of the object&#8217;s memory. To verify the type, just dereference the pointer and check that the first four bytes match the expected marker. (This is the method I use most frequently.)</li>
<li>Keep a hash table of all objects of the specified type and check if it is there.</li>
<li>For objects that are allocated from a pool, check that the pointer lies within the range of the pool.</li>
</ul>
<h3>Object lifetimes</h3>
<p>There are two approaches you can take to ownership of objects in the Lua interface. They can either be Lua owned and destroyed by the garbage collector or they can be owned by the C side and destroyed by explicit function calls. Both approaches have their advantages, but I usually lean towards the latter one. To me it feels more natural that Lua explicitly creates and destroys cameras with <i>World.destroy_camera()</i> rather than cameras just popping out of existence when the garbage collector feels they are no longer used. Also, since in our engine, Lua is an option, not a requirement, it makes more sense to have the ownership on the C side.</p>
<p>With this approach you have the problem that Lua can hold &#8220;dangling pointers&#8221; to C objects, which can lead to nasty bugs. (If you took the other approach, you would have the opposite problem, which is equally nasty.)</p>
<p>Again, for debugging purposes, we would want to do something similar to what we did with the type information. We would like to know, in debug builds, if the programmer has passed us a pointer to a dead object, so that we can display an error message rather than exhibit undefined behavior.</p>
<p>This is a trickier issue and I haven&#8217;t found a clear cut solution, but here are some of the techniques I have used:</p>
<ul>
<li>Clear out the marker field of the object when it is freed. That way if you attempt to use it later you will get a type error. Of course, checking this can cause an access violation if the memory has been returned to the system.</li>
<li>For objects that get created and destroyed a lot, such as particles or sound instances, let Lua manage them by IDs rather than by raw pointers.</li>
<li>Keep a hash table of all known live objects of the type.</li>
<li>Let Lua point to the object indirectly through a handle. Use some bits of the pointer to locate the handle and match the rest to a counter in the handle so that you can detect if the handle has been released and repurposed for something else.</li>
</ul>
<h2>Conclusions</h2>
<p>Using light instead of full userdata does make things more inconvenient. But as we have seen, there are tricks that help overcome many of these inconveniences.</p>
<p>We still haven&#8217;t looked at truly the temporary objects, such as <i>Vector3(0,0,0)</i>. In my next article I will discuss what can be done about them.</p>
<p><span style="font-size:small">(This has also been posted to the <a href="http://bitsquid.blogspot.com/">BitSquid blog</a>.)</span></p>
]]></content:encoded>
			<wfw:commentRss>http://www.altdevblogaday.com/2011/06/26/lightweight-lua-bindings/feed/</wfw:commentRss>
		<slash:comments>15</slash:comments>
		</item>
		<item>
		<title>Strings Redux</title>
		<link>http://www.altdevblogaday.com/2011/06/10/strings-redux/</link>
		<comments>http://www.altdevblogaday.com/2011/06/10/strings-redux/#comments</comments>
		<pubDate>Fri, 10 Jun 2011 06:56:47 +0000</pubDate>
		<dc:creator>Niklas Frykholm</dc:creator>
		
		<guid isPermaLink="false">http://altdevblogaday.org/?p=8098</guid>
		<description><![CDATA[<p>Simpler programs are better programs. Today&#8217;s target: strings. In this post I will show you three ways of improving your code by simplifying your strings.</p>
<h2>1. Use UTF-8 everywhere</h2>
<p>When I issue programming tests I always have some question about different string encodings. It is a good way of testing if a candidate can distinguish what data represents from how it is represented. But when I write code I just use UTF-8 everywhere, both in memory and on disk. Why? UTF-8 has many advantages and no serious disadvantages.</p>
<p><a href="http://www.altdevblogaday.com/2011/06/10/strings-redux/" class="more-link">Read more on Strings Redux&#8230;</a></p>
]]></description>
				<content:encoded><![CDATA[<p>Simpler programs are better programs. Today&#8217;s target: strings. In this post I will show you three ways of improving your code by simplifying your strings.</p>
<h2>1. Use UTF-8 everywhere</h2>
<p>When I issue programming tests I always have some question about different string encodings. It is a good way of testing if a candidate can distinguish what data represents from how it is represented. But when I write code I just use UTF-8 everywhere, both in memory and on disk. Why? UTF-8 has many advantages and no serious disadvantages.</p>
<p>Advantages:</p>
<ul>
<li>Using the same encoding everywhere means there is never any confusion about what encoding a certain string or file should be in. If it is not in UTF-8, then it is <i>wrong</i>. Period.</li>
<li>UTF-8 uses the standard C data types for strings: <i>char *</i> and <i>char []</i>.</li>
<li>ASCII strings look like ASCII strings and all functions, parsers, etc that operate on ASCII strings work on UTF-8 strings without modification.</li>
</ul>
<p>The most common disadvantages claimed for UTF-8 are:</p>
<ul>
<li>UTF-8 can waste memory.</li>
<li>Finding the i’th glyph in a UTF-8 string is expensive (O(n) rather than O(1)).</li>
</ul>
<p>There is some truth to the first point. Yes, if your text is in Japanese, UTF-8 probably uses more memory than Shift-JIS. But I don’t think that is a major issue. First, while UTF-8 is worse than other encodings for some languages, it does pretty well on average. Second, strings aren’t a big part of a game’s memory usage anyway (if they are, you are most likely doing something wrong). And third, if you care that much about string memory usage you should probably compress your string data.</p>
<p>Compression will pretty much nullify any differences in memory usage caused by using different encodings, since the entropy of the underlying data is the same regardless of how it is encoded. (At least in theory, it would be interesting to see someone test it in practice.)</p>
<p>The second point is true but also moot, since accessing glyphs at random indices in a string is a much rarer operation than you might think. For most string operations: concatenation, parsing, etc you never have to access individual glyphs. You can just use the same implementation as you would use for an ASCII-string and it will work without modification.</p>
<p>In the few cases where you <i>do</i> need to convert to glyphs (for example for rendering) you typically do that <i>sequentially</i>, from the start to the end. This is still a fast operation, it is only <i>random access</i> of glyphs that is significantly slower with UTF-8 than with UTF-32. Another interesting thing to note is that since all continuation bytes in UTF-8 follow the pattern 10xxxxxx you can quickly find the start and end of the next or previous glyph given a <i>char *</i> to anywhere within a UTF-8 string.</p>
<p>In fact I can&#8217;t think of any string operation that requires fast random access to glyphs other than completely contrived examples (given 10000 long strings, find the 1000th glyph in each). I urge my readers to try to come up with something.</p>
<h2>2. You do not need a string class</h2>
<p>String classes are highly overrated.</p>
<p>Generally speaking, code that deals with strings can be divided into two categories: code that looks at static strings (parsers, data compilers, script callbacks, etc) and code that builds dynamic strings (template formatters, debug logging, etc). In a typical game project there is a lot more of the first than the latter. Ironically, string classes don’t do a very good job with <i>either</i>!</p>
<p>For code that deals with static strings you should always use <i>const char *</i> rather than <i>const string &amp;</i>. The former is more flexible. It allows the caller to store her strings however she likes rather than adhering to some memory model imposed by the string class. It also means that if you call the function with a static string it doesn’t get pointlessly converted to a <i>string</i> object.</p>
<p>But string classes aren’t very good for dynamic strings either, as anyone who has written something like this can attest to:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="cpp" style="font-family:monospace;">string a<span style="color: #008080;">;</span>
<span style="color: #0000ff;">for</span> <span style="color: #008000;">&#40;</span>i <span style="color: #000080;">=</span> <span style="color: #0000dd;">0</span><span style="color: #008080;">;</span> i<span style="color: #000080;">&lt;</span><span style="color: #0000dd;">10000</span><span style="color: #008080;">;</span> <span style="color: #000040;">++</span>i<span style="color: #008000;">&#41;</span>
    a <span style="color: #000040;">+</span><span style="color: #000080;">=</span> <span style="color: #FF0000;">&quot;xxx&quot;</span><span style="color: #008080;">;</span></pre></td></tr></table></div>

<p>Depending on how your string class is implemented this can be horribly inefficient, reallocating and copying the string memory for every iteration of the loop. There are various ways of addressing this: reserving memory for the string up front or using some kind of &#8220;rope&#8221; or &#8220;stringstream&#8221; class.</p>
<p>The simpler approach is to just use:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="cpp" style="font-family:monospace;">vector<span style="color: #000080;">&lt;</span><span style="color: #0000ff;">char</span><span style="color: #000080;">&gt;</span> a<span style="color: #008080;">;</span>
<span style="color: #0000ff;">for</span> <span style="color: #008000;">&#40;</span>i<span style="color: #000080;">=</span><span style="color: #0000dd;">0</span><span style="color: #008080;">;</span> i<span style="color: #000080;">&lt;</span><span style="color: #0000dd;">10000</span><span style="color: #008080;">;</span> <span style="color: #000040;">++</span>i<span style="color: #008000;">&#41;</span>
	string<span style="color: #008080;">::</span><span style="color: #007788;">append</span><span style="color: #008000;">&#40;</span>a, <span style="color: #FF0000;">&quot;xxx&quot;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span></pre></td></tr></table></div>

<p>We represent the string as a vector of chars and provide a library of functions for performing &#8220;common string operations&#8221; on that representation.</p>
<p>The advantage of this over using a regular string class is that it provides a clear distinction between strings that can grow (vector&lt;char&gt;) and strings that can&#8217;t (char *) and emphasizes what the cost of growing is (amortized linear time). Do you know the cost of growing in your string class?</p>
<h2>3. You should almost never use strings in your runtime</h2>
<p>The variable length nature of strings make them slow, memory consuming and unwieldy (memory for them must be allocated and freed). If you use fixed length strings you will either use even more memory or annoy the content creators because they can&#8217;t make their resource names as descriptive as they would like too.</p>
<p>For these reasons I think that strings in the runtime should be reserved for two purposes:</p>
<ul>
<li>User interface text</li>
<li>Debugging</li>
</ul>
<p> In particular, you shouldn&#8217;t use strings for object/resource/parameter names in the runtime. Instead use string hashes. This lets you use user friendly names (strings) in your tools and fast ints in your runtime. It is also a lot easier to use than enums. Enums require global cooperation to avoid collisions. String hashes just require that you hash into a large enough key space.</p>
<p>We hash names during our data compile stage into either 32-bit or 64-bit ints depending on the risk of collision. If it is a global object name (such as the name of a texture) we use 64-bit ints. If it is a local name (such as the name of a bone in a character) we use 32-bit ints. Hash collision is considered a compile error. (It hasn&#8217;t happened yet.)</p>
<p>Since user interface text should always be localized, all user interface strings are managed by the localizer. The localized text is fetched from the localizer with a string lookup key, such as &#8220;menu_file_open&#8221; (hashed to a 64-bit int of course).</p>
<p>This only leaves debugging. We use formatted strings for informative assert messages when something goes wrong. Our profiler and monitoring tools use <a href="http://altdevblogaday.com/2011/05/26/monitoring-your-game/">interned strings</a> to identify data. Our game programmers use debug-prints to root out problems. Of course, non of this affects the end user, since the debugging strings are only used in debug builds.</p>
<p>Hashes can be problematic when debugging. If there is an error in the resource 0x3e728af10245bc71 it is not immediately obvious that it is the object <i>vegetation/trees/larch_3.mesh</i> that is at fault.</p>
<p>We handle this with a lookup table. When we compile our data we also create a reverse lookup table that converts from a hash value back to the original string that generated it. This table is not loaded by the runtime, but it can be accessed by our tools. So our game console, for instance, uses this table to automatically translate any hash IDs that are printed by the game.</p>
<p>However, recently I&#8217;ve started to also add small fixed-size debug strings to the resources themselves. Something like this:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="cpp" style="font-family:monospace;">HashMap<span style="color: #000080;">&lt;</span>IdString64, MeshResource <span style="color: #000040;">*</span><span style="color: #000080;">&gt;</span> _meshes<span style="color: #008080;">;</span>
&nbsp;
<span style="color: #0000ff;">struct</span> MeshResource
<span style="color: #008000;">&#123;</span>
	<span style="color: #0000ff;">char</span> debug_name<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">32</span><span style="color: #008000;">&#93;</span><span style="color: #008080;">;</span>
	…
<span style="color: #008000;">&#125;</span><span style="color: #008080;">;</span></pre></td></tr></table></div>

<p>As you can see, all the lookup tables etc, still use the 64-bit hash to identify the resource. But inside the resource is a 32-byte human friendly name (typically, the last 32 characters of the resource name), which is only used for debugging. This doesn&#8217;t add much to the resource size (most resources are a lot bigger than 32 bytes) but it allows us to quickly identify a resource in the debugger or in a raw memory dump without having to open up a tool to convert hashes back to strings. I think the time saved by this is worth those extra bytes.</p>
<p><span style="font-size:small">(This has also been posted to the <a href="http://bitsquid.blogspot.com/">BitSquid blog</a>.)</span></p>
]]></content:encoded>
			<wfw:commentRss>http://www.altdevblogaday.com/2011/06/10/strings-redux/feed/</wfw:commentRss>
		<slash:comments>16</slash:comments>
		</item>
		<item>
		<title>Monitoring your game</title>
		<link>http://www.altdevblogaday.com/2011/05/26/monitoring-your-game/</link>
		<comments>http://www.altdevblogaday.com/2011/05/26/monitoring-your-game/#comments</comments>
		<pubDate>Thu, 26 May 2011 19:52:44 +0000</pubDate>
		<dc:creator>Niklas Frykholm</dc:creator>
		
		<guid isPermaLink="false">http://altdevblogaday.org/?p=6772</guid>
		<description><![CDATA[<p>Many bugs are easy to fix with debuggers, stack traces and printf-statements. But some are hard to even <i>see</i> with such tools. I&#8217;m thinking of things like frame rate hitches, animation glitches and camera stutters. You can&#8217;t put a breakpoint on the glitch because what constitutes a glitch is only defined in relation to what happened in the frame before or what will happen in the next frame. And even if you are able to break exactly when the glitch occurs, you might not be able to tell what is going on from the call stack.</p>
<p><a href="http://www.altdevblogaday.com/2011/05/26/monitoring-your-game/" class="more-link">Read more on Monitoring your game&#8230;</a></p>
]]></description>
				<content:encoded><![CDATA[<p>Many bugs are easy to fix with debuggers, stack traces and printf-statements. But some are hard to even <i>see</i> with such tools. I&#8217;m thinking of things like frame rate hitches, animation glitches and camera stutters. You can&#8217;t put a breakpoint on the glitch because what constitutes a glitch is only defined in relation to what happened in the frame before or what will happen in the next frame. And even if you are able to break exactly when the glitch occurs, you might not be able to tell what is going on from the call stack.</p>
<p>In these situations, some way of monitoring and visualizing your game&#8217;s behavior can be invaluable. Indeed, if we graph the delta time for each frame, the hitches stand out clear as day.</p>
<div id="attachment_6773" class="wp-caption aligncenter" style="width: 673px"><a href="http://altdevblogaday.com/wp-content/uploads/2011/05/monitor1.png"><img src="http://altdevblogaday.com/wp-content/uploads/2011/05/monitor1.png" alt="" width="663" height="650" class="size-full wp-image-6773" /></a><p class="wp-caption-text">Delta-time graph with frame rate drops.</p></div>
<p>A graph like this opens up many new ways of attacking glitch bugs. You can play the game with the graph displayed and try to see what game actions trigger the glitches. Do they happen when a certain enemy is spawned? When a particular weapon is fired? Another approach is to draw the total frame time together with the time spent in all the different subsystems. This immediately shows you which subsystem is causing the frame rate to spike. You can constrain the problem further by graphing the time spent in narrower and narrower profiler scopes.</p>
<p>Visualization tools like these can help with many other issues as well. Want to find out where a weird camera stutter comes from? Plot the camera position, the position of its look-at target and any other variables that may influence its behavior to pin down the source of the problem. Draw a <a href="http://altdevblogaday.com/2011/05/17/a-birds-eye-view-of-your-memory-map/">graph representing your memory fragmentation</a> to find problematic allocations and get an overall feeling for how bad the situation is. Does something look slightly off with the animations? Graph the bone rotations to make sure that you don&#8217;t have any vibrations or discontinuities. Graph your network usage to make sure you stay below the bandwidth cap.</p>
<div id="attachment_6774" class="wp-caption aligncenter" style="width: 664px"><a href="http://altdevblogaday.com/wp-content/uploads/2011/05/monitor2.png"><img src="http://altdevblogaday.com/wp-content/uploads/2011/05/monitor2.png" alt="" width="654" height="642" class="size-full wp-image-6774" /></a><p class="wp-caption-text">Rotation of a bone during a jump animation.</p></div>
<p>When you study your game in this way, you will most likely learn things that surprise you. Games are highly complex systems built by a large number of people over a long period of time. As all complex systems they show emergent behavior.  You can be quite certain that at least someone has done at least done something that is <i>completely unexpected and totally weird</i>. You can&#8217;t hope to discover these things using just a bottom-up approach. There is too much code and too much data. Instead you must study your game as if it was an alien organism. Prod it and see how it reacts. Keep the graphs on screen and make sure that they look sane.</p>
<p>There are many different kinds of data that can be interesting and many ways of visualizing them &#8211; graphs, bars, charts, etc. But in all cases the pattern is pretty much the same. We have some data that we record from the game and then we have a visualizer that takes this data and draws it in some interesting way. Schematically, we can represent it like this:</p>
<div id="attachment_6775" class="wp-caption aligncenter" style="width: 665px"><a href="http://altdevblogaday.com/wp-content/uploads/2011/05/monitor3.jpeg"><img src="http://altdevblogaday.com/wp-content/uploads/2011/05/monitor3.jpeg" alt="" width="655" height="548" class="size-full wp-image-6775" /></a><p class="wp-caption-text">Basic monitoring system schematic.</p></div>
<p>I will refine this picture shortly, but first lets do a little data-oriented design and ask ourselves how we can best store and process this data.</p>
<p>If you have read any of my earlier blog posts you will know that I&#8217;m a fan of big dumb continuous memory buffers and data structures that look like &#8220;file formats for memory&#8221;. And this approach works perfectly for this problem. We can just store the data as a big block of concatenated structs, where each struct represents some recorded data. We begin each record with an enum specifying the type of recorded event and follow that with a variable sized struct with data for that particular event.</p>
<div id="attachment_6776" class="wp-caption aligncenter" style="width: 654px"><a href="http://altdevblogaday.com/wp-content/uploads/2011/05/monitor4.jpeg"><img src="http://altdevblogaday.com/wp-content/uploads/2011/05/monitor4.jpeg" alt="" width="644" height="180" class="size-full wp-image-6776" /></a><p class="wp-caption-text">Data buffer layout.</p></div>
<p>The event types might be things such as ENTER_PROFILER_SCOPE, LEAVE_PROFILER_SCOPE, ALLOCATE_MEMORY, FREE_MEMORY, RECORD_GLOBAL_FLOAT, etc.</p>
<p>RECORD_GLOBAL_FLOAT is the event type used for all kinds of data that we want to draw in graphs. We record the data with calls like these:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="c" style="font-family:monospace;">record_global_float<span style="color: #009900;">&#40;</span><span style="color: #ff0000;">&quot;application.delta_time&quot;</span><span style="color: #339933;">,</span> dt<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
record_global_float<span style="color: #009900;">&#40;</span><span style="color: #ff0000;">&quot;application.frame_rate&quot;</span><span style="color: #339933;">,</span> <span style="color:#800080;">1.0f</span> <span style="color: #339933;">/</span> dt<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span></pre></td></tr></table></div>

<p>The corresponding data struct is just:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="c" style="font-family:monospace;"><span style="color: #993333;">struct</span> RecordGlobalFloatEvent <span style="color: #009900;">&#123;</span>
    <span style="color: #993333;">const</span> <span style="color: #993333;">char</span> <span style="color: #339933;">*</span>name<span style="color: #339933;">;</span>
    <span style="color: #993333;">float</span> value<span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span><span style="color: #339933;">;</span></pre></td></tr></table></div>

<p>Note that there is an interesting little trick being used here. When we record the events, we just record the string <i>pointers</i>, not the complete string data. This saves memory, makes the struct fixed size and gives us faster string compares. This works because <i>record_global_float()</i> is called with static string data that is always at the same address and kept in memory throughout the lifetime of the application. (In the rare case where you want to call <i>record_global_float()</i> with a dynamic string, you must allocate a copy of that string at some permanent location, i.e. do a form of <a href="http://en.wikipedia.org/wiki/String_interning">string interning</a>.)</p>
<p>Now, let&#8217;s refine the picture slightly. There is a problem with recording all data to a single memory buffer and that is multithreading. If all threads record their data to the same memory buffer then we need lots of mutex locking to make sure they don&#8217;t step on each other&#8217;s toes.</p>
<p>We might also want to add support for some kind of off-line (i.e., not in-game) visualization. Off-line visualizers can take advantage of the full power of your development PC to implement more powerful visualization algorithms. And since they have near unlimited memory, they can record the entire data history so that you can explore it back and forth after the game session has ended.</p>
<p>With these refinements our monitoring system now looks like this:</p>
<div id="attachment_6777" class="wp-caption aligncenter" style="width: 705px"><a href="http://altdevblogaday.com/wp-content/uploads/2011/05/monitor5.jpeg"><img src="http://altdevblogaday.com/wp-content/uploads/2011/05/monitor5-1024x567.jpg" alt="" width="695" height="384" class="size-large wp-image-6777" /></a><p class="wp-caption-text">Advanced monitoring system schematic.</p></div>
<p>Each thread has a small TLS (thread-local-storage) cache with 64 K or so of debug memory where it records its events. When the cache gets full or we reach the end of the frame, the thread acquires the lock to the global event buffer and flushes its data there.</p>
<p>The active on-line visualizers process the events in the buffer and visualize them. Simulatenously, we send the data over TCP so that it can be processed by any off-line visualizers. In the process we consume the buffer data and the buffer can be filled with new data from the threads.</p>
<p>(We allocate all the buffers we use on a special debug heap, so that we separate the allocations which we only do for debugging purposes from the allocations done by the main game.)</p>
<p>Recording float data requires just a few lines of code.</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="c" style="font-family:monospace;"><span style="color: #000000; font-weight: bold;">enum</span> RECORD_GLOBAL_FLOAT_EVENT <span style="color: #339933;">=</span> <span style="color: #0000dd;">17</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">enum</span> THREAD_BUFFER_SIZE <span style="color: #339933;">=</span> <span style="color: #0000dd;">64</span><span style="color: #339933;">*</span><span style="color: #0000dd;">1024</span><span style="color: #339933;">;</span>
__thread <span style="color: #993333;">char</span> <span style="color: #339933;">*</span>_thread_buffer<span style="color: #339933;">;</span>
__thread <span style="color: #993333;">unsigned</span> _thread_buffer_count<span style="color: #339933;">;</span>
&nbsp;
<span style="color: #000000; font-weight: bold;">inline</span> <span style="color: #993333;">void</span> record_global_float<span style="color: #009900;">&#40;</span><span style="color: #993333;">const</span> <span style="color: #993333;">char</span> <span style="color: #339933;">*</span>name<span style="color: #339933;">,</span> <span style="color: #993333;">float</span> value<span style="color: #009900;">&#41;</span>
<span style="color: #009900;">&#123;</span>
     <span style="color: #b1b100;">if</span> <span style="color: #009900;">&#40;</span>_thread_buffer_count <span style="color: #339933;">+</span> <span style="color: #0000dd;">12</span> <span style="color: #339933;">&gt;</span> THREAD_BUFFER_SIZE<span style="color: #009900;">&#41;</span>
         flush_thread_buffer<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
     <span style="color: #993333;">char</span> <span style="color: #339933;">*</span>p <span style="color: #339933;">=</span> _thread_buffer <span style="color: #339933;">+</span> _thread_buffer_count
     <span style="color: #339933;">*</span><span style="color: #009900;">&#40;</span><span style="color: #993333;">unsigned</span> <span style="color: #339933;">*</span><span style="color: #009900;">&#41;</span>p <span style="color: #339933;">=</span> GLOBAL_FLOAT<span style="color: #339933;">;</span>
     <span style="color: #339933;">*</span><span style="color: #009900;">&#40;</span>RecordGlobalFloatEvent <span style="color: #339933;">*</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#40;</span>p<span style="color: #339933;">+</span><span style="color: #0000dd;">4</span><span style="color: #009900;">&#41;</span>.<span style="color: #202020;">name</span> <span style="color: #339933;">=</span> name<span style="color: #339933;">;</span>
     <span style="color: #339933;">*</span><span style="color: #009900;">&#40;</span>RecordGlobalFloatEvent <span style="color: #339933;">*</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#40;</span>p<span style="color: #339933;">+</span><span style="color: #0000dd;">4</span><span style="color: #009900;">&#41;</span>.<span style="color: #202020;">value</span> <span style="color: #339933;">=</span> value<span style="color: #339933;">;</span>
    thread_buffer_count <span style="color: #339933;">+=</span> <span style="color: #0000dd;">12</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span></pre></td></tr></table></div>

<p>When you have the data, writing the graph visualizer is not much work. Just save the data over a couple of frames and plot it using a line drawer.</p>
<p>In the BitSquid engine, we also expose all the data recording functions to Lua scripting. This makes it possible to dynamically create graphs for all kinds of data while the game is running.</p>
<p>As an example of this, a couple of days ago a game programmer suspected that some problematic behavior was caused by a low update frequency in the mouse driver. We quickly bashed out a couple of lines in the game console to produce a graph of the mouse data and could immediately confirm that this indeed was the case:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="lua" style="font-family:monospace;">Core<span style="color: #66cc66;">.</span>Debug<span style="color: #66cc66;">.</span>add_updator<span style="color: #66cc66;">&#40;</span>
  <span style="color: #aa9900; font-weight: bold;">function</span> <span style="color: #66cc66;">&#40;</span><span style="color: #66cc66;">&#41;</span>
    Profiler<span style="color: #66cc66;">.</span>record_statistics<span style="color: #66cc66;">&#40;</span><span style="color: #ff6666;">&quot;mouse&quot;</span><span style="color: #66cc66;">,</span> Mouse<span style="color: #66cc66;">.</span>axis<span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">0</span><span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span>
  <span style="color: #aa9900; font-weight: bold;">end</span> 
<span style="color: #66cc66;">&#41;</span>
graph make mousegraph
graph add_vector3 mousegraph mouse
graph range mousegraph <span style="color: #66cc66;">-</span><span style="color: #cc66cc;">20</span> <span style="color: #cc66cc;">20</span></pre></td></tr></table></div>

<div id="attachment_6778" class="wp-caption aligncenter" style="width: 645px"><a href="http://altdevblogaday.com/wp-content/uploads/2011/05/monitor6.png"><img src="http://altdevblogaday.com/wp-content/uploads/2011/05/monitor6.png" alt="" width="635" height="636" class="size-full wp-image-6778" /></a><p class="wp-caption-text">Graph of mouse input showing frames with no input.</p></div>
<p><span style="font-size: small;color: #777">(This article has also been posted to the <a href="http://bitsquid.blogspot.com/">BitSquid blog</a>.)</span></p>
]]></content:encoded>
			<wfw:commentRss>http://www.altdevblogaday.com/2011/05/26/monitoring-your-game/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Write A Script For It</title>
		<link>http://www.altdevblogaday.com/2011/05/11/write-a-script-for-it/</link>
		<comments>http://www.altdevblogaday.com/2011/05/11/write-a-script-for-it/#comments</comments>
		<pubDate>Wed, 11 May 2011 20:50:29 +0000</pubDate>
		<dc:creator>Niklas Frykholm</dc:creator>
		
		<guid isPermaLink="false">http://altdevblogaday.org/?p=5656</guid>
		<description><![CDATA[<p>I&#8217;m a programmer. I don&#8217;t do repetitive tasks. My computer does them for me.</p>
<p>I make a conscious effort to use scripts more. Before I start banging away on the keyboard on some repetitive or not very creative task, I stop for a second and ask myself: would it make sense to write a script for this? Mostly the answer is yes.</p>
<p><a href="http://www.altdevblogaday.com/2011/05/11/write-a-script-for-it/" class="more-link">Read more on Write A Script For It&#8230;</a></p>
]]></description>
				<content:encoded><![CDATA[<p>I&#8217;m a programmer. I don&#8217;t do repetitive tasks. My computer does them for me.</p>
<p>I make a conscious effort to use scripts more. Before I start banging away on the keyboard on some repetitive or not very creative task, I stop for a second and ask myself: would it make sense to write a script for this? Mostly the answer is yes.</p>
<p>I&#8217;m thinking of things like:</p>
<ul>
<li> Generating chm and pdf documentation
<li> Setting up a new build machine
<li> Counting the number of lines of source code
<li> Updating a web page
<li> Making a DLC package
<li> Doing the taxes
<li> Etc
</ul>
<p>Writing a script that performs a task instead of doing it manually has several advantages:</p>
<p><b>It saves time</b></p>
<p>The most obvious benefit. Instead of doing the task again and again, you can just run the script. You will quickly earn back the time you spent on writing it. And even if you never have to do the <i>exact</i> same thing ever again, you will often need to do <i>very similar</i> things that lets you reuse much of the script.</p>
<p>And every time you write a script you get better at it, and you will be able to do it faster and faster.</p>
<p><b>It formalizes procedural knowledge</b></p>
<p>Who keeps track of all the different versions of software that you need to install to setup a build machine? It is a task that is infrequent enough that every time you have to perform it you have forgotten the details. You could write it down in a list somewhere, but such lists have a tendency to get out of date.</p>
<p>With a script you can keep that list in an <i>executable form</i>. The script is the list and running the script installs all the necessary software. A script is easier to keep up-to-date, because it saves so much time that people have a vested interest in keeping it accurate. Also psychologically, a &#8220;broken program&#8221; <i>seems</i> more important to fix than a &#8220;not completely up-to-date list&#8221;.</p>
<p>&#8220;Executable lists&#8221; are a good way of making sure that you are not forgetting something, when a process (such as content submission) consists of a bunch of small steps. You can use it even if some of the steps cannot be automated. I. e. you can have one step where the script says: <i>Fetch the papers from the printer and put them in an envelope. Press return when you are done.</i></p>
<p>Having the knowledge in a script also allows you to save that knowledge by checking it into version control. You should <i>always</i> check your scripts into version control no matter how trivial they seem. Have special a folder for such &#8220;one-off&#8221; scripts.</p>
<p><b>It makes things more regular and organized</b></p>
<p>An interesting thing happens when you know that a task is performed by a script. You start to organize things so that it is <i>easier for the computer</i>. For example, you might put all the software that is needed in a single directory on the server instead of having it spread out all over the disk. You might put configuration data in a shared Google Spreadsheet where the script can fetch it.</p>
<p>When you approach a task, from the beginning, with the attitude that it is going to be performed by a script you will automatically create naming conventions and put data in well thought-out places, so that the script can do its work. And each time you run the script it will verify that everything is in place. So the script will impose a structure on the task and enforce you to keep that structure. If you later want to promote that simple script to something that is a part of your regular tool chain, you will already have the structure and configuration data in place.</p>
<p>This is another reason why it is good to think about scripting from the beginning. It will force the procedure to become more regular and standardized and thus more suited for scripting. Automating an already established manual procedure can be much messier.</p>
<h2>Getting Started</h2>
<p>To get into the habit of scripting you need a scripting language with &#8220;batteries included&#8221; (i.e. libraries for performing the most common task). I&#8217;m partial to <a href="http://ruby-lang.org/">Ruby</a> myself, but <a href="http://www.python.org/">Python</a> is also a good alternative.</p>
<p>If you do go with Ruby, here are some recommendation for modules to get acquainted with:</p>
<dl>
<dt><a href="http://www.ruby-doc.org/stdlib/libdoc/find/rdoc/classes/Find.html">Find</a></dt>
<dd>Walks the directory tree recursively. Very useful when you want to perform something on a bunch of files.</dd>
<dt><a href="http://www.ruby-doc.org/stdlib/libdoc/fileutils/rdoc/index.html">FileUtils</a></dt>
<dd>For copying, moving, renaming files.</dd>
<dt><a href="http://www.ruby-doc.org/core/classes/Kernel.html#M001408">Backticks or %x operator</a></dt>
<dd>For running a system command and getting the output back.</dd>
<dt><a href="http://www.ruby-doc.org/core/classes/Regexp.html">Regular expressions</a></dt>
<dd>For quickly and hackishly parsing and reformatting data.</dd>
</dl>
]]></content:encoded>
			<wfw:commentRss>http://www.altdevblogaday.com/2011/05/11/write-a-script-for-it/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>Universal Undo, Copy and Paste</title>
		<link>http://www.altdevblogaday.com/2011/04/26/universal-undo-copy-and-paste/</link>
		<comments>http://www.altdevblogaday.com/2011/04/26/universal-undo-copy-and-paste/#comments</comments>
		<pubDate>Tue, 26 Apr 2011 20:47:22 +0000</pubDate>
		<dc:creator>Niklas Frykholm</dc:creator>
		
		<guid isPermaLink="false">http://altdevblogaday.org/?p=4827</guid>
		<description><![CDATA[<p>(Crossposted to the <a href="http://bitsquid.blogspot.com/">BitSquid blog</a>.)</p>
<p>Undo, Copy and Paste are the bane of any tools programmer. Especially when they are bolted on to an already existing program. But even when they are properly planned from the start, these small (but essential) features can consume a lot of development time and be the source of many bugs.</p>
<p><a href="http://www.altdevblogaday.com/2011/04/26/universal-undo-copy-and-paste/" class="more-link">Read more on Universal Undo, Copy and Paste&#8230;</a></p>
]]></description>
				<content:encoded><![CDATA[<p>(Crossposted to the <a href="http://bitsquid.blogspot.com/">BitSquid blog</a>.)</p>
<p>Undo, Copy and Paste are the bane of any tools programmer. Especially when they are bolted on to an already existing program. But even when they are properly planned from the start, these small (but essential) features can consume a lot of development time and be the source of many bugs.</p>
<p>Wouldn&#8217;t it be nice if all that could be eliminated?</p>
<p>In an <a href="http://altdevblogaday.com/2011/03/27/collaboration-and-merging/">earlier post</a> I presented a generic model for storing data: objects-with-properties. As any model it consists of a combination of generalizations and restrictions. The generalizations make the model broadly applicable. The restrictions let us reason about it and prevents it from becoming an <a href="http://en.wikipedia.org/wiki/Inner-platform_effect">&#8220;inner platform&#8221;</a>.</p>
<p>To quickly recap, here is the gist of the model:</p>
<ul>
<li>The data consists of a set of objects-with-properties.</li>
<li>Each object is identified by a GUID.</li>
<li>Each property is identified by a string.</li>
<li>The property value can be null, a bool, a double, a vector3, a quaternion, a string, a data blob, a GUID or a set of GUIDs.</li>
<li>The data has a root object with GUID 0.</li>
</ul>
<p>We need only five operations to manipulate data stored using this model:</p>
<dl>
<dt>create(guid)</dt>
<dd>creates the object with the specified GUID</dd>
<dt>destroy(guid)</dt>
<dd>destroys the object with the specified GUID</dd>
<dt>set_property(guid, key, value)</dt>
<dd>sets the specified property of the object to the value (set to nil to remove the property)</dd>
<dt>add_to_set(guid, key, item_guid)</dt>
<dd>adds the item to the GUID set property identified by the key</dd>
<dt>remove_from_set(guid, key, item_guid)</dt>
<dd>removes the item from the GUID set property identified by the key</dd>
</dl>
<p>The interesting thing about this model is that it is generic enough to represent almost any kind of data, yet restricted enough to make it possible to define and perform a variety of interesting operations on the data. For example, in the previous post we saw that it was possible to define a property-based merge operation on the data (which for content files is much more useful than the line-based merge used by most version control systems).</p>
<p>Other operations that are easy to perform on this data are:</p>
<ul>
<li>referential integrity checks (check that all GUIDs used exist in the database)</li>
<li>checks for &#8220;dangling&#8221; objects</li>
<li>object replacement (replace all references to an object&#8217;s GUID with references to another object)</li>
</ul>
<p>And, of course, the topic for the day: Undo, Copy and Paste.</p>
<h2>Undo</h2>
<p>To implement undo in this model, note that each of the five mutating operations we can perform on the data has a simple inverse:</p>
<table>
<tr>
<th>Operation</th>
<th>Inverse</th>
</tr>
<tr>
<td>create(guid)</td>
<td>destroy(guid)</td>
</tr>
<tr>
<td>destroy(guid)</td>
<td>create(guid)</td>
</tr>
<tr>
<td>set_property(guid, key, value)</td>
<td>set_property(guid, key, old_value)</td>
</tr>
<tr>
<td>add_to_set(guid, key, item_guid)</td>
<td>remove_from_set(guid, key, item_guid)</td>
</tr>
<tr>
<td>remove_from_set(guid, key, item_guid)</td>
<td>add_to_set(guid, key, item_guid)</td>
</tr>
</table>
<p>&nbsp;</p>
<p>To implement Undo, all we have to do is to make sure that whenever the user performs one of the mutating operations, we save the corresponding inverse operation to a stack. To undo the latest action, we pop that last action from the stack and perform it. (We also save <em>its</em> inverse operation to a redo queue, so the user can redo it.)</p>
<p>Since the Undo operation is implemented on the low-level data model, all high-level programs that use it will automatically get &#8220;Undo&#8221; for free.</p>
<p>In the high level program you typically want to group together all the mutations that resulted from a single user action as one &#8220;undo item&#8221;, so the user can undo them with a single operation. You can do that by recording &#8220;restore points&#8221; in the undo stack whenever your program is idle. To undo an action, you undo all operations up to the last restore point.</p>
<h2>Copy</h2>
<p>To copy a set of objects, create a new database that holds just the copied objects. Copy the objects with their keys and values to the new database. Also copy all the objects they reference. (Use a set to remember the GUIDs of the objects you have already copied.)</p>
<p>In the root object of the new database, store the GUIDs of all the copied objects under some suitable key (for example: &#8220;copied-models&#8221;).</p>
<p>Then serialize the database copy to the clipboard (using your standard method for serialization).</p>
<h2>Paste</h2>
<p>To paste data, first unserialize it from the clipboard to a new temporary database. Then rename all the objects (give them new GUIDs) to make sure they don&#8217;t collide with existing objects.</p>
<p>Renaming is simple, just generate a new GUID for every object in the database. Use a dictionary to record the mapping from an object&#8217;s old GUID to the new GUID. Then, using that dictionary, translate all the references in the object properties from the old GUIDs to the new ones.</p>
<p>Finally, copy the objects from the temporary database to your main database.</p>
<p>Again, since Copy and Paste were implemented on the underlying data model and don&#8217;t depend on the high level data (what kind of objects you actually store) you get them for free in all programs that use the data model.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.altdevblogaday.com/2011/04/26/universal-undo-copy-and-paste/feed/</wfw:commentRss>
		<slash:comments>15</slash:comments>
		</item>
		<item>
		<title>Extreme Bug Hunting</title>
		<link>http://www.altdevblogaday.com/2011/04/11/extreme_bug_hunting/</link>
		<comments>http://www.altdevblogaday.com/2011/04/11/extreme_bug_hunting/#comments</comments>
		<pubDate>Mon, 11 Apr 2011 22:45:20 +0000</pubDate>
		<dc:creator>Niklas Frykholm</dc:creator>
		
		<guid isPermaLink="false">http://altdevblogaday.org/?p=3710</guid>
		<description><![CDATA[<p><em>Cross-posted to <a href="http://bitsquid.blogspot.com">bitsquid.blogspot.com</a>.</em></p>
<p>Put on your camouflage vest and step out onto the hot motherboard plains. Squint against the searing rays of burning processor cycles and feel the warm wind of chassi fans fill the air with anticipation. Today we go bug hunting.</p>
<p><a href="http://www.altdevblogaday.com/2011/04/11/extreme_bug_hunting/" class="more-link">Read more on Extreme Bug Hunting&#8230;</a></p>
]]></description>
				<content:encoded><![CDATA[<p><em>Cross-posted to <a href="http://bitsquid.blogspot.com">bitsquid.blogspot.com</a>.</em></p>
<p>Put on your camouflage vest and step out onto the hot motherboard plains. Squint against the searing rays of burning processor cycles and feel the warm wind of chassi fans fill the air with anticipation. Today we go bug hunting.</p>
<p>Our prey: the worst kind. Crashes only in release builds. Onl</p>
]]></content:encoded>
			<wfw:commentRss>http://www.altdevblogaday.com/2011/04/11/extreme_bug_hunting/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Collaboration and Merging</title>
		<link>http://www.altdevblogaday.com/2011/03/27/collaboration-and-merging/</link>
		<comments>http://www.altdevblogaday.com/2011/03/27/collaboration-and-merging/#comments</comments>
		<pubDate>Sun, 27 Mar 2011 21:38:16 +0000</pubDate>
		<dc:creator>Niklas Frykholm</dc:creator>
		
		<guid isPermaLink="false">http://altdevblogaday.org/?p=2596</guid>
		<description><![CDATA[<p>(This has also been posted to the <a href="http://bitsquid.blogspot.com">BitSquid blog</a>. We are looking for a tools programmer.)</p>
<p>Games are huge collaborative efforts, but usually they are not developed that way. Mostly, assets can only be worked on by one person at a time and need to be locked in version control to prevent conflicting changes. This can be a real time sink, especially for level design, but all assets would benefit from more collaborative workflows. As tool developers, it is time we start thinking seriously about how to support that.</p>
<p><a href="http://www.altdevblogaday.com/2011/03/27/collaboration-and-merging/" class="more-link">Read more on Collaboration and Merging&#8230;</a></p>
]]></description>
				<content:encoded><![CDATA[<p>(This has also been posted to the <a href="http://bitsquid.blogspot.com">BitSquid blog</a>. We are looking for a tools programmer.)</p>
<p>Games are huge collaborative efforts, but usually they are not developed that way. Mostly, assets can only be worked on by one person at a time and need to be locked in version control to prevent conflicting changes. This can be a real time sink, especially for level design, but all assets would benefit from more collaborative workflows. As tool developers, it is time we start thinking seriously about how to support that.</p>
<p>Recently I faced this issue while doing some work on our localization tools. (Localization is interesting in this context because it involves collaboration over long distances &#8212; a game studio in one country and a translation shop in another.) In the process I had a small epiphany: the key to collaboration is merging. When data merges nicely, collaborative work is easy. If you can&#8217;t merge changes it is really hard to do collaboration well, no matter what methods you use.</p>
<p><b>Why databases aren&#8217;t a magic solution</b></p>
<p>A central database can act as backend storage for a collaborative effort. But that, by itself, does not solve all issues of synchronization and collaboration.</p>
<p>Consider this: if you are going to use a database as your <em>only</em> synchronization mechanism then all clients will have to run in lockstep with the database. If you change something, you have to verify with the database that the change hasn&#8217;t been invalidated by something done by somebody else, perform the change as a single transaction and then wait for the database to acknowledge it before continuing. Every time you change something, you will have to wait for this round trip to the database and the responsiveness of your program is now completely at its mercy.</p>
<p>Web applications have faced this issue for a long time and they all use the same solution. Instead of synchronizing every little change with the database, they gather up their changes and send them to the database asynchronously. This change alone is what have made &#8220;web 2.0&#8243; applications competitive with desktop software.</p>
<p>But once you start talking to the database asynchronously, you have already entered &#8220;merge territory&#8221;. You send your updates to the server, they arrive at some later point, potentially after changes made by other users. When you get a reply back from the server you may already have made other, potentially conflicting, changes to your local data. Both at the server and in the clients, changes made by different users must be merged.</p>
<p>So you need merging. But you don&#8217;t necessarily need a database. If your merges are robust you can just use an ordinary version control system as the backend instead of a database. Or you can work completely disconnected and send your changes as patch files. The technology you use for the backend storage doesn&#8217;t matter that much, it is the ability to merge that is crucial.</p>
<p>A merge-based solution has another nice property that you don&#8217;t get with a &#8220;lockstep database&#8221;: the possibility of keeping a local changeset and only submitting it to others when it is &#8220;done&#8221;. This is of course crucial for code (imagine keeping all your source files in constantly mutating Google Documents). But I think it applies to other assets as well. You don&#8217;t want half-finished, broken assets all over your levels. An update/commit workflow is useful here as well.</p>
<p><strong>Making assets mergable</strong></p>
<p>If you have tried to merge assets in regular version control systems you will know that they usually don&#8217;t do so well. The merge tool can mess up the JSON/XML structure, mangle the file in other ways or just plain fail (because of a merge conflict). All of these problems arise because the merge tool treats the data as &#8220;source code&#8221; &#8212; a line-oriented text document with no additional structure. The reason for this is of course historic, version control systems emerged as a way of managing source code and then grew into other areas.</p>
<p>The irony of this is that source code is one of the hardest things to merge. It has complicated syntax and even more complicated semantics. Source code is so hard to merge that even humans with all their intelligency goodness find it taxing. In contrast, most assets are easy to merge, at least conceptually.</p>
<p>Take localization, for instance. The localization data is just a bunch of strings with translations for different languages. If one person has made a bunch of German translations, another person has made some Swedish translations and a third person has added some new source strings, we can merge all that without a hitch. The only time when we have any problem at all is if two people has provided different translations for the same string in the same language. We can solve such standoffs by just picking the most recent value. (Optionally, we could notify the user that this happened by hilighting the string in the tool.)</p>
<p>Many other assets have a similar structure. They can be described as &#8220;objects-with-properties&#8221;. For example, in a level asset the objects are the entities placed in the level and their properties are position, rotation, color, etc. All data that has this structure is easy to merge, because there are essentially just three types of operations you can perform on it: create an object, destroy an object and change a property of an object. All these operations are easy to merge. Again, the only problem is if two different users have changed the same property of the same object.</p>
<p>So when we try to merge assets using regular merge tools we are doing something rather silly. We are taking something that is conceptually very easy to merge, completely ignoring that and trying to merge it using rather complex algorithms that were designed for something completely different, something that is conceptually very hard to merge. Silly, when you think about it.</p>
<p>The solution to this sad state of affairs is of course to write custom merge tools that take advantage of the fact that assets are very easy to merge. Tools that understand the objects-with-properties model and know how to merge that.</p>
<p>A first step might be to write a merge program that understands XML or <a href="http://bitsquid.blogspot.com/2010/06/avoiding-content-locks-and-conflicts-3.html">JSON</a> files (the program in the link has some performance issues &#8212; I will deal with that in my next available time slot) and can interpret them as objects-with-properties.</p>
<p>This only goes half the way though, because you will need some kind of extra markup in the file for the tool to understand it as a set of objects-with-properties. For example, you probably need some kind of id field to mark object identity. Otherwise you can&#8217;t tell if a user has changed some properties of an old object or deleted the old object and created a new one. And that matters when you do the merge.</p>
<p>Instead of adding this extra markup, which can be a bit fragile, I think it is better to explicitly represent your data as objects-with-properties. <a href="http://bitsquid.blogspot.com/2010/08/new-data-storage-model.html">I&#8217;ve blogged about this before</a>, but since then I feel my thoughts on the subject have clarified and I&#8217;ve also had the opportunity to try it out in practice (with the localization tool). Such a representation could have the following key elements.</p>
<ul>
<li> The data consists of a set of objects-with-properties.</li>
<li> Each object is identified by a GUID.</li>
<li> Each property is identified by a string.</li>
<li> The property value can be null, a bool, a double, a vector3, a quaternion, a string, a data blob, a GUID or a set of GUIDs.</li>
<li>The data has a root object with GUID 0.</li>
</ul>
<p>We use a GUID to identify the object, since that means the ids of objects created by different users won&#8217;t collide. GUID values are used to make links between objects. Note that we don&#8217;t allow arrays, only sets. That is because array operations (move object from 5th place to 3rd place) are hard to merge. Set operations (insert object, remove object) are easy to merge.</p>
<p>Here is what a change set for creating a player entity in a level might look like using this model. (I have shortened the GUIDs to 2 bytes to make the example more readable.)</p>
<p>create #f341<br />
change_key #f341 &#8220;entity-type&#8221; &#8220;player&#8221;<br />
change_key #f341 &#8220;position&#8221; vector3(0,0,0)<br />
add_to_set #0000 &#8220;entities&#8221; #f341</p>
<p>Note that the root object (which represents the level) has a property &#8220;entities&#8221; that contains the set of all entities in the level.</p>
<p>To merge two such change sets, you could just append one to the other. You could even use the change set itself as your data format, if you don&#8217;t want to use a database backend (that is actually what I did for the localization tool).</p>
<p>I think most assets can be represented in the objects-with-properties model and it is a rather powerful way of making sure that they are mergable and collaboration-friendly. I will write all the new BitSquid tools with the object-with-properties model in mind and retrofit it into our older tools.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.altdevblogaday.com/2011/03/27/collaboration-and-merging/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>A Tiny Expression Language</title>
		<link>http://www.altdevblogaday.com/2011/03/13/tiny-expression-language/</link>
		<comments>http://www.altdevblogaday.com/2011/03/13/tiny-expression-language/#comments</comments>
		<pubDate>Sun, 13 Mar 2011 09:46:53 +0000</pubDate>
		<dc:creator>Niklas Frykholm</dc:creator>
		
		<guid isPermaLink="false">http://altdevblogaday.org/?p=1891</guid>
		<description><![CDATA[<p>(Also posted to <a href="http://bitsquid.blogspot.com/">bitsquid.blogspot.com</a>.)</p>
<p>Putting some of the power of programming into the hands of artists and designers can be a great thing. When they can customize the behavior of an object directly, without making the roundtrip through a programmer, there is a lot more room for experimentation and iteration. As a result you get better looking things with more interesting interactions.</p>
<p><a href="http://www.altdevblogaday.com/2011/03/13/tiny-expression-language/" class="more-link">Read more on A Tiny Expression Language&#8230;</a></p>
]]></description>
				<content:encoded><![CDATA[<p>(Also posted to <a href="http://bitsquid.blogspot.com/">bitsquid.blogspot.com</a>.)</p>
<p>Putting some of the power of programming into the hands of artists and designers can be a great thing. When they can customize the behavior of an object directly, without making the roundtrip through a programmer, there is a lot more room for experimentation and iteration. As a result you get better looking things with more interesting interactions.</p>
<p>Plus, if the artists do their own damn programming it means less work for me, so everybody wins.</p>
<p>Of course I don’t expect artists to actually program, but rather to use tools that expose that power, such as shader graphs, <a href="http://bitsquid.blogspot.com/2010/09/visual-scripting-data-oriented-way.html">visual scripting systems</a>, or — the topic of this post — expression languages.</p>
<p>By an expression language I mean a tiny little programming language that can be used to (and only used to) write one-line mathematical expressions, such as:</p>
<p>sin(t)  + 0.1 * cos(10 * t)</p>
<p>So  it is a really simple little calculator language. Simpler than <a href="http://en.wikipedia.org/wiki/Lisp_(programming_language)">Lisp</a>. Simpler than <a href="http://en.wikipedia.org/wiki/Forth_(programming_language)">Forth</a>. (Well maybe not, but simpler than trying to teach artists Lisp or Forth.) This simplicity has two advantages. First, it makes it easier to write and understand the expressions. Second, it makes it possible to compute the expressions efficiently, which is important, because it allows us to use them in more places without worrying too much about the performance or memory costs.</p>
<p>The expression language can be used to replace static values where we want the artist to be able to specify more unique behaviors. Some examples:</p>
<ul>
<li>In the particle system it can be used to script complicated custom particle behaviors that are hard to produce with other types of controllers.</li>
<li>In the animation system it can be used to compute the play speed and blend values of animations based on controller variables. </li>
<li>In the physics system it can be used to define custom force fields to achieve special effects, such as tornados, explosions or whirlwinds.</li>
</ul>
<p><strong>Computing the Expressions</strong></p>
<p>Since the expressions are so simple, usually not more than a few operators, we need to be able to evaluate them with as little overhead as possible. Otherwise, the overhead will dominate the execution cost. This means that we should use a simple design, such as a <a href="http://en.wikipedia.org/wiki/Stack_machine">stack-based virtual machine</a>. That may sound complicated, but the concepts are really quite simple. What it means is that we convert our expression to a sequence of operations that pushes or pops data from a computation stack. So our example from above:</p>
<p>sin(t)  + 0.1 * cos(10 * t)</p>
<p>Gets converted into:</p>
<p>t sin 0.1 10 t * cos * +</p>
<p>Here <em>t</em> pushes the value of the variable <em>t</em> to the stack. <em>sin</em> pops the top value from the stack, computes it and pushes the result to the stack. 0.1 pushes the value 0.1 to the stack. + pops two values from the stack, adds them together and pushes the result to the stack. * works the same way. If you go through the operations in the example you see that it computes the same result as the original expression.</p>
<p>This way of writing expressions is called <a href="http://en.wikipedia.org/wiki/Reverse_Polish_notation">Reverse Polish notation</a> (RPN) or postfix notation and it’s the basis for the programming language <a href="http://en.wikipedia.org/wiki/Forth_(programming_language)">Forth</a>.</p>
<p>If we examine the issue, we see that we really just need three types of operations in our byte code:</p>
<dl>
<dt>PUSH_VARIABLE</dt>
<dd>pushes the content of a variable to the stack</dd>
<dt>PUSH_FLOAT</dt>
<dd>pushes a floating point number to the stack</dd>
<dt>COMPUTE_FUNCTiON</dt>
<dd>pops the arguments of the stack, computes the result and pushes it to the stack</dd>
<dt>END</dt>
<dd>marks the end of the byte code</dd>
</dl>
<p>For simplicity I use 32 bits for each bytecode word. The upper 8 bits specify the type of the operation and the lower 24 bits is the data. For a variable the data is the index of the variable in a variable list. When compiling the bytecode you specify a list of variable names: {“t”, “x”}. And when executing you specify a corresponding list of variable values: {0.5, 20.1}. Similarly, for COMPUTE_FUNCTION, the data is an index into a function table. For PUSH_FLOAT we need an extra code word to hold the data, since we want 32 bit floats.</p>
<p>We can now write the function that runs the virtual machine, it is not much code at all:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
</pre></td><td class="code"><pre class="cpp" style="font-family:monospace;"><span style="color: #0000ff;">struct</span> Stack
<span style="color: #008000;">&#123;</span>
	<span style="color: #0000ff;">float</span> <span style="color: #000040;">*</span>data<span style="color: #008080;">;</span>
	<span style="color: #0000ff;">unsigned</span> size<span style="color: #008080;">;</span>
	<span style="color: #0000ff;">unsigned</span> capacity<span style="color: #008080;">;</span>
<span style="color: #008000;">&#125;</span><span style="color: #008080;">;</span>	
&nbsp;
<span style="color: #0000ff;">bool</span> run<span style="color: #008000;">&#40;</span><span style="color: #0000ff;">const</span> <span style="color: #0000ff;">unsigned</span> <span style="color: #000040;">*</span>byte_code, <span style="color: #0000ff;">const</span> <span style="color: #0000ff;">float</span> <span style="color: #000040;">*</span>variables, Stack <span style="color: #000040;">&amp;</span>stack<span style="color: #008000;">&#41;</span>
<span style="color: #008000;">&#123;</span>
	<span style="color: #0000ff;">const</span> <span style="color: #0000ff;">unsigned</span> <span style="color: #000040;">*</span>p <span style="color: #000080;">=</span> byte_code<span style="color: #008080;">;</span>
	<span style="color: #0000ff;">while</span> <span style="color: #008000;">&#40;</span><span style="color: #0000ff;">true</span><span style="color: #008000;">&#41;</span> <span style="color: #008000;">&#123;</span>
		<span style="color: #0000ff;">unsigned</span> bc <span style="color: #000080;">=</span> <span style="color: #000040;">*</span>p<span style="color: #000040;">++</span><span style="color: #008080;">;</span>
		<span style="color: #0000ff;">unsigned</span> op <span style="color: #000080;">=</span> <span style="color: #008000;">&#40;</span>bc <span style="color: #000080;">&gt;&gt;</span> <span style="color: #0000dd;">24</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
		<span style="color: #0000ff;">int</span> i <span style="color: #000080;">=</span> bc <span style="color: #000040;">&amp;</span> <span style="color: #208080;">0xffffff</span><span style="color: #008080;">;</span>
		<span style="color: #0000ff;">switch</span> <span style="color: #008000;">&#40;</span>op<span style="color: #008000;">&#41;</span> <span style="color: #008000;">&#123;</span>
			<span style="color: #0000ff;">case</span> BC_PUSH_FLOAT<span style="color: #008080;">:</span>
				<span style="color: #0000ff;">if</span> <span style="color: #008000;">&#40;</span>stack.<span style="color: #007788;">size</span> <span style="color: #000080;">==</span> stack.<span style="color: #007788;">capacity</span><span style="color: #008000;">&#41;</span> <span style="color: #0000ff;">return</span> <span style="color: #0000ff;">false</span><span style="color: #008080;">;</span>
				stack.<span style="color: #007788;">data</span><span style="color: #008000;">&#91;</span>stack.<span style="color: #007788;">size</span><span style="color: #000040;">++</span><span style="color: #008000;">&#93;</span> <span style="color: #000080;">=</span> unsigned_to_float<span style="color: #008000;">&#40;</span><span style="color: #000040;">*</span>p<span style="color: #000040;">++</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
				<span style="color: #0000ff;">break</span><span style="color: #008080;">;</span>
			<span style="color: #0000ff;">case</span> BC_PUSH_VAR<span style="color: #008080;">:</span>
				<span style="color: #0000ff;">if</span> <span style="color: #008000;">&#40;</span>stack.<span style="color: #007788;">size</span> <span style="color: #000080;">==</span> stack.<span style="color: #007788;">capacity</span><span style="color: #008000;">&#41;</span> <span style="color: #0000ff;">return</span> <span style="color: #0000ff;">false</span><span style="color: #008080;">;</span>
				stack.<span style="color: #007788;">data</span><span style="color: #008000;">&#91;</span>stack.<span style="color: #007788;">size</span><span style="color: #000040;">++</span><span style="color: #008000;">&#93;</span> <span style="color: #000080;">=</span> variables<span style="color: #008000;">&#91;</span>i<span style="color: #008000;">&#93;</span><span style="color: #008080;">;</span>
				<span style="color: #0000ff;">break</span><span style="color: #008080;">;</span>
			<span style="color: #0000ff;">case</span> BC_FUNCTION<span style="color: #008080;">:</span>
				compute_function<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#40;</span>OpCode<span style="color: #008000;">&#41;</span>i, stack<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
				<span style="color: #0000ff;">break</span><span style="color: #008080;">;</span>
			<span style="color: #0000ff;">case</span> BC_END<span style="color: #008080;">:</span>
				<span style="color: #0000ff;">return</span> <span style="color: #0000ff;">true</span><span style="color: #008080;">;</span>
		<span style="color: #008000;">&#125;</span>
	<span style="color: #008000;">&#125;</span>
<span style="color: #008000;">&#125;</span></pre></td></tr></table></div>

<p><strong>Compiling the Byte Code</strong></p>
<p>Compiling an expression involves three phases, tokenizing the data to a stream of input symbols, transforming that stream from infix to postfix notation and finally generating the byte code from that.</p>
<p>Tokenization means matching the identifiers in the expressions against a list of variable names and function names. We can also support contants that get converted to floats directly in the tokenization process. That is useful for things like <em>pi</em>.</p>
<p>The tokenization process converts our sample expression to something like this:</p>
<p>{ sin, (, t, ), +, 0.1, *, cos, (, 10, *, t, ) }</p>
<p>Now we need to convert this to infix notation. One way would be to write a full blown yacc parser with all that entails, but for this kind of simple expressions we can get away with something simpler, such as Dijkstra&#8217;s <a href="http://en.wikipedia.org/wiki/Shunting-yard_algorithm">Shunting Yard algorithm</a>.</p>
<p>I actually use an even simpler variant that doesn&#8217;t support right-associative operators, where I just process the input tokens one by one. If the token is a value or a variable I put it directly in the output. If the token is a function or an operator I push it to a function stack. But before I do that, I pop all functions with higher precedence from the function stack and put them in the output. Precedence takes parenthesis level into account, so a + nested in three parentheses has higher precedence than a * nested in two.</p>
<p>Let us see how this works for our simple example:</p>
<table>
<tr>
<th>Input</th>
<th>Output</th>
<th>Stack</th>
</tr>
<tr>
<td>sin ( t ) + 0.1 * cos ( 10 * t )</td>
<td></td>
<td></td>
</tr>
<tr>
<td>( t ) + 0.1 * cos ( 10 * t )</td>
<td></td>
<td>sin</td>
</tr>
<tr>
<td>+ 0.1 * cos ( 10 * t )</td>
<td>t</td>
<td>sin</td>
</tr>
<tr>
<td>0.1 * cos ( 10 * t )</td>
<td>t sin</td>
<td>+</td>
</tr>
<tr>
<td>* cos ( 10 * t )</td>
<td>t sin 0.1</td>
<td>+</td>
</tr>
<tr>
<td>cos ( 10 * t )</td>
<td>t sin 0.1</td>
<td>+ *</td>
</tr>
<tr>
<td>( 10 * t )</td>
<td>t sin 0.1</td>
<td>+ * cos</td>
</tr>
<tr>
<td>* t</td>
<td>t sin 0.1 10</td>
<td>+ * cos</td>
</tr>
<tr>
<td>t</td>
<td>t sin 0.1 10</td>
<td>+ * cos (*)</td>
</tr>
<tr>
<td></td>
<td>t sin 0.1 10 t</td>
<td>+ * cos (*)</td>
</tr>
<tr>
<td></td>
<td>t sin 0.1 10 t *</td>
<td>+ * cos</td>
</tr>
<tr>
<td></td>
<td>t sin 0.1 10 t * cos</td>
<td>+ *</td>
</tr>
<tr>
<td></td>
<td>t sin 0.1 10 t * cos *</td>
<td>+</td>
</tr>
<tr>
<td></td>
<td>t sin 0.1 10 t * cos * +</td>
<td></td>
</tr>
</table>
<p>&nbsp;</p>
<p><strong>Constant Folding</strong></p>
<p>To further improve efficiency we may want to distinguish the cases where the users have actually written an expression (such as “sin x”) from the cases where they have just written a constant (“0.5”) or a constant valued expression (“2*sin(pi)”). Luckily, constant folding is really easy to do in an RPL expression. </p>
<p>After tokenizing and RPL conversion, the expression “2 * sin(pi)” has been converted to:</p>
<p>2 3.14159265 sin *</p>
<p>We can constant fold a function of arity n if the n argument that preceedes it are constants. So in the sample above we can constant fold <em>sin</em> to:</p>
<p>2 <strong>3.14159265 sin</strong> *<br />
2 0 *</p>
<p>Continuing, we can fold *</p>
<p><strong>2 0 *</strong><br />
0</p>
<p>If we end up with a constant expression, the byte code will used be a single PUSH_FLOAT operation. We can detect that and bypass the expression evaluation all together for that case.</p>
<p><strong>Source Code</strong></p>
<p>If you want to start playing with these things you can start with my <a href="https://bitbucket.org/bitsquid/expression_language/src">expression language source code</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.altdevblogaday.com/2011/03/13/tiny-expression-language/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>Managing Decoupling Part 3 &#8211; C++ Duck Typing</title>
		<link>http://www.altdevblogaday.com/2011/02/25/managing-decoupling-part-3-c-duck-typing/</link>
		<comments>http://www.altdevblogaday.com/2011/02/25/managing-decoupling-part-3-c-duck-typing/#comments</comments>
		<pubDate>Fri, 25 Feb 2011 22:24:04 +0000</pubDate>
		<dc:creator>Niklas Frykholm</dc:creator>
		
		<guid isPermaLink="false">http://altdevblogaday.org/?p=1229</guid>
		<description><![CDATA[<p>(Also posted in the <a href="http://bitsquid.blogspot.com/" target="_blank">BitSquid blog</a>.)</p>
<p>Some systems need to manipulate objects whose exact nature are not known. For example, a particle system has to manipulate particles that sometimes have mass, sometimes a full 3D rotation, sometimes only 2D rotation, etc. (A <em>good</em> particle system anyway, a bad particle system could use the same struct for all particles in all effects. And the struct could have some fields called <em>custom_1</em>, <em>custom_2</em> used for different purposes in different effects. And it would be both inefficient, inflexible and messy.)</p>
<p><a href="http://www.altdevblogaday.com/2011/02/25/managing-decoupling-part-3-c-duck-typing/" class="more-link">Read more on Managing Decoupling Part 3 &#8211; C++ Duck Typing&#8230;</a></p>
]]></description>
				<content:encoded><![CDATA[<p>(Also posted in the <a href="http://bitsquid.blogspot.com/" target="_blank">BitSquid blog</a>.)</p>
<p>Some systems need to manipulate objects whose exact nature are not known. For example, a particle system has to manipulate particles that sometimes have mass, sometimes a full 3D rotation, sometimes only 2D rotation, etc. (A <em>good</em> particle system anyway, a bad particle system could use the same struct for all particles in all effects. And the struct could have some fields called <em>custom_1</em>, <em>custom_2</em> used for different purposes in different effects. And it would be both inefficient, inflexible and messy.)</p>
<p>Another example is a networking system tasked with synchronizing game objects between clients and servers. A very general such system might want to treat the objects as open JSON-like structs, with arbitrary fields and values:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="javascript" style="font-family:monospace;"><span style="color: #009900;">&#123;</span>
    <span style="color: #3366CC;">&quot;score&quot;</span> <span style="color: #339933;">:</span> <span style="color: #CC0000;">100</span><span style="color: #339933;">,</span>
    <span style="color: #3366CC;">&quot;name&quot;</span> <span style="color: #339933;">:</span> <span style="color: #3366CC;">&quot;Player 1&quot;</span>
<span style="color: #009900;">&#125;</span></pre></td></tr></table></div>

<p>We want to be able to handle such “general” or “open” objects in C++ in a nice way. Since we care about structure we don’t want the system to be strongly coupled to the layout of the objects it manages. And since we are performance junkies, we would like to do it in a way that doesn’t completely kill performance. I.e., we <em>don’t</em> want everything to inherit from a base class Object and define our JSON-like objects as:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="cpp" style="font-family:monospace;"><span style="color: #0000ff;">typedef</span> std<span style="color: #008080;">::</span><span style="color: #007788;">map</span><span style="color: #000080;">&lt;</span>std<span style="color: #008080;">::</span><span style="color: #007788;">string</span>,Object <span style="color: #000040;">*</span><span style="color: #000080;">&gt;</span> OpenStruct<span style="color: #008080;">;</span></pre></td></tr></table></div>

<p>Generally speaking, there are three possible levels of flexibility with which we can work with objects and types in a programming language:</p>
<p><strong>1. Exact typing &#8211; Only ducks are ducks</strong></p>
<p style="padding-left: 30px">We require the object to <em>be</em> of a specific type. This is the typing method used in C and for classes without inheritance in C++.</p>
<p><strong>2. Interface typing &#8211; If it says it’s a duck</strong></p>
<p style="padding-left: 30px">We require the object to inherit from and implement a specific interface type. This is the typing method used by default in Java and C# and in C++ when inheritance and virtual methods are used. It is more flexible that the exact approach, but still introduces a coupling, because it forces the objects we manage to inherit a type defined by us.</p>
<p style="padding-left: 30px">Side rant: My general opinion is that while inheriting <em>interfaces</em> (abstract classes) is a valid and useful design tool, inheriting <em>implementations</em> is usually little more than a glorified “hack”, a way of patching parent classes by inserting custom code here and there. You almost always get a cleaner design when you build your objects with composition instead of with implementation inheritance.</p>
<p><strong>3. Duck typing &#8211; If it quacks like a duck</strong></p>
<p style="padding-left: 30px">We don’t care about the type of the object at all, as long as it has the fields and methods that we need. An example:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">def</span> integrate_position<span style="color: black;">&#40;</span>o<span style="color: #66cc66;">,</span> dt<span style="color: black;">&#41;</span>:
    o.<span style="color: black;">position</span> <span style="color: #66cc66;">=</span> o.<span style="color: black;">position</span> + o.<span style="color: black;">velocity</span> * dt</pre></td></tr></table></div>

<p style="padding-left: 30px">This method integrates the position of the object <em>o</em>. It doesn’t care what the type of o is, as long as it has a “position” field and a “velocity” field.</p>
<p style="padding-left: 30px">Duck typing is the default in many “scripting” languages such as Ruby, Python, Lua and JavaScript. The reflection interface of Java and C# can also be used for duck typing, but unfortunately the code tends to become far less elegant than in the scripting languages:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="csharp" style="font-family:monospace;">o<span style="color: #008000;">.</span><span style="color: #0000FF;">GetType</span><span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">.</span><span style="color: #0000FF;">GetProperty</span><span style="color: #008000;">&#40;</span><span style="color: #666666;">&quot;Position&quot;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">.</span><span style="color: #0000FF;">SetValue</span><span style="color: #008000;">&#40;</span>o, o<span style="color: #008000;">.</span><span style="color: #0000FF;">GetType</span><span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">.</span>
  <span style="color: #0000FF;">GetProperty</span><span style="color: #008000;">&#40;</span><span style="color: #666666;">&quot;Position&quot;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">.</span><span style="color: #0000FF;">GetValue</span><span style="color: #008000;">&#40;</span>o, <span style="color: #0600FF; font-weight: bold;">null</span><span style="color: #008000;">&#41;</span> <span style="color: #008000;">+</span> o<span style="color: #008000;">.</span><span style="color: #0000FF;">GetType</span><span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">.</span>
  <span style="color: #0000FF;">GetProperty</span><span style="color: #008000;">&#40;</span><span style="color: #666666;">&quot;Velocity&quot;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">.</span><span style="color: #0000FF;">GetValue</span><span style="color: #008000;">&#40;</span>o, <span style="color: #0600FF; font-weight: bold;">null</span><span style="color: #008000;">&#41;</span> <span style="color: #008000;">*</span> dt, <span style="color: #0600FF; font-weight: bold;">null</span><span style="color: #008000;">&#41;</span></pre></td></tr></table></div>

<p>What we want is some way of doing “duck typing” in C++.</p>
<p>Let’s look at inheritance and virtual functions first, since that is the standard way of “generalizing” code in C++. It is true that you could do general objects using the inheritance mechanism. You would create a class structure looking something like:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="cpp" style="font-family:monospace;"><span style="color: #0000ff;">class</span> Object <span style="color: #008000;">&#123;</span>...<span style="color: #008000;">&#125;</span><span style="color: #008080;">;</span>
<span style="color: #0000ff;">class</span> Int <span style="color: #008080;">:</span> <span style="color: #0000ff;">public</span> Object <span style="color: #008000;">&#123;</span>...<span style="color: #008000;">&#125;</span><span style="color: #008080;">;</span>
<span style="color: #0000ff;">class</span> Float <span style="color: #008080;">:</span> <span style="color: #0000ff;">public</span> Object<span style="color: #008000;">&#123;</span>...<span style="color: #008000;">&#125;</span><span style="color: #008080;">;</span></pre></td></tr></table></div>

<p>and then use <em>dynamic_cast</em> or perhaps your own hand-rolled RTTI system to determine an object’s class.</p>
<p>But there are a number of drawbacks with this approach. It is quite verbose. The virtual inheritance model requires objects to be treated as pointers so they (probably) have to be heap allocated. This makes it tricky to get a good memory layout. And that hurts performance. Also, they are not PODs so we will have to do extra work if we want to move them to a co-processor or save them to disk.</p>
<p>So I prefer something much simpler. A generic object is just a type enum followed by the data for the object:</p>
<p><a href="http://altdevblogaday.com/wp-content/uploads/2011/02/duck_typing_1.png"><img class="aligncenter size-medium wp-image-1231" src="http://altdevblogaday.com/wp-content/uploads/2011/02/duck_typing_1-300x72.png" alt="" width="300" height="72" /></a></p>
<p>To pass the object you just pass its pointer. To make a copy, you make a copy of the memory block. You can also write it straight to disk and read it back, send it over network or to an SPU for off-core processing.</p>
<p>To extract the data from the object you would do something like:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="cpp" style="font-family:monospace;"><span style="color: #0000ff;">unsigned</span> type <span style="color: #000080;">=</span> <span style="color: #000040;">*</span><span style="color: #008000;">&#40;</span><span style="color: #0000ff;">unsigned</span> <span style="color: #000040;">*</span><span style="color: #008000;">&#41;</span>o<span style="color: #008080;">;</span>
<span style="color: #0000ff;">if</span> <span style="color: #008000;">&#40;</span>type <span style="color: #000080;">==</span> FLOAT_TYPE<span style="color: #008000;">&#41;</span>
    <span style="color: #0000ff;">float</span> f <span style="color: #000080;">=</span> <span style="color: #000040;">*</span><span style="color: #008000;">&#40;</span><span style="color: #0000ff;">float</span> <span style="color: #000040;">*</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#40;</span>o <span style="color: #000040;">+</span> <span style="color: #0000dd;">4</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span></pre></td></tr></table></div>

<p>You don’t really need that many different object types: <em>bool</em>, <em>int</em>, <em>float</em>, <em>vector3</em>, <em>quaternion</em>, <em>string</em>, <em>array</em> and <em>dictionary</em> is usually enough. You can build more complicated types as aggregates of those, just as you do in JSON.</p>
<p>For a dictionary object we just store the name/key and type of each object:</p>
<p><a href="http://altdevblogaday.com/wp-content/uploads/2011/02/duck_typing_2.png"><img class="aligncenter size-large wp-image-1232" src="http://altdevblogaday.com/wp-content/uploads/2011/02/duck_typing_2-1024x138.png" alt="" width="725" height="97" /></a></p>
<p>I tend to use a four byte value for the name/key and not care if it is an integer, float or a 32-bit string hash. As long as the data is queried with the same key that it was stored with, the right value will be returned. I only use this method for small structs, so the probability for a hash collision is close to zero and can be handled by “manual resolution”.</p>
<p>If we have many objects with the same “dictionary type” (i.e. the same set of fields, just different values) it makes sense to break out the definition of the type from the data itself to save space:</p>
<p><a href="http://altdevblogaday.com/wp-content/uploads/2011/02/duck_typing_3.png"><img class="aligncenter size-large wp-image-1233" src="http://altdevblogaday.com/wp-content/uploads/2011/02/duck_typing_3-1024x312.png" alt="" width="725" height="220" /></a></p>
<p>Here the <em>offset</em> field stores the offset of each field in the data block. Now we can efficiently store an array of such data objects with just one copy of the dictionary type information:</p>
<p><a href="http://altdevblogaday.com/wp-content/uploads/2011/02/duck_typing_4.png"><img class="aligncenter size-large wp-image-1234" src="http://altdevblogaday.com/wp-content/uploads/2011/02/duck_typing_4-1024x162.png" alt="" width="725" height="114" /></a></p>
<p>Note that the storage space (and thereby the cache and memory performance) is exactly the same as if we were using an array of regular C structs, even though we are using a completely open free form JSON-like struct. And extracting or changing data just requires a little pointer arithmetic and a cast.</p>
<p>This would be a good way of storing particles in a particle system. (Note: This is an array-of-structures approach, you can of course also use duck typing with a sturcture-of-arrays approach. I leave that as an exercise to the reader.)</p>
<p>If you are a graphics programmer all of this should look pretty familiar. The “dictionary type description” is very much like a “vertex data description” and the “dictionary data” is awfully similar to “vertex data”. This should come as no big surprise. Vertex data is generic flexible data that needs to be processed fast in parallel on in-order processing units. It is not strange that with the same design criterions we end up with a similar solution.</p>
<p><strong>Morale and musings</strong></p>
<p>It is OK to manipulate blocks of raw memory! Pointer arithmetic does not destroy your program! Type casts are not “dirty”! Let your freak flag fly!</p>
<p>Data-oriented-design and object-oriented design are not polar opposites. As this example shows a data-oriented design can in a sense be “more object-oriented” than a standard C++ virtual function design, i.e., more similar to how objects work in high level languages such as Ruby and Lua.</p>
<p>On the other hand, data-oriented-design and inheritance <em>are</em> enemies. Because designs based on base class pointers and virtual functions want objects to live individually allocated on the heap. Which means you cannot control the memory layout. Which is what DOD is all about. (Yes, you can probably do clever tricks with custom allocators and patching of vtables for moving or deserializing objects, but why bother, DOD is simpler.)</p>
<p>You could also store function pointers in these open structs. Then you would have something very similar to Ruby/Lua objects. This could probably be used for something great. This is left as an exercise to the reader.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.altdevblogaday.com/2011/02/25/managing-decoupling-part-3-c-duck-typing/feed/</wfw:commentRss>
		<slash:comments>19</slash:comments>
		</item>
		<item>
		<title>Managing Coupling Part 2 — Polling, Callbacks and Events</title>
		<link>http://www.altdevblogaday.com/2011/02/11/managing-coupling-part-2-%e2%80%94-polling-callbacks-and-events/</link>
		<comments>http://www.altdevblogaday.com/2011/02/11/managing-coupling-part-2-%e2%80%94-polling-callbacks-and-events/#comments</comments>
		<pubDate>Fri, 11 Feb 2011 01:13:00 +0000</pubDate>
		<dc:creator>Niklas Frykholm</dc:creator>
				<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://altdevblogaday.org/2011/02/11/managing-coupling-part-2-%e2%80%94-polling-callbacks-and-events/</guid>
		<description><![CDATA[<p>(Also posted in <a href="http://bitsquid.blogspot.com/">http://bitsquid.blogspot.com/</a>)</p>
<p>In my last post, I talked a bit about the importance of decoupling and how one of the fundamental challenges in system design is to keep systems decoupled while still allowing the necessary interactions to take place.</p>
<p><a href="http://www.altdevblogaday.com/2011/02/11/managing-coupling-part-2-%e2%80%94-polling-callbacks-and-events/" class="more-link">Read more on Managing Coupling Part 2 — Polling, Callbacks and Events&#8230;</a></p>
]]></description>
				<content:encoded><![CDATA[<p>(Also posted in <a href="http://bitsquid.blogspot.com/">http://bitsquid.blogspot.com/</a>)</p>
<p>In my last post, I talked a bit about the importance of decoupling and how one of the fundamental challenges in system design is to keep systems decoupled while still allowing the necessary interactions to take place.</p>
<p>This time I will look at one specific such challenge: when a low level system needs to notify a high level system that something has happened. For example, the animation system may want to notify the gameplay system that the character’s foot has touched the ground, so that a footstep sound can be played.</p>
<p>(Note that the reverse is not a problem. The high level system knows about the low level system and can call it directly. But the low level system shouldn’t know or care about the high level system.)</p>
<p>There are three common techniques for handling such notifications: polling, callbacks and events.</p>
<h3>Polling</h3>
<p>A polling system calls some function every frame to check if the event it is interested in has occurred. Has the file been downloaded yet? What about now? Are we there yet?</p>
<p>Polling is often considered “ugly” or “inefficient”. And indeed, in the desktop world, polling is very impolite, since it means busy-waiting and tying up 100 % of the CPU in doing nothing.</p>
<p>But in game development the situation is completely different. We are already doing a ton of stuff every 33 ms (or half a ton of stuff every 17 ms). As long as we don’t poll a huge amount of objects, polling won’t have any impact on the framerate.</p>
<p>And code that uses polling is often easier to write and ends up better designed than code that uses callbacks or events. For example, it is much easier to just check if the A key is pressed inside the character controller, than to write a callback that gets notified if A is pressed and somehow forward that information to the character controller.</p>
<p>So, in my opinion, you should actually prefer to use polling whenever possible (i.e., when you don’t have to monitor a huge number of objects).</p>
<p>Some areas where polling work well are: file downloads, server browsing, game saving, controller input, etc.</p>
<p>An area less suited for polling is physics collisions, since there are N*N possible collisions that you would have to poll for. (You could argue that rather than polling for a collision between two specific objects, you could poll for a collision between any two objects. My reply would be that in that case you are no longer strictly polling, you are in fact using a rudimentary effect system.)</p>
<h3>Callbacks</h3>
<p>In a callback solution, the low level system stores a list of high level functions to call when certain events occur.</p>
<p>An important question when it comes to callbacks is if the callback should be called immediately when the event occurs, or if it should be queued up and scheduled for execution later in the frame.</p>
<p>I much prefer the latter approach. If you do callbacks immediately you not only trash your instruction and data caches. You also prevent multithreading (unless you use locks everywhere to prevent the callbacks from stepping on each other). And you open yourself up to the nasty bug where a callback through a chain of events ends up destroying the very objects you are looping over.</p>
<p>It is much better to queue up all callbacks and only execute them when the high level system asks for it (with an execute_callbacks() call). That way you always know when the callbacks occur. Side effects can be minimized and the code flow is clearer. Also, with this approach there is no problem with generating callbacks on the SPU and merging the queue with other callback queues later.</p>
<p>The only thing you need to worry about with delayed callbacks is that the objects that the callback refers to might have been destroyed between the time when the callback was generated and the time when it was actually called. But this is neatly handled by using the ID reference system that I talked about in the previous post. Using that technique, the callback can always determine if the objects still exist.</p>
<p>Note that the callback system outlined here has some similarities with the polling system — in that the callbacks only happen when we explicitly poll for them.</p>
<p>It is not self-evident how to represent a callback in C++. You might be tempted to use a member function pointer. Don’t. The casting and typing rules make it near impossible to use them for any kind of generic callback mechanism. Also, don’t use an “observer pattern”, where the callback must be some object that inherits from an AnimationEventObserver class and overrides handle_animation_event(). That just leads to tons of typing and unnecessary heap allocation.</p>
<p>There is an interesting article about fast and efficient C++ delegates at http://www.codeproject.com/KB/cpp/FastDelegate.aspx. It looks solid, but personally I’m not comfortable with making something that requires so many platform specific tricks one of the core mechanisms of my engine.</p>
<p>So instead I use regular C function pointers for callbacks. This means that if I want to call a member function, I have to make a little static function that calls the member function. That is a bit annoying, but better than the alternatives.</p>
<p>(Isn’t it interesting that when you try to design a clean and flexible C++ API it often ends up as pure C.)</p>
<p>When you use C callbacks you typically also want to pass some data to them. The typical approach in the C world is to use a void * to “user data” that is passed to the callback function. I actually prefer a slightly different approach. Since I sometimes want to pass more data than a single void * I use something like this:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="cpp" style="font-family:monospace;"><span style="color: #0000ff;">struct</span> Callback16
<span style="color: #008000;">&#123;</span>
  <span style="color: #0000ff;">void</span> <span style="color: #008000;">&#40;</span><span style="color: #000040;">*</span>f<span style="color: #008000;">&#41;</span><span style="color: #008000;">&#40;</span><span style="color: #0000ff;">void</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
  <span style="color: #0000ff;">char</span> data<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">12</span><span style="color: #008000;">&#93;</span><span style="color: #008080;">;</span>
<span style="color: #008000;">&#125;</span><span style="color: #008080;">;</span></pre></td></tr></table></div>

<p>There aren’t a huge amount of callbacks, so using 16 bytes instead of 8 to store them doesn’t matter. You could go to Callback32 if you want the option to store even more data.</p>
<p>When calling the callback, I cast the function pointer to the appropriate type and pass a pointer to its data as the first parameter.</p>
<p>typedef void (*AnimationEventCallback)(void *, unsigned);<br />
AnimationEventCallback f = (AnimationEventCallback)callback.f;<br />
f(callback.data, event_id);</p>
<p>I’m not worried about casting the function pointer back and forth between a generic type and a specific one or about casting the data in and out of a raw buffer. Type safety is nice, but there is an awful lot of power in juggling blocks of raw memory. And you don’t have to worry that much about someone casting the data to the wrong type, because doing so will 99% of the time cause a huge spectacular crash, and the error will be fixed immediately.</p>
<h3>Events</h3>
<p>Event systems are in many ways similar to callback systems. The only difference is that instead of storing a direct pointer to a callback function, they store an event enum. The high level system that polls the events decides what action to take for each enum.</p>
<p>In my opinion, callbacks work better when you want to listen to specific notifications: “Tell me when this sound has finished playing.” Events work better when you process them in bulk: “Check all collision notifications to see if the forces involved are strong enough to break the objects.” But much of it is a matter of taste.</p>
<p>For storing the event queues (or callback queues) I just use a raw buffer (Vector orchar[FIXED_SIZE]) where I concatenate all events and their data:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="txt" style="font-family:monospace;">[event_1_enum] [event_1_data] [event_2_enum] [event_2_data] …</pre></td></tr></table></div>

<p>The high level system just steps through this buffer, processing each event in turn. Note that event queues like this are easy to move, copy, merge and transfer between cores. (Again, the power of raw data buffers.)</p>
<p>In this design there is only a single high level system that polls the events of a particular low level system. It understands what all the events mean, what data they use and knows how to act on them. The sole purpose of the event system (it is not even much of a “system”, just a stream of data) is to pass notifications from the low level to the high.</p>
<p>This is in my opinion exactly what an event system should be. It should not be a magic global switchboard that dispatches events from all over the code to whoever wants to listen to them. Because that would be horrid!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.altdevblogaday.com/2011/02/11/managing-coupling-part-2-%e2%80%94-polling-callbacks-and-events/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Managing Decoupling</title>
		<link>http://www.altdevblogaday.com/2011/01/26/managing-decoupling/</link>
		<comments>http://www.altdevblogaday.com/2011/01/26/managing-decoupling/#comments</comments>
		<pubDate>Wed, 26 Jan 2011 21:26:00 +0000</pubDate>
		<dc:creator>Niklas Frykholm</dc:creator>
		
		<guid isPermaLink="false">http://altdevblogaday.org/2011/01/26/managing-decoupling/</guid>
		<description><![CDATA[<div>The only way of staying sane while writing a large complex software system is to regard it as a collection of smaller, simpler systems. And this is only possible if the systems are properly decoupled.</div>
<p><a href="http://www.altdevblogaday.com/2011/01/26/managing-decoupling/" class="more-link">Read more on Managing Decoupling&#8230;</a></p>
]]></description>
				<content:encoded><![CDATA[<div>The only way of staying sane while writing a large complex software system is to regard it as a collection of smaller, simpler systems. And this is only possible if the systems are properly decoupled.</div>
<p />
<div>Ideally, each system should be completely isolated. The effect system should be the only system manipulating effects and it shouldn&rsquo;t do anything else. It should have its own <span style="font-family: courier new,monospace;">update()</span> call just for updating effects. No other system should care how the effects are stored in memory or what parts of the update happen on the CPU, SPU or GPU. A new programmer wanting to understand the system should only have to look at the files in the <span style="font-family: courier new,monospace;">effect_system</span> directory. It should be possible to optimize, rewrite or drop the entire system without affecting any other code.</div>
<p />
<div>Of course, complete isolation is not possible. If anything interesting is going to happen, different systems will at some point have to talk to one another, whether we like it or not.</div>
<p />
<div>The main challenge in keeping an engine &ldquo;healthy&rdquo; is to keep the systems as decoupled as possible while still allowing the necessary interactions to take place. If a system is properly decoupled, adding features is simple. Want a wind effect in your particle system? Just write it. It&rsquo;s just code. It shouldn&rsquo;t take more than a day. But if you are working in a tightly coupled project, such seemingly simple changes can stretch out into nightmarish day-long debugging marathons.</div>
<p />
<div>If you ever get the feeling that you would prefer to test an idea out in a simple toy project rather than in &ldquo;the real engine&rdquo;, that&rsquo;s a clear sign that you have too much coupling.</div>
<p />
<div>Sometimes, engines start out decoupled, but then as deadlines approach and features are requested that don&rsquo;t fit the well-designed APIs, programmers get tempted to open back doors between systems and introduce couplings that shouldn&rsquo;t really be there. Slowly, through this &ldquo;coupling creep&rdquo; the quality of the code deteriorates and the engine becomes less and less pleasant to work with.</div>
<p />
<div>Still, programmers cannot lock themselves in their ivory towers. &ldquo;That feature doesn&rsquo;t fit my API,&rdquo; is never an acceptable answer to give a budding artist. Instead, we need to find ways of handling the challenges of coupling without destroying our engines. Here are four quick ideas to begin with:</div>
<p />
<div><strong>1. Be wary of &ldquo;frameworks&rdquo;.</strong></div>
<p />
<div>By a &ldquo;framework&rdquo; I mean any kind of system that requires all your other code to conform to a specific world view. For example, a scripting system that requires you to add a specific set of macro tags to all your class declarations.</div>
<p />
<div>Other common culprits are:</div>
<div>
<ul>
<li>Root classes that every object must inherit from</li>
<li>RTTI/reflection systems</li>
<li>Serialization systems</li>
<li>Reference counting systems</li>
</ul>
</div>
<div>Such global systems introduce a coupling across the entire engine. They rudely enforce certain design choices on all subsystems, design choices which might not be appropriate for them. Sometimes the consequences are serious. A badly thought out reference system may prevent subsystems from multithreading. A less than stellar serialization system can make linear loading impossible.</div>
<p />
<div>Often, the motivation given for such global systems is that they increase maintainability. With a global serialization system, we just have to make changes at a single place. So refactoring is much easier, it is claimed.</div>
<p />
<div>But in practice, the reverse is often true. After a while, the global system has infested so much of the code base that making any significant change to it is virtually impossible. There are just too many things that would have to be changed, all at the same time.</div>
<p />
<div>You would be much better off if each system just defined its own <span style="font-family: courier new,monospace;">save()</span> and <span style="font-family: courier new,monospace;">load()</span> functions.</div>
<p />
<div><strong>2. Use high level systems to mediate between low level systems.</strong></div>
<p />
<div>Instead of directly coupling low level systems, use a high level system to shuffle data between them. For example, handling footstep sounds might involve the animation system, the sound system and the material system. But none of these systems should know about the others.</div>
<p />
<div>So instead of directly coupling them, let the gameplay system handle their interactions. Since the gameplay system knows about all three systems, it can poll the animation system for events defined in the animation data, sample the ground material from the material system and then ask the sound system to play the appropriate sound.</div>
<p />
<div>Make sure that you have a clear separation between this messy gameplay layer, that can poke around in all other systems, and your clean engine code that is isolated and decoupled. Otherwise there is always a risk that the mess propagates downwards and infects your clean systems.</div>
<p />
<div>In the BitSquid Tech we put the messy stuff either in Lua or in Flow (our visual scripting tool, similar to Unreal&rsquo;s Kismet). The language barrier acts as a firewall, preventing the spread of the messiness.</div>
<p />
<div><strong>3. Duplicating code is sometimes OK!</strong></div>
<p />
<div>Avoiding duplicated code is one of the fundamentals of software design. Entities should not be needlessly multiplied. But there are instances when you are better off breaking this rule.</div>
<p />
<div>I&rsquo;m not advocating copy-paste-programming or writing complicated algorithms twice. I&rsquo;m saying that sometimes people can get a little overzealous with their code reuse. Code sharing has a price that is not always recognized, in that it increases system coupling. Sometimes a little judiciously applied code duplication can be a better solution.</div>
<p />
<div>An typical example is the <span style="font-family: courier new,monospace;">String</span> class (or <span style="font-family: courier new,monospace;">std::string</span> if you are thusly inclined). In some projects you see the <span style="font-family: courier new,monospace;">String</span> class used almost everywhere. If something is a string, it should use the <span style="font-family: courier new,monospace;">String</span> class, the reasoning seems to be. But many systems that handle strings do not need all the features that you find in your typical <span style="font-family: courier new,monospace;">String</span> class: locales, <span style="font-family: courier new,monospace;">find_first_of()</span>, etc. They are fine with just a <span style="font-family: courier new,monospace;">const char *</span>, <span style="font-family: courier new,monospace;">strcmp()</span> and maybe one custom written (potentially duplicated) three-line function. So why not use that, the code will be much simpler and easier to move to SPUs.</div>
<p />
<div>Another culprit is <span style="font-family: courier new,monospace;">FixedArray&lt;int, 5&gt; a</span>. Sure, if you write <span style="font-family: courier new,monospace;">int a[5]</span> instead you will have to duplicate the code for bounds checking if you want that. But your code can be understood and compiled without <span style="font-family: courier new,monospace;">fixed_array.h</span> and template instantiation.</div>
<p />
<div>And if you have any method that takes a <span style="font-family: courier new,monospace;">const Vector&lt;T&gt; &amp;v</span> as argument you should probably take <span style="font-family: courier new,monospace;">const T *begin, const T *end</span> instead. Now you don&rsquo;t need the <span style="font-family: courier new,monospace;">vector.h</span> header, and the caller is not forced to use a particular <span style="font-family: courier new,monospace;">Vector</span> class for storage.</div>
<p />
<div>A final example: I just wrote a patching tool that manipulates our bundles (aka pak-files). That tool duplicates the code for parsing the bundle headers, which is already in the engine. Why? Well, the tool is written in C# and the engine in C++, but in this case that is kind of beside the point. The point is that sharing that code would have been a significant effort.</div>
<p />
<div>First, it would have had to be broken out into a separate library, together with the related parts of the engine. Then, since the tool requires some functionality that the engine doesn&rsquo;t (to parse bundles with foreign endianness) I would have to add a special function for the tool, and probably a <span style="font-family: courier new,monospace;">#define TOOL_COMPILE</span> since I don&rsquo;t want that function in the regular builds. This means I need a special build configuration for the tool. And the engine code would forever be dirtied with the <span style="font-family: courier new,monospace;">TOOL_COMPILE</span> flag. And I wouldn&rsquo;t be able to rearrange the engine code as I wanted in the future, since that might break the tool compile.</div>
<p />
<div>In contrast, rewriting the code for parsing the headers was only 10 minutes of work. It just reads a vector of string hashes. It&#8217;s not rocket science. Sure, if I ever decide to change the bundle format, I might have to spend another 10 minutes rewriting that code. I think I can live with that.</div>
<p />
<div>Writing code is not the problem. The messy, complicated couplings that prevent you from writing code is the problem.</div>
<p />
<div><strong>4. Use IDs to refer to external objects.</strong></div>
<p />
<div>At some point one of your systems will have to refer to objects belonging to another system. For example, the gameplay layer may have to move an effect around or change its parameters.</div>
<p />
<div>I find that the most decoupled way of doing that is by using an ID. Let&rsquo;s consider the alternatives.</div>
<p />
<div><span style="font-family: courier new,monospace;">Effect *, shared_ptr&lt;Effect&gt;</span></div>
<p />
<blockquote class="webkit-indent-blockquote" style="margin: 0 0 0 40px; border: none; padding: 0px;">
<div>A direct pointer is no good, because it will become invalid if the target object is deleted and the effect system should have full control over when and how its objects are deleted. A standard <span style="font-family: courier new,monospace;">shared_ptr</span> won&rsquo;t work for the same reason, it puts the life time of <span style="font-family: courier new,monospace;">Effect</span> objects out of the control of the effect system.</div>
</blockquote>
<p />
<div><span style="font-family: courier new,monospace;">Weak_ptr&lt;Effect&gt;, handle&lt;Effect&gt;</span></div>
<p />
<blockquote class="webkit-indent-blockquote" style="margin: 0 0 0 40px; border: none; padding: 0px;">
<div>By this I mean some kind of reference-counted, indirect pointer to the object. This is better, but still too strongly coupled for my taste. The indirect pointer will be accessed both by the external system (for dereferencing and changing the reference count) and by the effect system (for deleting the <span style="font-family: courier new,monospace;">Effect</span> object or moving it in memory). This has the potential for creating threading problems.</div>
</blockquote>
<p />
<blockquote class="webkit-indent-blockquote" style="margin: 0 0 0 40px; border: none; padding: 0px;">
<div>Also, this construct kind of implies that external systems can dereference and use the <span style="font-family: courier new,monospace;">Effect</span> whenever they want to. Perhaps the effect system only allows that when its <span style="font-family: courier new,monospace;">update()</span> loop is not running and want to <span style="font-family: courier new,monospace;">assert()</span> that. Or perhaps the effect system doesn&rsquo;t want to allow direct access to its objects at all, but instead double buffer all changes.</div>
</blockquote>
<p />
<div>So, in order to allow the effect system to freely reorganize its data and processing in any way it likes, I use IDs to identify objects externally. The IDs are just an integers uniquely identifying an object, that the user can throw away when she is done with them. They don&rsquo;t have to be &ldquo;released&rdquo; like a <span style="font-family: courier new,monospace;">weak_ptr</span>, which removes a point of interaction between the systems. It also means that the IDs are PODs. We can copy and move them freely in memory, juggle them in Lua and DMA them back-and-forth to our heart&rsquo;s content. All of this would be a lot more complicated if we had to keep reference counts.</div>
<p />
<div>In the system we need a fast way of mapping IDs back to objects. Note that <span style="font-family: courier new,monospace;">std::map&lt;unsigned, Object *&gt;</span> is not a fast way! But there are a number of possibilities. The simplest is to just use a fixed size array with object pointers:</div>
<p />
<div><span style="font-family: courier new,monospace;">Object *lookup[MAX_OBJECTS];</span></div>
<p />
<div>If your system has a maximum of 4096 objects, use 12 bits from the key to store an index into this array and the remaining 20 bits as a unique identifier (i.e., to detect the case when the original object has been deleted and a new object has been created at the same index). If you need lots of objects, you can go to a 64 bit ID.</div>
<p />
<p />
<div>That&#8217;s it for today, but this post really just scratches the surface of decoupling. There are a lot of other interesting techniques to look at, such as events, callbacks and &ldquo;duck typing&rdquo;. Maybe something for a future entry&#8230;</div>
]]></content:encoded>
			<wfw:commentRss>http://www.altdevblogaday.com/2011/01/26/managing-decoupling/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
	</channel>
</rss>

<!-- Dynamic page generated in 3.218 seconds. -->
<!-- Cached page generated by WP-Super-Cache on 2013-05-22 11:47:28 -->
<!-- Compression = gzip -->