<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>#AltDevBlogADay &#187; John-Carmack</title>
	<atom:link href="http://www.altdevblogaday.com/author/john-carmack/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.altdevblogaday.com</link>
	<description>Each day a little more #gamedev love</description>
	<lastBuildDate>Wed, 08 May 2013 03:07:50 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.5.1</generator>
		<item>
		<title>Latency Mitigation Strategies</title>
		<link>http://www.altdevblogaday.com/2013/02/22/latency-mitigation-strategies/</link>
		<comments>http://www.altdevblogaday.com/2013/02/22/latency-mitigation-strategies/#comments</comments>
		<pubDate>Fri, 22 Feb 2013 17:25:46 +0000</pubDate>
		<dc:creator>John-Carmack</dc:creator>
				<category><![CDATA[Computer Graphics]]></category>
		<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://www.altdevblogaday.com/?p=29198</guid>
		<description><![CDATA[<p>&#160;</p>
<p><b>Abstract</b></p>
<p>Virtual reality (VR) is one of the most demanding human-in-the-loop applications from a latency standpoint.  The latency between the physical movement of a user’s head and updated photons from a head mounted display reaching their eyes is one of the most critical factors in providing a high quality experience.</p>
<p><a href="http://www.altdevblogaday.com/2013/02/22/latency-mitigation-strategies/" class="more-link">Read more on Latency Mitigation Strategies&#8230;</a></p>
]]></description>
				<content:encoded><![CDATA[<p>&nbsp;</p>
<p><b>Abstract</b></p>
<p>Virtual reality (VR) is one of the most demanding human-in-the-loop applications from a latency standpoint.  The latency between the physical movement of a user’s head and updated photons from a head mounted display reaching their eyes is one of the most critical factors in providing a high quality experience.</p>
<p>Human sensory systems can detect very small relative delays in parts of the visual or, especially, audio fields, but when absolute delays are below approximately 20 milliseconds they are generally imperceptible.  Interactive 3D systems today typically have latencies that are several times that figure, but alternate configurations of the same hardware components can allow that target to be reached.</p>
<p>A discussion of the sources of latency throughout a system follows, along with techniques for reducing the latency in the processing done on the host system.</p>
<p><b>Introduction</b></p>
<p>Updating the imagery in a head mounted display (HMD) based on a head tracking sensor is a subtly different challenge than most human / computer interactions.  With a conventional mouse or game controller, the user is consciously manipulating an interface to complete a task, while the goal of virtual reality is to have the experience accepted at an unconscious level.</p>
<p>Users can adapt to control systems with a significant amount of latency and still perform challenging tasks or enjoy a game; many thousands of people enjoyed playing early network games, even with 400+ milliseconds of latency between pressing a key and seeing a response on screen.</p>
<p>If large amounts of latency are present in the VR system, users may still be able to perform tasks, but it will be by the much less rewarding means of using their head as a controller, rather than accepting that their head is naturally moving around in a stable virtual world.  Perceiving latency in the response to head motion is also one of the primary causes of simulator sickness.  Other technical factors that affect the quality of a VR experience, like head tracking accuracy and precision, may interact with the perception of latency, or, like display resolution and color depth, be largely orthogonal to it.</p>
<p>A total system latency of 50 milliseconds will feel responsive, but still subtly lagging.  One of the easiest ways to see the effects of latency in a head mounted display is to roll your head side to side along the view vector while looking at a clear vertical edge.  Latency will show up as an apparent tilting of the vertical line with the head motion; the view feels “dragged along” with the head motion.  When the latency is low enough, the virtual world convincingly feels like you are simply rotating your view of a stable world.</p>
<p>Extrapolation of sensor data can be used to mitigate some system latency, but even with a sophisticated model of the motion of the human head, there will be artifacts as movements are initiated and changed.  It is always better to not have a problem than to mitigate it, so true latency reduction should be aggressively pursued, leaving extrapolation to smooth out sensor jitter issues and perform only a small amount of prediction.</p>
<p><b>Data collection</b></p>
<p>It is not usually possible to introspectively measure the complete system latency of a VR system, because the sensors and display devices external to the host processor make significant contributions to the total latency.  An effective technique is to record high speed video that simultaneously captures the initiating physical motion and the eventual display update.  The system latency can then be determined by single stepping the video and counting the number of video frames between the two events.</p>
<p>In most cases there will be a significant jitter in the resulting timings due to aliasing between sensor rates, display rates, and camera rates, but conventional applications tend to display total latencies in the dozens of 240 fps video frames.</p>
<p>On an unloaded Windows 7 system with the compositing Aero desktop interface disabled, a gaming mouse dragging a window displayed on a 180 hz CRT monitor can show a response on screen in the same 240 fps video frame that the mouse was seen to first move, demonstrating an end to end latency below four milliseconds.  Many systems need to cooperate for this to happen: The mouse updates 500 times a second, with no filtering or buffering.  The operating system immediately processes the update, and immediately performs GPU accelerated rendering directly to the framebuffer without any page flipping or buffering.  The display accepts the video signal with no buffering or processing, and the screen phosphors begin emitting new photons within microseconds.</p>
<p>In a typical VR system, many things go far less optimally, sometimes resulting in end to end latencies of over 100 milliseconds.</p>
<p><b>Sensors</b></p>
<p>Detecting a physical action can be as simple as a watching a circuit close for a button press, or as complex as analyzing a live video feed to infer position and orientation.</p>
<p>In the old days, executing an IO port input instruction could directly trigger an analog to digital conversion on an ISA bus adapter card, giving a latency on the order of a microsecond and no sampling jitter issues.  Today, sensors are systems unto themselves, and may have internal pipelines and queues that need to be traversed before the information is even put on the USB serial bus to be transmitted to the host.</p>
<p>Analog sensors have an inherent tension between random noise and sensor bandwidth, and some combination of analog and digital filtering is usually done on a signal before returning it.  Sometimes this filtering is excessive, which can contribute significant latency and remove subtle motions completely.</p>
<p>Communication bandwidth delay on older serial ports or wireless links can be significant in some cases.  If the sensor messages occupy the full bandwidth of a communication channel, latency equal to the repeat time of the sensor is added simply for transferring the message.  Video data streams can stress even modern wired links, which may encourage the use of data compression, which usually adds another full frame of latency if not explicitly implemented in a pipelined manner.</p>
<p>Filtering and communication are constant delays, but the discretely packetized nature of most sensor updates introduces a variable latency, or “jitter” as the sensor data is used for a video frame rate that differs from the sensor frame rate.  This latency ranges from close to zero if the sensor packet arrived just before it was queried, up to the repeat time for sensor messages.  Most USB HID devices update at 125 samples per second, giving a jitter of up to 8 milliseconds, but it is possible to receive 1000 updates a second from some USB hardware.  The operating system may impose an additional random delay of up to a couple milliseconds between the arrival of a message and a user mode application getting the chance to process it, even on an unloaded system.</p>
<p><b>Displays</b></p>
<p>On old CRT displays, the voltage coming out of the video card directly modulated the voltage of the electron gun, which caused the screen phosphors to begin emitting photons a few microseconds after a pixel was read from the frame buffer memory.</p>
<p>Early LCDs were notorious for “ghosting” during scrolling or animation, still showing traces of old images many tens of milliseconds after the image was changed, but significant progress has been made in the last two decades.  The transition times for LCD pixels vary based on the start and end values being transitioned between, but a good panel today will have a switching time around ten milliseconds, and optimized displays for active 3D and gaming can have switching times less than half that.</p>
<p>Modern displays are also expected to perform a wide variety of processing on the incoming signal before they change the actual display elements.  A typical Full HD display today will accept 720p or interlaced composite signals and convert them to the 1920&#215;1080 physical pixels.  24 fps movie footage will be converted to 60 fps refresh rates.  Stereoscopic input may be converted from side-by-side, top-down, or other formats to frame sequential for active displays, or interlaced for passive displays.  Content protection may be applied.  Many consumer oriented displays have started applying motion interpolation and other sophisticated algorithms that require multiple frames of buffering.</p>
<p>Some of these processing tasks could be handled by only buffering a single scan line, but some of them fundamentally need one or more full frames of buffering, and display vendors have tended to implement the general case without optimizing for the cases that could be done with low or no delay.  Some consumer displays wind up buffering three or more frames internally, resulting in 50 milliseconds of latency even when the input data could have been fed directly into the display matrix.</p>
<p>Some less common display technologies have speed advantages over LCD panels; OLED pixels can have switching times well under a millisecond, and laser displays are as instantaneous as CRTs.</p>
<p>A subtle latency point is that most displays present an image incrementally as it is scanned out from the computer, which has the effect that the bottom of the screen changes 16 milliseconds later than the top of the screen on a 60 fps display.  This is rarely a problem on a static display, but on a head mounted display it can cause the world to appear to shear left and right, or “waggle” as the head is rotated, because the source image was generated for an instant in time, but different parts are presented at different times.  This effect is usually masked by switching times on LCD HMDs, but it is obvious with fast OLED HMDs.</p>
<p><b>Host processing</b></p>
<p>The classic processing model for a game or VR application is:</p>
<p>Read user input -&gt; run simulation -&gt; issue rendering commands -&gt; graphics drawing -&gt; wait for vsync -&gt; scanout</p>
<p>I = Input sampling and dependent calculation<br />
S = simulation / game execution<br />
R = rendering engine<br />
G = GPU drawing time<br />
V = video scanout time</p>
<p>All latencies are based on a frame time of roughly 16 milliseconds, a progressively scanned display, and zero sensor and pixel latency.</p>
<p>If the performance demands of the application are well below what the system can provide, a straightforward implementation with no parallel overlap will usually provide fairly good latency values.  However, if running synchronized to the video refresh, the minimum latency will still be 16 ms even if the system is infinitely fast.   This rate feels good for most eye-hand tasks, but it is still a perceptible lag that can be felt in a head mounted display, or in the responsiveness of a mouse cursor.</p>
<pre>Ample performance, vsync:
ISRG------------|VVVVVVVVVVVVVVVV|
.................. latency 16 – 32 milliseconds</pre>
<p>Running without vsync on a very fast system will deliver better latency, but only over a fraction of the screen, and with visible tear lines.  The impact of the tear lines are related to the disparity between the two frames that are being torn between, and the amount of time that the tear lines are visible.  Tear lines look worse on a continuously illuminated LCD than on a CRT or laser projector, and worse on a 60 fps display than a 120 fps display.  Somewhat counteracting that, slow switching LCD panels blur the impact of the tear line relative to the faster displays.</p>
<p>If enough frames were rendered such that each scan line had a unique image, the effect would be of a “rolling shutter”, rather than visible tear lines, and the image would feel continuous.  Unfortunately, even rendering 1000 frames a second, giving approximately 15 bands on screen separated by tear lines, is still quite objectionable on fast switching displays, and few scenes are capable of being rendered at that rate, let alone 60x higher for a true rolling shutter on a 1080P display.</p>
<pre>Ample performance, unsynchronized:
ISRG
VVVVV
..... latency 5 – 8 milliseconds at ~200 frames per second</pre>
<p>In most cases, performance is a constant point of concern, and a parallel pipelined architecture is adopted to allow multiple processors to work in parallel instead of sequentially.  Large command buffers on GPUs can buffer an entire frame of drawing commands, which allows them to overlap the work on the CPU, which generally gives a significant frame rate boost at the expense of added latency.</p>
<pre>CPU:ISSSSSRRRRRR----|
GPU:                |GGGGGGGGGGG----|
VID:                |               |VVVVVVVVVVVVVVVV|
    .................................. latency 32 – 48 milliseconds</pre>
<p>When the CPU load for the simulation and rendering no longer fit in a single frame, multiple CPU cores can be used in parallel to produce more frames.  It is possible to reduce frame execution time without increasing latency in some cases, but the natural split of simulation and rendering has often been used to allow effective pipeline parallel operation.  Work queue approaches buffered for maximum overlap can cause an additional frame of latency if they are on the critical user responsiveness path.</p>
<pre>CPU1:ISSSSSSSS-------|
CPU2:                |RRRRRRRRR-------|
GPU :                |                |GGGGGGGGGG------|
VID :                |                |                |VVVVVVVVVVVVVVVV|
     .................................................... latency 48 – 64 milliseconds</pre>
<p>Even if an application is running at a perfectly smooth 60 fps, it can still have host latencies of over 50 milliseconds, and an application targeting 30 fps could have twice that.   Sensor and display latencies can add significant additional amounts on top of that, so the goal of 20 milliseconds motion-to-photons latency is challenging to achieve.</p>
<p><b>Latency Reduction Strategies</b></p>
<p><b>Prevent GPU buffering</b></p>
<p>The drive to win frame rate benchmark wars has led driver writers to aggressively buffer drawing commands, and there have even been cases where drivers ignored explicit calls to glFinish() in the name of improved “performance”.  Today’s fence primitives do appear to be reliably observed for drawing primitives, but the semantics of buffer swaps are still worryingly imprecise.  A recommended sequence of commands to synchronize with the vertical retrace and idle the GPU is:</p>
<p>SwapBuffers();<br />
DrawTinyPrimitive();<br />
InsertGPUFence();<br />
BlockUntilFenceIsReached();</p>
<p>While this should always prevent excessive command buffering on any conformant driver, it could conceivably fail to provide an accurate vertical sync timing point if the driver was transparently implementing triple buffering.</p>
<p>To minimize the performance impact of synchronizing with the GPU, it is important to have sufficient work ready to send to the GPU immediately after the synchronization is performed.  The details of exactly when the GPU can begin executing commands are platform specific, but execution can be explicitly kicked off with glFlush() or equivalent calls.  If the code issuing drawing commands does not proceed fast enough, the GPU may complete all the work and go idle with a “pipeline bubble”.  Because the CPU time to issue a drawing command may have little relation to the GPU time required to draw it, these pipeline bubbles may cause the GPU to take noticeably longer to draw the frame than if it were completely buffered.  Ordering the drawing so that larger and slower operations happen first will provide a cushion, as will pushing as much preparatory work as possible before the synchronization point.</p>
<pre>Run GPU with minimal buffering:
CPU1:ISSSSSSSS-------|
CPU2:                |RRRRRRRRR-------|
GPU :                |-GGGGGGGGGG-----|
VID :                |                |VVVVVVVVVVVVVVVV|
     ................................... latency 32 – 48 milliseconds</pre>
<p>Tile based renderers, as are found in most mobile devices, inherently require a full scene of command buffering before they can generate their first tile of pixels, so synchronizing before issuing any commands will destroy far more overlap.  In a modern rendering engine there may be multiple scene renders for each frame to handle shadows, reflections, and other effects, but increased latency is still a fundamental drawback of the technology.</p>
<p>High end, multiple GPU systems today are usually configured for AFR, or Alternate Frame Rendering, where each GPU is allowed to take twice as long to render a single frame, but the overall frame rate is maintained because there are two GPUs producing frames</p>
<pre>Alternate Frame Rendering dual GPU:
CPU1:IOSSSSSSS-------|IOSSSSSSS-------|
CPU2:                |RRRRRRRRR-------|RRRRRRRRR-------|
GPU1:                | GGGGGGGGGGGGGGGGGGGGGGGG--------|
GPU2:                |                | GGGGGGGGGGGGGGGGGGGGGGG---------|
VID :                |                |                |VVVVVVVVVVVVVVVV|
     .................................................... latency 48 – 64 milliseconds</pre>
<p>Similarly to the case with CPU workloads, it is possible to have two or more GPUs cooperate on a single frame in a way that delivers more work in a constant amount of time, but it increases complexity and generally delivers a lower total speedup.</p>
<p>An attractive direction for stereoscopic rendering is to have each GPU on a dual GPU system render one eye, which would deliver maximum performance and minimum latency, at the expense of requiring the application to maintain buffers across two independent rendering contexts.</p>
<p>The downside to preventing GPU buffering is that throughput performance may drop, resulting in more dropped frames under heavily loaded conditions.</p>
<p><b>Late frame scheduling</b></p>
<p>Much of the work in the simulation task does not depend directly on the user input, or would be insensitive to a frame of latency in it.  If the user processing is done last, and the input is sampled just before it is needed, rather than stored off at the beginning of the frame, the total latency can be reduced.</p>
<p>It is very difficult to predict the time required for the general simulation work on the entire world, but the work just for the player’s view response to the sensor input can be made essentially deterministic.  If this is split off from the main simulation task and delayed until shortly before the end of the frame, it can remove nearly a full frame of latency.</p>
<pre>Late frame scheduling:
CPU1:SSSSSSSSS------I|
CPU2:                |RRRRRRRRR-------|
GPU :                |-GGGGGGGGGG-----|
VID :                |                |VVVVVVVVVVVVVVVV|
                    .................... latency 18 – 34 milliseconds</pre>
<p>Adjusting the view is the most latency sensitive task; actions resulting from other user commands, like animating a weapon or interacting with other objects in the world, are generally insensitive to an additional frame of latency, and can be handled in the general simulation task the following frame.</p>
<p>The drawback to late frame scheduling is that it introduces a tight scheduling requirement that usually requires busy waiting to meet, wasting power.  If your frame rate is determined by the video retrace rather than an arbitrary time slice, assistance from the graphics driver in accurately determining the current scanout position is helpful.</p>
<p><b>View bypass</b></p>
<p>An alternate way of accomplishing a similar, or slightly greater latency reduction Is to allow the rendering code to modify the parameters delivered to it by the game code, based on a newer sampling of user input.</p>
<p>At the simplest level, the user input can be used to calculate a delta from the previous sampling to the current one, which can be used to modify the view matrix that the game submitted to the rendering code.</p>
<p>Delta processing in this way is minimally intrusive, but there will often be situations where the user input should not affect the rendering, such as cinematic cut scenes or when the player has died.  It can be argued that a game designed from scratch for virtual reality should avoid those situations, because a non-responsive view in a HMD is disorienting and unpleasant, but conventional game design has many such cases.</p>
<p>A binary flag could be provided to disable the bypass calculation, but it is useful to generalize such that the game provides an object or function with embedded state that produces rendering parameters from sensor input data instead of having the game provide the view parameters themselves.  In addition to handling the trivial case of ignoring sensor input, the generator function can incorporate additional information such as a head/neck positioning model that modified position based on orientation, or lists of other models to be positioned relative to the updated view.</p>
<p>If the game and rendering code are running in parallel, it is important that the parameter generation function does not reference any game state to avoid race conditions.</p>
<pre>View bypass:
CPU1:ISSSSSSSSS------|
CPU2:                |IRRRRRRRRR------|
GPU :                |--GGGGGGGGGG----|
VID :                |                |VVVVVVVVVVVVVVVV|
                      .................. latency 16 – 32 milliseconds</pre>
<p>The input is only sampled once per frame, but it is simultaneously used by both the simulation task and the rendering task.  Some input processing work is now duplicated by the simulation task and the render task, but it is generally minimal.</p>
<p>The latency for parameters produced by the generator function is now reduced, but other interactions with the world, like muzzle flashes and physics responses, remain at the same latency as the standard model.</p>
<p>A modified form of view bypass could allow tile based GPUs to achieve similar view latencies to non-tiled GPUs, or allow non-tiled GPUs to achieve 100% utilization without pipeline bubbles by the following steps:</p>
<p>Inhibit the execution of GPU commands, forcing them to be buffered.  OpenGL has only the deprecated display list functionality to approximate this, but a control extension could be formulated.</p>
<p>All calculations that depend on the view matrix must reference it independently from a buffer object, rather than from inline parameters or as a composite model-view-projection (MVP) matrix.</p>
<p>After all commands have been issued and the next frame has started, sample the user input, run it through the parameter generator, and put the resulting view matrix into the buffer object for referencing by the draw commands.</p>
<p>Kick off the draw command execution.</p>
<pre>Tiler optimized view bypass:
CPU1:ISSSSSSSSS------|
CPU2:                |IRRRRRRRRRR-----|I
GPU :                |                |-GGGGGGGGGG-----|
VID :                |                |                |VVVVVVVVVVVVVVVV|
                                       .................. latency 16 – 32 milliseconds</pre>
<p>Any view frustum culling that was performed to avoid drawing some models may be invalid if the new view matrix has changed substantially enough from what was used during the rendering task.  This can be mitigated at some performance cost by using a larger frustum field of view for culling, and hardware clip planes based on the culling frustum limits can be used to guarantee a clean edge if necessary.  Occlusion errors from culling, where a bright object is seen that should have been occluded by an object that was incorrectly culled, are very distracting, but a temporary clean encroaching of black at a screen edge during rapid rotation is almost unnoticeable.</p>
<p><b>Time warping</b></p>
<p>If you had perfect knowledge of how long the rendering of a frame would take, some additional amount of latency could be saved by late frame scheduling the entire rendering task, but this is not practical due to the wide variability in frame rendering times.</p>
<pre>Late frame input sampled view bypass:
CPU1:ISSSSSSSSS------|
CPU2:                |----IRRRRRRRRR--|
GPU :                |------GGGGGGGGGG|
VID :                |                |VVVVVVVVVVVVVVVV|
                          .............. latency 12 – 28 milliseconds</pre>
<p>However, a post processing task on the rendered image can be counted on to complete in a fairly predictable amount of time, and can be late scheduled more easily.  Any pixel on the screen, along with the associated depth buffer value, can be converted back to a world space position, which can be re-transformed to a different screen space pixel location for a modified set of view parameters.</p>
<p>After drawing a frame with the best information at your disposal, possibly with bypassed view parameters, instead of displaying it directly, fetch the latest user input, generate updated view parameters, and calculate a transformation that warps the rendered image into a position that approximates where it would be with the updated parameters.  Using that transform, warp the rendered image into an updated form on screen that reflects the new input.  If there are two dimensional overlays present on the screen that need to remain fixed, they must be drawn or composited in after the warp operation, to prevent them from incorrectly moving as the view parameters change.</p>
<pre>Late frame scheduled time warp:
CPU1:ISSSSSSSSS------|
CPU2:                |RRRRRRRRRR----IR|
GPU :                |-GGGGGGGGGG----G|
VID :                |                |VVVVVVVVVVVVVVVV|
                                    .... latency 2 – 18 milliseconds</pre>
<p>If the difference between the view parameters at the time of the scene rendering and the time of the final warp is only a change in direction, the warped image can be almost exactly correct within the limits of the image filtering.  Effects that are calculated relative to the screen, like depth based fog (versus distance based fog) and billboard sprites will be slightly different, but not in a manner that is objectionable.</p>
<p>If the warp involves translation as well as direction changes, geometric silhouette edges begin to introduce artifacts where internal parallax would have revealed surfaces not visible in the original rendering.  A scene with no silhouette edges, like the inside of a box, can be warped significant amounts and display only changes in texture density, but translation warping realistic scenes will result in smears or gaps along edges.  In many cases these are difficult to notice, and they always disappear when motion stops, but first person view hands and weapons are a prominent case.  This can be mitigated by limiting the amount of translation warp, compressing or making constant the depth range of the scene being warped to limit the dynamic separation, or rendering the disconnected near field objects as a separate plane, to be composited in after the warp.</p>
<p>If an image is being warped to a destination with the same field of view, most warps will leave some corners or edges of the new image undefined, because none of the source pixels are warped to their locations.  This can be mitigated by rendering a larger field of view than the destination requires; but simply leaving unrendered pixels black is surprisingly unobtrusive, especially in a wide field of view HMD.</p>
<p>A forward warp, where source pixels are deposited in their new positions, offers the best accuracy for arbitrary transformations.  At the limit, the frame buffer and depth buffer could be treated as a height field, but millions of half pixel sized triangles would have a severe performance cost.  Using a grid of triangles at some fraction of the depth buffer resolution can bring the cost down to a very low level, and the trivial case of treating the rendered image as a single quad avoids all silhouette artifacts at the expense of incorrect pixel positions under translation.</p>
<p>Reverse warping, where the pixel in the source rendering is estimated based on the position in the warped image, can be more convenient because it is implemented completely in a fragment shader.  It can produce identical results for simple direction changes, but additional artifacts near geometric boundaries are introduced if per-pixel depth information is considered, unless considerable effort is expended to search a neighborhood for the best source pixel.</p>
<p>If desired, it is straightforward to incorporate motion blur in a reverse mapping by taking several samples along the line from the pixel being warped to the transformed position in the source image.</p>
<p>Reverse mapping also allows the possibility of modifying the warp through the video scanout.  The view parameters can be predicted ahead in time to when the scanout will read the bottom row of pixels, which can be used to generate a second warp matrix.  The warp to be applied can be interpolated between the two of them based on the pixel row being processed.  This can correct for the “waggle” effect on a progressively scanned head mounted display, where the 16 millisecond difference in time between the display showing the top line and bottom line results in a perceived shearing of the world under rapid rotation on fast switching displays.</p>
<p><b>Continuously updated time warping</b></p>
<p>If the necessary feedback and scheduling mechanisms are available, instead of predicting what the warp transformation should be at the bottom of the frame and warping the entire screen at once, the warp to screen can be done incrementally while continuously updating the warp matrix as new input arrives.</p>
<pre>Continuous time warp:
CPU1:ISSSSSSSSS------|
CPU2:                |RRRRRRRRRRR-----|
GPU :                |-GGGGGGGGGGGG---|
WARP:                |               W| W W W W W W W W|
VID :                |                |VVVVVVVVVVVVVVVV|
                                     ... latency 2 – 3 milliseconds for 500hz sensor updates</pre>
<p>The ideal interface for doing this would be some form of “scanout shader” that would be called “just in time” for the video display.  Several video game systems like the Atari 2600, Jaguar, and Nintendo DS have had buffers ranging from half a scan line to several scan lines that were filled up in this manner.</p>
<p>Without new hardware support, it is still possible to incrementally perform the warping directly to the front buffer being scanned for video, and not perform a swap buffers operation at all.</p>
<p>A CPU core could be dedicated to the task of warping scan lines at roughly the speed they are consumed by the video output, updating the time warp matrix each scan line to blend in the most recently arrived sensor information.</p>
<p>GPUs can perform the time warping operation much more efficiently than a conventional CPU can, but the GPU will be busy drawing the next frame during video scanout, and GPU drawing operations cannot currently be scheduled with high precision due to the difficulty of task switching the deep pipelines and extensive context state.  However, modern GPUs are beginning to allow compute tasks to run in parallel with graphics operations, which may allow a fraction of a GPU to be dedicated to performing the warp operations as a shared parameter buffer is updated by the CPU.</p>
<p><b>Discussion</b></p>
<p>View bypass and time warping are complementary techniques that can be applied independently or together.  Time warping can warp from a source image at an arbitrary view time / location to any other one, but artifacts from internal parallax and screen edge clamping are reduced by using the most recent source image possible, which view bypass rendering helps provide.</p>
<p>Actions that require simulation state changes, like flipping a switch or firing a weapon, still need to go through the full pipeline for 32 – 48 milliseconds of latency based on what scan line the result winds up displaying on the screen, and translational information may not be completely faithfully represented below the 16 – 32 milliseconds of the view bypass rendering, but the critical head orientation feedback can be provided in 2 – 18 milliseconds on a 60 hz display.  In conjunction with low latency sensors and displays, this will generally be perceived as immediate.  Continuous time warping opens up the possibility of latencies below 3 milliseconds, which may cross largely unexplored thresholds in human / computer interactivity.</p>
<p>Conventional computer interfaces are generally not as latency demanding as virtual reality, but sensitive users can tell the difference in mouse response down to the same 20 milliseconds or so, making it worthwhile to apply these techniques even in applications without a VR focus.</p>
<p>A particularly interesting application is in “cloud gaming”, where a simple client appliance or application forwards control information to a remote server, which streams back real time video of the game.  This offers significant convenience benefits for users, but the inherent network and compression latencies makes it a lower quality experience for action oriented titles.  View bypass and time warping can both be performed on the server, regaining a substantial fraction of the latency imposed by the network.  If the cloud gaming client was made more sophisticated, time warping could be performed locally, which could theoretically reduce the latency to the same levels as local applications, but it would probably be prudent to restrict the total amount of time warping to perhaps 30 or 40 milliseconds to limit the distance from the source images.</p>
<p><b>Acknowledgements</b></p>
<p>Zenimax for allowing me to publish this openly.</p>
<p>Hillcrest Labs for inertial sensors and experimental firmware.</p>
<p>Emagin for access to OLED displays.</p>
<p>Oculus for a prototype Rift HMD.</p>
<p>Nvidia for an experimental driver with access to the current scan line number.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.altdevblogaday.com/2013/02/22/latency-mitigation-strategies/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Functional Programming in C++</title>
		<link>http://www.altdevblogaday.com/2012/04/26/functional-programming-in-c/</link>
		<comments>http://www.altdevblogaday.com/2012/04/26/functional-programming-in-c/#comments</comments>
		<pubDate>Thu, 26 Apr 2012 22:13:37 +0000</pubDate>
		<dc:creator>John-Carmack</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.altdevblogaday.com/?p=25786</guid>
		<description><![CDATA[<p class="MsoNormal">Probably everyone reading this has heard “functional programming” put forth as something that is supposed to bring benefits to software development, or even heard it touted as a silver bullet.  However, a trip to <a href="http://en.wikipedia.org/wiki/Functional_programming">Wikipedia</a> for some more information can be initially off-putting, with early references to <a href="http://en.wikipedia.org/wiki/Lambda_calculus">lambda calculus</a> and <a href="http://en.wikipedia.org/wiki/Formal_system">formal systems</a>.  It isn’t immediately clear what that has to do with writing better software.</p>
<p><a href="http://www.altdevblogaday.com/2012/04/26/functional-programming-in-c/" class="more-link">Read more on Functional Programming in C++&#8230;</a></p>
]]></description>
				<content:encoded><![CDATA[<p class="MsoNormal">Probably everyone reading this has heard “functional programming” put forth as something that is supposed to bring benefits to software development, or even heard it touted as a silver bullet.  However, a trip to <a href="http://en.wikipedia.org/wiki/Functional_programming">Wikipedia</a> for some more information can be initially off-putting, with early references to <a href="http://en.wikipedia.org/wiki/Lambda_calculus">lambda calculus</a> and <a href="http://en.wikipedia.org/wiki/Formal_system">formal systems</a>.  It isn’t immediately clear what that has to do with writing better software.</p>
<p class="MsoNormal">My pragmatic summary:  A large fraction of the flaws in software development are due to programmers not fully understanding all the possible states their code may execute in.  In a multithreaded environment, the lack of understanding and the resulting problems are greatly amplified, almost to the point of panic if you are paying attention.  Programming in a functional style makes the state presented to your code explicit, which makes it much easier to reason about, and, in a completely pure system, makes thread race conditions impossible.</p>
<p class="MsoNormal">I do believe that there is real value in pursuing functional programming, but it would be irresponsible to exhort everyone to abandon their C++ compilers and start coding in <a href="http://en.wikipedia.org/wiki/Lisp_%28programming_language%29">Lisp</a>, <a href="http://en.wikipedia.org/wiki/Haskell_%28programming_language%29">Haskell</a>, or, to be blunt, any other fringe language.  To the eternal chagrin of language designers, there are plenty of externalities that can overwhelm the benefits of a language, and game development has more than most fields.  We have cross platform issues, proprietary tool chains, certification gates, licensed technologies, and stringent performance requirements on top of the issues with legacy codebases and workforce availability that everyone faces.</p>
<p class="MsoNormal">If you are in circumstances where you can undertake significant development work in a non-mainstream language, I’ll cheer you on, but be prepared to take some hits in the name of progress.  For everyone else: <em><strong>No matter what language you work in,</strong> <strong>programming in a functional style provides benefits.  You should do it whenever it is convenient, and you should think hard about the decision when it isn’t convenient</strong>.</em>  You can learn about lambdas, monads, currying, composing lazily evaluated functions on infinite sets, and all the other aspects of explicitly functionally oriented languages later if you choose.</p>
<p class="MsoNormal">C++ doesn’t encourage functional programming, but it doesn’t prevent you from doing it, and you retain the power to drop down and apply SIMD intrinsics to hand laid out data backed by memory mapped files, or whatever other nitty-gritty goodness you find the need for.</p>
<p>&nbsp;</p>
<p class="MsoNormal"><span style="font-size: 18pt">Pure Functions</span></p>
<p class="MsoNormal">A pure function only looks at the parameters passed in to it, and all it does is return one or more computed values based on the parameters.  It has no logical <em>side effects.  </em>This is an abstraction of course; every function has side effects at the CPU level, and most at the heap level, but the abstraction is still valuable.</p>
<p class="MsoNormal">It doesn’t look at or update global state.  it doesn’t maintain internal state.  It doesn’t perform any IO.  it doesn’t mutate any of the input parameters.  Ideally, it isn’t passed any extraneous data – getting an <span style="font-family: 'Courier New'">allMyGlobals</span> pointer passed in defeats much of the purpose.</p>
<p class="MsoNormal">Pure functions have a lot of nice properties.</p>
<p class="MsoNormal">Thread safety.  A pure function with value parameters is completely thread safe.  With reference or pointer parameters, even if they are const, you do need to be aware of the danger that another thread doing non-pure operations might mutate or free the data, but it is still one of the most powerful tools for writing safe multithreaded code.</p>
<p class="MsoNormal">You can trivially switch them out for <a href="http://altdevblogaday.com/2011/11/22/parallel-implementations">parallel implementations</a>, or run multiple implementations to compare the results.  This makes it much safer to experiment and evolve.</p>
<p class="MsoNormal">Reusability.  It is much easier to transplant a pure function to a new environment.  You still need to deal with type definitions and any called pure functions, but there is no snowball effect.  How many times have you known there was some code that does what you need in another system, but extricating it from all of its environmental assumptions was more work than just writing it over?</p>
<p class="MsoNormal">Testability.  A pure function has <em>referential transparency</em>, which means that it will always give the same result for a set of parameters no matter when it is called, which makes it much easier to exercise than something interwoven with other systems.   I have never been very responsible about writing test code;  a lot of code interacts with enough systems that it can require elaborate harnesses to exercise, and I could often convince myself (probably incorrectly) that it wasn’t worth the effort.  Pure functions are trivial to test; the tests look like something right out of a textbook, where you build some inputs and look at the output.  Whenever I come across a finicky looking bit of code now, I split it out into a separate pure function and write tests for it.  Frighteningly, I often find something wrong in these cases, which means I’m probably not casting a wide enough net.</p>
<p class="MsoNormal">Understandability and maintainability.  The bounding of both input and output makes pure functions easier to re-learn when needed, and there are less places for undocumented requirements regarding external state to hide.</p>
<p class="MsoNormal">Formal systems and automated reasoning about software will be increasingly important in the future.  <a href="http://altdevblogaday.com/2011/12/24/static-code-analysis/">Static code analysis</a> is important today, and transforming your code into a more functional style aids analysis tools, or at least lets the faster local tools cover the same ground as the slower and more expensive global tools.  We are a “Get ‘er done” sort of industry, and I do not see formal proofs of whole program “correctness” becoming a relevant goal, but being able to prove that certain classes of flaws are not present in certain parts of a codebase will still be very valuable.  We could use some more science and math in our process.</p>
<p class="MsoNormal">Someone taking an introductory programming class might be scratching their head and thinking “aren’t all programs supposed to be written like this?”  The reality is that far more programs are <a href="http://en.wikipedia.org/wiki/Big_ball_of_mud">Big Balls of Mud</a> than not.  Traditional imperative programming languages give you escape hatches, and they get used all the time.  If you are just writing throwaway code, do whatever is most convenient, which often involves global state.  If you are writing code that may still be in use a year later, balance the convenience factor against the difficulties you will inevitably suffer later.  Most developers are not very good at predicting the future time integrated suffering their changes will result in.</p>
<p>&nbsp;</p>
<p class="MsoNormal"><span style="font-size: 18pt">Purity In Practice</span></p>
<p class="MsoNormal">Not everything can be pure; unless the program is only operating on its own source code, at some point you need to interact with the outside world.  It can be fun in a puzzly sort of way to try to push purity to great lengths, but the pragmatic break point acknowledges that side effects are necessary at some point, and manages them effectively.</p>
<p class="MsoNormal">It doesn’t even have to be all-or-nothing in a particular function.  There is a continuum of value in how pure a function is, and the value step from almost-pure to completely-pure is smaller than that from spaghetti-state to mostly-pure.  Moving a function towards purity improves the code, even if it doesn’t reach full purity.  A function that bumps a global counter or checks a global debug flag is not pure, but if that is its only detraction, it is still going to reap most of the benefits.</p>
<p class="MsoNormal">Avoiding the worst in a broader context is generally more important than achieving perfection in limited cases.  If you consider the most toxic functions or systems you have had to deal with, the ones that you know have to be handled with tongs and a face shield, it is an almost sure bet that they have a complex web of state and assumptions that their behavior relies on, and it isn’t confined to their parameters.  Imposing some discipline in these areas, or at least fighting to prevent more code from turning into similar messes, is going to have more impact than tightening up some low level math functions.</p>
<p class="MsoNormal">The process of refactoring towards purity generally involves disentangling computation from the environment it operates in, which almost invariably means more parameter passing.  This seems a bit curious – greater verbosity in programming languages is broadly reviled, and functional programming is often associated with code size reduction.  The factors that allow programs in functional languages to sometimes be more concise than imperative implementations are pretty much orthogonal to the use of pure functions &#8212; garbage collection, powerful built in types, pattern matching, list comprehensions, function composition, various bits of syntactic sugar, etc.  For the most part, these size reducers don’t have much to do with being functional, and can also be found in some imperative languages.</p>
<p class="MsoNormal">You <em>should</em> be getting irritated if you have to pass a dozen parameters into a function; you may be able to refactor the code in a manner that reduces the parameter complexity.</p>
<p class="MsoNormal">The lack of any language support in C++ for maintaining purity is not ideal.  If someone modifies a widely used foundation function to be non-pure in some evil way, everything that uses the function also loses its purity.  This sounds disastrous from a formal systems point of view, but again, it isn’t an all-or-nothing proposition where you fall from grace with the first sin.  Large scale software development is unfortunately statistical.</p>
<p class="MsoNormal">It seems like there is a sound case for a pure keyword in future C/C++ standards.  There are close parallels with const – an optional qualifier that allows compile time checking of programmer intention and will never hurt, and could often help, code generation.  The D programming language does offer a pure keyword:  <a href="http://www.d-programming-language.org/function.html">http://www.d-programming-language.org/function.html</a>  Note their distinction between weak and strong purity – you need to also have const input references and pointers to be strongly pure.</p>
<p class="MsoNormal">In some ways, a language keyword is over-restrictive &#8212; a function can still be pure even if it calls impure functions, as long as the side effects don’t escape the outer function.  Entire programs can be considered pure functional units if they only deal with command line parameters instead of random file system state.</p>
<p class="MsoNormal"><span style="font-size: 18pt">Object Oriented Programming</span></p>
<p class="MsoNormal"><em><a href="https://twitter.com/#%21/mfeathers"><strong><span style="font-family: 'Calibri','sans-serif';color: blue">Michael Feathers</span></strong> <span class="username"><s><span style="color: blue">@</span></s></span><span class="username"><strong><span style="color: blue">mfeathers</span></strong></span> </a>  OO makes code understandable by encapsulating moving parts. FP makes code understandable by minimizing moving parts.</em></p>
<p class="MsoNormal">The “moving parts” are mutating states.  Telling an object to change itself is lesson one in a basic object oriented programming book, and it is deeply ingrained in most programmers, but it is anti-functional behavior.  Clearly there is some value in the basic OOP idea of grouping functions with the data structures they operate on, but if you want to reap the benefits of functional programming in parts of your code, you have to back away from some object oriented behaviors in those areas.</p>
<p class="MsoNormal">Class methods that can’t be const are not pure by definition, because they mutate some or all of the potentially large set of state in the object.  They are not thread safe, and the ability to incrementally poke and prod objects into unexpected states is indeed a significant source of bugs.</p>
<p class="MsoNormal">Const object methods can still be technically pure if you don’t count the implicit <em>const this</em> pointer against them, but many object are large enough to constitute a sort of global state all their own, blunting some of the clarity benefits of pure functions.  Constructors can be pure functions, and generally should strive to be – they take arguments and return an object.</p>
<p class="MsoNormal">At the tactical programming level, you can often work with objects in a more functional manner, but it may require changing the interfaces a bit.  At id we went over a decade with an idVec3 class that had a self-mutating <span style="font-family: 'Courier New'">void Normalize() </span>method, but no corresponding <span style="font-family: 'Courier New'">idVec3 Normalized() const</span> method.  Many string methods were similarly defined as working on themselves, rather than returning a new copy with the operation performed on it – <span style="font-family: 'Courier New'">ToLowerCase(), StripFileExtension(),</span> etc.</p>
<p class="MsoNormal"><span style="font-size: 18pt">Performance Implications</span></p>
<p class="MsoNormal">In almost all cases, directly mutating blocks of memory is the speed-of-light optimal case, and avoiding this is spending some performance.  Most of the time this is of only theoretical interest; we trade performance for productivity all the time.</p>
<p class="MsoNormal">Programming with pure functions will involve more copying of data, and in some cases this clearly makes it the incorrect implementation strategy due to performance considerations.  As an extreme example, you can write a pure <span style="font-family: 'Courier New'">DrawTriangle()</span> function that takes a framebuffer as a parameter and returns a completely new framebuffer with the triangle drawn into it as a result.  Don’t do that.</p>
<p class="MsoNormal">Returning everything by value is the natural functional programming style, but relying on compilers to always perform <a href="http://en.wikipedia.org/wiki/Return_value_optimization">return value optimization</a> can be hazardous to performance, so passing reference parameter for output of complex data structures is often justifiable, but it has the unfortunate effect of preventing you from declaring the returned value as const to enforce <a href="http://en.wikipedia.org/wiki/Single_assignment#Single_assignment">single assignment</a>.</p>
<p class="MsoNormal">There will be a strong urge in many cases to just update a value in a complex structure passed in rather than making a copy of it and returning the modified version, but doing so throws away the thread safety guarantee and should not be done lightly.  List generation is often a case where it is justified.  The pure functional way to append something to a list is to return a completely new copy of the list with the new element at the end, leaving the original list unchanged.  Actual functional languages are implemented in ways that make this not as disastrous as it sounds, but if you do this with typical C++ containers you will die.</p>
<p class="MsoNormal">A significant mitigating factor is that performance today means parallel programming, which usually requires more copying and combining than in a single threaded environment even in the optimal performance case, so the penalty is smaller, while the complexity reduction and correctness benefits are correspondingly larger.  When you start thinking about running, say, all the characters in a game world in parallel, it starts sinking in that the object oriented approach of updating objects has some deep difficulties in parallel environments.  Maybe if all of the object just referenced a read only version of the world state, and we copied over the updated version at the end of the frame…  Hey, wait a minute…</p>
<p>&nbsp;</p>
<p class="MsoNormal"><span style="font-size: 18pt">Action Items</span></p>
<p class="MsoNormal">Survey some non-trivial functions in your codebase and track down every bit of external state they can reach, and all possible modifications they can make.  This makes great documentation to stick in a comment block, even if you don’t do anything with it.  If the function can trigger, say, a screen update through your render system, you can just throw your hands up in the air and declare the set of all effects beyond human understanding.</p>
<p class="MsoNormal">The next task you undertake, try from the beginning to think about it in terms of the real computation that is going on.  Gather up your input, pass it to a pure function, then take the results and do something with it.</p>
<p class="MsoNormal">As you are debugging code, make yourself more aware of the part mutating state and hidden parameters play in obscuring what is going on.</p>
<p class="MsoNormal">Modify some of your utility object code to return new copies instead of self-mutating, and try throwing const in front of practically every non-iterator variable you use.</p>
<p>&nbsp;</p>
<p class="MsoNormal">Additional references:</p>
<p><a href="http://www.haskell.org/haskellwiki/Introduction">http://www.haskell.org/haskellwiki/Introduction</a></p>
<p><a href="http://lisperati.com/">http://lisperati.com/</a></p>
<p><a href="http://www.johndcook.com/blog/tag/functional-programming/">http://www.johndcook.com/blog/tag/functional-programming/</a></p>
<p><a href="http://www.cs.kent.ac.uk/people/staff/dat/miranda/whyfp90.pdf">http://www.cs.kent.ac.uk/people/staff/dat/miranda/whyfp90.pdf</a></p>
<p><a href="http://channel9.msdn.com/Shows/Going+Deep/Lecture-Series-Erik-Meijer-Functional-Programming-Fundamentals-Chapter-1">http://channel9.msdn.com/Shows/Going+Deep/Lecture-Series-Erik-Meijer-Functional-Programming-Fundamentals-Chapter-1</a></p>
<p><a href="http://www.cs.utah.edu/%7Ehal/docs/daume02yaht.pdf">http://www.cs.utah.edu/~hal/docs/daume02yaht.pdf</a></p>
<p><a href="http://www.cs.cmu.edu/%7Ecrary/819-f09/Backus78.pdf">http://www.cs.cmu.edu/~crary/819-f09/Backus78.pdf</a></p>
<p><a href="http://fpcomplete.com/the-downfall-of-imperative-programming/">http://fpcomplete.com/the-downfall-of-imperative-programming/</a></p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.altdevblogaday.com/2012/04/26/functional-programming-in-c/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Static Code Analysis</title>
		<link>http://www.altdevblogaday.com/2011/12/24/static-code-analysis/</link>
		<comments>http://www.altdevblogaday.com/2011/12/24/static-code-analysis/#comments</comments>
		<pubDate>Sat, 24 Dec 2011 05:04:46 +0000</pubDate>
		<dc:creator>John-Carmack</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[Tools]]></category>
		<category><![CDATA[/Analyze]]></category>
		<category><![CDATA[C++]]></category>
		<category><![CDATA[code analysis]]></category>
		<category><![CDATA[code optimization]]></category>
		<category><![CDATA[Coverity]]></category>
		<category><![CDATA[debugging]]></category>
		<category><![CDATA[PC-Lint]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[PVS-Studio]]></category>
		<category><![CDATA[static analysis]]></category>
		<category><![CDATA[static code]]></category>
		<category><![CDATA[Visual Studio]]></category>

		<guid isPermaLink="false">http://altdevblogaday.com/?p=21879</guid>
		<description><![CDATA[<p>The most important thing I have done as a programmer in recent years is to aggressively pursue static code analysis.  Even more valuable than the hundreds of serious bugs I have prevented with it is the change in mindset about the way I view software reliability and code quality.</p>
<p><a href="http://www.altdevblogaday.com/2011/12/24/static-code-analysis/" class="more-link">Read more on Static Code Analysis&#8230;</a></p>
]]></description>
				<content:encoded><![CDATA[<p>The most important thing I have done as a programmer in recent years is to aggressively pursue static code analysis.  Even more valuable than the hundreds of serious bugs I have prevented with it is the change in mindset about the way I view software reliability and code quality.</p>
<p>It is important to say right up front that quality isn’t everything, and acknowledging it isn’t some sort of moral failing.  <em>Value</em> is what you are trying to produce, and quality is only one aspect of it, intermixed with cost, features, and other factors.  There have been plenty of hugely successful and highly regarded titles that were filled with bugs and crashed a lot; pursuing a Space Shuttle style code development process for game development would be idiotic.  Still, quality does matter.</p>
<p>I have always cared about writing good code; one of my important internal motivations is that of the craftsman, and I always want to improve.  I have read piles of books with dry chapter titles like “Policies , Standards, and Quality Plans”, and my work with Armadillo Aerospace has put me in touch with the very different world of safety critical software development.</p>
<p>Over a decade ago, during the development of Quake 3, I bought a license for PC-Lint and tried using it – the idea of automatically pointing out flaws in my code sounded great.  However, running it as a command line tool and sifting through the reams of commentary that it produced didn’t wind up winning me over, and I abandoned it fairly quickly.</p>
<p>Both programmer count and codebase size have grown by an order of magnitude since then, and the implementation language has moved from C to C++, all of which contribute to a much more fertile ground for software errors.  A few years ago, after reading a number of research papers about modern static code analysis, I decided to see how things had changed in the decade since I had tried PC-Lint.</p>
<p>At this point, we had been compiling at warning level 4 with only a very few specific warnings disabled, and warnings-as-errors forced programmers to abide by it.  While there were some dusty reaches of the code that had years of accumulated cruft, most of the code was fairly modern.  We thought we had a pretty good codebase.</p>
<p><strong>Coverity</strong></p>
<p>Initially, I contacted <a href="http://www.coverity.com/">Coverity</a> and signed up for a demo run.  This is serious software, with the licensing cost based on total lines of code, and we wound up with a quote well into five figures.  When they presented their analysis, they commented that our codebase was one of the cleanest of its size they had seen (maybe they tell all customers that to make them feel good), but they presented a set of about a hundred issues that were identified.  This was very different than the old PC-Lint run.  It was very high signal to noise ratio – most of the issues highlighted were clearly incorrect code that could have serious consequences.</p>
<p>This was eye opening, but the cost was high enough that it gave us pause.  Maybe we wouldn’t introduce that many new errors for it to catch before we ship.</p>
<p><strong>Microsoft /analyze </strong></p>
<p>I probably would have talked myself into paying Coverity eventually, but while I was still debating it, Microsoft preempted the debate by incorporating their <a href="http://msdn.microsoft.com/en-us/library/d3bbz7tz%28v=VS.100%29.aspx">/analyze</a> functionality into the 360 SDK.  /Analyze was previously available as part of the top-end, ridiculously expensive version of Visual Studio, but it was now available to every 360 developer at no extra charge.  I read into this that Microsoft feels that game quality on the 360 impacts them more than application quality on Windows does. :-)</p>
<p>Technically, the Microsoft tool only performs local analysis, so it should be inferior to Coverity’s global analysis, but enabling it poured out <em>mountains</em> of errors, far more than Coverity reported.  True, there were lots of false positives, but there was also a lot of scary, scary stuff.</p>
<p>I started slowly working my way through the code, fixing up first my personal code, then the rest of the system code, then the game code.  I would work on it during odd bits of free time, so the entire process stretched over a couple months.  One of the side benefits of having it stretch out was that it conclusively showed that it was pointing out some very important things – during that time there was an epic multi-programmer, multi-day bug hunt that wound up being traced to something that /analyze had flagged, but I hadn’t fixed yet.  There were several other, less dramatic cases where debugging led directly to something already flagged by /analyze.  These were real issues.</p>
<p>Eventually, I had all the code used to build the 360 executable compiling without warnings with /analyze enabled, so I checked it in as the default behavior for 360 builds.  Every programmer working on the 360 was then getting the code analyzed every time they built, so they would notice the errors themselves as they were making them, rather than having me silently fix them at a later time.  This did slow down compiles somewhat, but /analyze is by far the fastest analysis tool I have worked with, and it is oh so worth it.</p>
<p>We had a period where one of the projects accidentally got the static analysis option turned off for a few months, and when I noticed and re-enabled it, there were piles of new errors that had been introduced in the interim.  Similarly, programmers working just on the PC or PS3 would check in faulty code and not realize it until they got a “broken 360 build” email report.  These were demonstrations that the normal development operations were continuously producing these classes of errors, and /analyze was effectively shielding us from a lot of them.</p>
<p>Bruce Dawson has blogged about working with /analysis a number of times: <a href="http://randomascii.wordpress.com/category/code-reliability/">http://randomascii.wordpress.com/category/code-reliability/</a></p>
<p><strong>PVS-Studio</strong></p>
<p>Because we were only using /analyze on the 360 code, we still had a lot of code not covered by analysis – the PC and PS3 specific platform code, and all the utilities that only ran on the PC.</p>
<p>The next tool I looked at was <a href="http://www.viva64.com/en/pvs-studio/">PVS-Studio</a>.  It has good integration with Visual Studio, and a convenient demo mode (try it!).  Compared to /analyze, PVS-Studio is painfully slow, but it pointed out a number of additional important errors, even on code that was already completely clean to /analyze.  In addition to pointing out things that are logically errors, PVS-Studio also points out a number of things that are common patterns of programmer error, even if it is still completely sensible code.  This is almost guaranteed to produce some false positives, but damned if we didn’t have instances of those common error patterns that needed fixing.</p>
<p>There are a number of good articles on the PVS-Studio <a href="http://www.viva64.com/en/developers-resources/">site</a>, most with code examples drawn from open source projects demonstrating exactly what types of things are found.   I considered adding some representative code analysis warnings to this article, but there are already better documented examples present there.  Go look at them, and don&#8217;t smirk and think &#8220;I would never write that!&#8221;</p>
<p><strong>PC-Lint</strong></p>
<p>Finally, I went back to <a href="http://www.gimpel.com/html/pcl.htm">PC-Lint</a>, coupled with  <a href="http://www.riverblade.co.uk/products/visual_lint/index.html">Visual Lint</a> for IDE integration.  In the grand unix tradition, it can be configured to do just about anything, but it isn’t very friendly, and generally doesn’t “just work”.  I bought a five-pack of licenses, but it has been problematic enough that  I think all the other developers that tried it gave up on it.  The flexibility does have benefits – I was able to configure it to analyze all of our PS3 platform specific code, but that was a tedious bit of work.</p>
<p>Once again, even in code that had been cleaned by both /analyze and PVS-Studio, new errors of significance were found.  I made a real effort to get our codebase lint clean, but I didn’t succeed.  I made it through all the system code, but I ran out of steam when faced with all the reports in the game code.  I triaged it by hitting the classes of reports that I worried most about, and ignored the bulk of the reports that were more stylistic or potential concerns.</p>
<p>Trying to retrofit a substantial codebase to be clean at maximum levels in PC-Lint is probably futile.  I did some “green field” programming where I slavishly made every picky lint comment go away, but it is more of an adjustment than most experienced C/C++ programmers are going to want to make.  I still need to spend some time trying to determine the right set of warnings to enable to let us get the most benefit from PC-Lint.</p>
<p><strong>Discussion</strong></p>
<p>I learned a lot going through this process.  I fear that some of it may not be easily transferable, that without personally going through hundreds of reports in a short amount of time and getting that sinking feeling in the pit of your stomach over and over again, “we’re doing OK” or “it’s not so bad” will be the default responses.</p>
<p>The first step is fully admitting that the code you write is riddled with errors.  That is a bitter pill to swallow for a lot of people, but without it, most suggestions for change will be viewed with irritation or outright hostility.  You have to <em>want</em> criticism of your code.</p>
<p>Automation is necessary.  It is common to take a sort of smug satisfaction in reports of colossal failures of automatic systems, but for every failure of automation, the failures of humans are legion.  Exhortations to “write better code” plans for more code reviews, pair programming, and so on just don’t cut it, especially in an environment with dozens of programmers under a lot of time pressure.  The value in catching even the small subset of errors that are tractable to static analysis <em>every single time</em> is huge.</p>
<p>I noticed that each time PVS-Studio was updated, it found something in our codebase with the new rules.  This seems to imply  that if you have a large enough codebase, any class of error that is syntactically legal probably exists there.  In a large project, code quality is every bit as statistical as physical material properties – flaws exist all over the place, you can only hope to minimize the impact they have on your users.</p>
<p>The analysis tools are working with one hand tied behind their back, being forced to infer information from languages that don’t necessarily provide what they want, and generally making very conservative assumptions.  You should cooperate as much as possible – favor indexing over pointer arithmetic, try to keep your call graph inside a single source file, use explicit annotations, etc.  Anything that isn’t crystal clear to a static analysis tool probably isn’t clear to your fellow programmers, either.  The classic hacker disdain for “bondage and discipline languages” is short sighted – the needs of large, long-lived, multi-programmer projects are just different than the quick work you do for yourself.</p>
<p>NULL pointers are the biggest problem in C/C++, at least in our code.  The dual use of a single value as both a flag and an address causes an incredible number of fatal issues.  C++ references should be favored over pointers whenever possible; while a reference is “really” just a pointer, it has the implicit contract of being not-NULL.  Perform NULL checks when pointers are turned into references, then you can ignore the issue thereafter.  There are a lot of deeply ingrained game programming patterns that are just dangerous, but I’m not sure how to gently migrate away from all the NULL checking.</p>
<p>Printf format string errors were the second biggest issue in our codebase, heightened by the fact that passing an idStr instead of idStr::c_str() almost always results in a crash, but annotating all our variadic functions with /analyze annotations so they are properly type checked kills this problem dead.  There were dozens of these hiding in informative warning messages that would turn into crashes when some odd condition triggered the code path, which is also a comment about how the code coverage of our general testing was lacking.</p>
<p>A lot of the serious reported errors are due to modifications of code long after it was written.  An incredibly common error pattern is to have some perfectly good code that checks for NULL before doing an operation, but a later code modification changes it so that the pointer is used again without checking.  Examined in isolation, this is a comment on code path complexity, but when you look back at the history, it is clear that it was more a failure to communicate preconditions clearly to the programmer modifying the code.</p>
<p>By definition, you can’t focus on everything, so focus on the code that is going to ship to customers, rather than the code that will be used internally.  Aggressively migrate code from shipping to isolated development projects.  There was a paper recently that noted that all of the various code quality metrics correlated at least as strongly with code size as error rate, making code size alone give essentially the same error predicting ability.  Shrink your important code.</p>
<p>If you aren’t deeply frightened about all the additional issues raised by concurrency, you aren’t thinking about it hard enough.</p>
<p>It is impossible to do a true control test in software development, but I feel the success that we have had with code analysis has been clear enough that I will say plainly <strong>it is irresponsible to not use it</strong>.  There is objective data in automatic console crash reports showing that Rage, despite being bleeding edge in many ways, is remarkably more robust than most contemporary titles.  The PC launch of Rage was unfortunately tragically flawed due to driver problems &#8212; I’ll wager AMD does not use static code analysis on their graphics drivers.</p>
<p>The takeaway action should be:  If your version of Visual Studio has /analyze available, turn it on and give it a try.  If I had to pick one tool, I would choose the Microsoft option.  Everyone else working in Visual Studio, at least give the PVS-Studio demo a try.  If you are developing commercial software, buying static analysis tools is money well spent.</p>
<p>A final parting comment from twitter:</p>
<p><a href="http://twitter.com/#%21/dave_revell"><strong>Dave Revell</strong> <span style="text-decoration: line-through;">@</span><strong>dave_revell</strong> </a> The more I push code through static analysis, the more I&#8217;m amazed that computers boot at all.</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.altdevblogaday.com/2011/12/24/static-code-analysis/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Parallel Implementations</title>
		<link>http://www.altdevblogaday.com/2011/11/22/parallel-implementations/</link>
		<comments>http://www.altdevblogaday.com/2011/11/22/parallel-implementations/#comments</comments>
		<pubDate>Tue, 22 Nov 2011 18:27:17 +0000</pubDate>
		<dc:creator>John-Carmack</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[game development]]></category>
		<category><![CDATA[programming]]></category>

		<guid isPermaLink="false">http://altdevblogaday.com/?p=20465</guid>
		<description><![CDATA[<p>I used to <a href="http://cam.ly/blog/2010/12/code-fearlessly/">Code Fearlessly</a> all the time, tearing up everything whenever I had a thought about a better way of doing something.  There was even a bit of pride there &#8212; &#8220;I&#8217;m not afraid to suffer consequences in the quest to Do The Right Thing!&#8221;  Of course, to be honest, the consequences usually fell on a more junior programmer who had to deal with an irate developer that had something unexpectedly stop working when I tore up the code to make it &#8220;better&#8221;.</p>
<p><a href="http://www.altdevblogaday.com/2011/11/22/parallel-implementations/" class="more-link">Read more on Parallel Implementations&#8230;</a></p>
]]></description>
				<content:encoded><![CDATA[<p>I used to <a href="http://cam.ly/blog/2010/12/code-fearlessly/">Code Fearlessly</a> all the time, tearing up everything whenever I had a thought about a better way of doing something.  There was even a bit of pride there &#8212; &#8220;I&#8217;m not afraid to suffer consequences in the quest to Do The Right Thing!&#8221;  Of course, to be honest, the consequences usually fell on a more junior programmer who had to deal with an irate developer that had something unexpectedly stop working when I tore up the code to make it &#8220;better&#8221;.</p>
<p>Sure, with everything in source control you can roll back the changes if it catastrophically breaks, but if you did succeed in making some aspect better, there is an incentive to keep pushing forward, even if there is a bit of suffering involved.  Somewhat more subtly, there are all sorts of opportunities to avoid making honest comparisons between the new way and the old way.  Rolling back code and rebuilding to run a test is a pain, and you aren’t going to do it very often, even if you have a suspicion that things aren’t working quite as well in a particular case you hadn’t considered during the rewrite.</p>
<p>What I try to do nowadays is to implement new ideas in parallel with the old ones, rather than mutating the existing code.  This allows easy and honest comparison between them, and makes it trivial to go back to the old reliable path when the spiffy new one starts showing flaws.  The difference between changing a console variable to get a different behavior versus running an old exe, let alone reverting code changes and rebuilding, is significant.</p>
<p>For some tasks, this is pretty obvious.  If you have a ray tracer, it isn&#8217;t hard to see an interface that allows you to have the Trace() function use various kD tree / BVH / BSP back ends, and a similar case can be made for the processing code that builds accelerator structures for them.  Missing some pixels?  Change over to the other implementation and check it there.</p>
<p>However, some of my most effective uses of this strategy have been more aggressive.  Over the years, I have done a number of hardware acceleration conversions from software rendering engines.  In the old days, I would basically start from scratch, first implementing the environment rendering, then the characters, then the special effects.  There were always lots of little features that got forgotten, and comparing against the original meant playing through the game on two systems at once.</p>
<p>The last two times I did this, I got the software rendering code running on the new platform first, so everything could be tested out at low frame rates, then implemented the hardware accelerated version in parallel, setting things up so you could instantly switch between the two at any time.  For a mobile OpenGL ES application being developed on a windows simulator, I opened a completely separate window for the accelerated view, letting me see it simultaneously with the original software implementation.  This was a <em>very</em> significant development win.</p>
<p>If the task you are working on can be expressed as a pure function that simply processes input parameters into a return structure, it is easy to switch it out for different implementations.  If it is a system that maintains internal state or has multiple entry points, you have to be a bit more careful about switching it in and out.  If it is a gnarly mess with lots of internal callouts to other systems to maintain parallel state changes, then you have some cleanup to do before trying a parallel implementation.</p>
<p>There are two general classes of parallel implementations I work with:  The reference implementation, which is much smaller and simpler, but will be maintained continuously, and the experimental implementation, where you expect one version to “win” and consign the other implementation to source control in a couple weeks after you have some confidence that it is both fully functional and a real improvement.</p>
<p>It is completely reasonable to violate some generally good coding rules while building an experimental implementation – copy, paste, and find-replace rename is actually a good way to start.  Code fearlessly on the copy, while the original remains fully functional and unmolested.  It is often tempting to shortcut this by passing in some kind of option flag to existing code, rather than enabling a full parallel implementation.  It is a  grey area, but I have been tending to find the extra path complexity with the flag approach often leads to messing up both versions as you work, and you usually compromise both implementations to some degree.</p>
<p>Every single time I have undertaken a parallel implementation approach, I have come away feeling that it was beneficial, and I now tend to code in a style that favors it.  Highly recommended.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.altdevblogaday.com/2011/11/22/parallel-implementations/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

<!-- Dynamic page generated in 2.131 seconds. -->
<!-- Cached page generated by WP-Super-Cache on 2013-05-16 00:10:04 -->
<!-- Compression = gzip -->