<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>#AltDevBlogADay &#187; Computer Graphics</title>
	<atom:link href="http://www.altdevblogaday.com/category/programming-2/computer-graphics/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.altdevblogaday.com</link>
	<description>Each day a little more #gamedev love</description>
	<lastBuildDate>Mon, 17 Jun 2013 14:45:09 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.5.1</generator>
		<item>
		<title>Latency Mitigation Strategies</title>
		<link>http://www.altdevblogaday.com/2013/02/22/latency-mitigation-strategies/</link>
		<comments>http://www.altdevblogaday.com/2013/02/22/latency-mitigation-strategies/#comments</comments>
		<pubDate>Fri, 22 Feb 2013 17:25:46 +0000</pubDate>
		<dc:creator>John-Carmack</dc:creator>
				<category><![CDATA[Computer Graphics]]></category>
		<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://www.altdevblogaday.com/?p=29198</guid>
		<description><![CDATA[<p>&#160;</p>
<p><b>Abstract</b></p>
<p>Virtual reality (VR) is one of the most demanding human-in-the-loop applications from a latency standpoint.  The latency between the physical movement of a user’s head and updated photons from a head mounted display reaching their eyes is one of the most critical factors in providing a high quality experience.</p>
<p><a href="http://www.altdevblogaday.com/2013/02/22/latency-mitigation-strategies/" class="more-link">Read more on Latency Mitigation Strategies&#8230;</a></p>
]]></description>
				<content:encoded><![CDATA[<p>&nbsp;</p>
<p><b>Abstract</b></p>
<p>Virtual reality (VR) is one of the most demanding human-in-the-loop applications from a latency standpoint.  The latency between the physical movement of a user’s head and updated photons from a head mounted display reaching their eyes is one of the most critical factors in providing a high quality experience.</p>
<p>Human sensory systems can detect very small relative delays in parts of the visual or, especially, audio fields, but when absolute delays are below approximately 20 milliseconds they are generally imperceptible.  Interactive 3D systems today typically have latencies that are several times that figure, but alternate configurations of the same hardware components can allow that target to be reached.</p>
<p>A discussion of the sources of latency throughout a system follows, along with techniques for reducing the latency in the processing done on the host system.</p>
<p><b>Introduction</b></p>
<p>Updating the imagery in a head mounted display (HMD) based on a head tracking sensor is a subtly different challenge than most human / computer interactions.  With a conventional mouse or game controller, the user is consciously manipulating an interface to complete a task, while the goal of virtual reality is to have the experience accepted at an unconscious level.</p>
<p>Users can adapt to control systems with a significant amount of latency and still perform challenging tasks or enjoy a game; many thousands of people enjoyed playing early network games, even with 400+ milliseconds of latency between pressing a key and seeing a response on screen.</p>
<p>If large amounts of latency are present in the VR system, users may still be able to perform tasks, but it will be by the much less rewarding means of using their head as a controller, rather than accepting that their head is naturally moving around in a stable virtual world.  Perceiving latency in the response to head motion is also one of the primary causes of simulator sickness.  Other technical factors that affect the quality of a VR experience, like head tracking accuracy and precision, may interact with the perception of latency, or, like display resolution and color depth, be largely orthogonal to it.</p>
<p>A total system latency of 50 milliseconds will feel responsive, but still subtly lagging.  One of the easiest ways to see the effects of latency in a head mounted display is to roll your head side to side along the view vector while looking at a clear vertical edge.  Latency will show up as an apparent tilting of the vertical line with the head motion; the view feels “dragged along” with the head motion.  When the latency is low enough, the virtual world convincingly feels like you are simply rotating your view of a stable world.</p>
<p>Extrapolation of sensor data can be used to mitigate some system latency, but even with a sophisticated model of the motion of the human head, there will be artifacts as movements are initiated and changed.  It is always better to not have a problem than to mitigate it, so true latency reduction should be aggressively pursued, leaving extrapolation to smooth out sensor jitter issues and perform only a small amount of prediction.</p>
<p><b>Data collection</b></p>
<p>It is not usually possible to introspectively measure the complete system latency of a VR system, because the sensors and display devices external to the host processor make significant contributions to the total latency.  An effective technique is to record high speed video that simultaneously captures the initiating physical motion and the eventual display update.  The system latency can then be determined by single stepping the video and counting the number of video frames between the two events.</p>
<p>In most cases there will be a significant jitter in the resulting timings due to aliasing between sensor rates, display rates, and camera rates, but conventional applications tend to display total latencies in the dozens of 240 fps video frames.</p>
<p>On an unloaded Windows 7 system with the compositing Aero desktop interface disabled, a gaming mouse dragging a window displayed on a 180 hz CRT monitor can show a response on screen in the same 240 fps video frame that the mouse was seen to first move, demonstrating an end to end latency below four milliseconds.  Many systems need to cooperate for this to happen: The mouse updates 500 times a second, with no filtering or buffering.  The operating system immediately processes the update, and immediately performs GPU accelerated rendering directly to the framebuffer without any page flipping or buffering.  The display accepts the video signal with no buffering or processing, and the screen phosphors begin emitting new photons within microseconds.</p>
<p>In a typical VR system, many things go far less optimally, sometimes resulting in end to end latencies of over 100 milliseconds.</p>
<p><b>Sensors</b></p>
<p>Detecting a physical action can be as simple as a watching a circuit close for a button press, or as complex as analyzing a live video feed to infer position and orientation.</p>
<p>In the old days, executing an IO port input instruction could directly trigger an analog to digital conversion on an ISA bus adapter card, giving a latency on the order of a microsecond and no sampling jitter issues.  Today, sensors are systems unto themselves, and may have internal pipelines and queues that need to be traversed before the information is even put on the USB serial bus to be transmitted to the host.</p>
<p>Analog sensors have an inherent tension between random noise and sensor bandwidth, and some combination of analog and digital filtering is usually done on a signal before returning it.  Sometimes this filtering is excessive, which can contribute significant latency and remove subtle motions completely.</p>
<p>Communication bandwidth delay on older serial ports or wireless links can be significant in some cases.  If the sensor messages occupy the full bandwidth of a communication channel, latency equal to the repeat time of the sensor is added simply for transferring the message.  Video data streams can stress even modern wired links, which may encourage the use of data compression, which usually adds another full frame of latency if not explicitly implemented in a pipelined manner.</p>
<p>Filtering and communication are constant delays, but the discretely packetized nature of most sensor updates introduces a variable latency, or “jitter” as the sensor data is used for a video frame rate that differs from the sensor frame rate.  This latency ranges from close to zero if the sensor packet arrived just before it was queried, up to the repeat time for sensor messages.  Most USB HID devices update at 125 samples per second, giving a jitter of up to 8 milliseconds, but it is possible to receive 1000 updates a second from some USB hardware.  The operating system may impose an additional random delay of up to a couple milliseconds between the arrival of a message and a user mode application getting the chance to process it, even on an unloaded system.</p>
<p><b>Displays</b></p>
<p>On old CRT displays, the voltage coming out of the video card directly modulated the voltage of the electron gun, which caused the screen phosphors to begin emitting photons a few microseconds after a pixel was read from the frame buffer memory.</p>
<p>Early LCDs were notorious for “ghosting” during scrolling or animation, still showing traces of old images many tens of milliseconds after the image was changed, but significant progress has been made in the last two decades.  The transition times for LCD pixels vary based on the start and end values being transitioned between, but a good panel today will have a switching time around ten milliseconds, and optimized displays for active 3D and gaming can have switching times less than half that.</p>
<p>Modern displays are also expected to perform a wide variety of processing on the incoming signal before they change the actual display elements.  A typical Full HD display today will accept 720p or interlaced composite signals and convert them to the 1920&#215;1080 physical pixels.  24 fps movie footage will be converted to 60 fps refresh rates.  Stereoscopic input may be converted from side-by-side, top-down, or other formats to frame sequential for active displays, or interlaced for passive displays.  Content protection may be applied.  Many consumer oriented displays have started applying motion interpolation and other sophisticated algorithms that require multiple frames of buffering.</p>
<p>Some of these processing tasks could be handled by only buffering a single scan line, but some of them fundamentally need one or more full frames of buffering, and display vendors have tended to implement the general case without optimizing for the cases that could be done with low or no delay.  Some consumer displays wind up buffering three or more frames internally, resulting in 50 milliseconds of latency even when the input data could have been fed directly into the display matrix.</p>
<p>Some less common display technologies have speed advantages over LCD panels; OLED pixels can have switching times well under a millisecond, and laser displays are as instantaneous as CRTs.</p>
<p>A subtle latency point is that most displays present an image incrementally as it is scanned out from the computer, which has the effect that the bottom of the screen changes 16 milliseconds later than the top of the screen on a 60 fps display.  This is rarely a problem on a static display, but on a head mounted display it can cause the world to appear to shear left and right, or “waggle” as the head is rotated, because the source image was generated for an instant in time, but different parts are presented at different times.  This effect is usually masked by switching times on LCD HMDs, but it is obvious with fast OLED HMDs.</p>
<p><b>Host processing</b></p>
<p>The classic processing model for a game or VR application is:</p>
<p>Read user input -&gt; run simulation -&gt; issue rendering commands -&gt; graphics drawing -&gt; wait for vsync -&gt; scanout</p>
<p>I = Input sampling and dependent calculation<br />
S = simulation / game execution<br />
R = rendering engine<br />
G = GPU drawing time<br />
V = video scanout time</p>
<p>All latencies are based on a frame time of roughly 16 milliseconds, a progressively scanned display, and zero sensor and pixel latency.</p>
<p>If the performance demands of the application are well below what the system can provide, a straightforward implementation with no parallel overlap will usually provide fairly good latency values.  However, if running synchronized to the video refresh, the minimum latency will still be 16 ms even if the system is infinitely fast.   This rate feels good for most eye-hand tasks, but it is still a perceptible lag that can be felt in a head mounted display, or in the responsiveness of a mouse cursor.</p>
<pre>Ample performance, vsync:
ISRG------------|VVVVVVVVVVVVVVVV|
.................. latency 16 – 32 milliseconds</pre>
<p>Running without vsync on a very fast system will deliver better latency, but only over a fraction of the screen, and with visible tear lines.  The impact of the tear lines are related to the disparity between the two frames that are being torn between, and the amount of time that the tear lines are visible.  Tear lines look worse on a continuously illuminated LCD than on a CRT or laser projector, and worse on a 60 fps display than a 120 fps display.  Somewhat counteracting that, slow switching LCD panels blur the impact of the tear line relative to the faster displays.</p>
<p>If enough frames were rendered such that each scan line had a unique image, the effect would be of a “rolling shutter”, rather than visible tear lines, and the image would feel continuous.  Unfortunately, even rendering 1000 frames a second, giving approximately 15 bands on screen separated by tear lines, is still quite objectionable on fast switching displays, and few scenes are capable of being rendered at that rate, let alone 60x higher for a true rolling shutter on a 1080P display.</p>
<pre>Ample performance, unsynchronized:
ISRG
VVVVV
..... latency 5 – 8 milliseconds at ~200 frames per second</pre>
<p>In most cases, performance is a constant point of concern, and a parallel pipelined architecture is adopted to allow multiple processors to work in parallel instead of sequentially.  Large command buffers on GPUs can buffer an entire frame of drawing commands, which allows them to overlap the work on the CPU, which generally gives a significant frame rate boost at the expense of added latency.</p>
<pre>CPU:ISSSSSRRRRRR----|
GPU:                |GGGGGGGGGGG----|
VID:                |               |VVVVVVVVVVVVVVVV|
    .................................. latency 32 – 48 milliseconds</pre>
<p>When the CPU load for the simulation and rendering no longer fit in a single frame, multiple CPU cores can be used in parallel to produce more frames.  It is possible to reduce frame execution time without increasing latency in some cases, but the natural split of simulation and rendering has often been used to allow effective pipeline parallel operation.  Work queue approaches buffered for maximum overlap can cause an additional frame of latency if they are on the critical user responsiveness path.</p>
<pre>CPU1:ISSSSSSSS-------|
CPU2:                |RRRRRRRRR-------|
GPU :                |                |GGGGGGGGGG------|
VID :                |                |                |VVVVVVVVVVVVVVVV|
     .................................................... latency 48 – 64 milliseconds</pre>
<p>Even if an application is running at a perfectly smooth 60 fps, it can still have host latencies of over 50 milliseconds, and an application targeting 30 fps could have twice that.   Sensor and display latencies can add significant additional amounts on top of that, so the goal of 20 milliseconds motion-to-photons latency is challenging to achieve.</p>
<p><b>Latency Reduction Strategies</b></p>
<p><b>Prevent GPU buffering</b></p>
<p>The drive to win frame rate benchmark wars has led driver writers to aggressively buffer drawing commands, and there have even been cases where drivers ignored explicit calls to glFinish() in the name of improved “performance”.  Today’s fence primitives do appear to be reliably observed for drawing primitives, but the semantics of buffer swaps are still worryingly imprecise.  A recommended sequence of commands to synchronize with the vertical retrace and idle the GPU is:</p>
<p>SwapBuffers();<br />
DrawTinyPrimitive();<br />
InsertGPUFence();<br />
BlockUntilFenceIsReached();</p>
<p>While this should always prevent excessive command buffering on any conformant driver, it could conceivably fail to provide an accurate vertical sync timing point if the driver was transparently implementing triple buffering.</p>
<p>To minimize the performance impact of synchronizing with the GPU, it is important to have sufficient work ready to send to the GPU immediately after the synchronization is performed.  The details of exactly when the GPU can begin executing commands are platform specific, but execution can be explicitly kicked off with glFlush() or equivalent calls.  If the code issuing drawing commands does not proceed fast enough, the GPU may complete all the work and go idle with a “pipeline bubble”.  Because the CPU time to issue a drawing command may have little relation to the GPU time required to draw it, these pipeline bubbles may cause the GPU to take noticeably longer to draw the frame than if it were completely buffered.  Ordering the drawing so that larger and slower operations happen first will provide a cushion, as will pushing as much preparatory work as possible before the synchronization point.</p>
<pre>Run GPU with minimal buffering:
CPU1:ISSSSSSSS-------|
CPU2:                |RRRRRRRRR-------|
GPU :                |-GGGGGGGGGG-----|
VID :                |                |VVVVVVVVVVVVVVVV|
     ................................... latency 32 – 48 milliseconds</pre>
<p>Tile based renderers, as are found in most mobile devices, inherently require a full scene of command buffering before they can generate their first tile of pixels, so synchronizing before issuing any commands will destroy far more overlap.  In a modern rendering engine there may be multiple scene renders for each frame to handle shadows, reflections, and other effects, but increased latency is still a fundamental drawback of the technology.</p>
<p>High end, multiple GPU systems today are usually configured for AFR, or Alternate Frame Rendering, where each GPU is allowed to take twice as long to render a single frame, but the overall frame rate is maintained because there are two GPUs producing frames</p>
<pre>Alternate Frame Rendering dual GPU:
CPU1:IOSSSSSSS-------|IOSSSSSSS-------|
CPU2:                |RRRRRRRRR-------|RRRRRRRRR-------|
GPU1:                | GGGGGGGGGGGGGGGGGGGGGGGG--------|
GPU2:                |                | GGGGGGGGGGGGGGGGGGGGGGG---------|
VID :                |                |                |VVVVVVVVVVVVVVVV|
     .................................................... latency 48 – 64 milliseconds</pre>
<p>Similarly to the case with CPU workloads, it is possible to have two or more GPUs cooperate on a single frame in a way that delivers more work in a constant amount of time, but it increases complexity and generally delivers a lower total speedup.</p>
<p>An attractive direction for stereoscopic rendering is to have each GPU on a dual GPU system render one eye, which would deliver maximum performance and minimum latency, at the expense of requiring the application to maintain buffers across two independent rendering contexts.</p>
<p>The downside to preventing GPU buffering is that throughput performance may drop, resulting in more dropped frames under heavily loaded conditions.</p>
<p><b>Late frame scheduling</b></p>
<p>Much of the work in the simulation task does not depend directly on the user input, or would be insensitive to a frame of latency in it.  If the user processing is done last, and the input is sampled just before it is needed, rather than stored off at the beginning of the frame, the total latency can be reduced.</p>
<p>It is very difficult to predict the time required for the general simulation work on the entire world, but the work just for the player’s view response to the sensor input can be made essentially deterministic.  If this is split off from the main simulation task and delayed until shortly before the end of the frame, it can remove nearly a full frame of latency.</p>
<pre>Late frame scheduling:
CPU1:SSSSSSSSS------I|
CPU2:                |RRRRRRRRR-------|
GPU :                |-GGGGGGGGGG-----|
VID :                |                |VVVVVVVVVVVVVVVV|
                    .................... latency 18 – 34 milliseconds</pre>
<p>Adjusting the view is the most latency sensitive task; actions resulting from other user commands, like animating a weapon or interacting with other objects in the world, are generally insensitive to an additional frame of latency, and can be handled in the general simulation task the following frame.</p>
<p>The drawback to late frame scheduling is that it introduces a tight scheduling requirement that usually requires busy waiting to meet, wasting power.  If your frame rate is determined by the video retrace rather than an arbitrary time slice, assistance from the graphics driver in accurately determining the current scanout position is helpful.</p>
<p><b>View bypass</b></p>
<p>An alternate way of accomplishing a similar, or slightly greater latency reduction Is to allow the rendering code to modify the parameters delivered to it by the game code, based on a newer sampling of user input.</p>
<p>At the simplest level, the user input can be used to calculate a delta from the previous sampling to the current one, which can be used to modify the view matrix that the game submitted to the rendering code.</p>
<p>Delta processing in this way is minimally intrusive, but there will often be situations where the user input should not affect the rendering, such as cinematic cut scenes or when the player has died.  It can be argued that a game designed from scratch for virtual reality should avoid those situations, because a non-responsive view in a HMD is disorienting and unpleasant, but conventional game design has many such cases.</p>
<p>A binary flag could be provided to disable the bypass calculation, but it is useful to generalize such that the game provides an object or function with embedded state that produces rendering parameters from sensor input data instead of having the game provide the view parameters themselves.  In addition to handling the trivial case of ignoring sensor input, the generator function can incorporate additional information such as a head/neck positioning model that modified position based on orientation, or lists of other models to be positioned relative to the updated view.</p>
<p>If the game and rendering code are running in parallel, it is important that the parameter generation function does not reference any game state to avoid race conditions.</p>
<pre>View bypass:
CPU1:ISSSSSSSSS------|
CPU2:                |IRRRRRRRRR------|
GPU :                |--GGGGGGGGGG----|
VID :                |                |VVVVVVVVVVVVVVVV|
                      .................. latency 16 – 32 milliseconds</pre>
<p>The input is only sampled once per frame, but it is simultaneously used by both the simulation task and the rendering task.  Some input processing work is now duplicated by the simulation task and the render task, but it is generally minimal.</p>
<p>The latency for parameters produced by the generator function is now reduced, but other interactions with the world, like muzzle flashes and physics responses, remain at the same latency as the standard model.</p>
<p>A modified form of view bypass could allow tile based GPUs to achieve similar view latencies to non-tiled GPUs, or allow non-tiled GPUs to achieve 100% utilization without pipeline bubbles by the following steps:</p>
<p>Inhibit the execution of GPU commands, forcing them to be buffered.  OpenGL has only the deprecated display list functionality to approximate this, but a control extension could be formulated.</p>
<p>All calculations that depend on the view matrix must reference it independently from a buffer object, rather than from inline parameters or as a composite model-view-projection (MVP) matrix.</p>
<p>After all commands have been issued and the next frame has started, sample the user input, run it through the parameter generator, and put the resulting view matrix into the buffer object for referencing by the draw commands.</p>
<p>Kick off the draw command execution.</p>
<pre>Tiler optimized view bypass:
CPU1:ISSSSSSSSS------|
CPU2:                |IRRRRRRRRRR-----|I
GPU :                |                |-GGGGGGGGGG-----|
VID :                |                |                |VVVVVVVVVVVVVVVV|
                                       .................. latency 16 – 32 milliseconds</pre>
<p>Any view frustum culling that was performed to avoid drawing some models may be invalid if the new view matrix has changed substantially enough from what was used during the rendering task.  This can be mitigated at some performance cost by using a larger frustum field of view for culling, and hardware clip planes based on the culling frustum limits can be used to guarantee a clean edge if necessary.  Occlusion errors from culling, where a bright object is seen that should have been occluded by an object that was incorrectly culled, are very distracting, but a temporary clean encroaching of black at a screen edge during rapid rotation is almost unnoticeable.</p>
<p><b>Time warping</b></p>
<p>If you had perfect knowledge of how long the rendering of a frame would take, some additional amount of latency could be saved by late frame scheduling the entire rendering task, but this is not practical due to the wide variability in frame rendering times.</p>
<pre>Late frame input sampled view bypass:
CPU1:ISSSSSSSSS------|
CPU2:                |----IRRRRRRRRR--|
GPU :                |------GGGGGGGGGG|
VID :                |                |VVVVVVVVVVVVVVVV|
                          .............. latency 12 – 28 milliseconds</pre>
<p>However, a post processing task on the rendered image can be counted on to complete in a fairly predictable amount of time, and can be late scheduled more easily.  Any pixel on the screen, along with the associated depth buffer value, can be converted back to a world space position, which can be re-transformed to a different screen space pixel location for a modified set of view parameters.</p>
<p>After drawing a frame with the best information at your disposal, possibly with bypassed view parameters, instead of displaying it directly, fetch the latest user input, generate updated view parameters, and calculate a transformation that warps the rendered image into a position that approximates where it would be with the updated parameters.  Using that transform, warp the rendered image into an updated form on screen that reflects the new input.  If there are two dimensional overlays present on the screen that need to remain fixed, they must be drawn or composited in after the warp operation, to prevent them from incorrectly moving as the view parameters change.</p>
<pre>Late frame scheduled time warp:
CPU1:ISSSSSSSSS------|
CPU2:                |RRRRRRRRRR----IR|
GPU :                |-GGGGGGGGGG----G|
VID :                |                |VVVVVVVVVVVVVVVV|
                                    .... latency 2 – 18 milliseconds</pre>
<p>If the difference between the view parameters at the time of the scene rendering and the time of the final warp is only a change in direction, the warped image can be almost exactly correct within the limits of the image filtering.  Effects that are calculated relative to the screen, like depth based fog (versus distance based fog) and billboard sprites will be slightly different, but not in a manner that is objectionable.</p>
<p>If the warp involves translation as well as direction changes, geometric silhouette edges begin to introduce artifacts where internal parallax would have revealed surfaces not visible in the original rendering.  A scene with no silhouette edges, like the inside of a box, can be warped significant amounts and display only changes in texture density, but translation warping realistic scenes will result in smears or gaps along edges.  In many cases these are difficult to notice, and they always disappear when motion stops, but first person view hands and weapons are a prominent case.  This can be mitigated by limiting the amount of translation warp, compressing or making constant the depth range of the scene being warped to limit the dynamic separation, or rendering the disconnected near field objects as a separate plane, to be composited in after the warp.</p>
<p>If an image is being warped to a destination with the same field of view, most warps will leave some corners or edges of the new image undefined, because none of the source pixels are warped to their locations.  This can be mitigated by rendering a larger field of view than the destination requires; but simply leaving unrendered pixels black is surprisingly unobtrusive, especially in a wide field of view HMD.</p>
<p>A forward warp, where source pixels are deposited in their new positions, offers the best accuracy for arbitrary transformations.  At the limit, the frame buffer and depth buffer could be treated as a height field, but millions of half pixel sized triangles would have a severe performance cost.  Using a grid of triangles at some fraction of the depth buffer resolution can bring the cost down to a very low level, and the trivial case of treating the rendered image as a single quad avoids all silhouette artifacts at the expense of incorrect pixel positions under translation.</p>
<p>Reverse warping, where the pixel in the source rendering is estimated based on the position in the warped image, can be more convenient because it is implemented completely in a fragment shader.  It can produce identical results for simple direction changes, but additional artifacts near geometric boundaries are introduced if per-pixel depth information is considered, unless considerable effort is expended to search a neighborhood for the best source pixel.</p>
<p>If desired, it is straightforward to incorporate motion blur in a reverse mapping by taking several samples along the line from the pixel being warped to the transformed position in the source image.</p>
<p>Reverse mapping also allows the possibility of modifying the warp through the video scanout.  The view parameters can be predicted ahead in time to when the scanout will read the bottom row of pixels, which can be used to generate a second warp matrix.  The warp to be applied can be interpolated between the two of them based on the pixel row being processed.  This can correct for the “waggle” effect on a progressively scanned head mounted display, where the 16 millisecond difference in time between the display showing the top line and bottom line results in a perceived shearing of the world under rapid rotation on fast switching displays.</p>
<p><b>Continuously updated time warping</b></p>
<p>If the necessary feedback and scheduling mechanisms are available, instead of predicting what the warp transformation should be at the bottom of the frame and warping the entire screen at once, the warp to screen can be done incrementally while continuously updating the warp matrix as new input arrives.</p>
<pre>Continuous time warp:
CPU1:ISSSSSSSSS------|
CPU2:                |RRRRRRRRRRR-----|
GPU :                |-GGGGGGGGGGGG---|
WARP:                |               W| W W W W W W W W|
VID :                |                |VVVVVVVVVVVVVVVV|
                                     ... latency 2 – 3 milliseconds for 500hz sensor updates</pre>
<p>The ideal interface for doing this would be some form of “scanout shader” that would be called “just in time” for the video display.  Several video game systems like the Atari 2600, Jaguar, and Nintendo DS have had buffers ranging from half a scan line to several scan lines that were filled up in this manner.</p>
<p>Without new hardware support, it is still possible to incrementally perform the warping directly to the front buffer being scanned for video, and not perform a swap buffers operation at all.</p>
<p>A CPU core could be dedicated to the task of warping scan lines at roughly the speed they are consumed by the video output, updating the time warp matrix each scan line to blend in the most recently arrived sensor information.</p>
<p>GPUs can perform the time warping operation much more efficiently than a conventional CPU can, but the GPU will be busy drawing the next frame during video scanout, and GPU drawing operations cannot currently be scheduled with high precision due to the difficulty of task switching the deep pipelines and extensive context state.  However, modern GPUs are beginning to allow compute tasks to run in parallel with graphics operations, which may allow a fraction of a GPU to be dedicated to performing the warp operations as a shared parameter buffer is updated by the CPU.</p>
<p><b>Discussion</b></p>
<p>View bypass and time warping are complementary techniques that can be applied independently or together.  Time warping can warp from a source image at an arbitrary view time / location to any other one, but artifacts from internal parallax and screen edge clamping are reduced by using the most recent source image possible, which view bypass rendering helps provide.</p>
<p>Actions that require simulation state changes, like flipping a switch or firing a weapon, still need to go through the full pipeline for 32 – 48 milliseconds of latency based on what scan line the result winds up displaying on the screen, and translational information may not be completely faithfully represented below the 16 – 32 milliseconds of the view bypass rendering, but the critical head orientation feedback can be provided in 2 – 18 milliseconds on a 60 hz display.  In conjunction with low latency sensors and displays, this will generally be perceived as immediate.  Continuous time warping opens up the possibility of latencies below 3 milliseconds, which may cross largely unexplored thresholds in human / computer interactivity.</p>
<p>Conventional computer interfaces are generally not as latency demanding as virtual reality, but sensitive users can tell the difference in mouse response down to the same 20 milliseconds or so, making it worthwhile to apply these techniques even in applications without a VR focus.</p>
<p>A particularly interesting application is in “cloud gaming”, where a simple client appliance or application forwards control information to a remote server, which streams back real time video of the game.  This offers significant convenience benefits for users, but the inherent network and compression latencies makes it a lower quality experience for action oriented titles.  View bypass and time warping can both be performed on the server, regaining a substantial fraction of the latency imposed by the network.  If the cloud gaming client was made more sophisticated, time warping could be performed locally, which could theoretically reduce the latency to the same levels as local applications, but it would probably be prudent to restrict the total amount of time warping to perhaps 30 or 40 milliseconds to limit the distance from the source images.</p>
<p><b>Acknowledgements</b></p>
<p>Zenimax for allowing me to publish this openly.</p>
<p>Hillcrest Labs for inertial sensors and experimental firmware.</p>
<p>Emagin for access to OLED displays.</p>
<p>Oculus for a prototype Rift HMD.</p>
<p>Nvidia for an experimental driver with access to the current scan line number.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.altdevblogaday.com/2013/02/22/latency-mitigation-strategies/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Implementing Voxel Cone Tracing</title>
		<link>http://www.altdevblogaday.com/2013/01/31/implementing-voxel-cone-tracing/</link>
		<comments>http://www.altdevblogaday.com/2013/01/31/implementing-voxel-cone-tracing/#comments</comments>
		<pubDate>Thu, 31 Jan 2013 14:22:12 +0000</pubDate>
		<dc:creator>Simon Yeung</dc:creator>
				<category><![CDATA[Computer Graphics]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[global illumination]]></category>
		<category><![CDATA[graphics]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[sparse voxel octree]]></category>
		<category><![CDATA[voxel cone tracing]]></category>

		<guid isPermaLink="false">http://www.altdevblogaday.com/?p=29117</guid>
		<description><![CDATA[<p>[<b>Updated on 25-2-2013</b>: added a paragraph about lowering the voxel update frequency for faster dynamic update. The <a href="https://docs.google.com/file/d/0B_CrrCOiha-VdWp4cXNZcllWRmM/edit">demo</a> also added a combo box for choosing the update frequency]</p>
<p><b><span class="Apple-style-span" style="font-size: large">Introduction</span></b></p>
<p>In last year SIGGRAPH, Epic games presented their <a href="http://www.unrealengine.com/files/misc/The_Technology_Behind_the_Elemental_Demo_16x9_(2).pdf">real time GI solution</a> which based on <a href="http://maverick.inria.fr/Publications/2011/CNSGE11b/GIVoxels-pg2011-authors.pdf">voxel cone tracing</a>. They showed some nice results which attract me to implement the technique and my implementation runs at around 22~30fps (updated every frame) at 1024&#215;768 screen resolution using a 256x256x256 voxel volume on my GTX460 graphic card. The demo program can be downloaded <a href="https://docs.google.com/file/d/0B_CrrCOiha-VdWp4cXNZcllWRmM/edit">here</a> which requires a DX11 GPU to run.</p>
<p><a href="http://www.altdevblogaday.com/2013/01/31/implementing-voxel-cone-tracing/" class="more-link">Read more on Implementing Voxel Cone Tracing&#8230;</a></p>
]]></description>
				<content:encoded><![CDATA[<p>[<b>Updated on 25-2-2013</b>: added a paragraph about lowering the voxel update frequency for faster dynamic update. The <a href="https://docs.google.com/file/d/0B_CrrCOiha-VdWp4cXNZcllWRmM/edit">demo</a> also added a combo box for choosing the update frequency]</p>
<p><b><span class="Apple-style-span" style="font-size: large">Introduction</span></b></p>
<p>In last year SIGGRAPH, Epic games presented their <a href="http://www.unrealengine.com/files/misc/The_Technology_Behind_the_Elemental_Demo_16x9_(2).pdf">real time GI solution</a> which based on <a href="http://maverick.inria.fr/Publications/2011/CNSGE11b/GIVoxels-pg2011-authors.pdf">voxel cone tracing</a>. They showed some nice results which attract me to implement the technique and my implementation runs at around 22~30fps (updated every frame) at 1024&#215;768 screen resolution using a 256x256x256 voxel volume on my GTX460 graphic card. The demo program can be downloaded <a href="https://docs.google.com/file/d/0B_CrrCOiha-VdWp4cXNZcllWRmM/edit">here</a> which requires a DX11 GPU to run.</p>
<table>
<tbody>
<tr>
<td>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a style="margin-left: auto;margin-right: auto" href="http://3.bp.blogspot.com/-XaUDldta9LI/UQVHxULKiOI/AAAAAAAAAhQ/-4Sb64oMMRU/s1600/gi0.png"><img alt="" src="http://3.bp.blogspot.com/-XaUDldta9LI/UQVHxULKiOI/AAAAAAAAAhQ/-4Sb64oMMRU/s1600/gi0.png" width="320" height="248" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center">With GI</td>
</tr>
</tbody>
</table>
</td>
<td>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a style="margin-left: auto;margin-right: auto" href="http://1.bp.blogspot.com/-tjaW2p5zK5o/UQVH2qHmiSI/AAAAAAAAAhY/C84ROUJDIzk/s1600/gi1.png"><img alt="" src="http://1.bp.blogspot.com/-tjaW2p5zK5o/UQVH2qHmiSI/AAAAAAAAAhY/C84ROUJDIzk/s1600/gi1.png" width="320" height="248" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center">Without GI</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
<p><b><span class="Apple-style-span" style="font-size: large">Overview</span></b></p>
<p>There are 5 major steps in voxel cone tracing:</p>
<blockquote><p>0. Given a scene with directly lighting only<br />
1. Voxelize the triangle meshes<br />
2. Construct sparse voxel octree<br />
3. Inject direct lighting into the octree<br />
4. Filter the direct lighting to generate mip-map<br />
5. Sample the mip-mapped values by cone tracing</p></blockquote>
<table>
<tbody>
<tr>
<td>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a style="margin-left: auto;margin-right: auto" href="http://2.bp.blogspot.com/-MO4f_YWl9wY/UQVH9eQDFkI/AAAAAAAAAhg/5_DCCzljCrE/s1600/ov0.png"><img alt="" src="http://2.bp.blogspot.com/-MO4f_YWl9wY/UQVH9eQDFkI/AAAAAAAAAhg/5_DCCzljCrE/s1600/ov0.png" width="200" height="155" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center">Step 0</td>
</tr>
</tbody>
</table>
</td>
<td>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a style="margin-left: auto;margin-right: auto" href="http://4.bp.blogspot.com/-P2BVPwilQsE/UQVIEgGhRoI/AAAAAAAAAho/XdXtwZ_EMmw/s1600/ov1.png"><img alt="" src="http://4.bp.blogspot.com/-P2BVPwilQsE/UQVIEgGhRoI/AAAAAAAAAho/XdXtwZ_EMmw/s1600/ov1.png" width="200" height="155" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center">Step 1</td>
</tr>
</tbody>
</table>
</td>
<td>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a style="margin-left: auto;margin-right: auto" href="http://1.bp.blogspot.com/-IOAxp7niW_s/UQVII8ur4OI/AAAAAAAAAhw/OJdMZ6trHjk/s1600/ov2.png"><img alt="" src="http://1.bp.blogspot.com/-IOAxp7niW_s/UQVII8ur4OI/AAAAAAAAAhw/OJdMZ6trHjk/s1600/ov2.png" width="200" height="155" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center">Step 2</td>
</tr>
</tbody>
</table>
</td>
</tr>
<tr>
<td>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a style="margin-left: auto;margin-right: auto" href="http://1.bp.blogspot.com/-_Y2TWCOlTxs/UQVINT1T_iI/AAAAAAAAAh4/4NGeItw8SIs/s1600/ov3.png"><img alt="" src="http://1.bp.blogspot.com/-_Y2TWCOlTxs/UQVINT1T_iI/AAAAAAAAAh4/4NGeItw8SIs/s1600/ov3.png" width="200" height="155" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center">Step 3</td>
</tr>
</tbody>
</table>
</td>
<td>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a style="margin-left: auto;margin-right: auto" href="http://2.bp.blogspot.com/-f-2JVZ5PLiU/UQVIRhWUIpI/AAAAAAAAAiA/2njRLi2e19c/s1600/ov4.png"><img alt="" src="http://2.bp.blogspot.com/-f-2JVZ5PLiU/UQVIRhWUIpI/AAAAAAAAAiA/2njRLi2e19c/s1600/ov4.png" width="200" height="155" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center">Step 4</td>
</tr>
</tbody>
</table>
</td>
<td>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a style="margin-left: auto;margin-right: auto" href="http://4.bp.blogspot.com/-6oxrpji5wd4/UQVIW5WFmQI/AAAAAAAAAiI/QiA2Zv9viZc/s1600/ov5.png"><img alt="" src="http://4.bp.blogspot.com/-6oxrpji5wd4/UQVIW5WFmQI/AAAAAAAAAiI/QiA2Zv9viZc/s1600/ov5.png" width="200" height="155" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center">Step 5</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
<p>The step 1 and 2 can be done only once for static geometry while step 3 to 5 need to be done every frame. The following sections will briefly describe the above steps, you may want to take a look at the <a href="http://maverick.inria.fr/Publications/2011/CNSGE11b/GIVoxels-pg2011-authors.pdf">original paper</a> first as some of the details will not be repeated in the following sections.</p>
<div>
<p><b><span class="Apple-style-span" style="font-size: large">Voxelization pass</span></b></p>
<p>The first step is to voxelize the scene. I first output all the voxels from triangle meshes into a big voxel fragment queue buffer. Each voxel fragment use 16 bytes:</p>
<div class="separator" style="clear: both;text-align: center"><a style="margin-left: 1em;margin-right: 1em" href="http://4.bp.blogspot.com/-bilhjuv67FA/UQVKbBjDjiI/AAAAAAAAAis/Oq_3yEp_C4g/s1600/voxelDataFragmentLayout.png"><img alt="" src="http://4.bp.blogspot.com/-bilhjuv67FA/UQVKbBjDjiI/AAAAAAAAAis/Oq_3yEp_C4g/s1600/voxelDataFragmentLayout.png" width="197" height="200" border="0" /></a></div>
<p>The first 4 bytes store the position of that voxel inside the voxel volume which is at most 512, so 4 bytes is enough to store the XYZ coordinates.</p>
<p>Voxel fragments are created using the conservative rasterization described in this <a href="http://www.seas.upenn.edu/%7Epcozzi/OpenGLInsights/OpenGLInsights-SparseVoxelization.pdf">OpenGL Insights</a> charter which requires only 1 geometry pass by enlarging the triangle a bit using the geometry shader. So I modify my <a href="http://simonstechblog.blogspot.hk/2012/08/shader-generator.html">shader generator</a> to generate the shaders for creating voxel fragments based on the material.</p>
<table>
<tbody>
<tr>
<td>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a style="margin-left: auto;margin-right: auto" href="http://1.bp.blogspot.com/-NRyrOzHslBw/UQVKjsrh-QI/AAAAAAAAAi0/cn2y1bzi8u8/s1600/vp0.png"><img alt="" src="http://1.bp.blogspot.com/-NRyrOzHslBw/UQVKjsrh-QI/AAAAAAAAAi0/cn2y1bzi8u8/s1600/vp0.png" width="320" height="248" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center">Scene before voxelize</td>
</tr>
</tbody>
</table>
</td>
<td>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a style="margin-left: auto;margin-right: auto" href="http://4.bp.blogspot.com/-LihpATI6lKk/UQVKn_Kr5mI/AAAAAAAAAi8/CwrITUtsKUs/s1600/vp1.png"><img alt="" src="http://4.bp.blogspot.com/-LihpATI6lKk/UQVKn_Kr5mI/AAAAAAAAAi8/CwrITUtsKUs/s1600/vp1.png" width="320" height="248" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center">Scene after voxelize</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
<p><b><span class="Apple-style-span" style="font-size: large">Octree building pass</span></b></p>
<p>I use a similar data structure that described in the <a href="http://maverick.inria.fr/Publications/2011/CNSGE11b/GIVoxels-pg2011-authors.pdf">voxel cone tracing paper</a> and <a href="http://maverick.inria.fr/Publications/2011/Cra11/">giga voxel paper</a> with a large octree node buffer storing the octree/voxel node data (with 8 node grouped into 1 tile which described in the paper) and a 3D texture storing the reflected radiance and alpha from the direct lighting at that voxel (I assume all the surface reflect light diffusely so only 1 3D texture is used which is different from the paper storing incoming radiance).</p>
<p>Each octree node is 28 bytes with the first 4 bytes storing the child tile index with another 24 bytes storing the neighbor node byte offset from the start of the octree node buffer:</p>
<div class="separator" style="clear: both;text-align: center"><a style="margin-left: 1em;margin-right: 1em" href="http://3.bp.blogspot.com/-s_YGclOlD4M/UQVLH2S0e8I/AAAAAAAAAjE/kLbz3nVoffQ/s1600/octreeNodeDataLayout.png"><img alt="" src="http://3.bp.blogspot.com/-s_YGclOlD4M/UQVLH2S0e8I/AAAAAAAAAjE/kLbz3nVoffQ/s1600/octreeNodeDataLayout.png" width="187" height="320" border="0" /></a></div>
<p>The child node tile index is also used to index the 3D texture brick(more details can be found in <a href="http://maverick.inria.fr/Publications/2011/CNSGE11b/GIVoxels-pg2011-authors.pdf">voxel cone tracing paper</a> about the texture brick) which store the reflected radiance of that octree node tile. And since I use 5 bits for store different bit flags for dynamic update, so my voxel volume can only be at most 512x512x512 large.</p>
<p>In the octree leaf node, it stores the voxel data directly which only use 16 bytes:</p>
<div class="separator" style="clear: both;text-align: center"><a style="margin-left: 1em;margin-right: 1em" href="http://4.bp.blogspot.com/-23Op1U3D4sQ/UQVLPPlwgmI/AAAAAAAAAjM/eg7h4EHjbpk/s1600/octreeLeafNodeDataLayout.png"><img alt="" src="http://4.bp.blogspot.com/-23Op1U3D4sQ/UQVLPPlwgmI/AAAAAAAAAjM/eg7h4EHjbpk/s1600/octreeLeafNodeDataLayout.png" width="199" height="320" border="0" /></a></div>
<p>the first 4 bytes store the bit flag to indicate whether that voxel is created from static mesh so that it will not be overwrite by dynamic geometry. Also the 1 byte counter stored along with the voxel normal is used to perform averaging when different voxel fragments fall into the same voxel. The steps to perform atomic average can be found in the <a href="http://www.seas.upenn.edu/%7Epcozzi/OpenGLInsights/OpenGLInsights-SparseVoxelization.pdf">OpenGL Insight Chapter</a>.</p>
<p>So given the voxel fragment queue output from the previous steps, we can build the octree using the steps described in  <a href="http://www.seas.upenn.edu/%7Epcozzi/OpenGLInsights/OpenGLInsights-SparseVoxelization.pdf">OpenGL Insights Chapter</a> and average the octree leaf node values when different voxels fall into the same node.</p>
<div>
<table>
<tbody>
<tr>
<td>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a style="margin-left: auto;margin-right: auto" href="http://2.bp.blogspot.com/-x5iweSNn5AE/UQVLpqY6fPI/AAAAAAAAAjU/yROoR5S8Jjw/s1600/op0.png"><img alt="" src="http://2.bp.blogspot.com/-x5iweSNn5AE/UQVLpqY6fPI/AAAAAAAAAjU/yROoR5S8Jjw/s1600/op0.png" width="320" height="248" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center">showing highest octree level</td>
</tr>
</tbody>
</table>
</td>
<td>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a style="margin-left: auto;margin-right: auto" href="http://1.bp.blogspot.com/-gaewbHxL1YA/UQVLuId7ykI/AAAAAAAAAjc/C4CXQRKQe0o/s1600/op1.png"><img alt="" src="http://1.bp.blogspot.com/-gaewbHxL1YA/UQVLuId7ykI/AAAAAAAAAjc/C4CXQRKQe0o/s1600/op1.png" width="320" height="248" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center">showing the octree with the voxels</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
<p><b><span class="Apple-style-span" style="font-size: large">Inject direct lighting pass</span></b></p>
<p>After building the voxelized scene, we need to add the lighting data into the data structure to calculate global illumination. First, we render the shadow map from the light&#8217;s point of view. Then for each pixel of the shadow map, we can re-construct the world position of the shadow map texel and then traverse down the octree data structure to calculate the reflected radiance(assume reflected diffusely) and write it to the 3D texture bricks.</p>
<div class="separator" style="clear: both;text-align: center"><a style="margin-left: 1em;margin-right: 1em" href="http://2.bp.blogspot.com/-vbpJe1TgxS4/UQVMStDZM2I/AAAAAAAAAjk/BTCl1_Kjcdo/s1600/ip0.png"><img alt="" src="http://2.bp.blogspot.com/-vbpJe1TgxS4/UQVMStDZM2I/AAAAAAAAAjk/BTCl1_Kjcdo/s1600/ip0.png" width="320" height="248" border="0" /></a></div>
<p>In my engine, I use cascade shadow map with 4 cascades, the last cascade is used to perform the light injection, with the slope scale depth bias disabled, otherwise, the re-constructed world position may not located exactly inside the voxel. Also, it is better to stablize the shadow map (in the demo, the position of the directional light is computed from view camera)  in order to avoid flicking during cone tracing step when the camera moves. But after stablizing the shadow map, not all the voxels are filled with lighting data at 512x512x512 voxel resolution as the shadow map is not fully utilized&#8230;</p>
<div class="separator" style="clear: both;text-align: center"><a style="margin-left: 1em;margin-right: 1em" href="http://1.bp.blogspot.com/-Yh_KS8-tplc/UQVMYUaxPeI/AAAAAAAAAjs/t2Gkjd8OqK4/s1600/ip1.png"><img alt="" src="http://1.bp.blogspot.com/-Yh_KS8-tplc/UQVMYUaxPeI/AAAAAAAAAjs/t2Gkjd8OqK4/s1600/ip1.png" width="320" height="248" border="0" /></a></div>
<div class="separator" style="clear: both;text-align: center"></div>
<p>This artifact will also occur when the shadow map resolution is low. Consider the the figure below, we have a directional light injecting lighting to the scene (which have a wall and floor).</p>
<div class="separator" style="clear: both;text-align: center"><a style="margin-left: 1em;margin-right: 1em" href="http://4.bp.blogspot.com/-SKgY0-mICw8/UQaSfPcnP6I/AAAAAAAAAoM/8GJL_MwD6yw/s1600/smLowRes.png"><img alt="" src="http://4.bp.blogspot.com/-SKgY0-mICw8/UQaSfPcnP6I/AAAAAAAAAoM/8GJL_MwD6yw/s1600/smLowRes.png" width="157" height="200" border="0" /></a></div>
<p>As we launch one thread for each shadow map texel to determine which voxel get light injected, we can only inject light to 3 voxels(the white square in the figure) in the above case. While if the shadow map resolution is high enough, those 5 vertical voxels should all be injected with light.</p>
<p><b><span class="Apple-style-span" style="font-size: large">Filtering pass</span></b></p>
<p>In order to perform cone tracing step, we need to filter the lighting data at the leaf node of the octree. The voxel lighting data is filtered anisotropically along the positive and negative XYZ directions. For example, in a 2D case(which is easier to explain and very similar to the 3D case), for each node tile, we will have 5&#215;5 voxels which will be filtered to 3&#215;3 voxels in the upper mip level like this:</p>
<div class="separator" style="clear: both;text-align: center"><a style="margin-left: 1em;margin-right: 1em" href="http://2.bp.blogspot.com/-aBit6SzNDeM/UQVMeL4G6eI/AAAAAAAAAj0/IdCtwuWiRV0/s1600/filterVoxel25to9.png"><img alt="" src="http://2.bp.blogspot.com/-aBit6SzNDeM/UQVMeL4G6eI/AAAAAAAAAj0/IdCtwuWiRV0/s1600/filterVoxel25to9.png" width="320" height="150" border="0" /></a></div>
<p>To filter the center voxel(texel E in the figure) in the node tile in upper mip level along the +X direction, we need to consider the 9 texels (texel g, h, i, l, m, n, q, r, s in the figure) in the lower level, The texels are divided into 4 groups as the figure below:</p>
<div class="separator" style="clear: both;text-align: center"><a style="margin-left: 1em;margin-right: 1em" href="http://2.bp.blogspot.com/-EjBColCh7AI/UQVMlIEbo3I/AAAAAAAAAj8/aegaQqszbuY/s1600/divide3x3into4gp.png"><img alt="" src="http://2.bp.blogspot.com/-EjBColCh7AI/UQVMlIEbo3I/AAAAAAAAAj8/aegaQqszbuY/s1600/divide3x3into4gp.png" width="320" height="159" border="0" /></a></div>
<p>in each group, we filter along the +X direction, for example in the upper left group, we first alpha blend the value in the +X direction and then average them to get the value for that group.</p>
<div class="separator" style="clear: both;text-align: center"><a style="margin-left: 1em;margin-right: 1em" href="http://3.bp.blogspot.com/-tTuW5WNU9vM/UQVMsbmYOAI/AAAAAAAAAkE/9XljOpicSc0/s1600/filterBlending.png"><img alt="" src="http://3.bp.blogspot.com/-tTuW5WNU9vM/UQVMsbmYOAI/AAAAAAAAAkE/9XljOpicSc0/s1600/filterBlending.png" width="200" height="175" border="0" /></a></div>
<p>After calculate the values for all the 4 groups, we can compute the filtered center voxel value by repeating the above steps for the 4 group values.</p>
<p>But for the corner values(e.g. texel a, b, f, g) in a node tile we cannot access all the 9 texels to compute a filtered value as those values are in the neighbor node tile. So the group value are partially computed and store in the 3D texture first, then we rely on the transfer step described in the <a href="http://maverick.inria.fr/Publications/2011/CNSGE11b/GIVoxels-pg2011-authors.pdf">cone tracing paper</a> to complete the calculation. In other words, during the transfer steps, some of the filtered value is computed by first alpha blending the neighbor group values followed by averaging them, and other filtered value will be computed by averaging first followed by alpha blending. Although this calculation is not commutative, but this can reduce the number of dispatch passes to compute the filtered value with a similar result.</p>
<p>So we have 6 directional values for a filtered voxels. To get a sample from it for a particularly direction, we can sample it using the method like <a href="http://www.valvesoftware.com/publications/2006/SIGGRAPH06_Course_ShadingInValvesSourceEngine.pdf">ambient cube</a> with 3 texture read (the following code is simplified):</p>
<ol>
<li>float3 sampleAnisotropic(float3 direction)</li>
<li>{</li>
<li>    float3 nSquared = direction * direction;</li>
<li>    uint3 isNegative = ( direction &lt; 0.0 );</li>
<li>    float3 filteredColor=</li>
<li>        nSquared.x * anisotropicFilteredBrickValue[isNegative.x] +</li>
<li>        nSquared.y * anisotropicFilteredBrickValue[isNegative.y+2] +</li>
<li>        nSquared.z * anisotropicFilteredBrickValue[isNegative.z+4];</li>
<li>    return filteredColor;</li>
<li>}</li>
</ol>
<p><b><span class="Apple-style-span" style="font-size: large">Voxel Cone Tracing pass</span></b></p>
</div>
<div>
<p>After doing the above steps, we finally can compute our global illumination. We first consider the simple case for ambient occlusion, which need to compute the AO integral. We approximate it by partitioning the integral with several cones:</p>
<div class="separator" style="clear: both;text-align: center"><a style="margin-left: 1em;margin-right: 1em" href="http://1.bp.blogspot.com/-IRkYoeGcs_M/UQVQGSuEMrI/AAAAAAAAAko/0K2iOTp5O5s/s1600/integral_ao.png"><img alt="" src="http://1.bp.blogspot.com/-IRkYoeGcs_M/UQVQGSuEMrI/AAAAAAAAAko/0K2iOTp5O5s/s1600/integral_ao.png" width="320" height="149" border="0" /></a></div>
<p>Where each partition need to multiply with a weight W. In my implementation, 6 cones are traced with 60 degree over the hemi-sphere in the following direction (Y-axis as the up vector):</p>
<div class="separator" style="clear: both;text-align: center"><a style="margin-left: 1em;margin-right: 1em" href="http://1.bp.blogspot.com/-FWGnacfrQ2o/UQVQNYhEj9I/AAAAAAAAAkw/99C0J8G-1Rk/s1600/coneTraceDir.png"><img alt="" src="http://1.bp.blogspot.com/-FWGnacfrQ2o/UQVQNYhEj9I/AAAAAAAAAkw/99C0J8G-1Rk/s1600/coneTraceDir.png" width="200" height="101" border="0" /></a></div>
<p>each with a weight W:</p>
<div class="separator" style="clear: both;text-align: center"><a style="margin-left: 1em;margin-right: 1em" href="http://3.bp.blogspot.com/-L5C8sxl5rYE/UQVQUCzd2KI/AAAAAAAAAk4/cXLoT4UvvGM/s1600/coneTraceWeight.png"><img alt="" src="http://3.bp.blogspot.com/-L5C8sxl5rYE/UQVQUCzd2KI/AAAAAAAAAk4/cXLoT4UvvGM/s1600/coneTraceWeight.png" width="58" height="200" border="0" /></a></div>
<p>Then to calculate the visibility inside one cone, we take multiple samples from the filtered 3D texture bricks along the cone direction:</p>
<div class="separator" style="clear: both;text-align: center"><a style="margin-left: 1em;margin-right: 1em" href="http://3.bp.blogspot.com/-HhHImwIQbcU/UQVQt62qCFI/AAAAAAAAAlA/ezXJXNgsJB4/s1600/coneTraceSampleLoc.png"><img alt="" src="http://3.bp.blogspot.com/-HhHImwIQbcU/UQVQt62qCFI/AAAAAAAAAlA/ezXJXNgsJB4/s1600/coneTraceSampleLoc.png" width="200" height="155" border="0" /></a></div>
<p>But, the remaining problem is to determine the sampling position. Since we manually filter the 3D texture bricks, the hardware quadrilinear interpolation will not work.  So to avoid performing the quadrilinear interpolation manually, It is better to make sampling position located at each mip level, having voxel size equals to the cone width at that position. This position is calculated by assuming the shape of voxel is sphere rather than cube(which is easier to calculate) as follows:</p>
<div class="separator" style="clear: both;text-align: center"><a style="margin-left: 1em;margin-right: 1em" href="http://4.bp.blogspot.com/-4Ch8UtZb0WE/UQVQz-ftlaI/AAAAAAAAAlI/NY9GZw5Ik7I/s1600/coneTraceSampleSphereLoc.png"><img alt="" src="http://4.bp.blogspot.com/-4Ch8UtZb0WE/UQVQz-ftlaI/AAAAAAAAAlI/NY9GZw5Ik7I/s1600/coneTraceSampleSphereLoc.png" width="200" height="155" border="0" /></a></div>
<p>Then the sampling location can be calculated given the cone origin, cone angle, trace direction and voxel radius with some simple geometry. And here is the voxel AO result:</p>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a style="margin-left: auto;margin-right: auto" href="http://3.bp.blogspot.com/-M0QCsel-EYk/UQVQ5N27v7I/AAAAAAAAAlQ/FandFHx7I2k/s1600/vctp_ao.png"><img alt="" src="http://3.bp.blogspot.com/-M0QCsel-EYk/UQVQ5N27v7I/AAAAAAAAAlQ/FandFHx7I2k/s1600/vctp_ao.png" width="320" height="248" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center">AO result using 512x512x512 voxel volume</td>
</tr>
</tbody>
</table>
<p>Next, for diffuse indirect illumination, the calculation is very similar to AO:</p>
<div class="separator" style="clear: both;text-align: center"><a style="margin-left: 1em;margin-right: 1em" href="http://1.bp.blogspot.com/-I1Nt67FQwlE/UQVREmeYB6I/AAAAAAAAAlY/TwNxb0k3Pss/s1600/integral_diffuse.png"><img alt="" src="http://1.bp.blogspot.com/-I1Nt67FQwlE/UQVREmeYB6I/AAAAAAAAAlY/TwNxb0k3Pss/s1600/integral_diffuse.png" width="400" height="180" border="0" /></a></div>
<p>The indirect diffuse calculation is done using the same cones as the AO so that both the indirect diffuse and AO can be calculated at the same time:</p>
<div class="separator" style="clear: both;text-align: center"><a style="margin-left: 1em;margin-right: 1em" href="http://2.bp.blogspot.com/-a-W0YCegQlE/UQVRKrvszeI/AAAAAAAAAlg/N794saJ2IEY/s1600/vctp_diffuse.png"><img alt="" src="http://2.bp.blogspot.com/-a-W0YCegQlE/UQVRKrvszeI/AAAAAAAAAlg/N794saJ2IEY/s1600/vctp_diffuse.png" width="320" height="248" border="0" /></a></div>
<p>Finally, we calculate the indirect specular, where cone is traced in the reflected view direction along the surface normal. The cone angle is depends on the glossiness, <i><b>g</b></i>, of the material, currently I use the following equation to calculate the angle:</p>
<div class="separator" style="clear: both;text-align: center"><a style="margin-left: 1em;margin-right: 1em" href="http://4.bp.blogspot.com/-wTO1GBD_KoA/UQVRSqalOnI/AAAAAAAAAlo/GMIiPBUjA0o/s1600/specConeAngle.png"><img alt="" src="http://4.bp.blogspot.com/-wTO1GBD_KoA/UQVRSqalOnI/AAAAAAAAAlo/GMIiPBUjA0o/s1600/specConeAngle.png" width="320" height="54" border="0" /></a></div>
<p>The glossiness is limited to a range so that the cone angle will not be too narrow to avoid stepping through thin walls. Here is the result of indirect specular:</p>
<div class="separator" style="clear: both;text-align: center"><a style="margin-left: 1em;margin-right: 1em" href="http://1.bp.blogspot.com/-5QDIN_rV0b4/UQVRieAprQI/AAAAAAAAAlw/mpQVJe6-0jA/s1600/vctp_specular.png"><img alt="" src="http://1.bp.blogspot.com/-5QDIN_rV0b4/UQVRieAprQI/AAAAAAAAAlw/mpQVJe6-0jA/s1600/vctp_specular.png" width="320" height="248" border="0" /></a></div>
<p>In my engine, I use a light pre-pass renderer, so for the opaque objects, I calculate the indirect lighting by rendering a full screen quad after the lighting pass which is then blend on top of the direct lighting buffer using the following blend state:</p>
<div class="separator" style="clear: both;text-align: center"><a style="margin-left: 1em;margin-right: 1em" href="http://2.bp.blogspot.com/-IFxVbID7h9Q/UQVRo3fvcbI/AAAAAAAAAl4/Ml0nc3U5RfA/s1600/blendStateSameAO.png"><img alt="" src="http://2.bp.blogspot.com/-IFxVbID7h9Q/UQVRo3fvcbI/AAAAAAAAAl4/Ml0nc3U5RfA/s1600/blendStateSameAO.png" width="200" height="32" border="0" /></a></div>
<p>With the value of AO store in alpha channel. this can apply AO to both the direct and indirect lighting. Sometimes different AO intensity is need for the direct and indirect light, this blend state can be used instead:</p>
<div class="separator" style="clear: both;text-align: center"><a style="margin-left: 1em;margin-right: 1em" href="http://2.bp.blogspot.com/-R4tJaQQAy9k/UQVRtaYZ12I/AAAAAAAAAmA/5_E46w_rfIM/s1600/blendStateDifferentAO.png"><img alt="" src="http://2.bp.blogspot.com/-R4tJaQQAy9k/UQVRtaYZ12I/AAAAAAAAAmA/5_E46w_rfIM/s1600/blendStateDifferentAO.png" width="200" height="30" border="0" /></a></div>
<p>which apply the alpha blending to only the direct lighting and the indirect AO is applied inside the shader.</p>
<p>When filtering mip map directionally from the leaf node, normal direction is not taken into account, which results in more light leaking for thin objects as below. And this artifact can be hidded slightly with the AO value calculated in cone tracing:</p>
<table>
<tbody>
<tr>
<td>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a style="margin-left: auto;margin-right: auto" href="http://1.bp.blogspot.com/-opltKhwHevA/UQVRzj3jn-I/AAAAAAAAAmI/skfb8vZDpD4/s1600/lightLeak0.png"><img alt="" src="http://1.bp.blogspot.com/-opltKhwHevA/UQVRzj3jn-I/AAAAAAAAAmI/skfb8vZDpD4/s1600/lightLeak0.png" width="320" height="248" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center">light leak through thin geometry</td>
</tr>
</tbody>
</table>
</td>
<td>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a style="margin-left: auto;margin-right: auto" href="http://3.bp.blogspot.com/-bVruWs-TAzg/UQVR4cUArWI/AAAAAAAAAmQ/2AmuZwzDhQU/s1600/lightLeak1.png"><img alt="" src="http://3.bp.blogspot.com/-bVruWs-TAzg/UQVR4cUArWI/AAAAAAAAAmQ/2AmuZwzDhQU/s1600/lightLeak1.png" width="320" height="248" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center">apply the AO can only hide the leaking a bit</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
<p><b><span class="Apple-style-span" style="font-size: large">Dynamic update</span></b></p>
<p>To make the indirect illumination calculation faster, 5 ways are used to speed up the calculation a bit.</p>
<p>First, when the scene is initialized, the first 4 steps that described in the overview section are performed for all the static geometry. Then in every frame, we re-calculate all the 5 steps for voxels that affected by dynamic objects. So, only dynamic objects will be voxelized every frame (while static voxels are already stored in the octree and we don&#8217;t overwrite those data which can be identified by bit flag stored in the node). Those dynamic voxel fragments are appended to the end of the octree node buffer which can be cleared easily. Also, we need to reset the static octree node neighbor offset which points to dynamic nodes, and those static nodes that affected by dynamic voxels in previous and current frame need to re-filter again. Those nodes are found by dispatching threads for all the nodes and queue those node index into another buffer. So only those with changed value will be re-calculated.</p>
<p>Second, for voxelizing the dynamic geometry, the InterlockedMax() function is used to compute both the diffuse and normal for the octree voxel which is faster than performing an atomic average. Note that using InterlockedMax() function for the normal will decrease the lighting quality if the voxel volume is at a small size(e.g. 256x256x256). You can see the artifact in the below figure:</p>
<table>
<tbody>
<tr>
<td>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a style="margin-left: auto;margin-right: auto" href="http://4.bp.blogspot.com/-MsDVhrYUW24/UQVSfesq4MI/AAAAAAAAAmY/tAxrLB6uhGc/s1600/duMax0.png"><img alt="" src="http://4.bp.blogspot.com/-MsDVhrYUW24/UQVSfesq4MI/AAAAAAAAAmY/tAxrLB6uhGc/s1600/duMax0.png" width="320" height="248" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center">Reflect light using an average voxel normal</td>
</tr>
</tbody>
</table>
</td>
<td>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a style="margin-left: auto;margin-right: auto" href="http://1.bp.blogspot.com/-pdyte6aypYk/UQVSjiC57dI/AAAAAAAAAmg/iRv5cvNOav8/s1600/duMax1.png"><img alt="" src="http://1.bp.blogspot.com/-pdyte6aypYk/UQVSjiC57dI/AAAAAAAAAmg/iRv5cvNOav8/s1600/duMax1.png" width="320" height="248" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center">Reflect light using InterlockedMax() voxel normal.</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
<p>So I only use InterlockedMax() function for normal in the dynamic geometry and keeping the static geometry computing an average normal. That is why you can see the octree voxel data structure(in octree building pass section) store a 1 byte counter along with the normal but not other attributes.</p>
<p>Third, I perform a view-frustum culling when injecting the direct lighting into the octree because there is no point to filter the light that is far from the camera where we never sample it from the current point of view. An extended frustum is calculated from the current camera (refer to the figure below) for culling.</p>
<div class="separator" style="clear: both;text-align: center"><a style="margin-left: 1em;margin-right: 1em" href="http://4.bp.blogspot.com/-EpQcL8m2LUE/UQVTPk2wMoI/AAAAAAAAAmo/ZB5I02PpSTc/s1600/frustumCulling.png"><img alt="" src="http://4.bp.blogspot.com/-EpQcL8m2LUE/UQVTPk2wMoI/AAAAAAAAAmo/ZB5I02PpSTc/s1600/frustumCulling.png" width="196" height="200" border="0" /></a></div>
<p>This frustum is simply moving the camera backward a bit with increased far plane using the same field of view. The extended distance is calculated by the maximum cone tracing distance (I limited the cone tracing distance in the demo, which lost the ability to sample from far objects like the sky) and the filtered voxel size.</p>
<p>Fourthly, I perform the cone tracing pass at half resolution of the screen resolution. However, this result in visible artifact when up-scaling to full resolution especially at the edge of the geometry (the strength of the indirect lighting in the following screen shots are increased to show the artifact more clearly).</p>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a style="margin-left: auto;margin-right: auto" href="http://2.bp.blogspot.com/-Uxm44Xox4sc/UQVTrCqnbjI/AAAAAAAAAmw/tcdlFaA9QUg/s1600/duUpScaleHalfResNone.png"><img alt="" src="http://2.bp.blogspot.com/-Uxm44Xox4sc/UQVTrCqnbjI/AAAAAAAAAmw/tcdlFaA9QUg/s1600/duUpScaleHalfResNone.png" width="320" height="248" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center">up-sample with only bilinear filtering</td>
</tr>
</tbody>
</table>
<p>So, I first decided to just fix those pixels by finding them with an edge detection filter using depth buffer.</p>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a style="margin-left: auto;margin-right: auto" href="http://4.bp.blogspot.com/-9gFKRv9kzvw/UQVTzJNxcSI/AAAAAAAAAm4/g6aPih3XPg4/s1600/duUpScaleHalfResEdgeDetection.png"><img alt="" src="http://4.bp.blogspot.com/-9gFKRv9kzvw/UQVTzJNxcSI/AAAAAAAAAm4/g6aPih3XPg4/s1600/duUpScaleHalfResEdgeDetection.png" width="320" height="248" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center">perform an edge detection pas</td>
</tr>
</tbody>
</table>
<p>And then perform cone tracing at those pixel again at full resolution:</p>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a style="margin-left: auto;margin-right: auto" href="http://4.bp.blogspot.com/-UWt20sb14Vs/UQVT73f6jRI/AAAAAAAAAnA/6v6v2B6Y0Vw/s1600/duUpScaleHalfResReTrace.png"><img alt="" src="http://4.bp.blogspot.com/-UWt20sb14Vs/UQVT73f6jRI/AAAAAAAAAnA/6v6v2B6Y0Vw/s1600/duUpScaleHalfResReTrace.png" width="320" height="248" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center">perform the cone-tracing again in the edge pixel at full resolution</td>
</tr>
</tbody>
</table>
<p>Some of the artifacts are gone, but the frame rate drops a lot again&#8230;</p>
<p>So, my second attempt is to just simply blur those pixels (averaging with the neighbor pixel). The quality of blurring is not as good as re-compute the cone tracing, but it is much faster.</p>
<table>
<tbody>
<tr>
<td>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a style="margin-left: auto;margin-right: auto" href="http://4.bp.blogspot.com/-xLV_g8cuF44/UQVUJyJ6h5I/AAAAAAAAAnI/cKMmzn5IU7g/s1600/duUpScaleHalfResBlur.png"><img alt="" src="http://4.bp.blogspot.com/-xLV_g8cuF44/UQVUJyJ6h5I/AAAAAAAAAnI/cKMmzn5IU7g/s1600/duUpScaleHalfResBlur.png" width="320" height="248" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center">up sample with blurring at edge</td>
</tr>
</tbody>
</table>
</td>
<td>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a style="margin-left: auto;margin-right: auto" href="http://4.bp.blogspot.com/-9gLIX8e_wes/UQVUPYxPPFI/AAAAAAAAAnQ/cQlHN2JRNUk/s1600/duUpScaleFullRes.png"><img alt="" src="http://4.bp.blogspot.com/-9gLIX8e_wes/UQVUPYxPPFI/AAAAAAAAAnQ/cQlHN2JRNUk/s1600/duUpScaleFullRes.png" width="320" height="248" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center">cone tracing at full resolution for reference</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
<div></div>
<div>Lastly, The voxel world can be updated at a different frequency than the render frame rate. By assuming the  light source/dynamic models are not moving very fast, we can take advantage of the temporary coherency and perform the voxelization at a lower frequency, say update at every 5 frames. We can trade the accuracy of the voxels for update speed, but this may result in the artifact shown below:</div>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a style="margin-left: auto;margin-right: auto" href="http://1.bp.blogspot.com/-gmnKqXtcuw4/USuEm-2gxLI/AAAAAAAAApU/Bf9je4_cogk/s1600/duInterval.png"><img alt="" src="http://1.bp.blogspot.com/-gmnKqXtcuw4/USuEm-2gxLI/AAAAAAAAApU/Bf9je4_cogk/s1600/duInterval.png" width="320" height="248" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center">The voxel model is lag behind the triangle<br />
mesh due to the lower update frequency</td>
</tr>
</tbody>
</table>
<p><b><span class="Apple-style-span" style="font-size: large">Conclusion</span></b></p>
<p>The advantage of voxel cone tracing is to compute the GI in real-time with both the dynamic lighting and geometry. Also the specular indirect lighting gives a very nice glossy effect. However, it uses lots of the processing power/memory and quality of lighting is not as good as the baked solution. In my implementation, I can only use 1 directional light to compute single bounce indirect lighting. And there are still room to improve in my implementation as fewer cones can be used for tracing, using the depth buffer for view frustum culling, better up-sampling when performing cone tracing and divide the voxels into several regions to handle a larger scene like Unreal Engine does. But there is never enough time to implement all that stuff&#8230; So I decided to release the <a href="https://docs.google.com/file/d/0B_CrrCOiha-VdWp4cXNZcllWRmM/edit">demo</a> at this stage first. In the demo, I have added some simple interface (for changing stuffs like the light direction, indirect lighting strength) for you to play around with. Hope you all enjoy the <a href="https://docs.google.com/file/d/0B_CrrCOiha-VdWp4cXNZcllWRmM/edit">demo</a>.</p>
<p>Finally, I would like to thanks Kevin Gadd and Luke Hutchinson for reviewing this article.</p>
<table>
<tbody>
<tr>
<td>
<div class="separator" style="clear: both;text-align: center"><a style="margin-left: 1em;margin-right: 1em" href="http://1.bp.blogspot.com/-3XIunxXIcFI/UQVU6EscbcI/AAAAAAAAAnY/MppCCKxFpec/s1600/demo0.png"><img alt="" src="http://1.bp.blogspot.com/-3XIunxXIcFI/UQVU6EscbcI/AAAAAAAAAnY/MppCCKxFpec/s1600/demo0.png" width="320" height="248" border="0" /></a></div>
</td>
<td>
<div class="separator" style="clear: both;text-align: center"><a style="margin-left: 1em;margin-right: 1em" href="http://1.bp.blogspot.com/-slFJ6rttJco/UQVU-PxzkbI/AAAAAAAAAng/mAelRkhNNfA/s1600/demo1.png"><img alt="" src="http://1.bp.blogspot.com/-slFJ6rttJco/UQVU-PxzkbI/AAAAAAAAAng/mAelRkhNNfA/s1600/demo1.png" width="320" height="248" border="0" /></a></div>
</td>
</tr>
</tbody>
</table>
<p>&nbsp;</p>
</div>
<div><b>References</b></div>
<div><span class="Apple-style-span" style="font-size: x-small">[1] The Technology Behind the “Unreal Engine 4 Elemental demo” <a href="http://www.unrealengine.com/files/misc/The_Technology_Behind_the_Elemental_Demo_16x9_(2).pdf">http://www.unrealengine.com/files/misc/The_Technology_Behind_the_Elemental_Demo_16x9_(2).pdf</a></span></div>
<div><span class="Apple-style-span" style="font-size: x-small">[2] Interactive Indirect Illumination Using Voxel Cone Tracing <a href="http://maverick.inria.fr/Publications/2011/CNSGE11b/GIVoxels-pg2011-authors.pdf">http://maverick.inria.fr/Publications/2011/CNSGE11b/GIVoxels-pg2011-authors.pdf</a></span></div>
<div><span class="Apple-style-span" style="font-size: x-small">[3] Octree-Based Sparse Voxelization Using the GPU Hardware Rasterizer <a href="http://www.seas.upenn.edu/%7Epcozzi/OpenGLInsights/OpenGLInsights-SparseVoxelization.pdf">http://www.seas.upenn.edu/%7Epcozzi/OpenGLInsights/OpenGLInsights-SparseVoxelization.pdf</a></span></div>
<div><span class="Apple-style-span" style="font-size: x-small">[4] GigaVoxels: A Voxel-Based Rendering Pipeline For Efficient Exploration Of Large And Detailed Scenes</span><br />
<a href="http://maverick.inria.fr/Publications/2011/Cra11/"><span class="Apple-style-span" style="font-size: x-small">http://maverick.inria.fr/Publications/2011/Cra11/</span></a></div>
<div><span class="Apple-style-span" style="font-size: x-small">[5] GPU Gems 2: Conservative Rasterization <a href="http://http.developer.nvidia.com/GPUGems2/gpugems2_chapter42.html">http://http.developer.nvidia.com/GPUGems2/gpugems2_chapter42.html</a></span><br />
<span class="Apple-style-span" style="font-size: x-small">[6] Shading in Valve’s Source Engine <a href="http://www.valvesoftware.com/publications/2006/SIGGRAPH06_Course_ShadingInValvesSourceEngine.pdf">http://www.valvesoftware.com/publications/2006/SIGGRAPH06_Course_ShadingInValvesSourceEngine.pdf</a></span><br />
<span class="Apple-style-span" style="font-size: x-small">[7] Perpendicular Possibilities <a href="http://blog.selfshadow.com/2011/10/17/perp-vectors/">http://blog.selfshadow.com/2011/10/17/perp-vectors/</a></span></div>
<div><span class="Apple-style-span" style="font-size: x-small">[8] A couple of notes about Z <a href="http://www.humus.name/index.php?ID=255">http://www.humus.name/index.php?ID=255</a></span></div>
<div></div>
<div></div>
<div></div>
<div></div>
<div></div>
<div></div>
<div></div>
</div>
]]></content:encoded>
			<wfw:commentRss>http://www.altdevblogaday.com/2013/01/31/implementing-voxel-cone-tracing/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Angle based SSAO</title>
		<link>http://www.altdevblogaday.com/2012/10/12/angle-based-ssao/</link>
		<comments>http://www.altdevblogaday.com/2012/10/12/angle-based-ssao/#comments</comments>
		<pubDate>Fri, 12 Oct 2012 02:41:30 +0000</pubDate>
		<dc:creator>Simon Yeung</dc:creator>
				<category><![CDATA[Computer Graphics]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[graphics]]></category>
		<category><![CDATA[SSAO]]></category>

		<guid isPermaLink="false">http://www.altdevblogaday.com/?p=28316</guid>
		<description><![CDATA[<p><strong><span class="Apple-style-span" style="font-size: large">Introduction</span></strong></p>
<p><a href="http://en.wikipedia.org/wiki/Screen_space_ambient_occlusion">SSAO (Screen space ambient occlusion)</a> is a common post processing effect that approximate how much light is occluded in a given surface by the surrounding objects. In this year SIGGRAPH, there are a few slides in <a href="http://advances.realtimerendering.com/s2012/Epic/The%20Technology%20Behind%20the%20Elemental%20Demo%2016x9.pptx">&#8220;The Technology behind the Unreal Engine 4 Elemental Demo&#8221;</a> about how they implement SSAO. Their technique can either use only the depth buffer or with the addition of per-pixel normal. And I tried to implement both version with a slight modification:</p>
<p><a href="http://www.altdevblogaday.com/2012/10/12/angle-based-ssao/" class="more-link">Read more on Angle based SSAO&#8230;</a></p>
]]></description>
				<content:encoded><![CDATA[<p><strong><span class="Apple-style-span" style="font-size: large">Introduction</span></strong></p>
<p><a href="http://en.wikipedia.org/wiki/Screen_space_ambient_occlusion">SSAO (Screen space ambient occlusion)</a> is a common post processing effect that approximate how much light is occluded in a given surface by the surrounding objects. In this year SIGGRAPH, there are a few slides in <a href="http://advances.realtimerendering.com/s2012/Epic/The%20Technology%20Behind%20the%20Elemental%20Demo%2016x9.pptx">&#8220;The Technology behind the Unreal Engine 4 Elemental Demo&#8221;</a> about how they implement SSAO. Their technique can either use only the depth buffer or with the addition of per-pixel normal. And I tried to implement both version with a slight modification:</p>
<table>
<tbody>
<tr>
<td>
<div class="separator" style="clear: both;text-align: center"><a href="http://1.bp.blogspot.com/-drIom4UieQk/UGsR9DsCXSI/AAAAAAAAAa0/GTInkJX0d_Q/s1600/finalSSAO.png"><img src="http://1.bp.blogspot.com/-drIom4UieQk/UGsR9DsCXSI/AAAAAAAAAa0/GTInkJX0d_Q/s320/finalSSAO.png" alt="" width="320" height="248" border="0" /></a></div>
</td>
<td>
<div class="separator" style="clear: both;text-align: center"><a href="http://2.bp.blogspot.com/-x-P_nUAliGw/UGsSBcsZ21I/AAAAAAAAAa8/SmW8SF16170/s1600/final.png"><img src="http://2.bp.blogspot.com/-x-P_nUAliGw/UGsSBcsZ21I/AAAAAAAAAa8/SmW8SF16170/s320/final.png" alt="" width="320" height="248" border="0" /></a></div>
</td>
</tr>
</tbody>
</table>
<p><strong><span class="Apple-style-span" style="font-size: large">Using only the depth buffer</span></strong></p>
<p>The definition of <a href="http://en.wikipedia.org/wiki/Ambient_occlusion">ambient occlusion</a> is to calculate the visibility integral over the hemisphere of a given surface:</p>
<div class="separator" style="clear: both;text-align: center"><a href="http://2.bp.blogspot.com/-NKUjkkDPW7I/UGuZuxHDEvI/AAAAAAAAAck/uGrCy3QKh8g/s1600/aoIntegral.png"><img src="http://2.bp.blogspot.com/-NKUjkkDPW7I/UGuZuxHDEvI/AAAAAAAAAck/uGrCy3QKh8g/s320/aoIntegral.png" alt="" width="320" height="66" border="0" /></a></div>
<p>To approximate this in screen space, we design our sampling pattern as paired samples:</p>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a href="http://1.bp.blogspot.com/-lzDuL7ywx5Y/UGuZ0YZL6rI/AAAAAAAAAcs/Q1JVSBOJMKA/s1600/pattern.png"><img src="http://1.bp.blogspot.com/-lzDuL7ywx5Y/UGuZ0YZL6rI/AAAAAAAAAcs/Q1JVSBOJMKA/s200/pattern.png" alt="" width="200" height="156" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center">paired sample pattern</td>
</tr>
</tbody>
</table>
<p>So for each pair of samples, we can approximate how much the shading point is occluded in 2D instead of integrating over the hemisphere:</p>
<div class="separator" style="clear: both;text-align: center"><a href="http://2.bp.blogspot.com/-Me10fR56gHs/UGuZ5pLGQoI/AAAAAAAAAc0/hZhdPr1GIpU/s1600/ao2D.png"><img src="http://2.bp.blogspot.com/-Me10fR56gHs/UGuZ5pLGQoI/AAAAAAAAAc0/hZhdPr1GIpU/s200/ao2D.png" alt="" width="200" height="144" border="0" /></a></div>
<p>The AO term for each given pair of samples will be min( (θ<span class="Apple-style-span" style="font-size: xx-small">left</span> + θ<span class="Apple-style-span" style="font-size: xx-small">right</span>)/π, 1). Then by averaging the AO terms of all the sample pairs (in my case, there are 6 pairs), we achieve the following result:</p>
<div class="separator" style="clear: both;text-align: center"><a href="http://1.bp.blogspot.com/-L2X8QQuRm-k/UGsSq_xSkwI/AAAAAAAAAbE/79HVZD726Ro/s1600/ssaoDepthOnlyNoDisAtten.png"><img src="http://1.bp.blogspot.com/-L2X8QQuRm-k/UGsSq_xSkwI/AAAAAAAAAbE/79HVZD726Ro/s320/ssaoDepthOnlyNoDisAtten.png" alt="" width="320" height="248" border="0" /></a></div>
<p><strong><span class="Apple-style-span" style="font-size: large">Dealing with large depth differences</span></strong></p>
<p>As seen from the above screen shot, there is dark halos around the knight. But the knight should not contribute AO to the castle as he is too far away. So to deal with the large depth differences. I adopt the approach used in <a href="http://advances.realtimerendering.com/s2010/Ownby,Hall%20and%20Hall%20-%20Toystory3%20(SIGGRAPH%202010%20Advanced%20RealTime%20Rendering%20Course).pdf">Toy Story 3</a>. If one of the paired sample is too far away from the shading point, say the red point in the following figure, it will be replace by the pink point, which is on the same plane as the other valid paired sample:</p>
<div class="separator" style="clear: both;text-align: center"><a href="http://1.bp.blogspot.com/-850b-rBaKp0/UGuaMkQJodI/AAAAAAAAAc8/M0oUukbYqcE/s1600/largeDepth.png"><img src="http://1.bp.blogspot.com/-850b-rBaKp0/UGuaMkQJodI/AAAAAAAAAc8/M0oUukbYqcE/s200/largeDepth.png" alt="" width="200" height="180" border="0" /></a></div>
<p>So we can interpolate between the red point and the pink point for dealing with the large depth difference. Now the dark halo has gone:</p>
<div class="separator" style="clear: both;text-align: center"><a href="http://3.bp.blogspot.com/-9O5WWj6WrsI/UGsS46doFPI/AAAAAAAAAbM/Wv2lgq9TC6M/s1600/ssaoDepthOnlyDisAttenNoWeight.png"><img src="http://3.bp.blogspot.com/-9O5WWj6WrsI/UGsS46doFPI/AAAAAAAAAbM/Wv2lgq9TC6M/s320/ssaoDepthOnlyDisAttenNoWeight.png" alt="" width="320" height="248" border="0" /></a></div>
<p>The above treatment only handle if one of the paired sample is far away from shading point. What if both of the samples have large depth differences?</p>
<table>
<tbody>
<tr>
<td>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a href="http://3.bp.blogspot.com/-BQPnvxeS5M4/UHd-KFLHVtI/AAAAAAAAAd8/ECofPtuQldo/s1600/ssaoDepthOnlyDisAttenNoWeightArtifact.png"><img src="http://3.bp.blogspot.com/-BQPnvxeS5M4/UHd-KFLHVtI/AAAAAAAAAd8/ECofPtuQldo/s1600/ssaoDepthOnlyDisAttenNoWeightArtifact.png" alt="" width="320" height="248" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center">dark halo artifact is shown around the sword</td>
</tr>
</tbody>
</table>
</td>
<td>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a href="http://2.bp.blogspot.com/-TMdmEecY7MM/UHd-RhtjJmI/AAAAAAAAAeE/G6r2Byc7ObU/s1600/ssaoDepthOnlyDisAttenNoWeightArtifactCombine.png"><img src="http://2.bp.blogspot.com/-TMdmEecY7MM/UHd-RhtjJmI/AAAAAAAAAeE/G6r2Byc7ObU/s1600/ssaoDepthOnlyDisAttenNoWeightArtifactCombine.png" alt="" width="320" height="248" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center">AO strength of this pic is increased to high light the artifact</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
<p>In this case, it will result in the dark halo around the sword in the above screen shot. Remember we are averaging the all the paired samples to compute the final AO value. So to deal with this artifact, we just assign a weight to each paired samples and then re-normalize the final result. Say, for each paired sample, if both of the samples are within a small depth differences, that sample pair will have a weight of 1. If only 1 sample is far away, that pair will have a weight of 0.5. And finally if both of the samples is far away, the weight will be 0. This can eliminate most(but not all) of the artifacts:</p>
<div class="separator" style="clear: both;text-align: center"><a href="http://4.bp.blogspot.com/-ReDzjqkwwJc/UHd-UkD-qGI/AAAAAAAAAeM/s0uqq8e7gU4/s1600/ssaoDepthOnlyDisAttenWithWeight.png"><img src="http://4.bp.blogspot.com/-ReDzjqkwwJc/UHd-UkD-qGI/AAAAAAAAAeM/s0uqq8e7gU4/s1600/ssaoDepthOnlyDisAttenWithWeight.png" alt="" width="320" height="248" border="0" /></a></div>
<p><strong><span class="Apple-style-span" style="font-size: large">Approximating arc-cos function</span></strong></p>
<p>In this approach, the AO is calculated by using the angle between the paired samples, which need to evaluate the arc-cos function which is a bit expensive. We can approximate acos(x) with a linear function:  π(1-x)/2.</p>
<div class="separator" style="clear: both;text-align: center"><a href="http://1.bp.blogspot.com/-z9JX5f6oJqc/UGsT1nYSp6I/AAAAAAAAAbs/F7YCjq9Sc_Q/s1600/acos_graphLinear.png"><img src="http://1.bp.blogspot.com/-z9JX5f6oJqc/UGsT1nYSp6I/AAAAAAAAAbs/F7YCjq9Sc_Q/s320/acos_graphLinear.png" alt="" width="320" height="178" border="0" /></a></div>
<p>And the resulting AO looks much darker with this approximation:</p>
<table>
<tbody>
<tr>
<td>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a href="http://1.bp.blogspot.com/-8vM4blit0E0/UGsUFhjEw6I/AAAAAAAAAb0/srJ_UKXueyE/s1600/acos.png"><img src="http://1.bp.blogspot.com/-8vM4blit0E0/UGsUFhjEw6I/AAAAAAAAAb0/srJ_UKXueyE/s320/acos.png" alt="" width="320" height="248" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center"><span class="Apple-style-span">computed </span>with<span class="Apple-style-span"> the arc-cos function</span></td>
</tr>
</tbody>
</table>
</td>
<td>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a href="http://3.bp.blogspot.com/-iqa85VVFz34/UGsUPuYwwhI/AAAAAAAAAb8/K1A3NIlQRMQ/s1600/acos_linearApprox.png"><img src="http://3.bp.blogspot.com/-iqa85VVFz34/UGsUPuYwwhI/AAAAAAAAAb8/K1A3NIlQRMQ/s320/acos_linearApprox.png" alt="" width="320" height="248" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center"><span class="Apple-style-span">computed </span>with<span class="Apple-style-span"> the linear approximation</span></td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
<p>Note that the maximum error between the two function is around 18.946 degree.</p>
<div class="separator" style="clear: both;text-align: center"><a href="http://2.bp.blogspot.com/-yUoq9kw8lUw/UGxS9nYRZzI/AAAAAAAAAdc/H_yeOXzQOxc/s1600/error_linearProve.png"><img src="http://2.bp.blogspot.com/-yUoq9kw8lUw/UGxS9nYRZzI/AAAAAAAAAdc/H_yeOXzQOxc/s200/error_linearProve.png" alt="" width="200" height="196" border="0" /></a></div>
<p>This may affect the AO for the area of a curved surface with low tessellation. You may either need to increase the bias angle threshold or switch to a more accurate function. So my second attempt is to approximate it with a quadratic function:  π(1- sign(x) * x * x)/2.</p>
<div class="separator" style="clear: both;text-align: center"><a href="http://2.bp.blogspot.com/-dTQBBBeHSWQ/UGsV5G2y7ZI/AAAAAAAAAcM/L0MREzz0z6I/s1600/acos_graphQuadratic.png"><img src="http://2.bp.blogspot.com/-dTQBBBeHSWQ/UGsV5G2y7ZI/AAAAAAAAAcM/L0MREzz0z6I/s400/acos_graphQuadratic.png" alt="" width="400" height="197" border="0" /></a></div>
<p>And this approximation shows a much similar result to the one using the arc-cos function.</p>
<table>
<tbody>
<tr>
<td>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a href="http://1.bp.blogspot.com/-8vM4blit0E0/UGsUFhjEw6I/AAAAAAAAAb0/srJ_UKXueyE/s1600/acos.png"><img src="http://1.bp.blogspot.com/-8vM4blit0E0/UGsUFhjEw6I/AAAAAAAAAb0/srJ_UKXueyE/s320/acos.png" alt="" width="320" height="248" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center">computed with the arc-cos function</td>
</tr>
</tbody>
</table>
</td>
<td>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a href="http://2.bp.blogspot.com/-aNjP2IxIZ0U/UGsUV_iiqgI/AAAAAAAAAcE/lZhdcZtB16o/s1600/acos_quadraticApprox.png"><img src="http://2.bp.blogspot.com/-aNjP2IxIZ0U/UGsUV_iiqgI/AAAAAAAAAcE/lZhdcZtB16o/s320/acos_quadraticApprox.png" alt="" width="320" height="248" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center"><span class="Apple-style-span">computed </span>with<span class="Apple-style-span"> the quadratic approximation</span></td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
<p>And the maximum error of this function is around 9.473 degree.</p>
<div class="separator" style="clear: both;text-align: center"><a href="http://2.bp.blogspot.com/-KfygFiMxqno/UGxTO5004AI/AAAAAAAAAdk/9hsnSWl_Ddk/s1600/error_quadraticProve.png"><img src="http://2.bp.blogspot.com/-KfygFiMxqno/UGxTO5004AI/AAAAAAAAAdk/9hsnSWl_Ddk/s320/error_quadraticProve.png" alt="" width="320" height="265" border="0" /></a></div>
<p><strong><span class="Apple-style-span" style="font-size: large">Using per-pixel normal</span></strong></p>
<p>We can enhance the details of AO by making use of the per-pixel normal. The per-pixel normal is used for further restricting the angle to compute the AO where the angle θ<span class="Apple-style-span" style="font-size: xx-small">left</span>, θ<span class="Apple-style-span" style="font-size: xx-small">right</span> are clamped to the tangent plane :</p>
<div class="separator" style="clear: both;text-align: center"><a href="http://4.bp.blogspot.com/-13mf9GAdRuU/UGuaT96AoxI/AAAAAAAAAdE/wRhp4A-mMUo/s1600/restrictByNormal.png"><img src="http://4.bp.blogspot.com/-13mf9GAdRuU/UGuaT96AoxI/AAAAAAAAAdE/wRhp4A-mMUo/s200/restrictByNormal.png" alt="" width="200" height="147" border="0" /></a></div>
<p>And here is the final result:</p>
<div class="separator" style="clear: both;text-align: center"><a href="http://1.bp.blogspot.com/-drIom4UieQk/UGsR9DsCXSI/AAAAAAAAAa0/GTInkJX0d_Q/s1600/finalSSAO.png"><img src="http://1.bp.blogspot.com/-drIom4UieQk/UGsR9DsCXSI/AAAAAAAAAa0/GTInkJX0d_Q/s320/finalSSAO.png" alt="" width="320" height="248" border="0" /></a></div>
<p><strong><span class="Apple-style-span" style="font-size: large">Conclusion</span></strong></p>
<p>The result of this AO is pleasant by taking total 12 samples per pixel and with 16 rotation in 4&#215;4 pixel block at half resolution. I did not apply bilateral blur to the AO result, but applying the blur may gives a softer AO look. Also approximating the arc-cos function with a linear function although is not accurate, but it gives a good enough result for me. Finally more time are need to spend on generating the sampling pattern in the future where the pattern I currently used is nearly uniform distributed (with some jittering).</p>
<p><strong>References</strong><br />
[1] The Technology behind the Unreal Engine 4 Elemental Demo <a href="http://advances.realtimerendering.com/s2012/Epic/The%20Technology%20Behind%20the%20Elemental%20Demo%2016x9.pptx">http://advances.realtimerendering.com/s2012/Epic/The%20Technology%20Behind%20the%20Elemental%20Demo%2016&#215;9.pptx</a><br />
[2] Rendering techniques in Toy Story 3 <a href="http://advances.realtimerendering.com/s2010/Ownby,Hall%20and%20Hall%20-%20Toystory3%20(SIGGRAPH%202010%20Advanced%20RealTime%20Rendering%20Course).pdf">http://advances.realtimerendering.com/s2010/Ownby,Hall%20and%20Hall%20-%20Toystory3%20(SIGGRAPH%202010%20Advanced%20RealTime%20Rendering%20Course).pdf</a><br />
[3] Image-Space Horizon-Based Ambient Occlusion <a href="http://www.nvidia.com/object/siggraph-2008-HBAO.html">http://www.nvidia.com/object/siggraph-2008-HBAO.html</a><br />
[4] <a href="http://www.wolframalpha.com/">http://www.wolframalpha.com/</a><br />
[5] The models are export from UDK and extracted from Infinity Blade using umodel.exe</p>
]]></content:encoded>
			<wfw:commentRss>http://www.altdevblogaday.com/2012/10/12/angle-based-ssao/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Building an HTML5 Game? Don’t Shrug Off Atlases.</title>
		<link>http://www.altdevblogaday.com/2012/09/17/building-an-html5-game-dont-shrug-off-atlases/</link>
		<comments>http://www.altdevblogaday.com/2012/09/17/building-an-html5-game-dont-shrug-off-atlases/#comments</comments>
		<pubDate>Mon, 17 Sep 2012 14:16:19 +0000</pubDate>
		<dc:creator>Colt McAnlis</dc:creator>
				<category><![CDATA[Computer Graphics]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[Tools]]></category>
		<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.altdevblogaday.com/?p=27879</guid>
		<description><![CDATA[<p>HTML5 is an amazing technology for designing web sites. The general flexibility of HTML5 markup and JavaScript often leads web developers to create their content using individual image elements. This approach works well for small sites with low overhead, but for games or other high-load websites, using droves of single image elements leads to long load times and slow performance, resulting in a poor end-user experience.  In an ecosystem where <a href="http://newmediaandmarketing.com/a-1-second-delay-in-webpage-load-time-could-cost-you-2-5-million/website-basics/">3 seconds</a> may cause you to lose half your users, it’s important to use the proper tool to address this issue: texture atlasing.</p>
<p><a href="http://www.altdevblogaday.com/2012/09/17/building-an-html5-game-dont-shrug-off-atlases/" class="more-link">Read more on Building an HTML5 Game? Don’t Shrug Off Atlases&#8230;.</a></p>
]]></description>
				<content:encoded><![CDATA[<p>HTML5 is an amazing technology for designing web sites. The general flexibility of HTML5 markup and JavaScript often leads web developers to create their content using individual image elements. This approach works well for small sites with low overhead, but for games or other high-load websites, using droves of single image elements leads to long load times and slow performance, resulting in a poor end-user experience.  In an ecosystem where <a href="http://newmediaandmarketing.com/a-1-second-delay-in-webpage-load-time-could-cost-you-2-5-million/website-basics/">3 seconds</a> may cause you to lose half your users, it’s important to use the proper tool to address this issue: texture atlasing.</p>
<p style="text-align: center"><img class="aligncenter" src="https://lh4.googleusercontent.com/XOjzZEE1NPCrZ35rrfQdRvKoZpPC1vYPVFOH9ERiUAOAys_ccwIeNYs1C3HUaxA3ofPdjmKoeKtXcPHECSu0PozWJnOecwgHFIQ-FVBARlzL7PyuGhE" alt="" width="512px;" height="512px;" /></p>
<p style="text-align: center" dir="ltr"><em>A texture atlas from the HTML5 game <a href="http://gritsgame.appspot.com/">GRITS</a>. This single texture holds all the smaller images from the game that contain alpha values. Loading these images in a single texture reduces load time and improves runtime performance.</em></p>
<h1><strong>What is texture atlasing?</strong></h1>
<p>In real-time graphics, a texture atlas is a single large image that contains many smaller sub-images, each of which is referenced independently. Texture atlases sprang up with the advent of 3D games, and have been around as long as we’ve had such games (for instance, <a href="http://fd.fabiensanglard.net/quake2/openGL/quake2-Context1-Texture1025level0.png">here’s</a> one of the original <a href="http://en.wikipedia.org/wiki/Lightmap">lightmaps</a> used in <a href="http://www.idsoftware.com/games/quake/quake2/">Quake 2</a>). Atlasing was originally used to accelerate the performance of 3D games, where there is significant overhead when swapping out or referencing new textures during the rasterization stages of a 3D pipeline. By combining all the smaller textures into one larger one, the graphics pipeline incurs less overhead from swapping, resulting in better performance.</p>
<p>A side note on terminology: Many of you fancy HTML5 folks out there might say that a texture atlas is the same thing as a sprite sheet, but I don’t believe that’s correct. A sprite sheet is an atlas that contains only sprites. In contrast, an atlas can contain many charts; each chart is a large image that holds a single type of graphic resource – e.g., sprites (for animation), UI textures (which are not animated), <a href="http://en.wikipedia.org/wiki/Glyph">glyphs</a> (for font-processing), and so on. An atlas is a <a href="http://en.wikipedia.org/wiki/Texture_atlas">technical concept</a>, while a sprite sheet has a specific <a href="http://blogs.msdn.com/b/davrous/archive/2012/03/16/html5-gaming-animating-sprites-in-canvas-with-easeljs.aspx">functional notion</a>.</p>
<p>&nbsp;</p>
<h1><strong>The benefits of using atlases in HTML5 games</strong></h1>
<p>Using a texture atlas in your HTML5 game can help your game load faster and run faster, and also reduces your bandwidth cost.<strong><strong><br />
</strong></strong></p>
<h3><strong>Faster HTTP load times</strong></h3>
<p style="text-align: left">As I explained in my <a href="http://www.youtube.com/watch?v=Prkyd5n0P7k&amp;utm_source=altdevblog">Google I/O talk</a>, a 4k x 4k texture fetched from a server can take around 241ms to download, which is pretty fast. If you were to chop up that single download into 4096 separate requests, at 64&#215;64 pixels each, the total load time changes drastically for the same number of pixels: It would increase from 241 ms to  4.3 seconds, an increase by a factor of 17x. The timing charts below illustrate this difference graphically:</p>
<p style="text-align: center"><strong><strong><br />
<img class="aligncenter" src="https://lh6.googleusercontent.com/MT8j1tO9OqEglm_qMhNKC4JICJID0AHkrfQh5HRasvPX0BV5TXNSnnOxIUB8nmKgJDE-8JNWG8gY5DsyuvDdrfTmjj94ePEcgv5hm1T8oqCIaUJNjew" alt="" width="687px;" height="85px;" /><br />
</strong></strong><em>Fetching a 4096&#215;4096 texture from a server requires a single HTTP request, and the request resolves quickly.<strong><strong><br />
<img class="aligncenter" src="https://lh6.googleusercontent.com/ABumX5UQ3iNHGlE94WVOyyVx9r188ORTBnrx4wNL5jWoAHaHN4DO8y_wSdZmPCnmMCCOppbhTT18eN4yj79et2HVKfA9ZLDz5HJPUzJB1LgXQ5b49Dg" alt="" width="683px;" height="436px;" /><br />
</strong></strong>Fetching assets individually with multiple HTTP requests takes a long time. The long, lighter-shaded left-hand portion of the duration bars represent time where the browser blocked the connection because too many requests were being issued.</em></p>
<p>It’s important to understand that there exists an upper limit on the number of requests that a browser can make to a single server. The browser itself sets this upper limit, and when the limit is reached, the browser blocks subsequent requests until an open connection becomes available. This is the primary reason for the performance difference we see with individual assets versus atlased assets. If you have 4,000 pending HTTP requests, and only six connections are available, all the requests get stacked. In the figure above, the lighter-shaded portion of each duration bar is where Chrome blocked, waiting for an active connection to come along.</p>
<p>In addition to reducing overall load time, atlasing also helps reduce the number of HTTP requests from your app. This is a hyper-critical issue that the developers of <a href="http://www.adityaravishankar.com/2011/11/command-and-conquer-programming-an-rts-game-in-html5-and-javascript/">HTML5 version</a> of <a href="http://www.commandandconquer.com/">Command &amp; Conquer</a> found out the <a href="http://www.adityaravishankar.com/2012/01/html5-game-development-using-sprite-sheets-for-better-performance/">hard way</a>: During development their app went viral, and their hosting service suspended their account until their request load dropped.</p>
<p>&nbsp;</p>
<h3><strong>Reduced browser runtime overhead</strong></h3>
<p>Using individual texture assets can also have a very large impact on your game’s runtime performance.<br />
Web browsers have two main components: a JavaScript VM and a DOM/Layout engine (usually <a href="http://www.webkit.org/">WebKit</a>).  To optimize reload times, WebKit keeps an <a href="http://en.wikipedia.org/wiki/Cache_(computing)">in-memory cache</a> of resources that are loaded from the network, so that the resources do not have to be re-downloaded (or reloaded from the disk cache) when they are requested in the future. For example, when a <a href="http://en.wikipedia.org/wiki/Document_Object_Model">DOM element</a> such as an image is deleted, the <a href="http://en.wikipedia.org/wiki/Garbage_collection_(computer_science)">garbage collection</a> process in WebKit can optionally keep the element in memory even though the JavaScript engine no longer needs it. To use the cache properly, WebKit can detect when the system is resource-constrained (i.e., when the amount of available RAM is low), and instantiate a cache-eviction process to remove unneeded resources in an effort to improve performance. For applications with a small number of DOM elements, the cache-eviction process is generally not an issue, but it can become a problem as the number of cacheable objects increases and the garbage collection algorithm spends more time looking for dead objects to reclaim. For 2D, image-based games this can be a considerable problem, e.g.,  when 2,000+ animated sprites and background textures are loaded and referenced individually in a browser. For these apps, images are generally the biggest offenders, and atlasing can really help:  By combining the individual images into larger atlases, the number of unique cacheable resources decreases, allowing WebKit to spend less time in its cache-eviction process in resource-constrained environments.</p>
<p>(To be clear, the WebKit in-memory caching behavior described here has nothing to do with the <a href="https://developers.google.com/v8/design">JavaScript garbage collection algorithm</a> and the problems you can get from incorrectly managing JavaScript objects.)</p>
<p>&nbsp;</p>
<h3><strong>Reduced GPU overhead</strong></h3>
<p>In Chrome, the 2D canvas element on a page has access to hardware acceleration if it’s available. That means all of your draws, images, and transforms are handled by the GPU, which significantly improves performance. The catch, however, is that there are a few abstraction layers between your API calls and what the hardware is doing.</p>
<p>It’s been known for quite some time that <a href="http://msdn.microsoft.com/en-us/library/windows/desktop/bb172234(v=vs.85).aspx">reducing state changes</a> in your graphics application <a href="http://developer.apple.com/library/ios/#documentation/3DDrawing/Conceptual/OpenGLES_ProgrammingGuide/OpenGLESApplicationDesign/OpenGLESApplicationDesign.html">yields higher performance</a> (see for example Chapter 3 in this old <a href="http://developer.download.nvidia.com/GPU_Programming_Guide/GPU_Programming_Guide_G80.pdf">NVIDIA GPU Programming Guide</a> (pdf)). Working with a hardware-accelerated canvas is no different. Each texture must be bound to the GPU before the primitive quad can be drawn, and with a limited number of texture slots for a given GPU, there’s quite a bit of time spent swapping textures in and out of the proper sampling units. Atlasing reduces this overhead by allowing the GPU to bind a single texture (or a smaller number of textures) to the graphics driver, eliminating the extra overhead of the swap.</p>
<p>&nbsp;</p>
<h3><strong>Reduced Memory footprint</strong></h3>
<p>With the exception of dumping the pixel data directly (ie, a <a href="http://en.wikipedia.org/wiki/Raw_image_format">.RAW file</a>) most file formats used for game development come with additional header data that needs to be transferred with the file. This data is there, regardless of the dimensions of the file, or type of data inside it. For instance, a <a href="http://en.wikipedia.org/wiki/DirectDraw_Surface">DDS file</a>, typically used for <a href="http://en.wikipedia.org/wiki/DirectX">DirectX</a> implementations of compressed textures, has a general overhead of 128 bytes for the header, which may not seem huge when compared to the data within it; However consider that 2000 loose texture files (not uncommon for a AAA game) would incur an extra overhead of 250k that needs to be transferred across the wire for each user loading your game. Packing these textures into a single atlas removes the redundant overhead of these header bytes, reducing the overall size of your program.</p>
<p>&nbsp;</p>
<h1><strong>Packing charts into an atlas</strong></h1>
<p style="text-align: center"><img class="aligncenter" src="https://lh3.googleusercontent.com/bAgzyoZhRq2cTZtEkLybZYyburnhhcau0PV8s9ZbQUh4khJDXJhMZ4oldMs1XRZ7Xd9WOrImyfdZ8ty3v6pAOifw0CUDsBE86jQ2yy66tP8Gr0cbj4s" alt="" width="651px;" height="362px;" /></p>
<p>Creating a texture atlas is a tricky engineering task. Texture packing is a type of <a href="http://en.wikipedia.org/wiki/Bin_packing_problem">bin packing</a> problem, which has been proven to be <a href="http://en.wikipedia.org/wiki/NP-hard">NP-hard</a>. The problem is so challenging that I frequently use a variant of this algorithm as an interview question to evaluate the skills of potential hires.</p>
<p><a href="http://www.codeandweb.com/texturepacker">TexturePacker</a> is a great off-the-shelf tool that fits into most content pipelines quickly; it will generate the atlas data given a list of textures, alongside a data file that maps the individual source images to their final locations in the atlas.</p>
<p>If you’re going to roll your own texture packer, I suggest you begin with some research. Here are some great places to start:</p>
<ul>
<li><a href="http://clb.demon.fi/files/RectangleBinPack.pdf">A Thousand Ways to Pack the Bin &#8211; A Practical Approach to Two-Dimensional Rectangle Bin Packing (pdf)</a> (pdf);  comes with full <a href="https://github.com/juj/RectangleBinPack">source code</a></li>
<li><a href="http://www.blackpawn.com/texts/lightmaps/">Blackpawn.com</a> posted one of the first open bin-packing algorithm I remember seeing; there’s now a <a href="http://incise.org/2d-bin-packing-with-javascript-and-canvas.html">great JavaScript example</a> that runs in real time, so you can see the algorithm in motion</li>
</ul>
<p>If you’re looking to do highly complex packing (e.g., font glyph packing), most font rasterization libraries come with very aggressive atlas packers. <a href="http://code.google.com/p/freetype-gl/">Freetype-gl</a> has one of the best ones I’ve seen; it uses the <a href="http://clb.demon.fi/projects/rectangle-bin-packing">Rectangle Bin Pack</a> algorithm.</p>
<p>Once your data has been packed into charts, you’ll also need to output a mapping file that lists where each leaf-image (individual image) is in the atlas.</p>
<h2></h2>
<h1><strong>Using atlases in your game</strong></h1>
<p>Use of atlases in HTML5 games varies by game type:</p>
<ul>
<li>A pure DOM game can use a <a href="http://css-tricks.com/css-sprites/">CSS sprite sheet</a>, and set CSS properties on a DOM image after loading the atlas data file.</li>
<li>A canvas game can use the <a href="http://www.w3schools.com/html5/canvas_drawimage.asp">canvas.drawImage</a> API, specifying a subsection of an image to be drawn to the canvas location.</li>
<li>A <a href="http://en.wikipedia.org/wiki/WebGL">WebGL</a> game can <a href="http://learningwebgl.com/blog/?p=507">modify the UV-Coordinates</a> of a vertex to reference a subsection of a larger image using normalized floating point values.</li>
</ul>
<h2></h2>
<h1><strong>Some perspective on game development in HTML5</strong></h1>
<p>Web browsers are amazing programs: They abstract out the underlying complexities of loading and rendering web pages, hiding the under-the-hood machinery from the average web developer. For high-performance web games, however, this is less than ideal. The layers of abstraction make it generally unclear how to organize data for maximum performance, and can lead to increased overhead. As with most development platforms, the key to high performance is a thoughtful understanding of what’s really going on, and knowing what tools are available to aid in developing real-time applications.</p>
<p>Stepping back a bit, it’s clear that HTML5 games have great promise. The ability to iterate quickly and deploy content across the Internet is a powerful incentive for developers. But it’s important to approach HTML5 game development with the proper perspective. Regardless of the unicorn dust of the web, working in HTML5 has many of the same pitfalls as console, mobile, and desktop development. Game developers need to continually explore and find great solutions for difficult problems.</p>
<p>For a take on HTML5 game development from the point of view of a traditional game developer, take a look at the following videos:</p>
<ul>
<li><a href="http://www.youtube.com/watch?v=Prkyd5n0P7k&amp;utm_source=altdevblog">GRITS: PvP gaming in HTML5</a></li>
<li><a href="http://www.youtube.com/watch?v=huXucPChX3g&amp;utm_source=altdevblog">Best practices in developing HTML5 games</a></li>
<li><a href="http://www.youtube.com/watch?v=XAqIpGU8ZZk&amp;utm_source=altdevblog">From console to Chrome</a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.altdevblogaday.com/2012/09/17/building-an-html5-game-dont-shrug-off-atlases/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Shader Generator</title>
		<link>http://www.altdevblogaday.com/2012/08/01/shader-generator/</link>
		<comments>http://www.altdevblogaday.com/2012/08/01/shader-generator/#comments</comments>
		<pubDate>Wed, 01 Aug 2012 15:33:43 +0000</pubDate>
		<dc:creator>Simon Yeung</dc:creator>
				<category><![CDATA[Computer Graphics]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[Tools]]></category>
		<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[graphics]]></category>
		<category><![CDATA[shader generator]]></category>

		<guid isPermaLink="false">http://www.altdevblogaday.com/?p=27083</guid>
		<description><![CDATA[<p><strong>Introduction</strong></p>
<p>In the last few weeks, I was busy with rewriting my iPhone engine so that it can also run on the Windows platform (so that I can use Visual Studio in stead of Xcode~) and most importantly, I can play around with D3D11. During the rewrite, I want to improve the process of writing shaders so that I don&#8217;t need to write similar shaders multiple times for each shader permutation (say, for each surface, I have to write a shader for static mesh, skinned mesh, instanced static mesh&#8230; multiplies with the number of render pass), and instead I can focus on coding how the surface would looks like. So I decided to write a shader generator that will generate those shaders which is similar to the <a href="http://docs.unity3d.com/Documentation/Components/SL-SurfaceShaders.html">surface shader in Unity</a>. I choose the surface shader approach instead of a graph based approach like <a href="http://udn.epicgames.com/Three/MaterialEditorUserGuide.html">Unreal Engine</a>, because being a programer, I feel more comfortable (and faster) to write code than dragging tree nodes using the GUI. In the current implementation of the shader generator, it can only generate vertex and pixel shaders for the light pre pass renderer which is the lighting model used before.</p>
<p><a href="http://www.altdevblogaday.com/2012/08/01/shader-generator/" class="more-link">Read more on Shader Generator&#8230;</a></p>
]]></description>
				<content:encoded><![CDATA[<p><strong>Introduction</strong></p>
<p>In the last few weeks, I was busy with rewriting my iPhone engine so that it can also run on the Windows platform (so that I can use Visual Studio in stead of Xcode~) and most importantly, I can play around with D3D11. During the rewrite, I want to improve the process of writing shaders so that I don&#8217;t need to write similar shaders multiple times for each shader permutation (say, for each surface, I have to write a shader for static mesh, skinned mesh, instanced static mesh&#8230; multiplies with the number of render pass), and instead I can focus on coding how the surface would looks like. So I decided to write a shader generator that will generate those shaders which is similar to the <a href="http://docs.unity3d.com/Documentation/Components/SL-SurfaceShaders.html">surface shader in Unity</a>. I choose the surface shader approach instead of a graph based approach like <a href="http://udn.epicgames.com/Three/MaterialEditorUserGuide.html">Unreal Engine</a>, because being a programer, I feel more comfortable (and faster) to write code than dragging tree nodes using the GUI. In the current implementation of the shader generator, it can only generate vertex and pixel shaders for the light pre pass renderer which is the lighting model used before.</p>
<p><strong>Defining the surface</strong></p>
<p>To generate the target vertex and pixel shaders by the shader generator, we need to define how the surface looks like by writing surface shader. In my version of surface shader, I need to define 3 functions: vertex function, surface function and lighting function. The vertex function defines the vertex properties like position and texture coordinates.</p>
<blockquote class="tr_bq">
<ol>
<li>VTX_FUNC_OUTPUT vtxFunc(VTX_FUNC_INPUT input)</li>
<li>{</li>
<li>    VTX_FUNC_OUTPUT output;</li>
<li>    output.position = mul( float4(input.position, 1), worldViewProj  );</li>
<li>    output.normal = mul( worldInv, float4(input.normal, 0) ).xyz;</li>
<li>    output.uv0 = input.uv0;</li>
<li>    return output;</li>
<li>}</li>
</ol>
</blockquote>
<p>The surface function which describe how the surface looks like by defining the diffuse color of the surface, glossiness and the surface normal.</p>
<blockquote class="tr_bq">
<ol>
<li>SUF_FUNC_OUTPUT sufFunc(SUF_FUNC_INPUT input)</li>
<li>{</li>
<li>    SUF_FUNC_OUTPUT output;</li>
<li>    output.normal = input.normal;</li>
<li>    output.diffuse = diffuseTex.Sample( samplerLinear, input.uv0 ).rgb;</li>
<li>    output.glossiness = glossiness;</li>
<li>    return output;</li>
<li>}</li>
</ol>
</blockquote>
<p>Finally the lighting function will decide which lighting model is used to calculate the reflected color of the surface.</p>
<blockquote class="tr_bq">
<ol>
<li>LIGHT_FUNC_OUTPUT lightFuncLPP(LIGHT_FUNC_INPUT input)</li>
<li>{</li>
<li>    LIGHT_FUNC_OUTPUT output;</li>
<li>    float4 lightColor = lightBuffer.Sample(samplerLinear, input.pxPos.xy * renderTargetSizeInv.xy );</li>
<li>    output.color = float4(input.diffuse * lightColor.rgb, 1);</li>
<li>    return output;</li>
<li>}</li>
</ol>
</blockquote>
<p>By defining the above functions, writer of the surface shader only need to fill in the output structure of the function by using the input structure with some auxiliary functions and shader constants provided by the engine.</p>
<p><strong>Generating the shaders</strong></p>
<p>As you can see in the above code snippet, my surface shader is just defining normal HLSL function with a fixed input and output structure for the functions. So to generate the vertex and pixel shaders, we just need to  copy these functions to the target shader code which will invoke those functions defined in the surface shader. Take the above vertex function as an example, the generated vertex shader would look like:</p>
<blockquote class="tr_bq">
<ol>
<li>#include &#8220;include.h&#8221;</li>
<li>struct VS_INPUT</li>
<li>{</li>
<li>    float3 position : POSITION0;</li>
<li>    float3 normal : NORMAL0;</li>
<li>    float2 uv0 : UV0;</li>
<li>};</li>
<li>struct VS_OUTPUT</li>
<li>{</li>
<li>    float4 position : SV_POSITION0;</li>
<li>    float3 normal : NORMAL0;</li>
<li>    float2 uv0 : UV0;</li>
<li>};</li>
<li>typedef VS_INPUT VTX_FUNC_INPUT;</li>
<li>typedef VS_OUTPUT VTX_FUNC_OUTPUT;</li>
<li>/********************* User Defined Content ********************/</li>
<li>VTX_FUNC_OUTPUT vtxFunc(VTX_FUNC_INPUT input)</li>
<li>{</li>
<li>    VTX_FUNC_OUTPUT output;</li>
<li>    output.position = mul( float4(input.position, 1), worldViewProj  );</li>
<li>    output.normal = mul( worldInv, float4(input.normal, 0) ).xyz;</li>
<li>    output.uv0 = input.uv0;</li>
<li>    return output;</li>
<li>}</li>
<li>/******************** End User Defined Content *****************/</li>
<li>VS_OUTPUT main(VS_INPUT input)</li>
<li>{</li>
<li>    return vtxFunc(input);</li>
<li>}</li>
</ol>
</blockquote>
<p>During code generation, the shader generator need to figure out what input and output structure are needed to feed into the user defined functions. This task is simple and can be accomplished by using some string functions.</p>
<p><strong>Simplifying the shader</strong></p>
<p>As I mentioned before, my shader generator is used for generating shaders used in the light pre pass renderer. There are 2 passes in light pre pass renderer which need different shader input and output. For example in the G-buffer pass, the shaders are only interested in the surface normal data but not the diffuse color while the data need by second geometry pass are the opposite. However all the surface information (surface normal and diffuse color) are defined in the surface function inside the surface shader. If we simply generating shaders like last section, we will generate some redundant code that cannot be optimized by the shader compiler. For example, the pixel shader in G buffer pass may need to sample the diffuse texture which require the texture coordinates input from vertex shader but the diffuse color is actually don&#8217;t needed in this pass, the compiler may not be able to figure out we don&#8217;t need the texture coordinates output in vertex shader. Of course we can force the writer to define some #if preprocessor inside the surface function for the particular render pass to eliminate the useless output, but this will complicated the surface shader authoring process as writing surface shader is to describe how the surface looks like, ideally, don&#8217;t need to worry about the output of a render pass.</p>
<p>So the problem is to figure out what the output data are actually need in a given pass and eliminate those outputs that are not needed. For example, given we are generating shaders for the G buffer pass and a surface function:</p>
<blockquote class="tr_bq">
<ol>
<li>SUF_FUNC_OUTPUT sufFunc(SUF_FUNC_INPUT input)</li>
<li>{</li>
<li>    SUF_FUNC_OUTPUT output;</li>
<li>    output.normal = input.normal;</li>
<li>    output.diffuse = diffuseTex.Sample( samplerLinear, input.uv0 ).rgb;</li>
<li>    output.glossiness = glossiness;</li>
<li>    return output;</li>
<li>}</li>
</ol>
</blockquote>
<p>We only want to keep the variables <em>output.normal</em> and <em>output.glossiness</em>. And the variable <em>output.diffuse</em>, and other variables that is referenced by <em>output.diffuse</em> (<em>diffuseTex</em>, <em>samplerLinear,</em> <em>input.uv0</em>) are going to be eliminated. To find out such variable dependency, we need to teach the shader generator to understand HLSL grammar and find out all the assignment statements and branching conditions to derive the variable dependency.</p>
<p>To do this, we need to generate an abstract syntax tree from the shader source code. Of course we can write our own LALR parser to achieve this goal, but I chose to use <a href="http://dinosaur.compilertools.net/">lex&amp;yacc (or flex&amp;bison)</a> to generate the parse tree. Luckily we are working on a subset of the HLSL syntax(only need to define functions and don&#8217;t need to use pointers) and HLSL syntax is similar to C language, so modifying the ANSI-C grammar rule for <a href="http://www.lysator.liu.se/c/ANSI-C-grammar-l.html">lex</a>&amp;<a href="http://www.lysator.liu.se/c/ANSI-C-grammar-y.html">yacc</a> would do the job. Here is my modified <a href="https://sites.google.com/site/simontechblog/home/lex_yacc/lex.l">grammar</a> <a href="https://sites.google.com/site/simontechblog/home/lex_yacc/rule.y">rule</a> used to generate the parse tree. By traversing the parse tree, the variable dependency can be obtained, hence we know which variables need to be eliminated and eliminate them by taking out the assignment statements, then the compiler will do the rest. Below is the simplified pixel shader generated in the previous example:</p>
<blockquote class="tr_bq">
<ol>
<li>#include &#8220;include.h&#8221;</li>
<li>cbuffer _materialParam : register( MATERIAL_CONSTANT_BUFFER_SLOT_0 )</li>
<li>{</li>
<li>    float glossiness;</li>
<li>};</li>
<li>Texture2D diffuseTex: register( MATERIAL_SHADER_RESOURCE_SLOT_0 );</li>
<li>struct PS_INPUT</li>
<li>{</li>
<li>    float4 position : SV_POSITION0;</li>
<li>    float3 normal : NORMAL0;</li>
<li>};</li>
<li>struct PS_OUTPUT</li>
<li>{</li>
<li>    float4 gBuffer : SV_Target0;</li>
<li>};</li>
<li>struct SUF_FUNC_OUTPUT</li>
<li>{</li>
<li>    float3 normal;</li>
<li>    float glossiness;</li>
<li>};</li>
<li>typedef PS_INPUT SUF_FUNC_INPUT;</li>
<li>/********************* User Defined Content ********************/</li>
<li>SUF_FUNC_OUTPUT sufFunc(SUF_FUNC_INPUT input)</li>
<li>{</li>
<li>    SUF_FUNC_OUTPUT output;</li>
<li>    output.normal = input.normal;</li>
<li>                                                                 ;</li>
<li>    output.glossiness = glossiness;</li>
<li>    return output;</li>
<li>}</li>
<li>/******************** End User Defined Content *****************/</li>
<li>PS_OUTPUT main(PS_INPUT input)</li>
<li>{</li>
<li>    SUF_FUNC_OUTPUT sufOut= sufFunc(input);</li>
<li>    PS_OUTPUT output;</li>
<li>    output.gBuffer= normalToGBuffer(sufOut.normal, sufOut.glossiness);</li>
<li>    return output;</li>
<li>}</li>
</ol>
</blockquote>
<p><strong>Extending the surface shader syntax</strong></p>
<p>As I use lex&amp;yacc to parse the surface shader, I can extend the surface shader syntax by adding more grammar rule, so that writer of the surface shader can define what shader constants and textures are needed in their surface function to generate the constant buffer and shader resources in the source code. Also my surface shader syntax permit user to define their struct and function other than their 3 main functions (vertex, surface and lighting function), where they will also be copied into the generated source code. Here is a sample of how my surface shader would looks like:</p>
<blockquote class="tr_bq">
<ol>
<li>RenderType{</li>
<li>    opaque;</li>
<li>};</li>
<li>ShaderConstant</li>
<li>{</li>
<li>    float glossiness: ui_slider_0_255_Glossiness;</li>
<li>};</li>
<li>TextureResource</li>
<li>{</li>
<li>    Texture2D diffuseTex;</li>
<li>};</li>
<li>VTX_FUNC_OUTPUT vtxFunc(VTX_FUNC_INPUT input)</li>
<li>{</li>
<li>    VTX_FUNC_OUTPUT output;</li>
<li>    output.position = mul( float4(input.position, 1), worldViewProj  );</li>
<li>    output.normal = mul( worldInv, float4(input.normal, 0) ).xyz;</li>
<li>    output.uv0 = input.uv0;</li>
<li>    return output;</li>
<li>}</li>
<li>SUF_FUNC_OUTPUT sufFunc(SUF_FUNC_INPUT input)</li>
<li>{</li>
<li>    SUF_FUNC_OUTPUT output;</li>
<li>    output.normal = input.normal;</li>
<li>    output.diffuse = diffuseTex.Sample( samplerLinear, input.uv0 ).rgb;</li>
<li>    output.glossiness = glossiness;</li>
<li>    return output;</li>
<li>}</li>
<li>LIGHT_FUNC_OUTPUT lightFuncLPP(LIGHT_FUNC_INPUT input)</li>
<li>{</li>
<li>    LIGHT_FUNC_OUTPUT output;</li>
<li>    float4 lightColor = lightBuffer.Sample(samplerLinear, input.pxPos.xy * renderTargetSizeInv.xy );</li>
<li>    output.color = float4(input.diffuse * lightColor.rgb, 1);</li>
<li>    return output;</li>
<li>}</li>
</ol>
</blockquote>
<p><strong>Conclusions</strong></p>
<p>This post described how I generate vertex and pixel shader source codes for different render passes by defining a surface shader which avoid me to write similar shaders multiple times and without worrying the particular shader input and output for each render pass. Currently, the shader generator can only generate vertex and pixel shader in HLSL for static mesh in the light pre pass renderer. The shader generator is still under progress where generating shader source code for the forward pass is still have not done yet. Besides domain, hull and geometry shaders are not implemented. Also GLSL support is missing, but this can be generated (in theory&#8230;) by building a more sophisticated abstract syntax tree during parsing the surface shader grammar or defining some new grammar rule in the surface shader (using lex&amp;yacc) for easier generating both HLSL and GLSL source code. But these will be left for the future as I still need to rewrite my engine and get it running again&#8230;</p>
<p><strong>References</strong><br />
[1] Unity &#8211; Surface Shader Examples <a href="http://docs.unity3d.com/Documentation/Components/SL-SurfaceShaderExamples.html">http://docs.unity3d.com/Documentation/Components/SL-SurfaceShaderExamples.html</a><br />
[2] Lex &amp; Yacc Tutorial <a href="http://epaperpress.com/lexandyacc/">http://epaperpress.com/lexandyacc/</a><br />
[3] ANSI C grammar, Lex specification <a href="http://www.lysator.liu.se/c/ANSI-C-grammar-l.html">http://www.lysator.liu.se/c/ANSI-C-grammar-l.html</a><br />
[4] ANSI C Yacc grammar <a href="http://www.lysator.liu.se/c/ANSI-C-grammar-y.html">http://www.lysator.liu.se/c/ANSI-C-grammar-y.html</a><br />
[5] <a href="http://www.ibm.com/developerworks/opensource/library/l-flexbison/index.html">http://www.ibm.com/developerworks/opensource/library/l-flexbison/index.html</a><br />
[6] <a href="http://www.gamedev.net/topic/200275-yaccbison-locations/">http://www.gamedev.net/topic/200275-yaccbison-locations/</a></p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.altdevblogaday.com/2012/08/01/shader-generator/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Photon Mapping Part 2</title>
		<link>http://www.altdevblogaday.com/2012/06/28/photon-mapping-part-2/</link>
		<comments>http://www.altdevblogaday.com/2012/06/28/photon-mapping-part-2/#comments</comments>
		<pubDate>Thu, 28 Jun 2012 14:39:09 +0000</pubDate>
		<dc:creator>Simon Yeung</dc:creator>
				<category><![CDATA[Computer Graphics]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[GI]]></category>
		<category><![CDATA[global illumination]]></category>
		<category><![CDATA[graphics]]></category>
		<category><![CDATA[light map]]></category>
		<category><![CDATA[programming]]></category>

		<guid isPermaLink="false">http://www.altdevblogaday.com/?p=26786</guid>
		<description><![CDATA[<p><strong><span class="Apple-style-span" style="font-size: large">Introduction</span></strong></p>
<p>Continue with <a href="http://simonstechblog.blogspot.hk/2012/06/photon-mapping-part-1.html">previous post</a>, this post will describe how <a href="http://en.wikipedia.org/wiki/Lightmap">light map</a> is calculated from the photon map. My light map stores incoming radiance of indirect lighting on a surface which are projected into <a href="http://en.wikipedia.org/wiki/Spherical_harmonics">Spherical Harmonics(SH)</a> basis. 4 SH coefficients is used  for each color channels. So 3 textures are used for RGB channels (total 12 coefficients).</p>
<p><a href="http://www.altdevblogaday.com/2012/06/28/photon-mapping-part-2/" class="more-link">Read more on Photon Mapping Part 2&#8230;</a></p>
]]></description>
				<content:encoded><![CDATA[<p><strong><span class="Apple-style-span" style="font-size: large">Introduction</span></strong></p>
<p>Continue with <a href="http://simonstechblog.blogspot.hk/2012/06/photon-mapping-part-1.html">previous post</a>, this post will describe how <a href="http://en.wikipedia.org/wiki/Lightmap">light map</a> is calculated from the photon map. My light map stores incoming radiance of indirect lighting on a surface which are projected into <a href="http://en.wikipedia.org/wiki/Spherical_harmonics">Spherical Harmonics(SH)</a> basis. 4 SH coefficients is used  for each color channels. So 3 textures are used for RGB channels (total 12 coefficients).</p>
<p><strong><span class="Apple-style-span" style="font-size: large">Baking the light map</span></strong></p>
<p>To bake the light map, the scene must have a set of unique, non-overlapping texture coordinates(UV) that correspond to a unique world space position so that the incoming radiance at a world position can be represented. This set of UV can be generated inside modeling package or using <a href="http://msdn.microsoft.com/en-us/library/windows/desktop/bb206321(v=vs.85).aspx">UVAtlas</a>. In my simple case, this UV is mapped manually.</p>
<p>To generate the light map, given a mesh with unique UV and the light map resolution, we need to rasterize the mesh (using scan-line or half-space rasterization) into the texture space with interpolated world space position across the triangles. So we can associate a world space position to a light map texel. Then for each texel, we can sample the photon map at the corresponding world space position by performing a final gather step just like previous post for offline rendering. So the incoming radiance at that world space position, hence the texel in the light map, can be calculated. Then the data is projected into SH coefficients, stored in 3 16-bits floating point textures. Below is a light map that extracting the dominant light color from SH coefficients:</p>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a href="http://1.bp.blogspot.com/-CrVk0YjGgJ8/T9ntBgbKVpI/AAAAAAAAAYo/RY02Z1TS1DM/s1600/sh_lightMap_dominantColor.png"><img src="http://1.bp.blogspot.com/-CrVk0YjGgJ8/T9ntBgbKVpI/AAAAAAAAAYo/RY02Z1TS1DM/s200/sh_lightMap_dominantColor.png" alt="" width="200" height="200" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center">The baked light map showing the dominant</p>
<p>light color from SH coefficients</td>
</tr>
</tbody>
</table>
<p><strong><span class="Apple-style-span" style="font-size: large">Using the light map</span></strong></p>
<p>After baking the light map, during run-time, the direct lighting is rendering with usual way, a point light is used to approximated the area light in the ray traced version, the difference is more noticeable at the shadow edges.</p>
<table>
<tbody>
<tr>
<td>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a href="http://4.bp.blogspot.com/-GF5MfBalGVU/T9nup6cp4sI/AAAAAAAAAYw/7wxAHirwzV8/s1600/sh_lightMap_direct_only.png"><img src="http://4.bp.blogspot.com/-GF5MfBalGVU/T9nup6cp4sI/AAAAAAAAAYw/7wxAHirwzV8/s320/sh_lightMap_direct_only.png" alt="" width="320" height="240" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center">direct lighting only, real time version</td>
</tr>
</tbody>
</table>
</td>
<td>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a href="http://3.bp.blogspot.com/-eDfbsOmVsFU/T9fy_NUuBAI/AAAAAAAAAXs/tld-MRVM9DM/s1600/d_only.png"><img src="http://3.bp.blogspot.com/-eDfbsOmVsFU/T9fy_NUuBAI/AAAAAAAAAXs/tld-MRVM9DM/s200/d_only.png" alt="" width="200" height="200" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center">direct lighting only, ray traced version</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
<p>Then we sample the SH coefficients from the light map to calculate the indirect lighting</p>
<table>
<tbody>
<tr>
<td>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a href="http://2.bp.blogspot.com/-LYrMnBJgpRU/T9nvty0YlOI/AAAAAAAAAY4/Z-m1B8XpP2w/s1600/sh_lightMap_indirect_only.png"><img src="http://2.bp.blogspot.com/-LYrMnBJgpRU/T9nvty0YlOI/AAAAAAAAAY4/Z-m1B8XpP2w/s320/sh_lightMap_indirect_only.png" alt="" width="320" height="240" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center">indirect lighting only, real time version</td>
</tr>
</tbody>
</table>
</td>
<td>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a href="http://1.bp.blogspot.com/-3_0VKifMtNE/T9f4oQKJHPI/AAAAAAAAAYQ/EVN--HPsl28/s1600/id_fg.png"><img src="http://1.bp.blogspot.com/-3_0VKifMtNE/T9f4oQKJHPI/AAAAAAAAAYQ/EVN--HPsl28/s200/id_fg.png" alt="" width="200" height="200" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center">indirect lighting only, ray traced version</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
<p>Combining the direct and indirect lighting, the final result becomes:</p>
<table>
<tbody>
<tr>
<td>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a href="http://3.bp.blogspot.com/-Y0EF5oGqHAY/T9nwoST7TFI/AAAAAAAAAZA/v3B3vZe-vJE/s1600/sh_lightMap.png"><img src="http://3.bp.blogspot.com/-Y0EF5oGqHAY/T9nwoST7TFI/AAAAAAAAAZA/v3B3vZe-vJE/s320/sh_lightMap.png" alt="" width="320" height="240" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center">direct + indirect lighting, real time version</td>
</tr>
</tbody>
</table>
</td>
<td>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a href="http://4.bp.blogspot.com/-QaIn1HWRpLw/T9fzJZhhpJI/AAAAAAAAAX0/WOhdDiKlESM/s1600/d_id_fg.png"><img src="http://4.bp.blogspot.com/-QaIn1HWRpLw/T9fzJZhhpJI/AAAAAAAAAX0/WOhdDiKlESM/s200/d_id_fg.png" alt="" width="200" height="199" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center">direct + indirect lighting, ray traced version</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
<p>&nbsp;</p>
<div>As we store the light map in SH, we can apply normal map to the mesh to change the reflected radiance.</div>
<div>
<table>
<tbody>
<tr>
<td>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a href="http://1.bp.blogspot.com/-wKt_4DgyH-I/T9nxjyWAqiI/AAAAAAAAAZI/AD3L13yYkfw/s1600/sh_lightMap_normal.png"><img src="http://1.bp.blogspot.com/-wKt_4DgyH-I/T9nxjyWAqiI/AAAAAAAAAZI/AD3L13yYkfw/s320/sh_lightMap_normal.png" alt="" width="320" height="240" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center">Rendered with normal map</td>
</tr>
</tbody>
</table>
</td>
<td>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a href="http://4.bp.blogspot.com/-4hsM8An5JvY/T9nyDJ_KG1I/AAAAAAAAAZQ/4JGIlEUh8HY/s1600/sh_lightMap_normal_indirect.png"><img src="http://4.bp.blogspot.com/-4hsM8An5JvY/T9nyDJ_KG1I/AAAAAAAAAZQ/4JGIlEUh8HY/s320/sh_lightMap_normal_indirect.png" alt="" width="320" height="240" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center">Indirect lighting with normal map</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
<p>We can also applying some tessellation, adding some ambient occlusion(AO) to make the result more interesting:</p>
</div>
<div>
<table>
<tbody>
<tr>
<td>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a href="http://2.bp.blogspot.com/-SpgqBKORBXY/T9nzEaQJi3I/AAAAAAAAAZY/wmhNQnu-kYw/s1600/sh_lightMap_normal_tess_ao.png"><img src="http://2.bp.blogspot.com/-SpgqBKORBXY/T9nzEaQJi3I/AAAAAAAAAZY/wmhNQnu-kYw/s320/sh_lightMap_normal_tess_ao.png" alt="" width="320" height="240" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center">Rendered with light map, normal map, tessellation and AO</td>
</tr>
</tbody>
</table>
</td>
<td>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a href="http://2.bp.blogspot.com/-bc1z8O_PAFk/T9nzL77umpI/AAAAAAAAAZg/WeXFHuxPYyA/s1600/sh_lightMap_normal_tess_ao2.png"><img src="http://2.bp.blogspot.com/-bc1z8O_PAFk/T9nzL77umpI/AAAAAAAAAZg/WeXFHuxPYyA/s320/sh_lightMap_normal_tess_ao2.png" alt="" width="320" height="240" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center">Rendered with light map, normal map, tessellation and AO</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
</div>
<p><strong><span class="Apple-style-span" style="font-size: large">Conclusion</span></strong></p>
<p>This post gives an overview on how to bake light map of indirect lighting data by sampling from the photon map. I use SH to store the incoming radiance, but other data can be stored such as storing the reflected diffuse radiance of the surface, which can reduce texture storage and doesn&#8217;t require floating point texture. Besides, the SH coefficients can be store per vertex in the static mesh instead of light map. Lastly, by sampling the photon map with final gather rays, light probe for dynamic objects can also be baked using similar methods.</p>
<p><strong>References</strong></p>
<p><span class="Apple-style-span" style="font-size: x-small">March of the Froblins: <a href="http://developer.amd.com/samples/demos/pages/froblins.aspx">http://developer.amd.com/samples/demos/pages/froblins.aspx</a></span></p>
<p><span class="Apple-style-span" style="font-size: x-small">Lighting and Material of HALO 3: <a href="http://www.bungie.net/images/Inside/publications/presentations/lighting_material.zip">http://www.bungie.net/images/Inside/publications/presentations/lighting_material.zip</a></span></p>
]]></content:encoded>
			<wfw:commentRss>http://www.altdevblogaday.com/2012/06/28/photon-mapping-part-2/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Photon Mapping Part 1</title>
		<link>http://www.altdevblogaday.com/2012/06/14/photon-mapping-part-1/</link>
		<comments>http://www.altdevblogaday.com/2012/06/14/photon-mapping-part-1/#comments</comments>
		<pubDate>Thu, 14 Jun 2012 02:29:27 +0000</pubDate>
		<dc:creator>Simon Yeung</dc:creator>
				<category><![CDATA[Computer Graphics]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[GI]]></category>
		<category><![CDATA[global illumination]]></category>
		<category><![CDATA[graphics]]></category>
		<category><![CDATA[programming]]></category>

		<guid isPermaLink="false">http://www.altdevblogaday.com/?p=26588</guid>
		<description><![CDATA[<p><strong>Introduction</strong></p>
<p>In this generation of computer graphics, <a href="http://en.wikipedia.org/wiki/Global_illumination">global illumination</a> (GI) is an important technique which calculate indirect lighting within a scene. <a href="http://en.wikipedia.org/wiki/Photon_mapping">Photon mapping</a> is one of the GI technique using particle tracing to compute images in offline rendering. Photon mapping is an easy to implement technique, so I choose to learn it and my target is to bake light map storing indirect diffuse lighting information using the photon map. Photon mapping consists of 2 passes: photon map pass and render pass, which will be described below.</p>
<p><a href="http://www.altdevblogaday.com/2012/06/14/photon-mapping-part-1/" class="more-link">Read more on Photon Mapping Part 1&#8230;</a></p>
]]></description>
				<content:encoded><![CDATA[<p><strong>Introduction</strong></p>
<p>In this generation of computer graphics, <a href="http://en.wikipedia.org/wiki/Global_illumination">global illumination</a> (GI) is an important technique which calculate indirect lighting within a scene. <a href="http://en.wikipedia.org/wiki/Photon_mapping">Photon mapping</a> is one of the GI technique using particle tracing to compute images in offline rendering. Photon mapping is an easy to implement technique, so I choose to learn it and my target is to bake light map storing indirect diffuse lighting information using the photon map. Photon mapping consists of 2 passes: photon map pass and render pass, which will be described below.</p>
<p><strong>Photon Map Pass</strong></p>
<p>In this pass, photons will be casted into the scene from the position of light source. Each photon store packet of energy. When photon hits a surface of the scene, the photon will either be reflected (either diffusely or specularly), transmitted  or absorbed, which is determined by Russian roulette.</p>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a href="http://2.bp.blogspot.com/-9JpihE-_5DE/T9f4DYhYuFI/AAAAAAAAAYI/VIRqKYHHUgE/s1600/photonPath.png"><img src="http://2.bp.blogspot.com/-9JpihE-_5DE/T9f4DYhYuFI/AAAAAAAAAYI/VIRqKYHHUgE/s400/photonPath.png" alt="" width="400" height="295" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center">Photons are traced in the scene to simulate the light transportation</td>
</tr>
</tbody>
</table>
<div></div>
<div>This hit event represents the incoming energy of that surface and will be stored in a <a href="http://en.wikipedia.org/wiki/K-d_tree">k-d tree</a> (known as photon map) for looking up in the render pass. Each hit event would store the photon energy, the incoming direction and the hit position.</div>
<p>However, it is more convenient to store radiance than storing energy in photon because when using punctual light source(e.g. point light), it is hard to compute the energy emits given the light source radiance. So I use the method described in <a href="http://www.pbrt.org/">Physically Based Rendering</a>, a weight of radiance is stored in each photon:</p>
<div class="separator" style="clear: both;text-align: center"><a href="http://3.bp.blogspot.com/-KhcyefbkX-I/T9fxHIj6NgI/AAAAAAAAAW0/xXmdUhJVEbI/s1600/alpha_emit.png"><img src="http://3.bp.blogspot.com/-KhcyefbkX-I/T9fxHIj6NgI/AAAAAAAAAW0/xXmdUhJVEbI/s320/alpha_emit.png" alt="" width="320" height="148" border="0" /></a></div>
<p>When a photon hits a surface, the probability of being reflected in a new random direction used in Russian roulette is:</p>
<div class="separator" style="clear: both;text-align: center"><a href="http://1.bp.blogspot.com/-2uau5SERimg/T9fxNXmevlI/AAAAAAAAAW8/bx2m2a3Sjis/s1600/prob_reflect.png"><img src="http://1.bp.blogspot.com/-2uau5SERimg/T9fxNXmevlI/AAAAAAAAAW8/bx2m2a3Sjis/s320/prob_reflect.png" alt="" width="320" height="143" border="0" /></a></div>
<p>This probability equation is chosen because photon will have a higher chance of being reflected if it is brighter. If the photon is reflected, its radiance will be updated to:</p>
<div class="separator" style="clear: both;text-align: center"><a href="http://1.bp.blogspot.com/-wLu8xN8rO6A/T9fxS3Q_qcI/AAAAAAAAAXE/vX5CQ94kjnE/s1600/alphaReflect.png"><img src="http://1.bp.blogspot.com/-wLu8xN8rO6A/T9fxS3Q_qcI/AAAAAAAAAXE/vX5CQ94kjnE/s320/alphaReflect.png" alt="" width="320" height="69" border="0" /></a></div>
<p>And the photon will continue to trace in the newly reflected direction.</p>
<p><strong>Render Pass</strong></p>
<p>In render pass, the direct and indirect lighting is computed separately. The direction lighting is computed using ray tracing.</p>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a href="http://3.bp.blogspot.com/-eDfbsOmVsFU/T9fy_NUuBAI/AAAAAAAAAXs/tld-MRVM9DM/s1600/d_only.png"><img src="http://3.bp.blogspot.com/-eDfbsOmVsFU/T9fy_NUuBAI/AAAAAAAAAXs/tld-MRVM9DM/s320/d_only.png" alt="" width="320" height="320" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center">Direct light only</td>
</tr>
</tbody>
</table>
<p>The indirect lighting is computed by sampling from the photon map. When calculating the indirect lighting in a given position(in this case, the shading pixel), we can locate N nearby photons in photon map to estimate the incoming radiance using <a href="http://en.wikipedia.org/wiki/Kernel_density_estimation">kernel density estimation</a>. A <a href="http://en.wikipedia.org/wiki/Kernel_(statistics)">kernel function</a> need to satisfy the conditions:</p>
<div class="separator" style="clear: both;text-align: center"><a href="http://2.bp.blogspot.com/-kqw71KJDmyg/T9fxYqiRdKI/AAAAAAAAAXM/1DHND31RjAM/s1600/kernel.png"><img src="http://2.bp.blogspot.com/-kqw71KJDmyg/T9fxYqiRdKI/AAAAAAAAAXM/1DHND31RjAM/s200/kernel.png" alt="" width="200" height="56" border="0" /></a></div>
<p>&nbsp;</p>
<div style="margin: 0px">I use the Simpson&#8217;s kernel(also known as Silverman&#8217;s second order kernel) suggested in the book Physically Based Rendering:</div>
<div class="separator" style="clear: both;text-align: center"><a href="http://2.bp.blogspot.com/-Ak52BhXbAwY/T9fxfemYUsI/AAAAAAAAAXU/EefVZobxOn8/s1600/simpsonKernel.png"><img src="http://2.bp.blogspot.com/-Ak52BhXbAwY/T9fxfemYUsI/AAAAAAAAAXU/EefVZobxOn8/s320/simpsonKernel.png" alt="" width="320" height="112" border="0" /></a></div>
<p>Then the density can be computed using kernel estimator for N samples within a distance d (i.e. the distance of the photon that is the most far away in the N samples):</p>
<div class="separator" style="clear: both;text-align: center"><a href="http://4.bp.blogspot.com/-EVIqO2i0ajw/T9fxmyjV79I/AAAAAAAAAXc/8esrs_WX7YM/s1600/kernelEstimator.png"><img src="http://4.bp.blogspot.com/-EVIqO2i0ajw/T9fxmyjV79I/AAAAAAAAAXc/8esrs_WX7YM/s320/kernelEstimator.png" alt="" width="320" height="96" border="0" /></a></div>
<p>Then the reflected radiance at the shading position can be computed with:</p>
<div class="separator" style="clear: both;text-align: center"><a href="http://3.bp.blogspot.com/-xNS0AtGd8Bk/T9fxuJ1RCyI/AAAAAAAAAXk/z7vOb8ZhVFY/s1600/L_relfected.png"><img src="http://3.bp.blogspot.com/-xNS0AtGd8Bk/T9fxuJ1RCyI/AAAAAAAAAXk/z7vOb8ZhVFY/s320/L_relfected.png" alt="" width="320" height="82" border="0" /></a></div>
<p>However, the result showing some circular artifact:</p>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a href="http://3.bp.blogspot.com/-Xny1lJG_Cgc/T9f0mR2R73I/AAAAAAAAAX8/cJTSfOVvFvs/s1600/d_id_no_fg.png"><img src="http://3.bp.blogspot.com/-Xny1lJG_Cgc/T9f0mR2R73I/AAAAAAAAAX8/cJTSfOVvFvs/s320/d_id_no_fg.png" alt="" width="320" height="319" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center">Using the photon map directly for indirect diffuselight would show artifact</td>
</tr>
</tbody>
</table>
<p>To tackle this problem, either increase the number of photon to a very high number, or we can perform a final gather step. In the final gather step, we shoot a number of final gather rays from the pixel that we are shading in random direction over the hemisphere of the shading point.</p>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a href="http://3.bp.blogspot.com/-vgb5P7NDD4c/T9f7JbTPF4I/AAAAAAAAAYc/Enpi3xO-AKs/s1600/finalGather.png"><img src="http://3.bp.blogspot.com/-vgb5P7NDD4c/T9f7JbTPF4I/AAAAAAAAAYc/Enpi3xO-AKs/s320/finalGather.png" alt="" width="320" height="294" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center">Final gather rays are casted from every shading position</td>
</tr>
</tbody>
</table>
<p>When final gather ray hit another surface, then the photon map is queried just like before and the reflected radiance from this surface will be the incoming radiance of the shading pixel. Using Monte Carlo integration, the reflected radiance at the shading pixel can be calculated by sampling the final gather rays. Here is the final result:</p>
<table>
<tbody>
<tr>
<td>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a href="http://4.bp.blogspot.com/-QaIn1HWRpLw/T9fzJZhhpJI/AAAAAAAAAX0/WOhdDiKlESM/s1600/d_id_fg.png"><img src="http://4.bp.blogspot.com/-QaIn1HWRpLw/T9fzJZhhpJI/AAAAAAAAAX0/WOhdDiKlESM/s320/d_id_fg.png" alt="" width="320" height="319" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center">Direct light + Indirect light, with final gather</td>
</tr>
</tbody>
</table>
</td>
<td>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a href="http://1.bp.blogspot.com/-3_0VKifMtNE/T9f4oQKJHPI/AAAAAAAAAYQ/EVN--HPsl28/s1600/id_fg.png"><img src="http://1.bp.blogspot.com/-3_0VKifMtNE/T9f4oQKJHPI/AAAAAAAAAYQ/EVN--HPsl28/s320/id_fg.png" alt="" width="320" height="320" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center">Indirect light only, with final gather</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
<p><strong>Conclusion</strong></p>
<p>In this post, the steps to implement photon map is briefly described. It is a 2 passes approach with the photon map pass building a photon map as kd-tree representing the indirect lighting data and the render pass use the photon map to compute the final image. In next part, I will describe how to make use of the photon map to bake light map for real time application.</p>
<p><strong>References</strong></p>
<p>A Practical Guide to Global Illumination using Photon Maps: <a href="http://nameless.cis.udel.edu/class_data/cg/jensen_photon_mapping_tutorial.pdf">http://nameless.cis.udel.edu/class_data/cg/jensen_photon_mapping_tutorial.pdf</a></p>
<p>Physically Based Rendering: <a href="http://www.pbrt.org/">http://www.pbrt.org/</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.altdevblogaday.com/2012/06/14/photon-mapping-part-1/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Why I went back into the studio&#8230;&#8230;</title>
		<link>http://www.altdevblogaday.com/2012/05/11/why-i-went-back-into-the-studio/</link>
		<comments>http://www.altdevblogaday.com/2012/05/11/why-i-went-back-into-the-studio/#comments</comments>
		<pubDate>Fri, 11 May 2012 08:26:29 +0000</pubDate>
		<dc:creator>Kevin Dent</dc:creator>
				<category><![CDATA[#AltDev Updates]]></category>
		<category><![CDATA[#AltDevConf]]></category>
		<category><![CDATA[#AltDevPodcast]]></category>
		<category><![CDATA[#AltDevUnconference]]></category>
		<category><![CDATA[#gamedev]]></category>
		<category><![CDATA[Audio]]></category>
		<category><![CDATA[Bizdev]]></category>
		<category><![CDATA[Computer Graphics]]></category>
		<category><![CDATA[Education]]></category>
		<category><![CDATA[Game design]]></category>
		<category><![CDATA[General Interest]]></category>
		<category><![CDATA[Guest Post]]></category>
		<category><![CDATA[Production]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[Programming Track]]></category>
		<category><![CDATA[Protected]]></category>
		<category><![CDATA[Tools]]></category>
		<category><![CDATA[UI and UX]]></category>
		<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[Visual Arts]]></category>

		<guid isPermaLink="false">http://www.altdevblogaday.com/?p=26204</guid>
		<description><![CDATA[<p>I LOVE working in the studio, I really do. I love the freedom it affords me. I love trying to create games that I want to play!</p>
<p>I also really love having a cerabal cortex, so I left the studio life and learned biz magic.</p>
<p><a href="http://www.altdevblogaday.com/2012/05/11/why-i-went-back-into-the-studio/" class="more-link">Read more on Why I went back into the studio&#8230;&#8230;&#8230;</a></p>
]]></description>
				<content:encoded><![CDATA[<p>I LOVE working in the studio, I really do. I love the freedom it affords me. I love trying to create games that I want to play!</p>
<p>I also really love having a cerabal cortex, so I left the studio life and learned biz magic.</p>
<p>I make a really good living in biz, I LOVE doing what I do now; I get to work with amazing people, I get to talk to amazing people. For the love of god, I spoke to the creator of Fruit Ninja tonight to have &#8220;chat&#8221;. How cool is that? Dude totally rocks btw.</p>
<p>My gig allows me to talk to my heroes. Seriously, I love playing video games that much.</p>
<p>The most common thing people say to me is &#8220;&#8230;.wow, you are a business guy and you love playing video games?&#8221;</p>
<p>When I stepped out of the studio, I made myself the pledge that I would only work on titles that touched me deeply. I would only work on games that I personally wanted to play. I would be way richer if I just kissed ass and decided to suck it up for the lord, god, almighty dollar!</p>
<p>If I do say so myself, I was pretty decent on the creative side of things too, but to be brutally honest; I was running a studio that sucked at business.</p>
<p>So one day I stopped. One day I decided that I would step out of the studio. I put a 22 year old in charge of it and for 7 mobile games -feature phones- he was shit. Then he just flipped the page and was brilliant.</p>
<p>Recently, I met a guy called Jason Brice. He sent me @ messages constantly, he was really cool and then one day I seen the maps he made in an FPS.</p>
<p>OMFG</p>
<p>They were great!</p>
<p>I looked at them -I hated the crane on the harbor map- BUT I loved the game itself.</p>
<p>Jason was creating the game that I wanted to play.</p>
<p>His view was that there were way too many layers between the player and AK47&#8242;ing another guy in the face.</p>
<p>He had me at &#8220;hello&#8221;.</p>
<p>So as I play this game, I talk about it, I reveal things about it and then I talk about it again.</p>
<p>Simply put, THIS IS GAMING!</p>
<p>I listen to an average of 14 game pitches a week, this was one of the first game pitches that I have seen that melted my resolve. Here I thought that I knew everything and these noobz are teaching me how to love video gaming again.</p>
<p>I want to be fair, I want to be honest and I do not want to be a prick.</p>
<p>BUT I am sick of the modern day FPS titles or as I like to call them &#8220;We just came up with another feature that allows us to get you to buy another sequel&#8221;&#8230;&#8230;&#8230;&#8230;.. ok that is a long title.</p>
<p>I am sick that on NPD day that we all shiver with anticipation at how many people bought our game. Here is a novel fucking idea, I am sick with anticipation at the thought of how many people enjoyed our game.</p>
<p>The game is ReKoil BTW. BUY IT PLUX</p>
<p>I am my most vulnerable when I am sitting in front of a metrics screen looking at the numbers, asking, wanting, no begging video gamers to like my title.</p>
<p>That is weak sauce, I want them to like me. I want them, no I crave that they like me!</p>
<p>It is the vulnerable essence of every video game maker. It is that vulnerability that allows me to exist.</p>
<p>This very insecurity, allows me to make games that I want to play, this insecurity allows Jason and the team to jump off a cliff and trust the fact that someone on the team will catch him.</p>
<p>Guess what? When someone that you consider a blood friend jumps off a cliff; you always catch them.</p>
<p>I am proud of what we are making today, I love that we are participating in the conversation! Will we beat Black-Op&#8217;s 2? God no! Their budget is 60X what people say that we are worth, but we will be participating in the conversation.</p>
<p>We are basically fighting the good fight, we are throwing punches and getting the shit kicked out of us. BUT we are Marty McFly and we will knock Biff the fuck out.</p>
<p>Am I desperate? FUCK YEAH!</p>
<p>Just tonight I sent this email:</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p><strong>From:</strong> Kevin Dent [mailto:<a href="mailto:kevin@XXX.XXXX">kevin@XXX.XXXX</a>]<br />
<strong>Sent:</strong> Thursday, May 10, 2012 8:50 PM<br />
<strong>To:</strong> &#8216;Andy McNamara&#8217;<br />
<strong>Subject:</strong> Front cover</p>
<p>Hi Andy,</p>
<p>Who do I have to screw to get the front cover for ReKoil?</p>
<p>Cheers,</p>
<p>Kevin</p>
<p>&nbsp;</p>
<p>As of today, we are not on Kickstarter, we are totally self-funded and we are totally throwing ourselves under the bus in an attempted to make a better game.</p>
<p>Let me be clear, Andy is an amazing person and I love him dearly. He is smart, contiencious and endearing. BUT Game Informer is owned by Game Stop and those dudes are hardcore publisher fuckers, there is zero chance of us getting the cover.</p>
<p>There is no way that, me, you or your freak parents will get us on the front cover of GI and nor should it. Brilliant games get on there 12 a year at least.</p>
<p>That said, it was worth a shot, we are living in the era of the indie.</p>
<p>There has been so many amazing titles in the last twelve months made by people with way more talent than me.</p>
<p>I rejoice at the next gen of game creators.</p>
<p>I am humbed by them.</p>
<p>But the truth is, that Jason Brice, asked me to jump off a cliff and I made the leap.</p>
<p>Who will catch me?</p>
<p>Kevin</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.altdevblogaday.com/2012/05/11/why-i-went-back-into-the-studio/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Generating Uniformly Distributed Points on Sphere</title>
		<link>http://www.altdevblogaday.com/2012/05/03/generating-uniformly-distributed-points-on-sphere/</link>
		<comments>http://www.altdevblogaday.com/2012/05/03/generating-uniformly-distributed-points-on-sphere/#comments</comments>
		<pubDate>Thu, 03 May 2012 13:47:31 +0000</pubDate>
		<dc:creator>Jaewon Jung</dc:creator>
				<category><![CDATA[Computer Graphics]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[points on sphere]]></category>
		<category><![CDATA[random sampling]]></category>
		<category><![CDATA[sphere]]></category>
		<category><![CDATA[stratified sampling]]></category>
		<category><![CDATA[uniform distribution]]></category>

		<guid isPermaLink="false">http://www.altdevblogaday.com/?p=25939</guid>
		<description><![CDATA[<p>Recently, while I was working on a screen-space shader effect, I had to do some <strong>random sampling over the surface of a sphere</strong>. An effective sampling requires <strong>a uniform distribution of samples</strong>. After a quick googling, I found out a way to generate uniformly distributed samples([<em>1</em>]), and it showed a decent result for my application. But, still unsure if that was an ideal way, I performed a due research about it later. Following is the result of that short research.</p>
<p><a href="http://www.altdevblogaday.com/2012/05/03/generating-uniformly-distributed-points-on-sphere/" class="more-link">Read more on Generating Uniformly Distributed Points on Sphere&#8230;</a></p>
]]></description>
				<content:encoded><![CDATA[<p>Recently, while I was working on a screen-space shader effect, I had to do some <strong>random sampling over the surface of a sphere</strong>. An effective sampling requires <strong>a uniform distribution of samples</strong>. After a quick googling, I found out a way to generate uniformly distributed samples([<em>1</em>]), and it showed a decent result for my application. But, still unsure if that was an ideal way, I performed a due research about it later. Following is the result of that short research.</p>
<p>Usually, in graphics application, one can limit it to the three-dimensional space. In that case, there are four possible approaches, all of which guarantee a uniform distribution(BTW, as for what the &#8216;uniform distribution&#8217; exactly means, [6] has some explanations). If a n-dimension support is required, one is out, so three remain. Let&#8217;s take stock of each.</p>
<h4>Rejection sampling ([<em>2</em>][<em>4</em>][<em>5</em>])</h4>
<p>One simple way is something called &#8216;rejection sampling&#8217;. For each x, y, z coordinates, choose a random value of a uniform distribution between [-1, 1]. If the length of the resulting vector is greater than one, reject it and try again. Obviously, this method can be generalized to n-dimension. But <strong>the bigger the dimension gets, the higher the rejection rate gets, so the less efficient the technique becomes</strong>.</p>
<h4>Normal deviate ([<em>2</em>][<em>5</em>])</h4>
<p>This technique chooses x, y and z from <strong>a normal distribution of mean 0 and variance 1</strong>. Then normalize the resulting vector and that&#8217;s it. [<em>2</em>] shows why this method can generate a uniform distribution over a sphere. In short,</p>
<blockquote><p><em>It works because the vector chosen (before normalization) has a density that depends only on the distance from the origin.</em></p></blockquote>
<p>as [<em>5</em>] explains. This also generalizes to n-dimension without a hassle.</p>
<h4>Trigonometry method ([<em>1</em>][<em>3</em>][<em>4</em>][<em>5</em>])</h4>
<p>This one works <strong>only for a three-dimensional sphere</strong>(called 2-sphere in literatures, which means it has two degrees of freedom), but is an easiest one to intuitively grab how it works. [<em>1</em>] nicely explains why it works from Archimedes&#8217; theorem:</p>
<blockquote><p><em>The area of a sphere equals the area of every right circular cylinder circumscribed about the sphere excluding the bases.</em></p></blockquote>
<p>The exact steps are as below:</p>
<ul>
<li>Choose z uniformly distributed in [-1,1].</li>
<li>Choose t uniformly distributed on [0, 2*pi).</li>
<li>Let r = sqrt(1-z^2).</li>
<li>Let x = r * cos(t).</li>
<li>Let y = r * sin(t).</li>
</ul>
<p>This is the one I used for my shader effect. Since I had to use a very small number of samples for the sake of performance, I did a stratified sampling with this method. <strong>A straightforward extension to a stratified sampling is another advantage of this technique</strong>.</p>
<h4>Coordinate approach ([<em>2</em>][<em>3</em>][<em>5</em>])</h4>
<p>The last one is applicable to general n-dimensions and [<em>2</em>] explains its quite math-heavy derivation in detail. This technique first gets the distribution of a single coordinate of a uniformly distributed point on the N-sphere. Then, it recursively gets the distribution of the next coordinate over (N-1)-sphere, and so on. Fortunately, <strong>for the usual 3D space(i.e. 2-sphere), the distribution of a coordinate is uniform and one can do a rejection sampling on 2D for the remaining 1-sphere(i.e. a circle).</strong> The exact way is explained in [<em>5</em>] as a variation of the trigonometry method.</p>
<h3>Codes and Pictures</h3>
<p>Even if you haven&#8217;t got it all fully up to this point, don&#8217;t worry. The source code will fill up the gaps in your understanding. You can find my <strong>naive C++ implementations</strong> of techniques above here: <a href="http://ideone.com/oYEVR">http://ideone.com/oYEVR</a></p>
<p>And some pretty pictures of random points generated by each method:</p>
<div class="wp-caption alignnone" style="width: 510px"><img src="http://i1089.photobucket.com/albums/i342/all2one/scriptogram_images/rejection_sampling.jpg" alt="500 points by rejection sampling" width="500" height="500" /><p class="wp-caption-text">500 points by rejection sampling</p></div>
<div class="wp-caption alignnone" style="width: 510px"><img src="http://i1089.photobucket.com/albums/i342/all2one/scriptogram_images/normal_deviate.jpg" alt="500 points by normal deviate" width="500" height="500" /><p class="wp-caption-text">500 points by normal deviate</p></div>
<div class="wp-caption alignnone" style="width: 510px"><img src="http://i1089.photobucket.com/albums/i342/all2one/scriptogram_images/trig_method.jpg" alt="500 points by trigonometry method" width="500" height="500" /><p class="wp-caption-text">500 points by trigonometry method</p></div>
<div class="wp-caption alignnone" style="width: 510px"><img src="http://i1089.photobucket.com/albums/i342/all2one/scriptogram_images/coordinate_approach.jpg" alt="500 points by coordinate approach" width="500" height="500" /><p class="wp-caption-text">500 points by coordinate approach</p></div>
<p>BTW, all the images above were plotted by <a href="http://math.exeter.edu/rparris/winplot.html"><strong>Winplot</strong></a>.</p>
<h3><a href="http://www.urbandictionary.com/define.php?term=tl%3Bdr" target="_blank">TL;DR</a></h3>
<p>Just use the trigonometry method above and add a stratified sampling, if necessary. That&#8217;ll be enough for the most of cases. ;)</p>
<h3>References</h3>
<ol>
<li><a href="http://repository.upenn.edu/cgi/viewcontent.cgi?article=1188&amp;context=cis_reports">http://repository.upenn.edu/cgi/viewcontent.cgi?article=1188&amp;context=cis_reports</a></li>
<li><a href="http://www-alg.ist.hokudai.ac.jp/~jan/randsphere.pdf">http://www-alg.ist.hokudai.ac.jp/~jan/randsphere.pdf</a></li>
<li><a href="http://mathworld.wolfram.com/SpherePointPicking.html">http://mathworld.wolfram.com/SpherePointPicking.html</a></li>
<li><a href="http://cgafaq.info/wiki/Uniform_random_points_on_sphere">http://cgafaq.info/wiki/Uniform_random_points_on_sphere</a></li>
<li><a href="http://www.math.niu.edu/~rusin/known-math/96/sph.rand">http://www.math.niu.edu/~rusin/known-math/96/sph.rand</a></li>
<li><a href="http://www.math.niu.edu/~rusin/known-math/95/sphere.faq">http://www.math.niu.edu/~rusin/known-math/95/sphere.faq</a></li>
</ol>
<p>(This article has also been posted to<a href="http://scriptogr.am/jj"> my personal blog</a>.)</p>
]]></content:encoded>
			<wfw:commentRss>http://www.altdevblogaday.com/2012/05/03/generating-uniformly-distributed-points-on-sphere/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Software Rasterizer Part 2</title>
		<link>http://www.altdevblogaday.com/2012/04/29/software-rasterizer-part-2/</link>
		<comments>http://www.altdevblogaday.com/2012/04/29/software-rasterizer-part-2/#comments</comments>
		<pubDate>Sun, 29 Apr 2012 12:48:13 +0000</pubDate>
		<dc:creator>Simon Yeung</dc:creator>
				<category><![CDATA[Computer Graphics]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[graphics]]></category>
		<category><![CDATA[Mathematics]]></category>
		<category><![CDATA[maths]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[software rasterizer]]></category>

		<guid isPermaLink="false">http://www.altdevblogaday.com/?p=25741</guid>
		<description><![CDATA[<p><strong><span class="Apple-style-span" style="font-size: large">Introduction</span></strong></p>
<p>Continue with the <a href="http://simonstechblog.blogspot.com/2012/04/software-rasterizer-part-1.html">previous post</a>, after filling the triangle with scan line or half-space algorithm, we also need to interpolate the vertex attributes across the triangle so that we can have texture coordinates or depth on every pixel. However we cannot directly interpolate those attributes in screen space because projection transform after perspective division is not an <a href="http://en.wikipedia.org/wiki/Affine_transformation">affine transformation</a> (i.e. after transformation, the mid-point of the line segment is no longer the mid-point), this will result in some distortion and this artifact is even more noticeable when the triangle is large:</p>
<p><a href="http://www.altdevblogaday.com/2012/04/29/software-rasterizer-part-2/" class="more-link">Read more on Software Rasterizer Part 2&#8230;</a></p>
]]></description>
				<content:encoded><![CDATA[<p><strong><span class="Apple-style-span" style="font-size: large">Introduction</span></strong></p>
<p>Continue with the <a href="http://simonstechblog.blogspot.com/2012/04/software-rasterizer-part-1.html">previous post</a>, after filling the triangle with scan line or half-space algorithm, we also need to interpolate the vertex attributes across the triangle so that we can have texture coordinates or depth on every pixel. However we cannot directly interpolate those attributes in screen space because projection transform after perspective division is not an <a href="http://en.wikipedia.org/wiki/Affine_transformation">affine transformation</a> (i.e. after transformation, the mid-point of the line segment is no longer the mid-point), this will result in some distortion and this artifact is even more noticeable when the triangle is large:</p>
<table>
<tbody>
<tr>
<td>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a href="http://4.bp.blogspot.com/-XOR93SRDUew/T4loMyobapI/AAAAAAAAAVc/bBYSu-0cIrg/s1600/interpolateInScrSpace.png"><img src="http://4.bp.blogspot.com/-XOR93SRDUew/T4loMyobapI/AAAAAAAAAVc/bBYSu-0cIrg/s320/interpolateInScrSpace.png" alt="" width="320" height="180" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center">interpolate in screen space</td>
</tr>
</tbody>
</table>
</td>
<td>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a href="http://3.bp.blogspot.com/-zIhlNAMPUU4/T4loScmVUgI/AAAAAAAAAVk/GzOogFZwhnM/s1600/interpolatePerspectiveCorrect.png"><img src="http://3.bp.blogspot.com/-zIhlNAMPUU4/T4loScmVUgI/AAAAAAAAAVk/GzOogFZwhnM/s320/interpolatePerspectiveCorrect.png" alt="" width="320" height="180" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center">perspective correct interpolation</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
<p><span class="Apple-style-span" style="font-size: large;font-weight: bold">Condition for linear interpolation</span></p>
<p>When interpolating the attributes in a linear way, we are saying that given a set of vertices, <em>v</em><span class="Apple-style-span" style="font-size: xx-small"><em>i</em></span> (where i is any integer&gt;=0) with a set of attributes <em>a</em><em><span class="Apple-style-span" style="font-size: xx-small">i</span></em> (such as texture coordinates), we have a function mapping a vertex to the corresponding attributes, i.e.</p>
<div style="text-align: center"><em>f</em>(<em>v</em><em><span class="Apple-style-span" style="font-size: xx-small">i</span></em>)= <em>a</em><em><span class="Apple-style-span" style="font-size: xx-small">i</span></em></div>
<p>Say, to interpolate a vertex inside a triangle in a linear way, the function <em>f</em> need to have the following properties:</p>
<div style="text-align: center"><em>f</em>(<em>t<span class="Apple-style-span" style="font-size: xx-small">0</span></em> *<em>v</em><em><span class="Apple-style-span" style="font-size: xx-small">0</span></em> + <em>t<span class="Apple-style-span" style="font-size: xx-small">1</span></em> *<em>v</em><em><span class="Apple-style-span" style="font-size: xx-small">1</span></em> + <em>t<span class="Apple-style-span" style="font-size: xx-small">2</span></em> *<em>v</em><em><span class="Apple-style-span" style="font-size: xx-small">2</span></em> ) = <em>t<span class="Apple-style-span" style="font-size: xx-small">0</span></em> * <em>f</em>(<em>v</em><em><span class="Apple-style-span" style="font-size: xx-small">0</span></em>) + <em>t<span class="Apple-style-span" style="font-size: xx-small">1</span></em> * <em>f</em>(<em>v</em><em><span class="Apple-style-span" style="font-size: xx-small">1</span></em>) + <em>t<span class="Apple-style-span" style="font-size: xx-small">2</span></em> * <em>f</em>(<em>v</em><em><span class="Apple-style-span" style="font-size: xx-small">2</span></em>)</div>
<div style="text-align: right"><span class="Apple-style-span" style="font-size: x-small"><span class="Apple-style-span">, for any </span><em>t0</em>, <em>t1</em>, <em>t2</em><span class="Apple-style-span"> where <em>t0 </em></span>+ <em>t1 </em>+ <em>t2</em>=1</span></div>
<p>which means that we can calculate the interpolated attributes using the same weight <em>t<span class="Apple-style-span" style="font-size: xx-small">i</span></em> used for interpolating vertex position. For functions having the above properties, those functions will be an <a href="http://en.wikipedia.org/wiki/Affine_transformation">affine function</a> with the following form:</p>
<div style="text-align: center"><em>f</em>(<em>x</em>)= <em>Ax</em> + <em>b</em></div>
<div style="text-align: right"><span class="Apple-style-span" style="font-size: x-small">, where <em>A</em> is a matrix, <em>x</em> and <em>b</em> are vector</span></div>
<p><span class="Apple-style-span" style="font-size: large;font-weight: bold">Depth interpolation</span></p>
<p>When a vertex is projected from view space to normalized device coordinates(NDC), we will have the following relation (ratio of the triangles) between the view space and NDC space:</p>
<div class="separator" style="clear: both;text-align: center"><a href="http://2.bp.blogspot.com/-cI7nS4ZOEXY/T4mDh9YNMNI/AAAAAAAAAVs/s0NYz7z_VME/s1600/eqt1_2.png"><img src="http://2.bp.blogspot.com/-cI7nS4ZOEXY/T4mDh9YNMNI/AAAAAAAAAVs/s0NYz7z_VME/s320/eqt1_2.png" alt="" width="320" height="88" border="0" /></a></div>
<p>substitute equation 1 and 2 into the plane equation of the triangle lies on:</p>
<div class="separator" style="clear: both;text-align: center"><a href="http://2.bp.blogspot.com/-39mf9Y9Nq7E/T4mDn7EyE6I/AAAAAAAAAV0/jd-K65xWHmU/s1600/eqt3.png"><img src="http://2.bp.blogspot.com/-39mf9Y9Nq7E/T4mDn7EyE6I/AAAAAAAAAV0/jd-K65xWHmU/s400/eqt3.png" alt="" width="400" height="90" border="0" /></a></div>
<p>&nbsp;</p>
<div>
<div>
<p>So, 1/<em>z<span class="Apple-style-span" style="font-size: xx-small">view</span></em> is an affine function of <em>x<span class="Apple-style-span" style="font-size: xx-small">ndc</span></em> and <em>y<span class="Apple-style-span" style="font-size: xx-small">ndc</span></em> which can be interpolated linearly across the screen space (the transform from NDC space to screen space is a linear transform).</p>
<p><span class="Apple-style-span" style="font-size: large;font-weight: bold">Attributes interpolation</span></p>
<p>In last section, we know how to interpolate the depth of a pixel linearly in screen space, the next problem is to interpolate the vertex attributes(e.g. texture coordinates). In view space, we know that those attributes can be interpolated linearly, so those attributes can be calculated by an affine function with the vertex position as parameters e.g.</p>
<div class="separator" style="clear: both;text-align: center"><a href="http://3.bp.blogspot.com/-YN4TbKnFFRo/T4mDwJdsZlI/AAAAAAAAAV8/9KUBzFV_q2A/s1600/uEqt.png"><img src="http://3.bp.blogspot.com/-YN4TbKnFFRo/T4mDwJdsZlI/AAAAAAAAAV8/9KUBzFV_q2A/s320/uEqt.png" alt="" width="320" height="44" border="0" /></a></div>
<p>Similar to interpolate depth, substitute equation 1 and 2 into the above equation:</p>
<div class="separator" style="clear: both;text-align: center"><a href="http://4.bp.blogspot.com/-A7XmZcn0wjY/T4mD3I0_Q9I/AAAAAAAAAWE/2Z7GLe-jNWA/s1600/uOverZ.png"><img src="http://4.bp.blogspot.com/-A7XmZcn0wjY/T4mD3I0_Q9I/AAAAAAAAAWE/2Z7GLe-jNWA/s400/uOverZ.png" alt="" width="400" height="121" border="0" /></a></div>
<p>Therefore, <em>u</em>/<em>z<span class="Apple-style-span" style="font-size: xx-small">view</span></em> is an another affine function of <em>x<span class="Apple-style-span" style="font-size: xx-small">ndc</span></em> and <em>y<span class="Apple-style-span" style="font-size: xx-small">ndc</span></em> which can be interpolated linearly across the screen space. Hence we can interpolating <em>u</em> linearly by first interpolate <em>1</em>/<em>z<span class="Apple-style-span" style="font-size: xx-small">view</span></em> and <em>u</em>/<em>z<span class="Apple-style-span" style="font-size: xx-small">view</span></em> across screen space, and then divide them per pixel.</p>
<p><span class="Apple-style-span" style="font-size: large;font-weight: bold">The last problem&#8230;</span></p>
</div>
<div>
<p>Now, we know that we can interpolate the view space depth and vertex attributes linearly across screen space. But during the rasterization state, we only have vertices in homogenous coordinates (vertices are transformed by the projection matrix already), how can we get the <em>z<span class="Apple-style-span" style="font-size: xx-small">view</span></em> to do the perspective correct interpolation?</p>
<p>Consider the projection matrix (I use D3D one, but the same for openGL):</p>
<div class="separator" style="clear: both;text-align: center"><a href="http://4.bp.blogspot.com/-KuQCpcDZdw0/T4mD-DIDFKI/AAAAAAAAAWM/HAckGRg5sDg/s1600/proj.png"><img src="http://4.bp.blogspot.com/-KuQCpcDZdw0/T4mD-DIDFKI/AAAAAAAAAWM/HAckGRg5sDg/s200/proj.png" alt="" width="200" height="116" border="0" /></a></div>
<div class="separator" style="clear: both;text-align: -webkit-auto"></div>
<p>After transforming the vertex position, the <em>w</em>-coordinate will be the view space depth!</p>
<div style="text-align: center">i.e. <em><strong>w-</strong></em><em>homogenous </em>=<em> </em><em><strong>z</strong><span class="Apple-style-span" style="font-size: xx-small">view </span></em></div>
<p>And look at the matrix again and consider the transformed <em>z</em>-coordinates, it will in a form of:</p>
<div class="separator" style="clear: both;text-align: center"><a href="http://2.bp.blogspot.com/-_GrTrCVKp-s/T4mEEDH6IFI/AAAAAAAAAWU/4V5nVfHFB68/s1600/zHomo.png"><img src="http://2.bp.blogspot.com/-_GrTrCVKp-s/T4mEEDH6IFI/AAAAAAAAAWU/4V5nVfHFB68/s1600/zHomo.png" alt="" border="0" /></a></div>
<p>After transforming to the NDC,</p>
<div class="separator" style="clear: both;text-align: center"><a href="http://3.bp.blogspot.com/-dBLPwyegfKA/T4mEJ30EpMI/AAAAAAAAAWc/xHMvLbRdMwg/s1600/zNDC.png"><img src="http://3.bp.blogspot.com/-dBLPwyegfKA/T4mEJ30EpMI/AAAAAAAAAWc/xHMvLbRdMwg/s1600/zNDC.png" alt="" border="0" /></a></div>
<p>So the depth value can be directly interpolated using <strong>z-</strong><em>NDC  </em>for depth test.</p>
<p><strong><span class="Apple-style-span" style="font-size: large">Demo</span></strong></p>
<p>A javascript demo to rasterize the triangles can be viewed <a href="http://simonstechblog.blogspot.com/2012/04/software-rasterizer-part-2.html#softwareRasterizerDemo">here</a>(although not optimized&#8230;). And the source code can be downloaded <a href="https://sites.google.com/site/simontechblog/home/softwarerasterizer/softwareRasterizer.js">here</a>.</p>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a href="http://simonstechblog.blogspot.com/2012/04/software-rasterizer-part-2.html#softwareRasterizerDemo"><img src="http://2.bp.blogspot.com/-riodEfJnzi4/T5gNhTswj8I/AAAAAAAAAWk/6mmx8r7Ri7I/s320/scrShot.png" alt="" width="320" height="232" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center">Screen shot of the demo</td>
</tr>
</tbody>
</table>
<p><strong><span class="Apple-style-span" style="font-size: large">Conclusion</span></strong><br />
In this post, the steps to linear interpolate the vertex in screen space is described. And for rasterizing the depth buffer only (e.g. for occlusion), the depth value can be linearly interpolated directly with the z coordinate in NDC space which is even simpler.</p>
<p><strong>References</strong><br />
[1] <a href="http://www.lysator.liu.se/~mikaelk/doc/perspectivetexture/">http://www.lysator.liu.se/~mikaelk/doc/perspectivetexture/</a><br />
[2] <a href="http://www.gamedev.net/topic/581732-perspective-correct-depth-interpolation/">http://www.gamedev.net/topic/581732-perspective-correct-depth-interpolation/</a><br />
[3] <a href="http://chrishecker.com/Miscellaneous_Technical_Articles">http://chrishecker.com/Miscellaneous_Technical_Articles</a><br />
[4] <a href="http://en.wikipedia.org/wiki/Affine_transformation">http://en.wikipedia.org/wiki/Affine_transformation</a></p>
<p>&nbsp;</p>
</div>
</div>
]]></content:encoded>
			<wfw:commentRss>http://www.altdevblogaday.com/2012/04/29/software-rasterizer-part-2/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Software Rasterizer Part 1</title>
		<link>http://www.altdevblogaday.com/2012/04/14/software-rasterizer-part-1/</link>
		<comments>http://www.altdevblogaday.com/2012/04/14/software-rasterizer-part-1/#comments</comments>
		<pubDate>Sat, 14 Apr 2012 10:05:37 +0000</pubDate>
		<dc:creator>Simon Yeung</dc:creator>
				<category><![CDATA[Computer Graphics]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[graphics]]></category>
		<category><![CDATA[math]]></category>
		<category><![CDATA[Mathematics]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[software rasterizer]]></category>

		<guid isPermaLink="false">http://www.altdevblogaday.com/?p=25433</guid>
		<description><![CDATA[<p><strong><span class="Apple-style-span" style="font-size: large">Introduction</span></strong><br />
Software rasterizer can be used for occlusion culling, some games such as <a href="http://www.slideshare.net/guerrillagames/practical-occlusion-culling-in-killzone-3">Killzone 3</a> use this to cull objects.  So I decided to write one by myself. The steps are first to transform vertices to homogenous coordinates, clip the triangles to the viewport and then fill the triangles with interpolated parameters.  Note that the clipping process should be done in homogenous coordinates before the perspective division, otherwise lots of the extra work are need to clip the triangles properly and this post will explain why clipping should be done before the perspective division.</p>
<p><a href="http://www.altdevblogaday.com/2012/04/14/software-rasterizer-part-1/" class="more-link">Read more on Software Rasterizer Part 1&#8230;</a></p>
]]></description>
				<content:encoded><![CDATA[<p><strong><span class="Apple-style-span" style="font-size: large">Introduction</span></strong><br />
Software rasterizer can be used for occlusion culling, some games such as <a href="http://www.slideshare.net/guerrillagames/practical-occlusion-culling-in-killzone-3">Killzone 3</a> use this to cull objects.  So I decided to write one by myself. The steps are first to transform vertices to homogenous coordinates, clip the triangles to the viewport and then fill the triangles with interpolated parameters.  Note that the clipping process should be done in homogenous coordinates before the perspective division, otherwise lots of the extra work are need to clip the triangles properly and this post will explain why clipping should be done before the perspective division.</p>
<p><strong><span class="Apple-style-span" style="font-size: large">Points in Homogenous coordinates</span></strong><br />
In our usual Cartesian Coordinate system, we can represent any points in 3D space in the form of (<em>X</em>, <em>Y</em>, <em>Z</em>). While in Homogenous coordinates, a redundant component <em>w</em> is added which resulting in a form of (<em>x</em>, <em>y</em>, <em>z</em>, <em>w</em>). Multiplying any constant (except zero) to that 4-components vector is still representing the same point in homogenous coordinates. To convert a homogenous point back to our usual Cartesian Coordinate, we would multiply a point in homogenous coordinates so that the <em>w</em> component is equals to one:</p>
<div style="text-align: center">(<em>x</em>, <em>y</em>, <em>z</em>, <em>w</em>) -&gt; (<em>x/</em><em>w </em>, <em>y/</em><em>w </em>, <em>z/</em><em>w, 1</em>) -&gt; (<em>X</em>, <em>Y</em>, <em>Z</em>)</div>
<p>In the following figure, we consider the <em>x</em>-<em>w</em> plane, a point (<em>x</em>, <em>y</em>, <em>z</em>, <em>w</em>) is transformed back to the usual Cartesian Coordinates (<em>X</em>, <em>Y</em>, <em>Z</em>) by projecting onto the <em>w</em>=1 plane:</p>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a href="http://2.bp.blogspot.com/-dfDLBMv6VG8/T3-_4ZA_SYI/AAAAAAAAAT0/yiY9uJC2Skw/s1600/projectTow1.png"><img src="http://2.bp.blogspot.com/-dfDLBMv6VG8/T3-_4ZA_SYI/AAAAAAAAAT0/yiY9uJC2Skw/s320/projectTow1.png" alt="" width="320" height="208" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center">figure 1. projecting point to <em>w</em>=1 plane</td>
</tr>
</tbody>
</table>
<p>The interesting point comes when the <em>w</em> component is equals to zero. Imagine the <em>w</em> component is getting smaller and smaller, approaching zero, the coordinates of point (<em>x/</em><em>w </em>, <em>y/</em><em>w </em>, <em>z/</em><em>w, 1</em>) will getting larger and larger. When <em>w</em> is equals to zero, we can represent a point at infinity.</p>
<p><strong><span class="Apple-style-span" style="font-size: large">Line Segments in Homogenous coordinates</span></strong><br />
In Homogenous coordinates, we still can represent a line segment between two points P<span class="Apple-style-span" style="font-size: xx-small">0</span>= (<em>x</em><span class="Apple-style-span" style="font-size: xx-small">0</span>, <em>y</em><span class="Apple-style-span" style="font-size: xx-small">0</span>, <em>z</em><span class="Apple-style-span" style="font-size: xx-small">0</span>, <em>w</em><span class="Apple-style-span" style="font-size: xx-small">0</span>) and  P<span class="Apple-style-span" style="font-size: xx-small">1</span>= (<em>x</em><span class="Apple-style-span" style="font-size: xx-small">1</span>, <em>y</em><span class="Apple-style-span" style="font-size: xx-small">1</span>, <em>z</em><span class="Apple-style-span" style="font-size: xx-small">1</span>, <em>w</em><span class="Apple-style-span" style="font-size: xx-small">1</span>) in parametric form:</p>
<div style="text-align: center">L= P<span class="Apple-style-span" style="font-size: xx-small">0</span> + t * (P<span class="Apple-style-span" style="font-size: xx-small">1</span>-P<span class="Apple-style-span" style="font-size: xx-small">0</span>),   <span class="Apple-style-span" style="font-size: x-small">where <em>t</em> is within [0, 1]</span></div>
<p>Then we can get a line having the shape:</p>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a href="http://3.bp.blogspot.com/-hDsxTEcslcM/T3_EqJVVdXI/AAAAAAAAAT8/KmSDSPQQUYg/s1600/internalLine.png"><img src="http://3.bp.blogspot.com/-hDsxTEcslcM/T3_EqJVVdXI/AAAAAAAAAT8/KmSDSPQQUYg/s320/internalLine.png" alt="" width="320" height="208" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center">figure 2. internal line segment</td>
</tr>
</tbody>
</table>
<p>The projected line on <em>w</em>=1 is called internal line segment in the above case.<br />
But what if the coordinates of P<span class="Apple-style-span" style="font-size: xx-small">0</span> and P<span class="Apple-style-span" style="font-size: xx-small">1</span> having the coordinates where <em>w</em><span class="Apple-style-span" style="font-size: xx-small">0</span> &lt; 0 and <em>w</em><span class="Apple-style-span" style="font-size: xx-small">1</span> &gt; 0 ?</p>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a href="http://4.bp.blogspot.com/-IS6paGPft8k/T3_GvjeA4uI/AAAAAAAAAUE/Ntx_8hcJ-BE/s1600/externalLine.png"><img src="http://4.bp.blogspot.com/-IS6paGPft8k/T3_GvjeA4uI/AAAAAAAAAUE/Ntx_8hcJ-BE/s320/externalLine.png" alt="" width="320" height="208" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center">figure 3. external line segment</td>
</tr>
</tbody>
</table>
<p>In this case, it will result in the above figure, forming an external line segment. It is because the homogenous line segment have the form L= P<span class="Apple-style-span" style="font-size: xx-small">0</span> + t * (P<span class="Apple-style-span" style="font-size: xx-small">1</span>-P<span class="Apple-style-span" style="font-size: xx-small">0</span>), when moving the parameter from <em>t</em>=0 to <em>t</em>= 1, since <em>w</em><span class="Apple-style-span" style="font-size: xx-small">0</span> &lt; 0 and <em>w</em><span class="Apple-style-span" style="font-size: xx-small">1</span> &gt; 0, there exist a point on the homogenous line where <em>w</em>=0. This point is at infinity when projected to the <em>w</em>=1 plane, resulting the projected line segment joining P<span class="Apple-style-span" style="font-size: xx-small">0</span> and P<span class="Apple-style-span" style="font-size: xx-small">1</span> passes through the point at infinity, forming an external line segment.</p>
<p>The figure below shows how points are transformed before and after perspective projection and divided by <em>w</em>:</p>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a href="http://4.bp.blogspot.com/-g1H9j5WDenc/T3_gSsZyoII/AAAAAAAAAUU/mtunCiIXOVQ/s1600/regionMapping.png"><img src="http://4.bp.blogspot.com/-g1H9j5WDenc/T3_gSsZyoII/AAAAAAAAAUU/mtunCiIXOVQ/s400/regionMapping.png" alt="" width="400" height="153" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center">figure 4. region mapping</td>
</tr>
</tbody>
</table>
<div style="margin: 0px">The blue line shows the viewing frustum, nothing unusual for the region in front of the eye. The unusual things are the points behind the eye. After perspective transformation and projected to <em>w</em>=1 plane, those points are transformed in front of the eye too. So for line segment with one point in front of the eye and the other behind the eye, it would be transformed to the external line segment after the perspective division.</div>
<div style="margin: 0px"><strong><br />
</strong></div>
<div style="margin: 0px"><strong><span class="Apple-style-span" style="font-size: large">Triangles in Homogenous coordinates</span></strong></div>
<div style="margin: 0px">In the last section, we know that there are internal and external line segments after the perspective division, we also have internal and external triangles. The internal triangles are the one that we usually sees. The external triangles must be formed by 1 internal line segment and 2 external line segments:</div>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a href="http://4.bp.blogspot.com/-T9LhG5LGq9Q/T3_T528h-TI/AAAAAAAAAUM/bNTggLtACBg/s1600/externalTri.png"><img src="http://4.bp.blogspot.com/-T9LhG5LGq9Q/T3_T528h-TI/AAAAAAAAAUM/bNTggLtACBg/s320/externalTri.png" alt="" width="320" height="208" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center">figure 5. external triangle</td>
</tr>
</tbody>
</table>
<p>In the above figure, the shaded area represents the external triangle formed by the points P<span class="Apple-style-span" style="font-size: xx-small">0</span>, P<span class="Apple-style-span" style="font-size: xx-small">1</span> and P<span class="Apple-style-span" style="font-size: xx-small">2</span>. This kind of external triangles may appear after the perspective projection transform. And this happens in our real world too:</p>
<table>
<tbody>
<tr>
<td>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a href="http://1.bp.blogspot.com/-OlJYIeUl3cs/T4BbwEDiLoI/AAAAAAAAAUk/v9IdMFQoKTM/s1600/clipTriPhoto.JPG"><img src="http://1.bp.blogspot.com/-OlJYIeUl3cs/T4BbwEDiLoI/AAAAAAAAAUk/v9IdMFQoKTM/s320/clipTriPhoto.JPG" alt="" width="320" height="240" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center">an external triangle in real world</td>
</tr>
</tbody>
</table>
</td>
<td>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a href="http://3.bp.blogspot.com/-S5jR7k5tLLU/T4Bb1Zp1OCI/AAAAAAAAAUs/ygOz9t773xE/s1600/clipTriPhotoFull.JPG"><img src="http://3.bp.blogspot.com/-S5jR7k5tLLU/T4Bb1Zp1OCI/AAAAAAAAAUs/ygOz9t773xE/s320/clipTriPhotoFull.JPG" alt="" width="240" height="320" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center">the full triangle of the left photo</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
<p>In the left photo, it shows an external triangle with one of the triangle vertex far behind the camera while the right photo shows the full view of the triangle and the cross marked the position of the camera where the left photo is taken.</p>
<p><strong><span class="Apple-style-span" style="font-size: large">Triangles clipping</span></strong><br />
To avoid the case of external triangles, lines/triangles should be clipped in homogenous coordinates before divided by the <em>w</em>-component. The homogenous point (<em>x</em>, <em>y</em>, <em>z</em>, <em>w</em>) will be tested with the following inequalities:</p>
<div style="text-align: center">(-<em>w </em>&lt;= <em>x </em>&lt;= <em>w</em>) &amp;&amp;   &#8212;&#8212; inequality. 1</div>
<div style="text-align: center">(-<em>w </em>&lt;= <em>y </em>&lt;= <em>w</em>) &amp;&amp;   &#8212;&#8212; inequality. 2</div>
<div style="text-align: center">(-<em>w </em>&lt;= <em>z </em>&lt;= <em>w</em> ) &amp;&amp;   &#8212;&#8212; inequality. 3<br />
<em>w </em>&gt; 0    &#8212;&#8212; inequality. 4</div>
<div>
<div style="text-align: center"></div>
<p>(The <em>z</em> clipping plane inequality is 0<em> </em>&lt;= <em>z </em>&lt;= <em>w</em> in the case for D3D, it depends on how the normalized device coordinates are defined.) Clipping by inequality 1,2,3 will effectively clip all points that with <em>w </em>&lt; 0 because if <em>w </em>&lt; 0, say <em>w </em>= -3:</p>
<div style="text-align: center">3 &lt;= x &lt;= -3     =&gt;     3 &lt;= -3</div>
<div style="text-align: center"></div>
<div style="text-align: left">which is impossible. But the point (0, 0, 0, 0) is still satisfy the first 3 inequalities and forming external cases, so inequality 4 is added. Consider a homogenous line with one end as (0, 0, 0, 0), it will equals to:</div>
<div style="text-align: left"></div>
<div style="text-align: center">L= (0, 0, 0, 0) + t * [ (<em>x</em>, <em>y</em>, <em>z</em>, <em>w</em>) - (0, 0, 0, 0) ] = t * (<em>x</em>, <em>y</em>, <em>z</em>, <em>w</em>)</div>
<p>which represent only a single point in homogenous coordinates. So triangle (after clipped by inequality 1, 2, 3) having one or two vertices with <em>w</em>=0 will result in either a line or a point which can be discarded. Hence, after clipping, no external triangles will be produced when dividing by <em>w-</em>component. To clip a triangle against a plane, the triangle may result in either  1 or 2 triangles depends on whether there are 1 or 2 vertex outside the clipping plane:</p>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a href="http://3.bp.blogspot.com/-C4wcagL7YSQ/T3_4q_4-PQI/AAAAAAAAAUc/x3uXbV63UFU/s1600/clipInternalTri.png"><img src="http://3.bp.blogspot.com/-C4wcagL7YSQ/T3_4q_4-PQI/AAAAAAAAAUc/x3uXbV63UFU/s320/clipInternalTri.png" alt="" width="320" height="208" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center">figure 6. clipping internal triangles</td>
</tr>
</tbody>
</table>
<p>Then the clipped triangles can be passed to the next stage to be rasterized either by a <a href="http://en.wikipedia.org/wiki/Scanline_rendering">scan line</a> algorithm or by a <a href="http://devmaster.net/forums/topic/1145-advanced-rasterization/">half-space</a> algorithm.</p>
<p>Below is the clipping result of an external triangles with 1 vertex behind the camera.</p>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a href="http://1.bp.blogspot.com/-0sruXroYV4E/T4Bd8iwNoZI/AAAAAAAAAU8/N4mkoQLP6KY/s1600/sampleClipExtTri.png"><img src="http://1.bp.blogspot.com/-0sruXroYV4E/T4Bd8iwNoZI/AAAAAAAAAU8/N4mkoQLP6KY/s320/sampleClipExtTri.png" alt="" width="320" height="180" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center">clipping external triangle in software rasterizer</td>
</tr>
</tbody>
</table>
<p>Below is another rasterized result:</p>
<table>
<tbody>
<tr>
<td>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a href="http://2.bp.blogspot.com/-cQ3S0rT6wI0/T4Beace2MUI/AAAAAAAAAVE/EQYVbccwg5Q/s1600/sampleDuckFill.png"><img src="http://2.bp.blogspot.com/-cQ3S0rT6wI0/T4Beace2MUI/AAAAAAAAAVE/EQYVbccwg5Q/s320/sampleDuckFill.png" alt="" width="320" height="180" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center">rasterized duck model</td>
</tr>
</tbody>
</table>
</td>
<td>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a href="http://2.bp.blogspot.com/-gfqOLmsJe1o/T4Beejh4EgI/AAAAAAAAAVM/h7a0yanEe3o/s1600/sampleDuckRef.png"><img src="http://2.bp.blogspot.com/-gfqOLmsJe1o/T4Beejh4EgI/AAAAAAAAAVM/h7a0yanEe3o/s320/sampleDuckRef.png" alt="" width="320" height="180" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center">reference of the duck model</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
<p><strong><span class="Apple-style-span" style="font-size: large">Conclusion</span></strong><br />
In this post, the maths behind the clipping of triangles are explained. Clipping should be done before projecting the homogenous point to the <em>w</em>=1 to avoid taking special cares to clip the external triangles. In the next post, I will talk about the perspective interpolation and the source code will be given in the next post (written in  javascript, drawing to html canvas).</p>
<p>And lastly special thanks to Fabian Giesen for giving feedback during the draft of this post.</p>
<p><strong>References</strong><br />
[1] <a href="http://research.microsoft.com/pubs/73937/p245-blinn.pdf">http://research.microsoft.com/pubs/73937/p245-blinn.pdf</a><br />
[2] <a href="http://medialab.di.unipi.it/web/IUM/Waterloo/node51.html">http://medialab.di.unipi.it/web/IUM/Waterloo/node51.html</a><br />
[3] <a href="http://kriscg.blogspot.com/2010/09/software-occlusion-culling.html">http://kriscg.blogspot.com/2010/09/software-occlusion-culling.html</a><br />
[4] <a href="http://www.slideshare.net/guerrillagames/practical-occlusion-culling-in-killzone-3">http://www.slideshare.net/guerrillagames/practical-occlusion-culling-in-killzone-3</a><br />
[5] <a href="http://www.slideshare.net/repii/parallel-graphics-in-frostbite-current-future-siggraph-2009-1860503">http://www.slideshare.net/repii/parallel-graphics-in-frostbite-current-future-siggraph-2009-1860503</a><br />
[6] <a href="http://fgiesen.wordpress.com/2011/07/05/a-trip-through-the-graphics-pipeline-2011-part-5/">http://fgiesen.wordpress.com/2011/07/05/a-trip-through-the-graphics-pipeline-2011-part-5/</a><br />
[7] <a href="http://devmaster.net/forums/topic/1145-advanced-rasterization/">http://devmaster.net/forums/topic/1145-advanced-rasterization/</a></p>
</div>
]]></content:encoded>
			<wfw:commentRss>http://www.altdevblogaday.com/2012/04/14/software-rasterizer-part-1/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Oxel: A Tool for Occluder Generation</title>
		<link>http://www.altdevblogaday.com/2012/04/06/oxel-a-tool-for-occluder-generation/</link>
		<comments>http://www.altdevblogaday.com/2012/04/06/oxel-a-tool-for-occluder-generation/#comments</comments>
		<pubDate>Fri, 06 Apr 2012 23:30:27 +0000</pubDate>
		<dc:creator>Nick Darnell</dc:creator>
				<category><![CDATA[Computer Graphics]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[Tools]]></category>
		<category><![CDATA[graphics]]></category>
		<category><![CDATA[Hierarchical Z-Buffer]]></category>
		<category><![CDATA[Occlusion Volumes]]></category>
		<category><![CDATA[Rendering]]></category>
		<category><![CDATA[tool]]></category>

		<guid isPermaLink="false">http://altdevblogaday.com/?p=25415</guid>
		<description><![CDATA[<p>Its been awhile since I’ve done an update to my research into occluder generation.  If you need a refresher, take a look at:</p>
<ul>
<li><a href="http://www.nickdarnell.com/2011/06/hierarchical-z-buffer-occlusion-culling-generating-occlusion-volumes/" target="_blank">Hierarchical Z-Buffer Occlusion Culling – Generating Occlusion Volumes</a></li>
</ul>
<p><a href="http://www.altdevblogaday.com/2012/04/06/oxel-a-tool-for-occluder-generation/" class="more-link">Read more on Oxel: A Tool for Occluder Generation&#8230;</a></p>
]]></description>
				<content:encoded><![CDATA[<p>Its been awhile since I’ve done an update to my research into occluder generation.  If you need a refresher, take a look at:</p>
<ul>
<li><a href="http://www.nickdarnell.com/2011/06/hierarchical-z-buffer-occlusion-culling-generating-occlusion-volumes/" target="_blank">Hierarchical Z-Buffer Occlusion Culling – Generating Occlusion Volumes</a></li>
<li><a href="http://www.nickdarnell.com/2011/09/robust-inside-and-outside-solid-voxelization/" target="_blank">Robust Inside and Outside Solid Voxelization</a></li>
</ul>
<p>Lets start with the newest and most important information&#8230;</p>
<h3>Update 4/13/2012</h3>
<p>The project is now hosted on BitBucket, <a href="http://bitbucket.org/NickDarnell/oxel/overview" target="_blank">http://bitbucket.org/NickDarnell/oxel/overview</a> feel free to contribute or just follow along :)</p>
<h3>It’s a Tool Now!</h3>
<p><img style="margin: 5px;padding-left: 0px;padding-right: 0px;padding-top: 0px;border-width: 0px" src="http://altdevblogaday.com/wp-content/uploads/2012/04/oxel_v1.png" alt="oxel_v1" width="590" height="592" border="0" /></p>
<h4>Download</h4>
<p><a href="http://bitbucket.org/NickDarnell/oxel/downloads" target="_blank">Oxel 1.0.0 &#8211; Win32.zip</a></p>
<h4>Requirements</h4>
<ul>
<li><a href="http://www.microsoft.com/download/en/details.aspx?id=17718" target="_blank">.Net 4.0</a></li>
<li><a href="http://www.microsoft.com/download/en/details.aspx?id=5555" target="_blank">VS2010 C++ Runtime</a></li>
</ul>
<h4>Description</h4>
<p>Oxel is a tool for generating occluders – primarily for use with the Hierarchical Z-Buffer method of occlusion culling.  Open an .obj file then go to Build &gt; Voxelize to generate the proxy.  Try it out, let me know what you think.</p>
<p>There are some further improvements that have been made over the method that is laid out in the original <a href="http://www.nickdarnell.com/2011/06/hierarchical-z-buffer-occlusion-culling-generating-occlusion-volumes/" target="_blank">Generating Occlusion Volumes</a> post,</p>
<ul>
<li>Retriangulation</li>
<li>Evaluating Occlusion-ness</li>
<li>Filtering Polygons</li>
</ul>
<h3>Retriangulation</h3>
<p>I went back to the CSG article <a href="http://sandervanrossen.blogspot.com/">Sander van Rossen</a> and Matthew Baranowski wrote.  I noticed near the end that they recommended retriangulating the final CSG mesh because their library doesn’t handle it.</p>
<p>The easiest method I found to do the retriangulation was to just use David Eberly’s Wild Magic math library which can do it, <a href="http://www.geometrictools.com/SampleMathematics/Triangulation/Triangulation.html" target="_blank">see here</a>.  Before you perform the retriangulation you need to collect all the external/internal edge loops on each surface and then merge collinear edges before performing the retriangulation.</p>
<p>It’s was definitely worth doing, performing the retriangulation saved about 30% of the triangles.</p>
<h3>Evaluating Occlusion-ness</h3>
<p>When you’re generating occluder geometry you need to ensure that the volumes you’re adding are useful enough to pay the additional polygon tax they will incur.</p>
<p>Oxel achieves this by measuring the number of pixels written to the color buffer by rendering the original visual mesh into the stencil buffer, and then using the stencil buffer to clip a full screen quad while a “samples passed” hardware query is performed to record the number of pixels written.  This is performed from about 64 different camera angles – then when all the samples are summed together this defines the <strong>ground truth</strong> occlusion of the mesh.</p>
<p>Knowing that information we can evaluate every new box we plan to add to the final occlusion proxy geometry and ask – Does this increase the coverage enough to make it worth it?.  The default threshold is set to 3%.  If a new volume does not at least cover 3% of new silhouette pixels, we don’t include it in the final occluder mesh.</p>
<h3>Filtering Polygons</h3>
<p>Not every game needs the occluders to be visible from all angles.  For example most racing games will never have the cars jumping so high they see on top of a building.  In those circumstances you may not want to have the occluders bothering to have polygons on the top.</p>
<table width="596" border="0" cellspacing="0" cellpadding="2">
<tbody>
<tr>
<td valign="top" width="298"><a href="http://altdevblogaday.com/wp-content/uploads/2012/04/with.png"><img style="margin: 1px;padding-left: 0px;padding-right: 0px;padding-top: 0px;border-width: 0px" src="http://altdevblogaday.com/wp-content/uploads/2012/04/with_thumb.png" alt="with" width="275" height="168" border="0" /></a></td>
<td valign="top" width="296"><a href="http://altdevblogaday.com/wp-content/uploads/2012/04/without.png"><img style="margin: 1px;padding-left: 0px;padding-right: 0px;padding-top: 0px;border-width: 0px" src="http://altdevblogaday.com/wp-content/uploads/2012/04/without_thumb.png" alt="without" width="275" height="168" border="0" /></a></td>
</tr>
</tbody>
</table>
<p>So the tool offers the ability to remove Top and Bottom polygons from the final occluder mesh after everything has finished being processed.</p>
<p>If you decide to filter out all top polygons they’re all removed.  For bottom surfaces this has to be handled slightly differently, see below -</p>
<p><a href="http://altdevblogaday.com/wp-content/uploads/2012/04/bottom_removed.png"><img style="margin: 5px;padding-left: 0px;padding-right: 0px;padding-top: 0px;border-width: 0px" src="http://altdevblogaday.com/wp-content/uploads/2012/04/bottom_removed_thumb.png" alt="bottom_removed" width="580" height="331" border="0" /></a></p>
<p>Imagine a bridge structure or a large over hang on a building.  You wouldn’t want the bottom portion of the occluder to be removed from the overhang since it would allow seeing into and through the occluder &#8211; because presumably you wouldn’t have double sided rendering enabled.  So to distinguish between bottom surfaces not meant to ever be seen by the player, and bottom surfaces that may be important to higher up potions of the structure, only bottom surfaces within 1 voxel’s height from the base of the mesh are removed.  (See picture above)</p>
<h3>Work Continues</h3>
<p>I’ve got some more ideas I need to test out but I wanted to go ahead and cut a version of where I’m at and let others take a look and give me some feedback.</p>
<p>One area I need to investigate next is a better way to generate the boxes.  Currently they are just expanded in all directions equally, but that’s not ideal.  I’m thinking about trying a parallel brute force method that would test all possible boxes at a specific origin point to find the best shaped box to generate from that point.</p>
<p><a href="http://www.nickdarnell.com/2012/04/oxel-a-tool-for-occluder-generation/" target="_blank">Oxel: A Tool for Occluder Generation @ nickdarnell.com</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.altdevblogaday.com/2012/04/06/oxel-a-tool-for-occluder-generation/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Seamless Cube Map Filtering</title>
		<link>http://www.altdevblogaday.com/2012/03/03/seamless-cube-map-filtering/</link>
		<comments>http://www.altdevblogaday.com/2012/03/03/seamless-cube-map-filtering/#comments</comments>
		<pubDate>Sat, 03 Mar 2012 06:53:11 +0000</pubDate>
		<dc:creator>Ignacio Castaño</dc:creator>
				<category><![CDATA[Computer Graphics]]></category>
		<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://altdevblogaday.com/?p=24628</guid>
		<description><![CDATA[<p>Modern GPUs filter seamlessly across cube map faces. This feature is enabled automatically when using Direct3D 10 and 11 and in OpenGL when using the <a href="http://www.opengl.org/registry/specs/ARB/seamless_cube_map.txt">ARB_seamless_cube_map</a> extension. However, it&#8217;s not exposed through Direct3D 9 and it&#8217;s just not available in any of the current generation consoles.</p>
<p><a href="http://www.altdevblogaday.com/2012/03/03/seamless-cube-map-filtering/" class="more-link">Read more on Seamless Cube Map Filtering&#8230;</a></p>
]]></description>
				<content:encoded><![CDATA[<p>Modern GPUs filter seamlessly across cube map faces. This feature is enabled automatically when using Direct3D 10 and 11 and in OpenGL when using the <a href="http://www.opengl.org/registry/specs/ARB/seamless_cube_map.txt">ARB_seamless_cube_map</a> extension. However, it&#8217;s not exposed through Direct3D 9 and it&#8217;s just not available in any of the current generation consoles.</p>
<p>There are several solutions for this problem. Texture borders solve it elegantly, but are not available on all hardware, and only exposed through the OpenGL API (and proprietary APIs in some consoles).</p>
<p>When textures are static a common solution is to pre-process them in an attempt to eliminate the edge seams. In a short siggraph sketch, John Isidoro <a href="http://developer.amd.com/media/gpu_assets/Isidoro-CubeMapFiltering-Sketch-SIG05.pdf">proposed averaging cube map edge texels</a> across edges and obscuring the effect of the averaging by adjusting the intensity of the nearby texels using various methods. These methods are implemented in <a href="http://developer.amd.com/archive/gpu/cubemapgen/pages/default.aspx">AMD&#8217;s CubeMapGen</a>, whose source code is now <a href="http://code.google.com/p/cubemapgen/">publicly available online</a>. While this seems like a good idea, a few minutes experimenting with CubeMapGen make it obvious that it does not always work very well!</p>
<h2>Embedded Texture Borders</h2>
<p>A very simple solution that even works for dynamic cube maps is to slightly increase the FOV of the perspective projection so that the edges of adjacent faces match up exactly. <a href="http://www.gamedev.net/blog/73/entry-2005516-seamless-filtering-across-faces-of-dynamic-cube-map/">Ysaneya shows</a> that in order to achieve that, the FOV needs to be tweaked as follows:</p>
<p><code>
<pre>
fov = 2.0 * atan(n / (n - 0.5))
</pre>
<p></code></p>
<p>where <code>n</code> is the resolution of the cube map.</p>
<p>What this is essentially doing is to scale down the face images by one texel and padding them with a border of texels that is shared between adjacent faces. Since the texels at the face edges are now identical the seams are gone. </p>
<p>In practice this is much trickier than it sounds. While the fragments at the adjacent face borders should sample the scene in the same direction, rasterization rules do not guarantee that in both cases the rasterized fragments will match.</p>
<p>However, if we take this idea to the realm of offline cube map generation, we can easily guarantee exact results. Cube maps are often used to store directional functions. Each texel has an associated uv coordinate within the cube map face, from which we derive a direction vector that is then used to sample our directional function. Examples of such functions include expensive BRDFs that we would like to precompute, or an environment map sampled using angular extent filtering.</p>
<p>Usually these uv coordinates are computed so that the resulting direction vectors point to the texel centers. For an integer texel coordinate <code>x</code> in the <code>[0,n-1]</code> range we map it to a floating point coordinate <code>u</code> in the <code>[-1, 1]</code> range as follows:</p>
<p><code>
<pre>
map_1(x) = (x + 0.5) * 2 / n - 1
</pre>
<p></code></p>
<p>We then obtain the corresponding direction vector as follows:</p>
<p><code>
<pre>
dir = normalize(faceVector + faceU * map_1(x) + faceV * map_1(y)
</pre>
<p></code></p>
<p>When doing that, the texels at the borders do not map to <code>-1</code> and <code>1</code> exactly, but to:</p>
<p><code>
<pre>
map(0) = -1 + 1/n
map(n-1) = 1 - 1/n
</pre>
<p></code></p>
<p>In our case we want the edges of each face to match up exactly to they result in the same direction vectors. That can be achieved with a function like this:</p>
<p><code>
<pre>
map_2(x) = 2 * x / (n - 1) - 1
</pre>
<p></code></p>
<p>If we use this map to sample our directional function, the resulting cube map is seamless, but the face images are scaled down uniformly. In the first case the slope of the map is:</p>
<p><code>
<pre>
map_1'(x) = 2 / n
</pre>
<p></code></p>
<p>but in the second case it is slightly different:</p>
<p><code>
<pre>
map_2'(x) = 2 / (n - 1)
</pre>
<p></code></p>
<p>This technique works very well at high resolutions. When n is sufficiently high, the change in slope between map_1 and map_2 becomes minimal. However, at low resolutions the stretching on the interior of the face can become noticeable.</p>
<p>A better solution is to stretch the image only in the proximity of the edges. That can be achieved warping the uv face coordinates with a cubic polynomial of this form:</p>
<p><img src="http://altdevblogaday.com/wp-content/uploads/2012/02/warp.png" alt="" title="warp3" width="360" height="222" class="alignright size-full wp-image-24718" /><code>
<pre>
warp3(x) = ax^3 + x
</pre>
<p></code></p>
<p>We can compose this function with our original mapping. The result around the origin is close to a linear identity, but we can adjust <code>a</code> to stretch the function closer to the face edges. In our case we want the values at <code>1-1/n</code> to produce <code>1</code> instead, so we can easily determine the value of <code>a</code> by solving:</p>
<p><code>
<pre>
warp3(1-1/n) = ax^3 + x = 1
</pre>
<p></code></p>
<p>which gives us:</p>
<p><code>
<pre>
a = n^2 / (n-1)^3
</pre>
<p></code></p>
<p>I implemented the linear stretch and cubic warping methods in <a href="http://code.google.com/p/nvidia-texture-tools/">NVTT</a> and they often produce better results than the methods available in AMD&#8217;s CubeMapGen. However, I was not entirely satisfied. While this removed the zero-order discontinuity, it introduced a first-order discontinuity that in some cases was even more noticeable than the artifacts it was intended to remove.</p>
<p>You can hover the cursor over the following image to show how the warp edge fixup method eliminates the discontinuities, but sometimes still results in visible artifacts:</p>
<style type="text/css">
img.hover {
  display:none
}
a:hover img.hover {
  display:inline
}
a:hover img.nohover {
  display:none
}
</style>
<p><a><img src="http://altdevblogaday.com/wp-content/uploads/2012/02/seamless_warp.png" alt="" title="" width="560" height="273" class="hover aligncenter size-full wp-image-24672" /><img src="http://altdevblogaday.com/wp-content/uploads/2012/02/seams.png" alt="" title="" width="560" height="273" class="nohover aligncenter size-full wp-image-24673" /></a></p>
<p>Any edge fixup method is going to force the slope of the color gradient across the edge to be zero, because it needs to duplicate the border texels. The eye seems to be very sensible to this form of discontinuity and it&#8217;s questionable whether this is better than the original artifact. Maybe other warp functions would make the discontinuity less obvious, or maybe it could be smoothed like Isidoro&#8217;s method do. At the time I implemented this I thought the remaining artifacts did not deserve more attention and moved on to other tasks.</p>
<h2>Modifed Texture Lookup</h2>
<p>However, a few days ago <a href ="https://twitter.com/#!/SebLagarde">Sebastien Lagarde</a> integrated these methods in AMD&#8217;s CubeMapGen. See <a href="http://seblagarde.wordpress.com/2012/02/26/amd-cubemapgen-for-physically-based-rendering/">this post</a> for more results and comparisons against other methods. That got me thinking again about this and then I realized that the only thing that needs to be done to avoid the seams is to modify the texture coordinates at runtime the same way we modify them during the offline cube map evaluation. At first I thought that would be impractical, because it would require projecting the texture coordinates onto the cube map faces, but turns out that the resulting math is very simple. In the case of the uniform stretch that I first suggested, the transform required at runtime is just a conditional per-component multiplication:</p>
<p><code>
<pre>
float3 fix_cube_lookup(float3 v) {
   float M = max(max(abs(v.x), abs(v.y)), abs(v.z));
   float scale = (cube_size - 1) / cube_size;
   if (abs(v.x) != M) v.x *= scale;
   if (abs(v.y) != M) v.y *= scale;
   if (abs(v.z) != M) v.z *= scale;
   return v;
}
</pre>
<p></code></p>
<p>One problem is that we need to know the size of the cube map face in advance, but every mipmap has a different size and we may not know what mipmap is going to be sampled in advance. So, this method only works when explicit LOD is used.</p>
<p>Another issue is that with trilinear filtering enabled, the hardware samples from two contiguous mipmap levels. Ideally we would have to use a different scale factor for each mipmap level. That could be achieved sampling them separately and combining the result manually, but in practice, using the same scale for both levels seems to produce fairly good results. We can easily find a scale factor that works well for fractional LODs as a function of the LOD value and the size of the top level mipmap:</p>
<p><code>
<pre>
float scale = 1 - exp2(lod) / cube_size;
if (abs(v.x) != M) v.x *= scale;
if (abs(v.y) != M) v.y *= scale;
if (abs(v.z) != M) v.z *= scale;
</pre>
<p></code></p>
<p>If you are using cube maps to store prefiltered environment maps, chances are you are computing the cube map LOD from the specular power using <code>log2(specular_power)</code>. If that&#8217;s the case, the two transcendental instructions cancel out and the scale becomes a linear function of the specular power.</p>
<p>The images below show the results using the warp filtering method (these were chosen to highlight the artifacts of the warp method). Hover the cursor over the images to visualize the results of the new approach:</p>
<p><a><img src="http://altdevblogaday.com/wp-content/uploads/2012/02/examples_warp.png" alt="" title="" width="566" height="582" class="nohover alignnone size-full wp-image-24694" /><img src="http://altdevblogaday.com/wp-content/uploads/2012/02/examples_seamless.png" alt="" title="" width="566" height="582" class="hover alignnone size-full wp-image-24693" /></a></p>
<p>I&#8217;d like to thank Sebastien Lagarde for his valuable feedback while testing these ideas and for providing the nice images accompanying this article.</p>
<p><em>Note: This article is also published at <a href="http://the-witness.net/news/2012/02/seamless-cube-map-filtering/">The Witness blog</a>.</em></p>
]]></content:encoded>
			<wfw:commentRss>http://www.altdevblogaday.com/2012/03/03/seamless-cube-map-filtering/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Light Pre Pass Renderer on iPhone</title>
		<link>http://www.altdevblogaday.com/2012/03/01/light-pre-pass-renderer-on-iphone/</link>
		<comments>http://www.altdevblogaday.com/2012/03/01/light-pre-pass-renderer-on-iphone/#comments</comments>
		<pubDate>Thu, 01 Mar 2012 16:26:11 +0000</pubDate>
		<dc:creator>Simon Yeung</dc:creator>
				<category><![CDATA[Computer Graphics]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[light pre pass]]></category>

		<guid isPermaLink="false">http://altdevblogaday.com/?p=24684</guid>
		<description><![CDATA[<p><strong><span class="Apple-style-span" style="font-size: large">Introduction</span></strong><br />
About a month ago, I bought an iPhone 4s, so I write some code on my new toy. Although this device does not support multiple render target(MRT), it do support rendering to a floating point render target (only available on iPhone 4s and iPad2). So I test it with a <a href="http://diaryofagraphicsprogrammer.blogspot.com/2008/03/light-pre-pass-renderer.html">light pre pass renderer</a>:</p>
<div class="separator" style="clear: both;text-align: center"><a href="http://2.bp.blogspot.com/-uMlCThLlIpM/T0-cA4vQyWI/AAAAAAAAARY/M-CZsZsTi-0/s1600/srcShot.png"><img src="http://2.bp.blogspot.com/-uMlCThLlIpM/T0-cA4vQyWI/AAAAAAAAARY/M-CZsZsTi-0/s320/srcShot.png" alt="" width="320" height="213" border="0" /></a></div>
<p>In the test, HDR lighting is done (gamma= 2.0 instead of 2.2, without adaptation) with 3 post processing filters (<a href="http://filmicgames.com/archives/75">flimic tone mapping</a>, bloom and <a href="http://www.crytek.com/sites/default/files/GDC08_SousaT_CrysisEffects.ppt">photo filter</a>). In the test scene, 3 directional lights(1 of them cast shadow with 4 cascade) and 30 point lights are used with 2 skinned models and running bullet physics at the same time which can have around 28~32fps.</p>
<p><a href="http://www.altdevblogaday.com/2012/03/01/light-pre-pass-renderer-on-iphone/" class="more-link">Read more on Light Pre Pass Renderer on iPhone&#8230;</a></p>
]]></description>
				<content:encoded><![CDATA[<p><strong><span class="Apple-style-span" style="font-size: large">Introduction</span></strong><br />
About a month ago, I bought an iPhone 4s, so I write some code on my new toy. Although this device does not support multiple render target(MRT), it do support rendering to a floating point render target (only available on iPhone 4s and iPad2). So I test it with a <a href="http://diaryofagraphicsprogrammer.blogspot.com/2008/03/light-pre-pass-renderer.html">light pre pass renderer</a>:</p>
<div class="separator" style="clear: both;text-align: center"><a href="http://2.bp.blogspot.com/-uMlCThLlIpM/T0-cA4vQyWI/AAAAAAAAARY/M-CZsZsTi-0/s1600/srcShot.png"><img src="http://2.bp.blogspot.com/-uMlCThLlIpM/T0-cA4vQyWI/AAAAAAAAARY/M-CZsZsTi-0/s320/srcShot.png" alt="" width="320" height="213" border="0" /></a></div>
<p>In the test, HDR lighting is done (gamma= 2.0 instead of 2.2, without adaptation) with 3 post processing filters (<a href="http://filmicgames.com/archives/75">flimic tone mapping</a>, bloom and <a href="http://www.crytek.com/sites/default/files/GDC08_SousaT_CrysisEffects.ppt">photo filter</a>). In the test scene, 3 directional lights(1 of them cast shadow with 4 cascade) and 30 point lights are used with 2 skinned models and running bullet physics at the same time which can have around 28~32fps.</p>
<p><strong><span class="Apple-style-span" style="font-size: large">G-buffer layout</span></strong><br />
I have tried 2 different layout for the G-buffer. My first attempt is to use one 16-bit render target with R channel storing the depth value, G and B channel storing the view space normal using the encoding method from &#8220;<a href="http://www.crytek.com/sites/default/files/A_bit_more_deferred_-_CryEngine3.ppt">A bit more deferred-CryEngine 3</a>&#8221; and A channel storing the glossiness for specular lighting calculation. But later I discovered that this device support the openGL extension <a href="http://www.khronos.org/registry/gles/extensions/OES/OES_depth_texture.txt">GL_OES_depth_texture</a> which can render the depth buffer into a texture. So my second attempt is to switch the G-buffer layout to use the RGB channels to store the view space normal without encoding and A channel storing the glossiness while the depth can be sampled directly from the depth texture.</p>
<table border="0">
<tbody>
<tr>
<td>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a href="http://3.bp.blogspot.com/-kru75sEFTXw/T02Pxn4vvwI/AAAAAAAAAP4/WLRktFxP3Uw/s1600/GBuffer.png"><img src="http://3.bp.blogspot.com/-kru75sEFTXw/T02Pxn4vvwI/AAAAAAAAAP4/WLRktFxP3Uw/s320/GBuffer.png" alt="" width="320" height="213" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center">G-buffer storing view space normal and glossiness</td>
</tr>
</tbody>
</table>
</td>
<td>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a href="http://4.bp.blogspot.com/-b1WgDqDTWrg/T02P4H366zI/AAAAAAAAAQA/RISKKEvSdug/s1600/depthBuffer.png"><img src="http://4.bp.blogspot.com/-b1WgDqDTWrg/T02P4H366zI/AAAAAAAAAQA/RISKKEvSdug/s320/depthBuffer.png" alt="" width="320" height="213" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center">Depth buffer</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
<p>Switching to this layout gives a boost in the frame rate as the normal value does not need to encode/decode from the texture. However, making the 16-bit render target to 8-bit to store normal and glossiness does not give any performance improvement, probably because the test scene is not bound by band width.</p>
<p><strong><span class="Apple-style-span" style="font-size: large">Stencil optimization</span></strong><br />
The second optimization is to optimize the deferred lights, using the <a href="http://www.insomniacgames.com/tech/articles/0409/files/GDC09_Lee_Prelighting.pdf">stencil trick</a> by <a href="http://altdevblogaday.com/2011/08/08/stencil-buffer-optimisation-for-deferred-lights/">drawing a convex light polygon</a> to cull those pixels that do not need to perform lighting.</p>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a href="http://3.bp.blogspot.com/-uaZ--IZ_BOc/T02QCtiO3KI/AAAAAAAAAQI/ZblDin2h_W4/s1600/lightBound.png"><img src="http://3.bp.blogspot.com/-uaZ--IZ_BOc/T02QCtiO3KI/AAAAAAAAAQI/ZblDin2h_W4/s320/lightBound.png" alt="" width="320" height="213" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center">drawing the bounding volume of the point lights</td>
</tr>
</tbody>
</table>
<p>However, after finish implementing the stencil trick, the frame rate drops&#8230; This is because when filling the stencil buffer,  I use the shader that is the same as the one used for performing lighting. Even the color write is disabled during filling the stencil buffer, the GPU is still doing redundant work. So a simple shader is used in the stencil pass instead which improve the performance.<br />
Also, drawing out the shape of the point lights make me discover that the attenuation factor I used (i.e. 1/(1+k.d+k.d^2) ) have a large area that does not get lit, so I switch to a more simple linear falloff model (e.g. 1- lightDistance/lightRange, can give an exponent to control the falloff) to give a tighter bound.</p>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a href="http://4.bp.blogspot.com/-53WtniKcrCc/T0-cIh4s19I/AAAAAAAAARg/2dQKm4ZPP7M/s1600/lightBuffer.png"><img src="http://4.bp.blogspot.com/-53WtniKcrCc/T0-cIh4s19I/AAAAAAAAARg/2dQKm4ZPP7M/s320/lightBuffer.png" alt="" width="320" height="213" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center">light buffer</td>
</tr>
</tbody>
</table>
<p><strong><span class="Apple-style-span" style="font-size: large">Combining post-processing passes</span></strong><br />
Combining the full screen render passes can help performance. In the test scene, originally the bloom result is additively blend with the tone-mapped scene render target, followed by a photo filter and render to the back buffer. Combining these passes by calculating the additive blend with tone-mapped scene inside the photo filter shader which is faster than before.</p>
<p><span class="Apple-style-span" style="font-size: large"><strong>Resolution</strong></span><br />
The program is run at a low resolution with back buffer of 480x320pixels. Also, the G-buffer and the post processing textures are further scaled down to 360x300pixels. This can reduce the number of fragments need to be shaded by the pixel shaders.</p>
<p><strong><span class="Apple-style-span" style="font-size: large">Shadow</span></strong><br />
In the scene, cascaded shadow map is used with 4 cascade (resolution= 256&#215;256). I have tried using the <a href="http://www.khronos.org/registry/gles/extensions/EXT/EXT_shadow_samplers.txt">GL_EXT_shadow_samplers</a> extension, hoping that it can helps the frame rate. But the result is disappointing as the speed of the extension is the same as performing comparison inside the shader&#8230;</p>
<div class="separator" style="clear: both;text-align: center"><a href="http://4.bp.blogspot.com/-CCKJW02KpEw/T0-SyPMrzNI/AAAAAAAAAQw/SYOuiHcOHac/s1600/shadowMaps.png"><img src="http://4.bp.blogspot.com/-CCKJW02KpEw/T0-SyPMrzNI/AAAAAAAAAQw/SYOuiHcOHac/s320/shadowMaps.png" alt="" width="320" height="213" border="0" /></a></div>
<p>It takes around 8ms for calculating shadow and blurring it. If a basic shadow map is used instead (i.e. without cascade) with blurring, it gives some or little performance boost depends on whether there are how many point lights on screen. Of course switching off the blur will speed up the shadow calculation a lot.</p>
<table border="0">
<tbody>
<tr>
<td>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a href="http://2.bp.blogspot.com/-cJ_IwJ4jtWQ/T0-camn3C-I/AAAAAAAAARo/tgxtCrHsfgc/s1600/shadow.png"><img src="http://2.bp.blogspot.com/-cJ_IwJ4jtWQ/T0-camn3C-I/AAAAAAAAARo/tgxtCrHsfgc/s1600/shadow.png" alt="" width="320" height="213" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center">basic shadow map</td>
</tr>
</tbody>
</table>
</td>
<td>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a href="http://3.bp.blogspot.com/-dyKy81ILMbg/T0-c9NXhG7I/AAAAAAAAAR0/AxJOFdC2fVQ/s1600/shadowBlur.png"><img src="http://3.bp.blogspot.com/-dyKy81ILMbg/T0-c9NXhG7I/AAAAAAAAAR0/AxJOFdC2fVQ/s1600/shadowBlur.png" alt="" width="320" height="213" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center">basic shadow map with blur</td>
</tr>
</tbody>
</table>
</td>
</tr>
<tr>
<td>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a href="http://1.bp.blogspot.com/-VLwYJrCIdBU/T0-dbc_y6qI/AAAAAAAAASA/VbntvAtRcV8/s1600/cascaded.png"><img src="http://1.bp.blogspot.com/-VLwYJrCIdBU/T0-dbc_y6qI/AAAAAAAAASA/VbntvAtRcV8/s1600/cascaded.png" alt="" width="320" height="213" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center">Cascaded shadow map</td>
</tr>
</tbody>
</table>
</td>
<td>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a href="http://2.bp.blogspot.com/-Ajh3NL7SJ-I/T0-dncFTXfI/AAAAAAAAASM/bQPD9ZQP5-U/s320/cascadedBlur.png"><img src="http://2.bp.blogspot.com/-Ajh3NL7SJ-I/T0-dncFTXfI/AAAAAAAAASM/bQPD9ZQP5-U/s320/cascadedBlur.png" alt="" width="320" height="213" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center">Cascaded shadow map with blur</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
<div><strong><span class="Apple-style-span" style="font-size: large">Conclusion</span></strong><br />
In this post, I described the methods used to make a light pre pass renderer to run on the iPhone to achieve 30fps with 30 dynamic lights. However, high resolution is sacrificed in order to keep the dynamic lights, HDR lighting and the post processing filters. Also, no anti aliasing is done in the test as the frame rate is not good enough. May be MSAA can be done if the basic shadow map is used instead of cascade. But these will leave for future investigation.</div>
<div></div>
<div>
<p><strong><span class="Apple-style-span" style="font-size: x-small">References</span></strong><br />
<span class="Apple-style-span" style="font-size: x-small">[1] Light Pre Pass Renderer: <a href="http://diaryofagraphicsprogrammer.blogspot.com/2008/03/light-pre-pass-renderer.html">http://diaryofagraphicsprogrammer.blogspot.com/2008/03/light-pre-pass-renderer.html</a></span><br />
<span class="Apple-style-span" style="font-size: x-small">[2] A bit more deferred &#8211; CryEngine 3: <a href="http://www.crytek.com/sites/default/files/A_bit_more_deferred_-_CryEngine3.ppt">http://www.crytek.com/sites/default/files/A_bit_more_deferred_-_CryEngine3.ppt</a></span><br />
<span class="Apple-style-span" style="font-size: x-small">[3] Filmic tone mapping operators: <a href="http://filmicgames.com/archives/75">http://filmicgames.com/archives/75</a></span><br />
<span class="Apple-style-span" style="font-size: x-small">[4] Crysis Next Gen Effects: <a href="http://www.crytek.com/sites/default/files/GDC08_SousaT_CrysisEffects.ppt">http://www.crytek.com/sites/default/files/GDC08_SousaT_CrysisEffects.ppt</a></span><br />
<span class="Apple-style-span" style="font-size: x-small">[5] Position From Depth 3: Back In The Habit: <a href="http://mynameismjp.wordpress.com/2010/09/05/position-from-depth-3/">http://mynameismjp.wordpress.com/2010/09/05/position-from-depth-3/</a></span><br />
<span class="Apple-style-span" style="font-size: x-small">[6] Fast Mobile Shaders: <a href="http://blogs.unity3d.com/wp-content/uploads/2011/08/FastMobileShaders_siggraph2011.pdf">http://blogs.unity3d.com/wp-content/uploads/2011/08/FastMobileShaders_siggraph2011.pdf</a></span><br />
<span class="Apple-style-span" style="font-size: x-small">[7] GLSL Optimizer: <a href="http://aras-p.info/blog/2010/09/29/glsl-optimizer/">http://aras-p.info/blog/2010/09/29/glsl-optimizer/</a></span><br />
<span class="Apple-style-span" style="font-size: x-small">[8] Deferred Cascaded Shadow Maps: <a href="http://aras-p.info/blog/2009/11/04/deferred-cascaded-shadow-maps/">http://aras-p.info/blog/2009/11/04/deferred-cascaded-shadow-maps/</a></span></p>
</div>
]]></content:encoded>
			<wfw:commentRss>http://www.altdevblogaday.com/2012/03/01/light-pre-pass-renderer-on-iphone/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Getting the Projected Extent of a Sphere to the Near Plane</title>
		<link>http://www.altdevblogaday.com/2012/03/01/getting-the-projected-extent-of-a-sphere-to-the-near-plane/</link>
		<comments>http://www.altdevblogaday.com/2012/03/01/getting-the-projected-extent-of-a-sphere-to-the-near-plane/#comments</comments>
		<pubDate>Thu, 01 Mar 2012 13:47:40 +0000</pubDate>
		<dc:creator>Jaewon Jung</dc:creator>
				<category><![CDATA[#gamedev]]></category>
		<category><![CDATA[Computer Graphics]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[binning]]></category>
		<category><![CDATA[geometry]]></category>
		<category><![CDATA[sphere]]></category>
		<category><![CDATA[tile]]></category>
		<category><![CDATA[trigonometry]]></category>

		<guid isPermaLink="false">http://altdevblogaday.com/?p=24732</guid>
		<description><![CDATA[<p>Recently I had a chance to implement a toy ray-tracer for a scene with spheres only. Once a brute-force approach had been coded, I tried to optimize it by pre-sorting spheres into screen tiles(i.e. a uniform grid on the near view plane) so that only spheres binned to the tile can be considered when tracing the ray for a pixel in that tile. This wasn&#8217;t that simple as I originally had expected because in a perspective projection a sphere projected to the 2D screen is not a  circle(in fact, not a shape representable by any simple equation at all, AFAIK). But, with a little bit of geometry and trigonometry gimmickry, it was possible to compute a tight rectangular bound of the projected sphere so that it can be binned to all relevant tiles.</p>
<p><a href="http://www.altdevblogaday.com/2012/03/01/getting-the-projected-extent-of-a-sphere-to-the-near-plane/" class="more-link">Read more on Getting the Projected Extent of a Sphere to the Near Plane&#8230;</a></p>
]]></description>
				<content:encoded><![CDATA[<p>Recently I had a chance to implement a toy ray-tracer for a scene with spheres only. Once a brute-force approach had been coded, I tried to optimize it by pre-sorting spheres into screen tiles(i.e. a uniform grid on the near view plane) so that only spheres binned to the tile can be considered when tracing the ray for a pixel in that tile. This wasn&#8217;t that simple as I originally had expected because in a perspective projection a sphere projected to the 2D screen is not a  circle(in fact, not a shape representable by any simple equation at all, AFAIK). But, with a little bit of geometry and trigonometry gimmickry, it was possible to compute a tight rectangular bound of the projected sphere so that it can be binned to all relevant tiles.</p>
<p>The picture below, which manually drawn by me, shows the situation and geometric parameters involved. Definitely not a fine art, but this was the best possible with my drawing skill :)</p>
<div id="attachment_24736" class="wp-caption aligncenter" style="width: 628px"><a href="http://altdevblogaday.com/wp-content/uploads/2012/03/sphere_binning.jpg"><img class="size-full wp-image-24736 " src="http://altdevblogaday.com/wp-content/uploads/2012/03/sphere_binning.jpg" alt="" width="618" height="874" /></a><p class="wp-caption-text">Geometry of a &#039;sphere in a frustum projected to the near plane&#039;</p></div>
<p>As you can see, this shows only the situation in the vertical(y) dimension, but the same derivation goes for the horizontal(x) dimension, also. A sphere of radius <em><strong>r</strong></em> is being projected here. The downscaled radius, <em><strong>r&#8217;</strong></em> at the near plane can be easily calculated using a proportional expression(<em><strong>Eq. 1</strong></em>). In the zoomed-in diagram at the bottom, you can see how the final extent in y dimension should be arranged. In the end, it will be like:</p>
<p style="text-align: center"><em><strong>r&#8217;/cos(theta)-Ydown &lt; (Y &#8211; projected_sphere_center) &lt; r&#8217;/cos(theta) +Yup</strong></em></p>
<p> Here, the <em><strong>projected_sphere_center</strong></em> means the projection of the sphere center to the near plane. the <em><strong>theta</strong></em> is an angle between the view vector and the vector from eye to the projected sphere center. To get the <em><strong>Yup</strong></em> &amp; <em><strong>Ydown</strong></em>, two proportional relations used again, one for the up part, the other for the down part(refer to the <em>blue/purple arcs</em> in the zoom-in part). I hope the proportional relations are obvious from my hand-drawing. Once <em><strong>Yup</strong></em> and <em><strong>Ydown</strong></em> calculated using <em><strong>Eq.2</strong></em>, <em><strong>Eq. 3</strong></em>, the extent of <em><strong>Y</strong></em> is obtained like the inequality above(note that with the convention of positive <em><strong>theta</strong></em> in a counter-clock-wise rotation, the same pair of equations can be used to get the extent even when the sphere is below the view vector rather than above). One can get the extent of <em><strong>X</strong></em> in the same way, and with this 2D extent obtained, screen tiles which overlap with this rectangular area can be easily identified.</p>
<p>In hindsight, the result is quite basic and requires only the primitive knowledge of geometry &amp; trigonometry. Nonetheless I thought it would be meaningful to document the derivation. I hope this is useful for some people. If you find any error or happen to know a better way, please enlighten me!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.altdevblogaday.com/2012/03/01/getting-the-projected-extent-of-a-sphere-to-the-near-plane/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Extracting dominant light from Spherical Harmonics</title>
		<link>http://www.altdevblogaday.com/2012/02/14/extracting-dominant-light-from-spherical-harmonics/</link>
		<comments>http://www.altdevblogaday.com/2012/02/14/extracting-dominant-light-from-spherical-harmonics/#comments</comments>
		<pubDate>Tue, 14 Feb 2012 01:59:38 +0000</pubDate>
		<dc:creator>Simon Yeung</dc:creator>
				<category><![CDATA[Computer Graphics]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[dominant light]]></category>
		<category><![CDATA[spherical harmonics]]></category>

		<guid isPermaLink="false">http://altdevblogaday.com/?p=24309</guid>
		<description><![CDATA[Introduction
Spherical Harmonics(SH) functions can represent low frequency data such as diffuse lighting, where those high frequency details are lost after projected to SH. Luckily we can extract a dominant directional light from SH coefficients to fake specular lighting. We can also extract more than 1 directional light from SH coefficients, but this post will only focus on extracting 1 dominant light, those interested can read Stupid Spherical Harmonics (SH) Tricks for the details. A webGL demo is provided at the last section which will only extract 1 directional light.]]></description>
				<content:encoded><![CDATA[<p><strong><span class="Apple-style-span" style="font-size: large">Introduction</span></strong><br />
<a href="http://simonstechblog.blogspot.com/2011/12/spherical-harmonic-lighting.html">Spherical Harmonics(SH)</a> functions can represent low frequency data such as diffuse lighting, where those high frequency details are lost after projected to SH. Luckily we can extract a dominant directional light from SH coefficients to fake specular lighting. We can also extract more than 1 directional light from SH coefficients, but this post will only focus on extracting 1 dominant light, those interested can read <a href="http://www.ppsloan.org/publications/StupidSH36.pdf">Stupid Spherical Harmonics (SH) Tricks</a> for the details. <a href="http://simonstechblog.blogspot.com/2012/02/extracting-dominant-light-from.html#sh_extractDominantLight">A webGL demo</a> is provided at the last section which will only extract 1 directional light.</p>
<div class="separator" style="clear: both;text-align: center"><a href="http://1.bp.blogspot.com/-uIOZ0lTTZxs/TzkvykLo2FI/AAAAAAAAAPY/l-9rh1nhc7c/s1600/demoSrcShot1.png"><img src="http://1.bp.blogspot.com/-uIOZ0lTTZxs/TzkvykLo2FI/AAAAAAAAAPY/l-9rh1nhc7c/s400/demoSrcShot1.png" alt="" width="400" height="225" border="0" /></a></div>
<div class="separator" style="clear: both;text-align: center"></div>
<p><strong><span class="Apple-style-span" style="font-size: large">Extracting dominant light direction</span></strong><br />
We can get a single dominant light direction from the SH projected environment lighting, <em><strong>Le</strong></em>. Consider we approximate the environment light up to band 1 (i.e. <em>l</em>=1):</p>
<div class="separator" style="clear: both;text-align: center"><a href="http://4.bp.blogspot.com/-6P9zwcsfGjw/Tzm3gi6Q0eI/AAAAAAAAAPo/BI7NfRIa4PM/s1600/LeLightDir.png"><img src="http://4.bp.blogspot.com/-6P9zwcsfGjw/Tzm3gi6Q0eI/AAAAAAAAAPo/BI7NfRIa4PM/s320/LeLightDir.png" alt="" width="319" height="320" border="0" /></a></div>
<p>Finding the dominant light direction is equivalent to choose an incoming direction, <em><strong>ω</strong></em>, so that <em><strong>Le(</strong></em><em><strong>ω)</strong></em>is maximized. In other words, cos<em>θ</em> should equals to 1:</p>
<div class="separator" style="clear: both;text-align: center"><a href="http://4.bp.blogspot.com/-tf8XQPDlnNw/Tzda0w2Nm-I/AAAAAAAAAN4/7mg4ZsukgHk/s1600/LeDir.png"><img src="http://4.bp.blogspot.com/-tf8XQPDlnNw/Tzda0w2Nm-I/AAAAAAAAAN4/7mg4ZsukgHk/s320/LeDir.png" alt="" width="308" height="320" border="0" /></a></div>
<p>So we can extract the dominant light direction for a single color channel. Finally the dominant light direction can be calculated by scaling each dominant direction for RGB channels using the ration that convert color to gray scale:</p>
<div class="separator" style="clear: both;text-align: center"><a href="http://3.bp.blogspot.com/-8LYkqfaguAg/TzdcSjnzCfI/AAAAAAAAAOA/oHxeyEv6etE/s1600/LeDirRGB.png"><img src="http://3.bp.blogspot.com/-8LYkqfaguAg/TzdcSjnzCfI/AAAAAAAAAOA/oHxeyEv6etE/s320/LeDirRGB.png" alt="" width="320" height="89" border="0" /></a></div>
<p><strong><span class="Apple-style-span" style="font-size: large">Extracting dominant light intensity</span></strong><br />
After extracting the light direction, the remaining problem is to calculate the light intensity. That&#8217;s mean we want to calculate an intensity <em><strong>s</strong></em>, so that the error between the extracted light and the light environment is at minimum (<em><strong>Le</strong></em> is the original environment light while <em><strong>Ld</strong></em> is the directional light):</p>
<div class="separator" style="clear: both;text-align: center"><a href="http://2.bp.blogspot.com/-NDgBOPpG9rI/TzddKItCAZI/AAAAAAAAAOI/KOEyUWRkOiM/s1600/errFunc.png"><img src="http://2.bp.blogspot.com/-NDgBOPpG9rI/TzddKItCAZI/AAAAAAAAAOI/KOEyUWRkOiM/s1600/errFunc.png" alt="" border="0" /></a></div>
<p>To minimize the error, differentiate the equation and solve it equals to zero:</p>
<div class="separator" style="clear: both;text-align: center"><a href="http://2.bp.blogspot.com/-2w-5w4dnnOY/TzdhtuDkU_I/AAAAAAAAAOQ/3OT5t_4EO2Q/s1600/intensity.png"><img src="http://2.bp.blogspot.com/-2w-5w4dnnOY/TzdhtuDkU_I/AAAAAAAAAOQ/3OT5t_4EO2Q/s320/intensity.png" alt="" width="320" height="209" border="0" /></a></div>
<p>If both lighting functions are projected into SH, the intensity can be simplified to:</p>
<div class="separator" style="clear: both;text-align: center"><a href="http://4.bp.blogspot.com/-wBwhba3uhh8/Tzfef4fPDDI/AAAAAAAAAO4/-t03nqM1tE8/s1600/lightIntensity.png"><img src="http://4.bp.blogspot.com/-wBwhba3uhh8/Tzfef4fPDDI/AAAAAAAAAO4/-t03nqM1tE8/s1600/lightIntensity.png" alt="" border="0" /></a></div>
<p>The next step is to project the directional light(with unit intensity) into SH basis (<em><strong>c<span class="Apple-style-span" style="font-size: xx-small">i</span></strong></em> is the SH coefficient of the projected directional light):</p>
<div class="separator" style="clear: both;text-align: center"><a href="http://4.bp.blogspot.com/-lMXDIbgP8E8/TzdnopA8foI/AAAAAAAAAOg/rRZ1hseyyTI/s1600/shProjDirLight.png"><img src="http://4.bp.blogspot.com/-lMXDIbgP8E8/TzdnopA8foI/AAAAAAAAAOg/rRZ1hseyyTI/s320/shProjDirLight.png" alt="" width="320" height="87" border="0" /></a></div>
<p>Therefore the SH coefficients of projected directional light can be calculated by substituting the light direction into the corresponding SH basis function.</p>
<p>As the SH projected directional light is in unit intensity, we want to scale it with a factor so that the extracted light intensity <strong><em>s</em></strong> is the light color that can be ready for use in direct lighting equation which is defined as (detail explanation can be found in [4]):</p>
<blockquote class="tr_bq"><p><em>For artist convenience, c<span class="Apple-style-span" style="font-size: xx-small">light</span> does not correspond to a direct radiometric measure of the light’s intensity; it is specified as the color a white Lambertian surface would have when illuminated by the light from a direction parallel to the surface normal (l<span class="Apple-style-span" style="font-size: xx-small">c</span> = n).</em></p></blockquote>
<p>So we need to calculate a scaling factor, <em><strong>c</strong></em>, that scale the SH projected directional light such that:</p>
<div class="separator" style="clear: both;text-align: center"><a href="http://2.bp.blogspot.com/-tTOIK1N3lBg/TzdppdcqWHI/AAAAAAAAAOo/HdMQCLFDhPw/s1600/noramlizationFactor.png"><img src="http://2.bp.blogspot.com/-tTOIK1N3lBg/TzdppdcqWHI/AAAAAAAAAOo/HdMQCLFDhPw/s1600/noramlizationFactor.png" alt="" border="0" /></a></div>
<p>We can project both L(<em><strong>ω</strong></em>) and (<strong><em>n</em></strong> . <em><strong>ω</strong></em>) into SH to calculate the integral. To project the transfer function (<strong><em>n</em></strong> . <em><strong>ω</strong></em>) into SH, we can first align the <em><strong>n</strong></em> to +Z-axis, which is zonal harmonics, then we can rotate the ZH coefficient into any direction using the equation:</p>
<div class="separator" style="clear: both;text-align: center"><a href="http://4.bp.blogspot.com/-axoY2LI1vtA/TtdpG0yaspI/AAAAAAAAAIk/GlRcZO6d8Do/s1600/zh2sh.png"><img src="http://4.bp.blogspot.com/-axoY2LI1vtA/TtdpG0yaspI/AAAAAAAAAIk/GlRcZO6d8Do/s320/zh2sh.png" alt="" width="320" height="141" border="0" /></a></div>
<p>The ZH coefficients of (<strong><em>n</em></strong> . <em><strong>ω</strong></em>) are: (note that the result is different from <a href="http://www.ppsloan.org/publications/StupidSH36.pdf">Stupid Spherical Harmonics (SH) Tricks</a> in the Normalization section as we have taken the π term outside the integral)</p>
<div class="separator" style="clear: both;text-align: center"><a href="http://4.bp.blogspot.com/-EuA6C2gvEOU/TzfgQ7ht0DI/AAAAAAAAAPA/duKfPbPcauM/s1600/zhT.png"><img src="http://4.bp.blogspot.com/-EuA6C2gvEOU/TzfgQ7ht0DI/AAAAAAAAAPA/duKfPbPcauM/s640/zhT.png" alt="" width="305" height="640" border="0" /></a></div>
<p>Then rotate the ZH coefficients such that the normal direction is equals to the light direction, <em>l<span class="Apple-style-span" style="font-size: xx-small">d</span></em> (because we need <em>l<span class="Apple-style-span" style="font-size: xx-small">d</span> = n </em>as stated above), we have:</p>
<div class="separator" style="clear: both;text-align: center"><a href="http://2.bp.blogspot.com/-eoi1Kx_7Hcg/TzfjxOl-tXI/AAAAAAAAAPI/79OiG8Os0aI/s1600/shT.png"><img src="http://2.bp.blogspot.com/-eoi1Kx_7Hcg/TzfjxOl-tXI/AAAAAAAAAPI/79OiG8Os0aI/s320/shT.png" alt="" width="191" height="320" border="0" /></a></div>
<p>Finally we can go back to compute the scaling factor, <em><strong>c</strong></em>,  for the SH projected directional light (we calculate up to band=2):</p>
<div class="separator" style="clear: both;text-align: center"><a href="http://4.bp.blogspot.com/-68zIMB7xDsE/TzkuJUKDebI/AAAAAAAAAPQ/V_KpAeTK0Aw/s1600/normalizeFactor.png"><img src="http://4.bp.blogspot.com/-68zIMB7xDsE/TzkuJUKDebI/AAAAAAAAAPQ/V_KpAeTK0Aw/s640/normalizeFactor.png" alt="" width="640" height="403" border="0" /></a></div>
<p>Therefore the steps to extract the dominant light intensity are first to project the directional light into SH with a scaling factor <em><strong>c</strong></em>, and then light color, <em><strong>s</strong></em>,  can be calculated by:</p>
<div class="separator" style="clear: both;text-align: center;margin: 0px"><a href="http://4.bp.blogspot.com/-wBwhba3uhh8/Tzfef4fPDDI/AAAAAAAAAO4/-t03nqM1tE8/s1600/lightIntensity.png"><img style="cursor: move" src="http://4.bp.blogspot.com/-wBwhba3uhh8/Tzfef4fPDDI/AAAAAAAAAO4/-t03nqM1tE8/s1600/lightIntensity.png" alt="" border="0" /></a></div>
<p><strong><span class="Apple-style-span" style="font-size: large">WebGL Demo</span></strong><br />
<a href="http://simonstechblog.blogspot.com/2012/02/extracting-dominant-light-from.html#sh_extractDominantLight"> A webGL demo</a> (need a webGL enabled browser such as Chrome) is provided to illustrate how to extract a single directional light to fake the specular lighting from the SH coefficient. The specular lighting is calculated using the basic Blinn-Phong specular team for simplicity reason, other specular lighting equation can be used such as those <a href="http://simonstechblog.blogspot.com/2011/12/microfacet-brdf.html">physically plausible</a>. (The source code can be downloaded from <a href="https://sites.google.com/site/simontechblog/home/shextractdominantlight/shLightingExtractDLight.js">here</a>.)</p>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a href="http://simonstechblog.blogspot.com/2012/02/extracting-dominant-light-from.html#sh_extractDominantLight"><img src="http://3.bp.blogspot.com/-gnLqbfq1ZDQ/Tzkv6dC37cI/AAAAAAAAAPg/0jQdXnrqy1U/s400/demoScrShot2.png" alt="" width="400" height="298" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center">Screen Shot of the demo</td>
</tr>
</tbody>
</table>
<p><strong><span class="Apple-style-span" style="font-size: large">Conclusion</span></strong><br />
Extracting the dominant directional light from SH projected light is easy to compute with the following steps: First, calculate the dominant light direction. Second, project the dominant light into SH with a normalization factor. Third, calculate the light color. The extracted light can be used for specular lighting to give an impression of high frequency lighting.</p>
<p><strong><span class="Apple-style-span" style="font-size: x-small">References</span></strong></p>
<div style="margin: 0px"><span class="Apple-style-span" style="font-size: x-small">[1] Stupid Spherical Harmonics (SH) Tricks: <a href="http://www.ppsloan.org/publications/StupidSH36.pdf">http://www.ppsloan.org/publications/StupidSH36.pdf</a></span></div>
<div style="margin: 0px"><span class="Apple-style-span" style="font-size: x-small">[2] Light Factorization for Mixed-Frequency Shadows in Augmented Reality: <a href="http://zurich.disneyresearch.com/~wjarosz/publications/nowrouzezahrai11light.pdf">http://zurich.disneyresearch.com/~wjarosz/publications/nowrouzezahrai11light.pdf</a></span><br />
<span class="Apple-style-span" style="font-size: x-small">[3] <a href="http://www.amazon.com/gp/product/0123750792/ref=pd_lpo_k2_dp_sr_1?pf_rd_p=1278548962&amp;pf_rd_s=lpo-top-stripe-1&amp;pf_rd_t=201&amp;pf_rd_i=012553180X&amp;pf_rd_m=ATVPDKIKX0DER&amp;pf_rd_r=0EB7X30AM5YVZFYJXYZR">Physically Based Rendering, Second Edition: From Theory To Implementation</a> Ch.17.2.2</span><br />
<span class="Apple-style-span" style="font-size: x-small">[4] Physically-Based Shading Models in Film and Game Production: <a href="http://renderwonk.com/publications/s2010-shading-course/hoffman/s2010_physically_based_shading_hoffman_a_notes.pdf">http://renderwonk.com/publications/s2010-shading-course/hoffman/s2010_physically_based_shading_hoffman_a_notes.pdf</a></span></div>
<div style="margin: 0px"><span class="Apple-style-span" style="font-size: x-small">[5] PI or not to PI in game lighting equation: <a href="http://seblagarde.wordpress.com/2012/01/08/pi-or-not-to-pi-in-game-lighting-equation/">http://seblagarde.wordpress.com/2012/01/08/pi-or-not-to-pi-in-game-lighting-equation/</a></span><br />
<span class="Apple-style-span" style="font-size: x-small">[6] March of the Froblins: Simulation and Rendering Massive Crowds of Intelligent and Detailed Creatures on GPU: <a href="http://developer.amd.com/documentation/presentations/legacy/Chapter03-SBOT-March_of_The_Froblins.pdf">http://developer.amd.com/documentation/presentations/legacy/Chapter03-SBOT-March_of_The_Froblins.pdf</a><br />
[7] Pick dominant light from sh coeffs: <a href="http://sourceforge.net/mailarchive/message.php?msg_id=28778827">http://sourceforge.net/mailarchive/message.php?msg_id=28778827</a></span></div>
<div style="margin: 0px"></div>
<p><strong><br />
</strong><br />
<strong><br />
</strong></p>
]]></content:encoded>
			<wfw:commentRss>http://www.altdevblogaday.com/2012/02/14/extracting-dominant-light-from-spherical-harmonics/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Microfacet BRDF</title>
		<link>http://www.altdevblogaday.com/2011/12/19/microfacet-brdf/</link>
		<comments>http://www.altdevblogaday.com/2011/12/19/microfacet-brdf/#comments</comments>
		<pubDate>Mon, 19 Dec 2011 17:33:01 +0000</pubDate>
		<dc:creator>Simon Yeung</dc:creator>
				<category><![CDATA[Computer Graphics]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[Beckmann distribution]]></category>
		<category><![CDATA[Blinn Phong]]></category>
		<category><![CDATA[BRDF]]></category>
		<category><![CDATA[diffuse]]></category>
		<category><![CDATA[Distribution term]]></category>
		<category><![CDATA[Fresnel term]]></category>
		<category><![CDATA[geometry]]></category>
		<category><![CDATA[Geometry term]]></category>
		<category><![CDATA[Lighting]]></category>
		<category><![CDATA[math]]></category>
		<category><![CDATA[Mathematics]]></category>
		<category><![CDATA[microfacet]]></category>
		<category><![CDATA[physics]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[specular]]></category>

		<guid isPermaLink="false">http://altdevblogaday.com/?p=21771</guid>
		<description><![CDATA[<p><span class="Apple-style-span" style="font-size: large"><strong>Introduction</strong></span><br />
In recent years, there are <a href="http://renderwonk.com/publications/s2010-shading-course/">more</a> and <a href="http://advances.realtimerendering.com/s2011/Lazarov-Physically-Based-Lighting-in-Black-Ops%20(Siggraph%202011%20Advances%20in%20Real-Time%20Rendering%20Course).pptx">more</a> papers talking about applying physically based <a href="http://en.wikipedia.org/wiki/Bidirectional_reflectance_distribution_function">BRDF</a> in games. So I decided to spend some time to investigate it. For a BRDF to be physically plausible, it should satisfy 2 conditions:</p>
<p><a href="http://www.altdevblogaday.com/2011/12/19/microfacet-brdf/" class="more-link">Read more on Microfacet BRDF&#8230;</a></p>
]]></description>
				<content:encoded><![CDATA[<p><span class="Apple-style-span" style="font-size: large"><strong>Introduction</strong></span><br />
In recent years, there are <a href="http://renderwonk.com/publications/s2010-shading-course/">more</a> and <a href="http://advances.realtimerendering.com/s2011/Lazarov-Physically-Based-Lighting-in-Black-Ops%20(Siggraph%202011%20Advances%20in%20Real-Time%20Rendering%20Course).pptx">more</a> papers talking about applying physically based <a href="http://en.wikipedia.org/wiki/Bidirectional_reflectance_distribution_function">BRDF</a> in games. So I decided to spend some time to investigate it. For a BRDF to be physically plausible, it should satisfy 2 conditions:</p>
<ol>
<li>Reciprocity: The incident light direction(<em>l) </em>and reflected light direction(<em>r</em>) for a BRDF(<em>f</em>) is the same after the incident and reflected direction is swapped. i.e. <em>f</em>(<em>l</em>, <em>r</em>)= <em>f</em>(<em>r</em>, <em>l</em>)</li>
<li>Energy Conservation: The total energy of reflected light is less than or equal to the energy of the incident light. i.e.</li>
</ol>
<p><a href="http://1.bp.blogspot.com/-XCi7SombX6Y/Tu4SSVmuYEI/AAAAAAAAAJE/FTaLRIHcfwg/s1600/energyConservation.png"><img src="http://1.bp.blogspot.com/-XCi7SombX6Y/Tu4SSVmuYEI/AAAAAAAAAJE/FTaLRIHcfwg/s1600/energyConservation.png" alt="" border="0" /></a></p>
<div>
<p>A physically based specular BRDF is based on micro-facet theory, which describe a surface is composed of many micro-facets and each micro-facet will only reflect light in a single direction according to their normal(<em>m</em>):</p>
<div class="separator" style="clear: both;text-align: center"><a href="http://2.bp.blogspot.com/-csjzgQpnreY/Tu4bBr1Ot7I/AAAAAAAAAJU/kksU0wuEiPQ/s1600/microfacet.png"><img src="http://2.bp.blogspot.com/-csjzgQpnreY/Tu4bBr1Ot7I/AAAAAAAAAJU/kksU0wuEiPQ/s320/microfacet.png" alt="" width="320" height="176" border="0" /></a></div>
<p>So, in the above diagram, for light coming from direction <em>l</em> to be reflected to viewing direction <em>v</em>, the micro-facet normal <em>m</em> must be equals to the half vector between <em>l</em> and <em>v</em>.<br />
A micro-facet BRDF has the following form:</p>
<div class="separator" style="clear: both;text-align: center"><a href="http://3.bp.blogspot.com/-rzsRa3w1Gc0/Tu4e1feQ8CI/AAAAAAAAAJc/EF1eO5frBd8/s1600/microBRDF.png"><img src="http://3.bp.blogspot.com/-rzsRa3w1Gc0/Tu4e1feQ8CI/AAAAAAAAAJc/EF1eO5frBd8/s400/microBRDF.png" alt="" width="400" height="185" border="0" /></a></div>
<p>which consists of 3 terms: Fresnel term(<em>F</em>), Distribution term(<em>D</em>) and Geometry term(<em>G</em>). Their meaning can be found in the <a href="http://renderwonk.com/publications/s2010-shading-course/hoffman/s2010_physically_based_shading_hoffman_a_notes.pdf">background talk</a> presented by Naty Hoffman in siggraph 2010. And these 3 terms can be chosen independently as stated in the talk <a href="http://advances.realtimerendering.com/s2011/Lazarov-Physically-Based-Lighting-in-Black-Ops%20(Siggraph%202011%20Advances%20in%20Real-Time%20Rendering%20Course).pptx">Physically-based lighting in Call of Duty:Black Ops</a> (<span style="font-size: x-small">although &#8220;<a href="http://www.cs.cornell.edu/~srm/publications/EGSR07-btdf.pdf">Microfacet Models for Refraction through Rough Surfaces</a>&#8220; states that some <em>G</em> depends on <em>D</em> to maintain energy conservation, but some <em>G</em> are extended to handle arbitrary distribution, so in this blog post, I assume that the G function is independent of <em>D</em></span>). So I decided to find some distribution functions <em>D</em> and geometry functions <em>G </em>and play with different combinations to see how it affects the rendering result. You can also play around with different combinations using the WebGL demo(need a webGL enabled browser such as Chrome) <a href="http://simonstechblog.blogspot.com/2011/12/microfacet-brdf.html">here (at the last section)</a>.</p>
</div>
<div><strong><span class="Apple-style-span" style="font-size: large">Fresnel Term</span></strong><br />
In this test, I use the common Schlick approximation to the Fresnel equation:</p>
<div class="separator" style="clear: both;text-align: center"><a href="http://4.bp.blogspot.com/-fMV_Hkwk7H0/Tu4pwpyTWLI/AAAAAAAAAJk/aX7XXaB0eWQ/s1600/fresnel.png"><img src="http://4.bp.blogspot.com/-fMV_Hkwk7H0/Tu4pwpyTWLI/AAAAAAAAAJk/aX7XXaB0eWQ/s1600/fresnel.png" alt="" border="0" /></a></div>
<p>and <em>f<span style="font-size: xx-small">0</span> </em>is found by using the following equation:</p>
<div class="separator" style="clear: both;text-align: center"><a href="http://3.bp.blogspot.com/-f8A5MIjL-Zw/Tu4rUBbeFjI/AAAAAAAAAJs/ZWHkNKv1HSY/s1600/f0.png"><img src="http://3.bp.blogspot.com/-f8A5MIjL-Zw/Tu4rUBbeFjI/AAAAAAAAAJs/ZWHkNKv1HSY/s1600/f0.png" alt="" border="0" /></a></div>
<p>where the refractive index <em>n</em> can be tuned in the demo.</p>
<p><strong><span class="Apple-style-span" style="font-size: large">Distribution Term</span></strong><br />
Distribution term is used to describe how the microfacet normal distributed around a given direction. In <a href="http://simonstechblog.blogspot.com/2011/12/microfacet-brdf.html">the demo</a>, I used two distribution function: Blinn-Phong and Beckmann distribution function.</p>
<p>For Blinn-Phong distribution, we can derive the distribution function by satisfying the equation:</p>
<div class="separator" style="clear: both;text-align: center"><a href="http://4.bp.blogspot.com/-kfzVQrVJYdw/Tu6aWgqSniI/AAAAAAAAAJ8/VgfGoC9O2G4/s1600/D_prove.png"><img src="http://4.bp.blogspot.com/-kfzVQrVJYdw/Tu6aWgqSniI/AAAAAAAAAJ8/VgfGoC9O2G4/s200/D_prove.png" alt="" width="200" height="68" border="0" /></a></div>
<p>which means that the projected microfacet area is equal to macro surface area for any projected direction <em>v</em>. So we choose <em>v</em>=<em>n</em> which simplify the equation:</p>
<div class="separator" style="clear: both;text-align: center"><a href="http://3.bp.blogspot.com/-C0fU28TO0mc/Tu6bb_OGmQI/AAAAAAAAAKE/RBEoHeapR98/s1600/D_prove2.png"><img src="http://3.bp.blogspot.com/-C0fU28TO0mc/Tu6bb_OGmQI/AAAAAAAAAKE/RBEoHeapR98/s200/D_prove2.png" alt="" width="200" height="45" border="0" /></a></div>
<p>To derive the Blinn Phong distribution function from original Blinn Phong specular term, we just need to multiply a constant <em>K</em> to satisfy the equation:</p>
<div class="separator" style="clear: both;text-align: center"><a href="http://1.bp.blogspot.com/-HowddMUstM8/Tu6kgebQE3I/AAAAAAAAAKM/cDTht9gfR0w/s1600/D_proveBlinnPhong.png"><img src="http://1.bp.blogspot.com/-HowddMUstM8/Tu6kgebQE3I/AAAAAAAAAKM/cDTht9gfR0w/s320/D_proveBlinnPhong.png" alt="" width="307" height="320" border="0" /></a></div>
<p>While the Beckmann distribution has the following form:</p>
<div class="separator" style="clear: both;text-align: center"><a href="http://1.bp.blogspot.com/-7A_4giMNYR4/Tu6W40znbFI/AAAAAAAAAJ0/FXOHMKJesOw/s1600/D_beckmann.png"><img src="http://1.bp.blogspot.com/-7A_4giMNYR4/Tu6W40znbFI/AAAAAAAAAJ0/FXOHMKJesOw/s320/D_beckmann.png" alt="" width="320" height="159" border="0" /></a></div>
<p>To convert between the roughness <em>m</em> in Beckmann distribution and shininess <span class="s1"><em>α</em> in Blinn-Phong distribution, the following formula is used:</span></p>
<div class="separator" style="clear: both;text-align: center"><a href="http://3.bp.blogspot.com/-Ah7bH41WmD0/Tu6lVTACoOI/AAAAAAAAAKU/cgL8mhf6YmU/s1600/Screen+shot+2011-12-19+at+10.44.18+AM.png"><img src="http://3.bp.blogspot.com/-Ah7bH41WmD0/Tu6lVTACoOI/AAAAAAAAAKU/cgL8mhf6YmU/s200/Screen+shot+2011-12-19+at+10.44.18+AM.png" alt="" width="160" height="99" border="0" /></a></div>
<p><span class="s1">which gives a very similar result when both refractive index <em>n</em> and roughness <em>m</em> are small. When <em>n</em>&gt;10 and <em>m</em>&gt;0.5 the 2 distribution start to show difference and the difference will get larger when both <em>m</em> and <em>n</em> are getter larger.</span></p>
<div class="separator" style="clear: both;text-align: center"><a href="http://4.bp.blogspot.com/-8rjfQ1TTWKc/Tu84q-TX3oI/AAAAAAAAAKc/8WhFDjW4BM4/s1600/distributionDiff.png"><img src="http://4.bp.blogspot.com/-8rjfQ1TTWKc/Tu84q-TX3oI/AAAAAAAAAKc/8WhFDjW4BM4/s640/distributionDiff.png" alt="" width="640" height="281" border="0" /></a></div>
<p><strong><span class="Apple-style-span" style="font-size: large">Geometry Term</span></strong><br />
Geometry term is used for describing how much the microfacet is blocked by other microfacet. In <a href="http://simonstechblog.blogspot.com/2011/12/microfacet-brdf.html">the demo</a>, 4 geometry terms have been tested: implicit, Cook-Torrance, Schlick approximation to Smith&#8217;s shadowing function and Walter approximation to Smith&#8217;s shadowing function.</p>
<p>The first one is implicit geometry function which has the form:</p>
<div class="separator" style="clear: both;text-align: center"><a href="http://4.bp.blogspot.com/-YK7YYmM-Mv4/Tu9QmklVruI/AAAAAAAAAKk/J6iO9S1rYzI/s1600/G_implicit.png"><img src="http://4.bp.blogspot.com/-YK7YYmM-Mv4/Tu9QmklVruI/AAAAAAAAAKk/J6iO9S1rYzI/s200/G_implicit.png" alt="" width="200" height="28" border="0" /></a></div>
<p>It is called implicit because when it is used, the microfacet BRDF will only depends on Fresnel equation and distribution function.</p>
<p>The second one used for testing is Cook-Torrance geometry function:</p>
<div class="separator" style="clear: both;text-align: center"><a href="http://2.bp.blogspot.com/-jKJ8XTOq47I/Tu9RO0gjurI/AAAAAAAAAKs/c1pf502FRI8/s1600/G_cookTorrance.png"><img src="http://2.bp.blogspot.com/-jKJ8XTOq47I/Tu9RO0gjurI/AAAAAAAAAKs/c1pf502FRI8/s400/G_cookTorrance.png" alt="" width="400" height="50" border="0" /></a></div>
<p>And the other 2 geometry functions used are both trying to approximate the Smith&#8217;s shadowing function which decompose the geometry function into another 2 geometry function as below:</p>
<div class="separator" style="clear: both;text-align: center"><a href="http://1.bp.blogspot.com/-26u0qlpdlU0/Tu9RrueXaRI/AAAAAAAAAK0/iZi9l7DLy6I/s1600/G_smith.png"><img src="http://1.bp.blogspot.com/-26u0qlpdlU0/Tu9RrueXaRI/AAAAAAAAAK0/iZi9l7DLy6I/s320/G_smith.png" alt="" width="320" height="36" border="0" /></a></div>
<p>With Schlick&#8217;s approximation, the following G1 is used:</p>
<div class="separator" style="clear: both;text-align: center"><a href="http://2.bp.blogspot.com/-cnonUB4CZ3g/Tu9R8USkvBI/AAAAAAAAAK8/7AA7JPueKvs/s1600/G_Schlick.png"><img src="http://2.bp.blogspot.com/-cnonUB4CZ3g/Tu9R8USkvBI/AAAAAAAAAK8/7AA7JPueKvs/s320/G_Schlick.png" alt="" width="320" height="112" border="0" /></a></div>
<p>While Walter&#8217;s approximate G1 as:</p>
<div class="separator" style="clear: both;text-align: center"><a href="http://1.bp.blogspot.com/-u0IsIOGo3rk/Tu9SKgmOQlI/AAAAAAAAALE/Hs_kkaOxn_Y/s1600/G_walter.png"><img src="http://1.bp.blogspot.com/-u0IsIOGo3rk/Tu9SKgmOQlI/AAAAAAAAALE/Hs_kkaOxn_Y/s400/G_walter.png" alt="" width="400" height="170" border="0" /></a></div>
<p>Among 4 geometry terms, the implicit one always show a darker specular color. While the other 3 geometry functions have similar appearance when the roughness <em>m</em> is small. When <em>m</em> is getter larger, the Schlick function will slightly darker than the Cook-Torrance and Walter geometry function. Both Cook-Torrance and Walter function gives a very similar results:</p>
<div class="separator" style="clear: both;text-align: center"><a href="http://4.bp.blogspot.com/-u5m0xliPB0U/Tu9ZayRw1wI/AAAAAAAAALM/6WR3fyu01yo/s1600/geometryDiff.png"><img src="http://4.bp.blogspot.com/-u5m0xliPB0U/Tu9ZayRw1wI/AAAAAAAAALM/6WR3fyu01yo/s640/geometryDiff.png" alt="" width="640" height="470" border="0" /></a></div>
<p><span class="Apple-style-span" style="font-size: large"><strong>Energy Conservation between Diffuse and Specular BRDF</strong></span><br />
Energy conservation is important for a physically based BRDF, but most paper only talks about the conservation within the specular BRDF. How about the energy conservation between the diffuse term and specular term? I can only find 2 ways to do this from the <a href="http://renderwonk.com/publications/s2010-shading-course/gotanda/course_note_practical_implementation_at_triace.pdf">paper provided by Tri-Ace</a>. They multiply the diffuse reflection term with a diffuse Fresnel term:</p>
<div class="separator" style="clear: both;text-align: center"><a href="http://3.bp.blogspot.com/-r8vh1lpu824/Tu9cnU5la_I/AAAAAAAAALU/vZV9h6DPZYU/s1600/Screen+shot+2011-12-19+at+11.46.22+PM.png"><img src="http://3.bp.blogspot.com/-r8vh1lpu824/Tu9cnU5la_I/AAAAAAAAALU/vZV9h6DPZYU/s200/Screen+shot+2011-12-19+at+11.46.22+PM.png" alt="" width="200" height="63" border="0" /></a></div>
<p>And they later discovered that this term can be approximated with (1- <em>f</em><span style="font-size: xx-small">0</span>), which will show very similar results.  However, using this term will violate the reciprocity of the BRDF. If diffuse energy conservation is enabled, when the refractive index change, the ratio between the diffuse and specular reflection also change accordingly.</p>
<div class="separator" style="clear: both;text-align: center"><a href="http://4.bp.blogspot.com/-O19KFtfCMVQ/Tu9gPQqBkOI/AAAAAAAAALc/0iX3smYLmKY/s1600/energyDiff.png"><img src="http://4.bp.blogspot.com/-O19KFtfCMVQ/Tu9gPQqBkOI/AAAAAAAAALc/0iX3smYLmKY/s640/energyDiff.png" alt="" width="640" height="184" border="0" /></a></div>
<p><strong><span class="Apple-style-span" style="font-size: large">WebGL Demo</span></strong><br />
I provide a <a href="http://simonstechblog.blogspot.com/2011/12/microfacet-brdf.html">webGL program</a> so that you can play around with the settings I described above. The model is illuminated by a single white directional light and the red color is the diffuse color. The diffuse BRDF is just a lambert surface which can be turned off in the demo. Dragging inside the viewport can rotate the camera. The source code can be downloaded from <a href="https://sites.google.com/site/simontechblog/home/microfacetbrdf/material.js">here</a>.</p>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a href="http://simonstechblog.blogspot.com/2011/12/microfacet-brdf.html"><img src="http://4.bp.blogspot.com/-Tt7abLxeFco/Tu9rkoihlFI/AAAAAAAAALk/I3Sji_j0Hgw/s400/demoScrShot.png" alt="" width="400" height="225" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center">Screen shot of the demo</td>
</tr>
</tbody>
</table>
<p><strong><span class="Apple-style-span" style="font-size: large">Conclusion</span></strong><br />
Physically plausible BRDF can give a different material appearance for a surface compare to traditional lighting model. However, in this post, I only use 1 microfacet BRDF for all 3 RGB channels, using different BRDF settings for difference channels is also possible as some material like copper and gold have different <em>f</em><span style="font-size: xx-small">0</span> term in RGB channels. Also only direct lighting is investigated where secondary lighting BRDF will be left for future blog post.</p>
<p><strong>Reference</strong><br />
<span class="Apple-style-span" style="font-size: x-small">[1] Background: Physically-Based Shading (Naty Hoffman): <a href="http://renderwonk.com/publications/s2010-shading-course/hoffman/s2010_physically_based_shading_hoffman_a_notes.pdf">http://renderwonk.com/publications/s2010-shading-course/hoffman/s2010_physically_based_shading_hoffman_a_notes.pdf</a></span><br />
<span class="Apple-style-span" style="font-size: x-small">[2] Practical Implementation of Physically-Based Shading Models at tri-Ace (Yoshiharu Gotanda): <a href="http://renderwonk.com/publications/s2010-shading-course/gotanda/course_note_practical_implementation_at_triace.pdf">http://renderwonk.com/publications/s2010-shading-course/gotanda/course_note_practical_implementation_at_triace.pdf</a></span><br />
<span class="Apple-style-span" style="font-size: x-small">[3] Crafting Physically Motivated Shading Models for Game Development (Naty Hoffman): <a href="http://renderwonk.com/publications/s2010-shading-course/hoffman/s2010_physically_based_shading_hoffman_b_notes.pdf">http://renderwonk.com/publications/s2010-shading-course/hoffman/s2010_physically_based_shading_hoffman_b_notes.pdf</a></span><br />
<span class="Apple-style-span" style="font-size: x-small">[4] Physically-based lighting in Call of Duty: Black Ops: <a href="http://advances.realtimerendering.com/s2011/Lazarov-Physically-Based-Lighting-in-Black-Ops%20(Siggraph%202011%20Advances%20in%20Real-Time%20Rendering%20Course).pptx">http://advances.realtimerendering.com/s2011/Lazarov-Physically-Based-Lighting-in-Black-Ops%20(Siggraph%202011%20Advances%20in%20Real-Time%20Rendering%20Course).pptx</a></span><br />
<span class="Apple-style-span" style="font-size: x-small">[5] <span class="Apple-style-span" style="font-family: inherit">Microfacet Models for Refraction through Rough Surfaces:</span><a href="http://www.cs.cornell.edu/~srm/publications/EGSR07-btdf.pdf">http://www.cs.cornell.edu/~srm/publications/EGSR07-btdf.pdf</a></span><br />
<span class="Apple-style-span" style="font-size: x-small">[6] </span><span class="Apple-style-span" style="font-family: Helvetica;font-size: 12px"><a href="http://www.rorydriscoll.com/2009/01/25/energy-conservation-in-games/">http://www.rorydriscoll.com/2009/01/25/energy-conservation-in-games/</a></span></p>
<div style="font: 12.0px Helvetica;margin: 0.0px 0.0px 0.0px 0.0px">[7] <a href="http://seblagarde.wordpress.com/2011/08/17/hello-world/">http://seblagarde.wordpress.com/2011/08/17/hello-world/</a></div>
<p><span class="Apple-style-span" style="font-size: x-small"><br />
</span></p>
</div>
]]></content:encoded>
			<wfw:commentRss>http://www.altdevblogaday.com/2011/12/19/microfacet-brdf/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Spherical Harmonic Lighting</title>
		<link>http://www.altdevblogaday.com/2011/12/01/spherical-harmonic-lighting/</link>
		<comments>http://www.altdevblogaday.com/2011/12/01/spherical-harmonic-lighting/#comments</comments>
		<pubDate>Thu, 01 Dec 2011 15:22:08 +0000</pubDate>
		<dc:creator>Simon Yeung</dc:creator>
				<category><![CDATA[Computer Graphics]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[Cartesian]]></category>
		<category><![CDATA[coordinates]]></category>
		<category><![CDATA[imaginary numbers]]></category>
		<category><![CDATA[Lighting]]></category>
		<category><![CDATA[lighting calculations]]></category>
		<category><![CDATA[math]]></category>
		<category><![CDATA[Mathematics]]></category>
		<category><![CDATA[maths]]></category>
		<category><![CDATA[Monte Carlo Integration]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[SH]]></category>
		<category><![CDATA[SH functions]]></category>
		<category><![CDATA[spherical]]></category>
		<category><![CDATA[spherical harmonics]]></category>
		<category><![CDATA[webgl]]></category>
		<category><![CDATA[zonal harmonics]]></category>

		<guid isPermaLink="false">http://altdevblogaday.com/?p=20946</guid>
		<description><![CDATA[<p><strong><span class="Apple-style-span" style="font-size: large">Introduction</span></strong><br />
<span class="Apple-style-span">Spherical Harmonics(SH) functions are a set of orthogonal basis functions defined in spherical coordinates using imaginary numbers. In this post, we use the following conversion between spherical and cartesian coordinates:</span></p>
<div class="separator" style="clear: both;text-align: center"><a href="http://3.bp.blogspot.com/-o7g5DTcEPZA/TtcdA1bF1OI/AAAAAAAAAH0/TudAxLXFxv8/s1600/coordinates2.png"><img src="http://3.bp.blogspot.com/-o7g5DTcEPZA/TtcdA1bF1OI/AAAAAAAAAH0/TudAxLXFxv8/s200/coordinates2.png" alt="" width="200" height="200" border="0" /></a></div>
<p>Since we are dealing with real value functions, we only need to deal with real spherical harmonics functions which in the form of:</p>
<div>
<div class="separator" style="clear: both;text-align: center"><a href="http://3.bp.blogspot.com/-xPuL_vDiisU/TtcIpEx0HUI/AAAAAAAAAHk/8UEf80cs0rI/s1600/shFunc.png"><img src="http://3.bp.blogspot.com/-xPuL_vDiisU/TtcIpEx0HUI/AAAAAAAAAHk/8UEf80cs0rI/s400/shFunc.png" alt="" width="400" height="150" border="0" /></a></div>
<p>The index <em><strong>l</strong></em> of the SH function is called the band index which is an integer <span class="Apple-style-span" style="font-size: x-small">&#62;=</span> 0 and index <em>m</em> is an integer with range -<em><strong>l</strong></em><span class="Apple-style-span" style="font-size: x-small">&#60;=</span><em><strong>m</strong></em><span class="Apple-style-span" style="font-size: x-small">&#60;=</span><em><strong>l</strong></em> , so there will be (2<em><strong>l</strong></em> + 1) functions in a given band. You may refer to the Appendix A2 of <a href="http://www.ppsloan.org/publications/StupidSH36.pdf">Stupid Spherical Harmonics(SH) Trick</a> to look up the evaluated value of the SH basis function for a pair of (<em><strong>l</strong></em>, <em><strong>m</strong></em>).</div>
<p><a href="http://www.altdevblogaday.com/2011/12/01/spherical-harmonic-lighting/" class="more-link">Read more on Spherical Harmonic Lighting&#8230;</a></p>
]]></description>
				<content:encoded><![CDATA[<p><strong><span class="Apple-style-span" style="font-size: large">Introduction</span></strong><br />
<span class="Apple-style-span">Spherical Harmonics(SH) functions are a set of orthogonal basis functions defined in spherical coordinates using imaginary numbers. In this post, we use the following conversion between spherical and cartesian coordinates:</span></p>
<div class="separator" style="clear: both;text-align: center"><a href="http://3.bp.blogspot.com/-o7g5DTcEPZA/TtcdA1bF1OI/AAAAAAAAAH0/TudAxLXFxv8/s1600/coordinates2.png"><img src="http://3.bp.blogspot.com/-o7g5DTcEPZA/TtcdA1bF1OI/AAAAAAAAAH0/TudAxLXFxv8/s200/coordinates2.png" alt="" width="200" height="200" border="0" /></a></div>
<p>Since we are dealing with real value functions, we only need to deal with real spherical harmonics functions which in the form of:</p>
<div>
<div class="separator" style="clear: both;text-align: center"><a href="http://3.bp.blogspot.com/-xPuL_vDiisU/TtcIpEx0HUI/AAAAAAAAAHk/8UEf80cs0rI/s1600/shFunc.png"><img src="http://3.bp.blogspot.com/-xPuL_vDiisU/TtcIpEx0HUI/AAAAAAAAAHk/8UEf80cs0rI/s400/shFunc.png" alt="" width="400" height="150" border="0" /></a></div>
<p>The index <em><strong>l</strong></em> of the SH function is called the band index which is an integer <span class="Apple-style-span" style="font-size: x-small">&gt;=</span> 0 and index <em>m</em> is an integer with range -<em><strong>l</strong></em><span class="Apple-style-span" style="font-size: x-small">&lt;=</span><em><strong>m</strong></em><span class="Apple-style-span" style="font-size: x-small">&lt;=</span><em><strong>l</strong></em> , so there will be (2<em><strong>l</strong></em> + 1) functions in a given band. You may refer to the Appendix A2 of <a href="http://www.ppsloan.org/publications/StupidSH36.pdf">Stupid Spherical Harmonics(SH) Trick</a> to look up the evaluated value of the SH basis function for a pair of (<em><strong>l</strong></em>, <em><strong>m</strong></em>).</p>
<p>The linear combination of SH basis functions with scalar values can be used to approximate a function as below:</p>
<div class="separator" style="clear: both;text-align: center"><a href="http://1.bp.blogspot.com/-UhYEAwYu_Bc/TtcXJGBniQI/AAAAAAAAAHs/LrG6k3-NCQg/s1600/shApprox.png"><img src="http://1.bp.blogspot.com/-UhYEAwYu_Bc/TtcXJGBniQI/AAAAAAAAAHs/LrG6k3-NCQg/s400/shApprox.png" alt="" width="400" height="146" border="0" /></a></div>
<p>With an approximation up to band <em><strong>l</strong></em> = <em><strong>n </strong></em>- 1, which <em><strong>n</strong></em><span class="Apple-style-span" style="line-height: 19px"><span class="Unicode"><span class="Apple-style-span" style="font-family: inherit">×</span></span></span><em><strong>n</strong></em> coefficients are needed.<br />
So the remaining problem to approximate a function is to compute the coefficient <span class="Apple-style-span" style="font-style: italic"><strong>c</strong> </span><span class="Apple-style-span">which can be solved either analytically or numerically by </span>Monte Carlo Integration.</p>
<div style="margin: 0px">
<div style="margin: 0px"><strong><span class="Apple-style-span" style="font-size: large">Monte Carlo Integration</span></strong></div>
</div>
<div style="margin: 0px">
<div style="margin: 0px">To compute a definite integral numerically, we can consider the Monte Carlo Estimator:</div>
<div class="separator" style="clear: both;text-align: center"><a href="http://1.bp.blogspot.com/-jLB6gE7aN1w/TtclS4awbzI/AAAAAAAAAH8/uLajXY_a2Lg/s1600/mcEstimator.png"><img src="http://1.bp.blogspot.com/-jLB6gE7aN1w/TtclS4awbzI/AAAAAAAAAH8/uLajXY_a2Lg/s320/mcEstimator.png" alt="" width="320" height="110" border="0" /></a></div>
<div style="margin: 0px">When the number of samples, N, is large enough, the estimator F will equal to the definite integral because considering the expected value of F:</div>
<div class="separator" style="clear: both;text-align: center"><a href="http://2.bp.blogspot.com/-hVE2eI57BJM/TtcrKXqe_PI/AAAAAAAAAIE/5IY8IbGXxFU/s1600/mcEstimatorProve.png"><img src="http://2.bp.blogspot.com/-hVE2eI57BJM/TtcrKXqe_PI/AAAAAAAAAIE/5IY8IbGXxFU/s200/mcEstimatorProve.png" alt="" width="169" height="200" border="0" /></a></div>
<div style="margin: 0px">When number of samples,N, is large enough, by the law of large numbers, the estimator F will converge to the definite integral. Therefore, we can calculate the coefficient of the SH basis functions by using Monte Carlo Estimator.</div>
<div style="margin: 0px"><span class="Apple-style-span" style="font-size: large"><strong>Properties of </strong><strong>Spherical Harmonics Function</strong></span></div>
</div>
<div style="margin: 0px">
<div style="margin: 0px">There are 2 important properties properties of SH functions:</div>
<div style="margin: 0px">First, it is rotationally invariant.</div>
<div class="separator" style="clear: both;text-align: center"><a href="http://3.bp.blogspot.com/-CZMq0sYWp54/Ttcy92B1MqI/AAAAAAAAAIM/K8GAE4s8r3I/s1600/rotateInvar.png"><img src="http://3.bp.blogspot.com/-CZMq0sYWp54/Ttcy92B1MqI/AAAAAAAAAIM/K8GAE4s8r3I/s200/rotateInvar.png" alt="" width="200" height="81" border="0" /></a></div>
<div style="margin: 0px">Where the rotated function <em><strong>g</strong></em> is still a SH function which its coefficients can be computed by using the coefficients of <em><strong>f</strong></em>. For details of rotating a general SH functions, you can refer to the section &#8216;Rotating Spherical Harmonics&#8217; in <a href="http://www.research.scea.com/gdc2003/spherical-harmonic-lighting.pdf">Spherical Harmonics Lighting: The Gritty Details</a>.</div>
<div style="margin: 0px">Second, when integrating 2 SH projected functions over the spherical domain, the results will equals to dot product of their SH coefficients (due to the SH basis functions are orthogonal):</div>
<div class="separator" style="clear: both;text-align: center"><a href="http://3.bp.blogspot.com/-gXaFDuf-e_Q/Ttc5c8VNVgI/AAAAAAAAAIU/ejvJPrxvIIE/s1600/shIntegrate.png"><img src="http://3.bp.blogspot.com/-gXaFDuf-e_Q/Ttc5c8VNVgI/AAAAAAAAAIU/ejvJPrxvIIE/s200/shIntegrate.png" alt="" width="200" height="65" border="0" /></a></div>
<div style="margin: 0px">This is a nice property that we can calculate the integration over the spherical domain by a dot product of the SH coefficients.</div>
<div style="margin: 0px"><strong><span class="Apple-style-span" style="font-size: large">Lighting with SH functions</span></strong></div>
<div style="margin: 0px">When performing lighting calculation, we need to solve the <a href="http://en.wikipedia.org/wiki/Rendering_equation">rendering equation</a>:</div>
</div>
<div>
<div class="separator" style="clear: both;text-align: center;margin: 0px"><a href="http://1.bp.blogspot.com/-NBT7pFTwG74/TtcCtW1doDI/AAAAAAAAAHc/Zldx8So309I/s1600/renderingEqt.png"><img src="http://1.bp.blogspot.com/-NBT7pFTwG74/TtcCtW1doDI/AAAAAAAAAHc/Zldx8So309I/s400/renderingEqt.png" alt="" width="400" height="143" border="0" /></a></div>
</div>
<p>For shading lambert diffuse surface without shadow, we can simplify the rendering equation into:</p>
<div class="separator" style="clear: both;text-align: center"><a href="http://4.bp.blogspot.com/-QZFI5kd20ck/TtdB89VrGEI/AAAAAAAAAIc/qbvQWYivbzY/s1600/shDiffuse.png"><img src="http://4.bp.blogspot.com/-QZFI5kd20ck/TtdB89VrGEI/AAAAAAAAAIc/qbvQWYivbzY/s400/shDiffuse.png" alt="" width="400" height="148" border="0" /></a></div>
<p>To solve this integral, we can project the functions <em>L</em>(<em><strong>x</strong></em>, <em><strong>ω</strong></em>) and max(N<strong>.</strong><em><strong>ω</strong></em>, 0) into SH functions using Monte Carlo Integration, then by the property 2 described above, the integral equals to dot product of the SH coefficients of the 2 SH projected functions.</p>
<p><strong><span class="Apple-style-span" style="font-size: large">Zonal Harmonics</span></strong><br />
If a SH projected function is rotational symmetric about a fixed axis, it is called Zonal Harmonics(ZH). If this axis is the z-axis, this will make the ZH function only depends on <em><strong>θ</strong></em>, which will result in only one non-zero coefficient in each band with <em><strong>m</strong></em>= 0. Then rotation of the ZH function can be greatly simplified. When the ZH function is rotated to a new axis <em><strong>d</strong></em>, the coefficients of the rotated SH function will equals to:</p>
<div class="separator" style="clear: both;text-align: center"><a href="http://4.bp.blogspot.com/-axoY2LI1vtA/TtdpG0yaspI/AAAAAAAAAIk/GlRcZO6d8Do/s1600/zh2sh.png"><img src="http://4.bp.blogspot.com/-axoY2LI1vtA/TtdpG0yaspI/AAAAAAAAAIk/GlRcZO6d8Do/s320/zh2sh.png" alt="" width="320" height="140" border="0" /></a></div>
<p>,which is faster than the general SH rotation. The ZH function is well suit to approximate the function max(N<strong>.</strong><em><strong>ω</strong></em>, 0) in the above diffuse surface rendering equation since the SH projected <em>L</em>(<em><strong>x</strong></em>, <em><strong>ω</strong></em>) is usually done in world space while the shading surface can be re-oriented to the same space to perform lighting calculation.</p>
<p><strong><span class="Apple-style-span" style="font-size: large">WebGL Demo</span></strong></p>
<div>
<p><a href="http://simonstechblog.blogspot.com/2011/12/spherical-harmonic-lighting.html"> Here</a> is a webGL demo (which need a webGL enabled browser such as Chrome) using the cube map below as light source and projected to SH function using Monte Carlo Integration.</p>
<div class="separator" style="clear: both;text-align: center"><img src="http://2.bp.blogspot.com/-OjplNQ6PpHI/TtdxkHYE8sI/AAAAAAAAAI0/lCwklmJWFaI/s200/cubeMap.png" alt="" width="200" height="151" border="0" /></div>
<p>Both the white and the blue color on the model is reflected from the sun and the blue sky using SH coefficients generated from the cube map and the ZH coefficients projected from max(N<strong>.</strong><em><strong>ω</strong></em>, 0) which rotated to world space according the surface normal. The approximation is done up to band <em><strong>l</strong></em>=2.  You can drag in the viewport to rotate the camera. Below is a screen shot of the demo:</p>
<div class="separator" style="clear: both;text-align: center"><a href="http://simonstechblog.blogspot.com/2011/12/spherical-harmonic-lighting.html"><img src="http://4.bp.blogspot.com/-HX6HT2oGYt4/TteICsLw5gI/AAAAAAAAAI8/rk5eaaOb4s4/s320/scrShot.png" alt="" width="320" height="180" border="0" /></a></div>
<p>The source code of the webGL can be downloaded <a href="https://sites.google.com/site/simontechblog/home/sh-lighting/shLighting.js">here</a>.</p>
<p><strong><span class="Apple-style-span" style="font-size: large">Conclusion</span></strong><br />
SH functions can be used to approximate the rendering equation with only a few coefficients and a simple dot product to evaluate lighting during run time. But it also has its disadvantage while SH can only approximate low frequency function as it needs large number of bands to represent high frequency details.</p>
<p><strong><span class="Apple-style-span" style="font-size: large">Reference</span></strong><br />
<span class="Apple-style-span" style="font-size: x-small">[1] Spherical Harmonics Lighting: The Gritty Details: <a href="http://www.research.scea.com/gdc2003/spherical-harmonic-lighting.pdf">http://www.research.scea.com/gdc2003/spherical-harmonic-lighting.pdf</a></span><br />
<span class="Apple-style-span" style="font-size: x-small">[2] Stupid Spherical Harmonics(SH) Trick: <a href="http://www.ppsloan.org/publications/StupidSH36.pdf">http://www.ppsloan.org/publications/StupidSH36.pdf</a></span><br />
<span class="Apple-style-span" style="font-size: x-small">[3] Physically Based Rendering: <a href="http://www.amazon.com/gp/product/0123750792/ref=pd_lpo_k2_dp_sr_1?pf_rd_p=486539851&amp;pf_rd_s=lpo-top-stripe-1&amp;pf_rd_t=201&amp;pf_rd_i=012553180X&amp;pf_rd_m=ATVPDKIKX0DER&amp;pf_rd_r=09AG8FQQWKJHC2AEFPD1">http://www.amazon.com/gp/product/0123750792/ref=pd_lpo_k2_dp_sr_1?pf_rd_p=486539851&amp;pf_rd_s=lpo-top-stripe-1&amp;pf_rd_t=201&amp;pf_rd_i=012553180X&amp;pf_rd_m=ATVPDKIKX0DER&amp;pf_rd_r=09AG8FQQWKJHC2AEFPD1</a></span><br />
<span class="Apple-style-span" style="font-size: x-small">[4] Sky box texture downloaded from: <a href="http://www.codemonsters.de/home/content.php?show=cubemaps">http://www.codemonsters.de/home/content.php?show=cubemaps</a></span></p>
</div>
</div>
]]></content:encoded>
			<wfw:commentRss>http://www.altdevblogaday.com/2011/12/01/spherical-harmonic-lighting/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>SSAO using Line Integrals</title>
		<link>http://www.altdevblogaday.com/2011/06/19/ssao-using-line-integrals/</link>
		<comments>http://www.altdevblogaday.com/2011/06/19/ssao-using-line-integrals/#comments</comments>
		<pubDate>Sun, 19 Jun 2011 08:59:32 +0000</pubDate>
		<dc:creator>Simon Yeung</dc:creator>
				<category><![CDATA[Computer Graphics]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[SSAO]]></category>

		<guid isPermaLink="false">http://altdevblogaday.org/?p=8935</guid>
		<description><![CDATA[<p>Hi everyone, this is my first post in <a href="http://altdevblogaday.com/">#AltDevBlogADay</a>. Let me introduce myself first, I am Simon Yeung, currently working as a game programmer. I like graphics programming and sometimes write iPhone apps.</p>
<p><a href="http://www.altdevblogaday.com/2011/06/19/ssao-using-line-integrals/" class="more-link">Read more on SSAO using Line Integrals&#8230;</a></p>
]]></description>
				<content:encoded><![CDATA[<p>Hi everyone, this is my first post in <a href="http://altdevblogaday.com/">#AltDevBlogADay</a>. Let me introduce myself first, I am Simon Yeung, currently working as a game programmer. I like graphics programming and sometimes write iPhone apps.</p>
<p>This time, I would like to talk about the SSAO implemented in my little demo program. I write this demo because I spent most of my time using openGL and know little about DirectX, so I decided to learn DX by writing this demo. So it is not well optimized.</p>
<p>SSAO, short for Screen Space Ambient Occlusion, is a technique for approximating the indirect shadow casted by surrounding scene geometry which is done in screen space by sampling from the depth buffer.</p>
<p>The SSAO is implemented using the line integrals from <a href="http://advances.realtimerendering.com/s2010/Ownby,Hall%20and%20Hall%20-%20Toystory3%20(SIGGRAPH%202010%20Advanced%20RealTime%20Rendering%20Course).pdf">&#8220;Rendering techniques in Toy Story 3&#8243;</a><span style="font-size: xx-small">[1]</span>. Here is my results:</p>
<p style="text-align: center"><a href="http://1.bp.blogspot.com/-L_P7MjGNzmI/TfezQ122MuI/AAAAAAAAAAc/sHrX50aPMfU/s1600/withSSAO.PNG"><img class="aligncenter" src="http://1.bp.blogspot.com/-L_P7MjGNzmI/TfezQ122MuI/AAAAAAAAAAc/sHrX50aPMfU/s320/withSSAO.PNG" alt="" width="320" height="180" border="0" /></a> With SSAO</p>
<p style="text-align: center"><a href="http://2.bp.blogspot.com/-_aWeoUPLBOY/TfezoubCsYI/AAAAAAAAAAg/Ch7rc6RXqSI/s1600/withoutSSAO.PNG"><img class="aligncenter" src="http://2.bp.blogspot.com/-_aWeoUPLBOY/TfezoubCsYI/AAAAAAAAAAg/Ch7rc6RXqSI/s320/withoutSSAO.PNG" alt="" width="320" height="180" border="0" /></a> without SSAO</p>
<p style="text-align: center"><a href="http://2.bp.blogspot.com/-PpbvzDfJFz0/Tfe0fx2kCNI/AAAAAAAAAAk/wytvMvMRrdQ/s1600/SSAOTexture.PNG"><img class="aligncenter" src="http://2.bp.blogspot.com/-PpbvzDfJFz0/Tfe0fx2kCNI/AAAAAAAAAAk/wytvMvMRrdQ/s320/SSAOTexture.PNG" alt="" width="320" height="180" border="0" /></a> SSAO texture</p>
<p style="text-align: left">Their method calculates the volume occluded by other objects inside a sphere at each fragment by sampling from the depth buffer.</p>
<p style="text-align: center"><a href="http://4.bp.blogspot.com/-RJc2HBD_5LA/Tfe226fyzeI/AAAAAAAAAAo/SJLGMt3uHgU/s1600/slide22.png"><img src="http://4.bp.blogspot.com/-RJc2HBD_5LA/Tfe226fyzeI/AAAAAAAAAAo/SJLGMt3uHgU/s320/slide22.png" alt="" width="320" height="181" border="0" /></a><br />
From Slide 22 of the paper</p>
<p>The volume of sphere is found by using the equation:</p>
<p style="text-align: center"><a href="http://1.bp.blogspot.com/-LtzNOc1bTQw/Tfe4pQcitxI/AAAAAAAAAAs/xbVqDIckhQM/s1600/slide51.png"><img src="http://1.bp.blogspot.com/-LtzNOc1bTQw/Tfe4pQcitxI/AAAAAAAAAAs/xbVqDIckhQM/s320/slide51.png" alt="" width="320" height="180" border="0" /></a><br />
From Slide 51 of the paper</p>
<p>And they use the Voronoi Diagram to associate the ratio of volume occupied by each sample points for their predefined sampling pattern.</p>
<p>But in my implementation, I didn&#8217;t use the Voronoi Diagram, in stead, I tried to calculate the volume occupied by the depth sample using that equation in the pixel shader. However, due to the perspective projection, that equation no longer holds as the ray will not form a right angle triangle which is not the same as above figure, and resulting the artifact as below(the wall on the right side):<br />
<a href="http://1.bp.blogspot.com/-UN_rSIo5iBg/Tfe9lczyFwI/AAAAAAAAAAw/Rpvw6HMCQVU/s1600/a1.PNG"><img class="aligncenter" src="http://1.bp.blogspot.com/-UN_rSIo5iBg/Tfe9lczyFwI/AAAAAAAAAAw/Rpvw6HMCQVU/s320/a1.PNG" alt="" width="320" height="180" border="0" /></a></p>
<p style="text-align: center">The artifact on the wall on the right side</p>
<p>So, I tried to solve the problem by using ray-sphere intersection to calculate a more accurate line integrals.</p>
<p><a href="http://1.bp.blogspot.com/-nkNjFo6_Hxk/TfzsBQU0EuI/AAAAAAAAABE/9kGr_Mbc2GY/s1600/raySphere.png"><img class="aligncenter" src="http://1.bp.blogspot.com/-nkNjFo6_Hxk/TfzsBQU0EuI/AAAAAAAAABE/9kGr_Mbc2GY/s320/raySphere.png" alt="" width="320" height="320" border="0" /></a></p>
<p>For example, when calculate the occlusion volume for the black cross in the above diagram (take 2 depth samples for easy explanation), I need to compute the length L1 and L2 by solving ray-sphere intersection. Also the length O1 and O2 can be computed by sampling from the depth buffer. Therefore, the volume of the sphere can be approximated by L1+L2 and the occlusion volume can be approximated by O1+O2. (I also added a distance attenuation factor to O1 and O2 if the depth difference is too large so that the tank does not occlude the wall in my demo program). And the AO value will be (O1+O2)/(L1+L2). This eliminate the artifacts:</p>
<p><a href="http://4.bp.blogspot.com/-kWcU4aI8gMc/Tfu6NGYOC-I/AAAAAAAAAA0/u-dG68JvPyQ/s1600/a2.PNG"><img class="aligncenter" src="http://4.bp.blogspot.com/-kWcU4aI8gMc/Tfu6NGYOC-I/AAAAAAAAAA0/u-dG68JvPyQ/s320/a2.PNG" alt="" width="320" height="180" border="0" /></a></p>
<p style="text-align: center">Solving ray-sphere intersection to eliminate the artifacts</p>
<p>The demo program uses 8 depth samples for each fragment. In order to fake a higher sample count, I also tried to rotate the sample points as suggested by the paper which gives a softer look for the AO:</p>
<p style="text-align: center"><a href="http://3.bp.blogspot.com/-GTix-kovkDA/Tfu7Jn213lI/AAAAAAAAAA4/SGpwgIJeGZU/s1600/a3.PNG"><img class="aligncenter" src="http://3.bp.blogspot.com/-GTix-kovkDA/Tfu7Jn213lI/AAAAAAAAAA4/SGpwgIJeGZU/s320/a3.PNG" alt="" width="320" height="180" border="0" /></a> Rotating the sample points</p>
<p style="text-align: left">Then, a bilateral blur is applied to smooth out the noise. Although bilateral blur is not separable, it is faster to divided it into 2 passes (i.e. 1 horizontal and 1 vertical, just like Gaussian blur), with 5 samples for each pass, which gives a softer result:<br />
<a href="http://1.bp.blogspot.com/-MFsmpMl0QYM/Tfu7cbHbY9I/AAAAAAAAAA8/16REii6b6FY/s1600/a4.PNG"><img class="aligncenter" src="http://1.bp.blogspot.com/-MFsmpMl0QYM/Tfu7cbHbY9I/AAAAAAAAAA8/16REii6b6FY/s320/a4.PNG" alt="" width="320" height="180" border="0" /></a></p>
<p style="text-align: center">After bilateral blur</p>
<p>Finally the SSAO texture is blend with the scene:</p>
<p><a href="http://3.bp.blogspot.com/-gMDVEAinYUk/Tfu7zAEGaXI/AAAAAAAAABA/yPibLK2pMx8/s1600/a5.PNG"><img class="aligncenter" src="http://3.bp.blogspot.com/-gMDVEAinYUk/Tfu7zAEGaXI/AAAAAAAAABA/yPibLK2pMx8/s320/a5.PNG" alt="" width="320" height="180" border="0" /></a></p>
<p style="text-align: center">Applying SSAO to the scene</p>
<p>In conclusion, I finished the SSAO but it is not optimized, there are several places can be improved such as when calculating the line integrals, I made several branches in pixel shader which slow down a lot. Also I rotated the sample points by calculate a rotation angle using the fragment position in pixel shader which can also be optimized using a pre-computed rotation angle texture as the paper suggested. These things can be improved but my main purpose of this demo is make me familiar with DirectX, so I just left out the optimization and left it as future enhancement.</p>
<p>&nbsp;</p>
<p>References:</p>
<p style="text-align: left">[1]: Rendering techniques in Toy Story 3: http://advances.realtimerendering.com/s2010/index.html</p>
<p style="text-align: left">[2]: The brick texture is obtained from Crytek&#8217;s Sponza Model: http://crytek.com/cryengine/cryengine3/downloads</p>
<p style="text-align: left">[3]: The tank model is obtained from an XNA demo project: http://create.msdn.com/en-US/education/catalog/?contenttype=0&amp;devarea=0&amp;platform=21&amp;sort=1</p>
]]></content:encoded>
			<wfw:commentRss>http://www.altdevblogaday.com/2011/06/19/ssao-using-line-integrals/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>

<!-- Dynamic page generated in 1.488 seconds. -->
<!-- Cached page generated by WP-Super-Cache on 2013-06-18 08:26:59 -->
<!-- Compression = gzip -->