<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>#AltDevBlogADay &#187; graphics</title>
	<atom:link href="http://www.altdevblogaday.com/tag/graphics/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.altdevblogaday.com</link>
	<description>Each day a little more #gamedev love</description>
	<lastBuildDate>Mon, 20 May 2013 21:33:38 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.5.1</generator>
		<item>
		<title>Implementing Voxel Cone Tracing</title>
		<link>http://www.altdevblogaday.com/2013/01/31/implementing-voxel-cone-tracing/</link>
		<comments>http://www.altdevblogaday.com/2013/01/31/implementing-voxel-cone-tracing/#comments</comments>
		<pubDate>Thu, 31 Jan 2013 14:22:12 +0000</pubDate>
		<dc:creator>Simon Yeung</dc:creator>
				<category><![CDATA[Computer Graphics]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[global illumination]]></category>
		<category><![CDATA[graphics]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[sparse voxel octree]]></category>
		<category><![CDATA[voxel cone tracing]]></category>

		<guid isPermaLink="false">http://www.altdevblogaday.com/?p=29117</guid>
		<description><![CDATA[<p>[<b>Updated on 25-2-2013</b>: added a paragraph about lowering the voxel update frequency for faster dynamic update. The <a href="https://docs.google.com/file/d/0B_CrrCOiha-VdWp4cXNZcllWRmM/edit">demo</a> also added a combo box for choosing the update frequency]</p>
<p><b><span class="Apple-style-span" style="font-size: large">Introduction</span></b></p>
<p>In last year SIGGRAPH, Epic games presented their <a href="http://www.unrealengine.com/files/misc/The_Technology_Behind_the_Elemental_Demo_16x9_(2).pdf">real time GI solution</a> which based on <a href="http://maverick.inria.fr/Publications/2011/CNSGE11b/GIVoxels-pg2011-authors.pdf">voxel cone tracing</a>. They showed some nice results which attract me to implement the technique and my implementation runs at around 22~30fps (updated every frame) at 1024&#215;768 screen resolution using a 256x256x256 voxel volume on my GTX460 graphic card. The demo program can be downloaded <a href="https://docs.google.com/file/d/0B_CrrCOiha-VdWp4cXNZcllWRmM/edit">here</a> which requires a DX11 GPU to run.</p>
<p><a href="http://www.altdevblogaday.com/2013/01/31/implementing-voxel-cone-tracing/" class="more-link">Read more on Implementing Voxel Cone Tracing&#8230;</a></p>
]]></description>
				<content:encoded><![CDATA[<p>[<b>Updated on 25-2-2013</b>: added a paragraph about lowering the voxel update frequency for faster dynamic update. The <a href="https://docs.google.com/file/d/0B_CrrCOiha-VdWp4cXNZcllWRmM/edit">demo</a> also added a combo box for choosing the update frequency]</p>
<p><b><span class="Apple-style-span" style="font-size: large">Introduction</span></b></p>
<p>In last year SIGGRAPH, Epic games presented their <a href="http://www.unrealengine.com/files/misc/The_Technology_Behind_the_Elemental_Demo_16x9_(2).pdf">real time GI solution</a> which based on <a href="http://maverick.inria.fr/Publications/2011/CNSGE11b/GIVoxels-pg2011-authors.pdf">voxel cone tracing</a>. They showed some nice results which attract me to implement the technique and my implementation runs at around 22~30fps (updated every frame) at 1024&#215;768 screen resolution using a 256x256x256 voxel volume on my GTX460 graphic card. The demo program can be downloaded <a href="https://docs.google.com/file/d/0B_CrrCOiha-VdWp4cXNZcllWRmM/edit">here</a> which requires a DX11 GPU to run.</p>
<table>
<tbody>
<tr>
<td>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a style="margin-left: auto;margin-right: auto" href="http://3.bp.blogspot.com/-XaUDldta9LI/UQVHxULKiOI/AAAAAAAAAhQ/-4Sb64oMMRU/s1600/gi0.png"><img alt="" src="http://3.bp.blogspot.com/-XaUDldta9LI/UQVHxULKiOI/AAAAAAAAAhQ/-4Sb64oMMRU/s1600/gi0.png" width="320" height="248" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center">With GI</td>
</tr>
</tbody>
</table>
</td>
<td>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a style="margin-left: auto;margin-right: auto" href="http://1.bp.blogspot.com/-tjaW2p5zK5o/UQVH2qHmiSI/AAAAAAAAAhY/C84ROUJDIzk/s1600/gi1.png"><img alt="" src="http://1.bp.blogspot.com/-tjaW2p5zK5o/UQVH2qHmiSI/AAAAAAAAAhY/C84ROUJDIzk/s1600/gi1.png" width="320" height="248" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center">Without GI</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
<p><b><span class="Apple-style-span" style="font-size: large">Overview</span></b></p>
<p>There are 5 major steps in voxel cone tracing:</p>
<blockquote><p>0. Given a scene with directly lighting only<br />
1. Voxelize the triangle meshes<br />
2. Construct sparse voxel octree<br />
3. Inject direct lighting into the octree<br />
4. Filter the direct lighting to generate mip-map<br />
5. Sample the mip-mapped values by cone tracing</p></blockquote>
<table>
<tbody>
<tr>
<td>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a style="margin-left: auto;margin-right: auto" href="http://2.bp.blogspot.com/-MO4f_YWl9wY/UQVH9eQDFkI/AAAAAAAAAhg/5_DCCzljCrE/s1600/ov0.png"><img alt="" src="http://2.bp.blogspot.com/-MO4f_YWl9wY/UQVH9eQDFkI/AAAAAAAAAhg/5_DCCzljCrE/s1600/ov0.png" width="200" height="155" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center">Step 0</td>
</tr>
</tbody>
</table>
</td>
<td>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a style="margin-left: auto;margin-right: auto" href="http://4.bp.blogspot.com/-P2BVPwilQsE/UQVIEgGhRoI/AAAAAAAAAho/XdXtwZ_EMmw/s1600/ov1.png"><img alt="" src="http://4.bp.blogspot.com/-P2BVPwilQsE/UQVIEgGhRoI/AAAAAAAAAho/XdXtwZ_EMmw/s1600/ov1.png" width="200" height="155" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center">Step 1</td>
</tr>
</tbody>
</table>
</td>
<td>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a style="margin-left: auto;margin-right: auto" href="http://1.bp.blogspot.com/-IOAxp7niW_s/UQVII8ur4OI/AAAAAAAAAhw/OJdMZ6trHjk/s1600/ov2.png"><img alt="" src="http://1.bp.blogspot.com/-IOAxp7niW_s/UQVII8ur4OI/AAAAAAAAAhw/OJdMZ6trHjk/s1600/ov2.png" width="200" height="155" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center">Step 2</td>
</tr>
</tbody>
</table>
</td>
</tr>
<tr>
<td>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a style="margin-left: auto;margin-right: auto" href="http://1.bp.blogspot.com/-_Y2TWCOlTxs/UQVINT1T_iI/AAAAAAAAAh4/4NGeItw8SIs/s1600/ov3.png"><img alt="" src="http://1.bp.blogspot.com/-_Y2TWCOlTxs/UQVINT1T_iI/AAAAAAAAAh4/4NGeItw8SIs/s1600/ov3.png" width="200" height="155" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center">Step 3</td>
</tr>
</tbody>
</table>
</td>
<td>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a style="margin-left: auto;margin-right: auto" href="http://2.bp.blogspot.com/-f-2JVZ5PLiU/UQVIRhWUIpI/AAAAAAAAAiA/2njRLi2e19c/s1600/ov4.png"><img alt="" src="http://2.bp.blogspot.com/-f-2JVZ5PLiU/UQVIRhWUIpI/AAAAAAAAAiA/2njRLi2e19c/s1600/ov4.png" width="200" height="155" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center">Step 4</td>
</tr>
</tbody>
</table>
</td>
<td>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a style="margin-left: auto;margin-right: auto" href="http://4.bp.blogspot.com/-6oxrpji5wd4/UQVIW5WFmQI/AAAAAAAAAiI/QiA2Zv9viZc/s1600/ov5.png"><img alt="" src="http://4.bp.blogspot.com/-6oxrpji5wd4/UQVIW5WFmQI/AAAAAAAAAiI/QiA2Zv9viZc/s1600/ov5.png" width="200" height="155" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center">Step 5</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
<p>The step 1 and 2 can be done only once for static geometry while step 3 to 5 need to be done every frame. The following sections will briefly describe the above steps, you may want to take a look at the <a href="http://maverick.inria.fr/Publications/2011/CNSGE11b/GIVoxels-pg2011-authors.pdf">original paper</a> first as some of the details will not be repeated in the following sections.</p>
<div>
<p><b><span class="Apple-style-span" style="font-size: large">Voxelization pass</span></b></p>
<p>The first step is to voxelize the scene. I first output all the voxels from triangle meshes into a big voxel fragment queue buffer. Each voxel fragment use 16 bytes:</p>
<div class="separator" style="clear: both;text-align: center"><a style="margin-left: 1em;margin-right: 1em" href="http://4.bp.blogspot.com/-bilhjuv67FA/UQVKbBjDjiI/AAAAAAAAAis/Oq_3yEp_C4g/s1600/voxelDataFragmentLayout.png"><img alt="" src="http://4.bp.blogspot.com/-bilhjuv67FA/UQVKbBjDjiI/AAAAAAAAAis/Oq_3yEp_C4g/s1600/voxelDataFragmentLayout.png" width="197" height="200" border="0" /></a></div>
<p>The first 4 bytes store the position of that voxel inside the voxel volume which is at most 512, so 4 bytes is enough to store the XYZ coordinates.</p>
<p>Voxel fragments are created using the conservative rasterization described in this <a href="http://www.seas.upenn.edu/%7Epcozzi/OpenGLInsights/OpenGLInsights-SparseVoxelization.pdf">OpenGL Insights</a> charter which requires only 1 geometry pass by enlarging the triangle a bit using the geometry shader. So I modify my <a href="http://simonstechblog.blogspot.hk/2012/08/shader-generator.html">shader generator</a> to generate the shaders for creating voxel fragments based on the material.</p>
<table>
<tbody>
<tr>
<td>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a style="margin-left: auto;margin-right: auto" href="http://1.bp.blogspot.com/-NRyrOzHslBw/UQVKjsrh-QI/AAAAAAAAAi0/cn2y1bzi8u8/s1600/vp0.png"><img alt="" src="http://1.bp.blogspot.com/-NRyrOzHslBw/UQVKjsrh-QI/AAAAAAAAAi0/cn2y1bzi8u8/s1600/vp0.png" width="320" height="248" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center">Scene before voxelize</td>
</tr>
</tbody>
</table>
</td>
<td>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a style="margin-left: auto;margin-right: auto" href="http://4.bp.blogspot.com/-LihpATI6lKk/UQVKn_Kr5mI/AAAAAAAAAi8/CwrITUtsKUs/s1600/vp1.png"><img alt="" src="http://4.bp.blogspot.com/-LihpATI6lKk/UQVKn_Kr5mI/AAAAAAAAAi8/CwrITUtsKUs/s1600/vp1.png" width="320" height="248" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center">Scene after voxelize</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
<p><b><span class="Apple-style-span" style="font-size: large">Octree building pass</span></b></p>
<p>I use a similar data structure that described in the <a href="http://maverick.inria.fr/Publications/2011/CNSGE11b/GIVoxels-pg2011-authors.pdf">voxel cone tracing paper</a> and <a href="http://maverick.inria.fr/Publications/2011/Cra11/">giga voxel paper</a> with a large octree node buffer storing the octree/voxel node data (with 8 node grouped into 1 tile which described in the paper) and a 3D texture storing the reflected radiance and alpha from the direct lighting at that voxel (I assume all the surface reflect light diffusely so only 1 3D texture is used which is different from the paper storing incoming radiance).</p>
<p>Each octree node is 28 bytes with the first 4 bytes storing the child tile index with another 24 bytes storing the neighbor node byte offset from the start of the octree node buffer:</p>
<div class="separator" style="clear: both;text-align: center"><a style="margin-left: 1em;margin-right: 1em" href="http://3.bp.blogspot.com/-s_YGclOlD4M/UQVLH2S0e8I/AAAAAAAAAjE/kLbz3nVoffQ/s1600/octreeNodeDataLayout.png"><img alt="" src="http://3.bp.blogspot.com/-s_YGclOlD4M/UQVLH2S0e8I/AAAAAAAAAjE/kLbz3nVoffQ/s1600/octreeNodeDataLayout.png" width="187" height="320" border="0" /></a></div>
<p>The child node tile index is also used to index the 3D texture brick(more details can be found in <a href="http://maverick.inria.fr/Publications/2011/CNSGE11b/GIVoxels-pg2011-authors.pdf">voxel cone tracing paper</a> about the texture brick) which store the reflected radiance of that octree node tile. And since I use 5 bits for store different bit flags for dynamic update, so my voxel volume can only be at most 512x512x512 large.</p>
<p>In the octree leaf node, it stores the voxel data directly which only use 16 bytes:</p>
<div class="separator" style="clear: both;text-align: center"><a style="margin-left: 1em;margin-right: 1em" href="http://4.bp.blogspot.com/-23Op1U3D4sQ/UQVLPPlwgmI/AAAAAAAAAjM/eg7h4EHjbpk/s1600/octreeLeafNodeDataLayout.png"><img alt="" src="http://4.bp.blogspot.com/-23Op1U3D4sQ/UQVLPPlwgmI/AAAAAAAAAjM/eg7h4EHjbpk/s1600/octreeLeafNodeDataLayout.png" width="199" height="320" border="0" /></a></div>
<p>the first 4 bytes store the bit flag to indicate whether that voxel is created from static mesh so that it will not be overwrite by dynamic geometry. Also the 1 byte counter stored along with the voxel normal is used to perform averaging when different voxel fragments fall into the same voxel. The steps to perform atomic average can be found in the <a href="http://www.seas.upenn.edu/%7Epcozzi/OpenGLInsights/OpenGLInsights-SparseVoxelization.pdf">OpenGL Insight Chapter</a>.</p>
<p>So given the voxel fragment queue output from the previous steps, we can build the octree using the steps described in  <a href="http://www.seas.upenn.edu/%7Epcozzi/OpenGLInsights/OpenGLInsights-SparseVoxelization.pdf">OpenGL Insights Chapter</a> and average the octree leaf node values when different voxels fall into the same node.</p>
<div>
<table>
<tbody>
<tr>
<td>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a style="margin-left: auto;margin-right: auto" href="http://2.bp.blogspot.com/-x5iweSNn5AE/UQVLpqY6fPI/AAAAAAAAAjU/yROoR5S8Jjw/s1600/op0.png"><img alt="" src="http://2.bp.blogspot.com/-x5iweSNn5AE/UQVLpqY6fPI/AAAAAAAAAjU/yROoR5S8Jjw/s1600/op0.png" width="320" height="248" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center">showing highest octree level</td>
</tr>
</tbody>
</table>
</td>
<td>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a style="margin-left: auto;margin-right: auto" href="http://1.bp.blogspot.com/-gaewbHxL1YA/UQVLuId7ykI/AAAAAAAAAjc/C4CXQRKQe0o/s1600/op1.png"><img alt="" src="http://1.bp.blogspot.com/-gaewbHxL1YA/UQVLuId7ykI/AAAAAAAAAjc/C4CXQRKQe0o/s1600/op1.png" width="320" height="248" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center">showing the octree with the voxels</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
<p><b><span class="Apple-style-span" style="font-size: large">Inject direct lighting pass</span></b></p>
<p>After building the voxelized scene, we need to add the lighting data into the data structure to calculate global illumination. First, we render the shadow map from the light&#8217;s point of view. Then for each pixel of the shadow map, we can re-construct the world position of the shadow map texel and then traverse down the octree data structure to calculate the reflected radiance(assume reflected diffusely) and write it to the 3D texture bricks.</p>
<div class="separator" style="clear: both;text-align: center"><a style="margin-left: 1em;margin-right: 1em" href="http://2.bp.blogspot.com/-vbpJe1TgxS4/UQVMStDZM2I/AAAAAAAAAjk/BTCl1_Kjcdo/s1600/ip0.png"><img alt="" src="http://2.bp.blogspot.com/-vbpJe1TgxS4/UQVMStDZM2I/AAAAAAAAAjk/BTCl1_Kjcdo/s1600/ip0.png" width="320" height="248" border="0" /></a></div>
<p>In my engine, I use cascade shadow map with 4 cascades, the last cascade is used to perform the light injection, with the slope scale depth bias disabled, otherwise, the re-constructed world position may not located exactly inside the voxel. Also, it is better to stablize the shadow map (in the demo, the position of the directional light is computed from view camera)  in order to avoid flicking during cone tracing step when the camera moves. But after stablizing the shadow map, not all the voxels are filled with lighting data at 512x512x512 voxel resolution as the shadow map is not fully utilized&#8230;</p>
<div class="separator" style="clear: both;text-align: center"><a style="margin-left: 1em;margin-right: 1em" href="http://1.bp.blogspot.com/-Yh_KS8-tplc/UQVMYUaxPeI/AAAAAAAAAjs/t2Gkjd8OqK4/s1600/ip1.png"><img alt="" src="http://1.bp.blogspot.com/-Yh_KS8-tplc/UQVMYUaxPeI/AAAAAAAAAjs/t2Gkjd8OqK4/s1600/ip1.png" width="320" height="248" border="0" /></a></div>
<div class="separator" style="clear: both;text-align: center"></div>
<p>This artifact will also occur when the shadow map resolution is low. Consider the the figure below, we have a directional light injecting lighting to the scene (which have a wall and floor).</p>
<div class="separator" style="clear: both;text-align: center"><a style="margin-left: 1em;margin-right: 1em" href="http://4.bp.blogspot.com/-SKgY0-mICw8/UQaSfPcnP6I/AAAAAAAAAoM/8GJL_MwD6yw/s1600/smLowRes.png"><img alt="" src="http://4.bp.blogspot.com/-SKgY0-mICw8/UQaSfPcnP6I/AAAAAAAAAoM/8GJL_MwD6yw/s1600/smLowRes.png" width="157" height="200" border="0" /></a></div>
<p>As we launch one thread for each shadow map texel to determine which voxel get light injected, we can only inject light to 3 voxels(the white square in the figure) in the above case. While if the shadow map resolution is high enough, those 5 vertical voxels should all be injected with light.</p>
<p><b><span class="Apple-style-span" style="font-size: large">Filtering pass</span></b></p>
<p>In order to perform cone tracing step, we need to filter the lighting data at the leaf node of the octree. The voxel lighting data is filtered anisotropically along the positive and negative XYZ directions. For example, in a 2D case(which is easier to explain and very similar to the 3D case), for each node tile, we will have 5&#215;5 voxels which will be filtered to 3&#215;3 voxels in the upper mip level like this:</p>
<div class="separator" style="clear: both;text-align: center"><a style="margin-left: 1em;margin-right: 1em" href="http://2.bp.blogspot.com/-aBit6SzNDeM/UQVMeL4G6eI/AAAAAAAAAj0/IdCtwuWiRV0/s1600/filterVoxel25to9.png"><img alt="" src="http://2.bp.blogspot.com/-aBit6SzNDeM/UQVMeL4G6eI/AAAAAAAAAj0/IdCtwuWiRV0/s1600/filterVoxel25to9.png" width="320" height="150" border="0" /></a></div>
<p>To filter the center voxel(texel E in the figure) in the node tile in upper mip level along the +X direction, we need to consider the 9 texels (texel g, h, i, l, m, n, q, r, s in the figure) in the lower level, The texels are divided into 4 groups as the figure below:</p>
<div class="separator" style="clear: both;text-align: center"><a style="margin-left: 1em;margin-right: 1em" href="http://2.bp.blogspot.com/-EjBColCh7AI/UQVMlIEbo3I/AAAAAAAAAj8/aegaQqszbuY/s1600/divide3x3into4gp.png"><img alt="" src="http://2.bp.blogspot.com/-EjBColCh7AI/UQVMlIEbo3I/AAAAAAAAAj8/aegaQqszbuY/s1600/divide3x3into4gp.png" width="320" height="159" border="0" /></a></div>
<p>in each group, we filter along the +X direction, for example in the upper left group, we first alpha blend the value in the +X direction and then average them to get the value for that group.</p>
<div class="separator" style="clear: both;text-align: center"><a style="margin-left: 1em;margin-right: 1em" href="http://3.bp.blogspot.com/-tTuW5WNU9vM/UQVMsbmYOAI/AAAAAAAAAkE/9XljOpicSc0/s1600/filterBlending.png"><img alt="" src="http://3.bp.blogspot.com/-tTuW5WNU9vM/UQVMsbmYOAI/AAAAAAAAAkE/9XljOpicSc0/s1600/filterBlending.png" width="200" height="175" border="0" /></a></div>
<p>After calculate the values for all the 4 groups, we can compute the filtered center voxel value by repeating the above steps for the 4 group values.</p>
<p>But for the corner values(e.g. texel a, b, f, g) in a node tile we cannot access all the 9 texels to compute a filtered value as those values are in the neighbor node tile. So the group value are partially computed and store in the 3D texture first, then we rely on the transfer step described in the <a href="http://maverick.inria.fr/Publications/2011/CNSGE11b/GIVoxels-pg2011-authors.pdf">cone tracing paper</a> to complete the calculation. In other words, during the transfer steps, some of the filtered value is computed by first alpha blending the neighbor group values followed by averaging them, and other filtered value will be computed by averaging first followed by alpha blending. Although this calculation is not commutative, but this can reduce the number of dispatch passes to compute the filtered value with a similar result.</p>
<p>So we have 6 directional values for a filtered voxels. To get a sample from it for a particularly direction, we can sample it using the method like <a href="http://www.valvesoftware.com/publications/2006/SIGGRAPH06_Course_ShadingInValvesSourceEngine.pdf">ambient cube</a> with 3 texture read (the following code is simplified):</p>
<ol>
<li>float3 sampleAnisotropic(float3 direction)</li>
<li>{</li>
<li>    float3 nSquared = direction * direction;</li>
<li>    uint3 isNegative = ( direction &lt; 0.0 );</li>
<li>    float3 filteredColor=</li>
<li>        nSquared.x * anisotropicFilteredBrickValue[isNegative.x] +</li>
<li>        nSquared.y * anisotropicFilteredBrickValue[isNegative.y+2] +</li>
<li>        nSquared.z * anisotropicFilteredBrickValue[isNegative.z+4];</li>
<li>    return filteredColor;</li>
<li>}</li>
</ol>
<p><b><span class="Apple-style-span" style="font-size: large">Voxel Cone Tracing pass</span></b></p>
</div>
<div>
<p>After doing the above steps, we finally can compute our global illumination. We first consider the simple case for ambient occlusion, which need to compute the AO integral. We approximate it by partitioning the integral with several cones:</p>
<div class="separator" style="clear: both;text-align: center"><a style="margin-left: 1em;margin-right: 1em" href="http://1.bp.blogspot.com/-IRkYoeGcs_M/UQVQGSuEMrI/AAAAAAAAAko/0K2iOTp5O5s/s1600/integral_ao.png"><img alt="" src="http://1.bp.blogspot.com/-IRkYoeGcs_M/UQVQGSuEMrI/AAAAAAAAAko/0K2iOTp5O5s/s1600/integral_ao.png" width="320" height="149" border="0" /></a></div>
<p>Where each partition need to multiply with a weight W. In my implementation, 6 cones are traced with 60 degree over the hemi-sphere in the following direction (Y-axis as the up vector):</p>
<div class="separator" style="clear: both;text-align: center"><a style="margin-left: 1em;margin-right: 1em" href="http://1.bp.blogspot.com/-FWGnacfrQ2o/UQVQNYhEj9I/AAAAAAAAAkw/99C0J8G-1Rk/s1600/coneTraceDir.png"><img alt="" src="http://1.bp.blogspot.com/-FWGnacfrQ2o/UQVQNYhEj9I/AAAAAAAAAkw/99C0J8G-1Rk/s1600/coneTraceDir.png" width="200" height="101" border="0" /></a></div>
<p>each with a weight W:</p>
<div class="separator" style="clear: both;text-align: center"><a style="margin-left: 1em;margin-right: 1em" href="http://3.bp.blogspot.com/-L5C8sxl5rYE/UQVQUCzd2KI/AAAAAAAAAk4/cXLoT4UvvGM/s1600/coneTraceWeight.png"><img alt="" src="http://3.bp.blogspot.com/-L5C8sxl5rYE/UQVQUCzd2KI/AAAAAAAAAk4/cXLoT4UvvGM/s1600/coneTraceWeight.png" width="58" height="200" border="0" /></a></div>
<p>Then to calculate the visibility inside one cone, we take multiple samples from the filtered 3D texture bricks along the cone direction:</p>
<div class="separator" style="clear: both;text-align: center"><a style="margin-left: 1em;margin-right: 1em" href="http://3.bp.blogspot.com/-HhHImwIQbcU/UQVQt62qCFI/AAAAAAAAAlA/ezXJXNgsJB4/s1600/coneTraceSampleLoc.png"><img alt="" src="http://3.bp.blogspot.com/-HhHImwIQbcU/UQVQt62qCFI/AAAAAAAAAlA/ezXJXNgsJB4/s1600/coneTraceSampleLoc.png" width="200" height="155" border="0" /></a></div>
<p>But, the remaining problem is to determine the sampling position. Since we manually filter the 3D texture bricks, the hardware quadrilinear interpolation will not work.  So to avoid performing the quadrilinear interpolation manually, It is better to make sampling position located at each mip level, having voxel size equals to the cone width at that position. This position is calculated by assuming the shape of voxel is sphere rather than cube(which is easier to calculate) as follows:</p>
<div class="separator" style="clear: both;text-align: center"><a style="margin-left: 1em;margin-right: 1em" href="http://4.bp.blogspot.com/-4Ch8UtZb0WE/UQVQz-ftlaI/AAAAAAAAAlI/NY9GZw5Ik7I/s1600/coneTraceSampleSphereLoc.png"><img alt="" src="http://4.bp.blogspot.com/-4Ch8UtZb0WE/UQVQz-ftlaI/AAAAAAAAAlI/NY9GZw5Ik7I/s1600/coneTraceSampleSphereLoc.png" width="200" height="155" border="0" /></a></div>
<p>Then the sampling location can be calculated given the cone origin, cone angle, trace direction and voxel radius with some simple geometry. And here is the voxel AO result:</p>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a style="margin-left: auto;margin-right: auto" href="http://3.bp.blogspot.com/-M0QCsel-EYk/UQVQ5N27v7I/AAAAAAAAAlQ/FandFHx7I2k/s1600/vctp_ao.png"><img alt="" src="http://3.bp.blogspot.com/-M0QCsel-EYk/UQVQ5N27v7I/AAAAAAAAAlQ/FandFHx7I2k/s1600/vctp_ao.png" width="320" height="248" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center">AO result using 512x512x512 voxel volume</td>
</tr>
</tbody>
</table>
<p>Next, for diffuse indirect illumination, the calculation is very similar to AO:</p>
<div class="separator" style="clear: both;text-align: center"><a style="margin-left: 1em;margin-right: 1em" href="http://1.bp.blogspot.com/-I1Nt67FQwlE/UQVREmeYB6I/AAAAAAAAAlY/TwNxb0k3Pss/s1600/integral_diffuse.png"><img alt="" src="http://1.bp.blogspot.com/-I1Nt67FQwlE/UQVREmeYB6I/AAAAAAAAAlY/TwNxb0k3Pss/s1600/integral_diffuse.png" width="400" height="180" border="0" /></a></div>
<p>The indirect diffuse calculation is done using the same cones as the AO so that both the indirect diffuse and AO can be calculated at the same time:</p>
<div class="separator" style="clear: both;text-align: center"><a style="margin-left: 1em;margin-right: 1em" href="http://2.bp.blogspot.com/-a-W0YCegQlE/UQVRKrvszeI/AAAAAAAAAlg/N794saJ2IEY/s1600/vctp_diffuse.png"><img alt="" src="http://2.bp.blogspot.com/-a-W0YCegQlE/UQVRKrvszeI/AAAAAAAAAlg/N794saJ2IEY/s1600/vctp_diffuse.png" width="320" height="248" border="0" /></a></div>
<p>Finally, we calculate the indirect specular, where cone is traced in the reflected view direction along the surface normal. The cone angle is depends on the glossiness, <i><b>g</b></i>, of the material, currently I use the following equation to calculate the angle:</p>
<div class="separator" style="clear: both;text-align: center"><a style="margin-left: 1em;margin-right: 1em" href="http://4.bp.blogspot.com/-wTO1GBD_KoA/UQVRSqalOnI/AAAAAAAAAlo/GMIiPBUjA0o/s1600/specConeAngle.png"><img alt="" src="http://4.bp.blogspot.com/-wTO1GBD_KoA/UQVRSqalOnI/AAAAAAAAAlo/GMIiPBUjA0o/s1600/specConeAngle.png" width="320" height="54" border="0" /></a></div>
<p>The glossiness is limited to a range so that the cone angle will not be too narrow to avoid stepping through thin walls. Here is the result of indirect specular:</p>
<div class="separator" style="clear: both;text-align: center"><a style="margin-left: 1em;margin-right: 1em" href="http://1.bp.blogspot.com/-5QDIN_rV0b4/UQVRieAprQI/AAAAAAAAAlw/mpQVJe6-0jA/s1600/vctp_specular.png"><img alt="" src="http://1.bp.blogspot.com/-5QDIN_rV0b4/UQVRieAprQI/AAAAAAAAAlw/mpQVJe6-0jA/s1600/vctp_specular.png" width="320" height="248" border="0" /></a></div>
<p>In my engine, I use a light pre-pass renderer, so for the opaque objects, I calculate the indirect lighting by rendering a full screen quad after the lighting pass which is then blend on top of the direct lighting buffer using the following blend state:</p>
<div class="separator" style="clear: both;text-align: center"><a style="margin-left: 1em;margin-right: 1em" href="http://2.bp.blogspot.com/-IFxVbID7h9Q/UQVRo3fvcbI/AAAAAAAAAl4/Ml0nc3U5RfA/s1600/blendStateSameAO.png"><img alt="" src="http://2.bp.blogspot.com/-IFxVbID7h9Q/UQVRo3fvcbI/AAAAAAAAAl4/Ml0nc3U5RfA/s1600/blendStateSameAO.png" width="200" height="32" border="0" /></a></div>
<p>With the value of AO store in alpha channel. this can apply AO to both the direct and indirect lighting. Sometimes different AO intensity is need for the direct and indirect light, this blend state can be used instead:</p>
<div class="separator" style="clear: both;text-align: center"><a style="margin-left: 1em;margin-right: 1em" href="http://2.bp.blogspot.com/-R4tJaQQAy9k/UQVRtaYZ12I/AAAAAAAAAmA/5_E46w_rfIM/s1600/blendStateDifferentAO.png"><img alt="" src="http://2.bp.blogspot.com/-R4tJaQQAy9k/UQVRtaYZ12I/AAAAAAAAAmA/5_E46w_rfIM/s1600/blendStateDifferentAO.png" width="200" height="30" border="0" /></a></div>
<p>which apply the alpha blending to only the direct lighting and the indirect AO is applied inside the shader.</p>
<p>When filtering mip map directionally from the leaf node, normal direction is not taken into account, which results in more light leaking for thin objects as below. And this artifact can be hidded slightly with the AO value calculated in cone tracing:</p>
<table>
<tbody>
<tr>
<td>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a style="margin-left: auto;margin-right: auto" href="http://1.bp.blogspot.com/-opltKhwHevA/UQVRzj3jn-I/AAAAAAAAAmI/skfb8vZDpD4/s1600/lightLeak0.png"><img alt="" src="http://1.bp.blogspot.com/-opltKhwHevA/UQVRzj3jn-I/AAAAAAAAAmI/skfb8vZDpD4/s1600/lightLeak0.png" width="320" height="248" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center">light leak through thin geometry</td>
</tr>
</tbody>
</table>
</td>
<td>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a style="margin-left: auto;margin-right: auto" href="http://3.bp.blogspot.com/-bVruWs-TAzg/UQVR4cUArWI/AAAAAAAAAmQ/2AmuZwzDhQU/s1600/lightLeak1.png"><img alt="" src="http://3.bp.blogspot.com/-bVruWs-TAzg/UQVR4cUArWI/AAAAAAAAAmQ/2AmuZwzDhQU/s1600/lightLeak1.png" width="320" height="248" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center">apply the AO can only hide the leaking a bit</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
<p><b><span class="Apple-style-span" style="font-size: large">Dynamic update</span></b></p>
<p>To make the indirect illumination calculation faster, 5 ways are used to speed up the calculation a bit.</p>
<p>First, when the scene is initialized, the first 4 steps that described in the overview section are performed for all the static geometry. Then in every frame, we re-calculate all the 5 steps for voxels that affected by dynamic objects. So, only dynamic objects will be voxelized every frame (while static voxels are already stored in the octree and we don&#8217;t overwrite those data which can be identified by bit flag stored in the node). Those dynamic voxel fragments are appended to the end of the octree node buffer which can be cleared easily. Also, we need to reset the static octree node neighbor offset which points to dynamic nodes, and those static nodes that affected by dynamic voxels in previous and current frame need to re-filter again. Those nodes are found by dispatching threads for all the nodes and queue those node index into another buffer. So only those with changed value will be re-calculated.</p>
<p>Second, for voxelizing the dynamic geometry, the InterlockedMax() function is used to compute both the diffuse and normal for the octree voxel which is faster than performing an atomic average. Note that using InterlockedMax() function for the normal will decrease the lighting quality if the voxel volume is at a small size(e.g. 256x256x256). You can see the artifact in the below figure:</p>
<table>
<tbody>
<tr>
<td>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a style="margin-left: auto;margin-right: auto" href="http://4.bp.blogspot.com/-MsDVhrYUW24/UQVSfesq4MI/AAAAAAAAAmY/tAxrLB6uhGc/s1600/duMax0.png"><img alt="" src="http://4.bp.blogspot.com/-MsDVhrYUW24/UQVSfesq4MI/AAAAAAAAAmY/tAxrLB6uhGc/s1600/duMax0.png" width="320" height="248" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center">Reflect light using an average voxel normal</td>
</tr>
</tbody>
</table>
</td>
<td>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a style="margin-left: auto;margin-right: auto" href="http://1.bp.blogspot.com/-pdyte6aypYk/UQVSjiC57dI/AAAAAAAAAmg/iRv5cvNOav8/s1600/duMax1.png"><img alt="" src="http://1.bp.blogspot.com/-pdyte6aypYk/UQVSjiC57dI/AAAAAAAAAmg/iRv5cvNOav8/s1600/duMax1.png" width="320" height="248" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center">Reflect light using InterlockedMax() voxel normal.</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
<p>So I only use InterlockedMax() function for normal in the dynamic geometry and keeping the static geometry computing an average normal. That is why you can see the octree voxel data structure(in octree building pass section) store a 1 byte counter along with the normal but not other attributes.</p>
<p>Third, I perform a view-frustum culling when injecting the direct lighting into the octree because there is no point to filter the light that is far from the camera where we never sample it from the current point of view. An extended frustum is calculated from the current camera (refer to the figure below) for culling.</p>
<div class="separator" style="clear: both;text-align: center"><a style="margin-left: 1em;margin-right: 1em" href="http://4.bp.blogspot.com/-EpQcL8m2LUE/UQVTPk2wMoI/AAAAAAAAAmo/ZB5I02PpSTc/s1600/frustumCulling.png"><img alt="" src="http://4.bp.blogspot.com/-EpQcL8m2LUE/UQVTPk2wMoI/AAAAAAAAAmo/ZB5I02PpSTc/s1600/frustumCulling.png" width="196" height="200" border="0" /></a></div>
<p>This frustum is simply moving the camera backward a bit with increased far plane using the same field of view. The extended distance is calculated by the maximum cone tracing distance (I limited the cone tracing distance in the demo, which lost the ability to sample from far objects like the sky) and the filtered voxel size.</p>
<p>Fourthly, I perform the cone tracing pass at half resolution of the screen resolution. However, this result in visible artifact when up-scaling to full resolution especially at the edge of the geometry (the strength of the indirect lighting in the following screen shots are increased to show the artifact more clearly).</p>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a style="margin-left: auto;margin-right: auto" href="http://2.bp.blogspot.com/-Uxm44Xox4sc/UQVTrCqnbjI/AAAAAAAAAmw/tcdlFaA9QUg/s1600/duUpScaleHalfResNone.png"><img alt="" src="http://2.bp.blogspot.com/-Uxm44Xox4sc/UQVTrCqnbjI/AAAAAAAAAmw/tcdlFaA9QUg/s1600/duUpScaleHalfResNone.png" width="320" height="248" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center">up-sample with only bilinear filtering</td>
</tr>
</tbody>
</table>
<p>So, I first decided to just fix those pixels by finding them with an edge detection filter using depth buffer.</p>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a style="margin-left: auto;margin-right: auto" href="http://4.bp.blogspot.com/-9gFKRv9kzvw/UQVTzJNxcSI/AAAAAAAAAm4/g6aPih3XPg4/s1600/duUpScaleHalfResEdgeDetection.png"><img alt="" src="http://4.bp.blogspot.com/-9gFKRv9kzvw/UQVTzJNxcSI/AAAAAAAAAm4/g6aPih3XPg4/s1600/duUpScaleHalfResEdgeDetection.png" width="320" height="248" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center">perform an edge detection pas</td>
</tr>
</tbody>
</table>
<p>And then perform cone tracing at those pixel again at full resolution:</p>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a style="margin-left: auto;margin-right: auto" href="http://4.bp.blogspot.com/-UWt20sb14Vs/UQVT73f6jRI/AAAAAAAAAnA/6v6v2B6Y0Vw/s1600/duUpScaleHalfResReTrace.png"><img alt="" src="http://4.bp.blogspot.com/-UWt20sb14Vs/UQVT73f6jRI/AAAAAAAAAnA/6v6v2B6Y0Vw/s1600/duUpScaleHalfResReTrace.png" width="320" height="248" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center">perform the cone-tracing again in the edge pixel at full resolution</td>
</tr>
</tbody>
</table>
<p>Some of the artifacts are gone, but the frame rate drops a lot again&#8230;</p>
<p>So, my second attempt is to just simply blur those pixels (averaging with the neighbor pixel). The quality of blurring is not as good as re-compute the cone tracing, but it is much faster.</p>
<table>
<tbody>
<tr>
<td>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a style="margin-left: auto;margin-right: auto" href="http://4.bp.blogspot.com/-xLV_g8cuF44/UQVUJyJ6h5I/AAAAAAAAAnI/cKMmzn5IU7g/s1600/duUpScaleHalfResBlur.png"><img alt="" src="http://4.bp.blogspot.com/-xLV_g8cuF44/UQVUJyJ6h5I/AAAAAAAAAnI/cKMmzn5IU7g/s1600/duUpScaleHalfResBlur.png" width="320" height="248" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center">up sample with blurring at edge</td>
</tr>
</tbody>
</table>
</td>
<td>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a style="margin-left: auto;margin-right: auto" href="http://4.bp.blogspot.com/-9gLIX8e_wes/UQVUPYxPPFI/AAAAAAAAAnQ/cQlHN2JRNUk/s1600/duUpScaleFullRes.png"><img alt="" src="http://4.bp.blogspot.com/-9gLIX8e_wes/UQVUPYxPPFI/AAAAAAAAAnQ/cQlHN2JRNUk/s1600/duUpScaleFullRes.png" width="320" height="248" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center">cone tracing at full resolution for reference</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
<div></div>
<div>Lastly, The voxel world can be updated at a different frequency than the render frame rate. By assuming the  light source/dynamic models are not moving very fast, we can take advantage of the temporary coherency and perform the voxelization at a lower frequency, say update at every 5 frames. We can trade the accuracy of the voxels for update speed, but this may result in the artifact shown below:</div>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a style="margin-left: auto;margin-right: auto" href="http://1.bp.blogspot.com/-gmnKqXtcuw4/USuEm-2gxLI/AAAAAAAAApU/Bf9je4_cogk/s1600/duInterval.png"><img alt="" src="http://1.bp.blogspot.com/-gmnKqXtcuw4/USuEm-2gxLI/AAAAAAAAApU/Bf9je4_cogk/s1600/duInterval.png" width="320" height="248" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center">The voxel model is lag behind the triangle<br />
mesh due to the lower update frequency</td>
</tr>
</tbody>
</table>
<p><b><span class="Apple-style-span" style="font-size: large">Conclusion</span></b></p>
<p>The advantage of voxel cone tracing is to compute the GI in real-time with both the dynamic lighting and geometry. Also the specular indirect lighting gives a very nice glossy effect. However, it uses lots of the processing power/memory and quality of lighting is not as good as the baked solution. In my implementation, I can only use 1 directional light to compute single bounce indirect lighting. And there are still room to improve in my implementation as fewer cones can be used for tracing, using the depth buffer for view frustum culling, better up-sampling when performing cone tracing and divide the voxels into several regions to handle a larger scene like Unreal Engine does. But there is never enough time to implement all that stuff&#8230; So I decided to release the <a href="https://docs.google.com/file/d/0B_CrrCOiha-VdWp4cXNZcllWRmM/edit">demo</a> at this stage first. In the demo, I have added some simple interface (for changing stuffs like the light direction, indirect lighting strength) for you to play around with. Hope you all enjoy the <a href="https://docs.google.com/file/d/0B_CrrCOiha-VdWp4cXNZcllWRmM/edit">demo</a>.</p>
<p>Finally, I would like to thanks Kevin Gadd and Luke Hutchinson for reviewing this article.</p>
<table>
<tbody>
<tr>
<td>
<div class="separator" style="clear: both;text-align: center"><a style="margin-left: 1em;margin-right: 1em" href="http://1.bp.blogspot.com/-3XIunxXIcFI/UQVU6EscbcI/AAAAAAAAAnY/MppCCKxFpec/s1600/demo0.png"><img alt="" src="http://1.bp.blogspot.com/-3XIunxXIcFI/UQVU6EscbcI/AAAAAAAAAnY/MppCCKxFpec/s1600/demo0.png" width="320" height="248" border="0" /></a></div>
</td>
<td>
<div class="separator" style="clear: both;text-align: center"><a style="margin-left: 1em;margin-right: 1em" href="http://1.bp.blogspot.com/-slFJ6rttJco/UQVU-PxzkbI/AAAAAAAAAng/mAelRkhNNfA/s1600/demo1.png"><img alt="" src="http://1.bp.blogspot.com/-slFJ6rttJco/UQVU-PxzkbI/AAAAAAAAAng/mAelRkhNNfA/s1600/demo1.png" width="320" height="248" border="0" /></a></div>
</td>
</tr>
</tbody>
</table>
<p>&nbsp;</p>
</div>
<div><b>References</b></div>
<div><span class="Apple-style-span" style="font-size: x-small">[1] The Technology Behind the “Unreal Engine 4 Elemental demo” <a href="http://www.unrealengine.com/files/misc/The_Technology_Behind_the_Elemental_Demo_16x9_(2).pdf">http://www.unrealengine.com/files/misc/The_Technology_Behind_the_Elemental_Demo_16x9_(2).pdf</a></span></div>
<div><span class="Apple-style-span" style="font-size: x-small">[2] Interactive Indirect Illumination Using Voxel Cone Tracing <a href="http://maverick.inria.fr/Publications/2011/CNSGE11b/GIVoxels-pg2011-authors.pdf">http://maverick.inria.fr/Publications/2011/CNSGE11b/GIVoxels-pg2011-authors.pdf</a></span></div>
<div><span class="Apple-style-span" style="font-size: x-small">[3] Octree-Based Sparse Voxelization Using the GPU Hardware Rasterizer <a href="http://www.seas.upenn.edu/%7Epcozzi/OpenGLInsights/OpenGLInsights-SparseVoxelization.pdf">http://www.seas.upenn.edu/%7Epcozzi/OpenGLInsights/OpenGLInsights-SparseVoxelization.pdf</a></span></div>
<div><span class="Apple-style-span" style="font-size: x-small">[4] GigaVoxels: A Voxel-Based Rendering Pipeline For Efficient Exploration Of Large And Detailed Scenes</span><br />
<a href="http://maverick.inria.fr/Publications/2011/Cra11/"><span class="Apple-style-span" style="font-size: x-small">http://maverick.inria.fr/Publications/2011/Cra11/</span></a></div>
<div><span class="Apple-style-span" style="font-size: x-small">[5] GPU Gems 2: Conservative Rasterization <a href="http://http.developer.nvidia.com/GPUGems2/gpugems2_chapter42.html">http://http.developer.nvidia.com/GPUGems2/gpugems2_chapter42.html</a></span><br />
<span class="Apple-style-span" style="font-size: x-small">[6] Shading in Valve’s Source Engine <a href="http://www.valvesoftware.com/publications/2006/SIGGRAPH06_Course_ShadingInValvesSourceEngine.pdf">http://www.valvesoftware.com/publications/2006/SIGGRAPH06_Course_ShadingInValvesSourceEngine.pdf</a></span><br />
<span class="Apple-style-span" style="font-size: x-small">[7] Perpendicular Possibilities <a href="http://blog.selfshadow.com/2011/10/17/perp-vectors/">http://blog.selfshadow.com/2011/10/17/perp-vectors/</a></span></div>
<div><span class="Apple-style-span" style="font-size: x-small">[8] A couple of notes about Z <a href="http://www.humus.name/index.php?ID=255">http://www.humus.name/index.php?ID=255</a></span></div>
<div></div>
<div></div>
<div></div>
<div></div>
<div></div>
<div></div>
<div></div>
</div>
]]></content:encoded>
			<wfw:commentRss>http://www.altdevblogaday.com/2013/01/31/implementing-voxel-cone-tracing/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Angle based SSAO</title>
		<link>http://www.altdevblogaday.com/2012/10/12/angle-based-ssao/</link>
		<comments>http://www.altdevblogaday.com/2012/10/12/angle-based-ssao/#comments</comments>
		<pubDate>Fri, 12 Oct 2012 02:41:30 +0000</pubDate>
		<dc:creator>Simon Yeung</dc:creator>
				<category><![CDATA[Computer Graphics]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[graphics]]></category>
		<category><![CDATA[SSAO]]></category>

		<guid isPermaLink="false">http://www.altdevblogaday.com/?p=28316</guid>
		<description><![CDATA[<p><strong><span class="Apple-style-span" style="font-size: large">Introduction</span></strong></p>
<p><a href="http://en.wikipedia.org/wiki/Screen_space_ambient_occlusion">SSAO (Screen space ambient occlusion)</a> is a common post processing effect that approximate how much light is occluded in a given surface by the surrounding objects. In this year SIGGRAPH, there are a few slides in <a href="http://advances.realtimerendering.com/s2012/Epic/The%20Technology%20Behind%20the%20Elemental%20Demo%2016x9.pptx">&#8220;The Technology behind the Unreal Engine 4 Elemental Demo&#8221;</a> about how they implement SSAO. Their technique can either use only the depth buffer or with the addition of per-pixel normal. And I tried to implement both version with a slight modification:</p>
<p><a href="http://www.altdevblogaday.com/2012/10/12/angle-based-ssao/" class="more-link">Read more on Angle based SSAO&#8230;</a></p>
]]></description>
				<content:encoded><![CDATA[<p><strong><span class="Apple-style-span" style="font-size: large">Introduction</span></strong></p>
<p><a href="http://en.wikipedia.org/wiki/Screen_space_ambient_occlusion">SSAO (Screen space ambient occlusion)</a> is a common post processing effect that approximate how much light is occluded in a given surface by the surrounding objects. In this year SIGGRAPH, there are a few slides in <a href="http://advances.realtimerendering.com/s2012/Epic/The%20Technology%20Behind%20the%20Elemental%20Demo%2016x9.pptx">&#8220;The Technology behind the Unreal Engine 4 Elemental Demo&#8221;</a> about how they implement SSAO. Their technique can either use only the depth buffer or with the addition of per-pixel normal. And I tried to implement both version with a slight modification:</p>
<table>
<tbody>
<tr>
<td>
<div class="separator" style="clear: both;text-align: center"><a href="http://1.bp.blogspot.com/-drIom4UieQk/UGsR9DsCXSI/AAAAAAAAAa0/GTInkJX0d_Q/s1600/finalSSAO.png"><img src="http://1.bp.blogspot.com/-drIom4UieQk/UGsR9DsCXSI/AAAAAAAAAa0/GTInkJX0d_Q/s320/finalSSAO.png" alt="" width="320" height="248" border="0" /></a></div>
</td>
<td>
<div class="separator" style="clear: both;text-align: center"><a href="http://2.bp.blogspot.com/-x-P_nUAliGw/UGsSBcsZ21I/AAAAAAAAAa8/SmW8SF16170/s1600/final.png"><img src="http://2.bp.blogspot.com/-x-P_nUAliGw/UGsSBcsZ21I/AAAAAAAAAa8/SmW8SF16170/s320/final.png" alt="" width="320" height="248" border="0" /></a></div>
</td>
</tr>
</tbody>
</table>
<p><strong><span class="Apple-style-span" style="font-size: large">Using only the depth buffer</span></strong></p>
<p>The definition of <a href="http://en.wikipedia.org/wiki/Ambient_occlusion">ambient occlusion</a> is to calculate the visibility integral over the hemisphere of a given surface:</p>
<div class="separator" style="clear: both;text-align: center"><a href="http://2.bp.blogspot.com/-NKUjkkDPW7I/UGuZuxHDEvI/AAAAAAAAAck/uGrCy3QKh8g/s1600/aoIntegral.png"><img src="http://2.bp.blogspot.com/-NKUjkkDPW7I/UGuZuxHDEvI/AAAAAAAAAck/uGrCy3QKh8g/s320/aoIntegral.png" alt="" width="320" height="66" border="0" /></a></div>
<p>To approximate this in screen space, we design our sampling pattern as paired samples:</p>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a href="http://1.bp.blogspot.com/-lzDuL7ywx5Y/UGuZ0YZL6rI/AAAAAAAAAcs/Q1JVSBOJMKA/s1600/pattern.png"><img src="http://1.bp.blogspot.com/-lzDuL7ywx5Y/UGuZ0YZL6rI/AAAAAAAAAcs/Q1JVSBOJMKA/s200/pattern.png" alt="" width="200" height="156" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center">paired sample pattern</td>
</tr>
</tbody>
</table>
<p>So for each pair of samples, we can approximate how much the shading point is occluded in 2D instead of integrating over the hemisphere:</p>
<div class="separator" style="clear: both;text-align: center"><a href="http://2.bp.blogspot.com/-Me10fR56gHs/UGuZ5pLGQoI/AAAAAAAAAc0/hZhdPr1GIpU/s1600/ao2D.png"><img src="http://2.bp.blogspot.com/-Me10fR56gHs/UGuZ5pLGQoI/AAAAAAAAAc0/hZhdPr1GIpU/s200/ao2D.png" alt="" width="200" height="144" border="0" /></a></div>
<p>The AO term for each given pair of samples will be min( (θ<span class="Apple-style-span" style="font-size: xx-small">left</span> + θ<span class="Apple-style-span" style="font-size: xx-small">right</span>)/π, 1). Then by averaging the AO terms of all the sample pairs (in my case, there are 6 pairs), we achieve the following result:</p>
<div class="separator" style="clear: both;text-align: center"><a href="http://1.bp.blogspot.com/-L2X8QQuRm-k/UGsSq_xSkwI/AAAAAAAAAbE/79HVZD726Ro/s1600/ssaoDepthOnlyNoDisAtten.png"><img src="http://1.bp.blogspot.com/-L2X8QQuRm-k/UGsSq_xSkwI/AAAAAAAAAbE/79HVZD726Ro/s320/ssaoDepthOnlyNoDisAtten.png" alt="" width="320" height="248" border="0" /></a></div>
<p><strong><span class="Apple-style-span" style="font-size: large">Dealing with large depth differences</span></strong></p>
<p>As seen from the above screen shot, there is dark halos around the knight. But the knight should not contribute AO to the castle as he is too far away. So to deal with the large depth differences. I adopt the approach used in <a href="http://advances.realtimerendering.com/s2010/Ownby,Hall%20and%20Hall%20-%20Toystory3%20(SIGGRAPH%202010%20Advanced%20RealTime%20Rendering%20Course).pdf">Toy Story 3</a>. If one of the paired sample is too far away from the shading point, say the red point in the following figure, it will be replace by the pink point, which is on the same plane as the other valid paired sample:</p>
<div class="separator" style="clear: both;text-align: center"><a href="http://1.bp.blogspot.com/-850b-rBaKp0/UGuaMkQJodI/AAAAAAAAAc8/M0oUukbYqcE/s1600/largeDepth.png"><img src="http://1.bp.blogspot.com/-850b-rBaKp0/UGuaMkQJodI/AAAAAAAAAc8/M0oUukbYqcE/s200/largeDepth.png" alt="" width="200" height="180" border="0" /></a></div>
<p>So we can interpolate between the red point and the pink point for dealing with the large depth difference. Now the dark halo has gone:</p>
<div class="separator" style="clear: both;text-align: center"><a href="http://3.bp.blogspot.com/-9O5WWj6WrsI/UGsS46doFPI/AAAAAAAAAbM/Wv2lgq9TC6M/s1600/ssaoDepthOnlyDisAttenNoWeight.png"><img src="http://3.bp.blogspot.com/-9O5WWj6WrsI/UGsS46doFPI/AAAAAAAAAbM/Wv2lgq9TC6M/s320/ssaoDepthOnlyDisAttenNoWeight.png" alt="" width="320" height="248" border="0" /></a></div>
<p>The above treatment only handle if one of the paired sample is far away from shading point. What if both of the samples have large depth differences?</p>
<table>
<tbody>
<tr>
<td>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a href="http://3.bp.blogspot.com/-BQPnvxeS5M4/UHd-KFLHVtI/AAAAAAAAAd8/ECofPtuQldo/s1600/ssaoDepthOnlyDisAttenNoWeightArtifact.png"><img src="http://3.bp.blogspot.com/-BQPnvxeS5M4/UHd-KFLHVtI/AAAAAAAAAd8/ECofPtuQldo/s1600/ssaoDepthOnlyDisAttenNoWeightArtifact.png" alt="" width="320" height="248" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center">dark halo artifact is shown around the sword</td>
</tr>
</tbody>
</table>
</td>
<td>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a href="http://2.bp.blogspot.com/-TMdmEecY7MM/UHd-RhtjJmI/AAAAAAAAAeE/G6r2Byc7ObU/s1600/ssaoDepthOnlyDisAttenNoWeightArtifactCombine.png"><img src="http://2.bp.blogspot.com/-TMdmEecY7MM/UHd-RhtjJmI/AAAAAAAAAeE/G6r2Byc7ObU/s1600/ssaoDepthOnlyDisAttenNoWeightArtifactCombine.png" alt="" width="320" height="248" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center">AO strength of this pic is increased to high light the artifact</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
<p>In this case, it will result in the dark halo around the sword in the above screen shot. Remember we are averaging the all the paired samples to compute the final AO value. So to deal with this artifact, we just assign a weight to each paired samples and then re-normalize the final result. Say, for each paired sample, if both of the samples are within a small depth differences, that sample pair will have a weight of 1. If only 1 sample is far away, that pair will have a weight of 0.5. And finally if both of the samples is far away, the weight will be 0. This can eliminate most(but not all) of the artifacts:</p>
<div class="separator" style="clear: both;text-align: center"><a href="http://4.bp.blogspot.com/-ReDzjqkwwJc/UHd-UkD-qGI/AAAAAAAAAeM/s0uqq8e7gU4/s1600/ssaoDepthOnlyDisAttenWithWeight.png"><img src="http://4.bp.blogspot.com/-ReDzjqkwwJc/UHd-UkD-qGI/AAAAAAAAAeM/s0uqq8e7gU4/s1600/ssaoDepthOnlyDisAttenWithWeight.png" alt="" width="320" height="248" border="0" /></a></div>
<p><strong><span class="Apple-style-span" style="font-size: large">Approximating arc-cos function</span></strong></p>
<p>In this approach, the AO is calculated by using the angle between the paired samples, which need to evaluate the arc-cos function which is a bit expensive. We can approximate acos(x) with a linear function:  π(1-x)/2.</p>
<div class="separator" style="clear: both;text-align: center"><a href="http://1.bp.blogspot.com/-z9JX5f6oJqc/UGsT1nYSp6I/AAAAAAAAAbs/F7YCjq9Sc_Q/s1600/acos_graphLinear.png"><img src="http://1.bp.blogspot.com/-z9JX5f6oJqc/UGsT1nYSp6I/AAAAAAAAAbs/F7YCjq9Sc_Q/s320/acos_graphLinear.png" alt="" width="320" height="178" border="0" /></a></div>
<p>And the resulting AO looks much darker with this approximation:</p>
<table>
<tbody>
<tr>
<td>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a href="http://1.bp.blogspot.com/-8vM4blit0E0/UGsUFhjEw6I/AAAAAAAAAb0/srJ_UKXueyE/s1600/acos.png"><img src="http://1.bp.blogspot.com/-8vM4blit0E0/UGsUFhjEw6I/AAAAAAAAAb0/srJ_UKXueyE/s320/acos.png" alt="" width="320" height="248" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center"><span class="Apple-style-span">computed </span>with<span class="Apple-style-span"> the arc-cos function</span></td>
</tr>
</tbody>
</table>
</td>
<td>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a href="http://3.bp.blogspot.com/-iqa85VVFz34/UGsUPuYwwhI/AAAAAAAAAb8/K1A3NIlQRMQ/s1600/acos_linearApprox.png"><img src="http://3.bp.blogspot.com/-iqa85VVFz34/UGsUPuYwwhI/AAAAAAAAAb8/K1A3NIlQRMQ/s320/acos_linearApprox.png" alt="" width="320" height="248" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center"><span class="Apple-style-span">computed </span>with<span class="Apple-style-span"> the linear approximation</span></td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
<p>Note that the maximum error between the two function is around 18.946 degree.</p>
<div class="separator" style="clear: both;text-align: center"><a href="http://2.bp.blogspot.com/-yUoq9kw8lUw/UGxS9nYRZzI/AAAAAAAAAdc/H_yeOXzQOxc/s1600/error_linearProve.png"><img src="http://2.bp.blogspot.com/-yUoq9kw8lUw/UGxS9nYRZzI/AAAAAAAAAdc/H_yeOXzQOxc/s200/error_linearProve.png" alt="" width="200" height="196" border="0" /></a></div>
<p>This may affect the AO for the area of a curved surface with low tessellation. You may either need to increase the bias angle threshold or switch to a more accurate function. So my second attempt is to approximate it with a quadratic function:  π(1- sign(x) * x * x)/2.</p>
<div class="separator" style="clear: both;text-align: center"><a href="http://2.bp.blogspot.com/-dTQBBBeHSWQ/UGsV5G2y7ZI/AAAAAAAAAcM/L0MREzz0z6I/s1600/acos_graphQuadratic.png"><img src="http://2.bp.blogspot.com/-dTQBBBeHSWQ/UGsV5G2y7ZI/AAAAAAAAAcM/L0MREzz0z6I/s400/acos_graphQuadratic.png" alt="" width="400" height="197" border="0" /></a></div>
<p>And this approximation shows a much similar result to the one using the arc-cos function.</p>
<table>
<tbody>
<tr>
<td>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a href="http://1.bp.blogspot.com/-8vM4blit0E0/UGsUFhjEw6I/AAAAAAAAAb0/srJ_UKXueyE/s1600/acos.png"><img src="http://1.bp.blogspot.com/-8vM4blit0E0/UGsUFhjEw6I/AAAAAAAAAb0/srJ_UKXueyE/s320/acos.png" alt="" width="320" height="248" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center">computed with the arc-cos function</td>
</tr>
</tbody>
</table>
</td>
<td>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a href="http://2.bp.blogspot.com/-aNjP2IxIZ0U/UGsUV_iiqgI/AAAAAAAAAcE/lZhdcZtB16o/s1600/acos_quadraticApprox.png"><img src="http://2.bp.blogspot.com/-aNjP2IxIZ0U/UGsUV_iiqgI/AAAAAAAAAcE/lZhdcZtB16o/s320/acos_quadraticApprox.png" alt="" width="320" height="248" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center"><span class="Apple-style-span">computed </span>with<span class="Apple-style-span"> the quadratic approximation</span></td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
<p>And the maximum error of this function is around 9.473 degree.</p>
<div class="separator" style="clear: both;text-align: center"><a href="http://2.bp.blogspot.com/-KfygFiMxqno/UGxTO5004AI/AAAAAAAAAdk/9hsnSWl_Ddk/s1600/error_quadraticProve.png"><img src="http://2.bp.blogspot.com/-KfygFiMxqno/UGxTO5004AI/AAAAAAAAAdk/9hsnSWl_Ddk/s320/error_quadraticProve.png" alt="" width="320" height="265" border="0" /></a></div>
<p><strong><span class="Apple-style-span" style="font-size: large">Using per-pixel normal</span></strong></p>
<p>We can enhance the details of AO by making use of the per-pixel normal. The per-pixel normal is used for further restricting the angle to compute the AO where the angle θ<span class="Apple-style-span" style="font-size: xx-small">left</span>, θ<span class="Apple-style-span" style="font-size: xx-small">right</span> are clamped to the tangent plane :</p>
<div class="separator" style="clear: both;text-align: center"><a href="http://4.bp.blogspot.com/-13mf9GAdRuU/UGuaT96AoxI/AAAAAAAAAdE/wRhp4A-mMUo/s1600/restrictByNormal.png"><img src="http://4.bp.blogspot.com/-13mf9GAdRuU/UGuaT96AoxI/AAAAAAAAAdE/wRhp4A-mMUo/s200/restrictByNormal.png" alt="" width="200" height="147" border="0" /></a></div>
<p>And here is the final result:</p>
<div class="separator" style="clear: both;text-align: center"><a href="http://1.bp.blogspot.com/-drIom4UieQk/UGsR9DsCXSI/AAAAAAAAAa0/GTInkJX0d_Q/s1600/finalSSAO.png"><img src="http://1.bp.blogspot.com/-drIom4UieQk/UGsR9DsCXSI/AAAAAAAAAa0/GTInkJX0d_Q/s320/finalSSAO.png" alt="" width="320" height="248" border="0" /></a></div>
<p><strong><span class="Apple-style-span" style="font-size: large">Conclusion</span></strong></p>
<p>The result of this AO is pleasant by taking total 12 samples per pixel and with 16 rotation in 4&#215;4 pixel block at half resolution. I did not apply bilateral blur to the AO result, but applying the blur may gives a softer AO look. Also approximating the arc-cos function with a linear function although is not accurate, but it gives a good enough result for me. Finally more time are need to spend on generating the sampling pattern in the future where the pattern I currently used is nearly uniform distributed (with some jittering).</p>
<p><strong>References</strong><br />
[1] The Technology behind the Unreal Engine 4 Elemental Demo <a href="http://advances.realtimerendering.com/s2012/Epic/The%20Technology%20Behind%20the%20Elemental%20Demo%2016x9.pptx">http://advances.realtimerendering.com/s2012/Epic/The%20Technology%20Behind%20the%20Elemental%20Demo%2016&#215;9.pptx</a><br />
[2] Rendering techniques in Toy Story 3 <a href="http://advances.realtimerendering.com/s2010/Ownby,Hall%20and%20Hall%20-%20Toystory3%20(SIGGRAPH%202010%20Advanced%20RealTime%20Rendering%20Course).pdf">http://advances.realtimerendering.com/s2010/Ownby,Hall%20and%20Hall%20-%20Toystory3%20(SIGGRAPH%202010%20Advanced%20RealTime%20Rendering%20Course).pdf</a><br />
[3] Image-Space Horizon-Based Ambient Occlusion <a href="http://www.nvidia.com/object/siggraph-2008-HBAO.html">http://www.nvidia.com/object/siggraph-2008-HBAO.html</a><br />
[4] <a href="http://www.wolframalpha.com/">http://www.wolframalpha.com/</a><br />
[5] The models are export from UDK and extracted from Infinity Blade using umodel.exe</p>
]]></content:encoded>
			<wfw:commentRss>http://www.altdevblogaday.com/2012/10/12/angle-based-ssao/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Shader Generator</title>
		<link>http://www.altdevblogaday.com/2012/08/01/shader-generator/</link>
		<comments>http://www.altdevblogaday.com/2012/08/01/shader-generator/#comments</comments>
		<pubDate>Wed, 01 Aug 2012 15:33:43 +0000</pubDate>
		<dc:creator>Simon Yeung</dc:creator>
				<category><![CDATA[Computer Graphics]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[Tools]]></category>
		<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[graphics]]></category>
		<category><![CDATA[shader generator]]></category>

		<guid isPermaLink="false">http://www.altdevblogaday.com/?p=27083</guid>
		<description><![CDATA[<p><strong>Introduction</strong></p>
<p>In the last few weeks, I was busy with rewriting my iPhone engine so that it can also run on the Windows platform (so that I can use Visual Studio in stead of Xcode~) and most importantly, I can play around with D3D11. During the rewrite, I want to improve the process of writing shaders so that I don&#8217;t need to write similar shaders multiple times for each shader permutation (say, for each surface, I have to write a shader for static mesh, skinned mesh, instanced static mesh&#8230; multiplies with the number of render pass), and instead I can focus on coding how the surface would looks like. So I decided to write a shader generator that will generate those shaders which is similar to the <a href="http://docs.unity3d.com/Documentation/Components/SL-SurfaceShaders.html">surface shader in Unity</a>. I choose the surface shader approach instead of a graph based approach like <a href="http://udn.epicgames.com/Three/MaterialEditorUserGuide.html">Unreal Engine</a>, because being a programer, I feel more comfortable (and faster) to write code than dragging tree nodes using the GUI. In the current implementation of the shader generator, it can only generate vertex and pixel shaders for the light pre pass renderer which is the lighting model used before.</p>
<p><a href="http://www.altdevblogaday.com/2012/08/01/shader-generator/" class="more-link">Read more on Shader Generator&#8230;</a></p>
]]></description>
				<content:encoded><![CDATA[<p><strong>Introduction</strong></p>
<p>In the last few weeks, I was busy with rewriting my iPhone engine so that it can also run on the Windows platform (so that I can use Visual Studio in stead of Xcode~) and most importantly, I can play around with D3D11. During the rewrite, I want to improve the process of writing shaders so that I don&#8217;t need to write similar shaders multiple times for each shader permutation (say, for each surface, I have to write a shader for static mesh, skinned mesh, instanced static mesh&#8230; multiplies with the number of render pass), and instead I can focus on coding how the surface would looks like. So I decided to write a shader generator that will generate those shaders which is similar to the <a href="http://docs.unity3d.com/Documentation/Components/SL-SurfaceShaders.html">surface shader in Unity</a>. I choose the surface shader approach instead of a graph based approach like <a href="http://udn.epicgames.com/Three/MaterialEditorUserGuide.html">Unreal Engine</a>, because being a programer, I feel more comfortable (and faster) to write code than dragging tree nodes using the GUI. In the current implementation of the shader generator, it can only generate vertex and pixel shaders for the light pre pass renderer which is the lighting model used before.</p>
<p><strong>Defining the surface</strong></p>
<p>To generate the target vertex and pixel shaders by the shader generator, we need to define how the surface looks like by writing surface shader. In my version of surface shader, I need to define 3 functions: vertex function, surface function and lighting function. The vertex function defines the vertex properties like position and texture coordinates.</p>
<blockquote class="tr_bq">
<ol>
<li>VTX_FUNC_OUTPUT vtxFunc(VTX_FUNC_INPUT input)</li>
<li>{</li>
<li>    VTX_FUNC_OUTPUT output;</li>
<li>    output.position = mul( float4(input.position, 1), worldViewProj  );</li>
<li>    output.normal = mul( worldInv, float4(input.normal, 0) ).xyz;</li>
<li>    output.uv0 = input.uv0;</li>
<li>    return output;</li>
<li>}</li>
</ol>
</blockquote>
<p>The surface function which describe how the surface looks like by defining the diffuse color of the surface, glossiness and the surface normal.</p>
<blockquote class="tr_bq">
<ol>
<li>SUF_FUNC_OUTPUT sufFunc(SUF_FUNC_INPUT input)</li>
<li>{</li>
<li>    SUF_FUNC_OUTPUT output;</li>
<li>    output.normal = input.normal;</li>
<li>    output.diffuse = diffuseTex.Sample( samplerLinear, input.uv0 ).rgb;</li>
<li>    output.glossiness = glossiness;</li>
<li>    return output;</li>
<li>}</li>
</ol>
</blockquote>
<p>Finally the lighting function will decide which lighting model is used to calculate the reflected color of the surface.</p>
<blockquote class="tr_bq">
<ol>
<li>LIGHT_FUNC_OUTPUT lightFuncLPP(LIGHT_FUNC_INPUT input)</li>
<li>{</li>
<li>    LIGHT_FUNC_OUTPUT output;</li>
<li>    float4 lightColor = lightBuffer.Sample(samplerLinear, input.pxPos.xy * renderTargetSizeInv.xy );</li>
<li>    output.color = float4(input.diffuse * lightColor.rgb, 1);</li>
<li>    return output;</li>
<li>}</li>
</ol>
</blockquote>
<p>By defining the above functions, writer of the surface shader only need to fill in the output structure of the function by using the input structure with some auxiliary functions and shader constants provided by the engine.</p>
<p><strong>Generating the shaders</strong></p>
<p>As you can see in the above code snippet, my surface shader is just defining normal HLSL function with a fixed input and output structure for the functions. So to generate the vertex and pixel shaders, we just need to  copy these functions to the target shader code which will invoke those functions defined in the surface shader. Take the above vertex function as an example, the generated vertex shader would look like:</p>
<blockquote class="tr_bq">
<ol>
<li>#include &#8220;include.h&#8221;</li>
<li>struct VS_INPUT</li>
<li>{</li>
<li>    float3 position : POSITION0;</li>
<li>    float3 normal : NORMAL0;</li>
<li>    float2 uv0 : UV0;</li>
<li>};</li>
<li>struct VS_OUTPUT</li>
<li>{</li>
<li>    float4 position : SV_POSITION0;</li>
<li>    float3 normal : NORMAL0;</li>
<li>    float2 uv0 : UV0;</li>
<li>};</li>
<li>typedef VS_INPUT VTX_FUNC_INPUT;</li>
<li>typedef VS_OUTPUT VTX_FUNC_OUTPUT;</li>
<li>/********************* User Defined Content ********************/</li>
<li>VTX_FUNC_OUTPUT vtxFunc(VTX_FUNC_INPUT input)</li>
<li>{</li>
<li>    VTX_FUNC_OUTPUT output;</li>
<li>    output.position = mul( float4(input.position, 1), worldViewProj  );</li>
<li>    output.normal = mul( worldInv, float4(input.normal, 0) ).xyz;</li>
<li>    output.uv0 = input.uv0;</li>
<li>    return output;</li>
<li>}</li>
<li>/******************** End User Defined Content *****************/</li>
<li>VS_OUTPUT main(VS_INPUT input)</li>
<li>{</li>
<li>    return vtxFunc(input);</li>
<li>}</li>
</ol>
</blockquote>
<p>During code generation, the shader generator need to figure out what input and output structure are needed to feed into the user defined functions. This task is simple and can be accomplished by using some string functions.</p>
<p><strong>Simplifying the shader</strong></p>
<p>As I mentioned before, my shader generator is used for generating shaders used in the light pre pass renderer. There are 2 passes in light pre pass renderer which need different shader input and output. For example in the G-buffer pass, the shaders are only interested in the surface normal data but not the diffuse color while the data need by second geometry pass are the opposite. However all the surface information (surface normal and diffuse color) are defined in the surface function inside the surface shader. If we simply generating shaders like last section, we will generate some redundant code that cannot be optimized by the shader compiler. For example, the pixel shader in G buffer pass may need to sample the diffuse texture which require the texture coordinates input from vertex shader but the diffuse color is actually don&#8217;t needed in this pass, the compiler may not be able to figure out we don&#8217;t need the texture coordinates output in vertex shader. Of course we can force the writer to define some #if preprocessor inside the surface function for the particular render pass to eliminate the useless output, but this will complicated the surface shader authoring process as writing surface shader is to describe how the surface looks like, ideally, don&#8217;t need to worry about the output of a render pass.</p>
<p>So the problem is to figure out what the output data are actually need in a given pass and eliminate those outputs that are not needed. For example, given we are generating shaders for the G buffer pass and a surface function:</p>
<blockquote class="tr_bq">
<ol>
<li>SUF_FUNC_OUTPUT sufFunc(SUF_FUNC_INPUT input)</li>
<li>{</li>
<li>    SUF_FUNC_OUTPUT output;</li>
<li>    output.normal = input.normal;</li>
<li>    output.diffuse = diffuseTex.Sample( samplerLinear, input.uv0 ).rgb;</li>
<li>    output.glossiness = glossiness;</li>
<li>    return output;</li>
<li>}</li>
</ol>
</blockquote>
<p>We only want to keep the variables <em>output.normal</em> and <em>output.glossiness</em>. And the variable <em>output.diffuse</em>, and other variables that is referenced by <em>output.diffuse</em> (<em>diffuseTex</em>, <em>samplerLinear,</em> <em>input.uv0</em>) are going to be eliminated. To find out such variable dependency, we need to teach the shader generator to understand HLSL grammar and find out all the assignment statements and branching conditions to derive the variable dependency.</p>
<p>To do this, we need to generate an abstract syntax tree from the shader source code. Of course we can write our own LALR parser to achieve this goal, but I chose to use <a href="http://dinosaur.compilertools.net/">lex&amp;yacc (or flex&amp;bison)</a> to generate the parse tree. Luckily we are working on a subset of the HLSL syntax(only need to define functions and don&#8217;t need to use pointers) and HLSL syntax is similar to C language, so modifying the ANSI-C grammar rule for <a href="http://www.lysator.liu.se/c/ANSI-C-grammar-l.html">lex</a>&amp;<a href="http://www.lysator.liu.se/c/ANSI-C-grammar-y.html">yacc</a> would do the job. Here is my modified <a href="https://sites.google.com/site/simontechblog/home/lex_yacc/lex.l">grammar</a> <a href="https://sites.google.com/site/simontechblog/home/lex_yacc/rule.y">rule</a> used to generate the parse tree. By traversing the parse tree, the variable dependency can be obtained, hence we know which variables need to be eliminated and eliminate them by taking out the assignment statements, then the compiler will do the rest. Below is the simplified pixel shader generated in the previous example:</p>
<blockquote class="tr_bq">
<ol>
<li>#include &#8220;include.h&#8221;</li>
<li>cbuffer _materialParam : register( MATERIAL_CONSTANT_BUFFER_SLOT_0 )</li>
<li>{</li>
<li>    float glossiness;</li>
<li>};</li>
<li>Texture2D diffuseTex: register( MATERIAL_SHADER_RESOURCE_SLOT_0 );</li>
<li>struct PS_INPUT</li>
<li>{</li>
<li>    float4 position : SV_POSITION0;</li>
<li>    float3 normal : NORMAL0;</li>
<li>};</li>
<li>struct PS_OUTPUT</li>
<li>{</li>
<li>    float4 gBuffer : SV_Target0;</li>
<li>};</li>
<li>struct SUF_FUNC_OUTPUT</li>
<li>{</li>
<li>    float3 normal;</li>
<li>    float glossiness;</li>
<li>};</li>
<li>typedef PS_INPUT SUF_FUNC_INPUT;</li>
<li>/********************* User Defined Content ********************/</li>
<li>SUF_FUNC_OUTPUT sufFunc(SUF_FUNC_INPUT input)</li>
<li>{</li>
<li>    SUF_FUNC_OUTPUT output;</li>
<li>    output.normal = input.normal;</li>
<li>                                                                 ;</li>
<li>    output.glossiness = glossiness;</li>
<li>    return output;</li>
<li>}</li>
<li>/******************** End User Defined Content *****************/</li>
<li>PS_OUTPUT main(PS_INPUT input)</li>
<li>{</li>
<li>    SUF_FUNC_OUTPUT sufOut= sufFunc(input);</li>
<li>    PS_OUTPUT output;</li>
<li>    output.gBuffer= normalToGBuffer(sufOut.normal, sufOut.glossiness);</li>
<li>    return output;</li>
<li>}</li>
</ol>
</blockquote>
<p><strong>Extending the surface shader syntax</strong></p>
<p>As I use lex&amp;yacc to parse the surface shader, I can extend the surface shader syntax by adding more grammar rule, so that writer of the surface shader can define what shader constants and textures are needed in their surface function to generate the constant buffer and shader resources in the source code. Also my surface shader syntax permit user to define their struct and function other than their 3 main functions (vertex, surface and lighting function), where they will also be copied into the generated source code. Here is a sample of how my surface shader would looks like:</p>
<blockquote class="tr_bq">
<ol>
<li>RenderType{</li>
<li>    opaque;</li>
<li>};</li>
<li>ShaderConstant</li>
<li>{</li>
<li>    float glossiness: ui_slider_0_255_Glossiness;</li>
<li>};</li>
<li>TextureResource</li>
<li>{</li>
<li>    Texture2D diffuseTex;</li>
<li>};</li>
<li>VTX_FUNC_OUTPUT vtxFunc(VTX_FUNC_INPUT input)</li>
<li>{</li>
<li>    VTX_FUNC_OUTPUT output;</li>
<li>    output.position = mul( float4(input.position, 1), worldViewProj  );</li>
<li>    output.normal = mul( worldInv, float4(input.normal, 0) ).xyz;</li>
<li>    output.uv0 = input.uv0;</li>
<li>    return output;</li>
<li>}</li>
<li>SUF_FUNC_OUTPUT sufFunc(SUF_FUNC_INPUT input)</li>
<li>{</li>
<li>    SUF_FUNC_OUTPUT output;</li>
<li>    output.normal = input.normal;</li>
<li>    output.diffuse = diffuseTex.Sample( samplerLinear, input.uv0 ).rgb;</li>
<li>    output.glossiness = glossiness;</li>
<li>    return output;</li>
<li>}</li>
<li>LIGHT_FUNC_OUTPUT lightFuncLPP(LIGHT_FUNC_INPUT input)</li>
<li>{</li>
<li>    LIGHT_FUNC_OUTPUT output;</li>
<li>    float4 lightColor = lightBuffer.Sample(samplerLinear, input.pxPos.xy * renderTargetSizeInv.xy );</li>
<li>    output.color = float4(input.diffuse * lightColor.rgb, 1);</li>
<li>    return output;</li>
<li>}</li>
</ol>
</blockquote>
<p><strong>Conclusions</strong></p>
<p>This post described how I generate vertex and pixel shader source codes for different render passes by defining a surface shader which avoid me to write similar shaders multiple times and without worrying the particular shader input and output for each render pass. Currently, the shader generator can only generate vertex and pixel shader in HLSL for static mesh in the light pre pass renderer. The shader generator is still under progress where generating shader source code for the forward pass is still have not done yet. Besides domain, hull and geometry shaders are not implemented. Also GLSL support is missing, but this can be generated (in theory&#8230;) by building a more sophisticated abstract syntax tree during parsing the surface shader grammar or defining some new grammar rule in the surface shader (using lex&amp;yacc) for easier generating both HLSL and GLSL source code. But these will be left for the future as I still need to rewrite my engine and get it running again&#8230;</p>
<p><strong>References</strong><br />
[1] Unity &#8211; Surface Shader Examples <a href="http://docs.unity3d.com/Documentation/Components/SL-SurfaceShaderExamples.html">http://docs.unity3d.com/Documentation/Components/SL-SurfaceShaderExamples.html</a><br />
[2] Lex &amp; Yacc Tutorial <a href="http://epaperpress.com/lexandyacc/">http://epaperpress.com/lexandyacc/</a><br />
[3] ANSI C grammar, Lex specification <a href="http://www.lysator.liu.se/c/ANSI-C-grammar-l.html">http://www.lysator.liu.se/c/ANSI-C-grammar-l.html</a><br />
[4] ANSI C Yacc grammar <a href="http://www.lysator.liu.se/c/ANSI-C-grammar-y.html">http://www.lysator.liu.se/c/ANSI-C-grammar-y.html</a><br />
[5] <a href="http://www.ibm.com/developerworks/opensource/library/l-flexbison/index.html">http://www.ibm.com/developerworks/opensource/library/l-flexbison/index.html</a><br />
[6] <a href="http://www.gamedev.net/topic/200275-yaccbison-locations/">http://www.gamedev.net/topic/200275-yaccbison-locations/</a></p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.altdevblogaday.com/2012/08/01/shader-generator/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Game Developers: Remember Priority #1</title>
		<link>http://www.altdevblogaday.com/2012/07/03/game-developers-remember-priority-1/</link>
		<comments>http://www.altdevblogaday.com/2012/07/03/game-developers-remember-priority-1/#comments</comments>
		<pubDate>Tue, 03 Jul 2012 18:39:27 +0000</pubDate>
		<dc:creator>Aaron San Filippo</dc:creator>
				<category><![CDATA[#gamedev]]></category>
		<category><![CDATA[Bizdev]]></category>
		<category><![CDATA[Game design]]></category>
		<category><![CDATA[Production]]></category>
		<category><![CDATA[creativity]]></category>
		<category><![CDATA[design]]></category>
		<category><![CDATA[development]]></category>
		<category><![CDATA[flippfly]]></category>
		<category><![CDATA[game design]]></category>
		<category><![CDATA[gamedev]]></category>
		<category><![CDATA[graphics]]></category>
		<category><![CDATA[rant]]></category>
		<category><![CDATA[time management]]></category>

		<guid isPermaLink="false">http://www.altdevblogaday.com/?p=26820</guid>
		<description><![CDATA[<p style="text-align: center"><img class="aligncenter" src="http://upload.wikimedia.org/wikipedia/commons/thumb/3/35/Tiziano_-_S%C3%ADsifo.jpg/214px-Tiziano_-_S%C3%ADsifo.jpg" alt="" width="214" height="240" /></p>
<p>(Note: this was originally posted on <a href="http://flippfly.com/news/remember-priority-1/">Flippfly.com</a>)</p>
<p>I questioned the wisdom of writing this, since as of yet, I&#8217;ve not released a highly successful game as an independent developer since quitting my day job back in April. Forest and I have high hopes for <a href="http://flippfly.com">Flippfly</a>, but aside from our moderately successful <a href="http://itunes.apple.com/us/app/monkey-drum-deluxe/id527723426?mt=8">Monkey Drum Deluxe</a>, there&#8217;s really not a lot of inherent credibility to my words that comes with having highly successful products to back them up.</p>
<p><a href="http://www.altdevblogaday.com/2012/07/03/game-developers-remember-priority-1/" class="more-link">Read more on Game Developers: Remember Priority #1&#8230;</a></p>
]]></description>
				<content:encoded><![CDATA[<p style="text-align: center"><img class="aligncenter" src="http://upload.wikimedia.org/wikipedia/commons/thumb/3/35/Tiziano_-_S%C3%ADsifo.jpg/214px-Tiziano_-_S%C3%ADsifo.jpg" alt="" width="214" height="240" /></p>
<p>(Note: this was originally posted on <a href="http://flippfly.com/news/remember-priority-1/">Flippfly.com</a>)</p>
<p>I questioned the wisdom of writing this, since as of yet, I&#8217;ve not released a highly successful game as an independent developer since quitting my day job back in April. Forest and I have high hopes for <a href="http://flippfly.com">Flippfly</a>, but aside from our moderately successful <a href="http://itunes.apple.com/us/app/monkey-drum-deluxe/id527723426?mt=8">Monkey Drum Deluxe</a>, there&#8217;s really not a lot of inherent credibility to my words that comes with having highly successful products to back them up.</p>
<p>But there&#8217;s a <a href="http://www.majaka.net/so-how-did-ski-champion-do-part-deux/">trend</a> of <a href="http://www.gamasutra.com/view/feature/173453/postmortem_crocodile_.php?print=1">beliefs</a> and focus among<a href="http://www.altdevblogaday.com/2012/03/04/indie-devs-the-odds-are-against-you/"> some indies</a> that&#8217;s really kind of discouraging to me.</p>
<p>Namely: I think many have forgotten that the most important factor to success as a game developer is making an excellent game, and started to believe that financial success is either random, or mostly due to factors outside of the game itself.</p>
<p><span id="more-26820"></span></p>
<p>Now &#8211; keep in mind that I said <em>excellent (</em>as opposed to <em>decent, good,</em> or even<em> great)</em> and that when I use that word, I mean: fun, appealing, polished, and accessible (and  don&#8217;t take <em>accessible </em>to mean <em>casual or</em> <em>broadly appealing</em>.)</p>
<p>Recently a tweet from Jon Blow (developer of Braid) made me think on this issue again:</p>
<blockquote class="twitter-tweet"><p>This Gamasutra article just kind of makes me angry. I should learn not to care: <a title="http://www.gamasutra.com/view/feature/173068/congratulations_your_first_indie_.php?print=1" href="http://t.co/UmQ02NKf">gamasutra.com/view/feature/1…</a></p>
<p>— Jonathan Blow (@Jonathan_Blow) <a href="https://twitter.com/Jonathan_Blow/status/218612995012567041">June 29, 2012</a></p></blockquote>
<p>&nbsp;</p>
<p>This was in response to a business-centric postmortem about a game that sold 7 copies titled &#8220;congratulations, your first indie game is a flop.&#8221;</p>
<p>Jon went on to explain:<br />
<em>&#8220;He made a game that there&#8217;s no reason for people to want, but acts like he is entitled to have people buy it / press cover it.&#8221;</em></p>
<p>Now to be fair, I think Jon&#8217;s take on this was harsh &#8211; I  found the article in question to be informative, and as pointed out by <a href="http://mightyvision.blogspot.co.uk/2012/06/ios-sale-numbers.html">Michael Brough</a>, talking about failures is important. The developer acknowledged mistakes, and ultimately showed no regret at having done something he loved and believed in.</p>
<p>My concern is that people seem to have an expectation that their game will do reasonably well as long as they get all their ducks in a row, and if this doesn&#8217;t happen, often the last thing they focus on is the game itself. Time and again I&#8217;ve seen people reference their 75% or so ratings and then go on to talk as if these are &#8220;great&#8221; reviews and that they just need to get their great game in front of people.</p>
<p>I think this kind of thinking is a big mistake.</p>
<p>Hear me out: it should be a self-evident fact that if you expect to succeed financially, you&#8217;re going to need lots of eyes on your game, <em>especially</em> if you&#8217;re charging a couple bucks or less for it. This can happen in a variety of ways, but it mostly boils down to two: either you spend money on marketing, or you make a game that is so good that its quality and value make it impossible to ignore. You want people to play it, share it, tweet about it, talk about it at work, review it, and feature it, not because of a great icon or an attractive promo video, but because it&#8217;s unquestionably <em>just that good. </em>You want it to be the game about which people say &#8220;you really have to play this!&#8221;</p>
<p>You should be able to think of your game like a dry pile of sticks doused in gasoline, that just <a href="http://bits.blogs.nytimes.com/2012/03/01/temple-run/">needs a spark</a> to <a href="http://struct.ca/2010/the-story-so-far/">ignite it</a>.</p>
<p>It&#8217;s tempting to look at counter-examples: all the good games that somehow get passed over, and all the mediocre games that somehow manage to sell millions.</p>
<p>But in the absence of big marketing dollars, I would argue that:</p>
<p>Mediocre games <em>usually</em> fail.<br />
Good games <em>often</em> fail.<br />
Excellent games <em>rarely </em>fail.</p>
<p>Every other case is just noise.</p>
<p>So am I saying that marketing, PR, great icons, promo videos, a great website, social features, killer screenshots, and personal connections are unimportant?</p>
<p>Of course not!</p>
<p>But if your game is less than excellent, then all this stuff is like trying to push a rock up a hill in today&#8217;s market. That&#8217;s not a fulfilling way to spend your life. And the weaker your game is, the more time you&#8217;re going to spend trying to make all these supporting factors make up for it &#8211; a really bad cycle to be in when time is your most precious asset!</p>
<p>What&#8217;s cool about setting out to make excellent games, is that in addition to taking so much of the randomness out of your success potential, you&#8217;re going to enjoy a much more fulfilling career!</p>
<p>Now I feel I should make a point to say that sales isn&#8217;t the only type of success &#8211; and there is certainly room for every type of game, as <a href="http://ramiismail.com/2012/07/on-success-failure-and-the-scene/">Rami Ismail of Vlambeer points out</a>. It&#8217;s a big space and not everybody in it is trying to make a living at it. We actually just submitted a little toy to the app store called &#8220;Creepy Eye&#8221; &#8211; an experiment using face-tracking and the gyroscope that we hadn&#8217;t seen explored before. Making an artful experiment or a cool diversion is a reward in itself. But my concern is when people start to feel a sense of entitlement or surprise when these experiments and &#8220;pretty good&#8221; games don&#8217;t garner any attention or sales, with sometimes lengthy and publicized business-focused analysis of where their monetization strategy failed, or scary-sounding warnings to other would-be indie developers.</p>
<p>If you want to sell games, and you don&#8217;t like throwing dice with your financial future, you need to be determined to produce excellence.</p>
<p>So do yourself a favor: take a break from your monetization strategizing, video-editing, press-emailing, buzz-creating, and icon-tuning for a minute, and ask yourself:<br />
&nbsp;</p>
<h4 style="text-align: center"><span style="color: #333399"><span style="text-align: center">&#8220;Is this game </span><em>Excellent</em><span style="text-align: center"> yet?&#8221;</span></span></h4>
]]></content:encoded>
			<wfw:commentRss>http://www.altdevblogaday.com/2012/07/03/game-developers-remember-priority-1/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Photon Mapping Part 2</title>
		<link>http://www.altdevblogaday.com/2012/06/28/photon-mapping-part-2/</link>
		<comments>http://www.altdevblogaday.com/2012/06/28/photon-mapping-part-2/#comments</comments>
		<pubDate>Thu, 28 Jun 2012 14:39:09 +0000</pubDate>
		<dc:creator>Simon Yeung</dc:creator>
				<category><![CDATA[Computer Graphics]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[GI]]></category>
		<category><![CDATA[global illumination]]></category>
		<category><![CDATA[graphics]]></category>
		<category><![CDATA[light map]]></category>
		<category><![CDATA[programming]]></category>

		<guid isPermaLink="false">http://www.altdevblogaday.com/?p=26786</guid>
		<description><![CDATA[<p><strong><span class="Apple-style-span" style="font-size: large">Introduction</span></strong></p>
<p>Continue with <a href="http://simonstechblog.blogspot.hk/2012/06/photon-mapping-part-1.html">previous post</a>, this post will describe how <a href="http://en.wikipedia.org/wiki/Lightmap">light map</a> is calculated from the photon map. My light map stores incoming radiance of indirect lighting on a surface which are projected into <a href="http://en.wikipedia.org/wiki/Spherical_harmonics">Spherical Harmonics(SH)</a> basis. 4 SH coefficients is used  for each color channels. So 3 textures are used for RGB channels (total 12 coefficients).</p>
<p><a href="http://www.altdevblogaday.com/2012/06/28/photon-mapping-part-2/" class="more-link">Read more on Photon Mapping Part 2&#8230;</a></p>
]]></description>
				<content:encoded><![CDATA[<p><strong><span class="Apple-style-span" style="font-size: large">Introduction</span></strong></p>
<p>Continue with <a href="http://simonstechblog.blogspot.hk/2012/06/photon-mapping-part-1.html">previous post</a>, this post will describe how <a href="http://en.wikipedia.org/wiki/Lightmap">light map</a> is calculated from the photon map. My light map stores incoming radiance of indirect lighting on a surface which are projected into <a href="http://en.wikipedia.org/wiki/Spherical_harmonics">Spherical Harmonics(SH)</a> basis. 4 SH coefficients is used  for each color channels. So 3 textures are used for RGB channels (total 12 coefficients).</p>
<p><strong><span class="Apple-style-span" style="font-size: large">Baking the light map</span></strong></p>
<p>To bake the light map, the scene must have a set of unique, non-overlapping texture coordinates(UV) that correspond to a unique world space position so that the incoming radiance at a world position can be represented. This set of UV can be generated inside modeling package or using <a href="http://msdn.microsoft.com/en-us/library/windows/desktop/bb206321(v=vs.85).aspx">UVAtlas</a>. In my simple case, this UV is mapped manually.</p>
<p>To generate the light map, given a mesh with unique UV and the light map resolution, we need to rasterize the mesh (using scan-line or half-space rasterization) into the texture space with interpolated world space position across the triangles. So we can associate a world space position to a light map texel. Then for each texel, we can sample the photon map at the corresponding world space position by performing a final gather step just like previous post for offline rendering. So the incoming radiance at that world space position, hence the texel in the light map, can be calculated. Then the data is projected into SH coefficients, stored in 3 16-bits floating point textures. Below is a light map that extracting the dominant light color from SH coefficients:</p>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a href="http://1.bp.blogspot.com/-CrVk0YjGgJ8/T9ntBgbKVpI/AAAAAAAAAYo/RY02Z1TS1DM/s1600/sh_lightMap_dominantColor.png"><img src="http://1.bp.blogspot.com/-CrVk0YjGgJ8/T9ntBgbKVpI/AAAAAAAAAYo/RY02Z1TS1DM/s200/sh_lightMap_dominantColor.png" alt="" width="200" height="200" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center">The baked light map showing the dominant</p>
<p>light color from SH coefficients</td>
</tr>
</tbody>
</table>
<p><strong><span class="Apple-style-span" style="font-size: large">Using the light map</span></strong></p>
<p>After baking the light map, during run-time, the direct lighting is rendering with usual way, a point light is used to approximated the area light in the ray traced version, the difference is more noticeable at the shadow edges.</p>
<table>
<tbody>
<tr>
<td>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a href="http://4.bp.blogspot.com/-GF5MfBalGVU/T9nup6cp4sI/AAAAAAAAAYw/7wxAHirwzV8/s1600/sh_lightMap_direct_only.png"><img src="http://4.bp.blogspot.com/-GF5MfBalGVU/T9nup6cp4sI/AAAAAAAAAYw/7wxAHirwzV8/s320/sh_lightMap_direct_only.png" alt="" width="320" height="240" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center">direct lighting only, real time version</td>
</tr>
</tbody>
</table>
</td>
<td>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a href="http://3.bp.blogspot.com/-eDfbsOmVsFU/T9fy_NUuBAI/AAAAAAAAAXs/tld-MRVM9DM/s1600/d_only.png"><img src="http://3.bp.blogspot.com/-eDfbsOmVsFU/T9fy_NUuBAI/AAAAAAAAAXs/tld-MRVM9DM/s200/d_only.png" alt="" width="200" height="200" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center">direct lighting only, ray traced version</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
<p>Then we sample the SH coefficients from the light map to calculate the indirect lighting</p>
<table>
<tbody>
<tr>
<td>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a href="http://2.bp.blogspot.com/-LYrMnBJgpRU/T9nvty0YlOI/AAAAAAAAAY4/Z-m1B8XpP2w/s1600/sh_lightMap_indirect_only.png"><img src="http://2.bp.blogspot.com/-LYrMnBJgpRU/T9nvty0YlOI/AAAAAAAAAY4/Z-m1B8XpP2w/s320/sh_lightMap_indirect_only.png" alt="" width="320" height="240" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center">indirect lighting only, real time version</td>
</tr>
</tbody>
</table>
</td>
<td>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a href="http://1.bp.blogspot.com/-3_0VKifMtNE/T9f4oQKJHPI/AAAAAAAAAYQ/EVN--HPsl28/s1600/id_fg.png"><img src="http://1.bp.blogspot.com/-3_0VKifMtNE/T9f4oQKJHPI/AAAAAAAAAYQ/EVN--HPsl28/s200/id_fg.png" alt="" width="200" height="200" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center">indirect lighting only, ray traced version</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
<p>Combining the direct and indirect lighting, the final result becomes:</p>
<table>
<tbody>
<tr>
<td>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a href="http://3.bp.blogspot.com/-Y0EF5oGqHAY/T9nwoST7TFI/AAAAAAAAAZA/v3B3vZe-vJE/s1600/sh_lightMap.png"><img src="http://3.bp.blogspot.com/-Y0EF5oGqHAY/T9nwoST7TFI/AAAAAAAAAZA/v3B3vZe-vJE/s320/sh_lightMap.png" alt="" width="320" height="240" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center">direct + indirect lighting, real time version</td>
</tr>
</tbody>
</table>
</td>
<td>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a href="http://4.bp.blogspot.com/-QaIn1HWRpLw/T9fzJZhhpJI/AAAAAAAAAX0/WOhdDiKlESM/s1600/d_id_fg.png"><img src="http://4.bp.blogspot.com/-QaIn1HWRpLw/T9fzJZhhpJI/AAAAAAAAAX0/WOhdDiKlESM/s200/d_id_fg.png" alt="" width="200" height="199" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center">direct + indirect lighting, ray traced version</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
<p>&nbsp;</p>
<div>As we store the light map in SH, we can apply normal map to the mesh to change the reflected radiance.</div>
<div>
<table>
<tbody>
<tr>
<td>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a href="http://1.bp.blogspot.com/-wKt_4DgyH-I/T9nxjyWAqiI/AAAAAAAAAZI/AD3L13yYkfw/s1600/sh_lightMap_normal.png"><img src="http://1.bp.blogspot.com/-wKt_4DgyH-I/T9nxjyWAqiI/AAAAAAAAAZI/AD3L13yYkfw/s320/sh_lightMap_normal.png" alt="" width="320" height="240" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center">Rendered with normal map</td>
</tr>
</tbody>
</table>
</td>
<td>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a href="http://4.bp.blogspot.com/-4hsM8An5JvY/T9nyDJ_KG1I/AAAAAAAAAZQ/4JGIlEUh8HY/s1600/sh_lightMap_normal_indirect.png"><img src="http://4.bp.blogspot.com/-4hsM8An5JvY/T9nyDJ_KG1I/AAAAAAAAAZQ/4JGIlEUh8HY/s320/sh_lightMap_normal_indirect.png" alt="" width="320" height="240" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center">Indirect lighting with normal map</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
<p>We can also applying some tessellation, adding some ambient occlusion(AO) to make the result more interesting:</p>
</div>
<div>
<table>
<tbody>
<tr>
<td>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a href="http://2.bp.blogspot.com/-SpgqBKORBXY/T9nzEaQJi3I/AAAAAAAAAZY/wmhNQnu-kYw/s1600/sh_lightMap_normal_tess_ao.png"><img src="http://2.bp.blogspot.com/-SpgqBKORBXY/T9nzEaQJi3I/AAAAAAAAAZY/wmhNQnu-kYw/s320/sh_lightMap_normal_tess_ao.png" alt="" width="320" height="240" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center">Rendered with light map, normal map, tessellation and AO</td>
</tr>
</tbody>
</table>
</td>
<td>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a href="http://2.bp.blogspot.com/-bc1z8O_PAFk/T9nzL77umpI/AAAAAAAAAZg/WeXFHuxPYyA/s1600/sh_lightMap_normal_tess_ao2.png"><img src="http://2.bp.blogspot.com/-bc1z8O_PAFk/T9nzL77umpI/AAAAAAAAAZg/WeXFHuxPYyA/s320/sh_lightMap_normal_tess_ao2.png" alt="" width="320" height="240" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center">Rendered with light map, normal map, tessellation and AO</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
</div>
<p><strong><span class="Apple-style-span" style="font-size: large">Conclusion</span></strong></p>
<p>This post gives an overview on how to bake light map of indirect lighting data by sampling from the photon map. I use SH to store the incoming radiance, but other data can be stored such as storing the reflected diffuse radiance of the surface, which can reduce texture storage and doesn&#8217;t require floating point texture. Besides, the SH coefficients can be store per vertex in the static mesh instead of light map. Lastly, by sampling the photon map with final gather rays, light probe for dynamic objects can also be baked using similar methods.</p>
<p><strong>References</strong></p>
<p><span class="Apple-style-span" style="font-size: x-small">March of the Froblins: <a href="http://developer.amd.com/samples/demos/pages/froblins.aspx">http://developer.amd.com/samples/demos/pages/froblins.aspx</a></span></p>
<p><span class="Apple-style-span" style="font-size: x-small">Lighting and Material of HALO 3: <a href="http://www.bungie.net/images/Inside/publications/presentations/lighting_material.zip">http://www.bungie.net/images/Inside/publications/presentations/lighting_material.zip</a></span></p>
]]></content:encoded>
			<wfw:commentRss>http://www.altdevblogaday.com/2012/06/28/photon-mapping-part-2/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Photon Mapping Part 1</title>
		<link>http://www.altdevblogaday.com/2012/06/14/photon-mapping-part-1/</link>
		<comments>http://www.altdevblogaday.com/2012/06/14/photon-mapping-part-1/#comments</comments>
		<pubDate>Thu, 14 Jun 2012 02:29:27 +0000</pubDate>
		<dc:creator>Simon Yeung</dc:creator>
				<category><![CDATA[Computer Graphics]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[GI]]></category>
		<category><![CDATA[global illumination]]></category>
		<category><![CDATA[graphics]]></category>
		<category><![CDATA[programming]]></category>

		<guid isPermaLink="false">http://www.altdevblogaday.com/?p=26588</guid>
		<description><![CDATA[<p><strong>Introduction</strong></p>
<p>In this generation of computer graphics, <a href="http://en.wikipedia.org/wiki/Global_illumination">global illumination</a> (GI) is an important technique which calculate indirect lighting within a scene. <a href="http://en.wikipedia.org/wiki/Photon_mapping">Photon mapping</a> is one of the GI technique using particle tracing to compute images in offline rendering. Photon mapping is an easy to implement technique, so I choose to learn it and my target is to bake light map storing indirect diffuse lighting information using the photon map. Photon mapping consists of 2 passes: photon map pass and render pass, which will be described below.</p>
<p><a href="http://www.altdevblogaday.com/2012/06/14/photon-mapping-part-1/" class="more-link">Read more on Photon Mapping Part 1&#8230;</a></p>
]]></description>
				<content:encoded><![CDATA[<p><strong>Introduction</strong></p>
<p>In this generation of computer graphics, <a href="http://en.wikipedia.org/wiki/Global_illumination">global illumination</a> (GI) is an important technique which calculate indirect lighting within a scene. <a href="http://en.wikipedia.org/wiki/Photon_mapping">Photon mapping</a> is one of the GI technique using particle tracing to compute images in offline rendering. Photon mapping is an easy to implement technique, so I choose to learn it and my target is to bake light map storing indirect diffuse lighting information using the photon map. Photon mapping consists of 2 passes: photon map pass and render pass, which will be described below.</p>
<p><strong>Photon Map Pass</strong></p>
<p>In this pass, photons will be casted into the scene from the position of light source. Each photon store packet of energy. When photon hits a surface of the scene, the photon will either be reflected (either diffusely or specularly), transmitted  or absorbed, which is determined by Russian roulette.</p>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a href="http://2.bp.blogspot.com/-9JpihE-_5DE/T9f4DYhYuFI/AAAAAAAAAYI/VIRqKYHHUgE/s1600/photonPath.png"><img src="http://2.bp.blogspot.com/-9JpihE-_5DE/T9f4DYhYuFI/AAAAAAAAAYI/VIRqKYHHUgE/s400/photonPath.png" alt="" width="400" height="295" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center">Photons are traced in the scene to simulate the light transportation</td>
</tr>
</tbody>
</table>
<div></div>
<div>This hit event represents the incoming energy of that surface and will be stored in a <a href="http://en.wikipedia.org/wiki/K-d_tree">k-d tree</a> (known as photon map) for looking up in the render pass. Each hit event would store the photon energy, the incoming direction and the hit position.</div>
<p>However, it is more convenient to store radiance than storing energy in photon because when using punctual light source(e.g. point light), it is hard to compute the energy emits given the light source radiance. So I use the method described in <a href="http://www.pbrt.org/">Physically Based Rendering</a>, a weight of radiance is stored in each photon:</p>
<div class="separator" style="clear: both;text-align: center"><a href="http://3.bp.blogspot.com/-KhcyefbkX-I/T9fxHIj6NgI/AAAAAAAAAW0/xXmdUhJVEbI/s1600/alpha_emit.png"><img src="http://3.bp.blogspot.com/-KhcyefbkX-I/T9fxHIj6NgI/AAAAAAAAAW0/xXmdUhJVEbI/s320/alpha_emit.png" alt="" width="320" height="148" border="0" /></a></div>
<p>When a photon hits a surface, the probability of being reflected in a new random direction used in Russian roulette is:</p>
<div class="separator" style="clear: both;text-align: center"><a href="http://1.bp.blogspot.com/-2uau5SERimg/T9fxNXmevlI/AAAAAAAAAW8/bx2m2a3Sjis/s1600/prob_reflect.png"><img src="http://1.bp.blogspot.com/-2uau5SERimg/T9fxNXmevlI/AAAAAAAAAW8/bx2m2a3Sjis/s320/prob_reflect.png" alt="" width="320" height="143" border="0" /></a></div>
<p>This probability equation is chosen because photon will have a higher chance of being reflected if it is brighter. If the photon is reflected, its radiance will be updated to:</p>
<div class="separator" style="clear: both;text-align: center"><a href="http://1.bp.blogspot.com/-wLu8xN8rO6A/T9fxS3Q_qcI/AAAAAAAAAXE/vX5CQ94kjnE/s1600/alphaReflect.png"><img src="http://1.bp.blogspot.com/-wLu8xN8rO6A/T9fxS3Q_qcI/AAAAAAAAAXE/vX5CQ94kjnE/s320/alphaReflect.png" alt="" width="320" height="69" border="0" /></a></div>
<p>And the photon will continue to trace in the newly reflected direction.</p>
<p><strong>Render Pass</strong></p>
<p>In render pass, the direct and indirect lighting is computed separately. The direction lighting is computed using ray tracing.</p>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a href="http://3.bp.blogspot.com/-eDfbsOmVsFU/T9fy_NUuBAI/AAAAAAAAAXs/tld-MRVM9DM/s1600/d_only.png"><img src="http://3.bp.blogspot.com/-eDfbsOmVsFU/T9fy_NUuBAI/AAAAAAAAAXs/tld-MRVM9DM/s320/d_only.png" alt="" width="320" height="320" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center">Direct light only</td>
</tr>
</tbody>
</table>
<p>The indirect lighting is computed by sampling from the photon map. When calculating the indirect lighting in a given position(in this case, the shading pixel), we can locate N nearby photons in photon map to estimate the incoming radiance using <a href="http://en.wikipedia.org/wiki/Kernel_density_estimation">kernel density estimation</a>. A <a href="http://en.wikipedia.org/wiki/Kernel_(statistics)">kernel function</a> need to satisfy the conditions:</p>
<div class="separator" style="clear: both;text-align: center"><a href="http://2.bp.blogspot.com/-kqw71KJDmyg/T9fxYqiRdKI/AAAAAAAAAXM/1DHND31RjAM/s1600/kernel.png"><img src="http://2.bp.blogspot.com/-kqw71KJDmyg/T9fxYqiRdKI/AAAAAAAAAXM/1DHND31RjAM/s200/kernel.png" alt="" width="200" height="56" border="0" /></a></div>
<p>&nbsp;</p>
<div style="margin: 0px">I use the Simpson&#8217;s kernel(also known as Silverman&#8217;s second order kernel) suggested in the book Physically Based Rendering:</div>
<div class="separator" style="clear: both;text-align: center"><a href="http://2.bp.blogspot.com/-Ak52BhXbAwY/T9fxfemYUsI/AAAAAAAAAXU/EefVZobxOn8/s1600/simpsonKernel.png"><img src="http://2.bp.blogspot.com/-Ak52BhXbAwY/T9fxfemYUsI/AAAAAAAAAXU/EefVZobxOn8/s320/simpsonKernel.png" alt="" width="320" height="112" border="0" /></a></div>
<p>Then the density can be computed using kernel estimator for N samples within a distance d (i.e. the distance of the photon that is the most far away in the N samples):</p>
<div class="separator" style="clear: both;text-align: center"><a href="http://4.bp.blogspot.com/-EVIqO2i0ajw/T9fxmyjV79I/AAAAAAAAAXc/8esrs_WX7YM/s1600/kernelEstimator.png"><img src="http://4.bp.blogspot.com/-EVIqO2i0ajw/T9fxmyjV79I/AAAAAAAAAXc/8esrs_WX7YM/s320/kernelEstimator.png" alt="" width="320" height="96" border="0" /></a></div>
<p>Then the reflected radiance at the shading position can be computed with:</p>
<div class="separator" style="clear: both;text-align: center"><a href="http://3.bp.blogspot.com/-xNS0AtGd8Bk/T9fxuJ1RCyI/AAAAAAAAAXk/z7vOb8ZhVFY/s1600/L_relfected.png"><img src="http://3.bp.blogspot.com/-xNS0AtGd8Bk/T9fxuJ1RCyI/AAAAAAAAAXk/z7vOb8ZhVFY/s320/L_relfected.png" alt="" width="320" height="82" border="0" /></a></div>
<p>However, the result showing some circular artifact:</p>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a href="http://3.bp.blogspot.com/-Xny1lJG_Cgc/T9f0mR2R73I/AAAAAAAAAX8/cJTSfOVvFvs/s1600/d_id_no_fg.png"><img src="http://3.bp.blogspot.com/-Xny1lJG_Cgc/T9f0mR2R73I/AAAAAAAAAX8/cJTSfOVvFvs/s320/d_id_no_fg.png" alt="" width="320" height="319" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center">Using the photon map directly for indirect diffuselight would show artifact</td>
</tr>
</tbody>
</table>
<p>To tackle this problem, either increase the number of photon to a very high number, or we can perform a final gather step. In the final gather step, we shoot a number of final gather rays from the pixel that we are shading in random direction over the hemisphere of the shading point.</p>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a href="http://3.bp.blogspot.com/-vgb5P7NDD4c/T9f7JbTPF4I/AAAAAAAAAYc/Enpi3xO-AKs/s1600/finalGather.png"><img src="http://3.bp.blogspot.com/-vgb5P7NDD4c/T9f7JbTPF4I/AAAAAAAAAYc/Enpi3xO-AKs/s320/finalGather.png" alt="" width="320" height="294" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center">Final gather rays are casted from every shading position</td>
</tr>
</tbody>
</table>
<p>When final gather ray hit another surface, then the photon map is queried just like before and the reflected radiance from this surface will be the incoming radiance of the shading pixel. Using Monte Carlo integration, the reflected radiance at the shading pixel can be calculated by sampling the final gather rays. Here is the final result:</p>
<table>
<tbody>
<tr>
<td>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a href="http://4.bp.blogspot.com/-QaIn1HWRpLw/T9fzJZhhpJI/AAAAAAAAAX0/WOhdDiKlESM/s1600/d_id_fg.png"><img src="http://4.bp.blogspot.com/-QaIn1HWRpLw/T9fzJZhhpJI/AAAAAAAAAX0/WOhdDiKlESM/s320/d_id_fg.png" alt="" width="320" height="319" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center">Direct light + Indirect light, with final gather</td>
</tr>
</tbody>
</table>
</td>
<td>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a href="http://1.bp.blogspot.com/-3_0VKifMtNE/T9f4oQKJHPI/AAAAAAAAAYQ/EVN--HPsl28/s1600/id_fg.png"><img src="http://1.bp.blogspot.com/-3_0VKifMtNE/T9f4oQKJHPI/AAAAAAAAAYQ/EVN--HPsl28/s320/id_fg.png" alt="" width="320" height="320" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center">Indirect light only, with final gather</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
<p><strong>Conclusion</strong></p>
<p>In this post, the steps to implement photon map is briefly described. It is a 2 passes approach with the photon map pass building a photon map as kd-tree representing the indirect lighting data and the render pass use the photon map to compute the final image. In next part, I will describe how to make use of the photon map to bake light map for real time application.</p>
<p><strong>References</strong></p>
<p>A Practical Guide to Global Illumination using Photon Maps: <a href="http://nameless.cis.udel.edu/class_data/cg/jensen_photon_mapping_tutorial.pdf">http://nameless.cis.udel.edu/class_data/cg/jensen_photon_mapping_tutorial.pdf</a></p>
<p>Physically Based Rendering: <a href="http://www.pbrt.org/">http://www.pbrt.org/</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.altdevblogaday.com/2012/06/14/photon-mapping-part-1/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Software Rasterizer Part 2</title>
		<link>http://www.altdevblogaday.com/2012/04/29/software-rasterizer-part-2/</link>
		<comments>http://www.altdevblogaday.com/2012/04/29/software-rasterizer-part-2/#comments</comments>
		<pubDate>Sun, 29 Apr 2012 12:48:13 +0000</pubDate>
		<dc:creator>Simon Yeung</dc:creator>
				<category><![CDATA[Computer Graphics]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[graphics]]></category>
		<category><![CDATA[Mathematics]]></category>
		<category><![CDATA[maths]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[software rasterizer]]></category>

		<guid isPermaLink="false">http://www.altdevblogaday.com/?p=25741</guid>
		<description><![CDATA[<p><strong><span class="Apple-style-span" style="font-size: large">Introduction</span></strong></p>
<p>Continue with the <a href="http://simonstechblog.blogspot.com/2012/04/software-rasterizer-part-1.html">previous post</a>, after filling the triangle with scan line or half-space algorithm, we also need to interpolate the vertex attributes across the triangle so that we can have texture coordinates or depth on every pixel. However we cannot directly interpolate those attributes in screen space because projection transform after perspective division is not an <a href="http://en.wikipedia.org/wiki/Affine_transformation">affine transformation</a> (i.e. after transformation, the mid-point of the line segment is no longer the mid-point), this will result in some distortion and this artifact is even more noticeable when the triangle is large:</p>
<p><a href="http://www.altdevblogaday.com/2012/04/29/software-rasterizer-part-2/" class="more-link">Read more on Software Rasterizer Part 2&#8230;</a></p>
]]></description>
				<content:encoded><![CDATA[<p><strong><span class="Apple-style-span" style="font-size: large">Introduction</span></strong></p>
<p>Continue with the <a href="http://simonstechblog.blogspot.com/2012/04/software-rasterizer-part-1.html">previous post</a>, after filling the triangle with scan line or half-space algorithm, we also need to interpolate the vertex attributes across the triangle so that we can have texture coordinates or depth on every pixel. However we cannot directly interpolate those attributes in screen space because projection transform after perspective division is not an <a href="http://en.wikipedia.org/wiki/Affine_transformation">affine transformation</a> (i.e. after transformation, the mid-point of the line segment is no longer the mid-point), this will result in some distortion and this artifact is even more noticeable when the triangle is large:</p>
<table>
<tbody>
<tr>
<td>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a href="http://4.bp.blogspot.com/-XOR93SRDUew/T4loMyobapI/AAAAAAAAAVc/bBYSu-0cIrg/s1600/interpolateInScrSpace.png"><img src="http://4.bp.blogspot.com/-XOR93SRDUew/T4loMyobapI/AAAAAAAAAVc/bBYSu-0cIrg/s320/interpolateInScrSpace.png" alt="" width="320" height="180" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center">interpolate in screen space</td>
</tr>
</tbody>
</table>
</td>
<td>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a href="http://3.bp.blogspot.com/-zIhlNAMPUU4/T4loScmVUgI/AAAAAAAAAVk/GzOogFZwhnM/s1600/interpolatePerspectiveCorrect.png"><img src="http://3.bp.blogspot.com/-zIhlNAMPUU4/T4loScmVUgI/AAAAAAAAAVk/GzOogFZwhnM/s320/interpolatePerspectiveCorrect.png" alt="" width="320" height="180" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center">perspective correct interpolation</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
<p><span class="Apple-style-span" style="font-size: large;font-weight: bold">Condition for linear interpolation</span></p>
<p>When interpolating the attributes in a linear way, we are saying that given a set of vertices, <em>v</em><span class="Apple-style-span" style="font-size: xx-small"><em>i</em></span> (where i is any integer&gt;=0) with a set of attributes <em>a</em><em><span class="Apple-style-span" style="font-size: xx-small">i</span></em> (such as texture coordinates), we have a function mapping a vertex to the corresponding attributes, i.e.</p>
<div style="text-align: center"><em>f</em>(<em>v</em><em><span class="Apple-style-span" style="font-size: xx-small">i</span></em>)= <em>a</em><em><span class="Apple-style-span" style="font-size: xx-small">i</span></em></div>
<p>Say, to interpolate a vertex inside a triangle in a linear way, the function <em>f</em> need to have the following properties:</p>
<div style="text-align: center"><em>f</em>(<em>t<span class="Apple-style-span" style="font-size: xx-small">0</span></em> *<em>v</em><em><span class="Apple-style-span" style="font-size: xx-small">0</span></em> + <em>t<span class="Apple-style-span" style="font-size: xx-small">1</span></em> *<em>v</em><em><span class="Apple-style-span" style="font-size: xx-small">1</span></em> + <em>t<span class="Apple-style-span" style="font-size: xx-small">2</span></em> *<em>v</em><em><span class="Apple-style-span" style="font-size: xx-small">2</span></em> ) = <em>t<span class="Apple-style-span" style="font-size: xx-small">0</span></em> * <em>f</em>(<em>v</em><em><span class="Apple-style-span" style="font-size: xx-small">0</span></em>) + <em>t<span class="Apple-style-span" style="font-size: xx-small">1</span></em> * <em>f</em>(<em>v</em><em><span class="Apple-style-span" style="font-size: xx-small">1</span></em>) + <em>t<span class="Apple-style-span" style="font-size: xx-small">2</span></em> * <em>f</em>(<em>v</em><em><span class="Apple-style-span" style="font-size: xx-small">2</span></em>)</div>
<div style="text-align: right"><span class="Apple-style-span" style="font-size: x-small"><span class="Apple-style-span">, for any </span><em>t0</em>, <em>t1</em>, <em>t2</em><span class="Apple-style-span"> where <em>t0 </em></span>+ <em>t1 </em>+ <em>t2</em>=1</span></div>
<p>which means that we can calculate the interpolated attributes using the same weight <em>t<span class="Apple-style-span" style="font-size: xx-small">i</span></em> used for interpolating vertex position. For functions having the above properties, those functions will be an <a href="http://en.wikipedia.org/wiki/Affine_transformation">affine function</a> with the following form:</p>
<div style="text-align: center"><em>f</em>(<em>x</em>)= <em>Ax</em> + <em>b</em></div>
<div style="text-align: right"><span class="Apple-style-span" style="font-size: x-small">, where <em>A</em> is a matrix, <em>x</em> and <em>b</em> are vector</span></div>
<p><span class="Apple-style-span" style="font-size: large;font-weight: bold">Depth interpolation</span></p>
<p>When a vertex is projected from view space to normalized device coordinates(NDC), we will have the following relation (ratio of the triangles) between the view space and NDC space:</p>
<div class="separator" style="clear: both;text-align: center"><a href="http://2.bp.blogspot.com/-cI7nS4ZOEXY/T4mDh9YNMNI/AAAAAAAAAVs/s0NYz7z_VME/s1600/eqt1_2.png"><img src="http://2.bp.blogspot.com/-cI7nS4ZOEXY/T4mDh9YNMNI/AAAAAAAAAVs/s0NYz7z_VME/s320/eqt1_2.png" alt="" width="320" height="88" border="0" /></a></div>
<p>substitute equation 1 and 2 into the plane equation of the triangle lies on:</p>
<div class="separator" style="clear: both;text-align: center"><a href="http://2.bp.blogspot.com/-39mf9Y9Nq7E/T4mDn7EyE6I/AAAAAAAAAV0/jd-K65xWHmU/s1600/eqt3.png"><img src="http://2.bp.blogspot.com/-39mf9Y9Nq7E/T4mDn7EyE6I/AAAAAAAAAV0/jd-K65xWHmU/s400/eqt3.png" alt="" width="400" height="90" border="0" /></a></div>
<p>&nbsp;</p>
<div>
<div>
<p>So, 1/<em>z<span class="Apple-style-span" style="font-size: xx-small">view</span></em> is an affine function of <em>x<span class="Apple-style-span" style="font-size: xx-small">ndc</span></em> and <em>y<span class="Apple-style-span" style="font-size: xx-small">ndc</span></em> which can be interpolated linearly across the screen space (the transform from NDC space to screen space is a linear transform).</p>
<p><span class="Apple-style-span" style="font-size: large;font-weight: bold">Attributes interpolation</span></p>
<p>In last section, we know how to interpolate the depth of a pixel linearly in screen space, the next problem is to interpolate the vertex attributes(e.g. texture coordinates). In view space, we know that those attributes can be interpolated linearly, so those attributes can be calculated by an affine function with the vertex position as parameters e.g.</p>
<div class="separator" style="clear: both;text-align: center"><a href="http://3.bp.blogspot.com/-YN4TbKnFFRo/T4mDwJdsZlI/AAAAAAAAAV8/9KUBzFV_q2A/s1600/uEqt.png"><img src="http://3.bp.blogspot.com/-YN4TbKnFFRo/T4mDwJdsZlI/AAAAAAAAAV8/9KUBzFV_q2A/s320/uEqt.png" alt="" width="320" height="44" border="0" /></a></div>
<p>Similar to interpolate depth, substitute equation 1 and 2 into the above equation:</p>
<div class="separator" style="clear: both;text-align: center"><a href="http://4.bp.blogspot.com/-A7XmZcn0wjY/T4mD3I0_Q9I/AAAAAAAAAWE/2Z7GLe-jNWA/s1600/uOverZ.png"><img src="http://4.bp.blogspot.com/-A7XmZcn0wjY/T4mD3I0_Q9I/AAAAAAAAAWE/2Z7GLe-jNWA/s400/uOverZ.png" alt="" width="400" height="121" border="0" /></a></div>
<p>Therefore, <em>u</em>/<em>z<span class="Apple-style-span" style="font-size: xx-small">view</span></em> is an another affine function of <em>x<span class="Apple-style-span" style="font-size: xx-small">ndc</span></em> and <em>y<span class="Apple-style-span" style="font-size: xx-small">ndc</span></em> which can be interpolated linearly across the screen space. Hence we can interpolating <em>u</em> linearly by first interpolate <em>1</em>/<em>z<span class="Apple-style-span" style="font-size: xx-small">view</span></em> and <em>u</em>/<em>z<span class="Apple-style-span" style="font-size: xx-small">view</span></em> across screen space, and then divide them per pixel.</p>
<p><span class="Apple-style-span" style="font-size: large;font-weight: bold">The last problem&#8230;</span></p>
</div>
<div>
<p>Now, we know that we can interpolate the view space depth and vertex attributes linearly across screen space. But during the rasterization state, we only have vertices in homogenous coordinates (vertices are transformed by the projection matrix already), how can we get the <em>z<span class="Apple-style-span" style="font-size: xx-small">view</span></em> to do the perspective correct interpolation?</p>
<p>Consider the projection matrix (I use D3D one, but the same for openGL):</p>
<div class="separator" style="clear: both;text-align: center"><a href="http://4.bp.blogspot.com/-KuQCpcDZdw0/T4mD-DIDFKI/AAAAAAAAAWM/HAckGRg5sDg/s1600/proj.png"><img src="http://4.bp.blogspot.com/-KuQCpcDZdw0/T4mD-DIDFKI/AAAAAAAAAWM/HAckGRg5sDg/s200/proj.png" alt="" width="200" height="116" border="0" /></a></div>
<div class="separator" style="clear: both;text-align: -webkit-auto"></div>
<p>After transforming the vertex position, the <em>w</em>-coordinate will be the view space depth!</p>
<div style="text-align: center">i.e. <em><strong>w-</strong></em><em>homogenous </em>=<em> </em><em><strong>z</strong><span class="Apple-style-span" style="font-size: xx-small">view </span></em></div>
<p>And look at the matrix again and consider the transformed <em>z</em>-coordinates, it will in a form of:</p>
<div class="separator" style="clear: both;text-align: center"><a href="http://2.bp.blogspot.com/-_GrTrCVKp-s/T4mEEDH6IFI/AAAAAAAAAWU/4V5nVfHFB68/s1600/zHomo.png"><img src="http://2.bp.blogspot.com/-_GrTrCVKp-s/T4mEEDH6IFI/AAAAAAAAAWU/4V5nVfHFB68/s1600/zHomo.png" alt="" border="0" /></a></div>
<p>After transforming to the NDC,</p>
<div class="separator" style="clear: both;text-align: center"><a href="http://3.bp.blogspot.com/-dBLPwyegfKA/T4mEJ30EpMI/AAAAAAAAAWc/xHMvLbRdMwg/s1600/zNDC.png"><img src="http://3.bp.blogspot.com/-dBLPwyegfKA/T4mEJ30EpMI/AAAAAAAAAWc/xHMvLbRdMwg/s1600/zNDC.png" alt="" border="0" /></a></div>
<p>So the depth value can be directly interpolated using <strong>z-</strong><em>NDC  </em>for depth test.</p>
<p><strong><span class="Apple-style-span" style="font-size: large">Demo</span></strong></p>
<p>A javascript demo to rasterize the triangles can be viewed <a href="http://simonstechblog.blogspot.com/2012/04/software-rasterizer-part-2.html#softwareRasterizerDemo">here</a>(although not optimized&#8230;). And the source code can be downloaded <a href="https://sites.google.com/site/simontechblog/home/softwarerasterizer/softwareRasterizer.js">here</a>.</p>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a href="http://simonstechblog.blogspot.com/2012/04/software-rasterizer-part-2.html#softwareRasterizerDemo"><img src="http://2.bp.blogspot.com/-riodEfJnzi4/T5gNhTswj8I/AAAAAAAAAWk/6mmx8r7Ri7I/s320/scrShot.png" alt="" width="320" height="232" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center">Screen shot of the demo</td>
</tr>
</tbody>
</table>
<p><strong><span class="Apple-style-span" style="font-size: large">Conclusion</span></strong><br />
In this post, the steps to linear interpolate the vertex in screen space is described. And for rasterizing the depth buffer only (e.g. for occlusion), the depth value can be linearly interpolated directly with the z coordinate in NDC space which is even simpler.</p>
<p><strong>References</strong><br />
[1] <a href="http://www.lysator.liu.se/~mikaelk/doc/perspectivetexture/">http://www.lysator.liu.se/~mikaelk/doc/perspectivetexture/</a><br />
[2] <a href="http://www.gamedev.net/topic/581732-perspective-correct-depth-interpolation/">http://www.gamedev.net/topic/581732-perspective-correct-depth-interpolation/</a><br />
[3] <a href="http://chrishecker.com/Miscellaneous_Technical_Articles">http://chrishecker.com/Miscellaneous_Technical_Articles</a><br />
[4] <a href="http://en.wikipedia.org/wiki/Affine_transformation">http://en.wikipedia.org/wiki/Affine_transformation</a></p>
<p>&nbsp;</p>
</div>
</div>
]]></content:encoded>
			<wfw:commentRss>http://www.altdevblogaday.com/2012/04/29/software-rasterizer-part-2/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Software Rasterizer Part 1</title>
		<link>http://www.altdevblogaday.com/2012/04/14/software-rasterizer-part-1/</link>
		<comments>http://www.altdevblogaday.com/2012/04/14/software-rasterizer-part-1/#comments</comments>
		<pubDate>Sat, 14 Apr 2012 10:05:37 +0000</pubDate>
		<dc:creator>Simon Yeung</dc:creator>
				<category><![CDATA[Computer Graphics]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[graphics]]></category>
		<category><![CDATA[math]]></category>
		<category><![CDATA[Mathematics]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[software rasterizer]]></category>

		<guid isPermaLink="false">http://www.altdevblogaday.com/?p=25433</guid>
		<description><![CDATA[<p><strong><span class="Apple-style-span" style="font-size: large">Introduction</span></strong><br />
Software rasterizer can be used for occlusion culling, some games such as <a href="http://www.slideshare.net/guerrillagames/practical-occlusion-culling-in-killzone-3">Killzone 3</a> use this to cull objects.  So I decided to write one by myself. The steps are first to transform vertices to homogenous coordinates, clip the triangles to the viewport and then fill the triangles with interpolated parameters.  Note that the clipping process should be done in homogenous coordinates before the perspective division, otherwise lots of the extra work are need to clip the triangles properly and this post will explain why clipping should be done before the perspective division.</p>
<p><a href="http://www.altdevblogaday.com/2012/04/14/software-rasterizer-part-1/" class="more-link">Read more on Software Rasterizer Part 1&#8230;</a></p>
]]></description>
				<content:encoded><![CDATA[<p><strong><span class="Apple-style-span" style="font-size: large">Introduction</span></strong><br />
Software rasterizer can be used for occlusion culling, some games such as <a href="http://www.slideshare.net/guerrillagames/practical-occlusion-culling-in-killzone-3">Killzone 3</a> use this to cull objects.  So I decided to write one by myself. The steps are first to transform vertices to homogenous coordinates, clip the triangles to the viewport and then fill the triangles with interpolated parameters.  Note that the clipping process should be done in homogenous coordinates before the perspective division, otherwise lots of the extra work are need to clip the triangles properly and this post will explain why clipping should be done before the perspective division.</p>
<p><strong><span class="Apple-style-span" style="font-size: large">Points in Homogenous coordinates</span></strong><br />
In our usual Cartesian Coordinate system, we can represent any points in 3D space in the form of (<em>X</em>, <em>Y</em>, <em>Z</em>). While in Homogenous coordinates, a redundant component <em>w</em> is added which resulting in a form of (<em>x</em>, <em>y</em>, <em>z</em>, <em>w</em>). Multiplying any constant (except zero) to that 4-components vector is still representing the same point in homogenous coordinates. To convert a homogenous point back to our usual Cartesian Coordinate, we would multiply a point in homogenous coordinates so that the <em>w</em> component is equals to one:</p>
<div style="text-align: center">(<em>x</em>, <em>y</em>, <em>z</em>, <em>w</em>) -&gt; (<em>x/</em><em>w </em>, <em>y/</em><em>w </em>, <em>z/</em><em>w, 1</em>) -&gt; (<em>X</em>, <em>Y</em>, <em>Z</em>)</div>
<p>In the following figure, we consider the <em>x</em>-<em>w</em> plane, a point (<em>x</em>, <em>y</em>, <em>z</em>, <em>w</em>) is transformed back to the usual Cartesian Coordinates (<em>X</em>, <em>Y</em>, <em>Z</em>) by projecting onto the <em>w</em>=1 plane:</p>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a href="http://2.bp.blogspot.com/-dfDLBMv6VG8/T3-_4ZA_SYI/AAAAAAAAAT0/yiY9uJC2Skw/s1600/projectTow1.png"><img src="http://2.bp.blogspot.com/-dfDLBMv6VG8/T3-_4ZA_SYI/AAAAAAAAAT0/yiY9uJC2Skw/s320/projectTow1.png" alt="" width="320" height="208" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center">figure 1. projecting point to <em>w</em>=1 plane</td>
</tr>
</tbody>
</table>
<p>The interesting point comes when the <em>w</em> component is equals to zero. Imagine the <em>w</em> component is getting smaller and smaller, approaching zero, the coordinates of point (<em>x/</em><em>w </em>, <em>y/</em><em>w </em>, <em>z/</em><em>w, 1</em>) will getting larger and larger. When <em>w</em> is equals to zero, we can represent a point at infinity.</p>
<p><strong><span class="Apple-style-span" style="font-size: large">Line Segments in Homogenous coordinates</span></strong><br />
In Homogenous coordinates, we still can represent a line segment between two points P<span class="Apple-style-span" style="font-size: xx-small">0</span>= (<em>x</em><span class="Apple-style-span" style="font-size: xx-small">0</span>, <em>y</em><span class="Apple-style-span" style="font-size: xx-small">0</span>, <em>z</em><span class="Apple-style-span" style="font-size: xx-small">0</span>, <em>w</em><span class="Apple-style-span" style="font-size: xx-small">0</span>) and  P<span class="Apple-style-span" style="font-size: xx-small">1</span>= (<em>x</em><span class="Apple-style-span" style="font-size: xx-small">1</span>, <em>y</em><span class="Apple-style-span" style="font-size: xx-small">1</span>, <em>z</em><span class="Apple-style-span" style="font-size: xx-small">1</span>, <em>w</em><span class="Apple-style-span" style="font-size: xx-small">1</span>) in parametric form:</p>
<div style="text-align: center">L= P<span class="Apple-style-span" style="font-size: xx-small">0</span> + t * (P<span class="Apple-style-span" style="font-size: xx-small">1</span>-P<span class="Apple-style-span" style="font-size: xx-small">0</span>),   <span class="Apple-style-span" style="font-size: x-small">where <em>t</em> is within [0, 1]</span></div>
<p>Then we can get a line having the shape:</p>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a href="http://3.bp.blogspot.com/-hDsxTEcslcM/T3_EqJVVdXI/AAAAAAAAAT8/KmSDSPQQUYg/s1600/internalLine.png"><img src="http://3.bp.blogspot.com/-hDsxTEcslcM/T3_EqJVVdXI/AAAAAAAAAT8/KmSDSPQQUYg/s320/internalLine.png" alt="" width="320" height="208" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center">figure 2. internal line segment</td>
</tr>
</tbody>
</table>
<p>The projected line on <em>w</em>=1 is called internal line segment in the above case.<br />
But what if the coordinates of P<span class="Apple-style-span" style="font-size: xx-small">0</span> and P<span class="Apple-style-span" style="font-size: xx-small">1</span> having the coordinates where <em>w</em><span class="Apple-style-span" style="font-size: xx-small">0</span> &lt; 0 and <em>w</em><span class="Apple-style-span" style="font-size: xx-small">1</span> &gt; 0 ?</p>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a href="http://4.bp.blogspot.com/-IS6paGPft8k/T3_GvjeA4uI/AAAAAAAAAUE/Ntx_8hcJ-BE/s1600/externalLine.png"><img src="http://4.bp.blogspot.com/-IS6paGPft8k/T3_GvjeA4uI/AAAAAAAAAUE/Ntx_8hcJ-BE/s320/externalLine.png" alt="" width="320" height="208" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center">figure 3. external line segment</td>
</tr>
</tbody>
</table>
<p>In this case, it will result in the above figure, forming an external line segment. It is because the homogenous line segment have the form L= P<span class="Apple-style-span" style="font-size: xx-small">0</span> + t * (P<span class="Apple-style-span" style="font-size: xx-small">1</span>-P<span class="Apple-style-span" style="font-size: xx-small">0</span>), when moving the parameter from <em>t</em>=0 to <em>t</em>= 1, since <em>w</em><span class="Apple-style-span" style="font-size: xx-small">0</span> &lt; 0 and <em>w</em><span class="Apple-style-span" style="font-size: xx-small">1</span> &gt; 0, there exist a point on the homogenous line where <em>w</em>=0. This point is at infinity when projected to the <em>w</em>=1 plane, resulting the projected line segment joining P<span class="Apple-style-span" style="font-size: xx-small">0</span> and P<span class="Apple-style-span" style="font-size: xx-small">1</span> passes through the point at infinity, forming an external line segment.</p>
<p>The figure below shows how points are transformed before and after perspective projection and divided by <em>w</em>:</p>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a href="http://4.bp.blogspot.com/-g1H9j5WDenc/T3_gSsZyoII/AAAAAAAAAUU/mtunCiIXOVQ/s1600/regionMapping.png"><img src="http://4.bp.blogspot.com/-g1H9j5WDenc/T3_gSsZyoII/AAAAAAAAAUU/mtunCiIXOVQ/s400/regionMapping.png" alt="" width="400" height="153" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center">figure 4. region mapping</td>
</tr>
</tbody>
</table>
<div style="margin: 0px">The blue line shows the viewing frustum, nothing unusual for the region in front of the eye. The unusual things are the points behind the eye. After perspective transformation and projected to <em>w</em>=1 plane, those points are transformed in front of the eye too. So for line segment with one point in front of the eye and the other behind the eye, it would be transformed to the external line segment after the perspective division.</div>
<div style="margin: 0px"><strong><br />
</strong></div>
<div style="margin: 0px"><strong><span class="Apple-style-span" style="font-size: large">Triangles in Homogenous coordinates</span></strong></div>
<div style="margin: 0px">In the last section, we know that there are internal and external line segments after the perspective division, we also have internal and external triangles. The internal triangles are the one that we usually sees. The external triangles must be formed by 1 internal line segment and 2 external line segments:</div>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a href="http://4.bp.blogspot.com/-T9LhG5LGq9Q/T3_T528h-TI/AAAAAAAAAUM/bNTggLtACBg/s1600/externalTri.png"><img src="http://4.bp.blogspot.com/-T9LhG5LGq9Q/T3_T528h-TI/AAAAAAAAAUM/bNTggLtACBg/s320/externalTri.png" alt="" width="320" height="208" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center">figure 5. external triangle</td>
</tr>
</tbody>
</table>
<p>In the above figure, the shaded area represents the external triangle formed by the points P<span class="Apple-style-span" style="font-size: xx-small">0</span>, P<span class="Apple-style-span" style="font-size: xx-small">1</span> and P<span class="Apple-style-span" style="font-size: xx-small">2</span>. This kind of external triangles may appear after the perspective projection transform. And this happens in our real world too:</p>
<table>
<tbody>
<tr>
<td>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a href="http://1.bp.blogspot.com/-OlJYIeUl3cs/T4BbwEDiLoI/AAAAAAAAAUk/v9IdMFQoKTM/s1600/clipTriPhoto.JPG"><img src="http://1.bp.blogspot.com/-OlJYIeUl3cs/T4BbwEDiLoI/AAAAAAAAAUk/v9IdMFQoKTM/s320/clipTriPhoto.JPG" alt="" width="320" height="240" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center">an external triangle in real world</td>
</tr>
</tbody>
</table>
</td>
<td>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a href="http://3.bp.blogspot.com/-S5jR7k5tLLU/T4Bb1Zp1OCI/AAAAAAAAAUs/ygOz9t773xE/s1600/clipTriPhotoFull.JPG"><img src="http://3.bp.blogspot.com/-S5jR7k5tLLU/T4Bb1Zp1OCI/AAAAAAAAAUs/ygOz9t773xE/s320/clipTriPhotoFull.JPG" alt="" width="240" height="320" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center">the full triangle of the left photo</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
<p>In the left photo, it shows an external triangle with one of the triangle vertex far behind the camera while the right photo shows the full view of the triangle and the cross marked the position of the camera where the left photo is taken.</p>
<p><strong><span class="Apple-style-span" style="font-size: large">Triangles clipping</span></strong><br />
To avoid the case of external triangles, lines/triangles should be clipped in homogenous coordinates before divided by the <em>w</em>-component. The homogenous point (<em>x</em>, <em>y</em>, <em>z</em>, <em>w</em>) will be tested with the following inequalities:</p>
<div style="text-align: center">(-<em>w </em>&lt;= <em>x </em>&lt;= <em>w</em>) &amp;&amp;   &#8212;&#8212; inequality. 1</div>
<div style="text-align: center">(-<em>w </em>&lt;= <em>y </em>&lt;= <em>w</em>) &amp;&amp;   &#8212;&#8212; inequality. 2</div>
<div style="text-align: center">(-<em>w </em>&lt;= <em>z </em>&lt;= <em>w</em> ) &amp;&amp;   &#8212;&#8212; inequality. 3<br />
<em>w </em>&gt; 0    &#8212;&#8212; inequality. 4</div>
<div>
<div style="text-align: center"></div>
<p>(The <em>z</em> clipping plane inequality is 0<em> </em>&lt;= <em>z </em>&lt;= <em>w</em> in the case for D3D, it depends on how the normalized device coordinates are defined.) Clipping by inequality 1,2,3 will effectively clip all points that with <em>w </em>&lt; 0 because if <em>w </em>&lt; 0, say <em>w </em>= -3:</p>
<div style="text-align: center">3 &lt;= x &lt;= -3     =&gt;     3 &lt;= -3</div>
<div style="text-align: center"></div>
<div style="text-align: left">which is impossible. But the point (0, 0, 0, 0) is still satisfy the first 3 inequalities and forming external cases, so inequality 4 is added. Consider a homogenous line with one end as (0, 0, 0, 0), it will equals to:</div>
<div style="text-align: left"></div>
<div style="text-align: center">L= (0, 0, 0, 0) + t * [ (<em>x</em>, <em>y</em>, <em>z</em>, <em>w</em>) - (0, 0, 0, 0) ] = t * (<em>x</em>, <em>y</em>, <em>z</em>, <em>w</em>)</div>
<p>which represent only a single point in homogenous coordinates. So triangle (after clipped by inequality 1, 2, 3) having one or two vertices with <em>w</em>=0 will result in either a line or a point which can be discarded. Hence, after clipping, no external triangles will be produced when dividing by <em>w-</em>component. To clip a triangle against a plane, the triangle may result in either  1 or 2 triangles depends on whether there are 1 or 2 vertex outside the clipping plane:</p>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a href="http://3.bp.blogspot.com/-C4wcagL7YSQ/T3_4q_4-PQI/AAAAAAAAAUc/x3uXbV63UFU/s1600/clipInternalTri.png"><img src="http://3.bp.blogspot.com/-C4wcagL7YSQ/T3_4q_4-PQI/AAAAAAAAAUc/x3uXbV63UFU/s320/clipInternalTri.png" alt="" width="320" height="208" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center">figure 6. clipping internal triangles</td>
</tr>
</tbody>
</table>
<p>Then the clipped triangles can be passed to the next stage to be rasterized either by a <a href="http://en.wikipedia.org/wiki/Scanline_rendering">scan line</a> algorithm or by a <a href="http://devmaster.net/forums/topic/1145-advanced-rasterization/">half-space</a> algorithm.</p>
<p>Below is the clipping result of an external triangles with 1 vertex behind the camera.</p>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a href="http://1.bp.blogspot.com/-0sruXroYV4E/T4Bd8iwNoZI/AAAAAAAAAU8/N4mkoQLP6KY/s1600/sampleClipExtTri.png"><img src="http://1.bp.blogspot.com/-0sruXroYV4E/T4Bd8iwNoZI/AAAAAAAAAU8/N4mkoQLP6KY/s320/sampleClipExtTri.png" alt="" width="320" height="180" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center">clipping external triangle in software rasterizer</td>
</tr>
</tbody>
</table>
<p>Below is another rasterized result:</p>
<table>
<tbody>
<tr>
<td>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a href="http://2.bp.blogspot.com/-cQ3S0rT6wI0/T4Beace2MUI/AAAAAAAAAVE/EQYVbccwg5Q/s1600/sampleDuckFill.png"><img src="http://2.bp.blogspot.com/-cQ3S0rT6wI0/T4Beace2MUI/AAAAAAAAAVE/EQYVbccwg5Q/s320/sampleDuckFill.png" alt="" width="320" height="180" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center">rasterized duck model</td>
</tr>
</tbody>
</table>
</td>
<td>
<table class="tr-caption-container" style="margin-left: auto;margin-right: auto;text-align: center" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="text-align: center"><a href="http://2.bp.blogspot.com/-gfqOLmsJe1o/T4Beejh4EgI/AAAAAAAAAVM/h7a0yanEe3o/s1600/sampleDuckRef.png"><img src="http://2.bp.blogspot.com/-gfqOLmsJe1o/T4Beejh4EgI/AAAAAAAAAVM/h7a0yanEe3o/s320/sampleDuckRef.png" alt="" width="320" height="180" border="0" /></a></td>
</tr>
<tr>
<td class="tr-caption" style="text-align: center">reference of the duck model</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
<p><strong><span class="Apple-style-span" style="font-size: large">Conclusion</span></strong><br />
In this post, the maths behind the clipping of triangles are explained. Clipping should be done before projecting the homogenous point to the <em>w</em>=1 to avoid taking special cares to clip the external triangles. In the next post, I will talk about the perspective interpolation and the source code will be given in the next post (written in  javascript, drawing to html canvas).</p>
<p>And lastly special thanks to Fabian Giesen for giving feedback during the draft of this post.</p>
<p><strong>References</strong><br />
[1] <a href="http://research.microsoft.com/pubs/73937/p245-blinn.pdf">http://research.microsoft.com/pubs/73937/p245-blinn.pdf</a><br />
[2] <a href="http://medialab.di.unipi.it/web/IUM/Waterloo/node51.html">http://medialab.di.unipi.it/web/IUM/Waterloo/node51.html</a><br />
[3] <a href="http://kriscg.blogspot.com/2010/09/software-occlusion-culling.html">http://kriscg.blogspot.com/2010/09/software-occlusion-culling.html</a><br />
[4] <a href="http://www.slideshare.net/guerrillagames/practical-occlusion-culling-in-killzone-3">http://www.slideshare.net/guerrillagames/practical-occlusion-culling-in-killzone-3</a><br />
[5] <a href="http://www.slideshare.net/repii/parallel-graphics-in-frostbite-current-future-siggraph-2009-1860503">http://www.slideshare.net/repii/parallel-graphics-in-frostbite-current-future-siggraph-2009-1860503</a><br />
[6] <a href="http://fgiesen.wordpress.com/2011/07/05/a-trip-through-the-graphics-pipeline-2011-part-5/">http://fgiesen.wordpress.com/2011/07/05/a-trip-through-the-graphics-pipeline-2011-part-5/</a><br />
[7] <a href="http://devmaster.net/forums/topic/1145-advanced-rasterization/">http://devmaster.net/forums/topic/1145-advanced-rasterization/</a></p>
</div>
]]></content:encoded>
			<wfw:commentRss>http://www.altdevblogaday.com/2012/04/14/software-rasterizer-part-1/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Oxel: A Tool for Occluder Generation</title>
		<link>http://www.altdevblogaday.com/2012/04/06/oxel-a-tool-for-occluder-generation/</link>
		<comments>http://www.altdevblogaday.com/2012/04/06/oxel-a-tool-for-occluder-generation/#comments</comments>
		<pubDate>Fri, 06 Apr 2012 23:30:27 +0000</pubDate>
		<dc:creator>Nick Darnell</dc:creator>
				<category><![CDATA[Computer Graphics]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[Tools]]></category>
		<category><![CDATA[graphics]]></category>
		<category><![CDATA[Hierarchical Z-Buffer]]></category>
		<category><![CDATA[Occlusion Volumes]]></category>
		<category><![CDATA[Rendering]]></category>
		<category><![CDATA[tool]]></category>

		<guid isPermaLink="false">http://altdevblogaday.com/?p=25415</guid>
		<description><![CDATA[<p>Its been awhile since I’ve done an update to my research into occluder generation.  If you need a refresher, take a look at:</p>
<ul>
<li><a href="http://www.nickdarnell.com/2011/06/hierarchical-z-buffer-occlusion-culling-generating-occlusion-volumes/" target="_blank">Hierarchical Z-Buffer Occlusion Culling – Generating Occlusion Volumes</a></li>
</ul>
<p><a href="http://www.altdevblogaday.com/2012/04/06/oxel-a-tool-for-occluder-generation/" class="more-link">Read more on Oxel: A Tool for Occluder Generation&#8230;</a></p>
]]></description>
				<content:encoded><![CDATA[<p>Its been awhile since I’ve done an update to my research into occluder generation.  If you need a refresher, take a look at:</p>
<ul>
<li><a href="http://www.nickdarnell.com/2011/06/hierarchical-z-buffer-occlusion-culling-generating-occlusion-volumes/" target="_blank">Hierarchical Z-Buffer Occlusion Culling – Generating Occlusion Volumes</a></li>
<li><a href="http://www.nickdarnell.com/2011/09/robust-inside-and-outside-solid-voxelization/" target="_blank">Robust Inside and Outside Solid Voxelization</a></li>
</ul>
<p>Lets start with the newest and most important information&#8230;</p>
<h3>Update 4/13/2012</h3>
<p>The project is now hosted on BitBucket, <a href="http://bitbucket.org/NickDarnell/oxel/overview" target="_blank">http://bitbucket.org/NickDarnell/oxel/overview</a> feel free to contribute or just follow along :)</p>
<h3>It’s a Tool Now!</h3>
<p><img style="margin: 5px;padding-left: 0px;padding-right: 0px;padding-top: 0px;border-width: 0px" src="http://altdevblogaday.com/wp-content/uploads/2012/04/oxel_v1.png" alt="oxel_v1" width="590" height="592" border="0" /></p>
<h4>Download</h4>
<p><a href="http://bitbucket.org/NickDarnell/oxel/downloads" target="_blank">Oxel 1.0.0 &#8211; Win32.zip</a></p>
<h4>Requirements</h4>
<ul>
<li><a href="http://www.microsoft.com/download/en/details.aspx?id=17718" target="_blank">.Net 4.0</a></li>
<li><a href="http://www.microsoft.com/download/en/details.aspx?id=5555" target="_blank">VS2010 C++ Runtime</a></li>
</ul>
<h4>Description</h4>
<p>Oxel is a tool for generating occluders – primarily for use with the Hierarchical Z-Buffer method of occlusion culling.  Open an .obj file then go to Build &gt; Voxelize to generate the proxy.  Try it out, let me know what you think.</p>
<p>There are some further improvements that have been made over the method that is laid out in the original <a href="http://www.nickdarnell.com/2011/06/hierarchical-z-buffer-occlusion-culling-generating-occlusion-volumes/" target="_blank">Generating Occlusion Volumes</a> post,</p>
<ul>
<li>Retriangulation</li>
<li>Evaluating Occlusion-ness</li>
<li>Filtering Polygons</li>
</ul>
<h3>Retriangulation</h3>
<p>I went back to the CSG article <a href="http://sandervanrossen.blogspot.com/">Sander van Rossen</a> and Matthew Baranowski wrote.  I noticed near the end that they recommended retriangulating the final CSG mesh because their library doesn’t handle it.</p>
<p>The easiest method I found to do the retriangulation was to just use David Eberly’s Wild Magic math library which can do it, <a href="http://www.geometrictools.com/SampleMathematics/Triangulation/Triangulation.html" target="_blank">see here</a>.  Before you perform the retriangulation you need to collect all the external/internal edge loops on each surface and then merge collinear edges before performing the retriangulation.</p>
<p>It’s was definitely worth doing, performing the retriangulation saved about 30% of the triangles.</p>
<h3>Evaluating Occlusion-ness</h3>
<p>When you’re generating occluder geometry you need to ensure that the volumes you’re adding are useful enough to pay the additional polygon tax they will incur.</p>
<p>Oxel achieves this by measuring the number of pixels written to the color buffer by rendering the original visual mesh into the stencil buffer, and then using the stencil buffer to clip a full screen quad while a “samples passed” hardware query is performed to record the number of pixels written.  This is performed from about 64 different camera angles – then when all the samples are summed together this defines the <strong>ground truth</strong> occlusion of the mesh.</p>
<p>Knowing that information we can evaluate every new box we plan to add to the final occlusion proxy geometry and ask – Does this increase the coverage enough to make it worth it?.  The default threshold is set to 3%.  If a new volume does not at least cover 3% of new silhouette pixels, we don’t include it in the final occluder mesh.</p>
<h3>Filtering Polygons</h3>
<p>Not every game needs the occluders to be visible from all angles.  For example most racing games will never have the cars jumping so high they see on top of a building.  In those circumstances you may not want to have the occluders bothering to have polygons on the top.</p>
<table width="596" border="0" cellspacing="0" cellpadding="2">
<tbody>
<tr>
<td valign="top" width="298"><a href="http://altdevblogaday.com/wp-content/uploads/2012/04/with.png"><img style="margin: 1px;padding-left: 0px;padding-right: 0px;padding-top: 0px;border-width: 0px" src="http://altdevblogaday.com/wp-content/uploads/2012/04/with_thumb.png" alt="with" width="275" height="168" border="0" /></a></td>
<td valign="top" width="296"><a href="http://altdevblogaday.com/wp-content/uploads/2012/04/without.png"><img style="margin: 1px;padding-left: 0px;padding-right: 0px;padding-top: 0px;border-width: 0px" src="http://altdevblogaday.com/wp-content/uploads/2012/04/without_thumb.png" alt="without" width="275" height="168" border="0" /></a></td>
</tr>
</tbody>
</table>
<p>So the tool offers the ability to remove Top and Bottom polygons from the final occluder mesh after everything has finished being processed.</p>
<p>If you decide to filter out all top polygons they’re all removed.  For bottom surfaces this has to be handled slightly differently, see below -</p>
<p><a href="http://altdevblogaday.com/wp-content/uploads/2012/04/bottom_removed.png"><img style="margin: 5px;padding-left: 0px;padding-right: 0px;padding-top: 0px;border-width: 0px" src="http://altdevblogaday.com/wp-content/uploads/2012/04/bottom_removed_thumb.png" alt="bottom_removed" width="580" height="331" border="0" /></a></p>
<p>Imagine a bridge structure or a large over hang on a building.  You wouldn’t want the bottom portion of the occluder to be removed from the overhang since it would allow seeing into and through the occluder &#8211; because presumably you wouldn’t have double sided rendering enabled.  So to distinguish between bottom surfaces not meant to ever be seen by the player, and bottom surfaces that may be important to higher up potions of the structure, only bottom surfaces within 1 voxel’s height from the base of the mesh are removed.  (See picture above)</p>
<h3>Work Continues</h3>
<p>I’ve got some more ideas I need to test out but I wanted to go ahead and cut a version of where I’m at and let others take a look and give me some feedback.</p>
<p>One area I need to investigate next is a better way to generate the boxes.  Currently they are just expanded in all directions equally, but that’s not ideal.  I’m thinking about trying a parallel brute force method that would test all possible boxes at a specific origin point to find the best shaped box to generate from that point.</p>
<p><a href="http://www.nickdarnell.com/2012/04/oxel-a-tool-for-occluder-generation/" target="_blank">Oxel: A Tool for Occluder Generation @ nickdarnell.com</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.altdevblogaday.com/2012/04/06/oxel-a-tool-for-occluder-generation/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Trip Report: Gamefest 2011 – Seattle</title>
		<link>http://www.altdevblogaday.com/2011/09/09/trip-report-gamefest-2011-%e2%80%93-seattle/</link>
		<comments>http://www.altdevblogaday.com/2011/09/09/trip-report-gamefest-2011-%e2%80%93-seattle/#comments</comments>
		<pubDate>Fri, 09 Sep 2011 19:00:57 +0000</pubDate>
		<dc:creator>Nick Darnell</dc:creator>
				<category><![CDATA[Education]]></category>
		<category><![CDATA[General Interest]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[conference]]></category>
		<category><![CDATA[DirectX 11]]></category>
		<category><![CDATA[Gamefest]]></category>
		<category><![CDATA[graphics]]></category>
		<category><![CDATA[Kinect]]></category>

		<guid isPermaLink="false">http://altdevblogaday.com/?p=16175</guid>
		<description><![CDATA[<p>I managed make it out to Seattle this year for Gamefest and figured I&#8217;d share my thoughts on some of the different presentations I saw. They are not available yet, but it looks like Microsoft is going to be posting the slides/audio for the different presentations <a href="http://www.microsoftgamefest.com/seattle_conferencedetails.htm" target="_blank">here</a> soon.</p>
<p><a href="http://www.altdevblogaday.com/2011/09/09/trip-report-gamefest-2011-%e2%80%93-seattle/" class="more-link">Read more on Trip Report: Gamefest 2011 – Seattle&#8230;</a></p>
]]></description>
				<content:encoded><![CDATA[<p>I managed make it out to Seattle this year for Gamefest and figured I&#8217;d share my thoughts on some of the different presentations I saw. They are not available yet, but it looks like Microsoft is going to be posting the slides/audio for the different presentations <a href="http://www.microsoftgamefest.com/seattle_conferencedetails.htm" target="_blank">here</a> soon.</p>
<h3>Tiled Resources for Xbox 360 and Direct3D 11 &#8211; Matt Lee</h3>
<p>This talk was about mega-texturing in DirectX 11/Xbox 360. Matt Lee was showing a new DirectX SDK sample that&#8217;s coming in the next SDK release giving a reference implementation of a mega-texturing run-time. I&#8217;ve only skimmed mega-texturing papers so I got a lot out of this talk since he walked through all the steps in the run-time.</p>
<p>The sample shows off how you begin by creating different tiles for different resource formats. Each pool is dedicated to a different texture format. The tiles in the pool are all the same size; However the tiles may vary in size depending upon the texture format to maximize cache efficiency. When you render the scene you have a shader that can write out texture look-up failures. When the UV coordinates and mip level are not found to be resident in memory a failure is added to this list. After the shader completes you read back the failures and proceed to load the tiles that will fit in your established pools.</p>
<p>Unlike most texture streaming systems you&#8217;re not loading an entire mip level or the entire mip chain of the texture. You&#8217;re only ever loading into the tiles a sub-region of a texture (like a 64&#215;64 pixel region), which overcomes one common texture streaming problem, texture memory fragmentation. Because the tile pools you create are never deallocated you don&#8217;t have to worry about fragmenting your texture memory because of different sized textures being streamed in and out.</p>
<p>Now the sample is not without its short comings, but that is mostly due to hardware limitations. Ideally the virtual texture system would be transparent, you wouldn&#8217;t need to write a shader that recorded look-up failures. The GPU and DirectX would simply report when a failure occurred and allow you to handle it. Maybe some day&#8230;</p>
<h3>Gesture Detection Using Machine Learning &#8211; Claude Marais</h3>
<p>If you have ever been interested in machine learning this is a worthwhile presentation to check out when the slides are posted. Claude Marais talked about a case study they performed to try and use machine learning to detect a Punch and a Kick. For their experiment they used <a href="http://en.wikipedia.org/wiki/AdaBoost" target="_blank">Adaboost</a> which is a machine learning technique that combines thousands of weak classifiers that &#8216;boost&#8217; each other and provide you with a high degree of accuracy in the results.</p>
<p>The classifiers are all extremely simple things, for example you may have a classifier like:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="cpp" style="font-family:monospace;"><span style="color: #0000ff;">if</span> <span style="color: #008000;">&#40;</span>elbow_joint_angle <span style="color: #000080;">&gt;</span> ANGLE<span style="color: #008000;">&#41;</span>
    <span style="color: #0000ff;">return</span> <span style="color: #0000dd;">1</span><span style="color: #008080;">;</span>
<span style="color: #0000ff;">return</span> <span style="color: #000040;">-</span><span style="color: #0000dd;">1</span><span style="color: #008080;">;</span></pre></td></tr></table></div>

<p>Then simply create a macro and have 180 variants of this classifier one for each ANGLE. If you can imagine all the different things you could measure about the skeleton, creating simple variants of the kernels for each of the possible test cases will explode the number of weak classifiers you have; Claude had around 21,000 weak classifiers for his system.</p>
<p>The training phase looks at labeled data sets to know what examples of punches look like (positive examples) and what -not- punches look like (negative examples). It uses the +1/-1 scores each weak classifier provides to determine the weights to apply to each classifier. After it has determined the best weak classifiers to detect a punch and not detect a negative example as a punch on accident you can use the classifiers at run-time with the weights applied to detect a punch.</p>
<p>The results were undeniable; they had a demo setup the the expo area that was really good at detecting a punch and kick.</p>
<p><del>The only real drawback to this solution is the data collection; they needed something on the order of 70,000 examples of punches and 7x that in negative not a punch examples before the training produced accuracies over 90-95% from the chart they had; if my memory is correct.</del></p>
<p><del></del><del></del>In training the system they had 70,000 frames worth of recorded training data.  The actual number of recorded punches used to train the system was 25 different people doing 10 punches, so around 250 punch examples.  Then they had about 7x that number in negative training examples, which might be things like waves, or other actions that SVM can use to differentiate between random movement and an intentional punch. (Thanks to Claude for clarifying this)</p>
<h3>Kinect and Kids: Pitfalls and Pleasantries &#8211; Deborah Hendersen</h3>
<p>If you had asked me to make a Kinect game for kids (ages 3-6) before seeing this presentation I likely would&#8217;ve designed something with a dumb-me as the target audience. What I quickly realized is how wrong I would&#8217;ve been to make that assumption. At that stage of development kids are not capable of interacting with games I&#8217;m used to playing.</p>
<p>Something as simple as a menu of options is an impossibility since they are illiterate. How many games have you seen that you could play without knowing how to read?</p>
<p>When interacting with an onscreen character, the kids ignore social norms of waiting for the person to finish talking. They may just jump the gun if they already know what is expected and get frustrated if they can&#8217;t do it when they want to.</p>
<p>Kids are distracted very easily and will make their own games out of game behavior. Deborah mentioned one story where a kid stopped playing the game because he realized he could get the game to react to leaving the play area and Kinect could no longer detect him the game would do something. So he made up his own game of jumping in and out of the play area to activate this condition; utterly boring for adults, completely entertaining for this kid.</p>
<p>You almost have to design the game like passive experience like a children&#8217;s TV show. Where on TV because there is no feedback, the TV show host asks the kid, &#8220;Can you find _______?&#8221; and the kid at home says something, and expecting this the show simply pauses while he waits for the response. The game has to function in essentially the same way, regardless of the kid participating in the expected fashion the game has to move forward. If it functions like a state machine that requires proper actions to move forward the kid may become bored and simply want to move on. If the game refuses to let them move on, they&#8217;ll just walk away.</p>
<p>I really enjoyed this presentation because it was very clear how difficult the problem space is and it was interesting to hear how they tried to solve each one.</p>
<h3>Kinect Hands: Finger Tracking and Voxel UI &#8211; Abdulwajid Mohamed and Tony Ambrus</h3>
<p>This presentation was broken into two completely different parts, the first part was on finger tracking with Kinect. This is one area I&#8217;ve been playing around in for awhile so it was interesting to see someone else&#8217;s attempt to solve the problem. Because the Kinect is a structured light depth camera you don&#8217;t necessarily have depth at each pixel like you would on a time of flight depth camera. Structured light cameras build a topology of depth using the light pattern it projects into the scene, viewed from a different angle it can discern depth, but a single dot does not give you depth. It connects groups of them when determining the depth of a surface. This means that even though your hand can be seen by Kinect, the further you back away from the sensor, the more like a mitten your hand becomes. The gaps between your fingers disappear until they are just clumps on your wrist.</p>
<p>Because of this limitation you can&#8217;t go past 10 feet, there simply isn&#8217;t enough data. Ideally the user is at 6 feet or closer, past 6 feet the accuracy begins to break down.</p>
<p>The way Microsoft tackled the problem was to first capture lots of hand examples and then to train an SVM (Support Vector Machine) against a curvature analysis of the hands. So once you know all the pixels that make up a persons hand you find the points on the hand that result in the largest changes in curvature. On an open hand these curves are your fingers and if you&#8217;re close enough to the camera that it can see the gaps between fingers it&#8217;s a very large change in curvature. A closed hand has more or less a uniform curvature change viewed from any angle. By training the SVM against a set of closed hand curvature examples vs. open hand curvature examples they were able to get pretty accurate results at about the 6-8 foot range for an adult, 5-7 feet for kids.</p>
<p>Because the detector is instantaneous i.e. it can tell you in a single frame is the hand open or closed, you need some something to counteract a single/couple misinterpreted frame. So they trained an HMM (Hidden Markov Model) on examples of a flaky transition where the system is quickly switching between 2 states because the hand is at an odd orientation confusing the SVM; I thought it was an interesting solution to the problem. I&#8217;ve only ever tried something simple like requiring 3 contiguous frames of agreement to have a state change.</p>
<p>The second half of the presentation was on a 3D (not stereoscopic) UI for Kinect. One of the problems with navigating a &#8216;push to click&#8217; interface is that it&#8217;s hard to correct for user drift. When a user pushes forward they may do several things,</p>
<ul>
<li>Push toward the TV</li>
<li>Push toward the sensor</li>
<li>Push forward (wherever forward happens to be at that moment in time)</li>
</ul>
<p>Depending upon what you&#8217;re expecting them to do there&#8217;s going to be drift away from the thing on the screen they are trying to click. To attempt to correct this Abdulwajid presented a UI where the hands are visualized as voxelized clumps of boxes in a 3D environment with 3D buttons that could be mashed. Seeing the hand in the same space as the button appeared to make it much easier to perform the click.</p>
<p>One thing I noticed that was not called out was his use of 2 directional shadow casting lights. By having 2 directional lights facing each other both casting shadows, the resulting effect is a focal point. As the hand gets closer to a surface the eye perceives the two shadows heading towards each other and can see the point where they will meet. I thought that was and additional powerful indicator of where your hand was moving in the space and made it much easier to correct drift.</p>
<p><a href="http://www.nickdarnell.com/2011/09/trip-report-gamefest-2011-seattle/" target="_blank">Trip Report: Gamefest 2011 – Seattle @ nickdarnell.com</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.altdevblogaday.com/2011/09/09/trip-report-gamefest-2011-%e2%80%93-seattle/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>An interesting vertex shader trick</title>
		<link>http://www.altdevblogaday.com/2011/08/08/interesting-vertex-shader-trick/</link>
		<comments>http://www.altdevblogaday.com/2011/08/08/interesting-vertex-shader-trick/#comments</comments>
		<pubDate>Mon, 08 Aug 2011 05:03:33 +0000</pubDate>
		<dc:creator>Cort Stratton</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[graphics]]></category>

		<guid isPermaLink="false">http://altdevblogaday.com/?p=13831</guid>
		<description><![CDATA[<p>I almost let today slip by without posting; instead, here&#8217;s a cute hack.</p>
<h3>Ode to an Unsung Hero</h3>
<p>Much has been written (<a HREF="http://altdevblogaday.com/2011/07/19/obligatory-fxaa-post/">on #AltDevBlogADay</a> and <a HREF="http://www.realtimerendering.com/blog/fxaa-rules-ok/">elsewhere</a>) about <a HREF="http://timothylottes.blogspot.com/">Timothy Lottes</a>&#8216; Fast Approximate Anti-Aliasing algorithm (FXAA). Naturally, most of the attention focuses on the pixel shader, but what about the poor neglected vertex shader?  Sure, it does almost nothing, but it does it so <em>elegantly</em>; let&#8217;s give it some love! Here&#8217;s the code (slightly simplified and reformatted):</p>
<p><a href="http://www.altdevblogaday.com/2011/08/08/interesting-vertex-shader-trick/" class="more-link">Read more on An interesting vertex shader trick&#8230;</a></p>
]]></description>
				<content:encoded><![CDATA[<p>I almost let today slip by without posting; instead, here&#8217;s a cute hack.</p>
<p><H3>Ode to an Unsung Hero</H3><br />
Much has been written (<A HREF="http://altdevblogaday.com/2011/07/19/obligatory-fxaa-post/">on #AltDevBlogADay</A> and <A HREF="http://www.realtimerendering.com/blog/fxaa-rules-ok/">elsewhere</A>) about <A HREF="http://timothylottes.blogspot.com/">Timothy Lottes</A>&#8216; Fast Approximate Anti-Aliasing algorithm (FXAA). Naturally, most of the attention focuses on the pixel shader, but what about the poor neglected vertex shader?  Sure, it does almost nothing, but it does it so <EM>elegantly</EM>; let&#8217;s give it some love! Here&#8217;s the code (slightly simplified and reformatted):</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
</pre></td><td class="code"><pre class="c" style="font-family:monospace;"><span style="color: #993333;">struct</span> VS_Output
<span style="color: #009900;">&#123;</span>  
    float4 Pos <span style="color: #339933;">:</span> SV_POSITION<span style="color: #339933;">;</span>              
    float2 Tex <span style="color: #339933;">:</span> TEXCOORD0<span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span><span style="color: #339933;">;</span>
&nbsp;
VS_Output VS<span style="color: #009900;">&#40;</span>uint id <span style="color: #339933;">:</span> SV_VertexID<span style="color: #009900;">&#41;</span>
<span style="color: #009900;">&#123;</span>
    VS_Output Output<span style="color: #339933;">;</span>
    Output.<span style="color: #202020;">Tex</span> <span style="color: #339933;">=</span> float2<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#40;</span>id <span style="color: #339933;">&lt;&lt;</span> <span style="color: #0000dd;">1</span><span style="color: #009900;">&#41;</span> <span style="color: #339933;">&amp;</span> <span style="color: #0000dd;">2</span><span style="color: #339933;">,</span> id <span style="color: #339933;">&amp;</span> <span style="color: #0000dd;">2</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    Output.<span style="color: #202020;">Pos</span> <span style="color: #339933;">=</span> float4<span style="color: #009900;">&#40;</span>Output.<span style="color: #202020;">Tex</span> <span style="color: #339933;">*</span> float2<span style="color: #009900;">&#40;</span><span style="color: #0000dd;">2</span><span style="color: #339933;">,-</span><span style="color: #0000dd;">2</span><span style="color: #009900;">&#41;</span> <span style="color: #339933;">+</span> float2<span style="color: #009900;">&#40;</span><span style="color: #339933;">-</span><span style="color: #0000dd;">1</span><span style="color: #339933;">,</span><span style="color: #0000dd;">1</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">,</span> <span style="color: #0000dd;">0</span><span style="color: #339933;">,</span> <span style="color: #0000dd;">1</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    <span style="color: #b1b100;">return</span> Output<span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span></pre></td></tr></table></div>

<p>What does it do? Well, let&#8217;s pretend we&#8217;ve bound the shader and are using it to draw a single triangle:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
</pre></td><td class="code"><pre class="c" style="font-family:monospace;">pImmediateContext<span style="color: #339933;">-&gt;</span>VSSetShader<span style="color: #009900;">&#40;</span>fxaaVS<span style="color: #339933;">,</span> ...<span style="color: #339933;">,</span> ...<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
pImmediateContext<span style="color: #339933;">-&gt;</span>IASetPrimitiveTopology<span style="color: #009900;">&#40;</span>D3D11_PRIMITIVE_TOPOLOGY_TRIANGLELIST<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
pImmediateContext<span style="color: #339933;">-&gt;</span>Draw<span style="color: #009900;">&#40;</span><span style="color: #0000dd;">3</span><span style="color: #339933;">,</span> <span style="color: #0000dd;">0</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span></pre></td></tr></table></div>

<p>The shader&#8217;s only input is the vertex ID, a system-generated value that starts at zero and increases for every new vertex. So, the shader will be invoked three times with IDs 0, 1, and 2. This produces the following output:<br />
<code>ID=0 -&gt; Pos=[-1,-1], Tex=[0,0]<br />
ID=1 -&gt; Pos=[&nbsp;3,-1], Tex=[2,0]<br />
ID=2 -&gt; Pos=[-1,-3], Tex=[0,2]</code><br />
If we clip the resulting triangle into homogenous clip space (-1..1 along each axis), we see that it just barely fills the XY plane, and that the texture coordinates range from [0,0] in the upper-left corner to [1,1] in the lower-right. Aha, a full-screen quad!<br />
<a href="http://altdevblogaday.com/wp-content/uploads/2011/08/triangle.png"><img src="http://altdevblogaday.com/wp-content/uploads/2011/08/triangle.png" alt="" width="518" height="404" class="size-full wp-image-13834" /></a><br />
So what? Full-screen quads are easy! What makes this one interesting is that the shader is completely self-contained, with <EM>zero</EM> dependencies on external data. No vertex buffer, no index buffer, no constant buffers, no transformation matrices, and no unusual render state. Just bind the shader, draw a single triangle using auto-generated indices, and you&#8217;re done. It doesn&#8217;t even rely on any fancy HLSL-specific language features, so you could easily concoct a similar shader in Cg/GLSL/etc. Neat! It&#8217;s almost as easy as the old immediate-mode days, before every draw call had to wrangle multiple buffers in GPU-mapped memory.</p>
<p><H3>Digging (Arguably) Too Much Deeper</H3><br />
Clean though it may be, the shader is missing one very useful feature: all but the simplest full-screen shaders need the viewport resolution (or more specifically, the inverse resolution) so that neighboring texels can be sampled. These values could very easily be hard-coded or passed as a shader constant and then stashed into the Z and W coordinates of the output UVs, but in doing so you&#8217;d tarnish an otherwise self-sufficient piece of code. Is there another way?</p>
<p>Well, the vertex ID is a full 32-bit integer, and we&#8217;re only using the lowest two bits. The D3D11 Draw() function allows you to specify the ID of the first vertex to draw. Since we&#8217;re already not using the ID to index into memory, why not pack the viewport&#8217;s width and height into the high 30 bits?</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
</pre></td><td class="code"><pre class="c" style="font-family:monospace;">pImmediateContext<span style="color: #339933;">-&gt;</span>Draw<span style="color: #009900;">&#40;</span><span style="color: #0000dd;">3</span><span style="color: #339933;">,</span> <span style="color: #009900;">&#40;</span><span style="color: #009900;">&#40;</span>viewWidth<span style="color: #339933;">&amp;</span><span style="color: #208080;">0x7FFF</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">&lt;&lt;</span><span style="color: #0000dd;">17</span><span style="color: #009900;">&#41;</span> <span style="color: #339933;">|</span> <span style="color: #009900;">&#40;</span><span style="color: #009900;">&#40;</span>viewHeight<span style="color: #339933;">&amp;</span><span style="color: #208080;">0x7FFF</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">&lt;&lt;</span><span style="color: #0000dd;">2</span><span style="color: #009900;">&#41;</span> <span style="color: #339933;">|</span> <span style="color: #0000dd;">0</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span></pre></td></tr></table></div>

<p>Unfortunately, this doesn&#8217;t work. No matter what starting index you pass, the SV_VertexID you see in HLSL always starts at zero. There&#8217;s probably a damned good reason for this, but for now it&#8217;s ruined my day. There&#8217;s also the possibility that you&#8217;ll run into an over-eager graphics driver that will try to pre-fetch some ridiculously far-away vertex even though you&#8217;re not using it, and probably end up triggering a segmentation fault.</p>
<p>There <EM>is</EM> another option: in the pixel shader, we could use the ddx()/ddy() functions to compute the screen-space partial derivative of the incoming texture coordinates. Since our UVs range from 0 to 1 across the screen, the partial derivative in each axis is exactly the inverse viewport resolution! But now you&#8217;re doing an extra bit of completely redundant ALU work in every single invocation of the pixel shader; on one hand, ALU is cheap (and getting cheaper), but most likely this is still too steep a price to pay for a bit of extra elegance and CPU-side simplicity.</p>
<p>[<B>Update:</B> As readers have pointed out in the comments, HLSL Shader Model 4+ provides a mechanism for querying the dimensions of a texture (via the GetDimensions() method), as well as applying an integer offset to texture reads (via the optional offset parameter to Sample()). In many full-screen passes, the source texture has the same dimensions as the destination viewport; if so, you're golden!]</p>
<p>That&#8217;s it; nothing earth-shattering, just a clever little nugget.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.altdevblogaday.com/2011/08/08/interesting-vertex-shader-trick/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Obligatory FXAA Post</title>
		<link>http://www.altdevblogaday.com/2011/07/19/obligatory-fxaa-post/</link>
		<comments>http://www.altdevblogaday.com/2011/07/19/obligatory-fxaa-post/#comments</comments>
		<pubDate>Tue, 19 Jul 2011 01:13:50 +0000</pubDate>
		<dc:creator>Jon Moore</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[anti-aliasing]]></category>
		<category><![CDATA[FXAA]]></category>
		<category><![CDATA[graphics]]></category>
		<category><![CDATA[programming]]></category>

		<guid isPermaLink="false">http://altdevblogaday.com/?p=11529</guid>
		<description><![CDATA[<p>I did a quick search through AltDev and I don&#8217;t think anyone else has really talked about FXAA on AltDevBlogADay yet, but the time is long overdue. I&#8217;m a little late to the game trying out <a href="http://twitter.com/#!/TimothyLottes" target="_blank">Timothy Lottes</a>&#8216; post-process anti-aliasing technique, but I had been noticing people saying great things about it when I finally decided to give it a serious look over 4th of July weekend. I had already been thinking about writing about my experiences with it when I saw <a href="http://twitter.com/#!/pointinpolygon" target="_blank">Eric Haines</a> had posted up a <a href="http://www.realtimerendering.com/blog/fxaa-rules-ok/" target="_blank">glowing review of it over on realtimerendering.com</a>. So if you don&#8217;t trust the judgement of some intern that doesn&#8217;t even have a college degree completed, please refer to all the more qualified people that are having similar experiences to me.</p>
<p><a href="http://www.altdevblogaday.com/2011/07/19/obligatory-fxaa-post/" class="more-link">Read more on Obligatory FXAA Post&#8230;</a></p>
]]></description>
				<content:encoded><![CDATA[<p>I did a quick search through AltDev and I don&#8217;t think anyone else has really talked about FXAA on AltDevBlogADay yet, but the time is long overdue. I&#8217;m a little late to the game trying out <a href="http://twitter.com/#!/TimothyLottes" target="_blank">Timothy Lottes</a>&#8216; post-process anti-aliasing technique, but I had been noticing people saying great things about it when I finally decided to give it a serious look over 4th of July weekend. I had already been thinking about writing about my experiences with it when I saw <a href="http://twitter.com/#!/pointinpolygon" target="_blank">Eric Haines</a> had posted up a <a href="http://www.realtimerendering.com/blog/fxaa-rules-ok/" target="_blank">glowing review of it over on realtimerendering.com</a>. So if you don&#8217;t trust the judgement of some intern that doesn&#8217;t even have a college degree completed, please refer to all the more qualified people that are having similar experiences to me.</p>
<p><strong>The Problem at Hand</strong></p>
<p>For those of you that might be reading this and haven&#8217;t drank the graphics programming Kool-Aid, I&#8217;m going to fill you in a little bit as to why we care about anti-aliasing.</p>
<p>In our line of work, the end result of a rendered scene is typically a 2-dimensional array of colors that is displayed on the user&#8217;s monitor. Just like how audio (which is originally analog), is quantized when it is made digital, the image is built into a discrete number of pixels during the process of rasterization (i.e. filling in each triangle). The measure of how many pixels are used to display the image is of course known as the resolution, which is why things become increasingly blocky as you lower a game&#8217;s resolution, due to there being fewer dots per inch.</p>
<p>Even at high resolutions, individual pixels can still often be identified by the viewer, because the lines and edges of triangles form distracting &#8220;jaggies&#8221;. This is noticeable along the outline of an object being displayed due to rasterization, but can also occur on interior surfaces for other reasons such as shadows and texture map resolution. The user typically picks the resolution their monitors are set at, but most console games provide 720p or 1080p images to the TV, and monitors tend to follow suit (my laptop is set to ~720p). Other than relying on hardware to pack more pixels into the same physical area by increasing the dots per inch (which is something that Apple is claiming with it&#8217;s retina displays, but check out <a href="http://filmicgames.com/archives/698" target="_blank">this post for an interesting look at that</a>), we have to find ways to smooth the transition between pixels. Perhaps the simplest strategy to dealing with aliasing is to render into a texture twice as large as the target resolution and then downsampling it to actual output resolution. This allows you to take the average of every four pixels, softening out the boundaries by averaging the edge colors together. However, this also just straight up sucks for performance. You end up processing 4x as many fragments from the higher resolution, and you have 4x the memory usage for the buffer. This is way too high a cost to pay for some smooth edges.</p>
<p><strong>The Hardware Option</strong></p>
<p>There is a hardware accelerated method, MSAA (Multi-Sample Anti-Aliasing). This operates by computing additional samples when rendering the frame instead of rendering additional pixels and downsampling. The fragment shader is only run once for each group of samples. The problem is that it doesn&#8217;t work with the increasingly popular deferred rendering techniques, because you can&#8217;t really take more samples of a buffer you&#8217;ve already rendered. In short, the damage is already done, the data has been discretized to a particular resolution. Furthermore, I&#8217;ve always found that MSAA is still pretty expensive (but then again, the only way to make your rendering take 0 ms is to quite doing rendering and switch to a job in finance).</p>
<p><strong>New Maps of AA-land</strong></p>
<p>The desire to use deferred rendering has pushed alternate forms of anti-aliasing to get a lot of attention (I mean who doesn&#8217;t need more acronyms, right?). Perhaps the most prevalent that you may have heard of is MLAA, but most of the techniques that I&#8217;m referring to here are post-processing techniques that rely on detecting and softening edges. By doing anti-aliasing as a post-process, it will work seamlessly with deferred rendering, and pretty much anything else for that matter. If the technique only needs the color buffer access, then it can even be applied to video/screenshots/whatever of existing games, which I think researchers in this area have found as a great method of showing off how their work.</p>
<p>MLAA was originally a CPU based technique developed by Intel, that has since then been adopted to be done on the GPU. This has been outlined in GPU Pro 2, Game Developer Magazine, and around the net, but if you&#8217;re not familiar with it I&#8217;ll make a few brief points about it. It essentially boils down to storing edges in a texture with different colors indicating which side of the pixel the edge is located at, and an additional buffer is used to calculate the blending weights for the blurring.</p>
<p>Impressively, they found results falling somewhere between 4x and 8x MSAA, while being 11x faster than 8x MSAA. The memory footprint of the technique is 1.5x or 2x the size of the back buffer depending on the hardware (2x for the Xbox 360 for all you console devs who probably have already heard everything I&#8217;m saying). That&#8217;s pretty good for something that solves the problems encountered with deferred rendering at the same time. This is undoubtably why the technique has garnered so much attention, and I really recommend the article in GPU Pro 2 if you want a clear view of all of the details.</p>
<p><strong>Enter: FXAA</strong></p>
<p>As you may know from reading other posts, my big side-project/hobby/thing is that when I give a new technique a try, I do it in Unity because it <strong>a)</strong> is usually not straightforward and requires actually understanding the technique to get it working and <strong>b)</strong> can be evaluated using the many projects I&#8217;ve done in Unity previously. From looking over the details of MLAA, I knew it would probably take a full weekend to get it right, and I had been procrastinating quite a bit with getting around to doing it.</p>
<p>When the third iteration of FXAA rolled out I decided to take a look at what it entailed. I knew in the back of my mind that FXAA was a strictly luminosity based technique, which is interesting to me. The authors of MLAA recommend using depth to determine edges for best results and performance. However, luminosity based techniques offer the advantage/disadvantage of smoothing boundaries that exist in places other than depth, such as with aliasing on texture maps and with shadows. The downside is that this can produce results that are too blurry in places you don&#8217;t want blur, such as with text on a prop in 3D space. I once tried a very, very simple luminosity based AA filter, that resulted in too many cons (especially with blurry text) for me to seriously use. I was curious if FXAA would give me similar problems.</p>
<p>With the intention of just looking over the code briefly, I suddenly found myself staring at a very simple and easy to use code base offering a ton of well explained pre-processor options for target platform and quality. Being around midnight when I started looking, I quickly decided that porting the higher-quality PC version of the HLSL code to CG/Unity would be fun. There were two steps involved:</p>
<p><strong>1)</strong> At the end of all the other post-processing, calculate Luminosity and slam it into the alpha channel (super easy to do).<br />
<strong>2)</strong> Perform the FXAA pass. Porting mostly involved fixing texture look-up syntax.</p>
<p>No extra buffers involved. This cuts down on the extra memory needed for MLAA, and the code seemed simple enough that it would probably be pretty fast. I ported all the code within 2 hours and then went to sleep. The next morning I finished setting it up for use in Unity&#8230; and was blown away by the speed and the results. Here&#8217;s a breakdown of what I got running in the Unity editor at ~720p, using Dust, a previous project of mine, to test it. These are PNG&#8217;s cropped down at native resolution:</p>
<p><strong>Shot 1: No Anti-aliasing</strong></p>
<p><a href="http://altdevblogaday.com/wp-content/uploads/2011/07/SailNoAA.png"><img src="http://altdevblogaday.com/wp-content/uploads/2011/07/SailNoAA.png" alt="" width="440" height="320" class="aligncenter size-full wp-image-11650" /></a></p>
<p><strong>Shot 2: FXAA3</strong></p>
<p><a href="http://altdevblogaday.com/wp-content/uploads/2011/07/SailFXAA3.png"><img src="http://altdevblogaday.com/wp-content/uploads/2011/07/SailFXAA3.png" alt="" width="440" height="320" class="aligncenter size-full wp-image-11656" /></a></p>
<p><strong>Shot 3: 6x MSAA</strong></p>
<p><a href="http://altdevblogaday.com/wp-content/uploads/2011/07/Sail6xAA.png"><img src="http://altdevblogaday.com/wp-content/uploads/2011/07/Sail6xAA.png" alt="" width="440" height="320" class="aligncenter size-full wp-image-11657" /></a></p>
<p>It takes FXAA3 only ~1 ms [Note: corrected from an erroneous order of magnitude type when I first posted this article] on my laptop (MacBook Pro with an NVIDIA GeForce 320M) to be completed (including the luminosity calculation), and as I mentioned, no additional memory either. I&#8217;ll pay that millisecond any day of the week for that quality of anti-aliasing. Furthermore, the blurriness on text was much more acceptable than the fast blur I had tried previously. Note that this text is not really meant to be read, but rather recognized to match the voice over so the player can understand that the voice is that of the journal&#8217;s author. I wish I could credit the sources where I put together the fast blur from, but it&#8217;s been more than a few months since I implemented/ditched it. Here&#8217;s a comparison:</p>
<p><strong>Shot 1: No Anti-aliasing</strong></p>
<p><a href="http://altdevblogaday.com/wp-content/uploads/2011/07/TextNoAA.png"><img src="http://altdevblogaday.com/wp-content/uploads/2011/07/TextNoAA.png" alt="" width="440" height="320" class="aligncenter size-full wp-image-11658" /></a></p>
<p><strong>Shot 2: FXAA3</strong></p>
<p><a href="http://altdevblogaday.com/wp-content/uploads/2011/07/TextFXAA3.png"><img src="http://altdevblogaday.com/wp-content/uploads/2011/07/TextFXAA3.png" alt="" width="440" height="320" class="aligncenter size-full wp-image-11659" /></a></p>
<p><strong>Shot 3: Fast Blur</strong></p>
<p><a href="http://altdevblogaday.com/wp-content/uploads/2011/07/TextFastAA.png"><img src="http://altdevblogaday.com/wp-content/uploads/2011/07/TextFastAA.png" alt="" width="440" height="320" class="aligncenter size-full wp-image-11660" /></a></p>
<p>This is getting a bit more to the personal opinion end of things, but even though FXAA3 does blur the text, I feel like the fast blur makes it almost uncomfortable to look at. I get the sensation that I&#8217;m getting tested for a new prescription of glasses, and have to recognize out of focus letters. To me, that seems like a pretty good indication that your luminosity based AA technique isn&#8217;t up to snuff if you get that type of blurring. FXAA3 on the other hand seems acceptable enough that it&#8217;ll definitely be enabled in future builds of Dust that get pushed up onto my website.</p>
<p><strong>What did we learn?</strong></p>
<p>Hopefully we learned that FXAA is both fast and doesn&#8217;t require additional memory, and is trivially simple to implement or port. There are versions for the 360, PS3, and PC, as well as HLSL and GLSL variants. It&#8217;s a luminosity based solution, and comes with the associated pros/cons those techniques have, but I&#8217;ve found FXAA to minimize the cons. It should literally take you at most a few hours to get it up and running in your game and evaluate if it&#8217;s a good fit for what you&#8217;re doing. Furthermore, it is in the public domain (credit to the post by Eric Haines for this twitter snippet):</p>
<p><a href="http://altdevblogaday.com/wp-content/uploads/2011/07/FXAAPublicDomain.jpg"><img src="http://altdevblogaday.com/wp-content/uploads/2011/07/FXAAPublicDomain.jpg" alt="" width="539" height="272" class="aligncenter size-full wp-image-11826" /></a></p>
<p>Although at this point, I&#8217;d the amount of quality you&#8217;re getting for the effort of implementation, it might as well be under <a href="http://en.wikipedia.org/wiki/Beerware" target="_blank">the beer license</a>. Finally, I would point out that Timothy Lottes is still improving on the code, and has already released version 3.9 in the time since I first touched it. You can find the latest source links and updates on his blog: <a href="http://timothylottes.blogspot.com/" target="_blank">http://timothylottes.blogspot.com/</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.altdevblogaday.com/2011/07/19/obligatory-fxaa-post/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Threading and Your Game Loop</title>
		<link>http://www.altdevblogaday.com/2011/07/03/threading-and-your-game-loop/</link>
		<comments>http://www.altdevblogaday.com/2011/07/03/threading-and-your-game-loop/#comments</comments>
		<pubDate>Sun, 03 Jul 2011 09:42:30 +0000</pubDate>
		<dc:creator>Kevin Gadd</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[graphics]]></category>
		<category><![CDATA[parallelism]]></category>
		<category><![CDATA[Performance]]></category>
		<category><![CDATA[programming]]></category>

		<guid isPermaLink="false">http://altdevblogaday.com/?p=10207</guid>
		<description><![CDATA[<p>Most game programmers are familiar with the typical &#8216;game loop&#8217;. It&#8217;s usually broken up into two key stages, and looks like this:</p>
<p><a href="http://altdevblogaday.com/wp-content/uploads/2011/07/01.png"><img src="http://altdevblogaday.com/wp-content/uploads/2011/07/01.png" alt="" width="500" height="22" class="alignnone size-full wp-image-10211" /></a></p>
<p>From a performance perspective, this is suboptimal, to say the least. We can never utilize multiple cores with this design, and we will be lucky to utilize 100% of a single core. In particular, vertical sync has a way of introducing long, useless pauses into our UI thread, like this:</p>
<p><a href="http://www.altdevblogaday.com/2011/07/03/threading-and-your-game-loop/" class="more-link">Read more on Threading and Your Game Loop&#8230;</a></p>
]]></description>
				<content:encoded><![CDATA[<p>Most game programmers are familiar with the typical &#8216;game loop&#8217;. It&#8217;s usually broken up into two key stages, and looks like this:</p>
<p><a href="http://altdevblogaday.com/wp-content/uploads/2011/07/01.png"><img src="http://altdevblogaday.com/wp-content/uploads/2011/07/01.png" alt="" width="500" height="22" class="alignnone size-full wp-image-10211" /></a></p>
<p>From a performance perspective, this is suboptimal, to say the least. We can never utilize multiple cores with this design, and we will be lucky to utilize 100% of a single core. In particular, vertical sync has a way of introducing long, useless pauses into our UI thread, like this:</p>
<p><a href="http://altdevblogaday.com/wp-content/uploads/2011/07/02.png"><img src="http://altdevblogaday.com/wp-content/uploads/2011/07/02.png" alt="" width="690" height="25" class="alignnone size-full wp-image-10212" /></a></p>
<p>The pedants among you may note that the GPU can cheat and help us out by moving that &#8216;wait for vertical sync&#8217; pause to the beginning of the next redraw. This is true, but the problem is that we are still stuck executing serially on one thread. We have to be extremely lucky to hit 100% utilization on that CPU, and we&#8217;ll never utilize other CPUs.<br />
Depending on the kind of game you&#8217;re building, you may be able to get some easy wins here &#8211; for example, let&#8217;s say you have enough objects that you can update them on two threads:</p>
<p><a href="http://altdevblogaday.com/wp-content/uploads/2011/07/03.png"><img src="http://altdevblogaday.com/wp-content/uploads/2011/07/03.png" alt="" width="690" height="49" class="alignnone size-full wp-image-10213" /></a></p>
<p>This is better. Now, the update work we&#8217;re doing can utilize two cores. Depending on your design, you can easily move up to 4 or 8 cores for your update code. You&#8217;ll notice that during a redraw, these threads have to sit idle, because there&#8217;s nothing else for them to do &#8211; the UI thread is using all the game state in order to draw, so no updating can occur.<br />
There&#8217;s a common solution for this sometimes known as &#8216;double buffered multithreading&#8217;, where you create two &#8216;buffers&#8217; that hold copies of your game state. It looks something like this:</p>
<h2>Double-Buffered Multithreading</h2>
<p><a href="http://altdevblogaday.com/wp-content/uploads/2011/07/04.png"><img src="http://altdevblogaday.com/wp-content/uploads/2011/07/04.png" alt="" width="690" height="35" class="alignnone size-full wp-image-10214" /></a></p>
<p><i>(In these diagrams, a red arrow indicates that one thread asks another to perform work, and a blue arrow indicates that one thread wakes another thread up from a waiting state.)</i></p>
<p>We now have two distinct threads of execution that synchronize at specific points.  In the example you can see red arrows indicating where the update thread tells the render thread to do work. Because we write to alternating buffers, the render thread can take as much time with Buffer A as it wants, and we can use that time to fill Buffer B. We only have to stop and wait when we need to update Buffer A again, at which point we need to be sure the render thread is done with it.</p>
<p>We can utilize multiple hardware threads to handle our update and render logic in this design &#8211; the &#8216;update thread&#8217; in this example could be four threads working on update logic in parallel. We can do something similar with the render thread, though the threading restrictions in Direct3D and OpenGL mean that eventually we become single-threaded again.<br />
The &#8216;double buffered&#8217; model is pretty easy to understand. I&#8217;m now going to describe an alternate model that has some distinct advantages. I&#8217;m going to call this the &#8216;batched model&#8217;:</p>
<h2>Batched Multithreading</h2>
<p><a href="http://altdevblogaday.com/wp-content/uploads/2011/07/05.png"><img src="http://altdevblogaday.com/wp-content/uploads/2011/07/05.png" alt="" width="690" height="52" class="alignnone size-full wp-image-10215" /></a></p>
<p>The concept behind this model is that each update is turned into a &#8216;batch&#8217; full of rendering commands that can be sent to the video card. A batch contains all the information necessary to draw a frame, but does not contain any other game state. This makes it smaller than a full copy of the game state. Building a batch containing your rendering commands is actually quite simple, because it&#8217;s what already happens behind the scenes: Direct3D/OpenGL calls turn into batches of native rendering commands that are sent to the GPU.</p>
<p>At first glance, this model might seem worse. It&#8217;s certainly more complicated. Your game loop has become a pipeline, with three stages: Update, Prepare, and Render. In practice, however, this model works quite well. One of the key advantages is that the pipelines are independent. Preparing could take a long time, for example:</p>
<p><a href="http://altdevblogaday.com/wp-content/uploads/2011/07/06.png"><img src="http://altdevblogaday.com/wp-content/uploads/2011/07/06.png" alt="" width="690" height="52" class="alignnone size-full wp-image-10216" /></a></p>
<p>In fact, updating, preparing, and drawing can all take a long time, and things look pretty good:</p>
<p><a href="http://altdevblogaday.com/wp-content/uploads/2011/07/07.png"><img src="http://altdevblogaday.com/wp-content/uploads/2011/07/07.png" alt="" width="690" height="52" class="alignnone size-full wp-image-10217" /></a></p>
<p>With this model, each of your pipelines being independent means that correctly written code will allow each pipeline to utilize a full CPU, as long as the pipeline&#8217;s got work to do.</p>
<h3>Performance Advantages</h3>
<p>Even better, pipelines can run on multiple cores. Here is where the value of the &#8216;prepare&#8217; stage comes in. In a normal single-threaded model, drawing a group of sprites would look like this:</p>
<p><a href="http://altdevblogaday.com/wp-content/uploads/2011/07/08.png"><img src="http://altdevblogaday.com/wp-content/uploads/2011/07/08.png" alt="" width="690" height="18" class="alignnone size-full wp-image-10218" /></a></p>
<p>If you&#8217;re observant, you&#8217;ll note that the only real input to the &#8216;build vertices&#8217; stage is the location of the sprite (along with information that changes infrequently, like the size of the sprite). As a result, we can defer the &#8216;build vertices&#8217; stage until later, or even do it on another thread.  Even better, this results in each thread doing the same work repeatedly. When a thread is doing the same work over and over, the CPU&#8217;s instruction and data caches are better utilized, improving performance.</p>
<p><a href="http://altdevblogaday.com/wp-content/uploads/2011/07/09.png"><img src="http://altdevblogaday.com/wp-content/uploads/2011/07/09.png" alt="" width="690" height="52" class="alignnone size-full wp-image-10219" /></a></p>
<p>This model also provides some other natural advantages. For example, a common performance issue when writing Direct3D code is that changing states too frequently is expensive. If you change rendering states every time you draw an object, you pay a tremendous price for doing so and the GPU isn&#8217;t able to operate at peak performance. Using this pipeline model, you can build up a long list of drawing commands, and then sort them by rendering state &#8211; enabling you to minimize the number of state changes. The performance benefits from this can be tremendous.</p>
<h3>Scaling Out Further</h3>
<p>Speaking of GPUs, this model is also useful for taking advantage of the GPU to perform computation. In the sprite example above, you can trivially move the hard work from the &#8216;build vertices&#8217; stage of the pipeline into a vertex shader running on the GPU. After doing so, things would look like this:</p>
<p><a href="http://altdevblogaday.com/wp-content/uploads/2011/07/10.png"><img src="http://altdevblogaday.com/wp-content/uploads/2011/07/10.png" alt="" width="690" height="69" class="alignnone size-full wp-image-10220" /></a></p>
<p>Furthermore, because each pipeline only relies on its inputs, you can take those inputs and split them into pieces and run the pipeline on multiple threads without synchronization. As long as the pipeline doesn&#8217;t hand its outputs to another thread before they&#8217;re finished, you don&#8217;t need locks and you don&#8217;t have to wait for other threads. Each pipeline pulls its inputs off a queue, processes them to produce its outputs, and shoves those outputs onto another queue. Most modern CPUs and GPUs are built in a similar pipelined fashion.</p>
<p><a href="http://altdevblogaday.com/wp-content/uploads/2011/07/11.png"><img src="http://altdevblogaday.com/wp-content/uploads/2011/07/11.png" alt="" width="690" height="69" class="alignnone size-full wp-image-10221" /></a></p>
<p>I hope this post has given you some ideas about how to scale your game up to make better use of multiple cores!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.altdevblogaday.com/2011/07/03/threading-and-your-game-loop/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Your Argument Is Invalid: Improving OpenGL error messages</title>
		<link>http://www.altdevblogaday.com/2011/06/23/improving-opengl-error-messages/</link>
		<comments>http://www.altdevblogaday.com/2011/06/23/improving-opengl-error-messages/#comments</comments>
		<pubDate>Thu, 23 Jun 2011 17:30:48 +0000</pubDate>
		<dc:creator>Cort Stratton</dc:creator>
				<category><![CDATA[#gamedev]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[debugging]]></category>
		<category><![CDATA[graphics]]></category>
		<category><![CDATA[opengl]]></category>
		<category><![CDATA[programming]]></category>

		<guid isPermaLink="false">http://altdevblogaday.org/?p=9346</guid>
		<description><![CDATA[<p>For  my first #AltDevBlogADay post, I’d like to cover a technique that makes  debugging OpenGL code quite a bit less painful: the  <a href="http://www.opengl.org/registry/specs/ARB/debug_output.txt">GL_ARB_debug_output</a> extension. For one thing, this extension significantly improves the  quality of error messages generated by the OpenGL runtime.  Perhaps more  importantly, it allows errors to be reported immediately by the  errant API call via a user-provided callback.  It&#8217;s not the sexiest feature in the world; it won&#8217;t make your render loop faster or your screenshots prettier. Nevertheless, I&#8217;ve found it has drastically reduced the time I spend debugging, which means more time for the fun stuff!</p>
<p><a href="http://www.altdevblogaday.com/2011/06/23/improving-opengl-error-messages/" class="more-link">Read more on Your Argument Is Invalid: Improving OpenGL error messages&#8230;</a></p>
]]></description>
				<content:encoded><![CDATA[<p>For  my first #AltDevBlogADay post, I’d like to cover a technique that makes  debugging OpenGL code quite a bit less painful: the  <a href="http://www.opengl.org/registry/specs/ARB/debug_output.txt">GL_ARB_debug_output</a> extension. For one thing, this extension significantly improves the  quality of error messages generated by the OpenGL runtime.  Perhaps more  importantly, it allows errors to be reported immediately by the  errant API call via a user-provided callback.  It&#8217;s not the sexiest feature in the world; it won&#8217;t make your render loop faster or your screenshots prettier. Nevertheless, I&#8217;ve found it has drastically reduced the time I spend debugging, which means more time for the fun stuff!</p>
<p>For example, let&#8217;s consider a common OpenGL error:</p>
<pre>glEnable(GL_DEPTH); // BAD!</pre>
<p>Of course, you and I know this is a bug (GL_DEPTH is not a valid argument to glEnable — perhaps you meant glEnable(GL_DEPTH_TEST) or glDepthMask(GL_TRUE)?). By default, this call will quietly fail, and the scene will not render correctly. If you call glGetError() at some point later, you will learn that a GL_INVALID_ENUM error occurred (&#8230;somewhere&#8230;), but that’s it. No indication of which function triggered the error, or which parameter was invalid.  Weak!</p>
<p>With the shiny new debug output extension, something like the following message will appear immediately upon executing this misguided glEnable() call:</p>
<pre>OpenGL: glEnable parameter has an invalid enum '0x1801' (GL_INVALID_ENUM)
[category=API_ERROR severity=HIGH id=1001]</pre>
<p>Even better, you can put a breakpoint in your error-handling callback. When an error occurs, you&#8217;ll drop immediately into the debugger with a full call stack leading right back to the problematic API call.  Keen!</p>
<p>Unfortunately, integrating this new functionality can be a bit of a headache.  Driver support for this extension is still far from universal, and getting it working involves messing with some  platform-specific windowing-system code that you probably haven’t looked  at in years (if ever). On the other hand, the only code you need to  modify is in the initialization stage of your application &#8212; the  boilerplate stuff that gets pasted unmodified into every new project.  You can write this code once and enjoy it for the rest of your OpenGL-using days. So if you’ve ever experienced the sheer unadulterated joy of sprinkling dozens of calls to glGetError() throughout your  codebase to track down the source of an oh-so-helpful GL_INVALID_OPERATION error,  please read on!</p>
<div style="text-align: center"><img class="alignnone size-full wp-image-9356" src="http://altdevblogaday.com/wp-content/uploads/2011/06/cat_watermelon.jpg" alt="" width="300" height="291" /></div>
<p style="text-align: center" dir="ltr">If I had my way, every GL_INVALID_ENUM error would be accompanied by this image.</p>
<p><span id="more-9346"></span></p>
<h3>Step 1: Is the extension available?</h3>
<p>Before you can use the new debug output features, your drivers need to  support them.  For Windows and Linux developers, NVIDIA added support in  August 2010 (driver version 259.31). AMD added preliminary support in  May 2011 (driver version 11.5a), with full support expected in a few  months time. However, AMD has their own vendor-specific <a href="http://www.opengl.org/registry/specs/AMD/debug_output.txt">GL_AMD_debug_output</a> extension which offers near-identical functionality with a very similar interface. On certain other white, hermetically-sealed platforms which shall remain nameless, your mileage may vary.  Fortunately, the absence of superior debug  output is hardly a fatal error; if something goes wrong, your app can  just soldier on as if nothing had happened.</p>
<p>The  topic of querying support for OpenGL extensions and setting up the  necessary function pointers could warrant a whole article by itself. The short version is that you should  use a library such as <a href="http://glew.sourceforge.net/">GLEW</a> to handle the gruntwork for you.  In the code below, I assume that GLEW has already been initialized.</p>
<h3>Step 2: Create a debug OpenGL context</h3>
<p>Time  to dust off some cobwebs!  Even if you’ve been working with OpenGL for  many years, it’s entirely possible you’ve never given any thought at all  to creating an OpenGL context (the object through which all future  OpenGL calls interact with your graphics hardware, behind the scenes).   Prior to OpenGL 3.0, there wasn’t much reason to care.  The process is  tedious, platform-specific, and completely independent of anything else  the application is doing &#8212; in other words, exactly the sort of code you want to embed in some larger engine or framework  and never touch again.</p>
<p>When  OpenGL 3.0 introduced a deprecation model for certain older/obsolete  functionality, the context creation process became slightly more  involved. You can now ask for a context which implements a particular  version of the OpenGL specification, and can also request that the  context should be “core” (strictly non-deprecated functionality only) or “compatibility”  (deprecated features supported where possible).  This additional  flexibility is exposed in a series of platform-specific extensions, such  as <a href="http://www.opengl.org/registry/specs/ARB/wgl_create_context.txt">WGL_ARB_create_context</a> (for Windows) and <a href="http://www.opengl.org/registry/specs/ARB/glx_create_context.txt">GLX_ARB_create_context</a> (for Linux).  These extensions also allow you to flag a new OpenGL context as a “debug context”, which is required by GL_ARB_debug_output. If you try to enable  debug output on a non-debug context, at best it will silently fail; at  worst, it may not even report that the extension is supported!</p>
<p>The  mechanism for creating a debug context varies from platform to  platform. I’ll cover the Windows implementation here, but the general  idea should be similar elsewhere (when in doubt, read the extension spec).  The full process is beyond the beyond the scope of this article;  instead, I’ll focus on how to extend pre-existing context creation code  to support this new functionality. Basically, somewhere in the depths  of your engine there will be code that looks something like this:</p>
<pre>HDC g_hdc;
HGLRC g_glContext = wglCreateContext(g_hdc);
wglMakeCurrent(g_hdc, g_glContext);</pre>
<p>If  wglCreateContext() is called more than once, focus on the final call;  the earlier ones are temporary dummy contexts which are probably deleted  a few lines later.  We need to use this final GL context to load a  pointer to the wglCreateContextAttribsARB() function (if available), and  use it to create our new final GL context.</p>
<pre>bool enableDebugContext = true; // This should be specified by the application
if (wglewIsSupported(“WGL_ARB_create_context”))
{
    // Retrieve a pointer to the wglCreateContextAttribsARB() function
    PFNWGLCREATECONTEXTATTRIBSARBPROC my_wglCreateContextAttribsARB;
    my_wglCreateContextAttribsARB = (PFNWGLCREATECONTEXTATTRIBSARBPROC*)
       wglGetProcAddress(“wglCreateContextAttribsARB”);
    if (my_wglCreateContextAttribsARB != NULL)
    {
        // Clean up and delete old GL context
        wglMakeCurrent(NULL, NULL);
        wglDeleteContext(g_glContext);

        // Create a new (optionally debug/core/compatibility) context
        int contextAttribs[] =
        {
            WGL_CONTEXT_FLAGS_ARB,
            enableDebugContext ? WGL_CONTEXT_DEBUG_BIT_ARB : 0,
            0
        };
        g_glContext = my_wglCreateContextAttribsARB(g_hdc, 0, contextAttribs);
        if (g_glContext == NULL)
        {
            int err = GetLastError();
            // handle error here
        }
        wglMakeCurrent(g_hdc, g_glContext);
    }
}</pre>
<p>Two things are worth noting in the above code. First, debug contexts do carry a performance cost, so they should only be enabled in development  builds. Also note the contextAttribs[] array, which is a zero-terminated  list of key/value pairs specifying which features you’d like the new GL  context to have.  Refer to the <a href="http://www.opengl.org/registry/specs/ARB/wgl_create_context.txt">WGL_ARB_create_context</a> extension specification for the full list of possible attributes; for  our purposes, only the debug bit matters.  Set this bit during context  creation, and if all goes well you’ll notice a difference in the string  returned by glGetString(GL_VERSION):</p>
<pre>3.3.10834 Compatibility Profile Context       // default context
3.3.10834 Compatibility Profile/Debug Context // debug context</pre>
<h3>Step 3: Hook up a custom error-printing callback</h3>
<p>If you’ve come this far, congratulations! The end is in sight! All that’s needed now is to register the callback function that will run whenever the OpenGL runtime detects an error. First, let’s whip up a utility function to format the various error codes into a human-readable string:</p>
<pre>void FormatDebugOutputARB(char outStr[], size_t outStrSize, GLenum source, GLenum type,
    GLuint id, GLenum severity, const char *msg)
{
    char sourceStr[32];
    const char *sourceFmt = "UNDEFINED(0x%04X)";
    switch(source)

    {
    case GL_DEBUG_SOURCE_API_ARB:             sourceFmt = "API"; break;
    case GL_DEBUG_SOURCE_WINDOW_SYSTEM_ARB:   sourceFmt = "WINDOW_SYSTEM"; break;
    case GL_DEBUG_SOURCE_SHADER_COMPILER_ARB: sourceFmt = "SHADER_COMPILER"; break;
    case GL_DEBUG_SOURCE_THIRD_PARTY_ARB:     sourceFmt = "THIRD_PARTY"; break;
    case GL_DEBUG_SOURCE_APPLICATION_ARB:     sourceFmt = "APPLICATION"; break;
    case GL_DEBUG_SOURCE_OTHER_ARB:           sourceFmt = "OTHER"; break;
    }

    _snprintf(sourceStr, 32, sourceFmt, source);
 
    char typeStr[32];
    const char *typeFmt = "UNDEFINED(0x%04X)";
    switch(type)
    {

    case GL_DEBUG_TYPE_ERROR_ARB:               typeFmt = "ERROR"; break;
    case GL_DEBUG_TYPE_DEPRECATED_BEHAVIOR_ARB: typeFmt = "DEPRECATED_BEHAVIOR"; break;
    case GL_DEBUG_TYPE_UNDEFINED_BEHAVIOR_ARB:  typeFmt = "UNDEFINED_BEHAVIOR"; break;
    case GL_DEBUG_TYPE_PORTABILITY_ARB:         typeFmt = "PORTABILITY"; break;
    case GL_DEBUG_TYPE_PERFORMANCE_ARB:         typeFmt = "PERFORMANCE"; break;
    case GL_DEBUG_TYPE_OTHER_ARB:               typeFmt = "OTHER"; break;
    }
    _snprintf(typeStr, 32, typeFmt, type);

 
    char severityStr[32];
    const char *severityFmt = "UNDEFINED";
    switch(severity)
    {
    case GL_DEBUG_SEVERITY_HIGH_ARB:   severityFmt = "HIGH";   break;
    case GL_DEBUG_SEVERITY_MEDIUM_ARB: severityFmt = "MEDIUM"; break;
    case GL_DEBUG_SEVERITY_LOW_ARB:    severityFmt = "LOW"; break;
    }

    _snprintf(severityStr, 32, severityFmt, severity);
 
    _snprintf(outStr, outStrSize, "OpenGL: %s [source=%s type=%s severity=%s id=%d]",
        msg, sourceStr, typeStr, severityStr, id);
}</pre>
<p>With that out of the way, writing the callback itself is a cakewalk. In this implementation, I use the callback’s userParam argument to store a FILE pointer to which the output should be printed:</p>
<pre>void DebugCallbackARB(GLenum source, GLenum type, GLuint id, GLenum severity,
                     GLsizei length, const GLchar *message, GLvoid *userParam)
{
    (void)length;
    FILE *outFile = (FILE*)userParam;
    char finalMessage[256];
    FormatDebugOutputARB(finalMessage, 256, source, type, id, severity, message);
    fprintf(outFile, “%s\n”, finalMessage);

}</pre>
<p>And finally, we need to register our callback with OpenGL during our application’s initialization (after the debug context is initialized, of course). We also enable synchronous output, which forces the driver to invoke the callback immediately when an error occurs instead of at some unspecified future time.</p>
<pre>if (glewIsSupported(“GL_ARB_debug_output”)
{
    glDebugMessageARB(DebugCallbackARB, stderr); // print debug output to stderr
    glEnable(GL_DEBUG_OUTPUT_SYNCHRONOUS_ARB);
}</pre>
<h3>Appendix for impatient AMD users</h3>
<p>As  mentioned above, AMD’s official drivers do not yet support the  GL_ARB_debug_output extension.  In the meantime, they <em>do</em> support the eeriely similar  <a href="http://www.opengl.org/registry/specs/AMD/debug_output.txt">GL_AMD_debug_output</a> extension (upon which the final ARB extension was clearly based). Both  extensions require the same debug OpenGL context initialization, so it’s  simple enough to include support for both in the same application; you  need only provide separate FormatDebugOutputAMD() and DebugCallbackAMD()  functions. I’ve provided implements below for convenience; there  shouldn’t be any surprises.</p>
<pre>void FormatDebugOutputAMD(char outStr[], size_t outStrSize, GLenum category, GLuint id,
                         GLenum severity, const char *msg)
{
    char categoryStr[32];
    const char *categoryFmt = "UNDEFINED(0x%04X)";
    switch(category)
    {
    case GL_DEBUG_CATEGORY_API_ERROR_AMD:          categoryFmt = "API_ERROR"; break;
    case GL_DEBUG_CATEGORY_WINDOW_SYSTEM_AMD:      categoryFmt = "WINDOW_SYSTEM"; break;
    case GL_DEBUG_CATEGORY_DEPRECATION_AMD:        categoryFmt = "DEPRECATION"; break;
    case GL_DEBUG_CATEGORY_UNDEFINED_BEHAVIOR_AMD: categoryFmt = "UNDEFINED_BEHAVIOR"; break;
    case GL_DEBUG_CATEGORY_PERFORMANCE_AMD:        categoryFmt = "PERFORMANCE"; break;
    case GL_DEBUG_CATEGORY_SHADER_COMPILER_AMD:    categoryFmt = "SHADER_COMPILER"; break;
    case GL_DEBUG_CATEGORY_APPLICATION_AMD:        categoryFmt = "APPLICATION"; break;
    case GL_DEBUG_CATEGORY_OTHER_AMD:              categoryFmt = "OTHER"; break;
    }
    _snprintf(categoryStr, 32, categoryFmt, category);

    char severityStr[32];
    const char *severityFmt = "UNDEFINED";
    switch(severity)
    {
    case GL_DEBUG_SEVERITY_HIGH_AMD:   severityFmt = "HIGH";   break;
    case GL_DEBUG_SEVERITY_MEDIUM_AMD: severityFmt = "MEDIUM"; break;
    case GL_DEBUG_SEVERITY_LOW_AMD:    severityFmt = "LOW";    break;
    }
    _snprintf(severityStr, 32, severityFmt, severity);

    _snprintf(outStr, outStrSize, "OpenGL: %s [category=%s severity=%s id=%d]",
        msg, categoryStr, severityStr, id);
}
void DebugCallbackAMD(GLuint id, GLenum category, GLenum severity, GLsizei length,
                     const GLchar *message, GLvoid *userParam)
{
    (void)length;
    FILE *outFile = (FILE*)userParam;
    char finalMsg[256];
    FormatDebugOutputAMD(finalMsg, 256, category, id, severity, message);
    fprintf(outFile, "%s\n", finalMsg);    
}</pre>
<h3>Further reading</h3>
<ul>
<li>The extension specs for <A HREF="http://www.opengl.org/registry/specs/ARB/debug_output.txt">GL_ARB_debug_output</A>, <A HREF="http://www.opengl.org/registry/specs/AMD/debug_output.txt">GL_AMD_debug_output</A>, <A HREF="http://www.opengl.org/registry/specs/ARB/wgl_create_context.txt">WGL_ARB_create_context</A> and <A HREF="http://www.opengl.org/registry/specs/ARB/glx_create_context.txt">GLX_ARB_create_context</A>.</li>
<li>The GL Extension Wrangler Library (<A HREF="http://glew.sourceforge.net">GLEW</A>)</li>
<li>A more in-depth <A HREF="https://sites.google.com/site/opengltutorialsbyaks/introduction-to-opengl-4-1---tutorial-05">GL_ARB_debug_output tutorial</A> by Aks.
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.altdevblogaday.com/2011/06/23/improving-opengl-error-messages/feed/</wfw:commentRss>
		<slash:comments>15</slash:comments>
		</item>
	</channel>
</rss>
