About a month ago, I bought an iPhone 4s, so I write some code on my new toy. Although this device does not support multiple render target(MRT), it do support rendering to a floating point render target (only available on iPhone 4s and iPad2). So I test it with a light pre pass renderer:
In the test, HDR lighting is done (gamma= 2.0 instead of 2.2, without adaptation) with 3 post processing filters (flimic tone mapping, bloom and photo filter). In the test scene, 3 directional lights(1 of them cast shadow with 4 cascade) and 30 point lights are used with 2 skinned models and running bullet physics at the same time which can have around 28~32fps.
I have tried 2 different layout for the G-buffer. My first attempt is to use one 16-bit render target with R channel storing the depth value, G and B channel storing the view space normal using the encoding method from “A bit more deferred-CryEngine 3” and A channel storing the glossiness for specular lighting calculation. But later I discovered that this device support the openGL extension GL_OES_depth_texture which can render the depth buffer into a texture. So my second attempt is to switch the G-buffer layout to use the RGB channels to store the view space normal without encoding and A channel storing the glossiness while the depth can be sampled directly from the depth texture.
Switching to this layout gives a boost in the frame rate as the normal value does not need to encode/decode from the texture. However, making the 16-bit render target to 8-bit to store normal and glossiness does not give any performance improvement, probably because the test scene is not bound by band width.
|drawing the bounding volume of the point lights|
However, after finish implementing the stencil trick, the frame rate drops… This is because when filling the stencil buffer, I use the shader that is the same as the one used for performing lighting. Even the color write is disabled during filling the stencil buffer, the GPU is still doing redundant work. So a simple shader is used in the stencil pass instead which improve the performance.
Also, drawing out the shape of the point lights make me discover that the attenuation factor I used (i.e. 1/(1+k.d+k.d^2) ) have a large area that does not get lit, so I switch to a more simple linear falloff model (e.g. 1- lightDistance/lightRange, can give an exponent to control the falloff) to give a tighter bound.
Combining post-processing passes
Combining the full screen render passes can help performance. In the test scene, originally the bloom result is additively blend with the tone-mapped scene render target, followed by a photo filter and render to the back buffer. Combining these passes by calculating the additive blend with tone-mapped scene inside the photo filter shader which is faster than before.
The program is run at a low resolution with back buffer of 480x320pixels. Also, the G-buffer and the post processing textures are further scaled down to 360x300pixels. This can reduce the number of fragments need to be shaded by the pixel shaders.
In the scene, cascaded shadow map is used with 4 cascade (resolution= 256×256). I have tried using the GL_EXT_shadow_samplers extension, hoping that it can helps the frame rate. But the result is disappointing as the speed of the extension is the same as performing comparison inside the shader…
It takes around 8ms for calculating shadow and blurring it. If a basic shadow map is used instead (i.e. without cascade) with blurring, it gives some or little performance boost depends on whether there are how many point lights on screen. Of course switching off the blur will speed up the shadow calculation a lot.
In this post, I described the methods used to make a light pre pass renderer to run on the iPhone to achieve 30fps with 30 dynamic lights. However, high resolution is sacrificed in order to keep the dynamic lights, HDR lighting and the post processing filters. Also, no anti aliasing is done in the test as the frame rate is not good enough. May be MSAA can be done if the basic shadow map is used instead of cascade. But these will leave for future investigation.
 Light Pre Pass Renderer: http://diaryofagraphicsprogrammer.blogspot.com/2008/03/light-pre-pass-renderer.html
 A bit more deferred – CryEngine 3: http://www.crytek.com/sites/default/files/A_bit_more_deferred_-_CryEngine3.ppt
 Filmic tone mapping operators: http://filmicgames.com/archives/75
 Crysis Next Gen Effects: http://www.crytek.com/sites/default/files/GDC08_SousaT_CrysisEffects.ppt
 Position From Depth 3: Back In The Habit: http://mynameismjp.wordpress.com/2010/09/05/position-from-depth-3/
 Fast Mobile Shaders: http://blogs.unity3d.com/wp-content/uploads/2011/08/FastMobileShaders_siggraph2011.pdf
 GLSL Optimizer: http://aras-p.info/blog/2010/09/29/glsl-optimizer/
 Deferred Cascaded Shadow Maps: http://aras-p.info/blog/2009/11/04/deferred-cascaded-shadow-maps/