This is really more of a follow on from my previous article, Preparing for Parallelism. Some of the comments wanted more explanation, and an example of what I actually mean about the likes of incorrect data and controlling execution context.
What I explained was a tips for a component/entity system, a task based one: concurrent systems, running in parallel, and communicating effectively between each other. Fine-grained parallelism does not apply. I also mentioned the fact that being able to accept, and properly deal with out of date ‘incorrect’ data can lead to some novel opportunities for parallelism.
Not to say the methods I employ lead to perfect parallelism and scalability, I do believe they allow for implementing a level of concurrent operation in a cost-effective manner on current hardware, without the hidden quirks that locks can have creep up on your frame profile. I include ‘lock-free algorithms’ in that, as those can cause nasty contention issues. If you disagree, please comment, as I’m really interested in hearing other opinions and experiences on the matter.
First of all, what changes are those that really make concurrent execution hard, even if you allow for reading data which is possibly a frame out of date?
Data structure modification is the weak point
This understanding is critical: modifying data, in itself, is not a dangerous operation when thinking about concurrent operation. If that write to data changes the layout of data in which you are operating on, that is dangerous. For example, here is the simple operation of adding a new node of data to a linked list, and where the things go wrong:
The call to new in the bubble may not be a Weak Point depending on your memory management, but I added it for completeness. The actual data in the node could be computed completely in parallel, regardless of anything else – and without locks. The only issue is when we modify the data structure by writing the address of your new node into a next pointer. Modifying this pointer could cause nasty race conditions. A solution is to calculate the data, populate a node structure and store that somewhere in parallel: leaving the last write of the next pointer until some point in the future, when we know it is safe to do so. It means you may need some extra information, like store where you want to insert this node for future reference, and deal with possibilities of out of date data, but most of the time in my experience you do not need that much extra code.
In short: design your code for minimising weak points in execution. If you minimize these points, you will reap benefits come the time you want to run this on another thread, or on SPU.
A working example
I’m going to use simple examples from my C# and XNA hobby work here; mainly because I can go into further detail with them without risk of NDA breaches. Even if you do not know C# they should still make sense, as little code will actually be written here.
The project has a threaded asset loader. It is a standard worker-thread style and in the code it works along the lines of:
TextureSource icon = AssetLoader.GetTextureSource(“icon.png”);
All this function does is add a request to a list of items – and immediately returns an object that contains a reference to a ‘Loading’ texture. The worker thread from time to time takes a peek at the list to see if an item exists which is marked as pending. When it sees it, it does the intensive load operation, and saves the reference to the real texture in the list. Whilst this is happening; the renderer could have processed this texture many times, but as the structure is as it should be (it’s a valid texture, even if it is not the one we asked for immediately) it runs fine. What the loading thread cannot do is to immediately go and write back the correct texture back as soon as it has finished loading – as it may be in use in some internal renderer computation we do not want to interrupt. Doing that write would have changed the structure of data in the renderer.
Instead, I have an AssetLoader.Commit() function which is called on the main critical path at a point I know where these structural changes can safely occur. The function goes over the complete requests, and then fixes up the references to the loading texture, replacing it with the real requested one. This is the ‘Write’ phase of the system, and allows the heavy data crunching part to run concurrently, at the cost of a very small set of possible write operations elsewhere. In this case it’s an easy decision to make and you can see the value here as no data locks are required on the critical path.
Although this is a short post, I hope that it explains some aspects of what I meant in the original article. In essence, instead of ‘I’ll compute this data and add it to the queue, or remove it from the tree’, we want ‘i’ll compute this data and fill out a node, ready for adding it to the queue later. Or flag this node as to be removed, and we will actually do the removal later’. By separating out these weak points of code that cause race conditions, you can schedule them in such a way as to maximize concurrency in the heavy lifting work – without locks.
Colin Riley is the Games Technology Director at Codeplay, based in Edinburgh, Scotland. His personal blog is available at http://sorryaboutthecaps.co.uk and can be followed on twitter @domipheus.
