We have been talking a lot lately in the office about the true cost of easy bugfixing. You often hear that a bug gets more expensive the further down the pipeline it exhibits itself; however, the true cost is easily hidden from the eyes of the programmer, for whom a fix is just a couple of keystrokes. It is tempting to mow down bugs and watch the statistics fly up, but you might just be kicking the can down the road, were a single bugfix today could mean two bugfixes tomorrow.
I’ve been thinking of a semi-formal pre-commit checklist, similar to a pilot’s pre-launch list.
Some of these are trivial, some may not apply to your development process or product; some could be solved by more automation, while others are the kinds of things that should keep technical leads awake at night. “Find someone to talk to” may be added to many of them; and “take your time” to all.
Here’s the list I came up with.
Does it compile?
No, seriously. People seem to be unable to resist the temptation of “it’s a trivial change, I’ll just fix the typo and commit back”. Fortunately omissions of this type are relatively harmless, as they are caught quickly. The embarrassment, however, is real.
On all platforms and configurations?
I recently did a search for “fixed xbox compilation” in our Subversion repository commit logs, and I’m not telling you how many hits I found until I am safely retired. Think of other platforms (PC/360/PS3) and configurations (Debug/Release/RealRelease/EvenMoreRealRelease etc); if possible, try to recompile for all of them. If compilation is so slow that this is impractical, work on your compilation times!
Does it run at all? Does it really run your changes?
This is especially important with dynamically typed languages, where you detect very few problems at compile time. Make sure the game runs, and make sure you are really exercising the code you introduced or modified. Place a breakpoint in your debugger and trace around.
Did you read your changes line by line?
As a wise friend of mine oft says, the job of a programmer is not to write code; it is first and foremost to think. Don’t skim; read carefully every change line by line in your diff tool, and think about them.
Did you clean up prints and supplementary debug code you wrote?
If you added prints, disabled optimizations locally or wrote some debug code to help you find the problem, clean it up! Verbose output logs littered with debug prints are hard to read, hide real issues and often are ignored. If you have built some kind of debugging support which is both non-trivial and likely to be used in the future, then by all means, clean it up and commit it – but do so in a separate commit.
Do you understand the bug? Is it fixed? Did it “fix itself”?
Did you reproduce the bug without your fix and verify that it disappears when the fix is applied? Or did you just happily assume that because you wrote some code and now you can’t observe the bug, it was your code that fixed it? Correlation does not imply causation.
Even in edge cases – errors, hostile user behavior etc?
The bug, especially when coming from testers, was most likely encountered during “normal use”; what happens when you exercise the edge cases? Is the code “around” your change still working? Make sure you test more code in the vicinity of your change, not just its direct effects. For example, if you added a new setting to the options dialog, make sure the entire dialog still works, options are properly saved, loaded, or restored as appropriate on apply or cancel etc.
Did you delete code made unnecessary by this change?
If you deleted code as part of your fix, check if you should delete more. If what you deleted was the last usage of a local utility function or a global configuration variable, think about whether these should also be deleted. Will they be of any use to potential future code? I have seen long configuration files with system and gameplay constants defined that haven’t been used for years; it is easy to add yet harder to remember it needs removing.
Is it the simplest possible code that works (but no simpler!)?
Choosing the right spot on the simple/clean/hacky <–> complex/flexible/over-engineered continuum for each thing you do is ultimately one of the most difficult aspects of software engineering. It is the hardest to put down in rules and it requires the most experience and wisdom. A great amount can be written about it, and still no recipe given. Imagine yourself having to maintain the code ten years from now, at midnight, with a headache, talking to an intern you have never met, on the phone.
Did you properly set up the commit in your source control system?
Did you e.g. add the files in Subversion or git? Did you delete the files no longer referenced by any projects? Are you working in the right branch?
Is the commit message short, informative and clear?
The most important role of the commit message is to help someone reading the log a year from now to find what he needs. Jokes, profanities and passive-aggressive whining might seem interesting, in a literary sense, right now, but chances are they’ll be stale on the next day.
Did you significantly change the performance or memory usage?
Is your change using more memory? Have you changed, for the worse, the Big-O complexity of the code? Does it have significantly more allocations than the old code, directly or indirectly? If using a language with a GC, are you generating more garbage for the collector? If you answer yes to any of those, think about how it affects the overall performance of the system. Can you afford it? Is it warranted, in terms of bang for buck ratio?
Does it work in non-developer mode? Should it?
Many developers run their games in a special “developer mode” during development, where more debug facilities are enabled. Maybe you accidentally made your code depend on this developer mode, and it will not run when QA plays the full release version? Or maybe you left debug functionality enabled in non-developer mode?
Does it affect savegames?
Savegames are finicky, fragile beasts. Is there any new state introduced by your code change which has a need to be explicitly saved, or explicitly forbidden? If you changed anything related to the game world, ensure it works across save and load. (Corollary: strive to have working save/load from day one, and to keep it working.) If the game is already released and you’re working on a patch or DLC, make sure you test both forward and backward save game compatibility.
Does it affect localisation?
If you are changing anything related to the UI (but not only!), think of how your change will be affected by localisation. Did you introduce new text? Is this still allowed, or is the localisation table locked down and sent to be translated? Did you leave enough space in the UI for more verbose languages? Are you letting user-input strings through to the filesystem and does your filesystem allow the encoding you used?
Does it affect multiplayer?
Depending on your multiplayer model, this can cover many things. Did you fix the corresponding parts on the client and the server? If you have a lockstep networking model, did you introduce new desyncs by accessing uninitialized variables and haphazardly pulling random numbers off the wrong generator? Have you allowed the players new ways to cheat or draw penises inside the game?
Does it affect threading?
Anything which touches directly low-level threading code, even a “completely safe” critical section, must be double and triple-checked. Code that introduces new synchronisation primitives or threads, IMHO, must be discussed with and reviewed by another programmer before committing.
Does it affect cert?
This is a huge topic by itself; very few people in the studio can keep in their mind all the requirements for the different platforms; but you should at least know the basics in your area of expertise. For example, if you do any UI work, you should check that you have not introduced any important UI elements near the edges of the screen.
Does the fix require art or game content to be redone? Does the bug affect other pieces of art or game content? How can you find them?
If the bug is related to consuming “content” such as art assets, mission data or localisation tables then sometimes you need to redo them (by hand via the artists) or reprocess them (e.g. if you fixed a pipeline tool). Can you automatically find all other content pieces that would exhibit the same bug? Do you need to let your artists or designers know about the problem? Should they do something differently?
Should the fix be merged into other active branches?
Is the fix relevant to other branches of the same codebase under active development? Let the maintainers of the other branches know about them but do not merge blindly, go through this checklist for each merge destination!
Did you update the adjacent comments in the code?
There is an old adage claiming that “when the code and the comments disagree, they are probably both wrong”. Do not simply ignore comments for the buggy code you just fixed; at least delete them if for some reason you do not update them.
Did you update the documentation relevant to the change?
Here “documentation” can be anything from Doxygen-style function headers and class overviews, tooltips and description texts in tool GUIs, to test plans or external documents such as a wiki. Your change might do more harm than good if it doesn’t also include the documentation.
Are there other bugs for the same issue filed in the bugtracker?
Some types of bugs are frequently misinterpreted by testers and filed numerous times, without them ever recognising it is actually the same bug. Make an effort to seek all instances out and resolve them, this maybe the most pleasant part of this entire checklist!
Are similar bugs likely to exist in other systems? How can you find them?
Think of other places that might have the same problem you just fixed; for example, if your highscore browsing code does not handle network timeouts properly, the probability is high that your user-generated-content browser does not either. Should this be fixed in each place, or should a common part be refactored out of these sections?
Do you understand why the bug was introduced in the first place?
Were there any other bugs introduced by the commit, maybe due to the same mistakes or false assumptions?
Bugs do not appear in the codebase because programmers are evil and it is rarely the case they are intrinsically stupid. Try to find out which false assumption or lack of information led to the bug, and read carefully through the commit that introduced it, especially if it was a bulk commit with a large amount of new code.
Can you prevent it, or other similar bugs from appearing in the future? Better interface, automated tests, more data validation?
With each bugfix, try to improve the usability of your internal development; do not just fix an art bug, instead make the tools used by the artist warn about it as early as possible (export time is better than process time, which is better than load time, which is better than runtime); do not just fix a misused API call, try to redesign the API so that it is easy to use correctly and practically impossible to misuse. If you don’t have an automated test system, think about setting one up. And finally, maintain a “smell map” of the codebase – know what subsystems are due for a cleanup, refactoring or redesign next time you have some breathing space between projects.
This is my list. It is probably heavily biased towards my work as a PC/360 retail game programmer; what can you add from your field of work? I’d love to hear your suggestions in the comments.