I’m working as a tool programmer at my current workplace. As the code base gets bigger  and more people are working on it, more unexpected disruptions happen in a module due to some changes in a totally different module. Especially our editor stopped working properly many times because of some innocuous engine changes. To prevent this, I sought for a way to automate testing the editor.

(In this article, I deal with the black-box testing with respect to a GUI application, only.)

Alternatives Considered

I considered following stuff starting from the ‘AutoIt’ which I’ve been quite familiar with.


  • A Basic-like scripting language for Windows platform
  • This identifies a UI element based on its control ID, control class or a text on it.
  • Very useful for general automation on Windows
  • Free


  • A commercial test automation tool
  • AFAIK, this uses a mechanism similar to AutoIt to identify UI elements.
  • Full-featured IDE support
  • Multiple scripting language supported


  • Jython-based, multi-platform
  • Computer-vision-based UI elements identification
    • A match is found by comparing images based on a user-specified percentage threshold
  • Basic IDE support
  • Early in development
  • Open source

SIKULI’s innovative concept and the fact that it can respond to any ‘thing’ on the screen, not just regular UI elements seduced me into using it. You can easily see how SIKULI functions by reading some tutorials here or watching this video:

[youtube width="640" height="505"]http://www.youtube.com/watch?v=FxDOlhysFcM&feature=player_embedded[/youtube]

From here on, I’ll describe how my adventure with SIKULI has been going on.

How It’s Going

My first goal was to automate daily regression tests that our QA guys had been doing manually for the editor. Until now, about seventy percent of them could have been automated. Once all registered tests finish running, a report is generated in a html format. It shows how each test went and what match was an issue in case of a failure. For instance, we’re testing following things:

  • Launching/finishing the editor
  • Opening/closing of each sub-editor window
  • Creating a new level and saving it
  • Loading an existing level
  • Exporting a level
  • View mode switch
  • Basic feature tests for each sub-editor

But, the overall process wasn’t as smooth as I expected. It’s been found out that its computer-vision-based innovation is a double-edged sword.

What I learned

  • High maintenance cost: When there is any change in GUI like icon/text changes or overall resizing, all test images should be updated. Can be very onerous.
  • It’s a balancing act to choose a region of the image to be compared to: If one chooses a too small region, the match precision decreases so that there happen to be more than one matches in some cases. If a too big region chosen, the test becomes too sensitive to any GUI change and the maintenance cost gets higher.
  • Issue of evaluating the result: The result evaluation usually happens through the image comparison, too. Again, this is too susceptible to small UI changes. Sometimes, this was used to check the result of the viewport rendering, but a stable testing was almost impossible because of frequent rendering changes in development. For example, if the backbuffer clear color changes for whatever reason, the relevant test images must have been updated to keep the test valid. In the end, for this kind of rendering tests, I went to a semi-automatic(?) alternative which provides a screenshot for a human to decide rather than trying to determine by machine alone.
  • Still in early development, SIKULI is quite rough around the edges and has some bugs: For instance, there was a memory leak related to a specific API function. For a bug which made a key input API ‘type()’ not work properly when the keyboard layout isn’t English, I had to use a clipboard API function ‘paste()’ instead to input a text. An essential feature like renaming test images in the IDE was missing (this has been implemented in the latest version), and there isn’t a good solution to modularize its project at the moment, which is quite important as the test set gets bigger. Fortunately SIKULI developers were very quick to respond to questions and requests and its QnA forum was helpful, also.
  • Difficult to debug: Due to the lack of support for a full-fledged debugging environment, it used to take longer than ideal to diagnose and fix when a rather long test function isn’t working as expected.
  • Execution speed isn’t that fast as one can expect since it performs an image comparison, but this wasn’t that much of an issue. Rather, some strange slow-downs when using a specific IDE feature sometimes slowed me down.

As written, it seems that I’m describing it pretty badly… But, don’t get me wrong. Even when other alternatives are used, which rely on the GUI control ID/class-based identification, similar maintenance issues can occur. Furthermore, there are situations where SIKULI provides an easy solution while those alternatives don’t.

  • If its development continues with enough momentum, I think SIKULI can be a prospective & mature tool.
  • It’s almost impossible to implement a stable & low-maintenance-cost GUI auto-test relying solely on an external UI identification. If an external tool can go more deeply to the tested application by using the testee’s exposed scriptability or something, a test set that is much more robust and easier to maintain can be achieved.
  • SIKULI seems to be useful for the rendering regression test.
  • Very useful for automating some personal mundane tasks on your machine.


You  can see the same post on my personal blog, but only if you can read Korean. ;)