The killer StatusBar

There are good bugs and there are bad bugs and a few are just downright nasty. We had some of those this week. So instead of our planned schedule, we had a fun little “bug week” digging through endless valgrind output trying to make sense of it all.

The good bugs are those where the symptoms directly provide a clue as to where to start looking. The bad bugs are those where you start with no clue at all, but it can be narrowed down eventually. The nasty bugs are those where the symptoms point into the entirely wrong direction, so you spend the time investigating totally unrelated code, second guessing your own sanity all the way until you stumble on the solution more or less by accident. The mother of all nasty bugs is one where in the end, it turns out the cause for all of the mess was something very simple and trivial. We had one of those this week.

The symptoms: The Debugger on Linux suddenly failed to work on KUbuntu with PB 4.20. Running a program with debug output would cause a crash with an X error. The weirdness part was the fact that it worked fine from a root account.

The trackdown: The root acount thing was the most puzzling symptom. It suggests that there is some form of access rights problem which in itself is weird, as the debugger does not do anything that could require special rights. Because of the X error, i suspected some gtk problem, so i fixed all gtk warnings and errors given while running the PB IDE. This took a while and is almost a story in its own right, but it was not the cause for the crash. After a search in this direction turned up nothing i turned to valgrind in the hope of finding some clue as to where the crash comes from. Here it got even more weird, as the valgrind output on KUbuntu was an endless list of “invalid read access beyond end of buffer” errors (which are usually serious), where the output on Suse was almost empty. After another few hours of search, i tried the same on a regular Ubuntu and got the same errors but no crash. We never figured out what these were, but they appear to come from some external library, and they did not cause the crash, so its not really our problem. Using the valgrind thread analysis tool was an intresting exercise (the debugger does a lot in threads), but lead to nowhere as well.

Long story short, in the end i finally managed to track it all down to a HideWindow() call (to show the debugger window), so it must have been some visual element on that window. From there it was just a comment/uncomment test to find the part that caused it all.

The actual cause: In the end, it turns out the tiny factor that led to all this mess was the StatusBar of the Debug output window. More specifically, AddStatusBarField() was called with a too high value. As there was (so far) no way to have a statusbar field go all the way to the right of the window, a common way was to just call AddStatusBarField(9999), which should be bigger than any window ever is. This worked ok so far, but in this call, there was 99999, and appearently that was too much. Some kind of allocation failed and caused the X error. To solve this for the future, you can now call AddStatusBarField(#PB_Ignore), and it will be sized with all remaining free space. There is also a check in the command now against too large fields.

This is somewhat disappointing. If you work on a bug for a long time, you want it to atleast be something significant. Some big thing that you finally figured out. Not such a small and insignificant function call. Its an awefull lot of time to waste, just because of an extra ‘9’. Well, think i’ll get over it 🙂

In addition to this one, we also had the fun of dealing with a heisenbug . A crash in the debugger that just disappeared as soon as we inserted printf() statements to try and narrow it down.

Anyway, we are back on track now. The (hopefully) final round of alpha versions is out to the testers and if all goes well, you can expect a public beta quite soon. (As usual, we’re not setting any dates though. You never know what comes along)