freak | PureBasic Team Blog

There are good bugs and there are bad bugs and a few are just downright nasty. We had some of those this week. So instead of our planned schedule, we had a fun little “bug week” digging through endless valgrind output trying to make sense of it all.

The good bugs are those where the symptoms directly provide a clue as to where to start looking. The bad bugs are those where you start with no clue at all, but it can be narrowed down eventually. The nasty bugs are those where the symptoms point into the entirely wrong direction, so you spend the time investigating totally unrelated code, second guessing your own sanity all the way until you stumble on the solution more or less by accident. The mother of all nasty bugs is one where in the end, it turns out the cause for all of the mess was something very simple and trivial. We had one of those this week.

The symptoms: The Debugger on Linux suddenly failed to work on KUbuntu with PB 4.20. Running a program with debug output would cause a crash with an X error. The weirdness part was the fact that it worked fine from a root account.

The trackdown: The root acount thing was the most puzzling symptom. It suggests that there is some form of access rights problem which in itself is weird, as the debugger does not do anything that could require special rights. Because of the X error, i suspected some gtk problem, so i fixed all gtk warnings and errors given while running the PB IDE. This took a while and is almost a story in its own right, but it was not the cause for the crash. After a search in this direction turned up nothing i turned to valgrind in the hope of finding some clue as to where the crash comes from. Here it got even more weird, as the valgrind output on KUbuntu was an endless list of “invalid read access beyond end of buffer” errors (which are usually serious), where the output on Suse was almost empty. After another few hours of search, i tried the same on a regular Ubuntu and got the same errors but no crash. We never figured out what these were, but they appear to come from some external library, and they did not cause the crash, so its not really our problem. Using the valgrind thread analysis tool was an intresting exercise (the debugger does a lot in threads), but lead to nowhere as well.

Long story short, in the end i finally managed to track it all down to a HideWindow() call (to show the debugger window), so it must have been some visual element on that window. From there it was just a comment/uncomment test to find the part that caused it all.

The actual cause: In the end, it turns out the tiny factor that led to all this mess was the StatusBar of the Debug output window. More specifically, AddStatusBarField() was called with a too high value. As there was (so far) no way to have a statusbar field go all the way to the right of the window, a common way was to just call AddStatusBarField(9999), which should be bigger than any window ever is. This worked ok so far, but in this call, there was 99999, and appearently that was too much. Some kind of allocation failed and caused the X error. To solve this for the future, you can now call AddStatusBarField(#PB_Ignore), and it will be sized with all remaining free space. There is also a check in the command now against too large fields.

This is somewhat disappointing. If you work on a bug for a long time, you want it to atleast be something significant. Some big thing that you finally figured out. Not such a small and insignificant function call. Its an awefull lot of time to waste, just because of an extra ‘9’. Well, think i’ll get over it 🙂

In addition to this one, we also had the fun of dealing with a heisenbug . A crash in the debugger that just disappeared as soon as we inserted printf() statements to try and narrow it down.

Anyway, we are back on track now. The (hopefully) final round of alpha versions is out to the testers and if all goes well, you can expect a public beta quite soon. (As usual, we’re not setting any dates though. You never know what comes along)

For 4.30, i worked a bit on LinkedLists, and how they allocate their elements. The original motivation was to ensure that linkedlist elements are as close together in memory as possible, limiting the penalty of page faults and cache misses when accessing/sorting large lists. In the end, the benefit is also for the more normal cases and not just these large extremes.

The obvious solution is to allocate the linkedlist elements in larger blocks, but this requires another level of memory management above that of the OS. What i have learned in my past encounters with memory management is that the OS memory management is very hard to beat or even come close to in performance, given how complex the task of managing memory is. However, with linkedlists there is a fact of which the OS functions (having to work in all scenarios) cannot take full advantage: A linkedlist needs a large number of small, equally sized memory blocks.

Lets start with some statistics:

My system runs Vista64 with 8Gb of RAM but i did my test with PB 32bit. So the limits described are purely due to the limitations of the processes own address space, and not due to overall system memory. I tested with a list of type Long. This is a very small example, but not a very unlikely one at that. So including the linkedlist pointers every element is 12bytes in size. I tried to squeze as many elements into the list as possible before getting a crash and then did some statistics on the heap. The results with a plain 4.20 final are this:

List elements:   40000000
Blocks total:    40000000
Size total:      457.76 Mb
Size average:    12 Byte
Overhead total:  457.76 Mb

First thing to notice is the heap overhead: It is exactly the same as the total amount of allocated memory. This is because the heap requires 12byte on average to manage an allocated memory block. So this is a very bad usage/overhead ratio. The second thing to notice is that 450Mb is not much usable memory, given the fact that the process should have almost its whole 2Gb of available user memory space for this list. This suggests a huge amount of fragmentation, filling up all space with not even 500Mb of usable data.

So whats the better solution ? It is to allocate the memory in larger blocks, with a minimum possible amount of extra management data per element. Also a free element must be found very fast. If there has to be a search for a free spot it will kill all performance. I ended up with a quite good solution requiring only 4byte of extra data per element, and findining a free element as well as freeing an element in O(1), meaning without any kind of loop to find the free spot. I won’t bore you with the details here. What i ended up is actually not a new idea. It is in fact quite old. Had i found the following paper earlier, i could have saved myself some thinking :-). If you are intrested, the paper is available here. My implementation is a bit different from what the paper describes, but the basic idea is the same.

So, here is the result from 4.30:

List elements:  120000000
Blocks total:	1977
Size total:    	1.79 Gb
Size average:	948.57 Kb
Overhead total:	15.55 Kb

So we can see: Many more elements fit into the list, with much more really used memory. The overhead is very small now because the average block size is about a Megabyte and there are not that many blocks because of that. But there are even more benefits from this:

Allocating/freeing elements is faster, because the number of alloc/free calls is greatly reduced.
Walking the list (and also sorting it of course) is faster, although not as much as i had hoped. This effect should be more visible on systems with less physical memory available, where the OS is forced to swap pages out to disk more often.
ClearList() is mighty fast now, as it does not have to walk the list to free all elements. All it has to do is walk the list of blocks and free those. (so in the above case thats 2000 operations instead of 120000000). A tiny exception here are string lists, as the list must be walked to free all strings anyway.

So whats the catch ?

Yes, there is a catch. There always is, but in this case the benefits much outweigh the costs. The problem is that the list will have unused but allocated elements when not all blocks are full. To limit this, allocation blocks grow dynamically depending on the list size, so the overhead of unused but allocated elements is very small compared to the overall list size.

The bigger problem arises when freeing part of the list. A block can only be freed when all its elements are unused. Consider the (highly unlikely) case that i use the list above, then free all elements but 2000 of them and unlucky as i am, there is exactly one element remaining on every single memory block. This means no memory is freed at all. This sounds like a big problem at first, but really its not that big of a deal. When allocating new elements, every available space is used first, so when removing/adding elements on a frequent basis (which is usually the case) the number of unused slots will still be very low. The only bad case is the above: making a list very big, then deleting random elements (not all) and then never touching the list again. This is a very unlikely case. Consider also this: In the stats from 4.20, 50% of the used memory was heap overhead. So even if now almost half of the allocated elements are not even used, you are still better off than before because the memory fragmentation is greatly reduced.

So overall i think this is a great improvement. Something to look forward to.

PureBasic Team Blog

random thoughts on PureBasic development

Author Archives: freak

The killer StatusBar

LinkedList memory management