Debugging tips – Part 1: Heap corruption

If you are wondering why this blog is silent for so long, its because we are in the beta phase of 4.30, and writing about bugfixing and documentation work is just boring.

I want to talk a bit about the most common problems i have seen on the forums where the PureBasic debugger is of not much help and which therefore leave most users quite confused. The usual reaction is to post it as a bugreport. So in the future, if somebody posts a bugreport about AllocateMemory(), you can point him to this article and it should explain everything 🙂

Symptoms:
  • A crash at AllocateMemory() or FreeMemory() even though the given input is valid.
  • A crash on a simple string assignment
  • Modifying the code in seemingly unrelated places makes the problem go away.
Reason:

First of all: AllocateMemory() is never the problem. Its a direct wrapper to the HeapAlloc() API function and also a heavily used function. If it had a bug we would know by now. What you have gotten yourself into here is what is called a heap corruption. You destroyed part of the data Windows uses to manage allocated memory.

When Windows allocates memory, it keeps a data structure to manage the allocated memory (usually 12bytes on 32bit Windows). This data structure is normal, writable memory which means that you will not get any access error when accidentally writing over it. It is nowhere specified where this data is kept, but it is a fact that it sometimes ends up right after your allocated memory buffer. Now, if you happen to write over the end of this buffer by just a few bytes, you destroy the heap structure without getting any error. The crash only happens later when another attempt to allocate or free memory causes Windows to examine this piece of heap data and crash due to an invalid pointer. This fact, that the cause of the problem and the actual crash are in different places makes this kind of bug so hard to debug.

Why does this not happen always when you overwrite a buffer ?

Getting the heap data right at the end of the allocated buffer is a very rare condition. Windows often rounds the allocated buffer size up to page boundaries or for alignment purposes, so you often get more memory than you asked for, which makes a small overwrite have no effect. Another scenario is that the memory after your allocated buffer is simply marked as invalid, in which case you get an error when trying to write to it. Which of these scenarios actually happens depends on the sequence of memory allocations done by your program, which means that if you comment another totally unrelated program part, you may change the allocation sequence and the problem seemingly disappears. (Note that in this case only the symptom disappears, not the problem of writing over the end of a buffer.)

Solution:

As we saw above, the problem is not where the crash is. Fortunately, Windows provides a function to check if the heap structures. Below is a piece of code that can be used to check the most used memory heaps in PureBasic: (this works in PureBasic Windows 32bit and 64bit)

Procedure _TestHeaps(File$, Line)
  Protected StringHeap, MemoryBase, MemoryHeap

  CompilerIf #PB_Compiler_Processor = #PB_Processor_x86
    !extrn _PB_StringHeap
    !extrn _PB_Memory_Heap         

    !mov eax, dword [_PB_StringHeap]
    !mov [p.v_StringHeap], eax
    !mov eax, dword [_PB_MemoryBase]
    !mov [p.v_MemoryBase], eax
    !mov eax, dword [_PB_Memory_Heap]
    !mov [p.v_MemoryHeap], eax
  CompilerElse
    !extrn PB_StringHeap
    !extrn PB_Memory_Heap

    !mov rax, qword [PB_StringHeap]
    !mov [p.v_StringHeap], rax
    !mov rax, qword [_PB_MemoryBase]
    !mov [p.v_MemoryBase], rax
    !mov rax, qword [PB_Memory_Heap]
    !mov [p.v_MemoryHeap], rax
  CompilerEndIf

  If HeapValidate_(StringHeap, 0, 0) = 0
    MessageRequester("StringHeap corrupted !", File$+" : "+Str(Line))
  EndIf

  If HeapValidate_(MemoryBase, 0, 0) = 0
    MessageRequester("MemoryBase heap corrupted !", File$+" : "+Str(Line))
  EndIf

  If HeapValidate_(MemoryHeap, 0, 0) = 0
    MessageRequester("AllocateMemory heap corrupted !", File$+" : "+Str(Line))
  EndIf
EndProcedure

Macro TestHeaps
  _TestHeaps(#PB_Compiler_File, #PB_Compiler_Line)
EndMacro 

Steps to find the bug:

  • Place a TestHeaps call right before the line that crashes. If you get one of the message requesters, you have a heap corruption. If not, then the problem is something else and the above code will not help.
  • Start placing TestHeaps calls in places that are executed before the crashing line. Start with only a call every bunch of lines and narrow it down later.
  • You need to find the line of code, where TestHeaps reports nothing before, and reports a heap corruption after. This is the line that causes all the problems.
  • Make sure you remove all this test code after fixing the bug, as it can have a big performance hit on the program (see below).

So why doesn’t the PureBasic debugger make this check automatically ?

The reason is that HeapValidate() has an effect on the future run of the program. The documentation sais that it degrades performance and that this effect can last until the end of the program. My guess is that the check for a valid heap somehow reorganizes it into a less efficient state which means that future allocations will be slower. This is why this check is not done by the debugger. Maybe there will be an option for this somewhen in the future. who knows ?

3 thoughts on “Debugging tips – Part 1: Heap corruption

  1. luis

    Very useful, thank you!

    “Maybe there will be an option for this somewhen in the future. who knows ?”

    It would be great, IMHO.

  2. byo

    One of the most amazing articles I’ve read this year in any programming related blogs. Keep up the good work.

    I can only imagine how hard to track bugs when they have to do with heap corruption.

Leave a Reply