Monthly Archives: November 2008

Debugging tips – Part 2: Stack problems

Another common problem that the debugger does not really identify are stack problems. Fortunately, other than the heap problems discussed before, these are not hard to narrow down.

Symptoms:
  • A piece of code works on the main level, but fails when called inside a procedure or inside a Select block
  • The crash is usually when leaving the procedure (ProcedureReturn or EndProcedure)
  • Usually a call to a Dll or imported function is involved
Reason:

The x86 processor has several different so called calling conventions which define how a function call is made (were the arguments go and who is responsible for cleaning up after the call). In fact everybody can invent their own convention for this, and quite a number of compilers do this for performance reasons. (PureBasic itself does it for Library functions written in ASM. They use the stdcall convention, but receive their first argument in the EAX register instead of the stack). When calling an external function, the calling convention must match or there will be problems like the one we have here.

The most popular conventions on x86 are cdecl and stdcall. cdecl is the standard used by the C language, stdcall is most used by Windows dlls. PureBasic uses stdcall by default in most places. They both provide the function arguments on the stack in reverse order. The only main difference is in the fact that cdecl requires the caller to remove the arguments from the stack and stdcall requires the function to do this before returning (there is also a difference in symbol naming, but thats not relevant here). So if the caller assumes the wrong calling convention then either both try to remove the parameters from the stack or nobody does it. In both cases the stack is not the same after the function call than before, and this is a big problem.

Here we get to the reason why it works outside of a procedure and not inside. A call to a procedure stores the address where execution continues after the procedure on the stack as well. If a function call inside the procedure leaves the stack in a mess, then the ProcedureReturn cannot find the address to continue and uses a wrong value instead which leads to a crash. Outside of a procedure, nobody will notice the stack mess, as the stack is only used for function calls.

Solution:
  • PB provides support for both stdcall (the default) and cdecl calling conventions (the functions/keywords with a C in the name)
  • If you used CallFunction() or CallFunctionFast() try CallCFunction() or CallCFunctionFast() instead. (these are the cdecl versions of the same function)
  • If you used Prototype, try PrototypeC instead.
  • If the crash happens in a callback procedure which you passed to an external function, try defining the callback with ProcedureC instead.
  • Also check if the number of arguments and the argument types match, as this can lead to the same problem. (remember that CallFunction() and CallFunctionFast() default to the long type, so for quads and doubles you have to use prototypes for a correct call.)

Some rules of thumb:

  • On Windows all Dlls that come with Windows use the stdcall convention. Most other Dlls follow this model, so there are only a few that use cdecl. With static libraries (used with Import/ImportC), the division is not so clear. Just try it.
  • On Linux / Mac OSX, the dominating calling convention is cdecl. So pretty much every library you use will require the C type of function.

Stack problems can also arise when InlineASM code is used and the stack is changed. Those that use this should know what they are doing though. The x64 and PowerPC processors do not have the calling convention problems by the way, as the force one common calling convention on everybody. So a parameter mismatch is the most probable explanation here.

A sidenote: Window callbacks

On Windows, another condition which can cause a crash on the ProcedureReturn keyword is the window callback. This is not a stack issue then though. What it means is that the crash happened somewhere in the PureBasic event processing that happens after your window callback returns. The debugger just shows the ProcedureReturn line because this was the last correctly executed line in non-library code.

What this probably means is that you did something with Windows API calls that conflicts with the way PureBasic handles its events. This may be in the window callback directly, or by modifying some Gadget/Window related data that PureBasic is expecting to remain unchanged. If you have no clue what causes this, just post it on the forum. There are a lot of capable people around that can help with these things.