Page 1 of 1

PureBasic can consume gigabytes of memory when allocating only a few object IDs

Posted: Thu Sep 16, 2021 7:20 pm
by Mistrel
There is a very strange performance regression which occurs when allocating object IDs in increasing numerical order. Not only does each subsequent number take an increasingly long time to call a specific function but the memory usage increases significantly.

In this example I will create and free 32 images by doubling the static ID specific when calling CreateImage(). The time needed by this function will on average double for each subsequent call:

** Note all examples were compiled and run using PureBasic for x64:

Code: Select all

id=1

Repeat
  DisableDebugger
  start=ElapsedMilliseconds()
  CreateImage(id,1,1)
  EnableDebugger
  
  FreeImage(id)
  
  Debug "Id/Elapsed: "+Str(id)+"/"+Str(ElapsedMilliseconds()-start)
  id*2
  
  If id=4294967296
    Debug "Done!"
    End
  EndIf
ForEver
This is not the result of a dynamic array being populated as a cache because only 32 IDs were ever specified:

Code: Select all

1
2
4
8
16
32
64
128
256
512
1024
2048
4096
8192
16384
32768
65536
131072
262144
524288
1048576
2097152
4194304
8388608
16777216
33554432
67108864
134217728
268435456
536870912
1073741824
2147483648
This may be the result of some kind of cache where the numbers in between are also being allocated. This can be confirmed by analyzing memory usage. By placing a loop at the end it can be seen that this program consumed over 8.4GB of memory! These results are consistent between PureBasic 5.7x and PureBasic 6 alpha.

At first I thought that all numbers lower than a specified ID are always allocated but this is not the case. This can be demonstrated by performing the same loop but in reverse. The program runs and finishes almost instantly with none of the delay or large amount of memory consumption:

Code: Select all

id=4294967296

Repeat
  DisableDebugger
  start=ElapsedMilliseconds()
  CreateImage(id,1,1)
  EnableDebugger
  
  FreeImage(id)
  
  Debug "Id/Elapsed: "+Str(id)+"/"+Str(ElapsedMilliseconds()-start)
  id/2
  
  If id=1
    Debug "Done!"
    
    Repeat
      Delay(1)
    ForEver
  EndIf
ForEver
But this isn't the end of the story. Despite only consuming about 1MB of memory according to Task Manager, the program size will balloon to several gigabytes as it ends whether naturally or forcibly with Kill Program.

This is not problem created by CreateImage() and can be demonstrated equally with OpenWinodw()/CloseWindow().

I don't know what the cause of this is but I did not expect to find PureBasic to consuming gigabytes of memory when allocating a few IDs.

Re: PureBasic can consume gigabytes of memory when allocating only a few object IDs

Posted: Thu Sep 16, 2021 7:34 pm
by STARGÅTE
If you use static numbers, the object library allocate an array with a size of the larges number.
CreateImage(1000, ...) allocates an array with 1001 elements to realize the fast access of all static numbers until this number.
CreateImage(1000000, ...) allocates an array with 1000001 elements...
Due to the re-allocation with increasing object index, the evaluation take longer with increasing object number, than with decreasing numbers.

Therefore the debugger gives a warning which you ignore. There is no bug!

Re: PureBasic can consume gigabytes of memory when allocating only a few object IDs

Posted: Thu Sep 16, 2021 7:40 pm
by Mistrel
Task Manager does not reflect this at runtime. When specifying a very large number the memory usage does not increase. But it does balloon when the program ends; I don't understand why it does that.

Code: Select all

DisableDebugger
CreateImage(4294967296,1,1)

Repeat
  Delay(1)
ForEver
The results are inconsistent depending on whether the numbers are provided in ascending or descending order. I don't know how this is implemented internally but if there is an array that is dynamically resizing itself then that is terrible.

Re: PureBasic can consume gigabytes of memory when allocating only a few object IDs

Posted: Thu Sep 16, 2021 7:43 pm
by STARGÅTE
Mistrel wrote: Thu Sep 16, 2021 7:40 pm I don't see why a static array needs to be used here.
How else you want to refer an index (static object number) to the allocated object location (somewhere).
The array stores all the links (addresses) to the objects.

Re: PureBasic can consume gigabytes of memory when allocating only a few object IDs

Posted: Thu Sep 16, 2021 7:46 pm
by Mistrel
It's called a hash or tree map. One has O(n) lookup and the other is O(logn). Either one would be just as fast as a static lookup table on modern computers, would use a tiny fraction of memory, and would be immune to this problem entirely.

A static array is not the only way to store a lookup table.

Re: PureBasic can consume gigabytes of memory when allocating only a few object IDs

Posted: Thu Sep 16, 2021 7:47 pm
by infratec
In a map?

Re: PureBasic can consume gigabytes of memory when allocating only a few object IDs

Posted: Thu Sep 16, 2021 8:09 pm
by Mistrel
Demivec found where this is defined is in the documentation:
The static, indexed way, allows you to reference an object by a predefined numeric value. The first available index number is 0 and subsequent indexes are allocated sequentially. This means that if you use the number 0 and then the number 1000, 1001 indexes will be allocated and 999 (from 1 to 999) will be unused, which is not an efficient way to use indexed objects.
Maybe this implementation is historical. It also means that anything #PB_Any returns is no longer useful for many of the use cases I had been using it for with PureBasic (x86).

Re: PureBasic can consume gigabytes of memory when allocating only a few object IDs

Posted: Thu Sep 16, 2021 8:31 pm
by jacdelad
Mistrel wrote: Thu Sep 16, 2021 8:09 pm Demivec found where this is defined is in the documentation:
The static, indexed way, allows you to reference an object by a predefined numeric value. The first available index number is 0 and subsequent indexes are allocated sequentially. This means that if you use the number 0 and then the number 1000, 1001 indexes will be allocated and 999 (from 1 to 999) will be unused, which is not an efficient way to use indexed objects.
Maybe this implementation is historical. It also means that anything #PB_Any returns is no longer useful for many of the use cases I had been using it for with PureBasic (x86).
I don't know about your programming style, but #PB_Any it the way that ALWAYS works, if done right. The handle is always valid and there's nothing extra being reserved.

Re: PureBasic can consume gigabytes of memory when allocating only a few object IDs

Posted: Thu Sep 16, 2021 9:18 pm
by STARGÅTE
Mistrel wrote: Thu Sep 16, 2021 7:46 pm A static array is not the only way to store a lookup table.
I know. But it is the fastest, without hashing, no collision handling etc.
Having the fastest access is important, while the static index is used at anyplace.

Re: PureBasic can consume gigabytes of memory when allocating only a few object IDs

Posted: Fri Sep 17, 2021 8:39 am
by NicTheQuick
Mistrel wrote: Thu Sep 16, 2021 7:40 pm Task Manager does not reflect this at runtime. When specifying a very large number the memory usage does not increase. But it does balloon when the program ends; I don't understand why it does that.

Code: Select all

DisableDebugger
CreateImage(4294967296,1,1)

Repeat
  Delay(1)
ForEver
The results are inconsistent depending on whether the numbers are provided in ascending or descending order. I don't know how this is implemented internally but if there is an array that is dynamically resizing itself then that is terrible.
That makes sense too. If you allocate a bunch of memory your operating system will not take that into account until you really use it. Your program's memory space exists only virtually and gets mapped to the real physical memory in the background or maybe to your swap device/file if your memory is full. That's why your browser seems to have allocated 50GB of virtual memory but in reality it only uses 2GB or so. It's called Memory overcommitment.

Now if your program ends, Purebasic wants to check if there are objects existing in that huge array of object ids and iterates over it. While it iterates over these 4294967297 IDs it reads the value from the array and because of that your operating system really has to commit that memory to your process and you can see that in the taskmanager. By the way you can also enable the virtual memory column in the task manager to see how much memory a process really tries to use.

Re: PureBasic can consume gigabytes of memory when allocating only a few object IDs

Posted: Fri Sep 17, 2021 4:18 pm
by Mistrel
You're right. I see the memory allocation when I enable "Committed Size" in Task Manager.

It's odd how performance differs when allocating IDs incrementally vs decrementally. I also wonder what PureBasic is doing as it closes that causes the operating system to allocate memory. It shouldn't need to perform any iteration and there is no need to free memory before ending a program as this cleanup is performed automatically; trying to do this yourself can be dangerous if cleanup occurs within a race condition and can lead to your program crashing as it terminates.