pjay wrote: Sun Oct 01, 2023 10:51 am
If you create an array the whole allocated memory has to be cleared first or else every element would contain old data that was allocated before.
I know this, but so does AllocateMemory() in it's default state. The code below shows AllocateMemory() to be hugely times faster than Dim() for the same allocation unit.
I wonder if AllocateMemory() has some kind of OS level optimization / caching, or there's something I still don't understand - Either way it doesn't really matter, it's just a curiosity.
Code: Select all
#Count = 50 * (3840 * 2160) * 3 ;/ 50 raw 4k RGB images
Debug "Alloc size: " + StrF(#Count / (1024*1024),1)+"mb"
time = ElapsedMilliseconds()
Dim rgbarray.a(#count-1)
time_array = ElapsedMilliseconds() - time
time = ElapsedMilliseconds()
*mem = AllocateMemory(#count)
Debug "*Mem pointer: " + Str(*Mem)
If *mem : PokeA(*mem,255) : PokeA(*mem + (#count-1),255) : EndIf ;/ write to start and end points as test
time_alloc = ElapsedMilliseconds() - time
Debug "Time for Dimensioning array: " + Str(time_array) + "ms"
Debug "Time for Allocating memory: " + Str(time_alloc) + "ms"
On Windows,
AllocateMemory(…) is a
HeapAlloc_(…) call, specifically:
Code: Select all
; When #PB_Memory_NoClear
HeapAlloc_(__PB_GLOBAL_HEAP, 0, numBytes)
; When no flags are present (i.e., memory is zeroed)
HeapAlloc_(__PB_GLOBAL_HEAP, #HEAP_ZERO_MEMORY, numBytes)
Note: you can't trivially access __PB_GLOBAL_HEAP so just define it with GetProcessHeap_().
Meanwhile,
Dim is a non-zeroing
HeapAlloc_(…) call followed by zeroing with
memset(…), similar to:
Code: Select all
; Dim array.l(999)
#ARRAY_UNIT_COUNT = 1000
#ARRAY_UNIT_SIZE = SizeOf(Long)
; Request memory.
Define *array = HeapAlloc_(__PB_GLOBAL_HEAP, 0, #ARRAY_UNIT_COUNT * #ARRAY_UNIT_SIZE + #ARRAY_HEADER_SIZE)
; … omit for brevity: writing metadata to array header (it's of minimal cost)
; Zero-out memory.
memset(*array + #ARRAY_HEADER_SIZE, 0, #ARRAY_UNIT_COUNT * #ARRAY_UNIT_SIZE)
; … omit for brevity: when structural array, initialise structure fields (Map/List/Array)
The most costly part of this is the
memset(…), as it steps over all of the newly allocated memory and writes to it. Meanwhile, with
HeapAlloc_(…) the OS will use memory in reserve or zero pages on-demand as it becomes necessary, so it's nearly doing nothing and defers the work.
There just isn't the memory bandwidth to
actually zero out ~1.2GB of memory in the fraction of a millisecond
HeapAlloc_(…) takes.
You can see how
memset(…) really dominates time taken given this snippet:
Code: Select all
EnableExplicit
ImportC ""
memset(*dest, c.l, count.i)
EndImport
#Count = 50 * (3840 * 2160) * 3 ;/ 50 raw 4k RGB images
If Not OpenConsole()
MessageRequester("Fatal Error", "Oops failed to open console!", #PB_MessageRequester_Error)
End
EndIf
PrintN("Alloc size: " + StrF(#Count / (1024 * 1024), 1) + "mb")
; Allocate a block of memory zeroed the usual way.
; ═══════════════════════════════════════════════════════
Define startTime = ElapsedMilliseconds()
Define *mem = AllocateMemory(#Count)
PrintN("Allocate memory normal: " + Str(ElapsedMilliseconds() - startTime) + "ms")
FreeMemory(*mem)
; Allocate a block of memory zeroed like an array.
; ═══════════════════════════════════════════════════════
Define startTime = ElapsedMilliseconds()
; Don't clear with alloc.
Define *memLikeArray = AllocateMemory(#Count, #PB_Memory_NoClear)
; Clear with memset instead.
memset(*memLikeArray, 0, #Count)
PrintN("Allocate memory array-like: " + Str(ElapsedMilliseconds() - startTime) + "ms")
; Dim a plain array.
; ═══════════════════════════════════════════════════════
Define startTime = ElapsedMilliseconds()
Dim rgbArray.a(#Count - 1)
PrintN("Dim array: " + Str(ElapsedMilliseconds() - startTime) + "ms")
Input()
Results:
Code: Select all
Alloc size: 1186.5mb
Allocate memory typical: 0ms
Allocate memory array-like: 108ms
Dim array: 109ms
As you can see, the mere presence of
memset(…) contributes roughly the entire time taken for
Dim.
Try looping over the "0ms" allocated memory, jumping over large blocks (try: 2048, 4096, 8192, …) and zeroing a single integer in each. You should see significant elapsed time (relative the allocation), even with large blocks where the loop is doing very little work.