ArrayList - Doubly-linked Dynamic Array Data Structure

Share your advanced PureBasic knowledge/code with the community.
pjay
Enthusiast
Enthusiast
Posts: 252
Joined: Thu Mar 30, 2006 11:14 am

Re: ArrayList - Doubly-linked Dynamic Array Data Structure

Post by pjay »

If you create an array the whole allocated memory has to be cleared first or else every element would contain old data that was allocated before.
I know this, but so does AllocateMemory() in it's default state. The code below shows AllocateMemory() to be hugely times faster than Dim() for the same allocation unit.

I wonder if AllocateMemory() has some kind of OS level optimization / caching, or there's something I still don't understand - Either way it doesn't really matter, it's just a curiosity.

Code: Select all

#Count = 50 * (3840 * 2160) * 3 ;/ 50 raw 4k RGB images
Debug "Alloc size: " + StrF(#Count / (1024*1024),1)+"mb"

time = ElapsedMilliseconds()
Dim rgbarray.a(#count-1)
time_array = ElapsedMilliseconds() - time

time = ElapsedMilliseconds()
*mem = AllocateMemory(#count)
Debug "*Mem pointer: " + Str(*Mem)
If *mem : PokeA(*mem,255) : PokeA(*mem + (#count-1),255) : EndIf ;/ write to start and end points as test
time_alloc = ElapsedMilliseconds() - time

Debug "Time for Dimensioning array: " + Str(time_array) + "ms"
Debug "Time for Allocating memory: " + Str(time_alloc) + "ms"
User avatar
yuki
Enthusiast
Enthusiast
Posts: 101
Joined: Sat Mar 31, 2018 9:09 pm

Re: ArrayList - Doubly-linked Dynamic Array Data Structure

Post by yuki »

pjay wrote: Sun Oct 01, 2023 10:51 am
If you create an array the whole allocated memory has to be cleared first or else every element would contain old data that was allocated before.
I know this, but so does AllocateMemory() in it's default state. The code below shows AllocateMemory() to be hugely times faster than Dim() for the same allocation unit.

I wonder if AllocateMemory() has some kind of OS level optimization / caching, or there's something I still don't understand - Either way it doesn't really matter, it's just a curiosity.

Code: Select all

#Count = 50 * (3840 * 2160) * 3 ;/ 50 raw 4k RGB images
Debug "Alloc size: " + StrF(#Count / (1024*1024),1)+"mb"

time = ElapsedMilliseconds()
Dim rgbarray.a(#count-1)
time_array = ElapsedMilliseconds() - time

time = ElapsedMilliseconds()
*mem = AllocateMemory(#count)
Debug "*Mem pointer: " + Str(*Mem)
If *mem : PokeA(*mem,255) : PokeA(*mem + (#count-1),255) : EndIf ;/ write to start and end points as test
time_alloc = ElapsedMilliseconds() - time

Debug "Time for Dimensioning array: " + Str(time_array) + "ms"
Debug "Time for Allocating memory: " + Str(time_alloc) + "ms"
On Windows, AllocateMemory(…) is a HeapAlloc_(…) call, specifically:

Code: Select all

; When #PB_Memory_NoClear
HeapAlloc_(__PB_GLOBAL_HEAP, 0, numBytes)

; When no flags are present (i.e., memory is zeroed)
HeapAlloc_(__PB_GLOBAL_HEAP, #HEAP_ZERO_MEMORY, numBytes)
Note: you can't trivially access __PB_GLOBAL_HEAP so just define it with GetProcessHeap_().

Meanwhile, Dim is a non-zeroing HeapAlloc_(…) call followed by zeroing with memset(…), similar to:

Code: Select all

; Dim array.l(999)
#ARRAY_UNIT_COUNT = 1000
#ARRAY_UNIT_SIZE  = SizeOf(Long)

; Request memory.
Define *array = HeapAlloc_(__PB_GLOBAL_HEAP, 0, #ARRAY_UNIT_COUNT * #ARRAY_UNIT_SIZE + #ARRAY_HEADER_SIZE)

; … omit for brevity: writing metadata to array header (it's of minimal cost)

; Zero-out memory.
memset(*array + #ARRAY_HEADER_SIZE, 0, #ARRAY_UNIT_COUNT * #ARRAY_UNIT_SIZE)

; … omit for brevity: when structural array, initialise structure fields (Map/List/Array)
The most costly part of this is the memset(…), as it steps over all of the newly allocated memory and writes to it. Meanwhile, with HeapAlloc_(…) the OS will use memory in reserve or zero pages on-demand as it becomes necessary, so it's nearly doing nothing and defers the work.

There just isn't the memory bandwidth to actually zero out ~1.2GB of memory in the fraction of a millisecond HeapAlloc_(…) takes.

You can see how memset(…) really dominates time taken given this snippet:

Code: Select all

EnableExplicit

ImportC ""
  memset(*dest, c.l, count.i)
EndImport

#Count = 50 * (3840 * 2160) * 3 ;/ 50 raw 4k RGB images

If Not OpenConsole()
  MessageRequester("Fatal Error", "Oops failed to open console!", #PB_MessageRequester_Error)
  End
EndIf

PrintN("Alloc size: " + StrF(#Count / (1024 * 1024), 1) + "mb")

; Allocate a block of memory zeroed the usual way.
; ═══════════════════════════════════════════════════════
Define startTime = ElapsedMilliseconds()
Define *mem = AllocateMemory(#Count)
PrintN("Allocate memory normal: " + Str(ElapsedMilliseconds() - startTime) + "ms")
FreeMemory(*mem)

; Allocate a block of memory zeroed like an array.
; ═══════════════════════════════════════════════════════
Define startTime = ElapsedMilliseconds()
; Don't clear with alloc.
Define *memLikeArray = AllocateMemory(#Count, #PB_Memory_NoClear)
; Clear with memset instead.
memset(*memLikeArray, 0, #Count)
PrintN("Allocate memory array-like: " + Str(ElapsedMilliseconds() - startTime) + "ms")

; Dim a plain array.
; ═══════════════════════════════════════════════════════
Define startTime = ElapsedMilliseconds()
Dim rgbArray.a(#Count - 1)
PrintN("Dim array: " + Str(ElapsedMilliseconds() - startTime) + "ms")

Input()
Results:

Code: Select all

Alloc size: 1186.5mb
Allocate memory typical: 0ms
Allocate memory array-like: 108ms
Dim array: 109ms
As you can see, the mere presence of memset(…) contributes roughly the entire time taken for Dim.

Try looping over the "0ms" allocated memory, jumping over large blocks (try: 2048, 4096, 8192, …) and zeroing a single integer in each. You should see significant elapsed time (relative the allocation), even with large blocks where the loop is doing very little work.
pjay
Enthusiast
Enthusiast
Posts: 252
Joined: Thu Mar 30, 2006 11:14 am

Re: ArrayList - Doubly-linked Dynamic Array Data Structure

Post by pjay »

Great explanation Yuki, thanks for that.
Post Reply