Array lookup vs. maps speed test

Share your advanced PureBasic knowledge/code with the community.
User avatar
Tenaja
Addict
Addict
Posts: 1959
Joined: Tue Nov 09, 2010 10:15 pm

Array lookup vs. maps speed test

Post by Tenaja »

I have a program that has a list of constant strings, and I need to continually reference it to get its position number. I have been storing them in arrays, with a LookUp routine, but I thought it might be inefficient, so I cobbled up a speed test to compare it with a map approach:

Code: Select all

; This test shows Arrays to be ~ 5-20 times faster at initializing, due to being consecutive.
;    (The lower number is with higher Map slots, since it performs better with more.)
; However, maps are 250+ times faster to access randomly than scanning a large array of strings !!!!
; 
;
; Test results:
; Debug "Init array = " + Str(t2-t1)		; 15	(at 10,000 loops)
; Debug "Init Map = "   + Str(t3-t2)		; 360	(at 10,000 loops with 512 slots)
; 											; 125 with 2048 slots
; 											; 62 with 32768 slots <--------------
; 											
; Debug "Test Array = " + Str(t4-t3)		; 26396 to 29188	(at 10,000 loops)
; Debug "Test Map = "   + Str(t5-t4)		; 31 to 47			(at 10,000, with 32678 slots)
; 											; 109	(at 10,000 loops with 2048 slots)
; --------------------------------------------------------------------------------------------


#kwsize = 10
#loops = 10000
#arrsize = #kwsize * #loops


Structure KWstruct
	name.s
	num.i
EndStructure


; Global Dim ArrList.KWstruct(#kwsize * #loops)
Global Dim ArrList.s(#arrsize)
Global NewMap MapList.KWstruct(32768)				; size of map  2048) ; 

Global t1, t2, t3, t4, t5

Global FailArr, FailMap

DisableDebugger

Procedure InitARRAYlist()
	Protected.i i, x, count
	
	count=0
	
	For i = 0 To #loops - 1
		Restore ArrData
		x = 0
		For x= 0 To #kwsize - 1
; 			Read.s ArrList(count)\name
; 			ArrList(count)\name + Str(count)
; 			ArrList(count)\num = count
			Read.s ArrList(count)
			ArrList(count) + Str(count)
; 			ArrList(count) = count
			count + 1
		Next
	Next
	
EndProcedure
		

Procedure InitMAPlist()
	Protected.i i, x, count
	Protected.s s
	
	count=0
	
	For i = 0 To #loops - 1
		Restore ArrData
		x = 0
		For x= 0 To #kwsize - 1
			Read.s s
			s + Str(count)
			MapList(s)\name = s
			MapList(s)\num = count
			count + 1
		Next
	Next
EndProcedure



; Array Lookup... finding the Needle (string) in the Haystack (String array)
; 	returns the array position of the string "needle"

Procedure.l Lookup(Array Haystack.s(1), Needle.s, n.l)
	While n > 0
		If 	Haystack(n-1) = Needle
			Break
		EndIf
		n-1 
	Wend
	ProcedureReturn n
EndProcedure


Procedure TestArrLookup()
	Protected.i i, x, result
	Protected.s s
	
	count=0
	
	For i = 0 To #loops - 1
		Restore ArrData
		x = 0
		For x= 0 To #kwsize - 1
			Read.s s
			s + Str(count)
			result = Lookup(ArrList(), s, #arrsize)
			If result = 0 : FailArr + 1 : EndIf
			count + 1
		Next
	Next
EndProcedure



Procedure TestMapLookup()
	Protected.i i, x, result
	Protected.s s
	
	count=0
	
	For i = 0 To #loops - 1
		Restore ArrData
		x = 0
		For x= 0 To #kwsize - 1
			Read.s s
			s + Str(count)
			result = FindMapElement(MapList(), s)
			If result = 0 : FailMap + 1 : EndIf
			count + 1
		Next
	Next
EndProcedure



DataSection
	ArrData:
	Data.s	"Alabama" 
	Data.s	"Alaska" 
	Data.s	"Arizona"
	Data.s	"Arkansas"
	Data.s	"California"
	Data.s	"Colorado"
	Data.s	"Connecticut" 
	Data.s	"Delaware" 
	Data.s	"Florida" 
	Data.s	"Georgia" 
	Data.s	"" 
EndDataSection

t1 = ElapsedMilliseconds()

InitARRAYlist()
t2 = ElapsedMilliseconds()
InitMAPlist()
t3 = ElapsedMilliseconds()


TestArrLookup()
t4 = ElapsedMilliseconds()

TestMapLookup()
t5 = ElapsedMilliseconds()




EnableDebugger

Debug "Init array = " + Str(t2-t1)			; 15	(at 10,000)
Debug "Init Map = "   + Str(t3-t2)			; 360	(at 10,000 with 512 slots)
											; 125 with 2048 slots
											; 62 with 32768 slots <--------------
											
Debug "Test Array = " + Str(t4-t3)			; 26396 to 29188	(at 10,000)
Debug "Test Map = "   + Str(t5-t4)			; 31 to 47			(at 10,000, with 32678 slots)
											; 109	with 2048 slots

											
Debug "Fails: " + Str(FailArr) + ", " + Str(FailMap) 
I attempted to make it so the required steps were similar, given the task. Also note that I did disable the debugger for the speed tests. As the comment shows, these are my conclusions:
This test shows Arrays to be 5-20 times faster at initializing, due to being consecutive. (The lower number [i.e. 5x] is with higher Map slots, since it performs better with more.)

However, maps are 250+ times faster (even with only 2048 slots) to access randomly than scanning a large array of strings !!!! (I had to make it large enough to register some time. If you bump the map slots up to 32768, the arrays take over 700x as much time.


So, for those wanting to know which is best for them, it all depends on what you are doing with it. If you are placing data most of the time, arrays are quite a bit faster. If you are looking up strings to get a reference number, maps are FAR faster. Also note the speed difference in loading

Now, if anybody knows how to calculate the memory requirements of a "slot", then please go here to post it:
How much ram per Map slot? http://purebasic.fr/english/viewtopic.php?f=13&t=54407
(This is the Tips forum, so that is why I made two threads.)
User avatar
Demivec
Addict
Addict
Posts: 4270
Joined: Mon Jul 25, 2005 3:51 pm
Location: Utah, USA

Re: Array lookup vs. maps speed test

Post by Demivec »

Tenaja wrote:I attempted to make it so the required steps were similar, given the task. Also note that I did disable the debugger for the speed tests. As the comment shows, these are my conclusions:
This test shows Arrays to be 5-20 times faster at initializing, due to being consecutive. (The lower number [i.e. 5x] is with higher Map slots, since it performs better with more.)
IMHO it really doesn't make sense to not take advantage of all the features of the datatype when conducting the speed test.

To make the test more meaningful the look-up procedure for the array shouldn't use a sequential search but should use something more efficient, such as a binary search. The init routine for the array should end by sorting the array to prepare it for searching.
User avatar
ts-soft
Always Here
Always Here
Posts: 5756
Joined: Thu Jun 24, 2004 2:44 pm
Location: Berlin - Germany

Re: Array lookup vs. maps speed test

Post by ts-soft »

Tenaja wrote:Also note that I did disable the debugger for the speed tests.
This is not the same as without debugger compiling :wink:
PureBasic 5.73 | SpiderBasic 2.30 | Windows 10 Pro (x64) | Linux Mint 20.1 (x64)
Old bugs good, new bugs bad! Updates are evil: might fix old bugs and introduce no new ones.
Image
User avatar
Demivec
Addict
Addict
Posts: 4270
Joined: Mon Jul 25, 2005 3:51 pm
Location: Utah, USA

Re: Array lookup vs. maps speed test

Post by Demivec »

Here are changes for the Array Procedures so that the array is sorted as part of the init and so that a binary search is used for the lookup:

Code: Select all

Procedure InitARRAYlist()
  Protected.i i, x, count
  
  count=0
  
  For i = 0 To #loops - 1
    Restore ArrData
    x = 0
    For x= 0 To #kwsize - 1
      ;          Read.s ArrList(count)\name
      ;          ArrList(count)\name + Str(count)
      ;          ArrList(count)\num = count
      Read.s ArrList(count)
      ArrList(count) + Str(count)
      ;          ArrList(count) = count
      count + 1
    Next
  Next
  SortArray(ArrList(), #PB_Sort_Ascending)
EndProcedure

; Array Lookup... finding the Needle (string) in the Haystack (String array)
;    returns the array position of the string "needle" or -1 if not found

Procedure.l Lookup(Array Haystack.s(1), Needle.s)
  Protected result, middleIndex, lowestIndex, highestIndex = ArraySize(Haystack())
  
  Repeat
    middleIndex = (highestIndex + lowestIndex) >> 1
    result = CompareMemoryString(@Haystack(middleIndex), @Needle) ;,#PB_String_NoCase)
    
    If result > 0       
      highestIndex = middleIndex - 1           
    ElseIf result < 0
      lowestIndex = middleIndex + 1
    Else
      ProcedureReturn middleIndex
    EndIf
  Until lowestIndex > highestIndex
  
  ProcedureReturn -1 ;not present
EndProcedure


Procedure TestArrLookup()
  Protected.i i, x, result
  Protected.s s
  
  count=0
  
  For i = 0 To #loops - 1
    Restore ArrData
    x = 0
    For x= 0 To #kwsize - 1
      Read.s s
      s + Str(count)
      result = Lookup(ArrList(), s)
      If result = -1 : FailArr + 1 : EndIf
      count + 1
    Next
  Next
EndProcedure
With this improved search I show the Array look-up as only taking twice as long as the Map look-up (with 32768 slots). The Array look-up is faster when the Map uses only 8192 slots.

Replace the debug output at the end with this code (never compile with the debugger when doing speed tests, whether or not it is enabled :mrgreen: ):

Code: Select all

Define output$
output$ = "Init array = " + Str(t2-t1)         ; 15   (at 10,000)
output$ + #crlf$ + "Init Map = "   + Str(t3-t2)         ; 360   (at 10,000 with 512 slots)
                                 ; 125 with 2048 slots
                                 ; 62 with 32768 slots <--------------
                                 
output$ + #crlf$ + "Test Array = " + Str(t4-t3)         ; 26396 to 29188   (at 10,000)
output$ + #crlf$ + "Test Map = "   + Str(t5-t4)         ; 31 to 47         (at 10,000, with 32678 slots)
                                 ; 109   with 2048 slots

                                 
output$ + #crlf$ + "Fails: " + Str(FailArr) + ", " + Str(FailMap)

MessageRequester("Results", output$)
Last edited by Demivec on Sat Apr 20, 2013 7:01 pm, edited 2 times in total.
Little John
Addict
Addict
Posts: 4791
Joined: Thu Jun 07, 2007 3:25 pm
Location: Berlin, Germany

Re: Array lookup vs. maps speed test

Post by Little John »

Demivec wrote:IMHO it really doesn't make sense to not take advantage of all the features of the datatype when conducting the speed test.

To make the test more meaningful the look-up procedure for the array shouldn't use a sequential search but should use something more efficient, such as a binary search.
Yep!
User avatar
Tenaja
Addict
Addict
Posts: 1959
Joined: Tue Nov 09, 2010 10:15 pm

Re: Array lookup vs. maps speed test

Post by Tenaja »

original text wrote:Very interesting. I guess this is a perfect example of how a mediocre Help file severely slows progress. Just the fact that CompareMemoryString() is not even mentioned in the String section of the Help file is frustrating--I would have used it if aware of it.

Not only that, but why on earth does Fred not parse out the string comparison to use that routine?!?!


I just ran a few more tests using CompareMemoryString(), and on my system the timing was identical to using If Haystack(n-1) = Needle. This is true of both debug and exe files.
Last edited by Tenaja on Sun Apr 21, 2013 2:27 am, edited 1 time in total.
User avatar
Tenaja
Addict
Addict
Posts: 1959
Joined: Tue Nov 09, 2010 10:15 pm

Re: Array lookup vs. maps speed test

Post by Tenaja »

ts-soft wrote:
Tenaja wrote:Also note that I did disable the debugger for the speed tests.
This is not the same as without debugger compiling :wink:
"The same" or not, a compiled EXE repeatedly gave identical timings, printing to a console or the debug window.
User avatar
Tenaja
Addict
Addict
Posts: 1959
Joined: Tue Nov 09, 2010 10:15 pm

Re: Array lookup vs. maps speed test

Post by Tenaja »

Demivec wrote:Here are changes for the Array Procedures so that the array is sorted as part of the init and so that a binary search is used for the lookup:

SNIP...
Thanks for that--I saved it for future reference. My app happens to have "random" strings in it, added as it goes, so sorting it will be very cumbersome. I just chose the strings I did for simplicity.
Post Reply