netmaestro wrote:I agree with you on the list insertion, but I can't on the sorted array search. ....although not quite as fast as list insertion or sorted array search
For 10 000 elements a binary search will definetely win. Of course, the hash table wins by a far amount when the number of items is absolutely enormous. But one million items is not enormous.idle wrote:So for 10,000 elements a binary search is often faster but compared to 1,000,000 elements there's no comparison.
I tested my binary search against srod's hash functions.
Number of keys: 1 158 189 strings from a dictionary
Sorting and adding the keys and values to an array for binary search: 844 ms
Adding the keys and values to the hash table: 1625 ms
The lookup was performed 1 000 000 times with the same random keys for both algorithms.
Hash lookup: 1984 ms
Binary search lookup: 2250 ms
Over a trial of one million lookup, binary search loses by only 275 ms, or 0,000275 ms per lookup.
A "clear" win for the hash table, except that the hash table uses absolutely ridiculous amounts of memory. For making the hash table perform like this, I used a table size of two million (almost twice as large as the key set). This means 841 811 additional array elements compared to the binary search.
Also each hash item has a field for table id and the calculated hash. This is 8 bytes extra for each item.
And the hash items are stored in a liked list, again 8 bytes extra for each item.
So, if my calculations are correct, the hash table used an additional 841 811*4 + 8*1 158 189 + 8*1 158 189 bytes, or 20.8 megabytes compared to the binary search array. (Srod, please look over this.)
The size of the dictionary was only 11,5 mb.
If I lower the hash table size to 200 000 elemens, the hash table loses on the lookup, and still uses more memory than the sorted array.
Idle said there was no comparison with one million elements, I disagree. Both are fair contestants depending on your need.
As for PB's new map library. Insertion: 97 seconds. Lookup: 108 seconds. The performance seems horrific. Good thing it's a beta.
My benchmarking code:
Code: Select all
XIncludeFile "repository\hashtable.pb" ; srod's hashtable
Structure SPair
Name.s
Value.l
EndStructure
Procedure BS_GetItem(Array Arr.SPair(1), Max.l, Name.s)
Protected Low, High, Mid
High = Max
While Low < High
Mid = (Low + High) >> 1
If Arr(Mid)\Name < Name
Low = Mid + 1
Else
High = Mid
EndIf
Wend
If Low < Max And Arr(Low)\Name = Name
ProcedureReturn Low
EndIf
ProcedureReturn -1
EndProcedure
; Load data
Max = 50000
Global Dim Strings.s(Max)
ReadFile(0, "c:\dic-0294.txt")
While Not Eof(0)
Strings(I) = ReadString(0)
I + 1
If I > Max
Max + 50000
ReDim Strings.s(Max)
EndIf
Wend
Max = I - 1
ReDim Strings.s(Max)
MessageRequester("Data loaded", Str(Max) + " words loaded")
; 1 158 189 words loaded
; Add data to array
time = ElapsedMilliseconds()
SortArray(Strings(), #PB_Sort_Ascending)
Global Dim S2.SPair(I)
For I = 0 To Max
S2(I)\Name = Strings(I)
S2(I)\Value = I
Next
time = ElapsedMilliseconds()-time
MessageRequester("Data added", "Adding and sorting to array took " + Str(time) + "ms")
; Add data to srod hash table
time = ElapsedMilliseconds()
HTid = HT_CreateHashTable(2000000)
For i = 0 To Max
HT_AddItem(HTid, Strings(I), I)
Next
time = ElapsedMilliseconds()-time
MessageRequester("Data added", "Data added to srod's hash table took " + Str(time) + "ms")
; Add Data To PB's map
compilerif 0
time = ElapsedMilliseconds()
NewMap Pbmap.i()
For I = 0 To Max
Pbmap(Strings(I)) = I
Next
time = ElapsedMilliseconds()-time
MessageRequester("Data added", "Data added to map took " + Str(time) + "ms")
compilerendif
; All should of course lookup the same values!
#Tries = 1000000
Global Dim Lookups(#Tries)
For I = 0 To #Tries
Lookups(I) = Random(Max)
Next
time = ElapsedMilliseconds()
For U = 0 To #Tries
HT_GetItem(HTid, Strings(Lookups(U)))
Next
MessageRequester("Srod Hash lookup", Str(ElapsedMilliseconds()-time))
time = ElapsedMilliseconds()
For U = 0 To #Tries
BS_GetItem(S2(), Max, Strings(Lookups(U)))
Next
MessageRequester("Binary search", Str(ElapsedMilliseconds()-time))
compilerif 0
time = ElapsedMilliseconds()
For U = 0 To #Tries
Pbmap(Strings(Lookups(U)))
Next
MessageRequester("PB Map lookup", Str(ElapsedMilliseconds()-time))
compilerendif