my hash function looks much cleaner (and more generic) than yours
Well, I think you have some work to do yet on it. Here's a comparison using this item list:
http://www.lloydsplace.com/sowpods2.zip
Code: Select all
Procedure.l djb2(tablesize, *key.CHARACTER)
Protected hash=5381
While *key\c <> 0
hash=(hash<<5)+hash + *key\c ; hash * 33 + asc
*key+1
Wend
If hash>=0
ProcedureReturn hash%(tablesize+1)
Else
ProcedureReturn -(hash%(tablesize+1))
EndIf
EndProcedure
Structure hashtable_char
c.c
EndStructure
Procedure.i hashtable_hashingfunction(tablesize, string$)
*p.hashtable_char = @string$
s = SizeOf(*p\c)
; calculate hash value using the shift-add algorithm
For i = 1 To Len(string$)
h << 4
h + *p\c
*p + s
Next
; make sure h is positive
If h < 0
h = ~h + 1
EndIf
ProcedureReturn h%tablesize+1
EndProcedure
Debug "Item count = 216555"
Debug "Bucket size = 10 million slots"
Debug ""
Dim a$(10000000)
collisioncount = 0
OpenFile(0,"sowpods2.txt")
While Not Eof(0)
txt$ = ReadString(0)
hash = hashtable_hashingfunction(10000000, txt$)
If a$(hash)=""
a$(hash) = txt$
Else
collisioncount+1
EndIf
Wend
Debug "This algorithm: " + Str(collisioncount) + " collisions"
Dim a$(10000000)
collisioncount = 0
OpenFile(0,"sowpods2.txt")
While Not Eof(0)
txt$ = ReadString(0)
hash = djb2(10000000, @txt$)
If a$(hash)=""
a$(hash) = txt$
Else
collisioncount+1
EndIf
Wend
Debug "djb2 algorithm: " + Str(collisioncount) + " collisions"
The code compares the collision frequency of your hash algorithm to that of the djb2 hash on the same 216555 items in a 10-million-slot array. You have 50k+ collisions vs 2.4k for djb2.
Also, the following test reveals a serious flaw in your hash:
Code: Select all
Procedure.l djb2(tablesize, *key.CHARACTER)
Protected hash=5381
While *key\c <> 0
hash=(hash<<5)+hash + *key\c ; hash * 33 + asc
*key+1
Wend
If hash>=0
ProcedureReturn hash%(tablesize+1)
Else
ProcedureReturn -(hash%(tablesize+1))
EndIf
EndProcedure
Structure hashtable_char
c.c
EndStructure
Procedure.i hashtable_hashingfunction(tablesize, string$)
*p.hashtable_char = @string$
s = SizeOf(*p\c)
; calculate hash value using the shift-add algorithm
For i = 1 To Len(string$)
h << 4
h + *p\c
*p + s
Next
; make sure h is positive
If h < 0
h = ~h + 1
EndIf
ProcedureReturn h%tablesize+1
EndProcedure
t$ = "abcdefghijklmnopqrstuvwxyz"
For i=0 To 16
a$ = Right(t$,Len(t$)-i)
Debug djb2(1000, @a$) ; well-distributed results
Next
For i=0 To 16
a$ = Right(t$,Len(t$)-i)
Debug hashtable_hashingfunction(1000, a$) ; all results = 31
Next