Page 3 of 3
Posted: Sat Apr 18, 2009 4:31 am
by kandl
i don't know why no one commented on my thread but my hash function looks much cleaner (and more generic) than yours
http://www.purebasic.fr/english/viewtopic.php?t=37060
Code: Select all
Structure hashtable_char
c.c
EndStructure
Procedure.i hastable_hashingfunction(string$)
*p.hashtable_char = @string$
s = SizeOf(*p\c)
; calculate hash value using the shift-add algorithm
For i = 1 To Len(string$)
h << 4
h + *p\c
*p + s
Next
; make sure h is positive
If h < 0
h = ~h + 1
EndIf
ProcedureReturn h
EndProcedure
Posted: Sat Apr 18, 2009 4:52 am
by netmaestro
my hash function looks much cleaner (and more generic) than yours
Well, I think you have some work to do yet on it. Here's a comparison using this item list:
http://www.lloydsplace.com/sowpods2.zip
Code: Select all
Procedure.l djb2(tablesize, *key.CHARACTER)
Protected hash=5381
While *key\c <> 0
hash=(hash<<5)+hash + *key\c ; hash * 33 + asc
*key+1
Wend
If hash>=0
ProcedureReturn hash%(tablesize+1)
Else
ProcedureReturn -(hash%(tablesize+1))
EndIf
EndProcedure
Structure hashtable_char
c.c
EndStructure
Procedure.i hashtable_hashingfunction(tablesize, string$)
*p.hashtable_char = @string$
s = SizeOf(*p\c)
; calculate hash value using the shift-add algorithm
For i = 1 To Len(string$)
h << 4
h + *p\c
*p + s
Next
; make sure h is positive
If h < 0
h = ~h + 1
EndIf
ProcedureReturn h%tablesize+1
EndProcedure
Debug "Item count = 216555"
Debug "Bucket size = 10 million slots"
Debug ""
Dim a$(10000000)
collisioncount = 0
OpenFile(0,"sowpods2.txt")
While Not Eof(0)
txt$ = ReadString(0)
hash = hashtable_hashingfunction(10000000, txt$)
If a$(hash)=""
a$(hash) = txt$
Else
collisioncount+1
EndIf
Wend
Debug "This algorithm: " + Str(collisioncount) + " collisions"
Dim a$(10000000)
collisioncount = 0
OpenFile(0,"sowpods2.txt")
While Not Eof(0)
txt$ = ReadString(0)
hash = djb2(10000000, @txt$)
If a$(hash)=""
a$(hash) = txt$
Else
collisioncount+1
EndIf
Wend
Debug "djb2 algorithm: " + Str(collisioncount) + " collisions"
The code compares the collision frequency of your hash algorithm to that of the djb2 hash on the same 216555 items in a 10-million-slot array. You have 50k+ collisions vs 2.4k for djb2.
Also, the following test reveals a serious flaw in your hash:
Code: Select all
Procedure.l djb2(tablesize, *key.CHARACTER)
Protected hash=5381
While *key\c <> 0
hash=(hash<<5)+hash + *key\c ; hash * 33 + asc
*key+1
Wend
If hash>=0
ProcedureReturn hash%(tablesize+1)
Else
ProcedureReturn -(hash%(tablesize+1))
EndIf
EndProcedure
Structure hashtable_char
c.c
EndStructure
Procedure.i hashtable_hashingfunction(tablesize, string$)
*p.hashtable_char = @string$
s = SizeOf(*p\c)
; calculate hash value using the shift-add algorithm
For i = 1 To Len(string$)
h << 4
h + *p\c
*p + s
Next
; make sure h is positive
If h < 0
h = ~h + 1
EndIf
ProcedureReturn h%tablesize+1
EndProcedure
t$ = "abcdefghijklmnopqrstuvwxyz"
For i=0 To 16
a$ = Right(t$,Len(t$)-i)
Debug djb2(1000, @a$) ; well-distributed results
Next
For i=0 To 16
a$ = Right(t$,Len(t$)-i)
Debug hashtable_hashingfunction(1000, a$) ; all results = 31
Next
Posted: Sat Apr 18, 2009 12:54 pm
by srod
The djb2 algorithm is well understood and well tested - used for many hash-table type implementations. It is mathematically proven to produce relatively uniform results which is exactly what you want for this kind of thing.
kandl, the algorithm you are using is very similar in that both are shift-add algorithms. Easy to see though that the code you are using will not produce results which are as uniformally spread out.
don't know why no one commented on my thread but my hash function looks much cleaner (and more generic) than yours
No need to get personal. As I said, both are similar algorithms so bang goes the generic argument there!
@Netty : yes that is a good suggestion there. Just one tiny modification required...
Thanks.