A OOP class for dynamically implementing hash tables!

Share your advanced PureBasic knowledge/code with the community.
kandl
User
User
Posts: 11
Joined: Sat Apr 11, 2009 6:54 am

Post by kandl »

i don't know why no one commented on my thread but my hash function looks much cleaner (and more generic) than yours

http://www.purebasic.fr/english/viewtopic.php?t=37060

Code: Select all


Structure hashtable_char
  c.c
EndStructure 

Procedure.i hastable_hashingfunction(string$)

  *p.hashtable_char = @string$
  s = SizeOf(*p\c)

  ; calculate hash value using the shift-add algorithm
  For i = 1 To Len(string$)
    h << 4
    h + *p\c
    *p + s
  Next
 
  ; make sure h is positive
  If h < 0
    h = ~h + 1
  EndIf

  ProcedureReturn h
 
EndProcedure 

User avatar
netmaestro
PureBasic Bullfrog
PureBasic Bullfrog
Posts: 8451
Joined: Wed Jul 06, 2005 5:42 am
Location: Fort Nelson, BC, Canada

Post by netmaestro »

my hash function looks much cleaner (and more generic) than yours
Well, I think you have some work to do yet on it. Here's a comparison using this item list:

http://www.lloydsplace.com/sowpods2.zip

Code: Select all


Procedure.l djb2(tablesize, *key.CHARACTER)
  Protected hash=5381
  While *key\c <> 0
    hash=(hash<<5)+hash + *key\c ; hash * 33 + asc
    *key+1
  Wend
  If hash>=0
    ProcedureReturn hash%(tablesize+1)
  Else
    ProcedureReturn -(hash%(tablesize+1))
  EndIf
EndProcedure


Structure hashtable_char 
  c.c 
EndStructure 

Procedure.i hashtable_hashingfunction(tablesize, string$) 

  *p.hashtable_char = @string$ 
  s = SizeOf(*p\c) 

  ; calculate hash value using the shift-add algorithm 
  For i = 1 To Len(string$) 
    h << 4 
    h + *p\c 
    *p + s 
  Next 
  
  ; make sure h is positive 
  If h < 0 
    h = ~h + 1 
  EndIf 

  ProcedureReturn h%tablesize+1
  
EndProcedure 


Debug "Item count = 216555"
Debug "Bucket size = 10 million slots"
Debug ""

Dim a$(10000000)

collisioncount = 0
OpenFile(0,"sowpods2.txt")
While Not Eof(0) 
  txt$ = ReadString(0)
  hash = hashtable_hashingfunction(10000000, txt$)
  If a$(hash)=""
    a$(hash) = txt$
  Else
    collisioncount+1
  EndIf
Wend

Debug "This algorithm: " + Str(collisioncount) + " collisions"


Dim a$(10000000)

collisioncount = 0
OpenFile(0,"sowpods2.txt")
While Not Eof(0) 
  txt$ = ReadString(0)
  hash = djb2(10000000, @txt$)
  If a$(hash)=""
    a$(hash) = txt$
  Else
    collisioncount+1
  EndIf
Wend

Debug "djb2 algorithm: " + Str(collisioncount) + " collisions"
The code compares the collision frequency of your hash algorithm to that of the djb2 hash on the same 216555 items in a 10-million-slot array. You have 50k+ collisions vs 2.4k for djb2.

Also, the following test reveals a serious flaw in your hash:

Code: Select all

Procedure.l djb2(tablesize, *key.CHARACTER)
  Protected hash=5381
  While *key\c <> 0
    hash=(hash<<5)+hash + *key\c ; hash * 33 + asc
    *key+1
  Wend
  If hash>=0
    ProcedureReturn hash%(tablesize+1)
  Else
    ProcedureReturn -(hash%(tablesize+1))
  EndIf
EndProcedure

Structure hashtable_char 
  c.c 
EndStructure 

Procedure.i hashtable_hashingfunction(tablesize, string$) 

  *p.hashtable_char = @string$ 
  s = SizeOf(*p\c) 

  ; calculate hash value using the shift-add algorithm 
  For i = 1 To Len(string$) 
    h << 4 
    h + *p\c 
    *p + s 
  Next 
  
  ; make sure h is positive 
  If h < 0 
    h = ~h + 1 
  EndIf 

  ProcedureReturn h%tablesize+1
  
EndProcedure 

t$ = "abcdefghijklmnopqrstuvwxyz"
For i=0 To 16
  a$ = Right(t$,Len(t$)-i)
  Debug djb2(1000, @a$) ; well-distributed results
Next

For i=0 To 16
  a$ = Right(t$,Len(t$)-i)
  Debug hashtable_hashingfunction(1000, a$) ; all results = 31
Next
BERESHEIT
srod
PureBasic Expert
PureBasic Expert
Posts: 10589
Joined: Wed Oct 29, 2003 4:35 pm
Location: Beyond the pale...

Post by srod »

The djb2 algorithm is well understood and well tested - used for many hash-table type implementations. It is mathematically proven to produce relatively uniform results which is exactly what you want for this kind of thing.

kandl, the algorithm you are using is very similar in that both are shift-add algorithms. Easy to see though that the code you are using will not produce results which are as uniformally spread out.
don't know why no one commented on my thread but my hash function looks much cleaner (and more generic) than yours
No need to get personal. As I said, both are similar algorithms so bang goes the generic argument there!


@Netty : yes that is a good suggestion there. Just one tiny modification required...

Code: Select all

*key+SizeOf(CHARACTER)
:wink:

Thanks.
I may look like a mule, but I'm not a complete ass.
Post Reply