Hash tables.

Share your advanced PureBasic knowledge/code with the community.
User avatar
netmaestro
PureBasic Bullfrog
PureBasic Bullfrog
Posts: 8451
Joined: Wed Jul 06, 2005 5:42 am
Location: Fort Nelson, BC, Canada

Post by netmaestro »

Large data sizes are where hash tables come into their own. One common use for them is in chess algorithms where the evaluations for many millions of possibilities must be searched over and over again. You won't find a mainstream chess program made today that doesn't employ hash tables, as there just isn't anything that comes close for huge jobs.

srod, thanks for posting that! I'll be putting it to work pretty soon coming up with a chess program that'll beat the world champion's cousin's retarded brother.
BERESHEIT
srod
PureBasic Expert
PureBasic Expert
Posts: 10589
Joined: Wed Oct 29, 2003 4:35 pm
Location: Beyond the pale...

Post by srod »

netmaestro wrote:srod, thanks for posting that! I'll be putting it to work pretty soon coming up with a chess program that'll beat the world champion's cousin's retarded brother.
:lol:

Too late, I wrote one of those a few years back except that it had a tendency to throw the game after two moves!
I may look like a mule, but I'm not a complete ass.
kinglestat
Enthusiast
Enthusiast
Posts: 746
Joined: Fri Jul 14, 2006 8:53 pm
Location: Malta
Contact:

Post by kinglestat »

typically when I am shopping for an idea from my salad days I find srod has anticipated me and already produced working code!!
Guys, you have other in your blood stream!

and thanks

KingLestat
srod
PureBasic Expert
PureBasic Expert
Posts: 10589
Joined: Wed Oct 29, 2003 4:35 pm
Location: Beyond the pale...

Post by srod »

I like to anticipate people's needs! :wink: Although my girlfriend doesn't think so! :?

You're welcome Kinglestat, I hope it's useful.
I may look like a mule, but I'm not a complete ass.
User avatar
blueznl
PureBasic Expert
PureBasic Expert
Posts: 6166
Joined: Sat May 17, 2003 11:31 am
Contact:

Post by blueznl »

you people are too far ahead of me, but, euh, could i look at it like a long array like this:

Code: Select all

Structure hash_table
  hash.s
  text.s
EndStructure

Dim test.hash_table(100)

For n = 1 To 100
  test(n)\text = Str(Random(999999))+Str(Random(999999))+Space(100)
  test(n)\hash = MD5Fingerprint(test(n)\text,100)
Next n
then sort the array on the hash and use the hash to find the string back using binairy search?
( PB6.00 LTS Win11 x64 Asrock AB350 Pro4 Ryzen 5 3600 32GB GTX1060 6GB)
( The path to enlightenment and the PureBasic Survival Guide right here... )
srod
PureBasic Expert
PureBasic Expert
Posts: 10589
Joined: Wed Oct 29, 2003 4:35 pm
Location: Beyond the pale...

Post by srod »

blueznl wrote:you people are too far ahead of me, but, euh, could i look at it like a long array like this:

Code: Select all

Structure hash_table
  hash.s
  text.s
EndStructure

Dim test.hash_table(100)

For n = 1 To 100
  test(n)\text = Str(Random(999999))+Str(Random(999999))+Space(100)
  test(n)\hash = MD5Fingerprint(test(n)\text,100)
Next n
then sort the array on the hash and use the hash to find the string back using binairy search?

If you're going to do this, why not just sort on the original text and then use a binary search etc? The hash is pretty much redundant in the above code as you're using a text string to index another text string!


A proper hash table has a number of advantages over the method you've outlined, including:
  • faster access - could be as quick as single lookup (order 1) as opposed to order log2 (n) etc.
  • the hash is numeric as opposed to textual and thus links directly into the underlying table/array
  • no need to continually ensure that the data is ordered (to facilitate the binary chop)
etc.

There's probably more! :)

Hash tables are used extensively.
I may look like a mule, but I'm not a complete ass.
User avatar
blueznl
PureBasic Expert
PureBasic Expert
Posts: 6166
Joined: Sat May 17, 2003 11:31 am
Contact:

Post by blueznl »

'cause searching through a list with elements of variable length means you'll need either another list to index the exact starting positions of the strings, or you have to keep track of all the string lengths

also on very long yet similar texts a search on the actual text is perhaps slower, i dunno' :-)

simplified: i haven't got a clue what we're doing here :-)
( PB6.00 LTS Win11 x64 Asrock AB350 Pro4 Ryzen 5 3600 32GB GTX1060 6GB)
( The path to enlightenment and the PureBasic Survival Guide right here... )
srod
PureBasic Expert
PureBasic Expert
Posts: 10589
Joined: Wed Oct 29, 2003 4:35 pm
Location: Beyond the pale...

Post by srod »

blueznl wrote:'cause searching through a list with elements of variable length means you'll need either another list to index the exact starting strings, or you have to keep track of all the string lengths
Not sure I understand! :?
I may look like a mule, but I'm not a complete ass.
Inf0Byt3
PureBasic Fanatic
PureBasic Fanatic
Posts: 2236
Joined: Fri Dec 09, 2005 12:15 pm
Location: Elbonia

Post by Inf0Byt3 »

Hi srod, any chance for a binary version :D ?

Great work!
None are more hopelessly enslaved than those who falsely believe they are free. (Goethe)
srod
PureBasic Expert
PureBasic Expert
Posts: 10589
Joined: Wed Oct 29, 2003 4:35 pm
Location: Beyond the pale...

Post by srod »

Inf0Byt3 wrote:Hi srod, any chance for a binary version :D ?

Great work!
Sorry, I'm not sure what you mean?

Do you mean one which creates hashes/indexes based upon binary data?

If so, then it could easily be achieved by first creating a MD5Fingerprint hash code and then feeding this to the above library etc. It's probably not worth adjusting the library to work with binary data directly, just throw a little function into your own program which creates MD5Fingerprints etc.
I may look like a mule, but I'm not a complete ass.
Inf0Byt3
PureBasic Fanatic
PureBasic Fanatic
Posts: 2236
Joined: Fri Dec 09, 2005 12:15 pm
Location: Elbonia

Post by Inf0Byt3 »

Do you mean one which creates hashes/indexes based upon binary data?
Yup, sorry for the ambiguity... Thank you for the great code and the advice :D!
None are more hopelessly enslaved than those who falsely believe they are free. (Goethe)
srod
PureBasic Expert
PureBasic Expert
Posts: 10589
Joined: Wed Oct 29, 2003 4:35 pm
Location: Beyond the pale...

Post by srod »

You're welcome and thank you for the kind words. :)

I'm glad the code is useful.
I may look like a mule, but I'm not a complete ass.
kinglestat
Enthusiast
Enthusiast
Posts: 746
Joined: Fri Jul 14, 2006 8:53 pm
Location: Malta
Contact:

Post by kinglestat »

I have a question
about these 2 lines in HT_AddItem(HTid, key$, value) procedure

AddElement(_HTentries())
PokeL(index, _HTentries())

isnt the poke writing to the address of the linked list?
and the linked list is private memory (memory being handled by PB)

I am following the logic, and can see the results...but I am lost with this.
I may not help with your coding
Just ask about mental issues!

http://www.lulu.com/spotlight/kingwolf
http://www.sen3.net
User avatar
DoubleDutch
Addict
Addict
Posts: 3220
Joined: Thu Aug 07, 2003 7:01 pm
Location: United Kingdom
Contact:

Post by DoubleDutch »

I already use a hash table (with collision possibility) in my spellchecker routine. After table creation I post sort it so that it can be searched with a binary search. Because of collision possibility I then do a sequential search from the start of the collision "pack".

It seems pretty fast and it searches up to 2x160,000 words on each cursor position change (as you type spell checking). Even on a slow machine you don't know it's doing it (apart from the fact that it underlines the spelling errors).

The reason I check up to 2x the number of words is because if they type a space I have to check left and right words. :)

The post sort/binary search method can only be used if data is sorted each time it's added to the database.
https://deluxepixel.com <- My Business website
https://reportcomplete.com <- School end of term reports system
srod
PureBasic Expert
PureBasic Expert
Posts: 10589
Joined: Wed Oct 29, 2003 4:35 pm
Location: Beyond the pale...

Post by srod »

kinglestat wrote:I have a question
about these 2 lines in HT_AddItem(HTid, key$, value) procedure

AddElement(_HTentries())
PokeL(index, _HTentries())

isnt the poke writing to the address of the linked list?
and the linked list is private memory (memory being handled by PB)

I am following the logic, and can see the results...but I am lost with this.
Hi,

the Poke is writing to the address pointed to by the value of 'index' and it is writing the address of the newly created element in the linked list.

Let me try and explain.

A hash table is simply a 1 dimensional array of pointers. For example :

Code: Select all

HT_CreateHashTable(20)
will allocate enough memory to store 20 pointers. Each pointer points to an element in the global linked list.

When you add a key/value to the hash table, the key gets 'hashed' in order to produce an index into the array of pointers.

For example, with the hash table created above, the line :

Code: Select all

HT_AddItem(HTid, "entry1",100)
with the key "entry1", produces a hash of 13.
This value (13) means that the value we wish to store (100 in this case) will, in principal, be placed within the 13th entry in the hash table.

However, because the 13th entry may already contain values from other keys generating the same hash code, we need a way of storing multiple items in this same element. Thus, instead of placing the value of 100 directly in this 13th element we instead place it into a linked list and place the address of this linked list element into the 13th element of the hash table (with the PokeL() statement).

This is done in such a way that all the entries generating the same hash code from the same table form consecutive elements in the linked list.

Now, when you then issue the following command :

Code: Select all

HT_GetItem(HTid, "entry1")
to retrieve this value, the library rehashes the key "entry1" to get the hash code 13 (again). This is then used to lookup the first entry in the linked list from this hash table sharing the hash code of 13. A quick search through these linked list items either then finds the required value or it doesn't if the key does not exits.

I hope this helps. :)
Last edited by srod on Wed Nov 07, 2007 12:13 pm, edited 1 time in total.
I may look like a mule, but I'm not a complete ass.
Post Reply