Page 2 of 5

Posted: Sun Nov 19, 2006 7:16 pm
by netmaestro
Large data sizes are where hash tables come into their own. One common use for them is in chess algorithms where the evaluations for many millions of possibilities must be searched over and over again. You won't find a mainstream chess program made today that doesn't employ hash tables, as there just isn't anything that comes close for huge jobs.

srod, thanks for posting that! I'll be putting it to work pretty soon coming up with a chess program that'll beat the world champion's cousin's retarded brother.

Posted: Sun Nov 19, 2006 7:19 pm
by srod
netmaestro wrote:srod, thanks for posting that! I'll be putting it to work pretty soon coming up with a chess program that'll beat the world champion's cousin's retarded brother.
:lol:

Too late, I wrote one of those a few years back except that it had a tendency to throw the game after two moves!

Posted: Tue Nov 21, 2006 11:36 am
by kinglestat
typically when I am shopping for an idea from my salad days I find srod has anticipated me and already produced working code!!
Guys, you have other in your blood stream!

and thanks

KingLestat

Posted: Tue Nov 21, 2006 12:03 pm
by srod
I like to anticipate people's needs! :wink: Although my girlfriend doesn't think so! :?

You're welcome Kinglestat, I hope it's useful.

Posted: Tue Nov 21, 2006 11:20 pm
by blueznl
you people are too far ahead of me, but, euh, could i look at it like a long array like this:

Code: Select all

Structure hash_table
  hash.s
  text.s
EndStructure

Dim test.hash_table(100)

For n = 1 To 100
  test(n)\text = Str(Random(999999))+Str(Random(999999))+Space(100)
  test(n)\hash = MD5Fingerprint(test(n)\text,100)
Next n
then sort the array on the hash and use the hash to find the string back using binairy search?

Posted: Tue Nov 21, 2006 11:43 pm
by srod
blueznl wrote:you people are too far ahead of me, but, euh, could i look at it like a long array like this:

Code: Select all

Structure hash_table
  hash.s
  text.s
EndStructure

Dim test.hash_table(100)

For n = 1 To 100
  test(n)\text = Str(Random(999999))+Str(Random(999999))+Space(100)
  test(n)\hash = MD5Fingerprint(test(n)\text,100)
Next n
then sort the array on the hash and use the hash to find the string back using binairy search?

If you're going to do this, why not just sort on the original text and then use a binary search etc? The hash is pretty much redundant in the above code as you're using a text string to index another text string!


A proper hash table has a number of advantages over the method you've outlined, including:
  • faster access - could be as quick as single lookup (order 1) as opposed to order log2 (n) etc.
  • the hash is numeric as opposed to textual and thus links directly into the underlying table/array
  • no need to continually ensure that the data is ordered (to facilitate the binary chop)
etc.

There's probably more! :)

Hash tables are used extensively.

Posted: Wed Nov 22, 2006 12:10 am
by blueznl
'cause searching through a list with elements of variable length means you'll need either another list to index the exact starting positions of the strings, or you have to keep track of all the string lengths

also on very long yet similar texts a search on the actual text is perhaps slower, i dunno' :-)

simplified: i haven't got a clue what we're doing here :-)

Posted: Wed Nov 22, 2006 12:19 am
by srod
blueznl wrote:'cause searching through a list with elements of variable length means you'll need either another list to index the exact starting strings, or you have to keep track of all the string lengths
Not sure I understand! :?

Posted: Sat Dec 09, 2006 9:02 am
by Inf0Byt3
Hi srod, any chance for a binary version :D ?

Great work!

Posted: Sat Dec 09, 2006 12:24 pm
by srod
Inf0Byt3 wrote:Hi srod, any chance for a binary version :D ?

Great work!
Sorry, I'm not sure what you mean?

Do you mean one which creates hashes/indexes based upon binary data?

If so, then it could easily be achieved by first creating a MD5Fingerprint hash code and then feeding this to the above library etc. It's probably not worth adjusting the library to work with binary data directly, just throw a little function into your own program which creates MD5Fingerprints etc.

Posted: Sat Dec 09, 2006 2:48 pm
by Inf0Byt3
Do you mean one which creates hashes/indexes based upon binary data?
Yup, sorry for the ambiguity... Thank you for the great code and the advice :D!

Posted: Sat Dec 09, 2006 2:50 pm
by srod
You're welcome and thank you for the kind words. :)

I'm glad the code is useful.

Posted: Wed Nov 07, 2007 3:06 am
by kinglestat
I have a question
about these 2 lines in HT_AddItem(HTid, key$, value) procedure

AddElement(_HTentries())
PokeL(index, _HTentries())

isnt the poke writing to the address of the linked list?
and the linked list is private memory (memory being handled by PB)

I am following the logic, and can see the results...but I am lost with this.

Posted: Wed Nov 07, 2007 9:03 am
by DoubleDutch
I already use a hash table (with collision possibility) in my spellchecker routine. After table creation I post sort it so that it can be searched with a binary search. Because of collision possibility I then do a sequential search from the start of the collision "pack".

It seems pretty fast and it searches up to 2x160,000 words on each cursor position change (as you type spell checking). Even on a slow machine you don't know it's doing it (apart from the fact that it underlines the spelling errors).

The reason I check up to 2x the number of words is because if they type a space I have to check left and right words. :)

The post sort/binary search method can only be used if data is sorted each time it's added to the database.

Posted: Wed Nov 07, 2007 12:11 pm
by srod
kinglestat wrote:I have a question
about these 2 lines in HT_AddItem(HTid, key$, value) procedure

AddElement(_HTentries())
PokeL(index, _HTentries())

isnt the poke writing to the address of the linked list?
and the linked list is private memory (memory being handled by PB)

I am following the logic, and can see the results...but I am lost with this.
Hi,

the Poke is writing to the address pointed to by the value of 'index' and it is writing the address of the newly created element in the linked list.

Let me try and explain.

A hash table is simply a 1 dimensional array of pointers. For example :

Code: Select all

HT_CreateHashTable(20)
will allocate enough memory to store 20 pointers. Each pointer points to an element in the global linked list.

When you add a key/value to the hash table, the key gets 'hashed' in order to produce an index into the array of pointers.

For example, with the hash table created above, the line :

Code: Select all

HT_AddItem(HTid, "entry1",100)
with the key "entry1", produces a hash of 13.
This value (13) means that the value we wish to store (100 in this case) will, in principal, be placed within the 13th entry in the hash table.

However, because the 13th entry may already contain values from other keys generating the same hash code, we need a way of storing multiple items in this same element. Thus, instead of placing the value of 100 directly in this 13th element we instead place it into a linked list and place the address of this linked list element into the 13th element of the hash table (with the PokeL() statement).

This is done in such a way that all the entries generating the same hash code from the same table form consecutive elements in the linked list.

Now, when you then issue the following command :

Code: Select all

HT_GetItem(HTid, "entry1")
to retrieve this value, the library rehashes the key "entry1" to get the hash code 13 (again). This is then used to lookup the first entry in the linked list from this hash table sharing the hash code of 13. A quick search through these linked list items either then finds the required value or it doesn't if the key does not exits.

I hope this helps. :)