A OOP class for dynamically implementing hash tables!
-
- Enthusiast
- Posts: 480
- Joined: Thu Jul 27, 2006 4:06 am
Hi,
Is this "engine" limited to string hashes?, can I use long-int hashes instead?, because I think it defeats the purpose if we compare hash-strings instead of the strings themselves (unless the actual data was longer than the hash length).
I'm using a small array of crc32 hashes to compare instead of comparing my lists of words... is this lib for me?
Thanks
Is this "engine" limited to string hashes?, can I use long-int hashes instead?, because I think it defeats the purpose if we compare hash-strings instead of the strings themselves (unless the actual data was longer than the hash length).
I'm using a small array of crc32 hashes to compare instead of comparing my lists of words... is this lib for me?
Thanks
I'm not sure what you mean exactly?
This library takes a string 'key' and from that generates a 32-bit hash code (which lies within certain bounds). There are no string hashes.
This is how 'traditional' hash tables operate.
The only time strings are compared is when there are 'collisions' and even then a binary search is utilised in the case of my library.
This library takes a string 'key' and from that generates a 32-bit hash code (which lies within certain bounds). There are no string hashes.
This is how 'traditional' hash tables operate.
The only time strings are compared is when there are 'collisions' and even then a binary search is utilised in the case of my library.
I may look like a mule, but I'm not a complete ass.
-
- Enthusiast
- Posts: 480
- Joined: Thu Jul 27, 2006 4:06 am
I'm not really implying anything, so I'm sorry if I wasn't too clear about my question to begin with 
Right now I'm using a list of 32bit hashes to compare strings from one set to another. It all works nicely but I was thinking if your library would do the same trick and if it would also provide me with some new useful functionality I might be lacking currently?.
I also like OOP but I'm wondering if this is the right lib for my case or if I should just leave my code as is.
Thanks

Right now I'm using a list of 32bit hashes to compare strings from one set to another. It all works nicely but I was thinking if your library would do the same trick and if it would also provide me with some new useful functionality I might be lacking currently?.
I also like OOP but I'm wondering if this is the right lib for my case or if I should just leave my code as is.
Thanks
If your situation is such that no two strings can generate the same 32-bit hash code, and your code works fine, then yes I would say that you probably shouldn't switch to my library.superadnim wrote:I'm not really implying anything, so I'm sorry if I wasn't too clear about my question to begin with
Right now I'm using a list of 32bit hashes to compare strings from one set to another. It all works nicely but I was thinking if your library would do the same trick and if it would also provide me with some new useful functionality I might be lacking currently?.
I also like OOP but I'm wondering if this is the right lib for my case or if I should just leave my code as is.
Thanks
A hash table is used to actually store data rather than just compare hashes etc. and so I am a little unsure if a hash table is what you need? When storing such data, there is always the possibility that two items will generate the same hash (memory is finite after all) and thus multiple items will attempt to occupy the same memory etc. This library of mine is what I would describe as a 'heavy duty' hash table in that it not only deals with 'collisions' in a very efficient way, but you can store absolutely any type of data within the hash tables etc. You are not limited to single 32-bit values.
I am just shaping up right now to create a hash table which will potentially be used to store millions of items. I'll probably break the library in the process!

I may look like a mule, but I'm not a complete ass.
-
- Enthusiast
- Posts: 480
- Joined: Thu Jul 27, 2006 4:06 am
Or make it even better!I'll probably break the library in the process!
Heres my situation: I have input (dynamic) data and comparison (static) data. on both cases a hash list is created for comparison of the items. I found this is way faster overall than comparing the raw string data in almost every case (other than 2 or 3 string lengths, in such cases the overhead of calculating the initial hash list was too big and not worth it for a small amount of comparisons overall).
The problem: my method is not very clean, code wise.. I don't like it at all, I'm using arrays, etc. I dislike it, period

That's why I was wondering if I should use your library or not but it appears that's not it's main purpose...
No, if I understand you correctly, then a hashtable does not seem appropriate.
Thinking about what you are doing, did you say you are using CRC32 checksums as your hash? If so, just remember that two different strings could yield the same hash in this case? Thus, you'll need to supplement your hash comparisons in cases where you find two equal hashes etc.
Thinking about what you are doing, did you say you are using CRC32 checksums as your hash? If so, just remember that two different strings could yield the same hash in this case? Thus, you'll need to supplement your hash comparisons in cases where you find two equal hashes etc.
I may look like a mule, but I'm not a complete ass.
-
- Enthusiast
- Posts: 480
- Joined: Thu Jul 27, 2006 4:06 am
Thanks for the info.
Yes, I'm aware of CRC32 and collisions with this algorithm, but I believe there are not enough words in the English dictionary to cause one (correct me if I'm wrong!).
In all cases, the lists are quite tiny (from 50 to 200 unique words on the dynamic side) and not so many on the static side (the user builds this one).
Also, the data is quite small so I don't really see how a collision could happen in my case?, please enlighten me
(and sorry if I somehow hijacked your thread but I thought it was appropriate - at least at the beginning - to ask)
I am toying with another hashing algorithm which is faster and has similar collision rates... I just haven't encountered a single one so far but you are right that I shouldn't think it won't happen ( but if it did, it wouldn't be too "fatal" in my case since the whole process is over viewed by the user ).
Yes, I'm aware of CRC32 and collisions with this algorithm, but I believe there are not enough words in the English dictionary to cause one (correct me if I'm wrong!).
In all cases, the lists are quite tiny (from 50 to 200 unique words on the dynamic side) and not so many on the static side (the user builds this one).
Also, the data is quite small so I don't really see how a collision could happen in my case?, please enlighten me

I am toying with another hashing algorithm which is faster and has similar collision rates... I just haven't encountered a single one so far but you are right that I shouldn't think it won't happen ( but if it did, it wouldn't be too "fatal" in my case since the whole process is over viewed by the user ).
Update : 15/08/08.
Have added a new method; ClearItems(), which removes all items from the underlying hash table but does not destroy the table. This is much faster than simply enumerating all keys and removing the items individually.
The download link is at the bottom of the first post.
Have added a new method; ClearItems(), which removes all items from the underlying hash table but does not destroy the table. This is much faster than simply enumerating all keys and removing the items individually.
The download link is at the bottom of the first post.
I may look like a mule, but I'm not a complete ass.
Hi Stephen,
would it be possible to implement some kind of serialization of the whole hashtable?
So that I could save it to a file and later load it again with all the entries already inserted.
Dunno if you're allocating one block of memory at once or one for each entry when necessary,
so I really don't know how to do it efficiently.
Thanks
would it be possible to implement some kind of serialization of the whole hashtable?
So that I could save it to a file and later load it again with all the entries already inserted.
Dunno if you're allocating one block of memory at once or one for each entry when necessary,
so I really don't know how to do it efficiently.
Thanks
Windows 7 & PureBasic 4.4
The internal data structures are such that a 'memory dump' of a hash table is very difficult!
Your best bet is to simply enumerate the entire hash table and write the data out to disc that way. When loading back you must create a hash table with exactly the same number of indices and then add each element in turn as you load them from disc. Writing a generic routine to do this for any hash table is possible, but is not a trivial exercise. It is also not something I am in need of and I do not have the time at the moment to code such a routine.
Sorry.

Your best bet is to simply enumerate the entire hash table and write the data out to disc that way. When loading back you must create a hash table with exactly the same number of indices and then add each element in turn as you load them from disc. Writing a generic routine to do this for any hash table is possible, but is not a trivial exercise. It is also not something I am in need of and I do not have the time at the moment to code such a routine.
Sorry.

I may look like a mule, but I'm not a complete ass.
- netmaestro
- PureBasic Bullfrog
- Posts: 8451
- Joined: Wed Jul 06, 2005 5:42 am
- Location: Fort Nelson, BC, Canada
Nice work srod! And good choice of hash algorithm imho. May I just make one tiny observation?
This:
would be approx. 15x faster than:
If one were using this for a chess engine there would be many millions of calls to this proc within a relatively short time period.
All in all though, excellent library. Thanks for sharing it.
This:
Code: Select all
Procedure.l CreateHash(numelementsintable, *key.CHARACTER)
Protected hash=5381
While *key\c <> 0
hash=(hash<<5)+hash + *key\c ; hash * 33 + asc
*key+1
Wend
If hash>=0
ProcedureReturn hash%(numelementsintable+1)
Else
ProcedureReturn -(hash%(numelementsintable+1))
EndIf
EndProcedure
Code: Select all
Procedure.l HashTableClass_CreateHash(numelementsintable, key$)
Protected hash=5381, i
If key$
For i = 1 To Len(key$)
hash=(hash<<5)+hash + Asc(Mid(key$,i, 1)) ; hash * 33 + asc
Next
EndIf
If hash>=0
ProcedureReturn hash%(numelementsintable+1)
Else
ProcedureReturn -(hash%(numelementsintable+1))
EndIf
EndProcedure
All in all though, excellent library. Thanks for sharing it.
BERESHEIT