Hash tables.

Share your advanced PureBasic knowledge/code with the community.
srod
PureBasic Expert
PureBasic Expert
Posts: 10589
Joined: Wed Oct 29, 2003 4:35 pm
Location: Beyond the pale...

Post by srod »

Actually, he said one million!
I may look like a mule, but I'm not a complete ass.
Foz
Addict
Addict
Posts: 1359
Joined: Tue Nov 13, 2007 12:42 pm
Location: Manchester, UK

Post by Foz »

:P He said both:
I prefer to make a very large hash..say a million records
and
On 600,000 records I save about 38% of ram
Still doesn't detract that that is a lot of hash.

Crap, is that the police dogs drug sniffers? Here doggy, lots of hash here! :)
srod
PureBasic Expert
PureBasic Expert
Posts: 10589
Joined: Wed Oct 29, 2003 4:35 pm
Location: Beyond the pale...

Post by srod »

No worries. My socks normally throw the sniffer dogs off the scent! :wink:
I may look like a mule, but I'm not a complete ass.
kinglestat
Enthusiast
Enthusiast
Posts: 746
Joined: Fri Jul 14, 2006 8:53 pm
Location: Malta
Contact:

Post by kinglestat »

hey
1 million
600,000 is 60% of a million
10byte struct, 10MB of Ram for the hash
whats wrong with that? if on normal runs only 25% of it is used it 2.5MB of RAM, same footprint as an empty notepad uses!
I may not help with your coding
Just ask about mental issues!

http://www.lulu.com/spotlight/kingwolf
http://www.sen3.net
srod
PureBasic Expert
PureBasic Expert
Posts: 10589
Joined: Wed Oct 29, 2003 4:35 pm
Location: Beyond the pale...

Post by srod »

I didn't say there was anything wrong with that. I just didn't expect to find anyone using hash tables that big! :)
I may look like a mule, but I'm not a complete ass.
kinglestat
Enthusiast
Enthusiast
Posts: 746
Joined: Fri Jul 14, 2006 8:53 pm
Location: Malta
Contact:

Post by kinglestat »

well, about 15 years ago I did an atlas called The World on CD-ROM, and I has a hash of 9 million, though "compressed" using a lookup table. thats 15 years ago! Thats where I learned about effing selectors.

But lately i wrote a file utility, and, well my laptop has 850,000 files and 110,000 folders...I assume I'm not the only one with lots of files

For those who are curious, if you can pre process data, you can create arrays with pre-established values, so empy spaces in hash are mostly eliminated.
I may not help with your coding
Just ask about mental issues!

http://www.lulu.com/spotlight/kingwolf
http://www.sen3.net
Dare
Addict
Addict
Posts: 1965
Joined: Mon May 29, 2006 1:01 am
Location: Outback

Post by Dare »

With the numbers and speeds suggested here, I am wondering ..

.. just curious, sorry for the not quite OT post ..

.. do database apps (mySql, access, postOgre and etc) use hash tables or balanced B-tree like indexing or something else altogether - or a mix of methods?
Dare2 cut down to size
srod
PureBasic Expert
PureBasic Expert
Posts: 10589
Joined: Wed Oct 29, 2003 4:35 pm
Location: Beyond the pale...

Post by srod »

Most contemporary databases will use b-trees or possibly skip-lists (or both!) in order to employ multiple indexes etc.

For large dynamic relational databases, I wouldn't think that a hash table would be appropriate.
I may look like a mule, but I'm not a complete ass.
kinglestat
Enthusiast
Enthusiast
Posts: 746
Joined: Fri Jul 14, 2006 8:53 pm
Location: Malta
Contact:

Post by kinglestat »

oracle and mysql use hashes too, in mysql its user selectable. Microsoft are not too forthcoming on mssql. I'm not sure what space enters into it, maybe srod cares to elaborate. Though from experience its quicker to iterate a million records in memory than read 1000 records from disk;keep in mind that memory only has about 50 tick delay per clock cycle at most, while disk access runs into thousands, bus speed of hdd controller is much slower than RAM controller, disk themselves are slower etc.

I mean a hash has the overhead of calculating the hash, and this extra overhead has to be compensated by the dataset; so larger datasets (with few collisions) woude make hashes faster than say a binary search. So if compare is 20 time faster than calculating a hash, a hash woule be more efficient than a binarysearch when there are more than 512k records. If I'm not missing somehting?

There is another point to consider with balanced trees, red/black, B2 etc that on verey large datasets...say 500mill records, fill factors make for very large files, and there is more probability of tree becoming unbalanced; and nowadays there is a greater need for 24/7 databases. Hashes dont become unbalanced except for collisions. And again in a db a hash would reference the page rather than record,and in the page you have indexing for the various records inside the page
I may not help with your coding
Just ask about mental issues!

http://www.lulu.com/spotlight/kingwolf
http://www.sen3.net
srod
PureBasic Expert
PureBasic Expert
Posts: 10589
Joined: Wed Oct 29, 2003 4:35 pm
Location: Beyond the pale...

Post by srod »

If you're telling me that these commercial grade DBMS use hash tables for their disk files... Have I understood you correctly? I can imagine them holding the data in memory and indexing through hash tables, but surely not on disc!

Actually, to change the subject, I've just started using MySQL and php for the first time ever having installed Apache on my other laptop. Very nice. I'm enjoying learning php - though I'd rather use PB of course for server side scripting. Fat chance of that whilst using a 3rd party hosting service I guess! Still, it was always about time I started learning this stuff! :)
Last edited by srod on Sat Nov 24, 2007 2:00 pm, edited 1 time in total.
I may look like a mule, but I'm not a complete ass.
Dare
Addict
Addict
Posts: 1965
Joined: Mon May 29, 2006 1:01 am
Location: Outback

Post by Dare »

Interesting. Thanks.

Back in the days when the dinosaurs roamed and things like 3rd Generation Languages were but a twinkle in the eye, the apps I wrote used BTree+ to handle indexing. I didn't really consider hash tables would be suitable for the task.

But this thread gave me some pause for thought.
Dare2 cut down to size
kinglestat
Enthusiast
Enthusiast
Posts: 746
Joined: Fri Jul 14, 2006 8:53 pm
Location: Malta
Contact:

Post by kinglestat »

well, dare, in the past calculating hashes was processor intensive, and no math coprocessors, low memroy...so a binary search and derivative fit bill nicely.

srod: In the past prior to use PB I used winbatch; they have a server side engine for it too

And re hashes for DB. I mean its pretty easy if your index uses fixed length keys...somehting like md5 or sha1 for large datasets...then you seek to record number * sizeof ( record_index )
I may not help with your coding
Just ask about mental issues!

http://www.lulu.com/spotlight/kingwolf
http://www.sen3.net
Motu
Enthusiast
Enthusiast
Posts: 160
Joined: Tue Oct 19, 2004 12:24 pm

Problem with you hash table:

Post by Motu »

Code: Select all

HTid = HT_CreateHashTable(5000)

;Let's add some data.
;Hash table entries are indexed by a string; here we use 5 basic 'key' values; 'entry1'
;'entry2' etc.
 Dim MeinArray.l(65000)
 For n = 1 To 65000
  value.l = 1000000 + Random(3000000)
  MeinArray(n) = value
  HT_AddItem(HTid,Str(value),100 + Random(100))
 Next

 MessageRequester("All Add done","",0)
 
 For n = 1 To 65000
  If Random(5) = 0
   HT_RemoveItem(HTid,Str(MeinArray(n)))
  EndIf
 Next

 MessageRequester("All Remove done","",0)

 For n = 1 To 65000
  value.l = 1000000 + Random(3000000)
  MeinArray(n) = value
  HT_AddItem(HTid,Str(value),100 + Random(100))
 Next

 MessageRequester("Final done","",0)

;Lets have a look at our data.
;Access is very fast indeed. Most access will be of order O(1). Worst case scenario is O(n),
;but this is very unlikely to ever happen.


For i = 1 To 1
  Debug HT_GetItem(HTid,Str(wert))
Next
add this to your code and it will crash after adding new elements when elements have beend removed before - any idea?
User avatar
Rook Zimbabwe
Addict
Addict
Posts: 4322
Joined: Tue Jan 02, 2007 8:16 pm
Location: Cypress TX
Contact:

Post by Rook Zimbabwe »

add this to your code and it will crash after adding new elements when elements have beend removed before - any idea?
Man walks in to a Doctors office... He raises his right arm and says "Doc it hurts when I do this!"
Doctor says... "Don't do that then!"


You probably need to reindex the table in some fashion after you delete items...
Binarily speaking... it takes 10 to Tango!!!

Image
http://www.bluemesapc.com/
User avatar
DoubleDutch
Addict
Addict
Posts: 3220
Joined: Thu Aug 07, 2003 7:01 pm
Location: United Kingdom
Contact:

Post by DoubleDutch »

SRod wrote:Fat chance of that whilst using a 3rd party hosting service I guess
I apparently can run Linux PB programs on my PS enabled DreamHost account. You can point any port from 8000 to 65535 to a program (only on a PS enabled account), the program can be persistent and its process will not be deleted by DreamHost.

I haven't tried it yet, but it would be good for checking keys, generating graphics or other things that could be a little too complex to be easy in php or ruby?

See http://www.dreamhostps.com/ for more info...
https://deluxepixel.com <- My Business website
https://reportcomplete.com <- School end of term reports system
Post Reply