PureBasic Forums - English

Posted: **Fri Nov 23, 2007 8:36 pm**

Actually, he said one million!

Posted: **Fri Nov 23, 2007 8:39 pm**

He said both:

I prefer to make a very large hash..say a million records

and

On 600,000 records I save about 38% of ram

Still doesn't detract that that is a lot of hash.

Crap, is that the police dogs drug sniffers? Here doggy, lots of hash here!

Posted: **Fri Nov 23, 2007 8:42 pm**

No worries. My socks normally throw the sniffer dogs off the scent!

Posted: **Fri Nov 23, 2007 9:07 pm**

hey
1 million
600,000 is 60% of a million
10byte struct, 10MB of Ram for the hash
whats wrong with that? if on normal runs only 25% of it is used it 2.5MB of RAM, same footprint as an empty notepad uses!

Posted: **Fri Nov 23, 2007 9:19 pm**

I didn't say there was anything wrong with that. I just didn't expect to find anyone using hash tables that big!

Posted: **Fri Nov 23, 2007 9:29 pm**

well, about 15 years ago I did an atlas called The World on CD-ROM, and I has a hash of 9 million, though "compressed" using a lookup table. thats 15 years ago! Thats where I learned about effing selectors.

But lately i wrote a file utility, and, well my laptop has 850,000 files and 110,000 folders...I assume I'm not the only one with lots of files

For those who are curious, if you can pre process data, you can create arrays with pre-established values, so empy spaces in hash are mostly eliminated.

Posted: **Fri Nov 23, 2007 11:05 pm**

With the numbers and speeds suggested here, I am wondering ..

.. just curious, sorry for the not quite OT post ..

.. do database apps (mySql, access, postOgre and etc) use hash tables or balanced B-tree like indexing or something else altogether - or a mix of methods?

Posted: **Fri Nov 23, 2007 11:07 pm**

Most contemporary databases will use b-trees or possibly skip-lists (or both!) in order to employ multiple indexes etc.

For large dynamic relational databases, I wouldn't think that a hash table would be appropriate.

Posted: **Sat Nov 24, 2007 8:43 am**

oracle and mysql use hashes too, in mysql its user selectable. Microsoft are not too forthcoming on mssql. I'm not sure what space enters into it, maybe srod cares to elaborate. Though from experience its quicker to iterate a million records in memory than read 1000 records from disk;keep in mind that memory only has about 50 tick delay per clock cycle at most, while disk access runs into thousands, bus speed of hdd controller is much slower than RAM controller, disk themselves are slower etc.

I mean a hash has the overhead of calculating the hash, and this extra overhead has to be compensated by the dataset; so larger datasets (with few collisions) woude make hashes faster than say a binary search. So if compare is 20 time faster than calculating a hash, a hash woule be more efficient than a binarysearch when there are more than 512k records. If I'm not missing somehting?

There is another point to consider with balanced trees, red/black, B2 etc that on verey large datasets...say 500mill records, fill factors make for very large files, and there is more probability of tree becoming unbalanced; and nowadays there is a greater need for 24/7 databases. Hashes dont become unbalanced except for collisions. And again in a db a hash would reference the page rather than record,and in the page you have indexing for the various records inside the page

Posted: **Sat Nov 24, 2007 12:39 pm**

If you're telling me that these commercial grade DBMS use hash tables for their disk files... Have I understood you correctly? I can imagine them holding the data in memory and indexing through hash tables, but surely not on disc!

Actually, to change the subject, I've just started using MySQL and php for the first time ever having installed Apache on my other laptop. Very nice. I'm enjoying learning php - though I'd rather use PB of course for server side scripting. Fat chance of that whilst using a 3rd party hosting service I guess! Still, it was always about time I started learning this stuff!

Posted: **Sat Nov 24, 2007 1:03 pm**

Interesting. Thanks.

Back in the days when the dinosaurs roamed and things like 3rd Generation Languages were but a twinkle in the eye, the apps I wrote used BTree+ to handle indexing. I didn't really consider hash tables would be suitable for the task.

But this thread gave me some pause for thought.

Posted: **Sat Nov 24, 2007 4:49 pm**

well, dare, in the past calculating hashes was processor intensive, and no math coprocessors, low memroy...so a binary search and derivative fit bill nicely.

srod: In the past prior to use PB I used winbatch; they have a server side engine for it too

And re hashes for DB. I mean its pretty easy if your index uses fixed length keys...somehting like md5 or sha1 for large datasets...then you seek to record number * sizeof ( record_index )

Posted: **Fri Mar 07, 2008 11:58 pm**

Code: Select all

HTid = HT_CreateHashTable(5000)

;Let's add some data.
;Hash table entries are indexed by a string; here we use 5 basic 'key' values; 'entry1'
;'entry2' etc.
 Dim MeinArray.l(65000)
 For n = 1 To 65000
  value.l = 1000000 + Random(3000000)
  MeinArray(n) = value
  HT_AddItem(HTid,Str(value),100 + Random(100))
 Next

 MessageRequester("All Add done","",0)
 
 For n = 1 To 65000
  If Random(5) = 0
   HT_RemoveItem(HTid,Str(MeinArray(n)))
  EndIf
 Next

 MessageRequester("All Remove done","",0)

 For n = 1 To 65000
  value.l = 1000000 + Random(3000000)
  MeinArray(n) = value
  HT_AddItem(HTid,Str(value),100 + Random(100))
 Next

 MessageRequester("Final done","",0)

;Lets have a look at our data.
;Access is very fast indeed. Most access will be of order O(1). Worst case scenario is O(n),
;but this is very unlikely to ever happen.


For i = 1 To 1
  Debug HT_GetItem(HTid,Str(wert))
Next

add this to your code and it will crash after adding new elements when elements have beend removed before - any idea?

Posted: **Sat Mar 08, 2008 12:07 am**

add this to your code and it will crash after adding new elements when elements have beend removed before - any idea?

Man walks in to a Doctors office... He raises his right arm and says "Doc it hurts when I do this!"
Doctor says... "Don't do that then!"

You probably need to reindex the table in some fashion after you delete items...

Posted: **Sat Mar 08, 2008 12:52 am**

SRod wrote:Fat chance of that whilst using a 3rd party hosting service I guess

I apparently can run Linux PB programs on my PS enabled DreamHost account. You can point any port from 8000 to 65535 to a program (only on a PS enabled account), the program can be persistent and its process will not be deleted by DreamHost.

I haven't tried it yet, but it would be good for checking keys, generating graphics or other things that could be a little too complex to be easy in php or ruby?

See http://www.dreamhostps.com/ for more info...

PureBasic Forums - English

Hash tables.

Problem with you hash table: