Hash tables.
-
- Enthusiast
- Posts: 746
- Joined: Fri Jul 14, 2006 8:53 pm
- Location: Malta
- Contact:
hey
1 million
600,000 is 60% of a million
10byte struct, 10MB of Ram for the hash
whats wrong with that? if on normal runs only 25% of it is used it 2.5MB of RAM, same footprint as an empty notepad uses!
1 million
600,000 is 60% of a million
10byte struct, 10MB of Ram for the hash
whats wrong with that? if on normal runs only 25% of it is used it 2.5MB of RAM, same footprint as an empty notepad uses!
I may not help with your coding
Just ask about mental issues!
http://www.lulu.com/spotlight/kingwolf
http://www.sen3.net
Just ask about mental issues!
http://www.lulu.com/spotlight/kingwolf
http://www.sen3.net
-
- Enthusiast
- Posts: 746
- Joined: Fri Jul 14, 2006 8:53 pm
- Location: Malta
- Contact:
well, about 15 years ago I did an atlas called The World on CD-ROM, and I has a hash of 9 million, though "compressed" using a lookup table. thats 15 years ago! Thats where I learned about effing selectors.
But lately i wrote a file utility, and, well my laptop has 850,000 files and 110,000 folders...I assume I'm not the only one with lots of files
For those who are curious, if you can pre process data, you can create arrays with pre-established values, so empy spaces in hash are mostly eliminated.
But lately i wrote a file utility, and, well my laptop has 850,000 files and 110,000 folders...I assume I'm not the only one with lots of files
For those who are curious, if you can pre process data, you can create arrays with pre-established values, so empy spaces in hash are mostly eliminated.
I may not help with your coding
Just ask about mental issues!
http://www.lulu.com/spotlight/kingwolf
http://www.sen3.net
Just ask about mental issues!
http://www.lulu.com/spotlight/kingwolf
http://www.sen3.net
-
- Enthusiast
- Posts: 746
- Joined: Fri Jul 14, 2006 8:53 pm
- Location: Malta
- Contact:
oracle and mysql use hashes too, in mysql its user selectable. Microsoft are not too forthcoming on mssql. I'm not sure what space enters into it, maybe srod cares to elaborate. Though from experience its quicker to iterate a million records in memory than read 1000 records from disk;keep in mind that memory only has about 50 tick delay per clock cycle at most, while disk access runs into thousands, bus speed of hdd controller is much slower than RAM controller, disk themselves are slower etc.
I mean a hash has the overhead of calculating the hash, and this extra overhead has to be compensated by the dataset; so larger datasets (with few collisions) woude make hashes faster than say a binary search. So if compare is 20 time faster than calculating a hash, a hash woule be more efficient than a binarysearch when there are more than 512k records. If I'm not missing somehting?
There is another point to consider with balanced trees, red/black, B2 etc that on verey large datasets...say 500mill records, fill factors make for very large files, and there is more probability of tree becoming unbalanced; and nowadays there is a greater need for 24/7 databases. Hashes dont become unbalanced except for collisions. And again in a db a hash would reference the page rather than record,and in the page you have indexing for the various records inside the page
I mean a hash has the overhead of calculating the hash, and this extra overhead has to be compensated by the dataset; so larger datasets (with few collisions) woude make hashes faster than say a binary search. So if compare is 20 time faster than calculating a hash, a hash woule be more efficient than a binarysearch when there are more than 512k records. If I'm not missing somehting?
There is another point to consider with balanced trees, red/black, B2 etc that on verey large datasets...say 500mill records, fill factors make for very large files, and there is more probability of tree becoming unbalanced; and nowadays there is a greater need for 24/7 databases. Hashes dont become unbalanced except for collisions. And again in a db a hash would reference the page rather than record,and in the page you have indexing for the various records inside the page
I may not help with your coding
Just ask about mental issues!
http://www.lulu.com/spotlight/kingwolf
http://www.sen3.net
Just ask about mental issues!
http://www.lulu.com/spotlight/kingwolf
http://www.sen3.net
If you're telling me that these commercial grade DBMS use hash tables for their disk files... Have I understood you correctly? I can imagine them holding the data in memory and indexing through hash tables, but surely not on disc!
Actually, to change the subject, I've just started using MySQL and php for the first time ever having installed Apache on my other laptop. Very nice. I'm enjoying learning php - though I'd rather use PB of course for server side scripting. Fat chance of that whilst using a 3rd party hosting service I guess! Still, it was always about time I started learning this stuff!
Actually, to change the subject, I've just started using MySQL and php for the first time ever having installed Apache on my other laptop. Very nice. I'm enjoying learning php - though I'd rather use PB of course for server side scripting. Fat chance of that whilst using a 3rd party hosting service I guess! Still, it was always about time I started learning this stuff!

Last edited by srod on Sat Nov 24, 2007 2:00 pm, edited 1 time in total.
I may look like a mule, but I'm not a complete ass.
Interesting. Thanks.
Back in the days when the dinosaurs roamed and things like 3rd Generation Languages were but a twinkle in the eye, the apps I wrote used BTree+ to handle indexing. I didn't really consider hash tables would be suitable for the task.
But this thread gave me some pause for thought.
Back in the days when the dinosaurs roamed and things like 3rd Generation Languages were but a twinkle in the eye, the apps I wrote used BTree+ to handle indexing. I didn't really consider hash tables would be suitable for the task.
But this thread gave me some pause for thought.
Dare2 cut down to size
-
- Enthusiast
- Posts: 746
- Joined: Fri Jul 14, 2006 8:53 pm
- Location: Malta
- Contact:
well, dare, in the past calculating hashes was processor intensive, and no math coprocessors, low memroy...so a binary search and derivative fit bill nicely.
srod: In the past prior to use PB I used winbatch; they have a server side engine for it too
And re hashes for DB. I mean its pretty easy if your index uses fixed length keys...somehting like md5 or sha1 for large datasets...then you seek to record number * sizeof ( record_index )
srod: In the past prior to use PB I used winbatch; they have a server side engine for it too
And re hashes for DB. I mean its pretty easy if your index uses fixed length keys...somehting like md5 or sha1 for large datasets...then you seek to record number * sizeof ( record_index )
I may not help with your coding
Just ask about mental issues!
http://www.lulu.com/spotlight/kingwolf
http://www.sen3.net
Just ask about mental issues!
http://www.lulu.com/spotlight/kingwolf
http://www.sen3.net
Problem with you hash table:
Code: Select all
HTid = HT_CreateHashTable(5000)
;Let's add some data.
;Hash table entries are indexed by a string; here we use 5 basic 'key' values; 'entry1'
;'entry2' etc.
Dim MeinArray.l(65000)
For n = 1 To 65000
value.l = 1000000 + Random(3000000)
MeinArray(n) = value
HT_AddItem(HTid,Str(value),100 + Random(100))
Next
MessageRequester("All Add done","",0)
For n = 1 To 65000
If Random(5) = 0
HT_RemoveItem(HTid,Str(MeinArray(n)))
EndIf
Next
MessageRequester("All Remove done","",0)
For n = 1 To 65000
value.l = 1000000 + Random(3000000)
MeinArray(n) = value
HT_AddItem(HTid,Str(value),100 + Random(100))
Next
MessageRequester("Final done","",0)
;Lets have a look at our data.
;Access is very fast indeed. Most access will be of order O(1). Worst case scenario is O(n),
;but this is very unlikely to ever happen.
For i = 1 To 1
Debug HT_GetItem(HTid,Str(wert))
Next
- Rook Zimbabwe
- Addict
- Posts: 4322
- Joined: Tue Jan 02, 2007 8:16 pm
- Location: Cypress TX
- Contact:
Man walks in to a Doctors office... He raises his right arm and says "Doc it hurts when I do this!"add this to your code and it will crash after adding new elements when elements have beend removed before - any idea?
Doctor says... "Don't do that then!"
You probably need to reindex the table in some fashion after you delete items...
- DoubleDutch
- Addict
- Posts: 3220
- Joined: Thu Aug 07, 2003 7:01 pm
- Location: United Kingdom
- Contact:
I apparently can run Linux PB programs on my PS enabled DreamHost account. You can point any port from 8000 to 65535 to a program (only on a PS enabled account), the program can be persistent and its process will not be deleted by DreamHost.SRod wrote:Fat chance of that whilst using a 3rd party hosting service I guess
I haven't tried it yet, but it would be good for checking keys, generating graphics or other things that could be a little too complex to be easy in php or ruby?
See http://www.dreamhostps.com/ for more info...
https://deluxepixel.com <- My Business website
https://reportcomplete.com <- School end of term reports system
https://reportcomplete.com <- School end of term reports system