Thankyou everyone so far for your input!!!!!
I probably should mention that the list never changes, so i never need to do modifications such as delete/insert, i simply need the index of string "peach" in the array, and yes it's fine to do a preprocess to for example calculate a list of hashes.
idle I only just saw your post then so i haven't had a good chance to take it all in yet, but looking at its dictionary demo it certainly does look in the ballpark for what i need!
however in my further reading on hash tables I just came across this bit, which clarified for me what i was confused about:
http://algs4.cs.princeton.edu/34hash/
We reference key-value pairs using arrays by doing arithmetic operations to transform keys into array indices.
Search algorithms that use hashing consist of two separate parts. The first step is to compute a hash function that transforms the search key into an array index. Ideally, different keys would map to different indices. This ideal is generally beyond our reach, so we have to face the possibility that two or more different keys may hash to the same array index. Thus, the second part of a hashing search is a collision-resolution process that deals with this situation.
Hash functions: If we have an array that can hold M key-value pairs, then we need a function that can transform any given key into an index into that array: an integer in the range [0, M-1]. We seek a hash function that is both easy to compute and uniformly distributes the keys.
Strings: Modular hashing works for long keys such as strings, too: we simply treat them as huge integers. For example, the code below computes a modular hash function for a String s, where R is a small prime integer (Java uses 31).
Code: Select all
int hash = 0;
int M = 1000; ;array size
int R = 31; ;prime
for (int i = 0; i < s.length(); i++)
hash = (R * hash + s.charAt(i)) % M;
Which i converted to PB:
Code: Select all
sTxt.s = "Hello, world!"
hash.l = 0
M.i = 1000 ;array size
R.i = 31 ;prime
For *pchar.Ascii = @sTxt To @sTxt + Len(sTxt)-1
hash = (R * hash + *pchar\a) % M
Next
Debug Hex(hash)
That's the part I was confused about... the "how to get from a seemingly random 32bit checksum/hash like 0xE9F3ABE2, which is obviously outside array bounds, to a normal array index like 317", but that prime-based modular hash function makes it easy, whew! what a great little addition to the toolbelt
So I think I understand now how we can use it to jump DIRECTLY to the item, which seems even more efficient than a Trie (for this)??? (but im learning as I type here! lol) i wonder if this is the sort of fun Fred was having when implementing the Map functions!
There is still
collision resolution to handle, but it seems the main problem is now dealt with!? hmmm lots more reading to do heehee
Collisions. There are chances that certain strings might cause the same key to be generated. In such cases, the individual has storage is turned into a link list or some other type of storage that can store all the duplicate keys. This is the reason why the individual hash storage is called as a bucket.
ok, that's easy enough ... so instead of expecting each array item to hold one string we expect it to hold a List of strings, and assuming no collisions each List will simply be 1 string in size. I think i get it now! lol
Q. Why does Java use 31 in the hashCode() for String?
A. It's prime, so that when the user mods out by another number, they have no common factors (unless it's a multiple of 31). 31 is also a Mersenne prime (like 127 or 8191) which is a prime number that is one less than a power of 2. This means that the mod can be done with one shift and one subtract if the machine's multiply instruction is slow.
hmm that doesn't explain though why they chose 31 over other Mersenne primes like the one either side of 31... 7 and 127, but
this page addresses that.
another interesting example of "why a prime":
Code: Select all
http://stackoverflow.com/questions/3613102/why-use-a-prime-number-in-hashcode
For example, all 32-bit integers are aligned to addresses divisible by 4. Check out the table below to visualize the effects of using a prime vs. non-prime modulus:
Input Mod8 Mod7 (prime)
0 0 0
4 4 4
8 0 1
12 4 5
16 0 2
20 4 6
24 0 3
28 4 0