Re: SQUINT 3, Sparse Quad Union Indexed Nibble Trie
Posted: Tue Oct 10, 2023 3:18 am
yes its just a trie the problem with tries is they use a lot of memory depending on how big their alphabet is, so how do you
make them smaller.
IF you made a Trie to handle normal 16bit unicode
Each node would have 65536 children plus a value so that's 524,296 bytes per node
you enter a key "ABBA" and it is now = 2,097,184 bytes with a depth of 4
so then you make a 256 way trie that,s 2056 bytes per node indexed by bytes
you enter a key "ABBA" and it is now = 16,448 bytes and the depth of 8
so then you make a 16 way trie that's 136 bytes per node indexed by nibbles
you enter a key "ABBA" and it is now = 2,176 bytes and the depth of 16
and then you make a 4 way trie range so that's 40 bytes per node by 2 bits
you enter a key "ABBA" and it is now = 1280 bytes and the depth of 32
Next you convert the strings to utf8 and use a 16 way trie
you enter a key "ABBA" and it is now = 1088 bytes and the depth of 8
then make it sparse so it's only 1 to 16 children you use a quad as an index for the 16 nodes of 4 bits
and you reduce the node to 16 bytes so that it's nicely aligned
you enter a key "ABBA" and it is now = 144 bytes and the depth is 8+1 for the dangling value node
but once you load it up the total memory savings are between 32 to 64 times smaller.
The next step is to truncate it but I've been holding off on that while I figure out the properties
I've got a plan how to do it without increasing the node size and if you want to make a trie case insensitive you need to store the key so that is what I will do.
make them smaller.
IF you made a Trie to handle normal 16bit unicode
Each node would have 65536 children plus a value so that's 524,296 bytes per node
you enter a key "ABBA" and it is now = 2,097,184 bytes with a depth of 4
so then you make a 256 way trie that,s 2056 bytes per node indexed by bytes
you enter a key "ABBA" and it is now = 16,448 bytes and the depth of 8
so then you make a 16 way trie that's 136 bytes per node indexed by nibbles
you enter a key "ABBA" and it is now = 2,176 bytes and the depth of 16
and then you make a 4 way trie range so that's 40 bytes per node by 2 bits
you enter a key "ABBA" and it is now = 1280 bytes and the depth of 32
Next you convert the strings to utf8 and use a 16 way trie
you enter a key "ABBA" and it is now = 1088 bytes and the depth of 8
then make it sparse so it's only 1 to 16 children you use a quad as an index for the 16 nodes of 4 bits
and you reduce the node to 16 bytes so that it's nicely aligned
you enter a key "ABBA" and it is now = 144 bytes and the depth is 8+1 for the dangling value node
but once you load it up the total memory savings are between 32 to 64 times smaller.
The next step is to truncate it but I've been holding off on that while I figure out the properties
I've got a plan how to do it without increasing the node size and if you want to make a trie case insensitive you need to store the key so that is what I will do.