yes its just a trie the problem with tries is they use a lot of memory depending on how big their alphabet is, so how do you
make them smaller.
IF you made a Trie to handle normal 16bit unicode
Each node would have 65536 children plus a value so that's 524,296 bytes per node
you enter a key "ABBA" and it is now = 2,097,184 bytes with a depth of 4
so then you make a 256 way trie that,s 2056 bytes per node indexed by bytes
you enter a key "ABBA" and it is now = 16,448 bytes and the depth of 8
so then you make a 16 way trie that's 136 bytes per node indexed by nibbles
you enter a key "ABBA" and it is now = 2,176 bytes and the depth of 16
and then you make a 4 way trie range so that's 40 bytes per node by 2 bits
you enter a key "ABBA" and it is now = 1280 bytes and the depth of 32
Next you convert the strings to utf8 and use a 16 way trie
you enter a key "ABBA" and it is now = 1088 bytes and the depth of 8
then make it sparse so it's only 1 to 16 children you use a quad as an index for the 16 nodes of 4 bits
and you reduce the node to 16 bytes so that it's nicely aligned
you enter a key "ABBA" and it is now = 144 bytes and the depth is 8+1 for the dangling value node
but once you load it up the total memory savings are between 32 to 64 times smaller.
The next step is to truncate it but I've been holding off on that while I figure out the properties
I've got a plan how to do it without increasing the node size and if you want to make a trie case insensitive you need to store the key so that is what I will do.
SQUINT 3, Sparse Quad Union Indexed Nibble Trie
Re: SQUINT 3, Sparse Quad Union Indexed Nibble Trie
I do not know the word trie. Have you got a namesake ?
Re: SQUINT 3, Sparse Quad Union Indexed Nibble Trie
Google Translate gives me the french verb "to try" from the english "trie". It finds also a geographic definition in France : Trie-sur-Baïse Thank you for the reply which gives the right definition. I will try to work on this concept while I have some time. I used a similar technic to compress a dictionary, and zip does not do a better compressing.
I think, first, a 64Kb array is required if only less than 256 characters are used. This does not change a lot the size of such a system. But it helps to grow the memory needs down.
Edit: after read the wiki article, I went to radix tree.
ex1:
ex2:
This 2nd ex gives ie "MainGadget #a", "MainWindow #2"
I think, first, a 64Kb array is required if only less than 256 characters are used. This does not change a lot the size of such a system. But it helps to grow the memory needs down.
Edit: after read the wiki article, I went to radix tree.
ex1:
Code: Select all
A - BBA
|\
| - KA
\
- LIAS
Code: Select all
Main - Window # - 1
| \
\ - 2
\
- Gadget # - a
\
- b
Re: SQUINT 3, Sparse Quad Union Indexed Nibble Trie
A radix tree is a little more complicated to implemen but the size and speed doesn't work out much different in practice, though it depends on the keys, when I implement shortening the key depth it will be interesting to see the difference.
Re: SQUINT 3, Sparse Quad Union Indexed Nibble Trie
Squint 3.2
Added Binary key functions and an optional Hash in numeric for large keys
added basic example of using binary keys and a test of using hash mode of numeric functions
https://github.com/idle-PB/Squint3
Added Binary key functions and an optional Hash in numeric for large keys
added basic example of using binary keys and a test of using hash mode of numeric functions
https://github.com/idle-PB/Squint3
Re: SQUINT 3, Sparse Quad Union Indexed Nibble Trie
Added Merge so writes and enumerations aren't blocked from each other. So it's lock free for concurrent readers, single writer, enumeration
This will later facilitate Acid gets and writes during enumerations, where
gets are immediate
writes are immediate
enumerations are immediate in respect to existing keys being updated during an enumeration
Test is with random keys 8 to 16 bytes with an initial size of 16m keys
8 readers 2 writers and 2 prefix enumerators
Test needs at least a 6 core processor with 12 threads
This will later facilitate Acid gets and writes during enumerations, where
gets are immediate
writes are immediate
enumerations are immediate in respect to existing keys being updated during an enumeration
Test is with random keys 8 to 16 bytes with an initial size of 16m keys
8 readers 2 writers and 2 prefix enumerators
Test needs at least a 6 core processor with 12 threads
Number of Keys 17,167,950
Memory 1378.52mb
Keysize 261.26 mb
Overhead 5.2764105797
Total Keys 11,037,573 p/s
Lookup Keys 9,921,537 p/s
Lookup Rate 120.18 mb p/s
Lookup Time 100.79 ns
Write Keys 488,509 p/s
Write Rate 7.39 mb p/s
Write Time 2.05 µs
Enums Keys 627,527
Enum Rate 7.10 mb p/s
Read thread 0 1,260,934
Read thread 1 1,229,538
Read thread 2 1,264,255
Read thread 3 1,206,778
Read thread 4 1,233,815
Read thread 5 1,241,885
Read thread 6 1,233,981
Read thread 7 1,250,351
Write thread 8 244,779
Write thread 9 243,730
Enum thread 10 557,395
Enum thread 11 70,132
Re: SQUINT 3, Sparse Quad Union Indexed Nibble Trie
changed license to Eclipse V2
This it still permissive as in you can link to it statically, it just means you need to share your changes or modifications to the library itself.
This it still permissive as in you can link to it statically, it just means you need to share your changes or modifications to the library itself.
Re: SQUINT 3, Sparse Quad Union Indexed Nibble Trie
v 3.3.2 b4
fixed a bug in the numeric functions asm backend.
Thanks Little John for report
fixed a bug in the numeric functions asm backend.
Thanks Little John for report