SQUINT 3, Sparse Quad Union Indexed Nibble Trie

Post by **idle** » Tue Oct 10, 2023 3:18 am

yes its just a trie the problem with tries is they use a lot of memory depending on how big their alphabet is, so how do you
make them smaller.

IF you made a Trie to handle normal 16bit unicode

Each node would have 65536 children plus a value so that's 524,296 bytes per node
you enter a key "ABBA" and it is now = 2,097,184 bytes with a depth of 4

so then you make a 256 way trie that,s 2056 bytes per node indexed by bytes
you enter a key "ABBA" and it is now = 16,448 bytes and the depth of 8

so then you make a 16 way trie that's 136 bytes per node indexed by nibbles
you enter a key "ABBA" and it is now = 2,176 bytes and the depth of 16

and then you make a 4 way trie range so that's 40 bytes per node by 2 bits
you enter a key "ABBA" and it is now = 1280 bytes and the depth of 32

Next you convert the strings to utf8 and use a 16 way trie
you enter a key "ABBA" and it is now = 1088 bytes and the depth of 8

then make it sparse so it's only 1 to 16 children you use a quad as an index for the 16 nodes of 4 bits
and you reduce the node to 16 bytes so that it's nicely aligned
you enter a key "ABBA" and it is now = 144 bytes and the depth is 8+1 for the dangling value node
but once you load it up the total memory savings are between 32 to 64 times smaller.

The next step is to truncate it but I've been holding off on that while I figure out the properties
I've got a plan how to do it without increasing the node size and if you want to make a trie case insensitive you need to store the key so that is what I will do.

Olli · Post by **Olli** » Tue Oct 10, 2023 7:29 pm

I do not know the word trie. Have you got a namesake ?

Post by **idle** » Tue Oct 10, 2023 7:36 pm

Olli wrote: Tue Oct 10, 2023 7:29 pm I do not know the word trie. Have you got a namesake ?

https://en.wikipedia.org/wiki/Trie

Olli · Post by **Olli** » Tue Oct 10, 2023 9:08 pm

Google Translate gives me the french verb "to try" from the english "trie". It finds also a geographic definition in France : Trie-sur-Baïse Thank you for the reply which gives the right definition. I will try to work on this concept while I have some time. I used a similar technic to compress a dictionary, and zip does not do a better compressing.

I think, first, a 64Kb array is required if only less than 256 characters are used. This does not change a lot the size of such a system. But it helps to grow the memory needs down.

Edit: after read the wiki article, I went to radix tree.

ex1:

Code: Select all

A - BBA
|\
| - KA
 \
  - LIAS

ex2:

Code: Select all

Main - Window # - 1
    |          \
    \           - 2
     \
      - Gadget # - a
                \
                 - b

This 2nd ex gives ie "MainGadget #a", "MainWindow #2"

Post by **idle** » Wed Oct 11, 2023 2:42 am

A radix tree is a little more complicated to implemen but the size and speed doesn't work out much different in practice, though it depends on the keys, when I implement shortening the key depth it will be interesting to see the difference.

Post by **idle** » Fri Nov 03, 2023 7:29 am

Squint 3.2
Added Binary key functions and an optional Hash in numeric for large keys
added basic example of using binary keys and a test of using hash mode of numeric functions
https://github.com/idle-PB/Squint3

Post by **idle** » Wed Nov 27, 2024 10:45 pm

Added Merge so writes and enumerations aren't blocked from each other. So it's lock free for concurrent readers, single writer, enumeration
This will later facilitate Acid gets and writes during enumerations, where
gets are immediate
writes are immediate
enumerations are immediate in respect to existing keys being updated during an enumeration

Test is with random keys 8 to 16 bytes with an initial size of 16m keys
8 readers 2 writers and 2 prefix enumerators

Test needs at least a 6 core processor with 12 threads

Number of Keys 17,167,950
Memory 1378.52mb
Keysize 261.26 mb
Overhead 5.2764105797

Total Keys 11,037,573 p/s
Lookup Keys 9,921,537 p/s
Lookup Rate 120.18 mb p/s
Lookup Time 100.79 ns
Write Keys 488,509 p/s
Write Rate 7.39 mb p/s
Write Time 2.05 µs
Enums Keys 627,527
Enum Rate 7.10 mb p/s

Read thread 0 1,260,934
Read thread 1 1,229,538
Read thread 2 1,264,255
Read thread 3 1,206,778
Read thread 4 1,233,815
Read thread 5 1,241,885
Read thread 6 1,233,981
Read thread 7 1,250,351
Write thread 8 244,779
Write thread 9 243,730
Enum thread 10 557,395
Enum thread 11 70,132

Post by **idle** » Mon Dec 16, 2024 2:53 am

changed license to Eclipse V2
This it still permissive as in you can link to it statically, it just means you need to share your changes or modifications to the library itself.

Post by **idle** » Tue Dec 17, 2024 12:09 am

v 3.3.2 b4
fixed a bug in the numeric functions asm backend.
Thanks Little John for report

PureBasic Forums - English

SQUINT 3, Sparse Quad Union Indexed Nibble Trie

Re: SQUINT 3, Sparse Quad Union Indexed Nibble Trie

Re: SQUINT 3, Sparse Quad Union Indexed Nibble Trie

Re: SQUINT 3, Sparse Quad Union Indexed Nibble Trie

Re: SQUINT 3, Sparse Quad Union Indexed Nibble Trie

Re: SQUINT 3, Sparse Quad Union Indexed Nibble Trie

Re: SQUINT 3, Sparse Quad Union Indexed Nibble Trie

Re: SQUINT 3, Sparse Quad Union Indexed Nibble Trie

Re: SQUINT 3, Sparse Quad Union Indexed Nibble Trie

Re: SQUINT 3, Sparse Quad Union Indexed Nibble Trie