SQUINT 2, Sparse Quad Union Indexed Nibble Trie

Share your advanced PureBasic knowledge/code with the community.
User avatar
idle
Always Here
Always Here
Posts: 5042
Joined: Fri Sep 21, 2007 5:52 am
Location: New Zealand

SQUINT 2, Sparse Quad Union Indexed Nibble Trie

Post by idle »

Squint 2.0.5

Squint is a dynamic compact prefix Trie, It can by used as a Map or Numeric map though Squint has one distinct advantage over a map. Lexicographic enumeration or prefix searches. It's a dynamic structure which needs no predetermined size, and will perform as O(k*2) rather than degrading towards O(n) like a fully loaded map would. While it's Sets are almost 2 times that of a sized map, it's Gets are closer to 1:1. Memory use should also be smaller than a map

String mode functions Set, Get, Delete, Prune, Enumerate and Walk, keys can be Unicode, UTF8, Ascii or Integers, keys are mapped to UTF8
you can mix Unicode, UTF8 and Integer key input types as it's converted to UTF8 but be careful not to mix Ascii if above 126.

The SquintNumeric functions, SetNumeric, GetNumeric, DeleteNumeric, WalkNumeric Behaves just like a map, it's gets are generally faster than a map and runs at a constant speed.

Enumeration or walk callbacks are in the form *key, value.i, *userdata, though keys are UTF8, Ascii or Integers for the fast numeric walk function.

Test of random keys,
AMD FX(tm)-6100 Six-Core Processor 64 bit
Items 1000000
Squint Set 932 ms
Map Set 410 ms
Squint Get 487 ms
Map Get 379 ms
Squint Enum 0 ms
Map Enum 256 ms
=========================
Squint Set Ratio 2.273 slower
Squint Get Ratio 1.285 slower
Squint Enum Ratio +Infinity faster
=========================
Key Input Size 7.00 mb
Squint Memory Size 54.67 mb
Overhead 7.81 bytes input
Map Size ~ 49.07mb

Fast Numeric mode test
Items 1000000
Squint Numeric Set 572 ms
Map Set 402 ms
Squint numeric Get 244 ms
Map Get 390 ms
=========================
Squint Set Ratio 1.423 slower
Squint Get Ratio 1.598 faster
=========================
Key Input Size 7.63 mb
Squint Memory Size 25.71 mb
Overhead 3.37 bytes input
Map Size ~ 49.70mb

Code: Select all

; SQUINT 2, Sparse Quad Union Indexed Nibble Trie
; Copyright (c) 2020 Andrew Ferguson
; Version 2.0.5
; PB 5.72 x86 x64 Linux OSX Windows
; Thanks Wilbert for the high low insight and utf8 str asm help.
; Squint is a compact prefix Trie indexed by nibbles into a sparse array with performance metrics close to a map
; It provides O(K*2) performance with a memory size ~32 times smaller than a 256 node Trie
; The overheads are similar to that of a QP Trie, thats a 1/3 smaller than a Critbit.
; In terms of speed SquintSet and SquintGet should be at worst ~2 times slower than a Map.
; Lexographical enumerations are however magnitudes faster than what you could achieve otherwise
; In fast Numeric mode it's gets are closer to 1:1 with a map 
;
; see https://en.wikipedia.org/wiki/Trie
;     https://dotat.at/prog/qp/blog-2015-10-04.html
;     https://cr.yp.to/critbit.html 
;
; Squint supports Set, Get, Enum, Walk, Delete and Prune with a flag in Delete
; keys can be either Unicode, Ascii, UTF8 or Integers, the type must be specified 
; keys are all mapped to UTF8
;
; SquintNumeric (fast numeric) supports, SetNumeric GetNumeric DeleteNumeric and WalkNumeric
; it's provided as a direct subtitute for a numeric map, though lacks enumeration
; keys are returned as Integers  
;
; MIT License
; Permission is hereby granted, SquintFree of charge, to any person obtaining a copy
; of this software and associated documentation files (the "Software"), to deal
; in the Software without restriction, including without limitation the rights
; To use, copy, modify, merge, publish, distribute, sublicense, and/or sell
; copies of the Software, and to permit persons to whom the Software is
; furnished to do so, subject to the following conditions:
; 
; The above copyright notice and this permission notice shall be included in all
; copies or substantial portions of the Software.
; 
; THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS Or
; IMPLIED, INCLUDING BUT Not LIMITED To THE WARRANTIES OF MERCHANTABILITY,
; FITNESS For A PARTICULAR PURPOSE And NONINFRINGEMENT. IN NO EVENT SHALL THE
; AUTHORS Or COPYRIGHT HOLDERS BE LIABLE For ANY CLAIM, DAMAGES Or OTHER
; LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT Or OTHERWISE, ARISING FROM,
; OUT OF Or IN CONNECTION With THE SOFTWARE Or THE USE Or OTHER DEALINGS IN THE
; SOFTWARE. 

DeclareModule SQUINT 
  
  #SQUINT_MAX_KEY = 1024
  
  Structure squint_node 
    *vertex.edge
    StructureUnion
      squint.q
      value.i 
    EndStructureUnion 
  EndStructure   
  
  Structure edge 
    e.squint_node[0]
  EndStructure 
  
  Structure squint
    *vt
    *root.squint_node
    size.l
    datasize.l
    count.l
    ib.a[22]
    sb.a[#SQUINT_MAX_KEY]
  EndStructure
  
  CompilerIf #PB_Compiler_Processor = #PB_Processor_x86 
    #Squint_Pmask = $ffffffff
  CompilerElse
    #Squint_Pmask = $ffffffffffff
  CompilerEndIf
  
  ;-Squint Callback prototype 
  Prototype Squint_CB(*key,*value=0,*userdata=0)
  
  Declare SquintNew()
  Declare SquintFree(*this.Squint)
  Declare SquintDelete(*this.squint,*key,prune=0,mode=#PB_Unicode)
  Declare SquintSet(*this.squint,*key,value.i,mode=#PB_Unicode)
  Declare SquintGet(*this.squint,*key,mode=#PB_Unicode)
  Declare SquintEnum(*this.squint,*key,*pfn.squint_CB,*userdata=0,ReturnMatch=0,mode=#PB_Unicode)
  Declare SquintWalk(*this.squint,*pfn.squint_CB,*userdata=0)
  Declare SquintSetNumeric(*this.squint,key.i,value.i)
  Declare SquintGetNumeric(*this.squint,key.i)
  Declare SquintDeleteNumeric(*this.squint,key.i)
  Declare SquintWalkNumeric(*this.squint,*pfn.squint_CB,*userdata=0)
  
  ;-Squint Inteface iSquint  
  Interface iSquint
    Free()
    Delete(*key,prune=0,mode=#PB_Unicode)
    Set(*key,value.i,mode=#PB_Unicode)
    Get(*key,mode=#PB_Unicode)
    Enum(*key,*pfn.squint_CB,*userdata=0,ReturnMatch=0,mode=#PB_Unicode)
    Walk(*pfn.squint_CB,*userdata=0)
    SetNumeric(key.i,value.i) 
    GetNumeric(key.i) 
    SquintDeleteNumeric(key.i)
    WalkNumeric(*pfn.Squint_CB,*userdata=0)
  EndInterface
  
  DataSection: vtSquint:
    Data.i @SquintFree()
    Data.i @SquintDelete() 
    Data.i @SquintSet()
    Data.i @SquintGet()
    Data.i @SquintEnum()
    Data.i @SquintWalk()
    Data.i @SquintSetNumeric()
    Data.i @SquintGetNumeric()
    Data.i @SquintDeleteNumeric()
    Data.i @SquintWalkNumeric() 
  EndDataSection   
  
EndDeclareModule

Module SQUINT
  
  EnableExplicit
  ;-macros 
  Macro _SETINDEX(in,index,number)
    in = in & ~(15 << (index << 2)) | (number << (index << 2))
  EndMacro
  
  Macro _GETNODECOUNT()
    CompilerIf #PB_Compiler_Processor = #PB_Processor_x86
      nodecount = MemorySize(*node\vertex) / SizeOf(squint_node)
    CompilerElse
      nodecount = (*node\vertex >> 48)
    CompilerEndIf
  EndMacro
  
  Macro _POKENHL(in,Index,Number)
    *Mem.Ascii = in
    *Mem + Index >> 1
    If Index & 1
      *Mem\a = (*Mem\a & $f0) | (Number & $f)
    Else
      *Mem\a = (*Mem\a & $0f) | (Number << 4)
    EndIf
  EndMacro
  
  Macro _STR() 
    CompilerIf #PB_Compiler_Processor = #PB_Processor_x64
      ; backup registers
      !mov [rsp - 8], rbx
      !mov [rsp - 16], rdi
      ; get procedure arguments
      !mov rax, [p.v_char]
      !mov rdi, [p.p_out] 
      ; determine sign
      !xor ebx, ebx
      !cmp rax, rbx
      !jge .l0
      !neg rax
      !mov rdx, 1 
      !mov [rsp - 24],rdx
      !.l0:
      ; convert number to ascii characters
      !add rdi, 22 
      !mov byte [rdi], 0
      !mov rbx, 0xcccccccccccccccd
      !.l1:
      !mov rcx, rax
      !mul rbx
      !mov rax, rdx
      !shr rax, 3
      !imul rdx, rax, 10
      !sub rcx, rdx
      !or rcx, 0x30
      !sub rdi, 1
      !mov [rdi], cl
      !test rax, rax
      !jnz .l1
      ;look up sign 
      !mov rax,[rsp-24]
      !cmp rax,1
      !jnz .l2
      !sub rdi, 1 
      !mov byte [rdi], '-'
      !.l2:
      ;store start of string 
      !mov [p.v_offset],rdi 
      ;restore registers 
      !mov rdi, [rsp - 16]
      !mov rbx, [rsp - 8]
    CompilerElse
      ; backup registers
      !mov [esp - 4], ebx
      !mov [esp - 8], edi
      ; get procedure arguments
      !mov eax, [p.v_char]
      !mov edi, [p.p_out]
      ; determine length
      !xor ebx, ebx
      !cmp eax, ebx
      !jge .l0
      !neg eax
      !mov edx, 1 
      !mov [esp - 12],edx
      !.l0:
      ; convert number to ascii characters
      !add edi, 22
      !mov byte [edi], 0
      !mov ebx, 0xcccccccd
      !.l1:
      !mov ecx, eax
      !mul ebx
      !mov eax, edx
      !shr eax, 3
      !imul edx, eax, 10
      !sub ecx, edx
      !or ecx, 0x30
      !sub edi, 1
      !mov [edi], cl
      !test eax, eax
      !jnz .l1
      ;look up sign
      !mov eax,[esp-12]
      !cmp eax,1
      !jnz .l2
      !sub edi, 1 
      !mov byte [edi], '-'
      !.l2:
      ;store offset to start of string
      !mov [p.v_offset],edi 
      ;restore registers 
      !mov edi, [esp - 8]
      !mov ebx, [esp - 4]
    CompilerEndIf
    
  EndMacro
  
  Macro _CONVERTUTF8() 
    If mode <> #PB_Integer 
      char= PeekU(*key)
    Else    
      char = PeekI(*key)
      *out = @*this\ib[0]
      _STR()
      *key = offset 
      mode = #PB_Ascii
      char= PeekU(*key)
    EndIf 
    
    If mode = #PB_Unicode  
      !mov eax, [p.v_char]
      !cmp eax, 0x0080
      !jb .l3
      !cmp eax, 0x0800
      !jae .l4
      !shl eax, 2
      !shr al, 2
      !or eax, 1100000010000000b
      !bswap eax
      !shr eax, 16
      !jmp .l3
      !.l4:
      !shl eax, 4
      !shr ax, 2
      !shr al, 2
      !or eax, 111000001000000010000000b
      !bswap eax
      !shr eax, 8
      !.l3:
      !mov [p.v_char],eax
    EndIf 
  EndMacro 
  
  Macro _MODECHECK()
    _CONVERTUTF8()
    If mode <> #PB_Unicode
      If (char >> ((count&1)<<4) & $ff = 0)
        Break
      EndIf 
    Else 
      *this\datasize+1
    EndIf
  EndMacro 
  
  Macro _SETNODE()
    If *node\vertex
      _GETNODECOUNT()
      If (offset <> 15 Or nodecount = 16)
        *node = *node\Vertex\e[offset] & #Squint_Pmask
      Else  
        *this\size + SizeOf(squint_node)
        offset = nodecount
        nodecount+1
        *node\vertex = ReAllocateMemory(*node\vertex & #Squint_Pmask,(nodecount)*SizeOf(squint_node))
        CompilerIf #PB_Compiler_Processor = #PB_Processor_x64 
          *node\vertex | ((nodecount) << 48)
        CompilerEndIf
        _SETINDEX(*node\squint,idx,offset)
        *node = *node\Vertex\e[offset] & #Squint_Pmask
      EndIf
    Else
      *this\size + SizeOf(squint_node)
      *node\vertex = AllocateMemory(SizeOf(squint_Node))
      CompilerIf #PB_Compiler_Processor = #PB_Processor_x64
        *node\vertex | (1 << 48)
      CompilerEndIf
      *node\squint = -1
      _SETINDEX(*node\squint,idx,0)
      *node = *node\Vertex\e[0] & #Squint_Pmask
      *this\count+1
    EndIf
  EndMacro
  
  CompilerIf #PB_Compiler_Processor = #PB_Processor_x86 
    Macro rax : eax : EndMacro 
  CompilerEndIf   
  ;-General functions 
  Procedure SquintNew()
    Protected *this.squint,a
    *this = AllocateMemory(SizeOf(squint))
    If *this
      *this\vt = ?vtSquint
      *this\root = AllocateMemory(SizeOf(squint_node))
      ProcedureReturn *this
    EndIf
  EndProcedure
  
  Procedure ISquintFree(*this.squint,*node.squint_node=0)
    Protected a,offset,nodecount
    If Not *node
      ProcedureReturn 0
    EndIf
    For a=0 To 15
      offset = (*node\squint >> (a<<2)) & $f
      If *node\vertex
        _GETNODECOUNT()
        If (offset <> 15 Or nodecount = 16)
          ISquintFree(*this,*node\Vertex\e[offset] & #Squint_Pmask)
        EndIf
      EndIf
    Next
    If *node\vertex
      _GETNODECOUNT()
      *this\size - nodecount
      *this\count - 1
      FreeMemory(*node\Vertex & #Squint_Pmask)
      *node\vertex=0
    EndIf
    ProcedureReturn *node
  EndProcedure
  
  Procedure SquintFree(*this.squint)
    Protected a,offset,*node.squint_node,nodecount
    *node = *this\root
    For a=0 To 15
      offset = (*node\squint >> (a<<2)) & $f
      If *node\vertex
        _GETNODECOUNT()
        If (offset <> 15 Or nodecount = 16)
          ISquintFree(*this,*node)
        EndIf
      EndIf
    Next
    FreeMemory(*this\root)
    nodecount = *this\count
    FreeMemory(*this)
    ProcedureReturn nodecount
  EndProcedure
  ;-string functions 
  Procedure SquintDelete(*this.squint,*key.Unicode,prune=0,mode=#PB_Unicode)
    Protected *node.squint_node,idx,*mem.Character,offset,nodecount,char.i,count,*out
    *node = *this\root
    _CONVERTUTF8()
    While char
      offset = (*node\squint >> ((char & $f0) >> 2 )) & $f
      If *node\vertex
        _GETNODECOUNT()
        If (offset <> 15 Or nodecount = 16)
          *node = *node\Vertex\e[offset] & #Squint_Pmask
        EndIf
      Else
        ProcedureReturn 0
      EndIf
      If *node
        offset = (*node\squint >> ((char & $0f) << 2)) & $f
        If *node\vertex
          _GETNODECOUNT()
          If (offset <> 15 Or nodecount = 16)
            *node = *node\Vertex\e[offset] & #Squint_Pmask
          EndIf
        Else
          ProcedureReturn 0
        EndIf
      EndIf
      char >> 8
      If char = 0
        *key+2
        _MODECHECK()
      EndIf
    Wend
    If prune
      ISquintFree(*this,*node)
      If (*node\vertex & #Squint_Pmask) = 0
        *node\squint = 0
      EndIf
    Else
      offset = *node\squint & $f
      _GETNODECOUNT()
      If offset <= nodecount
        *node = (*node\Vertex\e[offset] & #Squint_Pmask)
        If (*node\vertex & #Squint_Pmask) = 0
          *node\squint = 0
        EndIf
      Else
        ProcedureReturn 0
      EndIf
    EndIf
  EndProcedure
  
  Procedure SquintSet(*this.squint,*key,value.i,mode=#PB_Unicode)
    Protected *node.squint_node,idx,offset,nodecount,char.i,count,*out
    *node = *this\root & #Squint_Pmask
    _CONVERTUTF8()
    While char
      idx = (char >> 4) & $f
      offset = (*node\squint >> (idx<<2)) & $f
      _SETNODE()
      idx = char & $0f
      offset = (*node\squint >> (idx<<2)) & $f
      _SETNODE()
      *this\datasize+1
      char >> 8
      count+1
      If char = 0
        *key+2
        _MODECHECK()
      EndIf
    Wend
    idx=0
    offset = *node\squint & $f
    _SETNODE()
    If value 
      *node\value = value
    EndIf 
    ProcedureReturn *node 
  EndProcedure
  
  Procedure SquintGet(*this.squint,*key,mode=#PB_Unicode)
    Protected *node.squint_Node,idx,offset,nodecount,char.i,count,*out
    *node = *this\root & #Squint_Pmask
    _CONVERTUTF8()
    While char
      offset = (*node\squint >> ((char & $f0) >> 2 )) & $f
      _GETNODECOUNT()
      If offset < nodecount
        *node = (*node\Vertex\e[offset] & #Squint_Pmask)
      Else
        ProcedureReturn 0
      EndIf
      offset = (*node\squint >> ((char & $0f) << 2)) & $f
      _GETNODECOUNT()
      If offset < nodecount
        *node = (*node\Vertex\e[offset] & #Squint_Pmask)
      Else
        ProcedureReturn 0
      EndIf
      char >> 8
      count+1
      If char = 0
        *key+2
        _MODECHECK()
      EndIf
    Wend
    offset = *node\squint & $f
    _GETNODECOUNT()
    If offset <= nodecount
      *node = (*node\Vertex\e[offset] & #Squint_Pmask)
      ProcedureReturn *node\value
    Else
      ProcedureReturn 0
    EndIf
  EndProcedure
      
  Procedure IEnum(*this.squint,*node.squint_Node,depth,*pfn.squint_CB,*userdata=0)
    Protected a.i,offset,nodecount,*mem.Ascii 
    If Not *node
      ProcedureReturn 0
    EndIf
    For a=0 To 15
      offset = (*node\squint >> (a<<2)) & $f
      If (*node\vertex And *node\squint)
        _GETNODECOUNT()
        If (offset <> 15 Or nodecount = 16)
          _POKENHL(@*this\sb,depth,a)
          IEnum(*this,*node\Vertex\e[offset] & #Squint_Pmask,depth+1,*pfn,*userdata)
        EndIf
      EndIf
    Next
    If *node\vertex=0
      If *pfn
        PokeA(@*this\sb+((depth>>1)),0)
        *pfn(@*this\sb,*node\value,*userdata)
      EndIf
    EndIf
    ProcedureReturn *node
  EndProcedure
  
  Procedure SquintEnum(*this.squint,*key.Unicode,*pfn.squint_CB,*userdata=0,ReturnMatch=0,mode=#PB_Unicode)
    Protected *node.squint_Node,idx,*mem.Ascii,offset,nodecount,depth,char.i,count,*out
    If ReturnMatch
      If SquintGet(*this,*key)
        If *pfn
          *pfn(*key,0,*userdata)
        EndIf
        ProcedureReturn
      EndIf
    EndIf
    *node = *this\root
    _CONVERTUTF8()
    While char 
      idx = (char >> 4) & $f
      offset = (*node\squint >> (idx<<2)) & $f
      If (*node\vertex And *node\squint)
        _GETNODECOUNT()
        If (offset <> 15 Or nodecount = 16)
          ;POKENHL(@*this\sb,depth,idx)
          *mem = @*this\sb+(depth>>1) 
          *mem\a = (*mem\a & $0f) | (idx<<4)
          depth+1
          *node = *node\Vertex\e[offset] & #Squint_Pmask
        EndIf
      EndIf
      If Not *node Or *node\vertex = 0
        ProcedureReturn 0
      EndIf
      idx = char & $f
      offset = (*node\squint >> (idx<<2)) & $f
      If (*node\vertex And *node\squint)
        _GETNODECOUNT()
        If (offset <> 15 Or nodecount = 16)
          ;POKENHL(@*this\sb,depth,idx)
          *mem = @*this\sb+(depth>>1) 
          *Mem\a = (*Mem\a & $f0) | (idx & $f)
          depth+1
          *node = *node\Vertex\e[offset] & #Squint_Pmask
        EndIf
      EndIf
      If Not *node Or *node\vertex = 0
        ProcedureReturn 0
      EndIf
      char >> 8
      count+1
      If char = 0
        *key+2
        _MODECHECK()
      EndIf
    Wend
    IEnum(*this,*node,depth,*pfn,*userdata)
  EndProcedure
  
  Procedure SquintWalk(*this.squint,*pfn.squint_CB,*userdata=0) 
    IEnum(*this,*this\root,0,*pfn,*userdata)
  EndProcedure
  
  ;-Numeric functions
  Procedure SquintSetNumeric(*this.squint,key.i,value.i)
    Protected *node.squint_node,idx,offset,nodecount,char.i,count 
    *node = *this\root & #Squint_Pmask
    char = key
    EnableASM
    mov rax, char
    bswap rax;
    mov char,rax
    DisableASM 
    While count < SizeOf(Integer) 
      idx = (char >> 4) & $f
      offset = (*node\squint >> (idx<<2)) & $f
      _SetNODE()
      idx = (char & $f)
      offset = (*node\squint >> (idx<<2)) & $f
      _SetNODE()
      *this\datasize+1
      char >> 8
      count+1
    Wend
    *node\value = value
    ProcedureReturn 
  EndProcedure
  
  Procedure SquintGetNumeric(*this.squint,key.i)
    Protected *node.squint_Node,idx,offset,nodecount,char.i,count,skey.s
    *node = *this\root & #Squint_Pmask
    char = key
    EnableASM
    mov rax, char
    bswap rax;
    mov char,rax
    DisableASM 
    While count < SizeOf(Integer)
      offset = (*node\squint >> ((char & $f0) >> 2 )) & $f
      _GETNODECOUNT()
      If offset < nodecount
        *node = (*node\Vertex\e[offset] & #Squint_Pmask)
      Else
        ProcedureReturn 0
      EndIf
      offset = (*node\squint >> ((char & $0f) << 2)) & $f
      _GETNODECOUNT()
      If offset < nodecount
        *node = (*node\Vertex\e[offset] & #Squint_Pmask)
      Else
        ProcedureReturn 0
      EndIf
      char>>8
      count+1
    Wend
    ProcedureReturn *node\value
  EndProcedure
  
  Procedure SquintDeleteNumeric(*this.squint,key.i)
    Protected *node.squint_node,idx,*mem.Character,offset,nodecount,char.i,count
    *node = *this\root & #Squint_Pmask
    char = key
    EnableASM
    mov rax, char
    bswap rax;
    mov char,rax
    DisableASM 
    While count < SizeOf(Integer)
      offset = (*node\squint >> ((char & $f0) >> 2 )) & $f
      _GETNODECOUNT()
      If offset < nodecount
        *node = (*node\Vertex\e[offset] & #Squint_Pmask)
      Else
        ProcedureReturn 0
      EndIf
      offset = (*node\squint >> ((char & $0f) << 2)) & $f
      _GETNODECOUNT()
      If offset < nodecount
        *node = (*node\Vertex\e[offset] & #Squint_Pmask)
      Else
        ProcedureReturn 0
      EndIf
      char>>8
      count+1
    Wend
    If (*node\vertex & #Squint_Pmask) = 0
      *node\squint = 0
    EndIf
  EndProcedure
  
  Procedure IEnumNumeric(*this.squint,*node.squint_Node,idx,depth,*pfn.squint_CB,*userdata=0)
    Protected a.i,offset,nodecount,*mem.Ascii,char.i 
    If Not *node
      ProcedureReturn 0
    EndIf
    For a=0 To 15 
      offset = (*node\squint >> (a<<2)) & $f
      If (*node\vertex And *node\squint)
        _GETNODECOUNT()
        If (offset <> 15 Or nodecount = 16)
          _POKENHL(@*this\sb,depth,a)
          IEnumNumeric(*this,*node\Vertex\e[offset] & #Squint_Pmask,0,depth+1,*pfn,*userdata)
        EndIf
      EndIf
    Next
    If *node\vertex=0
      char = PeekI(@*this\sb)
      EnableASM
      mov rax, char
      bswap rax;
      mov char,rax
      DisableASM 
      If *pfn
        *pfn(@char,*node\value,*userdata)
      EndIf
    EndIf
    ProcedureReturn *node
  EndProcedure
  
  Procedure SquintWalkNumeric(*this.squint,*pfn.squint_CB,*userdata=0) 
    IEnumNumeric(*this,*this\root,0,0,*pfn,*userdata)
  EndProcedure
  
  DisableExplicit
  ;-End of module   
EndModule

CompilerIf #PB_Compiler_IsMainFile
  
  UseModule SQUINT
  
  Global msg.s
  
  Procedure Map_Enum(Map mp.i(),key.s,len) 
    Protected NewList items.s() 
    Protected word.s
    ForEach mp() 
      word = MapKey(mp())
      If Left(word,len) = key 
        AddElement(items()) 
        items() = word 
      EndIf 
    Next   
    SortList(items(),#PB_Sort_Ascending) 
  EndProcedure   
  
  Procedure CBSquint(*key,*value,*userData)
    If *key 
      msg + PeekS(*key,-1,#PB_UTF8) + #LF$
    EndIf   
  EndProcedure
  
  Procedure CBSquintUTF8(*key,*value,*userData)
    Debug PeekS(*key,-1,#PB_UTF8) + " " + PeekS(*userData)
  EndProcedure
  
  Procedure CBSquintAscii(*key,*value,*userData)
    Debug PeekS(*key,-1,#PB_Ascii) + " " + PeekS(*userData)
  EndProcedure
  
  Procedure CBSquintInteger(*key,*value,*userData)
    Debug PeekS(*userData)  + " " + Str(PeekI(*key)) + " / " + Str(*value)  
  EndProcedure
  
  Define ct=1000000
  Define *mt.squint = SquintNew()
  Define NewMap mp.i(ct)
  Define key.s,key1.s,*utf8,ikey
  Define TrieSet,TrieGet,MapSet,MapGet,TrieEnum,MapEnum,TrieSquintSetNum,TriGetNum
  Define st,et,a,seed = 1234
  
  CompilerIf #PB_Compiler_Debugger
    
    ;unicode
    key = Chr($f6) + Chr($20ac) + Chr($e0)
    SquintSet(*mt,@key,@key)
    Debug PeekS(SquintGet(*mt,@key)) + " test unicode SquintGet "
    key = Chr($f6)
    SquintEnum(*mt,@key,@CBSquintUTF8(),@"test unicode SquintEnum")
    
    ;UTF8
    key = Chr($f6) + Chr($20ac) + Chr($e0)
    *utf8 = UTF8(key)
    SquintSet(*mt,*utf8,*utf8,#PB_UTF8) ;overwrite the unicode key 
    Debug PeekS(SquintGet(*mt,*utf8,#PB_UTF8),-1,#PB_UTF8) + " test utf8 SquintGet"
    FreeMemory(*utf8) 
    *utf8 = UTF8(Chr($f6))
    SquintEnum(*mt,*utf8,@CBSquintUTF8(),@"test utf8 SquintEnum",0,#PB_UTF8)
    FreeMemory(*utf8) 
    
    ;Ascii 
    *ascii = Ascii("An ascii string") 
    SquintSet(*mt,*ascii,*ascii,#PB_Ascii) 
    Debug PeekS(SquintGet(*mt,*ascii,#PB_Ascii),-1,#PB_Ascii) 
    FreeMemory(*ascii)
    *ascii = Ascii("An")
    SquintEnum(*mt,*ascii,@CBSquintAscii(),@"testing Ascii SquintEnum",0,#PB_Ascii) 
    FreeMemory(*ascii) 
    
    ;Numeric string 
    a = -123456
    SquintSet(*mt,@a,a,#PB_Integer) 
    Debug SquintGet(*mt,@a,#PB_Integer) 
    a = -123
    SquintEnum(*mt,@a,@CBSquintUTF8(),@"test numeric SquintEnum",0,#PB_Integer)
    
    ;Fast Numeric 
    SquintFree(*mt) 
    *mt = SquintNew() 
    
    For a = 500 To 1000
      SquintSetNumeric(*mt,a,a) 
    Next   
    
    For a = 500 To 1000
      Debug SquintGetNumeric(*mt,a)
    Next   
    a=100
    SquintDeleteNumeric(*mt,a)
    
    Debug "test delete " + Str(SquintGetNumeric(*mt,a)) 
    Debug "test SquintWalk"
    SquintWalkNumeric(*mt,@CBSquintInteger(),@"SquintNumeric Key / Value: ")
    
    SquintFree(*mt) 
    
    End 
  CompilerEndIf
  
  RandomSeed(seed)
  st = ElapsedMilliseconds()
  For a = 0 To ct
    ikey = Random($FFFFFF)
    SquintSet(*mt,@ikey,ikey,#PB_Integer)
  Next
  et = ElapsedMilliseconds()
  TrieSet = et-st
  
  RandomSeed(seed)
  st = ElapsedMilliseconds()
  For a = 0 To ct
    ikey = Random($FFFFFF)
    mp(Str(ikey)) = ikey
  Next
  et = ElapsedMilliseconds()
  MapSet = et-st
  
  RandomSeed(seed+1)
  st = ElapsedMilliseconds()
  For a = 0 To ct
    ikey = Random($FFFFFF)
    out = SquintGet(*mt,@ikey,#PB_Integer)
    If (ikey <> out And out <> 0)
      MessageRequester("error", Str(ikey) + " " + Str(out))
      End
    EndIf
  Next
  et = ElapsedMilliseconds()
  TrieGet = et-st
  
  RandomSeed(seed+1)
  st = ElapsedMilliseconds() 
  For a = 0 To ct
    ikey = Random($FFFFFF)
    out = mp(Str(ikey))
    If (ikey <> out And  out <> 0)
      MessageRequester("error", Str(ikey) + " " + Str(out))
      End
    EndIf
  Next
  et = ElapsedMilliseconds()
  MapGet = et-st
  
  st = ElapsedMilliseconds() 
  key = "575"
  SquintEnum(*mt,@key,0)
  et = ElapsedMilliseconds() 
  TrieEnum = et-st 
  
  st = ElapsedMilliseconds() 
  key = "575"
  Map_Enum(mp(),key,3)
  et = ElapsedMilliseconds() 
  MapEnum = et-st
  
  msg.s = Trim(CPUName()) + " " + Str(SizeOf(integer)*8) + " bit" + #LF$
  msg.s + "Items " + Str(ct) + #LF$
  msg + "Squint Set " + Str(TrieSet) + " ms"+ #LF$
  msg + "Map Set " + Str(MapSet) + " ms" + #LF$
  msg + "Squint Get " + Str(TrieGet) + " ms" + #LF$
  msg + "Map Get " + Str(MapGet) + " ms" + #LF$
  msg + "Squint Enum " + Str(TrieEnum) + " ms" + #LF$
  msg + "Map Enum " + Str(MapEnum) + " ms" + #LF$
  msg + "=========================" + #LF$
  msg + "Squint Set Ratio " + StrF(TrieSet / MapSet,3) + " slower" + #LF$
  msg + "Squint Get Ratio " + StrF(TrieGet / MapGet,3) + " slower " + #LF$
  msg + "Squint Enum Ratio " + StrF(MapEnum / TrieEnum,3) + " faster " + #LF$
  msg + "=========================" + #LF$
  msg + "Key Input Size " + StrF(*mt\datasize / 1024 / 1024,2) + " mb" + #LF$
  msg + "Squint Memory Size " + StrF(*mt\size / 1024 / 1024,2) + " mb" + #LF$
  msg + "Overhead " + StrF(*mt\size / *mt\datasize,2) + " bytes input" + #LF$
  msg + "Map Size ~ " + StrF(((4*SizeOf(Integer)*MapSize(mp())*1.42)+*mt\datasize) / 1024 / 1024,2) + "mb" + #LF$
  
  SetClipboardText(msg)
  MessageRequester("SQUINT strings keys",msg)
  
  msg=""
  in.s = "57567"
  SquintEnum(*mt,@in,@cbsquint())
  SquintDelete(*mt,@in,1) 
  msg + "Pruned from " + in + #LF$ 
  SquintEnum(*mt,@in,@cbsquint())
  msg + "check pune worked  SquintEnum again" + #LF$ 
  nodecount = SquintFree(*mt) 
  msg + "SquintFree check node count is zero " + Str(Nodecount)  
  MessageRequester("Enumeration",msg)
  
  ;numeric test 
  Define ikey.i
  msg=""
  
  FreeMap(Mp())
  NewMap mp.i(ct)
  *mt.SQUINT = SquintNew() 
  
  RandomSeed(seed)
  st = ElapsedMilliseconds()
  For a = 0 To ct
    ikey = Random($FFFFFF)
    SquintSetNumeric(*mt,ikey,ikey)
  Next
  et = ElapsedMilliseconds()
  TrieSet = et-st
  
  RandomSeed(seed)
  st = ElapsedMilliseconds()
  For a = 0 To ct
    ikey = Random($FFFFFF)
    mp(Str(ikey)) = ikey
  Next
  et = ElapsedMilliseconds()
  MapSet = et-st
  
  RandomSeed(seed+1)
  st = ElapsedMilliseconds()
  For a = 0 To ct
    ikey = Random($FFFFFF)
    out = SquintGetNumeric(*mt,ikey)
    If (ikey <> out And  out <> 0)
      MessageRequester("error", Str(ikey) + " " + Str(out)) 
      End
    EndIf
  Next
  et = ElapsedMilliseconds()
  TrieGet = et-st
  
  RandomSeed(seed+1)
  st = ElapsedMilliseconds() 
  For a = 0 To ct
    ikey = Random($FFFFFF)
    out = mp(Str(ikey))
    If (ikey <> out And  out <> 0)
      MessageRequester("error", Str(ikey) + " " + Str(out))
      End
    EndIf
  Next
  et = ElapsedMilliseconds()
  MapGet = et-st
  
  msg.s + "Items " + Str(ct) + #LF$
  msg + "Squint Numeric Set " + Str(TrieSet) + " ms"+ #LF$
  msg + "Map Set " + Str(MapSet) + " ms" + #LF$
  msg + "Squint numeric Get " + Str(TrieGet) + " ms" + #LF$
  msg + "Map Get " + Str(MapGet) + " ms" + #LF$
  msg + "=========================" + #LF$
  msg + "Squint Set Ratio " + StrF(TrieSet / MapSet,3) + " slower" + #LF$
  msg + "Squint Get Ratio " + StrF(MapGet / TrieGet,3) + " faster" + #LF$
  msg + "=========================" + #LF$
  msg + "Key Input Size " + StrF(*mt\datasize / 1024 / 1024,2) + " mb" + #LF$
  msg + "Squint Memory Size " + StrF(*mt\size / 1024 / 1024,2) + " mb" + #LF$
  msg + "Overhead " + StrF(*mt\size / *mt\datasize,2) + " bytes input" + #LF$
  msg + "Map Size ~ " + StrF(((4*SizeOf(Integer)*MapSize(mp())*1.42)+*mt\datasize) / 1024 / 1024,2) + "mb" + #LF$
  
  SetClipboardText(msg)
  MessageRequester("SQUINT Integer keys",msg)
  
  SquintFree(*mt) 
  
CompilerEndIf


Windows 11, Manjaro, Raspberry Pi OS
Image
User avatar
NicTheQuick
Addict
Addict
Posts: 1224
Joined: Sun Jun 22, 2003 7:43 pm
Location: Germany, Saarbrücken
Contact:

Re: SQUINT 2, Sparse Quad Union Indexed Nibble Trie

Post by NicTheQuick »

Nice work!

Code: Select all

Intel(R) Core(TM) i7-3820QM CPU @ 2.70GHz 64 bit
Items 1000000
Squint Set 775 ms
Map Set 393 ms
Squint Get 502 ms
Map Get 421 ms
Squint Enum 1 ms
Map Enum 245 ms
=========================
Squint Set Ratio 1.972 slower
Squint Get Ratio 1.192 slower 
Squint Enum Ratio 245.000 faster 
=========================
Key Input Size 13.99 mb
Squint Memory Size 54.67 mb
Overhead 3.91 bytes input
Map Size ~ 46.49mb
Keep in mind that your speed test also measures the time of Random, Str and Val.
The english grammar is freeware, you can use it freely - But it's not Open Source, i.e. you can not change it or publish it in altered way.
User avatar
idle
Always Here
Always Here
Posts: 5042
Joined: Fri Sep 21, 2007 5:52 am
Location: New Zealand

Re: SQUINT 2, Sparse Quad Union Indexed Nibble Trie

Post by idle »

NicTheQuick wrote:Nice work!
Keep in mind that your speed test also measures the time of Random, Str and Val.
Thanks Nic, my aim was to have equally representative tests using the same random sequence, the loops are essentially equivalent and the additional code should help smooth out time slices a little, so hopefully it gets a more realistic ratio for the sets and gets. It will vary a bit but it shows that's it fast and surprisingly good for plain pb. Sets are ~2 times slower and Gets ~1.5 time slower. It should also scale better the more you load it.
AMD FX(tm)-6100 Six-Core Processor 64 bit
Items 10,000,000
Squint Set 14980 ms
Map Set 8704 ms
Squint Get 14750 ms
Map Get 9036 ms
Squint Enum 3 ms
Map Enum 3701 ms
=========================
Squint Set Ratio 1.721 slower
Squint Get Ratio 1.632 slower
Squint Enum Ratio 1233.667 faster
=========================
Key Input Size 139.96 mb
Squint Memory Size 269.53 mb
Overhead 1.93 bytes input
Map Size ~ 464.97mb
Windows 11, Manjaro, Raspberry Pi OS
Image
User avatar
idle
Always Here
Always Here
Posts: 5042
Joined: Fri Sep 21, 2007 5:52 am
Location: New Zealand

Re: SQUINT 2, Sparse Quad Union Indexed Nibble Trie

Post by idle »

updated 2.0.1a
Added on the fly UTF8 conversion with help from wilbert, with no adverse loss of speed.
fixed bug on x86, adding mode parameter and testing for null bytes for string termination, technically it should also apply to x64, though it seems to work fine without it.
Windows 11, Manjaro, Raspberry Pi OS
Image
User avatar
idle
Always Here
Always Here
Posts: 5042
Joined: Fri Sep 21, 2007 5:52 am
Location: New Zealand

Re: SQUINT 2, Sparse Quad Union Indexed Nibble Trie

Post by idle »

Squint 2.0.2a
fixed x86 utf8 mode, previous attempt didn't work properly.
tested ascii modes
ö€à test unicode get
ö€à test unicode enum
ö€à test utf8 get 2
ö€à test utf8 enum
An ascii string
An ascii string test ascii enum
Windows 11, Manjaro, Raspberry Pi OS
Image
User avatar
idle
Always Here
Always Here
Posts: 5042
Joined: Fri Sep 21, 2007 5:52 am
Location: New Zealand

Re: SQUINT 2, Sparse Quad Union Indexed Nibble Trie

Post by idle »

Added Numeric key support so it can be used as a numeric map,

test map sized to 1,000,000 items
Items 1000000
Squint Numeric Set 791 ms
Map Set 527 ms
Squint numeric Get 352 ms
Map Get 483 ms
=========================
Squint Set Ratio 1.501 slower
Squint Get Ratio 1.372 faster
=========================
Key Input Size 7.63 mb
Squint Memory Size 25.71 mb
Overhead 3.37 bytes input
Map Size ~ 49.70mb
test vs a dynamically sized map, sized to 10,000 elements, maps not very dynamic!
Items 1000000
Squint Numeric Set 554 ms
Map Set 4351 ms
Squint numeric Get 265 ms
Map Get 9174 ms
=========================
Squint Set Ratio 0.127 slower
Squint Get Ratio 34.619 faster
=========================
Key Input Size 7.63 mb
Squint Memory Size 25.71 mb
Overhead 3.37 bytes input
Map Size ~ 49.70mb
Windows 11, Manjaro, Raspberry Pi OS
Image
User avatar
idle
Always Here
Always Here
Posts: 5042
Joined: Fri Sep 21, 2007 5:52 am
Location: New Zealand

Re: SQUINT 2, Sparse Quad Union Indexed Nibble Trie

Post by idle »

Added Integer key support to the string mode functions, It provides a marginally faster conversion than calling str() on input.
If you don't need enumeration functions for numeric inputs the fast numeric functions are a better option.
AMD FX(tm)-6100 Six-Core Processor 64 bit
Items 1000000
Squint Set 940 ms
Map Set 414 ms
Squint Get 488 ms
Map Get 380 ms
Squint Enum 1 ms
Map Enum 256 ms
=========================
Squint Set Ratio 2.271 slower
Squint Get Ratio 1.284 slower
Squint Enum Ratio 256.000 faster
=========================
Key Input Size 7.00 mb
Squint Memory Size 54.67 mb
Overhead 7.81 bytes input
Map Size ~ 49.07mb
vs fast numeric
Items 1000000
Squint Numeric Set 587 ms
Map Set 401 ms
Squint numeric Get 251 ms
Map Get 391 ms
=========================
Squint Set Ratio 1.464 slower
Squint Get Ratio 1.558 faster
=========================
Key Input Size 7.63 mb
Squint Memory Size 25.71 mb
Overhead 3.37 bytes input
Map Size ~ 49.70mb
Windows 11, Manjaro, Raspberry Pi OS
Image
User avatar
idle
Always Here
Always Here
Posts: 5042
Joined: Fri Sep 21, 2007 5:52 am
Location: New Zealand

Re: SQUINT 2, Sparse Quad Union Indexed Nibble Trie

Post by idle »

2.0.5b
Removed the string length loop from the _STR macro

Test Squint string mode with numeric input against a sized map
AMD FX(tm)-6100 Six-Core Processor 64 bit
Items 1000000
Squint Set 943 ms
Map Set 424 ms
Squint Get 484 ms
Map Get 385 ms
Squint Enum 1 ms
Map Enum 266 ms
=========================
Squint Set Ratio 2.224 slower
Squint Get Ratio 1.257 slower
Squint Enum Ratio 266.000 faster
=========================
Key Input Size 7.00 mb
Squint Memory Size 54.67 mb
Overhead 7.81 bytes input
Map Size ~ 49.07mb
SquintNumeric mode (no string conversion) vs a sized map
Items 1000000
Squint Numeric Set 578 ms
Map Set 401 ms
Squint numeric Get 273 ms
Map Get 396 ms
=========================
Squint Set Ratio 1.441 slower
Squint Get Ratio 1.451 faster
=========================
Key Input Size 7.63 mb
Squint Memory Size 25.71 mb
Overhead 3.37 bytes input
Map Size ~ 49.70mb


Trying to accommodate unicode,utf8,ascii and integer input has been a bit of a challenge though it's still performing reasonably well. if you use Integers with Squint string functions, make sure you pass them by address.
Windows 11, Manjaro, Raspberry Pi OS
Image
User avatar
idle
Always Here
Always Here
Posts: 5042
Joined: Fri Sep 21, 2007 5:52 am
Location: New Zealand

Re: SQUINT 2, Sparse Quad Union Indexed Nibble Trie

Post by idle »

An example of using squint to find an array of strings in a space separated string.

Code: Select all

;Example of a FindStrings to find multiple occurances of tokens and return their positions

IncludeFile "Squint2.pbi"

UseModule SQUINT 
EnableExplicit 

Structure Item 
  key.s 
  count.l
  len.l
  List positions.i()
EndStructure 

Structure FindStrings 
  List *item.item() 
EndStructure

Global FindStringsItems.FindStrings 
  
Procedure FindStrings(*squint.squint,*source,*keys,*items.FindStrings=0) 
  Protected *inp.Character,*node.squint_node,*sp    
  Protected count,key.s,*item.item    
  
  If *source 
    *inp = *source 
    *sp = *source 
    While *inp\c <> 0
      While *inp\c > 32 
        *inp+2 
      Wend 
      key = PeekS(*source,(*inp-*source)>>1)
      *node = Squintset(*squint,@key,0) 
      If Not *node\value 
        *item =  AllocateStructure(item)
        AddElement(*item\positions()) 
        *item\positions() = *source - *sp
        *item\count = 1 
        *item\key = key 
        *item\len = (*inp-*source)>>1
        *node\value = *item
      Else   
        *item = *node\value 
        AddElement(*item\positions())
        *item\positions() = *source - *sp
        *item\len = (*inp-*source)>>1
        *item\count + 1 
      EndIf   
      If *inp\c <> 0
        *inp+2
        *source = *inp 
      Else 
        Break 
      EndIf   
    Wend 
  EndIf 
  
  *inp = *keys 
  While *inp\c <> 0
    While *inp\c > 32 
      *inp+2 
    Wend 
    key = PeekS(*keys,(*inp-*keys)>>1)
    *item = Squintget(*squint,@key)
    If *item
      count + *item\count  
      If *items
        If *item\count >= 1 
          AddElement(*items\item()) 
          *items\item() = *item 
        EndIf
      EndIf   
    EndIf   
    If *inp\c <> 0
      *inp+2
      *keys = *inp 
    Else 
      Break 
    EndIf   
  Wend 
  
  ProcedureReturn count 
  
EndProcedure 

Procedure cbFindStringsFree(*key,*value,*Data) 
  FreeStructure(*value) 
EndProcedure   

Procedure cbFindStringsEnum(*key,*value,*items.FindStrings) 
  AddElement(*items\item()) 
  *items\item() = *value
  Debug PeekS(*key,-1,#PB_UTF8)
EndProcedure   

Procedure FindStringsFree(*squint.squint) 
  SquintWalk(*squint,@cbFindStringsFree()) 
  SquintFree(*squint) 
EndProcedure   

Procedure FindStringsEnum(*mt.squint,key.s,*items.FindStrings) 
  ClearList(*items\item())
  SquintEnum(*mt,@key,@cbFindStringsEnum(),*items) 
EndProcedure   

Global String1.s = "373 ac3 b9d45 b iPdC ks23 al97 373 ac5 al99 346 vs42159ssbpx roro ask ePOC foo bar xyz 12dk tifer erer e"
Global String2.s = "346 373 iPdC roro ePOC ac3 375"
Global out.s
Global FindStringsItems.FindStrings 
Global *squint.squint = SquintNew() 
Global Replace.s = "-----------------------------------------------------------"

Debug Str(FindStrings(*squint,@String1,@String2,@FindStringsItems)) + " tokens found"

Debug "item, count and it position and the item " 

ForEach FindStringsItems\item() 
  out=""
  Debug FindStringsItems\item()\key + " " + Str(FindStringsItems\item()\count)
  ForEach FindStringsItems\item()\positions() 
    out + Str(FindStringsItems\item()\positions()) + ": " + PeekS(@string1 + FindStringsItems\item()\positions(),FindStringsItems\item()\len) + " " 
    CopyMemory(@Replace,@string1+FindStringsItems\item()\positions(),FindStringsItems\item()\len*SizeOf(Character)) ;replace found items
  Next 
  Debug out 
Next

Debug "replaced found strings with ---"
Debug string1 

Debug "Enum from a" 
 
FindStringsEnum(*squint,"a",@FindStringsItems) 

Debug "for each of a" 

ForEach FindStringsItems\item() 
  Debug FindStringsItems\item()\key 
Next

FindStringsFree(*squint) 
User avatar
NicTheQuick
Addict
Addict
Posts: 1224
Joined: Sun Jun 22, 2003 7:43 pm
Location: Germany, Saarbrücken
Contact:

Re: SQUINT 2, Sparse Quad Union Indexed Nibble Trie

Post by NicTheQuick »

Since I have my new processor the "Squint Enum Ratio" is "+infinity faster" :lol:
I don't know when I will need that code but it sure is a very welcome addition to Maps. This should be more integrated into Purebasic's standard libraries. That would be nice.
The english grammar is freeware, you can use it freely - But it's not Open Source, i.e. you can not change it or publish it in altered way.
User avatar
idle
Always Here
Always Here
Posts: 5042
Joined: Fri Sep 21, 2007 5:52 am
Location: New Zealand

Re: SQUINT 2, Sparse Quad Union Indexed Nibble Trie

Post by idle »

NicTheQuick wrote: Tue May 04, 2021 10:21 am Since I have my new processor the "Squint Enum Ratio" is "+infinity faster" :lol:
I don't know when I will need that code but it sure is a very welcome addition to Maps. This should be more integrated into Purebasic's standard libraries. That would be nice.
Thanks, It would be great to have a native Trie implementation. though I still need to make a thread safe version.

That's like my favourite feature "+infinity faster"
davido
Addict
Addict
Posts: 1890
Joined: Fri Nov 09, 2012 11:04 pm
Location: Uttoxeter, UK

Re: SQUINT 2, Sparse Quad Union Indexed Nibble Trie

Post by davido »

@idle,
I always find your code interesting. Thank you.
I ran the Squint2.pbi on my MacBook M1 and append the three messages, which I hope you will find useful, below:

Code: Select all

VirtualApple @ 2.50GHz 64 bit
Items 1000000
Squint Set 718 ms
Map Set 296 ms
Squint Get 402 ms
Map Get 368 ms
Squint Enum 0 ms
Map Enum 215 ms
=========================
Squint Set Ratio 2.426 slower
Squint Get Ratio 1.092 slower 
Squint Enum Ratio +Infinity faster 
=========================
Key Input Size 7.00 mb
Squint Memory Size 54.67 mb
Overhead 7.81 bytes input
Map Size ~ 49.07mb

5756700
5756702
5756707
5756711
5756734
575678
575679
5756790
Pruned from 57567
check pune worked  SquintEnum again
SquintFree check node count is zero 0

Items 1000000
Squint Numeric Set 405 ms
Map Set 291 ms
Squint numeric Get 153 ms
Map Get 367 ms
=========================
Squint Set Ratio 1.392 slower
Squint Get Ratio 2.399 faster
=========================
Key Input Size 7.63 mb
Squint Memory Size 25.71 mb
Overhead 3.37 bytes input
Map Size ~ 49.70mb

DE AA EB
User avatar
idle
Always Here
Always Here
Posts: 5042
Joined: Fri Sep 21, 2007 5:52 am
Location: New Zealand

Re: SQUINT 2, Sparse Quad Union Indexed Nibble Trie

Post by idle »

thanks for that.

here's an example using squint to test for k-mers.

Code: Select all

; SQUINT 2, Sparse Quad Union Indexed Nibble Trie
; Copyright (c) 2020 Andrew Ferguson
; Version 2.0.5
; PB 5.72 x86 x64 Linux OSX Windows
; Thanks Wilbert for the high low insight and utf8 str asm help.
; Squint is a compact prefix Trie indexed by nibbles into a sparse array with performance metrics close to a map
; It provides O(K*2) performance with a memory size ~32 times smaller than a 256 node Trie
; The overheads are similar to that of a QP Trie, thats a 1/3 smaller than a Critbit.
; In terms of speed SquintSet and SquintGet should be at worst ~2 times slower than a Map.
; Lexographical enumerations are however magnitudes faster than what you could achieve otherwise
; In fast Numeric mode it's gets are closer to 1:1 with a map 
;
; see https://en.wikipedia.org/wiki/Trie
;     https://dotat.at/prog/qp/blog-2015-10-04.html
;     https://cr.yp.to/critbit.html 
;
; Squint supports Set, Get, Enum, Walk, Delete and Prune with a flag in Delete
; keys can be either Unicode, Ascii, UTF8 or Integers, the type must be specified 
; keys are all mapped to UTF8
;
; SquintNumeric (fast numeric) supports, SetNumeric GetNumeric DeleteNumeric and WalkNumeric
; it's provided as a direct subtitute for a numeric map, though lacks enumeration
; keys are returned as Integers  
;
; MIT License
; Permission is hereby granted, SquintFree of charge, to any person obtaining a copy
; of this software and associated documentation files (the "Software"), to deal
; in the Software without restriction, including without limitation the rights
; To use, copy, modify, merge, publish, distribute, sublicense, and/or sell
; copies of the Software, and to permit persons to whom the Software is
; furnished to do so, subject to the following conditions:
; 
; The above copyright notice and this permission notice shall be included in all
; copies or substantial portions of the Software.
; 
; THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS Or
; IMPLIED, INCLUDING BUT Not LIMITED To THE WARRANTIES OF MERCHANTABILITY,
; FITNESS For A PARTICULAR PURPOSE And NONINFRINGEMENT. IN NO EVENT SHALL THE
; AUTHORS Or COPYRIGHT HOLDERS BE LIABLE For ANY CLAIM, DAMAGES Or OTHER
; LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT Or OTHERWISE, ARISING FROM,
; OUT OF Or IN CONNECTION With THE SOFTWARE Or THE USE Or OTHER DEALINGS IN THE
; SOFTWARE. 

DeclareModule SQUINT 
  
  #SQUINT_MAX_KEY = 1024
  
  Structure squint_node 
    *vertex.edge
    StructureUnion
      squint.q
      value.i 
    EndStructureUnion 
  EndStructure   
  
  Structure edge 
    e.squint_node[0]
  EndStructure 
  
  Structure squint
    *vt
    *root.squint_node
    size.l
    datasize.l
    count.l
    ib.a[22]
    sb.a[#SQUINT_MAX_KEY]
  EndStructure
  
  CompilerIf #PB_Compiler_Processor = #PB_Processor_x86 
    #Squint_Pmask = $ffffffff
  CompilerElse
    #Squint_Pmask = $ffffffffffff
  CompilerEndIf
  
  ;-Squint Callback prototype 
  Prototype Squint_CB(*key,*value=0,*userdata=0)
  
  Declare SquintNew()
  Declare SquintFree(*this.Squint)
  Declare SquintDelete(*this.squint,*key,prune=0,mode=#PB_Unicode)
  Declare SquintSet(*this.squint,*key,value.i,mode=#PB_Unicode)
  Declare SquintGet(*this.squint,*key,mode=#PB_Unicode)
  Declare SquintEnum(*this.squint,*key,*pfn.squint_CB,*userdata=0,ReturnMatch=0,mode=#PB_Unicode)
  Declare SquintWalk(*this.squint,*pfn.squint_CB,*userdata=0)
  Declare SquintSetNumeric(*this.squint,key.i,value.i)
  Declare SquintGetNumeric(*this.squint,key.i)
  Declare SquintDeleteNumeric(*this.squint,key.i)
  Declare SquintWalkNumeric(*this.squint,*pfn.squint_CB,*userdata=0)
  
  ;-Squint Inteface iSquint  
  Interface iSquint
    Free()
    Delete(*key,prune=0,mode=#PB_Unicode)
    Set(*key,value.i,mode=#PB_Unicode)
    Get(*key,mode=#PB_Unicode)
    Enum(*key,*pfn.squint_CB,*userdata=0,ReturnMatch=0,mode=#PB_Unicode)
    Walk(*pfn.squint_CB,*userdata=0)
    SetNumeric(key.i,value.i) 
    GetNumeric(key.i) 
    SquintDeleteNumeric(key.i)
    WalkNumeric(*pfn.Squint_CB,*userdata=0)
  EndInterface
  
  DataSection: vtSquint:
    Data.i @SquintFree()
    Data.i @SquintDelete() 
    Data.i @SquintSet()
    Data.i @SquintGet()
    Data.i @SquintEnum()
    Data.i @SquintWalk()
    Data.i @SquintSetNumeric()
    Data.i @SquintGetNumeric()
    Data.i @SquintDeleteNumeric()
    Data.i @SquintWalkNumeric() 
  EndDataSection   
  
EndDeclareModule

Module SQUINT
  
  EnableExplicit
  ;-macros 
  Macro _SETINDEX(in,index,number)
    in = in & ~(15 << (index << 2)) | (number << (index << 2))
  EndMacro
  
  Macro _GETNODECOUNT()
    CompilerIf #PB_Compiler_Processor = #PB_Processor_x86
      nodecount = MemorySize(*node\vertex) / SizeOf(squint_node)
    CompilerElse
      nodecount = (*node\vertex >> 48)
    CompilerEndIf
  EndMacro
  
  Macro _POKENHL(in,Index,Number)
    *Mem.Ascii = in
    *Mem + Index >> 1
    If Index & 1
      *Mem\a = (*Mem\a & $f0) | (Number & $f)
    Else
      *Mem\a = (*Mem\a & $0f) | (Number << 4)
    EndIf
  EndMacro
  
  Macro _STR() 
    CompilerIf #PB_Compiler_Processor = #PB_Processor_x64
      ; backup registers
      !mov [rsp - 8], rbx
      !mov [rsp - 16], rdi
      ; get procedure arguments
      !mov rax, [p.v_char]
      !mov rdi, [p.p_out] 
      ; determine sign
      !xor ebx, ebx
      !cmp rax, rbx
      !jge .l0
      !neg rax
      !mov rdx, 1 
      !mov [rsp - 24],rdx
      !.l0:
      ; convert number to ascii characters
      !add rdi, 22 
      !mov byte [rdi], 0
      !mov rbx, 0xcccccccccccccccd
      !.l1:
      !mov rcx, rax
      !mul rbx
      !mov rax, rdx
      !shr rax, 3
      !imul rdx, rax, 10
      !sub rcx, rdx
      !or rcx, 0x30
      !sub rdi, 1
      !mov [rdi], cl
      !test rax, rax
      !jnz .l1
      ;look up sign 
      !mov rax,[rsp-24]
      !cmp rax,1
      !jnz .l2
      !sub rdi, 1 
      !mov byte [rdi], '-'
      !.l2:
      ;store start of string 
      !mov [p.v_offset],rdi 
      ;restore registers 
      !mov rdi, [rsp - 16]
      !mov rbx, [rsp - 8]
    CompilerElse
      ; backup registers
      !mov [esp - 4], ebx
      !mov [esp - 8], edi
      ; get procedure arguments
      !mov eax, [p.v_char]
      !mov edi, [p.p_out]
      ; determine length
      !xor ebx, ebx
      !cmp eax, ebx
      !jge .l0
      !neg eax
      !mov edx, 1 
      !mov [esp - 12],edx
      !.l0:
      ; convert number to ascii characters
      !add edi, 22
      !mov byte [edi], 0
      !mov ebx, 0xcccccccd
      !.l1:
      !mov ecx, eax
      !mul ebx
      !mov eax, edx
      !shr eax, 3
      !imul edx, eax, 10
      !sub ecx, edx
      !or ecx, 0x30
      !sub edi, 1
      !mov [edi], cl
      !test eax, eax
      !jnz .l1
      ;look up sign
      !mov eax,[esp-12]
      !cmp eax,1
      !jnz .l2
      !sub edi, 1 
      !mov byte [edi], '-'
      !.l2:
      ;store offset to start of string
      !mov [p.v_offset],edi 
      ;restore registers 
      !mov edi, [esp - 8]
      !mov ebx, [esp - 4]
    CompilerEndIf
    
  EndMacro
  
  Macro _CONVERTUTF8() 
    If mode <> #PB_Integer 
      char= PeekU(*key)
    Else    
      char = PeekI(*key)
      *out = @*this\ib[0]
      _STR()
      *key = offset 
      mode = #PB_Ascii
      char= PeekU(*key)
    EndIf 
    
    If mode = #PB_Unicode  
      !mov eax, [p.v_char]
      !cmp eax, 0x0080
      !jb .l3
      !cmp eax, 0x0800
      !jae .l4
      !shl eax, 2
      !shr al, 2
      !or eax, 1100000010000000b
      !bswap eax
      !shr eax, 16
      !jmp .l3
      !.l4:
      !shl eax, 4
      !shr ax, 2
      !shr al, 2
      !or eax, 111000001000000010000000b
      !bswap eax
      !shr eax, 8
      !.l3:
      !mov [p.v_char],eax
    EndIf 
  EndMacro 
  
  Macro _MODECHECK()
    _CONVERTUTF8()
    If mode <> #PB_Unicode
      If (char >> ((count&1)<<4) & $ff = 0)
        Break
      EndIf 
    Else 
      *this\datasize+1
    EndIf
  EndMacro 
  
  Macro _SETNODE()
    If *node\vertex
      _GETNODECOUNT()
      If (offset <> 15 Or nodecount = 16)
        *node = *node\Vertex\e[offset] & #Squint_Pmask
      Else  
        *this\size + SizeOf(squint_node)
        offset = nodecount
        nodecount+1
        *node\vertex = ReAllocateMemory(*node\vertex & #Squint_Pmask,(nodecount)*SizeOf(squint_node))
        CompilerIf #PB_Compiler_Processor = #PB_Processor_x64 
          *node\vertex | ((nodecount) << 48)
        CompilerEndIf
        _SETINDEX(*node\squint,idx,offset)
        *node = *node\Vertex\e[offset] & #Squint_Pmask
      EndIf
    Else
      *this\size + SizeOf(squint_node)
      *node\vertex = AllocateMemory(SizeOf(squint_Node))
      CompilerIf #PB_Compiler_Processor = #PB_Processor_x64
        *node\vertex | (1 << 48)
      CompilerEndIf
      *node\squint = -1
      _SETINDEX(*node\squint,idx,0)
      *node = *node\Vertex\e[0] & #Squint_Pmask
      *this\count+1
    EndIf
  EndMacro
  
  CompilerIf #PB_Compiler_Processor = #PB_Processor_x86 
    Macro rax : eax : EndMacro 
  CompilerEndIf   
  ;-General functions 
  Procedure SquintNew()
    Protected *this.squint,a
    *this = AllocateMemory(SizeOf(squint))
    If *this
      *this\vt = ?vtSquint
      *this\root = AllocateMemory(SizeOf(squint_node))
      ProcedureReturn *this
    EndIf
  EndProcedure
  
  Procedure ISquintFree(*this.squint,*node.squint_node=0)
    Protected a,offset,nodecount
    If Not *node
      ProcedureReturn 0
    EndIf
    For a=0 To 15
      offset = (*node\squint >> (a<<2)) & $f
      If *node\vertex
        _GETNODECOUNT()
        If (offset <> 15 Or nodecount = 16)
          ISquintFree(*this,*node\Vertex\e[offset] & #Squint_Pmask)
        EndIf
      EndIf
    Next
    If *node\vertex
      _GETNODECOUNT()
      *this\size - nodecount
      *this\count - 1
      FreeMemory(*node\Vertex & #Squint_Pmask)
      *node\vertex=0
    EndIf
    ProcedureReturn *node
  EndProcedure
  
  Procedure SquintFree(*this.squint)
    Protected a,offset,*node.squint_node,nodecount
    *node = *this\root
    For a=0 To 15
      offset = (*node\squint >> (a<<2)) & $f
      If *node\vertex
        _GETNODECOUNT()
        If (offset <> 15 Or nodecount = 16)
          ISquintFree(*this,*node)
        EndIf
      EndIf
    Next
    FreeMemory(*this\root)
    nodecount = *this\count
    FreeMemory(*this)
    ProcedureReturn nodecount
  EndProcedure
  ;-string functions 
  Procedure SquintDelete(*this.squint,*key.Unicode,prune=0,mode=#PB_Unicode)
    Protected *node.squint_node,idx,*mem.Character,offset,nodecount,char.i,count,*out
    *node = *this\root
    _CONVERTUTF8()
    While char
      offset = (*node\squint >> ((char & $f0) >> 2 )) & $f
      If *node\vertex
        _GETNODECOUNT()
        If (offset <> 15 Or nodecount = 16)
          *node = *node\Vertex\e[offset] & #Squint_Pmask
        EndIf
      Else
        ProcedureReturn 0
      EndIf
      If *node
        offset = (*node\squint >> ((char & $0f) << 2)) & $f
        If *node\vertex
          _GETNODECOUNT()
          If (offset <> 15 Or nodecount = 16)
            *node = *node\Vertex\e[offset] & #Squint_Pmask
          EndIf
        Else
          ProcedureReturn 0
        EndIf
      EndIf
      char >> 8
      If char = 0
        *key+2
        _MODECHECK()
      EndIf
    Wend
    If prune
      ISquintFree(*this,*node)
      If (*node\vertex & #Squint_Pmask) = 0
        *node\squint = 0
      EndIf
    Else
      offset = *node\squint & $f
      _GETNODECOUNT()
      If offset <= nodecount
        *node = (*node\Vertex\e[offset] & #Squint_Pmask)
        If (*node\vertex & #Squint_Pmask) = 0
          *node\squint = 0
        EndIf
      Else
        ProcedureReturn 0
      EndIf
    EndIf
  EndProcedure
  
  Procedure SquintSet(*this.squint,*key,value.i,mode=#PB_Unicode)
    Protected *node.squint_node,idx,offset,nodecount,char.i,count,*out
    *node = *this\root & #Squint_Pmask
    _CONVERTUTF8()
    While char
      idx = (char >> 4) & $f
      offset = (*node\squint >> (idx<<2)) & $f
      _SETNODE()
      idx = char & $0f
      offset = (*node\squint >> (idx<<2)) & $f
      _SETNODE()
      *this\datasize+1
      char >> 8
      count+1
      If char = 0
        *key+2
        _MODECHECK()
      EndIf
    Wend
    idx=0
    offset = *node\squint & $f
    _SETNODE()
    If value 
      *node\value = value
    EndIf 
    ProcedureReturn *node 
  EndProcedure
  
  Procedure SquintGet(*this.squint,*key,mode=#PB_Unicode)
    Protected *node.squint_Node,idx,offset,nodecount,char.i,count,*out
    *node = *this\root & #Squint_Pmask
    _CONVERTUTF8()
    While char
      offset = (*node\squint >> ((char & $f0) >> 2 )) & $f
      _GETNODECOUNT()
      If offset < nodecount
        *node = (*node\Vertex\e[offset] & #Squint_Pmask)
      Else
        ProcedureReturn 0
      EndIf
      offset = (*node\squint >> ((char & $0f) << 2)) & $f
      _GETNODECOUNT()
      If offset < nodecount
        *node = (*node\Vertex\e[offset] & #Squint_Pmask)
      Else
        ProcedureReturn 0
      EndIf
      char >> 8
      count+1
      If char = 0
        *key+2
        _MODECHECK()
      EndIf
    Wend
    offset = *node\squint & $f
    _GETNODECOUNT()
    If offset <= nodecount
      *node = (*node\Vertex\e[offset] & #Squint_Pmask)
      ProcedureReturn *node\value
    Else
      ProcedureReturn 0
    EndIf
  EndProcedure
      
  Procedure IEnum(*this.squint,*node.squint_Node,depth,*pfn.squint_CB,*userdata=0)
    Protected a.i,offset,nodecount,*mem.Ascii 
    If Not *node
      ProcedureReturn 0
    EndIf
    For a=0 To 15
      offset = (*node\squint >> (a<<2)) & $f
      If (*node\vertex And *node\squint)
        _GETNODECOUNT()
        If (offset <> 15 Or nodecount = 16)
          _POKENHL(@*this\sb,depth,a)
          IEnum(*this,*node\Vertex\e[offset] & #Squint_Pmask,depth+1,*pfn,*userdata)
        EndIf
      EndIf
    Next
    If *node\vertex=0
      If *pfn
        PokeA(@*this\sb+((depth>>1)),0)
        *pfn(@*this\sb,*node\value,*userdata)
      EndIf
    EndIf
    ProcedureReturn *node
  EndProcedure
  
  Procedure SquintEnum(*this.squint,*key.Unicode,*pfn.squint_CB,*userdata=0,ReturnMatch=0,mode=#PB_Unicode)
    Protected *node.squint_Node,idx,*mem.Ascii,offset,nodecount,depth,char.i,count,*out
    If ReturnMatch
      If SquintGet(*this,*key)
        If *pfn
          *pfn(*key,0,*userdata)
        EndIf
        ProcedureReturn
      EndIf
    EndIf
    *node = *this\root
    _CONVERTUTF8()
    While char 
      idx = (char >> 4) & $f
      offset = (*node\squint >> (idx<<2)) & $f
      If (*node\vertex And *node\squint)
        _GETNODECOUNT()
        If (offset <> 15 Or nodecount = 16)
          ;POKENHL(@*this\sb,depth,idx)
          *mem = @*this\sb+(depth>>1) 
          *mem\a = (*mem\a & $0f) | (idx<<4)
          depth+1
          *node = *node\Vertex\e[offset] & #Squint_Pmask
        EndIf
      EndIf
      If Not *node Or *node\vertex = 0
        ProcedureReturn 0
      EndIf
      idx = char & $f
      offset = (*node\squint >> (idx<<2)) & $f
      If (*node\vertex And *node\squint)
        _GETNODECOUNT()
        If (offset <> 15 Or nodecount = 16)
          *mem = @*this\sb+(depth>>1) 
          *Mem\a = (*Mem\a & $f0) | (idx & $f)
          depth+1
          *node = *node\Vertex\e[offset] & #Squint_Pmask
        EndIf
      EndIf
      If Not *node Or *node\vertex = 0
        ProcedureReturn 0
      EndIf
      char >> 8
      count+1
      If char = 0
        *key+2
        _MODECHECK()
      EndIf
    Wend
    IEnum(*this,*node,depth,*pfn,*userdata)
  EndProcedure
  
  Procedure SquintWalk(*this.squint,*pfn.squint_CB,*userdata=0) 
    IEnum(*this,*this\root,0,*pfn,*userdata)
  EndProcedure
  
  ;-Numeric functions
  Procedure SquintSetNumeric(*this.squint,key.i,value.i)
    Protected *node.squint_node,idx,offset,nodecount,char.i,count 
    *node = *this\root & #Squint_Pmask
    char = key
    EnableASM
    mov rax, char
    bswap rax;
    mov char,rax
    DisableASM 
    While count < SizeOf(Integer) 
      idx = (char >> 4) & $f
      offset = (*node\squint >> (idx<<2)) & $f
      _SetNODE()
      idx = (char & $f)
      offset = (*node\squint >> (idx<<2)) & $f
      _SetNODE()
      *this\datasize+1
      char >> 8
      count+1
    Wend
    *node\value = value
    ProcedureReturn 
  EndProcedure
  
  Procedure SquintGetNumeric(*this.squint,key.i)
    Protected *node.squint_Node,idx,offset,nodecount,char.i,count,skey.s
    *node = *this\root & #Squint_Pmask
    char = key
    EnableASM
    mov rax, char
    bswap rax;
    mov char,rax
    DisableASM 
    While count < SizeOf(Integer)
      offset = (*node\squint >> ((char & $f0) >> 2 )) & $f
      _GETNODECOUNT()
      If offset < nodecount
        *node = (*node\Vertex\e[offset] & #Squint_Pmask)
      Else
        ProcedureReturn 0
      EndIf
      offset = (*node\squint >> ((char & $0f) << 2)) & $f
      _GETNODECOUNT()
      If offset < nodecount
        *node = (*node\Vertex\e[offset] & #Squint_Pmask)
      Else
        ProcedureReturn 0
      EndIf
      char>>8
      count+1
    Wend
    ProcedureReturn *node\value
  EndProcedure
  
  Procedure SquintDeleteNumeric(*this.squint,key.i)
    Protected *node.squint_node,idx,*mem.Character,offset,nodecount,char.i,count
    *node = *this\root & #Squint_Pmask
    char = key
    EnableASM
    mov rax, char
    bswap rax;
    mov char,rax
    DisableASM 
    While count < SizeOf(Integer)
      offset = (*node\squint >> ((char & $f0) >> 2 )) & $f
      _GETNODECOUNT()
      If offset < nodecount
        *node = (*node\Vertex\e[offset] & #Squint_Pmask)
      Else
        ProcedureReturn 0
      EndIf
      offset = (*node\squint >> ((char & $0f) << 2)) & $f
      _GETNODECOUNT()
      If offset < nodecount
        *node = (*node\Vertex\e[offset] & #Squint_Pmask)
      Else
        ProcedureReturn 0
      EndIf
      char>>8
      count+1
    Wend
    If (*node\vertex & #Squint_Pmask) = 0
      *node\squint = 0
    EndIf
  EndProcedure
  
  Procedure IEnumNumeric(*this.squint,*node.squint_Node,idx,depth,*pfn.squint_CB,*userdata=0)
    Protected a.i,offset,nodecount,*mem.Ascii,char.i 
    If Not *node
      ProcedureReturn 0
    EndIf
    For a=0 To 15 
      offset = (*node\squint >> (a<<2)) & $f
      If (*node\vertex And *node\squint)
        _GETNODECOUNT()
        If (offset <> 15 Or nodecount = 16)
          _POKENHL(@*this\sb,depth,a)
          IEnumNumeric(*this,*node\Vertex\e[offset] & #Squint_Pmask,0,depth+1,*pfn,*userdata)
        EndIf
      EndIf
    Next
    If *node\vertex=0
      char = PeekI(@*this\sb)
      EnableASM
      mov rax, char
      bswap rax;
      mov char,rax
      DisableASM 
      If *pfn
        *pfn(@char,*node\value,*userdata)
      EndIf
    EndIf
    ProcedureReturn *node
  EndProcedure
  
  Procedure SquintWalkNumeric(*this.squint,*pfn.squint_CB,*userdata=0) 
    IEnumNumeric(*this,*this\root,0,0,*pfn,*userdata)
  EndProcedure
  
  DisableExplicit
  ;-End of module   
EndModule

CompilerIf #PB_Compiler_IsMainFile
  
  UseModule SQUINT 
  EnableExplicit 
  
  Structure Item 
    key.s 
    count.l
    len.l
    List positions.i()
  EndStructure 
  
  Structure FindStrings 
    List *item.item() 
  EndStructure
  
  Global FindStringsItems.FindStrings 
  
  Procedure Kmers(*squint.squint,*source,*keys,k=2,*items.FindStrings=0) 
    Protected *inp.Character,*inp1.Character,*node.squint_node,*sp    
    Protected c,count,key.s,*item.item    
    
    If *source 
      *inp = *source 
      *sp = *source 
      While *inp\c <> 0
        c=1
        *inp1 = *inp 
        While (*inp1\c <> 0 And c <= k)  
          *inp1+SizeOf(Character)
          c+1
        Wend 
        key = PeekS(*source,(*inp1-*source)/SizeOf(Character))
        *node = Squintset(*squint,@key,0) 
        If Not *node\value 
          *item =  AllocateStructure(item)
          AddElement(*item\positions()) 
          *item\positions() = *source - *sp
          *item\count = 1 
          *item\key = key 
          *item\len = (*inp1-*source)/SizeOf(Character)
          *node\value = *item
        Else   
          *item = *node\value 
          AddElement(*item\positions())
          *item\positions() = *source - *sp
          *item\len = (*inp1-*source)/SizeOf(Character)
          *item\count + 1 
        EndIf   
        If *inp1\c <> 0
          *inp+2
          *source = *inp 
        Else 
          Break 
        EndIf   
      Wend 
    EndIf 
    
    *inp = *keys 
    While *inp\c <> 0
      c=1
      *inp1 = *inp 
      While (*inp1\c <> 0 And c <= k)  
        *inp1+SizeOf(Character)
        c+1
      Wend 
      key = PeekS(*keys,(*inp1-*keys)/SizeOf(Character))
      *item = Squintget(*squint,@key)
      If *item
        count + *item\count  
        If *items
          If *item\count >= 1 
            AddElement(*items\item()) 
            *items\item() = *item 
          EndIf
        EndIf   
      EndIf   
      If *inp1\c <> 0
        *inp+SizeOf(Character)
        *keys = *inp 
      Else 
        Break 
      EndIf   
    Wend 
    
    ProcedureReturn count 
    
  EndProcedure 
  
  
  Procedure cbFindStringsFree(*key,*value,*Data) 
    FreeStructure(*value) 
  EndProcedure   
  
  Procedure cbFindStringsEnum(*key,*value,*items.FindStrings) 
    AddElement(*items\item()) 
    *items\item() = *value
    Debug PeekS(*key,-1,#PB_UTF8)
  EndProcedure   
  
  Procedure FindStringsFree(*squint.squint) 
    SquintWalk(*squint,@cbFindStringsFree()) 
    SquintFree(*squint) 
  EndProcedure   
  
  Procedure FindStringsEnum(*mt.squint,key.s,*items.FindStrings) 
    ClearList(*items\item())
    SquintEnum(*mt,@key,@cbFindStringsEnum(),*items) 
  EndProcedure   
  
  Procedure MakeRandom(*buf.Unicode) 
    Protected a, len,c  
    len = (MemorySize(*buf)-1) >> 1 
    For a = 0 To Len 
      c = Random(3) 
      Select c 
        Case 0 
          *buf\u = 'A' 
        Case 1 
          *buf\u = 'C' 
        Case 2 
          *buf\u = 'T' 
        Case 3 
          *buf\u = 'G' 
      EndSelect 
      *buf+2 
    Next   
      
  EndProcedure   
  
  Global *buf = AllocateMemory(10000) 
    
  MakeRandom(*buf)
  
  Debug PeekS(*buf,100) + "..." 
  
  Global mers.s = "CTGAC" ;mers to search for are CTG + TGA + GAC when k = 3  
  
  Global out.s
  Global FindStringsItems.FindStrings 
  Global *squint.squint = SquintNew() 
  Global Replace.s = "-----------------------------------------------------------"
  
  Debug Str(Kmers(*squint,*buf,@mers,3,@FindStringsItems)) + " k-mers found"
  
  Debug "item, count and it position and the item " 
  
  ForEach FindStringsItems\item() 
    out=""
    Debug FindStringsItems\item()\key + " " + Str(FindStringsItems\item()\count)
    ForEach FindStringsItems\item()\positions() 
      out + Str(FindStringsItems\item()\positions()) + ": " + PeekS(*buf + FindStringsItems\item()\positions(),FindStringsItems\item()\len) + " " 
    Next 
    Debug Left(out,100) + "..."   
  Next
  
  Debug "replaced found strings with --"
  
  ForEach FindStringsItems\item() 
    ForEach FindStringsItems\item()\positions() 
     CopyMemory(@Replace,*buf+FindStringsItems\item()\positions(),FindStringsItems\item()\len*SizeOf(Character)) ;replace found items
   Next 
  Next
    
  Debug PeekS(*buf,100) + "..." 
  
  Debug "Enum from C in callback" 
  
  FindStringsEnum(*squint,"C",@FindStringsItems) 
  
;   Debug "foreach after Enum" 
;   
;   ForEach FindStringsItems\item() 
;     Debug FindStringsItems\item()\key 
;   Next
  
  Debug *squint\size 
  Debug  MemorySize(*buf)
  
  FindStringsFree(*squint) 
  
CompilerEndIf

kinglestat
Enthusiast
Enthusiast
Posts: 732
Joined: Fri Jul 14, 2006 8:53 pm
Location: Malta
Contact:

Re: SQUINT 2, Sparse Quad Union Indexed Nibble Trie

Post by kinglestat »

Truly fantastic. Well done.
I may not help with your coding
Just ask about mental issues!

http://www.lulu.com/spotlight/kingwolf
http://www.sen3.net
User avatar
idle
Always Here
Always Here
Posts: 5042
Joined: Fri Sep 21, 2007 5:52 am
Location: New Zealand

Re: SQUINT 2, Sparse Quad Union Indexed Nibble Trie

Post by idle »

kinglestat wrote: Wed May 05, 2021 8:15 am Truly fantastic. Well done.
Thanks, It's been quite a challenging data structure.
Post Reply