FindData module

Share your advanced PureBasic knowledge/code with the community.
Mistrel
Addict
Addict
Posts: 3415
Joined: Sat Jun 30, 2007 8:04 pm

Re: FindData module

Post by Mistrel »

Would someone provide example use cases for this? Thank you. :)
User avatar
idle
Always Here
Always Here
Posts: 5096
Joined: Fri Sep 21, 2007 5:52 am
Location: New Zealand

Re: FindData module

Post by idle »

It's like findstring but works with binary data and patterns and is ~10x faster than findstring. Each function has it's pros and cons
depending upon the length of what your looking for (the needle) and you also have to check if SSE is available
at runtime though that shouldn't really be an issue, unless its some rusty old box in the basement.

I just bolted it in to a tag preprocessor to look for "<?PB" and want to use SSE2_FIND which is blazingly fast for short needles
You need to test if SSE2 is available, Then set a function prototype or fall back to BM (boyer moore) which is also fast
but prefers bigger needles, as it steps through the input by the needle size.

Code: Select all

Procedure PreProcessInit(*pre.PreProcessor) 
  UseModule CPUInfo
  
  If IsCPU(#PB_CPU_SSE2)
    *pre\find = finddata::@SSE2_Find() 
  Else 
    *pre\find = FindData::@BM() 
  EndIf   
  UnuseModule CPUInfo 
EndProcedure 

....

While (pos > -1 And pos < len) 
    pos = *pre\find(*input,len,*pre\tag,*pre\taglen,pos) ;<--- hunt through *input for UTF8("<?PB")  
    If pos <> -1 
      osz = outp+(pos-p_end)
      If osz > outlen 
        *pre\output = ReAllocateMemory(*pre\output,osz,#PB_Memory_NoClear)
        outlen = osz 
      EndIf   
  ...

Wilbert will have the demo code lurking somewhere. no doubt.
Windows 11, Manjaro, Raspberry Pi OS
Image
Mistrel
Addict
Addict
Posts: 3415
Joined: Sat Jun 30, 2007 8:04 pm

Re: FindData module

Post by Mistrel »

Thank you. :)

I know that these are hand-optimized assembly. But I would love to see a normal code version for comparison for those of us who don't know assembly.
User avatar
idle
Always Here
Always Here
Posts: 5096
Joined: Fri Sep 21, 2007 5:52 am
Location: New Zealand

Re: FindData module

Post by idle »

I found the link to the collection of c algorithms which largely inspired the module
they should be a bit more comprehensible.
http://www.dmi.unict.it/~faro/smart/algorithms.php
Windows 11, Manjaro, Raspberry Pi OS
Image
SeregaZ
Enthusiast
Enthusiast
Posts: 619
Joined: Fri Feb 20, 2009 9:24 am
Location: Almaty (Kazakhstan. not Borat, but Triple G)
Contact:

Re: FindData module

Post by SeregaZ »

maybe someone will have some ideas:
i have rom file - any sega mega drive game. and i have 4 files - banks for music. i read first 16bytes from that banks, then search where it lays in a rom file and remember thems adresses. for example it was $AABB01 $AACC02 $AADD03 $AAEE04 - now i need to find some place, where that four addresses can be lays near each other. not exactly near... but maybe between of them 2-20 bytes... problem is this values too short and can be a lot of matching with search. but i need only that place, where they all near - that place will be sure place, what i need :) this four values can be with any order... not from bigger to lower, or lower to bigger... any. just some one place, where they lays near. and it can be 4 values, but can be 3 and 2 values (when some banks is empty)

i can see something in my head... make some 4 arrays with all matching places with that addresses... and then need some + and - function probably...cant catch it :) still flying in my head...
Bitblazer
Enthusiast
Enthusiast
Posts: 736
Joined: Mon Apr 10, 2017 6:17 pm
Location: Germany
Contact:

Re: FindData module

Post by Bitblazer »

I am not sure but this might be a special case of the knapsack problem
SeregaZ
Enthusiast
Enthusiast
Posts: 619
Joined: Fri Feb 20, 2009 9:24 am
Location: Almaty (Kazakhstan. not Borat, but Triple G)
Contact:

Re: FindData module

Post by SeregaZ »

thanks, not clear for me this Knapsack...

i make something like that:

Code: Select all

Structure forsortarray
  type$
  value.l
EndStructure
Global Dim SortArrayValues.forsortarray(4)

; arrays of addresses
Dim inst.l(3)
Dim envel.l(3)
Dim seq.l(3)
Dim samples.l(3)

;{ fill addresses
inst(1)    = $00000688
inst(2)    = $0000D688
inst(3)    = $001FF836

envel(1)   = $0000A682
envel(2)   = $0000D682
envel(3)   = $0011D682

seq(1)     = $0000D00C
seq(2)     = $0000D67C
seq(3)     = $0001D67C

samples(1) = $0000D600
samples(2) = $0000D676
samples(3) = $0010D676
;}

sizeins = ArraySize(inst())
sizeenv = ArraySize(envel())
sizeseq = ArraySize(seq())
sizesmp = ArraySize(samples())

; detect which arrays is empty
If sizeins And sizeenv And sizeseq And sizesmp
  ; all banks is here
  
  For ins = 1 To sizeins  
    For env = 1 To sizeenv
      For seq = 1 To sizeseq 
        For smp = 1 To sizesmp
          ; inst env seq smp
          
          ; fill all data in temp array
          SortArrayValues(1)\type$ = "ins"
          SortArrayValues(1)\value = inst(ins)
          
          SortArrayValues(2)\type$ = "env"
          SortArrayValues(2)\value = envel(env)
          
          SortArrayValues(3)\type$ = "seq"
          SortArrayValues(3)\value = seq(seq)
          
          SortArrayValues(4)\type$ = "smp"
          SortArrayValues(4)\value = samples(smp)
          
          ; sort array for #PB_Sort_Ascending
          SortStructuredArray(SortArrayValues(), #PB_Sort_Ascending, OffsetOf(forsortarray\value), TypeOf(forsortarray\value))
          
          tmp1 = SortArrayValues(4)\value - SortArrayValues(3)\value
          Select tmp1
            Case 4 To 20
              tmp2 = SortArrayValues(3)\value - SortArrayValues(2)\value
              Select tmp2
                Case 4 To 20
                  tmp3 = SortArrayValues(2)\value - SortArrayValues(1)\value
                  Select tmp3
                    Case 4 To 20
                      Debug "was found at " + Str(ins) + Str(env) + Str(seq) + Str(smp)
                      Break 4
                  EndSelect
              EndSelect 
          EndSelect
          
        Next
      Next  
    Next  
  Next  

  
ElseIf sizeins And sizeseq And sizesmp
  ; no envelopes
ElseIf sizeins And sizeenv And sizeseq
  ; no samples
ElseIf sizeins And sizeseq
  ; no samples and envelopes
EndIf

it should find some 2222 - second items in arrays is correct. but i am glad - it is work :) need a lot future tests but i think it is fine :)
Post Reply