A fast way to compare stuff

Just starting out? Need help? Post your questions and find answers here.
Inf0Byt3
PureBasic Fanatic
PureBasic Fanatic
Posts: 2236
Joined: Fri Dec 09, 2005 12:15 pm
Location: Elbonia

Post by Inf0Byt3 »

Yes, this would work too... I'll test it. Until now, netmaesto's idea is working exremely well, the scanner is working. I gotta test the efficacity too.
None are more hopelessly enslaved than those who falsely believe they are free. (Goethe)
Inf0Byt3
PureBasic Fanatic
PureBasic Fanatic
Posts: 2236
Joined: Fri Dec 09, 2005 12:15 pm
Location: Elbonia

Post by Inf0Byt3 »

Here's a non-functioning dirty test... If anyone can tell me why the heck doesn't the Procedure in the dll return nothing?

Server is now down...
None are more hopelessly enslaved than those who falsely believe they are free. (Goethe)
Dare2
Moderator
Moderator
Posts: 3321
Joined: Sat Dec 27, 2003 3:55 am
Location: Great Southern Land

Post by Dare2 »

Some code:

Builds a lookup table, uses a binary search to find the needed.
Conceptual, not optimised, and lots of assumptions.

Can you bend it to suit your needs?

Code: Select all

EnableExplicit

Macro strL(val)
  Str(val)
EndMacro

Structure UDT_keyInfo
  key.l
  ptr.l
EndStructure

; ===========================================================================
; PART 1 : Preparation. Not required in the main prog.
;   Assumes speed not critical, builds the list
;   This would be done each time a virus definition was added.

Global NewList keys.UDT_keyInfo()                ; To hold the key/ptr info
Global itemCount.l                               ; To count the info
Global memBank.l
Global fileSz.l

Global rawFile.s

rawFile = "C:\100\pb400\PROJECTS\DataBase\rawViralKeys.dta"
RandomSeed(1)

; Substitute this for something reading key/pointer information from the real file.
; Assumptions are:
;   The offset of each "record" in the real file is known.
;   A fixed length key using a part of the record is feasible.
;   Duplication is allowed. If none, so much the better!

Procedure populateRawList()                      ; Pretend we're reading these in
  Protected offSet.l                             ;   from a file. Key is used here
  For offSet = 1 To 52000                        ;   is just a long. Substitute for
    AddElement(keys())                           ;   whatever you will use. IMO, if
    keys()\key = Random($7FFFFFFF)               ;   using strings of 8 or less, then
    keys()\ptr = offSet                          ;   treat as quads or longs.
    itemCount + 1
  Next                                           ; Once keys are listed, sort them
  SortStructuredList(keys(),0,OffsetOf(UDT_keyInfo\key),#PB_Sort_Long)
EndProcedure

populateRawList()                                ; Build our lookup list

CreateFile(0,rawFile)                            ; Write our list to file
WriteLong(0,itemCount)                           ; First bytes are a count of items
ForEach keys()
  WriteLong(0,keys()\key)
  WriteLong(0,keys()\ptr)
Next
CloseFile(0)

; ===========================================================================
; PART 2 - Used to lookup in the main prog
;    Uses a binary search, overhead is:
;       max 16 (plus adjusting) hits for  32k - 1 items
;       max 32 (plus adjusting) hits for  64k - 1 items
;       max 64 (plus adjusting) hits for 128k - 1 items
;    etc.

fileSz = FileSize(rawFile)                       ; Load our sorted list
memBank.l = AllocateMemory(fileSz-4)             ; Not a lot of error checking :)
ReadFile(0,rawFile)
itemCount = ReadLong(0)
ReadData(0,memBank,fileSz)
CloseFile(0)

; This will (should) always find the first key that is >= to the search key,
;    with one exception:
;      If the key being searched is greater than the highest key in list,
;      the last key in the list is return and will be < than search key
     
Procedure binaryFindKey(key.l)                   ; Find the first key >= to search key

  Protected stepper.l                            ; Used to zone in on the target
  Protected offset.l                             ; The position, in items, we seek
  Protected *lookHere.UDT_keyInfo                ; Floated on the memory bank

  stepper = 1                                    ; Start our stepper at binary one
  While stepper < itemCount / 2                  ; Get as close to the middle as we
    stepper << 1                                 ;   can, using two's complement
  Wend
  If stepper >= itemCount                        ; If we just missed (got the end)
    stepper >> 1                                 ;   halve the stepper
  EndIf

  offset = stepper                               ; Set our offset (in items or elements)

  While stepper > 0                              ; Keep looking until we exhaust the stepper

    *lookHere = memBank + (offset - 1) * SizeOf(UDT_keyInfo)
    stepper >> 1                                 ; After getting offset, shrink the stepper

    If *lookHere\key = key                       ; Cf we got the key
      Break                                      ; Paydirt!

    ElseIf *lookHere\key > key                   ; Otherwise, if we are too deep into the
      offset - stepper                           ;   list, backtrack
    Else                                         ; Or if we are not deep enough,
      offset + stepper                           ;   then advance
      While offset > itemCount                   ;   But make sure we stay in bounds
        stepper << 1
        offset - stepper                         ; Potential crash/infinite loop here?
      Wend                                       ;  (** Note to me, check this **)
    EndIf
  Wend

  While *lookHere\key >= key And *lookHere > memBank
    *lookHere - SizeOf(UDT_keyInfo)              ; Tidy up, get to a lesser value
  Wend
  While *lookHere\key < key And *lookHere < memBank + (itemCount * SizeOf(UDT_keyInfo)) - SizeOf(UDT_keyInfo)
    *lookHere + SizeOf(UDT_keyInfo)              ; Stop at first > value or end of list
  Wend

  ProcedureReturn *lookHere
EndProcedure

Define *res.UDT_keyInfo
Define search.l

Define timer.l
timer=ElapsedMilliseconds()

search = 1234567                                 ; Some key to find
*res = binaryFindKey(search)                     ; Get result

Debug StrL(ElapsedMilliseconds()-timer)+" ms, got "+StrL(*res\key)+" for "+StrL(search)

If *res\key <> search
  Debug "NO MATCH"
Else                                             ; If duplicate keys exist, this is the 1st
  While *res\key = search And *res < memBank + (itemCount * SizeOf(UDT_keyInfo)) - SizeOf(UDT_keyInfo)
    ; Fetch the extended information
    ; If this is the one we want
    ;   Break
    ; otherwise
    *res + SizeOf(UDT_keyInfo)
  Wend
EndIf


FreeMemory(memBank)

Also, I looked at using a B+Tree to speed things up further but came to the conclusion there was no real speed gain, perhaps even a potential speed loss, and there is extra overhead in memory. If the BTree was accessed from file at node level, there is certainly a speed loss.

So - just a (huge) lookup table in memory, using a "significant enough" portion of the key.

Keys in this must be all of the same size. If you use variable length keys, go back to the link given by netmaestro as there is an evolving idea on lookups in that thread.

Hope this helps a bit.
@}--`--,-- A rose by any other name ..
Inf0Byt3
PureBasic Fanatic
PureBasic Fanatic
Posts: 2236
Joined: Fri Dec 09, 2005 12:15 pm
Location: Elbonia

Post by Inf0Byt3 »

Hi Dare2, thanks a lot for the code... I'll try it and see how it works.
None are more hopelessly enslaved than those who falsely believe they are free. (Goethe)
sverson
Enthusiast
Enthusiast
Posts: 286
Joined: Sun Jul 04, 2004 12:15 pm
Location: Germany

Post by sverson »

@Inf0Byt3:

Just in case you are still looking for a fast (and easy) way to compare one element (memory location) with another 52000 elements...

Code: Select all

#TableSize = 52000
Global Dim search.s(#TableSize)

Procedure.s CreateKey()
  Protected OutKey.s="", KeyPos.l, KeyEnd.l
  KeyEnd = 30+Random(30)
  For KeyPos=1 To KeyEnd
    OutKey + RSet(Hex(Random($FF)),2,"0")
  Next
  ProcedureReturn OutKey
EndProcedure

Procedure DummyTable()
  Protected TablePos.l
  For TablePos = 0 To #TableSize
    search(TablePos) = CreateKey()
  Next
  SortArray(search(),2) 
EndProcedure

Procedure.l FindKeyFast(KeyToFind.s)
  Protected KeyLo.l,KeyHi.l,KeyPos.l,DebugCount.l=0
  KeyLo=0
  KeyHi=#TableSize
  Repeat
    KeyPos = (KeyLo+KeyHi)>>1
    ;DebugCount+1
    ;Debug Str(DebugCount)+": "+Str(KeyLo)+" | "+Str(KeyPos)+" | "+Str(KeyHi)
    If search(KeyPos)=KeyToFind
      ProcedureReturn KeyPos
    ElseIf search(KeyPos)>KeyToFind
      KeyHi=KeyPos
    Else
      KeyLo=KeyPos
    EndIf
  ForEver
EndProcedure

Procedure.l FindKeySlow(base.s)
  For t = 0 To #TableSize
    If search.s(t) = base.s
      ProcedureReturn t
    EndIf
  Next t 
EndProcedure

Procedure TestIt()
  base.s = search(Random(#TableSize))
  QueryPerformanceCounter_(@Cntr1)
  Debug FindKeySlow(base)
  QueryPerformanceCounter_(@Cntr2)
  Debug FindKeyFast(base)
  QueryPerformanceCounter_(@Cntr3)
  Debug Str((Cntr2-Cntr1)/(Cntr3-Cntr2))+" times faster..."
  Debug "-------"
EndProcedure

DummyTable()

For x=1 To 10
  TestIt()
Next
;-) sverson
Inf0Byt3
PureBasic Fanatic
PureBasic Fanatic
Posts: 2236
Joined: Fri Dec 09, 2005 12:15 pm
Location: Elbonia

Post by Inf0Byt3 »

That's very nice! Thanks! I'll test and use it :D.
None are more hopelessly enslaved than those who falsely believe they are free. (Goethe)
sverson
Enthusiast
Enthusiast
Posts: 286
Joined: Sun Jul 04, 2004 12:15 pm
Location: Germany

Post by sverson »

Image
Post Reply