Page 2 of 2

Posted: Tue May 02, 2006 7:09 pm
by Inf0Byt3
Yes, this would work too... I'll test it. Until now, netmaesto's idea is working exremely well, the scanner is working. I gotta test the efficacity too.

Posted: Tue May 02, 2006 8:52 pm
by Inf0Byt3
Here's a non-functioning dirty test... If anyone can tell me why the heck doesn't the Procedure in the dll return nothing?

Server is now down...

Posted: Wed May 03, 2006 7:30 am
by Dare2
Some code:

Builds a lookup table, uses a binary search to find the needed.
Conceptual, not optimised, and lots of assumptions.

Can you bend it to suit your needs?

Code: Select all

EnableExplicit

Macro strL(val)
  Str(val)
EndMacro

Structure UDT_keyInfo
  key.l
  ptr.l
EndStructure

; ===========================================================================
; PART 1 : Preparation. Not required in the main prog.
;   Assumes speed not critical, builds the list
;   This would be done each time a virus definition was added.

Global NewList keys.UDT_keyInfo()                ; To hold the key/ptr info
Global itemCount.l                               ; To count the info
Global memBank.l
Global fileSz.l

Global rawFile.s

rawFile = "C:\100\pb400\PROJECTS\DataBase\rawViralKeys.dta"
RandomSeed(1)

; Substitute this for something reading key/pointer information from the real file.
; Assumptions are:
;   The offset of each "record" in the real file is known.
;   A fixed length key using a part of the record is feasible.
;   Duplication is allowed. If none, so much the better!

Procedure populateRawList()                      ; Pretend we're reading these in
  Protected offSet.l                             ;   from a file. Key is used here
  For offSet = 1 To 52000                        ;   is just a long. Substitute for
    AddElement(keys())                           ;   whatever you will use. IMO, if
    keys()\key = Random($7FFFFFFF)               ;   using strings of 8 or less, then
    keys()\ptr = offSet                          ;   treat as quads or longs.
    itemCount + 1
  Next                                           ; Once keys are listed, sort them
  SortStructuredList(keys(),0,OffsetOf(UDT_keyInfo\key),#PB_Sort_Long)
EndProcedure

populateRawList()                                ; Build our lookup list

CreateFile(0,rawFile)                            ; Write our list to file
WriteLong(0,itemCount)                           ; First bytes are a count of items
ForEach keys()
  WriteLong(0,keys()\key)
  WriteLong(0,keys()\ptr)
Next
CloseFile(0)

; ===========================================================================
; PART 2 - Used to lookup in the main prog
;    Uses a binary search, overhead is:
;       max 16 (plus adjusting) hits for  32k - 1 items
;       max 32 (plus adjusting) hits for  64k - 1 items
;       max 64 (plus adjusting) hits for 128k - 1 items
;    etc.

fileSz = FileSize(rawFile)                       ; Load our sorted list
memBank.l = AllocateMemory(fileSz-4)             ; Not a lot of error checking :)
ReadFile(0,rawFile)
itemCount = ReadLong(0)
ReadData(0,memBank,fileSz)
CloseFile(0)

; This will (should) always find the first key that is >= to the search key,
;    with one exception:
;      If the key being searched is greater than the highest key in list,
;      the last key in the list is return and will be < than search key
     
Procedure binaryFindKey(key.l)                   ; Find the first key >= to search key

  Protected stepper.l                            ; Used to zone in on the target
  Protected offset.l                             ; The position, in items, we seek
  Protected *lookHere.UDT_keyInfo                ; Floated on the memory bank

  stepper = 1                                    ; Start our stepper at binary one
  While stepper < itemCount / 2                  ; Get as close to the middle as we
    stepper << 1                                 ;   can, using two's complement
  Wend
  If stepper >= itemCount                        ; If we just missed (got the end)
    stepper >> 1                                 ;   halve the stepper
  EndIf

  offset = stepper                               ; Set our offset (in items or elements)

  While stepper > 0                              ; Keep looking until we exhaust the stepper

    *lookHere = memBank + (offset - 1) * SizeOf(UDT_keyInfo)
    stepper >> 1                                 ; After getting offset, shrink the stepper

    If *lookHere\key = key                       ; Cf we got the key
      Break                                      ; Paydirt!

    ElseIf *lookHere\key > key                   ; Otherwise, if we are too deep into the
      offset - stepper                           ;   list, backtrack
    Else                                         ; Or if we are not deep enough,
      offset + stepper                           ;   then advance
      While offset > itemCount                   ;   But make sure we stay in bounds
        stepper << 1
        offset - stepper                         ; Potential crash/infinite loop here?
      Wend                                       ;  (** Note to me, check this **)
    EndIf
  Wend

  While *lookHere\key >= key And *lookHere > memBank
    *lookHere - SizeOf(UDT_keyInfo)              ; Tidy up, get to a lesser value
  Wend
  While *lookHere\key < key And *lookHere < memBank + (itemCount * SizeOf(UDT_keyInfo)) - SizeOf(UDT_keyInfo)
    *lookHere + SizeOf(UDT_keyInfo)              ; Stop at first > value or end of list
  Wend

  ProcedureReturn *lookHere
EndProcedure

Define *res.UDT_keyInfo
Define search.l

Define timer.l
timer=ElapsedMilliseconds()

search = 1234567                                 ; Some key to find
*res = binaryFindKey(search)                     ; Get result

Debug StrL(ElapsedMilliseconds()-timer)+" ms, got "+StrL(*res\key)+" for "+StrL(search)

If *res\key <> search
  Debug "NO MATCH"
Else                                             ; If duplicate keys exist, this is the 1st
  While *res\key = search And *res < memBank + (itemCount * SizeOf(UDT_keyInfo)) - SizeOf(UDT_keyInfo)
    ; Fetch the extended information
    ; If this is the one we want
    ;   Break
    ; otherwise
    *res + SizeOf(UDT_keyInfo)
  Wend
EndIf


FreeMemory(memBank)

Also, I looked at using a B+Tree to speed things up further but came to the conclusion there was no real speed gain, perhaps even a potential speed loss, and there is extra overhead in memory. If the BTree was accessed from file at node level, there is certainly a speed loss.

So - just a (huge) lookup table in memory, using a "significant enough" portion of the key.

Keys in this must be all of the same size. If you use variable length keys, go back to the link given by netmaestro as there is an evolving idea on lookups in that thread.

Hope this helps a bit.

Posted: Wed May 03, 2006 11:59 am
by Inf0Byt3
Hi Dare2, thanks a lot for the code... I'll try it and see how it works.

Posted: Sat Jul 01, 2006 10:46 pm
by sverson
@Inf0Byt3:

Just in case you are still looking for a fast (and easy) way to compare one element (memory location) with another 52000 elements...

Code: Select all

#TableSize = 52000
Global Dim search.s(#TableSize)

Procedure.s CreateKey()
  Protected OutKey.s="", KeyPos.l, KeyEnd.l
  KeyEnd = 30+Random(30)
  For KeyPos=1 To KeyEnd
    OutKey + RSet(Hex(Random($FF)),2,"0")
  Next
  ProcedureReturn OutKey
EndProcedure

Procedure DummyTable()
  Protected TablePos.l
  For TablePos = 0 To #TableSize
    search(TablePos) = CreateKey()
  Next
  SortArray(search(),2) 
EndProcedure

Procedure.l FindKeyFast(KeyToFind.s)
  Protected KeyLo.l,KeyHi.l,KeyPos.l,DebugCount.l=0
  KeyLo=0
  KeyHi=#TableSize
  Repeat
    KeyPos = (KeyLo+KeyHi)>>1
    ;DebugCount+1
    ;Debug Str(DebugCount)+": "+Str(KeyLo)+" | "+Str(KeyPos)+" | "+Str(KeyHi)
    If search(KeyPos)=KeyToFind
      ProcedureReturn KeyPos
    ElseIf search(KeyPos)>KeyToFind
      KeyHi=KeyPos
    Else
      KeyLo=KeyPos
    EndIf
  ForEver
EndProcedure

Procedure.l FindKeySlow(base.s)
  For t = 0 To #TableSize
    If search.s(t) = base.s
      ProcedureReturn t
    EndIf
  Next t 
EndProcedure

Procedure TestIt()
  base.s = search(Random(#TableSize))
  QueryPerformanceCounter_(@Cntr1)
  Debug FindKeySlow(base)
  QueryPerformanceCounter_(@Cntr2)
  Debug FindKeyFast(base)
  QueryPerformanceCounter_(@Cntr3)
  Debug Str((Cntr2-Cntr1)/(Cntr3-Cntr2))+" times faster..."
  Debug "-------"
EndProcedure

DummyTable()

For x=1 To 10
  TestIt()
Next
;-) sverson

Posted: Sun Jul 02, 2006 3:23 pm
by Inf0Byt3
That's very nice! Thanks! I'll test and use it :D.

Posted: Mon Jul 03, 2006 2:06 am
by sverson
Image