A fast way to compare stuff
Some code:
Builds a lookup table, uses a binary search to find the needed.
Conceptual, not optimised, and lots of assumptions.
Can you bend it to suit your needs?
Also, I looked at using a B+Tree to speed things up further but came to the conclusion there was no real speed gain, perhaps even a potential speed loss, and there is extra overhead in memory. If the BTree was accessed from file at node level, there is certainly a speed loss.
So - just a (huge) lookup table in memory, using a "significant enough" portion of the key.
Keys in this must be all of the same size. If you use variable length keys, go back to the link given by netmaestro as there is an evolving idea on lookups in that thread.
Hope this helps a bit.
Builds a lookup table, uses a binary search to find the needed.
Conceptual, not optimised, and lots of assumptions.
Can you bend it to suit your needs?
Code: Select all
EnableExplicit
Macro strL(val)
Str(val)
EndMacro
Structure UDT_keyInfo
key.l
ptr.l
EndStructure
; ===========================================================================
; PART 1 : Preparation. Not required in the main prog.
; Assumes speed not critical, builds the list
; This would be done each time a virus definition was added.
Global NewList keys.UDT_keyInfo() ; To hold the key/ptr info
Global itemCount.l ; To count the info
Global memBank.l
Global fileSz.l
Global rawFile.s
rawFile = "C:\100\pb400\PROJECTS\DataBase\rawViralKeys.dta"
RandomSeed(1)
; Substitute this for something reading key/pointer information from the real file.
; Assumptions are:
; The offset of each "record" in the real file is known.
; A fixed length key using a part of the record is feasible.
; Duplication is allowed. If none, so much the better!
Procedure populateRawList() ; Pretend we're reading these in
Protected offSet.l ; from a file. Key is used here
For offSet = 1 To 52000 ; is just a long. Substitute for
AddElement(keys()) ; whatever you will use. IMO, if
keys()\key = Random($7FFFFFFF) ; using strings of 8 or less, then
keys()\ptr = offSet ; treat as quads or longs.
itemCount + 1
Next ; Once keys are listed, sort them
SortStructuredList(keys(),0,OffsetOf(UDT_keyInfo\key),#PB_Sort_Long)
EndProcedure
populateRawList() ; Build our lookup list
CreateFile(0,rawFile) ; Write our list to file
WriteLong(0,itemCount) ; First bytes are a count of items
ForEach keys()
WriteLong(0,keys()\key)
WriteLong(0,keys()\ptr)
Next
CloseFile(0)
; ===========================================================================
; PART 2 - Used to lookup in the main prog
; Uses a binary search, overhead is:
; max 16 (plus adjusting) hits for 32k - 1 items
; max 32 (plus adjusting) hits for 64k - 1 items
; max 64 (plus adjusting) hits for 128k - 1 items
; etc.
fileSz = FileSize(rawFile) ; Load our sorted list
memBank.l = AllocateMemory(fileSz-4) ; Not a lot of error checking :)
ReadFile(0,rawFile)
itemCount = ReadLong(0)
ReadData(0,memBank,fileSz)
CloseFile(0)
; This will (should) always find the first key that is >= to the search key,
; with one exception:
; If the key being searched is greater than the highest key in list,
; the last key in the list is return and will be < than search key
Procedure binaryFindKey(key.l) ; Find the first key >= to search key
Protected stepper.l ; Used to zone in on the target
Protected offset.l ; The position, in items, we seek
Protected *lookHere.UDT_keyInfo ; Floated on the memory bank
stepper = 1 ; Start our stepper at binary one
While stepper < itemCount / 2 ; Get as close to the middle as we
stepper << 1 ; can, using two's complement
Wend
If stepper >= itemCount ; If we just missed (got the end)
stepper >> 1 ; halve the stepper
EndIf
offset = stepper ; Set our offset (in items or elements)
While stepper > 0 ; Keep looking until we exhaust the stepper
*lookHere = memBank + (offset - 1) * SizeOf(UDT_keyInfo)
stepper >> 1 ; After getting offset, shrink the stepper
If *lookHere\key = key ; Cf we got the key
Break ; Paydirt!
ElseIf *lookHere\key > key ; Otherwise, if we are too deep into the
offset - stepper ; list, backtrack
Else ; Or if we are not deep enough,
offset + stepper ; then advance
While offset > itemCount ; But make sure we stay in bounds
stepper << 1
offset - stepper ; Potential crash/infinite loop here?
Wend ; (** Note to me, check this **)
EndIf
Wend
While *lookHere\key >= key And *lookHere > memBank
*lookHere - SizeOf(UDT_keyInfo) ; Tidy up, get to a lesser value
Wend
While *lookHere\key < key And *lookHere < memBank + (itemCount * SizeOf(UDT_keyInfo)) - SizeOf(UDT_keyInfo)
*lookHere + SizeOf(UDT_keyInfo) ; Stop at first > value or end of list
Wend
ProcedureReturn *lookHere
EndProcedure
Define *res.UDT_keyInfo
Define search.l
Define timer.l
timer=ElapsedMilliseconds()
search = 1234567 ; Some key to find
*res = binaryFindKey(search) ; Get result
Debug StrL(ElapsedMilliseconds()-timer)+" ms, got "+StrL(*res\key)+" for "+StrL(search)
If *res\key <> search
Debug "NO MATCH"
Else ; If duplicate keys exist, this is the 1st
While *res\key = search And *res < memBank + (itemCount * SizeOf(UDT_keyInfo)) - SizeOf(UDT_keyInfo)
; Fetch the extended information
; If this is the one we want
; Break
; otherwise
*res + SizeOf(UDT_keyInfo)
Wend
EndIf
FreeMemory(memBank)
Also, I looked at using a B+Tree to speed things up further but came to the conclusion there was no real speed gain, perhaps even a potential speed loss, and there is extra overhead in memory. If the BTree was accessed from file at node level, there is certainly a speed loss.
So - just a (huge) lookup table in memory, using a "significant enough" portion of the key.
Keys in this must be all of the same size. If you use variable length keys, go back to the link given by netmaestro as there is an evolving idea on lookups in that thread.
Hope this helps a bit.
@}--`--,-- A rose by any other name ..
@Inf0Byt3:
Just in case you are still looking for a fast (and easy) way to compare one element (memory location) with another 52000 elements...
sverson
Just in case you are still looking for a fast (and easy) way to compare one element (memory location) with another 52000 elements...
Code: Select all
#TableSize = 52000
Global Dim search.s(#TableSize)
Procedure.s CreateKey()
Protected OutKey.s="", KeyPos.l, KeyEnd.l
KeyEnd = 30+Random(30)
For KeyPos=1 To KeyEnd
OutKey + RSet(Hex(Random($FF)),2,"0")
Next
ProcedureReturn OutKey
EndProcedure
Procedure DummyTable()
Protected TablePos.l
For TablePos = 0 To #TableSize
search(TablePos) = CreateKey()
Next
SortArray(search(),2)
EndProcedure
Procedure.l FindKeyFast(KeyToFind.s)
Protected KeyLo.l,KeyHi.l,KeyPos.l,DebugCount.l=0
KeyLo=0
KeyHi=#TableSize
Repeat
KeyPos = (KeyLo+KeyHi)>>1
;DebugCount+1
;Debug Str(DebugCount)+": "+Str(KeyLo)+" | "+Str(KeyPos)+" | "+Str(KeyHi)
If search(KeyPos)=KeyToFind
ProcedureReturn KeyPos
ElseIf search(KeyPos)>KeyToFind
KeyHi=KeyPos
Else
KeyLo=KeyPos
EndIf
ForEver
EndProcedure
Procedure.l FindKeySlow(base.s)
For t = 0 To #TableSize
If search.s(t) = base.s
ProcedureReturn t
EndIf
Next t
EndProcedure
Procedure TestIt()
base.s = search(Random(#TableSize))
QueryPerformanceCounter_(@Cntr1)
Debug FindKeySlow(base)
QueryPerformanceCounter_(@Cntr2)
Debug FindKeyFast(base)
QueryPerformanceCounter_(@Cntr3)
Debug Str((Cntr2-Cntr1)/(Cntr3-Cntr2))+" times faster..."
Debug "-------"
EndProcedure
DummyTable()
For x=1 To 10
TestIt()
Next
