Bloom filter Class
Posted: Tue Dec 09, 2008 6:19 am
Bloom Filter Class
Bloom filters are very simple structure but getting them fast with accurate results isn't!
You will need to verify that the Filter performs for your given inputs, though it should be fine as it is.
While a Bloom is typically an O(k) as in constant time, this one employs a short circuit look up so is really O(1)
Indicative times from 2GHz system
with errors allowed
Add rates are ~800,000 items per second
look up rates ~1,400,000 items per second
with out errors
add rates are ~400,000 items per second
look up rates ~1,300,000 items per second
*Note even with an error rate of 0.0, it's still possible to get an error
currently it's only good for ~10,000,000 items with 4 keys and then it's not really tested much
What is a bloom filter? Why would you want one? What can you do with them?
http://en.wikipedia.org/wiki/Bloom_filter
Uses: AntiVirus scanners, DataBase lookups, Spell checkers, P2P apps, Caches
The real boon about them is that they're compact and extremely fast.
You can also use them in networked games, did player A trigger something at x,y,z
Bloom filters are very simple structure but getting them fast with accurate results isn't!
You will need to verify that the Filter performs for your given inputs, though it should be fine as it is.
While a Bloom is typically an O(k) as in constant time, this one employs a short circuit look up so is really O(1)
Indicative times from 2GHz system
with errors allowed
Add rates are ~800,000 items per second
look up rates ~1,400,000 items per second
with out errors
add rates are ~400,000 items per second
look up rates ~1,300,000 items per second
*Note even with an error rate of 0.0, it's still possible to get an error
currently it's only good for ~10,000,000 items with 4 keys and then it's not really tested much
What is a bloom filter? Why would you want one? What can you do with them?
http://en.wikipedia.org/wiki/Bloom_filter
Uses: AntiVirus scanners, DataBase lookups, Spell checkers, P2P apps, Caches
The real boon about them is that they're compact and extremely fast.
You can also use them in networked games, did player A trigger something at x,y,z
Code: Select all
;Idlearts bloom filter
;Class Gen Macro
Macro e(e=o):EndMacr#e:EndMacro:Macro m():Macro :e()
m()ss(n):Structure n:e():m()es():EndStructure:e():m()param(p):,p:e():m()ClassPointer(n,pointer):pointer.m_#n :e()
m()ClassStructure(Class):ss(m_#class):*vt :e():m()ClassPrototypes(Class):es():Interface Class#_obj:Destroy(*obj):e()
m()ClassMethods(Class):EndInterface:Procedure new_#class(*obj.m_#class):*obj = AllocateMemory(SizeOf(m_#class))
If *obj: *obj\vt=?class#_vt: EndIf :ProcedureReturn *obj : EndProcedure:Procedure Destroy_#class(*obj):If *obj:
ClearStructure(*obj, m_#class): FreeMemory(*obj)
*obj=0:EndIf:EndProcedure:EndMacro:m()AddressTable(Class):DataSection:Class#_vt::Data.i @Destroy_#Class():e()
m()pFunc(Class,function):Data.i @function#_#class#():e():m()EndClass():EndDataSection:e()
m()PublicMethod(Class,name,type,params):Procedure type name#_#class#(*this.m_#class, params):e()
m()EndMethod():EndProcedure:e():m()PublicReturn:ProcedureReturn:e():m()Private:Protected:e()
m()ClassFunc(n,func): *this\func#_#n :e()
;Class Builder Macro
Macro Class(n,Struct,Proto,Methods,Table)
ClassStructure(n)
struct
ClassPrototypes(n)
proto
ClassMethods(n)
methods
AddressTable(n)
table
Endclass();) Remove the comment to dump the generated Class use right click select all copy
EndMacro
;Bloom Filter Class Useage
;
;1) BloomFilter(clsBloom) ;Creates the Class named ClsBloom
;2) Global Bloom1.ClsBloom_obj ;Define a pointer to the Class
;
;3) Bloom1 = New_ClsBloom(@Bloom1) ;Create the Class Instance
;
;4) Bloom1\Create(number of items,number of keys,error percent)
; *Note if error rate is set 0.0 it is still posible for it to generate an error should be ok for 10,000,000 items
;Functions
;
;Bloom1\Add(string) or Bloom1\Add("",*mem,sizeof(mem))
;Bloom1\Check(string) or Bloom1\Check("",*mem,sizeof(mem))
;Bloom1\Merge(ClassPointer) eg Bloom2
;Merges filter 2 into filter 1 note filters must be the same size
;Bloom1\Merge(ClassPointer,Start,Length)
;Start = start merge from an offset in filter, the length to merge
;
;Bloom1\Clear() Zero the filter (windows only)
;Bloom1\Free() Must be called to free the allocated memory
;Bloom1\Destroy() Must be called to Destroy the Class
EnableASM
#FNV_1 = 2166136261
#FNV_0 = 0
Procedure.q FNV32(*tdata.i, len.i,fnvt)
MOV esi, *tdata ;esi = ptr to buffer
MOV ecx, len ;ecx = length of buffer (counter)
MOV eax, fnvt ;2166136261 ;eax;offset_basis ;set to 0 for FNV-0, or 2166136261 for FNV-1
MOV edi, 16777619 ;0x01000193 ;FNV_32_PRIME = 16777619
XOr ebx, ebx ;ebx = 0
! ntbyte:
MUL edi ;eax = eax * FNV_32_PRIME
MOV bl, [esi] ;bl = byte from esi
XOr eax, ebx ;al = al xor bl
INC esi ;esi = esi + 1 (buffer pos)
DEC ecx ;ecx = ecx - 1 (counter)
JNZ ntbyte ;if ecx is 0, jmp to NextByte
ProcedureReturn
EndProcedure
DisableASM
;The Bloom Filter Class Description
Macro bloomStruct(n)
bloomSize.i
mBloomSize.i
nKeys.i
er.f
*bloom
*bloom1
errorless.i
EndMacro
Macro Prototypes(n)
Free.i()
Create.f(size.f,nKeys.f,er.f)
Add.i(key.s,*mem=0,len=0)
Check.i(key.s,*mem=0,len=0)
Merge.i(ClassPointer(n,*pbloom),Start.i=0,len.i=0)
Clear.i()
Fill.i(*mem.i,size.i)
Export.i(*mem.i,size.i)
Save.i(file.s)
EndMacro
Macro Methods(n)
PublicMethod(n,Free,.i,*null)
If *this\bloom
FreeMemory(*this\bloom)
*this\bloom = 0
If *this\bloom1
FreeMemory(*this\bloom1)
*this\bloom1=0
EndIf
EndIf
Endmethod()
PublicMethod(n,Create,.f,size.f param(nKeys.f) param(er.f))
private ts.f
If er = 0.0
er = 0.001 ;errorless mode only good for less than 10,000,000 items
*this\errorless = 1
EndIf
*this\bloomsize = (-(nkeys * (size))) / (Log(1 - Pow((er/100),1/nkeys )))/8
*this\nkeys = nkeys
*this\er = er
*this\mbloomsize = *this\bloomsize * 8
*this\bloom = AllocateMemory(*this\bloomsize)
ts.f = *this\bloomsize
If *this\errorless
*this\bloom1 = AllocateMemory(*this\bloomsize)
ts*2
EndIf
PublicReturn ts / 1048576.0
Endmethod()
PublicMethod(n,ADD,.i,key.s param(*mem=0) param(len=0))
Private thash.q,mhash.q,ohash.q,a.i,*tb.byte,hf.i
If Not len
thash = fnv32(@key,Len(key),#FNV_1) & $FFFFFFFF
Else
thash = fnv32(*mem,len,#FNV_1) & $FFFFFFFF
EndIf
mhash = thash % *this\mbloomsize
*tb = *this\bloom+(mhash>>3)
*tb\b | (1 << (mhash % 8))
a=1
While a < *this\nKeys
thash = fnv32(@thash,4,#FNV_1)
mhash = thash % *this\mbloomsize
*tb = *this\bloom+(mhash>>3)
*tb\b | (1 << (mhash % 8))
a+1
Wend
If *this\errorless
hf = 16777619
If Not len
thash = (fnv32(@key,Len(key),#FNV_0) + hf) & $FFFFFFFF
Else
thash = (fnv32(*mem,len,#FNV_0) + hf) & $FFFFFFFF
EndIf
mhash = thash % *this\mbloomsize
*tb = *this\bloom1+(mhash>>3)
*tb\b | (1 << (mhash % 8))
a=1
While a < *this\nKeys
thash = fnv32(@thash,4,#FNV_0)
mhash = thash % *this\mbloomsize
*tb = *this\bloom1+(mhash>>3)
*tb\b | (1 << (mhash % 8))
a+1
Wend
EndIf
EndMethod()
PublicMethod(n,Check,.i,key.s param(*mem=0) param(len=0))
private thash.q,mhash.q,ohash.q,tret,mret,a,*tb.byte,mf,hf.i
If Not len
thash = fnv32(@key,Len(key),#FNV_1) & $FFFFFFFF
Else
thash = fnv32(*mem,len,#FNV_1) & $FFFFFFFF
EndIf
ohash = thash
mhash = thash % *this\mbloomsize
mf = (mhash % 8)
*tb = *this\bloom+(mhash>>3)
tret = (*tb\b & (1 << mf)) >> mf
If tret = 0
PublicReturn 0
Else
mret + 1
EndIf
a=1
While a < *this\nKeys
thash = fnv32(@thash,4,#FNV_1)
mhash = thash % *this\mbloomsize
mf = (mhash % 8)
*tb = *this\bloom+(mhash>>3)
tret = (*tb\b & (1 << mf)) >> mf
If tret = 0
PublicReturn 0
Else
mret + 1
EndIf
a+1
Wend
If *this\errorless
hf = 16777619
If Not len
thash = (fnv32(@key,Len(key),#FNV_0) + hf) & $FFFFFFFF
Else
thash = (fnv32(*mem,len,#FNV_0) + hf) & $FFFFFFFF
EndIf
mhash = thash % *this\mbloomsize
mf = (mhash % 8)
*tb = *this\bloom1+(mhash>>3)
tret2 = (*tb\b & (1 << mf)) >> mf
If tret2 = 0
PublicReturn 0
Else
mret + 1
EndIf
a=1
While a < *this\nKeys
thash = fnv32(@thash,4,#FNV_0);
mhash = thash % *this\mbloomsize
mf = (mhash % 8)
*tb = *this\bloom1+(mhash>>3)
tret2 = (*tb\b & (1 << mf)) >> mf
If tret2 = 0
PublicReturn 0
Else
mret + 1
EndIf
a+1
Wend
If mret = *this\nKeys*2
mret = 1
Else
mret = 0
EndIf
Else
If mret = *this\nKeys
mret = 1
Else
mret = 0
EndIf
EndIf
PublicReturn mret
EndMethod()
PublicMethod(n,Merge,.i,ClassPointer(n,*pbloom) param(start=0) param(len=0))
private sz.i,*t1.integer,*t2.integer,*t3.integer,*t4.integer,asz = SizeOf(Integer)
sz = *this\bloomsize
If *this\bloomsize = *pbloom\bloomsize
If start > 0
a = start
EndIf
If len <> 0
sz = len - a
EndIf
If *this\errorless
While a < sz
*t1 = *this\bloom+a
*t2 = *pbloom\bloom+a
*t1\i | *t2\i
*t3 = *this\bloom1+a
*t4 = *pbloom\bloom1+a
*t3\i | *t4\i
a+asz
Wend
Else
While a < sz
*t1 = *this\bloom+a
*t2 = *pbloom\bloom+a
*t1\i | *t2\i
a+asz
Wend
EndIf
EndIf
EndMethod()
PublicMethod(n,Clear,.i,*null)
FillMemory(*this\bloom,*this\bloomSize)
If *this\errorless
FillMemory(*this\bloom1,*this\bloomSize)
EndIf
Endmethod()
;not addapted for errorless mode
PublicMethod(n,Fill,.i,*mem.i param(size))
private sz.i,*t1.integer,*t2.integer,asz = SizeOf(Integer)
sz = *this\bloomsize-asz
If *this\bloomsize = size
While a < sz
*t1 = *this\bloom+a
*t2 = *mem+a
*t1\i = *t2\i
a+asz
Wend
EndIf
EndMethod()
;not addapted for errorless mode
PublicMethod(n,Export,.i,*mem.i param(size))
private sz.i,*t1.integer,*t2.integer,asz = SizeOf(Integer)
sz = *this\bloomsize-asz
If *this\bloomsize = size
While a < sz
*t1 = *this\bloom+a
*t2 = *mem+a
*t2\i = *t1\i
a+asz
Wend
EndIf
EndMethod()
;not addapted for errorless mode
PublicMethod(n,Save,.i,file.s)
private sf.i
sf= OpenFile(#PB_Any,file)
If sf
WriteData(sf,*this\bloom,*this\bloomsize)
CloseFile(sf)
EndIf
EndMethod()
EndMacro
Macro Table(n)
pfunc(n,Free)
pfunc(n,Create)
pfunc(n,ADD)
pfunc(n,Check)
pfunc(n,Merge)
pfunc(n,Clear)
pfunc(n,Fill)
pfunc(n,Export)
pfunc(n,Save)
EndMacro
;Bloom Filter Constrution macro call this macro in you code
Macro BloomFilter(n)
Class(n,bloomStruct(n),Prototypes(n),Methods(n),Table(n))
EndMacro
;
;/////////////////////////////End Class//////////////////////////////
;/////////////////////////////Test///////////////////////////////////
tt.TIMECAPS
timeGetDevCaps_(@tt, SizeOf(TIMECAPS))
timeBeginPeriod_(tt\wPeriodMin)
BloomFilter(clsBloom)
;Create objects
Global Bloom1.ClsBloom_obj = New_ClsBloom(@Bloom1)
Global Bloom2.ClsBloom_obj = New_ClsBloom(@Bloom2)
OpenConsole()
tv.q
;number of items for bloom
items.q = 5000000
;number of keys
keys = 4
;error rate
;if rate is 0.0 it is still posible to get errors!
er.f = 0.005
seed.i = Random(100)+1
sz1.f = Bloom1\Create(items,keys,er)
sz2 = Bloom2\Create(items,keys,er)
PrintN("Creating " + Str(items) + " item Bloom Filter " + StrF(sz1,3) + "Mb")
;test add, check And merge functions
PrintN("Add this string to Bloom1 and check if it's there")
tps$ = "Add this string to Bloom1 and check if it's there"
Bloom1\Add(tps$)
If Bloom1\Check(tps$)
PrintN("")
PrintN(" Yes its there")
EndIf
PrintN("")
PrintN("Add 20000 random Strings to Bloom2")
Dim astr.s(20000)
For a = 1 To 20000
val = a
tps$ = MD5Fingerprint(@val,4)
astr(a)=tps$
Bloom2\Add(tps$)
Next
PrintN("")
PrintN("Combining Bloom2 with Bloom1")
Bloom1\Merge(Bloom2)
PrintN("")
PrintN("Checking that the merged values are in Bloom1")
For a = 1 To 20000
tps$ = astr(a)
ct + Bloom1\Check(tps$)
Next
PrintN("")
PrintN(" Result found " + Str(ct) + " items in Bloom1")
PrintN("")
PrintN("Clearing bloom1")
Bloom1\Clear()
PrintN("")
PrintN("Test Adding " + Str(items) + " Items, wait..")
ct2=0
ct1=0
ct3=0
;timing test add
RandomSeed(seed)
st = GetTickCount_()
tp.q
For a = 1 To items
tp = a
Bloom1\Add("",@tp,8)
Next
et = GetTickCount_() - st
PrintN(" Time to add " + Str(items) + " items with overheads " + Str(et) + " milliseconds")
PrintN("")
;timing test for false entries that have not been entered into filter
PrintN("Checking for false entries, wait..")
ct2=0
RandomSeed(seed)
st1 = GetTickCount_()
For a = 1 To items
tp = a + items
ct2 + Bloom1\Check("",@tp,8)
Next
et1 = GetTickCount_() - st1
;time the loop overheads to get a time differential to measure the add and check rates
st1 = GetTickCount_()
a=0
For a = 1 To items
tp = a
ct + 1
Next
et2 = GetTickCount_() - st1
PrintN(" Time to check " + Str(items) + " items with overheads " + Str(et1) + " milliseconds")
PrintN("")
PrintN(" Number of false positives found = " + Str(ct2) + " err % = " + StrF(((ct2/items)*100),3))
PrintN("")
PrintN(" Time of overheads " + Str(et2) + " milliseconds")
PrintN("")
PrintN(" Avg add time minus overheads= " + StrF((et-et2)/items,6) + " milliseconds")
PrintN(" Avg check time minus overheads= " + StrF((et1-et2)/items,6) + " milliseconds")
rate.f = (1/((et-et2)/items))*1000
PrintN("")
PrintN(" Average add rate items per second " + StrF(rate,0) + " p/s")
rate.f = (1/((et1-et2)/items))*1000
PrintN(" Average check rate items per second " + StrF(rate,0) + " p/s")
Bloom1\free()
Bloom2\free()
Bloom1\Destroy(@Bloom1)
Bloom2\Destroy(@Bloom2)
PrintN("")
PrintN("")
Input()