Page 1 of 2
A fast way to compare stuff
Posted: Tue May 02, 2006 3:47 pm
by Inf0Byt3
I can't find a fast way to compare one element (memory location) with another 52000 elements. I am currently using an array to hold the 52000 elements but it's too damn slow. Any ideas please?
Code: Select all
global dim search.s(52000)
base.s = "testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttest"
for t = 1 to 52000
if search.s(t-1) = base.s
break
debug "found!"
endif
next t
Posted: Tue May 02, 2006 3:49 pm
by Dare2
Build a lookup tree?
Posted: Tue May 02, 2006 3:50 pm
by Inf0Byt3
Can you give me an example please? (Never coded something like this before

).
Posted: Tue May 02, 2006 4:00 pm
by Dare2
Ack. Sorry Inf0Byt3, I don't have any code for it in PureBasic and don't know that I could get something up real quick, even converting.
I'll have a go tomorrow, though, if nothing comes up for you before.
But also chase B .. binary .. balanced etc trees, etc, on the net, there is a lot out there.
Posted: Tue May 02, 2006 4:18 pm
by Inf0Byt3
Cool, allready had a search on the forums and found something, but I need to understand it deeply before attempting something. It seems that I need a sort of indexing in memory. Tha main plan was to enumerate the array elements one by one and use Boyer Moore to search for the pattern inside of each element. This is too simple to be fast so I must find something much quicker.
Thanks for helping me

.
Posted: Tue May 02, 2006 4:25 pm
by netmaestro
I made something similar a few months ago, it's pretty fast. Searches 170000 words in way under 1 millisecond:
http://www.purebasic.fr/english/viewtop ... y&start=20
Posted: Tue May 02, 2006 4:37 pm
by Inf0Byt3
Extremely usefull. Thank you. I will study it and see if I can use it.
Posted: Tue May 02, 2006 5:07 pm
by Inf0Byt3
Searches 170000 words in way under 1 millisecond:
Am I doing something wrong? On my 1000 Mhz AMD it takes around 1,7 seconds...
Posted: Tue May 02, 2006 5:09 pm
by Dare2
Hi Inf0Byt3,
What are the parameters for your search? Always text? Binary? Variable length "keys"?
Posted: Tue May 02, 2006 5:13 pm
by Inf0Byt3
Whazzzuuupp. This is for PureAV... So i need only binary searching, I guess... The sizes are variable yes... Here is a demo:
Code: Select all
Exploit.WMF.A=26060f001600ffffffff0000470000008f
Exploit.WMF.B=b804c0b807c1dab8858098d50eb854
Exploit.WMF.Gen-1=010009000003521f0000???????
Exploit.WMF.Gen-2=010009000003521f0000???????
Worm.Mytob.Gen-4=28d0010035d00100000000000
Trojan.Downloader.Small-525=7280958a6b877e83
Exploit.DCOM.Gen=05000b0310000000480000007f
The second part is a HEX encoded binary string. I 'unhex' them in memory and then I need to search this keys in a loaded file...
[Edit]
Truncated them a bit, too space consuming

Posted: Tue May 02, 2006 5:17 pm
by Dare2
Woot. Not a small thing.
Tomorrow I will have a play. But if it ends up like the other "help" I've tried to give you, you are probably getting worried about now ....

Posted: Tue May 02, 2006 5:18 pm
by netmaestro
Sorry, I meant it will find one word in a list of that size in way less than 1ms. The whole list takes longer.
Posted: Tue May 02, 2006 5:32 pm
by geoff
Inf0Byt3 wrote:I can't find a fast way to compare one element (memory location) with another 52000 elements. I am currently using an array to hold the 52000 elements but it's too damn slow. Any ideas please?
The examples above assume that the large string array you wish to search is ordered in some way. For example the data to be searched may be in alphabetic order. You can then do a binary search of 2^n array entries using only n tests.
If the large array is essentially random you may be able to speed the search by searching on just the first part of your search string, maybe just the first character, with a second test for the entire search string. Tricks like this will work better in some languages than in others.
Posted: Tue May 02, 2006 5:46 pm
by Inf0Byt3
@netmaestro
No problem, It is my mistake, not yours. I missunderstood what you said, I should be sorry for that... It works cool until now. This is the fastest method until now.
@Dare2
There is no need to make the 'unhex' part I allready have it done. I simply need the agorithm implementation in pb... I hope I'm not consuming your time... If you can't make it, there is absolutely no problem. Don't waste your time with me...
@geoff
If I find other methods (I'll wait for Dare2 to try) I'll use them. The array is too much time consuming I guess. However, I 'll try to sort it and make some speed tests.
Thanks to you all mates, it's a great thing to get help when you need it.
A quick note... Here is the link for an open source scanner made in Java that is very fast (for an interpreted implementation)
http://sourceforge.net/project/showfile ... _id=242042[/code]
Posted: Tue May 02, 2006 7:06 pm
by Pupil
I think it would be better to store the string encoded binary as binary in mem for all 52k items and use the first two bytes as reference in a lookup table that holds the position/memoryposition of a sorted table with items that all start with the same two bytes..