A fast way to compare stuff

Just starting out? Need help? Post your questions and find answers here.
Inf0Byt3
PureBasic Fanatic
PureBasic Fanatic
Posts: 2236
Joined: Fri Dec 09, 2005 12:15 pm
Location: Elbonia

A fast way to compare stuff

Post by Inf0Byt3 »

I can't find a fast way to compare one element (memory location) with another 52000 elements. I am currently using an array to hold the 52000 elements but it's too damn slow. Any ideas please?

Code: Select all


 global dim search.s(52000)
 base.s = "testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttest"
 for t = 1 to 52000
  if search.s(t-1) = base.s
   break
   debug "found!"
  endif
 next t

None are more hopelessly enslaved than those who falsely believe they are free. (Goethe)
Dare2
Moderator
Moderator
Posts: 3321
Joined: Sat Dec 27, 2003 3:55 am
Location: Great Southern Land

Post by Dare2 »

Build a lookup tree?
@}--`--,-- A rose by any other name ..
Inf0Byt3
PureBasic Fanatic
PureBasic Fanatic
Posts: 2236
Joined: Fri Dec 09, 2005 12:15 pm
Location: Elbonia

Post by Inf0Byt3 »

Can you give me an example please? (Never coded something like this before :D).
None are more hopelessly enslaved than those who falsely believe they are free. (Goethe)
Dare2
Moderator
Moderator
Posts: 3321
Joined: Sat Dec 27, 2003 3:55 am
Location: Great Southern Land

Post by Dare2 »

Ack. Sorry Inf0Byt3, I don't have any code for it in PureBasic and don't know that I could get something up real quick, even converting. :?

I'll have a go tomorrow, though, if nothing comes up for you before.

But also chase B .. binary .. balanced etc trees, etc, on the net, there is a lot out there.
@}--`--,-- A rose by any other name ..
Inf0Byt3
PureBasic Fanatic
PureBasic Fanatic
Posts: 2236
Joined: Fri Dec 09, 2005 12:15 pm
Location: Elbonia

Post by Inf0Byt3 »

Cool, allready had a search on the forums and found something, but I need to understand it deeply before attempting something. It seems that I need a sort of indexing in memory. Tha main plan was to enumerate the array elements one by one and use Boyer Moore to search for the pattern inside of each element. This is too simple to be fast so I must find something much quicker.
Thanks for helping me :D .
None are more hopelessly enslaved than those who falsely believe they are free. (Goethe)
User avatar
netmaestro
PureBasic Bullfrog
PureBasic Bullfrog
Posts: 8451
Joined: Wed Jul 06, 2005 5:42 am
Location: Fort Nelson, BC, Canada

Post by netmaestro »

I made something similar a few months ago, it's pretty fast. Searches 170000 words in way under 1 millisecond:

http://www.purebasic.fr/english/viewtop ... y&start=20
BERESHEIT
Inf0Byt3
PureBasic Fanatic
PureBasic Fanatic
Posts: 2236
Joined: Fri Dec 09, 2005 12:15 pm
Location: Elbonia

Post by Inf0Byt3 »

Extremely usefull. Thank you. I will study it and see if I can use it.
None are more hopelessly enslaved than those who falsely believe they are free. (Goethe)
Inf0Byt3
PureBasic Fanatic
PureBasic Fanatic
Posts: 2236
Joined: Fri Dec 09, 2005 12:15 pm
Location: Elbonia

Post by Inf0Byt3 »

Searches 170000 words in way under 1 millisecond:
Am I doing something wrong? On my 1000 Mhz AMD it takes around 1,7 seconds...
None are more hopelessly enslaved than those who falsely believe they are free. (Goethe)
Dare2
Moderator
Moderator
Posts: 3321
Joined: Sat Dec 27, 2003 3:55 am
Location: Great Southern Land

Post by Dare2 »

Hi Inf0Byt3,

What are the parameters for your search? Always text? Binary? Variable length "keys"?
@}--`--,-- A rose by any other name ..
Inf0Byt3
PureBasic Fanatic
PureBasic Fanatic
Posts: 2236
Joined: Fri Dec 09, 2005 12:15 pm
Location: Elbonia

Post by Inf0Byt3 »

Whazzzuuupp. This is for PureAV... So i need only binary searching, I guess... The sizes are variable yes... Here is a demo:

Code: Select all

Exploit.WMF.A=26060f001600ffffffff0000470000008f
Exploit.WMF.B=b804c0b807c1dab8858098d50eb854
Exploit.WMF.Gen-1=010009000003521f0000???????
Exploit.WMF.Gen-2=010009000003521f0000???????
Worm.Mytob.Gen-4=28d0010035d00100000000000
Trojan.Downloader.Small-525=7280958a6b877e83
Exploit.DCOM.Gen=05000b0310000000480000007f
The second part is a HEX encoded binary string. I 'unhex' them in memory and then I need to search this keys in a loaded file...

[Edit]
Truncated them a bit, too space consuming :)
Last edited by Inf0Byt3 on Tue May 02, 2006 5:49 pm, edited 1 time in total.
None are more hopelessly enslaved than those who falsely believe they are free. (Goethe)
Dare2
Moderator
Moderator
Posts: 3321
Joined: Sat Dec 27, 2003 3:55 am
Location: Great Southern Land

Post by Dare2 »

Woot. Not a small thing. :)

Tomorrow I will have a play. But if it ends up like the other "help" I've tried to give you, you are probably getting worried about now .... :D
@}--`--,-- A rose by any other name ..
User avatar
netmaestro
PureBasic Bullfrog
PureBasic Bullfrog
Posts: 8451
Joined: Wed Jul 06, 2005 5:42 am
Location: Fort Nelson, BC, Canada

Post by netmaestro »

Sorry, I meant it will find one word in a list of that size in way less than 1ms. The whole list takes longer.
BERESHEIT
User avatar
geoff
Enthusiast
Enthusiast
Posts: 128
Joined: Sun Apr 27, 2003 12:01 am
Location: Cornwall UK
Contact:

Post by geoff »

Inf0Byt3 wrote:I can't find a fast way to compare one element (memory location) with another 52000 elements. I am currently using an array to hold the 52000 elements but it's too damn slow. Any ideas please?
The examples above assume that the large string array you wish to search is ordered in some way. For example the data to be searched may be in alphabetic order. You can then do a binary search of 2^n array entries using only n tests.

If the large array is essentially random you may be able to speed the search by searching on just the first part of your search string, maybe just the first character, with a second test for the entire search string. Tricks like this will work better in some languages than in others.
Inf0Byt3
PureBasic Fanatic
PureBasic Fanatic
Posts: 2236
Joined: Fri Dec 09, 2005 12:15 pm
Location: Elbonia

Post by Inf0Byt3 »

@netmaestro
No problem, It is my mistake, not yours. I missunderstood what you said, I should be sorry for that... It works cool until now. This is the fastest method until now.

@Dare2
There is no need to make the 'unhex' part I allready have it done. I simply need the agorithm implementation in pb... I hope I'm not consuming your time... If you can't make it, there is absolutely no problem. Don't waste your time with me...

@geoff
If I find other methods (I'll wait for Dare2 to try) I'll use them. The array is too much time consuming I guess. However, I 'll try to sort it and make some speed tests.

Thanks to you all mates, it's a great thing to get help when you need it.

A quick note... Here is the link for an open source scanner made in Java that is very fast (for an interpreted implementation)
http://sourceforge.net/project/showfile ... _id=242042[/code]
None are more hopelessly enslaved than those who falsely believe they are free. (Goethe)
Pupil
Enthusiast
Enthusiast
Posts: 715
Joined: Fri Apr 25, 2003 3:56 pm

Post by Pupil »

I think it would be better to store the string encoded binary as binary in mem for all 52k items and use the first two bytes as reference in a lookup table that holds the position/memoryposition of a sorted table with items that all start with the same two bytes..
Post Reply