Find an string in a linked list, two ways one good one BAD!!

Share your advanced PureBasic knowledge/code with the community.
Jan Vooijs
Enthusiast
Enthusiast
Posts: 196
Joined: Tue Sep 30, 2003 4:32 pm
Location: The Netherlands

Find an string in a linked list, two ways one good one BAD!!

Post by Jan Vooijs »

Code updated for 5.20+

Hello,

I would like to share my findings into how to find an item in a linked list, in this case a certain string. This is part of a much larger program, but simple to understand. In my app i have a list of known spam adresses (12000 of them) and want to find if the email i recieve is from a spammer.

Look at this code run for yourself and see the difference. One warning do NOT try with more than 10000 since the 'wrong' for-next loop takes ages to run.

Notice i have had to 'slow down' the right way 520 times!!

Here is my code contribution:

Code: Select all

EnableExplicit

NewList slTest.s()

Define Seek.s

Define X.l
Define Count.l
Define Idx.l
Define T1.l, T2.l

Seek.s = "xxtest"

For x = 1 To 10000
   AddElement( slTest.s())
   slTest.s() = "test+" + Str( x)
Next

; this is index number 10000 and not 10001 since we start from ZERO!!
AddElement( slTest())
slTest() = Seek.s

OpenConsole()

; Try to find the "xxtest" in the linkedlist and how long it takes, version 1 for-next loop...
PrintN( "=Start = " + Seek.s)

Idx. l = -1
T1.l = ElapsedMilliseconds()
Count.l = ListSize( slTest.s())
For Idx.l = 0 To Count.l
   SelectElement( slTest.s(), Idx.l)
   If slTest.s() = Seek.s
      Break
   EndIf
Next
T2.l = ElapsedMilliseconds()
    
If Idx.l > -1
   PrintN( "=Found in: " + Str( T2 - T1) + "ms")
   PrintN( "At index: "+ Str( Idx.l))
EndIf


; Now the ForEach methode. It is SOO fast had to slow it down 520 times..
Idx. l = -1
T1.l = ElapsedMilliseconds()
For x = 1 To 520
   ForEach slTest.s()
      If slTest.s() = Seek.s
         Idx.l = ListIndex( slTest.s())
         Break
      EndIf
   Next
Next
T2.l = ElapsedMilliseconds()
    
If Idx.l > -1
   PrintN( "=Found in: " + Str( T2 - T1) + "ms")
   PrintN( "At index: "+ Str( Idx.l))
Else
   PrintN( "!!NOT!! Found in: " + Str( T2 - T1) + "ms")
EndIf

Input()

CloseConsole()

End
Have fun with it, oh by the way my app runs now so fast (yes 100 times faster)!!

Jan V.
Life goes to Fast, Enjoy!!

PB 4 is to good to be true, wake up man it is NOT a dream THIS is a reality!!!

AMD Athlon on 1.75G, 1Gb ram, 160Gb HD, NVidia FX5200, NEC ND-3500AG DVD+RW and CD+RW, in a Qbic EO3702A and Win XP Pro SP2 (registered)
Killswitch
Enthusiast
Enthusiast
Posts: 731
Joined: Wed Apr 21, 2004 7:12 pm

Post by Killswitch »

Christ that's a dramatic speed improvement - I'm a bit dubious as to how it manages to do it this quickly though?
~I see one problem with your reasoning: the fact is thats not a chicken~
srod
PureBasic Expert
PureBasic Expert
Posts: 10589
Joined: Wed Oct 29, 2003 4:35 pm
Location: Beyond the pale...

Post by srod »

That certainly is fast! :)

It's not that the 'fast method' is fast as such, but that the first method, relying on SelectElement(), is notoriously slow! :) This is because (I think!) that when you use SelectElement() to say, move to the 10th element, PB first iterates through the first 9 elements to get there etc!
I may look like a mule, but I'm not a complete ass.
Jan Vooijs
Enthusiast
Enthusiast
Posts: 196
Joined: Tue Sep 30, 2003 4:32 pm
Location: The Netherlands

Post by Jan Vooijs »

srod wrote:That certainly is fast! :)

It's not that the 'fast method' is fast as such, but that the first method, relying on SelectElement(), is notoriously slow! :) This is because (I think!) that when you use SelectElement() to say, move to the 10th element, PB first iterates through the first 9 elements to get there etc!
@Srod,
Oops forget that but there is a warning in my above post! It certainly looks likes that because if your run the for-next loop version with 100000 (100k) it takes for EVER to run (ahem 179 secs against 486ms), i inserted an 'every 10k blib message' and it runs slower and slower every next 10k items in the list, so yes SelectElemnt() traves every time from the start, tough luck if you need the last item.

First I did not notice the lack in performance since i have so many spam adresses i figured oke it is slow. But then converted to test and was amazed!!

Jan V.
Life goes to Fast, Enjoy!!

PB 4 is to good to be true, wake up man it is NOT a dream THIS is a reality!!!

AMD Athlon on 1.75G, 1Gb ram, 160Gb HD, NVidia FX5200, NEC ND-3500AG DVD+RW and CD+RW, in a Qbic EO3702A and Win XP Pro SP2 (registered)
Killswitch
Enthusiast
Enthusiast
Posts: 731
Joined: Wed Apr 21, 2004 7:12 pm

Post by Killswitch »

However it works (I'm still hanning onto the "More of Fred's Witchcraft" explination :P) I'm so glad you posted it, it's come at exactly the right time for me!
~I see one problem with your reasoning: the fact is thats not a chicken~
Trond
Always Here
Always Here
Posts: 7446
Joined: Mon Sep 22, 2003 6:45 pm
Location: Norway

Post by Trond »

Turn the first loop into this and you'll see it keep up without problems:

Code: Select all

Idx. l = -1 
T1.l = ElapsedMilliseconds() 
Count.l = CountList( slTest.s()) 
FirstElement(slTest())
For Idx.l = 0 To Count.l 
   If slTest.s() = Seek.s 
      Break 
   EndIf 
   NextElement(slTest())
Next 
T2.l = ElapsedMilliseconds()
dioxin
User
User
Posts: 97
Joined: Thu May 11, 2006 9:53 pm

Post by dioxin »

I'm curious. Why would you use a linked list for this task? Surely an array is more appropriate and would be a lot faster? You can probably scan a 100,000 element array in under 10ms compared to your current linked list search of 480ms.


Of course, if you want real speed then you'd use an entirely different approach such as a sorted array searched with a binary chop (slow to insert new elements but fast to find them) or a binary tree structure (fast to insert and search but more complex to program) both of which I'd expect you to be able to scan in under 10usec.

Paul.
User avatar
Joakim Christiansen
Addict
Addict
Posts: 2452
Joined: Wed Dec 22, 2004 4:12 pm
Location: Norway
Contact:

Post by Joakim Christiansen »

It will jump compulsory to every element until the target position is reached
That is why SelectElement() is slow!
I like logic, hence I dislike humans but love computers.
Jan Vooijs
Enthusiast
Enthusiast
Posts: 196
Joined: Tue Sep 30, 2003 4:32 pm
Location: The Netherlands

Post by Jan Vooijs »

dioxin wrote:I'm curious. Why would you use a linked list for this task? Surely an array is more appropriate and would be a lot faster? You can probably scan a 100,000 element array in under 10ms compared to your current linked list search of 480ms.


Of course, if you want real speed then you'd use an entirely different approach such as a sorted array searched with a binary chop (slow to insert new elements but fast to find them) or a binary tree structure (fast to insert and search but more complex to program) both of which I'd expect you to be able to scan in under 10usec.

Paul.
Yes and NO, if you read cairfully I SLOWED the ForEach loop DOWN because it is so fast i could not time it!! It runs on 100000 (100k) in just 16ms (on averidge) not bad for a 100K scan. I did not try an array this is fast enough for the future!!

Why linked list? Easy to progam, sorting is quick (and very easy). Think of it as a large stack of cards. While in an array one has to be cairfull not to add 'out of bounds'! Look at how easy i fill the database with 10000 (10k) strings, four lines of code!!

Jan V.

(edit an typo it was 4 lines and not three!!)
Last edited by Jan Vooijs on Tue Jul 18, 2006 2:41 am, edited 1 time in total.
Life goes to Fast, Enjoy!!

PB 4 is to good to be true, wake up man it is NOT a dream THIS is a reality!!!

AMD Athlon on 1.75G, 1Gb ram, 160Gb HD, NVidia FX5200, NEC ND-3500AG DVD+RW and CD+RW, in a Qbic EO3702A and Win XP Pro SP2 (registered)
Jan Vooijs
Enthusiast
Enthusiast
Posts: 196
Joined: Tue Sep 30, 2003 4:32 pm
Location: The Netherlands

Post by Jan Vooijs »

Trond wrote:Turn the first loop into this and you'll see it keep up without problems:

Code: Select all

Idx. l = -1 
T1.l = ElapsedMilliseconds() 
Count.l = CountList( slTest.s()) 
FirstElement(slTest())
For Idx.l = 0 To Count.l 
   If slTest.s() = Seek.s 
      Break 
   EndIf 
   NextElement(slTest())
Next 
T2.l = ElapsedMilliseconds()
You are right it is as fast as the second methode. But because the ForEach looks more 'nicer' i stick to that one.

As this proofs you can do one thing in more than two ways!!

Thanks Trond

Jan V.
Life goes to Fast, Enjoy!!

PB 4 is to good to be true, wake up man it is NOT a dream THIS is a reality!!!

AMD Athlon on 1.75G, 1Gb ram, 160Gb HD, NVidia FX5200, NEC ND-3500AG DVD+RW and CD+RW, in a Qbic EO3702A and Win XP Pro SP2 (registered)
sverson
Enthusiast
Enthusiast
Posts: 286
Joined: Sun Jul 04, 2004 12:15 pm
Location: Germany

Post by sverson »

@Jan Vooijs: If you want it really fast try my X-Tree.pbi...

It's quite slow when you add elements - but it's just in time sorted.
But XTree_FindKey is verry fast!!!
500.000 records - max 19 steps to find the key!

You need X-Tree.pbi (beta) to compile the Demo.

Code: Select all

#ListSize = 500000

XIncludeFile "X-Tree.pbi"

Global Dim SourceList.s(#ListSize)
Global NewList LinkedList.s()
Global *XTree.XTree_RN
Global StrToFind.s

Procedure.s FillSourceList() ;/ Same source for LinkedList and X-Tree
  Protected ListPos.l, KeyStr.s, KeyPos.l, EMS.l
  EMS = ElapsedMilliseconds()
  For ListPos = 1 To #ListSize
    KeyStr=""
    For KeyPos=1 To 10
      KeyStr + RSet(Hex(Random($FF)),2,"0")
    Next
    SourceList(ListPos) = KeyStr
  Next
  ProcedureReturn "SourceList(): "+Str(#ListSize)+" Elements stored in " +Str(ElapsedMilliseconds()-EMS)+"ms."
EndProcedure

Procedure.s FillLinkedList()
  Protected ListPos.l, EMS.l
  EMS = ElapsedMilliseconds()
  For ListPos = 1 To #ListSize
    AddElement(LinkedList())
    LinkedList() = SourceList(ListPos)
  Next
  ProcedureReturn "LinkedList(): "+Str(#ListSize)+" Elements stored in " +Str(ElapsedMilliseconds()-EMS)+"ms."
EndProcedure

Procedure.s FillXTree()
  Protected ListPos.l, EMS.l
  EMS = ElapsedMilliseconds()
  *XTree.XTree_RN = XTree_NewTree(10)
  For ListPos = 1 To #ListSize
    XTree_NewKey(*XTree,SourceList(ListPos))
  Next
  ProcedureReturn "XTree: "+Str(#ListSize)+" Elements stored in " +Str(ElapsedMilliseconds()-EMS)+"ms."
EndProcedure

Procedure.s FindLinkedList(KeyStr.s)
  Protected EMS.l
  EMS = ElapsedMilliseconds()
  ForEach LinkedList()
    If LinkedList() = KeyStr
      ProcedureReturn "LinkedList(): '"+KeyStr+"' found after " +Str(ElapsedMilliseconds()-EMS)+"ms."
    EndIf
  Next
  ProcedureReturn "LinkedList(): '"+KeyStr+"' NOT found after " +Str(ElapsedMilliseconds()-EMS)+"ms."
EndProcedure
  
Procedure.s FindXTree(KeyStr.s)
  Protected EMS.l
  EMS = ElapsedMilliseconds()
  If XTree_FindKey(*XTree,KeyStr)
    ProcedureReturn "XTree: '"+KeyStr+"' found after " +Str(ElapsedMilliseconds()-EMS)+"ms."
  Else
    ProcedureReturn "XTree: '"+KeyStr+"' NOT found after " +Str(ElapsedMilliseconds()-EMS)+"ms."
  EndIf
EndProcedure


Debug FillSourceList()
Debug FillLinkedList()
Debug FillXTree()

For x=1 To 20
  Debug "___________________________"
  StrToFind = SourceList(Random(#ListSize))
  If Not Random(3) : StrToFind+" ERROR!" : EndIf
  Debug FindLinkedList(StrToFind)
  Debug FindXTree(StrToFind)
  Debug "¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯"
Next
;-) sverson
Xombie
Addict
Addict
Posts: 898
Joined: Thu Jul 01, 2004 2:51 am
Location: Tacoma, WA
Contact:

Post by Xombie »

My red-black tree stuff is still hiding out here somewhere on the forum. At the time, it took roughly 4.5 seconds to add 2,000,000 random elements and somewhere around 0 to find a key from those 2 million records. It was actually one of my first tidbits I released on the forum :) I should optimize that sucker....
Jan Vooijs
Enthusiast
Enthusiast
Posts: 196
Joined: Tue Sep 30, 2003 4:32 pm
Location: The Netherlands

Post by Jan Vooijs »

sverson wrote:@Jan Vooijs: If you want it really fast try my X-Tree.pbi...

... SNIP EDIT ...

;-) sverson
I will look into it but seeing the code the first time i'am afriad my code needs a rewrite to use yours, maybe if i run into perfomance problems i will use it. One thing if i remember correctly a X-tree it not so fast if there are a lot off 'almost similair' entries. I have +350 Hotmail accounts sending spam (among the 18000+ entries).

My current version (the ForEach-methode) looks in under 15ms through the current database of 18000+ adresses, which in my opinion is very fast

And until i hit the 10.000.000 (10 million mark i find it in just under 1200ms). And 1 million is 'just' 93ms. In the last 3 years i build the spam adresses from zero to hero, pardon to 18000+ so i expect to run a long time with the ForEach-methode.

An other idea is to use an "index", every letter has a starting point in the linked list since it is sorted, so switching to that first index an search from there will likely speed it up. Or better making a linked list for every letter and digit (ouch)...

But thanks for the sugestion.

And by the way found a third version which is also very fast:

Code: Select all

Idx. l = -1
T1.l = ElapsedMilliseconds()
ResetList( slTest.s())
While NextElement(slTest())
   If slTest.s() = Seek.s
      Break
   EndIf
Wend
T2.l = ElapsedMilliseconds()
The trick is the "ResetList()' and 'NextElement()'. It is slightly modified from the help-file where i found it today. And yes as fast (sometimes 1ms faster) then my ForEach-methode. But i find that ForEach so elegant i stick to it (for now).

But than this post was more of showing "how to kiil a bird with more than one stone" (example not that I would kill a bird, heavens!!) ;)

Jan V.
Life goes to Fast, Enjoy!!

PB 4 is to good to be true, wake up man it is NOT a dream THIS is a reality!!!

AMD Athlon on 1.75G, 1Gb ram, 160Gb HD, NVidia FX5200, NEC ND-3500AG DVD+RW and CD+RW, in a Qbic EO3702A and Win XP Pro SP2 (registered)
Jan Vooijs
Enthusiast
Enthusiast
Posts: 196
Joined: Tue Sep 30, 2003 4:32 pm
Location: The Netherlands

Post by Jan Vooijs »

Xombie wrote:My red-black tree stuff is still hiding out here somewhere on the forum. At the time, it took roughly 4.5 seconds to add 2,000,000 random elements and somewhere around 0 to find a key from those 2 million records. It was actually one of my first tidbits I released on the forum :) I should optimize that sucker....
Xombie is XSub not a program of yours? Found it somewhere and looks promessing for subtitleing some TV shows I have.

And again if i hit a perfomance barrier i look into yours code too.

Thanks,

Jan V.
Life goes to Fast, Enjoy!!

PB 4 is to good to be true, wake up man it is NOT a dream THIS is a reality!!!

AMD Athlon on 1.75G, 1Gb ram, 160Gb HD, NVidia FX5200, NEC ND-3500AG DVD+RW and CD+RW, in a Qbic EO3702A and Win XP Pro SP2 (registered)
sverson
Enthusiast
Enthusiast
Posts: 286
Joined: Sun Jul 04, 2004 12:15 pm
Location: Germany

Post by sverson »

Xombie wrote:At the time, it took roughly 4.5 seconds to add 2,000,000 random elements....
@Xombie:
On my PC your Red/Black Tree took 25 seconds to add 500.000 elements.
I could not test the lookop function because RBTree does not support string-keys.
XTree added 500.000 sorted string elements in 17,5 seconds. ;-)
Debugger wrote:SourceList(): 500000 Elements stored in 2516ms.
LinkedList(): 500000 Elements stored in 390ms.
XTree: 500000 Elements stored in 17438ms.
RBTree: 500000 Elements stored in 25031ms.
___________________________
LinkedList(): '3855BDC7418D8C00560A ERROR!' NOT found after 63ms.
XTree: '3855BDC7418D8C00560A ERROR!' NOT found after 0ms.
¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
___________________________
LinkedList(): '60D2EACF61D94470E73F' found after 15ms.
XTree: '60D2EACF61D94470E73F' found after 0ms.
¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
___________________________
LinkedList(): 'DDDBF81FEB5641553150' found after 63ms.
XTree: 'DDDBF81FEB5641553150' found after 0ms.
¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
___________________________
LinkedList(): '3181D1212E5100ECC98E ERROR!' NOT found after 62ms.
XTree: '3181D1212E5100ECC98E ERROR!' NOT found after 0ms.
¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
... ...
___________________________
LinkedList(): 'BDDD3EE1B29A21F264A5' found after 31ms.
XTree: 'BDDD3EE1B29A21F264A5' found after 0ms.
¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
___________________________
LinkedList(): '3A1F49558F907763DD08' found after 63ms.
XTree: '3A1F49558F907763DD08' found after 0ms.
¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯

Code: Select all

Procedure.s FillRBTree()
  Protected ListPos.l, EMS.l
  EMS = ElapsedMilliseconds()
  *RBTree = rbtCreate()
  For ListPos = 1 To #ListSize
    rbtAdd(*RBTree,ListPos,SourceList(ListPos))
  Next
  ProcedureReturn "RBTree: "+Str(#ListSize)+" Elements stored in " +Str(ElapsedMilliseconds()-EMS)+"ms."
EndProcedure
;-) sverson
Post Reply