Page 1 of 2

PureBasic speed test compared to this video

Posted: Thu Mar 25, 2021 3:16 pm
by BarryG
Hi all, I'm a fan of Dave Plummer on YouTube (he wrote "Task Manager" on Windows, and lots of other Windows things) and he just posted a video about the speed of calculating prime numbers in Python, C#, and C++ (see the results here). He posted the source for the three languages on Github. My obvious question is: can someone convert one of those sources to PureBasic to compare the speed to his results? Would be very interesting to see. Make sure you post your code here to compare. Thanks!

[Edit] The test is done over 5 seconds, to compare properly with his results.

Re: PureBasic speed test compared to this video

Posted: Thu Mar 25, 2021 4:10 pm
by Axolotl
i will have a look to the codes. usually i am to slow on doing this... the professionals here will beat me with ther solutions i guess. :oops:
stay tuned.

Re: PureBasic speed test compared to this video

Posted: Thu Mar 25, 2021 4:24 pm
by NicTheQuick
Just translated it for you:

Code: Select all

EnableExplicit

Global Dim rawbits.a(0)
Global sieveSize.i

Procedure validateResults(actualCount.i)
	Protected count.i
	Select sieveSize
		Case 10: count = 1
		Case 100: count = 25
		Case 1000: count = 168
		Case 10000: count = 1229
		Case 100000: count = 9592
		Case 1000000: count = 78498
		Case 10000000: count = 664579
		Case 100000000: count = 5761455
		Default: ProcedureReturn #False
	EndSelect
	
	ProcedureReturn Bool(count = actualCount)
EndProcedure

Procedure GetBit(index.i)
	If index & 1
		index / 2
		ProcedureReturn rawbits(index / 8) & (1 << (index % 8))
	EndIf
	ProcedureReturn #False
EndProcedure

Procedure ClearBit(index.i)
	If index & 1
		index / 2
		rawbits(index / 8) & ~(1 << (index % 8))
	EndIf
EndProcedure

Procedure prime_sieve(n.i)
	sieveSize = n
	ReDim rawbits.a(n / 8)
	FillMemory(@rawbits(0), n / 8 + 1, $ff, #PB_Ascii)
EndProcedure

Procedure runSieve()
	Protected factor.i = 3
	Protected q = Sqr(sieveSize)
	Protected num.i
	
	While (factor < q)
		num = factor
		While num < sieveSize
			If GetBit(num)
				factor = num
				Break
			EndIf
			num + 1
		Wend
		num = factor * 3
		While num < sieveSize
			ClearBit(num)
			num + factor * 2
		Wend
		factor + 2
	Wend
EndProcedure

Procedure printResults(showResults.i, duration.d, passes.i)
	Protected num.i
	
	If showResults
		Print("2, ")
	EndIf
	
	Protected count.i = 1
	For num = 3 To sieveSize
		If GetBit(num)
			If showResults
				Print(Str(num) + ", ")
			EndIf
			count + 1
		EndIf
	Next
	
	If showResults
		PrintN("")
	EndIf
	
	PrintN("Passes: " + passes + 
	       ", Time: " + duration + " ms" +
	       ", Avg: " + StrD(duration / passes, 3) + " ms" +
	       ", Limit: " + sieveSize + 
	       ", Count: " + count + 
	       ", Valid: " + Str(validateResults(count)))
EndProcedure

Procedure main(limit.i = 1000000, duration.i = 10000)
	Protected passes.i = 0
	
	Protected tStart.i = ElapsedMilliseconds()
	
	While ElapsedMilliseconds() - tStart < duration
		prime_sieve(limit)
		runSieve()
		passes + 1
	Wend
	
	Protected tD.i = ElapsedMilliseconds() - tStart
	
	printResults(#False, tD, passes)
EndProcedure

OpenConsole()

main(1000000, 10000)
Input()
CloseConsole()
And this is the output on my old laptop:

Code: Select all

Passes: 246, Time: 10021 ms, Avg: 40.736 ms, Limit: 1000000, Count: 78498, Valid: 1
And in case you are curious, this is with debugger on:

Code: Select all

Passes: 30, Time: 10190 ms, Avg: 339.667 ms, Limit: 1000000, Count: 78498, Valid: 1
I also compiled the C++ code using g++ on Linux and got this result:

Code: Select all

Passes: 1741, Time: 10000 ms, Avg: 5.744, Limit: 1000000, Count: 78498, Valid: 1
As you can see it is nearly 7 times faster. I gues the main reason for that is automatic inlining of functions and better optimization for GetBit and ClearBit.

Re: PureBasic speed test compared to this video

Posted: Thu Mar 25, 2021 4:30 pm
by tft
on my maschine ....... Passes 329, Time 10004 ms, AVG= 30.407

and yet ???

Re: PureBasic speed test compared to this video

Posted: Thu Mar 25, 2021 4:58 pm
by BarryG
The test is meant to be done over a 5-second period, not 10. My result over 5 seconds is:

Passes: 182, Time: 5001 ms, Avg: 27.478 ms, Limit: 1000000, Count: 78498, Valid: 1

(And thanks for the converted code, NicTheQuick!).

Re: PureBasic speed test compared to this video

Posted: Thu Mar 25, 2021 5:24 pm
by fluent
PB 5.62 x86, debugger off:
Passes: 395, Time: 5009 ms, Avg: 12.681 ms, Limit: 1000000, Count: 78498, Valid: 1

PB 5.62 x64, debugger off:
Passes: 122, Time: 5039 ms, Avg: 41.303 ms, Limit: 1000000, Count: 78498, Valid: 1

Looks like 64-bit is slower??

(win 10 x64 - 5-year-old low-end ASUS laptop)

Re: PureBasic speed test compared to this video

Posted: Thu Mar 25, 2021 5:31 pm
by skywalk
Ha! This is quite timely with v6 approaching.

Please post your OS versions and whitelisting of PB app in use.
And there is no process priority applied to the test?

Re: PureBasic speed test compared to this video

Posted: Thu Mar 25, 2021 6:51 pm
by NicTheQuick
With macros and the avoidance of modulo you actually can make it pretty fast. And that is what a C compiler usually does automatically for you:

Code: Select all

EnableExplicit

Global Dim rawbits.a(0)
Global sieveSize.i

Procedure validateResults(actualCount.i)
	Protected count.i
	Select sieveSize
		Case 10: count = 1
		Case 100: count = 25
		Case 1000: count = 168
		Case 10000: count = 1229
		Case 100000: count = 9592
		Case 1000000: count = 78498
		Case 10000000: count = 664579
		Case 100000000: count = 5761455
		Default: ProcedureReturn #False
	EndSelect
	
	ProcedureReturn Bool(count = actualCount)
EndProcedure

Macro GetBit(index)
	((index & 1) And rawbits(index >> 4) & (1 << ((index & $f) >> 1)))
EndMacro

Macro ClearBit(index)
	rawbits(index >> 4) & ~(1 << ((index & $f) >> 1))
EndMacro

Procedure prime_sieve(n.i)
	sieveSize = n
	ReDim rawbits.a(n / 8)
	FillMemory(@rawbits(0), n / 8 + 1, $ff, #PB_Ascii)
EndProcedure

Procedure runSieve()
	Protected factor.i = 3
	Protected q = Sqr(sieveSize)
	Protected num.i
	
	While (factor < q)
		num = factor
		While num < sieveSize
			If GetBit(num)
				factor = num
				Break
			EndIf
			num + 1
		Wend
		num = factor * 3
		While num < sieveSize
			; We can be sure that num is always an odd number
			ClearBit(num)
			num + factor * 2
		Wend
		factor + 2
	Wend
EndProcedure

Procedure printResults(showResults.i, duration.d, passes.i)
	Protected num.i
	
	If showResults
		Print("2, ")
	EndIf
	
	Protected count.i = 1
	For num = 3 To sieveSize
		If GetBit(num)
			If showResults
				Print(Str(num) + ", ")
			EndIf
			count + 1
		EndIf
	Next
	
	If showResults
		PrintN("")
	EndIf
	
	PrintN("Passes: " + passes + 
	       ", Time: " + duration + " ms" +
	       ", Avg: " + StrD(duration / passes, 3) + " ms" +
	       ", Limit: " + sieveSize + 
	       ", Count: " + count + 
	       ", Valid: " + Str(validateResults(count)))
EndProcedure

Procedure main(limit.i = 1000000, duration.i = 5000)
	Protected passes.i = 0
	
	Protected tStart.i = ElapsedMilliseconds()
	
	While ElapsedMilliseconds() - tStart < duration
		prime_sieve(limit)
		runSieve()
		passes + 1
	Wend
	
	Protected tD.i = ElapsedMilliseconds() - tStart
	
	printResults(#False, tD, passes)
EndProcedure

OpenConsole()

main(1000000, 5000)
Input()
CloseConsole()
Old code over 5 seconds:

Code: Select all

Passes: 129, Time: 5001 ms, Avg: 38.767 ms, Limit: 1000000, Count: 78498, Valid: 1
New version over 5 seconds:

Code: Select all

Passes: 1690, Time: 5000 ms, Avg: 2.959 ms, Limit: 1000000, Count: 78498, Valid: 1
C-Code over 5 seconds on the same machine:

Code: Select all

Passes: 909, Time: 5000 ms, Avg: 5.501 ms, Limit: 1000000, Count: 78498, Valid: 1
C-Code over 5 seconds with -O3 flag for best optimization:

Code: Select all

Passes: 4431, Time: 5000 ms, Avg: 1.128 ms, Limit: 1000000, Count: 78498, Valid: 1
Btw my CPU: Intel(R) Core(TM) i7-3820QM CPU @ 2.70GHz with 8MB L3 cache.
It's a Thinkpad W530.

Re: PureBasic speed test compared to this video

Posted: Fri Mar 26, 2021 2:31 am
by Rinzwind
PB needs inline procedures as manual option at least if the compiler is not smart enough to detect when to apply it.

Same as with easy byref parameters (way more BASIC than having to deal with pointers). I guess it is too much to ask for for current asm emitter logic? Too much old, sticky and cold spaghetti?

Anyway good to see that macro's saved PB's ass ;). They could also do with some practical extra's btw.

Re: PureBasic speed test compared to this video

Posted: Fri Mar 26, 2021 2:41 am
by skywalk
Macro's are inlined code. :idea: :wink:

Re: PureBasic speed test compared to this video

Posted: Fri Mar 26, 2021 10:46 am
by NicTheQuick
skywalk wrote:Macro's are inlined code. :idea: :wink:
Not really. Macros are more like generic programming in C(++).
For example you can not create Macros with a non trivial return value. This only works if the macro itself is a one liner. And then you also have the issue that parameters which are used multiple times were also evaluated multiple times. See here:

Code: Select all

Global counter = 0
Procedure count()
	counter + 1
	ProcedureReturn counter
EndProcedure

Macro tripleAMacro(a)
	((a) + (a) + (a))
EndMacro

Procedure tripleAProcedure(a)
	ProcedureReturn ((a) + (a) + (a))
EndProcedure

; Calls count() once
Debug tripleAProcedure(count())

counter = 0
; Calls count() three times
Debug tripleAMacro(count())

Re: PureBasic speed test compared to this video

Posted: Fri Mar 26, 2021 12:53 pm
by skywalk
I agree, but I was referring to your macro code in the speed test. l often inline code for speed at the expense of shorter procedures.

Re: PureBasic speed test compared to this video

Posted: Fri Mar 26, 2021 1:05 pm
by NicTheQuick
skywalk wrote:I agree, but I was referring to your macro code in the speed test. l often inline code for speed at the expense of shorter procedures.
In this case the inlining doubled the speed. If I change GetBit and ClearBit back to a procedure I get this result:

Code: Select all

Passes: 876, Time: 5002 ms, Avg: 5.710 ms, Limit: 1000000, Count: 78498, Valid: 1
which is ~52% of the speed of the Macro version.
If I then replace the shifts back to division (">> 4" -> "/ 16") it makes a much bigger difference:

Code: Select all

Passes: 190, Time: 5010 ms, Avg: 26.368 ms, Limit: 1000000, Count: 78498, Valid: 1
Finally replacing the bitwise ANDs with modulo again ("& $f" -> "% 16") it gets even slower:

Code: Select all

Passes: 125, Time: 5013 ms, Avg: 40.104 ms, Limit: 1000000, Count: 78498, Valid: 1
I think Purebasic should do the optimization with the modulo and shifts automatically. Then you would not need to make it manually.

But if Purebasic evolves to a compiler for C-Code instead FASM, we do not longer care about that I guess.

Re: PureBasic speed test compared to this video

Posted: Fri Mar 26, 2021 1:29 pm
by Axolotl
@NicTheQuick:
if i understand you correctly, these are the fastest routines based on procedure, right? :oops:

Code: Select all

; Macro GetBit(index) 
;    ((index & 1) And rawbits(index >> 4) & (1 << ((index & $f) >> 1)))
; EndMacro

Procedure GetBit(index.i)
   If index & 1
      ProcedureReturn rawbits(index >> 4) & (1 << ((index & $f) >> 1))
   EndIf
   ProcedureReturn #False
;    If index & 1
;       index / 2
;       ProcedureReturn rawbits(index / 8) & (1 << (index % 8))
;    EndIf
;    ProcedureReturn #False
EndProcedure

; Macro ClearBit(index)
;    rawbits(index >> 4) & ~(1 << ((index & $f) >> 1))
; EndMacro

Procedure ClearBit(index.i)
  rawbits(index >> 4) & ~(1 << ((index & $f) >> 1))
;    If index & 1
;       index / 2
;       rawbits(index / 8) & ~(1 << (index % 8))
;    EndIf
EndProcedure
@BarryG: told you :oops: (my solution was very similar to NicTheQuick's, so i refrained from publishing it.)

Re: PureBasic speed test compared to this video

Posted: Fri Mar 26, 2021 1:40 pm
by NicTheQuick
Axolotl wrote:@NicTheQuick:
if i understand you correctly, these are the fastest routines based on procedure, right? :oops:
The fastest should be these:

Code: Select all

Procedure GetBit(index)
	ProcedureReturn Bool((index & 1) And rawbits(index >> 4) & (1 << ((index & $f) >> 1)))
EndProcedure

Procedure ClearBit(index)
	rawbits(index >> 4) & ~(1 << ((index & $f) >> 1))
EndProcedure
They are branchless (best for optimization within the CPU itself) and do not use any division or modulo (which is a division by itself).