Simple speed benchmark results

Everything else that doesn't fall into one of the other PB categories.
OgreVorbis
User
User
Posts: 77
Joined: Thu Jan 16, 2020 10:47 pm

Simple speed benchmark results

Post by OgreVorbis »

I began experimenting with using Lua in PureBasic and I was curious about the speed compared to other languages. I found these benchmark results on github and added PureBasic to the list. The code is extremely simple, so it may not be a very good real world example, however, it's still interesting. There is a better benchmark called prime sieve, but I have not implemented it in PB yet.

( original link: https://github.com/DNS/benchmark-language )

I pruned and sorted the results to make it easier to read. And I added PureBasic to the results. I thought it would be better. Let me know if there is some way I can further optimize the PB version.

Here is the code I wrote for the PB version:

Code: Select all

EnableExplicit
Define x.d = 1
Define i.i = 0

OpenConsole()
Define StartTime.d = ElapsedMilliseconds()

For i = 0 To 99999998
	x = (i+i+2*i+1-0.379)/(x)
Next

Define EndTime.d = (ElapsedMilliseconds() - StartTime) / 1000
PrintN("It took " + StrD(EndTime) + " seconds to complete.")
PrintN("Result: " + StrD(x))
PrintN("Press ENTER to exit. . .")
Input()
CloseConsole()
And here's the Lua it was based off of:

Code: Select all

local x = 1
for i=0,99999998 do
	x = (i+i+2*i+1-0.379)/(x)
end
print(x)
Finally, the results:

Code: Select all

C (CLANG LLVM 6.0.0)
command took 0.65s

C (MSVC 18, VS 2013)
command took 0.65s

C (MINGW CLANG 8.0.1)
command took 0.65s

LuaJIT 2.0.5
command took 0.65s

C (CYGWIN CLANG 8.0.1)
command took 0.66s

C (GCC 7.2.0)
command took 0.66s

C (MINGW GCC 10.2.0)
command took 0.66s

C (CYGWIN GCC 10.2.0)
command took 0.66s

C# .NET CLR (CSC 12)
command took 0.67s

PureBasic 5.73 LTS 64-bit
command took 1.02s
(debug mode took 7.00s)

C (Embarcadero C++ 6.60 for Win32)
command took 1.40s

LUAC 5.3.4
command took 7.19s

LUA 5.3.4
command took 4.37s
Last edited by OgreVorbis on Sat Nov 06, 2021 5:12 am, edited 1 time in total.
My blog/software site: http://dosaidsoft.com/
User avatar
STARGÅTE
Addict
Addict
Posts: 2067
Joined: Thu Jan 10, 2008 1:30 pm
Location: Germany, Glienicke
Contact:

Re: Simple speed benchmark results

Post by STARGÅTE »

How do you compare your PB execution with the reported executed times?
Is it the same computer system? A benchmark depends on the operating system. Have you run also all other codes?
Your code result on my machine:

Code: Select all

It took 0.573 seconds to complete.
Result: 7051.5711976423
Press ENTER to exit. . .
PB 6.01 ― Win 10, 21H2 ― Ryzen 9 3900X, 32 GB ― NVIDIA GeForce RTX 3080 ― Vivaldi 6.0 ― www.unionbytes.de
Lizard - Script language for symbolic calculations and moreTypeface - Sprite-based font include/module
User avatar
Tenaja
Addict
Addict
Posts: 1948
Joined: Tue Nov 09, 2010 10:15 pm

Re: Simple speed benchmark results

Post by Tenaja »

I am really curious to see how the C output version of pb compares, with the optimizer at its best.
OgreVorbis
User
User
Posts: 77
Joined: Thu Jan 16, 2020 10:47 pm

Re: Simple speed benchmark results

Post by OgreVorbis »

STARGÅTE wrote: Sat Nov 06, 2021 12:22 am How do you compare your PB execution with the reported executed times?
Is it the same computer system? A benchmark depends on the operating system. Have you run also all other codes?
You're correct, I did not run all the test on my PC so the results can be flawed. I will edit that out for now until I can do all the tests on my PC. Maybe the results are misleading.

Now I did another (ALL) tests on my machine. Better results. If someone wants to test the new C backed, please post.

Code: Select all

100,000,000 Primes:
===================

* All compilers are 64-bit running on same machine
(compiler - seconds - RAM - exe size)

GCC 8.1 - 1.63 - 99MB - 56KB
PB (byte) - 1.66 - 98MB - 14.5KB
Pelles C - 1.92 - 98MB - 53KB
PB (int) - 2.8 - 783MB - 14.5KB
LuaJIT - 5 - 1.05GB
Lua - 19 - 2.1GB
PB code:

Code: Select all

EnableExplicit
Dim Nums.a(0)
Define l, n, m, lim
 
If OpenConsole()
 
  ; Ask for the limit to search, get that input and allocate a Array
  Print("Enter limit for this search: ")
  lim = Val(Input())
  
  Define StartTime.d = ElapsedMilliseconds()
  Dim Nums(lim)
 
  ; Use a basic Sieve of Eratosthenes
  For n = 2 To Sqr(lim)
    If Nums(n) = #False
      m = n * n
      While m <= lim
        Nums(m) = #True
        m + n
      Wend
    EndIf
  Next n
  
  Define EndTime.d = (ElapsedMilliseconds() - StartTime) / 1000
  PrintN("It took " + StrD(EndTime) + " seconds to complete.")
  Print("Press ENTER to list results. . . ") : Input()
  PrintN(#CRLF$ + "The primes up to " + Str(lim) + " are:")
  m = 0
  For n = 2 To lim
    If Nums(n) = #False
      PrintN(Str(n))
      m + 1
    EndIf
  Next
 
  Print(#CRLF$ + #CRLF$ + "Press ENTER to exit"): Input()
  CloseConsole()
EndIf
Lua code for comparison: https://pastebin.com/kH8pyx2b
C code: https://pastebin.com/vYxrJL2J

I rewrote in C and Lua to be as similar as possible. In this case I'm very impressed with PB. It's so close to GCC with full optimization.
My blog/software site: http://dosaidsoft.com/
OgreVorbis
User
User
Posts: 77
Joined: Thu Jan 16, 2020 10:47 pm

Re: Simple speed benchmark results

Post by OgreVorbis »

Alright, I updated all the code (were some issues in C version) and had some others look at it to confirm. Here are the final results:

Code: Select all

100,000,000 Primes:
===================

* All compilers are 64-bit running on same machine
(compiler - seconds - RAM - exe size)

GCC 8.1 - 1.19 - 99MB - 17KB
Pelles C - 1.19 - 98MB - 50KB
C# (x64 (.NET 4.0)) - 1.27 - 101MB - 12KB
C# (x86 (.NET 4.0)) - 1.57 - 100MB - 16KB
PB 5.73 (x64) - 1.59 - 98MB - 15KB
LuaJIT - 4 - 1.05GB
Lua - 17 - 2.1GB
Not as impressing unfortunately.
PureBasic turns to 1.65 - 1.7 if any more code of any kind is added to the file :( This is interesting to me. Apparently the assembler is slowing down with more code added (completely outside and away from the main calculation). For example, I added a Macro with a tiny "repeat inkey" loop to get "press any key to continue" functionality. When doing so, the program slowed down. I also tried adding a double and pre-calculating the sqt outside the main loop. This should provide better performance, but it actually got worse do to the small size increase in the program. Very strange. So the takeaway is, larger PB programs will run slower even if the performant code is untouched. I bet the C backend will not have this problem.

Code updates:
C code: https://pastebin.com/R7CNS82k
C# code: https://pastebin.com/T9mnfKrX (the sqrt being outside the loop made no difference)
Lua code: https://pastebin.com/kH8pyx2b
PB code is unchanged.
My blog/software site: http://dosaidsoft.com/
User avatar
STARGÅTE
Addict
Addict
Posts: 2067
Joined: Thu Jan 10, 2008 1:30 pm
Location: Germany, Glienicke
Contact:

Re: Simple speed benchmark results

Post by STARGÅTE »

OgreVorbis wrote: Sun Nov 14, 2021 10:36 am So the takeaway is, larger PB programs will run slower even if the performant code is untouched.
:lol: You can not claim a general statement for all PB programs when just test one code on one machine.

Please be aware of CPU properties like: L1-, L2-, L3-Cache, Hyper-Threading, frequency turbo boost.
What I want to say is:
The identical code can run faster when some dummy code is running before, "heating up" the CPU and enable the turbo boost.
The identical code can run slower when some other code is added, which then could also change the content of the L1/2/3 cache.
And these are just two example of many other influences.
OgreVorbis wrote: Sun Nov 14, 2021 10:36 am I also tried adding a double and pre-calculating the sqt outside the main loop. This should provide better performance, but it actually got worse do to the small size increase in the program. Very strange.
Yes it should. However, if your limit is large, the bottleneck is the inner while loop which increases in operating time more and more. If this inner code is then running in a not optimized way, because of other codes around this part, then the time can increase.
PB 6.01 ― Win 10, 21H2 ― Ryzen 9 3900X, 32 GB ― NVIDIA GeForce RTX 3080 ― Vivaldi 6.0 ― www.unionbytes.de
Lizard - Script language for symbolic calculations and moreTypeface - Sprite-based font include/module
OgreVorbis
User
User
Posts: 77
Joined: Thu Jan 16, 2020 10:47 pm

Re: Simple speed benchmark results

Post by OgreVorbis »

STARGÅTE wrote: Sun Nov 14, 2021 11:19 am :lol: You can not claim a general statement for all PB programs when just test one code on one machine.

Please be aware of CPU properties like: L1-, L2-, L3-Cache, Hyper-Threading, frequency turbo boost.
What I want to say is:
The identical code can run faster when some dummy code is running before, "heating up" the CPU and enable the turbo boost.
The identical code can run slower when some other code is added, which then could also change the content of the L1/2/3 cache.
And these are just two example of many other influences.
Yes, I should have said "might" run slower cause there are so many variables surrounding the situation.
The point is that adding more and more to the C# and C programs (some not included in posted code) did not effect the speed of the main calculation, but on PB it did. I don't mean to say it will always be like this.
Those are interesting things you mentioned though - about the turbo and cache. Something I noticed is that when the PB program is launched, it changes the mouse to the waiting cursor for a few seconds (even though it loads right away). I found that strange cause the others didn't. Maybe PB runs some sort of initialization code.
My blog/software site: http://dosaidsoft.com/
Jeff8888
User
User
Posts: 38
Joined: Fri Jan 31, 2020 6:48 pm

Re: Simple speed benchmark results

Post by Jeff8888 »

Out of curiousity I decided to compare PB to directly programming in assembly. So I generated assembly code with PB then took the portion for the loops and modified it and put it back into the basic code. Mainly just used registers for variables. Various instructions could be pulled out of the loops as they only need to be done once and not every loop such as putting the address of the array into register rbp.

Bottom line to sieve up to 100,000,000 takes about 1.5 secs with no assembly and 1.05 secs using assembly. Be sure to compile with no debug. When I was debugging the assembly code, I needed to insert a simple statement like dummy=1 and set a breakpoint there. Thanks PB for being able to display the r registers in the debugger.

No assembly code

Code: Select all

Dim Nums.b(0)
Define.l l, n, m, lim,limroot

If OpenConsole()
  
  ; Ask for the limit to search, get that input and allocate a Array
  Print("Enter limit for this search: ")
  lim = Val(Input())
  If lim=0
    lim=100000000 ;default if zero input
  EndIf
  
  Define StartTime.d = ElapsedMilliseconds()
  Dim Nums(lim)
  limroot=Sqr(lim)
  
  ; Use a basic Sieve of Eratosthenes
  n=2 
  Repeat
    If Nums(n) = #False
      m = n * n
      While m <= lim
        Nums(m)+1
        m = m + n
      Wend
    EndIf
    n= n+1
  Until n>limroot
  
  Define EndTime.d = (ElapsedMilliseconds() - StartTime) / 1000.0
  PrintN("It took " + StrD(EndTime) + " seconds to complete.")
  Print("Press ENTER to list results. . . ") : Input()
  PrintN(#CRLF$ + "The primes up to " + Str(lim) + " are:")
  m = 0
  For n = 2 To lim
    If Nums(n) = #False
      ;PrintN(Str(n))
      m = m + 1
    EndIf
  Next
  PrintN(#CRLF$ + "The number of primes up to " + Str(lim) + " is: "+Str(m))
  Dim pcount(20)
  For i=0 To 9
    If i=0
      pcount(i)=pcount(i)-2
    EndIf
    For j=i*lim/10 To i*lim/10+lim/10-1
      If Nums(j) = #False
        ;PrintN(Str(n))
        pcount(i)=pcount(i)+1
      EndIf
      
    Next
    PrintN("Num Primes from "+Str(i*lim/10)+" To " +Str(i*lim/10+lim/10-1)+" is "+Str(pcount(i)))
  Next
  Print(#CRLF$ + "Press ENTER to exit"): Input()
  CloseConsole()
EndIf

Assembly code:

Code: Select all

Dim Nums.a(0)
Define.l l, n, m

If OpenConsole()
  again:
  ; Ask for the limit to search, get that input and allocate a Array
  Print("Enter limit for this search: ")
  lim = Val(Input())
  If lim=0
    lim=100000000 ;default if zero input
  EndIf
  
  Define StartTime.d = ElapsedMilliseconds()
  Dim Nums(lim)
  limroot=Sqr(lim)
  EnableASM 
  ; Use a basic Sieve of Eratosthenes
  ; n=2 
  ! MOV    R11,2
  
  ;Initialize registers
  ! MOV rbp,qword [a_Nums]
  ! MOV r8,qword [v_limroot]
  ! MOV r9, qword [v_lim]

  ; Repeat
  ! _Repeat5:
  ; If Nums(n) = #False
  
  ! MOVZX  r15,byte [rbp+r11]
  ! XOR rax,rax
  ! CMP r15,rax
  ! JNE   _EndIf7
  
  ; m = n * n
  ! MOV r15,R11
  ! IMUL   r15,R11
  ! MOV    R10,r15

  ; While m <= lim
  ! _While8:
  ! CMP    R10,R9
  ! JG    _Wend8
  
  ; Nums(m) = #True
  ! MOV    byte [rbp+r10],1
  
  ; m = m + n
  ! ADD    r10,r11
 
  ; Wend
  ! JMP   _While8
  ! _Wend8:
  
  ; EndIf
  ! _EndIf7:
  ; n= n+1
  
  ! INC R11
  
  ; Until n>limroot
  ! CMP    R11,R8
  ! JLE   _Repeat5
  DisableASM
  
  Define EndTime.d = (ElapsedMilliseconds() - StartTime) / 1000.0
  PrintN("It took " + StrD(EndTime) + " seconds to complete.")
  Print("Press ENTER to list results. . . ") : Input()
  PrintN(#CRLF$ + "The primes up to " + Str(lim) + " are:")
  m = 0
  For n = 2 To lim
    If Nums(n) = #False
      ;PrintN(Str(n))  ;uncomment for small numbers if you wish
      m = m + 1
    EndIf
  Next
  PrintN(#CRLF$ + "The number of primes up to " + Str(lim) + " is: "+Str(m))
  Dim pcount(20)
  lim10=lim/10
  For i=0 To 9
    If i=0
      pcount(i)=pcount(i)-2
    EndIf
    For j=i*lim10 To i*lim10+lim10-1
      If Nums(j) = #False
        pcount(i)=pcount(i)+1
      EndIf
    Next
    PrintN("Num Primes from "+Str(i*lim10)+" To " +Str(i*lim10+lim10-1)+" is "+Str(pcount(i)))
  Next
  Print(#CRLF$ + "Press ENTER to repeat"): Input()
 Goto again
EndIf
User avatar
chikega
User
User
Posts: 34
Joined: Fri Dec 04, 2020 3:19 am

Re: Simple speed benchmark results

Post by chikega »

This topic somewhat reminds me of the language drag racing on Dave's Garage YouTube channel. :D
Gary E Chike DMD MS
'Experience is what you get when you don't get what you want' Image
jack
Addict
Addict
Posts: 1336
Joined: Fri Apr 25, 2003 11:10 pm

Re: Simple speed benchmark results

Post by jack »

I am not convinced that the benchmark by Dave is foolproof, because I have found that gcc is very good at removing dead code, for example, it will completely remove code in a loop if it's calculations are not used somewhere else, preferably a printout to screen
this simple speed-test should be foolproof as long as you output some of the result to screen
OgreVorbis
User
User
Posts: 77
Joined: Thu Jan 16, 2020 10:47 pm

Re: Simple speed benchmark results

Post by OgreVorbis »

chikega wrote: Tue Dec 07, 2021 6:38 am This topic somewhat reminds me of the language drag racing on Dave's Garage YouTube channel. :D
Ah, yes, I watched it. It's what got me into this idea. I decided not to use his code though because I didn't want to take so much time to implement. I also knew I would be re-writing it many times in different languages as time goes on. Genius factor is not what matters here, it's how similar you can make the code in each language for a fair comparison.

As a side note, I tested Python today (with the better revised code). It's atrocious. Tested three times, and got an average time of 57 seconds :lol:
RAM peaked at 1.4GB
My blog/software site: http://dosaidsoft.com/
OgreVorbis
User
User
Posts: 77
Joined: Thu Jan 16, 2020 10:47 pm

Re: Simple speed benchmark results

Post by OgreVorbis »

Updated results:
(new ones are FreeBASIC, PowerBASIC, HolyC, Python, and PyPy (JIT-ed python))

Code: Select all

100,000,000 Primes:
===================

* All compilers are 64-bit if not mentioned and all on same machine
(compiler - seconds - RAM - exe size)

GCC 8.1 - 1.19 - 99MB - 17KB
FreeBASIC (x64) - 1.19 - 99MB - 73KB
Pelles C - 1.19 - 98MB - 50KB
PowerBASIC (x86) - 1.20 - 98MB - 16KB
C# (x64 (.NET 4.0)) - 1.27 - 101MB - 12KB
FreeBASIC (x86) - 1.5 - 98MB - 66KB
C# (x86 (.NET 4.0)) - 1.57 - 100MB - 16KB
PB 5.73 (x64) - 1.59 - 98MB - 15KB
HolyC in VirtualBox - 1.88 - ? - ?
LuaJIT - 4 - 1.05GB
PyPy 3.8 - 7.5 - 1.57GB
Lua - 17 - 2.1GB
Python 3.8 - 57 - 787MB (peaks at 1.4GB)
Most of the code snips are on my pastebin: https://pastebin.com/u/OgreVorbis
My blog/software site: http://dosaidsoft.com/
User avatar
blueb
Addict
Addict
Posts: 1041
Joined: Sat Apr 26, 2003 2:15 pm
Location: Cuernavaca, Mexico

Re: Simple speed benchmark results

Post by blueb »

OgreVorbis... I find this all very interesting.

Results from my desktop computer...
--------------------------------------
Enter limit for this search: 100000000
It took 0.578 seconds to complete.
Press ENTER to list results. . .
===========================
Using assembly version...
--------------------------------------
Enter limit for this search: 100000000
It took 0.401 seconds to complete.
Press ENTER to list results. . .
--------------------------------------

As you can see from my profile, I have a faster machine. So hardware is important.

Back in the day, I can remember 300 baud modems to communicate from home to my business. (300 baud transferred about 30 ascii characters per second over the telephone line)
Today my broadband connection just reported... 41 Mbps (1 Mbps is one million bits per second)
So I can transfer info at: 4,100,000 characters per second! Wow!

So as hardware prices fall we all benefit. Just saying. :)


Getting back to languages...

I believe that choosing a language that is easy to use outweighs speed, as long as the result is 'fast enough' to do the job.

My take on this... if you selected a programming language that requires less staff you will save more by lowering your payroll costs than by trying to squeeze every bit of performance by changing languages.

PureBasic is:
- less verbose
- source is easy to read and understand (now and for future changes)
- fast to prototype
- easy to optimize
- most likely... fast enough


Developer salaries are going UP... while hardware costs are going DOWN.

So it makes sense to stick with a programming language that suits the developer, and accomplishes the task required (speed included).
- It was too lonely at the top.

System : PB 6.10 Beta 9 (x64) and Win Pro 11 (x64)
Hardware: AMD Ryzen 9 5900X w/64 gigs Ram, AMD RX 6950 XT Graphics w/16gigs Mem
User_Russian
Addict
Addict
Posts: 1443
Joined: Wed Nov 12, 2008 5:01 pm
Location: Russia

Re: Simple speed benchmark results

Post by User_Russian »

OgreVorbis wrote: Mon Dec 20, 2021 8:26 pmPB 5.73 (x64) - 1.59 - 98MB - 15KB
Need result PB 6.00 C Backend with optimize generated code.
User avatar
skywalk
Addict
Addict
Posts: 3972
Joined: Wed Dec 23, 2009 10:14 pm
Location: Boston, MA

Re: Simple speed benchmark results

Post by skywalk »

Would that not be close to the GCC 8.1?
The nice thing about standards is there are so many to choose from. ~ Andrew Tanenbaum
Post Reply