Which one is faster

Just starting out? Need help? Post your questions and find answers here.
User avatar
pf shadoko
Enthusiast
Enthusiast
Posts: 385
Joined: Thu Jul 09, 2015 9:07 am

Re: Which one is faster

Post by pf shadoko »

ah ! ok

some modifications (again) so that the calculations are done with the backendc + optimization
(X3 for me !)

Code: Select all

DisableDebugger

Procedure Limit1(x.l)
  If x>255
    x=255
  ElseIf x<0
    x=0
  EndIf
  ProcedureReturn x
EndProcedure


Procedure Limit2(x.l)
  If x>255
    ProcedureReturn 255
  ElseIf x<0
    ProcedureReturn 0
  Else
    ProcedureReturn x
  EndIf
EndProcedure


Procedure Limit3(x.l)
  If x>255
    ProcedureReturn 255
  ElseIf x<0
    ProcedureReturn 0
  EndIf
  ProcedureReturn x
EndProcedure


Procedure Limit4(x.l)
	If x & (~$FF) = 0
		ProcedureReturn x
	ElseIf x < 0
		ProcedureReturn 0
	Else
		ProcedureReturn 255
	EndIf
EndProcedure

Procedure Limit5(x.l)
	If x < 0
		ProcedureReturn 0
	ElseIf x > 255
		ProcedureReturn 255
	EndIf
    ProcedureReturn x
EndProcedure

#num = 11
#count = 10000000

StartTime = ElapsedMilliseconds()
For i = 0 To #count 
	s+Limit1(i)
Next
xx = ElapsedMilliseconds() - StartTime

StartTime = ElapsedMilliseconds()
For i = 0 To #count 
	s+Limit2(i)
Next
yy = ElapsedMilliseconds() - StartTime

StartTime = ElapsedMilliseconds()
For i = 0 To #count 
	s+Limit3(i)
Next
zz = ElapsedMilliseconds() - StartTime

StartTime = ElapsedMilliseconds()
For i = 0 To #count 
	s+Limit4(i)
Next
aa = ElapsedMilliseconds() - StartTime

StartTime = ElapsedMilliseconds()
For i = 0 To #count 
	s+Limit5(i)
Next
bb = ElapsedMilliseconds() - StartTime

EnableDebugger
Debug "somme : "+s
Debug xx
Debug yy
Debug zz
Debug aa
Debug bb
User avatar
mk-soft
Always Here
Always Here
Posts: 6205
Joined: Fri May 12, 2006 6:51 pm
Location: Germany

Re: Which one is faster

Post by mk-soft »

Maybe Limit ByREF

Code: Select all

Procedure Limit(*Value.integer, Min, Max)
  If *Value\i < Min
    *Value\i = Min
  ElseIf *Value\i > Max
    *Value\i = Max
  EndIf
EndProcedure

a = 10
b = 120

Limit(@a, 50, 100)
Debug a
Limit(@b, 50, 100)
Debug b
My Projects ThreadToGUI / OOP-BaseClass / EventDesigner V3
PB v3.30 / v5.75 - OS Mac Mini OSX 10.xx - VM Window Pro / Linux Ubuntu
Downloads on my Webspace / OneDrive
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3942
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Re: Which one is faster

Post by wilbert »

pf shadoko wrote: Tue Dec 05, 2023 12:25 pm some modifications (again) so that the calculations are done with the backendc + optimization
(X3 for me !)
If you take a look at the generated asm code, you will see that that still isn't enough. :wink:
The C compiler is so good at optimizing that

1. The procedure isn't called like the asm backend does but the code is integrated like a macro.
2. The compiler is intelligent enough to see that you are only using positive values so no check for < 0.
3. The compiler is intelligent enough to see that storing variable s is not needed inside the loop.

So nothing is called inside the loop and nothing is stored in memory inside the loop.
What it actually does is first load the current value of s into a cpu register and then loop #count times.
If the value is above 255, use 255 and if not use the original value.
Add that to the cpu register and continue the loop.
Once the loop has been completed, write the value inside the register to memory address of value s.
Windows (x64)
Raspberry Pi OS (Arm64)
SMaag
Enthusiast
Enthusiast
Posts: 302
Joined: Sat Jan 14, 2023 6:55 pm
Location: Bavaria/Germany

Re: Which one is faster

Post by SMaag »

I guess Wilbert's ASM Macro will be the fastest Version possible.
But it has one Problem, it works only because x is used as MacroVariable and CodeVariable.
If you try to pass 'y' it won't work.

If you really need the high speed because of millions of calls in a Loop you have to eliminate the Procedure.
The Procedure Call overhead is arround half of the time at such short codes. Using Macros or direct coding is much faster!

Using a Macro for the Limit Function has futher advantage: it don't care about the variable type.
On my system there isn't a significant difference of the 2 Procedure versions!

Code: Select all

 Macro mac_IsInRange(value, min, max)
    Bool (value >= min And Val <= max)  
 EndMacro 
 
 Macro mac_Limit(Value, min, max)
   If Value > max 
     Value = max 
   ElseIf Value < min
     Value = min
   EndIf
 EndMacro 
 
 Procedure Limit(Value, min, max)
   If Value > max 
     Value = max 
   ElseIf Value < min
     Value = min
   EndIf
   ProcedureReturn Value
 EndProcedure
 
 Procedure Limit_(Value, min, max)
   If Value > max 
     ProcedureReturn max
   EndIf
   
   If Value < min
     ProcedureReturn min
   EndIf
   
   ProcedureReturn Value
 EndProcedure

Debug mac_IsInRange(10, 20,30)
Debug mac_IsInRange(10, 5, 20)

val = 10

mac_Limit(val,20,30)
Debug val

#Loops = 100000000

t = ElapsedMilliseconds()
For I = 1 To #Loops
  val = I
  val = Limit(val, 0, 255)  
Next

t = ElapsedMilliseconds() -t 

MessageRequester("Time", Str(t) + "ms")

t = ElapsedMilliseconds()
For I = 1 To #Loops
  val = I
  mac_Limit(val, 0, 255)  
Next

t = ElapsedMilliseconds() -t 
MessageRequester("Time", Str(t) + "ms")


User avatar
jacdelad
Addict
Addict
Posts: 1992
Joined: Wed Feb 03, 2021 12:46 pm
Location: Riesa

Re: Which one is faster

Post by jacdelad »

Great discussion...but we had this before (and it was me who asked).

Also, the question really was just whether it is faster to use else or put the ProcedureReturn outside the if block.

But please go on... :mrgreen:
Good morning, that's a nice tnetennba!

PureBasic 6.21/Windows 11 x64/Ryzen 7900X/32GB RAM/3TB SSD
Synology DS1821+/DX517, 130.9TB+50.8TB+2TB SSD
Olli
Addict
Addict
Posts: 1200
Joined: Wed May 27, 2020 12:26 pm

Re: Which one is faster

Post by Olli »

jacdelad wrote:But please go on...
Easy to say it...

Code: Select all

Procedure sat(value.i, min.i, max.i)
 If value > min
  If value >= max
   ProcedureReturn max
  Else
   ProcedureReturn value
  EndIf
 EndIf
 ProcedureReturn min
EndProcedure
But why do you ask a question especially on ASM backend, with native functions ? :mrgreen:
User avatar
jacdelad
Addict
Addict
Posts: 1992
Joined: Wed Feb 03, 2021 12:46 pm
Location: Riesa

Re: Which one is faster

Post by jacdelad »

Because I don't know how the c compiler would optimize the code. I'd expect the ASM compiler to be more "straightforward".
Good morning, that's a nice tnetennba!

PureBasic 6.21/Windows 11 x64/Ryzen 7900X/32GB RAM/3TB SSD
Synology DS1821+/DX517, 130.9TB+50.8TB+2TB SSD
Olli
Addict
Addict
Posts: 1200
Joined: Wed May 27, 2020 12:26 pm

Re: Which one is faster

Post by Olli »

My humble opinion : this depends of the hardware.

-> on X86/X64, the C backend compiler will make an X86/X64 ASM result as Wilbert presented if the hardware provides the right AVX instructions.

If the code of Stargate is quicker (the execution speed depends of the quantity of the similar executions), and if the hardware provides the ccMOVx instructions, so you will find the ccMOVx instructions in the final code, instead of AVX instructions.

-> on ARM, if it exists similar instructions, what I actually ignore, the C compiler will use it.

Concerning the ASM backend, it is straightforward. I did some tests 14 years ago, about If statements (only If, not ElseIf ) and I saw a strange conditionnal loop also. But it seems that, this strange spaghetti did not change the performances.

We are confused sometimes, because we expect the ASM result to be as the basic level :

Code: Select all

! CMP [v_tartempion], 5; If tartempion = 5
! JNZ endifx ; Then
CALL l_yvonne ; Goto yvonne
! endifx: ; EndIf
But not, this for lots of reasons.
If pureBasic statement allows lots of several boolean equations of everything and whatever. So, there is a sauce which stays on the back of the pan.
Example of boolean equation :

Code: Select all

If A.i(5, 4, N.d) + J.d(3, 2) > 4
This equation is allowed (without the explicite types, I agree).

Result : The optimizer of the C backend does a very big job thought by any passionates persons. If you are sure to repeat a math specific function, test maybe the C optimizer, disassembly the result, and in the rare way, you do not find the best algo, a back to the ASM backend and a manual optimizing will do the affair. But it is rare, more and more... (the C compiler can be updated)
benubi
Enthusiast
Enthusiast
Posts: 215
Joined: Tue Mar 29, 2005 4:01 pm

Re: Which one is faster

Post by benubi »

Code: Select all

 Procedure.a Limit1(x.w)
   If x<0
     ProcedureReturn 0
   ElseIf x>255
     ProcedureReturn 255
   Else 
     ProcedureReturn x
   EndIf 
EndProcedure


Debug Limit1(-1)

I suggest also testing with a shorter return type (.a) but it may result in a handicap. And if the limit of x is in .w word range I would try to use .w for parameter type because AFAIK it should only push 2 bytes (.w) on the stack instead of 4 (.l) and this may give a minimal to significant boost. Also I suspect the C Backend to convert short procedures like this one into inline procedures (with optimizer=ON) in loops or somehow shorten the calling path, but this may be also an effect related to newer+optimized CPU/OS features (compared to my older XP system). So maybe using .a as return type and .w for the parameter may be very fast/much faster in one Backend and but slower in the other. There's also the problem that there are different CPU architectures that have different strengths and weaknesses, and those can also be important inside the same family; optimizing for x86 and ARM will probably result in more different codes. Some optimization that work well on older CPU's may be brakes on newer ones.
User avatar
STARGÅTE
Addict
Addict
Posts: 2227
Joined: Thu Jan 10, 2008 1:30 pm
Location: Germany, Glienicke
Contact:

Re: Which one is faster

Post by STARGÅTE »

benubi wrote: Fri Dec 08, 2023 4:42 pm And if the limit of x is in .w word range I would try to use .w for parameter type because AFAIK it should only push 2 bytes (.w) on the stack instead of 4 (.l) and this may give a minimal to significant boost.
No, that is not how a CPU works. A CPU has a word size in which all operations communicating. Nowadays we have 64 bit CPUs, before we had 32 bit and smaller chips and oder PCs had 16 bits.
Working with this word size is the fastest way how a CPU can work. It do not push byte-by-byte or bit-by-bit on a stack or register. The Operation "MOV rax, rdx" or "PUSH rax" is one single clock cycle, one momentum where all the transistors switch and change the state of the register, the stack and the CPU.
PB 6.01 ― Win 10, 21H2 ― Ryzen 9 3900X, 32 GB ― NVIDIA GeForce RTX 3080 ― Vivaldi 6.0 ― www.unionbytes.de
Lizard - Script language for symbolic calculations and moreTypeface - Sprite-based font include/module
Post Reply