Which one is faster

pf shadoko · Post by **pf shadoko** » Tue Dec 05, 2023 12:25 pm

ah ! ok

some modifications (again) so that the calculations are done with the backendc + optimization
(X3 for me !)

DisableDebugger

Procedure Limit1(x.l)
  If x>255
    x=255
  ElseIf x<0
    x=0
  EndIf
  ProcedureReturn x
EndProcedure


Procedure Limit2(x.l)
  If x>255
    ProcedureReturn 255
  ElseIf x<0
    ProcedureReturn 0
  Else
    ProcedureReturn x
  EndIf
EndProcedure


Procedure Limit3(x.l)
  If x>255
    ProcedureReturn 255
  ElseIf x<0
    ProcedureReturn 0
  EndIf
  ProcedureReturn x
EndProcedure


Procedure Limit4(x.l)
	If x & (~$FF) = 0
		ProcedureReturn x
	ElseIf x < 0
		ProcedureReturn 0
	Else
		ProcedureReturn 255
	EndIf
EndProcedure

Procedure Limit5(x.l)
	If x < 0
		ProcedureReturn 0
	ElseIf x > 255
		ProcedureReturn 255
	EndIf
    ProcedureReturn x
EndProcedure

#num = 11
#count = 10000000

StartTime = ElapsedMilliseconds()
For i = 0 To #count 
	s+Limit1(i)
Next
xx = ElapsedMilliseconds() - StartTime

StartTime = ElapsedMilliseconds()
For i = 0 To #count 
	s+Limit2(i)
Next
yy = ElapsedMilliseconds() - StartTime

StartTime = ElapsedMilliseconds()
For i = 0 To #count 
	s+Limit3(i)
Next
zz = ElapsedMilliseconds() - StartTime

StartTime = ElapsedMilliseconds()
For i = 0 To #count 
	s+Limit4(i)
Next
aa = ElapsedMilliseconds() - StartTime

StartTime = ElapsedMilliseconds()
For i = 0 To #count 
	s+Limit5(i)
Next
bb = ElapsedMilliseconds() - StartTime

EnableDebugger
Debug "somme : "+s
Debug xx
Debug yy
Debug zz
Debug aa
Debug bb

mk-soft · Post by **mk-soft** » Tue Dec 05, 2023 12:51 pm

Maybe Limit ByREF

Code: Select all

Procedure Limit(*Value.integer, Min, Max)
  If *Value\i < Min
    *Value\i = Min
  ElseIf *Value\i > Max
    *Value\i = Max
  EndIf
EndProcedure

a = 10
b = 120

Limit(@a, 50, 100)
Debug a
Limit(@b, 50, 100)
Debug b

wilbert · Post by **wilbert** » Tue Dec 05, 2023 1:27 pm

pf shadoko wrote: Tue Dec 05, 2023 12:25 pm some modifications (again) so that the calculations are done with the backendc + optimization
(X3 for me !)

If you take a look at the generated asm code, you will see that that still isn't enough.

The C compiler is so good at optimizing that

1. The procedure isn't called like the asm backend does but the code is integrated like a macro.
2. The compiler is intelligent enough to see that you are only using positive values so no check for < 0.
3. The compiler is intelligent enough to see that storing variable s is not needed inside the loop.

So nothing is called inside the loop and nothing is stored in memory inside the loop.
What it actually does is first load the current value of s into a cpu register and then loop #count times.
If the value is above 255, use 255 and if not use the original value.
Add that to the cpu register and continue the loop.
Once the loop has been completed, write the value inside the register to memory address of value s.

SMaag · Post by **SMaag** » Tue Dec 05, 2023 4:01 pm

I guess Wilbert's ASM Macro will be the fastest Version possible.
But it has one Problem, it works only because x is used as MacroVariable and CodeVariable.
If you try to pass 'y' it won't work.

If you really need the high speed because of millions of calls in a Loop you have to eliminate the Procedure.
The Procedure Call overhead is arround half of the time at such short codes. Using Macros or direct coding is much faster!

Using a Macro for the Limit Function has futher advantage: it don't care about the variable type.
On my system there isn't a significant difference of the 2 Procedure versions!

Code: Select all

 Macro mac_IsInRange(value, min, max)
    Bool (value >= min And Val <= max)  
 EndMacro 
 
 Macro mac_Limit(Value, min, max)
   If Value > max 
     Value = max 
   ElseIf Value < min
     Value = min
   EndIf
 EndMacro 
 
 Procedure Limit(Value, min, max)
   If Value > max 
     Value = max 
   ElseIf Value < min
     Value = min
   EndIf
   ProcedureReturn Value
 EndProcedure
 
 Procedure Limit_(Value, min, max)
   If Value > max 
     ProcedureReturn max
   EndIf
   
   If Value < min
     ProcedureReturn min
   EndIf
   
   ProcedureReturn Value
 EndProcedure

Debug mac_IsInRange(10, 20,30)
Debug mac_IsInRange(10, 5, 20)

val = 10

mac_Limit(val,20,30)
Debug val

#Loops = 100000000

t = ElapsedMilliseconds()
For I = 1 To #Loops
  val = I
  val = Limit(val, 0, 255)  
Next

t = ElapsedMilliseconds() -t 

MessageRequester("Time", Str(t) + "ms")

t = ElapsedMilliseconds()
For I = 1 To #Loops
  val = I
  mac_Limit(val, 0, 255)  
Next

t = ElapsedMilliseconds() -t 
MessageRequester("Time", Str(t) + "ms")

jacdelad · Post by **jacdelad** » Tue Dec 05, 2023 7:31 pm

Great discussion...but we had this before (and it was me who asked).

Also, the question really was just whether it is faster to use else or put the ProcedureReturn outside the if block.

But please go on...

Olli · Post by **Olli** » Tue Dec 05, 2023 7:42 pm

jacdelad wrote:But please go on...

Easy to say it...

Code: Select all

Procedure sat(value.i, min.i, max.i)
 If value > min
  If value >= max
   ProcedureReturn max
  Else
   ProcedureReturn value
  EndIf
 EndIf
 ProcedureReturn min
EndProcedure

But why do you ask a question especially on ASM backend, with native functions ?

jacdelad · Post by **jacdelad** » Tue Dec 05, 2023 8:07 pm

Because I don't know how the c compiler would optimize the code. I'd expect the ASM compiler to be more "straightforward".

Olli · Post by **Olli** » Tue Dec 05, 2023 9:10 pm

My humble opinion : this depends of the hardware.

-> on X86/X64, the C backend compiler will make an X86/X64 ASM result as Wilbert presented if the hardware provides the right AVX instructions.

If the code of Stargate is quicker (the execution speed depends of the quantity of the similar executions), and if the hardware provides the ccMOVx instructions, so you will find the ccMOVx instructions in the final code, instead of AVX instructions.

-> on ARM, if it exists similar instructions, what I actually ignore, the C compiler will use it.

Concerning the ASM backend, it is straightforward. I did some tests 14 years ago, about If statements (only If, not ElseIf ) and I saw a strange conditionnal loop also. But it seems that, this strange spaghetti did not change the performances.

We are confused sometimes, because we expect the ASM result to be as the basic level :

Code: Select all

! CMP [v_tartempion], 5; If tartempion = 5
! JNZ endifx ; Then
CALL l_yvonne ; Goto yvonne
! endifx: ; EndIf

But not, this for lots of reasons.
If pureBasic statement allows lots of several boolean equations of everything and whatever. So, there is a sauce which stays on the back of the pan.
Example of boolean equation :

Code: Select all

If A.i(5, 4, N.d) + J.d(3, 2) > 4

This equation is allowed (without the explicite types, I agree).

Result : The optimizer of the C backend does a very big job thought by any passionates persons. If you are sure to repeat a math specific function, test maybe the C optimizer, disassembly the result, and in the rare way, you do not find the best algo, a back to the ASM backend and a manual optimizing will do the affair. But it is rare, more and more... (the C compiler can be updated)

benubi · Post by **benubi** » Fri Dec 08, 2023 4:42 pm

Code: Select all

 Procedure.a Limit1(x.w)
   If x<0
     ProcedureReturn 0
   ElseIf x>255
     ProcedureReturn 255
   Else 
     ProcedureReturn x
   EndIf 
EndProcedure


Debug Limit1(-1)

I suggest also testing with a shorter return type (.a) but it may result in a handicap. And if the limit of x is in .w word range I would try to use .w for parameter type because AFAIK it should only push 2 bytes (.w) on the stack instead of 4 (.l) and this may give a minimal to significant boost. Also I suspect the C Backend to convert short procedures like this one into inline procedures (with optimizer=ON) in loops or somehow shorten the calling path, but this may be also an effect related to newer+optimized CPU/OS features (compared to my older XP system). So maybe using .a as return type and .w for the parameter may be very fast/much faster in one Backend and but slower in the other. There's also the problem that there are different CPU architectures that have different strengths and weaknesses, and those can also be important inside the same family; optimizing for x86 and ARM will probably result in more different codes. Some optimization that work well on older CPU's may be brakes on newer ones.

STARGÅTE · Post by **STARGÅTE** » Fri Dec 08, 2023 10:04 pm

benubi wrote: Fri Dec 08, 2023 4:42 pm And if the limit of x is in .w word range I would try to use .w for parameter type because AFAIK it should only push 2 bytes (.w) on the stack instead of 4 (.l) and this may give a minimal to significant boost.

No, that is not how a CPU works. A CPU has a word size in which all operations communicating. Nowadays we have 64 bit CPUs, before we had 32 bit and smaller chips and oder PCs had 16 bits.
Working with this word size is the fastest way how a CPU can work. It do not push byte-by-byte or bit-by-bit on a stack or register. The Operation "MOV rax, rdx" or "PUSH rax" is one single clock cycle, one momentum where all the transistors switch and change the state of the register, the stack and the CPU.

PureBasic Forums - English

Which one is faster

Re: Which one is faster

Re: Which one is faster

Re: Which one is faster

Re: Which one is faster

Re: Which one is faster

Re: Which one is faster

Re: Which one is faster

Re: Which one is faster

Re: Which one is faster

Re: Which one is faster