IF+IF is quicker than IF+AND or IF+OR

Got an idea for enhancing PureBasic? New command(s) you'd like to see?
Trond
Always Here
Always Here
Posts: 7446
Joined: Mon Sep 22, 2003 6:45 pm
Location: Norway

Post by Trond »

Code: Select all

@Trond: With your code I get -23% -> -28% .
Strange, I get between 10 and 20%.
User avatar
Rescator
Addict
Addict
Posts: 1769
Joined: Sat Feb 19, 2005 5:05 pm
Location: Norway

Post by Rescator »

@Trond
on x64 i get -11% to -17% with most runs getting -17%
on x86 i get 0% to 8% with most runs getting 0%.

And obviously I selected compile without debugger in the menu all times.
Trond
Always Here
Always Here
Posts: 7446
Joined: Mon Sep 22, 2003 6:45 pm
Location: Norway

Post by Trond »

GCC generates the if+if version even if you write and, by the way.
Ollivier
Enthusiast
Enthusiast
Posts: 281
Joined: Mon Jul 23, 2007 8:30 pm
Location: FR

Post by Ollivier »

What values do you get with this other algo?

Code: Select all


DisableDebugger



Macro ExecutionIfTrue()
   ; Nothing
EndMacro



Macro Test1()
            If A
               If B
                  ExecutionIfTrue()
               EndIf
            EndIf
EndMacro



Macro Test2()
            If A And B
               ExecutionIfTrue()
            EndIf
EndMacro




Macro Test3() ; (= If A And B)    /!\ /!\ /!\ /!\ /!\
            !Lea esi,[v_A]
            !Lodsd
            !And eax,[v_B]
            !JNE l_istrue  
            ; Code execution if false          
            !JMP l_wasfalse
IsTrue:
            ; Code execution if true
            ExecutionIfTrue()
WasFalse:
EndMacro ;                        \!/ \!/ \!/ \!/ \!/




Macro Counter(InternalMacro)
      Cumul = 0
      For A = 0 To 1
         For B = 0 To 1
            !Rdtsc
            !Mov [v_ValeurIni], EAX
            InternalMacro
            !Rdtsc
            !Mov [v_ValeurFin], EAX
            Cumul + (ValeurFin - ValeurIni)
         Next
      Next
EndMacro




   Define Test1.I
   Define Test2.I
   Define Test3.I
   Define ValeurIni.I
   Define ValeurFin.I
   Define Cumul.I
   Define A.I = 0
   Define B.I = 0
   Define TotalTest.I = 1

   For I = 1 To TotalTest
      Counter(Test1() ): Test1 + (Cumul / (1 << 2))
      Counter(Test2() ): Test2 + (Cumul / (1 << 2))
      Counter(Test3() ): Test3 + (Cumul / (1 << 2))
   Next

   MessageRequester("Tests without debugger", "Test #1 (If a If b) = " + Str(Test1) + " cycles" + Chr(10) + "Test #2 (If a And b) = " + Str(Test2) + " cycles" + Chr(10) + "Test #3 (Asm /!\ with ESI reg) = " + Str(Test3) + " cycles")
Trond
Always Here
Always Here
Posts: 7446
Joined: Mon Sep 22, 2003 6:45 pm
Location: Norway

Post by Trond »

Rdtsc is not reliable on modern processors, and also DisableDebugger is not enough, you must disable the debugger from the menu to turn on the assembly optimizer.
Ollivier
Enthusiast
Enthusiast
Posts: 281
Joined: Mon Jul 23, 2007 8:30 pm
Location: FR

Post by Ollivier »

@Trond... Mh... :D
Trond wrote:The test is flawed. The conditions are true once and false 63999999 times, which is not very balanced.
Are you sure?

Code: Select all

For A = 0 To 8000
      For B = 0 To 8000
         If A
            If B
           
            EndIf
         EndIf
      Next B
   Next A 
Really sure?
Trond wrote:Rdtsc is not reliable on modern processors
Are you again sure? I want to believe you. So, could you give me the links which tell me which processor family is excluded?
Trond wrote:DisableDebugger is not enough
Are you again sure?Which code example could you give me to prove it?

In the next code, I wrote it:

Code: Select all

Macro Test3() ; (= If A And B)    /!\ /!\ /!\ /!\ /!\
            !Lea esi,[v_A]
            !Lodsd
            !And eax,[v_B]
            !JNE l_istrue 
            ; Code execution if false         
            !JMP l_wasfalse
IsTrue:
            ; Code execution if true
WasFalse:
EndMacro ;                        \!/ \!/ \!/ \!/ \!/ 
Have you execution duration values comparing the classical expression:

Code: Select all

If A And B
EndIf
Thank you

Ollivier
User avatar
Rescator
Addict
Addict
Posts: 1769
Joined: Sat Feb 19, 2005 5:05 pm
Location: Norway

Post by Rescator »

Trond is right!

RDTSC is not reliable for timing, It may suddenly jump ahead sometimes, speestep/auto over or auto underclocking etc.

If the CPU was to speed up or down between or during then the timing info is useless. Even the QueryPerformanceCounter has a few of this issues (as on some systems it also uses RDTSC (mostly older systems) instead of a high res clock crystal)

As to DisableDebugger, I've noticed speed differences between DisableDebugger and hoosing compile without debugger in the menu.
When you use DisableDebugger the debugger is still there, maybe Freak could tell you the difference betwenn the DisableDebugger and compiling without the debugger.

I re-did your test to use timeGetTime set to 1ms timing.
It's the easiest way to get accurate timing on Windows 2000 and later (on Linux and Mac and Win9x you could simply use ElapsedMilliseconds)
Donwside is you gotta do a lot of loops to see the difference.

If the difference is too small (you got a very fast CPU) then add a 0 to the test loop, the test took 6+ seconds here on my system then instead of less than a second as in the number below.


Code: Select all

DisableDebugger
timeBeginPeriod_(1)


Macro ExecutionIfTrue()
   ; Nothing
EndMacro



Macro Test1()
            If A
               If B
                  ExecutionIfTrue()
               EndIf
            EndIf
EndMacro



Macro Test2()
            If A And B
               ExecutionIfTrue()
            EndIf
EndMacro




Macro Test3() ; (= If A And B)    /!\ /!\ /!\ /!\ /!\
            !Lea esi,[v_A]
            !Lodsd
            !And eax,[v_B]
            !JNE l_istrue 
            ; Code execution if false         
            !JMP l_wasfalse
IsTrue:
            ; Code execution if true
            ExecutionIfTrue()
WasFalse:
EndMacro ;                        \!/ \!/ \!/ \!/ \!/




Macro Counter(InternalMacro)
      Cumul = 0
      For A = 0 To 1
         For B = 0 To 1
            InternalMacro
         Next
      Next
EndMacro




   Define Test1.I
   Define Test2.I
   Define Test3.I
   Define ValeurIni.I
   Define ValeurFin.I
   Define Cumul.I
   Define A.I = 0
   Define B.I = 0
   Define TotalTest.I = 10000000

   ValeurIni=timeGetTime_()
   For I = 1 To TotalTest
      Counter(Test1() )
   Next
   ValeurFin=timeGetTime_()
   Test1=ValeurFin-ValeurIni

   ValeurIni=timeGetTime_()
   For I = 1 To TotalTest
      Counter(Test2() )
   Next
   ValeurFin=timeGetTime_()
   Test2=ValeurFin-ValeurIni

   ValeurIni=timeGetTime_()
   For I = 1 To TotalTest
      Counter(Test3() )
   Next
   ValeurFin=timeGetTime_()
   Test3=ValeurFin-ValeurIni

   MessageRequester("Tests without debugger", "Test #1 (If a If b) = " + Str(Test1) + " ms" + Chr(10) + "Test #2 (If a And b) = " + Str(Test2) + " ms" + Chr(10) + "Test #3 (Asm /!\ with ESI reg) = " + Str(Test3) + " ms")

timeEndPeriod_(1)
DisableDebugger
Test #1 (If a If b) = 214 ms
Test #2 (If a And b) = 237 ms
Test #3 (Asm /!\ with ESI reg) = 200 ms
Compile Without Debugger
Test #1 (If a If b) = 213 ms
Test #2 (If a And b) = 263 ms
Test #3 (Asm /!\ with ESI reg) = 204 ms
As you can see there is a difference.
Trond
Always Here
Always Here
Posts: 7446
Joined: Mon Sep 22, 2003 6:45 pm
Location: Norway

Post by Trond »

Ollivier wrote:@Trond... Mh... :D
Trond wrote:The test is flawed. The conditions are true once and false 63999999 times, which is not very balanced.
Are you sure?
... code...
Really sure?
No, actually, I was wrong. There were 64016001 iterations. In 64000000 the check was true. In 16001 the check was false. I still claim that this is not balanced, though.

See for yourself:

Code: Select all

For A = 0 To 8000
  For B = 0 To 8000
     If A
        If B
          True + 1
        EndIf
     EndIf
     Iter + 1
  Next
Next
MessageRequester("", "Number of iterations: " + Str(Iter))
MessageRequester("", "Of which were true: " + Str(True))
MessageRequester("", "Of which were false: " + Str(Iter-True))
Trond wrote:Rdtsc is not reliable on modern processors
Are you again sure? I want to believe you. So, could you give me the links which tell me which processor family is excluded?
All processors with multi-core, or systems with multiple processors, and all processors with cpu frequency scaling ("speedstep").
See: http://msdn.microsoft.com/en-us/library ... S.85).aspx
Also, if the processor executes several instructions in parallel, the rdtsc may get a timestamp that is too early, because the last instructions haven't completed yet.
Trond wrote:DisableDebugger is not enough
Are you again sure? Which code example could you give me to prove it?
Yes, I'm sure. Fred should know, he made PB: http://www.purebasic.fr/english/viewtop ... 209#242209
Trond
Always Here
Always Here
Posts: 7446
Joined: Mon Sep 22, 2003 6:45 pm
Location: Norway

Post by Trond »

By the way, the lea... variant isn't short-circuiting, so it doesn't behave like the And in PB.
Ollivier
Enthusiast
Enthusiast
Posts: 281
Joined: Mon Jul 23, 2007 8:30 pm
Location: FR

Post by Ollivier »

I can see also a difference between the two configurations : with and without the debugger option.
Trond wrote:It doesn't behave like the And in PB.
You are right.

Now, I can really see we have the same logic rules:
Zero is false
All the others integer values are true
I thank you for the codes and the links.

I think this algo

Code: Select all

! CMP [Var], 0
Etc...
is slower (15% -> 25%) than this algo

Code: Select all

! XOR EAX, EAX
! LEA EDI, [Var]
! SCASD
Etc...
Actually I can't check if this difference is true with x64 (with SCASQ). But, with x86, I can see a difference.

In order to check if the AND gate is exact, I give you this first code.

Code: Select all

For Max = 1 To 3 Step 2

   Result$ = ""

   For A = 0 To Max
      For B = 0 To Max

         ! xor eax, eax
   
         ! lea edi, [v_A]
         ! scasd
         ! je l_isfalse2
   
         ! lea edi, [v_B]
         ! scasd
         ! je l_isfalse2

IsTrue2:

         Result$ + "1"
         ! jmp l_endif2

IsFalse2:

         Result$ + "0"

EndIf2:

      Next
   
      Result$ + "-"

   Next

   MessageRequester("AND gate (" + Str(((Max - 1) / 2) + 1) + " bit(s) per input)", Result$)

Next
And to see the execution duration difference, I take your code (Rescator) and insert the algo above in the macro Test3() to check it:

Code: Select all

DisableDebugger
timeBeginPeriod_(1)


Macro ExecutionIfTrue()
   ; Nothing
EndMacro

Macro ExecutionIfFalse()
   ; Nothing
EndMacro



Macro Test1()
            If A
               If B
                  ExecutionIfTrue()
               Else
                  ExecutionIfFalse()
               EndIf
            Else
               ExecutionIfFalse()
            EndIf
EndMacro



Macro Test2()
            If A And B
               ExecutionIfTrue()
            Else
               ExecutionIfFalse()
            EndIf
EndMacro




Macro Test3() ; (= If A And B)    /!\ /!\ /!\ /!\ /!\

   ! xor eax, eax
   
   ! lea edi, [v_A]
   ! scasd
   ! je l_isfalse2
   
   ! lea edi, [v_B]
   ! scasd
   ! je l_isfalse2

IsTrue2:

   ExecutionIfTrue()
   ! jmp l_endif2

IsFalse2:

   ExecutionIfFalse()

EndIf2:


EndMacro ;                        \!/ \!/ \!/ \!/ \!/




Macro Counter(InternalMacro)
      Cumul = 0
      For A = 0 To 1
         For B = 0 To 1
            InternalMacro
         Next
      Next
EndMacro




   Define Test1.I
   Define Test2.I
   Define Test3.I
   Define ValeurIni.I
   Define ValeurFin.I
   Define Cumul.I
   Define A.I = 0
   Define B.I = 0
   Define TotalTest.I = 10000000

   ValeurIni=timeGetTime_()
   For I = 1 To TotalTest
      Counter(Test1() )
   Next
   ValeurFin=timeGetTime_()
   Test1=ValeurFin-ValeurIni

   ValeurIni=timeGetTime_()
   For I = 1 To TotalTest
      Counter(Test2() )
   Next
   ValeurFin=timeGetTime_()
   Test2=ValeurFin-ValeurIni

   ValeurIni=timeGetTime_()
   For I = 1 To TotalTest
      Counter(Test3() )
   Next
   ValeurFin=timeGetTime_()
   Test3=ValeurFin-ValeurIni

   MessageRequester("Tests without debugger", "Test #1 (If a If b) = " + Str(Test1) + " ms" + Chr(10) + "Test #2 (If a And b) = " + Str(Test2) + " ms" + Chr(10) + "Test #3 (Asm /!\ with LEA) = " + Str(Test3) + " ms")

timeEndPeriod_(1)
Could you check if you have the same observation?
User avatar
Michael Vogel
Addict
Addict
Posts: 2797
Joined: Thu Feb 09, 2006 11:27 pm
Contact:

Post by Michael Vogel »

Just to see what can happen, when to much optimizing has been done :lol:

Here are the results of the netbook cpu:
Test#1: 829ms
Test#2: 924ms
Test#3: 1082ms :shock:

Michael
Ollivier
Enthusiast
Enthusiast
Posts: 281
Joined: Mon Jul 23, 2007 8:30 pm
Location: FR

Post by Ollivier »

Michael Vogel wrote:Here are the results of the netbook cpu:
Test#1: 829ms
Test#2: 924ms
Test#3: 1082ms Shocked
Interesting! The first optimization (#1) is ok but the second one is out (#3)...
Excuse me but I'd like to know you which OS, which CPU and which Anti-virus have you got?

Ollivier
User avatar
Michael Vogel
Addict
Addict
Posts: 2797
Joined: Thu Feb 09, 2006 11:27 pm
Contact:

Post by Michael Vogel »

Ollivier wrote:Interesting! The first optimization (#1) is ok but the second one is out (#3)...
Excuse me but I'd like to know you which OS, which CPU and which Anti-virus have you got?

Ollivier
XP SP3, Intel Atom N270, Kaspersky Security Suite 2009
Ollivier
Enthusiast
Enthusiast
Posts: 281
Joined: Mon Jul 23, 2007 8:30 pm
Location: FR

Post by Ollivier »

Thank you Michael Vogel!
User avatar
Demivec
Addict
Addict
Posts: 4260
Joined: Mon Jul 25, 2005 3:51 pm
Location: Utah, USA

Post by Demivec »

Here are my results:
Test #1 (If a If b) = 506 ms
Test #2 (If a And b) = 326 ms
Test #3 (Asm /!\ with LEA) = 212 ms
Using Windows XP SP3, AMD Athlon 64 3500+ 2.21GHz, ZoneAlarm
Post Reply