Page 2 of 3

Posted: Mon Jun 08, 2009 1:41 pm
by Trond

Code: Select all

@Trond: With your code I get -23% -> -28% .
Strange, I get between 10 and 20%.

Posted: Tue Jun 09, 2009 1:38 am
by Rescator
@Trond
on x64 i get -11% to -17% with most runs getting -17%
on x86 i get 0% to 8% with most runs getting 0%.

And obviously I selected compile without debugger in the menu all times.

Posted: Tue Jun 09, 2009 9:10 am
by Trond
GCC generates the if+if version even if you write and, by the way.

Posted: Fri Jun 12, 2009 5:20 am
by Ollivier
What values do you get with this other algo?

Code: Select all


DisableDebugger



Macro ExecutionIfTrue()
   ; Nothing
EndMacro



Macro Test1()
            If A
               If B
                  ExecutionIfTrue()
               EndIf
            EndIf
EndMacro



Macro Test2()
            If A And B
               ExecutionIfTrue()
            EndIf
EndMacro




Macro Test3() ; (= If A And B)    /!\ /!\ /!\ /!\ /!\
            !Lea esi,[v_A]
            !Lodsd
            !And eax,[v_B]
            !JNE l_istrue  
            ; Code execution if false          
            !JMP l_wasfalse
IsTrue:
            ; Code execution if true
            ExecutionIfTrue()
WasFalse:
EndMacro ;                        \!/ \!/ \!/ \!/ \!/




Macro Counter(InternalMacro)
      Cumul = 0
      For A = 0 To 1
         For B = 0 To 1
            !Rdtsc
            !Mov [v_ValeurIni], EAX
            InternalMacro
            !Rdtsc
            !Mov [v_ValeurFin], EAX
            Cumul + (ValeurFin - ValeurIni)
         Next
      Next
EndMacro




   Define Test1.I
   Define Test2.I
   Define Test3.I
   Define ValeurIni.I
   Define ValeurFin.I
   Define Cumul.I
   Define A.I = 0
   Define B.I = 0
   Define TotalTest.I = 1

   For I = 1 To TotalTest
      Counter(Test1() ): Test1 + (Cumul / (1 << 2))
      Counter(Test2() ): Test2 + (Cumul / (1 << 2))
      Counter(Test3() ): Test3 + (Cumul / (1 << 2))
   Next

   MessageRequester("Tests without debugger", "Test #1 (If a If b) = " + Str(Test1) + " cycles" + Chr(10) + "Test #2 (If a And b) = " + Str(Test2) + " cycles" + Chr(10) + "Test #3 (Asm /!\ with ESI reg) = " + Str(Test3) + " cycles")

Posted: Fri Jun 12, 2009 12:30 pm
by Trond
Rdtsc is not reliable on modern processors, and also DisableDebugger is not enough, you must disable the debugger from the menu to turn on the assembly optimizer.

Posted: Fri Jun 12, 2009 1:39 pm
by Ollivier
@Trond... Mh... :D
Trond wrote:The test is flawed. The conditions are true once and false 63999999 times, which is not very balanced.
Are you sure?

Code: Select all

For A = 0 To 8000
      For B = 0 To 8000
         If A
            If B
           
            EndIf
         EndIf
      Next B
   Next A 
Really sure?
Trond wrote:Rdtsc is not reliable on modern processors
Are you again sure? I want to believe you. So, could you give me the links which tell me which processor family is excluded?
Trond wrote:DisableDebugger is not enough
Are you again sure?Which code example could you give me to prove it?

In the next code, I wrote it:

Code: Select all

Macro Test3() ; (= If A And B)    /!\ /!\ /!\ /!\ /!\
            !Lea esi,[v_A]
            !Lodsd
            !And eax,[v_B]
            !JNE l_istrue 
            ; Code execution if false         
            !JMP l_wasfalse
IsTrue:
            ; Code execution if true
WasFalse:
EndMacro ;                        \!/ \!/ \!/ \!/ \!/ 
Have you execution duration values comparing the classical expression:

Code: Select all

If A And B
EndIf
Thank you

Ollivier

Posted: Fri Jun 12, 2009 6:09 pm
by Rescator
Trond is right!

RDTSC is not reliable for timing, It may suddenly jump ahead sometimes, speestep/auto over or auto underclocking etc.

If the CPU was to speed up or down between or during then the timing info is useless. Even the QueryPerformanceCounter has a few of this issues (as on some systems it also uses RDTSC (mostly older systems) instead of a high res clock crystal)

As to DisableDebugger, I've noticed speed differences between DisableDebugger and hoosing compile without debugger in the menu.
When you use DisableDebugger the debugger is still there, maybe Freak could tell you the difference betwenn the DisableDebugger and compiling without the debugger.

I re-did your test to use timeGetTime set to 1ms timing.
It's the easiest way to get accurate timing on Windows 2000 and later (on Linux and Mac and Win9x you could simply use ElapsedMilliseconds)
Donwside is you gotta do a lot of loops to see the difference.

If the difference is too small (you got a very fast CPU) then add a 0 to the test loop, the test took 6+ seconds here on my system then instead of less than a second as in the number below.


Code: Select all

DisableDebugger
timeBeginPeriod_(1)


Macro ExecutionIfTrue()
   ; Nothing
EndMacro



Macro Test1()
            If A
               If B
                  ExecutionIfTrue()
               EndIf
            EndIf
EndMacro



Macro Test2()
            If A And B
               ExecutionIfTrue()
            EndIf
EndMacro




Macro Test3() ; (= If A And B)    /!\ /!\ /!\ /!\ /!\
            !Lea esi,[v_A]
            !Lodsd
            !And eax,[v_B]
            !JNE l_istrue 
            ; Code execution if false         
            !JMP l_wasfalse
IsTrue:
            ; Code execution if true
            ExecutionIfTrue()
WasFalse:
EndMacro ;                        \!/ \!/ \!/ \!/ \!/




Macro Counter(InternalMacro)
      Cumul = 0
      For A = 0 To 1
         For B = 0 To 1
            InternalMacro
         Next
      Next
EndMacro




   Define Test1.I
   Define Test2.I
   Define Test3.I
   Define ValeurIni.I
   Define ValeurFin.I
   Define Cumul.I
   Define A.I = 0
   Define B.I = 0
   Define TotalTest.I = 10000000

   ValeurIni=timeGetTime_()
   For I = 1 To TotalTest
      Counter(Test1() )
   Next
   ValeurFin=timeGetTime_()
   Test1=ValeurFin-ValeurIni

   ValeurIni=timeGetTime_()
   For I = 1 To TotalTest
      Counter(Test2() )
   Next
   ValeurFin=timeGetTime_()
   Test2=ValeurFin-ValeurIni

   ValeurIni=timeGetTime_()
   For I = 1 To TotalTest
      Counter(Test3() )
   Next
   ValeurFin=timeGetTime_()
   Test3=ValeurFin-ValeurIni

   MessageRequester("Tests without debugger", "Test #1 (If a If b) = " + Str(Test1) + " ms" + Chr(10) + "Test #2 (If a And b) = " + Str(Test2) + " ms" + Chr(10) + "Test #3 (Asm /!\ with ESI reg) = " + Str(Test3) + " ms")

timeEndPeriod_(1)
DisableDebugger
Test #1 (If a If b) = 214 ms
Test #2 (If a And b) = 237 ms
Test #3 (Asm /!\ with ESI reg) = 200 ms
Compile Without Debugger
Test #1 (If a If b) = 213 ms
Test #2 (If a And b) = 263 ms
Test #3 (Asm /!\ with ESI reg) = 204 ms
As you can see there is a difference.

Posted: Fri Jun 12, 2009 8:45 pm
by Trond
Ollivier wrote:@Trond... Mh... :D
Trond wrote:The test is flawed. The conditions are true once and false 63999999 times, which is not very balanced.
Are you sure?
... code...
Really sure?
No, actually, I was wrong. There were 64016001 iterations. In 64000000 the check was true. In 16001 the check was false. I still claim that this is not balanced, though.

See for yourself:

Code: Select all

For A = 0 To 8000
  For B = 0 To 8000
     If A
        If B
          True + 1
        EndIf
     EndIf
     Iter + 1
  Next
Next
MessageRequester("", "Number of iterations: " + Str(Iter))
MessageRequester("", "Of which were true: " + Str(True))
MessageRequester("", "Of which were false: " + Str(Iter-True))
Trond wrote:Rdtsc is not reliable on modern processors
Are you again sure? I want to believe you. So, could you give me the links which tell me which processor family is excluded?
All processors with multi-core, or systems with multiple processors, and all processors with cpu frequency scaling ("speedstep").
See: http://msdn.microsoft.com/en-us/library ... S.85).aspx
Also, if the processor executes several instructions in parallel, the rdtsc may get a timestamp that is too early, because the last instructions haven't completed yet.
Trond wrote:DisableDebugger is not enough
Are you again sure? Which code example could you give me to prove it?
Yes, I'm sure. Fred should know, he made PB: http://www.purebasic.fr/english/viewtop ... 209#242209

Posted: Fri Jun 12, 2009 8:47 pm
by Trond
By the way, the lea... variant isn't short-circuiting, so it doesn't behave like the And in PB.

Posted: Sat Jun 13, 2009 2:57 am
by Ollivier
I can see also a difference between the two configurations : with and without the debugger option.
Trond wrote:It doesn't behave like the And in PB.
You are right.

Now, I can really see we have the same logic rules:
Zero is false
All the others integer values are true
I thank you for the codes and the links.

I think this algo

Code: Select all

! CMP [Var], 0
Etc...
is slower (15% -> 25%) than this algo

Code: Select all

! XOR EAX, EAX
! LEA EDI, [Var]
! SCASD
Etc...
Actually I can't check if this difference is true with x64 (with SCASQ). But, with x86, I can see a difference.

In order to check if the AND gate is exact, I give you this first code.

Code: Select all

For Max = 1 To 3 Step 2

   Result$ = ""

   For A = 0 To Max
      For B = 0 To Max

         ! xor eax, eax
   
         ! lea edi, [v_A]
         ! scasd
         ! je l_isfalse2
   
         ! lea edi, [v_B]
         ! scasd
         ! je l_isfalse2

IsTrue2:

         Result$ + "1"
         ! jmp l_endif2

IsFalse2:

         Result$ + "0"

EndIf2:

      Next
   
      Result$ + "-"

   Next

   MessageRequester("AND gate (" + Str(((Max - 1) / 2) + 1) + " bit(s) per input)", Result$)

Next
And to see the execution duration difference, I take your code (Rescator) and insert the algo above in the macro Test3() to check it:

Code: Select all

DisableDebugger
timeBeginPeriod_(1)


Macro ExecutionIfTrue()
   ; Nothing
EndMacro

Macro ExecutionIfFalse()
   ; Nothing
EndMacro



Macro Test1()
            If A
               If B
                  ExecutionIfTrue()
               Else
                  ExecutionIfFalse()
               EndIf
            Else
               ExecutionIfFalse()
            EndIf
EndMacro



Macro Test2()
            If A And B
               ExecutionIfTrue()
            Else
               ExecutionIfFalse()
            EndIf
EndMacro




Macro Test3() ; (= If A And B)    /!\ /!\ /!\ /!\ /!\

   ! xor eax, eax
   
   ! lea edi, [v_A]
   ! scasd
   ! je l_isfalse2
   
   ! lea edi, [v_B]
   ! scasd
   ! je l_isfalse2

IsTrue2:

   ExecutionIfTrue()
   ! jmp l_endif2

IsFalse2:

   ExecutionIfFalse()

EndIf2:


EndMacro ;                        \!/ \!/ \!/ \!/ \!/




Macro Counter(InternalMacro)
      Cumul = 0
      For A = 0 To 1
         For B = 0 To 1
            InternalMacro
         Next
      Next
EndMacro




   Define Test1.I
   Define Test2.I
   Define Test3.I
   Define ValeurIni.I
   Define ValeurFin.I
   Define Cumul.I
   Define A.I = 0
   Define B.I = 0
   Define TotalTest.I = 10000000

   ValeurIni=timeGetTime_()
   For I = 1 To TotalTest
      Counter(Test1() )
   Next
   ValeurFin=timeGetTime_()
   Test1=ValeurFin-ValeurIni

   ValeurIni=timeGetTime_()
   For I = 1 To TotalTest
      Counter(Test2() )
   Next
   ValeurFin=timeGetTime_()
   Test2=ValeurFin-ValeurIni

   ValeurIni=timeGetTime_()
   For I = 1 To TotalTest
      Counter(Test3() )
   Next
   ValeurFin=timeGetTime_()
   Test3=ValeurFin-ValeurIni

   MessageRequester("Tests without debugger", "Test #1 (If a If b) = " + Str(Test1) + " ms" + Chr(10) + "Test #2 (If a And b) = " + Str(Test2) + " ms" + Chr(10) + "Test #3 (Asm /!\ with LEA) = " + Str(Test3) + " ms")

timeEndPeriod_(1)
Could you check if you have the same observation?

Posted: Sat Jun 13, 2009 2:21 pm
by Michael Vogel
Just to see what can happen, when to much optimizing has been done :lol:

Here are the results of the netbook cpu:
Test#1: 829ms
Test#2: 924ms
Test#3: 1082ms :shock:

Michael

Posted: Sat Jun 13, 2009 9:47 pm
by Ollivier
Michael Vogel wrote:Here are the results of the netbook cpu:
Test#1: 829ms
Test#2: 924ms
Test#3: 1082ms Shocked
Interesting! The first optimization (#1) is ok but the second one is out (#3)...
Excuse me but I'd like to know you which OS, which CPU and which Anti-virus have you got?

Ollivier

Posted: Sun Jun 14, 2009 9:48 am
by Michael Vogel
Ollivier wrote:Interesting! The first optimization (#1) is ok but the second one is out (#3)...
Excuse me but I'd like to know you which OS, which CPU and which Anti-virus have you got?

Ollivier
XP SP3, Intel Atom N270, Kaspersky Security Suite 2009

Posted: Sun Jun 14, 2009 11:56 am
by Ollivier
Thank you Michael Vogel!

Posted: Sun Jun 14, 2009 2:16 pm
by Demivec
Here are my results:
Test #1 (If a If b) = 506 ms
Test #2 (If a And b) = 326 ms
Test #3 (Asm /!\ with LEA) = 212 ms
Using Windows XP SP3, AMD Athlon 64 3500+ 2.21GHz, ZoneAlarm