Optmizations questions

kinglestat · Post by **kinglestat** » Sun Feb 10, 2008 7:42 pm

joking apart
I think you are missing the point guys

the C code is not using any internal functions
and its not optmized for speed by hand
yet all those compilers generated code which is blazing fast - faster than 1 liner in PB.

I'd like to find ways how to redress the balance. I'm sure Fred or any of the super freaks can help us

djes · Post by **djes** » Sun Feb 10, 2008 8:41 pm

In this kind of speed test, a minor change could do a major difference.

The first thing I see comparing PB and C codes is that in the second one, you're passing pointers to the recursive function. The result is that you're doing all calculations directly on the final buffer, without any "memory swap". This is really a huge difference!

I wouldn't give any conclusion without modifying this.

tinman · Post by **tinman** » Sun Feb 10, 2008 11:05 pm

superadnim makes a good point about the data. If it is different then the swaps and branches taken through the code will be different. Unless you are also taking average timings over a number of runs, as that should balance out.

With the short times you mention, the measurement of the times may also be affected by the Windows scheduler time and your applications priority. Try giving yourself real time priority (only for the duration of the sort!) so that only your sort (and drivers) run. You should also use the high performance counter to measure the sort time instead of a millisecond counter.

You should use both these things in the PB and the C code in order to get times which can be compared more accurately.

The C code will still be quicker.

Your assessment of PB strings is accurate. If you want to pre-allocate strings you could try simply assigning a large string to your variable, e.g. Space(10000). I don't know if the buffer would be re-used though.

Is is possible for you to code the speed critical sections in C and import them as a library into PB, or do you not want to do that?

Any protected/local variables in your PB procedure will be initialised to zero automatically. In C there is no automatic initialisation.

Having said that, I get 39ms for your C code here and 133ms for the same code in PB (I converted it from the C code directly). The PB compiler simply doesn't do the same level of optimisation as the C compiler, which you won't get without using ASM or coding in C and linking the library to PB.

Even if I disable optimisations the C code takes 108ms. So there is a fundamental difference in the way that PB and C compilers generate code. I've tried changing how it is written but the fastest I got was 127 ms, and any more code you add to be tricky about how the sort is written just makes it slower.

(This was all MSVC 2005 Express.)

Trond · Post by **Trond** » Sun Feb 10, 2008 11:18 pm

The C code will probably always be slightly faster. Try writing the same algorithm in PHP to get some perspective on it.

pdwyer · Post by **pdwyer** » Tue Feb 12, 2008 2:08 pm

I did some more reading, a heap sort it supposed to be pretty good to so I tried that but it was slower 312ms. It seems the more I try the slower I get it to work!

. It is quite a fast sort though.

Interestingly though, I read on wikipedia that quicksort and similar algorithms are known to perform differently on different compilers, some compilers seem to work better with some algorithms (which is why I tried the heap sort).

Get Trond to take a look under the hood, maybe he can spot another area where PB's basic2ASM is a little sub optimal. (or we need that guy Hutch from the powerbasic forum to migrate to PB, the one who wrote masm32)

Either way, trying different algorithms is more a competition of algorithms than a competition of compilers.

One thing is for sure, PB was quick to implement, 15mins and I had a wikipedia link, an algorithm I'd never read before in pseudo code turned into a working sort function. See if the C++ spaghetti folks can beat that!

Dare · Post by **Dare** » Tue Feb 12, 2008 2:12 pm

pdwyer wrote:(or we need that guy Hutch from the powerbasic forum to migrate to PB, the one who wrote masm32)

MS might have a different take on the authorship there!

pdwyer · Post by **pdwyer** » Tue Feb 12, 2008 2:26 pm

I don't mean MASM

http://www.masm32.com/
http://en.wikipedia.org/wiki/MASM32

It's not MASM as such, it uses it under the hood and other things to work, uses ASM sytax plus higher level function calls

Steve claims MS know about it.

Dare · Post by **Dare** » Tue Feb 12, 2008 2:40 pm

Yep, that masm. He is not the author, but he has kept masm alive and he has made a huge contribution to the development package. I think he and someone else (iczelion?) initially worked together on putting together the current package. Then Hutch carried it on. I think.

wikipedia wrote:There has been much debacle over whether Steve Hutchinsson has the right to re-distribute MASM (ml.exe + friends); he claims he has permission from Microsoft Australia, but has never backed his claims. See alt.lang.asm usenet discussion archive, the topic has been raised multiple times during the years.

Debacle? Debate, surely.

Strike that whole wiki paragraph because it misleads.

MS do know him. You can follow some of his discussions with MS re masm (and their decision on some changes) on one of their forums. Link somewhere on the masm site.

To the best of my knowledge they accept his stewardship (but not his ownership).

However, masm was written by MS and masm32 is masm. I don't think hutch would claim authorship.

Edit:

BTW, if you're into that sort of thing, Pelle (of PellesC fame) is (or was) developing a masm lookalike which Hutch seems to like.

djes · Post by **djes** » Tue Feb 12, 2008 3:19 pm

I've translated the C code in native purebasic, with the fewest modifications possible, as I think it could be a good comparison test. Maybe some mistakes are still there, could you test? Thank you.

Code: Select all

#gSz=500000

ImportC "msvcrt.lib"
  ftime( a ) As "__ftime@4"
EndImport

Structure vector
   len.d
   elem.l[#gSz]
EndStructure

Structure _timeb
  time.l
  milli.w
  tz.w
  flag.w
EndStructure 

;*****************************************************************************

Procedure.l MAX(a.l,b.l)

  If a>b
    ProcedureReturn a
  Else
    ProcedureReturn b
  EndIf

EndProcedure

;*****************************************************************************

Procedure.l MIN(a.l,b.l)

  If a<b
    ProcedureReturn a
  Else
    ProcedureReturn b
  EndIf

EndProcedure
  

;*****************************************************************************
Procedure.l partition(*v.vector, p.l, r.l)
  
  Protected x.l, i.l, j.l
  
  If ((p + 3) < r)
    x = MIN(*v\elem[p], MAX(*v\elem[p+1], *v\elem[p+2]));
  Else
    x = *v\elem[p]
  EndIf
  
  i = p - 1
  j = r + 1 
  While 1
    Repeat
      j-1
    Until *v\elem[j] <= x
    Repeat
      i+1
    Until *v\elem[i] >= x
    
    If i < j 
       aux.l = *v\elem[j]
       *v\elem[j] = *v\elem[i]
       *v\elem[i] = aux
    Else
       ProcedureReturn j
    EndIf
  Wend

EndProcedure

;*****************************************************************************

Procedure quicksort_real(*v.vector, p.l, r.l)

   If p < r
      q.l = partition(*v, p, r)
      quicksort_real(*v, p, q)
      quicksort_real(*v, q + 1, r)
   EndIf

EndProcedure

;*****************************************************************************

Procedure quicksort(*v.vector)

   PrintN("Quicksort...");
   quicksort_real(*v, 0, *v\len - 1);

EndProcedure


;*****************************************************************************

Define.l i,lDiv
Define.vector lV
Define.w lS
Define._timeb t

OpenConsole()

lV\len = #gSz

For  i = 0 To lV\len-1
  ;--- don't see the point here, and don't know how big is #randmax
  ;lDiv = 1+Random(32767)
  ;lV\elem[i] = Random(32767)/lDiv
  ;---
  lV\elem[i] = Random(32767)
  Debug Str(lV\elem[i])
Next i

PrintN("Vector populated. Press enter to quicksort...");

Input()

ftime(@t)
lS = t\milli;
quicksort(@lV);
ftime(@t);
lS = t\milli - lS;

PrintN("Elapsed milliseconds: "+Str(lS));

CompilerIf #PB_Compiler_Debugger=1
  For  i = 0 To lV\len-1
    PrintN(Str(lV\elem[i]))
  Next i
CompilerEndIf


PrintN("Press enter to quit...");
Input()
End

pdwyer · Post by **pdwyer** » Tue Feb 12, 2008 3:39 pm

wouldn't having min and max as functions add extra overhead? (or is this more about matching the C code?)

Maybe PB is doing some sort of memory bounds checking which slows it a bit?

It might be interesting to baseline what happens if both comilers use ASM inline which is the same. Then if they are out by a fair bit you know something is wierd

djes · Post by **djes** » Tue Feb 12, 2008 3:57 pm

Min and max were macros in C, actually. As I can't do them in Purebasic, I've done them as procedures. But they're not in the worst time consuming loops.
As we're comparing optimisations done by the misc languages, we have to do the most comparable code. If we're doing assembly, we will compare processors, not languages.

tinman · Post by **tinman** » Tue Feb 12, 2008 11:34 pm

djes wrote:I've translated the C code in native purebasic, with the fewest modifications possible, as I think it could be a good comparison test. Maybe some mistakes are still there, could you test?

That's what I did to get 39 vs 132 ms. However, your code gave me a few ideas and I tried them.

Using "Swap" is faster than manually swapping the values.
Using unbounded arrays is no quicker than using bounded arrays (i.e. there is no bounds checking in a program with the debugger turned off)
Using shared variables is slower than protected variables for the "partition" procedure.
Using procedures for the min and max functions are slower than doing it manually with this code:

Code: Select all

        x = *v\elem[p+1]
        temp = *v\elem[p+2]
        If x < temp : x = temp : EndIf
        temp = *v\elem[p]
        If x > temp : x = temp : EndIf

Using "i = i + 1" is equally fast as "i + 1".

None of these things make it comparable to C though. Just not going to happen.

djes · Post by **djes** » Wed Feb 13, 2008 9:53 am

tinman> You're right for the procedures. As it is not time consuming, I privilegied code readability. Swap is sort of a special PB command; as it is not in the C code, we should not use it; as I were wrong to use i++ as you said. For the protected statement, maybe we should use static, or global. We should compare with the c at this point...

I'd like to see disasm code

pdwyer · Post by **pdwyer** » Wed Feb 13, 2008 10:45 am

I'm completely lost on what you are trying to achieve now

So if a C compiler has optimisation options and ways to use macros inline you turn them on and use them but if PB has commands that the C code doesn't have which are fast then you want to avoid them as then it's not apples and apples anymore.

What exactly are you trying to compare?

It doesn't seem to be a) which can sort an interger list faster or b)which can implement a faster quicksort

Quick tests showed C was faster, it sounds like you are just trying to find a way to make the gap look bigger.

tinman · Post by **tinman** » Wed Feb 13, 2008 12:29 pm

djes wrote:tinman> You're right for the procedures. As it is not time consuming, I privilegied code readability. Swap is sort of a special PB command; as it is not in the C code, we should not use it; as I were wrong to use i++ as you said. For the protected statement, maybe we should use static, or global. We should compare with the c at this point...

I think we should use Swap :)

About i++, I was only saying that "i + 1" (which I also used) was the same speed as "i = i + 1".

pdwyer wrote:What exactly are you trying to compare?

Personally I'm trying to find the fastest implementation of the same algorithm in PB. As such we should make use of the language features (e.g. Swap) available. If we can find areas that need improvement that get implemented then thats even better.