Optmizations questions
-
- Enthusiast
- Posts: 746
- Joined: Fri Jul 14, 2006 8:53 pm
- Location: Malta
- Contact:
joking apart
I think you are missing the point guys
the C code is not using any internal functions
and its not optmized for speed by hand
yet all those compilers generated code which is blazing fast - faster than 1 liner in PB.
I'd like to find ways how to redress the balance. I'm sure Fred or any of the super freaks can help us
I think you are missing the point guys
the C code is not using any internal functions
and its not optmized for speed by hand
yet all those compilers generated code which is blazing fast - faster than 1 liner in PB.
I'd like to find ways how to redress the balance. I'm sure Fred or any of the super freaks can help us
I may not help with your coding
Just ask about mental issues!
http://www.lulu.com/spotlight/kingwolf
http://www.sen3.net
Just ask about mental issues!
http://www.lulu.com/spotlight/kingwolf
http://www.sen3.net
In this kind of speed test, a minor change could do a major difference.
The first thing I see comparing PB and C codes is that in the second one, you're passing pointers to the recursive function. The result is that you're doing all calculations directly on the final buffer, without any "memory swap". This is really a huge difference!
I wouldn't give any conclusion without modifying this.
The first thing I see comparing PB and C codes is that in the second one, you're passing pointers to the recursive function. The result is that you're doing all calculations directly on the final buffer, without any "memory swap". This is really a huge difference!
I wouldn't give any conclusion without modifying this.
- tinman
- PureBasic Expert
- Posts: 1102
- Joined: Sat Apr 26, 2003 4:56 pm
- Location: Level 5 of Robot Hell
- Contact:
superadnim makes a good point about the data. If it is different then the swaps and branches taken through the code will be different. Unless you are also taking average timings over a number of runs, as that should balance out.
With the short times you mention, the measurement of the times may also be affected by the Windows scheduler time and your applications priority. Try giving yourself real time priority (only for the duration of the sort!) so that only your sort (and drivers) run. You should also use the high performance counter to measure the sort time instead of a millisecond counter.
You should use both these things in the PB and the C code in order to get times which can be compared more accurately.
The C code will still be quicker.
Your assessment of PB strings is accurate. If you want to pre-allocate strings you could try simply assigning a large string to your variable, e.g. Space(10000). I don't know if the buffer would be re-used though.
Is is possible for you to code the speed critical sections in C and import them as a library into PB, or do you not want to do that?
Any protected/local variables in your PB procedure will be initialised to zero automatically. In C there is no automatic initialisation.
Having said that, I get 39ms for your C code here and 133ms for the same code in PB (I converted it from the C code directly). The PB compiler simply doesn't do the same level of optimisation as the C compiler, which you won't get without using ASM or coding in C and linking the library to PB.
Even if I disable optimisations the C code takes 108ms. So there is a fundamental difference in the way that PB and C compilers generate code. I've tried changing how it is written but the fastest I got was 127 ms, and any more code you add to be tricky about how the sort is written just makes it slower.
(This was all MSVC 2005 Express.)
With the short times you mention, the measurement of the times may also be affected by the Windows scheduler time and your applications priority. Try giving yourself real time priority (only for the duration of the sort!) so that only your sort (and drivers) run. You should also use the high performance counter to measure the sort time instead of a millisecond counter.
You should use both these things in the PB and the C code in order to get times which can be compared more accurately.
The C code will still be quicker.
Your assessment of PB strings is accurate. If you want to pre-allocate strings you could try simply assigning a large string to your variable, e.g. Space(10000). I don't know if the buffer would be re-used though.
Is is possible for you to code the speed critical sections in C and import them as a library into PB, or do you not want to do that?
Any protected/local variables in your PB procedure will be initialised to zero automatically. In C there is no automatic initialisation.
Having said that, I get 39ms for your C code here and 133ms for the same code in PB (I converted it from the C code directly). The PB compiler simply doesn't do the same level of optimisation as the C compiler, which you won't get without using ASM or coding in C and linking the library to PB.
Even if I disable optimisations the C code takes 108ms. So there is a fundamental difference in the way that PB and C compilers generate code. I've tried changing how it is written but the fastest I got was 127 ms, and any more code you add to be tricky about how the sort is written just makes it slower.
(This was all MSVC 2005 Express.)
If you paint your butt blue and glue the hole shut you just themed your ass but lost the functionality.
(WinXPhSP3 PB5.20b14)
(WinXPhSP3 PB5.20b14)
I did some more reading, a heap sort it supposed to be pretty good to so I tried that but it was slower 312ms. It seems the more I try the slower I get it to work!
. It is quite a fast sort though.
Interestingly though, I read on wikipedia that quicksort and similar algorithms are known to perform differently on different compilers, some compilers seem to work better with some algorithms (which is why I tried the heap sort).
Get Trond to take a look under the hood, maybe he can spot another area where PB's basic2ASM is a little sub optimal. (or we need that guy Hutch from the powerbasic forum to migrate to PB, the one who wrote masm32)
Either way, trying different algorithms is more a competition of algorithms than a competition of compilers.
One thing is for sure, PB was quick to implement, 15mins and I had a wikipedia link, an algorithm I'd never read before in pseudo code turned into a working sort function. See if the C++ spaghetti folks can beat that! 

Interestingly though, I read on wikipedia that quicksort and similar algorithms are known to perform differently on different compilers, some compilers seem to work better with some algorithms (which is why I tried the heap sort).
Get Trond to take a look under the hood, maybe he can spot another area where PB's basic2ASM is a little sub optimal. (or we need that guy Hutch from the powerbasic forum to migrate to PB, the one who wrote masm32)
Either way, trying different algorithms is more a competition of algorithms than a competition of compilers.


Paul Dwyer
“In nature, it’s not the strongest nor the most intelligent who survives. It’s the most adaptable to change” - Charles Darwin
“If you can't explain it to a six-year old you really don't understand it yourself.” - Albert Einstein
“In nature, it’s not the strongest nor the most intelligent who survives. It’s the most adaptable to change” - Charles Darwin
“If you can't explain it to a six-year old you really don't understand it yourself.” - Albert Einstein
I don't mean MASM
http://www.masm32.com/
http://en.wikipedia.org/wiki/MASM32
It's not MASM as such, it uses it under the hood and other things to work, uses ASM sytax plus higher level function calls
Steve claims MS know about it.
http://www.masm32.com/
http://en.wikipedia.org/wiki/MASM32
It's not MASM as such, it uses it under the hood and other things to work, uses ASM sytax plus higher level function calls
Steve claims MS know about it.
Paul Dwyer
“In nature, it’s not the strongest nor the most intelligent who survives. It’s the most adaptable to change” - Charles Darwin
“If you can't explain it to a six-year old you really don't understand it yourself.” - Albert Einstein
“In nature, it’s not the strongest nor the most intelligent who survives. It’s the most adaptable to change” - Charles Darwin
“If you can't explain it to a six-year old you really don't understand it yourself.” - Albert Einstein
Yep, that masm. He is not the author, but he has kept masm alive and he has made a huge contribution to the development package. I think he and someone else (iczelion?) initially worked together on putting together the current package. Then Hutch carried it on. I think.

Strike that whole wiki paragraph because it misleads.
MS do know him. You can follow some of his discussions with MS re masm (and their decision on some changes) on one of their forums. Link somewhere on the masm site.
To the best of my knowledge they accept his stewardship (but not his ownership).
However, masm was written by MS and masm32 is masm. I don't think hutch would claim authorship.
Edit:
BTW, if you're into that sort of thing, Pelle (of PellesC fame) is (or was) developing a masm lookalike which Hutch seems to like.
Debacle? Debate, surely.wikipedia wrote:There has been much debacle over whether Steve Hutchinsson has the right to re-distribute MASM (ml.exe + friends); he claims he has permission from Microsoft Australia, but has never backed his claims. See alt.lang.asm usenet discussion archive, the topic has been raised multiple times during the years.

Strike that whole wiki paragraph because it misleads.
MS do know him. You can follow some of his discussions with MS re masm (and their decision on some changes) on one of their forums. Link somewhere on the masm site.
To the best of my knowledge they accept his stewardship (but not his ownership).
However, masm was written by MS and masm32 is masm. I don't think hutch would claim authorship.
Edit:
BTW, if you're into that sort of thing, Pelle (of PellesC fame) is (or was) developing a masm lookalike which Hutch seems to like.
Dare2 cut down to size
I've translated the C code in native purebasic, with the fewest modifications possible, as I think it could be a good comparison test. Maybe some mistakes are still there, could you test? Thank you.
Code: Select all
#gSz=500000
ImportC "msvcrt.lib"
ftime( a ) As "__ftime@4"
EndImport
Structure vector
len.d
elem.l[#gSz]
EndStructure
Structure _timeb
time.l
milli.w
tz.w
flag.w
EndStructure
;*****************************************************************************
Procedure.l MAX(a.l,b.l)
If a>b
ProcedureReturn a
Else
ProcedureReturn b
EndIf
EndProcedure
;*****************************************************************************
Procedure.l MIN(a.l,b.l)
If a<b
ProcedureReturn a
Else
ProcedureReturn b
EndIf
EndProcedure
;*****************************************************************************
Procedure.l partition(*v.vector, p.l, r.l)
Protected x.l, i.l, j.l
If ((p + 3) < r)
x = MIN(*v\elem[p], MAX(*v\elem[p+1], *v\elem[p+2]));
Else
x = *v\elem[p]
EndIf
i = p - 1
j = r + 1
While 1
Repeat
j-1
Until *v\elem[j] <= x
Repeat
i+1
Until *v\elem[i] >= x
If i < j
aux.l = *v\elem[j]
*v\elem[j] = *v\elem[i]
*v\elem[i] = aux
Else
ProcedureReturn j
EndIf
Wend
EndProcedure
;*****************************************************************************
Procedure quicksort_real(*v.vector, p.l, r.l)
If p < r
q.l = partition(*v, p, r)
quicksort_real(*v, p, q)
quicksort_real(*v, q + 1, r)
EndIf
EndProcedure
;*****************************************************************************
Procedure quicksort(*v.vector)
PrintN("Quicksort...");
quicksort_real(*v, 0, *v\len - 1);
EndProcedure
;*****************************************************************************
Define.l i,lDiv
Define.vector lV
Define.w lS
Define._timeb t
OpenConsole()
lV\len = #gSz
For i = 0 To lV\len-1
;--- don't see the point here, and don't know how big is #randmax
;lDiv = 1+Random(32767)
;lV\elem[i] = Random(32767)/lDiv
;---
lV\elem[i] = Random(32767)
Debug Str(lV\elem[i])
Next i
PrintN("Vector populated. Press enter to quicksort...");
Input()
ftime(@t)
lS = t\milli;
quicksort(@lV);
ftime(@t);
lS = t\milli - lS;
PrintN("Elapsed milliseconds: "+Str(lS));
CompilerIf #PB_Compiler_Debugger=1
For i = 0 To lV\len-1
PrintN(Str(lV\elem[i]))
Next i
CompilerEndIf
PrintN("Press enter to quit...");
Input()
End
wouldn't having min and max as functions add extra overhead? (or is this more about matching the C code?)
Maybe PB is doing some sort of memory bounds checking which slows it a bit?
It might be interesting to baseline what happens if both comilers use ASM inline which is the same. Then if they are out by a fair bit you know something is wierd
Maybe PB is doing some sort of memory bounds checking which slows it a bit?
It might be interesting to baseline what happens if both comilers use ASM inline which is the same. Then if they are out by a fair bit you know something is wierd
Paul Dwyer
“In nature, it’s not the strongest nor the most intelligent who survives. It’s the most adaptable to change” - Charles Darwin
“If you can't explain it to a six-year old you really don't understand it yourself.” - Albert Einstein
“In nature, it’s not the strongest nor the most intelligent who survives. It’s the most adaptable to change” - Charles Darwin
“If you can't explain it to a six-year old you really don't understand it yourself.” - Albert Einstein
Min and max were macros in C, actually. As I can't do them in Purebasic, I've done them as procedures. But they're not in the worst time consuming loops.
As we're comparing optimisations done by the misc languages, we have to do the most comparable code. If we're doing assembly, we will compare processors, not languages.
As we're comparing optimisations done by the misc languages, we have to do the most comparable code. If we're doing assembly, we will compare processors, not languages.
- tinman
- PureBasic Expert
- Posts: 1102
- Joined: Sat Apr 26, 2003 4:56 pm
- Location: Level 5 of Robot Hell
- Contact:
That's what I did to get 39 vs 132 ms. However, your code gave me a few ideas and I tried them.djes wrote:I've translated the C code in native purebasic, with the fewest modifications possible, as I think it could be a good comparison test. Maybe some mistakes are still there, could you test?
Using "Swap" is faster than manually swapping the values.
Using unbounded arrays is no quicker than using bounded arrays (i.e. there is no bounds checking in a program with the debugger turned off)
Using shared variables is slower than protected variables for the "partition" procedure.
Using procedures for the min and max functions are slower than doing it manually with this code:
Code: Select all
x = *v\elem[p+1]
temp = *v\elem[p+2]
If x < temp : x = temp : EndIf
temp = *v\elem[p]
If x > temp : x = temp : EndIf
None of these things make it comparable to C though. Just not going to happen.
If you paint your butt blue and glue the hole shut you just themed your ass but lost the functionality.
(WinXPhSP3 PB5.20b14)
(WinXPhSP3 PB5.20b14)
tinman> You're right for the procedures. As it is not time consuming, I privilegied code readability. Swap is sort of a special PB command; as it is not in the C code, we should not use it; as I were wrong to use i++ as you said. For the protected statement, maybe we should use static, or global. We should compare with the c at this point...
I'd like to see disasm code
I'd like to see disasm code

I'm completely lost on what you are trying to achieve now
So if a C compiler has optimisation options and ways to use macros inline you turn them on and use them but if PB has commands that the C code doesn't have which are fast then you want to avoid them as then it's not apples and apples anymore.
What exactly are you trying to compare?
It doesn't seem to be a) which can sort an interger list faster or b)which can implement a faster quicksort
Quick tests showed C was faster, it sounds like you are just trying to find a way to make the gap look bigger.

So if a C compiler has optimisation options and ways to use macros inline you turn them on and use them but if PB has commands that the C code doesn't have which are fast then you want to avoid them as then it's not apples and apples anymore.
What exactly are you trying to compare?
It doesn't seem to be a) which can sort an interger list faster or b)which can implement a faster quicksort
Quick tests showed C was faster, it sounds like you are just trying to find a way to make the gap look bigger.

Paul Dwyer
“In nature, it’s not the strongest nor the most intelligent who survives. It’s the most adaptable to change” - Charles Darwin
“If you can't explain it to a six-year old you really don't understand it yourself.” - Albert Einstein
“In nature, it’s not the strongest nor the most intelligent who survives. It’s the most adaptable to change” - Charles Darwin
“If you can't explain it to a six-year old you really don't understand it yourself.” - Albert Einstein
- tinman
- PureBasic Expert
- Posts: 1102
- Joined: Sat Apr 26, 2003 4:56 pm
- Location: Level 5 of Robot Hell
- Contact:
I think we should use Swap :)djes wrote:tinman> You're right for the procedures. As it is not time consuming, I privilegied code readability. Swap is sort of a special PB command; as it is not in the C code, we should not use it; as I were wrong to use i++ as you said. For the protected statement, maybe we should use static, or global. We should compare with the c at this point...
About i++, I was only saying that "i + 1" (which I also used) was the same speed as "i = i + 1".
Personally I'm trying to find the fastest implementation of the same algorithm in PB. As such we should make use of the language features (e.g. Swap) available. If we can find areas that need improvement that get implemented then thats even better.pdwyer wrote:What exactly are you trying to compare?
If you paint your butt blue and glue the hole shut you just themed your ass but lost the functionality.
(WinXPhSP3 PB5.20b14)
(WinXPhSP3 PB5.20b14)