Page 1 of 4
Optmizations questions
Posted: Sat Feb 09, 2008 9:37 pm
by kinglestat
This week I had a "contest" with a friend of mine about which programming language to use to sort 500,000 integers
Unfortunately, his program won....in borland C++, and not even using an internal sort function.
So I've been thinking how can I make my programs faster?
a) is it possible to use registers for loop variables ?
b) can more registers be used for parameter passing in procedures ?
c) void return for parameters not returning a value
d) memory allocation for strings handled by user?
Any suggestions to improve speed
I already use macros exhaustively and pay attention to optimize my code, but I am quite new to PB
Re: Optmizations questions
Posted: Sat Feb 09, 2008 10:00 pm
by tinman
kinglestat wrote:This week I had a "contest" with a friend of mine about which programming language to use to sort 500,000 integers
Unfortunately, his program won....in borland C++, and not even using an internal sort function.
What version of Borland C++ does your friend use? Modern C++ compilers are usually very good at optimising the code, to a level that PB's compiler does not do.
For example, loop unrolling, inline function calls are two optimisations that would make a big difference in certain situations. PB does not automatically put frequently used variables in registers, preferring to read them from memory each time they are used.
kinglestat wrote:a) is it possible to use registers for loop variables ?
You cannot use registers in PB. You could if you were willing to code in ASM, but that would no longer be PB code :)
kinglestat wrote:b) can more registers be used for parameter passing in procedures ?
PB passes parameters on the stack, so you cannot use registers.
kinglestat wrote:c) void return for parameters not returning a value
PB does not support the notion of a void return for procedures. If you do not return a value then you do not return a value. I do not know how much difference that would make anyway.
kinglestat wrote:d) memory allocation for strings handled by user?
Perhaps. Trond posted some ASM code and pdwyer was going to have a look at it for improving the speed of strings.
What sort algorithm does your friend use? It could be that the algorithms used have a large difference (I do not know what PB uses for SortArray).
Posted: Sun Feb 10, 2008 12:09 am
by rsts
Unless you both used the same basic 'code' - algorithm, it wasn't as much about language as it was programmer.
cheers
edit - not 'good' or 'bad' programming just 'choices' and 'methods'
Re: Optmizations questions
Posted: Sun Feb 10, 2008 12:37 am
by Seymour Clufley
tinman wrote:It could be that the algorithms used have a large difference (I do not know what PB uses for SortArray).
Whatever the algorithm is, it adds 5k to executable size. Or at least it did when I used the function a few months ago. Just a sidenote.
Posted: Sun Feb 10, 2008 1:25 am
by kinglestat
he uses codebuilder 2007
I think its the latest
It is optimizing well. We tried his code with watcom, and with some adjustments also microsoft. With watcom, PB was faster. with microsoft, sometimes faster others slower. Borland, always faster.
Its true, optimizations are mostly developer specific. But there are times, with very large iterations, where we depend on the language itself. And like tinman correcly points out, if we coded in assembly, it would not be PB anymore - also there is the risk that with upgrades in PB, these hardcoded optimizations might not work.
Further, optimizing strings is the hardest, as to optimize any string related stuff, would need to build all the associated routines.
Lets take a simple example.
for i = 1 to 1000
s1.s = "some text"
s2.s = "some other text"
s3.s = Peeks( mem )
;>
s1 = s2 + s3
s1 + s2
; do something with s1
s1 = "something else"
next
now from my understanding, PB would do the following
starting from comment
allocate mem for s1
copy string s1
append string s2
realloc mem for s1
copy string s2
and later
free mem for s1
s1 =something else
now, had s1's memory been enough at the beginning, lots of operations would be avoides, thus gaining speed
correct me if I am wrong
This kind of work is ideal in C, but its very hard using C code with PB, as to date there is no simple way to link PB libs with C libs (not using DLLs)
Posted: Sun Feb 10, 2008 1:39 am
by kinglestat
incidentally this is the code
Code: Select all
; English forum: http://www.purebasic.fr/english/viewtopic.php?t=7503&highlight=
; Author: Pupil (updated for PB4.00 by blbltheworm + Andre)
; Date: 16. September 2003
; OS: Windows
; Demo: Yes
; With this small assembly of previous post you can now easily compare the different
; sorting algorithms. The code below contains some algorithms that has already been
; posted, i.e. the basic BubbleSort and oldefoxx's "fastest" bubblesort variation.
; New sorting algoritms are the CombSort and the QuickSort algorithm (quicksort as
; found in the PB examples folder).
; The CombSort algorithm is a modification of bubblesort that addresses the weakness
; of propagation from bottom to top in the bubblesort algorithm. CombSort was published
; by Box And Lacey in 1991. The algorithm is so much more efficient than the original
; bubblesort that it is in fact comparable to some of the best sorting algorithms
; there is, like QuickSort and HeapSort to name a few. In the CombSort algorithm the
; optimal value of the shrinkFactor vary from one array to another, but doesn't stray
; far away from the value 1.3.
;- Code:
; Compare sort algorithms:
; BubbleSort, modified BubbleSort, CombSort,
; QuickSort and PB internal SortArray() command
;
ImportC "msvcrt.lib"
ftime( a ) As "__ftime@4"
EndImport
Structure _timeb
time.l
milli.w
tz.w
flag.w
EndStructure
#NMAX = 500000
Global Dim dummy.l(0)
Global Dim array.l(#NMAX)
Global t1._timeb
Global t2._timeb
DisableDebugger
; QuickSort - From PureBasic Advanced examples
;
Declare QuickSortRecursive(g.l, d.l)
Procedure QuickSort(*ptr, n.l)
; small dummy() wrapper.
tmpAddress.l = @dummy()
dummy() = *ptr
QuickSortRecursive(0, n)
dummy() = tmpAddress
EndProcedure
Procedure QuickSortRecursive(g.l, d.l)
If g < d
v = dummy(d)
i = g-1
j = d
Repeat
Repeat
i=i+1
Until dummy(i) >= v
ok = 0
Repeat
If j>0
j=j-1
Else
ok=1
EndIf
If dummy(j) <= v
ok=1
EndIf
Until ok<>0
tmp.l = dummy(i)
dummy(i) = dummy(j)
dummy(j) = tmp
Until j <= i
t = dummy(j)
dummy(j) = dummy(i)
dummy(i) = dummy(d)
dummy(d) = t
QuickSortRecursive(g, i-1)
QuickSortRecursive(i+1, d)
EndIf
EndProcedure
; Fill Array with random numbers
For i = 0 To #NMAX
array(i) = Random($FFFF)
;array(i) = Random(32767)
Next
; Need the exact conditions for both tests
; so copy array.
Mem = AllocateMemory(#NMAX*4+4)
If Mem = 0
End
EndIf
CopyMemory(@array(0), Mem, #NMAX*4+4)
; Time the QuickSort algorithm
ftime(@t1)
QuickSort(Mem, #NMAX)
ftime(@t2)
message.s = "QuickSort : "+Str(t2\milli-t1\milli)+"ms"+Chr(13)+Chr(10)
; Time the internal SortArray() command
ftime(@t1)
SortArray(array(), 0)
ftime(@t2)
message + "SortArray : "+Str(t2\milli-t1\milli)+"ms"+Chr(13)+Chr(10)
MessageRequester("Result", message, 0)
FreeMemory(Mem)
Code: Select all
//---------------------------------------------------------------------------
#include <stdafx.h>
#include <stdlib.h>
#include <conio.h>
#include <stdio.h>
#include <sys\timeb.h>
#pragma hdrstop
#define gSz 500000
typedef struct {
int len;
int elem[gSz];
} vector;
void printv(char *str, vector* v)
{
int i = 0;
printf("%s", str);
for (; i < v->len; ++i)
printf("%d ", v->elem[i]);
printf("\n");
}
#define MAX(a, b) ((a > b) ? (a) : (b))
#define MIN(a, b) ((a < b) ? (a) : (b))
int partition(vector *v, int p, int r)
{
int x;
int i;
int j;
if (p + 3 < r)
x = MIN(v->elem[p], MAX(v->elem[p+1], v->elem[p+2]));
else
x = v->elem[p];
i = p - 1;
j = r + 1;
while (1) {
do {
--j;
} while (v->elem[j] > x);
do {
++i;
} while (v->elem[i] < x);
if (i < j) {
int aux = v->elem[j];
v->elem[j] = v->elem[i];
v->elem[i] = aux;
} else
return j;
}
}
void quicksort_real(vector *v, int p, int r) {
if (p < r) {
int q = partition(v, p, r);
quicksort_real(v, p, q);
quicksort_real(v, q + 1, r);
}
}
void quicksort(vector *v) {
printf("Quicksort...\n");
quicksort_real(v, 0, v->len - 1);
}
int main(int argc, char* argv[])
{
int i;
int lDiv;
vector lV;
unsigned long lS;
struct timeb t;
lV.len = gSz;
for ( i = 0; i < lV.len; ++i ) {
lDiv = rand();
lV.elem[i] = (int)(rand()/(lDiv==0?1:lDiv));
}
printf("Vector populated; Hit any key to quicksort...");
getch();
ftime(&t);
lS = t.millitm;
quicksort(&lV);
ftime(&t);
lS = t.millitm - lS;
printf("Elapsed milliseconds: %ld", lS);
return 0;
}
//---------------------------------------------------------------------------
I changed the ellapsedmilliseconds() to ftime as I suspected ftime isnt so very hi res with timing, infact with ellapsedmilliseconds() gives "slower" results on our computers; which shows that there is some logic there.
Posted: Sun Feb 10, 2008 9:42 am
by superadnim
I didn't happen to read both sources but since it's a benchmark and it relates sorting algorithms.... Are you actually giving both methods the SAME data or is the second test performed with the sorted array? because that would explain for one the inconclusive results you're getting.
Posted: Sun Feb 10, 2008 10:44 am
by kinglestat
you can see in the code
data is random every time
Posted: Sun Feb 10, 2008 1:44 pm
by pdwyer
what were the times? (differences I mean)
My qsort proc is the same speed as yours (edit, actually mine is a little slower

). I got rid of all the time calls you made to msvcrt.lib though because I was getting answers with negative times. I wonder how powerbasic would fare, it has register variable options.
My version, which isn't as fast as PB's internal one BTW looks like below. I wonder if it was done with pointers instead of array calls if it would be faster...?
Maybe you'll just have to accept that his is bigger than yours
Code: Select all
Procedure QuickSortLong(List.l(1), First.l, last.l)
MedVal.l
hi.l
lo.l
If First >= Last
ProcedureReturn
EndIf
MedVal = list(first)
lo = first
hi = last
While 1
While list(hi) >= medval
hi = hi - 1
If hi <= lo
Break
EndIf
Wend
If hi <= lo
list(lo) = medval
Break
EndIf
list(lo) = list(hi) ;swap Vals
lo = lo + 1
While list(lo) < medval
lo = lo + 1
If lo >= hi
Break
EndIf
Wend
If lo >= hi
lo = hi
list (hi) = medVal
Break
EndIf
list(hi) = list(lo) ;Swap Vals
Wend
QuickSortLong(list(), first, lo -1)
QuickSortLong(list(), lo +1, last)
EndProcedure
;====================================
Dim Larray.l(20)
For i = 0 To 20
Larray(i) = Random($FFFF)
Debug Larray(i)
Next
QuickSortLong(Larray(), 0,20)
Debug "----------"
For i = 0 To 20
Debug Larray(i)
Next
Posted: Sun Feb 10, 2008 3:23 pm
by pdwyer
I tried removing all of the arrays and swaping them for pointer arithmatic and it was not really any different, possibly slightly slower

.
It looks more sophisticated though with those square brackets
(also, making some vars global slows it down too)
Code: Select all
Structure MemArray
Lng.l[0]
EndStructure
Procedure QuickSortLong(pList.l, First.l, last.l)
If Last > first
MedVal.l
hi.l
lo.l
*LstArray.MemArray = pList;@List()
MedVal = *LstArray\lng[first]
lo = first
hi = last
While 1
While *LstArray\lng[hi] >= medval
hi = hi - 1
If hi <= lo
Break
EndIf
Wend
If hi <= lo
*LstArray\lng[lo] = medval
Break
EndIf
*LstArray\lng[lo] = *LstArray\lng[hi] ;swap Vals
lo = lo + 1
While *LstArray\lng[lo] < medval
lo = lo + 1
If lo >= hi
Break
EndIf
Wend
If lo >= hi
lo = hi
*LstArray\lng[hi] = medVal
Break
EndIf
*LstArray\lng[hi] = *LstArray\lng[lo] ;Swap Vals
Wend
QuickSortLong(pList, first, lo -1)
QuickSortLong(pList, lo +1, last)
EndIf
EndProcedure
#ArraySize = 5
Dim Larray.l(#ArraySize)
*pLArray.MemArray = @Larray()
For i = 0 To #ArraySize
Larray(i) = Random($FF)
Debug Larray(i)
Next
QuickSortLong(@Larray(), 0,#ArraySize)
Debug "----Sorted------"
For i = 0 To #ArraySize
Debug Larray(i)
;Debug "-" + Str(*pLArray\lng[i])
Next
Posted: Sun Feb 10, 2008 4:51 pm
by hellhound66
Removed.ry.
Posted: Sun Feb 10, 2008 6:07 pm
by kinglestat
well, that sort is from the codearchive
any merits to speed is to blbltheworm + Andre
I only adapted it
but there are 2 sorts in the PB source
a custom sort and a sort using SortArray (PB inbuilt sort function) which is the faster of the 2...probably due to pointer swaps rather than assiging 1st to temp
Speeds are as follows
PB
QuickSort: 125
SortArray: 94
C/C++
Borland 84 Avg
Microsoft 100 Avg
Watcom 140 Avg
re negative timing I think ftime is not accurate for high resultion timing, so on fast computers it might throw the timings off
Posted: Sun Feb 10, 2008 7:06 pm
by Rook Zimbabwe
See PB SortArray() is faster than MS and Watcom!
What version numbers of each... I would figure the latest versions of each would be more optimized!

Posted: Sun Feb 10, 2008 7:08 pm
by hellhound66
Removed.
Posted: Sun Feb 10, 2008 7:11 pm
by Demivec
hellhound66 wrote:Offtopic:
See PB is faster than MS and Watcom!
Yeah, sure! And pigs can fly..
More Offtopic, maybe they're PBpigs.
