PureBasic Forums - English

Posted: **Tue Nov 13, 2007 7:57 pm**

A more optimized code!

Posted: **Tue Nov 13, 2007 8:33 pm**

Seconded.

Armoured wrote:A more optimized code!

Yes, more than C please.

Posted: **Tue Nov 13, 2007 8:42 pm**

+1 although I don't know how optimized the output already is...

Posted: **Tue Nov 13, 2007 8:51 pm**

Psychophanta wrote:Yes, more than C please.

lol!

You do realize that companies developing industrial grade C compilers spend
years with large teams on their code optimisation ?
I wonder how you expect us to compete with that...

Posted: **Tue Nov 13, 2007 8:56 pm**

Just use some magic

Posted: **Tue Nov 13, 2007 8:59 pm**

and your fantaisie.

c ya,
nco2k

Posted: **Tue Nov 13, 2007 9:02 pm**

Just add a redundant syntax that you can add just before a procedure: OptimiseProcedureBetterThanCorC++

Or even better a little checkbox in the compiler options that says the same thing...

It doesn't have to do anything at all

Posted: **Tue Nov 13, 2007 9:16 pm**

Foz wrote:Just add a redundant syntax that you can add just before a procedure: OptimiseProcedureBetterThanCorC++

Or even better a little checkbox in the compiler options that says the same thing...

It doesn't have to do anything at all

Excellent idea! However, I think it's already patented and in use by Microsoft, see here: http://www.purebasic.fr/english/viewtopic.php?t=29526

But we don't care about software patents anyways.

Posted: **Wed Nov 14, 2007 4:32 pm**

I found the checkbox!!!

Posted: **Tue Dec 04, 2007 9:01 pm**

Here's a real-world sample of what can be done (with a LOT of time):

PB code:

Code: Select all

Structure SFrameMap
  StructureUnion
    Alloc.b[1048576/8]
    BitsL.l[0]
  EndStructureUnion
EndStructure

Global Frames.SFrameMap

Procedure UnlockFrame(Frame)
  Index = Frame / 32
  Offset = Frame % 32
  Frames\BitsL[Index] &~ (1 << Offset)
EndProcedure

Currently generated code:

Code: Select all

_Procedure0:
  PUSH   ebp
  PUSH   ebx
  PUSH   edi
  PS0=24
  XOR    eax,eax
  PUSH   eax
  PUSH   eax                                                                                                                                                                                    
; Index = Frame / 32
  MOV    ebx,dword [esp+PS0+0]
  MOV    eax,ebx
  MOV    ebx,32
  CDQ
  IDIV   ebx
  MOV    ebx, eax
  MOV    dword [esp],ebx
; Offset = Frame % 32
  MOV    ebx,dword [esp+PS0+0]
  AND    ebx,31
  MOV    dword [esp+4],ebx
; Frames\BitsL[Index] &~ (1 << Offset)
  LEA    ebp,[v_Frames]
  PUSH   ebp
  MOV    eax,dword [esp+4]
  SAL    eax,2
  POP    ebp
  ADD    ebp,eax
  MOV    ebx,dword [ebp]
  MOV    edi,dword [esp+4]
  MOV    ecx,edi
  MOV    eax,1
  SAL    eax,cl
  MOV    edi,eax
  NOT    edi
  AND    ebx,edi
  PUSH   ebx
  LEA    ebp,[v_Frames]
  PUSH   ebp
  MOV    eax,dword [esp+8]
  SAL    eax,2
  POP    ebp
  ADD    ebp,eax
  POP    eax
  MOV    dword [ebp],eax
; EndProcedure
  XOR    eax,eax
_EndProcedure1:
  ADD    esp,8
  POP    edi
  POP    ebx
  POP    ebp
  RET    4

Optimized code:

Code: Select all

_Procedure0:
MOV     eax, [esp+4]
MOV     ecx, eax
SHR     eax, 5
AND     ecx, 31
MOV     edx, 1
SHL     edx, cl
NOT     edx
AND     [eax*4+frames], edx
RET     4

Out of curiousity, I'm going to see what GCC does as well.

Edit: gcc 3.4.4 with -O3:

Code: Select all

_UnlockFrame@4:
	push	ebp
	mov	eax, -2
	mov	ebp, esp
	mov	ecx, DWORD PTR [ebp+8]
	pop	ebp
	mov	edx, ecx
	and	ecx, 31
	shr	edx, 5
	rol	eax, cl
	and	DWORD PTR _Frames[0+edx*4], eax
	ret	4

That's pretty amazingly fast, but the debugging part of the code is actually WRONG because it creates a stack frame which seems to be destroyed too early.
Also, GCC got a very unfair advantage by having only all variables specified as unsigned. Let's see what it does when everything is signed...

Code: Select all

_UnlockFrame@4:
	push	ebp
	mov	ebp, esp
	mov	ecx, DWORD PTR [ebp+8]
	test	ecx, ecx
	mov	eax, ecx
	js	L5
	pop	ebp
	mov	edx, eax
	sar	edx, 5
	mov	eax, edx
	sal	eax, 5
	sub	ecx, eax
	mov	eax, -2
	rol	eax, cl
	and	DWORD PTR _Frames[0+edx*4], eax
	ret	4
	.p2align 4,,7
L5:
	pop	ebp
	lea	eax, [ecx+31]
	mov	edx, eax
	sar	edx, 5
	mov	eax, edx
	sal	eax, 5
	sub	ecx, eax
	mov	eax, -2
	rol	eax, cl
	and	DWORD PTR _Frames[0+edx*4], eax
	ret	4

Wow, that was big!
It still pops the stack frame too early, so debugging data will report a crash in the wrong function if something goes wrong. The PB code is at least correct.

Posted: **Thu Feb 21, 2008 6:10 am**

Armoured wrote:A more optimized code!

+1

Posted: **Thu Feb 21, 2008 10:57 am**

freak wrote: You do realize that companies developing industrial grade C compilers spend years with large teams on their code optimisation ?
I wonder how you expect us to compete with that...

of course, nobody expects Fantaisie to compete with these large outfits. There's a way out though: don't use assembly as an intermediate language but have PB create C, so you can leverage the "years that large teams spend on their code optimisation".
A few of those that elected to go this route: Eiffel, SmartEiffel, GLBasic, BCX, ....

Posted: **Thu Feb 21, 2008 12:32 pm**

A few of those that elected to go this route: Eiffel, SmartEiffel, GLBasic, BCX, ....

Compiling a small project in GLBasic took a whopping 21 seconds.

Posted: **Thu Feb 21, 2008 12:37 pm**

Trond wrote:Compiling a small project in GLBasic took a whopping 21 seconds.

Wait until you get something over 10,000 lines.

Posted: **Thu Feb 21, 2008 2:22 pm**

Trond wrote:Compiling a small project in GLBasic took a whopping 21 seconds.

of course there are disadvantages as well.... but if you're a small outfit and want good code optimisation nonetheless it might be an approach to take.

PureBasic Forums - English

Optimizations

Optimizations

Re: Optimizations

Re: Optimizations

Re: Optimizations

Re: Optimizations