Page 1 of 1

Optimizations

Posted: Tue Nov 13, 2007 7:57 pm
by Armoured
A more optimized code!

Re: Optimizations

Posted: Tue Nov 13, 2007 8:33 pm
by Psychophanta
Seconded.
Armoured wrote:A more optimized code!
Yes, more than C please. 8)

Posted: Tue Nov 13, 2007 8:42 pm
by milan1612
+1 although I don't know how optimized the output already is...

Re: Optimizations

Posted: Tue Nov 13, 2007 8:51 pm
by freak
Psychophanta wrote:Yes, more than C please. 8)
lol! :lol:

You do realize that companies developing industrial grade C compilers spend
years with large teams on their code optimisation ?
I wonder how you expect us to compete with that...

Posted: Tue Nov 13, 2007 8:56 pm
by thefool
Just use some magic

Posted: Tue Nov 13, 2007 8:59 pm
by nco2k
and your fantaisie. :D

c ya,
nco2k

Posted: Tue Nov 13, 2007 9:02 pm
by Foz
Just add a redundant syntax that you can add just before a procedure: OptimiseProcedureBetterThanCorC++

Or even better a little checkbox in the compiler options that says the same thing...

It doesn't have to do anything at all :)

Posted: Tue Nov 13, 2007 9:16 pm
by Trond
Foz wrote:Just add a redundant syntax that you can add just before a procedure: OptimiseProcedureBetterThanCorC++

Or even better a little checkbox in the compiler options that says the same thing...

It doesn't have to do anything at all :)
Excellent idea! However, I think it's already patented and in use by Microsoft, see here: http://www.purebasic.fr/english/viewtopic.php?t=29526

But we don't care about software patents anyways. :twisted:

Posted: Wed Nov 14, 2007 4:32 pm
by Trond
I found the checkbox!!!

Image

Posted: Tue Dec 04, 2007 9:01 pm
by Trond
Here's a real-world sample of what can be done (with a LOT of time):

PB code:

Code: Select all

Structure SFrameMap
  StructureUnion
    Alloc.b[1048576/8]
    BitsL.l[0]
  EndStructureUnion
EndStructure

Global Frames.SFrameMap

Procedure UnlockFrame(Frame)
  Index = Frame / 32
  Offset = Frame % 32
  Frames\BitsL[Index] &~ (1 << Offset)
EndProcedure
Currently generated code:

Code: Select all

_Procedure0:
  PUSH   ebp
  PUSH   ebx
  PUSH   edi
  PS0=24
  XOR    eax,eax
  PUSH   eax
  PUSH   eax                                                                                                                                                                                    
; Index = Frame / 32
  MOV    ebx,dword [esp+PS0+0]
  MOV    eax,ebx
  MOV    ebx,32
  CDQ
  IDIV   ebx
  MOV    ebx, eax
  MOV    dword [esp],ebx
; Offset = Frame % 32
  MOV    ebx,dword [esp+PS0+0]
  AND    ebx,31
  MOV    dword [esp+4],ebx
; Frames\BitsL[Index] &~ (1 << Offset)
  LEA    ebp,[v_Frames]
  PUSH   ebp
  MOV    eax,dword [esp+4]
  SAL    eax,2
  POP    ebp
  ADD    ebp,eax
  MOV    ebx,dword [ebp]
  MOV    edi,dword [esp+4]
  MOV    ecx,edi
  MOV    eax,1
  SAL    eax,cl
  MOV    edi,eax
  NOT    edi
  AND    ebx,edi
  PUSH   ebx
  LEA    ebp,[v_Frames]
  PUSH   ebp
  MOV    eax,dword [esp+8]
  SAL    eax,2
  POP    ebp
  ADD    ebp,eax
  POP    eax
  MOV    dword [ebp],eax
; EndProcedure
  XOR    eax,eax
_EndProcedure1:
  ADD    esp,8
  POP    edi
  POP    ebx
  POP    ebp
  RET    4
Optimized code:

Code: Select all

_Procedure0:
MOV     eax, [esp+4]
MOV     ecx, eax
SHR     eax, 5
AND     ecx, 31
MOV     edx, 1
SHL     edx, cl
NOT     edx
AND     [eax*4+frames], edx
RET     4
Out of curiousity, I'm going to see what GCC does as well.

Edit: gcc 3.4.4 with -O3:

Code: Select all

_UnlockFrame@4:
	push	ebp
	mov	eax, -2
	mov	ebp, esp
	mov	ecx, DWORD PTR [ebp+8]
	pop	ebp
	mov	edx, ecx
	and	ecx, 31
	shr	edx, 5
	rol	eax, cl
	and	DWORD PTR _Frames[0+edx*4], eax
	ret	4
That's pretty amazingly fast, but the debugging part of the code is actually WRONG because it creates a stack frame which seems to be destroyed too early.
Also, GCC got a very unfair advantage by having only all variables specified as unsigned. Let's see what it does when everything is signed...

Code: Select all

_UnlockFrame@4:
	push	ebp
	mov	ebp, esp
	mov	ecx, DWORD PTR [ebp+8]
	test	ecx, ecx
	mov	eax, ecx
	js	L5
	pop	ebp
	mov	edx, eax
	sar	edx, 5
	mov	eax, edx
	sal	eax, 5
	sub	ecx, eax
	mov	eax, -2
	rol	eax, cl
	and	DWORD PTR _Frames[0+edx*4], eax
	ret	4
	.p2align 4,,7
L5:
	pop	ebp
	lea	eax, [ecx+31]
	mov	edx, eax
	sar	edx, 5
	mov	eax, edx
	sal	eax, 5
	sub	ecx, eax
	mov	eax, -2
	rol	eax, cl
	and	DWORD PTR _Frames[0+edx*4], eax
	ret	4
Wow, that was big!
It still pops the stack frame too early, so debugging data will report a crash in the wrong function if something goes wrong. The PB code is at least correct.

Re: Optimizations

Posted: Thu Feb 21, 2008 6:10 am
by va!n
Armoured wrote:A more optimized code!
+1 :!: :wink:

Re: Optimizations

Posted: Thu Feb 21, 2008 10:57 am
by dell_jockey
freak wrote: You do realize that companies developing industrial grade C compilers spend years with large teams on their code optimisation ?
I wonder how you expect us to compete with that...
of course, nobody expects Fantaisie to compete with these large outfits. There's a way out though: don't use assembly as an intermediate language but have PB create C, so you can leverage the "years that large teams spend on their code optimisation".
A few of those that elected to go this route: Eiffel, SmartEiffel, GLBasic, BCX, ....

Posted: Thu Feb 21, 2008 12:32 pm
by Trond
A few of those that elected to go this route: Eiffel, SmartEiffel, GLBasic, BCX, ....
Compiling a small project in GLBasic took a whopping 21 seconds.

Posted: Thu Feb 21, 2008 12:37 pm
by BriceManuel
Trond wrote:Compiling a small project in GLBasic took a whopping 21 seconds.
Wait until you get something over 10,000 lines.

Posted: Thu Feb 21, 2008 2:22 pm
by dell_jockey
Trond wrote:Compiling a small project in GLBasic took a whopping 21 seconds.
of course there are disadvantages as well.... but if you're a small outfit and want good code optimisation nonetheless it might be an approach to take.