Possible Compiler Optimizations

Everything else that doesn't fall into one of the other PB categories.
User avatar
mback2k
Enthusiast
Enthusiast
Posts: 257
Joined: Sun Dec 02, 2007 12:11 pm
Location: Germany

Possible Compiler Optimizations

Post by mback2k »

Hello everyone,

before I ask for compiler optimizations I would like to understand why the compiler currently does things like the following examples. Maybe the PureBasic team can elaborate a little bit about the reasons?

Let's assume the following PureBasic code:

Code: Select all

Procedure Test()
  Protected A, B
  A = Random(100)
  B = A
  ProcedureReturn B
EndProcedure
Which results in the following ASM code:

Code: Select all

; Procedure Test()
macro MP0{
_Procedure0:
  PS0=64
  XOR    rax,rax
  PUSH   rax
  PUSH   rax
  SUB    rsp,40                                                                                                                                                                                                                                                        
; Protected A, B
; A = Random(100)
  PUSH   qword 100
  POP    rcx
  CALL   PB_Random_THREAD
  MOV    qword [rsp+40],rax
; B = A
  PUSH   qword [rsp+40]
  POP    rax
  MOV    qword [rsp+48],rax
; ProcedureReturn B
  MOV    rax,qword [rsp+48]
  JMP   _EndProcedure1
; EndProcedure
  XOR    rax,rax
_EndProcedure1:
  ADD    rsp,56
  RET
}
Example 1; the following extract from the code above:

Code: Select all

; A = Random(100)
  PUSH   qword 100
  POP    rcx
  CALL   PB_Random_THREAD
  MOV    qword [rsp+40],rax
Could be shortened to:

Code: Select all

; A = Random(100)
  MOV    rcx,qword 100
  CALL   PB_Random_THREAD
  MOV    qword [rsp+40],rax
Example 2; the following extract from the code above:

Code: Select all

; B = A
  PUSH   qword [rsp+40]
  POP    rax
  MOV    qword [rsp+48],rax
Could be shortened to:

Code: Select all

; B = A
  MOV    rax,qword [rsp+40]
  MOV    qword [rsp+48],rax
But it could also be done as:

Code: Select all

; B = A
  PUSH   qword [rsp+40]
  POP    qword [rsp+48]
Question: Why does the current PB compiler make use of the stack and registers to move variable contents like this?

Thanks in advance!

[compiled using PB 5.11 x64 on Windows]
[Edit 1: updated to clarify code examples]
Last edited by mback2k on Sun May 12, 2013 4:54 pm, edited 2 times in total.
User avatar
Tenaja
Addict
Addict
Posts: 1948
Joined: Tue Nov 09, 2010 10:15 pm

Re: Possible Compiler Optimizations

Post by Tenaja »

Deleted.
Last edited by Tenaja on Sun May 12, 2013 5:30 pm, edited 1 time in total.
User avatar
mback2k
Enthusiast
Enthusiast
Posts: 257
Joined: Sun Dec 02, 2007 12:11 pm
Location: Germany

Re: Possible Compiler Optimizations

Post by mback2k »

Thanks for your answer, but I don't think you understood the purpose of those examples. The examples 1 and 2 show the result of a single compiled PureBasic line, therefore it is indeed possible to optimize the code that is produced in a single compiler pass.

1. Yes, I understand the purpose of those PUSHes you are talking about and I actually didn't question those.
2. Yes, it is a one/single pass compiler, but the example is the result of a single line.
3. I am not talking about the irrelevant variable, but the the way the value is moved from A to B. Please take a look at each example and the optimization proposals below each of them.

I am sorry if my initial post was not clear enough and hope that this clarified what I am talking about. tl;dr: I am talking about ASM optimizations for a single PureBasic line.

Thanks.
User avatar
Tenaja
Addict
Addict
Posts: 1948
Joined: Tue Nov 09, 2010 10:15 pm

Re: Possible Compiler Optimizations

Post by Tenaja »

I just looked at the 32-bit asm, and it does use just the push/call combination in favor of the push/pop/call.

Hopefully, if Fred can improve this in the 64-bit output, he will. I was wondering why some algorithms are slower in 64-bit mode; perhaps he only improved the 32-bit compiler at one time.

And btw, the fact that it is on one line (or command) is irrelevant in optimizing. What matters is where the asm instructions are generated, and if Fred wants to complicate the procedures that generate those instructions.
User avatar
Tenaja
Addict
Addict
Posts: 1948
Joined: Tue Nov 09, 2010 10:15 pm

Re: Possible Compiler Optimizations

Post by Tenaja »

I just saw this in the History file, under heading dated 1st December 2009 : Version 4.40:
- Added: Peephole optimizer to 64-bit versions to produce better code
If this is true, then Fred can certainly improve the code. If he has the time.
User avatar
idle
Always Here
Always Here
Posts: 5042
Joined: Fri Sep 21, 2007 5:52 am
Location: New Zealand

Re: Possible Compiler Optimizations

Post by idle »

maybe this article on x64 might help explain
http://codejury.com/a-walk-in-x64-land/
Windows 11, Manjaro, Raspberry Pi OS
Image
Fred
Administrator
Administrator
Posts: 16619
Joined: Fri May 17, 2002 4:39 pm
Location: France
Contact:

Re: Possible Compiler Optimizations

Post by Fred »

The PureBasic compiler has been created using a stack call based model (useful for x86) so the whole pipeline use this and it's a very big task to add a new mode to handle a register based model (useful for x64). So for now, it still use the stack and unpop the register as need before calling the function. Even if it does look ugly (and I do agree it would be better without that), it won't be a big bottleneck performance wise as all is in the cache. The peephole optimizer already takes care of several standard cases to produce smaller output, and I could add more if needed.
User avatar
mback2k
Enthusiast
Enthusiast
Posts: 257
Joined: Sun Dec 02, 2007 12:11 pm
Location: Germany

Re: Possible Compiler Optimizations

Post by mback2k »

Thanks for the insight, Fred. I actually thought that the stack call based model is the (historic) reason. Thanks for the confirmation. :-)

I think it would be great if you could add the case of the second example to the peephole optimizer. Could you also elaborate a little bit on what is possible with the peephole optimizer?

I also spotted another case, please take a look at the x64 ASM code generated from the following code and compare the assignments between the two With-sections:

Code: Select all

Structure TestStruc
  A.Integer
  B.Integer
EndStructure

Define Test.TestStruc

With Test
  \A\i = 1
  \B\i = 2
EndWith

Define *Ptr.TestStruc = @Test

With *Ptr
  \A\i = 1
  \B\i = 2
EndWith
The first With-section results in the following code:

Code: Select all

; With Test
; \A\i = 1
  LEA    rbp,[v_Test]
  MOV    qword [rbp],1
; \B\i = 2
  MOV    qword [rbp+8],2
; EndWith
The second one in the following code:

Code: Select all

; With *Ptr
; \A\i = 1
  MOV    rbp,qword [p_Ptr]
  MOV    qword [rbp],1
; \B\i = 2
  MOV    rbp,qword [p_Ptr]
  MOV    qword [rbp+8],2
; EndWith
It would be great if the rpb-register could only be set once, even for pointers (and probably other dynamic types, like lists, etc.).

I know that those are very small optimizations which do not matter that much. But I still think that small optimizations can contribute to an overall better product.
Thanks in advance!
Fred
Administrator
Administrator
Posts: 16619
Joined: Fri May 17, 2002 4:39 pm
Location: France
Contact:

Re: Possible Compiler Optimizations

Post by Fred »

Basically the peephole optimizer parse the generated assembly code, detect standard patterns, and if it matches replace them in-place with a smaller/better code.
User avatar
mback2k
Enthusiast
Enthusiast
Posts: 257
Joined: Sun Dec 02, 2007 12:11 pm
Location: Germany

Re: Possible Compiler Optimizations

Post by mback2k »

Thanks for implementing the first optimization mentioned in each of my posts above. What about the second ones?

Code: Select all

; B = A
  PUSH   qword [rsp+40]
  POP    rax
  MOV    qword [rsp+48],rax

Code: Select all

; With *Ptr
; \A\i = 1
  MOV    rbp,qword [p_Ptr]
  MOV    qword [rbp],1
; \B\i = 2
  MOV    rbp,qword [p_Ptr]
  MOV    qword [rbp+8],2
; EndWith
User avatar
Tristano
Enthusiast
Enthusiast
Posts: 190
Joined: Thu Nov 26, 2015 6:52 pm
Location: Italy
Contact:

Re: Possible Compiler Optimizations

Post by Tristano »

I wanted to ask some clarification about Fred's statement:
Fred wrote:Basically the peephole optimizer parse the generated assembly code, detect standard patterns, and if it matches replace them in-place with a smaller/better code.
... so, does this imply that when producing the PureBasic.asm file via the compiler switch /COMMENTED, the assembly code thus produced is still unoptimized?

If this is the case, it means that any manual code editing done to the PureBasic.asm file will also undergo optimization when compiling it with /REASM?

Also, this thread is a few years old, have things changed since, regarding compiler optimizations?
The PureBASIC Archives: FOSS Resources:
Post Reply