Â
Using inline assembler brings more speed but the readability of the code is gone. So I try to use PB commands all the time.
Analyzing the generated assembler code, it's obvious that the compiler is a bit stupid sometimes (even though the compiler generates really good code compared to others).
So for example a code like this
a=12
b=a+1
first stores 12 in memory (variable a) then loads again this value from memory, adds 1 to it and saves the result to memory (variable b).
Would it be possible to omit the loading of variable a (it is stored in a register) and store variable b only if no further computation is done or the register is needed e.g. by using compiler directives?
I know that PB cannot produce hand optimized assembler code but is there still room for some *optional* speed enhancements for time critical routines or is the limit reached?
Especially when introducing new variable types like fp64 I fear the performance could drop because of the additional flexibility.
Â
Optional Speed Optimizations
Thomas,
But first why does the compiler execute first the variable to memory before to increment the value in memory ?
Because you, as a programmer, coded it like that !
If you coded :
b = 12 + 1, then the compiler would have store 12 in a register, then incremented it before to store in memory.
And if you coded b = 13, it would have save some steps.
Why did you wrote a = 12 and then b = a + 1 ?
Probably because you intended to use later either a and b. This, the compiler cannot decide to change, except that a possible optimization exist by loading 12 in a register and discovering that the corresponding variable will be used at the next step :
So the generated code in this case should be ie :
MOV eax, 12
MOV [v_a], eax
INC eax
MOV [v_b], eax
This is optimized like that.
But what would occur if you decide that :
a = function(12)
b = a + 1
or anything more complicate ...
Then the executable code will have to call something else in the meanwhile before to store the value of b. And in this case, registers will have changed a lot.
Well, optimization is possible, but will make the compiler more complicate.
KRgrds
You are right that it is possible to optimize.So for example a code like this
a=12
b=a+1
first stores 12 in memory (variable a) then loads again this value from memory, adds 1 to it and saves the result to memory (variable b).
But first why does the compiler execute first the variable to memory before to increment the value in memory ?
Because you, as a programmer, coded it like that !
If you coded :
b = 12 + 1, then the compiler would have store 12 in a register, then incremented it before to store in memory.
And if you coded b = 13, it would have save some steps.
Why did you wrote a = 12 and then b = a + 1 ?
Probably because you intended to use later either a and b. This, the compiler cannot decide to change, except that a possible optimization exist by loading 12 in a register and discovering that the corresponding variable will be used at the next step :
So the generated code in this case should be ie :
MOV eax, 12
MOV [v_a], eax
INC eax
MOV [v_b], eax
This is optimized like that.
But what would occur if you decide that :
a = function(12)
b = a + 1
or anything more complicate ...
Then the executable code will have to call something else in the meanwhile before to store the value of b. And in this case, registers will have changed a lot.
Well, optimization is possible, but will make the compiler more complicate.
KRgrds
My avatar is a small copy of the 4x1.8m image I created and exposed at 'Le salon international du meuble à Paris' january 2004 in Matt Sindall's 'Shades' designers exhibition. The original laminated print was designed using a 150 dpi printout.
Fred,
More ... it is optimized and clever because the compiler generates :
; a = 12
MOV dword [v_a],12
; b = a + 1
MOV ebx,dword [v_a]
INC ebx
MOV dword [v_b],ebx
; c = 1 + a
MOV ebx,dword [v_a]
INC ebx
MOV dword [v_c],ebx
Taking such a simple case is interesting, because it shows you had to understand both possible ways to add 1 to a memory variable.
Hope folks understand how uneasy it can become to design a language and let coders as much free as possible to write how they like.
Anyway I use Purebasic, love it, and search my way with.
KRgrds
More ... it is optimized and clever because the compiler generates :
; a = 12
MOV dword [v_a],12
; b = a + 1
MOV ebx,dword [v_a]
INC ebx
MOV dword [v_b],ebx
; c = 1 + a
MOV ebx,dword [v_a]
INC ebx
MOV dword [v_c],ebx
Taking such a simple case is interesting, because it shows you had to understand both possible ways to add 1 to a memory variable.
Hope folks understand how uneasy it can become to design a language and let coders as much free as possible to write how they like.
Anyway I use Purebasic, love it, and search my way with.
KRgrds
My avatar is a small copy of the 4x1.8m image I created and exposed at 'Le salon international du meuble à Paris' january 2004 in Matt Sindall's 'Shades' designers exhibition. The original laminated print was designed using a 150 dpi printout.
Re:
On the German board they said, that INC needs more cycles than ADD [reg], 1...fweil wrote:More ... it is optimized and clever because the compiler generates :
[...]
MOV ebx,dword [v_a]
INC ebx
MOV dword [v_b],ebx
I just wanted to note this...
Re: Re:
The code is from 2004.Mok wrote:On the German board they said, that INC needs more cycles than ADD [reg], 1...fweil wrote:More ... it is optimized and clever because the compiler generates :
[...]
MOV ebx,dword [v_a]
INC ebx
MOV dword [v_b],ebx
I just wanted to note this...

Rules for assembler optimization change with every CPU generation.
Besides INC needs not more cycles than ADD on most CPU's. Intel just suggests to not use INC anymore because Intel will not optimize that instruction any longer. Currently INC and ADD are at the exact same speed on most CPU's.