Page 3 of 3
Re: Optimize code size
Posted: Mon Jul 30, 2012 4:47 pm
by TI-994A
Tenaja wrote:TI-994A wrote:...but coding in Assembly isn't exactly within the reach of the average PureBasic programmer...
I would venture to guess that "very good" code is also not within the reach of the average programmer! (PB or not!)
But
good, clean code is.

Re: Optimize code size
Posted: Mon Jul 30, 2012 8:15 pm
by xorc1zt
Code: Select all
Procedure a (va.i, vb.i, vc.i, vd.i, vf.i)
Define r = va+vb-vc-vd+vf
If (r = 0)
Debug "0"
ElseIf (r = 1)
Debug "1"
ElseIf (r = 2)
Debug "2"
EndIf
EndProcedure
a(1,3,5,3,5)
a(1,3,5,3,5)
a(1,3,5,3,5)
the calls
Code: Select all
; a(1,3,5,3,5)
SUB rsp,8
PUSH qword 5
PUSH qword 3
PUSH qword 5
PUSH qword 3
PUSH qword 1
POP rcx
POP rdx
POP r8
POP r9
SUB rsp,32
CALL _Procedure0
ADD rsp,48
; a(1,3,5,3,5)
SUB rsp,8
PUSH qword 5
PUSH qword 3
PUSH qword 5
PUSH qword 3
PUSH qword 1
POP rcx
POP rdx
POP r8
POP r9
SUB rsp,32
CALL _Procedure0
ADD rsp,48
; a(1,3,5,3,5)
SUB rsp,8
PUSH qword 5
PUSH qword 3
PUSH qword 5
PUSH qword 3
PUSH qword 1
POP rcx
POP rdx
POP r8
POP r9
SUB rsp,32
CALL _Procedure0
ADD rsp,48

Re: Optimize code size
Posted: Tue Jul 31, 2012 3:22 pm
by fromVB
TI-994A wrote:Even the slightest optimization is still good news. Thanks for pointing that out, luis.
luis wrote:... there is still the fact some unused procedure are compiled in the final exe, if we want to consider the "size" side of things...
Knowing that PureBasic is a single-pass compiler, bloat caused by the failure to remove unused variables/constants/procedures/etc. is the programmer's fault. A good programmer should not only be able to code well, but should also know the capabilities and limitations of the tools and resources he uses.
luis wrote:... The code you write in PB can be great PB code, but sometime you could have written it in a total different way in hand-made ASM, so it's not a guarantee optimized code in PB can give the best optimized ASM code for that specific problem. Hand written ASM using a different approach could be better ... the code generated by the compiler, but it will always be "bloated" compared to hand written ASM ... The most astute compiler cannot beat the average ASM programmer in the way the general code layout is generated ...
It would be highly unfair to compare PureBasic-generated Assembly with manually-coded Assembly; automated translations will invariably be less efficient than manual ones, and that's true for all compilers. But still, the important thing is that PureBasic and FASM does not
worsen your code. It all comes down to the programmer; if you write good, clean code, PureBasic would be able to produce fairly decent and efficient executables. If you don't, no amount of manual Assembly-tweaking, and no optimizer in the world would be able to help you.
Thank you for taking the time to explain the intricacies to me; your points directed me to some good resources in the forum, which have clarified my concerns.
So it is translated to assembly that is not optimized? I'm confused.
Re: Optimize code size
Posted: Tue Jul 31, 2012 4:26 pm
by wilbert
fromVB wrote:So it is translated to assembly that is not optimized? I'm confused.
Source code
PB output
Code: Select all
; A = 20
MOV dword [v_A],20
; B = 32
MOV dword [v_B],32
; C = A / 4 + B / 4
MOV ebx,dword [v_A]
MOV eax,ebx
MOV ecx,4
CDQ
IDIV ecx
MOV ebx,eax
MOV edi,dword [v_B]
MOV eax,edi
MOV ecx,4
CDQ
IDIV ecx
MOV edi,eax
ADD ebx,edi
MOV dword [v_C],ebx
Hand optimized code
Code: Select all
mov eax, 20
mov edx, 32
mov dword [v_A], eax
mov dword [v_B], edx
add eax, edx
sar eax, 2
mov dword [v_C], eax
A human understands that A / 4 + B / 4 is the same as (A + B) / 4. The second approach is faster and requires less operations.
4 is a power of 2. Therefore a shift operation (sar) can be used to perform the division. This is faster compared to an integer division (idiv).
The hand optimized code also has less memory access (3 times instead of 5).
C compilers try to analyze the source code and perform optimizations like that automatically while PureBasic just does a literal translation.
Does that make PureBasic slow ? No. In most situations you won't notice the difference. But when you have code that is executed millions of times inside a loop it can be very rewarding to optimize that part of your application using hand written asm code.
Re: Optimize code size
Posted: Tue Jul 31, 2012 4:46 pm
by Demivec
wilbert wrote:A human understands that A / 4 + B / 4 is the same as (A + B) / 4. The second approach is faster and requires less operations.
4 is a power of 2. Therefore a shift operation (sar) can be used to perform the division. This is faster compared to an integer division (idiv).
The hand optimized code also has less memory access (3 times instead of 5).
C compilers try to analyze the source code and perform optimizations like that automatically while PureBasic just does a literal translation.
Does that make PureBasic slow ? No. In most situations you won't notice the difference. But when you have code that is executed millions of times inside a loop it can be very rewarding to optimize that part of your application using hand written asm code.
Hand optimizing the assembly is not needed in the simple case you presented. Instead what is needed is hand optimizing the PureBasic before it is compiled.
Original PureBasic source code:
Here are your hand optimized assembly results:
Code: Select all
mov eax, 20
mov edx, 32
mov dword [v_A], eax
mov dword [v_B], edx
add eax, edx
sar eax, 2
mov dword [v_C], eax
Compare to the assembly of hand optimized PureBasic source code:
Code: Select all
; A = 20
MOV dword [v_A],20
; B = 32
MOV dword [v_B],32
; C = (A + B) >> 2
MOV ebx,dword [v_a]
ADD ebx,dword [v_b]
SAR ebx,2
MOV dword [v_C],ebx
In this case they are almost identical. The hand optimized PureBasic source code actually uses one less MOV instruction when it is compiled but it still has 5 memory accesses. I am not skilled enough to know if this results in code that is faster or slower overall

.
The point of this illustration is that there are definite improvements to be gained from wisely structuring code in PureBasic. I agree that using assembly code directly can provide the greatest improvements, but as I demonstrated, it isn't necessary to go directly to assembly to obtain at least some of these improvements.
Re: Optimize code size
Posted: Tue Jul 31, 2012 4:57 pm
by wilbert
I tried to explain what compiler optimizations could look like.
It's not uncommon you can get similar results optimizing the source code.
Your remark demonstrates quite well that compiler optimizations have less impact on good quality source code.
Re: Optimize code size
Posted: Tue Jul 31, 2012 5:47 pm
by Tenaja
fromVB wrote:So it is translated to assembly that is not optimized? I'm confused.
This is correct. Check this out, for example:
Code: Select all
Procedure test1()
Protected b.i = 1
Protected c.i = 0
Protected d.i = 0
Protected e.i = 0
Protected f.i = 0
Protected g.i = 0
; Insert Code here...
EndProcedure
test1()
The Proc compiles to this:
Code: Select all
_Procedure0:
PS0=28
XOR eax,eax ; initialize all local variables to 0:
PUSH eax
PUSH eax
PUSH eax
PUSH eax
PUSH eax
PUSH eax
; Protected b.i = 1
MOV dword [esp],1
; Protected c.i = 0
MOV dword [esp+4],0 ; duplicate the initialization of variables to 0:
; Protected d.i = 0
MOV dword [esp+8],0
; Protected e.i = 0
MOV dword [esp+12],0
; Protected f.i = 0
MOV dword [esp+16],0
; Protected g.i = 0
MOV dword [esp+20],0
The coder is punished for exercising good programming practice of initializing every variable before use, because in this example, they are all initialized TWICE!
And regarding the previous example with the a/4 + b/4 vs (a+b) >> 2, that is the difference between a strong optimizing compiler (It does it automatically for you!) vs. one with NO optimizing, which requires YOU to code it all in a way that ends up more optimized. The biggest problem with this is that VERY FEW PB coders know how to even SEE the asm code, much less actually analyze it. Also, very few understand why one code snip compiles more efficiently than another. An optimizing compiler takes care of all of this for you so it just doesn't matter--the coder can be naive, and blissfully so, since the end result is about the same. Not so with PB; since its primary focus is beginners, who are still naive about these manual optimizations, it merrily produces poor code because of the skillset of the coder base.
And YES, it
IS SLOWER. How much slower may very well be irrelevant, in most situations. Consider if you are calculating an internal window size after a major window resize; nobody will notice a dozen extra instructions being executed, even on a slow cpu.
On the other hand, if you are performing numerous CPU-intensive calculations, those dozen extra instructions could easily turn into minutes, if not hours given a large enough task.
Re: Optimize code size
Posted: Tue Jul 31, 2012 6:08 pm
by Thorium
Not to forget that PB does not have any constructs to enable the programmer to use SIMD operations. With PB (and many other compilers) this is only possible using ASM. And having no SIMD means you are loosing a whole layer of parallelization.
For any parallelizable code SIMD is a great performance boost. And SIMD can't be applied automaticly by compilers. Some compilers like Intel C++ offer commands to use SIMD on the high language level.
Re: Optimize code size
Posted: Wed Aug 01, 2012 10:41 am
by TI-994A
Demivec wrote:...what is needed is hand optimizing the PureBasic before it is compiled...there are definite improvements to be gained from wisely structuring code in PureBasic....
wilbert wrote:Your remark demonstrates quite well that compiler optimizations have less impact on good quality source code.
Thanks for your illustrative examples; precisely the point I've been trying to make.
Tenaja wrote:...The coder is punished for exercising good programming practice of initializing every variable before use, because in this example, they are all initialized TWICE!
The coder is punished for
not knowing that PureBasic already initializes all variables with zero. Good practices are always contingent.
Tenaja wrote:(thread link)...the Java-written Open Office is so pathetically slow with a million values in a spreadsheet that it took nearly an hour to process a simple graph. In MS Excel, it was nearly instantaneous. There is NO WAY that C is that much faster than Java in ANY benchmark test... the Java implementation either has poor memory management, or poor coding...
An optimizing compiler takes care of all of this for you so it just doesn't matter--the coder can be naive, and blissfully so, since the end result is about the same.
Even highly optimized Java, with its bytecode, JVM, and JIT compilation, cannot help poor coding. Thank you for pointing that out, although it contradicts your second statement above. The results are clearly not the same.
Re: Optimize code size
Posted: Sun Aug 19, 2012 5:17 am
by Vitor_BossĀ®
Thank you for all codes and explanation about ASM coding. I tried optimize memory comparison procedure and got this
Code: Select all
Procedure CmpMem(*Source, *Test, Size)
push edi
push esi
mov eax, [esp+14h] ; Size
mov esi, [esp+10h] ; *Test2
mov edi, [esp+0Ch] ; *Source
dec eax
cmp eax, 3
jl loc_Byte_cmp_init ; if (Size > 4)
add esi, 0FFFFFFFDh ; Needed to alignment of last byte
add edi, 0FFFFFFFDh ; Needed to alignment of last byte
cmp eax, 7
jl loc_4Bytes_cmp ; if (Size > 8)
add esi, 0FFFFFFFCh ; Needed to alignment of last byte
add edi, 0FFFFFFFCh ; Needed to alignment of last byte
!loc_8Bytes_cmp:
movq mm0, [eax+edi] ;Source[i]
pcmpeqd mm0, [eax+esi] ;Test[i]
psrlq mm0, 16
movd edx, mm0
cmp edx, 0FFFFFFFFh
jne loc_return_0 ; If (Source[i]!=Test[i])
add eax, 0FFFFFFF8h ; i-=8
jl loc_return_1
cmp eax, 7
jg loc_8Bytes_cmp ; If (i>8)
mov al, 7
jmp loc_8Bytes_cmp ; If (i>7)
!loc_4Bytes_cmp:
mov edx, [eax+edi] ;Source[i]
cmp edx, [eax+esi] ;Test[i]
jnz loc_return_0 ; If (Source[i]!=Test[i])
add eax, 0FFFFFFFCh ; i-=4
jl loc_return_1
cmp eax, 2
jg loc_4Bytes_cmp ; If (i>3)
mov al, 3
jmp loc_4Bytes_cmp
!loc_Byte_cmp_init:
test eax, eax
jle loc_return_0 ;If (Size > 0)
!loc_Byte_cmp:
mov dl, [eax+edi] ;Source[i]
cmp dl, [eax+esi] ;Test[i]
jnz loc_return_0 ; If (Source[i]!=Test[i])
dec eax ; --i
jge loc_Byte_cmp ; If (i>=0)
!loc_return_1:
pop edi
pop esi
mov eax, 1
!retn 0ch
!loc_return_0:
pop edi
pop esi
EndProcedure
In a 10 millions test loop I got up to 50% speed improvement.
All processors since 98 has MMX right?
Fred you can use it if you want.
Re: Optimize code size
Posted: Mon Aug 20, 2012 8:26 pm
by Thorium
Check out the Rep and Scan instructions, they should speed it up more.
Or use 128bit SSE registers for the compare.
Re: Optimize code size
Posted: Tue Aug 28, 2012 4:50 am
by Vitor_BossĀ®
Thorium wrote:Check out the Rep and Scan instructions, they should speed it up more.
Or use 128bit SSE registers for the compare.
Updated with 64bit MMX instructions for a fastest results, all memory use a bus 64bits wide right?