But good, clean code is.Tenaja wrote:I would venture to guess that "very good" code is also not within the reach of the average programmer! (PB or not!)TI-994A wrote:...but coding in Assembly isn't exactly within the reach of the average PureBasic programmer...
Optimize code size
Re: Optimize code size
Texas Instruments TI-99/4A Home Computer: the first home computer with a 16bit processor, crammed into an 8bit architecture. Great hardware - Poor design - Wonderful BASIC engine. And it could talk too! Please visit my YouTube Channel 
Re: Optimize code size
Code: Select all
Procedure a (va.i, vb.i, vc.i, vd.i, vf.i)
Define r = va+vb-vc-vd+vf
If (r = 0)
Debug "0"
ElseIf (r = 1)
Debug "1"
ElseIf (r = 2)
Debug "2"
EndIf
EndProcedure
a(1,3,5,3,5)
a(1,3,5,3,5)
a(1,3,5,3,5)
Code: Select all
; a(1,3,5,3,5)
SUB rsp,8
PUSH qword 5
PUSH qword 3
PUSH qword 5
PUSH qword 3
PUSH qword 1
POP rcx
POP rdx
POP r8
POP r9
SUB rsp,32
CALL _Procedure0
ADD rsp,48
; a(1,3,5,3,5)
SUB rsp,8
PUSH qword 5
PUSH qword 3
PUSH qword 5
PUSH qword 3
PUSH qword 1
POP rcx
POP rdx
POP r8
POP r9
SUB rsp,32
CALL _Procedure0
ADD rsp,48
; a(1,3,5,3,5)
SUB rsp,8
PUSH qword 5
PUSH qword 3
PUSH qword 5
PUSH qword 3
PUSH qword 1
POP rcx
POP rdx
POP r8
POP r9
SUB rsp,32
CALL _Procedure0
ADD rsp,48
Re: Optimize code size
So it is translated to assembly that is not optimized? I'm confused.TI-994A wrote:Even the slightest optimization is still good news. Thanks for pointing that out, luis.luis wrote:Actually Fred said use some kind of peephole optimization ...
Knowing that PureBasic is a single-pass compiler, bloat caused by the failure to remove unused variables/constants/procedures/etc. is the programmer's fault. A good programmer should not only be able to code well, but should also know the capabilities and limitations of the tools and resources he uses.luis wrote:... there is still the fact some unused procedure are compiled in the final exe, if we want to consider the "size" side of things...
It would be highly unfair to compare PureBasic-generated Assembly with manually-coded Assembly; automated translations will invariably be less efficient than manual ones, and that's true for all compilers. But still, the important thing is that PureBasic and FASM does not worsen your code. It all comes down to the programmer; if you write good, clean code, PureBasic would be able to produce fairly decent and efficient executables. If you don't, no amount of manual Assembly-tweaking, and no optimizer in the world would be able to help you.luis wrote:... The code you write in PB can be great PB code, but sometime you could have written it in a total different way in hand-made ASM, so it's not a guarantee optimized code in PB can give the best optimized ASM code for that specific problem. Hand written ASM using a different approach could be better ... the code generated by the compiler, but it will always be "bloated" compared to hand written ASM ... The most astute compiler cannot beat the average ASM programmer in the way the general code layout is generated ...
Thank you for taking the time to explain the intricacies to me; your points directed me to some good resources in the forum, which have clarified my concerns.
Re: Optimize code size
Source codefromVB wrote:So it is translated to assembly that is not optimized? I'm confused.
Code: Select all
A = 20
B = 32
C = A / 4 + B / 4Code: Select all
; A = 20
MOV dword [v_A],20
; B = 32
MOV dword [v_B],32
; C = A / 4 + B / 4
MOV ebx,dword [v_A]
MOV eax,ebx
MOV ecx,4
CDQ
IDIV ecx
MOV ebx,eax
MOV edi,dword [v_B]
MOV eax,edi
MOV ecx,4
CDQ
IDIV ecx
MOV edi,eax
ADD ebx,edi
MOV dword [v_C],ebxCode: Select all
mov eax, 20
mov edx, 32
mov dword [v_A], eax
mov dword [v_B], edx
add eax, edx
sar eax, 2
mov dword [v_C], eax4 is a power of 2. Therefore a shift operation (sar) can be used to perform the division. This is faster compared to an integer division (idiv).
The hand optimized code also has less memory access (3 times instead of 5).
C compilers try to analyze the source code and perform optimizations like that automatically while PureBasic just does a literal translation.
Does that make PureBasic slow ? No. In most situations you won't notice the difference. But when you have code that is executed millions of times inside a loop it can be very rewarding to optimize that part of your application using hand written asm code.
Re: Optimize code size
Hand optimizing the assembly is not needed in the simple case you presented. Instead what is needed is hand optimizing the PureBasic before it is compiled.wilbert wrote:A human understands that A / 4 + B / 4 is the same as (A + B) / 4. The second approach is faster and requires less operations.
4 is a power of 2. Therefore a shift operation (sar) can be used to perform the division. This is faster compared to an integer division (idiv).
The hand optimized code also has less memory access (3 times instead of 5).
C compilers try to analyze the source code and perform optimizations like that automatically while PureBasic just does a literal translation.
Does that make PureBasic slow ? No. In most situations you won't notice the difference. But when you have code that is executed millions of times inside a loop it can be very rewarding to optimize that part of your application using hand written asm code.
Original PureBasic source code:
Code: Select all
A = 20
B = 32
C = A / 4 + B / 4Code: Select all
mov eax, 20
mov edx, 32
mov dword [v_A], eax
mov dword [v_B], edx
add eax, edx
sar eax, 2
mov dword [v_C], eaxCode: Select all
; A = 20
MOV dword [v_A],20
; B = 32
MOV dword [v_B],32
; C = (A + B) >> 2
MOV ebx,dword [v_a]
ADD ebx,dword [v_b]
SAR ebx,2
MOV dword [v_C],ebxThe point of this illustration is that there are definite improvements to be gained from wisely structuring code in PureBasic. I agree that using assembly code directly can provide the greatest improvements, but as I demonstrated, it isn't necessary to go directly to assembly to obtain at least some of these improvements.
Last edited by Demivec on Tue Jul 31, 2012 5:01 pm, edited 3 times in total.
Re: Optimize code size
I tried to explain what compiler optimizations could look like.
It's not uncommon you can get similar results optimizing the source code.
Your remark demonstrates quite well that compiler optimizations have less impact on good quality source code.
It's not uncommon you can get similar results optimizing the source code.
Your remark demonstrates quite well that compiler optimizations have less impact on good quality source code.
Re: Optimize code size
This is correct. Check this out, for example:fromVB wrote:So it is translated to assembly that is not optimized? I'm confused.
Code: Select all
Procedure test1()
Protected b.i = 1
Protected c.i = 0
Protected d.i = 0
Protected e.i = 0
Protected f.i = 0
Protected g.i = 0
; Insert Code here...
EndProcedure
test1()Code: Select all
_Procedure0:
PS0=28
XOR eax,eax ; initialize all local variables to 0:
PUSH eax
PUSH eax
PUSH eax
PUSH eax
PUSH eax
PUSH eax
; Protected b.i = 1
MOV dword [esp],1
; Protected c.i = 0
MOV dword [esp+4],0 ; duplicate the initialization of variables to 0:
; Protected d.i = 0
MOV dword [esp+8],0
; Protected e.i = 0
MOV dword [esp+12],0
; Protected f.i = 0
MOV dword [esp+16],0
; Protected g.i = 0
MOV dword [esp+20],0And regarding the previous example with the a/4 + b/4 vs (a+b) >> 2, that is the difference between a strong optimizing compiler (It does it automatically for you!) vs. one with NO optimizing, which requires YOU to code it all in a way that ends up more optimized. The biggest problem with this is that VERY FEW PB coders know how to even SEE the asm code, much less actually analyze it. Also, very few understand why one code snip compiles more efficiently than another. An optimizing compiler takes care of all of this for you so it just doesn't matter--the coder can be naive, and blissfully so, since the end result is about the same. Not so with PB; since its primary focus is beginners, who are still naive about these manual optimizations, it merrily produces poor code because of the skillset of the coder base.
And YES, it IS SLOWER. How much slower may very well be irrelevant, in most situations. Consider if you are calculating an internal window size after a major window resize; nobody will notice a dozen extra instructions being executed, even on a slow cpu.
On the other hand, if you are performing numerous CPU-intensive calculations, those dozen extra instructions could easily turn into minutes, if not hours given a large enough task.
Re: Optimize code size
Not to forget that PB does not have any constructs to enable the programmer to use SIMD operations. With PB (and many other compilers) this is only possible using ASM. And having no SIMD means you are loosing a whole layer of parallelization.
For any parallelizable code SIMD is a great performance boost. And SIMD can't be applied automaticly by compilers. Some compilers like Intel C++ offer commands to use SIMD on the high language level.
For any parallelizable code SIMD is a great performance boost. And SIMD can't be applied automaticly by compilers. Some compilers like Intel C++ offer commands to use SIMD on the high language level.
Re: Optimize code size
Demivec wrote:...what is needed is hand optimizing the PureBasic before it is compiled...there are definite improvements to be gained from wisely structuring code in PureBasic....
Thanks for your illustrative examples; precisely the point I've been trying to make.wilbert wrote:Your remark demonstrates quite well that compiler optimizations have less impact on good quality source code.
The coder is punished for not knowing that PureBasic already initializes all variables with zero. Good practices are always contingent.Tenaja wrote:...The coder is punished for exercising good programming practice of initializing every variable before use, because in this example, they are all initialized TWICE!
Tenaja wrote:(thread link)...the Java-written Open Office is so pathetically slow with a million values in a spreadsheet that it took nearly an hour to process a simple graph. In MS Excel, it was nearly instantaneous. There is NO WAY that C is that much faster than Java in ANY benchmark test... the Java implementation either has poor memory management, or poor coding...
Even highly optimized Java, with its bytecode, JVM, and JIT compilation, cannot help poor coding. Thank you for pointing that out, although it contradicts your second statement above. The results are clearly not the same.An optimizing compiler takes care of all of this for you so it just doesn't matter--the coder can be naive, and blissfully so, since the end result is about the same.
Texas Instruments TI-99/4A Home Computer: the first home computer with a 16bit processor, crammed into an 8bit architecture. Great hardware - Poor design - Wonderful BASIC engine. And it could talk too! Please visit my YouTube Channel 
-
Vitor_Boss®
- User

- Posts: 81
- Joined: Thu Sep 23, 2010 4:22 am
Re: Optimize code size
Thank you for all codes and explanation about ASM coding. I tried optimize memory comparison procedure and got this
In a 10 millions test loop I got up to 50% speed improvement.
All processors since 98 has MMX right?
Fred you can use it if you want.
Code: Select all
Procedure CmpMem(*Source, *Test, Size)
push edi
push esi
mov eax, [esp+14h] ; Size
mov esi, [esp+10h] ; *Test2
mov edi, [esp+0Ch] ; *Source
dec eax
cmp eax, 3
jl loc_Byte_cmp_init ; if (Size > 4)
add esi, 0FFFFFFFDh ; Needed to alignment of last byte
add edi, 0FFFFFFFDh ; Needed to alignment of last byte
cmp eax, 7
jl loc_4Bytes_cmp ; if (Size > 8)
add esi, 0FFFFFFFCh ; Needed to alignment of last byte
add edi, 0FFFFFFFCh ; Needed to alignment of last byte
!loc_8Bytes_cmp:
movq mm0, [eax+edi] ;Source[i]
pcmpeqd mm0, [eax+esi] ;Test[i]
psrlq mm0, 16
movd edx, mm0
cmp edx, 0FFFFFFFFh
jne loc_return_0 ; If (Source[i]!=Test[i])
add eax, 0FFFFFFF8h ; i-=8
jl loc_return_1
cmp eax, 7
jg loc_8Bytes_cmp ; If (i>8)
mov al, 7
jmp loc_8Bytes_cmp ; If (i>7)
!loc_4Bytes_cmp:
mov edx, [eax+edi] ;Source[i]
cmp edx, [eax+esi] ;Test[i]
jnz loc_return_0 ; If (Source[i]!=Test[i])
add eax, 0FFFFFFFCh ; i-=4
jl loc_return_1
cmp eax, 2
jg loc_4Bytes_cmp ; If (i>3)
mov al, 3
jmp loc_4Bytes_cmp
!loc_Byte_cmp_init:
test eax, eax
jle loc_return_0 ;If (Size > 0)
!loc_Byte_cmp:
mov dl, [eax+edi] ;Source[i]
cmp dl, [eax+esi] ;Test[i]
jnz loc_return_0 ; If (Source[i]!=Test[i])
dec eax ; --i
jge loc_Byte_cmp ; If (i>=0)
!loc_return_1:
pop edi
pop esi
mov eax, 1
!retn 0ch
!loc_return_0:
pop edi
pop esi
EndProcedure
All processors since 98 has MMX right?
Fred you can use it if you want.
Last edited by Vitor_Boss® on Sun Sep 16, 2012 4:21 pm, edited 2 times in total.
Sorry by bad English.
HP Pavilion DV6-2155DX: Intel i3-330m 2.13 / 4GB DDR3 / 500GB Sata2 HD / Display 15.6" LED / Win7 Ultimate x64 / PB 4.50 x86 demo.
HP Pavilion DV6-2155DX: Intel i3-330m 2.13 / 4GB DDR3 / 500GB Sata2 HD / Display 15.6" LED / Win7 Ultimate x64 / PB 4.50 x86 demo.
Re: Optimize code size
Check out the Rep and Scan instructions, they should speed it up more.
Or use 128bit SSE registers for the compare.
Or use 128bit SSE registers for the compare.
-
Vitor_Boss®
- User

- Posts: 81
- Joined: Thu Sep 23, 2010 4:22 am
Re: Optimize code size
Updated with 64bit MMX instructions for a fastest results, all memory use a bus 64bits wide right?Thorium wrote:Check out the Rep and Scan instructions, they should speed it up more.
Or use 128bit SSE registers for the compare.
Sorry by bad English.
HP Pavilion DV6-2155DX: Intel i3-330m 2.13 / 4GB DDR3 / 500GB Sata2 HD / Display 15.6" LED / Win7 Ultimate x64 / PB 4.50 x86 demo.
HP Pavilion DV6-2155DX: Intel i3-330m 2.13 / 4GB DDR3 / 500GB Sata2 HD / Display 15.6" LED / Win7 Ultimate x64 / PB 4.50 x86 demo.


