Help me defend PB "reputation"
Re: Help me defend PB "reputation"
Your friends are at the very beginning of evolution. Can't compare C++ with
Re: Help me defend PB "reputation"
It's not easy to give a good answer. It depends on the code. Some years ago someone benchmarked some code from different compilers and compared it to PureBasic. It's somewhere on the forums, but i can't find it. It also included Java.TI-994A wrote:Hi Thorium. Very slow? Much slower?Thorium wrote:Actualy it's very slow compared to optimizing C or C++ compilers ... the resulting executable is much slower...
With no optimisation options, speed disparities are to be expected. But what do you consider very much slower?
As a general statement i would say 50% performance would be normal, compared to C/C++.
If you use instrinsics for SIMD it should be around 10 to 20%.
As i said you can compensate for that by using some inline assembly. So, it's no issue for me.
Re: Help me defend PB "reputation"
Hi Thorium. At best, double the speed sounds about right. Anything more would require some painstaking, hand-coded compiler directives to achieve.Thorium wrote:...As a general statement i would say 50% performance would be normal, compared to C/C++.
If you use instrinsics for SIMD it should be around 10 to 20%.
Texas Instruments TI-99/4A Home Computer: the first home computer with a 16bit processor, crammed into an 8bit architecture. Great hardware - Poor design - Wonderful BASIC engine. And it could talk too! Please visit my YouTube Channel
Re: Help me defend PB "reputation"
Not necessarily.TI-994A wrote:At best, double the speed sounds about right. Anything more would require some painstaking, hand-coded compiler directives to achieve.
For example i implemented a image filter in PureBasic and optimized it with assembly.
The result was that a simple assembly version without any instruction extension was allready about 5 times faster. Many C/C++ compilers can achieve this without any inline assembly needed.
It was a very small loop, with just a few variables. That can be very good optimized by a compiler.
That was the benchmark of the different implementations:
Code: Select all
PureBasic 197 MB/s 100%
Assembler 1178 MB/s 598%
MMX 7133 MB/s 3621%
SSE2 10997 MB/s 5582%
Re: Help me defend PB "reputation"
The only one I remember is this one, but it isn't saying much: http://www.purebasic.fr/english/viewtop ... =7&t=48202Thorium wrote:Some years ago someone benchmarked some code from different compilers and compared it to PureBasic. It's somewhere on the forums, but i can't find it. It also included Java.
Not the one you remember ? If it's not this then I must have missed it.
PS: this thread is ... amusing.
"Have you tried turning it off and on again ?"
A little PureBasic review
A little PureBasic review
Re: Help me defend PB "reputation"
Hi Thorium. Very impressive results; more than fifty times faster compared to vanilla PureBasic, and ten times faster than assembly.Thorium wrote:For example i implemented a image filter in PureBasic and optimized it with assembly ... It was a very small loop, with just a few variables.
It would be really great if we could see how each of the codes were implemented. Maybe we could all learn something about optimisation.
Texas Instruments TI-99/4A Home Computer: the first home computer with a 16bit processor, crammed into an 8bit architecture. Great hardware - Poor design - Wonderful BASIC engine. And it could talk too! Please visit my YouTube Channel
Re: Help me defend PB "reputation"
It's not uncommon to get a massive speed increase when you hand code some part with assembler code especially when you can use SSE.TI-994A wrote:Hi Thorium. Very impressive results; more than fifty times faster compared to vanilla PureBasic, and ten times faster than assembly.Thorium wrote:For example i implemented a image filter in PureBasic and optimized it with assembly ... It was a very small loop, with just a few variables.
It would be really great if we could see how each of the codes were implemented. Maybe we could all learn something about optimisation.
But already a very simple thing is that you can use registers instead of memory to store temporary variables.
PureBasic handles code line by line, storing variables all the time which is convenient if you want to mix PB code with assembler but also inefficient.
Take for example this code
Code: Select all
result.i = 0
t1 = ElapsedMilliseconds()
For i = 1 To 100000000
result = result + i
result = result - 50000
result = result >> 5
Next
t2 = ElapsedMilliseconds()
MessageRequester(Str(t2-t1)+" ms", Str(result))
Code: Select all
; For i = 1 To 100000000
MOV qword [v_i],1
_For1:
MOV rax,100000000
CMP rax,qword [v_i]
JL _Next2
; result = result + i
MOV r15,qword [v_result]
ADD r15,qword [v_i]
MOV qword [v_result],r15
; result = result - 50000
MOV r15,qword [v_result]
ADD r15,-50000
MOV qword [v_result],r15
; result = result >> 5
MOV r15,qword [v_result]
SAR r15,5
MOV qword [v_result],r15
; Next
_NextContinue2:
INC qword [v_i]
JNO _For1
_Next2:
Code: Select all
result = (result + i - 50000) >> 5
But for most applications, the computer is waiting for user input a lot of the time and you won't notice the difference.
And like Thorium said, for some special routines that are called a lot you can mix in assembler.
Windows (x64)
Raspberry Pi OS (Arm64)
Raspberry Pi OS (Arm64)
Re: Help me defend PB "reputation"
Yes, thats the one.luis wrote: The only one I remember is this one, but it isn't saying much: http://www.purebasic.fr/english/viewtop ... =7&t=48202
It's nothing special. Was one of my first tries on SIMD.TI-994A wrote:Hi Thorium. Very impressive results; more than fifty times faster compared to vanilla PureBasic, and ten times faster than assembly.Thorium wrote:For example i implemented a image filter in PureBasic and optimized it with assembly ... It was a very small loop, with just a few variables.
It would be really great if we could see how each of the codes were implemented. Maybe we could all learn something about optimisation.
It's unfiltering the "up filter" of the PNG image file format.
The code is very simple, it's a lot because of the many implementations.
If you want to get into optimization just learn some assembly and SIMD (MMX, SSE, AVX). SIMD can be hard to use but it can be very rewarding.
Code: Select all
Structure Tsi_Pixel_Channel
Channel.a
EndStructure
;Undos the up filter.
Procedure Tsi_UnFilterUp(*ImageData, Width.i, Height.i, PixelSize.i)
CompilerSelect #PB_Compiler_Processor
CompilerCase #PB_Processor_x86
If Tsi_Sse2Supported = #True
;save registers
!push esi
!push edi
!push ebx
;calculate the pointers
!mov edi,[p.p_ImageData+12]
!mov esi,edi
!mov eax,[p.v_Width+12]
!mul dword[p.v_PixelSize+12]
!mov edx,eax
!add edi,edx
;calculate the counters
!mov eax,[p.v_Height+12]
!dec eax
!mul dword[p.v_Width+12]
!mul dword[p.v_PixelSize+12]
!mov ecx,eax
!shr ecx,7
!and eax,127
!mov ebx,eax
;process a part of the data to cut the length to a multiple of 128
!test ebx,ebx
!je Tsi_UnFilterUp_Sse2CutLengthEnd
!align 4
!Tsi_UnFilterUp_Sse2CutLengthStart:
!mov al,[edi]
!add al,[esi]
!mov [edi],al
!inc esi
!inc edi
!dec ebx
!jne Tsi_UnFilterUp_Sse2CutLengthStart
!align 4
!Tsi_UnFilterUp_Sse2CutLengthEnd:
;process the rest of the data
!test ecx,ecx
!je Tsi_UnFilterUp_Sse2LoopEnd
!align 4
!Tsi_UnFilterUp_Sse2LoopStart:
!movdqu xmm0,[esi]
!movdqu xmm1,[esi+16]
!movdqu xmm2,[esi+32]
!movdqu xmm3,[esi+48]
!movdqu xmm4,[esi+64]
!movdqu xmm5,[esi+80]
!movdqu xmm6,[esi+96]
!movdqu xmm7,[esi+112]
!paddb xmm0,[edi]
!paddb xmm1,[edi+16]
!paddb xmm2,[edi+32]
!paddb xmm3,[edi+48]
!paddb xmm4,[edi+64]
!paddb xmm5,[edi+80]
!paddb xmm6,[edi+96]
!paddb xmm7,[edi+112]
!movdqu [edi],xmm0
!movdqu [edi+16],xmm1
!movdqu [edi+32],xmm2
!movdqu [edi+48],xmm3
!movdqu [edi+64],xmm4
!movdqu [edi+80],xmm5
!movdqu [edi+96],xmm6
!movdqu [edi+112],xmm7
!add esi,128
!add edi,128
!dec ecx
!jne Tsi_UnFilterUp_Sse2LoopStart
!align 4
!Tsi_UnFilterUp_Sse2LoopEnd:
;restore the registers
!pop ebx
!pop edi
!pop esi
;end SSE2 state
!emms
ElseIf Tsi_MmxSupported = #True
;save registers
!push esi
!push edi
!push ebx
;calculate the pointers
!mov edi,[p.p_ImageData+12]
!mov esi,edi
!mov eax,[p.v_Width+12]
!mul dword[p.v_PixelSize+12]
!mov edx,eax
!add edi,edx
;calculate the counters
!mov eax,[p.v_Height+12]
!dec eax
!mul dword[p.v_Width+12]
!mul dword[p.v_PixelSize+12]
!mov ecx,eax
!shr ecx,6
!and eax,63
!mov ebx,eax
;process a part of the data to cut the length to a multiple of 64
!test ebx,ebx
!je Tsi_UnFilterUp_MmxCutLengthEnd
!align 4
!Tsi_UnFilterUp_MmxCutLengthStart:
!mov al,[edi]
!add al,[esi]
!mov [edi],al
!inc esi
!inc edi
!dec ebx
!jne Tsi_UnFilterUp_MmxCutLengthStart
!align 4
!Tsi_UnFilterUp_MmxCutLengthEnd:
;process the rest of the data
!test ecx,ecx
!je Tsi_UnFilterUp_MmxLoopEnd
!align 4
!Tsi_UnFilterUp_MmxLoopStart:
!movq mm0,[esi]
!movq mm1,[esi+8]
!movq mm2,[esi+16]
!movq mm3,[esi+24]
!movq mm4,[esi+32]
!movq mm5,[esi+40]
!movq mm6,[esi+48]
!movq mm7,[esi+56]
!paddb mm0,[edi]
!paddb mm1,[edi+8]
!paddb mm2,[edi+16]
!paddb mm3,[edi+24]
!paddb mm4,[edi+32]
!paddb mm5,[edi+40]
!paddb mm6,[edi+48]
!paddb mm7,[edi+56]
!movq [edi],mm0
!movq [edi+8],mm1
!movq [edi+16],mm2
!movq [edi+24],mm3
!movq [edi+32],mm4
!movq [edi+40],mm5
!movq [edi+48],mm6
!movq [edi+56],mm7
!add esi,64
!add edi,64
!dec ecx
!jne Tsi_UnFilterUp_MmxLoopStart
!align 4
!Tsi_UnFilterUp_MmxLoopEnd:
;restore the registers
!pop ebx
!pop edi
!pop esi
;end MMX state
!emms
Else
!push esi
!push edi
!mov eax,[p.v_Height+8]
!dec eax
!mul dword[p.v_Width+8]
!mul dword[p.v_PixelSize+8]
!mov ecx,eax
!mov edi,[p.p_ImageData+8]
!mov esi,edi
!mov eax,[p.v_Width+8]
!mul dword[p.v_PixelSize+8]
!mov edx,eax
!add edi,edx
!align 4
!Tsi_UnFilterUp_LoopStart:
!mov al,[edi]
!add al,[esi]
!mov [edi],al
!inc esi
!inc edi
!dec ecx
!jne Tsi_UnFilterUp_LoopStart
!pop edi
!pop esi
EndIf
CompilerCase #PB_Processor_x64
If Tsi_Sse2Supported = #True
;save registers
!push rsi
!push rdi
;calculate the pointers
!mov rdi,[p.p_ImageData+16]
!mov rsi,rdi
!mov rax,[p.v_Width+16]
!mul qword[p.v_PixelSize+16]
!mov rdx,rax
!add rdi,rdx
;calculate the counters
!mov rax,[p.v_Height+16]
!dec rax
!mul qword[p.v_Width+16]
!mul qword[p.v_PixelSize+16]
!mov rcx,rax
!shr rcx,7
!and rax,127
!mov r10,rax
;process a part of the data to cut the length to a multiple of 128
!test r10,r10
!je Tsi_UnFilterUp_Sse2CutLengthEnd
!align 8
!Tsi_UnFilterUp_Sse2CutLengthStart:
!mov al,[rdi]
!add al,[rsi]
!mov [rdi],al
!inc rsi
!inc rdi
!dec r10
!jne Tsi_UnFilterUp_Sse2CutLengthStart
!align 8
!Tsi_UnFilterUp_Sse2CutLengthEnd:
;process the rest of the data
!test rcx,rcx
!je Tsi_UnFilterUp_Sse2LoopEnd
!align 8
!Tsi_UnFilterUp_Sse2LoopStart:
!movdqu xmm0,[rsi]
!movdqu xmm1,[rsi+16]
!movdqu xmm2,[rsi+32]
!movdqu xmm3,[rsi+48]
!movdqu xmm4,[rsi+64]
!movdqu xmm5,[rsi+80]
!movdqu xmm6,[rsi+96]
!movdqu xmm7,[rsi+112]
!paddb xmm0,[rdi]
!paddb xmm1,[rdi+16]
!paddb xmm2,[rdi+32]
!paddb xmm3,[rdi+48]
!paddb xmm4,[rdi+64]
!paddb xmm5,[rdi+80]
!paddb xmm6,[rdi+96]
!paddb xmm7,[rdi+112]
!movdqu [rdi],xmm0
!movdqu [rdi+16],xmm1
!movdqu [rdi+32],xmm2
!movdqu [rdi+48],xmm3
!movdqu [rdi+64],xmm4
!movdqu [rdi+80],xmm5
!movdqu [rdi+96],xmm6
!movdqu [rdi+112],xmm7
!add rsi,128
!add rdi,128
!dec rcx
!jne Tsi_UnFilterUp_Sse2LoopStart
!align 8
!Tsi_UnFilterUp_Sse2LoopEnd:
;restore the registers
!pop rdi
!pop rsi
;end SSE2 state
!emms
ElseIf Tsi_MmxSupported = #True
;save registers
!push rsi
!push rdi
;calculate the pointers
!mov rdi,[p.p_ImageData+16]
!mov rsi,rdi
!mov rax,[p.v_Width+16]
!mul qword[p.v_PixelSize+16]
!mov rdx,rax
!add rdi,rdx
;calculate the counters
!mov rax,[p.v_Height+16]
!dec rax
!mul qword[p.v_Width+16]
!mul qword[p.v_PixelSize+16]
!mov rcx,rax
!shr rcx,6
!and rax,63
!mov r10,rax
;process a part of the data to cut the length to a multiple of 64
!test r10,r10
!je Tsi_UnFilterUp_MmxCutLengthEnd
!align 8
!Tsi_UnFilterUp_MmxCutLengthStart:
!mov al,[rdi]
!add al,[rsi]
!mov [rdi],al
!inc rsi
!inc rdi
!dec r10
!jne Tsi_UnFilterUp_MmxCutLengthStart
!align 8
!Tsi_UnFilterUp_MmxCutLengthEnd:
;process the rest of the data
!test rcx,rcx
!je Tsi_UnFilterUp_MmxLoopEnd
!align 8
!Tsi_UnFilterUp_MmxLoopStart:
!movq mm0,[rsi]
!movq mm1,[rsi+8]
!movq mm2,[rsi+16]
!movq mm3,[rsi+24]
!movq mm4,[rsi+32]
!movq mm5,[rsi+40]
!movq mm6,[rsi+48]
!movq mm7,[rsi+56]
!paddb mm0,[rdi]
!paddb mm1,[rdi+8]
!paddb mm2,[rdi+16]
!paddb mm3,[rdi+24]
!paddb mm4,[rdi+32]
!paddb mm5,[rdi+40]
!paddb mm6,[rdi+48]
!paddb mm7,[rdi+56]
!movq [rdi],mm0
!movq [rdi+8],mm1
!movq [rdi+16],mm2
!movq [rdi+24],mm3
!movq [rdi+32],mm4
!movq [rdi+40],mm5
!movq [rdi+48],mm6
!movq [rdi+56],mm7
!add rsi,64
!add rdi,64
!dec rcx
!jne Tsi_UnFilterUp_MmxLoopStart
!align 8
!Tsi_UnFilterUp_MmxLoopEnd:
;restore the registers
!pop rdi
!pop rsi
;end MMX state
!emms
Else
!push rsi
!push rdi
!mov rax,[p.v_Height+16]
!dec rax
!mul qword[p.v_Width+16]
!mul qword[p.v_PixelSize+16]
!mov rcx,rax
!mov rdi,[p.p_ImageData+16]
!mov rsi,rdi
!mov rax,[p.v_Width+16]
!mul qword[p.v_PixelSize+16]
!mov rdx,rax
!add rdi,rdx
!align 8
!Tsi_UnFilterUp_LoopStart:
!mov al,[rdi]
!add al,[rsi]
!mov [rdi],al
!inc rsi
!inc rdi
!dec rcx
!jne Tsi_UnFilterUp_LoopStart
!pop rdi
!pop rsi
EndIf
CompilerDefault
Protected.i X, ByteSize
Protected *ActualChannel.Tsi_Pixel_Channel
Protected *PriorChannel.Tsi_Pixel_Channel
*PriorChannel = *ImageData
*ActualChannel = *ImageData + Width * PixelSize
Height - 1
ByteSize = Width * Height * PixelSize
For X = 1 To ByteSize
*ActualChannel\Channel = *ActualChannel\Channel + *PriorChannel\Channel
*ActualChannel + 1
*PriorChannel + 1
Next
CompilerEndSelect
EndProcedure
Re: Help me defend PB "reputation"
Hi wilbert. Great example of how good coding makes a difference. Thank you.wilbert wrote:It's not uncommon to get a massive speed increase when you hand code some part with assembler code especially when you can use SSE.
Thanks for the code, Thorium. That's my point exactly.Thorium wrote:SIMD can be hard to use but it can be very rewarding.
Texas Instruments TI-99/4A Home Computer: the first home computer with a 16bit processor, crammed into an 8bit architecture. Great hardware - Poor design - Wonderful BASIC engine. And it could talk too! Please visit my YouTube Channel
Re: Help me defend PB "reputation"
This is actually the thread I remember:
http://www.purebasic.fr/english/viewtop ... 17&t=49882
http://www.purebasic.fr/english/viewtop ... 17&t=49882
Blog: Why Does It Suck? (http://whydoesitsuck.com/)
"You can disagree with me as much as you want, but during this talk, by definition, anybody who disagrees is stupid and ugly."
- Linus Torvalds
Re: Help me defend PB "reputation"
There are also optimization that can be done in PureBasic that are often overlooked.TI-994A wrote:Hi wilbert. Great example of how good coding makes a difference. Thank you.wilbert wrote:It's not uncommon to get a massive speed increase when you hand code some part with assembler code especially when you can use SSE.
Thanks for the code, Thorium. That's my point exactly.Thorium wrote:SIMD can be hard to use but it can be very rewarding.
For example the filter loop in PureBasic is fairly optimized.
-It precalculates the pointers, so inside the loop it moves to the next pixel channel just by incrementing the pointer. No need for pointer calculation inside the loop.
Allways precalculate as much as possible befor entering a loop. Especialy multiplication and division kills performance.
-It uses just one loop to process the whole image. No nested loop, so no additional overhead. A image has hight and width. But it's a continous string of pixels in memory. So you dont need to have a loop for Y and a nested loop for X. You can just calculate the image size in bytes and use only one loop. You might end up processing line padding bytes as well, but it can still be faster.
-Additional optimizations could be done. For example unrolling the loop. Processing multiple channels and pixels per iteration.
Code: Select all
Protected.i X, ByteSize
Protected *ActualChannel.Tsi_Pixel_Channel
Protected *PriorChannel.Tsi_Pixel_Channel
*PriorChannel = *ImageData
*ActualChannel = *ImageData + Width * PixelSize
Height - 1
ByteSize = Width * Height * PixelSize
For X = 1 To ByteSize
*ActualChannel\Channel = *ActualChannel\Channel + *PriorChannel\Channel
*ActualChannel + 1
*PriorChannel + 1
Next
-
- Addict
- Posts: 1309
- Joined: Fri Aug 28, 2015 6:10 pm
- Location: Portugal
Re: Help me defend PB "reputation"
As a newbie to PureBasic (two monthsish) and only a hobbyist programmer moving from VB. I find the following from a previous post brilliant.
- executables require absolutely no dependencies or frameworks
- executables can be run out of the box without any installation
The users of the application I am attempting to convert want to be able to run the application from a memory stick just moving from one computer to another. Managed it in VB but took ages and is quite bloated with all the DLLs i have to include plus the cost to my sanity.
I have found PB easy to learn and the only problems I am having are mainly down to my understanding of the bits and pieces. As I olearn more it becomes more apparent that PB is just the way to go. Just scratching the surface as yet.
The lack of lots of code samples is a big drop from VB but the forum here is good for getting answers.
I also have a couple of friends who program for a living and when they start asking me what I am learning I just say PureBasic and put on the most idiotic smile I can to make them feel at home.
- executables require absolutely no dependencies or frameworks
- executables can be run out of the box without any installation
The users of the application I am attempting to convert want to be able to run the application from a memory stick just moving from one computer to another. Managed it in VB but took ages and is quite bloated with all the DLLs i have to include plus the cost to my sanity.
I have found PB easy to learn and the only problems I am having are mainly down to my understanding of the bits and pieces. As I olearn more it becomes more apparent that PB is just the way to go. Just scratching the surface as yet.
The lack of lots of code samples is a big drop from VB but the forum here is good for getting answers.
I also have a couple of friends who program for a living and when they start asking me what I am learning I just say PureBasic and put on the most idiotic smile I can to make them feel at home.
Any intelligent fool can make things bigger and more complex. It takes a touch of genius — and a lot of courage to move in the opposite direction.
Re: Help me defend PB "reputation"
Code: Select all
MessageRequester("Hello World","Hello World")
Ok a messagebox is probably cheating, but OpenWindow() is only one line too
I try many different compilers before choosing Purebasic, and some of them couldn't even say Hello World in less than 1 megabyte!
i know 1 megabyte doesnt mean much in 2015 but for me it just doesnt quite feel right when ive seen asm exes that can do it in 1 kilobyte, and i value qualities like accountability, efficiency, performance.
Code: Select all
MessageRequester("Hello World","Hello World")
! xor rax, rax
some great examples of this posted above in this thread
Re: Help me defend PB "reputation"
collectordave wrote:
http://rosettacode.org/wiki/PureBasic
The pages contain programming problems solved using PureBasic.
RosettaCode currently has 492 pages in its PureBasic categoryThe lack of lots of code samples is a big drop from VB but the forum here is good for getting answers.
http://rosettacode.org/wiki/PureBasic
The pages contain programming problems solved using PureBasic.
Think Unicode!
Re: Help me defend PB "reputation"
Yeeeaaaahhhh you can make a product in 1/5 the turn-around but guess what: PB is sometimes 1% less efficient than a C++ solution written by one of the rare C++ coders who actually knows about compiler design and optimizations..
P.S. PB makes machine code PE and ELF on all platforms just like C&C++ compilers.
If I had about $25,000.00 for every person I know who has never finished a product and doesn't know what a buffer overflow is OR how to reverse-engineer a binary tell me about why C&C++ is better; I'd buy the company..
P.S. PB makes machine code PE and ELF on all platforms just like C&C++ compilers.
If I had about $25,000.00 for every person I know who has never finished a product and doesn't know what a buffer overflow is OR how to reverse-engineer a binary tell me about why C&C++ is better; I'd buy the company..
The truth hurts.