Anyone's afraid of BlitzMax speed?

Sebe · Post by **Sebe** » Mon Mar 20, 2006 12:50 am

Well, some people might be also on the CodersWorkshop forums. Currently there's an brawl about BMX being faster than PB. The base of this was the SIEVE OF ERATOSTHENES benchmark in which BMX was quite fast but PB wasn't even faster than Blitz3D

That probably means that integers are faster with BMX, but what about the other stuff? So I did a little work on the sieve test and changed some integers to floats. Here's the link: http://www.kudoscry.com/public/sieve.zip

My results so far:

Speed:
-> PureBasic / 4.0beta7 / debugger off: 19687 m/secs
-> BlitzMax / 1.12 demo / debugger off: 30940 m/secs w/o GarbageCollector
-> BlitzMax / 1.12 demo / debugger off: 31625 m/secs w/ GarbageCollector

Filesize:
-> PureBasic / 4.0beta7 / debugger off: 5.632 Bytes
-> BlitzMax / 1.12 demo / debugger off: 49.664 Bytes w/ and w/o GarbageCollector

Have fun

GedB · Post by **GedB** » Mon Mar 20, 2006 1:50 am

Looking at the code, I'm thinking that PB is losing out in the Array look ups.

Does Blitzmax allow you to get the ASM?

Sebe · Post by **Sebe** » Mon Mar 20, 2006 1:57 am

Difficult, because BMX converts the code into C(C++?) code and then calls the gcc I think. BMX itself is just a LANGUAGE->C/C++ "compiler" and I don't know how to get the ASM output of gcc

Funny thing is: I did a intfloat-mixedmode test with C/C++, too. I didn't save the sourcefile, but gcc (I used DevCpp) was slightly faster than PureBasic, so I wonder why BlitzMax loses that much speed :roll:
If anyone's interested: Here's the original INTonly sieve test with Blitz3D, BlitzMax, Purebasic and C/C++: http://homepage.ntlworld.com/config/share/sieve.zip

Edit: Wasn't even difficult

Just took a look at the files in the .bmx folder and found a "sieve.bmx.BLABLABLA.s" file. This is what the file contains:

Code: Select all

	format	MS COFF
	extrn	___bb_blitz_blitz_
	extrn	___bb_standardio_standardio_
	extrn	___bb_system_system_
	extrn	_bbArrayNew1D
	extrn	_bbEnd
	extrn	_bbFloatToInt
	extrn	_bbGCCollect
	extrn	_bbGCSetMode
	extrn	_bbStringClass
	extrn	_bbStringConcat
	extrn	_bbStringFromFloat
	extrn	_brl_standardio_Input
	extrn	_brl_standardio_Print
	extrn	_brl_system_MilliSecs
	public	__bb_main
	public	_bb_t
	section	"code" code
__bb_main:
	push	ebp
	mov	ebp,esp
	sub	esp,12
	push	ebx
	push	esi
	push	edi
	cmp	dword [_40],0
	je	_41
	mov	eax,0
	pop	edi
	pop	esi
	pop	ebx
	mov	esp,ebp
	pop	ebp
	ret
_41:
	mov	dword [_40],1
	fld	dword [_42]
	fstp	dword [ebp-8]
	call	___bb_blitz_blitz_
	call	___bb_system_system_
	call	___bb_standardio_standardio_
	push	2
	call	_bbGCSetMode
	add	esp,4
	push	_25
	call	_brl_standardio_Print
	add	esp,4
	mov	eax,dword [_26]
	and	eax,1
	cmp	eax,0
	jne	_27
	call	_brl_system_MilliSecs
	mov	dword [ebp+-12],eax
	fild	dword [ebp+-12]
	fstp	dword [_bb_t]
	or	dword [_26],1
_27:
	mov	edi,1
	jmp	_29
_7:
	fld	dword [_44]
	fstp	dword [ebp-8]
	push	8191
	push	_32
	call	_bbArrayNew1D
	add	esp,8
	mov	esi,eax
	mov	ebx,0
	jmp	_36
_10:
	fld1
	fstp	dword [esi+ebx*4+24]
_8:
	add	ebx,1
_36:
	cmp	ebx,8190
	jle	_10
_9:
	mov	ebx,0
	jmp	_37
_13:
	fld	dword [esi+ebx*4+24]
	fld1
	fxch	st1
	fucompp
	fnstsw	ax
	sahf
	setnz	al
	movzx	eax,al
	cmp	eax,0
	jne	_38
	mov	eax,ebx
	shl	eax,1
	mov	dword [ebp+-12],eax
	fild	dword [ebp+-12]
	fstp	dword [ebp-4]
	fld	dword [ebp-4]
	fadd	dword [_46]
	fstp	dword [ebp-4]
	mov	dword [ebp+-12],ebx
	fild	dword [ebp+-12]
	fadd	dword [ebp-4]
	sub	esp,8
	fstp	qword [esp]
	call	_bbFloatToInt
	add	esp,8
	jmp	_14
_16:
	fldz
	fstp	dword [esi+eax*4+24]
	mov	dword [ebp+-12],eax
	fild	dword [ebp+-12]
	fadd	dword [ebp-4]
	sub	esp,8
	fstp	qword [esp]
	call	_bbFloatToInt
	add	esp,8
_14:
	cmp	eax,8190
	jle	_16
_15:
	fld	dword [ebp-8]
	fadd	dword [_47]
	fstp	dword [ebp-8]
_38:
_11:
	add	ebx,1
_37:
	cmp	ebx,8190
	jle	_13
_12:
	call	_bbGCCollect
_5:
	add	edi,1
_29:
	cmp	edi,50000
	jle	_7
_6:
	call	_brl_system_MilliSecs
	mov	dword [ebp+-12],eax
	fild	dword [ebp+-12]
	fsub	dword [_bb_t]
	fstp	dword [_bb_t]
	push	_18
	push	dword [_bb_t]
	call	_bbStringFromFloat
	add	esp,4
	push	eax
	push	_39
	call	_bbStringConcat
	add	esp,8
	push	eax
	call	_bbStringConcat
	add	esp,8
	push	eax
	call	_brl_standardio_Print
	add	esp,4
	push	dword [ebp-8]
	call	_bbStringFromFloat
	add	esp,4
	push	eax
	push	_19
	call	_bbStringConcat
	add	esp,8
	push	eax
	call	_brl_standardio_Print
	add	esp,4
	push	_20
	call	_brl_standardio_Input
	add	esp,4
	call	_bbEnd
	mov	eax,0
	jmp	_21
_21:
	pop	edi
	pop	esi
	pop	ebx
	mov	esp,ebp
	pop	ebp
	ret
	section	"data" data writeable align 8
	align	4
_40:
	dd	0
	align	4
_42:
	dd	0x0
	align	4
_25:
	dd	_bbStringClass
	dd	2147483647
	dd	40
	dw	83,73,69,86,69,32,79,70
	dw	32,69,82,65,84,79,83,84
	dw	72,69,78,69,83,32,45,32
	dw	53,48,48,48,48,32,105,116
	dw	101,114,97,116,105,111,110,115
	align	4
_26:
	dd	0
	align	4
_bb_t:
	dd	0x0
	align	4
_44:
	dd	0x0
_32:
	db	"f",0
	align	4
_46:
	dd	0x40400000
	align	4
_47:
	dd	0x3f800000
	align	4
_18:
	dd	_bbStringClass
	dd	2147483647
	dd	8
	dw	32,109,47,115,101,99,115,46
	align	4
_39:
	dd	_bbStringClass
	dd	2147483647
	dd	22
	dw	53,48,48,48,48,32,105,116
	dw	101,114,97,116,105,111,110,115
	dw	32,116,111,111,107,32
	align	4
_19:
	dd	_bbStringClass
	dd	2147483647
	dd	8
	dw	80,114,105,109,101,115,58,32
	align	4
_20:
	dd	_bbStringClass
	dd	2147483647
	dd	17
	dw	82,101,116,117,114,110,32,116
	dw	111,32,101,110,100,32,46,46
	dw	46

I also took a look at the /bin folder and guess what? No gcc anymore

Mark seems to use FASM now and that means Fred should be able to kick BMXs' ass

Sebe · Post by **Sebe** » Mon Mar 20, 2006 2:10 am

Here's the asm output for the INTonly sieve test. Together with the sieve.zip in my second post, one should be able to compare the BMX and the PB ASM outpur and optimize:

Code: Select all

	format	MS COFF
	extrn	___bb_blitz_blitz_
	extrn	___bb_standardio_standardio_
	extrn	___bb_system_system_
	extrn	_bbArrayNew1D
	extrn	_bbEnd
	extrn	_bbGCCollect
	extrn	_bbGCSetMode
	extrn	_bbStringClass
	extrn	_bbStringConcat
	extrn	_bbStringFromInt
	extrn	_brl_standardio_Input
	extrn	_brl_standardio_Print
	extrn	_brl_system_MilliSecs
	public	__bb_main
	public	_bb_t
	section	"code" code
__bb_main:
	push	ebp
	mov	ebp,esp
	push	ebx
	push	esi
	push	edi
	cmp	dword [_40],0
	je	_41
	mov	eax,0
	pop	edi
	pop	esi
	pop	ebx
	pop	ebp
	ret
_41:
	mov	dword [_40],1
	mov	ebx,0
	call	___bb_blitz_blitz_
	call	___bb_system_system_
	call	___bb_standardio_standardio_
	push	2
	call	_bbGCSetMode
	add	esp,4
	push	_25
	call	_brl_standardio_Print
	add	esp,4
	mov	eax,dword [_26]
	and	eax,1
	cmp	eax,0
	jne	_27
	call	_brl_system_MilliSecs
	mov	dword [_bb_t],eax
	or	dword [_26],1
_27:
	mov	edi,1
	jmp	_29
_7:
	push	8191
	push	_30
	call	_bbArrayNew1D
	add	esp,8
	mov	ecx,eax
	mov	ebx,0
	mov	edx,0
	jmp	_36
_10:
	mov	dword [ecx+edx*4+24],1
_8:
	add	edx,1
_36:
	cmp	edx,8190
	jle	_10
_9:
	mov	edx,0
	jmp	_37
_13:
	cmp	dword [ecx+edx*4+24],1
	jne	_38
	mov	eax,edx
	add	eax,edx
	add	eax,3
	mov	esi,edx
	add	esi,eax
	jmp	_14
_16:
	mov	dword [ecx+esi*4+24],0
	add	esi,eax
_14:
	cmp	esi,8190
	jle	_16
_15:
	add	ebx,1
_38:
_11:
	add	edx,1
_37:
	cmp	edx,8190
	jle	_13
_12:
	call	_bbGCCollect
_5:
	add	edi,1
_29:
	cmp	edi,50000
	jle	_7
_6:
	call	_brl_system_MilliSecs
	sub	eax,dword [_bb_t]
	mov	dword [_bb_t],eax
	push	_18
	push	dword [_bb_t]
	call	_bbStringFromInt
	add	esp,4
	push	eax
	push	_39
	call	_bbStringConcat
	add	esp,8
	push	eax
	call	_bbStringConcat
	add	esp,8
	push	eax
	call	_brl_standardio_Print
	add	esp,4
	push	ebx
	call	_bbStringFromInt
	add	esp,4
	push	eax
	push	_19
	call	_bbStringConcat
	add	esp,8
	push	eax
	call	_brl_standardio_Print
	add	esp,4
	push	_20
	call	_brl_standardio_Input
	add	esp,4
	call	_bbEnd
	mov	eax,0
	jmp	_21
_21:
	pop	edi
	pop	esi
	pop	ebx
	pop	ebp
	ret
	section	"data" data writeable align 8
	align	4
_40:
	dd	0
	align	4
_25:
	dd	_bbStringClass
	dd	2147483647
	dd	40
	dw	83,73,69,86,69,32,79,70
	dw	32,69,82,65,84,79,83,84
	dw	72,69,78,69,83,32,45,32
	dw	53,48,48,48,48,32,105,116
	dw	101,114,97,116,105,111,110,115
	align	4
_26:
	dd	0
	align	4
_bb_t:
	dd	0
_30:
	db	"i",0
	align	4
_18:
	dd	_bbStringClass
	dd	2147483647
	dd	8
	dw	32,109,47,115,101,99,115,46
	align	4
_39:
	dd	_bbStringClass
	dd	2147483647
	dd	22
	dw	53,48,48,48,48,32,105,116
	dw	101,114,97,116,105,111,110,115
	dw	32,116,111,111,107,32
	align	4
_19:
	dd	_bbStringClass
	dd	2147483647
	dd	8
	dw	80,114,105,109,101,115,58,32
	align	4
_20:
	dd	_bbStringClass
	dd	2147483647
	dd	17
	dw	82,101,116,117,114,110,32,116
	dw	111,32,101,110,100,32,46,46
	dw	46

dmoc · Post by **dmoc** » Mon Mar 20, 2006 9:50 am

In real-world coding bmax will probably (most likely?) be slower - it's to do with how structured vars are stored.

Sebe · Post by **Sebe** » Mon Mar 20, 2006 11:25 am

You know PB is fater, I know PB is faster

But there are many people out there that don't know if to chose PB or BMX and I get almost sick if I read someone saying BMX is faster just because it's faster in a integer(!) iteration(!!) benchmark

Sebe · Post by **Sebe** » Mon Mar 20, 2006 4:01 pm

Some little "PB kicks BMXs' ass" stuff for you people

http://www.kudoscry.com/public/pi.zip

PUREpi - 8192 Bytes - 250000 drops
Float: 4031 m/secs
Double: 4219 m/secs

piMAX - 51712 Bytes - 250000 drops
Float: 23797 m/secs
Double: 24164 m/secs

PC: AMD Athlon64 3500+ Venice - 1024 MB DDR400 RAM

Feel free to post your benchmarks here.

Edit: YES, both were compiled with no debugger

Nik · Post by **Nik** » Mon Mar 20, 2006 4:23 pm

Hmm what I dont understand wether less means better in the first post the texts suggests it does not but form the Pb Sourcecode it seems so. also why teh hell does the BMX example use floats to mesure the time with GetTickCount thats just nonsense

Sebe · Post by **Sebe** » Mon Mar 20, 2006 4:30 pm

m/secs means milliseconds, so less is better.
And I used floats to measure the time because I wanted to do as many as possible with floats. Nevermind, for the second one I used integers for measuring the time.

Nik · Post by **Nik** » Mon Mar 20, 2006 4:36 pm

So purebasic was better then BMX in the first post which makes me wonder what
GedB means with

Looking at the code, I'm thinking that PB is losing out in the Array look ups.

Does Blitzmax allow you to get the ASM?

thefool · Post by **thefool** » Mon Mar 20, 2006 4:45 pm

Sebe, my processor is a little faster than yours (same ammount), and our blitz max pi calc speeds are about the same (mine is a bit faster), however in NO way our purebasic pi speeds are the same.. We do talk over double as fast as bmax, but NO WAY you could get 4 milliseconds

Are you sure you used the correct ammount of drops?

Post by **Fred** » Mon Mar 20, 2006 4:50 pm

When doing speed test on arithmetic, you have to remove all display stuffs and such, else it's non-sens (for example the console output in your exe).

Nik · Post by **Nik** » Mon Mar 20, 2006 4:57 pm

BTW both examples charmingly run in wine, (I don't want to move the 4 m to my Win Box^^), and the PB example is till much faster, so we can at least say pb's implementation of Print() is much better and does work even with the faked wine API

thefool · Post by **thefool** » Mon Mar 20, 2006 4:58 pm

Fred wrote:When doing speed test on arithmetic, you have to remove all display stuffs and such, else it's non-sens (for example the console output in your exe).

Definently true, printing to console takes a lot power..

Better just print "On the run"
and "Done.. Took xxxx ms"

Sebe · Post by **Sebe** » Mon Mar 20, 2006 5:24 pm

Fred is right, I did remove the PRINT commands on both sources, here's a new test result:

PUREpi - 10 000 000 drops
Floats: 4938 m/secs
Doubles: 5032 m/secs

piMAX - 10 000 000 drops
Floats: 10119 m/secs
Doubles: 9763 m/secs

We do talk over double as fast as bmax, but NO WAY you could get 4 milliseconds

1. it's ca. 4000 m/sec -> 4 secs
2. I DO get 4 sec (sometimes even a bit under 4 secs).

My system:

Windows XP Pro SP2 all updates
AMD Athlon64 3500+ Venice Core
1024 MB DDR400 RAM
GeForce 6800 GT 256MB VRAM
Seagate SATA2 HDD

I can make a screenshot if you don't trust me