PureBasic crc32 dramatic speed enhancement tip, +zlib q's.

Got an idea for enhancing PureBasic? New command(s) you'd like to see?
User avatar
Rescator
Addict
Addict
Posts: 1769
Joined: Sat Feb 19, 2005 5:05 pm
Location: Norway

PureBasic crc32 dramatic speed enhancement tip, +zlib q's.

Post by Rescator »

I just did some tests on PureBasic's crc32 vs zlib's crc32 (the zlib.lib shipped with PureBasic).

zlib's crc32 is about 2.0 to 2.5 times faster than PureBasic's crc32,
that was a difference of night and day for me, a minute to test x amount of data vs a few seconds O.o

How? Simple...the crc32 in zlib uses a few logical tricks.
The crc32 loop is designed like this.
if len and (*mem & 3) ;i.e. not on 4byte/32bit alignment
;loop start
;do crc 1 byte at a time
*mem+1
;loop end
endif
if len>=8 ;i.e. at least 8 bytes or more left
;loop start
;do crc 1 long (4 bytes/32bit) at a time
;do crc 1 long (4 bytes/32bit) at a time
;do crc 1 long (4 bytes/32bit) at a time
;do crc 1 long (4 bytes/32bit) at a time
*mem+32
;loop end
endif
if len>=4 ;i.e. at least 4 bytes or more left
;loop start
;do crc 1 long (4 bytes/32bit) at a time
*mem+4
;loop end
endif
if len ;i.e. do any bytes left over.
;loop start
;do crc 1 byte at a time
*mem+1
;loop end
endif
(See crc32.c in zlib source for actual code.)
PS! Due to the way len is checked zlib's crc32 do not crash on 0 length data unlike PureBasic's function :P (hides from Fred)

It is obvious that in most cases the 4 longs at a time loop will be used
as most data we usually do crc32 on are more than 32 bytes in length.

Fred, any chance you could try this with PureBasic's crc32 and see if that approaches zlib's crc32 speeds?


And speaking of zlib, not sure how you build zlib.lib for PureBasic,
but it's a very fast performing one.

I tried to compile zlib myself with PellesC, the goal was to toss away everything but the raw deflate and inflate and crc32 so I could use that for some smaller exe's etc. (turns out I didn't save that much compared to the hassle, zlib is built more tight than it looks codewise)
It ended up twice as slow as the one shipped with PureBasic x86 (didn't test x64, sorry)

I also did a build of the normal zlib using VisualC8++ with the following settings:
/O2 /Ob1 /Ot /D "WIN32" /D "NDEBUG" /D "_VC80_UPGRADE=0x0600" /GF /FD /EHsc /MD /GS- /Gy /Fp".\Win32_LIB_Release/zlib.pch" /Fo".\Win32_LIB_Release/" /Fd".\Win32_LIB_Release/" /W3 /nologo /c /Gd /TC /errorReport:prompt

The resulting lib performed about 4% faster in my tests compared to PureBasic's zlib.lib build.
I checked the exe using a hex editor and found no .dll dependencies besides the exact same ones as in PureBasic's zlib.lib
The size was also the same (well the VC8++ build was half a KB smaller).

Could you look into possibly improving zlib.lib's speed like this?
I know, considering how fast it already is 4% may not be much,
but considering that with both crc32 and deflate and inflate you want to pass trough as much data as fast as possible (vital for games or high performance applications) 4% do add up if doing a lot of files/data.

Considering I'm a noob when it comes to VC8++ it may be possible to tweak settings for more speed even.

PS! I have not tested ay GCC builds of zlib, GCC is supposedly faster I hear but I'm no sure in this case, I'd guess the same 4% would be achieved there as well?

The 4% may be nitpicking but since it's achieved using the compiler and compiler flags it's a easy one I hope.

Improving PureBasic's crc needs some re-coding I guess, but considering you might get a 200-250% speed improvement it's really worth it ;)
A similar approach might be applied to other data/stream processing code in PureBasic where appropriate?

Here is the speed test code for the crc stuff:
Testing with 100 runs on a avi files that was 368484KB (*100 = a little over 35GB of data) for some imperical results.
zlib.lib's crc32 took: 71347ms
and PB's crc32 took: 149389ms
(a large avi was chosen since a large amount of changing data is what one usually crc anyway, the file overhead pull down each functions performance a little though, a purely in-memory test would make the speed gap bigger than this)

Code: Select all

EnableExplicit

ImportC "zlib.lib" ;Use PureBasic's zlib.lib build.
 crc32.l(crc.l,*buf,len.l)
EndImport

Enumeration 1
 #File1
EndEnumeration

Define *inbuf,error.l,inbuflen.l,readlen.l
Define crc32.l,start.l,stop.l,file$,runs.i,i.i

timeBeginPeriod_(1)

runs=10
file$="a big movie file here.avi"

inbuflen=64*1024 ;64KB buffer, a good allround buffersize.
*inbuf=AllocateMemory(inbuflen)
If *inbuf
 If ReadFile(#File1,file$)
  FileBuffersSize(#File1,inbuflen)
  ;If using zlib's crc32() in a threadead app it is wise to use get_crc_table() here (or at the start/init of the app)
  ;to avoid the risk of two crc32 calls trying to init the huffman tables at the same time the first time they are used.

  start=timeGetTime_()
  For i=1 To runs
   FileSeek(#File1,0)
   crc32=#Null
   Repeat
    readlen=ReadData(#File1,*inbuf,inbuflen)
    If readlen
     crc32=crc32(crc32,*inbuf,readlen)
    EndIf
   Until readlen=0
  Next
  stop=timeGetTime_()
CompilerIf #PB_Compiler_Debugger=#False
  MessageRequester("zlib.lib CRC32()",Str(stop-start)+"ms")
CompilerEndIf

  start=timeGetTime_()
  For i=1 To runs
   FileSeek(#File1,0)
   crc32=#Null
   Repeat
    readlen=ReadData(#File1,*inbuf,inbuflen)
    If readlen
     crc32=CRC32Fingerprint(*inbuf,readlen,crc32)
    EndIf
   Until readlen=0
  Next
  stop=timeGetTime_()
CompilerIf #PB_Compiler_Debugger=#False
  MessageRequester("CRC32FingerPrint()",Str(stop-start)+"ms")
CompilerEndIf

  CloseFile(#File1)
 EndIf
 FreeMemory(*inbuf)
EndIf

timeEndPeriod_(1)