[Added!] Compression: Allow level (fast vs ratio)

Got an idea for enhancing PureBasic? New command(s) you'd like to see?
User avatar
Keya
Addict
Addict
Posts: 1891
Joined: Thu Jun 04, 2015 7:10 am

[Added!] Compression: Allow level (fast vs ratio)

Post by Keya »

P1. It would be great if PB exposed the level of compression (fast vs strong) - PB uses zlib hard-coded at a level that's a good middle ground, but neither particularly fast or particularly strong and sometimes if not often we either want fast or strong but not this middle ground.

I'd imagine it would be trivial to add support for, especially as zlib already supports it as a simple integer parameter - it's right there but we can't touch it. "SetPackLevel(level.i)" perhaps!? I dont know if lzma/brieflz etc support levels, but even if only zip supported it it'd still be worth it. I thought from my tests that PB was using hard-coded level 6 but it appears to be 8, as simply patching this byte from between 1 to 9 changes the speed/ratio:

Code: Select all

0040CC50  /$  FF7424 10     push dword ptr [esp+10]
0040CC54  |.  FF7424 10     push dword ptr [esp+10]
0040CC58  |.  6A 00         push 0
0040CC5A  |.  6A 08         push 8 <-  it's right here in compiled exe but no access to it via CompressMemory()!
0040CC5C  |.  6A 0F         push 0F
0040CC5E  |.  6A 08         push 8
0040CC60  |.  FF7424 20     push dword ptr [esp+20]
0040CC64  |.  FF7424 20     push dword ptr [esp+20]
0040CC68  |.  E8 B3FDFFFF   call PureBasi.0040CA20
0040CC6D  |.  83C4 20       add esp, 20
0040CC70  \.  C3            retn

P2. And if we're adding parameter support for the level of compression, seeing as its 2017 and more options are available perhaps switching over from zlib to zstd (http://www.zstd.net) might be the way to go? I know there's a million different libs for deflate/inflate, but this is worth serious consideration, and as zlib and zstd are both the inflate/deflate algorithm it wouldn't break compatibility with older/prebuilt PB binaries that use zlib. zstd is facebook's implementation of deflate/inflate, and it not only beats zlib for speed, it also beats it for ratio, and it's just a BSD license. And while zlib supports 9 levels, zstd supports 22. Just a simple C lib. Its decompressor is also verrry fast (see image below), although this post only considers compressing.

I did a quick comparison, compressing shell32.dll (8MB):

Code: Select all

Sorted by compression _SIZE_:
ZStd L22:   3827ms. Size=2629347 = 31.0749320984%
ZStd L15:   1412ms. Size=2856344 = 33.7576980591%
ZStd L12:   714ms.  Size=2866242 = 33.8746757507%
ZStd L11:   793ms.  Size=2872718 = 33.9512138367%
ZStd L10:   659ms.  Size=2910523 = 34.3980102539%
ZStd L09:   721ms.  Size=2915126 = 34.4524116516%
ZStd L08:   605ms.  Size=2925838 = 34.5790100098%
ZStd L07:   405ms.  Size=2944612 = 34.8008918762%
ZStd L06:   383ms.  Size=2967094 = 35.0665931702%
ZStd L05:   317ms.  Size=3111381 = 36.7718505859%
ZStd L04:   288ms.  Size=3148635 = 37.2121353149%
ZLib L9:    1497ms. Size=3215743 = 38.0052528381%
ZLib L8:    878ms.  Size=3221524 = 38.0735740662%
ZLib L7:    494ms.  Size=3230386 = 38.1783103943%
ZLib L6:    442ms.  Size=3236379 = 38.2491378784%
*PB Zlib:   504ms.  Size=3236384 = 38.2491989136% <-
ZStd L03:   179ms.  Size=3258843 = 38.5146293640%
ZLib L5:    404ms.  Size=3259220 = 38.5190849304%
ZLib L4:    346ms.  Size=3300830 = 39.0108528137%
ZLib L3:    285ms.  Size=3368303 = 39.8082809448%
ZStd L02:   108ms.  Size=3396132 = 40.1371803284%
ZLib L2:    258ms.  Size=3412803 = 40.3342056274%
ZLib L1:    211ms.  Size=3471516 = 41.0281066895%
ZStd L01:   76ms.   Size=3557520 = 42.0445442200%

Sorted by compression _TIME_:
ZStd L01:   76ms.   Size=3557520 = 42.0445442200%
ZStd L02:   108ms.  Size=3396132 = 40.1371803284%
ZStd L03:   179ms.  Size=3258843 = 38.5146293640%
ZLib L1:    211ms.  Size=3471516 = 41.0281066895%
ZLib L2:    258ms.  Size=3412803 = 40.3342056274%
ZLib L3:    285ms.  Size=3368303 = 39.8082809448%
ZStd L04:   288ms.  Size=3148635 = 37.2121353149%
ZStd L05:   317ms.  Size=3111381 = 36.7718505859%
ZLib L4:    346ms.  Size=3300830 = 39.0108528137%
ZStd L06:   383ms.  Size=2967094 = 35.0665931702%
ZLib L5:    404ms.  Size=3259220 = 38.5190849304%
ZLib L6:    442ms.  Size=3236379 = 38.2491378784%
ZLib L7:    494ms.  Size=3230386 = 38.1783103943%
ZStd L07:   405ms.  Size=2944612 = 34.8008918762%
*PB Zlib:   504ms.  Size=3236384 = 38.2491989136% <-
ZStd L08:   605ms.  Size=2925838 = 34.5790100098%
ZStd L10:   659ms.  Size=2910523 = 34.3980102539%
ZStd L12:   714ms.  Size=2866242 = 33.8746757507%
ZStd L09:   721ms.  Size=2915126 = 34.4524116516%
ZStd L11:   793ms.  Size=2872718 = 33.9512138367%
ZLib L8:    878ms.  Size=3221524 = 38.0735740662%
ZStd L15:   1412ms. Size=2856344 = 33.7576980591%
ZLib L9:    1497ms. Size=3215743 = 38.0052528381%
ZStd L22:   3827ms. Size=2629347 = 31.0749320984%
Image
Image

P3. If no zlib->zstd then zlib itself could do with an update, here's some changes since the 1.2.8 build currently in PB:

Code: Select all

Version 1.2.11 has these key improvements over 1.2.10:
    Fix deflate stored bug when pulling last block from window
    Permit immediate deflateParams changes before any deflate input 
Due to the bug fixes, any installations of 1.2.9 or 1.2.10 should be immediately replaced with 1.2.11.

Version 1.2.10 has these key improvements over 1.2.9:
    Fix bug in deflate_stored() for zero-length input
    Fix bug in gzwrite.c that produced corrupt gzip files 

Version 1.2.9 has these key improvements over 1.2.8:
    Improve compress() and uncompress() to support large lengths
    Allow building zlib outside of the source directory
    Fix bug when level 0 used with Z_HUFFMAN or Z_RLE
    Fix bugs in creating a very large gzip header
    Add uncompress2() function, which returns the input size used
    Dramatically speed up deflation for level 0 (storing)
    Add gzfread() and gzfwrite(), duplicating the interfaces of fread() and fwrite()
    Add crc32_z() and adler32_z() functions with size_t lengths
    Many portability improvements
Last edited by Keya on Sat Jan 28, 2017 12:44 pm, edited 1 time in total.
User avatar
Lunasole
Addict
Addict
Posts: 1091
Joined: Mon Oct 26, 2015 2:55 am
Location: UA
Contact:

Re: Compression: Allow level (fast vs ratio), and Zstd vs Zl

Post by Lunasole »

That would be useful improvement, like one with HTTP headers
"W̷i̷s̷h̷i̷n̷g o̷n a s̷t̷a̷r"
User avatar
Michael Vogel
Addict
Addict
Posts: 2677
Joined: Thu Feb 09, 2006 11:27 pm
Contact:

Re: Compression: Allow level (fast vs ratio), and Zstd vs Zl

Post by Michael Vogel »

Would be a nice feature (as for SaveImage :|) but I would also ask for splitting each packing lib into two parts, compressing and umcompressing (like done with the image libraries).

This would be useful if the decompressing and compressing routines are more or less independent and (at least) the uncompressing lib would be much smaller than the complete packing lib actually. Sometimes programs only need to decompress prepared data, but the 200k+ rucksack just for uncompressing data is hard to carry.
User avatar
skywalk
Addict
Addict
Posts: 3996
Joined: Wed Dec 23, 2009 10:14 pm
Location: Boston, MA

Re: [Added!] Compression: Allow level (fast vs ratio)

Post by skywalk »

+1 for zstd!
The nice thing about standards is there are so many to choose from. ~ Andrew Tanenbaum
Post Reply