lzma unix vs lzma pb

Paradox · Post by **Paradox** » Thu Apr 19, 2018 7:41 pm

Difference in LZMA return used in PB for LZMA Unix ...
How to leave identical?

UseLZMAPacker()
Procedure main()
  Protected test$ = "teste teste teste teste teste teste teste teste teste"
  Protected size = StringByteLength(test$)
  Protected *Output = AllocateMemory(size)
  Protected compress_size = CompressMemory(@test$, size, *Output, size,#PB_PackerPlugin_Lzma)
  Protected *uncompres_output = AllocateMemory(size)
  Protected uncompress_size = UncompressMemory(*Output,compress_size,*uncompres_output,size,#PB_PackerPlugin_Lzma)
  Debug PeekS(*uncompres_output,uncompress_size)
  ShowMemoryViewer(*Output,compress_size)
EndProcedure
main()

PB LZMA:

Code: Select all

echo "teste teste teste teste teste teste teste teste teste" | lzma | hexdump -C | cut -c9-60

Unix LZMA:

infratec · Post by **infratec** » Fri Apr 20, 2018 7:20 am

Hi,

you made several faults:

1. PB uses Unicode for text, linux shell uses UTF8
2. echo without additional parameter adds always a LF at the end
to avaoid this you have to use -n as parameter
3. The string in PB has always a terminating 0 at the end.
4. If the supplied string is small, it is possible that the compressed result is larger as the string
(cause of the header for the compressed stuff)

A better version:

Code: Select all

UseLZMAPacker()
Procedure main()
  Protected *test = UTF8("teste teste teste teste teste teste teste teste teste" + #LF$)
  ; + 32 for header stuff, so we are always safe
  Protected *Output = AllocateMemory(MemorySize(*test) + 32)
  ; - 1 to eliminate the terminating 0
  Protected compress_size = CompressMemory(*test, MemorySize(*test) - 1, *Output, MemorySize(*Output), #PB_PackerPlugin_Lzma)
  ; + 32 to be safe
  Protected *uncompres_output = AllocateMemory(MemorySize(*test) + 32)
  Protected uncompress_size = UncompressMemory(*Output, compress_size, *uncompres_output, MemorySize(*uncompres_output), #PB_PackerPlugin_Lzma)
  Debug PeekS(*uncompres_output,uncompress_size, #PB_UTF8)
  ShowMemoryViewer(*Output, compress_size)
EndProcedure
main()

Better and not good, because the result looks nearly identical, but the lzma header is different.

059E05E8 5D 00 00 00 02 00 3A 19 4A D7 09 02 9C 61 2E 9E ].....:.J×..a.
059E05F8 E3 DD FF FF FB 7E 90 00 ãÝÿÿû~.

The compressed bytes starting from 00 3A 19 ... are now identical.
But the 'header' is still different.

Bernd

infratec · Post by **infratec** » Fri Apr 20, 2018 7:38 am

From:
https://en.wikipedia.org/wiki/Lempel%E2 ... _algorithm

the 7-zip LZMA file format, configuration is performed by a header containing the "properties" byte followed by the 32-bit little-endian dictionary size in bytes

This results in a 5 byte header like generated by PB

From:
https://svn.python.org/projects/externa ... format.txt

The .lzma format file consist of 13-byte Header followed by
the LZMA Compressed Data.

The problem is, that lzma is only a compression algorithm and not a file format.

PureBasic Forums - English

lzma unix vs lzma pb

lzma unix vs lzma pb

Re: lzma unix vs lzma pb

Re: lzma unix vs lzma pb