Page 1 of 1

lzma unix vs lzma pb

Posted: Thu Apr 19, 2018 7:41 pm
by Paradox
Difference in LZMA return used in PB for LZMA Unix ...
How to leave identical?


Code: Select all

UseLZMAPacker()
Procedure main()
  Protected test$ = "teste teste teste teste teste teste teste teste teste"
  Protected size = StringByteLength(test$)
  Protected *Output = AllocateMemory(size)
  Protected compress_size = CompressMemory(@test$, size, *Output, size,#PB_PackerPlugin_Lzma)
  Protected *uncompres_output = AllocateMemory(size)
  Protected uncompress_size = UncompressMemory(*Output,compress_size,*uncompres_output,size,#PB_PackerPlugin_Lzma)
  Debug PeekS(*uncompres_output,uncompress_size)
  ShowMemoryViewer(*Output,compress_size)
EndProcedure
main()
PB LZMA:
Image

Code: Select all

echo "teste teste teste teste teste teste teste teste teste" | lzma | hexdump -C | cut -c9-60
Unix LZMA:
Image

Re: lzma unix vs lzma pb

Posted: Fri Apr 20, 2018 7:20 am
by infratec
Hi,

you made several faults:

1. PB uses Unicode for text, linux shell uses UTF8
2. echo without additional parameter adds always a LF at the end
to avaoid this you have to use -n as parameter
3. The string in PB has always a terminating 0 at the end.
4. If the supplied string is small, it is possible that the compressed result is larger as the string
(cause of the header for the compressed stuff)

A better version:

Code: Select all

UseLZMAPacker()
Procedure main()
  Protected *test = UTF8("teste teste teste teste teste teste teste teste teste" + #LF$)
  ; + 32 for header stuff, so we are always safe
  Protected *Output = AllocateMemory(MemorySize(*test) + 32)
  ; - 1 to eliminate the terminating 0
  Protected compress_size = CompressMemory(*test, MemorySize(*test) - 1, *Output, MemorySize(*Output), #PB_PackerPlugin_Lzma)
  ; + 32 to be safe
  Protected *uncompres_output = AllocateMemory(MemorySize(*test) + 32)
  Protected uncompress_size = UncompressMemory(*Output, compress_size, *uncompres_output, MemorySize(*uncompres_output), #PB_PackerPlugin_Lzma)
  Debug PeekS(*uncompres_output,uncompress_size, #PB_UTF8)
  ShowMemoryViewer(*Output, compress_size)
EndProcedure
main()
Better and not good, because the result looks nearly identical, but the lzma header is different.
059E05E8 5D 00 00 00 02 00 3A 19 4A D7 09 02 9C 61 2E 9E ].....:.J×..œa.ž
059E05F8 E3 DD FF FF FB 7E 90 00 ãÝÿÿû~.
The compressed bytes starting from 00 3A 19 ... are now identical.
But the 'header' is still different.

Bernd

Re: lzma unix vs lzma pb

Posted: Fri Apr 20, 2018 7:38 am
by infratec
From:
https://en.wikipedia.org/wiki/Lempel%E2 ... _algorithm
the 7-zip LZMA file format, configuration is performed by a header containing the "properties" byte followed by the 32-bit little-endian dictionary size in bytes
This results in a 5 byte header like generated by PB

From:
https://svn.python.org/projects/externa ... format.txt
The .lzma format file consist of 13-byte Header followed by
the LZMA Compressed Data.
The problem is, that lzma is only a compression algorithm and not a file format.