lzma unix vs lzma pb

Just starting out? Need help? Post your questions and find answers here.
Paradox
User
User
Posts: 23
Joined: Tue Dec 09, 2014 7:16 pm
Location: Brasil

lzma unix vs lzma pb

Post by Paradox »

Difference in LZMA return used in PB for LZMA Unix ...
How to leave identical?


Code: Select all

UseLZMAPacker()
Procedure main()
  Protected test$ = "teste teste teste teste teste teste teste teste teste"
  Protected size = StringByteLength(test$)
  Protected *Output = AllocateMemory(size)
  Protected compress_size = CompressMemory(@test$, size, *Output, size,#PB_PackerPlugin_Lzma)
  Protected *uncompres_output = AllocateMemory(size)
  Protected uncompress_size = UncompressMemory(*Output,compress_size,*uncompres_output,size,#PB_PackerPlugin_Lzma)
  Debug PeekS(*uncompres_output,uncompress_size)
  ShowMemoryViewer(*Output,compress_size)
EndProcedure
main()
PB LZMA:
Image

Code: Select all

echo "teste teste teste teste teste teste teste teste teste" | lzma | hexdump -C | cut -c9-60
Unix LZMA:
Image
infratec
Always Here
Always Here
Posts: 6871
Joined: Sun Sep 07, 2008 12:45 pm
Location: Germany

Re: lzma unix vs lzma pb

Post by infratec »

Hi,

you made several faults:

1. PB uses Unicode for text, linux shell uses UTF8
2. echo without additional parameter adds always a LF at the end
to avaoid this you have to use -n as parameter
3. The string in PB has always a terminating 0 at the end.
4. If the supplied string is small, it is possible that the compressed result is larger as the string
(cause of the header for the compressed stuff)

A better version:

Code: Select all

UseLZMAPacker()
Procedure main()
  Protected *test = UTF8("teste teste teste teste teste teste teste teste teste" + #LF$)
  ; + 32 for header stuff, so we are always safe
  Protected *Output = AllocateMemory(MemorySize(*test) + 32)
  ; - 1 to eliminate the terminating 0
  Protected compress_size = CompressMemory(*test, MemorySize(*test) - 1, *Output, MemorySize(*Output), #PB_PackerPlugin_Lzma)
  ; + 32 to be safe
  Protected *uncompres_output = AllocateMemory(MemorySize(*test) + 32)
  Protected uncompress_size = UncompressMemory(*Output, compress_size, *uncompres_output, MemorySize(*uncompres_output), #PB_PackerPlugin_Lzma)
  Debug PeekS(*uncompres_output,uncompress_size, #PB_UTF8)
  ShowMemoryViewer(*Output, compress_size)
EndProcedure
main()
Better and not good, because the result looks nearly identical, but the lzma header is different.
059E05E8 5D 00 00 00 02 00 3A 19 4A D7 09 02 9C 61 2E 9E ].....:.J×..œa.ž
059E05F8 E3 DD FF FF FB 7E 90 00 ãÝÿÿû~.
The compressed bytes starting from 00 3A 19 ... are now identical.
But the 'header' is still different.

Bernd
Last edited by infratec on Fri Apr 20, 2018 10:57 am, edited 2 times in total.
infratec
Always Here
Always Here
Posts: 6871
Joined: Sun Sep 07, 2008 12:45 pm
Location: Germany

Re: lzma unix vs lzma pb

Post by infratec »

From:
https://en.wikipedia.org/wiki/Lempel%E2 ... _algorithm
the 7-zip LZMA file format, configuration is performed by a header containing the "properties" byte followed by the 32-bit little-endian dictionary size in bytes
This results in a 5 byte header like generated by PB

From:
https://svn.python.org/projects/externa ... format.txt
The .lzma format file consist of 13-byte Header followed by
the LZMA Compressed Data.
The problem is, that lzma is only a compression algorithm and not a file format.
Post Reply