Page 1 of 1

Variability in Pack size with optimised code

Posted: Tue Apr 16, 2024 5:20 pm
by mikejs
Hi all,

I have code that, amongst other things, creates a pack file (with UseLZMAPacker()), and I noticed that with 6.10, the size of the resulting pack file does not seem to be deterministic. That is, I can run the same code multiple times against the same files, and get slightly different sized output files each time.

After a bit of poking around, this seems to be a side effect of enabling "Optimise generated code" in the compiler options. But the resulting variation happens at runtime, not at compile time (that is, compiling an exe and running the exe multiple times gives different results).

It also seems to require more than one file being added to the pack.

E.g. some example code:

Code: Select all

UseLZMAPacker()
Define.i pkhnd
pkhnd=CreatePack(#PB_Any, "C:\Temp\TestPack_"+Str(Random(100))+".lzma")
If pkhnd
  AddPackFile(pkhnd, "C:\Program Files\PureBasic\Purebasic.exe", "Purebasic.exe")
  AddPackFile(pkhnd, "C:\Program Files\PureBasic\Purebasic.chm", "Purebasic.chm")
  ClosePack(pkhnd)
EndIf
Running this a few times gives:

Code: Select all

16/04/2024  16:52         7,975,393 TestPack_1.lzma
16/04/2024  16:46         7,975,391 TestPack_40.lzma
16/04/2024  16:49         7,975,390 TestPack_73.lzma
16/04/2024  16:45         7,975,392 TestPack_8.lzma
As far as I can tell, the resulting files are all valid and unpack correctly, so this isn't necessarily a problem, but it's something else that came up as an unexpected discrepancy when checking that newer code matched the output of older code.

This effect is reproducible with either backend. Is this expected behaviour?

Re: Variability in Pack size with optimised code

Posted: Tue Apr 16, 2024 5:49 pm
by pjay
My guess is it's due to the LZMA storing the file date metadata. This metadata contains an 'accessed date' field, which will change every time you add the file & could therefore be compressing slightly differently each time a second ticks past.

Re: Variability in Pack size with optimised code

Posted: Tue Apr 16, 2024 10:35 pm
by jacdelad
Try PackMemory() several times and validate whether the result varies in size.

Re: Variability in Pack size with optimised code

Posted: Wed Apr 17, 2024 9:54 am
by mikejs
pjay wrote: Tue Apr 16, 2024 5:49 pm My guess is it's due to the LZMA storing the file date metadata. This metadata contains an 'accessed date' field, which will change every time you add the file & could therefore be compressing slightly differently each time a second ticks past.
I think this is probably the answer, as this seems to be new in 6.10.

Even if I get the same output size, doing a file compare shows that the contents are different, and that would make sense if the last accessed timestamp is being stored.

What was confusing me though was that the difference in size seemed to be related to the compilation options - this is a lot easier to reproduce with Optimise generated code turned on. Not sure why that would affect it.

Either way, I'll stop worrying about this. It does unfortunately mean that the output is going to be different every time, which makes sanity checking that new code works the same as old code difficult.

Re: Variability in Pack size with optimised code

Posted: Wed Apr 17, 2024 10:21 am
by pjay
As long as you have write access to the files then you could use the SetFileDate() function just before you pack each of them:

Code: Select all

UseLZMAPacker()
UseSHA1Fingerprint()
OpenConsole()
Define.i pkhnd
file_output.s = "C:\Temp\TestPack_ConstantAccessDate.lzma"
file1.s = "C:\Program Files\PureBasic\Purebasic.exe"
file2.s = "C:\Program Files\PureBasic\Purebasic.chm"
For x = 1 To 10
  pkhnd = CreatePack(#PB_Any, file_output)
  If pkhnd
    If Not SetFileDate(file1,#PB_Date_Accessed,Date(2024,04,17,0,0,0)) : PrintN( "Error, no write access to file: "+file1) : EndIf
    AddPackFile(pkhnd, file1, GetFilePart(file1))
    If Not SetFileDate(file2,#PB_Date_Accessed,Date(2024,04,17,0,0,0)) : PrintN( "Error, no write access to file: "+file2) : EndIf
    AddPackFile(pkhnd, file2, GetFilePart(file2))
    ClosePack(pkhnd)
  EndIf
  
  PrintN("File size: "+Str(FileSize(file_output))+" - Hash: "+FileFingerprint(file_output,#PB_Cipher_SHA1))
  DeleteFile(file_output)
Next

Input()