Unpack PB ZIP archive with umlauts and subdirectory

Windows specific forum
PeDe
Enthusiast
Enthusiast
Posts: 281
Joined: Sun Nov 26, 2017 3:13 pm

Unpack PB ZIP archive with umlauts and subdirectory

Post by PeDe »

PB v6.11/6.12b2 x64, Windows 7 x64

I created a ZIP archive with PB today. That worked, but I have problems with it.

Directory and file names with umlauts such as "ÜÖÄßüöää" are not restored when unpacking, but are replaced by other characters.

Unpacking with Windows Explorer also works with subdirectories. But e.g. the program 7-Zip v23.01 unpacks all files without subdirectories.

EDIT: Unpacking with 7-Zip now works, I had to use #NPS$ (/) instead of the Windows standard #PS$ (\) in the paths for AddPackFile().

Is this known and normal?

I cannot pass on the ZIP archives in this way. When I create ZIP archives with Windows Explorer, everything works without any problems.

Peter

EDIT 2: Files with umlauts in the ZIP archive cannot be unpacked with PB.
PeDe
Enthusiast
Enthusiast
Posts: 281
Joined: Sun Nov 26, 2017 3:13 pm

Re: Unpack PB ZIP archive with umlauts and subdirectory

Post by PeDe »

It looks like PB saves the file names in UTF8 format in the ZIP archive. According to RFC 1952, the file names must be saved in ISO 8859-1 (LATIN-1) format. This information comes from the website zlib.net. I assume this lib is used by PB.

Edit: PB apparently uses libarchive v3.6.2, which in turn uses zlib v1.2.12. At least that's what I can read in a compiled PB exe.

Peter
PeDe
Enthusiast
Enthusiast
Posts: 281
Joined: Sun Nov 26, 2017 3:13 pm

Re: Unpack PB ZIP archive with umlauts and subdirectory

Post by PeDe »

I took a closer look at the ZIP file created with PB. I noticed two things.

1.
The file names are not UTF-8 encoded, but different. But I don't know how exactly.
Here is a comparison with the data of the PB ZIP archive and the output of UTF8("DATEI_Ö_.txt")

Code: Select all

Original: DATEI_Ö_.txt
                       D  A  T  E  I  _  Ö     _  .  t  x  t
PB ZIP: DATEI_Ç-_.txt [44 41 54 45 49 5F C7 2D 5F 2E 74 78 74]
UTF8(): DATEI_Ö_.txt [44 41 54 45 49 5F C3 96 5F 2E 74 78 74]
2.
If the file names are encoded in UTF-8, bit 11 must also be set in the 'general purpose bit flag'.
This is not the case with the PB ZIP archive. When I create a ZIP archive with 7-Zip, bit 11 is set and the file names are encoded correctly.

Code: Select all

7z a "T:\Archive.zip" "T:\Archive\*.*" -mcu=on
If I change a PB ZIP archive manually with a HEX editor, I can unpack the archive with the correct file name using Windows Explorer.
It is best to use compression 0 (zero), which makes the ZIP archives easier to read in the HEX editor.

Peter
AZJIO
Addict
Addict
Posts: 2183
Joined: Sun May 14, 2017 1:48 am

Re: Unpack PB ZIP archive with umlauts and subdirectory

Post by AZJIO »

I use this method: https://www.purebasic.fr/english/viewtopic.php?t=83418
I checked BriefLZ, Zip, Lzma on Linux.
BriefLZ creates the least amount of problems.
Zip - I had to cache the PackEntryName(0) name into a variable because reusing it produced an empty string, causing the folder's attachments to not appear.

Code: Select all

tmp$ = PackEntryName(0)
If UncompressPackFile(0, path + tmp$, tmp$) = -1 And ForceDirectories(GetPathPart(path + tmp$))
	UncompressPackFile(0, path + tmp$, tmp$)
EndIf
Tar - requires caching the file name like in Zip
Lzma - When creating an archive, the file names consist of a question in a diamond. Archive manager "Engrampa" creates and extracts 7zip archives without any problems.
PeDe
Enthusiast
Enthusiast
Posts: 281
Joined: Sun Nov 26, 2017 3:13 pm

Re: Unpack PB ZIP archive with umlauts and subdirectory

Post by PeDe »

Hello AZJIO,

thanks for your information.

I also tested with the ZIP properties fields for 'version' and 'version made by'. But unpacking a PB ZIP archive with Windows Explorer only worked if the names were encoded in UTF-8 and bit 11 was set.
However, Windows XP cannot unpack these files correctly, as the UTF-8 encoding is not recognized.

A ZIP archive created with 7-Zip without UTF-8 encoding works correctly everywhere under Windows.

Peter
AZJIO
Addict
Addict
Posts: 2183
Joined: Sun May 14, 2017 1:48 am

Re: Unpack PB ZIP archive with umlauts and subdirectory

Post by AZJIO »

PeDe wrote: Sun Aug 11, 2024 2:35 pm I had to use #NPS$ (/)
But on Linux there may be a problem. Then you need to explicitly specify "/"

I made a test example for Windows

Code: Select all

EnableExplicit

; AZJIO
; https://www.purebasic.fr/english/viewtopic.php?p=566355#p566355
EnableExplicit

Procedure FileSearch(List Files.s(), dir.s, mask.s = "*", depth = 130)
	Protected Name.s, c
	Protected Dim hDir(depth)
	Protected Dim SearchPath.s(depth)

	If Right(dir, 1) <> #PS$
		dir + #PS$
	EndIf

	SearchPath(c) = dir
	hDir(c) = ExamineDirectory(#PB_Any, dir, mask)
	If Not hDir(c)
		ProcedureReturn
	EndIf

	Repeat
		While NextDirectoryEntry(hDir(c))
			Name = DirectoryEntryName(hDir(c))
			If Name = "." Or Name = ".."
				Continue
			EndIf
			If DirectoryEntryType(hDir(c)) = #PB_DirectoryEntry_Directory
				If c >= depth
					Continue
				EndIf
				dir = SearchPath(c)
				c + 1
				SearchPath(c) = dir + Name + #PS$
				hDir(c) = ExamineDirectory(#PB_Any, SearchPath(c), mask)
				If Not hDir(c)
					c - 1
				EndIf
			Else
				If AddElement(Files())
					Files() = SearchPath(c) + Name
				EndIf
			EndIf
		Wend
		FinishDirectory(hDir(c))
		c - 1
	Until c < 0
EndProcedure



UseZipPacker()

Define NewList Files.s()
Define Path$ = "C:\PB\Source\Current\archive\test\"
Define length = Len(Path$) + 1
FileSearch(Files(), Path$)
; Debug "Count = " + Str(ListSize(Files()))

; Создаём архивный файл
If CreatePack(0, "config-archive.zip", #PB_PackerPlugin_Zip)
	ForEach Files()
; 		Debug Files()
;     	AddPackFile(0, Files(), Mid(Files(), length))
    	AddPackFile(0, Files(), ReplaceString(Mid(Files(), length), "\", "/", #PB_String_InPlace))
	Next
     ClosePack(0)
EndI
My Cyrillic alphabet is not displayed
Image

Windows:
7zip - ok
BriefLZ - ok
zip - Cyrillic problem in file names
tar - Cyrillic problem in file names
Post Reply