[Done] 4.30 IncludeFile and UTF-8

Post bugreports for the Windows version here
breeze4me
Enthusiast
Enthusiast
Posts: 511
Joined: Thu Mar 09, 2006 9:24 am
Location: S. Kor

[Done] 4.30 IncludeFile and UTF-8

Post by breeze4me »

Compiler option: UTF-8, unicode

Non-english folder and file names are not recognized.
(XIncludeFile, IncludeFile, IncludePath, IncludeBinary)

Code: Select all

;XIncludeFile "C:\temp\프로그램\abc123.pb"
IncludeFile "C:\temp\프로그램\abc123.pb"

;IncludePath "C:\temp\프로그램\"


DataSection
  ;IncludeBinary "C:\temp\프로그램\abc123.pb"
EndDataSection
Error:
Line 2: File not found (D:\temp\?꾨줈洹몃옩\abc123.pb).
Little John
Addict
Addict
Posts: 4519
Joined: Thu Jun 07, 2007 3:25 pm
Location: Berlin, Germany

Re: [4.30] IncludeFile and UTF-8

Post by Little John »

Testing XIncludeFile and file names with German umlauts
(IDE file format UTF-8):
  • works with PB 5.73 LTS (x64)
  • again a bug in PB 6.04 LTS (x64) – hotfix from 2023-12-15
User avatar
loadstone
User
User
Posts: 97
Joined: Wed Jan 16, 2008 11:28 am
Location: china

Re: [4.30] IncludeFile and UTF-8

Post by loadstone »

6.04 final demo

@fred

6.03 compilation prompt from can not find the directory,6.04 version click to compile becomes unable to open the directory, the Chinese directory in the error prompt is still garbled,6.02 Everything is ok
Diamond Sutra: all law I was into at, forbearance .
http://www.8do8.com
Fred
Administrator
Administrator
Posts: 16621
Joined: Fri May 17, 2002 4:39 pm
Location: France
Contact:

Re: [Done]4.30 IncludeFile and UTF-8

Post by Fred »

Fixed in 6.10b2 The Windows compiler is now fully unicode for file handling.
breeze4me
Enthusiast
Enthusiast
Posts: 511
Joined: Thu Mar 09, 2006 9:24 am
Location: S. Kor

Re: [Done]4.30 IncludeFile and UTF-8

Post by breeze4me »

Fred wrote: Tue Jan 16, 2024 8:36 am Fixed in 6.10b2 The Windows compiler is now fully unicode for file handling.
IncludeFile, XIncludeFile, and IncludePath work, but IncludeBinary is still not working in 6.10 b2.
The file exists but is not recognized.

Code: Select all

DataSection
  IncludeBinary "Z:\프로그램\TXT 파일.txt"
EndDataSection
Included file not found: Z:\프로그램\TXT 파일.txt.
Fred
Administrator
Administrator
Posts: 16621
Joined: Fri May 17, 2002 4:39 pm
Location: France
Contact:

Re: [Done] 4.30 IncludeFile and UTF-8

Post by Fred »

Fixed.
breeze4me
Enthusiast
Enthusiast
Posts: 511
Joined: Thu Mar 09, 2006 9:24 am
Location: S. Kor

Re: [Done] 4.30 IncludeFile and UTF-8

Post by breeze4me »

Fred wrote: Fri Jan 19, 2024 5:53 pmFixed.
Not fixed in beta 3.
Now a new error is raised.
---------------------------
PureBasic - Assembler error
---------------------------
purebasic.asm [652]:
file "Z:\프로그램\TXT파~1.TXT"
processed: file 'Z:\프로그램\TXT파~1.TXT'
error: file not found.
So I did an experiment.
I created a purebasic.asm file (UTF-8), converted it to ascii (system default code page) format, recompiled it, and it worked.
It seems that the assembler doesn't recognize the UTF-8 string and only sees it as ascii.

BTW, in the C backend, it compiles fine with no errors.
The problem only occurs in the assembly backend.



Edit:
According to a search on the FASM forum, FASM can only read ANSI source code.
So, it seems best to have the pb compiler read the file specified in IncludeBinary and write the data byte by byte to the data section.
For example, "dd 0xAABBCCDD" and so on.

Another way is to replace CreateFileA API called by Fasm.exe with CreateFileW API via a hook. (by using MS Detours library)
https://github.com/microsoft/Detours

I've done some testing and it seems to work, but I haven't done any in-depth testing so I don't know if it's perfect.
So, try it out only for testing purposes.

1. Compile the Detours library at the above webpage in x86 mode.
2. Compile the code below with PB x86 version to create a dll file. (for example, the name is HookCreateFileA.dll).
3. There is "SetDll.exe" file in the "bin.x86" folder of the compiled Detours library. Run it like this:

Code: Select all

setdll /d:HookCreateFileA.dll FAsm.exe
4. Copy the newly created FAsm.exe and HookCreateFileA.dll files to the PB "Compilers" folder. (Copy the same files for both x86 and x64)

This will ensure that the UTF-8 string filenames in the IncludeBinary are read correctly.

Code: Select all

CompilerIf #PB_Compiler_Processor <> #PB_Processor_x86
  CompilerError "Compile with PB x86."
CompilerEndIf
CompilerIf #PB_Compiler_ExecutableFormat <> #PB_Compiler_DLL
  CompilerError "Compile as DLL file."
CompilerEndIf

Import "Detours.lib"
  DetourTransactionBegin.l()
  DetourRestoreAfterWith()
  DetourTransactionCommit.l()
  DetourUpdateThread.l(hThread)
  DetourAttach.l(*ppPointer, pDetour)
  DetourDetach.l(*ppPointer, pDetour)
EndImport

Import "kernel32.lib"
  CreateFileA(*lpFileName, dwDesiredAccess.l, dwShareMode.l, lpSecurityAttributes, dwCreationDisposition.l, dwFlagsAndAttributes.l, hTemplateFile)
  CreateFileW(lpFileName.s, dwDesiredAccess.l, dwShareMode.l, lpSecurityAttributes, dwCreationDisposition.l, dwFlagsAndAttributes.l, hTemplateFile)
EndImport

Prototype CreateFile(*lpFileName, dwDesiredAccess.l, dwShareMode.l, lpSecurityAttributes, dwCreationDisposition.l, dwFlagsAndAttributes.l, hTemplateFile)
Global OriCreateFileA.CreateFile, Error.l

ProcedureDLL My_CreateFileA(*lpFileName, dwDesiredAccess.l, dwShareMode.l, lpSecurityAttributes, dwCreationDisposition.l, dwFlagsAndAttributes.l, hTemplateFile)
  Protected FileName.s, Result
  
  Result = OriCreateFileA(*lpFileName, dwDesiredAccess, dwShareMode, lpSecurityAttributes, dwCreationDisposition, dwFlagsAndAttributes, hTemplateFile)
  ; The "GetLastError_() = #ERROR_INVALID_NAME" error check will fail because some UTF-8 filenames are accepted by CreateFileA but not actually processed.
  ; Therefore, it's better not to use the GetLastError_() at all.
  If Result = #INVALID_HANDLE_VALUE And dwCreationDisposition = #OPEN_EXISTING ;And GetLastError_() = #ERROR_INVALID_NAME
    If *lpFileName
      FileName = PeekS(*lpFileName, -1, #PB_UTF8)
      If FileName
        Result = CreateFileW(FileName, dwDesiredAccess, dwShareMode, lpSecurityAttributes, dwCreationDisposition, dwFlagsAndAttributes, hTemplateFile)
      EndIf
    EndIf
  EndIf
  ProcedureReturn Result
EndProcedure

ProcedureDLL AttachProcess(Instance)
  Error = -1
  OriCreateFileA = @CreateFileA()
  
  DetourRestoreAfterWith()
  
  If DetourTransactionBegin() = #NO_ERROR
    DetourUpdateThread(GetCurrentThread_())
    DetourAttach(@OriCreateFileA, @My_CreateFileA())
    Error = DetourTransactionCommit()
  EndIf
EndProcedure

ProcedureDLL DetachProcess(Instance)
  If Error = #NO_ERROR
    DetourTransactionBegin()
    DetourUpdateThread(GetCurrentThread_())
    DetourDetach(@OriCreateFileA, @My_CreateFileA())
    DetourTransactionCommit()
  EndIf
EndProcedure

Edit 2:
Another workaround only for Windows 10 1903+.
https://learn.microsoft.com/en-us/windo ... -code-page

The following must be added as a manifest to the fasm.exe file using a resource editing program like Resource Hacker in order to work properly.
Alternatively, you can use Visual Studio's mt.exe as described in the webpage above.

Code: Select all

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<assembly manifestVersion="1.0" xmlns="urn:schemas-microsoft-com:asm.v1">
  <assemblyIdentity type="win32" name="..." version="6.0.0.0"/>
  <application>
    <windowsSettings>
      <activeCodePage xmlns="http://schemas.microsoft.com/SMI/2019/WindowsSettings">UTF-8</activeCodePage>
    </windowsSettings>
  </application>
</assembly>
Last edited by breeze4me on Tue Jan 23, 2024 3:13 pm, edited 4 times in total.
juergenkulow
Enthusiast
Enthusiast
Posts: 544
Joined: Wed Sep 25, 2019 10:18 am

Re: 4.30 IncludeFile and UTF-8

Post by juergenkulow »

Where does the Asian, Korean version of Windows behave differently?
TestDLL.pb:

Code: Select all

; Test Asian Path IncludeBinary x86 ASM Backend DLL 
DataSection
  L:
  IncludeBinary "D:\프로그램\TXT우주입니다.txt"
  E:
EndDataSection

ProcedureDLL Test()
  MessageRequester("Test",PeekS(?L,?E-?L,#PB_UTF8))
EndProcedure

; ASM-Backend D:\PB610B3x86\Compilers\pbcompiler.exe D:\프로그램\TestDLL.pb /Commented /DLL /OUTPUT "D:\TestDLL.dll"
; PureBasic 6.10 beta 3 (Windows - x86) generated code
; section '.data' data readable writeable
; l_l:
; file "D:\____~1\TXT___~1.TXT"
; l_e:
; SYS_EndDataSection:

; D:\TestDLL.dll:
; 00049230                                      EF BB BF EC            ì
; 00049240  95 88 EB 85 95 ED 95 98 EC 84 B8 EC 9A 94 2C 20  ??ë??í??ì?¸ì??, 
; 00049250  EB B3 B4 EC 9D B4 EB 8A 94 20 EC 9A B0 EC A3 BC  보이ë?? ì?°ì£¼
; 00049260  EC 9E 85 EB 8B 88 EB 8B A4 2E 20 48 61 6C 6C 6F  ì??ë??ë?¤. Hallo
; 00049270  20 73 69 63 68 74 62 61 72 65 73 20 55 6E 69 76   sichtbares Univ
; 00049280  65 72 73 75 6D 2E 20 00 00 00 00 00 01 00 00 00  ersum. ......... 
D:\프로그램\TXT우주입니다.txt :

Code: Select all

안녕하세요, 보이는 우주입니다. Hallo sichtbares Universum. 

Code: Select all

; DLLTest
If OpenLibrary(0, "D:\TestDLL.dll")
  CallFunction(0, "Test")
  CloseLibrary(0)
EndIf

Code: Select all

; Show 
Debug PeekS(?L,?E-?L,#PB_UTF8)
DataSection
  L:
  Data.a $EF, $BB, $BF, $EC, $95, $88, $EB, $85, $95, $ED, $95, $98, $EC, $84, $B8, $EC
  Data.a $9A, $94, $2C, $20, $EB, $B3, $B4, $EC, $9D, $B4, $EB, $8A, $94, $20, $EC, $9A
  Data.a $B0, $EC, $A3, $BC, $EC, $9E, $85, $EB, $8B, $88, $EB, $8B, $A4, $2E
  E:
EndDataSection  
; 안녕하세요, 보이는 우주입니다.     
breeze4me
Enthusiast
Enthusiast
Posts: 511
Joined: Thu Mar 09, 2006 9:24 am
Location: S. Kor

Re: 4.30 IncludeFile and UTF-8

Post by breeze4me »

@juergenkulow

Generated purebasic.asm code (UTF-8):

Code: Select all

......
align 4
PB_DataPointer rd 1
align 4
align 4
align 4
align 4
I_BSSEnd:
section '.data' data readable writeable
l_l:
file "Z:\프로그램\TXT우~1.TXT"
l_e:
SYS_EndDataSection:
Read as ANSI.

Code: Select all

align 4
PB_DataPointer rd 1
align 4
align 4
align 4
align 4
I_BSSEnd:
section '.data' data readable writeable
l_l:
file "Z:\?꾨줈洹몃옩\TXT??1.TXT"                  <---- Not recognized by FASM.
l_e:
SYS_EndDataSection:

Code: Select all

C:\Programs\PureBasic v6.10 x86\Compilers>pbcompiler.exe Z:\프로그램\TestDLL.pb /Commented /DLL /OUTPUT "z:\TestDLL.dll"
PureBasic 6.10 beta 3 (Windows - x86)
Compiling Z:\프로그램\TestDLL.pb
Loading external libraries...
Starting compilation...
9 lines processed.
Error: Assembler
purebasic.asm [239]:
file "Z:\?꾨줈洹몃옩\TXT??1.TXT"
processed: file 'Z:\?꾨줈洹몃옩\TXT??1.TXT'
error: file not found.

C:\Programs\PureBasic v6.10 x86\Compilers>

The word "Français" is not read by CreateFileA on Korean Windows.
Since only the ANSI version of the API is used by Fasm, multilingual filenames are not supported.

Code: Select all

Import "kernel32.lib"
  CreateFileA(*lpFileName, dwDesiredAccess.l, dwShareMode.l, lpSecurityAttributes, dwCreationDisposition.l, dwFlagsAndAttributes.l, hTemplateFile)
  CompilerIf #PB_Compiler_Processor = #PB_Processor_x64 
    CreateFileA__(lpFileName.s, dwDesiredAccess.l, dwShareMode.l, lpSecurityAttributes, dwCreationDisposition.l, dwFlagsAndAttributes.l, hTemplateFile) As "CreateFileA"
  CompilerElse
    CreateFileA__(lpFileName.s, dwDesiredAccess.l, dwShareMode.l, lpSecurityAttributes, dwCreationDisposition.l, dwFlagsAndAttributes.l, hTemplateFile) As "_CreateFileA@28"
  CompilerEndIf
EndImport

;*lpFileName = UTF8("z:\Français.txt") ;the same result.
*lpFileName = Ascii("z:\Français.txt")
Debug CreateFileA(*lpFileName, #GENERIC_READ, #FILE_SHARE_READ, 0, #OPEN_EXISTING, #FILE_ATTRIBUTE_NORMAL, 0)
Debug Bool(GetLastError_() = #ERROR_FILE_NOT_FOUND)

lpFileName.s = "z:\Français.txt"
Debug CreateFileA__(lpFileName, #GENERIC_READ, #FILE_SHARE_READ, 0, #OPEN_EXISTING, #FILE_ATTRIBUTE_NORMAL, 0)
Debug Bool(GetLastError_() = #ERROR_FILE_NOT_FOUND)

;CreateFileW
h = CreateFile_(lpFileName, #GENERIC_READ, #FILE_SHARE_READ, 0, #OPEN_EXISTING, #FILE_ATTRIBUTE_NORMAL, 0)
Debug h
CloseHandle_(h)
[20:21:19] -1
[20:21:19] 1
[20:21:19] -1
[20:21:19] 1
[20:21:19] 312
Fred
Administrator
Administrator
Posts: 16621
Joined: Fri May 17, 2002 4:39 pm
Location: France
Contact:

Re: 4.30 IncludeFile and UTF-8

Post by Fred »

I used shortname to allow this with fasm, I don't know why it behaves differently on Korean windows
User_Russian
Addict
Addict
Posts: 1443
Joined: Wed Nov 12, 2008 5:01 pm
Location: Russia

Re: 4.30 IncludeFile and UTF-8

Post by User_Russian »

Not only in Korean. It’s the same with Russian.

Code: Select all

DataSection
  IncludeBinary "D:\Новый текстовый документ.txt"
EndDataSection
PureBasic - Assembler error

purebasic.asm [665]:
file "D:\Новый текстовый документ.txt"
processed: file 'D:\Новый текстовый документ.txt'
error: file not found.
The file is on disk and the FileSize() function confirms this.
The file path "D:\Новый текстовый документ.txt" is UTF-8 encoded, but the file "purebasic.asm" does not have a BOM. This may be the reason for the error.

With C backend there is no error and the code compiles successfully.
Last edited by User_Russian on Mon Jan 22, 2024 2:00 pm, edited 2 times in total.
breeze4me
Enthusiast
Enthusiast
Posts: 511
Joined: Thu Mar 09, 2006 9:24 am
Location: S. Kor

Re: 4.30 IncludeFile and UTF-8

Post by breeze4me »

Fred wrote: Mon Jan 22, 2024 1:10 pm I used shortname to allow this with fasm, I don't know why it behaves differently on Korean windows
Even if it's a short filename, the conversion result will still contain CJK characters, which are not properly read by ANSI APIs because they have completely different byte values depending on the encoding(ASCII or UTF-8).

Code: Select all

A binary view of the string "Z:\프로그램\TXT우~1.TXT"
5A 3A 5C C7 C1 B7 CE B1  D7 B7 A5 5C 54 58 54 BF  EC 7E 31 2E 54 58 54 (Ascii)
5A 3A 5C ED 94 84 EB A1  9C EA B7 B8 EB 9E A8 5C  54 58 54 EC 9A B0 7E 31  2E 54 58 54 (UTF-8)
If the purebasic.asm file is generated in ASCII format, CJK users should at least be able to read files containing their own characters in the IncludeBinary command. However, it will still throw an error if it contains multilingual characters.

Like the reply just above, the word "Français" is not recognized on Korean Windows.

Code: Select all

DataSection
  IncludeBinary "z:\Français.txt"
EndDataSection
---------------------------
PureBasic - Assembler error
---------------------------
purebasic.asm [652]:

file "z:\Français.txt"

processed: file 'z:\Français.txt'

error: file not found.

Edit:
User_Russian wrote: Mon Jan 22, 2024 1:49 pm ..., but the file "purebasic.asm" does not have a BOM. This may be the reason for the error.
No. It's not a "BOM" issue. Fasm.exe reads the purebasic.asm file as a binary without any conversion, which means it doesn't support UTF-8, it reads it as an ASCII(ANSI) file on Windows.
If Fasm supported multilingual character sets like UTF-8, it would use the ~~W APIs instead of using ANSI-specific APIs like CreateFileA.
Fred
Administrator
Administrator
Posts: 16621
Joined: Fri May 17, 2002 4:39 pm
Location: France
Contact:

Re: [Done] 4.30 IncludeFile and UTF-8

Post by Fred »

Should be fixed.
Post Reply