I'm compiling an unicode executable and I have to work with some program parameters in UTF8. Therefore, in PB the UTF8 string is double encoded (unicode encoded utf8 string).
For conversion, I try to poke the input as single byte and then peek it to unicode by seeing it as utf8, but I have issues between plattforms (Linux, Windows). This is my test source:
Code: Select all
; PB 5.22 LTS, 32 Bit, Unicode (Windows, Linux)
; Here is the UTF-8 representation of "©äΣ丌‡".
; Normally retrieved by ProgramParameter() function.
; Now it is double encoded (as executable is unicode).
t.s = "©äΣ丌‡" ;
; DEBUG: output original memory content
l.i = StringByteLength(t.s)
Orig.s = ""
For x.i = 0 To l.i-1
Orig.s = Orig.s + RSet(Hex(PeekA(@t + x)), 2, "0") + " "
Next
Debug "Original Bytes: [" + Trim(Orig.s) + "]"
; Convert from unicode to single bytes UTF-8
sbp.s = Space(StringByteLength(t) / 2 + 1)
FillMemory(@sbp, StringByteLength(sbp), 0)
pbytes.i = PokeS(@sbp, t.s, -1, #PB_Ascii)
; DEBUG: output poked memory content
pked.s = ""
For x.i = 0 To pbytes.i-1
pked.s = pked.s + RSet(Hex(PeekA(@sbp + x)), 2, "0") + " "
Next
Debug "Poked Bytes: [" + Trim(pked.s) + "]"
; Convert from UTF-8 single bytes UTF-8 to unicode string
ret.s = PeekS(@sbp, pbytes.i, #PB_UTF8)
; DEBUG: Output result (should be ©äΣ丌‡)
Debug "Result: [" + ret.s + "]"
EndOriginal Bytes: [C2 00 A9 00 C3 00 A4 00 CE 00 A3 00 E4 00 B8 00 52 01 E2 00 AC 20 A1 00]
Poked Bytes: [C2 A9 C3 A4 CE A3 E4 B8 E2 A1]
Result: [©ä]
This are the results on Windows:
Original Bytes: [C2 00 A9 00 C3 00 A4 00 CE 00 A3 00 E4 00 B8 00 52 01 E2 00 AC 20 A1 00]
Poked Bytes: [C2 A9 C3 A4 CE A3 E4 B8 8C E2 80 A1]
Result: [©äS?‡]
I assume the PokeS() command is making the difference, but why? And is there a secure "PB only" way to do this cross platform?
(I know the debug Window is the reason for not correctly displaying the result on Windows, because using file logging it seems okay on Windows).
Best,
Kukulkan

