[Solved] Problem with PeekS() with utf8 to unicode
Posted: Thu Jan 14, 2016 11:27 am
Hello,
I stumbled over a problem and wonder what I'm doing wrong. I like to peek a utf8 string from memory. My program runs in unicode mode.
System: PureBasic 5.24 LTS, 32 Bit, Linux Kubuntu 14.04
This code is showing my problem:
Expected result:
utf8 in memory: 46 72 61 75 20 4D 69 72 6A 61 20 42 C3 B6 73 65
Length in bytes: 16
Unicode in memory: 46 00 72 00 61 00 75 00 20 00 4D 00 69 00 72 00 6A 00 61 00 20 00 42 00 F6 00 73 00 65 00
Length in bytes: 30
My result:
utf8 in memory: 46 72 61 75 20 4D 69 72 6A 61 20 42 C3 B6 73 65
Length in bytes: 16
Unicode in memory: 46 00 72 00 61 00 75 00 20 00 4D 00 69 00 72 00 6A 00 61 00 20 00 42 00 F6 00 73 00 65 00 21 00
Length in bytes: 32
There is one random character to much in my resulting string! Why? How to fix this? What is wrong?
Kukulkan
I stumbled over a problem and wonder what I'm doing wrong. I like to peek a utf8 string from memory. My program runs in unicode mode.
System: PureBasic 5.24 LTS, 32 Bit, Linux Kubuntu 14.04
This code is showing my problem:
Code: Select all
; PeekS Test
EnableExplicit
; Converts a HEX encoded in number to a decimal
Procedure.i Hex2Dec(Input.s)
; "FF" = 256 "65A" = 1626
Protected i.i, d.i
For i.i = 1 To Len(Input.s)
d.i = (d.i << 4) + Asc(UCase(Mid(Input.s, i.i, 1))) - 48 - 7 * (Asc(UCase(Mid(Input.s, i.i, 1))) >> 6)
Next
ProcedureReturn d.i
EndProcedure
; Convert a given memory area to a hex encoded string
Procedure.s MemoryToHex(*location, length.i, separator.s = "")
Protected Ergebnis.s = "", x.i
For x.i = 0 To length.i - 1
Ergebnis.s + RSet(Hex(PeekA(*location + x), #PB_Byte), 2, "0") + separator.s
Next
ProcedureReturn Ergebnis.s
EndProcedure
; Converts a hex string to a memory area (bytes)
Procedure HexToMemory(DataHEX.s, PointerToOutputBuffer.i)
Protected Position.i, x.i, Zahl.i
Position.i = 0
For x.i = 1 To Len(DataHEX.s) Step 2
Zahl.i = Hex2Dec(Mid(DataHEX.s, x.i, 2))
PokeB(PointerToOutputBuffer.i + Position.i, Zahl.i)
Position.i = Position.i + 1
Next
EndProcedure
CompilerIf #PB_Compiler_Unicode = 0
Debug "Please run in unicode mode"
End
CompilerEndIf
Procedure main()
; F r a u _ M i r j a _ B . . s e
Protected Input.s = "46726175204D69726A612042C3B67365" ; "Frau Mirja Böse" in utf8
Protected InputLen.i = Len(Input.s) / 2
Protected *buf = AllocateMemory(InputLen.i)
HexToMemory(Input.s, *buf) ; put to memory in utf8
Debug "utf8 in memory: " + MemoryToHex(*buf, InputLen.i, " ")
Debug "Length in bytes: " + Str(InputLen.i)
; get to unicode from memory
Protected Output.s = PeekS(*buf, InputLen.i, #PB_UTF8)
Protected OutputLen.i = StringByteLength(Output.s)
Debug "Unicode in memory: " + MemoryToHex(@Output.s, OutputLen.i, " ")
Debug "Length in bytes: " + Str(OutputLen.i)
FreeMemory(*buf)
EndProcedure
main()utf8 in memory: 46 72 61 75 20 4D 69 72 6A 61 20 42 C3 B6 73 65
Length in bytes: 16
Unicode in memory: 46 00 72 00 61 00 75 00 20 00 4D 00 69 00 72 00 6A 00 61 00 20 00 42 00 F6 00 73 00 65 00
Length in bytes: 30
My result:
utf8 in memory: 46 72 61 75 20 4D 69 72 6A 61 20 42 C3 B6 73 65
Length in bytes: 16
Unicode in memory: 46 00 72 00 61 00 75 00 20 00 4D 00 69 00 72 00 6A 00 61 00 20 00 42 00 F6 00 73 00 65 00 21 00
Length in bytes: 32
There is one random character to much in my resulting string! Why? How to fix this? What is wrong?
Kukulkan