Page 1 of 1

[Solved] Problem with PeekS() with utf8 to unicode

Posted: Thu Jan 14, 2016 11:27 am
by Kukulkan
Hello,

I stumbled over a problem and wonder what I'm doing wrong. I like to peek a utf8 string from memory. My program runs in unicode mode.

System: PureBasic 5.24 LTS, 32 Bit, Linux Kubuntu 14.04

This code is showing my problem:

Code: Select all

; PeekS Test
EnableExplicit

; Converts a HEX encoded in number to a decimal
Procedure.i Hex2Dec(Input.s)
  ; "FF" = 256  "65A" = 1626
  Protected i.i, d.i
  
  For i.i = 1 To Len(Input.s) 
    d.i = (d.i << 4) + Asc(UCase(Mid(Input.s, i.i, 1))) - 48 - 7 * (Asc(UCase(Mid(Input.s, i.i, 1))) >> 6) 
  Next 
  ProcedureReturn d.i
EndProcedure

; Convert a given memory area to a hex encoded string
Procedure.s MemoryToHex(*location, length.i, separator.s = "")
  Protected Ergebnis.s = "", x.i
  For x.i = 0 To length.i - 1
    Ergebnis.s + RSet(Hex(PeekA(*location + x), #PB_Byte), 2, "0") + separator.s
  Next
  ProcedureReturn Ergebnis.s
EndProcedure

; Converts a hex string to a memory area (bytes)
Procedure HexToMemory(DataHEX.s, PointerToOutputBuffer.i)
  Protected Position.i, x.i, Zahl.i
  
  Position.i = 0
  For x.i = 1 To Len(DataHEX.s) Step 2
    Zahl.i = Hex2Dec(Mid(DataHEX.s, x.i, 2))
    PokeB(PointerToOutputBuffer.i + Position.i, Zahl.i)
    Position.i = Position.i + 1
  Next
EndProcedure

CompilerIf #PB_Compiler_Unicode = 0
  Debug "Please run in unicode mode"
  End
CompilerEndIf

Procedure main()
  ;                    F r a u _ M i r j a _ B . . s e
  Protected Input.s = "46726175204D69726A612042C3B67365" ; "Frau Mirja Böse" in utf8
  Protected InputLen.i = Len(Input.s) / 2
  
  Protected *buf = AllocateMemory(InputLen.i)
  HexToMemory(Input.s, *buf) ; put to memory in utf8
  Debug "utf8 in memory: " + MemoryToHex(*buf, InputLen.i, " ")
  Debug "Length in bytes: " + Str(InputLen.i)
  
  ; get to unicode from memory
  Protected Output.s = PeekS(*buf, InputLen.i, #PB_UTF8)
  Protected OutputLen.i = StringByteLength(Output.s)
  Debug "Unicode in memory: " + MemoryToHex(@Output.s, OutputLen.i, " ")
  Debug "Length in bytes: " + Str(OutputLen.i)
  FreeMemory(*buf)
EndProcedure

main()
Expected result:
utf8 in memory: 46 72 61 75 20 4D 69 72 6A 61 20 42 C3 B6 73 65
Length in bytes: 16
Unicode in memory: 46 00 72 00 61 00 75 00 20 00 4D 00 69 00 72 00 6A 00 61 00 20 00 42 00 F6 00 73 00 65 00
Length in bytes: 30

My result:
utf8 in memory: 46 72 61 75 20 4D 69 72 6A 61 20 42 C3 B6 73 65
Length in bytes: 16
Unicode in memory: 46 00 72 00 61 00 75 00 20 00 4D 00 69 00 72 00 6A 00 61 00 20 00 42 00 F6 00 73 00 65 00 21 00
Length in bytes: 32

There is one random character to much in my resulting string! Why? How to fix this? What is wrong?

Kukulkan

Re: Problem with PeekS() with utf8 to unicode

Posted: Thu Jan 14, 2016 11:38 am
by wilbert
You need a terminating zero for your utf8 data.

Re: Problem with PeekS() with utf8 to unicode

Posted: Thu Jan 14, 2016 12:02 pm
by Kukulkan
Strange... I now tested on several machines and on some other it works fine! Sometimes it fails and after adding some more debug statements it starts to work?

What is wrong here?

Re: Problem with PeekS() with utf8 to unicode

Posted: Thu Jan 14, 2016 12:04 pm
by Kukulkan
@wilbert: Why? I set the size for PeekS() and therefore it should not need a terminator?

So the line

Code: Select all

Protected *buf = AllocateMemory(InputLen.i + 1)
would fix it (as AllocateMemory() setting the memory to 0)?

Re: Problem with PeekS() with utf8 to unicode

Posted: Thu Jan 14, 2016 12:27 pm
by wilbert
I thought I read a length of -1 for PeekS when I looked at your code :oops:
Anyway, you still need a terminating zero or the correct length.
Your InputLen variable doesn't contain the length of the utf8 string in characters but in bytes and for PeekS you need the length in characters.

Re: Problem with PeekS() with utf8 to unicode

Posted: Thu Jan 14, 2016 12:56 pm
by Kukulkan
wilbert wrote:Your InputLen variable doesn't contain the length of the utf8 string in characters but in bytes and for PeekS you need the length in characters.
Aaaahhh. This is the reason! I solved it now by adding a null byte at the end of the utf8 string by adding "+1" to AllocateMemory() function. This ensures a null byte at the end.

Thank you!

Re: [Solved] Problem with PeekS() with utf8 to unicode

Posted: Thu Jan 14, 2016 8:11 pm
by the.weavster
Kukulkan wrote:
wilbert wrote:Your InputLen variable doesn't contain the length of the utf8 string in characters but in bytes and for PeekS you need the length in characters.
Aaaahhh. This is the reason! I solved it now by adding a null byte at the end of the utf8 string by adding "+1" to AllocateMemory() function. This ensures a null byte at the end.
Or you can do:

Code: Select all

Protected Output.s = PeekS(*buf, InputLen.i, #PB_UTF8 | #PB_ByteLength)

Re: [Solved] Problem with PeekS() with utf8 to unicode

Posted: Thu Jan 14, 2016 8:16 pm
by wilbert
the.weavster wrote:Or you can do:

Code: Select all

Protected Output.s = PeekS(*buf, InputLen.i, #PB_UTF8 | #PB_ByteLength)
I wasn't aware this was added with PB 5.40 .
Nice feature !

Re: [Solved] Problem with PeekS() with utf8 to unicode

Posted: Fri Jan 15, 2016 8:24 am
by Kukulkan
Yeah, nice new option. But I have to stay on PB 5.24 LTS as the new included cURL stuff is not compiling with all the cURL code we already include since years. I can not compile using new PB :-( Have to create a repro and ask Fred or the forum in a few weeks if I find the time.

Kukulkan