[Solved] Problem with PeekS() with utf8 to unicode

Everything else that doesn't fall into one of the other PB categories.
User avatar
Kukulkan
Addict
Addict
Posts: 1415
Joined: Mon Jun 06, 2005 2:35 pm
Location: germany
Contact:

[Solved] Problem with PeekS() with utf8 to unicode

Post by Kukulkan »

Hello,

I stumbled over a problem and wonder what I'm doing wrong. I like to peek a utf8 string from memory. My program runs in unicode mode.

System: PureBasic 5.24 LTS, 32 Bit, Linux Kubuntu 14.04

This code is showing my problem:

Code: Select all

; PeekS Test
EnableExplicit

; Converts a HEX encoded in number to a decimal
Procedure.i Hex2Dec(Input.s)
  ; "FF" = 256  "65A" = 1626
  Protected i.i, d.i
  
  For i.i = 1 To Len(Input.s) 
    d.i = (d.i << 4) + Asc(UCase(Mid(Input.s, i.i, 1))) - 48 - 7 * (Asc(UCase(Mid(Input.s, i.i, 1))) >> 6) 
  Next 
  ProcedureReturn d.i
EndProcedure

; Convert a given memory area to a hex encoded string
Procedure.s MemoryToHex(*location, length.i, separator.s = "")
  Protected Ergebnis.s = "", x.i
  For x.i = 0 To length.i - 1
    Ergebnis.s + RSet(Hex(PeekA(*location + x), #PB_Byte), 2, "0") + separator.s
  Next
  ProcedureReturn Ergebnis.s
EndProcedure

; Converts a hex string to a memory area (bytes)
Procedure HexToMemory(DataHEX.s, PointerToOutputBuffer.i)
  Protected Position.i, x.i, Zahl.i
  
  Position.i = 0
  For x.i = 1 To Len(DataHEX.s) Step 2
    Zahl.i = Hex2Dec(Mid(DataHEX.s, x.i, 2))
    PokeB(PointerToOutputBuffer.i + Position.i, Zahl.i)
    Position.i = Position.i + 1
  Next
EndProcedure

CompilerIf #PB_Compiler_Unicode = 0
  Debug "Please run in unicode mode"
  End
CompilerEndIf

Procedure main()
  ;                    F r a u _ M i r j a _ B . . s e
  Protected Input.s = "46726175204D69726A612042C3B67365" ; "Frau Mirja Böse" in utf8
  Protected InputLen.i = Len(Input.s) / 2
  
  Protected *buf = AllocateMemory(InputLen.i)
  HexToMemory(Input.s, *buf) ; put to memory in utf8
  Debug "utf8 in memory: " + MemoryToHex(*buf, InputLen.i, " ")
  Debug "Length in bytes: " + Str(InputLen.i)
  
  ; get to unicode from memory
  Protected Output.s = PeekS(*buf, InputLen.i, #PB_UTF8)
  Protected OutputLen.i = StringByteLength(Output.s)
  Debug "Unicode in memory: " + MemoryToHex(@Output.s, OutputLen.i, " ")
  Debug "Length in bytes: " + Str(OutputLen.i)
  FreeMemory(*buf)
EndProcedure

main()
Expected result:
utf8 in memory: 46 72 61 75 20 4D 69 72 6A 61 20 42 C3 B6 73 65
Length in bytes: 16
Unicode in memory: 46 00 72 00 61 00 75 00 20 00 4D 00 69 00 72 00 6A 00 61 00 20 00 42 00 F6 00 73 00 65 00
Length in bytes: 30

My result:
utf8 in memory: 46 72 61 75 20 4D 69 72 6A 61 20 42 C3 B6 73 65
Length in bytes: 16
Unicode in memory: 46 00 72 00 61 00 75 00 20 00 4D 00 69 00 72 00 6A 00 61 00 20 00 42 00 F6 00 73 00 65 00 21 00
Length in bytes: 32

There is one random character to much in my resulting string! Why? How to fix this? What is wrong?

Kukulkan
Last edited by Kukulkan on Thu Jan 14, 2016 12:56 pm, edited 1 time in total.
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3943
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Re: Problem with PeekS() with utf8 to unicode

Post by wilbert »

You need a terminating zero for your utf8 data.
Windows (x64)
Raspberry Pi OS (Arm64)
User avatar
Kukulkan
Addict
Addict
Posts: 1415
Joined: Mon Jun 06, 2005 2:35 pm
Location: germany
Contact:

Re: Problem with PeekS() with utf8 to unicode

Post by Kukulkan »

Strange... I now tested on several machines and on some other it works fine! Sometimes it fails and after adding some more debug statements it starts to work?

What is wrong here?
User avatar
Kukulkan
Addict
Addict
Posts: 1415
Joined: Mon Jun 06, 2005 2:35 pm
Location: germany
Contact:

Re: Problem with PeekS() with utf8 to unicode

Post by Kukulkan »

@wilbert: Why? I set the size for PeekS() and therefore it should not need a terminator?

So the line

Code: Select all

Protected *buf = AllocateMemory(InputLen.i + 1)
would fix it (as AllocateMemory() setting the memory to 0)?
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3943
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Re: Problem with PeekS() with utf8 to unicode

Post by wilbert »

I thought I read a length of -1 for PeekS when I looked at your code :oops:
Anyway, you still need a terminating zero or the correct length.
Your InputLen variable doesn't contain the length of the utf8 string in characters but in bytes and for PeekS you need the length in characters.
Windows (x64)
Raspberry Pi OS (Arm64)
User avatar
Kukulkan
Addict
Addict
Posts: 1415
Joined: Mon Jun 06, 2005 2:35 pm
Location: germany
Contact:

Re: Problem with PeekS() with utf8 to unicode

Post by Kukulkan »

wilbert wrote:Your InputLen variable doesn't contain the length of the utf8 string in characters but in bytes and for PeekS you need the length in characters.
Aaaahhh. This is the reason! I solved it now by adding a null byte at the end of the utf8 string by adding "+1" to AllocateMemory() function. This ensures a null byte at the end.

Thank you!
User avatar
the.weavster
Addict
Addict
Posts: 1581
Joined: Thu Jul 03, 2003 6:53 pm
Location: England

Re: [Solved] Problem with PeekS() with utf8 to unicode

Post by the.weavster »

Kukulkan wrote:
wilbert wrote:Your InputLen variable doesn't contain the length of the utf8 string in characters but in bytes and for PeekS you need the length in characters.
Aaaahhh. This is the reason! I solved it now by adding a null byte at the end of the utf8 string by adding "+1" to AllocateMemory() function. This ensures a null byte at the end.
Or you can do:

Code: Select all

Protected Output.s = PeekS(*buf, InputLen.i, #PB_UTF8 | #PB_ByteLength)
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3943
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Re: [Solved] Problem with PeekS() with utf8 to unicode

Post by wilbert »

the.weavster wrote:Or you can do:

Code: Select all

Protected Output.s = PeekS(*buf, InputLen.i, #PB_UTF8 | #PB_ByteLength)
I wasn't aware this was added with PB 5.40 .
Nice feature !
Windows (x64)
Raspberry Pi OS (Arm64)
User avatar
Kukulkan
Addict
Addict
Posts: 1415
Joined: Mon Jun 06, 2005 2:35 pm
Location: germany
Contact:

Re: [Solved] Problem with PeekS() with utf8 to unicode

Post by Kukulkan »

Yeah, nice new option. But I have to stay on PB 5.24 LTS as the new included cURL stuff is not compiling with all the cURL code we already include since years. I can not compile using new PB :-( Have to create a repro and ask Fred or the forum in a few weeks if I find the time.

Kukulkan
Post Reply