UTF-8 support for strings

ts-soft · Post by **ts-soft** » Sat Sep 06, 2008 8:48 pm

It is not sure to use other formats in a pb stringvariable, use memory for this:

Procedure StringToUTF(S.s)
  #AutoLength = -1
  Protected *Buffer
 
  *Buffer = AllocateMemory(StringByteLength(S,#PB_UTF8) + 1) ;<== add a byte for the Null
  PokeS(*Buffer,S, #AutoLength, #PB_UTF8)
 
  ProcedureReturn *Buffer
EndProcedure

blueznl · Post by **blueznl** » Sat Sep 06, 2008 9:35 pm

Demivec wrote: You have to add one byte for the Null when you reserve buffer space.

Actually TWO when you work in Windows Unicode / UTF16.... even the null comes double...

Demivec · Post by **Demivec** » Sat Sep 06, 2008 10:07 pm

blueznl wrote:
Demivec wrote: You have to add one byte for the Null when you reserve buffer space.
Actually TWO when you work in Windows Unicode / UTF16.... even the null comes double...

True, but the procedures are designed specifically for Ascii-to-UTF8 and UTF8-to-Ascii.

spacefractal · Post by **spacefractal** » Sun Sep 07, 2008 2:56 am

ts-soft's didn't actually work as it should when I tested here, here is fixed modified version which worked here (these functions does here both way):

Code: Select all

; UTF8 to Unicode/AscII (depend if the app is compiled as unicode or not).
Procedure.s Unicode(s.s)
  Protected *Buffer 
  
  *Buffer = AllocateMemory(StringByteLength(S,#PB_UTF8) + 2) ;<== add a byte for the Null (1 or 2?) 
  PokeS(*Buffer,S, -1, #PB_Ascii) 
  Result$=PeekS(*Buffer, -1, #PB_UTF8)
  FreeMemory(*Buffer)
  ProcedureReturn Result$
EndProcedure
 
; Unicode/AscII (depend if the app is compiled as unicode or not) to UTf8.
Procedure.s UTF8(s.s)
  Protected *Buffer 
  
  *Buffer = AllocateMemory(StringByteLength(S,#PB_UTF8) + 2) ;<== add a byte for the Null (1 or 2)? 
  PokeS(*Buffer,S, -1, #PB_UTF8); 
  Result$=PeekS(*Buffer, -1, #PB_Ascii);
  FreeMemory(*Buffer)
  ProcedureReturn Result$
EndProcedure

UTF8 is a ASCII formatted string using variable length for encodning the chars, hence it need to been "saved" to ASCII, and then convert it to a string using #PB_UTF8.

ts-soft · Post by **ts-soft** » Sun Sep 07, 2008 3:15 am

PB Stringmanager support only Unicode in unicode-applications and ASCII
in ASCII application. Your Return of a stringvariable, that hold a UTF-8 in the
buffer, this is not sure. The UTF-8 is only sure in a allocated memory but
never in a stringvariable.
UTF-8 is never required in a Stringvariable.
If a lib requires UTF-8, you can use a pseudotype or a pointer to memory

blueznl · Post by **blueznl** » Sun Sep 07, 2008 9:40 am

You can store an UTF8 string in a string variable, as a UTF8 string will never contain a zero. Of course, PB's string handling commands will all be thrown off-track...

Hmm.

Except for Linux, I suppose. Is PB Unicode in Linux in UTF16 or UTF8 in memory?

Michael Vogel · Post by **Michael Vogel** » Sun Sep 07, 2008 3:40 pm

The reason for using such routines is simple: sometimes it is necessary to handle different files within one program (preferences, database etc.) - so both text representations must be handled also.

In my case, I have to handle (addtionally to a simple INI file) GPX, HST and TCX files for GPS data. For normal, these files consist of UTF-8 text, but sometimes there is also simple ASCII content.

In such cases, my routines above can help - maybe a fast WhatStringTypeIs() function would be fine to check, if a string is ASCII or UTF8 formated.