Array of UTF8 strings in a Unicode program?

Just starting out? Need help? Post your questions and find answers here.
mikejs
Enthusiast
Enthusiast
Posts: 175
Joined: Thu Oct 21, 2010 9:46 pm

Array of UTF8 strings in a Unicode program?

Post by mikejs »

I'm looking at updating my various programs to support unicode, given that support for ascii compilation is ending soon.

Mostly this is straightforward, but I've found one case where I'm not sure what to do. I'm calling an API that expects, as one of its parameters, a pointer to an array of strings. It turns out that it wants utf8 strings in this array. Passing it ascii works because that overlaps with utf8 to a large extent (and completely in this particular case). Passing it any sort of 16bit unicode format doesn't work.

If I compile my program in ascii, all is well, but in unicode, I don't see an easy way to create the right kind of array. The parameter type for the call is a pointer, so I can't use .p-utf8 anywhere, and there doesn't seem to be a way of saying, when you Dim the array, that you want the strings to be anything other than native, whatever that happens to be. PokeS has utf8 options, but I'm not sure how to use that with array entries, and that looks a very messy solution to something that just worked with ascii compilation.

Any suggestions as to a way around this?

Have I missed something obvious?
User avatar
skywalk
Addict
Addict
Posts: 4211
Joined: Wed Dec 23, 2009 10:14 pm
Location: Boston, MA

Re: Array of UTF8 strings in a Unicode program?

Post by skywalk »

Convert your array to .b(byte) or .a(ascii) or a memory block and send its pointer.
A single chr(0) denotes the end of string.
The nice thing about standards is there are so many to choose from. ~ Andrew Tanenbaum
Oma
Enthusiast
Enthusiast
Posts: 312
Joined: Thu Jun 26, 2014 9:17 am
Location: Germany

Re: Array of UTF8 strings in a Unicode program?

Post by Oma »

I've got the same problem.
A API Import like

Code: Select all

ImportC ""
	gtk_about_dialog_set_authors(*about, *authors)
EndImport
need to pass an UTF8-String-Array for *authors

I did it this way, but it's really not very elegant:

Code: Select all

Procedure.s UniToUtf8(SUni.s)
	Protected *gMem= AllocateMemory(StringByteLength(SUni, #PB_Unicode) + 2)
	PokeS(*gMem, SUni, -1,  #PB_UTF8)
	Protected.s UTF8= PeekS(*gMem)
	FreeMemory(*gMem)
	ProcedureReturn UTF8
EndProcedure

Global Dim AboutAuthors.s(4)
AboutAuthors(0)= UniToUtf8("Huber")
AboutAuthors(1)= UniToUtf8("Meier")
AboutAuthors(2)= UniToUtf8("Müller")
AboutAuthors(3)= UniToUtf8("Schmidt")
AboutAuthors(4)= #Null$; here it works ;-)

...
...
gtk_about_dialog_set_authors(*gAbout, @AboutAuthors())
I don't know if there's a better way :?:

Best Regards
Charly
PureBasic 5.4-5.7, Linux: (X/L/K)Ubuntus+Mint - Windows XP (32Bit)
PureBasic Linux-API-Library & Viewer: http://www.chabba.de
mikejs
Enthusiast
Enthusiast
Posts: 175
Joined: Thu Oct 21, 2010 9:46 pm

Re: Array of UTF8 strings in a Unicode program?

Post by mikejs »

skywalk wrote:Convert your array to .b(byte) or .a(ascii) or a memory block and send its pointer.
A single chr(0) denotes the end of string.
Yes, but if the array now contains numbers, how do I populate the strings in it?

Previously I had code like this:

Code: Select all

      Dim attrs.s(10)
      attrs(0)="this"
      attrs(1)="that"
      attrs(2)="theother"
      attrs(3)="somethingelse"
      attrs(4)="blahblahblah"
      attrs(5)=#Null$
What would that look like for an array of .b or .a? Stepping through each string and using Asc() calls?
mikejs
Enthusiast
Enthusiast
Posts: 175
Joined: Thu Oct 21, 2010 9:46 pm

Re: Array of UTF8 strings in a Unicode program?

Post by mikejs »

Oma wrote:I don't know if there's a better way :?:
There probably should be, but that looks workable at least. I'll do some testing and see if that works here. It's the LDAP SDK api, incidentally - at one point you pass an array of utf8 strings listing the attributes you want to read.

Cheers
Little John
Addict
Addict
Posts: 4779
Joined: Thu Jun 07, 2007 3:25 pm
Location: Berlin, Germany

Re: Array of UTF8 strings in a Unicode program?

Post by Little John »

Idles ModXString module might be useful for you.
Facilitates using strings in any format, which may be useful when interacting with external libs
that require either utf8, unicode, wchar, ascii or Bstr
Oma
Enthusiast
Enthusiast
Posts: 312
Joined: Thu Jun 26, 2014 9:17 am
Location: Germany

Re: Array of UTF8 strings in a Unicode program?

Post by Oma »

@mikejs
There (probably) should be
I agree :wink:

@Little John
Thanks for the link. A big thing, but something that should be remembered in some cases.
PureBasic 5.4-5.7, Linux: (X/L/K)Ubuntus+Mint - Windows XP (32Bit)
PureBasic Linux-API-Library & Viewer: http://www.chabba.de
User avatar
skywalk
Addict
Addict
Posts: 4211
Joined: Wed Dec 23, 2009 10:14 pm
Location: Boston, MA

Re: Array of UTF8 strings in a Unicode program?

Post by skywalk »

Smaller example code...Are you sure your api call does not expect the array to be packed?

Code: Select all

EnableExplicit
Macro SF_Uni_UTF8(StringFrom, StringTo)
  StringTo = Space(Len(StringFrom)+SizeOf(Character)+1) ; Only works if I add +1 when StringTo is an array element?
  PokeS(@StringTo, StringFrom, -1, #PB_UTF8); | #PB_String_NoZero)
EndMacro
Define.i i, nPts = 5
Define.s s$
Dim a$(nPts-1)
a$(0) = "000011112222333344445555666677778888"
a$(1) = "Huber"
a$(2) = "Meier"
a$(3) = "Müller"    ;<-- Corrupted if SF_Uni_UTF8() adds +2 instead of +3?
a$(4) = "Schmidt"
For i = 0 To nPts-1
  s$ = a$(i)
  SF_Uni_UTF8(s$, a$(i))
Next i
Debug @a$(0)
ShowMemoryViewer(@a$(3), 255)
The nice thing about standards is there are so many to choose from. ~ Andrew Tanenbaum
ElementE
Enthusiast
Enthusiast
Posts: 139
Joined: Sun Feb 22, 2015 2:33 am

Re: Array of UTF8 strings in a Unicode program?

Post by ElementE »

skywalk wrote:a$(3) = "Müller" ;<-- Corrupted if SF_Uni_UTF8() adds +2 instead of +3[/code]
UTF-8 hex encoding of ü (Unicode code point U-00FC) is $C3BC.
A check should be added to maker sure the characters in the string can be represented as one-byte ASCII characters.
Think Unicode!
User avatar
kenmo
Addict
Addict
Posts: 2033
Joined: Tue Dec 23, 2003 3:54 am

Re: Array of UTF8 strings in a Unicode program?

Post by kenmo »

How about this method?

It's not very user-friendly to walk the UTF-8 array in PB, but if you just need a UTF-8 array pointer to pass to API, it should be OK.

Code: Select all

Procedure.i MakeUTF8Array(Array Strings.s(1))
  N = ArraySize(Strings()) + 1
  *Array = AllocateMemory(SizeOf(INTEGER) * (N + 1))
  If (*Array)
    For i = 0 To N - 1
      SBL = StringByteLength(Strings(i), #PB_UTF8) + 1
      *Ptr = AllocateMemory(SBL)
      If (*Ptr)
        PokeS(*Ptr, Strings(i), -1, #PB_UTF8)
        PokeI(*Array + i * SizeOf(INTEGER), *Ptr)
      EndIf
    Next i
  EndIf
  ProcedureReturn *Array
EndProcedure

Procedure.i FreeUTF8Array(*Array)
  If (*Array)
    i = 0
    Repeat
      *Ptr = PeekI(*Array + i * SizeOf(INTEGER))
      If (Not *Ptr)
        Break
      EndIf
      FreeMemory(*Ptr)
      i + 1
    ForEver
    FreeMemory(*Array)
  EndIf
  ProcedureReturn #Null
EndProcedure

;- - DEMO

Dim Test.s(4)
Test(0) = "000011112222333344445555666677778888"
Test(1) = "Huber"
Test(2) = "Meier"
Test(3) = "Müller"
Test(4) = "Schmidt"

*Array = MakeUTF8Array(Test())

If (*Array)
  
  i = 0
  Repeat
    *Ptr = PeekI(*Array + i * SizeOf(INTEGER))
    If (Not *Ptr)
      Break
    EndIf
    Debug "UTF-8 string at " + Hex(*Ptr) + " = " + PeekS(*Ptr, -1, #PB_UTF8)
    i + 1
  ForEver
  
  FreeUTF8Array(*Array)
EndIf
User avatar
Demivec
Addict
Addict
Posts: 4260
Joined: Mon Jul 25, 2005 3:51 pm
Location: Utah, USA

Re: Array of UTF8 strings in a Unicode program?

Post by Demivec »

@kenmo: I get a strange bug when running your code.

After the value is assigned to Test(1) the value can't be retrieved again. It is as if it is a Null$. The debug output from your demo shows nothing for that index, the debug output shows only 4 lines because the 2nd line is blank.

It looks like this:

Code: Select all

000011112222333344445555666677778888

Meier
Müller
Schmidt
UTF-8 string at 1AE0870 = 000011112222333344445555666677778888
UTF-8 string at 1AE08A0 = 
UTF-8 string at 1AE08C0 = Meier
UTF-8 string at 1AE08E0 = Müller
UTF-8 string at 1AE0900 = Schmidt
I even tried debugging the output right after the initial assignment and it shows nothing. :shock:


The rest of the code (besides the initial assignment) functions as expected.


I compiled it as Unicode with Windows 8.1 x64 PB v5.40.
ElementE
Enthusiast
Enthusiast
Posts: 139
Joined: Sun Feb 22, 2015 2:33 am

Re: Array of UTF8 strings in a Unicode program?

Post by ElementE »

Using the code kenmo posted (see below) I get the following Debug Output (PB v5.41 LTS Beta 2, Windows 64-bit):
UTF-8 string at 241330 = 000011112222333344445555666677778888
UTF-8 string at 241360 = Huber
UTF-8 string at 241380 = Meier
UTF-8 string at 2413A0 = Müller
UTF-8 string at 2413C0 = Schmidt
kenmo wrote:How about this method?

It's not very user-friendly to walk the UTF-8 array in PB, but if you just need a UTF-8 array pointer to pass to API, it should be OK.

Code: Select all

Procedure.i MakeUTF8Array(Array Strings.s(1))
  N = ArraySize(Strings()) + 1
  *Array = AllocateMemory(SizeOf(INTEGER) * (N + 1))
  If (*Array)
    For i = 0 To N - 1
      SBL = StringByteLength(Strings(i), #PB_UTF8) + 1
      *Ptr = AllocateMemory(SBL)
      If (*Ptr)
        PokeS(*Ptr, Strings(i), -1, #PB_UTF8)
        PokeI(*Array + i * SizeOf(INTEGER), *Ptr)
      EndIf
    Next i
  EndIf
  ProcedureReturn *Array
EndProcedure

Procedure.i FreeUTF8Array(*Array)
  If (*Array)
    i = 0
    Repeat
      *Ptr = PeekI(*Array + i * SizeOf(INTEGER))
      If (Not *Ptr)
        Break
      EndIf
      FreeMemory(*Ptr)
      i + 1
    ForEver
    FreeMemory(*Array)
  EndIf
  ProcedureReturn #Null
EndProcedure

;- - DEMO

Dim Test.s(4)
Test(0) = "000011112222333344445555666677778888"
Test(1) = "Huber"
Test(2) = "Meier"
Test(3) = "Müller"
Test(4) = "Schmidt"

*Array = MakeUTF8Array(Test())

If (*Array)
  
  i = 0
  Repeat
    *Ptr = PeekI(*Array + i * SizeOf(INTEGER))
    If (Not *Ptr)
      Break
    EndIf
    Debug "UTF-8 string at " + Hex(*Ptr) + " = " + PeekS(*Ptr, -1, #PB_UTF8)
    i + 1
  ForEver
  
  FreeUTF8Array(*Array)
EndIf
Think Unicode!
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3942
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Re: Array of UTF8 strings in a Unicode program?

Post by wilbert »

Here a utf8 conversion using prototype working in both ascii and unicode mode.

Code: Select all

Prototype.s ProtoUTF8String(str.p-utf8)

Procedure.s UTF8String_(*str)
  CompilerIf #PB_Compiler_Unicode
    ProcedureReturn PeekS(*str, (MemoryStringLength(*str, #PB_Ascii) + 1) >> 1)
  CompilerElse
    ProcedureReturn PeekS(*str)
  CompilerEndIf
EndProcedure

Global UTF8String.ProtoUTF8String = @UTF8String_()


Dim Test.s(2)
Test(0) = UTF8String("Meier")
Test(1) = UTF8String("Müller")
Test(2) = UTF8String("Schmidt")

Debug Test(1)
Debug PeekS(@Test(1), -1, #PB_UTF8)
Windows (x64)
Raspberry Pi OS (Arm64)
mikejs
Enthusiast
Enthusiast
Posts: 175
Joined: Thu Oct 21, 2010 9:46 pm

Re: Array of UTF8 strings in a Unicode program?

Post by mikejs »

wilbert wrote:Here a utf8 conversion using prototype working in both ascii and unicode mode.
That works for me :)
User avatar
kenmo
Addict
Addict
Posts: 2033
Joined: Tue Dec 23, 2003 3:54 am

Re: Array of UTF8 strings in a Unicode program?

Post by kenmo »

Demivec wrote:After the value is assigned to Test(1) the value can't be retrieved again. It is as if it is a Null$. The debug output from your demo shows nothing for that index, the debug output shows only 4 lines because the 2nd line is blank.

...

I compiled it as Unicode with Windows 8.1 x64 PB v5.40.
Very weird. Maybe cut out code until you find exactly what ruins the string (might be a 64-bit PB bug?). I can't test the 64-bit version right now.
Post Reply