Array of UTF8 strings in a Unicode program?
Array of UTF8 strings in a Unicode program?
I'm looking at updating my various programs to support unicode, given that support for ascii compilation is ending soon.
Mostly this is straightforward, but I've found one case where I'm not sure what to do. I'm calling an API that expects, as one of its parameters, a pointer to an array of strings. It turns out that it wants utf8 strings in this array. Passing it ascii works because that overlaps with utf8 to a large extent (and completely in this particular case). Passing it any sort of 16bit unicode format doesn't work.
If I compile my program in ascii, all is well, but in unicode, I don't see an easy way to create the right kind of array. The parameter type for the call is a pointer, so I can't use .p-utf8 anywhere, and there doesn't seem to be a way of saying, when you Dim the array, that you want the strings to be anything other than native, whatever that happens to be. PokeS has utf8 options, but I'm not sure how to use that with array entries, and that looks a very messy solution to something that just worked with ascii compilation.
Any suggestions as to a way around this?
Have I missed something obvious?
Mostly this is straightforward, but I've found one case where I'm not sure what to do. I'm calling an API that expects, as one of its parameters, a pointer to an array of strings. It turns out that it wants utf8 strings in this array. Passing it ascii works because that overlaps with utf8 to a large extent (and completely in this particular case). Passing it any sort of 16bit unicode format doesn't work.
If I compile my program in ascii, all is well, but in unicode, I don't see an easy way to create the right kind of array. The parameter type for the call is a pointer, so I can't use .p-utf8 anywhere, and there doesn't seem to be a way of saying, when you Dim the array, that you want the strings to be anything other than native, whatever that happens to be. PokeS has utf8 options, but I'm not sure how to use that with array entries, and that looks a very messy solution to something that just worked with ascii compilation.
Any suggestions as to a way around this?
Have I missed something obvious?
Re: Array of UTF8 strings in a Unicode program?
Convert your array to .b(byte) or .a(ascii) or a memory block and send its pointer.
A single chr(0) denotes the end of string.
A single chr(0) denotes the end of string.
The nice thing about standards is there are so many to choose from. ~ Andrew Tanenbaum
Re: Array of UTF8 strings in a Unicode program?
I've got the same problem.
A API Import likeneed to pass an UTF8-String-Array for *authors
I did it this way, but it's really not very elegant:
I don't know if there's a better way
Best Regards
Charly
A API Import like
Code: Select all
ImportC ""
gtk_about_dialog_set_authors(*about, *authors)
EndImport
I did it this way, but it's really not very elegant:
Code: Select all
Procedure.s UniToUtf8(SUni.s)
Protected *gMem= AllocateMemory(StringByteLength(SUni, #PB_Unicode) + 2)
PokeS(*gMem, SUni, -1, #PB_UTF8)
Protected.s UTF8= PeekS(*gMem)
FreeMemory(*gMem)
ProcedureReturn UTF8
EndProcedure
Global Dim AboutAuthors.s(4)
AboutAuthors(0)= UniToUtf8("Huber")
AboutAuthors(1)= UniToUtf8("Meier")
AboutAuthors(2)= UniToUtf8("Müller")
AboutAuthors(3)= UniToUtf8("Schmidt")
AboutAuthors(4)= #Null$; here it works ;-)
...
...
gtk_about_dialog_set_authors(*gAbout, @AboutAuthors())

Best Regards
Charly
PureBasic 5.4-5.7, Linux: (X/L/K)Ubuntus+Mint - Windows XP (32Bit)
PureBasic Linux-API-Library & Viewer: http://www.chabba.de
PureBasic Linux-API-Library & Viewer: http://www.chabba.de
Re: Array of UTF8 strings in a Unicode program?
Yes, but if the array now contains numbers, how do I populate the strings in it?skywalk wrote:Convert your array to .b(byte) or .a(ascii) or a memory block and send its pointer.
A single chr(0) denotes the end of string.
Previously I had code like this:
Code: Select all
Dim attrs.s(10)
attrs(0)="this"
attrs(1)="that"
attrs(2)="theother"
attrs(3)="somethingelse"
attrs(4)="blahblahblah"
attrs(5)=#Null$
Re: Array of UTF8 strings in a Unicode program?
There probably should be, but that looks workable at least. I'll do some testing and see if that works here. It's the LDAP SDK api, incidentally - at one point you pass an array of utf8 strings listing the attributes you want to read.Oma wrote:I don't know if there's a better way![]()
Cheers
-
- Addict
- Posts: 4779
- Joined: Thu Jun 07, 2007 3:25 pm
- Location: Berlin, Germany
Re: Array of UTF8 strings in a Unicode program?
Idles ModXString module might be useful for you.
Facilitates using strings in any format, which may be useful when interacting with external libs
that require either utf8, unicode, wchar, ascii or Bstr
Re: Array of UTF8 strings in a Unicode program?
@mikejs
@Little John
Thanks for the link. A big thing, but something that should be remembered in some cases.
I agreeThere (probably) should be

@Little John
Thanks for the link. A big thing, but something that should be remembered in some cases.
PureBasic 5.4-5.7, Linux: (X/L/K)Ubuntus+Mint - Windows XP (32Bit)
PureBasic Linux-API-Library & Viewer: http://www.chabba.de
PureBasic Linux-API-Library & Viewer: http://www.chabba.de
Re: Array of UTF8 strings in a Unicode program?
Smaller example code...Are you sure your api call does not expect the array to be packed?
Code: Select all
EnableExplicit
Macro SF_Uni_UTF8(StringFrom, StringTo)
StringTo = Space(Len(StringFrom)+SizeOf(Character)+1) ; Only works if I add +1 when StringTo is an array element?
PokeS(@StringTo, StringFrom, -1, #PB_UTF8); | #PB_String_NoZero)
EndMacro
Define.i i, nPts = 5
Define.s s$
Dim a$(nPts-1)
a$(0) = "000011112222333344445555666677778888"
a$(1) = "Huber"
a$(2) = "Meier"
a$(3) = "Müller" ;<-- Corrupted if SF_Uni_UTF8() adds +2 instead of +3?
a$(4) = "Schmidt"
For i = 0 To nPts-1
s$ = a$(i)
SF_Uni_UTF8(s$, a$(i))
Next i
Debug @a$(0)
ShowMemoryViewer(@a$(3), 255)
The nice thing about standards is there are so many to choose from. ~ Andrew Tanenbaum
Re: Array of UTF8 strings in a Unicode program?
UTF-8 hex encoding of ü (Unicode code point U-00FC) is $C3BC.skywalk wrote:a$(3) = "Müller" ;<-- Corrupted if SF_Uni_UTF8() adds +2 instead of +3[/code]
A check should be added to maker sure the characters in the string can be represented as one-byte ASCII characters.
Think Unicode!
Re: Array of UTF8 strings in a Unicode program?
How about this method?
It's not very user-friendly to walk the UTF-8 array in PB, but if you just need a UTF-8 array pointer to pass to API, it should be OK.
It's not very user-friendly to walk the UTF-8 array in PB, but if you just need a UTF-8 array pointer to pass to API, it should be OK.
Code: Select all
Procedure.i MakeUTF8Array(Array Strings.s(1))
N = ArraySize(Strings()) + 1
*Array = AllocateMemory(SizeOf(INTEGER) * (N + 1))
If (*Array)
For i = 0 To N - 1
SBL = StringByteLength(Strings(i), #PB_UTF8) + 1
*Ptr = AllocateMemory(SBL)
If (*Ptr)
PokeS(*Ptr, Strings(i), -1, #PB_UTF8)
PokeI(*Array + i * SizeOf(INTEGER), *Ptr)
EndIf
Next i
EndIf
ProcedureReturn *Array
EndProcedure
Procedure.i FreeUTF8Array(*Array)
If (*Array)
i = 0
Repeat
*Ptr = PeekI(*Array + i * SizeOf(INTEGER))
If (Not *Ptr)
Break
EndIf
FreeMemory(*Ptr)
i + 1
ForEver
FreeMemory(*Array)
EndIf
ProcedureReturn #Null
EndProcedure
;- - DEMO
Dim Test.s(4)
Test(0) = "000011112222333344445555666677778888"
Test(1) = "Huber"
Test(2) = "Meier"
Test(3) = "Müller"
Test(4) = "Schmidt"
*Array = MakeUTF8Array(Test())
If (*Array)
i = 0
Repeat
*Ptr = PeekI(*Array + i * SizeOf(INTEGER))
If (Not *Ptr)
Break
EndIf
Debug "UTF-8 string at " + Hex(*Ptr) + " = " + PeekS(*Ptr, -1, #PB_UTF8)
i + 1
ForEver
FreeUTF8Array(*Array)
EndIf
Re: Array of UTF8 strings in a Unicode program?
@kenmo: I get a strange bug when running your code.
After the value is assigned to Test(1) the value can't be retrieved again. It is as if it is a Null$. The debug output from your demo shows nothing for that index, the debug output shows only 4 lines because the 2nd line is blank.
It looks like this:
I even tried debugging the output right after the initial assignment and it shows nothing.
The rest of the code (besides the initial assignment) functions as expected.
I compiled it as Unicode with Windows 8.1 x64 PB v5.40.
After the value is assigned to Test(1) the value can't be retrieved again. It is as if it is a Null$. The debug output from your demo shows nothing for that index, the debug output shows only 4 lines because the 2nd line is blank.
It looks like this:
Code: Select all
000011112222333344445555666677778888
Meier
Müller
Schmidt
UTF-8 string at 1AE0870 = 000011112222333344445555666677778888
UTF-8 string at 1AE08A0 =
UTF-8 string at 1AE08C0 = Meier
UTF-8 string at 1AE08E0 = Müller
UTF-8 string at 1AE0900 = Schmidt

The rest of the code (besides the initial assignment) functions as expected.
I compiled it as Unicode with Windows 8.1 x64 PB v5.40.
Re: Array of UTF8 strings in a Unicode program?
Using the code kenmo posted (see below) I get the following Debug Output (PB v5.41 LTS Beta 2, Windows 64-bit):
UTF-8 string at 241330 = 000011112222333344445555666677778888
UTF-8 string at 241360 = Huber
UTF-8 string at 241380 = Meier
UTF-8 string at 2413A0 = Müller
UTF-8 string at 2413C0 = Schmidt
kenmo wrote:How about this method?
It's not very user-friendly to walk the UTF-8 array in PB, but if you just need a UTF-8 array pointer to pass to API, it should be OK.
Code: Select all
Procedure.i MakeUTF8Array(Array Strings.s(1)) N = ArraySize(Strings()) + 1 *Array = AllocateMemory(SizeOf(INTEGER) * (N + 1)) If (*Array) For i = 0 To N - 1 SBL = StringByteLength(Strings(i), #PB_UTF8) + 1 *Ptr = AllocateMemory(SBL) If (*Ptr) PokeS(*Ptr, Strings(i), -1, #PB_UTF8) PokeI(*Array + i * SizeOf(INTEGER), *Ptr) EndIf Next i EndIf ProcedureReturn *Array EndProcedure Procedure.i FreeUTF8Array(*Array) If (*Array) i = 0 Repeat *Ptr = PeekI(*Array + i * SizeOf(INTEGER)) If (Not *Ptr) Break EndIf FreeMemory(*Ptr) i + 1 ForEver FreeMemory(*Array) EndIf ProcedureReturn #Null EndProcedure ;- - DEMO Dim Test.s(4) Test(0) = "000011112222333344445555666677778888" Test(1) = "Huber" Test(2) = "Meier" Test(3) = "Müller" Test(4) = "Schmidt" *Array = MakeUTF8Array(Test()) If (*Array) i = 0 Repeat *Ptr = PeekI(*Array + i * SizeOf(INTEGER)) If (Not *Ptr) Break EndIf Debug "UTF-8 string at " + Hex(*Ptr) + " = " + PeekS(*Ptr, -1, #PB_UTF8) i + 1 ForEver FreeUTF8Array(*Array) EndIf
Think Unicode!
Re: Array of UTF8 strings in a Unicode program?
Here a utf8 conversion using prototype working in both ascii and unicode mode.
Code: Select all
Prototype.s ProtoUTF8String(str.p-utf8)
Procedure.s UTF8String_(*str)
CompilerIf #PB_Compiler_Unicode
ProcedureReturn PeekS(*str, (MemoryStringLength(*str, #PB_Ascii) + 1) >> 1)
CompilerElse
ProcedureReturn PeekS(*str)
CompilerEndIf
EndProcedure
Global UTF8String.ProtoUTF8String = @UTF8String_()
Dim Test.s(2)
Test(0) = UTF8String("Meier")
Test(1) = UTF8String("Müller")
Test(2) = UTF8String("Schmidt")
Debug Test(1)
Debug PeekS(@Test(1), -1, #PB_UTF8)
Windows (x64)
Raspberry Pi OS (Arm64)
Raspberry Pi OS (Arm64)
Re: Array of UTF8 strings in a Unicode program?
That works for mewilbert wrote:Here a utf8 conversion using prototype working in both ascii and unicode mode.

Re: Array of UTF8 strings in a Unicode program?
Very weird. Maybe cut out code until you find exactly what ruins the string (might be a 64-bit PB bug?). I can't test the 64-bit version right now.Demivec wrote:After the value is assigned to Test(1) the value can't be retrieved again. It is as if it is a Null$. The debug output from your demo shows nothing for that index, the debug output shows only 4 lines because the 2nd line is blank.
...
I compiled it as Unicode with Windows 8.1 x64 PB v5.40.