Page 1 of 2
Array of UTF8 strings in a Unicode program?
Posted: Mon Dec 07, 2015 5:30 pm
by mikejs
I'm looking at updating my various programs to support unicode, given that support for ascii compilation is ending soon.
Mostly this is straightforward, but I've found one case where I'm not sure what to do. I'm calling an API that expects, as one of its parameters, a pointer to an array of strings. It turns out that it wants utf8 strings in this array. Passing it ascii works because that overlaps with utf8 to a large extent (and completely in this particular case). Passing it any sort of 16bit unicode format doesn't work.
If I compile my program in ascii, all is well, but in unicode, I don't see an easy way to create the right kind of array. The parameter type for the call is a pointer, so I can't use .p-utf8 anywhere, and there doesn't seem to be a way of saying, when you Dim the array, that you want the strings to be anything other than native, whatever that happens to be. PokeS has utf8 options, but I'm not sure how to use that with array entries, and that looks a very messy solution to something that just worked with ascii compilation.
Any suggestions as to a way around this?
Have I missed something obvious?
Re: Array of UTF8 strings in a Unicode program?
Posted: Mon Dec 07, 2015 5:51 pm
by skywalk
Convert your array to .b(byte) or .a(ascii) or a memory block and send its pointer.
A single chr(0) denotes the end of string.
Re: Array of UTF8 strings in a Unicode program?
Posted: Mon Dec 07, 2015 5:56 pm
by Oma
I've got the same problem.
A API Import like
Code: Select all
ImportC ""
gtk_about_dialog_set_authors(*about, *authors)
EndImport
need to pass an UTF8-String-Array for
*authors
I did it this way, but it's really not very elegant:
Code: Select all
Procedure.s UniToUtf8(SUni.s)
Protected *gMem= AllocateMemory(StringByteLength(SUni, #PB_Unicode) + 2)
PokeS(*gMem, SUni, -1, #PB_UTF8)
Protected.s UTF8= PeekS(*gMem)
FreeMemory(*gMem)
ProcedureReturn UTF8
EndProcedure
Global Dim AboutAuthors.s(4)
AboutAuthors(0)= UniToUtf8("Huber")
AboutAuthors(1)= UniToUtf8("Meier")
AboutAuthors(2)= UniToUtf8("Müller")
AboutAuthors(3)= UniToUtf8("Schmidt")
AboutAuthors(4)= #Null$; here it works ;-)
...
...
gtk_about_dialog_set_authors(*gAbout, @AboutAuthors())
I don't know if there's a better way
Best Regards
Charly
Re: Array of UTF8 strings in a Unicode program?
Posted: Mon Dec 07, 2015 5:59 pm
by mikejs
skywalk wrote:Convert your array to .b(byte) or .a(ascii) or a memory block and send its pointer.
A single chr(0) denotes the end of string.
Yes, but if the array now contains numbers, how do I populate the strings in it?
Previously I had code like this:
Code: Select all
Dim attrs.s(10)
attrs(0)="this"
attrs(1)="that"
attrs(2)="theother"
attrs(3)="somethingelse"
attrs(4)="blahblahblah"
attrs(5)=#Null$
What would that look like for an array of .b or .a? Stepping through each string and using Asc() calls?
Re: Array of UTF8 strings in a Unicode program?
Posted: Mon Dec 07, 2015 6:06 pm
by mikejs
Oma wrote:I don't know if there's a better way
There probably should be, but that looks workable at least. I'll do some testing and see if that works here. It's the LDAP SDK api, incidentally - at one point you pass an array of utf8 strings listing the attributes you want to read.
Cheers
Re: Array of UTF8 strings in a Unicode program?
Posted: Mon Dec 07, 2015 6:12 pm
by Little John
Idles ModXString module might be useful for you.
Facilitates using strings in any format, which may be useful when interacting with external libs
that require either utf8, unicode, wchar, ascii or Bstr
Re: Array of UTF8 strings in a Unicode program?
Posted: Mon Dec 07, 2015 7:00 pm
by Oma
@mikejs
There (probably) should be
I agree
@Little John
Thanks for the link. A big thing, but something that should be remembered in some cases.
Re: Array of UTF8 strings in a Unicode program?
Posted: Mon Dec 07, 2015 7:09 pm
by skywalk
Smaller example code...Are you sure your api call does not expect the array to be packed?
Code: Select all
EnableExplicit
Macro SF_Uni_UTF8(StringFrom, StringTo)
StringTo = Space(Len(StringFrom)+SizeOf(Character)+1) ; Only works if I add +1 when StringTo is an array element?
PokeS(@StringTo, StringFrom, -1, #PB_UTF8); | #PB_String_NoZero)
EndMacro
Define.i i, nPts = 5
Define.s s$
Dim a$(nPts-1)
a$(0) = "000011112222333344445555666677778888"
a$(1) = "Huber"
a$(2) = "Meier"
a$(3) = "Müller" ;<-- Corrupted if SF_Uni_UTF8() adds +2 instead of +3?
a$(4) = "Schmidt"
For i = 0 To nPts-1
s$ = a$(i)
SF_Uni_UTF8(s$, a$(i))
Next i
Debug @a$(0)
ShowMemoryViewer(@a$(3), 255)
Re: Array of UTF8 strings in a Unicode program?
Posted: Mon Dec 07, 2015 8:01 pm
by ElementE
skywalk wrote:a$(3) = "Müller" ;<-- Corrupted if SF_Uni_UTF8() adds +2 instead of +3[/code]
UTF-8 hex encoding of ü (Unicode code point U-00FC) is $C3BC.
A check should be added to maker sure the characters in the string can be represented as one-byte ASCII characters.
Re: Array of UTF8 strings in a Unicode program?
Posted: Mon Dec 07, 2015 8:03 pm
by kenmo
How about this method?
It's not very user-friendly to walk the UTF-8 array in PB, but if you just need a UTF-8 array pointer to pass to API, it should be OK.
Code: Select all
Procedure.i MakeUTF8Array(Array Strings.s(1))
N = ArraySize(Strings()) + 1
*Array = AllocateMemory(SizeOf(INTEGER) * (N + 1))
If (*Array)
For i = 0 To N - 1
SBL = StringByteLength(Strings(i), #PB_UTF8) + 1
*Ptr = AllocateMemory(SBL)
If (*Ptr)
PokeS(*Ptr, Strings(i), -1, #PB_UTF8)
PokeI(*Array + i * SizeOf(INTEGER), *Ptr)
EndIf
Next i
EndIf
ProcedureReturn *Array
EndProcedure
Procedure.i FreeUTF8Array(*Array)
If (*Array)
i = 0
Repeat
*Ptr = PeekI(*Array + i * SizeOf(INTEGER))
If (Not *Ptr)
Break
EndIf
FreeMemory(*Ptr)
i + 1
ForEver
FreeMemory(*Array)
EndIf
ProcedureReturn #Null
EndProcedure
;- - DEMO
Dim Test.s(4)
Test(0) = "000011112222333344445555666677778888"
Test(1) = "Huber"
Test(2) = "Meier"
Test(3) = "Müller"
Test(4) = "Schmidt"
*Array = MakeUTF8Array(Test())
If (*Array)
i = 0
Repeat
*Ptr = PeekI(*Array + i * SizeOf(INTEGER))
If (Not *Ptr)
Break
EndIf
Debug "UTF-8 string at " + Hex(*Ptr) + " = " + PeekS(*Ptr, -1, #PB_UTF8)
i + 1
ForEver
FreeUTF8Array(*Array)
EndIf
Re: Array of UTF8 strings in a Unicode program?
Posted: Tue Dec 08, 2015 3:09 am
by Demivec
@kenmo: I get a strange bug when running your code.
After the value is assigned to Test(1) the value can't be retrieved again. It is as if it is a Null$. The debug output from your demo shows nothing for that index, the debug output shows only 4 lines because the 2nd line is blank.
It looks like this:
Code: Select all
000011112222333344445555666677778888
Meier
Müller
Schmidt
UTF-8 string at 1AE0870 = 000011112222333344445555666677778888
UTF-8 string at 1AE08A0 =
UTF-8 string at 1AE08C0 = Meier
UTF-8 string at 1AE08E0 = Müller
UTF-8 string at 1AE0900 = Schmidt
I even tried debugging the output right after the initial assignment and it shows nothing.
The rest of the code (besides the initial assignment) functions as expected.
I compiled it as Unicode with Windows 8.1 x64 PB v5.40.
Re: Array of UTF8 strings in a Unicode program?
Posted: Tue Dec 08, 2015 3:48 am
by ElementE
Using the code kenmo posted (see below) I get the following Debug Output (PB v5.41 LTS Beta 2, Windows 64-bit):
UTF-8 string at 241330 = 000011112222333344445555666677778888
UTF-8 string at 241360 = Huber
UTF-8 string at 241380 = Meier
UTF-8 string at 2413A0 = Müller
UTF-8 string at 2413C0 = Schmidt
kenmo wrote:How about this method?
It's not very user-friendly to walk the UTF-8 array in PB, but if you just need a UTF-8 array pointer to pass to API, it should be OK.
Code: Select all
Procedure.i MakeUTF8Array(Array Strings.s(1))
N = ArraySize(Strings()) + 1
*Array = AllocateMemory(SizeOf(INTEGER) * (N + 1))
If (*Array)
For i = 0 To N - 1
SBL = StringByteLength(Strings(i), #PB_UTF8) + 1
*Ptr = AllocateMemory(SBL)
If (*Ptr)
PokeS(*Ptr, Strings(i), -1, #PB_UTF8)
PokeI(*Array + i * SizeOf(INTEGER), *Ptr)
EndIf
Next i
EndIf
ProcedureReturn *Array
EndProcedure
Procedure.i FreeUTF8Array(*Array)
If (*Array)
i = 0
Repeat
*Ptr = PeekI(*Array + i * SizeOf(INTEGER))
If (Not *Ptr)
Break
EndIf
FreeMemory(*Ptr)
i + 1
ForEver
FreeMemory(*Array)
EndIf
ProcedureReturn #Null
EndProcedure
;- - DEMO
Dim Test.s(4)
Test(0) = "000011112222333344445555666677778888"
Test(1) = "Huber"
Test(2) = "Meier"
Test(3) = "Müller"
Test(4) = "Schmidt"
*Array = MakeUTF8Array(Test())
If (*Array)
i = 0
Repeat
*Ptr = PeekI(*Array + i * SizeOf(INTEGER))
If (Not *Ptr)
Break
EndIf
Debug "UTF-8 string at " + Hex(*Ptr) + " = " + PeekS(*Ptr, -1, #PB_UTF8)
i + 1
ForEver
FreeUTF8Array(*Array)
EndIf
Re: Array of UTF8 strings in a Unicode program?
Posted: Tue Dec 08, 2015 6:59 am
by wilbert
Here a utf8 conversion using prototype working in both ascii and unicode mode.
Code: Select all
Prototype.s ProtoUTF8String(str.p-utf8)
Procedure.s UTF8String_(*str)
CompilerIf #PB_Compiler_Unicode
ProcedureReturn PeekS(*str, (MemoryStringLength(*str, #PB_Ascii) + 1) >> 1)
CompilerElse
ProcedureReturn PeekS(*str)
CompilerEndIf
EndProcedure
Global UTF8String.ProtoUTF8String = @UTF8String_()
Dim Test.s(2)
Test(0) = UTF8String("Meier")
Test(1) = UTF8String("Müller")
Test(2) = UTF8String("Schmidt")
Debug Test(1)
Debug PeekS(@Test(1), -1, #PB_UTF8)
Re: Array of UTF8 strings in a Unicode program?
Posted: Tue Dec 08, 2015 1:14 pm
by mikejs
wilbert wrote:Here a utf8 conversion using prototype working in both ascii and unicode mode.
That works for me

Re: Array of UTF8 strings in a Unicode program?
Posted: Tue Dec 08, 2015 1:47 pm
by kenmo
Demivec wrote:After the value is assigned to Test(1) the value can't be retrieved again. It is as if it is a Null$. The debug output from your demo shows nothing for that index, the debug output shows only 4 lines because the 2nd line is blank.
...
I compiled it as Unicode with Windows 8.1 x64 PB v5.40.
Very weird. Maybe cut out code until you find exactly what ruins the string (might be a 64-bit PB bug?). I can't test the 64-bit version right now.