Array of UTF8 strings in a Unicode program?

Just starting out? Need help? Post your questions and find answers here.
Oma
Enthusiast
Enthusiast
Posts: 312
Joined: Thu Jun 26, 2014 9:17 am
Location: Germany

Re: Array of UTF8 strings in a Unicode program?

Post by Oma »

Thank you wilbert!
Very ingenious method and works fine on 64-Bit-Linux.
But on 32-Bit i get an Invalid memory access in the
Test(0) = UTF8String("Meier")-line.

If i convert the *str to a quad-Type str.q for testing it, no Invalid memory access occurs, but the Debug-Strings stays empty.

At the moment i have no idea what's the problem here. It seems, that the address must be a 64-Bit on 32-Bit Systems too. :?

Best Regards, Charly
PureBasic 5.4-5.7, Linux: (X/L/K)Ubuntus+Mint - Windows XP (32Bit)
PureBasic Linux-API-Library & Viewer: http://www.chabba.de
ElementE
Enthusiast
Enthusiast
Posts: 139
Joined: Sun Feb 22, 2015 2:33 am

Re: Array of UTF8 strings in a Unicode program?

Post by ElementE »

Hi wilbert.
When I run your code in unicode mode, I get the following debug output:
썍沼敬r
Müller
Oringial code:
wilbert wrote:Here a utf8 conversion using prototype working in both ascii and unicode mode.

Code: Select all

Prototype.s ProtoUTF8String(str.p-utf8)

Procedure.s UTF8String_(*str)
  CompilerIf #PB_Compiler_Unicode
    ProcedureReturn PeekS(*str, (MemoryStringLength(*str, #PB_Ascii) + 1) >> 1)
  CompilerElse
    ProcedureReturn PeekS(*str)
  CompilerEndIf
EndProcedure

Global UTF8String.ProtoUTF8String = @UTF8String_()


Dim Test.s(2)
Test(0) = UTF8String("Meier")
Test(1) = UTF8String("Müller")
Test(2) = UTF8String("Schmidt")

Debug Test(1)
Debug PeekS(@Test(1), -1, #PB_UTF8)
Think Unicode!
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3942
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Re: Array of UTF8 strings in a Unicode program?

Post by wilbert »

Oma wrote:on 32-Bit i get an Invalid memory access in the
Test(0) = UTF8String("Meier")-line.
That's strange. On OSX it works fine with both the x86 and x64 version of PB.
Does it fail in both ascii and unicode mode ?
ElementE wrote:When I run your code in unicode mode, I get the following debug output:
썍沼敬r
Müller
It's supposed to output that in unicode mode :wink:
Windows (x64)
Raspberry Pi OS (Arm64)
User avatar
skywalk
Addict
Addict
Posts: 4211
Joined: Wed Dec 23, 2009 10:14 pm
Location: Boston, MA

Re: Array of UTF8 strings in a Unicode program?

Post by skywalk »

A different approach. Convert native string array to string buffer of user-defined encoding.

Code: Select all

CompilerIf #PB_Compiler_Unicode = 0
  MessageRequester("try-uni-Array-utf8-mem", "Requires #PB_Compiler_Unicode." + #CRLF$ +
                   "PB v5.4+ is dropping Ascii compiler switch." + #CRLF$, #MB_ICONWARNING)
  End
CompilerEndIf
EnableExplicit
Procedure.i JoinToMem(*nBytes.Integer, Array A$(1), Delm$=#Empty$, iStart.i=0, iStop.i=-1, Enc.i=#PB_Ascii)
  ; REV:  151207, skywalk
  ;       Spinoff of Join() which returns a concatenated native PB string(Unicode only).
  ; RETURN:
  ;  *s     = String buffer concatenated from A$() in user-defined encoding(Enc).
  ;  nBytes = num bytes within string buffer. *s contains Chr(0)'s so cannot use Len(PeekS(*s,-1,Enc))
  ; NOTES:
  ;   Delm$       Action
  ;   #Empty$     OK to write Chr(0) after each element.
  ;   #Null$      Set #PB_String_NoZero and add nothing between elements.
  ;   > 0         Set #PB_String_NoZero and only add Delm$ between elements.
  Protected.i i, k, npb, nBytes, lenbEOS, *p, *s
  Protected.i PokeNoZero = #PB_String_NoZero
  Protected.s r$
  If iStart < 0
    iStart = 0
  EndIf
  If iStop < 0
    k = ArraySize(A$())
    iStop = k
  Else
    If iStart >= iStop
      iStop = iStart
      k = iStop - iStart
    Else
      k = iStop - iStart + 1
    EndIf
  EndIf
  ; Determine nBytes required to hold string array contents
  For i = iStart To iStop
    nBytes + StringByteLength(A$(i), Enc)
  Next i
  If Delm$
    nBytes + k * StringByteLength(Delm$, Enc) ; Add room for delimiters
  EndIf
  If @Delm$ And Len(Delm$) < 1  ; Delm$ has an address so allow PokeZero
    PokeNoZero = 0
    If Enc <> #PB_Unicode       ; Set Size(bytes) of trailing nullchar in user defined encoding.
      lenbEOS = 1
    Else
      lenbEOS = 2
    EndIf
    nBytes + k * lenbEOS        ; Account for nullchar delimiters + 1 trailer.
  EndIf
  *s = AllocateMemory(nBytes+lenbEOS)
  If *s
    If nBytes <= MemorySize(*s) ; Verify enough memory created
      *p = *s                   ; Create tracking pointer for concatenating memory
      If Delm$
        npb = PokeS(*p, A$(iStart), -1, Enc | PokeNoZero)
        *p + npb + lenbEOS
        If k > 0
          npb = PokeS(*p, Delm$, -1, Enc | PokeNoZero)
          *p + npb + lenbEOS
          k = iStop - 1         ; Avoid recalculating k-1 in For-Next loop
          For i = iStart + 1 To k
            npb = PokeS(*p, A$(i), -1, Enc | PokeNoZero)
            *p + npb + lenbEOS
            npb = PokeS(*p, Delm$, -1, Enc | PokeNoZero)
            *p + npb + lenbEOS
          Next i
          npb = PokeS(*p, A$(i), -1, Enc | PokeNoZero)
        EndIf
      Else
        npb = PokeS(*p, A$(iStart), -1, Enc | PokeNoZero)
        *p + npb + lenbEOS
        If k > 0
          For i = iStart + 1 To iStop
            npb = PokeS(*p, A$(i), -1, Enc | PokeNoZero)
            *p + npb + lenbEOS
          Next i
        EndIf
      EndIf
      ;FreeMemory(*s)           ; Not now, but remember to free memory when done.
    EndIf
  EndIf
  If *nBytes                    ; Avoid null pointer.
    *nBytes\i = nBytes          ; Buffer contains 0's so <> Len(PeekS(*s,-1,Enc))
  EndIf
  ProcedureReturn *s
EndProcedure
;-{ TEST
#NUL$ = #Empty$
#SP$  = " "
#CMA$ = ","
Define.i i, *s, nBytes, Enc, nPts = 5
Define.s s$
Dim a$(nPts-1)
a$(0) = "000011112222333344445555666677778888"
a$(1) = "Huber"
a$(2) = "Völler"
a$(3) = "Müller"
a$(4) = "Šimûnek"
Enc = #PB_UTF8
;Enc = #PB_Ascii
;Enc = #PB_Unicode
*s = JoinToMem(@nBytes, a$(), #CMA$, 0, -1, Enc)
ShowMemoryViewer(*s, nBytes)
Debug PeekS(*s+nBytes-2, 1, Enc)
FreeMemory(*s)
*s = JoinToMem(@nBytes, a$(), #Null$, 0, -1, Enc)
ShowMemoryViewer(*s, nBytes)
Debug PeekS(*s+nBytes-2, 1, Enc)
FreeMemory(*s)
*s = JoinToMem(@nBytes, a$(), #NUL$, 0, -1, Enc)
ShowMemoryViewer(*s, nBytes)
Debug PeekS(*s+nBytes-2, 1, Enc)
FreeMemory(*s)
;-} TEST
The nice thing about standards is there are so many to choose from. ~ Andrew Tanenbaum
User avatar
Demivec
Addict
Addict
Posts: 4260
Joined: Mon Jul 25, 2005 3:51 pm
Location: Utah, USA

Re: Array of UTF8 strings in a Unicode program?

Post by Demivec »

kenmo wrote:
Demivec wrote:After the value is assigned to Test(1) the value can't be retrieved again. It is as if it is a Null$. The debug output from your demo shows nothing for that index, the debug output shows only 4 lines because the 2nd line is blank.

...

I compiled it as Unicode with Windows 8.1 x64 PB v5.40.
Very weird. Maybe cut out code until you find exactly what ruins the string (might be a 64-bit PB bug?). I can't test the 64-bit version right now.
My tests today show that there are no anomalies with the output, or anything else. Can't explain the difference or the initial occurrence. Hopefully the space-time continuum is also all back as it should be. :)


@Skywalk: Your code only outputs 3 e's and shows strings via the memory viewer. Given what happen when I tested kenmo's code I'll wait a day and try it again. 8)
User avatar
skywalk
Addict
Addict
Posts: 4211
Joined: Wed Dec 23, 2009 10:14 pm
Location: Boston, MA

Re: Array of UTF8 strings in a Unicode program?

Post by skywalk »

Demivec wrote:@Skywalk: Your code only outputs 3 e's and shows strings via the memory viewer. Given what happen when I tested kenmo's code I'll wait a day and try it again. 8)
Haha, yes, that is the correct output. The string buffer is intended to be passed to a dll as a pointer. I only provided a few debug commands to show the memory and a common character in the buffer. The difference with this approach is the strings are immediately terminated with 0's and/or delimiters per user setting. And with user defined encoding. I have no immediate use for the the array approach.
The nice thing about standards is there are so many to choose from. ~ Andrew Tanenbaum
User avatar
kenmo
Addict
Addict
Posts: 2033
Joined: Tue Dec 23, 2003 3:54 am

Re: Array of UTF8 strings in a Unicode program?

Post by kenmo »

Demivec wrote: My tests today show that there are no anomalies with the output, or anything else. Can't explain the difference or the initial occurrence. Hopefully the space-time continuum is also all back as it should be. :)
Sounds like you are the third person to post about an intermittent null-string issue...
http://www.purebasic.fr/english/viewtop ... 13&t=63910
Oma
Enthusiast
Enthusiast
Posts: 312
Joined: Thu Jun 26, 2014 9:17 am
Location: Germany

Re: Array of UTF8 strings in a Unicode program?

Post by Oma »

It's seems to be a big theme :D

Hello Wilbert,
Does it fail in both ascii and unicode mode ?
yes, it happens in Ascii and Unicode-mode.

If I have the time i'll try it again in the evening. (But I have no experience with the prototype :| )

Best Regards,
Charly
PureBasic 5.4-5.7, Linux: (X/L/K)Ubuntus+Mint - Windows XP (32Bit)
PureBasic Linux-API-Library & Viewer: http://www.chabba.de
mikejs
Enthusiast
Enthusiast
Posts: 175
Joined: Thu Oct 21, 2010 9:46 pm

Re: Array of UTF8 strings in a Unicode program?

Post by mikejs »

mikejs wrote:
wilbert wrote:Here a utf8 conversion using prototype working in both ascii and unicode mode.
That works for me :)
... correction. It works for me on 64bit, but does some very strange things on 32bit. It's probably going to be easier to just construct a block of memory to pass to the function in this specific case (the strings I need to pass are known at compile-time and do not need to vary at run-time), but I think there needs to be a better native solution to this.

I'll post something in Feature Requests...
Post Reply