Page 1 of 2

How do I get the leftmost/righmost character in a string?

Posted: Sat May 06, 2023 7:37 am
by The8th
I try to simply retrieve the rightmost/leftmost character in a string. But it does not work. How can I achieve this?

Code: Select all

EnableExplicit
Define example$ = "🅐A🅚K🅝"
Define i.b
Debug Right(example$, 1)
Debug ""
Debug Left(example$, 1)
Debug ""
For i = 1 To Len(example$)
  Debug Mid(example$, i, 1)
Next i
The output is:
� (Seems to be $DD5D)

� (Seems to be $D83C)



A


K



But it should be:
🅝

🅐

🅐
A
🅚
K
🅝

PB 6.01 LTS (x86)
Henry

Re: How do I get the leftmost/righmost character in a string?

Posted: Sat May 06, 2023 7:51 am
by Fred
May be the debugger font don't have these chars. Did you try to write in a file to see if it's right?

Re: How do I get the leftmost/righmost character in a string?

Posted: Sat May 06, 2023 7:58 am
by BarryG
The debugger window doesn't show me those, either, no matter which font I set it to. For quick reference: neither Arial, Consolas, Dina, or Courier New show them. I tried this code to a file, but the file doesn't show them as well:

Code: Select all

EnableExplicit
Define example$ = "🅐A🅚K🅝"
Define i.b
CreateFile(0,"d:\zzz.txt")
WriteStringN(0, Right(example$, 1))
WriteStringN(0, "")
WriteStringN(0, Left(example$, 1))
WriteStringN(0, "")
For i = 1 To Len(example$)
  WriteStringN(0, Mid(example$, i, 1))
Next i
CloseFile(0)

Image

Re: How do I get the leftmost/righmost character in a string?

Posted: Sat May 06, 2023 8:40 am
by #NULL
Fred wrote: Sat Apr 08, 2023 12:02 pm PB supports only UCS2 unicode without surrogates supports, should be mentioned in the generate string section as basically every function which manipulate string is impacted.
Maybe those are surrogate characters? They seem to occupy 4 bytes.

Code: Select all

s1.s = "🅐A🅚K🅝"
s2.s = Left(s1, 1)
s3.s = Right(s1, 1)
ShowMemoryViewer(@s1, 16) ; 🅐 is '3C D8 50 DD', A is '41 00'
CallDebugger              ; (click Debugger Continue)
ShowMemoryViewer(@s2, 4)  ; shows '3C D8 00 00' i.e. only the first 2 bytes of 🅐
CallDebugger
ShowMemoryViewer(@s3, 4)  ; shows '5D DD 00 00' i.e. only the last 2 bytes of 🅝
CallDebugger


Re: How do I get the leftmost/righmost character in a string?

Posted: Sat May 06, 2023 8:41 am
by infratec

Code: Select all

ShowMemoryViewer(@example$, StringByteLength(example$))
Shows:
3C D8 50 DD 41 00 3C D8 5A DD 4B 00 3C D8 5D DD <ØPÝA.<ØZÝK.<Ø]Ý

Re: How do I get the leftmost/righmost character in a string?

Posted: Sat May 06, 2023 8:44 am
by infratec
But ....

Code: Select all

Define example$ = "🅐A🅚K🅝"

ShowMemoryViewer(@example$, StringByteLength(example$))


Debug Right(example$, 1)
Debug ""
Debug Left(example$, 1)
Debug ""
For i = 1 To Len(example$)
  Debug Mid(example$, i, 1)
Next i

*Buffer = UTF8(example$)
Debug PeekS(*Buffer, -1, #PB_UTF8)

Re: How do I get the leftmost/righmost character in a string?

Posted: Sat May 06, 2023 8:52 am
by infratec
Even more strange:

Code: Select all

Define example$ = "🅐A🅚K🅝"
Debug example$

ShowMemoryViewer(@example$, StringByteLength(example$))

Debug Right(example$, 1)
Debug ""
Debug Left(example$, 1)
Debug ""
For i = 1 To Len(example$)
  Debug Mid(example$, i, 1)
Next i

Debug "------"

*Buffer = UTF8(example$)
Converted$ = PeekS(*Buffer, -1, #PB_UTF8)

ShowMemoryViewer(*Buffer, MemorySize(*Buffer))
Debug Converted$

Debug PeekS(*Buffer, 1, #PB_UTF8)
Debug PeekS(*Buffer, 4, #PB_UTF8|#PB_ByteLength)

Debug Left(Converted$, 1)
*Buffer can be displayed correct, also Converted$, but not parts of it.

Re: How do I get the leftmost/righmost character in a string?

Posted: Sat May 06, 2023 9:01 am
by infratec
As Result:

Code: Select all

Procedure.s LeftUnicode(String$, Len.i)
  
  Protected *Buffer, Result$
  
  *Buffer = UTF8(String$)
  If *Buffer
    Result$ = PeekS(*Buffer, Len, #PB_UTF8)
    FreeMemory(*Buffer)
  EndIf
  
  ProcedureReturn Result$
  
EndProcedure


Define example$ = "🅐A🅚K🅝"
Debug example$

Debug LeftUnicode(example$, 2)
But Rifgt is much more difficult. ReverseString is not working, and the size of the characters in bytes is unknown.

Re: How do I get the leftmost/righmost character in a string?

Posted: Sat May 06, 2023 9:03 am
by The8th
Thanks for all who have tried the example.
I fear there is no solution for these characters.

Code: Select all

Debug Chr($1F150)
doesn't work also (should show an 🅐).
Trim functions fail with an error:

Code: Select all

Define example$ = "🅐A🅚K🅝"
Debug RTrim(example$ , "🅝")
Debug LTrim(example$ , "🅐")
Caution" The example above trows a runtime error!
RemoveString works:

Code: Select all

Define example$ = "🅐A🅚K🅝"
Debug RemoveString(example$ , "🅐")
Debug RemoveString(example$ , "🅚")
Debug RemoveString(example$ , "🅝")
Henry

Re: How do I get the leftmost/righmost character in a string?

Posted: Sat May 06, 2023 9:22 am
by infratec
Very, very ugly, but working:

Code: Select all

EnableExplicit

Procedure.s LeftUnicode(String$, Len.i)
  
  Protected *Buffer, Result$
  
  
  *Buffer = UTF8(String$)
  If *Buffer
    Result$ = PeekS(*Buffer, Len, #PB_UTF8)
    FreeMemory(*Buffer)
  EndIf
  
  ProcedureReturn Result$
  
EndProcedure


Procedure.s RightUnicode(String$, Len.i)
  
  Protected *Buffer, Result$, *Ptr, Count
  Protected NewList CharList$()
  
  
  *Buffer = UTF8(String$)
  If *Buffer
    *Ptr = *Buffer
    While Not PeekA(*Ptr) = 0
      AddElement(CharList$())
      CharList$() = PeekS(*Ptr, 1, #PB_UTF8)
      *Ptr + StringByteLength(PeekS(*Ptr, 1, #PB_UTF8), #PB_UTF8)
    Wend
    FreeMemory(*Buffer)
    
    If SelectElement(CharList$(), ListSize(CharList$()) - Len - 1)
      While NextElement(CharList$())
        Result$ + CharList$()
      Wend
    EndIf
    
  EndIf
  
  ProcedureReturn Result$
  
EndProcedure



Define Test$
Define example$ = "🅐A🅚K🅝"
Debug example$


Test$ = LeftUnicode(example$, 2)
Debug Test$

Test$ = RightUnicode(example$, 2)
Debug Test$

Re: How do I get the leftmost/righmost character in a string?

Posted: Sat May 06, 2023 9:31 am
by infratec
PB is using UCS-2 (in general :wink: )
It looks like some functions are working with a bit more (by accident), like UTF8()

Re: How do I get the leftmost/righmost character in a string?

Posted: Sat May 06, 2023 10:16 am
by mk-soft
The result is still a UTF16 string. ;)

Code: Select all

Test$ = LeftUnicode(example$, 2)
Debug Test$
Debug Len(Test$)

Test$ = RightUnicode(example$, 2)
Debug Test$
Debug Len(Test$)


Re: How do I get the leftmost/righmost character in a string?

Posted: Sat May 06, 2023 10:27 am
by infratec
Strange. Not tested this.
But then something inside PB is wrong.

Re: How do I get the leftmost/righmost character in a string?

Posted: Sat May 06, 2023 10:34 am
by mk-soft
If you find a string as UTF16, you have to treat it differently. So completely new string function (which are then slower).
Purebasic uses UCS-2, which is also 99.99% ok.

Code: Select all

Procedure IsUTF16String(String$)
  Protected *String.Unicode = @String$
  
  If *String
    While *String\u
      If *String\u > $D7FF And *String\u < $E000
        ProcedureReturn #True
      EndIf
      *String + 2
    Wend
  EndIf
  ProcedureReturn #False
EndProcedure

Procedure LenUTF16(String$)
  Protected *Char.Unicode
  Protected cnt
  
  *Char.Unicode = @String$
  If *Char
    While *Char\u
      If *Char\u > $D7FF And *Char\u < $E000
        *Char + 4
        len + 2
      Else
        *Char + 2
        len + 1
      EndIf
      cnt + 1
    Wend
  EndIf
  ProcedureReturn cnt
EndProcedure

Procedure.s LeftUTF16(String$, Length)
  Protected *Char.Unicode
  Protected len, cnt
  
  If Length < 1
    ProcedureReturn ""
  EndIf
  *Char.Unicode = @String$
  If *Char
    While *Char\u
      If cnt >= Length
        Break
      EndIf
      If *Char\u > $D7FF And *Char\u < $E000
        *Char + 4
        len + 2
      Else
        *Char + 2
        len + 1
      EndIf
      cnt + 1
    Wend
  EndIf
  ProcedureReturn Left(String$, len)
EndProcedure

Procedure.s RightUTF16(String$, Length)
  Protected *Char.Unicode, *Char2.Unicode, *String.Unicode
  Protected len, cnt
  
  If Length < 1
    ProcedureReturn ""
  EndIf
  *String = @String$
  If *String
    *Char = *String + StringByteLength(String$) - 2
    While *Char\u
      If cnt >= Length Or *Char <= *String
        Break
      EndIf
      *Char2 = *Char - 2
      If *Char2 >= *String And (*Char\u > $D7FF And *Char\u < $E000)
        *Char - 4
        len + 2
      Else
        *Char - 2
        len + 1
      EndIf
      cnt + 1
    Wend
  EndIf
  ProcedureReturn Right(String$, len)
EndProcedure

; ****

Define s1.s

s1 = "🅐A🅚K🅝"
Debug "Is '" + s1 + "' UFT16: " + IsUTF16String(s1)
Debug "StringByteLength = " + StringByteLength(s1)
Debug "Len = " + LenUTF16(s1)
Debug "Left = " + LeftUTF16(s1, 2)
Debug "Right = " + RightUTF16(s1, 2)


Re: How do I get the leftmost/righmost character in a string?

Posted: Sat May 06, 2023 11:22 am
by idle
I will see if I can add left mid right functions to the utf16 full casefolding module.
It should be doable in a single parse.