Page 1 of 2
How do I get the leftmost/righmost character in a string?
Posted: Sat May 06, 2023 7:37 am
by The8th
I try to simply retrieve the rightmost/leftmost character in a string. But it does not work. How can I achieve this?
Code: Select all
EnableExplicit
Define example$ = "🅐A🅚K🅝"
Define i.b
Debug Right(example$, 1)
Debug ""
Debug Left(example$, 1)
Debug ""
For i = 1 To Len(example$)
Debug Mid(example$, i, 1)
Next i
The output is:
� (Seems to be $DD5D)
� (Seems to be $D83C)
�
�
A
�
�
K
�
�
But it should be:
🅝
🅐
🅐
A
🅚
K
🅝
PB 6.01 LTS (x86)
Henry
Re: How do I get the leftmost/righmost character in a string?
Posted: Sat May 06, 2023 7:51 am
by Fred
May be the debugger font don't have these chars. Did you try to write in a file to see if it's right?
Re: How do I get the leftmost/righmost character in a string?
Posted: Sat May 06, 2023 7:58 am
by BarryG
The debugger window doesn't show me those, either, no matter which font I set it to. For quick reference: neither Arial, Consolas, Dina, or Courier New show them. I tried this code to a file, but the file doesn't show them as well:
Code: Select all
EnableExplicit
Define example$ = "🅐A🅚K🅝"
Define i.b
CreateFile(0,"d:\zzz.txt")
WriteStringN(0, Right(example$, 1))
WriteStringN(0, "")
WriteStringN(0, Left(example$, 1))
WriteStringN(0, "")
For i = 1 To Len(example$)
WriteStringN(0, Mid(example$, i, 1))
Next i
CloseFile(0)

Re: How do I get the leftmost/righmost character in a string?
Posted: Sat May 06, 2023 8:40 am
by #NULL
Fred wrote: Sat Apr 08, 2023 12:02 pm
PB supports only UCS2 unicode without surrogates supports, should be mentioned in the generate string section as basically every function which manipulate string is impacted.
Maybe those are surrogate characters? They seem to occupy 4 bytes.
Code: Select all
s1.s = "🅐A🅚K🅝"
s2.s = Left(s1, 1)
s3.s = Right(s1, 1)
ShowMemoryViewer(@s1, 16) ; 🅐 is '3C D8 50 DD', A is '41 00'
CallDebugger ; (click Debugger Continue)
ShowMemoryViewer(@s2, 4) ; shows '3C D8 00 00' i.e. only the first 2 bytes of 🅐
CallDebugger
ShowMemoryViewer(@s3, 4) ; shows '5D DD 00 00' i.e. only the last 2 bytes of 🅝
CallDebugger
Re: How do I get the leftmost/righmost character in a string?
Posted: Sat May 06, 2023 8:41 am
by infratec
Code: Select all
ShowMemoryViewer(@example$, StringByteLength(example$))
Shows:
3C D8 50 DD 41 00 3C D8 5A DD 4B 00 3C D8 5D DD <ØPÝA.<ØZÝK.<Ø]Ý
Re: How do I get the leftmost/righmost character in a string?
Posted: Sat May 06, 2023 8:44 am
by infratec
But ....
Code: Select all
Define example$ = "🅐A🅚K🅝"
ShowMemoryViewer(@example$, StringByteLength(example$))
Debug Right(example$, 1)
Debug ""
Debug Left(example$, 1)
Debug ""
For i = 1 To Len(example$)
Debug Mid(example$, i, 1)
Next i
*Buffer = UTF8(example$)
Debug PeekS(*Buffer, -1, #PB_UTF8)
Re: How do I get the leftmost/righmost character in a string?
Posted: Sat May 06, 2023 8:52 am
by infratec
Even more strange:
Code: Select all
Define example$ = "🅐A🅚K🅝"
Debug example$
ShowMemoryViewer(@example$, StringByteLength(example$))
Debug Right(example$, 1)
Debug ""
Debug Left(example$, 1)
Debug ""
For i = 1 To Len(example$)
Debug Mid(example$, i, 1)
Next i
Debug "------"
*Buffer = UTF8(example$)
Converted$ = PeekS(*Buffer, -1, #PB_UTF8)
ShowMemoryViewer(*Buffer, MemorySize(*Buffer))
Debug Converted$
Debug PeekS(*Buffer, 1, #PB_UTF8)
Debug PeekS(*Buffer, 4, #PB_UTF8|#PB_ByteLength)
Debug Left(Converted$, 1)
*Buffer can be displayed correct, also Converted$, but not parts of it.
Re: How do I get the leftmost/righmost character in a string?
Posted: Sat May 06, 2023 9:01 am
by infratec
As Result:
Code: Select all
Procedure.s LeftUnicode(String$, Len.i)
Protected *Buffer, Result$
*Buffer = UTF8(String$)
If *Buffer
Result$ = PeekS(*Buffer, Len, #PB_UTF8)
FreeMemory(*Buffer)
EndIf
ProcedureReturn Result$
EndProcedure
Define example$ = "🅐A🅚K🅝"
Debug example$
Debug LeftUnicode(example$, 2)
But Rifgt is much more difficult. ReverseString is not working, and the size of the characters in bytes is unknown.
Re: How do I get the leftmost/righmost character in a string?
Posted: Sat May 06, 2023 9:03 am
by The8th
Thanks for all who have tried the example.
I fear there is no solution for these characters.
doesn't work also (should show an 🅐).
Trim functions fail with an error:
Code: Select all
Define example$ = "🅐A🅚K🅝"
Debug RTrim(example$ , "🅝")
Debug LTrim(example$ , "🅐")
Caution" The example above trows a runtime error!
RemoveString works:
Code: Select all
Define example$ = "🅐A🅚K🅝"
Debug RemoveString(example$ , "🅐")
Debug RemoveString(example$ , "🅚")
Debug RemoveString(example$ , "🅝")
Henry
Re: How do I get the leftmost/righmost character in a string?
Posted: Sat May 06, 2023 9:22 am
by infratec
Very, very ugly, but working:
Code: Select all
EnableExplicit
Procedure.s LeftUnicode(String$, Len.i)
Protected *Buffer, Result$
*Buffer = UTF8(String$)
If *Buffer
Result$ = PeekS(*Buffer, Len, #PB_UTF8)
FreeMemory(*Buffer)
EndIf
ProcedureReturn Result$
EndProcedure
Procedure.s RightUnicode(String$, Len.i)
Protected *Buffer, Result$, *Ptr, Count
Protected NewList CharList$()
*Buffer = UTF8(String$)
If *Buffer
*Ptr = *Buffer
While Not PeekA(*Ptr) = 0
AddElement(CharList$())
CharList$() = PeekS(*Ptr, 1, #PB_UTF8)
*Ptr + StringByteLength(PeekS(*Ptr, 1, #PB_UTF8), #PB_UTF8)
Wend
FreeMemory(*Buffer)
If SelectElement(CharList$(), ListSize(CharList$()) - Len - 1)
While NextElement(CharList$())
Result$ + CharList$()
Wend
EndIf
EndIf
ProcedureReturn Result$
EndProcedure
Define Test$
Define example$ = "🅐A🅚K🅝"
Debug example$
Test$ = LeftUnicode(example$, 2)
Debug Test$
Test$ = RightUnicode(example$, 2)
Debug Test$
Re: How do I get the leftmost/righmost character in a string?
Posted: Sat May 06, 2023 9:31 am
by infratec
PB is using UCS-2 (in general

)
It looks like some functions are working with a bit more (by accident), like UTF8()
Re: How do I get the leftmost/righmost character in a string?
Posted: Sat May 06, 2023 10:16 am
by mk-soft
The result is still a UTF16 string.
Code: Select all
Test$ = LeftUnicode(example$, 2)
Debug Test$
Debug Len(Test$)
Test$ = RightUnicode(example$, 2)
Debug Test$
Debug Len(Test$)
Re: How do I get the leftmost/righmost character in a string?
Posted: Sat May 06, 2023 10:27 am
by infratec
Strange. Not tested this.
But then something inside PB is wrong.
Re: How do I get the leftmost/righmost character in a string?
Posted: Sat May 06, 2023 10:34 am
by mk-soft
If you find a string as UTF16, you have to treat it differently. So completely new string function (which are then slower).
Purebasic uses UCS-2, which is also 99.99% ok.
Code: Select all
Procedure IsUTF16String(String$)
Protected *String.Unicode = @String$
If *String
While *String\u
If *String\u > $D7FF And *String\u < $E000
ProcedureReturn #True
EndIf
*String + 2
Wend
EndIf
ProcedureReturn #False
EndProcedure
Procedure LenUTF16(String$)
Protected *Char.Unicode
Protected cnt
*Char.Unicode = @String$
If *Char
While *Char\u
If *Char\u > $D7FF And *Char\u < $E000
*Char + 4
len + 2
Else
*Char + 2
len + 1
EndIf
cnt + 1
Wend
EndIf
ProcedureReturn cnt
EndProcedure
Procedure.s LeftUTF16(String$, Length)
Protected *Char.Unicode
Protected len, cnt
If Length < 1
ProcedureReturn ""
EndIf
*Char.Unicode = @String$
If *Char
While *Char\u
If cnt >= Length
Break
EndIf
If *Char\u > $D7FF And *Char\u < $E000
*Char + 4
len + 2
Else
*Char + 2
len + 1
EndIf
cnt + 1
Wend
EndIf
ProcedureReturn Left(String$, len)
EndProcedure
Procedure.s RightUTF16(String$, Length)
Protected *Char.Unicode, *Char2.Unicode, *String.Unicode
Protected len, cnt
If Length < 1
ProcedureReturn ""
EndIf
*String = @String$
If *String
*Char = *String + StringByteLength(String$) - 2
While *Char\u
If cnt >= Length Or *Char <= *String
Break
EndIf
*Char2 = *Char - 2
If *Char2 >= *String And (*Char\u > $D7FF And *Char\u < $E000)
*Char - 4
len + 2
Else
*Char - 2
len + 1
EndIf
cnt + 1
Wend
EndIf
ProcedureReturn Right(String$, len)
EndProcedure
; ****
Define s1.s
s1 = "🅐A🅚K🅝"
Debug "Is '" + s1 + "' UFT16: " + IsUTF16String(s1)
Debug "StringByteLength = " + StringByteLength(s1)
Debug "Len = " + LenUTF16(s1)
Debug "Left = " + LeftUTF16(s1, 2)
Debug "Right = " + RightUTF16(s1, 2)
Re: How do I get the leftmost/righmost character in a string?
Posted: Sat May 06, 2023 11:22 am
by idle
I will see if I can add left mid right functions to the utf16 full casefolding module.
It should be doable in a single parse.