Just starting out? Need help? Post your questions and find answers here.
The8th
User
Posts: 29 Joined: Fri Sep 04, 2015 10:23 am
Post
by The8th » Sat May 06, 2023 7:37 am
I try to simply retrieve the rightmost/leftmost character in a string. But it does not work. How can I achieve this?
Code: Select all
EnableExplicit
Define example$ = "π
Aπ
Kπ
"
Define i.b
Debug Right(example$, 1)
Debug ""
Debug Left(example$, 1)
Debug ""
For i = 1 To Len(example$)
Debug Mid(example$, i, 1)
Next i
The output is:
οΏ½ (Seems to be $DD5D)
οΏ½ (Seems to be $D83C)
οΏ½
οΏ½
A
οΏ½
οΏ½
K
οΏ½
οΏ½
But it should be:
π
π
π
A
π
K
π
PB 6.01 LTS (x86)
Henry
Fred
Administrator
Posts: 18162 Joined: Fri May 17, 2002 4:39 pm
Location: France
Contact:
Post
by Fred » Sat May 06, 2023 7:51 am
May be the debugger font don't have these chars. Did you try to write in a file to see if it's right?
BarryG
Addict
Posts: 4123 Joined: Thu Apr 18, 2019 8:17 am
Post
by BarryG » Sat May 06, 2023 7:58 am
The debugger window doesn't show me those, either, no matter which font I set it to. For quick reference: neither Arial, Consolas, Dina, or Courier New show them. I tried this code to a file, but the file doesn't show them as well:
Code: Select all
EnableExplicit
Define example$ = "π
Aπ
Kπ
"
Define i.b
CreateFile(0,"d:\zzz.txt")
WriteStringN(0, Right(example$, 1))
WriteStringN(0, "")
WriteStringN(0, Left(example$, 1))
WriteStringN(0, "")
For i = 1 To Len(example$)
WriteStringN(0, Mid(example$, i, 1))
Next i
CloseFile(0)
#NULL
Addict
Posts: 1497 Joined: Thu Aug 30, 2007 11:54 pm
Location: right here
Post
by #NULL » Sat May 06, 2023 8:40 am
Fred wrote: Sat Apr 08, 2023 12:02 pm
PB supports only UCS2 unicode without surrogates supports, should be mentioned in the generate string section as basically every function which manipulate string is impacted.
Maybe those are surrogate characters? They seem to occupy 4 bytes.
Code: Select all
s1.s = "π
Aπ
Kπ
"
s2.s = Left(s1, 1)
s3.s = Right(s1, 1)
ShowMemoryViewer(@s1, 16) ; π
is '3C D8 50 DD', A is '41 00'
CallDebugger ; (click Debugger Continue)
ShowMemoryViewer(@s2, 4) ; shows '3C D8 00 00' i.e. only the first 2 bytes of π
CallDebugger
ShowMemoryViewer(@s3, 4) ; shows '5D DD 00 00' i.e. only the last 2 bytes of π
CallDebugger
infratec
Always Here
Posts: 7577 Joined: Sun Sep 07, 2008 12:45 pm
Location: Germany
Post
by infratec » Sat May 06, 2023 8:41 am
Code: Select all
ShowMemoryViewer(@example$, StringByteLength(example$))
Shows:
3C D8 50 DD 41 00 3C D8 5A DD 4B 00 3C D8 5D DD <ΓPΓA.<ΓZΓK.<Γ]Γ
infratec
Always Here
Posts: 7577 Joined: Sun Sep 07, 2008 12:45 pm
Location: Germany
Post
by infratec » Sat May 06, 2023 8:44 am
But ....
Code: Select all
Define example$ = "π
Aπ
Kπ
"
ShowMemoryViewer(@example$, StringByteLength(example$))
Debug Right(example$, 1)
Debug ""
Debug Left(example$, 1)
Debug ""
For i = 1 To Len(example$)
Debug Mid(example$, i, 1)
Next i
*Buffer = UTF8(example$)
Debug PeekS(*Buffer, -1, #PB_UTF8)
infratec
Always Here
Posts: 7577 Joined: Sun Sep 07, 2008 12:45 pm
Location: Germany
Post
by infratec » Sat May 06, 2023 8:52 am
Even more strange:
Code: Select all
Define example$ = "π
Aπ
Kπ
"
Debug example$
ShowMemoryViewer(@example$, StringByteLength(example$))
Debug Right(example$, 1)
Debug ""
Debug Left(example$, 1)
Debug ""
For i = 1 To Len(example$)
Debug Mid(example$, i, 1)
Next i
Debug "------"
*Buffer = UTF8(example$)
Converted$ = PeekS(*Buffer, -1, #PB_UTF8)
ShowMemoryViewer(*Buffer, MemorySize(*Buffer))
Debug Converted$
Debug PeekS(*Buffer, 1, #PB_UTF8)
Debug PeekS(*Buffer, 4, #PB_UTF8|#PB_ByteLength)
Debug Left(Converted$, 1)
*Buffer can be displayed correct, also Converted$, but not parts of it.
infratec
Always Here
Posts: 7577 Joined: Sun Sep 07, 2008 12:45 pm
Location: Germany
Post
by infratec » Sat May 06, 2023 9:01 am
As Result:
Code: Select all
Procedure.s LeftUnicode(String$, Len.i)
Protected *Buffer, Result$
*Buffer = UTF8(String$)
If *Buffer
Result$ = PeekS(*Buffer, Len, #PB_UTF8)
FreeMemory(*Buffer)
EndIf
ProcedureReturn Result$
EndProcedure
Define example$ = "π
Aπ
Kπ
"
Debug example$
Debug LeftUnicode(example$, 2)
But Rifgt is much more difficult. ReverseString is not working, and the size of the characters in bytes is unknown.
The8th
User
Posts: 29 Joined: Fri Sep 04, 2015 10:23 am
Post
by The8th » Sat May 06, 2023 9:03 am
Thanks for all who have tried the example.
I fear there is no solution for these characters.
doesn't work also (should show an π
).
Trim functions fail with an error:
Code: Select all
Define example$ = "π
Aπ
Kπ
"
Debug RTrim(example$ , "π
")
Debug LTrim(example$ , "π
")
Caution" The example above trows a runtime error!
RemoveString works:
Code: Select all
Define example$ = "π
Aπ
Kπ
"
Debug RemoveString(example$ , "π
")
Debug RemoveString(example$ , "π
")
Debug RemoveString(example$ , "π
")
Henry
infratec
Always Here
Posts: 7577 Joined: Sun Sep 07, 2008 12:45 pm
Location: Germany
Post
by infratec » Sat May 06, 2023 9:22 am
Very, very ugly, but working:
Code: Select all
EnableExplicit
Procedure.s LeftUnicode(String$, Len.i)
Protected *Buffer, Result$
*Buffer = UTF8(String$)
If *Buffer
Result$ = PeekS(*Buffer, Len, #PB_UTF8)
FreeMemory(*Buffer)
EndIf
ProcedureReturn Result$
EndProcedure
Procedure.s RightUnicode(String$, Len.i)
Protected *Buffer, Result$, *Ptr, Count
Protected NewList CharList$()
*Buffer = UTF8(String$)
If *Buffer
*Ptr = *Buffer
While Not PeekA(*Ptr) = 0
AddElement(CharList$())
CharList$() = PeekS(*Ptr, 1, #PB_UTF8)
*Ptr + StringByteLength(PeekS(*Ptr, 1, #PB_UTF8), #PB_UTF8)
Wend
FreeMemory(*Buffer)
If SelectElement(CharList$(), ListSize(CharList$()) - Len - 1)
While NextElement(CharList$())
Result$ + CharList$()
Wend
EndIf
EndIf
ProcedureReturn Result$
EndProcedure
Define Test$
Define example$ = "π
Aπ
Kπ
"
Debug example$
Test$ = LeftUnicode(example$, 2)
Debug Test$
Test$ = RightUnicode(example$, 2)
Debug Test$
infratec
Always Here
Posts: 7577 Joined: Sun Sep 07, 2008 12:45 pm
Location: Germany
Post
by infratec » Sat May 06, 2023 9:31 am
PB is using UCS-2 (in general
)
It looks like some functions are working with a bit more (by accident), like UTF8()
mk-soft
Always Here
Posts: 6204 Joined: Fri May 12, 2006 6:51 pm
Location: Germany
Post
by mk-soft » Sat May 06, 2023 10:16 am
The result is still a UTF16 string.
Code: Select all
Test$ = LeftUnicode(example$, 2)
Debug Test$
Debug Len(Test$)
Test$ = RightUnicode(example$, 2)
Debug Test$
Debug Len(Test$)
infratec
Always Here
Posts: 7577 Joined: Sun Sep 07, 2008 12:45 pm
Location: Germany
Post
by infratec » Sat May 06, 2023 10:27 am
Strange. Not tested this.
But then something inside PB is wrong.
mk-soft
Always Here
Posts: 6204 Joined: Fri May 12, 2006 6:51 pm
Location: Germany
Post
by mk-soft » Sat May 06, 2023 10:34 am
If you find a string as UTF16, you have to treat it differently. So completely new string function (which are then slower).
Purebasic uses UCS-2, which is also 99.99% ok.
Code: Select all
Procedure IsUTF16String(String$)
Protected *String.Unicode = @String$
If *String
While *String\u
If *String\u > $D7FF And *String\u < $E000
ProcedureReturn #True
EndIf
*String + 2
Wend
EndIf
ProcedureReturn #False
EndProcedure
Procedure LenUTF16(String$)
Protected *Char.Unicode
Protected cnt
*Char.Unicode = @String$
If *Char
While *Char\u
If *Char\u > $D7FF And *Char\u < $E000
*Char + 4
len + 2
Else
*Char + 2
len + 1
EndIf
cnt + 1
Wend
EndIf
ProcedureReturn cnt
EndProcedure
Procedure.s LeftUTF16(String$, Length)
Protected *Char.Unicode
Protected len, cnt
If Length < 1
ProcedureReturn ""
EndIf
*Char.Unicode = @String$
If *Char
While *Char\u
If cnt >= Length
Break
EndIf
If *Char\u > $D7FF And *Char\u < $E000
*Char + 4
len + 2
Else
*Char + 2
len + 1
EndIf
cnt + 1
Wend
EndIf
ProcedureReturn Left(String$, len)
EndProcedure
Procedure.s RightUTF16(String$, Length)
Protected *Char.Unicode, *Char2.Unicode, *String.Unicode
Protected len, cnt
If Length < 1
ProcedureReturn ""
EndIf
*String = @String$
If *String
*Char = *String + StringByteLength(String$) - 2
While *Char\u
If cnt >= Length Or *Char <= *String
Break
EndIf
*Char2 = *Char - 2
If *Char2 >= *String And (*Char\u > $D7FF And *Char\u < $E000)
*Char - 4
len + 2
Else
*Char - 2
len + 1
EndIf
cnt + 1
Wend
EndIf
ProcedureReturn Right(String$, len)
EndProcedure
; ****
Define s1.s
s1 = "π
Aπ
Kπ
"
Debug "Is '" + s1 + "' UFT16: " + IsUTF16String(s1)
Debug "StringByteLength = " + StringByteLength(s1)
Debug "Len = " + LenUTF16(s1)
Debug "Left = " + LeftUTF16(s1, 2)
Debug "Right = " + RightUTF16(s1, 2)
Last edited by
mk-soft on Sat May 06, 2023 11:43 am, edited 1 time in total.
idle
Always Here
Posts: 5836 Joined: Fri Sep 21, 2007 5:52 am
Location: New Zealand
Post
by idle » Sat May 06, 2023 11:22 am
I will see if I can add left mid right functions to the utf16 full casefolding module.
It should be doable in a single parse.