Code: Select all
;16 characters (grapheme clusters)
text$="Приве́т नमस्ते שָׁלוֹם"
Debug Len(text$) ;22 codepoints
Debug StringByteLength(text$) ;44 bytes
Debug StringByteLength(text$, #PB_UTF8) ;48 bytes
text$=text$+#CRLF$+UCase(text$)+#CRLF$+LCase(text$)
MessageRequester("Test", text$)To my eyes it looks like 16 character total (14 symbols and 2 spaces)
As stated on http://utf8everywhere.org/ that string actually consists of 22 code points but only 16 grapheme clusters.
In latest Firefox this seems to be true, the cursor moves 16 places.For cursor movement, text selection and alike, grapheme clusters shall be used.
But in the PB IDE (v5.31) the cursor moves 22 places.
The first word seems to be correctly upper and lowercased in the requester, the other two words seem to be ignored, this is probably a issue with the locales installed on the system.
On your system this may end up behaving differently.
It would be interesting to hear about other unicode oddities that people find.


