Sicro wrote: Sun Apr 25, 2021 5:59 pm
PureBasic uses UCS-2 (2 bytes per character) and is therefore limited to the character codes from 0 to 65,535 (
see PB help).
But even with this limitation the filling of the list takes some seconds (I tested it with the source code of the PureBasic IDE).
Maybe it would be better if not all characters are displayed at once. At the top of the window we could place several buttons with different character ranges or take a ComboBoxGadget for it, with which we could switch the displayed character ranges in the list.
@Sicro: PureBasic says it uses UCS-2 internally but I think that is a bit fiddly. I think the truth is that all of its string functions like Mid() , LSet() and so on simply operate on codepoints as if they were all two bytes long. Many functions that utilize strings actually make use of UTF-16. UTF-16 allows all of the Unicode codepoints to be written using either two or four bytes with a surrogate mechanism.
Here is a demonstration:
Code: Select all
Procedure handleError(value, text.s)
If Not value
MessageRequester("Error", text)
End
EndIf
EndProcedure
Procedure.s _Chr(v.i) ;return a proper surrogate pair for unicode values outside the BMP (Basic Multilingual Plane)
Protected high, low
If v < $10000
ProcedureReturn Chr(v)
Else
;calculate surrogate pair of unicode codepoints to represent value in UTF-16
v - $10000
high = v / $400 + $D800 ;high/lead surrogate value
low = v % $400 + $DC00 ;low/tail surrogate value
ProcedureReturn Chr(high) + Chr(low)
EndIf
EndProcedure
#imageWidth = 310
#imageHeight = 310
handleError(LoadFont(0, "Courier", 200), "Can't load font.")
handleError(CreateImage(0, #imageWidth, #imageHeight), "Can't to create image.")
If StartDrawing(ImageOutput(0))
DrawingFont(FontID(0))
a$ = _Chr($1F600)
DrawText(0, 0, a$)
StopDrawing()
EndIf
handleError(OpenWindow(0, 0, 0, #imageWidth, #imageHeight + 20, a$ + "Unicode Test" + a$), "Can't open window.")
ImageGadget(0, 0, 0, 0, 0, ImageID(0))
TextGadget(1, 5, #imageHeight, #imageWidth, 20, ReplaceString(Space(25), " ", a$))
Repeat: Until WaitWindowEvent() = #PB_Event_CloseWindow
- If you see a smiling emoji "
" after running the above code you can see that UTF-16 is being used by the DrawText() function and not UCS-2.
- If you see a line of smiling emoji in the TextGadget then you can see that UTF-16 is being used by the TextGadget() and not UCS-2.
- If you see a smiling emoji at the beginning and end of the Window's title then you can see that UTF-16 is being used by the OpenWindow() function and not UCS-2.
- If you see a smiling emoji in the debug window while debugging than you can see that the Debug command is using UTF-16 and that the font you are using in the Debug window also has a character for that codepoint.
When I run the code in WIndows 10 with PureBasic v5.73 LTS x64 I see smileys in all of the above areas.
As far as a chart of unicode or even only UCS-2 codepoints (and characters) goes, the number is very large and it wouldn't really make much sense to put that much info in picture form into the Help file. Also, as stated earlier the codepoint definitions are still in a process of change. UCS-2 is updated to keep it synchronized to changes in the BMP (Basic Multilingual Plane) of unicode. You'll notice that the chart that Saki linked to has many visible characters with a description of '(unknown)' which shows that the chart is not up-to-date and the website it was posted on was last updated in 2011 (by my guess). One example is codepoint 0220 ('Ƞ '). Codepoint 0220 has a description of 'LATIN CAPITAL LETTER N WITH LONG RIGHT LEG' in the unicode charts available from the link I posted.
I don't think buttons would work very well to select portions of the codepoint range to display simply because it is such a large range.
Note: I verified that the forum update now allows Unicode characters outside the BMP to be posted in messages. The Smiley emoticon in this message is the test case. Here are a few more 🀁🀂🀃

🀢🀣🀤🀥🀦🀧🀨🀩(mahjong tiles).