In the PB editor, at the "tools" menu, there is a useful option called "Character table", it is extended ASCII only.
Since newer versions of the compiler are unicode only, there would be interesting to implemente a Unicode Character Table.

It just doesn't work, there are way too many.Demivec wrote: Sun Apr 25, 2021 4:33 pm The unicode codepoints are quite extensive and also still in a state of change.
Perhaps a link to the symbols would be better.
http://www.unicode.org/charts/
@Sicro: PureBasic says it uses UCS-2 internally but I think that is a bit fiddly. I think the truth is that all of its string functions like Mid() , LSet() and so on simply operate on codepoints as if they were all two bytes long. Many functions that utilize strings actually make use of UTF-16. UTF-16 allows all of the Unicode codepoints to be written using either two or four bytes with a surrogate mechanism.Sicro wrote: Sun Apr 25, 2021 5:59 pm PureBasic uses UCS-2 (2 bytes per character) and is therefore limited to the character codes from 0 to 65,535 (see PB help).
But even with this limitation the filling of the list takes some seconds (I tested it with the source code of the PureBasic IDE).
Maybe it would be better if not all characters are displayed at once. At the top of the window we could place several buttons with different character ranges or take a ComboBoxGadget for it, with which we could switch the displayed character ranges in the list.
Code: Select all
Procedure handleError(value, text.s)
If Not value
MessageRequester("Error", text)
End
EndIf
EndProcedure
Procedure.s _Chr(v.i) ;return a proper surrogate pair for unicode values outside the BMP (Basic Multilingual Plane)
Protected high, low
If v < $10000
ProcedureReturn Chr(v)
Else
;calculate surrogate pair of unicode codepoints to represent value in UTF-16
v - $10000
high = v / $400 + $D800 ;high/lead surrogate value
low = v % $400 + $DC00 ;low/tail surrogate value
ProcedureReturn Chr(high) + Chr(low)
EndIf
EndProcedure
#imageWidth = 310
#imageHeight = 310
handleError(LoadFont(0, "Courier", 200), "Can't load font.")
handleError(CreateImage(0, #imageWidth, #imageHeight), "Can't to create image.")
If StartDrawing(ImageOutput(0))
DrawingFont(FontID(0))
a$ = _Chr($1F600)
DrawText(0, 0, a$)
StopDrawing()
EndIf
handleError(OpenWindow(0, 0, 0, #imageWidth, #imageHeight + 20, a$ + "Unicode Test" + a$), "Can't open window.")
ImageGadget(0, 0, 0, 0, 0, ImageID(0))
TextGadget(1, 5, #imageHeight, #imageWidth, 20, ReplaceString(Space(25), " ", a$))
Repeat: Until WaitWindowEvent() = #PB_Event_CloseWindow
Yes, the functions that display or draw strings interpret the UCS-2 string as UTF-16 (which is an extension of UCS-2). But it is actually the OS API functions that do that, not the PB functions.Demivec wrote: Thu Apr 29, 2021 2:44 am @Sicro: PureBasic says it uses UCS-2 internally but I think that is a bit fiddly. I think the truth is that all of its string functions like Mid() , LSet() and so on simply operate on codepoints as if they were all two bytes long. Many functions that utilize strings actually make use of UTF-16. UTF-16 allows all of the Unicode codepoints to be written using either two or four bytes with a surrogate mechanism.
Here is a demonstration:
[...]
Code: Select all
Len(one character string)
Ok, then I also think it would be better if a link to an always-up-to-date web page is inserted at the bottom of the characters table window.Demivec wrote: Thu Apr 29, 2021 2:44 am Also, as stated earlier the codepoint definitions are still in a process of change. UCS-2 is updated to keep it synchronized to changes in the BMP (Basic Multilingual Plane) of unicode.
That's cool. Probably the old forum did not useDemivec wrote: Thu Apr 29, 2021 2:44 am Note: I verified that the forum update now allows Unicode characters outside the BMP to be posted in messages. The Smiley emoticon in this message is the test case. Here are a few more 🀁🀂🀃🀢🀣🀤🀥🀦🀧🀨🀩(mahjong tiles).
Code: Select all
<meta charset="utf-8">