PB's Unicode string mode seems to really be UTF-16, meaning characters over $FFFF can be represented as surrogate pairs (two 16-bit values).
For example, you can use the Unicode RUNNER character U+1F3C3 in your GUI with the pair $D83C $DFC3.
However it's not simple to use these high characters in your PB code.
My request is:
1. Chr() accepts characters > $FFFF and generates the surrogate pair (string with Len() = 2)
AND/OR
2. Chr() accepts surrogate pair constants ($D800-$DFFF) without converting them to REPLACEMENT CHAR ($FFFD)
Try the example below.
Methods 1-3 show three ways of using surrogate pairs (PokeU, Chr with variable inputs, DataSection)
Method 4 passes the high codepoint to Chr(), which truncates it to 16-bit and produces the wrong character

Method 5 tries to build the surrogate pair with two Chr() calls, but they are converted to $FFFD instead

Method 6 shows a custom "ChrU" procedure which corrects case 4-5!
Please consider this functionality for future PB, especially now that Unicode mode is standard!

Code: Select all
; A Unicode character > $FFFF
#Runner_Codepoint = $1F3C3
; Represented as UTF-16 surrogate pair
#Runner_HighSurrogate = $D83C
#Runner_LowSurrogate = $DFC3
CompilerIf Not #PB_Compiler_Unicode
CompilerError "Compile in Unicode mode"
CompilerEndIf
Debug "Use a debugger font like Segoe UI Symbol..."
Debug ""
; Method 1: Poke surrogate pair
Str$ = Space(2)
PokeU(@Str$, #Runner_HighSurrogate)
PokeU(@Str$ + 2, #Runner_LowSurrogate)
Debug Str$
; Method 2: Chr() with variables
hi = #Runner_HighSurrogate
lo = #Runner_LowSurrogate
Str$ = Chr(hi) + Chr(lo)
Debug Str$
; Method 3: Data Section
DataSection
UTF16_String:
Data.u #Runner_HighSurrogate, #Runner_LowSurrogate, #NUL
EndDataSection
Str$ = PeekS(?UTF16_String, -1, #PB_Unicode) ; same as #PB_UTF16
Debug Str$
; Method 4: Chr() with value > $FFFF - TRUNCATED TO 16-BIT
Str$ = Chr(#Runner_Codepoint)
Debug Str$
;ShowMemoryViewer(@Str$, StringByteLength(Str$) + 2)
; Method 5: Chr() with constants - CONVERTED TO $FFFD REPLACEMENT CHARS
Str$ = Chr(#Runner_HighSurrogate) + Chr(#Runner_LowSurrogate)
Debug Str$
;ShowMemoryViewer(@Str$, StringByteLength(Str$) + 2)
; Method 6: Modified Chr() which can output surrogate pairs
Procedure.s ChrU(Codepoint.i)
If (Codepoint > $FFFF)
Result.s = " "
Codepoint - $10000
PokeU(@Result, $D800 + ((Codepoint >> 10) & $3FF))
PokeU(@Result + 2, $DC00 + (Codepoint & $3FF))
ProcedureReturn Result
ElseIf (Codepoint >= $0000)
ProcedureReturn Chr(Codepoint)
Else
ProcedureReturn ""
EndIf
EndProcedure
Str$ = ChrU(#Runner_Codepoint)
Debug Str$
Str$ = ChrU(#Runner_HighSurrogate) + ChrU(#Runner_LowSurrogate)
Debug Str$