The functions use macros to allow the seamless replacement of PureBasic's native functions.
Essentially, PureBasic limits the codepoint values returned from Chr() to: 0 <= value$ <= $FFFF. My Chr() replacement will return a pair of UTF-16 surrogate codepoints that represent unicode characters (codepoints) for values > $FFFF that encode values as high as $10FFFF. It also returns the same values as PureBasic for the lower range of values.
For PureBasic's Asc() you can only obtain values for a single codepoint and not for a pair of surrogate code points that are needed for characters (codepoints) > $FFFF. My Asc() replacement will check the parameter of Asc() to see if it is a matching pair of UTF-16 surrogate codepoints and return the value encoded by them.
Code: Select all
;File Name: UTF-16 Chr() and Asc() functions.pbi ;Author: Demivec ;Created: 02/18/2016 ;Updated: 02/23/2016 ;Version: v01.01 ;OS: All ;only tested on Windows ;Compiler: PureBasic v5.41 x64 ;License: open and free to use and abuse; no guarantees ;Forum: http://www.purebasic.fr/english/viewtopic.php?f=12&t=64947 ;Description: Replacements for PureBasic's Chr() and Asc() functions. ; The replacements allow for proper handling of all values in the UTF-16 range. ; Specifically Chr() now returns a surrogate pair of codepoints for values > $FFFF and ; Asc() will return a value for the corresponding surrogate pair of codepoints. ; This allows the full unicode codepoint range (0 <= $10FFF). CompilerIf #PB_Compiler_Unicode = 0 CompilerError "Requires compiling as unicode." CompilerEndIf Procedure.s _Chr(v.i) ;return a proper surrogate pair for unicode values outside the BMP (Basic Multilingual Plane) Protected high, low If v < $10000 ProcedureReturn Chr(v) Else ;calculate surrogate pair of unicode codepoints to represent value in UTF-16 v - $10000 high = v / $400 + $D800 ;high/lead surrogate value low = v % $400 + $DC00 ;low/tail surrogate value ProcedureReturn Chr(high) + Chr(low) EndIf EndProcedure Macro Chr(v = 0) _Chr(v) EndMacro Procedure _Asc(u$) ;return a proper codepoint value for a UTF-16 surrogate pair Protected *u = @u$, high = PeekU(*u), low Select high Case 0 To $D7FF, $DC00 To $FFFF ;includes range for low surrogate value ($DC00 to $DFFF) ProcedureReturn high ;return value as is (may be an unmatched low surrogate value) Case $D800 To $DBFF low = PeekU(*u + SizeOf(Unicode)) If low & $DC00 = $DC00 ;low >= $DC00 And low <= $DFFF ProcedureReturn (high - $D800) * $400 + (low - $DC00) + $10000 ;return decoded surrogate pair EndIf ProcedureReturn high ;an unmatched high surrogate value, return value as is EndSelect EndProcedure Macro Asc(u = "") _Asc(u) EndMacro CompilerIf #PB_Compiler_IsMainFile ;Sample range of values starting at the low end of the Unicode BMP (Basic Multilingual Plane) ;and moving through the high/low surrogate pairs and ending at the start of SMP (Supplemental Multilingual Plane). Define i, m$, d For i = $0 To $11000 m$ = Chr(i) d = Asc(m$) Debug "$" + Hex(i) + "; Asc: " + Hex(d) + " Chr: " + m$ Next CompilerEndIf
@Edit2: Added the full URL to this thread to the source code.