It is currently Mon Nov 30, 2020 5:42 am

All times are UTC + 1 hour




Post new topic Reply to topic  [ 9 posts ] 
Author Message
 Post subject: Revised Chr() & Asc() for UTF-16 surrogate pairs
PostPosted: Fri Feb 19, 2016 4:51 am 
Offline
Addict
Addict
User avatar

Joined: Mon Jul 25, 2005 3:51 pm
Posts: 3756
Location: Utah, USA
Here is a set of replacement functions for Chr() and Asc() to handle UTF-16 surrogate code points in unicode compilations.

The functions use macros to allow the seamless replacement of PureBasic's native functions.

Essentially, PureBasic limits the codepoint values returned from Chr() to: 0 <= value$ <= $FFFF. My Chr() replacement will return a pair of UTF-16 surrogate codepoints that represent unicode characters (codepoints) for values > $FFFF that encode values as high as $10FFFF. It also returns the same values as PureBasic for the lower range of values.

For PureBasic's Asc() you can only obtain values for a single codepoint and not for a pair of surrogate code points that are needed for characters (codepoints) > $FFFF. My Asc() replacement will check the parameter of Asc() to see if it is a matching pair of UTF-16 surrogate codepoints and return the value encoded by them.

Code:
;File Name: UTF-16 Chr() and Asc() functions.pbi
;Author: Demivec
;Created: 02/18/2016
;Updated: 02/23/2016
;Version: v01.01
;OS: All ;only tested on Windows
;Compiler: PureBasic v5.41 x64
;License: open and free to use and abuse; no guarantees
;Forum: http://www.purebasic.fr/english/viewtopic.php?f=12&t=64947
;Description: Replacements for PureBasic's Chr() and Asc() functions.
;  The replacements allow for proper handling of all values in the UTF-16 range.
;  Specifically Chr() now returns a surrogate pair of codepoints for values > $FFFF and
;  Asc() will return a value for the corresponding surrogate pair of codepoints.
;  This allows the full unicode codepoint range (0 <= $10FFF).

CompilerIf #PB_Compiler_Unicode = 0
  CompilerError "Requires compiling as unicode."
CompilerEndIf

Procedure.s _Chr(v.i) ;return a proper surrogate pair for unicode values outside the BMP (Basic Multilingual Plane)
  Protected high, low
  If v < $10000
    ProcedureReturn Chr(v)
  Else
    ;calculate surrogate pair of unicode codepoints to represent value in UTF-16
    v - $10000
    high = v / $400 + $D800 ;high/lead surrogate value
    low = v % $400 + $DC00 ;low/tail surrogate value
    ProcedureReturn Chr(high) + Chr(low)
  EndIf
EndProcedure

Macro Chr(v = 0)
  _Chr(v)
EndMacro


Procedure _Asc(u$)  ;return a proper codepoint value for a UTF-16 surrogate pair
  Protected *u = @u$, high = PeekU(*u), low
  Select high
    Case 0 To $D7FF, $DC00 To $FFFF ;includes range for low surrogate value ($DC00 to $DFFF)
      ProcedureReturn high             ;return value as is (may be an unmatched low surrogate value)
    Case $D800 To $DBFF
      low = PeekU(*u + SizeOf(Unicode))
      If low & $DC00 = $DC00 ;low >= $DC00 And low <= $DFFF
        ProcedureReturn (high - $D800) * $400 + (low - $DC00) + $10000 ;return decoded surrogate pair
      EndIf
     
      ProcedureReturn high ;an unmatched high surrogate value, return value as is
  EndSelect
EndProcedure

Macro Asc(u = "")
  _Asc(u)
EndMacro

CompilerIf #PB_Compiler_IsMainFile
  ;Sample range of values starting at the low end of the Unicode BMP (Basic Multilingual Plane)
  ;and moving through the high/low surrogate pairs and ending at the start of SMP (Supplemental Multilingual Plane).
  Define i, m$, d
 
  For i = $0 To $11000
    m$ = Chr(i)
    d = Asc(m$)
    Debug  "$" + Hex(i) + "; Asc: " + Hex(d) + " Chr: " + m$
  Next
CompilerEndIf



@Edit: Made a change to increase speed of the Asc() function by 5%.
@Edit2: Added the full URL to this thread to the source code.

_________________
Image


Last edited by Demivec on Mon Feb 27, 2017 8:37 am, edited 3 times in total.

Top
 Profile  
Reply with quote  
 Post subject: Re: Revised Chr() & Asc() for UTF-16 surrogate pairs
PostPosted: Fri Feb 19, 2016 6:00 am 
Offline
Addict
Addict

Joined: Thu Jun 07, 2007 3:25 pm
Posts: 3965
Location: Berlin, Germany
Many thanks, Demivec!

_________________
Please excuse my flawed English. My native language is PureBasic.
Search
RSBasic's backups


Top
 Profile  
Reply with quote  
 Post subject: Re: Revised Chr() & Asc() for UTF-16 surrogate pairs
PostPosted: Fri Feb 19, 2016 10:40 am 
Offline
Addict
Addict

Joined: Fri Nov 09, 2012 11:04 pm
Posts: 1792
Location: Uttoxeter, UK
@Demivec,
Interesting; something new to learn.
The output seems a little odd, though:
The debug font seems to change to a mono font from time-to-time. At $3001 it appears to become a mono font with a reversion at $3022 and each increment of $65 thereafter!

_________________
DE AA EB


Top
 Profile  
Reply with quote  
 Post subject: Re: Revised Chr() & Asc() for UTF-16 surrogate pairs
PostPosted: Fri Feb 19, 2016 1:59 pm 
Offline
Addict
Addict
User avatar

Joined: Mon Jul 25, 2005 3:51 pm
Posts: 3756
Location: Utah, USA
davido wrote:
@Demivec,
Interesting; something new to learn.
The output seems a little odd, though:
The debug font seems to change to a mono font from time-to-time. At $3001 it appears to become a mono font with a reversion at $3022 and each increment of $65 thereafter!

If you are running Windows, it selects a different font if you the one you are using doesn't have a glyph for the character you are trying to print. You will notice though that there are still many codepoints that don't have a visible glyph or are not included in a font yet.

_________________
Image


Top
 Profile  
Reply with quote  
 Post subject: Re: Revised Chr() & Asc() for UTF-16 surrogate pairs
PostPosted: Fri Feb 19, 2016 5:40 pm 
Offline
Addict
Addict

Joined: Fri Nov 09, 2012 11:04 pm
Posts: 1792
Location: Uttoxeter, UK
@Demivec,
I am running Windows 10.
Thank you very much for the explanation.

_________________
DE AA EB


Top
 Profile  
Reply with quote  
 Post subject: Re: Revised Chr() & Asc() for UTF-16 surrogate pairs
PostPosted: Mon Apr 09, 2018 10:33 pm 
Offline
New User
New User
User avatar

Joined: Wed Jan 10, 2018 2:33 pm
Posts: 4
thank you much for this code . to display Emoji etc.. on my MAC ., i was not able to understand the problem..

_________________
MacBookAir7,2 LowSierra x64 PB 5.62


Top
 Profile  
Reply with quote  
 Post subject: Re: Revised Chr() & Asc() for UTF-16 surrogate pairs
PostPosted: Tue Apr 10, 2018 6:40 am 
Offline
Addict
Addict
User avatar

Joined: Thu Jan 10, 2008 1:30 pm
Posts: 1320
Location: Germany, Glienicke
If you work with strings, which includes character over $FFFF you need also "new" functions für Len(), Mid() etc.
Here are my solution: http://www.purebasic.fr/german/viewtopic.php?p=340514#p340514

_________________
ImageImage


Top
 Profile  
Reply with quote  
 Post subject: Re: Revised Chr() & Asc() for UTF-16 surrogate pairs
PostPosted: Tue Apr 10, 2018 6:52 am 
Offline
PureBasic Expert
PureBasic Expert

Joined: Sun Aug 08, 2004 5:21 am
Posts: 3706
Location: Netherlands
STARGÅTE wrote:
If you work with strings, which includes character over $FFFF you need also "new" functions für Len(), Mid() etc.
Here are my solution: http://www.purebasic.fr/german/viewtopic.php?p=340514#p340514

Very nice :)

For functions like Left, Right , Mid and Len, you could consider writing asm procedures to make things faster.

_________________
macOS 10.15 Catalina, Windows 10


Top
 Profile  
Reply with quote  
 Post subject: Re: Revised Chr() & Asc() for UTF-16 surrogate pairs
PostPosted: Tue Apr 10, 2018 9:10 am 
Offline
New User
New User
User avatar

Joined: Wed Jan 10, 2018 2:33 pm
Posts: 4
very useful, it works , i have to say there are others problems from system
for the font is searched to obtain the special glyph and i often have to make a
"LoadFont" again to retrieve my crushed font.
these new large Unicode seem difficult to display..

_________________
MacBookAir7,2 LowSierra x64 PB 5.62


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 9 posts ] 

All times are UTC + 1 hour


Who is online

Users browsing this forum: No registered users and 18 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
Jump to:  

 


Powered by phpBB © 2008 phpBB Group
subSilver+ theme by Canver Software, sponsor Sanal Modifiye