Revised Chr() & Asc() for UTF-16 surrogate pairs

Share your advanced PureBasic knowledge/code with the community.
User avatar
mk-soft
Always Here
Always Here
Posts: 6201
Joined: Fri May 12, 2006 6:51 pm
Location: Germany

Re: Revised Chr() & Asc() for UTF-16 surrogate pairs

Post by mk-soft »

With one PeekS ...

Code: Select all

Procedure.s _Chr(v.i) ;return a proper surrogate pair for unicode values outside the BMP (Basic Multilingual Plane)
  Protected highlow.l
  If v < $10000
    ProcedureReturn Chr(v)
  Else
    ;calculate surrogate pair of unicode codepoints to represent value in UTF-16
    v - $10000
    ; high/lead << low/tail surrogate value
    highlow = (v / $400 + $D800) | (v % $400 + $DC00) << 16 
    ProcedureReturn PeekS(@highlow, 2, #PB_Unicode)
  EndIf
EndProcedure

Debug _Chr($1F600)  ; Smiley
My Projects ThreadToGUI / OOP-BaseClass / EventDesigner V3
PB v3.30 / v5.75 - OS Mac Mini OSX 10.xx - VM Window Pro / Linux Ubuntu
Downloads on my Webspace / OneDrive
infratec
Always Here
Always Here
Posts: 7575
Joined: Sun Sep 07, 2008 12:45 pm
Location: Germany

Re: Revised Chr() & Asc() for UTF-16 surrogate pairs

Post by infratec »

I think your last version is the slowest due to several calls to other procedures.

Your previous version is faster :

Code: Select all

Procedure.s _Chr(v.i) ;return a proper surrogate pair for unicode values outside the BMP (Basic Multilingual Plane)
  
  Protected r.s{2}, *p.Character
  
  
  *p = @r
  
  If v < $10000
    *p\c = v
;     *p + 2          ; not needed, since PB initializes everything with 0
;     *p\c = #Null
  Else  ; calculate surrogate pair of unicode codepoints to represent value in UTF-16
    v - $10000
    *p\c = v / $400 + $D800 ; high/lead surrogate value
    *p + 2
    *p\c = v % $400 + $DC00 ; low/tail surrogate value
  EndIf
  
  ProcedureReturn r
  
EndProcedure


a$ = _Chr($1F600) + " Smiley"
Debug a$
a$ = _Chr($0040) + " At"
Debug a$
User avatar
mk-soft
Always Here
Always Here
Posts: 6201
Joined: Fri May 12, 2006 6:51 pm
Location: Germany

Re: Revised Chr() & Asc() for UTF-16 surrogate pairs

Post by mk-soft »

Had the impression that pointers were not wanted ;)
My Projects ThreadToGUI / OOP-BaseClass / EventDesigner V3
PB v3.30 / v5.75 - OS Mac Mini OSX 10.xx - VM Window Pro / Linux Ubuntu
Downloads on my Webspace / OneDrive
User avatar
STARGÅTE
Addict
Addict
Posts: 2226
Joined: Thu Jan 10, 2008 1:30 pm
Location: Germany, Glienicke
Contact:

Re: Revised Chr() & Asc() for UTF-16 surrogate pairs

Post by STARGÅTE »

infratec wrote: Mon Apr 01, 2024 4:29 pm I think your last version is the slowest due to several calls to other procedures.
When we talk about speed, than we should avoid division of $400:

Code: Select all

Procedure.s _Chr(Unicode.i)
	
	Protected String.s{2}
	Protected *Long.Long = @String
	
	If Unicode < $10000
		*Long\l = Unicode
	Else
		Unicode - $10000
		*Long\l = (Unicode>>10) | (Unicode&$3FF)<<16 | $DC00D800
	EndIf
	
	ProcedureReturn String
  
EndProcedure


a$ = _Chr($1F600) + " Smiley"
Debug a$
a$ = _Chr($0040) + " At"
Debug a$
PB 6.01 ― Win 10, 21H2 ― Ryzen 9 3900X, 32 GB ― NVIDIA GeForce RTX 3080 ― Vivaldi 6.0 ― www.unionbytes.de
Lizard - Script language for symbolic calculations and moreTypeface - Sprite-based font include/module
infratec
Always Here
Always Here
Posts: 7575
Joined: Sun Sep 07, 2008 12:45 pm
Location: Germany

Re: Revised Chr() & Asc() for UTF-16 surrogate pairs

Post by infratec »

Yep, that's true.

But ... for a not bit affine person it's now no longer understandable what happens.
User avatar
idle
Always Here
Always Here
Posts: 5834
Joined: Fri Sep 21, 2007 5:52 am
Location: New Zealand

Re: Revised Chr() & Asc() for UTF-16 surrogate pairs

Post by idle »

infratec wrote: Mon Apr 01, 2024 6:30 pm Yep, that's true.

But ... for a not bit affine person it's now no longer understandable what happens.
They can watch and learn or ask questions. Speed 1st
Post Reply