Page 2 of 2

Re: Revised Chr() & Asc() for UTF-16 surrogate pairs

Posted: Mon Apr 01, 2024 1:10 pm
by mk-soft
With one PeekS ...

Code: Select all

Procedure.s _Chr(v.i) ;return a proper surrogate pair for unicode values outside the BMP (Basic Multilingual Plane)
  Protected highlow.l
  If v < $10000
    ProcedureReturn Chr(v)
  Else
    ;calculate surrogate pair of unicode codepoints to represent value in UTF-16
    v - $10000
    ; high/lead << low/tail surrogate value
    highlow = (v / $400 + $D800) | (v % $400 + $DC00) << 16 
    ProcedureReturn PeekS(@highlow, 2, #PB_Unicode)
  EndIf
EndProcedure

Debug _Chr($1F600)  ; Smiley

Re: Revised Chr() & Asc() for UTF-16 surrogate pairs

Posted: Mon Apr 01, 2024 4:29 pm
by infratec
I think your last version is the slowest due to several calls to other procedures.

Your previous version is faster :

Code: Select all

Procedure.s _Chr(v.i) ;return a proper surrogate pair for unicode values outside the BMP (Basic Multilingual Plane)
  
  Protected r.s{2}, *p.Character
  
  
  *p = @r
  
  If v < $10000
    *p\c = v
;     *p + 2          ; not needed, since PB initializes everything with 0
;     *p\c = #Null
  Else  ; calculate surrogate pair of unicode codepoints to represent value in UTF-16
    v - $10000
    *p\c = v / $400 + $D800 ; high/lead surrogate value
    *p + 2
    *p\c = v % $400 + $DC00 ; low/tail surrogate value
  EndIf
  
  ProcedureReturn r
  
EndProcedure


a$ = _Chr($1F600) + " Smiley"
Debug a$
a$ = _Chr($0040) + " At"
Debug a$

Re: Revised Chr() & Asc() for UTF-16 surrogate pairs

Posted: Mon Apr 01, 2024 4:41 pm
by mk-soft
Had the impression that pointers were not wanted ;)

Re: Revised Chr() & Asc() for UTF-16 surrogate pairs

Posted: Mon Apr 01, 2024 4:43 pm
by STARGĂ…TE
infratec wrote: Mon Apr 01, 2024 4:29 pm I think your last version is the slowest due to several calls to other procedures.
When we talk about speed, than we should avoid division of $400:

Code: Select all

Procedure.s _Chr(Unicode.i)
	
	Protected String.s{2}
	Protected *Long.Long = @String
	
	If Unicode < $10000
		*Long\l = Unicode
	Else
		Unicode - $10000
		*Long\l = (Unicode>>10) | (Unicode&$3FF)<<16 | $DC00D800
	EndIf
	
	ProcedureReturn String
  
EndProcedure


a$ = _Chr($1F600) + " Smiley"
Debug a$
a$ = _Chr($0040) + " At"
Debug a$

Re: Revised Chr() & Asc() for UTF-16 surrogate pairs

Posted: Mon Apr 01, 2024 6:30 pm
by infratec
Yep, that's true.

But ... for a not bit affine person it's now no longer understandable what happens.

Re: Revised Chr() & Asc() for UTF-16 surrogate pairs

Posted: Wed Apr 17, 2024 9:15 pm
by idle
infratec wrote: Mon Apr 01, 2024 6:30 pm Yep, that's true.

But ... for a not bit affine person it's now no longer understandable what happens.
They can watch and learn or ask questions. Speed 1st