Symbols appear corrupted

Working on new editor enhancements?
User avatar
marcoagpinto
Addict
Addict
Posts: 945
Joined: Sun Mar 10, 2013 3:01 pm
Location: Portugal
Contact:

Symbols appear corrupted

Post by marcoagpinto »

Hello!

I am using emojis and they appear well if I paste them into the IDE, but while displaying them during execution I get corrupt results:
http://pastebin.com/WxBZ8WF5

Arial font.
Fred
Administrator
Administrator
Posts: 16664
Joined: Fri May 17, 2002 4:39 pm
Location: France
Contact:

Re: [PB5.44][PB5.60b6] Symbols appear corrupted

Post by Fred »

It's probably arial font which is not supporting these symbols ?
User avatar
marcoagpinto
Addict
Addict
Posts: 945
Joined: Sun Mar 10, 2013 3:01 pm
Location: Portugal
Contact:

Re: [PB5.44][PB5.60b6] Symbols appear corrupted

Post by marcoagpinto »

Fred wrote:It's probably arial font which is not supporting these symbols ?
I could swear that I had Arial font on when I inserted the symbols in LibreOffice.
User avatar
Demivec
Addict
Addict
Posts: 4089
Joined: Mon Jul 25, 2005 3:51 pm
Location: Utah, USA

Re: [PB5.44][PB5.60b6] Symbols appear corrupted

Post by Demivec »

marcoagpinto wrote:Hello!

I am using emojis and they appear well if I paste them into the IDE, but while displaying them during execution I get corrupt results:
http://pastebin.com/WxBZ8WF5

Arial font.
It should be the same problem as you talked about in this thread: http://www.purebasic.fr/english/viewtopic.php?f=13&t=67687&p=501763#p501763
You need to realize that the codepoint for the emoji's require 4 bytes to store in PureBasic if using a Utf 16 encoded string (the kind PureBasic uses) but can also be stored as Utf 8 (like the source code). The problem usually occurs when either a literal string or a value composed using Chr () is used in the source code as these may not be properly encoded for Utf 16. These problems can be avoided by using a buffer to hold the properly encoded value.

My example code in your other thread demonstrates anot her way to solve this problem for these high codepoint vales in unicode by using a custom function for Chr ().
User avatar
marcoagpinto
Addict
Addict
Posts: 945
Joined: Sun Mar 10, 2013 3:01 pm
Location: Portugal
Contact:

Re: [PB5.44][PB5.60b6] Symbols appear corrupted

Post by marcoagpinto »

Demivec wrote:
marcoagpinto wrote:Hello!

I am using emojis and they appear well if I paste them into the IDE, but while displaying them during execution I get corrupt results:
http://pastebin.com/WxBZ8WF5

Arial font.
It should be the same problem as you talked about in this thread: http://www.purebasic.fr/english/viewtopic.php?f=13&t=67687&p=501763#p501763
You need to realize that the codepoint for the emoji's require 4 bytes to store in PureBasic if using a Utf 16 encoded string (the kind PureBasic uses) but can also be stored as Utf 8 (like the source code). The problem usually occurs when either a literal string or a value composed using Chr () is used in the source code as these may not be properly encoded for Utf 16. These problems can be avoided by using a buffer to hold the properly encoded value.

My example code in your other thread demonstrates anot her way to solve this problem for these high codepoint vales in unicode by using a custom function for Chr ().
My friend,

Could you explain to me how do I convert the symbols to ASCII so that I can use your function?

My ASC(symbol) gave the same value for both emojis.

Thank you,
User avatar
kenmo
Addict
Addict
Posts: 1967
Joined: Tue Dec 23, 2003 3:54 am

Re: [PB5.44][PB5.60b6] Symbols appear corrupted

Post by kenmo »

1. I use "Segoe UI Symbol" for the Debugger, because it supports (most?) emoji and other symbols.
Tested "Arial" font - did not show symbols.
Back to "Segoe UI Symbol" - output changed to symbols, without even re-running the test program.

2. Asc() gave you the same value, because Asc() only returns a 16-bit value... so you're only getting part of any codepoint over $FFFF (the "high surrogate" half).

From http://www.purebasic.fr/english/viewtop ... 12&t=64947

Code: Select all

Procedure.s _Chr(v.i) ;return a proper surrogate pair for unicode values outside the BMP (Basic Multilingual Plane)
  Protected high, low
  If v < $10000
    ProcedureReturn Chr(v)
  Else
    ;calculate surrogate pair of unicode codepoints to represent value in UTF-16
    v - $10000
    high = v / $400 + $D800 ;high/lead surrogate value
    low = v % $400 + $DC00 ;low/tail surrogate value
    ProcedureReturn Chr(high) + Chr(low)
  EndIf
EndProcedure

Procedure _Asc(u$)  ;return a proper codepoint value for a UTF-16 surrogate pair
  Protected *u = @u$, high = PeekU(*u), low
  Select high
    Case 0 To $D7FF, $DC00 To $FFFF ;includes range for low surrogate value ($DC00 to $DFFF)
      ProcedureReturn high             ;return value as is (may be an unmatched low surrogate value)
    Case $D800 To $DBFF
      low = PeekU(*u + SizeOf(Unicode))
      If low & $DC00 = $DC00 ;low >= $DC00 And low <= $DFFF
        ProcedureReturn (high - $D800) * $400 + (low - $DC00) + $10000 ;return decoded surrogate pair
      EndIf
     
      ProcedureReturn high ;an unmatched high surrogate value, return value as is
  EndSelect
EndProcedure

Text.s = _Chr(128299)
Debug Text
Debug Asc(Text)
Debug _Asc(Text)

Text.s = _Chr(128294)
Debug Text
Debug Asc(Text)
Debug _Asc(Text)
User avatar
marcoagpinto
Addict
Addict
Posts: 945
Joined: Sun Mar 10, 2013 3:01 pm
Location: Portugal
Contact:

Re: [PB5.44][PB5.60b6] Symbols appear corrupted

Post by marcoagpinto »

Thank you, my friend!
User avatar
Keya
Addict
Addict
Posts: 1891
Joined: Thu Jun 04, 2015 7:10 am

Re: [PB5.44][PB5.60b6] Symbols appear corrupted

Post by Keya »

interesting and eyeopening thread, thanks to all posters. I've had no problems getting them to display in Win7-64 (tested both PB x86 and x64), but i've had no luck getting them to display in XP-32 (including using the exact same Segue UI .ttf file from Win7). Not sure if thats just a limitation of XP or 32bit OS (i havent tried any other 32bit Windows), or maybe im just doing something wrong or maybe my XP VM sucks. Actually i know my XP VM sucks, but i trust it more than my Win10 VM which sucks more.
User avatar
Demivec
Addict
Addict
Posts: 4089
Joined: Mon Jul 25, 2005 3:51 pm
Location: Utah, USA

Re: [PB5.44][PB5.60b6] Symbols appear corrupted

Post by Demivec »

Keya wrote:interesting and eyeopening thread, thanks to all posters. I've had no problems getting them to display in Win7-64 (tested both PB x86 and x64), but i've had no luck getting them to display in XP-32 (including using the exact same Segue UI .ttf file from Win7). Not sure if thats just a limitation of XP or 32bit OS (i havent tried any other 32bit Windows), or maybe im just doing something wrong or maybe my XP VM sucks. Actually i know my XP VM sucks, but i trust it more than my Win10 VM which sucks more.
Maybe XP has an older version of the Segue UI font, before the emoji codepoints were added to unicode. They were added in version 6.0 of the Unicode Standard in 10/10/2010. You would have to install a later version of the font on XP to see the characters.

@Edit: Added details on the codepoints in question.
Post Reply