Page 1 of 2

PB 6.20: Emojis above 65535?

Posted: Sun Feb 16, 2025 8:55 am
by marcoagpinto
Heya,

Since PB 6.20 accepts loading text in ASCII, UTF-8, and UTF-16, does it mean that I can now display emojis above chr 65535 in UTF-8?

Thanks!

Re: PB 6.20: Emojis above 65535?

Posted: Sun Feb 16, 2025 9:01 am
by jacdelad
marcoagpinto wrote: Sun Feb 16, 2025 8:55 am Heya,

Since PB 6.20 accepts loading text in ASCII, UTF-8, and UTF-16,[...]
Hi,
where did you get this information? I can't find it.

Re: PB 6.20: Emojis above 65535?

Posted: Sun Feb 16, 2025 9:14 am
by marcoagpinto
jacdelad wrote: Sun Feb 16, 2025 9:01 am
marcoagpinto wrote: Sun Feb 16, 2025 8:55 am Heya,

Since PB 6.20 accepts loading text in ASCII, UTF-8, and UTF-16,[...]
Hi,
where did you get this information? I can't find it.

Code: Select all

  ReadFile(1,File$)  
    string_format=ReadStringFormat(1)
Press F1 on:

Code: Select all

ReadStringFormat(1)

Re: PB 6.20: Emojis above 65535?

Posted: Sun Feb 16, 2025 12:01 pm
by infratec
:?: :?: :?:

SIne a long time there is written that it can detect many formats, but only
#PB_Ascii, #PB_UTF8 and #PB_Unicode can directly used.

And what has this to do with
does it mean that I can now display emojis above chr 65535 in UTF-8
?

UTF8 can result in up to 4 bytes.
If you can display all this characters depends on the font you are using.
I think you have to switch the font for your wanted emojis an then back to your normal font.

Re: PB 6.20: Emojis above 65535?

Posted: Sun Feb 16, 2025 12:06 pm
by HeX0R

Code: Select all

l = $81989FF0
a$ = PeekS(@l, 4, #PB_UTF8)
Debug a$

Re: PB 6.20: Emojis above 65535?

Posted: Sun Feb 16, 2025 12:29 pm
by infratec

Code: Select all

; https://github.com/idle-PB/UTF16/blob/main/UTF16.pb

IncludeFile "UTF16.pb"

UseModule UTF16

If LoadFont(0, "Segoe UI Emoji", 14)
  SetGadgetFont(#PB_Default, FontID(0))
  Debug "Ok"
EndIf

If OpenWindow(0, 0, 0, 322, 150, "EditorGadget", #PB_Window_SystemMenu | #PB_Window_ScreenCentered)
  EditorGadget(0, 8, 8, 306, 133)
  For a = 0 To 9
    AddGadgetItem(0, a, Hex(128512 + a) + ": " + StrChr(128512 + a))
  Next
  Repeat : Until WaitWindowEvent() = #PB_Event_CloseWindow
EndIf

Re: PB 6.20: Emojis above 65535?

Posted: Sun Feb 16, 2025 12:48 pm
by infratec
Extended HeXOR example:

Code: Select all

; https://www.compart.com/en/unicode/U+1F600


If LoadFont(0, "Segoe UI Emoji", 14)
  SetGadgetFont(#PB_Default, FontID(0))
  Debug "Ok"
EndIf

;0xF0 0x9F 0x98 0x80  emoji 1F600 in utf8 (see link above)
l = $80989FF0 ; in little endian

If OpenWindow(0, 0, 0, 322, 150, "EditorGadget", #PB_Window_SystemMenu | #PB_Window_ScreenCentered)
  EditorGadget(0, 8, 8, 306, 133)
  AddGadgetItem(0, a, "Emoji: " +  PeekS(@l, 4, #PB_UTF8|#PB_ByteLength))
  Repeat : Until WaitWindowEvent() = #PB_Event_CloseWindow
EndIf
If you use a font which includes the emojis, you can show the emojies :wink:

Re: PB 6.20: Emojis above 65535?

Posted: Sun Feb 16, 2025 7:00 pm
by marcoagpinto
Heya,

Why aren't you using

Code: Select all

chr()
in the examples?

I tried to convert to emoji the following values in the Portuguese LibreOffice autocorrect file:

Code: Select all

<block-list:block block-list:abbreviated-name=":zebra:" block-list:name="&#x1F993;"/>
<block-list:block block-list:abbreviated-name=":zeta:" block-list:name="&#x3B6;"/>
<block-list:block block-list:abbreviated-name=":Zeta:" block-list:name="&#x396;"/>
<block-list:block block-list:abbreviated-name=":zombie:" block-list:name="&#x1F9DF;"/>
<block-list:block block-list:abbreviated-name=":zzz:" block-list:name="&#x1F4A4;"/>
And they appear all corrupted.

However, in the tons of emojis which the file has, some appear correct.

I am using the Arial font.

Thanks!

Re: PB 6.20: Emojis above 65535?

Posted: Sun Feb 16, 2025 7:49 pm
by idle
Chr will fail with the debugger enabled as it's intended to return a ucs2 chr

Re: PB 6.20: Emojis above 65535?

Posted: Sun Feb 16, 2025 8:08 pm
by kenmo
PureBasic has "unofficially" supported emoji and other characters >$FFFF for years. Although PB counts all characters as fixed 16-bit (UCS-2?) they seem to be treated as UTF-16 by the operating system when rendered.

But the provided Chr() still doesn't accept higher codepoints and won't return a UTF-16 surrogate pair string.

See my ChrU() procedure or Demivec's _Chr() procedure here:
https://www.purebasic.fr/english/viewtopic.php?t=66836
https://www.purebasic.fr/english/viewtopic.php?t=64947

Re: PB 6.20: Emojis above 65535?

Posted: Sun Feb 16, 2025 8:09 pm
by Little John
marcoagpinto wrote: Sun Feb 16, 2025 7:00 pm Why aren't you using

Code: Select all

chr()
in the examples?
For Unicode codepoints above $FFFF (= 65535), Chr() cannot be used.
Use this replacement instead.

//edit: kenmo was a few seconds quicker. :-)

Re: PB 6.20: Emojis above 65535?

Posted: Sun Feb 16, 2025 8:11 pm
by infratec
Which font did you use ???

Code: Select all

; https://www.compart.com/en/unicode/U+1F600


; zebra:  1F993 -> UTF8 = 0xF0 0x9F 0xA6 0x93
; zeta:   3B6   -> UTF8 = 0xCE 0xB6
; Zeta:   396   -> UTF8 = 0xCE 0x96
; zombie: 1F9DF -> UTF8 = 0xF0 0x9F 0xA7 0x9F
; zzz:    1F4A4 -> UTF8 = 0xF0 0x9F 0x92 0xA4

#zebra = $93a69ff0
#lzeta = $b6ce
#uZeta = $96ce
#zombie = $9fa79ff0
#zzz = $a4929ff0

Define Emoji.l

If LoadFont(0, "Segoe UI Emoji", 14)
  SetGadgetFont(#PB_Default, FontID(0))
EndIf


If OpenWindow(0, 0, 0, 322, 150, "EditorGadget", #PB_Window_SystemMenu | #PB_Window_ScreenCentered)
  EditorGadget(0, 8, 8, 306, 133)
  Emoji = #zebra
  AddGadgetItem(0, 1, PeekS(@Emoji, 4, #PB_UTF8|#PB_ByteLength))
  Emoji = #lzeta
  AddGadgetItem(0, 2, PeekS(@Emoji, 2, #PB_UTF8|#PB_ByteLength))
  Emoji = #uZeta
  AddGadgetItem(0, 3, PeekS(@Emoji, 2, #PB_UTF8|#PB_ByteLength))
  Emoji = #zombie
  AddGadgetItem(0, 4, PeekS(@Emoji, 4, #PB_UTF8|#PB_ByteLength))
  Emoji = #zzz
  AddGadgetItem(0, 5, PeekS(@Emoji, 4, #PB_UTF8|#PB_ByteLength))
  Repeat : Until WaitWindowEvent() = #PB_Event_CloseWindow
EndIf


Re: PB 6.20: Emojis above 65535?

Posted: Sun Feb 16, 2025 8:16 pm
by marcoagpinto
infratec wrote: Sun Feb 16, 2025 8:11 pm Which font did you use ???

Code: Select all

; https://www.compart.com/en/unicode/U+1F600


; zebra:  1F993 -> UTF8 = 0xF0 0x9F 0xA6 0x93
; zeta:   3B6   -> UTF8 = 0xCE 0xB6
; Zeta:   396   -> UTF8 = 0xCE 0x96
; zombie: 1F9DF -> UTF8 = 0xF0 0x9F 0xA7 0x9F
; zzz:    1F4A4 -> UTF8 = 0xF0 0x9F 0x92 0xA4

#zebra = $93a69ff0
#lzeta = $b6ce
#uZeta = $96ce
#zombie = $9fa79ff0
#zzz = $a4929ff0

Define Emoji.l

If LoadFont(0, "Segoe UI Emoji", 14)
  SetGadgetFont(#PB_Default, FontID(0))
EndIf


If OpenWindow(0, 0, 0, 322, 150, "EditorGadget", #PB_Window_SystemMenu | #PB_Window_ScreenCentered)
  EditorGadget(0, 8, 8, 306, 133)
  Emoji = #zebra
  AddGadgetItem(0, 1, PeekS(@Emoji, 4, #PB_UTF8|#PB_ByteLength))
  Emoji = #lzeta
  AddGadgetItem(0, 2, PeekS(@Emoji, 2, #PB_UTF8|#PB_ByteLength))
  Emoji = #uZeta
  AddGadgetItem(0, 3, PeekS(@Emoji, 2, #PB_UTF8|#PB_ByteLength))
  Emoji = #zombie
  AddGadgetItem(0, 4, PeekS(@Emoji, 4, #PB_UTF8|#PB_ByteLength))
  Emoji = #zzz
  AddGadgetItem(0, 5, PeekS(@Emoji, 4, #PB_UTF8|#PB_ByteLength))
  Repeat : Until WaitWindowEvent() = #PB_Event_CloseWindow
EndIf

I have been using Arial for over 10 years.

Maybe it is time to switch font?

Which one do you advice to be used by Windows, Linux and Mac?

Thanks!

Re: PB 6.20: Emojis above 65535?

Posted: Sun Feb 16, 2025 8:18 pm
by kenmo
Little John wrote: Sun Feb 16, 2025 8:09 pm //edit: kenmo was a few seconds quicker. :-)
8)

Re: PB 6.20: Emojis above 65535?

Posted: Sun Feb 16, 2025 8:40 pm
by infratec
As written in my first answer: you need a font with all the emojies inside.
And this is not Arial.

You can try

Code: Select all

LoadFont(0, "Noto Color Emoji", 14)
Maybe this font is available on all OSs if LibreOffice is installed.
But you need such an emoji font only for the emojies. The other text can be printed in Arial.
Or your program has to deliver this font.
The font is a free 'google' font.
https://fonts.google.com/noto/specimen/Noto+Color+Emoji