Page 1 of 1

Unicode Text output

Posted: Fri Feb 02, 2024 1:55 pm
by matalog
How can I write to a file so that it will correctly output Unicode codes, which will be recognised on online pages etc resulting in the characters being displayed.

I found a list of Unicode codes here: https://www.unicode.org/Public/UCD/late ... Charts.pdf.

How can I output code 1F0B6 πŸ‚Ά PLAYING CARD SIX OF HEARTS for example?

Or this

Code: Select all

𝔸𝔹ℂ 𝕒𝕓𝕔 πŸ™πŸšπŸ›
Which was copied and pasted from one of those online font converters.

I will have an input box and then the asci codes will reference to the codes in an array from the relevant font. I assume I will have to use byte output. Does anyone know how the output of each code would be represented in bytes?

Re: Unicode Text output

Posted: Fri Feb 02, 2024 2:10 pm
by Fred
You can use the excellent UTF-16 module by idle: https://www.purebasic.fr/english/viewtopic.php?t=80275

Code: Select all

Procedure.s StrChr(v.i) ;return a proper surrogate pair for unicode values outside the BMP (Basic Multilingual Plane)
	Protected buffer.q
	If v < $10000
		ProcedureReturn Chr(v)
	Else
		Buffer = (v&$3FF)<<16 | (v-$10000)>>10 | $DC00D800
		ProcedureReturn PeekS(@Buffer, 2, #PB_Unicode)
	EndIf
EndProcedure

a$ = StrChr($1F0B6)
Debug a$

Re: Unicode Text output

Posted: Fri Feb 02, 2024 3:57 pm
by matalog
That's great, it seems to work for most of Unicode. I don't expect to be using the eastern characters it doesn't work with so it is good for me.


Would there to get the Unicode code from a pasted character, like the ABC123abc I included in the first post, to see what the large Unicode reference list calls them?

Re: Unicode Text output

Posted: Fri Feb 02, 2024 4:27 pm
by matalog
I'm struggling to understand the relation between the Unicode Codes and the Hex of a TXT file containing them, for example:

Code: Select all

a$ = StrChr($1F150)+StrChr($1F0B2)+StrChr($1F0B3)
Converted into πŸ…πŸ‚²πŸ‚³ then that txt file saved and then opened in HxD results in: F0 9F 85 90 F0 9F 82 B2 F0 9F 82 B3

It looks like F0 is the marker of a 4 byte code for a character, not sure about any more.

Re: Unicode Text output

Posted: Fri Feb 02, 2024 6:46 pm
by Demivec
matalog wrote: Fri Feb 02, 2024 4:27 pm I'm struggling to understand the relation between the Unicode Codes and the Hex of a TXT file containing them, for example:

Code: Select all

a$ = StrChr($1F150)+StrChr($1F0B2)+StrChr($1F0B3)
Converted into πŸ…πŸ‚²πŸ‚³ then that txt file saved and then opened in HxD results in: F0 9F 85 90 F0 9F 82 B2 F0 9F 82 B3

It looks like F0 is the marker of a 4 byte code for a character, not sure about any more.
The code example is using UTF-16 and the values from the text file are in UTF-8.

Re: Unicode Text output

Posted: Sat Feb 03, 2024 6:07 am
by idle
Fred wrote: Fri Feb 02, 2024 2:10 pm You can use the excellent UTF-16 module by idle: https://www.purebasic.fr/english/viewtopic.php?t=80275

Code: Select all

Procedure.s StrChr(v.i) ;return a proper surrogate pair for unicode values outside the BMP (Basic Multilingual Plane)
	Protected buffer.q
	If v < $10000
		ProcedureReturn Chr(v)
	Else
		Buffer = (v&$3FF)<<16 | (v-$10000)>>10 | $DC00D800
		ProcedureReturn PeekS(@Buffer, 2, #PB_Unicode)
	EndIf
EndProcedure

a$ = StrChr($1F0B6)
Debug a$
but that's missing the other 12000 lines of code :lol: