Page 1 of 1
Unicode Text output
Posted: Fri Feb 02, 2024 1:55 pm
by matalog
How can I write to a file so that it will correctly output Unicode codes, which will be recognised on online pages etc resulting in the characters being displayed.
I found a list of Unicode codes here:
https://www.unicode.org/Public/UCD/late ... Charts.pdf.
How can I output code 1F0B6 πΆ PLAYING CARD SIX OF HEARTS for example?
Or this
Code: Select all
πΈπΉβ πππ πππ
Which was copied and pasted from one of those online font converters.
I will have an input box and then the asci codes will reference to the codes in an array from the relevant font. I assume I will have to use byte output. Does anyone know how the output of each code would be represented in bytes?
Re: Unicode Text output
Posted: Fri Feb 02, 2024 2:10 pm
by Fred
You can use the excellent UTF-16 module by idle:
https://www.purebasic.fr/english/viewtopic.php?t=80275
Code: Select all
Procedure.s StrChr(v.i) ;return a proper surrogate pair for unicode values outside the BMP (Basic Multilingual Plane)
Protected buffer.q
If v < $10000
ProcedureReturn Chr(v)
Else
Buffer = (v&$3FF)<<16 | (v-$10000)>>10 | $DC00D800
ProcedureReturn PeekS(@Buffer, 2, #PB_Unicode)
EndIf
EndProcedure
a$ = StrChr($1F0B6)
Debug a$
Re: Unicode Text output
Posted: Fri Feb 02, 2024 3:57 pm
by matalog
That's great, it seems to work for most of Unicode. I don't expect to be using the eastern characters it doesn't work with so it is good for me.
Would there to get the Unicode code from a pasted character, like the ABC123abc I included in the first post, to see what the large Unicode reference list calls them?
Re: Unicode Text output
Posted: Fri Feb 02, 2024 4:27 pm
by matalog
I'm struggling to understand the relation between the Unicode Codes and the Hex of a TXT file containing them, for example:
Code: Select all
a$ = StrChr($1F150)+StrChr($1F0B2)+StrChr($1F0B3)
Converted into π
π²π³ then that txt file saved and then opened in HxD results in: F0 9F 85 90 F0 9F 82 B2 F0 9F 82 B3
It looks like F0 is the marker of a 4 byte code for a character, not sure about any more.
Re: Unicode Text output
Posted: Fri Feb 02, 2024 6:46 pm
by Demivec
matalog wrote: Fri Feb 02, 2024 4:27 pm
I'm struggling to understand the relation between the Unicode Codes and the Hex of a TXT file containing them, for example:
Code: Select all
a$ = StrChr($1F150)+StrChr($1F0B2)+StrChr($1F0B3)
Converted into π
π²π³ then that txt file saved and then opened in HxD results in: F0 9F 85 90 F0 9F 82 B2 F0 9F 82 B3
It looks like F0 is the marker of a 4 byte code for a character, not sure about any more.
The code example is using UTF-16 and the values from the text file are in UTF-8.
Re: Unicode Text output
Posted: Sat Feb 03, 2024 6:07 am
by idle
Fred wrote: Fri Feb 02, 2024 2:10 pm
You can use the excellent UTF-16 module by idle:
https://www.purebasic.fr/english/viewtopic.php?t=80275
Code: Select all
Procedure.s StrChr(v.i) ;return a proper surrogate pair for unicode values outside the BMP (Basic Multilingual Plane)
Protected buffer.q
If v < $10000
ProcedureReturn Chr(v)
Else
Buffer = (v&$3FF)<<16 | (v-$10000)>>10 | $DC00D800
ProcedureReturn PeekS(@Buffer, 2, #PB_Unicode)
EndIf
EndProcedure
a$ = StrChr($1F0B6)
Debug a$
but that's missing the other 12000 lines of code
