UTF-8: I don't get it...

Just starting out? Need help? Post your questions and find answers here.
klaver
Enthusiast
Enthusiast
Posts: 147
Joined: Wed Jun 28, 2006 6:55 pm
Location: Schröttersburg

UTF-8: I don't get it...

Post by klaver »

Code: Select all

mem = AllocateMemory(64)
PokeS(mem, "ĄąĆćĘꣳŃńŚśŹźŻż", -1, #PB_UTF8)
For i=0 To 16
  Debug Hex(PeekB(mem+i*2) & $FF) +" "+ Hex(PeekB(mem+i*2+1) & $FF)
Next
Debug PeekS(mem, -1, #PB_UTF8)

Debug PeekS(?Start, -1, #PB_UTF8)

DataSection
  Start:
  Data.b $C4, $84;Ą
  Data.b $C4, $85;ą
  Data.b $C4, $86;Ć
  Data.b $C4, $87;ć
  Data.b $C4, $98;Ę
  Data.b $C4, $99;ę
  Data.b $C5, $81;Ł
  Data.b $C5, $82;ł
  Data.b $C5, $83;Ń
  Data.b $C5, $84;ń
  Data.b $C5, $9A;Ś
  Data.b $C5, $9B;ś
  Data.b $C5, $B9;Ź
  Data.b $C5, $BA;ź
  Data.b $C5, $BB;Ż
  Data.b $C5, $BC;ż
  Data.b $00
EndDataSection
Please run this code with Unicode checked and unchecked. Why nothing is working correctly here?
Why is "ń" converted into "C3 B1" while it should be "C5 84" ? Link: http://www.utf8-chartable.de/unicode-utf8-table.pl
Why is the string from DataSection read as it should in Unicode mode, but in ANSI it's "?????????????????"?

Can someone explain this in simple words for me? :roll:
Thanks in advance.
Image
User avatar
Joakim Christiansen
Addict
Addict
Posts: 2452
Joined: Wed Dec 22, 2004 4:12 pm
Location: Norway
Contact:

Re: UTF-8: I don't get it...

Post by Joakim Christiansen »

klaver wrote:Why is the string from DataSection read as it should in Unicode mode, but in ANSI it's "?????????????????"?
Because all those characters are outside the standard ASCII range (0-127).
I like logic, hence I dislike humans but love computers.
klaver
Enthusiast
Enthusiast
Posts: 147
Joined: Wed Jun 28, 2006 6:55 pm
Location: Schröttersburg

Re: UTF-8: I don't get it...

Post by klaver »

Thanks, but why are characters converted into UTF-8 incorrectly?
Image
srod
PureBasic Expert
PureBasic Expert
Posts: 10589
Joined: Wed Oct 29, 2003 4:35 pm
Location: Beyond the pale...

Re: UTF-8: I don't get it...

Post by srod »

Have you set the IDE file format to UTF-8 ?
I may look like a mule, but I'm not a complete ass.
Trond
Always Here
Always Here
Posts: 7446
Joined: Mon Sep 22, 2003 6:45 pm
Location: Norway

Re: UTF-8: I don't get it...

Post by Trond »

Consider:
1. The integrated debugger is compiled in ascii mode, and can't display unicode-only characters.
2. When compiling your program in ascii mode, literal strings which contain unicode characters will inevitably not contain those characters any more.
3. When compiling in ascii mode, and peeking utf-8 strings, the result is put into a PB ascii string, so unicode-only characters are lost.
Post Reply