PureBasic Forums - English

Posted: **Tue Mar 09, 2010 3:13 am**

mem = AllocateMemory(64)
PokeS(mem, "ĄąĆćĘęŁłŃńŚśŹźŻż", -1, #PB_UTF8)
For i=0 To 16
  Debug Hex(PeekB(mem+i*2) & $FF) +" "+ Hex(PeekB(mem+i*2+1) & $FF)
Next
Debug PeekS(mem, -1, #PB_UTF8)

Debug PeekS(?Start, -1, #PB_UTF8)

DataSection
  Start:
  Data.b $C4, $84;Ą
  Data.b $C4, $85;ą
  Data.b $C4, $86;Ć
  Data.b $C4, $87;ć
  Data.b $C4, $98;Ę
  Data.b $C4, $99;ę
  Data.b $C5, $81;Ł
  Data.b $C5, $82;ł
  Data.b $C5, $83;Ń
  Data.b $C5, $84;ń
  Data.b $C5, $9A;Ś
  Data.b $C5, $9B;ś
  Data.b $C5, $B9;Ź
  Data.b $C5, $BA;ź
  Data.b $C5, $BB;Ż
  Data.b $C5, $BC;ż
  Data.b $00
EndDataSection

Please run this code with Unicode checked and unchecked. Why nothing is working correctly here?
Why is "ń" converted into "C3 B1" while it should be "C5 84" ? Link: http://www.utf8-chartable.de/unicode-utf8-table.pl
Why is the string from DataSection read as it should in Unicode mode, but in ANSI it's "?????????????????"?

Can someone explain this in simple words for me? :roll:
Thanks in advance.

Posted: **Tue Mar 09, 2010 6:34 am**

klaver wrote:Why is the string from DataSection read as it should in Unicode mode, but in ANSI it's "?????????????????"?

Because all those characters are outside the standard ASCII range (0-127).

Posted: **Wed Mar 10, 2010 2:26 pm**

Thanks, but why are characters converted into UTF-8 incorrectly?

Posted: **Wed Mar 10, 2010 2:40 pm**

Have you set the IDE file format to UTF-8 ?

Posted: **Wed Mar 10, 2010 3:01 pm**

Consider:
1. The integrated debugger is compiled in ascii mode, and can't display unicode-only characters.
2. When compiling your program in ascii mode, literal strings which contain unicode characters will inevitably not contain those characters any more.
3. When compiling in ascii mode, and peeking utf-8 strings, the result is put into a PB ascii string, so unicode-only characters are lost.

PureBasic Forums - English

UTF-8: I don't get it...

UTF-8: I don't get it...

Re: UTF-8: I don't get it...

Re: UTF-8: I don't get it...

Re: UTF-8: I don't get it...

Re: UTF-8: I don't get it...