Page 2 of 2

Re: UTF8 and strings...

Posted: Mon Jan 18, 2010 11:05 am
by helpy
Behaviour changed in 4.41 RC1:

unicode compiler option == OFF ==> Result of TEST:

Code: Select all

PeekS(?string,?dataEnd-?string,#PB_UTF8) ==> Test ? 123

Dump of ?string:
004223FC  54 65 73 74 20 E9 20 31 32 33                      |Test é 123|
unicode compiler option == ON ==> Result of TEST:

Code: Select all

PeekS(?string,?dataEnd-?string,#PB_UTF8) ==> Test  123

Dump of ?string:
004224B8  54 65 73 74 20 E9 20 31 32 33                      |Test é 123|

Re: UTF8 and strings...

Posted: Mon Jan 18, 2010 12:02 pm
by Joakim Christiansen
helpy wrote:Behaviour changed in 4.41 RC1:
Yeah, I posted my example in the bug section and they wrote "fixed".
And I'm happy with the new behavior (still maybe not perfect, but okay enough for me). That's whats so nice with PureBasic; the developers listen to their users.

Re: UTF8 and strings...

Posted: Mon Jan 18, 2010 12:59 pm
by Trond
Extended ascii (> 128) can't be read by UTF-8, this is by design.

Also, you can't expect unicode characters to display the same in both unicode and ascii mode. (When you use an UTF-8-encoded é, it's a unicode character.) Even if the character is the same as the ascii character é, they have different numbers, and when converting a character with character number > 255 to ascii, it will be lost.

Of course, just cutting the string at this character (like in 4.40) isn't the right thing to do, though.