PureBasic Forums - English

Posted: **Thu Mar 05, 2015 3:08 am**

I think PB needs built-in functions for Unicode normalization.
Without such functions, Unicode string searches, comparisons and sorting can yield wrong results.

[u]''Unicode equivalence'' in Wikipedia[/u] wrote:For example, the code point U+006E (the Latin lowercase "n") followed by U+0303 (the combining tilde "◌̃") is defined by Unicode to be canonically equivalent to the single code point U+00F1 (the lowercase letter "ñ" of the Spanish alphabet). Therefore, those sequences should be displayed in the same manner, should be treated in the same way by applications such as alphabetizing names or searching, and may be substituted for each other.

Code: Select all

a$ = Chr($006E) + Chr($0303)
b$ = Chr($00F1)

Debug a$
Debug b$

If a$ = b$
   Debug "equal"
Else
   Debug "not equal"
EndIf

Output wrote:ñ
ñ
not equal

If you don't get this output, see here.

If I understand the above mentioned Wikipedia article correctly, then both strings should be considered equal.
If we had a NormalizeString() function, then we could write this:

Code: Select all

a$ = Chr($006E) + Chr($0303)
b$ = Chr($00F1)

Debug a$
Debug b$

If NormalizeString(a$) = NormalizeString(b$)
   Debug "equal"
Else
   Debug "not equal"
EndIf

and this should show "equal".

srod has provided some code here for Windows.
However, we need this on all platforms, and IMHO it is so important that it should be built into PB.

Maybe in Unicode mode also PB's built-in sorting functions should (optionally?) do normalization internally, before comparing strings.

Posted: **Thu Mar 05, 2015 9:46 pm**

maybe Fred could take a look at ICU http://site.icu-project.org/home

Posted: **Thu Mar 05, 2015 10:21 pm**

IMHO it is so important that it should be built into PB

+1

Posted: **Thu Mar 05, 2015 11:24 pm**

Posted: **Fri Mar 06, 2015 12:05 am**

Especially since they're dropping ascii support.

If you say you support (only) unicode, you should (support unicode).

Posted: **Wed Aug 12, 2015 3:57 pm**

see also Converting from UTF-8 NFD to NFC & vice versa

PureBasic Forums - English

Unicode normalization

Unicode normalization

Re: Unicode normalization

Re: Unicode normalization

Re: Unicode normalization

Re: Unicode normalization

Re: Unicode normalization