PureBasic Forums - English

Posted: **Mon Nov 14, 2022 11:29 pm**

idle wrote: Mon Nov 14, 2022 3:57 am I would need to implement
https://www.unicode.org/Public/UCD/late ... olding.txt

Yes. Maybe it should be a separate module because the codes posted in this thread completely fulfills the thread topic, at least the simple variant of case mapping. Comparing strings in a case-insensitive manner is a different topic.

idle wrote: Mon Nov 14, 2022 3:57 am the goal for example will say that "MASSE" and "Maße" are equal.

This is full case-folding, where the mapping can have different number of letters. In simple case-folding "ss/Ss/sS/SS" and "ß" is different, because simple case-folding supports only mappings with the same number of letters.

With full case-folding and full case-mapping, the topic becomes even more complicated because some Unicode characters can be created by multiple variants of character combinations. To put these character combinations into a normalized form to then apply case-mapping or case-folding, there are several algorithms for normalization and these algorithms sometimes even have to be run multiple times.

To better understand the complexity, I recommend this documentation:
https://www.w3.org/TR/charmod-norm/

Posted: **Tue Nov 15, 2022 12:38 am**

Sicro wrote: Mon Nov 14, 2022 11:29 pm
idle wrote: Mon Nov 14, 2022 3:57 am I would need to implement
https://www.unicode.org/Public/UCD/late ... olding.txt
Yes. Maybe it should be a separate module because the codes posted in this thread completely fulfills the thread topic, at least the simple variant of case mapping. Comparing strings in a case-insensitive manner is a different topic.

idle wrote: Mon Nov 14, 2022 3:57 am the goal for example will say that "MASSE" and "Maße" are equal.
This is full case-folding, where the mapping can have different number of letters. In simple case-folding "ss/Ss/sS/SS" and "ß" is different, because simple case-folding supports only mappings with the same number of letters.

With full case-folding and full case-mapping, the topic becomes even more complicated because some Unicode characters can be created by multiple variants of character combinations. To put these character combinations into a normalized form to then apply case-mapping or case-folding, there are several algorithms for normalization and these algorithms sometimes even have to be run multiple times.

To better understand the complexity, I recommend this documentation:
https://www.w3.org/TR/charmod-norm/

I think this works as intended to preform a full case folding string cmp, it's case in sensitive.
https://www.unicode.org/Public/UCD/late ... olding.txt
I will leave it here for pickings and if it's correct I will post is it's own thread. I don't have any need for if but it might be useful to those following this topic.
https://dnscope.io/idlefiles/casefold.pb

Posted: **Sun Dec 04, 2022 8:28 pm**

idle wrote: Tue Nov 15, 2022 12:38 am I think this works as intended to preform a full case folding string cmp, it's case in sensitive.
https://www.unicode.org/Public/UCD/late ... olding.txt
I will leave it here for pickings and if it's correct I will post is it's own thread. I don't have any need for if but it might be useful to those following this topic.
https://dnscope.io/idlefiles/casefold.pb

Yes, usually case folding is applied for comparing two strings. In this thread, you just included a simple variant in your code and that's totally ok, because this is actually about case mapping and case folding is a different topic. So it's all good. I just wanted to briefly mention how it's usually done. I've written you a PN, so it doesn't get too off-topic here.

PureBasic Forums - English

Upper and Lower Case Mapping for Unicode

Re: Upper and Lower Case Mapping for Unicode

Re: Upper and Lower Case Mapping for Unicode

Re: Upper and Lower Case Mapping for Unicode