Page 3 of 3
Re: Upper and Lower Case Mapping for Unicode
Posted: Mon Nov 14, 2022 11:29 pm
by Sicro
Yes. Maybe it should be a separate module because the codes posted in this thread completely fulfills the thread topic, at least the simple variant of case mapping. Comparing strings in a case-insensitive manner is a different topic.
idle wrote: Mon Nov 14, 2022 3:57 am
the goal for example will say that "MASSE" and "Maße" are equal.
This is full case-folding, where the mapping can have different number of letters. In simple case-folding "ss/Ss/sS/SS" and "ß" is different, because simple case-folding supports only mappings with the same number of letters.
With full case-folding and full case-mapping, the topic becomes even more complicated because some Unicode characters can be created by multiple variants of character combinations. To put these character combinations into a normalized form to then apply case-mapping or case-folding, there are several algorithms for normalization and these algorithms sometimes even have to be run multiple times.
To better understand the complexity, I recommend this documentation:
https://www.w3.org/TR/charmod-norm/
Re: Upper and Lower Case Mapping for Unicode
Posted: Tue Nov 15, 2022 12:38 am
by idle
Sicro wrote: Mon Nov 14, 2022 11:29 pm
Yes. Maybe it should be a separate module because the codes posted in this thread completely fulfills the thread topic, at least the simple variant of case mapping. Comparing strings in a case-insensitive manner is a different topic.
idle wrote: Mon Nov 14, 2022 3:57 am
the goal for example will say that "MASSE" and "Maße" are equal.
This is full case-folding, where the mapping can have different number of letters. In simple case-folding "ss/Ss/sS/SS" and "ß" is different, because simple case-folding supports only mappings with the same number of letters.
With full case-folding and full case-mapping, the topic becomes even more complicated because some Unicode characters can be created by multiple variants of character combinations. To put these character combinations into a normalized form to then apply case-mapping or case-folding, there are several algorithms for normalization and these algorithms sometimes even have to be run multiple times.
To better understand the complexity, I recommend this documentation:
https://www.w3.org/TR/charmod-norm/
I think this works as intended to preform a full case folding string cmp, it's case in sensitive.
https://www.unicode.org/Public/UCD/late ... olding.txt
I will leave it here for pickings and if it's correct I will post is it's own thread. I don't have any need for if but it might be useful to those following this topic.
https://dnscope.io/idlefiles/casefold.pb
Re: Upper and Lower Case Mapping for Unicode
Posted: Sun Dec 04, 2022 8:28 pm
by Sicro
Yes, usually case folding is applied for comparing two strings. In this thread, you just included a simple variant in your code and that's totally ok, because this is actually about case mapping and case folding is a different topic. So it's all good. I just wanted to briefly mention how it's usually done. I've written you a PN, so it doesn't get too off-topic here.