Quirky Sorting with the Sort_NoCase flag in Russian & Beyond

Just starting out? Need help? Post your questions and find answers here.
staringfrog
User
User
Posts: 58
Joined: Wed Feb 27, 2013 9:36 am

Quirky Sorting with the Sort_NoCase flag in Russian & Beyond

Post by staringfrog »

Reposted from the Russian forum (http://purebasic.info/phpBB3ex/viewtopic.php?f=1&t=3896) as tested by several users on various machines and PB builds.

Sorting in English with #PB_Sort_NoCase flag (both in ASCII and Unicode) results in:
AAA
aaa
asdfre
ASDFRE
BUT
but
fghj
FGHJ


while sorting in Russian with #PB_Sort_NoCase flag (both in ASCII and Unicode) outputs this:
ААА
АПРОЛДЖЭ
БЮ
ЙЦУКЕН
ааа
апролджэ
бю
йцукен


That is, CAPS go first (A-Z), and then the rest of it (a-z).

In other words, #PB_Sort_NoCase isn't working at all.

Or do we miss something of crucial importance in our code?

We also wonder what's your rationale behind these fancy fluctuations of order between upper and lower case strings, otherwise identical (AAA, aaa, bbb, BBB, CCC, ccc etc, see the first listing above), observed while sorting in Latin and Cyrillic alphabets alike? For Russian, it is demonstrated in the second debug output of the structured array being sorted with U-cased key offset (in the snippet below).

It appears like PB is always wavering uncertainly which case should go first, sob or SOB 8) (Longman says sob's first). How about bringing this to some conventional order? NoCase doesn't mean NoOrder, as a matter of fact.

Code: Select all

EnableExplicit

Structure FF
   name.s
   uname.s
EndStructure

Define i.i
Define Dim s.s(7), Dim fs.ff(7)
; s(0) = "fghj"
; s(1) = "FGHJ"
; s(2) = "asdfre"
; s(3) = "ASDFRE"
; s(4) = "BUT"
; s(5) = "but"
; s(6) = "AAA"
; s(7) = "aaa"
s(0) = "йцукен"
s(1) = "ЙЦУКЕН"
s(2) = "апролджэ"
s(3) = "АПРОЛДЖЭ"
s(4) = "БЮ"
s(5) = "бю"
s(6) = "ААА"
s(7) = "ааа"
For i=0 To ArraySize(fs())
   fs(i)\name = s(i)
   fs(i)\uname = UCase(s(i))
Next
SortArray(s(),#PB_Sort_Ascending|#PB_Sort_NoCase)
Debug s(0)
Debug s(1)
Debug s(2)
Debug s(3)
Debug s(4)
Debug s(5)
Debug s(6)
Debug s(7)
Debug ""
SortStructuredArray(fs(),#PB_Sort_Ascending,OffsetOf(FF\uname),#PB_String)
Debug fs(0)\name
Debug fs(1)\name
Debug fs(2)\name
Debug fs(3)\name
Debug fs(4)\name
Debug fs(5)\name
Debug fs(6)\name
Debug fs(7)\name
Debug ""
P.S. Reported in the Windows Bugs Section here
Coding's men's knitwork.