Page 1 of 1
Sort using natural instead of ASCII
Posted: Wed Aug 17, 2016 2:59 pm
by marcoagpinto
Hello!
The way PB sorts the arrays isn't very good.
It uses ASCII instead of the natural order, which means that all words with accents are moved to the end of the list
There should be a special flag for the sorting command: #NATURAL or #LATIN
Re: Sort using natural instead of ASCII
Posted: Thu Aug 18, 2016 10:42 pm
by Andre
As a workaround I'm currently using another field in the structured list or array, which should be sorted.
In this extra structure element I copy the field contents to sort (e.g. names), but convert "special chars" before.
Or if I made it by hand, then mostly the first 4-5 chars should be enough for sorting...
But anyway I native support would be good!

Re: Sort using natural instead of ASCII
Posted: Fri Aug 19, 2016 10:08 am
by wilbert
marcoagpinto wrote:There should be a special flag for the sorting command: #NATURAL or #LATIN
There is no single natural order. Characters can have different order depending on the language that is used. To implement this you would need the specify what language you are using.
Re: Sort using natural instead of ASCII
Posted: Fri Aug 19, 2016 10:40 am
by acreis
Does not unicode solve this issue?
Re: Sort using natural instead of ASCII
Posted: Fri Aug 19, 2016 2:23 pm
by helpy
wilbert wrote:There is no single natural order. Characters can have different order depending on the language that is used. To implement this you would need the specify what language you are using.
Yes! This is right!
More information:
==>
http://site.icu-project.org/
==>
https://en.wikipedia.org/wiki/Unicode_c ... _algorithm
==>
https://en.wikipedia.org/wiki/European_ordering_rules
acreis wrote:Does not unicode solve this issue?
Not the order of characters according to the unicode value!
The order of characters is different depending on the language!
==> see mentioned links!
But Unicode defines more than the characters and the according unicode value!
The Unicode collation algorithm (UCA) is an algorithm defined in Unicode Technical Report #10, which defines a customizable method to compare two strings. These comparisons can then be used to collate or sort text in any writing system and language that can be represented with Unicode.
Unicode Technical Report #10 also specifies the Default Unicode Collation Element Table (DUCET). This datafile specifies the default collation ordering. The DUCET is customizable for different languages. Some such customisations can be found in Common Locale Data Repository (CLDR).
Re: Sort using natural instead of ASCII
Posted: Fri Aug 19, 2016 4:52 pm
by Derren
You never know if "-" will get sorted before or after "A".
I don't know about PB, but in PHP the natural order is mainly used for numbers. Maybe PB does that correctly already, but I fear that if it would, there would be no request.
Unsorted: 4, 1, 7, 11
Sorted: 1, 11, 4, 7
Naturally Sorted: 1, 4, 7, 11
Also: g, c, A, b, C
Sorted: A, C, b, c, g
Naturally Sorted: A, b, (c, C OR C, c), g
How you would sort this, I have no clue. But I nver had to deal with things like that, either.
Oh well, helpy's 2nd link explains that at least. So what's the big issue?
a, c, α δ, p, π
Re: Sort using natural instead of ASCII
Posted: Sat Aug 20, 2016 12:28 pm
by Andre
Seeing this replies I think the programmer should do the job and define the "sorting field" contents, just like I described in my previous post...!?

Re: Sort using natural instead of ASCII
Posted: Sat Aug 20, 2016 3:45 pm
by acreis
I couldn't agree more with Andre
I think an exclusive sort field will speed up the processing too and do a lot of tricks, supressing undesireables char, for instance.
Re: Sort using natural instead of ASCII
Posted: Tue Aug 30, 2016 8:19 am
by marcoagpinto
Andre wrote:As a workaround I'm currently using another field in the structured list or array, which should be sorted.
In this extra structure element I copy the field contents to sort (e.g. names), but convert "special chars" before.
Or if I made it by hand, then mostly the first 4-5 chars should be enough for sorting...
But anyway I native support would be good!

Andre, could you provide a code snippet for me to use?
Thanks!
Re: Sort using natural instead of ASCII
Posted: Sat Sep 17, 2016 1:29 pm
by Andre
Hi Marco,
sorry for the late reply, I'm just back from 3 weeks holiday with my family...
In the meantime this discussion with further code snippets is already ongoing here:
http://www.purebasic.fr/english/viewtop ... 13&t=66601
Unfortunately I can't provide a simple working snippets. The complete structure design, loading and converting data (names with special chars), sorting, etc. is completely integrated in a big project...
A short description looks as this:
- build a structure including all needed fields (name with special chars, name with converted special chars, other data fields)
- build a linked list using this structure
- load the needed data from disk into this linked list (all data is loaded 1:1 from disk, except the "name converted" is fill programmatically using a conversion routine replacing all special chars with regular ascii chars)
- do a SortStructuredList() by using this "name converted" field
This is it
Using arrays is probably faster (not tested), but the way described above is fast enough for me, as I prefer to have all data in one structured list...
Re: Sort using natural instead of ASCII
Posted: Fri Sep 23, 2016 6:13 pm
by mk-soft
Perhaps "SetSortCallback(@SortProc())
Procedure SortProc(*Data1, *Data2)
...
ProcedureReturn #True or #False
EndProcedure