Sort using natural instead of ASCII

Got an idea for enhancing PureBasic? New command(s) you'd like to see?
User avatar
marcoagpinto
Addict
Addict
Posts: 1045
Joined: Sun Mar 10, 2013 3:01 pm
Location: Portugal
Contact:

Sort using natural instead of ASCII

Post by marcoagpinto »

Hello!

The way PB sorts the arrays isn't very good.

It uses ASCII instead of the natural order, which means that all words with accents are moved to the end of the list :(

There should be a special flag for the sorting command: #NATURAL or #LATIN
User avatar
Andre
PureBasic Team
PureBasic Team
Posts: 2137
Joined: Fri Apr 25, 2003 6:14 pm
Location: Germany (Saxony, Deutscheinsiedel)
Contact:

Re: Sort using natural instead of ASCII

Post by Andre »

As a workaround I'm currently using another field in the structured list or array, which should be sorted.
In this extra structure element I copy the field contents to sort (e.g. names), but convert "special chars" before.
Or if I made it by hand, then mostly the first 4-5 chars should be enough for sorting...

But anyway I native support would be good! :mrgreen:
Bye,
...André
(PureBasicTeam::Docs & Support - PureArea.net | Order:: PureBasic | PureVisionXP)
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3942
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Re: Sort using natural instead of ASCII

Post by wilbert »

marcoagpinto wrote:There should be a special flag for the sorting command: #NATURAL or #LATIN
There is no single natural order. Characters can have different order depending on the language that is used. To implement this you would need the specify what language you are using.
Windows (x64)
Raspberry Pi OS (Arm64)
acreis
Enthusiast
Enthusiast
Posts: 204
Joined: Fri Jun 01, 2012 12:20 am

Re: Sort using natural instead of ASCII

Post by acreis »

Does not unicode solve this issue?
User avatar
helpy
Enthusiast
Enthusiast
Posts: 552
Joined: Sat Jun 28, 2003 12:01 am

Re: Sort using natural instead of ASCII

Post by helpy »

wilbert wrote:There is no single natural order. Characters can have different order depending on the language that is used. To implement this you would need the specify what language you are using.
Yes! This is right!

More information:
==> http://site.icu-project.org/
==> https://en.wikipedia.org/wiki/Unicode_c ... _algorithm
==> https://en.wikipedia.org/wiki/European_ordering_rules
acreis wrote:Does not unicode solve this issue?
Not the order of characters according to the unicode value!
The order of characters is different depending on the language!
==> see mentioned links!

But Unicode defines more than the characters and the according unicode value!
The Unicode collation algorithm (UCA) is an algorithm defined in Unicode Technical Report #10, which defines a customizable method to compare two strings. These comparisons can then be used to collate or sort text in any writing system and language that can be represented with Unicode.

Unicode Technical Report #10 also specifies the Default Unicode Collation Element Table (DUCET). This datafile specifies the default collation ordering. The DUCET is customizable for different languages. Some such customisations can be found in Common Locale Data Repository (CLDR).
Windows 10 / Windows 7
PB Last Final / Last Beta Testing
User avatar
Derren
Enthusiast
Enthusiast
Posts: 316
Joined: Sat Jul 23, 2011 1:13 am
Location: Germany

Re: Sort using natural instead of ASCII

Post by Derren »

You never know if "-" will get sorted before or after "A".
I don't know about PB, but in PHP the natural order is mainly used for numbers. Maybe PB does that correctly already, but I fear that if it would, there would be no request.

Unsorted: 4, 1, 7, 11
Sorted: 1, 11, 4, 7
Naturally Sorted: 1, 4, 7, 11

Also: g, c, A, b, C
Sorted: A, C, b, c, g
Naturally Sorted: A, b, (c, C OR C, c), g

How you would sort this, I have no clue. But I nver had to deal with things like that, either.
Oh well, helpy's 2nd link explains that at least. So what's the big issue?
a, c, α δ, p, π
User avatar
Andre
PureBasic Team
PureBasic Team
Posts: 2137
Joined: Fri Apr 25, 2003 6:14 pm
Location: Germany (Saxony, Deutscheinsiedel)
Contact:

Re: Sort using natural instead of ASCII

Post by Andre »

Seeing this replies I think the programmer should do the job and define the "sorting field" contents, just like I described in my previous post...!? :D
Bye,
...André
(PureBasicTeam::Docs & Support - PureArea.net | Order:: PureBasic | PureVisionXP)
acreis
Enthusiast
Enthusiast
Posts: 204
Joined: Fri Jun 01, 2012 12:20 am

Re: Sort using natural instead of ASCII

Post by acreis »

I couldn't agree more with Andre

I think an exclusive sort field will speed up the processing too and do a lot of tricks, supressing undesireables char, for instance.
User avatar
marcoagpinto
Addict
Addict
Posts: 1045
Joined: Sun Mar 10, 2013 3:01 pm
Location: Portugal
Contact:

Re: Sort using natural instead of ASCII

Post by marcoagpinto »

Andre wrote:As a workaround I'm currently using another field in the structured list or array, which should be sorted.
In this extra structure element I copy the field contents to sort (e.g. names), but convert "special chars" before.
Or if I made it by hand, then mostly the first 4-5 chars should be enough for sorting...

But anyway I native support would be good! :mrgreen:
Andre, could you provide a code snippet for me to use?

Thanks!
User avatar
Andre
PureBasic Team
PureBasic Team
Posts: 2137
Joined: Fri Apr 25, 2003 6:14 pm
Location: Germany (Saxony, Deutscheinsiedel)
Contact:

Re: Sort using natural instead of ASCII

Post by Andre »

Hi Marco,

sorry for the late reply, I'm just back from 3 weeks holiday with my family... :mrgreen:

In the meantime this discussion with further code snippets is already ongoing here: http://www.purebasic.fr/english/viewtop ... 13&t=66601

Unfortunately I can't provide a simple working snippets. The complete structure design, loading and converting data (names with special chars), sorting, etc. is completely integrated in a big project...

A short description looks as this:
- build a structure including all needed fields (name with special chars, name with converted special chars, other data fields)
- build a linked list using this structure
- load the needed data from disk into this linked list (all data is loaded 1:1 from disk, except the "name converted" is fill programmatically using a conversion routine replacing all special chars with regular ascii chars)
- do a SortStructuredList() by using this "name converted" field

This is it :D

Using arrays is probably faster (not tested), but the way described above is fast enough for me, as I prefer to have all data in one structured list...
Bye,
...André
(PureBasicTeam::Docs & Support - PureArea.net | Order:: PureBasic | PureVisionXP)
User avatar
mk-soft
Always Here
Always Here
Posts: 6209
Joined: Fri May 12, 2006 6:51 pm
Location: Germany

Re: Sort using natural instead of ASCII

Post by mk-soft »

Perhaps "SetSortCallback(@SortProc())

Procedure SortProc(*Data1, *Data2)
...
ProcedureReturn #True or #False
EndProcedure
My Projects ThreadToGUI / OOP-BaseClass / EventDesigner V3
PB v3.30 / v5.75 - OS Mac Mini OSX 10.xx - VM Window Pro / Linux Ubuntu
Downloads on my Webspace / OneDrive
Post Reply