Page 1 of 1

Sort using natural instead of ASCII

Posted: Thu Sep 15, 2016 11:35 am
by marcoagpinto
Hello!

Andre hasn't replied so far to my topic:
http://www.purebasic.fr/english/viewtop ... =3&t=66401

Could someone with the knowledge provide a code snippet?

Thanks!

Re: Sort using natural instead of ASCII

Posted: Thu Sep 15, 2016 3:44 pm
by Thunder93
Natural Order String Comparison - http://sourcefrog.net/projects/natsort/


.. this is what you looking for? to be-able to do with PB right? :wink:

Re: Sort using natural instead of ASCII

Posted: Thu Sep 15, 2016 4:20 pm
by marcoagpinto
Thunder93 wrote:Natural Order String Comparison - http://sourcefrog.net/projects/natsort/


.. this is what you looking for? to be-able to do with PB right? :wink:
Not really... it is a Hunspell project... I want to be able to sort words with accents àáéèú... etc. in a natural order and not at the bottom of the array.

Re: Sort using natural instead of ASCII

Posted: Thu Sep 15, 2016 4:29 pm
by Thunder93
I should have left my original post... It was just working with the first character. It might give you some ideas.

Code: Select all

0: !
1: 1 Hoo
2: 2 Hoo
3: [
4: ]
5: `
6: Cat
7: Dog
8: Micky
9: z1
10: z11
11: Zoo
12: È

Mark1: 0 ( ! )
Mark2: 6 ( Cat )
Mark3: 12 ( È )

----
!
1 Hoo
2 Hoo
[
]
`
È
èè
ó
Cat
Dog
Micky
z1
z11
Zoo

Code: Select all

Dim MyArray.s(14) : Dim MyArray2.s(14)

MyArray(0) = "Micky"
MyArray(1) = "Zoo"
MyArray(2) = "Dog"
MyArray(3) = "Cat"
MyArray(4) = "1 Hoo"
MyArray(5) = "2 Hoo"
MyArray(6) = "z1"
MyArray(7) = "z11"
MyArray(8) = "`"
MyArray(9) = "ó"
MyArray(10) = "È"
MyArray(11) = "èè"
MyArray(12) = "["
MyArray(13) = "!"
MyArray(14) = "]"


SortArray(MyArray(), #PB_Sort_Ascending|#PB_Sort_NoCase) : Num.l : Mark1.l = -1 : Mark2.l = -1 : Mark3.l = -1

For k=0 To ArraySize(MyArray())
  Debug Str(k)+": "+MyArray(k)
  Num = Asc(Mid(MyArray(k), 1, 1))
  
  If (Mark1 = -1) And (Num > 32 And Num < 65)
    Mark1 = k
  ElseIf (Mark2 = -1) And Not (Num > 90 And Num < 97) And (Num > 64 And Num < 123)
    Mark2 = k
  ElseIf Num > 191 And Num < 256
    Mark3 = k
    Break
  EndIf
Next

Debug ""
Debug "Mark1: "+Str(Mark1)+" ( "+MyArray(Mark1)+" )"
Debug "Mark2: "+Str(Mark2)+" ( "+MyArray(Mark2)+" )"
Debug "Mark3: "+Str(Mark3)+" ( "+MyArray(Mark3)+" )"
Debug ""

For k=Mark1 To Mark2-1
  MyArray2(kk) = MyArray(k)
  kk+1
Next

For k=Mark3 To ArraySize(MyArray())
  MyArray2(kk) = MyArray(k)
  
  If MyArray2(kk) = "" : Break : EndIf
  kk+1
Next

For k=Mark2 To Mark3-1
  MyArray2(kk) = MyArray(k)
  kk+1
Next

CopyArray(MyArray2(), MyArray())
FreeArray(MyArray2())


Debug "----"
For k=0 To ArraySize(MyArray())
  Debug MyArray(k)
Next

Re: Sort using natural instead of ASCII

Posted: Thu Sep 15, 2016 5:19 pm
by marcoagpinto
I had a crazy idea:

I have a main array with the words:
main_array$(0)="até"
main_array$(1)="atão"
main_array$(2)="Luís"
etc.

and a secondary array blank:
secondary_array$(0)=""
secondary_array$(1)=""
secondary_array$(2)=""

What if do a loop:
For f=0 to 2
t$=main_array$(f)
> Here I replace all accents with letters without accents (using the direct replace command, one line per each existing accent) and put t$ lowercase,
then I add to the t$ the current position after a chr(9):
t$+chr(9)+str(f)
secondary_array$(f)=t$
next f


Then I use the command SORT on the secondary array.

Then I do a loop in the secondary array replacing each array position with the text from the main_array:

For f=0 to 2
secondary_array$(f)=main_array$(stringfield(secondary_array$(f),2,chr(9))
next f

Then I just copy from the secondary array to the main.

Does this sound viable?

It was the best I could think of.

Re: Sort using natural instead of ASCII

Posted: Thu Sep 15, 2016 5:54 pm
by wilbert
You could also use the api procedure qsort which uses a callback and create your own compare procedure.

Re: Sort using natural instead of ASCII

Posted: Thu Sep 15, 2016 7:02 pm
by wilbert
Here's an example using qsort.
It might be needed to extend the lookup table to unicode values between 256 and 591.

Code: Select all

; import qsort procedure
ImportC ""
  qsort(*base, num, size, *comparator)
EndImport 

; lookup table
DataSection
  LUT_Compare:
  Data.u 64,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,
         112,113,114,115,116,117,118,119,120,121,122,91,92,93,94,95,
         96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,
         112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,
         128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,
         144,145,146,147,148,149,150,151,152,153,154,155,156,157,158,159,
         160,161,162,163,164,165,166,167,168,169,170,171,172,173,174,175,
         176,177,178,179,180,181,182,183,184,185,186,187,188,189,190,191,
         97,97,97,97,97,97,230,99,101,101,101,101,105,105,105,105,
         240,110,111,111,111,111,111,215,248,117,117,117,117,121,254,223,
         97,97,97,97,97,97,230,99,101,101,101,101,105,105,105,105,
         240,110,111,111,111,111,111,247,248,117,117,117,117,121,254,121
EndDataSection

; create global lookup array
Global Dim LUT_Compare.u(191)
CopyMemory(?LUT_Compare, @LUT_Compare(), 384)

; compare procedure
ProcedureC.i Compare(*s1.String, *s2.String)
  
  Protected c1.c, c2.c, result.i
  Protected *c1.Character = @*s1\s
  Protected *c2.Character = @*s2\s
  
  If *c1 = 0
    If *c2 = 0
      ProcedureReturn 0   ; Both pointers 0
    Else
      ProcedureReturn -1  ; First pointer 0, second not
    EndIf
  ElseIf *c2 = 0
    ProcedureReturn 1     ; Second pointer 0, first not
  Else
    
    ; Both valid strings so compare
    Repeat
      c1 = *c1\c : *c1 + SizeOf(Character)
      c2 = *c2\c : *c2 + SizeOf(Character)
      If c1 >= 64 And c1 < 192 : c1 = LUT_Compare(c1 - 64) : EndIf
      If c2 >= 64 And c2 < 192 : c2 = LUT_Compare(c2 - 64) : EndIf
      result = c1 - c2
    Until result Or c1 = 0
    ProcedureReturn result
    
  EndIf
  
EndProcedure


; array to sort
Dim values.s(2)
values(0)="até"
values(1)="atão"
values(2)="Luís"

; perform sort
qsort(@values(), ArraySize(values()) + 1, SizeOf(String), @Compare())

; present result
For i = 0 To 2
  Debug values(i)
Next

Re: Sort using natural instead of ASCII

Posted: Thu Sep 15, 2016 7:08 pm
by normeus
I would use @wilbert solution (even thou I haven't looked at it but it is wilbert code so it should be good)
but here is a simplified method which can be used to understand whats happening:

Code: Select all

Structure toSort
  Name$
  noaccent$
EndStructure

Dim names.tosort (2)
names(0)\Name$="Alemen"
names(1)\Name$="alemon"
names(2)\Name$="alemán"


Procedure createfield(Array names.tosort(1))
  ; create a string with all names replace accent then create second array
  Protected k,temp$="",pipe.s="|"
  If ArraySize(names())
    temp$=names(0)\Name$
    For k = 1 To ArraySize(names())
      temp$ = temp$+ pipe +names(k)\Name$ ;pipe to separate
    Next
    ;Long list of accents to replace so you can sort the way you like 
    temp$= ReplaceString(temp$, "á", "a", #PB_String_NoCase)
    temp$= ReplaceString(temp$, "å", "a", #PB_String_NoCase)
    ;//etc...
    ;now populate 2nd array without accents
    For k = 0 To ArraySize(names())
      names(k)\noaccent$=StringField(temp$,k+1,pipe)
    Next
  EndIf
  temp$=""
EndProcedure

createfield(names())
SortStructuredArray(names(), #PB_Sort_NoCase, OffsetOf(tosort\noaccent$) ,TypeOf(tosort\noaccent$))


For k = 0 To ArraySize(names())
  Debug names(k)\Name$
Next

Norm.

Re: Sort using natural instead of ASCII

Posted: Thu Sep 15, 2016 8:26 pm
by marcoagpinto
I have managed to make it work!!!!!!!!

Code: Select all

Dim main_array$(2)
main_array$(0)="Alemen"
main_array$(1)="alemon"
main_array$(2)="alemán"


Dim secondary_array$(2)
secondary_array$(0)=""
secondary_array$(1)=""
secondary_array$(2)=""


For f=0 To 2
  t$=LCase(main_array$(f))
  ReplaceString(t$,"á","a",#PB_String_InPlace)
  t$+Chr(9)+Str(f)
  secondary_array$(f)=t$
Next f


SortArray(secondary_array$(),#PB_Sort_Ascending|#PB_Sort_NoCase)


For f=0 To 2
  secondary_array$(f)=main_array$(Val(StringField(secondary_array$(f),2,Chr(9))))
Next f


CopyArray(secondary_array$(),main_array$())


For f=0 To 2
  Debug main_array$(f)
Next f


Re: Sort using natural instead of ASCII

Posted: Thu Sep 15, 2016 8:31 pm
by wilbert
This seems to work on Windows ...

Code: Select all

; import qsort procedure
ImportC ""
  qsort(*base, num, size, *comparator)
EndImport 

; compare procedure
ProcedureC.i Compare(*s1.String, *s2.String)
  ; Compare using LOCALE_USER_DEFAULT
  ProcedureReturn CompareString_($400, #NORM_IGNORECASE, @*s1\s, -1, @*s2\s, -1) - 2
EndProcedure

; array to sort
Dim values.s(2)
values(0)="Alemen"
values(1)="alemon"
values(2)="alemán"

; perform sort
qsort(@values(), ArraySize(values()) + 1, SizeOf(String), @Compare())

; present result
For i = 0 To 2
  Debug values(i)
Next

Re: Sort using natural instead of ASCII

Posted: Thu Sep 15, 2016 8:46 pm
by marcoagpinto
Thanks for all the comments :mrgreen:

But I guess my approach will make it work on the three platforms.

I still think the best would be for Fredddy to add a flag #LATIN to the sort command as that would solve complex coding.

Re: Sort using natural instead of ASCII

Posted: Thu Sep 15, 2016 9:29 pm
by Thunder93
Wilbert's example;

To:
résu1
résu11
résu2
résu22
resume
résumé
Résumé
resumes
Resumes
résumés

From:
resumes
resume
résumés
Resumes
Résumé
résumé
résu1
résu11
résu2
résu22


Not sure where, but I was reading this morning something about résu2 comes before résu11. Must be another type of sorting. :?

Re: Sort using natural instead of ASCII

Posted: Wed Sep 13, 2017 2:35 pm
by Little John
Thunder93 wrote:Not sure where, but I was reading this morning something about résu2 comes before résu11. Must be another type of sorting. :?
Yes, it's another type. And this is called Natural sort order.
What marcoagpinto is asking for is not called natural sorting or similar, so the title of this thread is very misleading.

Re: Sort using natural instead of ASCII

Posted: Wed Sep 13, 2017 2:42 pm
by Thunder93
Thanks for your insight Little John. :wink: