Custom sort for Linked Lists

Share your advanced PureBasic knowledge/code with the community.
Little John
Addict
Addict
Posts: 4519
Joined: Thu Jun 07, 2007 3:25 pm
Location: Berlin, Germany

Custom sort for Linked Lists

Post by Little John »

Motivation

PureBasic's built-in sorting routines for arrays and lists are limited.
They cannot be used e.g. for
  • sorting lists/arrays of strings by the length of the strings
  • sorting lists/arrays of strings according to self-defined rules (there are different rules for different languages, especially with regard to special characters)
  • sorting lists/arrays of strings that contain numbers numerically (instead, they are always sorted alphabetically like other strings, so that e.g. 11 goes before 7)
  • sorting lists/arrays of structures by more than one field
  • sorting lists/arrays of arrays
  • sorting lists/arrays of pointers according to the content of their targets
Since a programming language can't have special built-in functions for all sorting requirements, it is important that it provides sorting routines where the user can define the sorting rules her/himself (by a callback function). Many other programming languages have such sorting routines. For PureBasic, there is a feature request on this.

Custom sort for arrays can be done with a trick that wilbert has demonstrated.

The module

A stable hybrid sorting algorithm is used here: Longer lists are sorted by the fastest recursive PureBasic implementation of MergeSort that I'm aware of, shorter lists are sorted by an efficient InsertionSort implementation. The used threshold between "long" and "short" can be changed at runtime. A small program for testing the speed when using different thresholds is in the 3rd post here. No data are copied at all, all sorting is done by manipulating pointers.

This module contains individual procedures for sorting lists of integers, doubles, strings, and pointers.
A problem arises with structures: PureBasic doesn't allow us to write generic procedures that work for all data types, and a module like this can't contain separate procedures for all possible structures. So the module provides the public macro SortListAny(), which internally creates a list of pointers to the structures, then sorts the pointer list and finally rearranges the original structure list according to the order of the pointers. This is still considerably faster than the in-place version of MergeSort which this module used up to version 1.10.

// edit 2019-03-03:
In the meantime, wilbert has posted nifty code that does non-recursive custom sort for Linked Lists, which is even faster than my following code.

Code: Select all

; -- Custom sort for linked lists
; <https://www.purebasic.fr/english/viewtopic.php?f=12&t=71677>

; Version 1.22, 2019-01-31
; Purebasic 5.20 LTS or newer is required because of the module.
; Cross-platform, x86 and x64, Unicode compliant.

; This module uses a hybrid sorting algorithm. For lists of a
; size bigger than the internal threshold (value of variable
; 'sInsertionSortMaxSize'), it uses MergeSort, for smaller lists
; (including those that are always produced by MergeSort as
; intermediate results), InsertionSort is used. This is
; considerably faster than using MergeSort alone. By calling
; SetInsertionSortMaxSize(), your code can change the threshold
; at runtime.
; Your code must provide a custom comparison callback function
; of type 'ProtoCompare' (see the beginning of the module, and
; examples in the demo code). If the comparison function is
; written properly, then sorting by the procedures of this
; module is stable.

; ------------------------------------------------------------------------------
; MIT License
;
; Copyright (c) 2018-2019 Jürgen Lüthje <http://luethje.eu/>
;
; Permission is hereby granted, free of charge, to any person obtaining a copy
; of this software and associated documentation files (the "Software"), to deal
; in the Software without restriction, including without limitation the rights
; to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
; copies of the Software, and to permit persons to whom the Software is
; furnished to do so, subject to the following conditions:
;
; The above copyright notice and this permission notice shall be included in all
; copies or substantial portions of the Software.
;
; THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
; IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
; FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
; AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
; LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
; OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
; SOFTWARE.
; ------------------------------------------------------------------------------


DeclareModule CS
   EnableExplicit
   
   Declare.i SetInsertionSortMaxSize (insSortMaxSize.i)
   
   Declare SortListI   (List a.i(), *Compare, mode.i=#PB_Sort_Ascending, first.i=0, last.i=-1)
   Declare SortListD   (List a.d(), *Compare, mode.i=#PB_Sort_Ascending, first.i=0, last.i=-1)
   Declare SortListS   (List a.s(), *Compare, mode.i=#PB_Sort_Ascending, first.i=0, last.i=-1)
   Declare SortListPtr (List *a(),  *Compare, mode.i=#PB_Sort_Ascending, first.i=0, last.i=-1)
   
   
   Macro RearrangeData (_dataList_, _pointerList_)
      ; -- rearrange the elements of _dataList_ according to the order of elements in _pointerList_
      
      ForEach _pointerList_
         ChangeCurrentElement(_dataList_, _pointerList_)
         MoveElement(_dataList_, #PB_List_Last)
      Next
   EndMacro
   
   NewList *ptr()    ; * for internal use by Macro SortListAny() only *
   
   Macro SortListAny (_list_, _compare_, _mode_=#PB_Sort_Ascending, _first_=0, _last_=-1)
      ; -- sort any kind of list (indirectly by pointers)
      
      ForEach _list_
         AddElement(CS::*ptr())
         CS::*ptr() = @ _list_
      Next
      
      CS::SortListPtr(CS::*ptr(), _compare_, _mode_, _first_, _last_)
      CS::RearrangeData(_list_, CS::*ptr())
      ClearList(CS::*ptr())
   EndMacro
EndDeclareModule


Module CS
   Prototype.i ProtoCompare (*a, *b, mode.i)
   
   
   Define sInsertionSortMaxSize.i = 40
   
   Procedure.i SetInsertionSortMaxSize (insSortMaxSize.i)
      ; -- For maximum sorting speed, you might need to change the value of the
      ;    internal variable 'sInsertionSortMaxSize'.
      ; in : insSortMaxSize: new value of the internal variable 'sInsertionSortMaxSize'
      ; out: return value  : old value of the internal variable 'sInsertionSortMaxSize',
      ;                      or 0 on error
      Shared sInsertionSortMaxSize
      
      If insSortMaxSize >= 1
         Swap sInsertionSortMaxSize, insSortMaxSize
         ProcedureReturn insSortMaxSize
      Else
         ProcedureReturn 0
      EndIf      
   EndProcedure
   
   ;------------------------------------------------------------------------------------------------------
   
   Macro _Merge (_listA_, _listB_)
      ; -- merge two sorted linked lists
      ; in : _listA_, _listB_: partial lists which are already sorted (both lists are not empty)
      ; out: _listA_         : sorted list that contains all elements of _listA_ and _listB_
      
      ; -- merge both partial lists
      *curA = LastElement(_listA_)
      MergeLists(_listB_, _listA_)        ; Append _listB_ to the end of _listA_.
      *curB = NextElement(_listA_)
      
      If IsInWrongOrder(*curA, *curB, SortMode) > 0
         countA = 0
         *curA = FirstElement(_listA_)
         
         ; -- rearrange the elements in the merged list
         While countA < firstB And *curB <> #Null
            If IsInWrongOrder(*curA, *curB, SortMode) > 0
               ChangeCurrentElement(_listA_, *curB)
               *nxtB = NextElement(_listA_)
               ChangeCurrentElement(_listA_, *curB)
               MoveElement(_listA_, #PB_List_Before, *curA)
               *curB = *nxtB
            Else
               countA + 1
            EndIf
            *curA = NextElement(_listA_)  ; *curA always points to the first element which is not yet sorted.
         Wend
      EndIf
   EndMacro
   
   
   Macro _InsertionSort (_list_)
      FirstElement(_list_)
      While NextElement(_list_) <> #Null
         *curElement = @ _list_                     ; save pointer to current element
         *target = *curElement
         *lastSortedElement = PreviousElement(_list_)
         *prevElement = *lastSortedElement
         While *prevElement <> #Null And IsInWrongOrder(*prevElement, *curElement, SortMode) > 0
            *target = *prevElement
            *prevElement = PreviousElement(_list_)
         Wend
         
         ChangeCurrentElement(_list_, *curElement)
         If *target <> *curElement
            MoveElement(_list_, #PB_List_Before, *target)
            ChangeCurrentElement(_list_, *lastSortedElement)
         EndIf
      Wend
   EndMacro
   
   
   Macro _SortRange (_ProcSuffix_, _list_)
      ; -- select the part of the list that needs to be sorted
      If first = 0
         If last = -1 Or last = ListSize(_list_) - 1
            ; -- Sort all elements
            _Sort#_ProcSuffix_(_list_)
         ElseIf 0 < last And last < ListSize(_list_) - 1
            ; -- Sort leading part of the list
            SelectElement(_list_, last)
            SplitList(_list_, t(), #True)          ; Move the trailing part of _list_ to t().
            _Sort#_ProcSuffix_(_list_)             ; Sort rest of _list_.
            MergeLists(t(), _list_)                ; Move the saved trailing part back to _list_.
         EndIf
         
      ElseIf 0 < first And first < ListSize(_list_) - 1
         If last = -1 Or last = ListSize(_list_) - 1
            ; -- Sort trailing part of the list
            SelectElement(_list_, first)
            SplitList(_list_, t())                 ; Move the trailing part of _list_ to t().
            _Sort#_ProcSuffix_(t())                ; Sort t().
            MergeLists(t(), _list_)                ; Move the sorted trailing part back to _list_.
         ElseIf 0 < last And last < ListSize(_list_) - 1
            ; -- Sort middle part of the list
            SelectElement(_list_, last)
            SplitList(_list_, t(), #True)          ; Move the trailing part of _list_ to t().
            SelectElement(_list_, first)
            SplitList(_list_, m())                 ; Move the middle part of _list_ to m().
            _Sort#_ProcSuffix_(m())                ; Sort m().
            MergeLists(m(), _list_)                ; Move the sorted middle part back to _list_.
            MergeLists(t(), _list_)                ; Move the saved trailing part back to _list_.
         EndIf
      EndIf
   EndMacro
   
   
   Global IsInWrongOrder.ProtoCompare
   Global SortMode.i
   
   ;------------------------------------------------------------------------------------------------------
   
   Procedure _SortI (List a.i())
      Shared sInsertionSortMaxSize
      Protected *target
      Protected *curA, *curB, *nxtB, countA, firstB            ; for _Merge()
      Protected *curElement, *lastSortedElement, *prevElement  ; for _InsertionSort()
      Protected NewList b.i()
      
      If ListSize(a()) <= 1
         ProcedureReturn
      EndIf
      
      If ListSize(a()) > sInsertionSortMaxSize
         firstB = Int(ListSize(a()) / 2)
         SelectElement(a(), firstB)
         SplitList(a(), b())            ; Move the second half of a() to b().
         
         _SortI(a())
         _SortI(b())
         _Merge(a(), b())
         
      Else
         _InsertionSort(a())
      EndIf
   EndProcedure
   
   Procedure SortListI (List a.i(), *Compare, mode.i=#PB_Sort_Ascending, first.i=0, last.i=-1)
      ; -- Sort list a() according to the given comparison function
      ; in : a()     : List of integers to be sorted
      ;      *Compare: address of a custom comparison function of type 'ProtoCompare'
      ;      mode    : This value is just passed to the custom comparison function.
      ;      first   : index of first element to sort  (default: sort ... )
      ;      last    : index of last  element to sort  (... the whole list)
      ; out: a()     : completely or partly sorted list
      Protected NewList m.i()
      Protected NewList t.i()
      
      IsInWrongOrder = *Compare
      SortMode = mode
      _SortRange(I, a())
   EndProcedure
   
   ;------------------------------------------------------------------------------------------------------
   
   Procedure _SortD (List a.d())
      Shared sInsertionSortMaxSize
      Protected *target
      Protected *curA, *curB, *nxtB, countA, firstB            ; for _Merge()
      Protected *curElement, *lastSortedElement, *prevElement  ; for _InsertionSort()
      Protected NewList b.d()
      
      If ListSize(a()) <= 1
         ProcedureReturn
      EndIf
      
      If ListSize(a()) > sInsertionSortMaxSize
         firstB = Int(ListSize(a()) / 2)
         SelectElement(a(), firstB)
         SplitList(a(), b())            ; Move the second half of a() to b().
         
         _SortD(a())
         _SortD(b())
         _Merge(a(), b())
         
      Else
         _InsertionSort(a())
      EndIf
   EndProcedure
   
   Procedure SortListD (List a.d(), *Compare, mode.i=#PB_Sort_Ascending, first.i=0, last.i=-1)
      ; -- Sort list a() according to the given comparison function
      ; in : a()     : List of doubles to be sorted
      ;      *Compare: address of a custom comparison function of type 'ProtoCompare'
      ;      mode    : This value is just passed to the custom comparison function.
      ;      first   : index of first element to sort  (default: sort ... )
      ;      last    : index of last  element to sort  (... the whole list)
      ; out: a()     : completely or partly sorted list
      Protected NewList m.d()
      Protected NewList t.d()
      
      IsInWrongOrder = *Compare
      SortMode = mode
      _SortRange(D, a())
   EndProcedure
   
   ;------------------------------------------------------------------------------------------------------
   
   Procedure _SortS (List a.s())
      Shared sInsertionSortMaxSize
      Protected *target
      Protected *curA, *curB, *nxtB, countA, firstB            ; for _Merge()
      Protected *curElement, *lastSortedElement, *prevElement  ; for _InsertionSort()
      Protected NewList b.s()
      
      If ListSize(a()) <= 1
         ProcedureReturn
      EndIf
      
      If ListSize(a()) > sInsertionSortMaxSize
         firstB = Int(ListSize(a()) / 2)
         SelectElement(a(), firstB)
         SplitList(a(), b())            ; Move the second half of a() to b().
         
         _SortS(a())
         _SortS(b())
         _Merge(a(), b())
         
      Else
         _InsertionSort(a())
      EndIf
   EndProcedure
   
   Procedure SortListS (List a.s(), *Compare, mode.i=#PB_Sort_Ascending, first.i=0, last.i=-1)
      ; -- Sort list a() according to the given comparison function
      ; in : a()     : List of strings to be sorted
      ;      *Compare: address of a custom comparison function of type 'ProtoCompare'
      ;      mode    : This value is just passed to the custom comparison function.
      ;      first   : index of first element to sort  (default: sort ... )
      ;      last    : index of last  element to sort  (... the whole list)
      ; out: a()     : completely or partly sorted list
      Protected NewList m.s()
      Protected NewList t.s()
      
      IsInWrongOrder = *Compare
      SortMode = mode
      _SortRange(S, a())
   EndProcedure
   
   ;------------------------------------------------------------------------------------------------------
   
   Procedure _SortPtr (List *a())
      Shared sInsertionSortMaxSize
      Protected *target
      Protected *curA, *curB, *nxtB, countA, firstB            ; for _Merge()
      Protected *curElement, *lastSortedElement, *prevElement  ; for _InsertionSort()
      Protected NewList *b()
      
      If ListSize(*a()) <= 1
         ProcedureReturn
      EndIf
      
      If ListSize(*a()) > sInsertionSortMaxSize
         firstB = Int(ListSize(*a()) / 2)
         SelectElement(*a(), firstB)
         SplitList(*a(), *b())           ; Move the second half of *a() to *b().
         
         _SortPtr(*a())
         _SortPtr(*b())
         _Merge(*a(), *b())
         
      Else
         _InsertionSort(*a())
      EndIf
   EndProcedure
   
   Procedure SortListPtr (List *a(), *Compare, mode.i=#PB_Sort_Ascending, first.i=0, last.i=-1)
      ; -- Sort list *a() according to the given comparison function
      ; in : *a()    : List of pointers to be sorted according to their targets
      ;      *Compare: address of a custom comparison function of type 'ProtoCompare'
      ;      mode    : This value is just passed to the custom comparison function.
      ;      first   : index of first element to sort  (default: sort ... )
      ;      last    : index of last  element to sort  (... the whole list)
      ; out: *a()    : completely or partly sorted list
      Protected NewList m.i()
      Protected NewList t.i()
      
      IsInWrongOrder = *Compare
      SortMode = mode
      _SortRange(Ptr, *a())
   EndProcedure
   
   ;-=====================================================================================================
EndModule


CompilerIf #PB_Compiler_IsMainFile
   ; -- Module demo
   
   CompilerIf #PB_Compiler_Debugger = #False
      MessageRequester("Error",
                       "Switch Debugger on to see the demo.",
                       #PB_MessageRequester_Error)
      End
   CompilerEndIf
   
   EnableExplicit
   
   Debug "Old value of 'sInsertionSortMaxSize': " + CS::SetInsertionSortMaxSize(3)
   Debug ""
   
   Define first, last, k, prefix$
   
   Macro ShowList (_list_, _first_=99, _last_=99)
      Debug ""
      k = 0
      ForEach _list_
         If _first_ <= k And k <= _last_
            prefix$ = "> "
         Else
            prefix$ = "  "
         EndIf
         Debug Str(k) + ") " + prefix$ + _list_
         k + 1
      Next
   EndMacro
   
   ;================================================================
   
   Procedure.i CompareI (*a.Integer, *b.Integer, mode.i)
      ; -- custom comparison function of type 'ProtoCompare' (see Module CS)
      ; in : *a, *b: pointers to integers to be compared
      ;      mode  : mode of comparison:
      ;              #PB_Sort_Ascending/#PB_Sort_Descending
      ; out: return value: < 0, 0, or > 0
      
      If (mode & #PB_Sort_Descending)
         ProcedureReturn *b\i - *a\i
      Else
         ProcedureReturn *a\i - *b\i
      EndIf
   EndProcedure
   
   
   NewList i.i()
   For k = 0 To 6
      AddElement(i())
      i() = k
   Next
   
   Debug "Integers shuffled:"
   RandomizeList(i())
   ShowList(i())
   
   Debug "-----------------------------------------"
   
   Debug "Whole integer list sorted ascending:"
   CS::SortListI(i(), @ CompareI(), #PB_Sort_Ascending)
   ShowList(i(), 0, 6)
   
   Debug "-----------------------------------------"
   
   Debug "Whole integer list sorted descending:"
   CS::SortListI(i(), @ CompareI(), #PB_Sort_Descending)
   ShowList(i(), 0, 6)
   
   Debug ""
   Debug "========================================="
   Debug ""
   
   Procedure.i CompareD (*a.Double, *b.Double, mode.i)
      ; -- custom comparison function of type 'ProtoCompare' (see Module CS)
      ; in : *a, *b: pointers to doubles to be compared
      ;      mode  : mode of comparison:
      ;              #PB_Sort_Ascending/#PB_Sort_Descending
      ; out: return value: #True/#False
      
      If (mode & #PB_Sort_Descending)
         ProcedureReturn Bool(*a\d < *b\d)
      Else
         ProcedureReturn Bool(*a\d > *b\d)
      EndIf
   EndProcedure
   
   
   NewList d.d()
   For k = 0 To 6
      AddElement(d())
      d() = k * 0.4
   Next
   
   Debug "Doubles shuffled:"
   RandomizeList(d())
   ShowList(d())
   
   first = 2
   
   Debug "-----------------------------------------"
   
   Debug "Doubles from #" + first + " to last element sorted ascending:"
   CS::SortListD(d(), @ CompareD(), #PB_Sort_Ascending, first)
   ShowList(d(), first)
   
   Debug "-----------------------------------------"
   
   Debug "Doubles from #" + first + " to last element sorted descending:"
   CS::SortListD(d(), @ CompareD(), #PB_Sort_Descending, first)
   ShowList(d(), first)
   
   Debug ""
   Debug "========================================="
   Debug ""
   
   Procedure.i CompareS (*a.String, *b.String, mode.i)
      ; -- custom comparison function of type 'ProtoCompare' (see Module CS)
      ; in : *a, *b: pointers to strings to be compared
      ;      mode  : mode of comparison:
      ;              #PB_Sort_Ascending/#PB_Sort_Descending/#PB_Sort_NoCase
      ; out: return value: < 0, 0, or > 0
      Protected ret.i=0
      
      If (mode & #PB_Sort_NoCase)
         ret = CompareMemoryString(@ *a\s, @ *b\s, #PB_String_NoCase)
      Else
         ret = CompareMemoryString(@ *a\s, @ *b\s)
      EndIf
      
      If (mode & #PB_Sort_Descending)
         ProcedureReturn -ret
      Else
         ProcedureReturn ret
      EndIf
   EndProcedure
   
   
   DataSection
      StringData:
      Data.s "aaaaaaaaaa"
      Data.s "bbbbbbbbbb"
      Data.s "cccccccc"
      Data.s "dddddddd"
      Data.s "eeeeee"
      Data.s "ffffff"
      Data.s "gggggg"
      Data.s ""
   EndDataSection
   
   Procedure ReadStrings (List s.s())
      Protected temp$
      
      Restore StringData
      Read.s temp$
      While temp$
         AddElement(s())
         s() = temp$
         Read.s temp$
      Wend
   EndProcedure
   
   
   NewList s.s()
   ReadStrings(s())
   
   Debug "Strings shuffled:"
   RandomizeList(s())
   ShowList(s())
   
   first = 0 : last = 5
   
   Debug "-----------------------------------------"
   
   Debug "Strings from #" + first + " to #" + last + " sorted ascending:"
   CS::SortListS(s(), @ CompareS(), #PB_Sort_Ascending, first, last)
   ShowList(s(), first, last)
   
   Debug "-----------------------------------------"
   
   Debug "Strings from #" + first + " to #" + last + " sorted descending:"
   CS::SortListS(s(), @ CompareS(), #PB_Sort_Descending, first, last)
   ShowList(s(), first, last)
   
   Debug ""
   Debug "========================================="
   Debug ""
   
   Structure Person
      FamilyName.s
      GivenName.s
      Age.i
   EndStructure
   
   Enumeration 0 Step 4
      #ByFamilyName
      #ByGivenName
      #ByAge
   EndEnumeration
   
   Procedure.i ComparePtr (*pa.Integer, *pb.Integer, mode.i)
      ; -- custom comparison function of type 'ProtoCompare' (see Module CS)
      ; in : *pa, *pb: pointers to pointers to data to be compared
      ;      mode  : mode of comparison:
      ;              number for structure field that is used for comparison, combined
      ;              with #PB_Sort_Ascending/#PB_Sort_Descending/#PB_Sort_NoCase
      ; out: return value: #True/#False
      ;
      ; CAVE! When using '<=' instead of '<' or '>=' instead of '>' here,
      ;       sorting will work but won't be stable!
      Protected.Person *a = *pa\i, *b = *pb\i          ; dereference the pointers passed as parameters
      
      Select mode
            ; -- by family name
         Case 0                                        ; ascending
            ProcedureReturn Bool(*a\FamilyName > *b\FamilyName)
         Case 1                                        ; descending
            ProcedureReturn Bool(*a\FamilyName < *b\FamilyName)
         Case 2                                        ; ascending and case-insensitive
            ProcedureReturn Bool(UCase(*a\FamilyName) > UCase(*b\FamilyName))
         Case 3                                        ; descending and case-insensitive
            ProcedureReturn Bool(UCase(*a\FamilyName) < UCase(*b\FamilyName))
            
            ; -- by given name
         Case 4                                        ; ascending
            ProcedureReturn Bool(*a\GivenName > *b\GivenName)
         Case 5                                        ; descending
            ProcedureReturn Bool(*a\GivenName < *b\GivenName)
         Case 6                                        ; ascending and case-insensitive
            ProcedureReturn Bool(UCase(*a\GivenName) > UCase(*b\GivenName))
         Case 7                                        ; descending and case-insensitive
            ProcedureReturn Bool(UCase(*a\GivenName) < UCase(*b\GivenName))
            
            ; -- by age
         Case 8                                        ; ascending
            ProcedureReturn Bool(*a\Age > *b\Age)
         Case 9                                        ; descending
            ProcedureReturn Bool(*a\Age < *b\Age)
      EndSelect
   EndProcedure
   
   
   DataSection
      People:
      Data.s "Schmidt", "Michelle"
      Data.i 50
      Data.s "Johnson", "Jamie"
      Data.i 40
      Data.s "Schmidt", "Jürgen"
      Data.i 30
      Data.s "Miller", "Mary"
      Data.i 40
      Data.s "Parker", "Peter"
      Data.i 20
      Data.s "Johnson", "Karl"
      Data.i 30
      Data.s "Parker", "Michelle"
      Data.i 60
      Data.s ""
   EndDataSection
   
   Procedure GetData (List x.Person())
      Protected temp$
      
      Restore People
      Read.s temp$
      While temp$
         AddElement(x())
         x()\FamilyName = temp$
         Read.s x()\GivenName
         Read.i x()\Age
         Read.s temp$
      Wend
   EndProcedure
   
   Procedure ShowData (List x.Person(), first.i=99, last.i=99)
      Protected k.i, prefix$
      
      Debug ""
      k = 0
      ForEach x()
         If first <= k And k <= last
            prefix$ = "> "
         Else
            prefix$ = "  "
         EndIf
         With x()
            Debug Str(k) + ") " + prefix$ + LSet(\FamilyName + ", " + \GivenName + ", ", 20) + \Age
         EndWith
         k + 1
      Next
   EndProcedure
   
   Procedure ShowTargets (List *p(), first.i=99, last.i=99)
      Protected k.i, prefix$, *temp.Person
      
      Debug ""
      k = 0
      ForEach *p()
         If first <= k And k <= last
            prefix$ = "> "
         Else
            prefix$ = "  "
         EndIf
         *temp = *p()
         With *temp
            Debug Str(k) + ") " + prefix$ + LSet(\FamilyName + ", " + \GivenName + ", ", 20) + \Age
         EndWith
         k + 1
      Next
   EndProcedure
   
   
   NewList x.Person()
   NewList *p()
   
   GetData(x())
   RandomizeList(x())
   
   ForEach x()
      AddElement(*p())
      *p() = @ x()
   Next
   
   Debug "Pointers to shuffled data:"
   ShowTargets(*p())
   
   first = 1 : last = 5
   
   Debug "-----------------------------------------"
   
   Debug "Pointers from #" + first + " to #" + last + " sorted ascending by family name (stable):"
   CS::SortListPtr(*p(), @ ComparePtr(), #PB_Sort_Ascending|#ByFamilyName, first, last)
   ShowTargets(*p(), first, last)
   
   Debug "-----------------------------------------"
   
   Debug "Pointers from #" + first + " to #" + last + " sorted descending by age (stable):"
   CS::SortListPtr(*p(), @ ComparePtr(), #PB_Sort_Descending|#ByAge, first, last)
   ShowTargets(*p(), first, last)
   
   Debug "-----------------------------------------"
   
   Debug "The original data are still unsorted:"
   ShowData(x())
   
   Debug "-----------------------------------------"
   
   Debug "Now the original data are sorted, too"
   Debug "(according to the current order of the pointers):"
   CS::RearrangeData(x(), *p())
   ShowData(x(), first, last)
   
   Debug ""
   Debug "========================================="
   Debug ""
   
   Structure Country
      Name.s
      Capital.s
      Population.i
   EndStructure
   
   Enumeration 0 Step 4
      #ByName
      #ByCapital
      #ByPopulation
   EndEnumeration
   
   Procedure.i CompareStruc (*pa.Integer, *pb.Integer, mode.i)
      ; -- custom comparison function of type 'ProtoCompare'
      ; in : *pa, *pb: pointers to pointers to structures to be compared
      ;      mode    : mode of comparison:
      ;                number for structure field that is used for comparison, combined
      ;                with #PB_Sort_Ascending/#PB_Sort_Descending/#PB_Sort_NoCase
      ; out: return value: #True/#False
      ;
      ; CAVE! When using '<=' instead of '<' or '>=' instead of '>' here,
      ;       sorting will work but won't be stable!
      Protected.Country *a = *pa\i, *b = *pb\i        ; dereference the pointers passed as parameters
      
      Select mode
            ; -- by name
         Case 0                                       ; ascending
            ProcedureReturn Bool(*a\Name > *b\Name)
         Case 1                                       ; descending
            ProcedureReturn Bool(*a\Name < *b\Name)
         Case 2                                       ; ascending and case-insensitive
            ProcedureReturn Bool(UCase(*a\Name) > UCase(*b\Name))
         Case 3                                       ; descending and case-insensitive
            ProcedureReturn Bool(UCase(*a\Name) < UCase(*b\Name))
            
            ; -- by capital
         Case 4                                       ; ascending
            ProcedureReturn Bool(*a\Capital > *b\Capital)
         Case 5                                       ; descending
            ProcedureReturn Bool(*a\Capital < *b\Capital)
         Case 6                                       ; ascending and case-insensitive
            ProcedureReturn Bool(UCase(*a\Capital) > UCase(*b\Capital))
         Case 7                                       ; descending and case-insensitive
            ProcedureReturn Bool(UCase(*a\Capital) < UCase(*b\Capital))
            
            ; -- by population
         Case 8                                       ; ascending
            ProcedureReturn Bool(*a\Population > *b\Population)
         Case 9                                       ; descending
            ProcedureReturn Bool(*a\Population < *b\Population)
      EndSelect
   EndProcedure
   
   
   DataSection
      Countries:
      Data.s "France", "Paris"
      Data.i 67348000
      Data.s "England", "London"
      Data.i 55619400
      Data.s "Japan", "Tokyo"
      Data.i 126440000
      Data.s "New Zealand", "Wellington"
      Data.i 4915240
      Data.s "Germany", "Berlin"
      Data.i 82800000
      Data.s "Canada", "Ottawa"
      Data.i 37067011
      Data.s "China", "Beijing"
      Data.i 1403500365
      Data.s ""
   EndDataSection
   
   Procedure ReadStruc (List x.Country())
      Protected temp$
      
      Restore Countries
      Read.s temp$
      While temp$
         AddElement(x())
         x()\Name = temp$
         Read.s x()\Capital
         Read.i x()\Population
         Read.s temp$
      Wend
   EndProcedure
   
   Procedure ShowStruc (List r.Country(), first.i=99, last.i=99)
      Protected k.i, prefix$
      
      Debug ""
      k = 0
      ForEach r()
         If first <= k And k <= last
            prefix$ = "> "
         Else
            prefix$ = "  "
         EndIf
         With r()
            Debug Str(k) + ") " + prefix$ + LSet(\Name + ", " + \Capital + ", ", 26) + RSet(Str(\Population), 10)
         EndWith
         k + 1
      Next
   EndProcedure
   
   
   NewList r.Country()
   ReadStruc(r())
   RandomizeList(r())
   
   Debug "Shuffled structured list:"
   ShowStruc(r())
   
   Debug "-----------------------------------------"
   
   Debug "Whole structured list sorted ascending by name:"
   CS::SortListAny(r(), @ CompareStruc(), #PB_Sort_Ascending|#ByName)
   ShowStruc(r(), 0, 6)
   
   Debug "-----------------------------------------"
   
   Debug "Whole structured list sorted descending by population:"
   CS::SortListAny(r(), @ CompareStruc(), #PB_Sort_Descending|#ByPopulation)
   ShowStruc(r(), 0, 6)
CompilerEndIf
Last edited by Little John on Sun Mar 03, 2019 11:20 pm, edited 8 times in total.
Little John
Addict
Addict
Posts: 4519
Joined: Thu Jun 07, 2007 3:25 pm
Location: Berlin, Germany

Re: Custom sort for Linked Lists

Post by Little John »

Validation

Comparing the result of the algorithm used in the module with the result of a built-in PB routine.

Code: Select all

EnableExplicit

XIncludeFile "SortListCustom.pbi"

Structure Person
   Name.s
   Age.i
EndStructure


Procedure.i ComparePersons_1 (*pa.Integer, *pb.Integer, mode.i)
   ; -- custom comparison function of type 'ProtoCompare'
   ; in : *pa, *pb: pointers to pointers to structures to be compared
   ;      mode    : mode of comparison:
   ;                #PB_Sort_Ascending/#PB_Sort_Descending
   ; out: return value: #True/#False
   Protected.Person *a = *pa\i, *b = *pb\i     ; dereference the pointers passed as parameters
   
   If (mode & #PB_Sort_Descending)
      ProcedureReturn Bool(*a\Name < *b\Name)
   Else
      ProcedureReturn Bool(*a\Name > *b\Name)
   EndIf
EndProcedure


Procedure.i ComparePersons_2 (*pa.Integer, *pb.Integer, mode.i)
   ; -- custom comparison function of type 'ProtoCompare'
   ; in : *pa, *pb: pointers to pointers to structures to be compared
   ;      mode    : mode of comparison:
   ;                #PB_Sort_Ascending/#PB_Sort_Descending
   ; out: return value: < 0, 0, or > 0
   Protected.Person *a = *pa\i, *b = *pb\i     ; dereference the pointers passed as parameters
   Protected ret=0
   
   If *a\Name < *b\Name
      ret = -1
   ElseIf *a\Name > *b\Name   
      ret = 1
   EndIf
   
   If (mode & #PB_Sort_Descending)
      ProcedureReturn -ret
   Else   
      ProcedureReturn ret
   EndIf
EndProcedure


Procedure.s RandomString (length.i)
   Protected *char.Character, ret$=Space(length)
   
   *char = @ ret$
   While *char\c <> 0
      If Random(1) = 0
         *char\c = Random('Z', 'A')
      Else   
         *char\c = Random('z', 'a')
      EndIf   
      *char + SizeOf(Character)
   Wend
   
   ProcedureReturn ret$
EndProcedure


Procedure CreateRandomStructures (List x.Person(), n.i)
   Protected m$, a, i, r=Int(n/2)
   
   For i = 1 To r
      m$ = RandomString(Random(50, 30))
      a = Random(80, 20)
      
      AddElement(x())
      x()\Name = m$
      x()\Age = a
      
      AddElement(x())
      x()\Name = m$
      x()\Age = a + 10
   Next
EndProcedure


Procedure.i IsEqualStructuredList (List a.Person(), List b.Person())
   ; check whether 2 sorted structure lists are equal
   
   If ListSize(a()) <> ListSize(b())
      ProcedureReturn #False
   EndIf
   
   FirstElement(b())
   ForEach a()
      If a()\Name <> b()\Name Or
         a()\Age <> b()\Age
         ProcedureReturn #False
      EndIf  
      NextElement(b())
   Next
   
   ProcedureReturn #True
EndProcedure

;-===============================================================================

Define n, first, last, e1, e2, e3, e4, flag, msg$=""
NewList a.Person()
NewList b1.Person()
NewList b2.Person()

n = 3000

CreateRandomStructures(a(), n)
last  = Random(ListSize(a())-1, 1)
first = Random(last-1)

; -------------------------------------------------------------------------------
; sort ascending

RandomizeList(a())
CopyList(a(), b1())
CopyList(a(), b2())

SortStructuredList(a(), #PB_Sort_Ascending, OffsetOf(Person\Name), TypeOf(Person\Name), first, last)

CS::SortListAny(b1(), @ ComparePersons_1(), #PB_Sort_Ascending, first, last)
e1 = IsEqualStructuredList(a(), b1())

CS::SortListAny(b2(), @ ComparePersons_2(), #PB_Sort_Ascending, first, last)
e2 = IsEqualStructuredList(a(), b2())

; -------------------------------------------------------------------------------
; sort descending

RandomizeList(a())
CopyList(a(), b1())
CopyList(a(), b2())

SortStructuredList(a(), #PB_Sort_Descending, OffsetOf(Person\Name), TypeOf(Person\Name), first, last)

CS::SortListAny(b1(), @ ComparePersons_1(), #PB_Sort_Descending, first, last)
e3 = IsEqualStructuredList(a(), b1())

CS::SortListAny(b2(), @ ComparePersons_2(), #PB_Sort_Descending, first, last)
e4 = IsEqualStructuredList(a(), b2())

; -------------------------------------------------------------------------------

If e1 = #False : msg$ + ", e1" : EndIf
If e2 = #False : msg$ + ", e2" : EndIf
If e3 = #False : msg$ + ", e3" : EndIf
If e4 = #False : msg$ + ", e4" : EndIf

If msg$ = ""
   msg$ = ~"Stable sorting ascending and descending:\nOK with both comparison functions."
   flag = #PB_MessageRequester_Info
Else   
   msg$ = "False: " + Mid(msg$, 3)
   flag = #PB_MessageRequester_Warning
EndIf

MessageRequester("Validation", msg$, flag)
Last edited by Little John on Thu Jan 31, 2019 6:59 am, edited 4 times in total.
Little John
Addict
Addict
Posts: 4519
Joined: Thu Jun 07, 2007 3:25 pm
Location: Berlin, Germany

Re: Custom sort for Linked Lists

Post by Little John »

Speed test

The following code demonstrates that the hybrid sorting algorithm is considerably faster than using self-written MergeSort alone. On my system, 40 is a good threshold value.

Code: Select all

EnableExplicit

XIncludeFile "SortListCustom.pbi"

CompilerIf #PB_Compiler_Debugger
   MessageRequester("Error",
                    "Switch the Debugger off!",
                    #PB_MessageRequester_Error)
   End
CompilerEndIf


Procedure.s RandomString (length.i)
   Protected *char.Character, ret$=Space(length)
   
   *char = @ ret$
   While *char\c <> 0
      If Random(1) = 0
         *char\c = Random('Z', 'A')
      Else
         *char\c = Random('z', 'a')
      EndIf
      *char + SizeOf(Character)
   Wend
   
   ProcedureReturn ret$
EndProcedure

Define msg$, threshold, n = 500000 


;-============  Strings  ============

Macro StorePointers (_dataList_, _pointerList_)
   ; -- create _pointerList_, that contains the addresses of the elements of _dataList_
   
   ClearList(_pointerList_)
   ForEach _dataList_
      AddElement(_pointerList_)
      _pointerList_ = @ _dataList_
   Next
EndMacro

Macro RearrangeData (_dataList_, _pointerList_)
   ; -- rearrange the elements of _dataList_ according to the order of elements in _pointerList_
   
   ForEach _pointerList_
      ChangeCurrentElement(_dataList_, _pointerList_)
      MoveElement(_dataList_, #PB_List_Last)
   Next
EndMacro


Procedure CreateRandomStrings (List x.s(), n.i)
   Protected i
   
   For i = 1 To n
      AddElement(x())
      x() = RandomString(Random(1000, 200))
   Next
EndProcedure


Procedure.i CompareS (*a.String, *b.String, mode.i)
   ; -- custom comparison function of type 'ProtoCompare'
   ; in : *a, *b: pointers to strings to be compared
   ;      mode  : mode of comparison:
   ;              #PB_Sort_Ascending/#PB_Sort_Descending
   ; out: return value: #True/#False
   
   If (mode & #PB_Sort_Descending)
      ProcedureReturn Bool(*a\s < *b\s)
   Else
      ProcedureReturn Bool(*a\s > *b\s)
   EndIf
EndProcedure


Define t0, t1, t2, t3, t4
NewList s0.s()
NewList *s0()

CreateRandomStrings(s0(), n)
StorePointers(s0(), *s0())

; --------------------------------------------------------------------------

threshold = 40

t0 = ElapsedMilliseconds()
SortList(s0(), #PB_Sort_Ascending)
t0 = ElapsedMilliseconds() - t0

RearrangeData(s0(), *s0())
CS::SetInsertionSortMaxSize(2 * threshold)
t1 = ElapsedMilliseconds()
CS::SortListS(s0(), @ CompareS())
t1 = ElapsedMilliseconds() - t1

RearrangeData(s0(), *s0())
CS::SetInsertionSortMaxSize(threshold)
t2 = ElapsedMilliseconds()
CS::SortListS(s0(), @ CompareS())
t2 = ElapsedMilliseconds() - t2

RearrangeData(s0(), *s0())
CS::SetInsertionSortMaxSize(0.5 * threshold)
t3 = ElapsedMilliseconds()
CS::SortListS(s0(), @ CompareS())
t3 = ElapsedMilliseconds() - t3

RearrangeData(s0(), *s0())
CS::SetInsertionSortMaxSize(1)
t4 = ElapsedMilliseconds()
CS::SortListS(s0(), @ CompareS())
t4 = ElapsedMilliseconds() - t4

; --------------------------------------------------------------------------

msg$ = "-- Sorting strings" + #LF$ +
       "t0 = " + StrD(t0/1000,3) + " Sec. (built-in MergeSort)" + #LF$ +
       "t1 = " + StrD(t1/1000,3) + " Sec."   + #LF$ +
       "t2 = " + StrD(t2/1000,3) + " Sec. <" + #LF$ +
       "t3 = " + StrD(t3/1000,3) + " Sec."   + #LF$ +
       "t4 = " + StrD(t4/1000,3) + " Sec. (self-written MergeSort alone)" + #LF$ +
       #LF$


;-============  Structures  ============

Structure Person
   Idx.i
   Name.s
   Age.i
EndStructure


Procedure CreateRandomStructures (List x.Person(), n.i)
   Protected m$, a, i, r=Int(n/2)
   
   For i = 1 To r
      m$ = RandomString(Random(1000, 200))
      a = Random(80, 20)
      
      AddElement(x())
      x()\Idx = i*2 - 1
      x()\Name = m$
      x()\Age = a
      
      AddElement(x())
      x()\Idx = i*2
      x()\Name = m$
      x()\Age = a + 10
   Next
EndProcedure


Procedure.i ComparePersons (*pa.Integer, *pb.Integer, mode.i)
   ; -- custom comparison function of type 'ProtoCompare'
   ; in : *pa, *pb: pointers to pointers to structures to be compared
   ;      mode    : mode of comparison:
   ;                #PB_Sort_Ascending/#PB_Sort_Descending
   ; out: return value: #True/#False
   Protected.Person *a = *pa\i, *b = *pb\i     ; dereference the pointers passed as parameters
   
   If (mode & #PB_Sort_Descending)
      ProcedureReturn Bool(*a\Name < *b\Name)
   Else
      ProcedureReturn Bool(*a\Name > *b\Name)
   EndIf
EndProcedure


Define u0, u1, u2, u3, u4

NewList p0.Person()

CreateRandomStructures(p0(), n)

; --------------------------------------------------------------------------

threshold = 40

u0 = ElapsedMilliseconds()
SortStructuredList(p0(), #PB_Sort_Ascending, OffsetOf(Person\Name), TypeOf(Person\Name))
u0 = ElapsedMilliseconds() - u0

SortStructuredList(p0(), #PB_Sort_Ascending, OffsetOf(Person\Idx), TypeOf(Person\Idx))
CS::SetInsertionSortMaxSize(2 * threshold)
u1 = ElapsedMilliseconds()
CS::SortListAny(p0(), @ ComparePersons())
u1 = ElapsedMilliseconds() - u1

SortStructuredList(p0(), #PB_Sort_Ascending, OffsetOf(Person\Idx), TypeOf(Person\Idx))
CS::SetInsertionSortMaxSize(threshold)
u2 = ElapsedMilliseconds()
CS::SortListAny(p0(), @ ComparePersons())
u2 = ElapsedMilliseconds() - u2

SortStructuredList(p0(), #PB_Sort_Ascending, OffsetOf(Person\Idx), TypeOf(Person\Idx))
CS::SetInsertionSortMaxSize(0.5 * threshold)
u3 = ElapsedMilliseconds()
CS::SortListAny(p0(), @ ComparePersons())
u3 = ElapsedMilliseconds() - u3

SortStructuredList(p0(), #PB_Sort_Ascending, OffsetOf(Person\Idx), TypeOf(Person\Idx))
CS::SetInsertionSortMaxSize(1)
u4 = ElapsedMilliseconds()
CS::SortListAny(p0(), @ ComparePersons())
u4 = ElapsedMilliseconds() - u4

; --------------------------------------------------------------------------

msg$ + "-- Sorting structures" + #LF$ +
       "u0 = " + StrD(u0/1000,3) + " Sec. (built-in MergeSort)" + #LF$ +
       "u1 = " + StrD(u1/1000,3) + " Sec."   + #LF$ +
       "u2 = " + StrD(u2/1000,3) + " Sec. <" + #LF$ +
       "u3 = " + StrD(u3/1000,3) + " Sec."   + #LF$ +
       "u4 = " + StrD(u4/1000,3) + " Sec. (self-written MergeSort alone)"

MessageRequester("Duration", msg$)
Last edited by Little John on Sun Mar 03, 2019 12:29 pm, edited 5 times in total.
Little John
Addict
Addict
Posts: 4519
Joined: Thu Jun 07, 2007 3:25 pm
Location: Berlin, Germany

Re: Custom sort for Linked Lists

Post by Little John »

Some examples for custom comparison functions

Code: Select all

EnableExplicit

XIncludeFile "SortListCustom.pbi"

Procedure.i CompareD_Abs (*a.Double, *b.Double, mode.i)
   ; -- Sort doubles by their absolute value
   
   If (mode & #PB_Sort_Descending)
      ProcedureReturn Bool(Abs(*a\d) < Abs(*b\d))
   Else
      ProcedureReturn Bool(Abs(*a\d) > Abs(*b\d))
   EndIf
EndProcedure

Define k
NewList d.d()

Debug "Random doubles:"
For k = 1 To 8
   AddElement(d())
   d() = Random(99) - 50
   Debug d()
Next

Debug ""
Debug "Sorted ascending by absolute value:"
CS::SortListD(d(), @ CompareD_Abs())
ForEach d()
   Debug d()
Next

Code: Select all

EnableExplicit

XIncludeFile "SortListCustom.pbi"

Procedure.i CompareS_Val (*a.String, *b.String, mode.i)
   ; -- Sort strings that represent integers numerically rather than alphabetically
   
   If (mode & #PB_Sort_Descending)
      ProcedureReturn Val(*b\s) - Val(*a\s)
   Else
      ProcedureReturn Val(*a\s) - Val(*b\s)
   EndIf
EndProcedure

Define k
NewList s1.s()
NewList s2.s()

Debug "Random numerical strings:"
For k = 1 To 8
   AddElement(s1())
   s1() = Str(Random(13, 7))
   Debug s1()
Next
CopyList(s1(), s2())

Debug ""
Debug "Sorted ascending alphabetically with PureBasic's SortList():"
SortList(s1(), #PB_Sort_Ascending)
ForEach s1()
   Debug s1()
Next

Debug ""
Debug "Sorted ascending numerically using CompareS_Val():"
CS::SortListS(s2(), @ CompareS_Val())
ForEach s2()
   Debug s2()
Next

Code: Select all

EnableExplicit

XIncludeFile "SortListCustom.pbi"

Macro Replace_InPlace_AB (_a_, _b_, _old_, _new_)
   ReplaceString(_a_, _old_, _new_, #PB_String_InPlace)
   ReplaceString(_b_, _old_, _new_, #PB_String_InPlace)
EndMacro

Procedure.i CompareS_ASCII (*a.String, *b.String, mode.i)
   ; -- Sort strings in a way that modified letters are treated the same as the base letters.
   ;    (Here are just some examples. There are several different characters and different
   ;     rules in different languages.)
   ;    Don't change the original data, but use local variables here!
   Protected a$, b$
   
   ; upper case
   a$ = ReplaceString(*a\s,   "Â", "A")
   b$ = ReplaceString(*b\s,   "Â", "A")
   Replace_InPlace_AB(a$, b$, "Á", "A")
   Replace_InPlace_AB(a$, b$, "À", "A")
   Replace_InPlace_AB(a$, b$, "Ä", "A")
   
   Replace_InPlace_AB(a$, b$, "Ê", "E")
   Replace_InPlace_AB(a$, b$, "É", "E")
   Replace_InPlace_AB(a$, b$, "È", "E")
   
   Replace_InPlace_AB(a$, b$, "Î", "I")
   Replace_InPlace_AB(a$, b$, "Í", "I")
   Replace_InPlace_AB(a$, b$, "Ì", "I")
   
   Replace_InPlace_AB(a$, b$, "Ô", "O")
   Replace_InPlace_AB(a$, b$, "Ó", "O")
   Replace_InPlace_AB(a$, b$, "Ò", "O")
   Replace_InPlace_AB(a$, b$, "Ö", "O")
   
   Replace_InPlace_AB(a$, b$, "Û", "U")
   Replace_InPlace_AB(a$, b$, "Ú", "U")
   Replace_InPlace_AB(a$, b$, "Ù", "U")
   Replace_InPlace_AB(a$, b$, "Ü", "U")
   
   ; lower case
   Replace_InPlace_AB(a$, b$, "â", "a")
   Replace_InPlace_AB(a$, b$, "á", "a")
   Replace_InPlace_AB(a$, b$, "à", "a")
   Replace_InPlace_AB(a$, b$, "ä", "a")
   
   Replace_InPlace_AB(a$, b$, "ê", "e")
   Replace_InPlace_AB(a$, b$, "é", "e")
   Replace_InPlace_AB(a$, b$, "è", "e")
   
   Replace_InPlace_AB(a$, b$, "î", "i")
   Replace_InPlace_AB(a$, b$, "í", "i")
   Replace_InPlace_AB(a$, b$, "ì", "i")
   
   Replace_InPlace_AB(a$, b$, "ô", "o")
   Replace_InPlace_AB(a$, b$, "ó", "o")
   Replace_InPlace_AB(a$, b$, "ò", "o")
   Replace_InPlace_AB(a$, b$, "ö", "o")
   
   Replace_InPlace_AB(a$, b$, "û", "u")
   Replace_InPlace_AB(a$, b$, "ú", "u")
   Replace_InPlace_AB(a$, b$, "ù", "u")
   Replace_InPlace_AB(a$, b$, "ü", "u")
   
   Select mode
      Case 0                                           ; ascending
         ProcedureReturn Bool(a$ > b$)
      Case 1                                           ; descending
         ProcedureReturn Bool(a$ < b$)
      Case 2                                           ; ascending and case-insensitive
         ProcedureReturn Bool(UCase(a$) > UCase(b$))
      Case 3                                           ; descending and case-insensitive
         ProcedureReturn Bool(UCase(a$) < UCase(b$))
   EndSelect
EndProcedure
Last edited by Little John on Sun Nov 11, 2018 12:15 pm, edited 1 time in total.
Little John
Addict
Addict
Posts: 4519
Joined: Thu Jun 07, 2007 3:25 pm
Location: Berlin, Germany

Re: Custom sort for Linked Lists

Post by Little John »

Natural sorting
[u]Coding Horror[/u] wrote:The default sort functions in almost every programming language are poorly suited for human consumption.
That means that after sorting a list ascending, for instance "z12" goes before "z7". This happens when strings are sorted according to the ASCII order (or nowadays Unicode order) of their characters, but it's not what most human readers would expect. Natural sorting produces a more "human friendly" result.

There is no "one true way" to do this. The following code covers the most basic points that are listed on Rosetta Code regarding Natural sorting. For more details see comments in the code.

Code: Select all

; -- Natural sorting of linked lists (or arrays)

EnableExplicit


Structure SplitString
   s$
   match.i
EndStructure

Procedure.i _SplitByRegEx (regEx.i, source$, List part.SplitString())
   ; -- split a string into parts that match and parts that don't match a Regular Expression
   ; in : regEx  : number of a Regular Expression generated by CreateRegularExpression()
   ;      source$: string to be split into parts
   ; out: part()      : resulting list of parts
   ;      return value: number of elements in part():
   ;                    0 if source$ = "", > 0 otherwise;
   ;                   -1 on error
   Protected.i left, right
   
   If ExamineRegularExpression(regEx, source$) = 0
      ProcedureReturn -1              ; error
   EndIf
   
   left = 1
   While NextRegularExpressionMatch(regEx)
      right = RegularExpressionMatchPosition(regEx)
      If left < right
         AddElement(part())
         part()\s$ = Mid(source$, left, right-left)
         part()\match = #False
      EndIf
      AddElement(part())
      part()\s$ = RegularExpressionMatchString(regEx)
      part()\match = #True
      left = right + RegularExpressionMatchLength(regEx)
   Wend
   
   If left <= Len(source$)
      AddElement(part())
      part()\s$ = Mid(source$, left)
      part()\match = #False
   EndIf
   
   ProcedureReturn ListSize(part())   ; success
EndProcedure


Define s_Rex_WhiteSpace.i = CreateRegularExpression(#PB_Any, "\s+")  ; \s matches spaces, tabs, and line breaks
Define s_Rex_Integer.i    = CreateRegularExpression(#PB_Any, "\d+")  ; \d matches digits

Macro _ReturnIfNotEqual (_a_, _b_)
   If _a_ < _b_
      ProcedureReturn -cmp
   ElseIf _a_ > _b_
      ProcedureReturn cmp
   EndIf
EndMacro

Procedure.i CompareNatural (*s1.string, *s2.string, mode.i)
   ; -- Compare two strings in a "human friendly" way,
   ;    according to points 1 to 4 listed on Rosetta Code
   ;    <http://rosettacode.org/wiki/Natural_sorting>,
   ;    adapted according to a suggestion by Charles Petzold
   ;    <http://www.charlespetzold.com/blog/2007/12/Sorting-Without-Spaces.html>:
   ;    o Ignore all whitespace characters.
   ;    o Sort without regard to case.
   ;    o Sort numeric portions of strings in numeric order.
   ;      That is split the string into fields on numeric boundaries,
   ;      then sort on each field, where fields of integers are
   ;      treated as numbers, e.g.
   ;      - "foo9.txt" < "foo10.txt"
   ;      - "x9y99" < "x9y100" < "x10y0"
   ;
   ; in : *s1, *s2: pointers to strings to be compared
   ;      mode    : mode of comparison:
   ;                #PB_Sort_Ascending or #PB_Sort_Descending
   ; out: -1, 0, or 1
   Shared s_Rex_WhiteSpace, s_Rex_Integer
   Protected cmp.i, p1.i, p2.i, nxt1.i, nxt2.i, v1.i, v2.i, s1.s, s2.s
   Protected NewList split1.SplitString()
   Protected NewList split2.SplitString()
   
   If (mode & #PB_Sort_Descending)
      cmp = -1
   Else
      cmp = 1
   EndIf   
   
   p1 = _SplitByRegEx(s_Rex_Integer, ReplaceRegularExpression(s_Rex_WhiteSpace, *s1\s, ""), split1())
   p2 = _SplitByRegEx(s_Rex_Integer, ReplaceRegularExpression(s_Rex_WhiteSpace, *s2\s, ""), split2())
   If p1 = -1 Or p2 = -1
      ProcedureReturn 0              ; error
   EndIf   
   
   nxt1 = FirstElement(split1())
   nxt2 = FirstElement(split2())
   
   While nxt1 And nxt2   
      If split1()\match = #True And split2()\match = #True  ; if both parts represent (nonnegative) integers
         v1 = Val(split1()\s$)
         v2 = Val(split2()\s$)
         _ReturnIfNotEqual(v1, v2)
      Else
         s1 = UCase(split1()\s$)
         s2 = UCase(split2()\s$)
         _ReturnIfNotEqual(s1, s2)
      EndIf
      
      nxt1 = NextElement(split1())
      nxt2 = NextElement(split2())
   Wend
   
   _ReturnIfNotEqual(p1, p2)
   ProcedureReturn 0
EndProcedure


CompilerIf #PB_Compiler_IsMainFile
   ; -- Demo
   
   XIncludeFile "SortListCustom.pbi"
   
   NewList x.s()
   
   AddElement(x()) : x() = "New York"
   AddElement(x()) : x() = "Newark"
   AddElement(x()) : x() = "NewYork"
   AddElement(x()) : x() = ~"New\tYork"
   
   AddElement(x()) : x() = "1234567"
   AddElement(x()) : x() = "1 234 567"
   AddElement(x()) : x() = "  27"
   AddElement(x()) : x() = "  5  "
   
   AddElement(x()) : x() = "foo10.txt"
   AddElement(x()) : x() = "foo9.txt"
   AddElement(x()) : x() = "x9y100"
   AddElement(x()) : x() = "x10y0"
   AddElement(x()) : x() = "x9y99"
   
   AddElement(x()) : x() = "cASE INDEPENDENT 4"
   AddElement(x()) : x() = "caSE INDEPENDENT 3"
   AddElement(x()) : x() = "casE independent 1"
   AddElement(x()) : x() = "case INDEPENDENT 2"
   
   ; -------------------------------------
   
   CS::SortListS(x(), @ CompareNatural(), #PB_Sort_Ascending)
   
   Debug "-- Sorted ascending (stable)"
   ForEach x()
      Debug x()
   Next
   
   Debug ""
   Debug "--------------------------------"
   Debug ""
   
   CS::SortListS(x(), @ CompareNatural(), #PB_Sort_Descending)
   
   Debug "-- Sorted descending (stable)"
   ForEach x()
      Debug x()
   Next
CompilerEndIf
Last edited by Little John on Fri Nov 16, 2018 9:51 am, edited 3 times in total.
User avatar
Andre
PureBasic Team
PureBasic Team
Posts: 2056
Joined: Fri Apr 25, 2003 6:14 pm
Location: Germany (Saxony, Deutscheinsiedel)
Contact:

Re: Custom sort for Linked Lists

Post by Andre »

Works very well and could become useful for me. Thank you very much, Little John! :D
Bye,
...André
(PureBasicTeam::Docs & Support - PureArea.net | Order:: PureBasic | PureVisionXP)
Little John
Addict
Addict
Posts: 4519
Joined: Thu Jun 07, 2007 3:25 pm
Location: Berlin, Germany

Re: Custom sort for Linked Lists

Post by Little John »

Thanks André, you are welcome. :-)
Little John
Addict
Addict
Posts: 4519
Joined: Thu Jun 07, 2007 3:25 pm
Location: Berlin, Germany

Re: Custom sort for Linked Lists

Post by Little John »

New version 1.10, 2018-11-11

Changes
The internal comparison logic of the sorting algorithms has been changed, in order to allow for more flexibility in writing comparison callback functions.
This also means that "old" comparison callback functions are not valid anymore. Sorry for that.

old:

Code: Select all

Procedure.i CompareD (*a.Double, *b.Double, mode.i)
   ; -- custom comparison function of type 'ProtoCompare'
   ; in : *a, *b: pointers to doubles to be compared
   ;      mode  : mode of comparison:
   ;              #PB_Sort_Ascending/#PB_Sort_Descending
   ; out: return value: #True/#False
   
   If (mode & #PB_Sort_Descending)
      ProcedureReturn Bool(*a\d >= *b\d)
   Else
      ProcedureReturn Bool(*a\d <= *b\d)
   EndIf
EndProcedure
-----------------------------------------------------------------------------

new:

Code: Select all

Procedure.i CompareD (*a.Double, *b.Double, mode.i)
   ; -- custom comparison function of type 'ProtoCompare'
   ; in : *a, *b: pointers to doubles to be compared
   ;      mode  : mode of comparison:
   ;              #PB_Sort_Ascending/#PB_Sort_Descending
   ; out: return value: #True/#False
   
   If (mode & #PB_Sort_Descending)
      ProcedureReturn Bool(*a\d < *b\d)
   Else
      ProcedureReturn Bool(*a\d > *b\d)
   EndIf
EndProcedure

Code: Select all

Procedure.i CompareI (*a.Integer, *b.Integer, mode.i)
   ; -- custom comparison function of type 'ProtoCompare'
   ; in : *a, *b: pointers to integers to be compared
   ;      mode  : mode of comparison:
   ;              #PB_Sort_Ascending/#PB_Sort_Descending
   ; out: return value: < 0, 0, or > 0
   
   If (mode & #PB_Sort_Descending)
      ProcedureReturn *b\i - *a\i
   Else
      ProcedureReturn *a\i - *b\i
   EndIf
EndProcedure

Code: Select all

Procedure.i CompareS (*a.String, *b.String, mode.i)
   ; -- custom comparison function of type 'ProtoCompare'
   ; in : *a, *b: pointers to strings to be compared
   ;      mode  : mode of comparison:
   ;              #PB_Sort_Ascending/#PB_Sort_Descending/#PB_Sort_NoCase
   ; out: return value: < 0, 0, or > 0
   Protected ret.i=0
   
   If (mode & #PB_Sort_NoCase)
      ret = CompareMemoryString(@ *a\s, @ *b\s, #PB_String_NoCase)
   Else
      ret = CompareMemoryString(@ *a\s, @ *b\s)
   EndIf
   
   If (mode & #PB_Sort_Descending)
      ProcedureReturn -ret
   Else
      ProcedureReturn ret
   EndIf
EndProcedure
Little John
Addict
Addict
Posts: 4519
Joined: Thu Jun 07, 2007 3:25 pm
Location: Berlin, Germany

Re: Custom sort for Linked Lists

Post by Little John »

New version 1.20, 2018-11-14

Changes
The new code uses the fastest PureBasic MergeSort implementation for linked lists that I'm aware of after having tested various different possibilities. It is considerably faster than the previous version 1.10. For long lists it can be e.g. 10 times as fast.

Since no in-place MergeSort is used anymore (as in the previous version), the previously used ASM trick for passing a pointer to any arbitrary list can no longer be used. SortListAny() is now a public macro, which internally creates a list of pointers to the structures, then sorts the pointer list and finally rearranges the original structure list according to the order of the pointers.

As a consequence, the callback comparison functions used in combination with SortListAny() have to be a bit different now. That means, in the callback function the pointers have to be dereferenced, for instance:

old

Code: Select all

Procedure.i ComparePersons (*a.Person, *b.Person, mode.i)
   ; -- custom comparison function of type 'ProtoCompare'
   ; in : *a, *b: pointers to structures to be compared
   ;      mode  : mode of comparison:
   ;              #PB_Sort_Ascending/#PB_Sort_Descending
   ; out: return value: #True/#False
   
   If (mode & #PB_Sort_Descending)
      ProcedureReturn Bool(*a\Name < *b\Name)
   Else
      ProcedureReturn Bool(*a\Name > *b\Name)
   EndIf
EndProcedure
-----------------------------------------------------------------------------

new

Code: Select all

Procedure.i ComparePersons (*pa.Integer, *pb.Integer, mode.i)
   ; -- custom comparison function of type 'ProtoCompare'
   ; in : *pa, *pb: pointers to pointers to structures to be compared
   ;      mode    : mode of comparison:
   ;                #PB_Sort_Ascending/#PB_Sort_Descending
   ; out: return value: #True/#False

   Protected.Person *a = *pa\i, *b = *pb\i     ; dereference the pointers that are passed as parameters
   
   If (mode & #PB_Sort_Descending)
      ProcedureReturn Bool(*a\Name < *b\Name)
   Else
      ProcedureReturn Bool(*a\Name > *b\Name)
   EndIf
EndProcedure
jassing
Addict
Addict
Posts: 1745
Joined: Wed Feb 17, 2010 12:00 am

Re: Custom sort for Linked Lists

Post by jassing »

Good job! Thanks for sharing. Speed is impressive.
Little John
Addict
Addict
Posts: 4519
Joined: Thu Jun 07, 2007 3:25 pm
Location: Berlin, Germany

Re: Custom sort for Linked Lists

Post by Little John »

Thank you. :-)

The 5th message of this thread previously had remained empty.
Now it contains code that does Natural Sorting, which produces results that are rather convenient for human readers.
User avatar
Andre
PureBasic Team
PureBasic Team
Posts: 2056
Joined: Fri Apr 25, 2003 6:14 pm
Location: Germany (Saxony, Deutscheinsiedel)
Contact:

Re: Custom sort for Linked Lists

Post by Andre »

Well done, thanks! :D
Bye,
...André
(PureBasicTeam::Docs & Support - PureArea.net | Order:: PureBasic | PureVisionXP)
User avatar
RSBasic
Moderator
Moderator
Posts: 1218
Joined: Thu Dec 31, 2009 11:05 pm
Location: Gernsbach (Germany)
Contact:

Re: Custom sort for Linked Lists

Post by RSBasic »

Thanks for sharing. Image
Image
Image
Little John
Addict
Addict
Posts: 4519
Joined: Thu Jun 07, 2007 3:25 pm
Location: Berlin, Germany

Re: Custom sort for Linked Lists

Post by Little John »

You are welcome! :-)

New version 1.21, 2019-01-27

License changed from Public Domain to MIT License.
Nothing else changed.
Fred
Administrator
Administrator
Posts: 16617
Joined: Fri May 17, 2002 4:39 pm
Location: France
Contact:

Re: Custom sort for Linked Lists

Post by Fred »

That's serious good work !
Post Reply