Page 2 of 3
Re: Sorting option(quick/merge) for arrays.
Posted: Fri Feb 12, 2010 3:39 am
by skywalk
My bad, I quickly scanned your code and VC__swapMemory is the only ASM part.
Thanks for quantifying your merge routine against Purebasic's internal.
The merge sort I use has no recursion and is based on the list-merge algorithm found in Knuth, Vol. 3.
(I rarely use recursion as code redundancy usually beats it at the expense of memory.)
The merge routine requires a secondary pointer array that actually holds the sorted element locations.
As I mentioned, I'm not really "there" yet in the conversion process, but I can jump ahead and convert the merge and quick sorts and post them for you to compare with the Purebasic internal. I know they will be slower, just good to know how much?
Re: Sorting option(quick/merge) for arrays.
Posted: Fri Feb 12, 2010 4:45 am
by Demivec
skywalk wrote:The merge sort I use has no recursion and is based on the list-merge algorithm found in Knuth, Vol. 3.
(I rarely use recursion as code redundancy usually beats it at the expense of memory.)
The merge routine requires a secondary pointer array that actually holds the sorted element locations.
I have a brief comment regarding the two in-place methods. The one in the code I posted requires additional stack space that depends on the level of recursion (i.e. for 50000000 elements this is only equal to a max of 24 levels of recursion). The one you suggested will use additional space to hold an array of element indexes (or pointers) equal in size to the number of elements operated on.
You are right that it usually is a choice between memory use and speed. I chose the one I used precisely for its lower estimated memory use. I am interested in seeing your approach to make comparisons on it's memory and speed use also. The next time around I may need speed more than memory.

Re: Sorting option(quick/merge) for arrays.
Posted: Fri Feb 12, 2010 9:35 am
by Fred
If you need to sort on several fields (for example name, then forname etc.), you can do it easily by using the range feature of the Sort() functions: sort the array once, then iterate the array and when your main key change, sort this range on this field.
Re: Sorting option(quick/merge) for arrays.
Posted: Fri Feb 12, 2010 3:30 pm
by Demivec
Fred wrote:If you need to sort on several fields (for example name, then forname etc.), you can do it easily by using the range feature of the Sort() functions: sort the array once, then iterate the array and when your main key change, sort this range on this field.
Thanks for the tip. I'd like to suggest that that tip be mentioned in the help documentation for the sort library.

Re: Sorting option(quick/merge) for arrays.
Posted: Sat Feb 13, 2010 12:31 pm
by Trond
Btw I just thought of something: "Stable sort"? Who needs their horses sorted?
Re: Sorting option(quick/merge) for arrays.
Posted: Sat Feb 13, 2010 6:05 pm
by Demivec
Trond wrote:Btw I just thought of something: "Stable sort"? Who needs their horses sorted?
If someone needs horses from a stable it is not uncommon to ask them "what sort of horses would you like?"

Re: Sorting option(quick/merge) for arrays.
Posted: Sun Feb 14, 2010 4:17 pm
by Demivec
@skywalk: It might be a good idea to start a thread in Coding Questions regarding programming solutions for sorting on multiple keys, or implementing a stable-sort.
With that in mind I've taken a stab at the trick Fred posted regarding an array of structures that have only two fields. Preliminary results were about 4 times faster than the MergeSort version I wrote. To get some more experience using Fred's suggestion I was going to add a few fields to the structure. Since my tests are based on fictional structures it would be helpful to know how many fields are in the structures of the code you are converting? I'll then post code that might be useful as a kind of template for specific needs.
Re: Sorting option(quick/merge) for arrays.
Posted: Sun Feb 14, 2010 10:14 pm
by skywalk
Hi Demivec,
Weekends are for beer and sun and football and ???coding???
(I'm sure I left out other activities

)
Well, I am converting some sort functions now and will post shortly.
Data is data, and mine is probably not much different in scope than most.
The data is dynamic in columns and rows.
Data can be sourced from the clipboard or ascii text files or DB recordset.
Whether lazily or sloppily, I coded in VB6 with dynamic arrays contained in structures and it all works.
Moving to Purebasic implies a decision. Use arrays or linked lists or structured lists.
My decision is to convert to arrays and remove arrays from the structures.
Keep things simple and hopefully faster.
The requirement for stable sort(merge vs quick) is that a user requested sort of a column or columns must retain the element location for duplicate locations.
Otherwise, I lose or jumble the data and that is bad.
Quicksorts are fine and preferred for unique lists, provided they are not nearly sorted.
MergeSort() for arrays
Re: Sorting option(quick/merge) for arrays.
Posted: Mon Mar 05, 2012 7:33 am
by skywalk
Sorry to bump this topic, but I have a simple example that screams for MergeSort.
I'd like to keep the data in an array if possible, but the SortStructuredArray() function jumbles the order(unstable) of equal data.
Maybe we could get Merge option for Structured Arrays also?
I have looked at a lot of my data structures, and very few are truly unweighted or independent of related fields.
And what did Fred mean about about range sorting?
Code: Select all
Structure myABC
a$
b$
c$
EndStructure
Define.i i, tw = 4
Dim myL.myABC(6)
Restore SortThis
For i = 0 To 6
Read.s myL(i)\a$
Read.s myL(i)\b$
Read.s myL(i)\c$
Next
Debug "-- Before Sort --"
Debug LSet("a$", tw) + LSet("b$", tw) + LSet("c$", tw)
Debug LSet("--", tw) + LSet("--", tw) + LSet("--", tw)
For i = 0 To 6
Debug LSet(myL(i)\a$, tw) + LSet(myL(i)\b$, tw) + LSet(myL(i)\c$, tw)
Next
Debug "-- After 1st SortStructuredArray() by c$ --"
Debug "-- Then 2nd SortStructuredArray() by b$ --"
SortStructuredArray(myL(),#PB_Sort_Ascending|#PB_Sort_NoCase,OffsetOf(myABC\c$),#PB_Sort_String)
SortStructuredArray(myL(),#PB_Sort_Ascending|#PB_Sort_NoCase,OffsetOf(myABC\b$),#PB_Sort_String)
Debug LSet("a$", tw) + LSet("b$", tw) + LSet("c$", tw)
Debug LSet("--", tw) + LSet("--", tw) + LSet("--", tw)
For i = 0 To 6
Debug LSet(myL(i)\a$, tw) + LSet(myL(i)\b$, tw) + LSet(myL(i)\c$, tw)
Next
DataSection
SortThis:
; a$, b$, c$
Data.s "a", "2", "5"
Data.s "a", "2", "1"
Data.s "x", "1", "b"
Data.s "y", "1", "a"
Data.s "z", "3", "z"
Data.s "d", "3", "y"
Data.s "n", "3", "x"
IWantThis:
; a$, b$, c$ ; Note the order of column c$
; for equal values of column b$.
Data.s "y", "1", "a" ;<-- "a then b"
Data.s "x", "1", "b"
Data.s "a", "2", "1" ;<-- "1 then 5"
Data.s "a", "2", "5"
Data.s "n", "3", "x" ;<-- "x y z"
Data.s "d", "3", "y"
Data.s "z", "3", "z"
IGetThis:
; a$, b$, c$ ; Note the order of column c$
; for equal values of column b$.
Data.s "x", "1", "a" ;<-- "a then b"
Data.s "y", "1", "b"
Data.s "a", "2", "1" ;<-- "1 then 5"
Data.s "a", "2", "5"
Data.s "d", "3", "y" ;<-- "y z x" <-- ERROR -->
Data.s "z", "3", "z"
Data.s "n", "3", "x"
EndDataSection
Re: Sorting option(quick/merge) for arrays.
Posted: Mon Mar 05, 2012 4:06 pm
by Demivec
skywalk wrote:Sorry to bump this topic, but I have a simple example that screams for MergeSort.
I'd like to keep the data in an array if possible, but the SortStructuredArray() function jumbles the order(unstable) of equal data.
Maybe we could get Merge option for Structured Arrays also?
I have looked at a lot of my data structures, and very few are truly unweighted or independent of related fields.
And what did Fred mean about about range sorting?
Fred meant that if you are sorting on field 1, then on field 2 that you would first sort field 1 then resort the sub-ranges on field 2 where the field 1 values are equal. In other words if you had 3 entrys where the first field equaled "A" then you would sort the subrange of 1 to 3 for field 2.
Here is some example code that uses an include that generalizes multi-field sorting for a single structure, see link in the Edit comment for a copy of the include:
Code: Select all
;Author: Demivec
EnableExplicit
IncludeFile "Array_MultiFieldSort.pbi"
;-setup
Structure myABC
a$
b$
c$
EndStructure
Define i, entryCount
Restore SortThis
Read.i entryCount
Dim myL.myABC(entryCount)
For i = 0 To ArraySize(myL())
Read.s myL(i)\a$
Read.s myL(i)\b$
Read.s myL(i)\c$
Next
Procedure display_myABC(Array x.myABC(1), header.s = "", footer.s = "")
Protected i, tw = 4
Debug header
Debug LSet("a$", tw) + LSet("b$", tw) + LSet("c$", tw)
Debug LSet("--", tw) + LSet("--", tw) + LSet("--", tw)
For i = 0 To ArraySize(x())
Debug LSet(x(i)\a$, tw) + LSet(x(i)\b$, tw) + LSet(x(i)\c$, tw)
Next
Debug footer
EndProcedure
display_myABC(myL(), "-- Before Sort --")
;-sort routines
mcr_multiSort(myABC)
;-sort
Dim myl_sortspecs.sortFieldSpecs(1) ;Array should be sized to be one less than the number of fields to be sorted
mcr_setSortSpec(myl_sortspecs, 0, OffsetOf(myABC\b$), #PB_Sort_String, #PB_Sort_Ascending)
mcr_setSortSpec(myl_sortspecs, 1, OffsetOf(myABC\c$), #PB_Sort_String, #PB_Sort_Ascending)
multiSort_myABC(myL(), myl_sortspecs())
display_myABC(myL(), "-- Sort by: b$ A, then c$ A --")
DataSection
SortThis:
Data.i 10 ;0 based entry count
; a$, b$, c$
Data.s "a", "2", "5"
Data.s "a", "2", "1"
Data.s "x", "1", "b"
Data.s "y", "1", "a"
Data.s "a", "3", "5"
Data.s "z", "3", "z"
Data.s "a", "2", "1"
Data.s "d", "3", "y"
Data.s "y", "1", "c"
Data.s "y", "1", "b"
Data.s "n", "3", "x"
EndDataSection
Here is an example that uses a structure containing different element types and sorts all 4 fields, some in Ascending and some in Descending order:
-- Code Removed -- See comment regarding edit.
The code to perform the multi-field sort is currently limited to handling the sorting of fields for a particular structure, as can be seen by comparing the two code samples. Even with that limitation it is generalized to handle any number of fields each of a different standard PB variable type in the given structure (excluding arrays, lists, and maps). I constructed the code by lifting some routines from other code I have written. I only had time to generalize some of it. If more time was to be found the sorting code could be further generalized to handle any structure.
I hope you find it useful.
If I generalize this further I'll repost the code to the Tips & Tricks forum and remove it from here.
@Edit: Code removed as predicted and after being refined a little bit, posted
here.
Re: Sorting option(quick/merge) for arrays.
Posted: Mon Mar 05, 2012 5:08 pm
by Trond
Edit: Buggy code in this post, see below for a corrected version:
http://purebasic.fr/english/viewtopic.p ... 93#p375693
Very simple:
Code: Select all
Macro SubSortStructuredArray(Arr, MainField, SubOptions, SubOffset, SubType)
Define SubSort_Start, SubSort_I
SubSort_Start = 0
For SubSort_I = 1 To ArraySize(Arr())
If Arr(SubSort_Start)\MainField <> Arr(SubSort_I)\MainField
SortStructuredArray(Arr(), SubOptions, SubOffset, SubType, SubSort_Start, SubSort_I)
SubSort_Start = SubSort_I+1
EndIf
Next
EndMacro
Need to sort on three fields? Sort first normally on primary, then use SubSort with MainField=primary and SubOffset secondary, then use SubSort with MainField=secondary and SubOffset tertiary. And so on for an unlimited number of fields.
Code: Select all
;- Example
Structure myABC
a$
b$
c$
EndStructure
Define.i i, tw = 4
Dim myL.myABC(6)
; Fill array
For i = 0 To 6
Read.s myL(i)\a$
Read.s myL(i)\b$
Read.s myL(i)\c$
Next
; Print before sort
Debug "-- Before Sort --"
Debug LSet("a$", tw) + LSet("b$", tw) + LSet("c$", tw)
Debug LSet("--", tw) + LSet("--", tw) + LSet("--", tw)
For i = 0 To 6
Debug LSet(myL(i)\a$, tw) + LSet(myL(i)\b$, tw) + LSet(myL(i)\c$, tw)
Next
; Primary sort on b$, secondary sort on c$
SortStructuredArray(myL(), #PB_Sort_Ascending | #PB_Sort_NoCase, OffsetOf(myABC\b$), #PB_Sort_String)
SubSortStructuredArray(myL, c$, #PB_Sort_Ascending | #PB_Sort_NoCase, OffsetOf(myABC\c$), #PB_Sort_String)
; Print after sort
Debug "-- After Sort --"
Debug LSet("a$", tw) + LSet("b$", tw) + LSet("c$", tw)
Debug LSet("--", tw) + LSet("--", tw) + LSet("--", tw)
For i = 0 To 6
Debug LSet(myL(i)\a$, tw) + LSet(myL(i)\b$, tw) + LSet(myL(i)\c$, tw)
Next
DataSection
SortThis:
; a$, b$, c$
Data.s "a", "2", "5"
Data.s "a", "2", "1"
Data.s "x", "1", "b"
Data.s "y", "1", "a"
Data.s "z", "3", "z"
Data.s "d", "3", "y"
Data.s "n", "3", "x"
EndDataSection
Re: Sorting option(quick/merge) for arrays.
Posted: Mon Mar 05, 2012 6:25 pm
by skywalk
Thanks Demivec. I will study your approach.
The example I gave was limited to only strings in the structure, but the reality is other types can exist.
Trond the macro master...
I get confusing results if I try any other sorts?
Code: Select all
; THIS WORKS...
; Primary sort on b$, secondary sort on c$
;SortStructuredArray(myL(), #PB_Sort_Ascending | #PB_Sort_NoCase, OffsetOf(myABC\b$), #PB_Sort_String)
;SubSortStructuredArray(myL, c$, #PB_Sort_Ascending | #PB_Sort_NoCase, OffsetOf(myABC\c$), #PB_Sort_String)
; THIS FAILS...
; Primary sort on c$, secondary sort on b$
SortStructuredArray(myL(), #PB_Sort_Ascending | #PB_Sort_NoCase, OffsetOf(myABC\c$), #PB_Sort_String)
SubSortStructuredArray(myL, b$, #PB_Sort_Ascending | #PB_Sort_NoCase, OffsetOf(myABC\b$), #PB_Sort_String)
And, if I change 1 of the fields to integer, it also fails.
I think merge sort should be an option for any array that is not a standard type.
The speed of QuickSort is lost when you must do so much overhead to retain the order of duplicates
Another compromise could be:
Expose the pointer array from the internal SortStructuredArray() command.
Then, I could use them for this purpose.

Re: Sorting option(quick/merge) for arrays.
Posted: Mon Mar 05, 2012 8:11 pm
by Trond
I get confusing results if I try any other sorts?
Oups, the example is wrong, and also when I correct the example the macro stops working... It seems to need more tweaking.
Edit: Right, should be working now. Also I added a performance optimization.
Code: Select all
Macro SubSortStructuredArray(Arr, MainField, SubOptions, SubOffset, SubType)
Define SubSort_Start, SubSort_I
SubSort_Start = 0
For SubSort_I = 1 To ArraySize(Arr())
If Arr(SubSort_Start)\MainField <> Arr(SubSort_I)\MainField
If SubSort_Start - (SubSort_I - 1) <> 0 ; Performance optimization
; Debug Str(SubSort_Start) + ", " + Str(subsort_i-1)
SortStructuredArray(Arr(), SubOptions, SubOffset, SubType, SubSort_Start, SubSort_I-1)
EndIf
SubSort_Start = SubSort_I
EndIf
Next
; Debug Str(SubSort_Start) + ", " + Str(subsort_i-1)
SortStructuredArray(Arr(), SubOptions, SubOffset, SubType, SubSort_Start, SubSort_I-1)
EndMacro
;- Example
Structure myABC
a$
b$
c$
EndStructure
Define.i i, tw = 4
Dim myL.myABC(6)
; Fill array
For i = 0 To 6
Read.s myL(i)\a$
Read.s myL(i)\b$
Read.s myL(i)\c$
Next
; Print before sort
Debug "-- Before Sort --"
Debug LSet("a$", tw) + LSet("b$", tw) + LSet("c$", tw)
Debug LSet("--", tw) + LSet("--", tw) + LSet("--", tw)
For i = 0 To 6
Debug LSet(myL(i)\a$, tw) + LSet(myL(i)\b$, tw) + LSet(myL(i)\c$, tw)
Next
; Primary sort on b$, secondary sort on c$
SortStructuredArray(myL(), #PB_Sort_Ascending | #PB_Sort_NoCase, OffsetOf(myABC\b$), #PB_Sort_String)
SubSortStructuredArray(myL, b$, #PB_Sort_Ascending | #PB_Sort_NoCase, OffsetOf(myABC\c$), #PB_Sort_String)
; ^ This is the key was previously sorted ^ This is the key that should be sorted next
; ; Primary sort on c$, secondary sort on b
; SortStructuredArray(myL(), #PB_Sort_Ascending | #PB_Sort_NoCase, OffsetOf(myABC\c$), #PB_Sort_String)
; SubSortStructuredArray(myL, c$, #PB_Sort_Ascending | #PB_Sort_NoCase, OffsetOf(myABC\b$), #PB_Sort_String)
; ; ^ This is the key was previously sorted ^ This is the key that should be sorted next
; ; Note: due to the nature of the example data, secondary sort on $b after sort on $c does nothing.
; Print after sort
Debug "-- After Sort --"
Debug LSet("a$", tw) + LSet("b$", tw) + LSet("c$", tw)
Debug LSet("--", tw) + LSet("--", tw) + LSet("--", tw)
For i = 0 To 6
Debug LSet(myL(i)\a$, tw) + LSet(myL(i)\b$, tw) + LSet(myL(i)\c$, tw)
Next
DataSection
SortThis:
; a$, b$, c$
Data.s "a", "2", "5"
Data.s "a", "2", "1"
Data.s "x", "1", "b"
Data.s "y", "1", "a"
Data.s "z", "3", "z"
Data.s "d", "3", "y"
Data.s "n", "3", "x"
EndDataSection
Re: Sorting option(quick/merge) for arrays.
Posted: Mon Mar 05, 2012 10:16 pm
by skywalk
That is cool Trond!
It also works for different data types in the structure. Macros are versatile.

Re: Sorting option(quick/merge) for arrays.
Posted: Tue Mar 06, 2012 2:18 am
by kenmo
This is a very informative thread about sorting in PureBasic... thank you guys.
I have been using SortStructuredList() on the assumption that it was "stable" (which I now understand means: it retains order of previous sorts) but it is good to hear that officially confirmed.