Stable in-place MergeSort for arrays
Posted: Thu Feb 11, 2010 10:14 pm
Code updated for 5.20+
Here is a stable in-place MergeSort routine for structured arrays or similarly structured copies of arrays.
It is coded for v4.41 and supports all of the standard sort types. It does not allow sorts using the #PB_NoCase option, though it is possible to define additional search comparison types (at compile-time).
I ripped this out of some other code I had been writing and made it presentable. Hopefully there aren't any rough edges.
See the comments at the codes beginning about credit for contributions, great and small.
And a sloppy example to go with it
:
Here is a stable in-place MergeSort routine for structured arrays or similarly structured copies of arrays.
It is coded for v4.41 and supports all of the standard sort types. It does not allow sorts using the #PB_NoCase option, though it is possible to define additional search comparison types (at compile-time).
I ripped this out of some other code I had been writing and made it presentable. Hopefully there aren't any rough edges.
See the comments at the codes beginning about credit for contributions, great and small.
Code: Select all
;Programmed by: Demivec (Jared Johnson) with major contributions from others
;File: "StableMergeSort.pbi" v1.0
;Date: 2/11/2010
;Coded for PureBasic v4.41
;
;Description: performs a stable in-place mergeSort on an array of structured elements (or
; an equivalently structured memory buffer)
;
;Large portions of code dealing with the inPlaceMergeSort were based on code from
;"Fastest In-Place Stable Sort" by Craig Brown posted on The Code Project which was based
;on an "algorithm taken from Thomas Baudel's recursive C version
;http://thomas.baudel.name/Visualisation/VisuTri/inplacestablesort.html " and was
;re-written by Craig Brown to improve clarity. I've converted Craig Brown's version
;(writtin in a version of Basic) for use in PureBasic with some slight changes that
;mostly involved changes to variable names and some minor restructuring.
;
;The VC__swapMemory() function was written by Thorium and shared on the PureBasic
;programming forum.
;===================================================================================
;Include "StableMergeSort.pbi" ;use path and name of this file
;
;InitMergeSort() ;if result of InitMergeSort() = 1 then it is ready for use.
;===================================================================================
;Call it by using
;
;VC_structuredSort(*array, options, arraySize, elementSize, offset, cType, firstIndex, LastIndex)
;
;All parameters are required. [cType] is a sort type (i.e. #PB_Integer), and
;[option] can be either #PB_Sort_Ascending or #PB_Sort_Descending.
;
;;For example, if an array 'Results' of structure 'Test' is defined as
;
; Structure Test
; name.s
; score.i
; EndStructure
;
; Dim Results.Test(100)
;
;;It could be sorted using
;
;VC_structuredSort(Results(), #PB_Sort_Ascending, ArraySize(Results()), Sizeof(Test), OffsetOf(Test\score), #PB_Integer, 0, 100)
;
;===================================================================================
Enumeration
;The vecClass standard compare functions return one of the following comparison
;results. User-Defined comparison functions should also return these values.
#VC_CompareIsLessThan = -1
#VC_CompareIsEqualTo
#VC_CompareIsGreaterThan
EndEnumeration
; ------ constant for internal use only ------
#_VC_CompareToggleValue = #VC_CompareIsGreaterThan ! #VC_CompareIsLessThan ;used to toggle between a lessThan result and a greaterThan result(i.e. result ! #_VC_CompareToggleValue)
Global NewMap comparisonType()
Prototype pf_CompElementFunc(*element_1, *element_2) ;*element's will be typed in the corresponding function being prototyped.
;=================================================================================
; Description: An internal routine for swapping memory in place between two memory
; blocks without needing any additional memory buffer space.
;
; Notes: Used by the Swap() method and procedures handling the MergeSort and
; InsertionSort routines.
; (swapMemory method authored by Thorium of the PureBasic forum)
;=================================================================================
Procedure VC__swapMemory(*sourceMemory, *destinationMemory, memoryLength)
CompilerIf #PB_Compiler_Processor = #PB_Processor_x86
!PUSH Esi
!PUSH Edi
!MOV Esi, [p.p_sourceMemory+8]
!MOV Edi, [p.p_destinationMemory+8]
!MOV Ecx, [p.v_memoryLength+8]
!PUSH Ecx
!SHR Ecx, 2
!align 4
!MemorySwapASM3_LoopStart1:
!CMP Ecx, 0
!JE MemorySwapASM3_LoopEnd1
!MOV Eax, [Esi]
!MOV Edx, [Edi]
!MOV [Esi], Edx
!MOV [Edi], Eax
!ADD Esi, 4
!ADD Edi, 4
!DEC Ecx
!JMP MemorySwapASM3_LoopStart1
!MemorySwapASM3_LoopEnd1:
!POP Ecx
!AND Ecx, 3
!MemorySwapASM3_LoopStart2:
!CMP Ecx, 0
!JE MemorySwapASM3_LoopEnd2
!MOV al, [Esi]
!MOV dl, [Edi]
!MOV [Esi], dl
!MOV [Edi], al
!INC Esi
!INC Edi
!DEC Ecx
!JMP MemorySwapASM3_LoopStart2
!MemorySwapASM3_LoopEnd2:
!POP Edi
!POP Esi
CompilerElse
!MOV r8, [p.p_sourceMemory]
!MOV r9, [p.p_destinationMemory]
!MOV rcx, [p.v_memoryLength]
!MOV r10, rcx
!SHR rcx, 3
!align 4
!MemorySwapASM3_LoopStart1:
!CMP rcx, 0
!JE MemorySwapASM3_LoopEnd1
!MOV rax, [r8]
!MOV rdx, [r9]
!MOV [r8], rdx
!MOV [r9], rax
!ADD r8, 8
!ADD r9, 8
!DEC rcx
!JMP MemorySwapASM3_LoopStart1
!MemorySwapASM3_LoopEnd1:
!AND r10, 7
!MemorySwapASM3_LoopStart2:
!CMP r10, 0
!JE MemorySwapASM3_LoopEnd2
!MOV al, [r8]
!MOV dl, [r9]
!MOV [r8], dl
!MOV [r9], al
!INC r8
!INC r9
!DEC r10
!JMP MemorySwapASM3_LoopStart2
!MemorySwapASM3_LoopEnd2:
CompilerEndIf
EndProcedure
;=================================================================================
; Description: Part of the structuredSort() method.
;
; Notes: Used by procedures handling the MergeSort routine.
; Sorts array elements using an insertion sort.
;=================================================================================
Procedure VC__insertSort(firstIndex, lastIndex, *arrayBase, *offsetArrayBase, elementSize, compFunc.pf_CompElementFunc, sortOrder)
Protected i, j, tw, tw2
If lastIndex > firstIndex + 1
i = firstIndex + 1
While i <= lastIndex
j = i
tw = *offsetArrayBase + j * elementSize ;slight speed optimization to use stepping instead of multiplication inside loop
While j > firstIndex
If compFunc(tw, tw - elementSize) = sortOrder
tw2 = *arrayBase + j * elementSize
VC__swapMemory(tw2, tw2 - elementSize, elementSize)
Else
Break
EndIf
j - 1
tw - elementSize
Wend
i + 1
Wend
EndIf
EndProcedure
;=================================================================================
; Description: Part of the structuredSort() method.
;
; Returns: Returns the index of the array element that is >= the data at [*value]
; if sorting in ascending order or <= the data at [*value] if sorting
; in descending order.
;
; Notes: Used by procedures handling the MergeSort routine.
;=================================================================================
Procedure VC__binarySearchForFirstElementGEValue(*value, firstIndex, lastIndex, *offsetArrayBase, elementSize, compFunc.pf_CompElementFunc, sortOrder)
Protected middleIndex
lastIndex - 1
While (lastIndex - firstIndex) > 1
middleIndex = (firstIndex + lastIndex) / 2
If compFunc(*offsetArrayBase + middleIndex * elementSize, *value) <> sortOrder
lastIndex = middleIndex
Else
firstIndex = middleIndex
EndIf
Wend
If compFunc(*offsetArrayBase + firstIndex * elementSize, *value) <> sortOrder
ProcedureReturn firstIndex
Else
If compFunc(*offsetArrayBase + lastIndex * elementSize, *value) <> sortOrder
ProcedureReturn lastIndex
Else
ProcedureReturn lastIndex + 1
EndIf
EndIf
EndProcedure
;=================================================================================
; Description: Part of the structuredSort() method.
;
; Returns: Returns the index of the array element that is > the data at [*value]
; while sorting in ascending order and < the data at [*value] if sorting
; in descending order.
;
; Notes: Used by procedures handling the MergeSort routine.
;=================================================================================
Procedure VC__binarySearchForFirstElementGTValue(*value, firstIndex, lastIndex, *offsetArrayBase, elementSize, compFunc.pf_CompElementFunc, sortOrder)
Protected middleIndex
lastIndex - 1
While (lastIndex - firstIndex) > 1
middleIndex = (firstIndex + lastIndex) / 2
If compFunc(*offsetArrayBase + middleIndex * elementSize, *value) = sortOrder
lastIndex = middleIndex
Else
firstIndex = middleIndex
EndIf
Wend
If compFunc(*offsetArrayBase + firstIndex * elementSize, *value) = sortOrder
ProcedureReturn firstIndex
Else
If compFunc(*offsetArrayBase + lastIndex * elementSize, *value) = sortOrder
ProcedureReturn lastIndex
Else
ProcedureReturn lastIndex + 1
EndIf
EndIf
EndProcedure
;=================================================================================
; Description: Part of the structuredSort() method.
;
; Returns: Returns the greatest common denominator of the integers [m] and [n].
;
; Notes: Used by procedures handling the MergeSort routine.
;=================================================================================
Procedure VC__greatestCommonDenominator(m, n)
Protected T
While n <> 0
T = m % n
m = n
n = T
Wend
ProcedureReturn m
EndProcedure
;=================================================================================
; Description: Part of the structuredSort() method.
;
; Notes: Used by procedures handling the MergeSort routine.
; Moves a block of memory forward while moving another block of memory
; backwards to make room. The blocks can be different sizes.
;=================================================================================
Procedure VC__pushBlock(block1Start, block1Length, block1Dest, *arrayBase, elementSize)
;Function pushes a block of memory forward and another block backwards (to make room).
;block1Start,block1Length,block1Dest,block2Length are all array indexes ranging from 0 -> (arraySize)
Protected block2Length
Protected loopNo
Protected currentStartIndex, currentIndex, sourceIndex
If block1Length = 0 Or block1Dest = block1Start
ProcedureReturn
EndIf
;Block 1 goes forwards, block 2 goes backwards.
;Calculate size of the block moving backwards.
block2Length = block1Dest - block1Start
;use a slight speed optimization of doing multiplication outside the loop and
;only stepping while in the loop
loopNo = VC__greatestCommonDenominator(block1Length + block2Length, block1Length) * elementSize
block1Start * elementSize
block1Dest * elementSize
block1Length * elementSize
block2Length * elementSize
While loopNo > 0
loopNo - elementSize
currentStartIndex = block1Start + loopNo
currentIndex = currentStartIndex
sourceIndex = currentIndex + block1Length
While sourceIndex <> currentStartIndex
VC__swapMemory(*arrayBase + sourceIndex, *arrayBase + currentIndex, elementSize)
currentIndex = sourceIndex
If currentIndex >= block1Dest
sourceIndex - block2Length
Else
sourceIndex + block1Length
EndIf
Wend
Wend
EndProcedure
;=================================================================================
; Description: Part of the structuredSort() method.
;
; Notes: Used by procedures handling the MergeSort routine.
; Merges 2 sorted 'streams' into a larger stream using recursion.
;=================================================================================
Procedure VC__inPlaceMerge(startFirstStream, startSecondStream, endSecondStream, *arrayBase, *offsetArrayBase, elementSize, compFunc.pf_CompElementFunc, sortOrder)
Protected firstBlockStart, firstBlockLength, secondBlockStart, secondBlockLength
Protected lengthFirstStream = startSecondStream - startFirstStream
Protected lengthSecondStream = endSecondStream - startSecondStream
If lengthFirstStream <> 0 And lengthSecondStream <> 0
If lengthFirstStream + lengthSecondStream = 2
If compFunc(*offsetArrayBase + startSecondStream * elementSize, *offsetArrayBase + startFirstStream * elementSize) = sortOrder
VC__swapMemory(*arrayBase + startSecondStream * elementSize, *arrayBase + (startFirstStream) * elementSize, elementSize)
EndIf
Else
If lengthFirstStream > lengthSecondStream
;First block starts half way through the first stream
;and continues to the end of the first stream.
firstBlockStart = startFirstStream + (lengthFirstStream / 2)
firstBlockLength = startSecondStream - firstBlockStart
;Second block starts at the start of the second stream
;and ends at a point so that everything in the first block
;should come after everything in the second block.
secondBlockStart = startSecondStream
secondBlockLength = VC__binarySearchForFirstElementGEValue(*offsetArrayBase + firstBlockStart * elementSize, startSecondStream, endSecondStream, *offsetArrayBase, elementSize, compFunc, sortOrder) - secondBlockStart
Else
;Second block starts at the start of the second stream
;and continues to half way through the second stream.
secondBlockStart = startSecondStream
secondBlockLength = lengthSecondStream / 2
;First block starts so that everything in the first block should
;come after everything in the second block.
firstBlockStart = VC__binarySearchForFirstElementGTValue(*offsetArrayBase + (secondBlockStart + secondBlockLength) * elementSize, startFirstStream, startSecondStream, *offsetArrayBase, elementSize, compFunc, sortOrder ! #_VC_CompareToggleValue)
firstBlockLength = startSecondStream - firstBlockStart
EndIf
VC__pushBlock(firstBlockStart, firstBlockLength, secondBlockStart + secondBlockLength - firstBlockLength, *arrayBase, elementSize)
VC__inPlaceMerge(startFirstStream, firstBlockStart, firstBlockStart + secondBlockLength, *arrayBase, *offsetArrayBase, elementSize, compFunc, sortOrder)
VC__inPlaceMerge(secondBlockStart + secondBlockLength - firstBlockLength, secondBlockStart + secondBlockLength, endSecondStream, *arrayBase, *offsetArrayBase, elementSize, compFunc, sortOrder)
EndIf
EndIf
EndProcedure
;=================================================================================
; Description: Part of the structuredSort() method.
;
; Notes: This procedure handles the MergeSort routines.
; Performs a stable in-place merge sort using [*compFunc] and based on
; [sortOrder] using recursion.
;=================================================================================
Procedure VC__inPlaceMergeSort(firstIndex, lastIndex, *arrayBase, *offsetArrayBase, elementSize, *compFunc, sortOrder)
Protected middleIndex
If lastIndex - firstIndex < 12
VC__insertSort(firstIndex, lastIndex, *arrayBase, *offsetArrayBase, elementSize, *compFunc, sortOrder)
Else
middleIndex = (firstIndex + lastIndex) / 2
VC__inPlaceMergeSort(firstIndex, middleIndex, *arrayBase, *offsetArrayBase, elementSize, *compFunc, sortOrder)
VC__inPlaceMergeSort(middleIndex, lastIndex, *arrayBase, *offsetArrayBase, elementSize, *compFunc, sortOrder)
VC__inPlaceMerge(firstIndex, middleIndex, lastIndex, *arrayBase, *offsetArrayBase, elementSize, *compFunc, sortOrder)
EndIf
EndProcedure
;=================================================================================
; Method: Sorts the array elements of size [elementSize] beginning at [*arrayBase]
; from [firstIndex] to [lastIndex] according to [option] and the data at
; the structure offset [offset] using function referenced by [cType].
;
; Notes: [cType] is a compare type defined in the InitMergeSort routine that will
; determine the compare function used to complete the sort. Each of the
; compare types listed in the PB Manual under SortStructuredArray are
; permitted as well as any custom types that are defined.
; The [option] can be either #PB_Sort_Ascending or #PB_SortDescending.
; The sort is stable and performed in-place.
;=================================================================================
Procedure VC_structuredSort(*arrayBase, option, arraySize, elementSize, offset, cType, firstIndex, lastIndex)
;begin parameter check
Protected *compFunc = comparisonType(Str(cType))
If *compFunc = 0: ProcedureReturn 0: EndIf ;invalid ctype
If arraySize < 0: ProcedureReturn 0: EndIf ;no elements
If firstIndex < 0 Or firstIndex > arraySize Or firstIndex >= lastIndex Or lastIndex > arraySize: ProcedureReturn -1: EndIf ;invalid indexes
If elementSize <= 0: ProcedureReturn 0: EndIf ;elementSize is invalid
If offset <0 Or offset >= elementSize: ProcedureReturn 0: EndIf ;offset is invalid
If option < #PB_Sort_Ascending Or option > #PB_Sort_Descending: ProcedureReturn 0: EndIf ;invalid option
;set sortOrder
Protected sortOrder
Select option
Case #PB_Sort_Descending
sortOrder = #VC_CompareIsGreaterThan
Case #PB_Sort_Ascending
sortOrder = #VC_CompareIsLessThan
Default
ProcedureReturn ;#PB_NoCase is not handled
EndSelect
Protected *offsetArrayBase = *arrayBase + offset
VC__inPlaceMergeSort(firstIndex, lastIndex, *arrayBase, *offsetArrayBase, elementSize, *compFunc, sortOrder)
ProcedureReturn 1 ;succesful completion
EndProcedure
;comparison functions
Macro mcr_compareElements_StandardTypes(type,subStructure)
Procedure VC_compare_#type(*element_1.type, *element_2.type)
If *element_1\subStructure > *element_2\subStructure
ProcedureReturn #VC_CompareIsGreaterThan
ElseIf *element_1\subStructure < *element_2\subStructure
ProcedureReturn #VC_CompareIsLessThan
Else
ProcedureReturn #VC_CompareIsEqualTo
EndIf
EndProcedure
EndMacro
mcr_compareElements_StandardTypes(byte,b)
mcr_compareElements_StandardTypes(word,w)
mcr_compareElements_StandardTypes(long,l)
mcr_compareElements_StandardTypes(quad,q)
mcr_compareElements_StandardTypes(float,f)
mcr_compareElements_StandardTypes(double,d)
mcr_compareElements_StandardTypes(ascii,a)
mcr_compareElements_StandardTypes(unicode,u)
mcr_compareElements_StandardTypes(string,s)
DataSection
;custom types may be added. Care must be taken to avoid duplication of cType #'s.
mergeSortelementTypes:
Data.i 10 ;count of cTypes listed below (1 based)
;cType, functionAddr
Data.i #PB_Byte, @vc_compare_byte()
Data.i #PB_Word, @vc_compare_word()
Data.i #PB_Long, @vc_compare_long()
Data.i #PB_Integer ;special case
;#PB_Integer's function is determined by the following compiler directives
CompilerIf #PB_Compiler_Processor = #PB_Processor_x64
Data.i @vc_compare_quad()
CompilerElse
Data.i @vc_compare_long()
CompilerEndIf
Data.i #PB_Quad, @vc_compare_quad()
Data.i #PB_String, @vc_compare_string() ;works with .s, $, or .String but not fixed string
Data.i #PB_Float, @vc_compare_float()
Data.i #PB_Double, @vc_compare_double()
Data.i #PB_Character ;special case
;#PB_Character's function is determined by the following compiler directives
CompilerIf #PB_Compiler_Unicode
Data.i @vc_compare_unicode()
CompilerElse
Data.i @vc_compare_ascii()
CompilerEndIf
Data.i #PB_Ascii, @vc_compare_ascii()
Data.i #PB_Unicode, @vc_compare_unicode()
EndDataSection
Procedure InitMergeSort()
Protected count, i, cType, funcAdr
Restore mergeSortElementTypes
Read count
For i = 1 To count
Read cType
Read funcAdr
comparisonType(Str(cType)) = funcAdr
Next
ProcedureReturn 1 ; success
EndProcedure

Code: Select all
XIncludeFile "StableMergeSort.pbi"
InitMergeSort()
Structure TestResults
name.s
score.i
originalOrder.i
EndStructure
Procedure setOrder(Array results.TestResults(1))
Protected i
For i = 0 To ArraySize(results())
results(i)\originalOrder = i
Next
EndProcedure
Procedure Showit(Array results.TestResults(1))
Protected i
Debug "ID# name score"
For i = 0 To ArraySize(results())
Debug RSet(Str(results(i)\originalOrder),4," ") + ": " + results(i)\name + " " + Str(results(i)\score)
Next
EndProcedure
#testCount = 50
Dim a.TestResults(#testCount)
Define i, j, name.s, numTests
i = 0
While i <= #testCount
name = Chr(65 + Random(25)) + Chr(65 + Random(25)) + Chr(65 + Random(25)) + Chr(65 + Random(25))
numTests = Random(10)
If numTests > #testCount - i + 1
numTests = #testCount - i + 1
EndIf
For j = 0 To numTests - 1
a(i + j)\name = name
a(i + j)\score = (Random(7) * 10) + 30
Next
i + numTests
Wend
setOrder(a())
Debug ""
Debug "------- Original created order ------------"
Showit(a())
;sort on each desired index in reverse order of importance (i.e. by score, then by name)
;Because a stable sort is not needed for the least important index
;PB's structuredSort will be used, it's not stable but it is quick.
setOrder(a())
SortStructuredArray(a(),#PB_Sort_Ascending,OffsetOf(TestResults\score),#PB_Integer)
Debug ""
Debug "------- Results of PB's sortStructuredArray() by score ------------"
Showit(a())
;Now re-sort by each subsequent index in order of increasing importance
setOrder(a())
VC_structuredSort(a(), #PB_Sort_Ascending, ArraySize(a()), SizeOf(TestResults), OffsetOf(TestResults\name), #PB_String, 0, ArraySize(a()))
Debug ""
Debug "-------Results of StructuredMergeSort() by name ------------"
Showit(a())