Stable in-place MergeSort for arrays

Share your advanced PureBasic knowledge/code with the community.
User avatar
Demivec
Addict
Addict
Posts: 4260
Joined: Mon Jul 25, 2005 3:51 pm
Location: Utah, USA

Stable in-place MergeSort for arrays

Post by Demivec »

Code updated for 5.20+

Here is a stable in-place MergeSort routine for structured arrays or similarly structured copies of arrays.

It is coded for v4.41 and supports all of the standard sort types. It does not allow sorts using the #PB_NoCase option, though it is possible to define additional search comparison types (at compile-time).

I ripped this out of some other code I had been writing and made it presentable. Hopefully there aren't any rough edges.

See the comments at the codes beginning about credit for contributions, great and small.

Code: Select all

;Programmed by:  Demivec (Jared Johnson) with major contributions from others
;File: "StableMergeSort.pbi" v1.0
;Date: 2/11/2010
;Coded for PureBasic v4.41
;
;Description: performs a stable in-place mergeSort on an array of structured elements (or
;  an equivalently structured memory buffer)
;
;Large portions of code dealing with the inPlaceMergeSort were based on code from
;"Fastest In-Place Stable Sort" by Craig Brown posted on The Code Project which was based
;on an "algorithm taken from Thomas Baudel's recursive C version 
;http://thomas.baudel.name/Visualisation/VisuTri/inplacestablesort.html " and was
;re-written by Craig Brown to improve clarity.  I've converted Craig Brown's version 
;(writtin in a version of Basic) for use in PureBasic with some slight changes that
;mostly involved changes to variable names and some minor restructuring.
;
;The VC__swapMemory() function was written by Thorium and shared on the PureBasic
;programming forum.

;===================================================================================
;Include "StableMergeSort.pbi" ;use path and name of this file
;
;InitMergeSort() ;if result of InitMergeSort() = 1 then it is ready for use.
;===================================================================================
;Call it by using
;
;VC_structuredSort(*array, options, arraySize, elementSize, offset, cType, firstIndex, LastIndex)
;
;All parameters are required. [cType] is a sort type (i.e. #PB_Integer), and
;[option] can be either #PB_Sort_Ascending or #PB_Sort_Descending.
;
;;For example, if an array 'Results' of structure 'Test' is defined as
; 
; Structure Test
;   name.s
;   score.i
; EndStructure
;
; Dim Results.Test(100)
;
;;It could be sorted using
;
;VC_structuredSort(Results(), #PB_Sort_Ascending, ArraySize(Results()), Sizeof(Test), OffsetOf(Test\score), #PB_Integer, 0, 100)
;
;===================================================================================


Enumeration   
  ;The vecClass standard compare functions return one of the following comparison
  ;results.  User-Defined comparison functions should also return these values.
  #VC_CompareIsLessThan = -1
  #VC_CompareIsEqualTo
  #VC_CompareIsGreaterThan
EndEnumeration

; ------ constant for internal use only ------
#_VC_CompareToggleValue = #VC_CompareIsGreaterThan ! #VC_CompareIsLessThan ;used to toggle between a lessThan result and a greaterThan result(i.e. result ! #_VC_CompareToggleValue) 

Global NewMap comparisonType()

Prototype pf_CompElementFunc(*element_1, *element_2) ;*element's will be typed in the corresponding function being prototyped.

;=================================================================================
; Description: An internal routine for swapping memory in place between two memory 
;              blocks without needing any additional memory buffer space.
;
; Notes: Used by the Swap() method and procedures handling the MergeSort and
;        InsertionSort routines.
;        (swapMemory method authored by Thorium of the PureBasic forum)
;=================================================================================
Procedure VC__swapMemory(*sourceMemory, *destinationMemory, memoryLength)
  CompilerIf #PB_Compiler_Processor = #PB_Processor_x86   
    !PUSH Esi
    !PUSH Edi
    
    !MOV Esi, [p.p_sourceMemory+8]
    !MOV Edi, [p.p_destinationMemory+8]
    !MOV Ecx, [p.v_memoryLength+8]
    
    !PUSH Ecx
    !SHR Ecx, 2
    
    !align 4
    !MemorySwapASM3_LoopStart1:
    !CMP Ecx, 0
    !JE MemorySwapASM3_LoopEnd1
    
    !MOV Eax, [Esi]
    !MOV Edx, [Edi]
    !MOV [Esi], Edx
    !MOV [Edi], Eax
    !ADD Esi, 4
    !ADD Edi, 4
    !DEC Ecx
    
    !JMP MemorySwapASM3_LoopStart1
    !MemorySwapASM3_LoopEnd1:
    
    !POP Ecx
    !AND Ecx, 3
    
    !MemorySwapASM3_LoopStart2:
    !CMP Ecx, 0
    !JE MemorySwapASM3_LoopEnd2
    
    !MOV al, [Esi]
    !MOV dl, [Edi]
    !MOV [Esi], dl
    !MOV [Edi], al
    !INC Esi
    !INC Edi
    !DEC Ecx
    
    !JMP MemorySwapASM3_LoopStart2
    !MemorySwapASM3_LoopEnd2:
    
    !POP Edi
    !POP Esi
  CompilerElse
    !MOV r8, [p.p_sourceMemory]
    !MOV r9, [p.p_destinationMemory]
    !MOV rcx, [p.v_memoryLength]
    
    !MOV r10, rcx
    !SHR rcx, 3
    
    !align 4
    !MemorySwapASM3_LoopStart1:
    !CMP rcx, 0
    !JE MemorySwapASM3_LoopEnd1
    
    !MOV rax, [r8]
    !MOV rdx, [r9]
    !MOV [r8], rdx
    !MOV [r9], rax
    !ADD r8, 8
    !ADD r9, 8
    !DEC rcx
    
    !JMP MemorySwapASM3_LoopStart1
    !MemorySwapASM3_LoopEnd1:
    
    !AND r10, 7
    
    !MemorySwapASM3_LoopStart2:
    !CMP r10, 0
    !JE MemorySwapASM3_LoopEnd2
    
    !MOV al, [r8]
    !MOV dl, [r9]
    !MOV [r8], dl
    !MOV [r9], al
    !INC r8
    !INC r9
    !DEC r10
    
    !JMP MemorySwapASM3_LoopStart2
    !MemorySwapASM3_LoopEnd2:
  CompilerEndIf
EndProcedure  

;=================================================================================
; Description: Part of the structuredSort() method.
;
; Notes: Used by procedures handling the MergeSort routine.
;        Sorts array elements using an insertion sort.
;=================================================================================
Procedure VC__insertSort(firstIndex, lastIndex, *arrayBase, *offsetArrayBase, elementSize, compFunc.pf_CompElementFunc, sortOrder)
  Protected i, j, tw, tw2
  
  If lastIndex > firstIndex + 1
    i = firstIndex + 1
    While i <= lastIndex
      j = i
      tw = *offsetArrayBase + j * elementSize ;slight speed optimization to use stepping instead of multiplication inside loop
      While j > firstIndex
        If compFunc(tw, tw - elementSize) = sortOrder
          tw2 = *arrayBase + j * elementSize
          VC__swapMemory(tw2, tw2 - elementSize, elementSize)
        Else
          Break
        EndIf
        j - 1
        tw - elementSize
      Wend
      i + 1
    Wend
  EndIf
EndProcedure

;=================================================================================
; Description: Part of the structuredSort() method.
;
; Returns: Returns the index of the array element that is >= the data at [*value]
;          if sorting in ascending order or <= the data at [*value] if sorting
;          in descending order.
;
; Notes: Used by procedures handling the MergeSort routine.
;=================================================================================
Procedure VC__binarySearchForFirstElementGEValue(*value, firstIndex, lastIndex, *offsetArrayBase, elementSize, compFunc.pf_CompElementFunc, sortOrder)
  Protected middleIndex
  
  lastIndex - 1
  While (lastIndex - firstIndex) > 1
    middleIndex = (firstIndex + lastIndex) / 2
    If compFunc(*offsetArrayBase + middleIndex * elementSize, *value) <> sortOrder
      lastIndex = middleIndex
    Else
      firstIndex = middleIndex
    EndIf
  Wend
  
  If compFunc(*offsetArrayBase + firstIndex * elementSize, *value) <> sortOrder
    ProcedureReturn firstIndex
  Else
    If compFunc(*offsetArrayBase + lastIndex * elementSize, *value) <> sortOrder
      ProcedureReturn lastIndex
    Else
      ProcedureReturn lastIndex + 1
    EndIf
  EndIf
EndProcedure

;=================================================================================
; Description: Part of the structuredSort() method.
;
; Returns: Returns the index of the array element that is > the data at [*value]
;          while sorting in ascending order and < the data at [*value] if sorting
;          in descending order.
;
; Notes: Used by procedures handling the MergeSort routine.
;=================================================================================
Procedure VC__binarySearchForFirstElementGTValue(*value, firstIndex, lastIndex, *offsetArrayBase, elementSize, compFunc.pf_CompElementFunc, sortOrder)
  Protected middleIndex
  
  lastIndex - 1
  While (lastIndex - firstIndex) > 1
    middleIndex = (firstIndex + lastIndex) / 2
    If compFunc(*offsetArrayBase + middleIndex * elementSize, *value) = sortOrder
      lastIndex = middleIndex
    Else
      firstIndex = middleIndex
    EndIf
  Wend
  
  If compFunc(*offsetArrayBase + firstIndex * elementSize, *value) = sortOrder
    ProcedureReturn firstIndex
  Else
    If compFunc(*offsetArrayBase + lastIndex * elementSize, *value) = sortOrder
      ProcedureReturn lastIndex
    Else
      ProcedureReturn lastIndex + 1
    EndIf
  EndIf
EndProcedure

;=================================================================================
; Description: Part of the structuredSort() method.
;
; Returns: Returns the greatest common denominator of the integers [m] and [n].
;
; Notes: Used by procedures handling the MergeSort routine.
;=================================================================================
Procedure VC__greatestCommonDenominator(m, n)
  Protected T
  
  While n <> 0
    T = m % n
    m = n
    n = T
  Wend
  ProcedureReturn m
EndProcedure

;=================================================================================
; Description: Part of the structuredSort() method.
;
; Notes: Used by procedures handling the MergeSort routine.
;        Moves a block of memory forward while moving another block of memory
;        backwards to make room.  The blocks can be different sizes.
;=================================================================================
Procedure VC__pushBlock(block1Start, block1Length, block1Dest, *arrayBase, elementSize)
  ;Function pushes a block of memory forward and another block backwards (to make room).
  ;block1Start,block1Length,block1Dest,block2Length are all array indexes ranging from 0 -> (arraySize)
  Protected block2Length
  Protected loopNo
  Protected currentStartIndex, currentIndex, sourceIndex
  
  If block1Length = 0 Or block1Dest = block1Start
    ProcedureReturn
  EndIf
  
  ;Block 1 goes forwards, block 2 goes backwards.
  ;Calculate size of the block moving backwards.
  block2Length = block1Dest - block1Start
  ;use a slight speed optimization of doing multiplication outside the loop and
  ;only stepping while in the loop
  loopNo = VC__greatestCommonDenominator(block1Length + block2Length, block1Length) * elementSize
  block1Start * elementSize
  block1Dest * elementSize
  block1Length * elementSize
  block2Length * elementSize
  While loopNo > 0
    loopNo - elementSize
    
    currentStartIndex = block1Start + loopNo
    currentIndex = currentStartIndex
    sourceIndex = currentIndex + block1Length
    While sourceIndex <> currentStartIndex
      VC__swapMemory(*arrayBase + sourceIndex, *arrayBase + currentIndex, elementSize)
      currentIndex = sourceIndex
      If currentIndex >= block1Dest
        sourceIndex - block2Length
      Else
        sourceIndex + block1Length
      EndIf
    Wend
  Wend
EndProcedure

;=================================================================================
; Description: Part of the structuredSort() method.
;
; Notes: Used by procedures handling the MergeSort routine.
;        Merges 2 sorted 'streams' into a larger stream using recursion.
;=================================================================================
Procedure VC__inPlaceMerge(startFirstStream, startSecondStream, endSecondStream, *arrayBase, *offsetArrayBase, elementSize, compFunc.pf_CompElementFunc, sortOrder)
  Protected firstBlockStart, firstBlockLength, secondBlockStart, secondBlockLength
  Protected lengthFirstStream = startSecondStream - startFirstStream
  Protected lengthSecondStream = endSecondStream - startSecondStream
  If lengthFirstStream <> 0 And lengthSecondStream <> 0
    If lengthFirstStream + lengthSecondStream = 2
      If compFunc(*offsetArrayBase + startSecondStream * elementSize, *offsetArrayBase + startFirstStream * elementSize) = sortOrder
        VC__swapMemory(*arrayBase + startSecondStream * elementSize, *arrayBase + (startFirstStream) * elementSize, elementSize)
      EndIf
    Else
      If lengthFirstStream > lengthSecondStream
        ;First block starts half way through the first stream
        ;and continues to the end of the first stream.
        firstBlockStart = startFirstStream + (lengthFirstStream / 2)
        firstBlockLength = startSecondStream - firstBlockStart
        ;Second block starts at the start of the second stream
        ;and ends at a point so that everything in the first block
        ;should come after everything in the second block.
        secondBlockStart = startSecondStream
        secondBlockLength = VC__binarySearchForFirstElementGEValue(*offsetArrayBase + firstBlockStart * elementSize, startSecondStream, endSecondStream, *offsetArrayBase, elementSize, compFunc, sortOrder) - secondBlockStart
      Else
        ;Second block starts at the start of the second stream
        ;and continues to half way through the second stream.
        secondBlockStart = startSecondStream
        secondBlockLength = lengthSecondStream / 2
        ;First block starts so that everything in the first block should
        ;come after everything in the second block.
        firstBlockStart = VC__binarySearchForFirstElementGTValue(*offsetArrayBase + (secondBlockStart + secondBlockLength) * elementSize, startFirstStream, startSecondStream, *offsetArrayBase, elementSize, compFunc, sortOrder ! #_VC_CompareToggleValue)
        firstBlockLength = startSecondStream - firstBlockStart
      EndIf
      VC__pushBlock(firstBlockStart, firstBlockLength, secondBlockStart + secondBlockLength - firstBlockLength, *arrayBase, elementSize)
      VC__inPlaceMerge(startFirstStream, firstBlockStart, firstBlockStart + secondBlockLength, *arrayBase, *offsetArrayBase, elementSize, compFunc, sortOrder)
      
      VC__inPlaceMerge(secondBlockStart + secondBlockLength - firstBlockLength, secondBlockStart + secondBlockLength, endSecondStream, *arrayBase, *offsetArrayBase, elementSize, compFunc, sortOrder)
    EndIf
  EndIf
EndProcedure

;=================================================================================
; Description: Part of the structuredSort() method.
;
; Notes: This procedure handles the MergeSort routines.
;        Performs a stable in-place merge sort using [*compFunc] and based on 
;        [sortOrder] using recursion.
;=================================================================================
Procedure VC__inPlaceMergeSort(firstIndex, lastIndex, *arrayBase, *offsetArrayBase, elementSize, *compFunc, sortOrder)
  Protected middleIndex
  If lastIndex - firstIndex < 12
    VC__insertSort(firstIndex, lastIndex, *arrayBase, *offsetArrayBase, elementSize, *compFunc, sortOrder)
  Else
    middleIndex = (firstIndex + lastIndex) / 2
    VC__inPlaceMergeSort(firstIndex, middleIndex, *arrayBase, *offsetArrayBase, elementSize, *compFunc, sortOrder)
    VC__inPlaceMergeSort(middleIndex, lastIndex, *arrayBase, *offsetArrayBase, elementSize, *compFunc, sortOrder)
    VC__inPlaceMerge(firstIndex, middleIndex, lastIndex, *arrayBase, *offsetArrayBase, elementSize, *compFunc, sortOrder)
  EndIf
EndProcedure

;=================================================================================
; Method: Sorts the array elements of size [elementSize] beginning at [*arrayBase]
;         from [firstIndex] to [lastIndex] according to [option] and the data at
;         the structure offset [offset] using function referenced by [cType].
;
; Notes: [cType] is a compare type defined in the InitMergeSort routine that will
;        determine the compare function used to complete the sort. Each of the
;        compare types listed in the PB Manual under SortStructuredArray are
;        permitted as well as any custom types that are defined.
;        The [option] can be either #PB_Sort_Ascending or #PB_SortDescending.
;        The sort is stable and performed in-place.        
;=================================================================================
Procedure VC_structuredSort(*arrayBase, option, arraySize, elementSize, offset, cType, firstIndex, lastIndex)
  ;begin parameter check
  Protected *compFunc = comparisonType(Str(cType))
  If *compFunc = 0: ProcedureReturn 0: EndIf ;invalid ctype
  If arraySize < 0: ProcedureReturn 0: EndIf ;no elements
  If firstIndex < 0 Or firstIndex > arraySize Or firstIndex >= lastIndex Or lastIndex > arraySize: ProcedureReturn  -1: EndIf ;invalid indexes
  If elementSize <= 0: ProcedureReturn 0: EndIf ;elementSize is invalid
  If offset <0 Or offset >= elementSize: ProcedureReturn 0: EndIf ;offset is invalid
  If option < #PB_Sort_Ascending Or option > #PB_Sort_Descending: ProcedureReturn 0: EndIf ;invalid option
  
  
  ;set sortOrder
  Protected sortOrder
  Select option
    Case #PB_Sort_Descending
      sortOrder = #VC_CompareIsGreaterThan
    Case #PB_Sort_Ascending
      sortOrder = #VC_CompareIsLessThan
    Default
      ProcedureReturn ;#PB_NoCase is not handled
  EndSelect
  
  Protected *offsetArrayBase = *arrayBase + offset
  VC__inPlaceMergeSort(firstIndex, lastIndex, *arrayBase, *offsetArrayBase, elementSize, *compFunc, sortOrder)
  
  ProcedureReturn 1 ;succesful completion
EndProcedure


;comparison functions
Macro mcr_compareElements_StandardTypes(type,subStructure)
  Procedure VC_compare_#type(*element_1.type, *element_2.type)
    If *element_1\subStructure > *element_2\subStructure
      ProcedureReturn #VC_CompareIsGreaterThan
    ElseIf *element_1\subStructure < *element_2\subStructure
      ProcedureReturn #VC_CompareIsLessThan
    Else 
      ProcedureReturn #VC_CompareIsEqualTo
    EndIf
  EndProcedure
EndMacro

mcr_compareElements_StandardTypes(byte,b)
mcr_compareElements_StandardTypes(word,w)
mcr_compareElements_StandardTypes(long,l)
mcr_compareElements_StandardTypes(quad,q)
mcr_compareElements_StandardTypes(float,f)
mcr_compareElements_StandardTypes(double,d)
mcr_compareElements_StandardTypes(ascii,a)
mcr_compareElements_StandardTypes(unicode,u)
mcr_compareElements_StandardTypes(string,s)

DataSection
  ;custom types may be added.  Care must be taken to avoid duplication of cType #'s.
  mergeSortelementTypes:
  Data.i 10 ;count of cTypes listed below (1 based)
  ;cType, functionAddr
  Data.i #PB_Byte, @vc_compare_byte()
  Data.i #PB_Word, @vc_compare_word()
  Data.i #PB_Long, @vc_compare_long()
  Data.i #PB_Integer ;special case
  ;#PB_Integer's function is determined by the following compiler directives
  CompilerIf #PB_Compiler_Processor = #PB_Processor_x64
    Data.i @vc_compare_quad()
  CompilerElse
    Data.i @vc_compare_long()
  CompilerEndIf 
  
  Data.i #PB_Quad, @vc_compare_quad()
  Data.i #PB_String, @vc_compare_string() ;works with .s, $, or .String but not fixed string
  Data.i #PB_Float, @vc_compare_float()
  Data.i #PB_Double, @vc_compare_double()
  
  Data.i #PB_Character ;special case
  ;#PB_Character's function is determined by the following compiler directives
  CompilerIf #PB_Compiler_Unicode
    Data.i @vc_compare_unicode()
  CompilerElse
    Data.i @vc_compare_ascii()
  CompilerEndIf 
  
  Data.i #PB_Ascii, @vc_compare_ascii()
  Data.i #PB_Unicode, @vc_compare_unicode()
EndDataSection

Procedure InitMergeSort()
  Protected count, i, cType, funcAdr

  Restore mergeSortElementTypes
  Read count
  For i = 1 To count
    Read cType
    Read funcAdr
    comparisonType(Str(cType)) = funcAdr
  Next
  
  ProcedureReturn 1 ; success
EndProcedure
And a sloppy example to go with it :D :

Code: Select all

XIncludeFile "StableMergeSort.pbi"
InitMergeSort()

Structure TestResults
  name.s
  score.i
  originalOrder.i
EndStructure

Procedure setOrder(Array results.TestResults(1))
  Protected i
  
  For i = 0 To ArraySize(results())
    results(i)\originalOrder = i
  Next
EndProcedure

Procedure Showit(Array results.TestResults(1))
  Protected i
  
  Debug "ID# name  score"
  For i = 0 To ArraySize(results())
    Debug RSet(Str(results(i)\originalOrder),4," ") + ": " + results(i)\name + "  " + Str(results(i)\score)
  Next
EndProcedure

#testCount = 50
Dim a.TestResults(#testCount)

Define i, j, name.s, numTests

i = 0
While i <= #testCount
   name = Chr(65 + Random(25)) + Chr(65 + Random(25)) + Chr(65 + Random(25)) + Chr(65 + Random(25))
  
  numTests = Random(10)
  If numTests > #testCount - i + 1
    numTests = #testCount - i + 1
  EndIf
  
  For j = 0 To numTests - 1
    a(i + j)\name = name
    a(i + j)\score = (Random(7) * 10) + 30
  Next 
  
  i + numTests 
Wend 

setOrder(a())
Debug ""
Debug "------- Original created order ------------"
Showit(a())

;sort on each desired index in reverse order of importance (i.e. by score, then by name)

;Because a stable sort is not needed for the least important index
;PB's structuredSort will be used, it's not stable but it is quick.  
setOrder(a())
SortStructuredArray(a(),#PB_Sort_Ascending,OffsetOf(TestResults\score),#PB_Integer)
Debug ""
Debug "------- Results of PB's sortStructuredArray() by score ------------"
Showit(a())

;Now re-sort by each subsequent index in order of increasing importance
setOrder(a())
VC_structuredSort(a(), #PB_Sort_Ascending, ArraySize(a()), SizeOf(TestResults), OffsetOf(TestResults\name), #PB_String, 0, ArraySize(a()))
Debug ""
Debug "-------Results of StructuredMergeSort() by name ------------"
Showit(a())

User avatar
Kwai chang caine
Always Here
Always Here
Posts: 5494
Joined: Sun Nov 05, 2006 11:42 pm
Location: Lyon - France

Re: Stable in-place MergeSort for arrays

Post by Kwai chang caine »

A "little bit" late...but it's never really to late for a big thanks :wink:
Works very well here
I found it just now :oops: furthermore, when i search another thing
But like i often use arrays, i keep it preciously
Thanks for sharing, Have a good end of year 8)
ImageThe happiness is a road...
Not a destination
Post Reply