Radix sort

Share your advanced PureBasic knowledge/code with the community.
Trond
Always Here
Always Here
Posts: 7446
Joined: Mon Sep 22, 2003 6:45 pm
Location: Norway

Radix sort

Post by Trond »

This is a very cool sorting algorithm. It's not as fast as PB's sort, but not exactly slow either. Unfortunately it sorts negative numbers in reverse (or actually, the positive numbers are sorted in reverse, but if you reverse this, then the negative numbers come out in reverse). But I bet this can be fixed.

Algorithm
1. Sort the list in place based on the most significant ("leftmost") bit. This splits the list in two (the entries with this bit=1 is list 1, the entries with this bit=0 is list 2).
2. Use the first step of this algorithm on both list 1 and 2, except use the next most significant bit (one bit "to the right" of the one we used in the previous step).

So how do we sort the list based on a particular bit? We use another sort, I called it "bit selection sort", it's selection sort, but because the keys are only one bit long, we can do some special optimizations.

Selection sort works like this:
0. Set I = 0
1. Find the maximum item in the unsorted part of the list, and swap this with the item at position I
2. Increase I by 1
3. If we are not at the end of the list yet, go to step 1.

But since we sorting on bits, we have an advantage: since the maximum value of a bit is 1, we know immediately if find a bit that's 1, that no other item has a higher bit value. So we don't have to traverse the entire unsorted part of the list to find each maximum value. We just have to find ANY entry with bit value 1. So it's faster.
To avoid unnecessary swapping, we start looking for the "maximum" (bit = 1) bit from the END of the list. This optimization will cause the sort to unstable, so if you need a fast stable sort you could probably get that by looking from the current position (I) instead of from the end of the list (I didn't test it, though).

Note that by searching the internet, you may come across descriptions of "radix sort" which uses pigeonhole sort instead of bit selection sort. This would be completely ok except that they don't explain which part of the code is pigeonhole sort and which part is the actual radix sort, and just call everything "radix sort". So don't let this confuse you.

Code: Select all

Procedure BitSelectionSort(Array Arr.l(1), Start, Stop, Bitmask)

  Fill = Start
  Look = Stop
  
  While Arr(Fill) & Bitmask
    Fill + 1
    If Fill > Stop
      ProcedureReturn Fill
    EndIf
  Wend
  
  While Look > Fill And Fill <= Stop
    If Arr(Look) & Bitmask
      Swap Arr(Fill), Arr(Look)
      Fill + 1
      While Arr(Fill) & Bitmask And Fill <= Stop
        Fill + 1
      Wend
    EndIf
    Look - 1
  Wend
  
  ProcedureReturn Fill
EndProcedure

Procedure RadixSortIn(Array Arr.l(1), Start, Stop, Bitmask.l)
  If Bitmask
    Pivot = BitSelectionSort(Arr(), Start, Stop, Bitmask)
    Bitmask = (Bitmask >> 1) & $7FFFFFFF ; & 7f to compensate for the use of arithmetic shift
    If Pivot - Start > 1
      RadixSortIn(Arr(), Start, Pivot-1, Bitmask)
    EndIf
    If Stop - Pivot > 0
      RadixSortIn(Arr(), Pivot, Stop, Bitmask)
    EndIf
  EndIf
EndProcedure

Procedure RadixSort(Array Arr.l(1))
  RadixSortIn(Arr(), 0, ArraySize(Arr()), 1 << 31)
EndProcedure

L = 6000000
Dim Bubble.l(L)
Dim PB.l(L)
Dim Radix.l(L)

For I = 0 To L
  Bubble(I) = Random(2147483647)
Next

For I = 0 To L
  Radix(I) = Bubble(I)
  PB(I) = Bubble(I)
Next


time = ElapsedMilliseconds()
RadixSort(Radix())
MessageRequester("", Str(ElapsedMilliseconds()-time))

time = ElapsedMilliseconds()
SortArray(PB(), #PB_Sort_Descending)
MessageRequester("", Str(ElapsedMilliseconds()-time))

For i = 0 To L
  If Radix(I) <> PB(I)
    MessageRequester("", "Error in sort at " + Str(I))
    End
  EndIf
Next