Page 4 of 5

Re: Optimization of a sorting algorithm

Posted: Sat Oct 07, 2023 2:11 pm
by Kurzer
Kurzer wrote: Sat Oct 07, 2023 11:38 am... so far. Image
I have now fixed the incorrect sorting in V6 sorting. Yuki, your data is now sorted correctly by V6 as well.

For sorting, I have divided the Sort value into an odd slot (999999, 999997, 999995, etc.) and an even slot (999998, 999996, 999994, etc.).

When processing the entries, RefEntries always get an odd sort value, while the directly parent entry always gets an even sort value. Of course there can be duplications, because with iterative referencing a RefEntry can be a parent entry at the same time, but two directly related entries never get the same sort value.

I tried to optimize the procedure even more by storing the sort values directly in the list, as far as it was possible. So that when transferring the sort values from the map to the list later, not all values have to be transferred.

V6 is similar to your latest V4. I also have two list passes, as you did in V4. But although in V4 you do not need the subsequent copying of the sort values from the map to the list and V4 seems less complicated than V6, V6 performs better than all other versions up to about 5500 entries. For me it is not quite clear why this is so.
Yuki wrote: Relying on the dirtied map for sort logic has the downside that whenever elements change, you have to properly invalidate the map. Affecting only the changed element might be coincidentally safe in some cases, but you'd really have to walk relations to be sure, which is relatively costly with the existing structure. This is good if your list is fixed after creation, but otherwise may cost more in the long run.
Yes, that is absolutely correct.
However, due to the use case (arranging GUI elements in a window), I would limit myself to re-sorting the list when (subsequently) adding or removing a gadget.

I'm not sure yet how to implement this, because it makes absolutely no sense to re-sort the entire list internally for *every* AddGadget() or RemoveGadget() function. Even if the sorting is lightning fast by now, it would still be completely wasted resources for no reason.

I'm still thinking about whether I should provide the user with a separate command for this (e.g. InitGadgetlist() )... or whether I should let the module's resize procedure do the list sorting and detect whether the gadget list is invalid or not yet initialized. Then the sorting would be done completely transparently in the background, which would benefit the user-friendliness of the module.

For this I would need a fast method to hash all entries in the list and then compare this hash value with a stored hash value generated during the last sort.

Here now the actual V6 procedure and a few timing measurements.

Code: Select all

Procedure.i SortList_V6()
	Protected.i *RefEntry.MyStruct, iSort = 999998
	
	With StructList()
		ForEach StructList()
			; We process only the entries with a RefEntry
			If \RefEntry <> ""
				*RefEntry = FindMapElement(StructMap(), \RefEntry)
				; Invalid reference entry detected
				If *RefEntry = 0
					ProcedureReturn -1
				EndIf
				If *RefEntry\Sort = 0
					; The Entry with a not yet processed RefEntry gets the current sort value
					; Only the even slots are used for this (999998, 999996, 999994, etc.)
					\Sort = iSort
					StructMap(\Entry)\Sort = iSort
					iSort - 2
				Else
 					; If the RefEntry has already been edited, then the parent element gets
 					; the next higher sort value in the odd slots (999999, 999997, 999995, etc.) 
					\Sort = *RefEntry\Sort + 1
					StructMap(\Entry)\Sort = \Sort
				EndIf
			Else
				; Entries without RefEntry are marked with sort = 1
				\Sort = 1
			EndIf
		Next
	EndWith
	
	ForEach StructList()
		; We update the list, but only the entries with sort = 0 to save some execution time.
		; Only these entries contain uninitialized data.
		; All other list entries have already been updated in the loop above.
		If StructList()\Sort = 0
			StructList()\Sort = StructMap(StructList()\Entry)\Sort
		EndIf
	Next		
	
	SortStructuredList(StructList(), #PB_Sort_Ascending, OffsetOf(MyStruct\Sort), TypeOf(MyStruct\Sort))
	ProcedureReturn 0
	
EndProcedure

Code: Select all

Listsize: 24
V3 Algorithm: 14 microseconds.
V4 Algorithm: 6 microseconds.
V5 Algorithm: 18 microseconds.
V5 extended: 22 microseconds.
V6 Algorithm: 3 microseconds.

Listsize: 96
V3 Algorithm: 49 microseconds.
V4 Algorithm: 21 microseconds.
V5 Algorithm: 50 microseconds.
V5 extended: 57 microseconds.
V6 Algorithm: 17 microseconds.

Listsize: 240
V3 Algorithm: 102 microseconds.
V4 Algorithm: 41 microseconds.
V5 Algorithm: 71 microseconds.
V5 extended: 83 microseconds.
V6 Algorithm: 34 microseconds.

Listsize: 2400
V3 Algorithm: 1680 microseconds.
V4 Algorithm: 663 microseconds.
V5 Algorithm: 718 microseconds.
V5 extended: 738 microseconds.
V6 Algorithm: 533 microseconds.

Listsize: 5520
V3 Algorithm: 6409 microseconds.
V4 Algorithm: 2195 microseconds.
V5 Algorithm: 1818 microseconds.
V5 extended: 1853 microseconds.
V6 Algorithm: 1870 microseconds.

Re: Optimization of a sorting algorithm

Posted: Sat Oct 07, 2023 5:07 pm
by yuki
Kurzer wrote: Sat Oct 07, 2023 2:11 pm I have now fixed the incorrect sorting in V6 sorting. Yuki, your data is now sorted correctly by V6 as well.
I hate to be poo-poo bearer of bad news, but it seems this case sorts strangely:

Code: Select all

AddElement(StructList()) : StructList()\Entry = "L1"
AddElement(StructList()) : StructList()\Entry = "L2" : StructList()\RefEntry = "L1"
AddElement(StructList()) : StructList()\Entry = "L4" : StructList()\RefEntry = "L3"
AddElement(StructList()) : StructList()\Entry = "L3" : StructList()\RefEntry = "L2"
(sorts to: L1, L4, L2, L3, instead of: L1, L2, L3, L4)

Given the flat list of entries, collecting chains of arbitrary depth with just two iterations would be difficult unless depth/relation data is present on the entries themselves. For this reason, V4 and V5 continuously iterate until a solution is found (or bail if none exists).
Kurzer wrote: Sat Oct 07, 2023 2:11 pm Yes, that is absolutely correct.
However, due to the use case (arranging GUI elements in a window), I would limit myself to re-sorting the list when (subsequently) adding or removing a gadget.

I'm not sure yet how to implement this, because it makes absolutely no sense to re-sort the entire list internally for *every* AddGadget() or RemoveGadget() function. Even if the sorting is lightning fast by now, it would still be completely wasted resources for no reason.
Yeah! I'd usually defer this sort of thing. Instead of re-sorting on each GUI element change, you could just have any such changes set an isDirty = #True.

Then, on the next render frame, if isDirty is true, re-sorting and other housekeeping can be done before necessary changes are applied to presentation in bulk. (Though, you might just use a list/map of change states to serve as isDirty instead)

Though, this assumes presentation happens at set frame points rather than immediately, which you might not find desirable. It can be a pretty good way to prevent flickering/partial changes though (double-buffering, and either explicitly painting on request or doing it implicitly in your event-polling function, under the reasoning that almost all changes happen in-between event-poll calls).

Re: Optimization of a sorting algorithm

Posted: Sat Oct 07, 2023 9:07 pm
by Kurzer
Yuki, thanks for your further test, which showed that V6 is still not working properly.
I will use your V4 sorting until further notice. But at the moment some other problems arise while working on the WinHandler modul.

I think I have to completely rebuild the internal management of the gadgets after all, so that the data is not kept in a map but directly in a list (which I can then sort). For random access I will then create a lookup map that contains the name of the gadgt and the pointer to the associated list entry.

Conceptually, I also need to think about what to do when a gadget is removed that is referenced by another gadget. The sort procedure probably won't be the only thing needed for these reference dependencies. I'll have to sleep on it. The module was not originally designed for references to other gadgets, so I should think again about the whole concept and implementation.

But I am glad that I can finally finish the list sorting. I think that V4 is by now an optimal solution, which (as you can see from my two failed attempts) cannot be surpassed so easily.

Re: Optimization of a sorting algorithm

Posted: Sun Oct 08, 2023 4:58 am
by wilbert
Kurzer wrote: Sat Oct 07, 2023 9:07 pm Conceptually, I also need to think about what to do when a gadget is removed that is referenced by another gadget.
Is there a specific reason for using this reference based approach ?
When gadget 1 points to gadget 2 which points to gadget 3 (gadget 1 -> gadget 2 -> gadget 3),
can't you use group numbers and assign all of them the same group number instead of referencing each other?
Common information could be stored in a group object and if you keep a counter for the amount of members belonging to a group, the group could be removed if it has no more members. This way it is much easier to remove a gadget.
I don't know if you are currently using it but you can use SetGadgetData() to set a pointer and link a gadget to an object directly.

I also have a V7 for you to test that I created.
I hope that I didn't make any mistakes this time :shock:
When I looked up topological sorting, I came across Kahn's algorithm. The code is based on that.

Code: Select all

Procedure.i SortList_V7()
  Protected.MyStruct *check, *current, *last = LastElement(StructList())
  
  ForEach StructList()
    If StructList()\RefEntry
      ; update reference counter
      StructMap(StructList()\RefEntry)\Sort + 1
    EndIf
  Next
  
  ; sort the list
  While *last And *last <> *check
    *check = *last
    ChangeCurrentElement(StructList(), *last)
    *current = *last
    While *current
      If FindMapElement(StructMap(), *current\Entry) = 0 Or StructMap()\Sort = 0
        If *current\RefEntry
          StructMap(*current\RefEntry)\Sort - 1
        EndIf
        SwapElements(StructList(), *current, *last)
        *current = *last
        *last = PreviousElement(StructList())
        ChangeCurrentElement(StructList(), *current)
      EndIf
      *current = PreviousElement(StructList())
    Wend
  Wend
  
EndProcedure

A pointer based alternative.

Code: Select all

Structure sl_node
  *nxt.sl_node: *prv.sl_node: e.MyStruct
EndStructure

Procedure.i SortList_V7b()
  Protected.sl_node *check, *current, *last = LastElement(StructList())
  
  If *last
    *last - SizeOf(Integer) << 1
  EndIf
    
  *current = *last
  While *current
    If *current\e\RefEntry
      ; update reference counter
      StructMap(*current\e\RefEntry)\Sort + 1
    EndIf
    *current = *current\prv
  Wend
  
  ; sort the list
  While *last And *last <> *check
    *check = *last
    *current = *last
    While *current
      If FindMapElement(StructMap(), *current\e\Entry) = 0 Or StructMap()\Sort = 0
        If *current\e\RefEntry
          StructMap(*current\e\RefEntry)\Sort - 1
        EndIf
        SwapElements(StructList(), @*current\e, @*last\e)
        Swap *current, *last
        *last = *last\prv
      EndIf
      *current = *current\prv
    Wend
  Wend
  
EndProcedure

And a pointer based alternative with a local map instead of global structmap.

Code: Select all

Structure sl_node
  *nxt.sl_node: *prv.sl_node: e.MyStruct
EndStructure

Procedure.i SortList_V7c()
  Protected.sl_node *check, *current, *last = LastElement(StructList())
  
  Static NewMap RefCount.i(2048)
  ClearMap(RefCount())
  
  If *last
    *last - SizeOf(Integer) << 1
  EndIf
    
  *current = *last
  While *current
    If *current\e\RefEntry
      ; update reference counter
      RefCount(*current\e\RefEntry) + 1
    EndIf
    *current = *current\prv
  Wend
  
  ; sort the list
  While *last And *last <> *check
    *check = *last
    *current = *last
    While *current
      If FindMapElement(RefCount(), *current\e\Entry) = 0 Or RefCount() = 0
        If *current\e\RefEntry
          RefCount(*current\e\RefEntry) - 1
        EndIf
        SwapElements(StructList(), @*current\e, @*last\e)
        Swap *current, *last
        *last = *last\prv
      EndIf
      *current = *current\prv
    Wend
  Wend
  
EndProcedure

Re: Optimization of a sorting algorithm

Posted: Mon Oct 09, 2023 1:01 am
by Olli
I am puzzled by the reason of these swaps...
:mrgreen:
This subject will shoot a record ! :lol:

Re: Optimization of a sorting algorithm

Posted: Mon Oct 09, 2023 5:40 am
by wilbert
Olli wrote: Mon Oct 09, 2023 1:01 am I am puzzled by the reason of these swaps...
To sort a list you have to move it's elements into the right place.
One way is MoveElement(), the other one SwapElements() .
For my approach, SwapElements was the faster and easier one of the two.

Re: Optimization of a sorting algorithm

Posted: Mon Oct 09, 2023 9:22 am
by Olli
Very clever.

First, a swap does not require neither time-consuming element insertion, nor time-consuming element deletion.

Second, on the two swaped element, the first one moves up, and is analyzed to be moved. The second one moves down and is not analyzed to be moved. However, the move down grows the chances to the all element to reach the order.

This also causes the question, to the coder, if a quicker array is not enough, instead of a list.

Re: Optimization of a sorting algorithm

Posted: Mon Oct 09, 2023 5:13 pm
by yuki
Ooh, if the sort field's value isn't strictly required to convey order (but instead used purely for accelerating sorting), we can squeeze more gains!

I present — for today's episode of "Optimisation of a Sorting Algorithm" — the fastest yet... V8:

Code: Select all

Procedure.i SortList_V8()
  Protected *next.MyStruct
  Protected *parent.MyStruct
  Protected *current.MyStruct
  Protected *tail.MyStruct = LastElement(StructList())
  
  ; Associate each list entry with its corresponding map entry.
  ;
  ; You could avoid this, and thus increase both init and sorting performance by changing
  ; the map from StructMap.MyStruct() -> *StructMap.MyStruct().
  ;
  ; In order to fit within the existing framework without structural or init changes, the
  ; "Sort" field is used to hold a pointer to the map entry's relevant entry. Such values
  ; are never actually sorted (because they're ambiguous) but are used to accelerate sort
  ; operations below.
  ForEach StructList()
    StructMap(StructList()\Entry)\Sort = @StructList()
  Next
  
  ResetList(StructList())
  *next = NextElement(StructList())
  Repeat
    *current = *next
    ; Exit if end reached.
    If Not *current
      Break
    EndIf
    
    ; Store next element which we'll continue to after this one (if any). We may move the
    ; current element, so it's important to keep track of where we should continue from.
    *next = NextElement(StructList())
    
    ; Skip if element is already sorted.
    If *current\Sort
      Continue
    EndIf
    
    If *current\RefEntry
      *parent = StructMap(*current\RefEntry)\Sort
      ; Detect missing reference or direct cycle, returning -1 error code.
      If Not *parent Or *parent = *current
        DebuggerWarning("Invalid reference (missing or cyclic) detected: " + *current\Entry + " -> " + *current\RefEntry)
        ProcedureReturn -1
      EndIf
      ; Restore the element being inspected as active list entry as we must now move it.
      ChangeCurrentElement(StructList(), *current)
      ; If parent is already sorted, move into position by it. Otherwise, defer for later.
      If *parent\Sort
        *current\Sort = #True
        MoveElement(StructList(), #PB_List_After, *parent)
        If *parent = *tail
          *tail = *current
        EndIf
      Else
        MoveElement(StructList(), #PB_List_After, *tail)
        *tail = *current
      EndIf
      ; Continue on from the element which would follow the current prior to movement.
      If *next
        ChangeCurrentElement(StructList(), *next)
      EndIf
    Else
      ; Element has no reference so is considered already sorted.
      *current\Sort = #True
    EndIf
  ForEver
EndProcedure
Timings (PB 6.03b10 – ASM Release, Windows 11 x64, 5800X):

Code: Select all

| Size |  V1 (µs) |  V2 (µs) |  V3 (µs) |  V4 (µs) | V4B (µs) |  V5 (µs) |  V6 (µs) |  V7 (µs) | V7B (µs) | V7C (µs) |  V8 (µs) |
|------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|
|   10 |      1.8 |      1.6 |      1.8 |     18.1 |      1.2 |     16.1 |      0.9 |      1.2 |      1.1 |     16.8 |      0.9 |
|   20 |      3.5 |      2.8 |      3.0 |      4.5 |      2.0 |      2.8 |      1.4 |      2.1 |      2.0 |      3.3 |      1.6 |
|   50 |     10.9 |      7.0 |      6.7 |     10.1 |      3.9 |      8.0 |      2.8 |      4.3 |      4.0 |      8.2 |      2.9 |
|  100 |     32.5 |     17.9 |     12.4 |     16.8 |      6.7 |     14.8 |      4.8 |      7.8 |      7.1 |     12.0 |      5.2 |
|  200 |    113.7 |     55.5 |     24.4 |     30.1 |     12.6 |     28.0 |      9.0 |     15.7 |     14.6 |     23.1 |     10.4 |
|  300 |    245.0 |    114.0 |     40.2 |     44.9 |     21.3 |     43.3 |     14.5 |     27.0 |     25.1 |     32.1 |     17.4 |
|  500 |    654.4 |    293.3 |     75.6 |     74.1 |     39.6 |     73.7 |     26.3 |     54.1 |     51.2 |     55.6 |     33.5 |
| 1000 |   2553.1 |   1104.8 |    184.4 |    139.0 |     94.4 |    147.8 |     62.7 |    140.6 |    134.1 |    123.7 |     84.8 |
(methodology: average across 100 runs)
(each sample for each method is isolated in a separate process. i.e., 1100 sequential RunProgram(...) calls to account for 100 tests × 11 methods)
(processes report their own timings, so only sorting is measured, excluding launch/teardown as well as effects from other tests)


Correctness and speedup factor vs. V1 (@1000 items):

Code: Select all

| Sorting Method | Is Sorted OK | Speedup vs. V1 |
|----------------|--------------|----------------|
|       V6       |      No      |         40.72x |
|       V8       |      Yes     |         30.11x |
|       V4B      |      Yes     |         27.05x |
|       V7C      |      Yes     |         20.64x |
|       V7B      |      Yes     |         19.04x |
|       V4       |      Yes     |         18.37x |
|       V7       |      Yes     |         18.16x |
|       V5       |      Yes     |         17.27x |
|       V3       |      Yes     |         13.85x |
|       V2       |      Yes     |          2.31x |
|       V1       |      Yes     |          1.00x |
V8 isn't super impactful, but does inch ahead of V4B some.

It'll be a bit difficult to squeeze out more gains without changing data structures up a bit though (V8 source has one hint on this, which would also simplify development due to removing a need for structural copies into the map).

Re: Optimization of a sorting algorithm

Posted: Mon Oct 09, 2023 8:46 pm
by Olli
I do not find my duration for this code ! (page 3)

Code: Select all

Procedure entryListSortLinks(*this.entryList)
 Protected *that.entries
 Protected hnd ; HaNDling status flag
 Protected alpha, beta ; pointers
 With *this\e()
  entryListResetTheDones(*this)
  entryListCreateMap(*this)
  *that = *this\m
  FirstElement(*this\e() )
  Repeat
   If \ref
    If hnd = 0
     alpha = *this\e()
     hnd = 1
    EndIf
    beta = *this\e()
    \done = 1
    \ref = *that\key(\refName) ; <--- Added
    ChangeCurrentElement(*this\e(), \ref)
    If \done
     ChangeCurrentElement(*this\e(), beta)
     \done = 1
    Else
     MoveElement(*this\e(), #PB_List_Before, beta)
     \done = 1
     PreviousElement(*this\e() )
     Continue
    EndIf
   Else
    If hnd
     hnd = 0
     ChangeCurrentElement(*this\e(), alpha)
    EndIf
    \done = 1
   EndIf
  Until NextElement(*this\e() ) = 0
  entryListDestroyMap(*this)
 EndWith
EndProcedure

Re: Optimization of a sorting algorithm

Posted: Mon Oct 09, 2023 9:26 pm
by yuki
Olli wrote: Mon Oct 09, 2023 8:46 pm I do not find my duration for this code ! (page 3)
Ah, I can measure and add timing for it if you'd like.

There are two compiler errors that I'm not sure if I've fixed correctly, because final sort order isn't valid (elements appear before their ref).

In entryListCreateMap(…):

Code: Select all

; ═════ Before ══════════════════════════════════════════════════
   *that\key(\name) = \e()
; ═════ After  ══════════════════════════════════════════════════
   *that\key(\e()\name) = \e()
In entryListAdd(…):

Code: Select all

; ═════ Before ══════════════════════════════════════════════════
  With *this
; ═════ After  ══════════════════════════════════════════════════
  Protected *el.entry = AddElement(*this\e())
  With *el
If I've made a goof, do let me know!

Re: Optimization of a sorting algorithm

Posted: Mon Oct 09, 2023 10:28 pm
by Olli
Have you got a header ?

(
- the kurzer usual structure
- a common initial storage (with randomSeed(#publicNumber)
)

I will give a complete code either. I think my personnal optimization is alternate the truth of the variable done for not to waste a time to zero it/them.

And I ignore if the map must be taken in the time (in this way, do we have to include its deletion in the measured time ?)

Re: Optimization of a sorting algorithm

Posted: Tue Oct 10, 2023 8:37 am
by wilbert
Timings are so inconsistent on my computer.
I think the current form of comparing might not be accurate enough when you are counting microseconds.

Navigating a list or map jumps through a lot of different memory locations.
At the moment, before testing another algorithm, the list is cleared and all elements are added again.
As a result of this, the elements are not at the same memory locations and one algorithm may be lucky and get elements allocated more closely together as another one.
It might be more fair to add a field to the list with the initial order and sort the list on that field and reset the \Sort values to 0 before testing a new algorithm instead of recreating the list.
That way the exact same list is presented to the next algorithm.
The same could be done for the map.

Re: Optimization of a sorting algorithm

Posted: Tue Oct 10, 2023 11:24 am
by wilbert
yuki wrote: Mon Oct 09, 2023 5:13 pmI present — for today's episode of "Optimisation of a Sorting Algorithm" — the fastest yet... V8:
That's fast. :)

I edited my post.
It might be a little bit faster to mark the elements without a RefEntry already sorted in the beginning.

Code: Select all

Procedure.i SortList_V8a()
  Protected *next.MyStruct
  Protected *parent.MyStruct
  Protected *current.MyStruct
  
  ; Associate each list entry with its corresponding map entry.
  ;
  ; You could avoid this, and thus increase both init and sorting performance by changing
  ; the map from StructMap.MyStruct() -> *StructMap.MyStruct().
  ;
  ; In order to fit within the existing framework without structural or init changes, the
  ; "Sort" field is used to hold a pointer to the map entry's relevant entry. Such values
  ; are never actually sorted (because they're ambiguous) but are used to accelerate sort
  ; operations below.
  ForEach StructList()
    StructMap(StructList()\Entry)\Sort = @StructList()
    If StructList()\RefEntry = #Empty$
      ; Element has no reference so is considered already sorted.
      StructList()\Sort = #True
    EndIf
  Next
  
  *next = FirstElement(StructList())
  While *next
    *current = *next
    
    ; Store next element which we'll continue to after this one (if any). We may move the
    ; current element, so it's important to keep track of where we should continue from.
    *next = NextElement(StructList())
    
    ; Skip if element is already sorted.
    If Not *current\Sort
      *parent = StructMap(*current\RefEntry)\Sort
      ; Detect missing reference or direct cycle, returning -1 error code.
      If Not *parent Or *parent = *current
        DebuggerWarning("Invalid reference (missing or cyclic) detected: " + *current\Entry + " -> " + *current\RefEntry)
        ProcedureReturn -1
      EndIf
      ; Set the element being inspected as active list entry as we must now move it.
      ChangeCurrentElement(StructList(), *current)
      ; If parent is already sorted, move into position by it. Otherwise, defer for later.
      If *parent\Sort
        *current\Sort = #True
        MoveElement(StructList(), #PB_List_After, *parent)
      Else
        MoveElement(StructList(), #PB_List_Last)
      EndIf
      ; Continue on from the element which would follow the current prior to movement.
      If *next
        ChangeCurrentElement(StructList(), *next)
      EndIf
    EndIf
    
  Wend
EndProcedure

Re: Optimization of a sorting algorithm

Posted: Tue Oct 10, 2023 12:20 pm
by yuki
Olli wrote: Mon Oct 09, 2023 10:28 pm Have you got a header ?

(
- the kurzer usual structure
- a common initial storage (with randomSeed(#publicNumber)
)

I will give a complete code either. I think my personnal optimization is alternate the truth of the variable done for not to waste a time to zero it/them.

And I ignore if the map must be taken in the time (in this way, do we have to include its deletion in the measured time ?)
Anything is fine as long as it fits between a Start = ElapsedMicroseconds() and Duration = ElapsedMicroseconds() - Start.
Time to initialise/destroy shouldn't be measured, unless your sort method must perform such initialisation/destruction every time it's invoked.
wilbert wrote: Tue Oct 10, 2023 11:24 am It won't make a difference for performance but I think you could do without the *tail variable.
This should do exactly the same (if I'm not mistaken).
Good catch! I totally missed #PB_List_Last!
wilbert wrote: Tue Oct 10, 2023 8:37 am Timings are so inconsistent on my computer.
I think the current form of comparing might not be accurate enough when you are counting microseconds.

Navigating a list or map jumps through a lot of different memory locations.
At the moment, before testing another algorithm, the list is cleared and all elements are added again.
As a result of this, the elements are not at the same memory locations and one algorithm may be lucky and get elements allocated more closely together as another one.
It might be more fair to add a field to the list with the initial order and sort the list on that field and reset the \Sort values to 0 before testing a new algorithm instead of recreating the list.
That way the exact same list is presented to the next algorithm.
The same could be done for the map.
They're pretty consistent here, running each single iteration of a sort method in its own high-priority process, e.g.:

Code: Select all

Init(argSize)

Select argSortMethod
  Case "V1"
    StartTime = ElapsedMicroseconds()
    SortList_V1()
    Duration = ElapsedMicroseconds() - StartTime
  Case "V2"
    StartTime = ElapsedMicroseconds()
    SortList_V2()
    Duration = ElapsedMicroseconds() - StartTime
  ; omit: rest of sort methods for brevity …
EndSelect

If argShouldWriteTiming
  ; omit: open timings file and append entry …
EndIf

If argShouldWriteResults
  ; omit: open results file and overwrite with content of StructList, so it can be externally validated …
EndIf
(I also changed ElapsedMicroseconds() to measure fractions of a microsecond, so the end result is, e.g., 336 for 33.6 microseconds)


The first row of times does seem bizarre (though it's actually pretty explainable):
yuki wrote: Mon Oct 09, 2023 5:13 pm

Code: Select all

| Size |  V1 (µs) |  V2 (µs) |  V3 (µs) |  V4 (µs) | V4B (µs) |  V5 (µs) |  V6 (µs) |  V7 (µs) | V7B (µs) | V7C (µs) |  V8 (µs) |
|------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|
|   10 |      1.8 |      1.6 |      1.8 |     18.1 |      1.2 |     16.1 |      0.9 |      1.2 |      1.1 |     16.8 |      0.9 |
|   20 |      3.5 |      2.8 |      3.0 |      4.5 |      2.0 |      2.8 |      1.4 |      2.1 |      2.0 |      3.3 |      1.6 |
I meant to expand on this, but was waiting a moment for Olli's revisions before rerunning tests (it takes a moment to run them all).

For methods which happen to be slow with size 10, they all write into large-ish maps/arrays for the first time:

Code: Select all

Procedure SortList_V4()
  Protected numAssignedInCurrentPass
  Protected *parent.MyStruct
  Static NewMap *handledOrderMap.MyStruct(4096)

Procedure SortList_V5()
  Static NewMap seenOrderMap.i(2048)
  Static Dim *pendingEntries.MyStruct(0)

Procedure.i SortList_V7c()
  Protected.sl_node *check, *current, *last = LastElement(StructList())
  
  Static NewMap RefCount.i(2048)
If the memory backing these isn't in cache, measured times will suffer as a result of priming that. Also, depending on OS, accessing a memory page for the first time can impose relatively serious cost as it's lazily zeroed.

Coincidentally, for non-10 sizes, the memory regions necessary for these data-structures seem to be well-primed as a side-effect of initialisation.

If we run a warming phase which runs the target sorting method a number of times before restoring initial list state and beginning measurements, things have a good opportunity to be readied into cache.

Timings, with warming (PB 6.03b10 – ASM Release, Windows 11 x64, 5800X):

Code: Select all

| Size |  V1 (µs) |  V2 (µs) |  V3 (µs) |  V4 (µs) | V4B (µs) |  V5 (µs) |  V6 (µs) |  V7 (µs) | V7B (µs) | V7C (µs) |  V8 (µs) | V8B (µs) |
|------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|
|   10 |      1.1 |      0.9 |      1.0 |      2.6 |      0.6 |      1.9 |      0.4 |      0.8 |      0.7 |      2.1 |      0.5 |      0.5 |
|   20 |      2.8 |      2.0 |      1.9 |      3.7 |      1.1 |      2.9 |      0.8 |      1.6 |      1.4 |      3.1 |      1.1 |      1.0 |
|   30 |      5.2 |      3.4 |      2.9 |      5.1 |      1.7 |      4.2 |      1.1 |      2.4 |      2.2 |      4.1 |      1.6 |      1.5 |
|   50 |     11.3 |      6.8 |      5.2 |      7.5 |      2.8 |      6.5 |      1.9 |      4.0 |      3.8 |      6.2 |      2.7 |      2.4 |
|  100 |     34.4 |     19.1 |     10.5 |     13.5 |      5.7 |     12.4 |      4.0 |      7.7 |      7.2 |     10.9 |      5.2 |      4.7 |
|  200 |    112.8 |     54.4 |     20.8 |     22.6 |     10.8 |     21.7 |      7.9 |     14.8 |     14.1 |     18.4 |      9.8 |      9.1 |
|  300 |    244.3 |    113.6 |     33.9 |     33.5 |     17.6 |     31.7 |     12.1 |     24.2 |     22.6 |     28.4 |     15.8 |     14.1 |
|  500 |    663.7 |    298.4 |     69.0 |     59.4 |     35.5 |     56.4 |     23.5 |     51.4 |     48.7 |     52.9 |     32.1 |     28.6 |
| 1000 |   2568.7 |   1138.0 |    177.1 |    125.8 |     91.3 |    120.4 |     60.2 |    138.5 |    133.1 |    110.9 |     83.8 |     73.2 |
(methodology: average across 100 runs)
(each sample for each method is isolated in a separate process. i.e., 1100 sequential RunProgram(...) calls to account for 100 tests × 11 methods)
(processes report their own timings, so only sorting is measured, excluding launch/teardown as well as effects from other tests)
(for each sample, 10 sort rounds of the respective method are run before re-initialising the list and beginning measurement)


The warmed results are more realistic if you expect sorting to happen relatively often, but cold ones are interesting as well.

Re: Optimization of a sorting algorithm

Posted: Tue Oct 10, 2023 12:25 pm
by wilbert
I updated my previous post.
I believe I could improve a tiny bit on the speed of your V8 by marking the elements without a reference a bit earlier.
yuki wrote: Tue Oct 10, 2023 12:20 pm The warmed results are more realistic if you expect sorting to happen relatively often, but cold ones are interesting as well.
That might be it.
I just copied the code earlier in this thread which does only one run and any algorithm can suddenly be a lot slower and the next time it can be fast and another one slower.