Another Binary Tree Implementation

davido · Post by **davido** » Tue Oct 27, 2015 9:32 am

@DontTalkToMe,

Thank you for the explanation: I didn't realize.

Lunasole · Post by **Lunasole** » Tue Oct 27, 2015 5:19 pm

davido wrote:@Lunasole,
Would it be possible to use a List?
I appreciate that speed will be reduced by a factor of about 2, but Lists are designed to have elements added and deleted.

Yes it is possible (also map can be used too), but I preferred array due to a speed.
I don't see a problem to delete item from array-based tree - no need to "physically" remove a deleted node from array [but if neccessay, can use Swap() to replace a deleted item by new Node, etc], just remove references to it from other nodes. Althought in my case no deletion and other functional is required so didn't added it, I'm just using such tree to quickly remove duplicate strings from text files ^^

Also after Don'tTalkToMe explanations I had some idea to use raw memory allocation - create a new buffer every time when tree expansion needed + still continue using old buffers at the same time, so pointers didn't become broken. that would allow to use pointers and also grants a high-speed, but seems to be is complicated a bit so I didn't tried yet.

For now I just replaced all pointers to array indexes, that can be much slower than pointers, but in total still good and works fine. Check the first post for code.
Also old example with pointers would be good and working nice too, but it requires to set Array size once at start to size that will fit all items you are going to add and not allowing to change it further, so the tree will be fixed and limited by that size.

Lunasole · Post by **Lunasole** » Wed Nov 11, 2015 12:05 am

Just tried for fun the same code using List and all related internal functions.
WTF, IT WAS SO SLOW (but i expected that, so that's why used array earlier)

Although, the list itself adds elements faster than Array + Redim (), but if using SelectElement() access everywhere where needed it becomes really EXTREMALY SLOW.
So then I replaced SelectElement() with pointers.
The result (see in code) is some kind faster than array from previous example (but depends of how often Redim() is used, if increase array step is large enough, the speed in both cases will be ~ the same).

And after testing on 600k+ strings, pointers to list elements have not become broken like pointers to array elements after array resize, so seems this is a one advantage of List () usage.

Code: Select all

EnableExplicit

Structure TreeNode
  *Right.TreeNode	; pointer to right follower (list element)
  *Left.TreeNode  	; pointer to left follower
  Data.s     		; stored data string
EndStructure

Global NewList TreeNodes.TreeNode () 	; "backend" list
Global      TreeGrows.a = 0         	; current tree state, if 1 tree has been initialized
Global      TMP.TreeNode            	; to store a new node data while processing

; this function receives array index and starts/continues from that node
; if new node data is > than actual node data, it goes to right, else left. If data is equal, false is returned and recursion stops
Procedure.b TreeRecurse (*Index.TreeNode)
; 	SelectElement(TreeNodes(), Index)         ; why SO SLOW
  Define CMP.i = CompareMemoryString(@TMP\Data, @*Index\Data, #PB_String_CaseSensitive) 
    If CMP > 0
        If Not (*Index\Right = 0)                   ; if there is follower
            ProcedureReturn TreeRecurse(*Index\Right)  ; ... continue recursive search
        Else
             *Index\Right = AddElement(TreeNodes()) ; set new node as follower of actual node (right vector)
            TreeNodes () = TMP               ; save new node into array
            ;Debug "+right"
              ProcedureReturn #True
        EndIf

     ElseIf CMP < 0                                   ; all the same, but left vector
        If Not (*Index\Left = 0)
            ProcedureReturn TreeRecurse(*Index\Left)
        Else
            *Index\Left = AddElement(TreeNodes())
            TreeNodes () = TMP
            ;Debug "+left"
              ProcedureReturn #True
        EndIf
    EndIf
EndProcedure

; add new node to a tree. if success, return true. if node with the same data exists, function will return false
; in total, all this is made to find and delete duplicate strings [the fastest way I know], but may be modified and extended for many other things
Procedure.b TreeInsert (iStr.s)
  Tmp\Left = 0		; clr temp node
  Tmp\Right = 0
  Tmp\Data = iStr
  
  If TreeGrows = 0 	; initiate list
  	AddElement(TreeNodes())
    TreeNodes() = TMP
    TreeGrows = 1
		ProcedureReturn #True
  Else              ; proceed tree starting from Parent node
	SelectElement(TreeNodes(), 0)
		ProcedureReturn TreeRecurse(@TreeNodes())
  EndIf
EndProcedure

; erase tree
Procedure TreeClear ()
  ClearList (TreeNodes())
    TreeGrows = 0
EndProcedure

;usage
Debug TreeInsert("2")
Debug TreeInsert("1")
Debug TreeInsert("4")
Debug TreeInsert("1") ; duplicate, will return false
Debug TreeInsert("01")

; ForEach TreeNodes()
; 	Debug TreeNodes()\right
; 	Debug TreeNodes()\left
; 	Debug TreeNodes()\Data
; 	Debug "------"
; Next TreeNodes()

Debug "Tree List: " + Str(ListSize (TreeNodes())) + " filled"
TreeClear ()          ; discard tree

Lunasole · Post by **Lunasole** » Sat Jan 09, 2016 8:54 am

A bit of useless statistic

It was funny to compare Purebasic program using this stuff to deal with double strings in files, with 1:1 equal C program (both for windows, 32bit, MinGW compiler used for C). The more funny was that Purebasic unexpectedly showed better results ^^
Thanx to my friend who liked my idea of such useless test, ported my code to C and provided this screenshot.

Recently he also tested same code with Python [without using stl crap to implement a tree, and not translating python to that poor C code] and this was ridiculous (but expected) - wasted time was 17s, not even saying about all other params. Even outdated VB6 [also well-known about how it's VM is slow when doing with strings] showed much better results - 7s.

Lunasole · Post by **Lunasole** » Tue Jan 12, 2016 8:04 pm

Hah, it surely was too optimistic to think that such results are permanent.
When using maximum optimizations with C [-o3] it beats all the shit from PB code, doing that thing 0.4s faster than PB, and producing exe with size much lesser, when tell linker to strip all.
Although the PB still had some supremacy with memory usage and other resources wasting [OS handles, etc]. Doubtful advantage, but it is at least on Windows ^^

Well, in my opinion PB by speed/abilities/resulting code efficiency is somewhere there:
[scriptkiddies languages, like python, php, js, as, powershell, and lot of such]
[entry-level general-purpose languages, like java/.net/VB6]
[delphi, purebasic, general-purpose too]
[true-maximum-pro universal languages, like C/C++]

But also it's unique in that part that it can be easily used both to writing "simple scripts" and something more serious, doing that with same efficiency.

PS. It's still about "useless statistics" ^^

Tenaja · Post by **Tenaja** » Tue Jan 12, 2016 11:18 pm

Lunasole wrote:PS. It's still about "useless statistics" ^^

Could be, but modern Java, in some cases, does better than C++...

PureBasic Forums - English

Another Binary Tree Implementation

Re: Another Binary Tree Implementation

Re: Another Binary Tree Implementation

Re: Another Binary Tree Implementation

Re: Another Binary Tree Implementation

Re: Another Binary Tree Implementation

Re: Another Binary Tree Implementation