@DontTalkToMe,
Thank you for the explanation: I didn't realize.
Another Binary Tree Implementation
Re: Another Binary Tree Implementation
Yes it is possible (also map can be used too), but I preferred array due to a speed.davido wrote:@Lunasole,
Would it be possible to use a List?
I appreciate that speed will be reduced by a factor of about 2, but Lists are designed to have elements added and deleted.
I don't see a problem to delete item from array-based tree - no need to "physically" remove a deleted node from array [but if neccessay, can use Swap() to replace a deleted item by new Node, etc], just remove references to it from other nodes. Althought in my case no deletion and other functional is required so didn't added it, I'm just using such tree to quickly remove duplicate strings from text files ^^
Also after Don'tTalkToMe explanations I had some idea to use raw memory allocation - create a new buffer every time when tree expansion needed + still continue using old buffers at the same time, so pointers didn't become broken. that would allow to use pointers and also grants a high-speed, but seems to be is complicated a bit so I didn't tried yet.
For now I just replaced all pointers to array indexes, that can be much slower than pointers, but in total still good and works fine. Check the first post for code.
Also old example with pointers would be good and working nice too, but it requires to set Array size once at start to size that will fit all items you are going to add and not allowing to change it further, so the tree will be fixed and limited by that size.
"W̷i̷s̷h̷i̷n̷g o̷n a s̷t̷a̷r"
Re: Another Binary Tree Implementation
Just tried for fun the same code using List and all related internal functions.
WTF, IT WAS SO SLOW (but i expected that, so that's why used array earlier)
Although, the list itself adds elements faster than Array + Redim (), but if using SelectElement() access everywhere where needed it becomes really EXTREMALY SLOW.
So then I replaced SelectElement() with pointers.
The result (see in code) is some kind faster than array from previous example (but depends of how often Redim() is used, if increase array step is large enough, the speed in both cases will be ~ the same).
And after testing on 600k+ strings, pointers to list elements have not become broken like pointers to array elements after array resize, so seems this is a one advantage of List () usage.
WTF, IT WAS SO SLOW (but i expected that, so that's why used array earlier)
Although, the list itself adds elements faster than Array + Redim (), but if using SelectElement() access everywhere where needed it becomes really EXTREMALY SLOW.
So then I replaced SelectElement() with pointers.
The result (see in code) is some kind faster than array from previous example (but depends of how often Redim() is used, if increase array step is large enough, the speed in both cases will be ~ the same).
And after testing on 600k+ strings, pointers to list elements have not become broken like pointers to array elements after array resize, so seems this is a one advantage of List () usage.
Code: Select all
EnableExplicit
Structure TreeNode
*Right.TreeNode ; pointer to right follower (list element)
*Left.TreeNode ; pointer to left follower
Data.s ; stored data string
EndStructure
Global NewList TreeNodes.TreeNode () ; "backend" list
Global TreeGrows.a = 0 ; current tree state, if 1 tree has been initialized
Global TMP.TreeNode ; to store a new node data while processing
; this function receives array index and starts/continues from that node
; if new node data is > than actual node data, it goes to right, else left. If data is equal, false is returned and recursion stops
Procedure.b TreeRecurse (*Index.TreeNode)
; SelectElement(TreeNodes(), Index) ; why SO SLOW
Define CMP.i = CompareMemoryString(@TMP\Data, @*Index\Data, #PB_String_CaseSensitive)
If CMP > 0
If Not (*Index\Right = 0) ; if there is follower
ProcedureReturn TreeRecurse(*Index\Right) ; ... continue recursive search
Else
*Index\Right = AddElement(TreeNodes()) ; set new node as follower of actual node (right vector)
TreeNodes () = TMP ; save new node into array
;Debug "+right"
ProcedureReturn #True
EndIf
ElseIf CMP < 0 ; all the same, but left vector
If Not (*Index\Left = 0)
ProcedureReturn TreeRecurse(*Index\Left)
Else
*Index\Left = AddElement(TreeNodes())
TreeNodes () = TMP
;Debug "+left"
ProcedureReturn #True
EndIf
EndIf
EndProcedure
; add new node to a tree. if success, return true. if node with the same data exists, function will return false
; in total, all this is made to find and delete duplicate strings [the fastest way I know], but may be modified and extended for many other things
Procedure.b TreeInsert (iStr.s)
Tmp\Left = 0 ; clr temp node
Tmp\Right = 0
Tmp\Data = iStr
If TreeGrows = 0 ; initiate list
AddElement(TreeNodes())
TreeNodes() = TMP
TreeGrows = 1
ProcedureReturn #True
Else ; proceed tree starting from Parent node
SelectElement(TreeNodes(), 0)
ProcedureReturn TreeRecurse(@TreeNodes())
EndIf
EndProcedure
; erase tree
Procedure TreeClear ()
ClearList (TreeNodes())
TreeGrows = 0
EndProcedure
;usage
Debug TreeInsert("2")
Debug TreeInsert("1")
Debug TreeInsert("4")
Debug TreeInsert("1") ; duplicate, will return false
Debug TreeInsert("01")
; ForEach TreeNodes()
; Debug TreeNodes()\right
; Debug TreeNodes()\left
; Debug TreeNodes()\Data
; Debug "------"
; Next TreeNodes()
Debug "Tree List: " + Str(ListSize (TreeNodes())) + " filled"
TreeClear () ; discard tree
"W̷i̷s̷h̷i̷n̷g o̷n a s̷t̷a̷r"
Re: Another Binary Tree Implementation
A bit of useless statistic
It was funny to compare Purebasic program using this stuff to deal with double strings in files, with 1:1 equal C program (both for windows, 32bit, MinGW compiler used for C). The more funny was that Purebasic unexpectedly showed better results ^^
Thanx to my friend who liked my idea of such useless test, ported my code to C and provided this screenshot.
Recently he also tested same code with Python [without using stl crap to implement a tree, and not translating python to that poor C code] and this was ridiculous (but expected) - wasted time was 17s, not even saying about all other params. Even outdated VB6 [also well-known about how it's VM is slow when doing with strings] showed much better results - 7s.


It was funny to compare Purebasic program using this stuff to deal with double strings in files, with 1:1 equal C program (both for windows, 32bit, MinGW compiler used for C). The more funny was that Purebasic unexpectedly showed better results ^^
Thanx to my friend who liked my idea of such useless test, ported my code to C and provided this screenshot.
Recently he also tested same code with Python [without using stl crap to implement a tree, and not translating python to that poor C code] and this was ridiculous (but expected) - wasted time was 17s, not even saying about all other params. Even outdated VB6 [also well-known about how it's VM is slow when doing with strings] showed much better results - 7s.

"W̷i̷s̷h̷i̷n̷g o̷n a s̷t̷a̷r"
Re: Another Binary Tree Implementation
Hah, it surely was too optimistic to think that such results are permanent.
When using maximum optimizations with C [-o3] it beats all the shit from PB code, doing that thing 0.4s faster than PB, and producing exe with size much lesser, when tell linker to strip all.
Although the PB still had some supremacy with memory usage and other resources wasting [OS handles, etc]. Doubtful advantage, but it is at least on Windows ^^
Well, in my opinion PB by speed/abilities/resulting code efficiency is somewhere there:
[scriptkiddies languages, like python, php, js, as, powershell, and lot of such]
[entry-level general-purpose languages, like java/.net/VB6]
[delphi, purebasic, general-purpose too]
[true-maximum-pro universal languages, like C/C++]
But also it's unique in that part that it can be easily used both to writing "simple scripts" and something more serious, doing that with same efficiency.
PS. It's still about "useless statistics" ^^
When using maximum optimizations with C [-o3] it beats all the shit from PB code, doing that thing 0.4s faster than PB, and producing exe with size much lesser, when tell linker to strip all.
Although the PB still had some supremacy with memory usage and other resources wasting [OS handles, etc]. Doubtful advantage, but it is at least on Windows ^^
Well, in my opinion PB by speed/abilities/resulting code efficiency is somewhere there:
[scriptkiddies languages, like python, php, js, as, powershell, and lot of such]
[entry-level general-purpose languages, like java/.net/VB6]
[delphi, purebasic, general-purpose too]
[true-maximum-pro universal languages, like C/C++]
But also it's unique in that part that it can be easily used both to writing "simple scripts" and something more serious, doing that with same efficiency.
PS. It's still about "useless statistics" ^^
"W̷i̷s̷h̷i̷n̷g o̷n a s̷t̷a̷r"
Re: Another Binary Tree Implementation
Could be, but modern Java, in some cases, does better than C++...Lunasole wrote:PS. It's still about "useless statistics" ^^