Another Binary Tree Implementation

Lunasole · Post by **Lunasole** » Mon Oct 26, 2015 12:09 pm

Well, In my first topic on forum I said it is very hard and complicated to implement that algorithm without using OOP, but seems I was wrong a byte ^^
Just tried and after smoking help file here is the result. Maybe not too good & optimized (I'm using PB < 1 week and don't know too much), but if you like try it.
Also would be nice to get some improvements [it is always nice]
// thanx DontTalkToMe for explaining my mistake with pointers, now pointers replaced to a simple array indexes

Code: Select all

; Binary Tree implementation using array and structures
; Copyright (C) 2015 Lunasole         
; * A horrible "Redim() and pointers bug" explained by DontTalkToMe :3
; 
; This program is Free software: you can redistribute it And/Or modify
; it under the terms of the GNU General Public License As published by
; the Free Software Foundation, either version 3 of the License, Or
; (at your option) any later version.
; 
; This program is distributed in the hope that it will be useful,
; but WITHOUT ANY WARRANTY; without even the implied warranty of
; MERCHANTABILITY Or FITNESS For A PARTICULAR PURPOSE.  See the
; GNU General Public License For more details.
; 
; You should have received a copy of the GNU General Public License
; along With this program.  If Not, see <http://www.gnu.org/licenses/>.

EnableExplicit

Structure TreeNode
  Right.i    ; pointer to right follower (array element)
  Left.i     ; pointer to left follower
  Data.s     ; stored data string
EndStructure

Global Dim  TreeNodes.TreeNode (511)  ; "backend" array. dont set initial size to zero, must be at least 1
Global      TreeNenum.i = 0         ; current amount of array elements used
Global      TreeNgrow.i = 2048      ; a step to increase array by: lower value causes more often ReDim usage and slowing tree. don't set it to <= 0
Global      TMP.TreeNode            ; to store a new node data while processing

; this function receives array index and starts/continues from that node
; if new node data is > than actual node data, it goes to right, else left. If data is equal, false is returned and recursion stops
Procedure.b TreeRecurse (Index.i)
  Define CMP.i = CompareMemoryString(@TMP\Data, @TreeNodes(Index)\Data, #PB_String_CaseSensitive) 
    If CMP > 0
        If Not (TreeNodes(Index)\Right = 0)                   ; if there is follower
            ProcedureReturn TreeRecurse(TreeNodes(Index)\Right)  ; ... continue recursive search
        Else
            TreeNodes (TreeNenum) = TMP               ; save new node into array
            TreeNodes(Index)\Right = TreeNenum        ; set new node as follower of actual node (right vector)
            ;Debug "+right"
            TreeNenum = TreeNenum+1 : If TreeNenum > ArraySize(TreeNodes()) : ReDim TreeNodes.TreeNode (TreeNenum + TreeNgrow) : EndIf ; enlarge your Array [if needed]
              ProcedureReturn #True
        EndIf
     ElseIf CMP < 0                                   ; all the same, but left vector
        If Not (TreeNodes(Index)\Left = 0)
            ProcedureReturn TreeRecurse(TreeNodes(Index)\Left)
        Else
            TreeNodes (TreeNenum) = TMP
            TreeNodes(Index)\Left = TreeNenum
            ;Debug "+left"
            TreeNenum = TreeNenum+1 : If TreeNenum > ArraySize(TreeNodes()) : ReDim TreeNodes.TreeNode (TreeNenum + TreeNgrow) : EndIf
              ProcedureReturn #True
        EndIf
    EndIf
EndProcedure

; add new node to a tree. if success, return true. if node with the same data exists, function will return false
; in total, all this is made to find and delete duplicate strings [the fastest way I know], but may be modified and extended for many other things
Procedure.b TreeInsert (iStr.s)
  Tmp\Left = 0    ; clr temp node
  Tmp\Right = 0
  Tmp\Data = iStr
  
  If TreeNenum = 0   ; set up Parent node
    TreeNodes(TreeNenum) = TMP
    TreeNenum = TreeNenum + 1
      ProcedureReturn #True
  Else              ; proceed tree starting from Parent node
      ProcedureReturn TreeRecurse(0)
  EndIf
EndProcedure

; erase tree
Procedure TreeClear ()
  Dim TreeNodes.TreeNode (1)
    TreeNenum = 0
EndProcedure

;usage
Debug TreeInsert("2")
Debug TreeInsert("1")
Debug TreeInsert("4")
Debug TreeInsert("1") ; duplicate, will return false
Debug TreeInsert("01")

Debug "Tree Array: " + Str(TreeNenum) + " of " + Str(ArraySize(TreeNodes()) + 1) + " filled"
TreeClear ()          ; discard tree

DontTalkToMe · Post by **DontTalkToMe** » Mon Oct 26, 2015 12:21 pm

If you are interested I remember at least two (balanced I think) tree implementations posted here on the forum, but you have to search for them

Lunasole · Post by **Lunasole** » Mon Oct 26, 2015 12:49 pm

DontTalkToMe wrote:If you are interested I remember at least two (balanced I think) tree implementations posted here on the forum, but you have to search for them

I know there should be, just created bicycle by myself for fun and to check how difficult it will be without OOP :3
So no needed a ready solution.

DontTalkToMe · Post by **DontTalkToMe** » Mon Oct 26, 2015 2:27 pm

Lunasole wrote: So no needed a ready solution.

You said it could be problematic to do that without OO so I meant the above just as in: look at them to see how they were done if you please.

http://www.purebasic.fr/english/viewtop ... 12&t=54525
http://www.purebasic.fr/english/viewtop ... 12&t=40116

There are countless implementations of trees, B-trees, etc. in C which is not object oriented, personally I don't see OO particularly useful for a tree (more pleasing to the eye maybe), while it could be a lot more beneficial to have some sort of templates instead (which PB doesn't have) to make the tree work easily with generic data types.

davido · Post by **davido** » Mon Oct 26, 2015 3:03 pm

@Lunasole,
Thank you for sharing.
Looks a whole lot simpler than other the posts.
It is, therefore, easier for me to understand.

DontTalkToMe · Post by **DontTalkToMe** » Mon Oct 26, 2015 4:28 pm

It's simpler as they are not exactly equivalent, so no surprise

Lunasole · Post by **Lunasole** » Mon Oct 26, 2015 5:51 pm

Oop. I'm sorry, just found a strange bug which I can't explain, it is appearing on a large trees.

To reproduce: just add following code as a test routine

Code: Select all

Define  TT.i = 0
For tt = 1 To 222222
  Debug TreeInsert(Str(tt))
Next tt

It will execute fine, but then error "Invalid memory access" raises.
And that error depends on ReDim () usage frequency -- if TreeNgrow.i = 2048 (as set in code) then error comes soon (at 4381 node), but if increase value to a 55555 for example (this greatly decreases Redim() frequency), error will come much later (at 56068 element).

I don't think that something is wrong with the Array redim code:

Code: Select all

If TreeNenum > ArraySize(TreeNodes()) : ReDim TreeNodes.TreeNode (TreeNenum + TreeNgrow) : EndIf ;

But don't know what else it can be. Have someone any ideas?

(the temporary solution is - set initial size of TreeNodes.TreeNode () to a large value that will fit all your data without using ReDim() so often)

DontTalkToMe wrote: http://www.purebasic.fr/english/viewtop ... 12&t=54525
http://www.purebasic.fr/english/viewtop ... 12&t=40116

Funny enough ^^ Should be much shorter with OOP - like in my example, but without array to store data.

DontTalkToMe · Post by **DontTalkToMe** » Mon Oct 26, 2015 8:36 pm

davido wrote: It is, therefore, easier for me to understand.

And spot the bug too ?

Lunasole wrote: But don't know what else it can be. Have someone any ideas?

Do you expect expanded memory buffers to be kept at the same base address ?
Very unlikely. Just put yourself in place of the memory manager.

Code: Select all

Structure TreeNode
  *Right    ; pointer to right follower (array element)
  *Left     ; pointer to left follower
  Data.s    ; stored data
EndStructure

Dim x.TreeNode(10)

Debug @x.TreeNode(0)

ReDim x.TreeNode(100)

Debug @x.TreeNode(0)

So after a redim your pointers are not guaranteed to point to something meaningful, likely just garbage, better to use just the array indexes.

Lunasole wrote: Funny enough ^^ Should be much shorter with OOP

What's funny ?

I don't see what big advantage could OO give here beyond a cleaner look and being more natural to use, often a big plus of OO.

Lunasole wrote:like in my example, but without array to store data.

The other codes are longer because they implement more complex operations.

You just implemented the insert operation (faulty using redim, but ok it can be fixed), those codes implement insertion (the most trivial of the operations), lookup (trivial), removal (moderately complex) and rebalancing of the tree (complex).
If you add sequentially ordered data to your tree you get a degenerated tree or a list.

Lunasole · Post by **Lunasole** » Mon Oct 26, 2015 9:07 pm

DontTalkToMe wrote: Do you expect expanded memory buffers to be kept at the same base address ?
Very unlikely. Just put yourself in place of the memory manager.

So after a redim your pointers are not guaranteed to point to something meaningful, likely just garbage, better to use just the array indexes.

I suspected something like that, but wondered why more Redims with a lower step causing error, and few Redims with a very big step not or rarely, so abandoned such theory. But as clearly seeing from your example you're right, thanks a lot. Should avoid Redim() in such cases or avoid pointers to array items. Another my mistake is that I expected that Redim() works like it was in VB6 and don't even think about how it deals with pointers inside

(vb6 has extra internal pointers which are corrected in such cases by runtime)

DontTalkToMe wrote: those codes implement insertion (the most trivial of the operations), lookup (trivial), removal (moderately complex) and rebalancing of the tree (complex).

All of those operations implementing pretty simply using OOP without perceptible lost of speed and increased memory usage, see C++ examples. That's an "advantage".

davido · Post by **davido** » Mon Oct 26, 2015 10:32 pm

@Lunasole,
Would it be possible to use a List?
I appreciate that speed will be reduced by a factor of about 2, but Lists are designed to have elements added and deleted.

Tenaja · Post by **Tenaja** » Mon Oct 26, 2015 11:04 pm

A "dim" allocates a fixed amount of space. Every ReDim requires a new location, and also the time of copying the old data to the new location. Frequent ReDim's would certainly make a list a worthy consideration. I am sure you could tune the dim size and added ReDim space for a specific application, as long as the requirement was predictable.

DontTalkToMe · Post by **DontTalkToMe** » Tue Oct 27, 2015 12:55 am

Why he should use a list in a thread where the subject is a tree ?
Use a list for the scenarios where a list is better and use a tree when a tree is indicated.

http://stackoverflow.com/questions/2700 ... earch-tree

Maybe a balanced tree with more granular dynamic allocations instead of redims, unless you know the distribution of the input data and the fact it's unlikely to end up with a degenerated tree

Tenaja · Post by **Tenaja** » Tue Oct 27, 2015 1:17 am

DontTalkToMe wrote:Why he should use a list in a thread where the subject is a tree ?

If you are going to ask that, then why not ask "why should he use an array in a thread where the subject is a tree?"

He has to have a chunk of ram to store the data. There are numerous ways to allocate that ram, and List or array are just two of those ways--neither of which are are optimal in all cases when used in the way implemented here.

Tenaja · Post by **Tenaja** » Tue Oct 27, 2015 1:30 am

DontTalkToMe wrote:Why he should use a list in a thread where the subject is a tree ?
Use a list for the scenarios where a list is better and use a tree when a tree is indicated.

http://stackoverflow.com/questions/2700 ... earch-tree

Maybe a balanced tree with more granular dynamic allocations instead of redims, unless you know the distribution of the input data and the fact it's unlikely to end up with a degenerated tree

I just realized that perhaps you did not understand the suggestion I made above...because it was unclear. I will clarify it.

With the current code in the IP, every time the maximum number of elements is exceeded, a new (larger) block of ram is allocated (and cleared), and all of the data from the previous block is copied, then then the previous block is freed up. That is what ReDim does, and it is very time consuming.

What I was proposing was to create a list of arrays, so the data is not copied every time. This will be much faster than calling ReDim if you are continually adding data in a way that forces it to ReDim. Obviously, the cost is to look up which array to access.

Again, it all depends on the intended use. If you know ahead of time precisely how much data you will have, the array is a no-brainer. But that is not very common, so we often get one-size-fits-all libraries. A one-size-fits-all will not be optimized for a use where adding data is the most common activity, nor will it be optimized for a use where reading the data is the most common activity. If you have lopsided use, then design the system to favor the activity...if efficiency is a concern at all.

DontTalkToMe · Post by **DontTalkToMe** » Tue Oct 27, 2015 1:39 am

Tenaja wrote: If you are going to ask that, then why not ask "why should he use an array in a thread where the subject is a tree?"

Because I understand why using an array is a reasonable idea even if not the best

It's reasonable to think to implement a tree using an array, since any element is directly accessible in a way similar to what you can do through a pointer, the first choice you usually see to implement a tree structure.

So you can keep intact the access characteristics which make a tree behave like you expect a tree to behave.
Better would be using a vector since it does not limit you to a fixed size of elements, but PB does not offer it.
One can implement a vector by himself but at that point he can just use pointers and be done with it.

Using a list to implement a tree it's a very strange idea in my opinion, since negates all the advantages a tree has on a list.
It's not a strange idea to use a list for a tree's item, for example to implement a map using a tree (with pointers) and a list for the collisions, but that's for another topic.

PureBasic Forums - English

Another Binary Tree Implementation

Another Binary Tree Implementation

Re: Another Binary Tree Implementation

Re: Another Binary Tree Implementation

Re: Another Binary Tree Implementation

Re: Another Binary Tree Implementation

Re: Another Binary Tree Implementation

Re: Another Binary Tree Implementation

Re: Another Binary Tree Implementation

Re: Another Binary Tree Implementation

Re: Another Binary Tree Implementation

Re: Another Binary Tree Implementation

Re: Another Binary Tree Implementation

Re: Another Binary Tree Implementation

Re: Another Binary Tree Implementation

Re: Another Binary Tree Implementation