In the last article we saw one way of implementing an object-functional, immutable Stack data structure. Now let's see how to implement a Set data structure. A Set is a container which supports the following operations efficiently:
insert
: adds a new item to the Setdelete
: removes given item from the Setmember
: check if a given item is in the Set
As we did before with the Stack, we can define this behavior in a Trait like the following:
|
|
Ordering
We are planning to implement the Set using Binary Search Trees (BST) which means elements in the Set needs to be comparable (or can be ordered). One way to do it is by constraining the type T
to be a subclass of Ordered[T]
. But a more lenient approach is to allow the caller to use any type as long as it can be implicitly converted to an Ordered[T]
. Scala provides syntactic sugar for doing this easily called view bounds using which we can define the type as +T <% Ordered[T]
. Since an implicit parameter needs to be passed under the hood, we define Set
as an abstract class. A partial definition will look like the following:
|
|
There are plans to deprecate view bounds from Scala. We could use the equivalent:
|
|
This is a bit verbose since we have to repeat the implicit parameter at multiple places. A better option is to use context bounds. We can also define a type alias for T => Ordered[T]
to simplify things:
|
|
Unbalanced BST
Now that we are done with defining proper types, let's start with the actual implementation. We will first implement an unbalanced version. A BST node can either be a Leaf
which is a sentinel for empty, or a Node
which has a value, left tree and right tree. Once we have this, implementing the operations recursively is trivial since the structure of the tree is also recursive. For instance, to insert a value we need to compare with the root and if the value is less insert on left tree otherwise on right tree etc.
|
|
As is the case with persistent data structures, the operations will result in new copies with potential sharing of structure. For instance, if we insert a value which is less than that in root node, the complete right subtree can be shared etc.
Deletion
Deleting a node from BST is slightly more complex. One way is to replace the deleted node with its inorder predecessor. Additionally we need to reconstruct the subtree containing the deleted and predecessor nodes since our data structure is immutable. There are special cases to consider like one or both children being empty which can be handled by the Leaf
object. In the following implementation we use delete
method to locate the node to be deleted and deleteCurrent
to actually replace stuff. It uses a recursive fixNodes
function which finds the predecessor and propagates it back to deleted node rebuilding the tree.
|
|
Balanced BST
As you would know, we need a better solution to work efficiently on all inputs. The tree operations take O(d)
time where d is the depth of the tree. If we insert n items which are in sorted order, we create a tree with depth n which makes the operations linear time. To avoid this, we need to keep the tree balanced so that the depth is roughly log(n). There are multiple ways to do this like AVL trees, Red Black trees(RBT) etc. We will consider a Red black tree implementation here.
Okasaki gives an elegant implementation of RBT in the book (If you don't have a copy see this paper). As usual, we will adopt the solution to Scala and it is natural to extend from the BST
, Leaf
and Node
classes which we already have. We need an additional method balancedInsert
to take care of balancing, so let's create a Trait to capture RBT specific stuff:
|
|
RBT is a BST with the following additional properties which guarantees that the tree remains balanced:
- Every node is either Red or Black. The
Leaf
nodes are black by convention. - A red node cannot have a red parent
- Every path from root to leaf should have same no.of black nodes.
balancedInsert
makes sure that (2) and (3) above are not violated when inserting an item. We start defining the Node
for RBT as follows:
|
|
Note that we don't need to define the member
function here. The behavior is same as that of a BST
and since we inherit from BST.Node
we get that for free. We override insert
and make sure we balance things when something is inserted if needed. Finally we maintain the invariant that the root node is always Black.
Let's define the Leaf
for RBT which is simple enough:
|
|
So far we haven't done much in RBT and once we define balancedInsert
we are done! Of course balancing the tree is the heart of insert
operation. As discussed in the paper, we just need to handle balancing (by rotation) in four cases:
This can be declaratively expressed in code, thanks to pattern matching as follows:
|
|
balance
does the actual rotation and it corresponds as is to the above diagram. This, in fact, is the heart of the algorithm. Most of the other things we talked about are about how we organize code cleanly, adding flexible type support etc.
Side Note: We didn't tackle deletion of a node here. Implementing the delete
operation is slightly involved for RBT
. See this post to see how it can be done.
Example
Let's see a sample tree construction to see how it works. Assume we insert 1,2,3,4 in this order, we get something like the following:
We are only discussing the very basic operations of the data structure here. Using this other convenience functions can be easily added. For example, we can add an apply
function to help creating a Set from its arguments as follows:
|
|
Testing
As we did before, tests using Scalacheck can be written once we have a generator of random trees.
|
|
That's all for now. See the complete code for Set. Scala standard library also provides an implementation of RBT in a different style.