ELEMENTARY SEARCHING METHODS
The call treeprint(head7.r) will print out the keys of the tree in order. This
defines a sorting method which is remarkably similar to Quicksort, with the
node at the root of the tree playing a role similar to that of the partitioning
element in Quicksort. A major difference is that the tree-sorting method must
use extra memory for the links, while Quicksort sorts with only a little extra
memory.
The running times of algorithms on binary search trees are quite depen-
dent on the shapes of the trees. In the best case, the tree could be shaped like
that given above for describing the comparison structure for binary search,
with about lg N nodes between the root and each external node. We might,
roughly, expect logarithmic search times on the average because the first ele-
ment inserted becomes the root of the tree; if N keys are to be inserted at
random, then this element would divide the keys in half (on the average),
leading to logarithmic search times (using the same argument on the subtrees).
Indeed, were it not for the equal keys, it could happen that the tree given above
for describing the comparison structure for binary search would be built. This
would be the best case of the algorithm, with guaranteed logarithmic running
time for all searches. Actually, the root is equally likely to be any key in a
truly random situation, so such a perfectly balanced tree would be extremely
rare. But if random keys are inserted, it turns out that the trees are nicely
balanced. The average number of steps for a treesearch in a tree built by
successive insertion of N random keys is proportional to 2 In N.
On the other hand, binary tree searching is susceptible to the same
worst-
case problems as Quicksort. For example, when the keys are inserted in order
(or in reverse order) the binary tree search method is no better than the
sequential search method that we saw at the beginning of this chapter. In the
next chapter, we’ll examine a technique for eliminating this worst case and
making all trees look more like the best-case tree.
The implementations given above for the fundamental search, insert, and
As we saw with heaps in Chapter 11, for many applications we want a search-
ing structure to simply help us find records, without moving them around.
For example, we might have an array a[l N] of records with keys, and we
would like the search routine to give us the index into that array of the record
matching a certain key. Or we might want to remove the record with a given
index from the searching structure, but still keep it in the array for some
other use.
To adapt binary search trees to such a situation, we simply make the
info field of the nodes the array index. Then we could eliminate the key field
by having the search routines access the keys in the records directly, e.g. via
an instruction like if v<a[xt .info] then. . . . However, it is often better to
make a copy of the key, and use the code above just as it is given. We’ll
use the function name bstinsert(v, info: integer; x: link) to refer to a function
just like treeinsert, except that it also sets the info field to the value given
in the argument. Similarly, a function bstdelete(v,info: integer;x: link) to
delete the node with key v and array index info from the binary search tree
rooted at x will refer to an implementation of the delete function as described
above. These functions use an extra copy of the keys (one in the array, one
in the tree), but this allows the same function to be used for more than one
array, or as we’ll see in Chapter 27, for more than one key field in the same
array. (There are other ways to achieve this: for example, a procedure could
be associated with each tree which extracts keys from records.)
Another direct way to achieve “indirection” for binary search trees is to
simply do away with the linked implementation entirely. That is, all links just
become indices into an array a[1 N] of records which contain a key field and 1
and
r
index fields. Then link references such as if v<xf.key then
x:=x7.1
else
Give a recursive implementation of binary search.
4.
Suppose
a[i]=2i
for 1 5 i 5 N. How many table positions are examined
by interpolation search during the unsuccessful search for 2k
-
l?
5.
Draw the binary search tree that results from inserting records with the
keys E A
S
Y Q U E S T I 0 N into an initially empty tree.
6. Write a recursive program to compute the height of a binary tree: the
longest distance from the root to an external node.
7.
Suppose that we have an estimate ahead of time of how often search keys
are to be accessed in a binary tree. Should the keys be inserted into the
tree in increasing or decreasing order of likely frequency of access? Why?
8.
Give a way to modify binary tree search so that it would keep equal keys
together in the tree. (If there are any other nodes in the tree with the
same key as any given node, then either its father or one of its sons should
have an equal key.)
9.
Write a nonrecursive program to print out the keys from a binary search
tree in order.
10.
Use a least-squares curvefitter to find values of a and b that give the best
2-3-4
Trees
To eliminate the worst case for binary search trees, we’ll need some flexibility
in the data structures that we use. To get this flexibility, let’s assume that we
can have nodes in our trees that can hold more than one key. Specifically, we’ll
187
188
CHAPTER 15
allow J-nodes and d-nodes, which can hold two and three keys respectively. A
3-node has
t.hree
links coming out of it, one for all records with keys smaller
than both its keys, one for all records with keys in between its two keys, and
one for all records with keys larger than both its keys. Similarly, a 4-node
has four links coming out of it, one for each of the intervals defined by its
three keys. (The nodes in a standard binary search tree could thus be called
,%nodes:
one key, two links.) We’ll see below some efficient ways of defining
and implementing the basic operations on these extended nodes; for now, let’s
assume we can manipulate them conveniently and see how they can be put
together to form trees.
For example, below is a
&Y-4
tree which contains some keys from our
searching example.
It is easy to see how to search in such a tree. For example, to search for
0 in the tree above, we would follow the middle link from the root, since 0
is between E and R then terminate the unsuccessful search at the right link
from the node containing H and I.
To insert a new node in a 2-3-4 tree, we would like to do an unsuccessful
But what if we were to need to split a 4-node whose father is also a 4-node?
One method would be to split the father also, but this could keep happening
all the way back up the tree. An easier way is to make sure that the father of
any node we see won’t be a 4-node by splitting any 4-node we see on the way
down the tree. For example, when E is inserted, the tree above first becomes
190
This ensures that we could handle the situation at the bottom even if E were
to go into a 4-node (for example, if we were inserting another A instead).
Now, the insertion of E, X, A, M, P, L, and E finally leads to the tree:
The above example shows that we can easily insert new nodes into 2-3-
4 trees by doing a search and splitting 4-nodes on the way down the tree.
Specifically, every time we encounter a 2-node connected to a 4-node, we
should transform it into a 3-node connected to two 2-nodes:
and every time we encounter a 3-node connected to a 4-node, we should
transform it into a 4-node connected to two 2-nodes:
BALANCED TREES
These transformations are purely “local”: no part of the tree need be examined
or modified other than what is diagrammed. Each of the transformations
passes up one of the keys from a
4-node
to its father in the tree, restructuring
links accordingly. Note that we don’t have to worry explicitly about the father
being a 4-node since our transformations ensure that as we pass through each
node in the tree, we come out on a node that is not a 4-node. In particular,
when we come out the bottom of the tree, we are not on a 4-node, and we
can directly insert the new node either by transforming a 2-node to a 3-node
or a 3-node to a 4-node. Actually, it is convenient to treat the insertion as a
split of an imaginary 4-node at the bottom which passes up the new key to be
inserted. Whenever the root of the tree becomes a 4-node, we’ll split it into
three 2-nodes, as we did for our first node split in the example above. This
Fortunately, as we’ll see below, there is a relatively simple representation of
2-, 3-, and 4-nodes that allows the transformations to be done in a uniform
way with very little overhead beyond the costs incurred by standard binary
tree search.
192
CHAPTER 15
Red-Black Trees
Remarkably, it is possible to represent 2-3-4 trees as standard binary trees
(2-nodes only) by using only one extra bit per node. The idea is to represent
3-nodes and 4nodes as small binary trees bound together by
“red”
links
which contrast with the “black” links which bind the 2-3-4 tree together. The
representation is simple: 4-nodes are represented as three 2-nodes connected
by red links and 3-nodes are represented as two 2-nodes connected by a red
link (red links are drawn as double lines):
(Either orientation for a 3-node is legal.) The binary tree drawn below is one
way to represent the final tree from the example above. If we eliminate the
red links and collapse the nodes they connect together, the result is the 2-3-4
tree from above. The extra bit per node is used to store the color of the link
pointing to that node: we’ll refer to 2-3-4 trees represented in this way as
red-black trees.