276 Chapter 7 • Searching
{
int list_size
=
the_list.size( );
if
(searches <= 0 || list_size
<
0) {
cout << " Exiting test: "<<endl
<< " The number of searches must be positive."<<endl
<< " The number of list entries must exceed 0."<<endl;
return;
}
int i, target, found_at;
Key :: comparisons
=
0;
Random number;
Timer clock;
for
(i
=
0; i
<
searches; i++) {
target
=
2
*
number.random_integer(0, list_size − 1) + 1;
output function,
print_out, are left as a project.
Exercises 7.2
E1. One good check for any algorithm is to see what it does in extreme cases.
Determine what sequential search does when
(a) there is only one item in the list.
(b) the list is empty.
(c) the list is full.
E2. Trace sequential search as it searches for each of the keys present in a list con-
taining three items. Determine how many comparisons are made, and thereby
check the formula for the average number of comparisons for a successful
search.
Section 7.2 • Sequential Search 277
E3. If we can assume that the keys in the list have been arranged in order (for
example, numerical or alphabetical order), then we can terminate unsuccessful
searches more quickly. If the smallest keys come first, then we can terminate
the search as soon as a key greater than or equal to the target key has been
found. If we assume that it is equally likely that a target key not in the list is in
any one of the
n +1 intervals (before the first key, between a pair of successive
keys, or after the last key), then what is the average number of comparisons
for unsuccessful search in this version?
E4. At each iteration, sequential search checks two inequalities, one a comparison
of keys to see if the target has been found, and the other a comparison of
indices to see ifthe end of the listhasbeen reached. A good waytospeed up the
algorithm byeliminatingthesecondcomparisonistomake surethat eventually
key
target will be found, by increasing the size of the list and inserting an extra
item at the end with key
target. Such an item placed in a list to ensure that a
Find out how many comparisons are done for both unsuccessful and suc-
cessful searches, and compare these results with the analyses in the text.
Run your program for representative values of
n, such as n = 10, n = 100,
n = 1000.
278 Chapter 7 • Searching
P2. Take the driver program written in Project P1 to test searching functions, and
insert the version of sequential search that uses a sentinel (see Exercise E4). For
sentinel search
various values of n, determine whether the version with or without a sentinel
is faster. By experimenting, find the cross-over point between thetwo versions,
if there is one. That is, for what value of n is the extra time needed to insert
a sentinel at the end of a list of size
n about the same as the time needed for
extra comparisons of indices in the version without a sentinel?
P3. What changes are required to our sequential search function and testing pro-
gram in order to operate on simply linked lists as developed in Section 6.2.3?
linked sequential
search
Make these changes and apply the testing program from Project P1 for linked
lists to test linked sequential search.
7.3 BINARY SEARCH
Sequential search is easy to write and efficient for short lists, but a disaster for long
ones. Imagine trying to find the name “Amanda Thompson” in a large telephone
book by reading one name at a time starting at the front of the book! To find any
entry in a long list, there are far more efficient methods, provided that the keys in
the list are already sorted into order.
One of the best methods for a list with keys in order is first to compare the
target key with one in the center of the list and then restrict our attention to only
method
<
Record
>
{
public:
Ordered_list( );
Error_code insert(const Record &data);
Error_code insert(int position, const Record &data);
Error_code replace(int position, const Record &data);
};
As well as overriding the methods insert and replace, we have overloaded the
223
method insert so that it can be used with a single parameter. This overloaded
method places an entry into the correct position, determined by the order of the
keys. We shall study this operation further in Chapter 8, but here is a simple,
implementation-independent version of the overloaded method.
If the list already contains keys equal to the new one being inserted, then the
new key will be inserted as the
first of those that are equal.
Error_code Ordered_list :: insert(const Record &data)
/
*
Post: If the Ordered_list is not full, the function succeeds: The Record data is
inserted into the list, following the last entry of the list with a strictly lesser
key (or in the first list position if no list element has a lesser key).
Else: the function fails with the diagnostic Error_code overflow.
*
/
{
int s
*
Post: If the Ordered_list is not full, 0 ≤ position ≤ n, where n is the number
of entries in the list, and the Record data can be inserted at position in
the list, without disturbing the list order, then the function succeeds: Any
entry formerly in position and all later entries have their position numbers
increased by 1 and data is inserted at position of the List.
Else: the function fails with a diagnostic Error_code.
*
/
{
Record list_data;
if
(position
>
0) {
retrieve(position − 1, list_data);
if
(data
<
list_data)
return fail;
}
if (position
<
size( )) {
retrieve(position, list_data);
if
(data
>
list_data)
the part of the list in which we are looking for the target key. At each iteration, we
2
Richard E. Pattis, “Textbook errors in binary searching,” SIGCSE Bulletin, 20 (1988), 190–194.
Section 7.3 • Binary Search 281
shall reduce the size of this part of the list by about half. To help us keep track of the
225
progress of the algorithm, let us write down an assertion that we shall require to be
true before every iteration of the process. Such a statement is called an
invariant
of the process.
The target key, provided it is present in the list, will be found between the indices
bottom and top, inclusive.
invariant
We establish the initial correctness of this assertion by setting bottom to 0 and top
to the_list.size( ) − 1.
To do binary search, we first calculate the index
mid halfway between bottom
and top as
mid
=
(bottom + top)/2
Next, we comparethe targetkey against the key at positionmid and thenwe change
the appropriate one of the indices
top or bottom so as to reduce the list to either
its bottom or top half.
Next, we note that binary search should terminate when
top ≤ bottom; that
termination
is, when the remaining part of the list contains at most one item, providing that we
have not terminated earlier by finding the target.
Record data;
if
(bottom
<
top) { // List has more than one entry.
int mid
=
(bottom + top)/2;
the_list.retrieve(mid, data);
if
(data
<
target) // Reduce to top half of list.
return recursive_binary_1(the_list, target, mid + 1, top, position);
else //
Reduce to bottom half of list.
return recursive_binary_1(the_list, target, bottom, mid, position);
}
else if (top
<
bottom)
return not_present; // List is empty.
else { // List has exactly one entry.
position
=
bottom
;
the_list.retrieve(bottom, data);
if
(data == target) return success;
bottom. But this condition implies that when mid
is calculated we always have
bottom <= mid
<
top
Section 7.3 • Binary Search 283
since integer division truncates downward. Next, the if statement reduces the size
of the interval from
top − bottom either to top − (mid + 1) or to mid − bottom,
both of which, by the inequality, are strictly less than
top − bottom. Thus at each
iteration the size of the interval strictly decreases, so the recursion will eventually
terminate.
After the recursion terminates, we must finally check to see if the target key
has been found, since all previous comparisons have tested only inequalities.
To adjust the parameters to our standard search function conventions, we pro-
duce the following search function:
Error_code run_recursive_binary_1(const Ordered_list &the_list,
const
Key &target, int &position)
main call to
recursive_binary1
{
return recursive_binary_1(the_list, target, 0, the_list.size( ) − 1, position);
}
Since the recursion used in the function recursive_binary_1 is tail recursion, we
can easily convert it into an iterative loop. At the same time, we can make the
parameters consistent with other searching methods.
228
Error_code binary_search_1 (const Ordered_list &the_list,
bottom
=
mid
+ 1;
else
top
=
mid;
}
if (top
<
bottom) return not_present;
else
{
position
=
bottom;
the_list.retrieve(bottom, data);
if
(data == target) return success;
else return
not_present;
}
}
284 Chapter 7 • Searching
7.3.4 Recognizing Equality
Although binary_search_1 is a simple form of binary search, it seems that it will
often make unnecessary iterations because it fails to recognize that it has found
the target before continuing to iterate. Thus we might hope to save computer time
with a variation that checks at each stage to see if it has found the target.
success;
}
else if (data
<
target)
return recursive_binary_2(the_list, target, mid + 1, top, position);
else
return
recursive_binary_2(the_list, target, bottom, mid − 1, position);
}
else return not_present;
}
As with run_recursive_binary_1, we need a function run_recursive_binary_2 to ad-
just the parameters to our standard conventions.
Error_code run_recursive_binary_2(const Ordered_list &the_list,
const
Key &target, int &position)
main call to
recursive_binary2
{
return recursive_binary_2(the_list, target, 0, the_list.size( ) − 1, position);
}
Again, this functioncanbe translatedintononrecursiveformwith onlythestandard
parameters:
Section 7.3 • Binary Search 285
230
Error_code binary_search_2(const Ordered_list &the_list,
const
Key &target, int &position)
/
top
=
position − 1;
}
return not_present;
}
The operation of this version is described in the following diagram:
231
bottom top
?
< target
> target
Notice that this diagram (in contrast to that for the first method) is symmetrical
in that the first part contains only entries strictly less than
target, and the last
part contains only entries strictly greater than
target. With this method, therefore,
if
target appears more than once in the list, then the algorithm may return any
instance of the target.
Proving that the loop in
binary_search_2 terminates is easier than the proof for
loop termination
binary_search_1.Inbinary_search_2, the form of the if statement within the loop
guarantees that the length of theinterval is reduced by at least half ineach iteration.
comparison of methods Which of these two versions of binary search will do fewer comparisons of
keys? Clearly
binary_search_2 will, if we happen to find the target near the begin-
ning of the search. But each iteration of
binary_search_2 requires two comparisons
binary_search_1 and binary_search_2 the search options. Compare their per-
formance with each other and with sequential search.
P2. Incorporate the recursive versions of binary search (both variations) into the
testing program of Project P1 of Section 7.2 (page 277). Compare the perfor-
mance with the nonrecursive versions of binary search.
7.4 COMPARISON TREES
The comparison tree (also called decision tree or search tree) of an algorithm is
obtained by tracing through the action of the algorithm, representing each com-
parison of keys by a
vertex of the tree (which we draw as a circle). Inside the
definitions
circle we put the index of the key against which we are comparing the target key.
Branches (lines) drawn down from the circle represent the possible outcomes of
the comparison and are labeled accordingly. When the algorithm terminates, we
put either
F (for failure) or the position where the target is found at the end of
232
the appropriate branch, which we call a leaf, and draw as a square. Leaves are
also sometimes called
end vertices or external vertices of the tree. The remaining
vertices are called the
internal vertices of the tree.
The comparison tree for sequential search is especially simple; it is drawn in
Figure 7.2.
The number of comparisons done by an algorithm in a particular search is
the number of internal (circular) vertices traversed in going from the top of the
tree (which is called its
root) down the appropriate path to a leaf. The number of
definitions
branches traversed to reach a vertex from the root is called the level of the vertex.
≠
=
F
.
.
.
Figure 7.2. Comparison tree for sequential_search
7.4.1 Analysis for n = 10
1. Shape of Trees
That sequential search on average does far more comparisons than binary search
is obvious from comparing the shape of its tree with the shape of the trees for
binary_search_1 and binary_search_2, which for n = 10 are drawn inFigure 7.3 and
Figure 7.4, respectively. Sequential search has a long, narrow tree, which means
many comparisons, whereasthe trees for binary search are much wider and shorter.
234
12FF
123F 678
≠
≠>
67FF
4F5F 9 10FFF
13 6845 910
2749
38
5
≠≠≠
====
≠≠≠≠≠
== ===
>
>
>
>
>
>
>
=
>
>
=
>
>
>
>
=
>
>
=
>
>
=
==
=
=
=
≠
>
>
2
…
key at mid:
Search from
mid +1 to top.
<>
<>
S
= ≠ =
Figure 7.5. Top of the comparison tree, recursive binary_search_2
Section 7.4 • Comparison Trees 289
From the trees shown for binary_search_1 and binary_search_2 with n = 10,
it is easy to read off how many comparisons will be done by each algorithm. In
the worst case search, this number is simply one more than the height of the tree;
in fact, for every search it is the number of interior vertices lying between the root
and the vertex that terminates the search.
3. Comparison Count for binary_search_1
In binary_search_1, every search terminates at a leaf; to obtain the average number
of comparisons for both successful and unsuccessful searches, we need what is
called the
external path length of the tree: the sum of the number of branches
external path length
traversed in going from the root once to every leaf in the tree. For the tree in
Figure 7.3, the external path length is
(4 × 5)+(6 × 4)+(4 × 5)+(6 × 4)= 88.
Half the leaves correspond to successful searches, and half to unsuccessful searches.
Hence the average number of comparisons needed for either a successful or un-
successful search by
binary_search_1 is
44
10
= 4.4 when n = 10.
For an unsuccessful search by
binary_search_2, we need the external path
length of the tree in Figure 7.4. This is
(5 × 3)+(6 × 4)= 39.
290 Chapter 7 • Searching
We shall assume for unsuccessful searches that the n + 1 intervals (less than the
first key, between a pair of successive keys, or greater than the largest) are all
equally likely; for the diagram we therefore assume that any of the 11 failure leaves
are equally likely. Thus the average number of comparisons for an unsuccessful
average unsuccessful
count
search is
2
× 39
11
≈ 7.1.
5. Comparison of Algorithms
For n = 10, binary_search_1 does slightly fewer comparisons both for successful
and for unsuccessful searches. To be fair, however, we should note that the two
comparisons done by
binary_search_2 at each internal vertex are closely related
(the same keys are being compared), so that an optimizing compiler may not do as
much work as two full comparisons. In that case, in fact,
binary_search_2 may be
a slightly better choice than
binary_search_1 for successful searches when n = 10.
7.4.2 Generalization
What happens when n is larger than 10? For longer lists, it may be impossible to
draw the complete comparison tree, but from the examples with
n = 10, we can
assume that we have
k vertices on level t. Since (by the second half of Lemma 7.1)
k ≤ 2
t
, we obtain t ≥ lg k, where lg denotes a logarithm with base 2.
3
Lemma 7.2 If a 2-tree has k vertices on level t, then t ≥ lg k, where lg denotes a logarithm with
base 2.
3
For a review of properties of logarithms, see Appendix A.
Section 7.4 • Comparison Trees 291
The notation for base 2 logarithms just used will be our standard notation through-
out this book. In analyzing algorithms we shall also sometimes need natural loga-
rithms (taken with base
e = 2.71828 ). We shall denote a naturallogarithm by ln.
We shall rarely need logarithms to any other base. We thus summarize as follows:
logarithms
Conventions
Unless stated otherwise, all logarithms will be taken with base 2.
The symbol
lg denotes a logarithm with base 2,
and the symbol
ln denotes a natural logarithm.
When the base for logarithms is not specified (or is not important),
then the symbol
log will be used.
After we take logarithms, we frequently need to move either up or down to the
floor and ceiling
next integer. To specify this action, we define the floor of a real number x to be
the largest integer less than or equal to
2
whose lengths differ by at most 1, then all leaves of T
1
and T
2
are
on the same or adjacent levels. The statement is clearly true when
L
1
and L
2
are
lists with length at most 2. Moreover, if
binary_search_1 divides two larger lists
whose sizes differ by at most one, the sizes of the four halves also differ by at
most 1, and the induction hypothesis shows that their leaves are all on the same or
adjacent levels.) From Lemma 7.2 it follows that the maximum level
t of leaves in
the comparison tree satisfies
t =lg 2n.
Since one comparison of keys is done at the root (which is level 0), but no
comparisons are done at the leaves (level
t), it follows that the maximum number
of key comparisons is also
t =lg 2n. Furthermore, the maximum number is at
most one morethan the average number, sinceall leaves are on the sameoradjacent
levels.
292 Chapter 7 • Searching
Hence we have:
The number of comparisons of keys done by binary_search_1 in searching a list of n
binary_search_2, two comparisons of keys are performed for each internal ver-
tex, the number of comparisons done in an unsuccessful search is approximately
2lg
(n +1).
The number of comparisons done in an unsuccessful search by binary_search_2 is
approximately
2lg(n + 1).
4. The Path-Length Theorem
To calculate the average number of comparisons for a successful search by bi-
237
nary_search_2, we first obtain an interesting and important relationship that holds
for any 2-tree.
Theorem 7.3 Denote the external path length of a 2-tree by E , the internal path length by I , and
let
q be the number of vertices that are not leaves. Then
E = I + 2q.
Proof To prove the theorem we use the method of mathematical induction, using the
number of vertices in the tree to do the induction.
If the tree contains only its root, and no other vertices, then
E = I = q = 0, and
the base case of the theorem is trivially correct.
Now take a larger tree, and let
v be some vertex that is not a leaf, but for which
both the children of
v are leaves. Let k be the number of branches on the path
from the root to
v . See Figure 7.6.
Section 7.4 • Comparison Trees 293
q non-leaves
k
we have seen. The number of leaves is
n +1, so the external path length is about
(n + 1)lg(n + 1).
Theorem 7.3 then shows that the internal path length is about
(n + 1)lg(n + 1)−2n.
To obtain the average number of comparisons done in a successful search, we must
first divide by
n (the number of non-leaves) and then add 1 and double, since two
comparisons were done at each internal node. Finally, we subtract 1, since only
one comparison is done at the node where the target is found. The result is:
294 Chapter 7 • Searching
In a successful search of a list of n entries, binary_search_2 does approximately
236
2(n + 1)
n
lg(n + 1)−3
comparisons of keys.
7.4.3 Comparison of Methods
Note the similarities and differences in the formulas for the two versions of binary
search. Recall, first, that we have already made some approximations in our cal-
culations, and hence our formulas are only approximate. For large values of
n the
difference between lg
n and lg
(n +1) is insignificant, and (n +1)/n is very nearly
simplified counts
1. Hence we can simplify our results as follows:
Successful search Unsuccessful search
binary_search_1 lg n +1lgn +1
binary_search_2 2lgn −32lgn
is better, and for small problems, sequential_search is better. To
be fair, however, with some computers and optimizing compilers, the two com-
parisons needed in
binary_search_2 will not take double the time of the one in
binary_search_1, so in such a situation binary_search_2 might prove the better
choice.
Section 7.4 • Comparison Trees 295
Our object in doing analysis of algorithms is to help us decide which may be
better under appropriate circumstances. Disregarding the foregoing provisos, we
have now been able to make such a decision, and have available to us information
that might otherwise not be obvious.
The numbers of comparisons of keys done in the average successful case by
sequential_search, binary_search_1, and binary_search_2 are graphed in Figure 7.7.
The numbers shown in the graphs are from test runs of the functions; they are not
approximations. The first graph in Figure 7.7 compares the three functions for
logarithmic graphs
small values of n, the number of items in the list. In the second graph we compare
the numbers over a much larger range by employing a
log-log graph in which each
unit along an axis represents doubling the corresponding coordinate. In the third
graph we wish to compare the two versions of binary search; a
semilog graph is
appropriatehere, so that the vertical axismaintains linear units while the horizontal
238
axis is logarithmic.
+
+
+
+
+
Sequential
Binary
2048
1024
512
256
128
64
32
16
8
4
2
1
1 2 4 8 32 128 512 2048
Sequential
Binary 2
Binary 1
22
20
18
16
14
12
10
8
6
4
2
0
binary_search_2. That is, we shall assume
hypotheses
that the leaves of the comparison tree correspond to unsuccessful searches, that the
internal vertices correspond to successful searches, and that two comparisons of
keys are made for each internal vertex, except that only one is made at the vertex
where the target is found. If
I and E are the internal and external path lengths of
the tree, respectively, and
n is the number of items in the list, so that n is also the
number of internal vertices in the tree, then, as in the analysis of
binary_search_2,
we know that the average number of comparisons in a successful search is
237
S = 2
I
n
+
1
−
1 =
2I
n
+
1
and the average number for an unsuccessful search is
U = 2E/(n +1).ByTheorem
7.3,
E = I + 2n. Combining these expressions, we can therefore conclude that
search over all 10,000 names every time, consider the possibility of splitting the
Section 7.5 • Lower Bounds 297
list into two: a high-frequency list of 2000 names and a low-frequency list of
the remaining 8000 names. To look up a name, you will first use binary search
on the high-frequency list, and 80 percent of the time you will not need to go
on to the second stage, where you use binary search on the low-frequency list.
Is this scheme worth the effort? Justify your answer by finding the number of
comparisons done by
binary_search_1 for the average successful search, both
in the new scheme and in a binary search of a single list of 10,000 names.
E4. Use mathematical induction on
n to prove that, in the comparison tree for
binary_search_1 on a list of n entries, n>0, all the leaves are on levels lg 2n
and lg 2
n. [Hence, if n is a power of 2 (so that lg 2n is an integer), then all
the leaves are on one level; otherwise, they are all on two adjacent levels.]
E5. If you modified binary search so that it divided the list not essentially in half at
each pass, but instead into two pieces of sizes about one-third and two-thirds
of the remaining list, then what would be the approximate effect on its average
count of comparisons?
Programming
Projects 7.4
P1. (a) Write a “ternary” search function analogous to binary_search_2 that ex-
amines the key one-third of the way through the list, and if the target key is
greater, then examines the key two-thirds of the way through, and thus in any
case at each pass reduces the length of the list by a factor of three. (b) Include
your function as an additional option in the testing program of Project P1 of
Section 7.2 (page 277), and compare its performance with other methods.
P2. (a) Write a program that will do a “hybrid” search, using
binary_search_1 for
bi-
nary_search_1
and binary_search_2 become insignificant in comparison with the
difference between either binary search and sequential search. For large lists,
bi-
nary_search_2
may require nearly double the time of binary_search_1, but the dif-
ference between 2 lg
n and lg n is negligible compared to the difference between
2lg
n comparisons and the n comparisonssometimes needed by sequential search.
2. Arbitrary Searching Algorithms
Let us now ask whether it is possible for any search algorithm to exist that will,
in the worst and the average cases, be able to find its target using significantly
fewer comparisons of keys than binary search. We shall see that the answer is
no,
providing that we stay within the class of algorithms that rely only on comparisons
of keys to determine where to look within an ordered list.
general algorithms
and comparison trees
Let us start with an arbitrary algorithm that searches an ordered list by making
comparisons of keys, and imagine drawing its comparison tree in the same way as
we drew the tree for
binary_search_1. That is, each internal node of the tree will
correspond to one comparison of keys and each leaf to one of the possible final
outcomes. (If the algorithm is formulated as three-way comparisons like those
of
binary_search_2, then we expand each internal vertex into two, as shown for
one vertex in Figure 7.4.) The possible outcomes to which the leaves correspond
include not only the successful discovery of the target but also the different kinds
them from
v , and attach them as children of some (former) leaf on level s. Then
we have changed
T into a new 2-tree T
that still has k leaves, the height of T
is
certainly no more than that of
T , and the external path length of T
satisfies
E(T
)= E(T)−2r + (r − 1)−s + 2(s + 1)= E(T)−r + s + 1 < E(T )
since r>s+ 1. The terms in this expression are obtained as follows. Since two
leaves at level
r are removed, E(T) is reduced by 2r . Since vertex v has become a
leaf,
E(T) is increased by r −1. Since the vertex on level s is no longer a leaf, E(T)
is reduced by s . Since the two leaves formerly on level r are now on level s + 1,
the term 2
(s +1) is added to E(T). This process is illustrated in Figure 7.8.
239
v
s
r
Figure 7.8. Moving leaves higher in a 2-tree
We can continue in this way to move leaves higher up the tree, reducing the
external path length and possibly the height each time, until finally all the leaves
2
(k −x) vertices on level h −1, which, with the x leaves, comprise all
vertices on level
h −1. Hence, by Lemma 7.2,
1
2
(k − x)+x ≤ 2
h−1
,
300 Chapter 7 • Searching
which becomes x ≤ 2
h
− k. We now have
E(T) = (h − 1)x + h(k − x)
= kh − x
≥ kh − (
2
h
− k)
= k(h +
1)−2
h
.
From the bound on the height, we already know that 2
h−1
<k≤ 2
h
. Ifweset
h = lg k +, then satisfies 0 ≤ <1, and substituting into the bound for E(T)
we obtain
n + 1 comparisons of keys. When we compare
these numbers with those obtained in the analysis of
binary_search_1, we obtain
Corollary 7.7 binary_search_1 is optimal in the class of all algorithms that search an ordered list by
making comparisons of keys. In both the average and worst cases,
binary_search_1
achieves the optimal bound.