constants, or if a variable is repeated twice in the rule head, it can easily be rectified: a constant c is
replaced by a variable X, and a predicate equal(X, c) is added to the rule body. Similarly, if a
variable Y appears twice in a rule head, one of those occurrences is replaced by another variable Z, and
a predicate equal(Y, Z) is added to the rule body.
The evaluation of a nonrecursive query can be expressed as a tree whose leaves are the base relations.
What is needed is appropriate application of the relational operations of SELECT, PROJECT, and
JOIN, together with set operations of UNION and SET DIFFERENCE, until the predicate in the query
gets evaluated. An outline of an inference algorithm
GET_EXPR(Q) that generates a relational
expression for computing the result of a DATALOG query Q = p(arg
1
, arg
2
, , arg
n
) can
informally be stated as follows:
1. Locate all rules S whose head involves the predicate p. If there are no such rules, then p is a
fact-defined predicate corresponding to some database relation R
p
; in this case, one of the
following expressions is returned and the algorithm is terminated (we use the notation $i to
refer to the name of the i
th
attribute of relation R
p
);
a. If all arguments are distinct variables, the relational expression returned is R
p
into the natural join. Let the resulting relation from this join be R
s
.
c. If any built-in predicate X h Y was defined over the arguments X and Y, the result of
the join is subjected to an additional selection:
SELECT
X h Y
(R
s
),
d. Repeat Step 2(b) until no more built-in predicates apply.
3. Take the UNION of the expressions generated in Step 2 (if more than one rule exists with
predicate p as its head). 25.5.4 Concepts for Recursive Query Processing in Datalog
1
Page 698 of 893
Naive Strategy
Seminaive Strategy
The Magic Set Rule Rewriting Technique
Query processing can be separated into two approaches:
• Pure evaluation approach: Creating a query evaluation plan that produces an answer to the
query.
• Rule rewriting approach: Optimizing the plan into a more efficient strategy.
Many approaches have been presented for both recursive and nonrecursive queries. We discussed an
approach to nonrecursive query evaluation earlier. Here we first define some terminology for recursive
ancestor(X,Y) :- ancestor(X,Z), parent(Z,Y) is called left linearly recursive, while the rule ancestor(X,Y) :- parent(X,Z), ancestor(Z,Y), is called right linearly recursive.
Notice that the rule ancestor(X,Y) :- ancestor(X,Z), ancestor(Z,Y) is not linearly recursive. It is believed that most "real-life" rules can be described as linear recursive
rules; algorithms have been defined to execute linear sets of rules efficiently. The preceding definitions
become more involved when a set of rules with predicates that occur on both the LHS and the RHS of
rules are considered.
A predicate whose relation is stored in the database is called an extensional database (EDB)
predicate, while a predicate for which the corresponding relation is defined by logical rules is called an
intensional database (IDB) predicate. Given a Datalog program with relations corresponding to the
predicates, the "if" symbol, :-, may be replaced by an equality to form Datalog equations, without any
loss of meaning. The resulting set of Datalog equations could potentially have many solutions. In a set
of relations for the EDB predicates, say R
1
, R
2
, , R
and only if there is an edge from node X to node Y in the graph, the paths in the graph may be
expressed by the following rules:
1
Page 700 of 893path(X,Y) :- edge(X,Y)
path(X,Y) :- path(X,Z), path (Z,Y) Notice that there are other ways of defining paths recursively. Let us assume that relations P and A
correspond to the predicates path and edge in the preceding rules. The transitive closure of relation P
contains all possible pairs of nodes that have a path between them, and it corresponds to the least fixed-
point solution corresponding to the equations that result from the preceding rules (Note 6). These rules
can be turned into a single equation for the relation P corresponding to the predicate edge. P(X,Y) = A(X,Y) D p
X,Y
(P(X,Z)P(Z,Y)) Suppose that the nodes are 3,4,5 and A = {(3,4), (4,5)}. From the first and second rules we can infer
that (3,4), (4,5) and (3,5) are in P. We need not look for any other paths, because P = {(3,4),(4,5),(3,5)}
is a solution of the above equation: {(3,4),(4,5),(3,5)} = {(3,4),(4,5)}D p
X,Y
({(3,4),(4,5),(3,5)}{ (3,4),(4,5),(3.5)})
, R
2
, , R
n
) The Jacobi method proceeds as follows. Initially, the variable relations R
i
are set equal to the empty set.
Then, the computation R
i
= E
i
(R
1
, R
2
, , R
n
), i = 1, , n is iterated until none of the R
i
changes
between two consecutive iterations (i.e., until the R
i
reach a fixpoint). Algorithm 25.1 Jacobi naive strategy.
n
);
If R
i
S
i
then condition = false
end
until condition; 1
Page 702 of 893
The convergence of the Jacobi method can be slightly improved if, at each step k, in order to compute
the new value R
i
(k) , we substitute in E
i
the values of R
j
(k) that have just been computed in the same
iteration instead of the old values R
j
(k - 1). This variant of the Jacobi method is called the Gauss-Seidel
method, which produces the same result as the Jacobi algorithm. Consider the following example
where ancestor(X, Y) means X is ancestor of Y; parent(X, Y) means X is parent of Y. ancestor(X,Y) :- parent(X,Y).
ancestor(X,Y) :- ancestor(X,Z), parent(Z,Y).
includes parents as ancestors. A
(1)
S
1
, thus condition = false. We therefore enter the second iteration
with S
1
set to A
(1)
. Computing the value of A again, we get, A
(2)
= P = {(bert,alice),(bert,george),(alice,derek),(alice,pat),(derek,frank), (bert,derek), (bert,pat),
(alice,frank)}.
1
Page 703 of 893It can be seen that A
(2)
= A(1) D {(bert,derek), (bert,pat), (alice,frank)}. Note that A
(2)
now includes
grandparents as ancestors besides parents. Since A
(2)
S
1
, we iterate again, setting S
(4)
= P = {(bert,alice),(bert,george),(alice,derek),(alice,pat),(derek,frank), (bert,derek), (bert,pat),
(alice,frank), (bert,frank)}. Finally, A
(4)
= A
(3)
= S
1
, the evaluation is finished. Intuitively, from the above parental hierarchy, it is
obvious that all ancestors have been computed. Seminaive Strategy
Seminaive evaluation is a bottom-up technique designed to eliminate redundancy in the evaluation of
tuples at different iterations. This method does not use any information about the structure of the
program. There are two possible settings of the seminaive algorithm: the (pure) seminaive and the
pseudo rewriting seminaive.
Consider the Jacobi algorithm. Let R
i(k)
be the temporary value of relation R
i
at iteration Step k. The
differential of R
i
at Step k of the iteration is defined as,
for i = 1 to n do R
i
= ;
for i = 1 to n do D
i
= ;
repeat
for i = 1 to n do S
i
= ;
condition = true;
for i = 1 to n do
begin
D
i
= E
i
[S
1
, , S
n
] - R
i
;
R
i
= D
i
D R
i
Page 705 of 893A
(1)
= D
(1)
D A
(0)
= {(bert,alice),(bert,george),(alice,derek),
(alice,pat),(derek,frank)}.
D
(2)
= {(bert,alice),(bert,george),(alice,derek),
(alice,pat),(derek,frank), (bert,derek),
(bert,pat), (alice,frank)}- A(1)
= {(bert,derek), (bert,pat), (alice,frank)}.
A
(2)
= D
(2)
D A
(1)
= {(bert,alice),(bert,george),(alice,derek),
(alice,pat),(derek,frank), (bert,derek),
(bert,pat), (alice,frank)}.
D
(3)
The Magic Set Rule Rewriting Technique
The problem addressed by the magic sets rule rewriting technique is that frequently a query asks not for
the entire relation corresponding to an intentional predicate but for a small subset of this relation.
Consider the following program:
1
Page 706 of 893sg(X,Y) :- flat(X,Y).
sg(X,Y) :- up(X,U), sg(U,V), down(V,Y). Here, sg is a predicate ("same-generation cousin"), and the head of each of the two rules is the atomic
formula sg(X, Y). The other predicates found in the rules are flat, up, and down. These are
presumably stored extensionally as facts, while the relation for sg is intentional—that is, defined only
by the rules. For a query like sg(john, Z)—that is, "who are the same generation cousins of
John?"—asked of the predicate, our answer to the query must examine only the part of the database
that is relevant—namely the part that involves individuals somehow connected to John.
A top-down, or backward-chaining search would start from the query as a goal and use the rules from
head to body to create more goals; none of these goals would be irrelevant to the query, although some
might cause us to explore paths that happen to "deadend." On the other hand, a bottom-up or forward-
chaining search, working from the bodies of the rules to the heads, would cause us to infer sg facts that
would never even be considered in the top-down search. Yet bottom-up evaluation is desirable because
it avoids the problems of looping and repeated computation that are inherent in the top-down approach,
and allow us to use set-at-a-time operations, such as relational joins.
Magic sets rule rewriting is a technique that allows us to rewrite the rules as a function of the query
form only—that is, it considers which arguments of the predicate are bound to constants and which are
variable, so that the advantages of top-down and bottom-up methods are combined. The technique
focuses on the goal inherent in the top-down evaluation but combines this with the looping freedom,
easy termination testing, and efficient evaluation of bottom-up evaluation. Instead of giving the
discussed earlier, does not hold. In the presence of negated literals, a program may not have a minimal
or least model. For example, the program p(a):- not p(b). has two minimal models: {p(a)} and {p(b)}.
A detailed analysis of the concept of negation is beyond our scope. But for practical purposes, we next
discuss stratified negation, an important notion used in deductive system implementations.
The meaning of a program with negation is usually given by some "intended" model. The challenge is
to develop algorithms for choosing an intended model that does the following:
1. Makes sense to the user of the rules.
2. Allows us to answer queries about the model efficiently.
In particular, it is desirable that the model work well with the magic sets transformation, in the sense
that we can modify the rules by some suitable generalization of magic sets, and the resulting rules
allow (only) the relevant portion of the selected model to be computed efficiently. (Alternatively, other
efficient evaluation techniques must be developed.)
One important class of negation that has been extensively studied is stratified negation. A program is
stratified if there is no recursion through negation. Programs in this class have a very intuitive
semantics and can be efficiently evaluated. The example that follows describes a stratified program.
Consider the following program P
2
: r
1
: ancestor(X,Y) :- parent (X,Y).
r
25.6.3 The CORAL System
The founding event of the deductive database field can be considered to be the Toulouse workshop on
"Logic and Databases" organized by Gallaire, Minker, and Nicolas in 1977. The next period of the
explosive growth started with the setting up of the MCC (Microelectronics and Computer Technology
Corporation), which was a reaction to the Japanese Fifth Generation Project. Several experimental
deductive database systems have been developed and a few have been commercially deployed. In this
section we briefly review three different implementations of the ideas presented so far: LDL, NAIL!,
and CORAL. 25.6.1 The LDL System
Background, Motivation, and Overview
The LDL Data Model and Language
The Logic Data Language (LDL) project at Microelectronics and Computer Technology Corporation
(MCC) was started in 1984 with two primary objectives:
• To develop a system that extends the relational model yet exploits some of the desirable
features of an RDBMS (relational database management system).
• To enhance the functionality of a DBMS so that it works as a deductive DBMS and also
supports the development of general-purpose applications.
The resulting system is now a deductive DBMS made available as a product. In this section, we briefly
survey the highlights of the technical approach taken by LDL and consider its important features. Background, Motivation, and Overview
The design of the LDL language may be viewed as a rule-based extension to domain calculus-based
languages (see Section 9.4). The LDL system has tried to combine the expressive capability of Prolog
with the functionality and facility of a general-purpose DBMS. The main drawback experienced by
earlier systems that coupled Prolog with an RDBMS is that Prolog is navigational (tuple-at-a-time)
1
A particular employee record can therefore be defined as follows: Employee (Name (John Doe), Job(VP),
Education ({(High school, 1961),
(College (Fergusson, bs, physics), 1965),
(College (Michigan, phd, ie), 1976)})) In the preceding record, VP is a simple term, whereas education is a complex term that consists of a
term for high school and a nested relation containing the term for college and the year of graduation.
LDL thus supports complex objects with an arbitrarily complex structure including lists, set terms,
1
Page 710 of 893
trees, and nested relations. We can think of a compound term as a Prolog structure with the function
symbol as the functor.
LDL allows updates in the bodies of rules. For instance, a rule happy (Dept, Raise, Name) <-
emp (Name, Dept, Sal), Newsal = Sal+Raise
-emp (Name, Dept, -), +emp(Name,Dept,Newsal). combined with ?happy(software, 1000, Name). The preprocessor rewrites the source NAIL! program by isolating "negation" and "set" operators, and
by replacing disjunction with several conjunctive rules. After preprocessing, the NAIL! program is
represented through its predicates and rules. The strategy selection module takes as input the user’s
goal and produces as output the best execution strategies for solving the user’s goal and all the other
goals related to it, using the internal language ICODE.
The ICODE statements produced as a result of the strategy selection process are optimized and then
executed through an interpreter, which translates ICODE retrieval statements to SQL when needed.
An initial prototype system was built but later abandoned because the purely declarative paradigm was
found to be unworkable for many applications. The revised system uses a core language, called GLUE,
which is essentially single logical rules, with the power of SQL statements, wrapped in conventional
language constructs such as loops, procedures, and modules. The original NAIL! language becomes a
view mechanism for GLUE; it permits fully declarative specifications in situations where
declarativeness is appropriate. 25.6.3 The CORAL System
The CORAL system, which was developed at the University of Wisconsin at Madison, builds on
experience gained from the LDL project. Like LDL, the system provides a declarative language based
on Horn clauses with an open architecture. There are many important differences, however, in both the
language and its implementation. The CORAL system can be seen as a database programming
language that combines important features of SQL and Prolog.
From a language standpoint, CORAL adapts LDL’s set-grouping construct to be closer to SQL’s
GROUP BY construct. For example, consider budget(Dname,sum(<Sal>)) :- dept(Dname,Ename,Sal).
nonground tuples efficiently, in addition to techniques such as magic templates for pushing selections
into recursive queries, pushing projections, and special optimizations of different kinds of (left- and
right-) linear programs. It also provides an efficient way to compute nonstratified queries. A "shallow-
compilation" approach is used, whereby the run-time system interprets the compiled plan. CORAL
uses the EXODUS storage manager to provide support for disk-resident relations. It also has a good
interface with C++ and is extensible, enabling a user to customize the system for special applications
by adding new data types or relation implementations. An interesting feature is an explanation package
that allows a user to examine graphically how a fact is generated; this is useful for debugging as well as
for providing explanations. 25.7 Deductive Object-Oriented Databases
25.7.1 Overview of DOODs
25.7.2 VALIDITY
The emergence of deductive database concepts is contemporaneous with initial work in Logic
Programming. Deductive object-oriented databases (DOODs) came about through the integration of the
OO paradigm and logic programming. The observation that OO and deductive database systems
generally have complementary strengths and weaknesses gave rise to the integration of the two
paradigms. 25.7.1 Overview of DOODs
1
Page 713 of 893
Since the late 1980s, several DOOD prototypes were developed in universities and research
laboratories. VALIDITY, which was developed at Bull, is the first industrial product in the DOOD
arena. The LDL and the CORAL systems we reviewed offer some additional object-orientated
features—e.g., in CORAL++ —and may be considered as DOODs.
The following broad approaches have been adopted in the design of DOOD systems:
• Language extension: An existing deductive language model is extended with object-oriented
VALIDITY integrates the traditional functions of a database (persistency, concurrency control, crash
recovery, etc.) with the advanced deductive capabilities for deriving information and verifying
semantic integrity. The lowest level component of the engine is a fact manager that integrates storage,
concurrency control, and recovery functions. The fact manager supports fact identity and complex data
items. In addition to locking, the concurrency control protocol integrates read-consistency technology,
used in particular when verifying constraints. The higher-level component supports the DEL language
and performs optimization, compilation, and execution of statements and queries. The engine also
supports an SQL interface permitting SQL queries and updates to be run on VALIDITY data.
VALIDITY also has a deductive wrapper for SQL systems, called DELite. This supports a subset of
DEL functionality (no constraints, no recursion, limited object capabilities, etc.) on top of commercial
SQL systems.
1
Page 714 of 893DEL Data Model
The DEL data model integrates a rich type system with primitives to define persistent and derived data.
The DEL type system consists of built-in types, which can be used to implement user-defined and
composite types. Composite types are defined using four type constructors: (1) bag, (2) set, (3) list, and
(4) tuple.
The basic unit of information in VALIDITY is called a fact. Facts are instances of predicates, which
are logical constructs characterized by a name and a set of typed attributes. A fact specifies values to
the attributes of the predicate of which it is an instance. There are four kinds of predicates and facts in
VALIDITY:
1. Basis facts: Are persistent units of information stored in the database; they are instances of
basis predicates, which have attributes and methods and are organized into inheritance
hierarchies.
2. Derived facts: Are deduced from basis facts stored in the database or other derived facts; they
are instances of derived predicates.
3. Computed predicates and facts: These are similar to derived predicates and facts, but they are
has been applied to genome data analysis in the field of microbiology, where data dredging
consists of identifying the DNA sequences from low-level digitized autoradiographs from
experiments performed on E. coli bacteria.
• Software reuse: The bulk of the software for an application is developed in standard
procedural code, and a small fraction is rule-based and encoded in LDL. The rules give rise to
a knowledge base that contains the following elements:
A definition of each C module used in the system.
A set of rules that defines ways in which modules can export/import functions, constraints, and so on. The "knowledge base" can be used to make decisions that pertain to the reuse of software subsets.
Modules can be recombined to satisfy specific tasks, as long as the relevant rules are satisfied. This is
being experimented with in banking software. 25.8.2 VALIDITY Applications
Knowledge independence is a term used by VALIDITY developers to refer to a technical version of
business rule independence. From a database standpoint, it is a step beyond data independence that
brings about integration of data and rules. The goal is to achieve streamlining of application
development (multiple applications share rules managed by the database), application maintenance
(changes in definitions and in regulations are more easily done), and ease-of-use (interactions are done
through high-level tools enabled by the logic foundation). For instance, it simplifies the task of the
application programmer who does not need to include tests in his application to guarantee the
soundness of his transactions. VALIDITY claims to be able to express, manage, and apply the business
rules governing the interactions among various processes within a company.
VALIDITY is an appropriate tool for applying software engineering principles to application
development. It allows the formal specification of an application in the DEL language, which can then
be directly compiled. This eliminates the error-prone step that most methodologies based on entity-
relationship conceptual designs and relational implementations require between specification and
compilation. The following are some application areas of the VALIDITY system:
management. This field has been influenced by logic programming languages, particularly by Prolog.
A subset of Prolog called Datalog, which contains function-free Horn clauses, is primarily used as the
basis of current deductive database work. Concepts of Datalog were introduced here. We discussed the
standard backward-chaining inferencing mechanism of Prolog and a forward-chaining bottom-up
strategy. The latter has been adapted to evaluate queries dealing with relations (extensional databases),
by using standard relational operations together with Datalog. Procedures for evaluating nonrecursive
and recursive query processing were discussed and algorithms presented for naive and seminaive
evaluation of recursive queries. Negation is particularly difficult to deal with in such deductive
databases; a popular concept called stratified negation was introduced in this regard.
We surveyed a commercial deductive database system called LDL originally developed at MCC and
other experimental systems called CORAL and NAIL!. The latest deductive database implementations
are called DOODs. They combine the power of object orientation with deductive capabilities. The most
recent entry on the commercial DOOD scene is VALIDITY, which we discussed here briefly. The
deductive database area is still in an experimental stage. Its adoption by industry will give a boost to its
development. Toward this end, we mentioned practical applications in which LDL and VALIDITY are
proving to be very valuable. Exercises
25.1. Add the following facts to the example database in Figure 25.03: supervise (ahmad,bob), supervise (franklin,gwen). First modify the supervisory tree in Figure 25.01(b) to reflect this change. Then modify the
diagram in Figure 25.04 showing the top-down evaluation of the query superior(james,
Y).
25.2.
Consider the following set of facts for the relation parent(X, Y), where Y is the parent of X:
Notice that "father(X, Y)" means that Y is the father of X; "ancestor(X, Y)" means that
Y is the ancestor of X. Consider the fact base father(Harry,Issac), father(Issac,John), father(John,Kurt).
a. Construct a model theoretic interpretation of the above rules using the given facts.
b. Consider that a database contains the above relations father(X, Y), another
relation brother(X, Y), and a third relation birth(X, B), where B is the birthdate
of person X. State a rule that computes the first cousins of the following variety: their
fathers must be brothers.
c. Show a complete Datalog program with fact-based and rule-based literals that
1
Page 718 of 893
computes the following relation: list of pairs of cousins, where the first person is born
after 1960 and the second after 1970. You may use "greater than" as a built-in
predicate. (Note: Sample facts for brother, birth, and person must also be shown.)
25.4. Consider the following rules: reachable(X,Y) :- flight(X,Y)
reachable(X,Y) :- flight(X,Z), reachable(Z,Y) where reachable(X, Y) means that city Y can be reached from city X, and flight(X, Y)
means that there is a flight to city Y from city X.
a. Construct fact predicates that describe the following:
i. Los Angeles, New York, Chicago, Atlanta, Frankfurt, Paris, Singapore,
Sydney are cities.
ii. The following flights exist: LA to NY, NY to Atlanta, Atlanta to Frankfurt,
1
Page 719 of 893
sgc(X,Y) :- eq(X,Y).
sgc(X,Y) :- par(X,X1), sgc(X1,Y1), par(Y,Y1). and the EDB, PAR = {(d, g), (e, g), (b, d), (a, d), (a, h), (c, e)}. What is the result
of the query sgc(a,Y)? Solve using the naive and seminaive methods.
25.6. The following rules have been given: path(X,Y) :- arc(X,Y).
path(X,Y) :- path(X,Z), path(Z,Y). Suppose that the nodes in a graph are {a, b, c, d} and there are no arcs. Let the set of paths, P
= {(a, b), (c, d)}. Show that this model is not a fixed point.
25.7. Consider the frequent flyer Skymiles program database at an airline. It maintains the following
relations: 99status(X,Y), 98status(X,Y), 98Miles(X,Y).
d. Does the superior have a fixpoint? Why or why not? Explain.
For the population of players in the database, assuming John is one of the players, how do you
compute "superior (john, X)?" using naive, and seminaive algorithms? Selected Bibliography
The early developments of the logic and database approach are surveyed by Gallaire et al. (1984).
Reiter (1984) provides a reconstruction of relational database theory, while Levesque (1984) provides a
discussion of incomplete knowledge in light of logic. Gallaire and Minker (1978) provide an early
book on this topic. A detailed treatment of logic and databases appears in Ullman (1989, vol. 2), and
there is a related chapter in Volume 1 (1988). Ceri, Gottlob, and Tanca (1990) present a comprehensive
yet concise treatment of logic and databases. Das (1992) is a comprehensive book on deductive
databases and logic programming. The early history of Datalog is covered in Maier and Warren (1988).
Clocksin and Mellish (1994) is an excellent reference on Prolog language.
Aho and Ullman (1979) provide an early algorithm for dealing with recursive queries, using the least
fixed-point operator. Bancilhon and Ramakrishnan (1986) give an excellent and detailed description of
the approaches to recursive query processing, with detailed examples of the naive and seminaive
approaches. Excellent survey articles on deductive databases and recursive query processing include
Warren (1992) and Ramakrishnan and Ullman (1993). A complete description of the seminaive
approach based on relational algebra is given in Bancilhon (1985). Other approaches to recursive query
processing include the recursive query/subquery strategy of Vieille (1986), which is a top-down
interpreted strategy, and the Henschen-Naqvi (1984) top-down compiled iterative strategy. Balbin and
Rao (1987) discuss an extension of the seminaive differential approach for multiple predicates.
1
Page 721 of 893