Tài liệu Managing time in relational databases- P2 - Pdf 92

bi-temporal constructs and transformations. If IT and business
management in your own organizations are wise, and if your
initial implementations are successful, then your organizations
will be positioned on the leading edge of a revolution in the man-
agement of data, a position from which business advantage over
trailing edge adopters will continue to be enjoyed for many years.
Theory is practical, as we hope this book will demonstrate. But
the relationship of theory and practice is a two-way street. Com-
puter scientists are theoreticians, working from theory down to
practice, from mathematical abstractions to their best under-
standings of how those abstractions might be put to work. IT pro-
fessionals are practitioners, working from specific problems up to
best practice approaches to classes of similar problems.
Common ground can sometimes be reached, ground where
the “best understandings” of computer scientists meet the “best
practices” of IT professionals. Here, theoreticians may glimpse
the true complexities of the problems to which their theories
are intended to be relevant. Here, practitioners may glimpse
the potential of powerful abstractions to make their best
practices even better.
We conclude with an example and a maxim about the inter-
play of theory and practice.
The example: Leonard Euler, one of history’s greatest
mathematicians, created the field of mathematical graph theory
while thinking about various paths he had taken crossing the
bridges of Konigsberg, Germany during Sunday afternoon walks.
The maxim: to paraphrase Immanuel Kant, one of history’s
greatest philosophers: “theory without practice is empty; prac-
tice without theory is blind”.
Glossary References
Glossary entries whose definitions form strong inter-

1
AN INTRODUCTION TO
TEMPORAL DATA
MANAGEMENT
Chapter Contents
1. A Brief History of Temporal Data Management 11
2. A Taxonomy of Bi-Temporal Data Management Methods 27
Historical data first manifested itself as the backups and
logfiles we kept and hoped to ignore. We hoped to ignore those
datasets because if we had to use them, it meant that some-
thing had gone wrong, and we had to recover a state of the
database prior to when that happened. Later, as data storage
andaccesstechnologymadeitpossibletomanagemassively
larger volumes of data than ever before, we brought much of
that historical data on-line and organized it in two different
ways. On the one hand, backups were stacked on top of one
another and turned into data warehouses. On the other hand,
logfiles were supplemented with foreign keys and turned into
data marts.
We don’t mean to say that this is how the IT or computer
science communities thought of the development and evolution
of warehouses and marts, as it was happening over the last
two decades. Nor is it how they think of warehouses and marts
Managing Time in Relational Databases. Doi: 10.1016/B978-0-12-375041-9.00023-6
Copyright
#
2010 Elsevier Inc. All rights of reproduction in any form reserved.
1
today. Rather, this is more like what philosophers call a rational
reconstruction of what happened. It seems to us that, in fact,

whelmed were those technologies not rapidly advancing
themselves.
We have already mentioned, in the Preface, the differences
between non-temporal and temporal data and, in the latter cate-
gory, the two ways that time and data are interwoven. How-
ever it is not until Part 2 that we will begin to discuss the
complexities of bi-temporal data, and how Asserted Versioning
renders that complexity manageable. But since there are any
number of things we could be talking about under the joint
heading of time and data, and since it would be helpful to
narrow our focus a little before we get to those chapters, we
would like to introduce a simple mental model of this key
set of distinctions.
2
Part 1 AN INTRODUCTION TO TEMPORAL DATA MANAGEMENT
Non-Temporal, Uni-Temporal and
Bi-Temporal Data
Figure Part 1.1 is an illustration of a row of data in three dif-
ferent kinds of relational table.
1
id is our abbreviation for
“unique identifier”, PK for “primary key”, bd
1
and ed
1
for one pair
of columns, one containing the begin date of a time period and
the other containing the end date of that time period, and bd
2
and ed

2
This book illustrates the management of temporal data with time periods delimited
by dates, although we believe it will be far more common for developers to use
timestamps instead. Our use of dates is motivated primarily by the need to display
rows of temporal data on a single printed line.
data
PK
non-temporal
uni-temporal
bi-temporal
PK
data
id
data
|------------ PK -----------|
bd
1
bd
1
id
id
ed
1
ed
1
ed
2
bd
2
|------ -----|

For convenience, dates are represented as a month and a year.
The two rows for customer id-1 show a history of that customer
over the period May 2012 to January 2013. From May to August,
the customer’s data was 123; from August to January, it was 456.
Now we can have multiple rows for the same customer in our
Customer table, and we (and the DBMS) can keep them distinct.
Each of these rows is a version of the customer, and the table is
now a versioned Customer table. We use this terminology in this
book, but generally prefer to add the term “uni-temporal”
because the term “uni-temporal” suggests the idea of a single
temporal dimension to the data, a single kind of time associated
with the data, and this notion of one (or two) temporal
dimensions is a useful one to keep in mind. In fact, it may be
useful to think of these two temporal dimensions as the X and
Y axes of a Cartesian graph, and of each row in a bi-temporal
table as represented by a rectangle on that graph.
Nowwecometothelastofthethreeillustrationsin
Figure Part 1.1.
P
retty clearly, we can transform the second
table into this third table exactly the same way we transformed
the first into the second: we can add another pair of dates to
the primary key. And just as clearly, we achieve the same effect.
Just as the first two date columns allow us to keep multiple
rows all having the same identifier,
bd
2
and ed
2
allow us to keep

important things. However, it also has the potential to mislead
us if we are not careful. So let’s try to draw the valid conclusions
we can from it, and remind ourselves of what conclusions we
should not draw.
First of all, the third illustration in Figure Part 1.1 do
es
show us
a valid bi-temporal schema. It is a table whose primary key
contains three logical components. The first is a unique identifier
of the object which the row represents. In this case, it is a specific
customer. The second is a unique identifier of a period of time.
That is the period of time during which the object existed with
the characteristics which the row ascribes to it, e.g. the period of
time during which that particular customer had that specific
name and address, that specific customer status, and so on.
The third logical component of the primary key is the pair of
da
tes
which define a second time period. This is the period of
time during which we believe that the row is correct, that what
it says its object is like during that first time period is indeed
true. The main reason for introducing this second time period,
then, is to handle the occasions on which the data is in fact
wrong. For if it is wrong, we now have a way to both retain the
error (for auditing or other regulatory purposes, for example)
and also replace it with its correction.
Now we can have two rows that have exactly the same identi-
fier, and exactly the same first time period. And our convention
will be that, of those two rows, the one whose second time period
begins later will be the row providing the correction, and the one

bd
2
and ed
2
. If the specified date is any date from
August 2012 to March 2013, it will produce an as-was report. It
will show only the first three rows because the specified date
does not fall within the second time period for the fourth row
in the table. But if the specified date is any date from March
2013 onwards, it will produce an as-is report. That report will
show all rows but the second row because it falls within the sec-
ond time period for those rows, but does not fall within the
second time period for the second row.
Both reports will show the continuous history of customer id-
1 from May 2012 to January 2013. The first will report that cus-
tomer id-1 had data 123 and 456 during that period of time.
The second will report that customer id-1 had data 123 and
457 during that same period of time. So
bd
1
and ed
1
delimit
the time period out in the world during which things were as
the data describes them, whereas
bd
2
and ed
2
delimit a time

Aug12
Aug12
Jul12 Jul12
Mar13
Jan13
Nov12
Jan13
id-1
id-1
id-2
Figure Part 1.3 A Bi-Temporal Table.
6
Part 1 AN INTRODUCTION TO TEMPORAL DATA MANAGEMENT


Nhờ tải bản gốc
Music ♫

Copyright: Tài liệu đại học © DMCA.com Protection Status