FUNDAMENTALS OF DATABASE SYSTEMS Fourth Edition phần 10 doc - Pdf 21

29.2 Multimedia Databases I
929
•
Marketing,
advertising,
retailing,
entertainment, and
travel:
There
are virtually no limits
to using multimedia information in these
applications-from
effective sales presenta-
tions
to
virtual tours of cities
and
art
galleries.
The
film industry has already shown
the
power of special effects in creating animations
and
synthetically designed ani-
mals, aliens,
and
special effects.
The
use of predesigned stored objects in multimedia
databases will

data
management,
and
therefore there
are
none
that
have
the
range of functionality required to fully support all of
the
multimedia information
management
applications
that
we discussed above. However,
several
OBMSs
today support multimedia data types; these include lnformix Dynamic
Server,
OB2
Universal database
(UOB)
of
IBM,
Oracle
9
and
10, CA-
JASMINE,

much
apparent
attention
to scalability
and
performance.
There
are products available
that
operate
either
stand-alone or in
conjunction
with
other
vendors' systems to allow retrieval of image
data
by
content.
They
include Virage, Excalibur,
and
IBM's
QBIC.
Operations
on
multimedia
need
to be
standardized.

systems
related
to
the
requirements of
multimedia
databases. Grosky et al. (1997) con-
tains
contributed
articles including a survey
on
content-based
indexing
and
retrieval by
]agadish
(1997).
Faloutsos
et
al. (1994) also discuss a system for image querying by
con-
tent.
Li et al. (1998)
introduce
image
modeling
in
which
an image is viewed as a hierar-
chical

web;
the
semantic
web effort is summarized
in
Fensel (2000).
Khan
(2000) did a
dissertation
on
ontology-based
information
retrieval.
Uschold
and
Gruninger
(1996) is
a good resource
on
ontologies
Corcho
et
al. (2003) compare ontology languages
and
discuss
methodologies
to
build
ontologies.
Multimedia

Excalibur technologies: http://www.excalib.com
Virage, Inc
(Content
based image retrieval): http://www.virage.com
IBM's
QBlC
(Query by Image
Content)
product:
29.3 GEOGRAPHIC INFORMATION
SYSTEMS
Geographic information systems
(GIS)
are used to collect, model, store,
and
analyze
information describing physical properties of
the
geographical world.
The
scope of
GIS
broadly encompasses two types of data: (1) spatial data, originating from maps, digital
images, administrative
and
political boundaries, roads, transportation networks; physical
data
such
as rivers, soil characteristics, climatic regions,
land

and
air quality. In geographic
objects applications, objects of interest are identified from a physical
domain-for
example, power plants, electoral districts, property parcels, product distribution districts,
and
city landmarks.
These
objects are related
with
pertinent
application
data-which
may be, for this specific example, power consumption, voting patterns, property sales
volumes,
product
sales volume,
and
traffic density.
The
first two categories of GIS applications require a field-based representation,
whereas
the
third
category requires an object-based one.
The
cartographic approach
involves special functions
that
can

surface terrain.
It
requires functions of interpolation
between
observed points as well as visualization. Inobject-based geographic applications,
additional spatial functions are needed to deal
with
data
related to roads, physical
pipelines,
communication
cables, power lines,
and
such. For example, for a given region,
29.3 Geographic Information
Systems
I 931
GIS Applications
r>:
Cartographic
Irrigation
Crop yield
analysis
Land
evaluation
Planning and
facilities
management
Landscape
studies

Gangopadhyay (1997)).
comparable maps
can
be used for comparison at various points of time to show changes in
certain
data
such as locations of roads, cables, buildings,
and
streams.
29.3.2 Data Management Requirements
of
GIS
The
functional requirements of
the
GIS applications above translate into
the
following data-
base requirements.
Data
Modeling
and Representation. GIS data
can
be broadly represented in two
formats:
(l)
vector and (2) raster. Vector data represents geometric objects such as points,
lines, and polygons.
Thus
a lake may be represented as a polygon, a river by a series of line

density
that
may vary with
the
roughness of
the
terrain. Rectangular grids (or elevation
932 IChapter 29 Emerging Database Technologies and Applications
matrices) are two-dimensional array structures. In digital
terrain
modeling
(OTM),
the
model also may be used by substituting
the
elevation with some attribute of interest such as
population density or air temperature.
GIS
data often includes a temporal structure in addi-
tion to a spatial structure. For example, traffic flow or average vehicular speeds in traffic may
be measured every 60 seconds at a set of points in a roadway nework.
Data
Analysis.
GIS
data undergoes various types of analysis. For example, in applica-
tions such as soil erosion studies, environmental impact studies, or hydrological runoff simu-
lations,
OTM
data may undergo various types of geomorphometric
analysis-measurements

Chapter
24.
Data
Integration.
GISs
must integrate
both
vector and raster data from a variety of
sources. Sometimes edges and regions are inferred from a raster image to form a vector model,
or conversely, raster imagessuch as aerial photographs are used to update vector models.
Sev-
eral coordinate systemssuch as Universal Transverse Mercator
(UTM),
latitude/longitude, and
local cadastral systems are used to identify locations. Data originating from different coordi-
nate systems requires appropriate transformations. Major public sources of geographic data,
including the
TIGER
files maintained by U.S. Department of Commerce, are used for road
maps by many Web-based map drawing tools (e.g., http://maps.yahoo.com). Often there are
high-accuracy, attribute-poor maps
that
have to be merged with low-accuracy, attribute-rich
maps. This is done with a process called "rubber-banding" where the user defines a set of
con-
trol points in both maps and the transformation of the low accuracy map is accomplished by
lining up
the
control points. A major integration issue is to create and maintain attribute
information (such as air quality or traffic flow), which can be related to and integrated with

surveys are
the
traditional
approach and
the
most accurate,
but
they are very time consuming.
Other
techniques
include photogrammetric sampling and digitizing cartographic documents.
29.3 Geographic Information Systems I
933
29.3.3 Specific GIS Data Operations
GISapplications are conducted through
the
use of special operators such as
the
following:
1.
Interpolation:
This
process derives elevation
data
for points at
which
no samples
have
been
taken.

terrain
data
such as editing, smoothing, reducing details, and enhancing.
Additional
operations involve
patching
or zipping
the
borders of triangles
(in
TIN
data),
and
merging,
which
implies combining overlapping models
and
resolving
conflicts
among
attribute data. Conversions among grid models,
contour
models,
and
TIN
data
are involved in
the
interpretation
of

(2) digital image analysis,
which
deals
with
analysis of a digital image for features such as edge
detection
and object detection.
Detecting
roads in a satellite image of a city is an example of
the
latter.
5. Analysis of
networks:
Networks occur in GIS in many contexts
that
must be ana-
lyzed and may be subjected to segmentations, overlays, and so on. Network overlay
refers to a type of spatial join where a given
network-for
example, a highway net-
work-is
joined
with
a
point
database-for
example, incident
locations-to
yield,
in

is of par-
amount
importance for providing accurate results to queries.
This
problem is par-
ticularly significant in
the
GIS
context
because of
the
variety of data, sources, and
measurement techniques involved
and
the
absolute accuracy expected by applica-
tions users.
6.
Visualization:
A crucial function in GIS is related to
visualization-the
graphical
display of terrain information
and
the
appropriate representation of application
934
IChapter 29 Emerging Database Technologies and Applications
attributes to go
with

that
standard
RDBMSs
or
ODBMSs
do
not
meet the
special needs of
GIS.
It is therefore necessary to design systems
that
support
the
vector and
raster representations and
the
spatial functionality as well as
the
required
DBMS
features. A
popular
GIS
software called
ARC-INFO,
which is not a
DBMS
but integrates
RDBMS

coverage
in ARC/INFO-eonsists of three primitives: (1) nodes (points), (2) arcs (similar to
lines), and (3) polygons.
The
arc is
the
most important of
the
three and stores a large
amount
of topological information.
An
arc has a start node and an
end
node (and it there-
fore has direction too).
Inaddition,
the
polygons to
the
left and
the
right of
the
arc are also
stored along with
each
arc. As there is no restriction on
the
shape of

node (e.g., names of the
intersecting roads at
the
node).
The
AAT contains an internal !D for
the
are, a
user-
specified !D,
the
internal
!D of
the
start
and
end
nodes,
the
internal !D of
the
polygons to
the
left
and
the
right, a series of coordinates of shape points (if any),
the
length
of the are,

county
the
polygon represents).
Typical spatial queries are related to adjacency, containment, and connectivity. The arc
node model has enough information to satisfyall three types of queries, but the
RDBMS
isnot
ideally suited for this type of querying. A simple example will highlight the number of timesa
relational database has to be queried to extract adjacency information. Assume that we are
trying to determine whether two polygons, A and
B, are adjacent to each other. We
would
have to exhaustively look at
the
entire AAT
to
determine whether there is an edge that has A
29.3 Geographic Information Systems I 935
on one side and B on the other.
The
search cannot be limited to the edges of either polygon as
we do
not
explicitly store all the arcs
that
make a polygon in the
PAT.
Storing all the arcs in
the
PAT

embedded
within
a
GIS.
29.3.5 Problems and Future
Issues
in GIS
GIS
is an expanding application area of databases, reflecting an explosion in
the
number of
end
users using digitized maps, terrain data, space images, weather data, and traffic informa-
tion
support data. As a consequence, an increasing number of problems related to
GIS
appli-
cations has
been
generated and will need to be solved:
1. New
architectures:
GISapplications will
need
a
new
client-server architecture
that
will benefit from existing advances in
RDBMS

independent
databases with an automatic posting of
updates across
them.
Appropriate tools for
data
transfer,
change
management,
and
workflow
management
will be required.
2. Versioningand
object
life-cycle
approach:
Because of constantly evolving geographi-
cal features,
GISs
must
maintain
elaborate cartographic
and
terrain
data-a
man-
agement
problem
that

models,
formalization of
data
transfer.standards is crucial for
the
success of
GIS.
The
inter-
national
standardization body (rso
Tc2l0
and
the
European standards body
(CEN
Tc278)
are
now
in
the
process of debating relevant
issues-among
them
conversion between vector
and
raster
data
for fast query performance.
4. Matching

science, hydrology,
and
agriculture will require more area-oriented
and
terrain model data. It is
not
clear
that
all this functionality
can
be supported by a single general-purpose
GIS.
The
specialized needs of
GISs
will require
that
general purpose
DBMSs
must be
936
IChapter 29 Emerging Database Technologies and Applications
enhanced
with
additional
data
types
and
functionality before full-fledged
GIS

GIS.
29.3.6 Selected Bibliography for GIS
There
are a number of books written
on
GIS.
Adam
and
Gangopadhyay (1997) and Laurini
and
Thompson (1992) focus on
GIS
database
and
information management problems.
Kemp (1993) gives an overview of GIS issues and
data
sources. Huxhold (1991) gives an
intruduction to
Urban
GIS.
Maguire et al. (1991) have a very good collection of GIS-related
papers.
Antenucci
(1998) presents a discussion of
the
GIS
technologies. Shekhar and
Chawla (2002) discusses issues and approaches to spatial
data

the
U.S.
Department
of Commerce (1993). Laser-Scan's
Web
site (http://www.lsl.co.uk/papers) is a good source of information.
Environmental
System Research
Institute
(ESRI)
has an
excellent
library of
GIS
books for all levels at http://www.esri.com.
The
GIS terminology is defined at http://
www.esri.com/library/glossary/glossary.html.
The
university of Edinburgh maintains a
GIS
WWW
resource list at http://www.geo.ed.ac.uk/home/giswww.html
29.4 GENOME DATA MANAGEMENT
29.4.1 Biological Sciences and Genetics
The
biological sciences encompass an enormous variety of information. Environmental sci-
ence gives us a view of
how
species live

Genetics
has emerged as an ideal field for
the
application of information technology.
In a broad sense, it
can
be
thought
of as
the
construction
of models based on information
29.4
Genome
Data Management I
937
about
genes-which
can
be defined as basic units of
heredity-and
populations
and
the
seeking
out
of relationships in
that
information.
The

genetic information
varies across populations of organisms.
Molecular genetics provides a more detailed look at genetic information by allowing
researchers to examine
the
composition, structure,
and
function of genes.
The
origins of
molecular genetics
can
be traced to two important discoveries.
The
first occurred in 1869
when
Friedrich Miescher discovered nuclein
and
its primary component, deoxyribonucleic
acid
(DNA).
In subsequent research DNA
and
a related compound, ribonucleic acid
(RNA),
were found to be composed of nucleotides (a sugar, a phosphate, and a base, which
combined to form nucleic acid) linked into long polymers via
the
sugar and phosphate.
The

and
its structure is hailed as probably
the
most important biological work of the last
100 years,
and
the
field it opened may be
the
scientific frontier for the
next
100. In 1962,
Watson, Crick, and Wilkins won
the
Nobel Prize for physiology/medicine for this
breakthrough.
7
29.4.2 Characteristics
of
Biological Data
Biological data exhibits many special characteristics
that
make management of biological
information a particularly challenging problem. We will thus begin by summarizing the
characteristics related to biological information, and focusing on a multidisciplinary field
called bioinforrnatics
that
has emerged, with graduate degree programs now in place in sev-
eral universities. Bioinformatics addresses information management of genetic information
with special emphasis on

and
to ensure
that
no
information is lost
6. See Nature, 171:737 1953.
7. http://www.pbs.org/wgbh/aso/databank/entries/doS3dn.html
938
I Chapter 29 Emerging Database Technologies and Applications
during biological
data
modeling.
The
structure of biological
data
often
provides an
additional
context
for
interpretation
of
the
information. Biological information systems
must be able to represent any level of complexity in any
data
schema, relationship, or
schema
substructure-not
just hierarchical, binary, or table data. As an example,

might
be expected, its management
has
encountered
a large
number
of problems; we
have
been
unable to use
the
traditional
RDBMS or ODBMS approches to capture all aspects of
the
data.
Characteristic
2: The amount and
range
of
variability
in data is
high.
Hence,
biological
systems must be flexible in
handling
data
types
and
values.

at a
rapid
pace. Hence, for
improved information flow between generations or releases of databases, schema
evolution
and
data
object
migration must be supported.
The
ability to
extend
the schema,
a frequent occurrence in
the
biological setting, is unsupported in most relational and
object database systems. Presently systems such as
GenBank
rerelease
the
entire database
with
new schemas
once
or twice a year
rather
than
incrementally changing
the
system as

the
results
often
reflecting
the
particular focus of
the
scientist.
While
two individuals
may produce different
data
models if asked
to
interpret
the
same entity, these models will
likely
have
numerous points in common. In such situations, it would be useful to
biological investigators to be able to
run
queries across these
common
points. By linking
data
elements in a
network
of schemas, this could be accomplished.
Characteristic

29.4
Genome
Data Management I
939
15,000 users per
month
on
the
Internet.
There
are fewer
than
twenty
noncurator
generated submissions
to
MITOMAP
every
month.
In
other
words,
the
number
of users
requiring write access is small. Users generate a wide variety of read-access
patterns
into
the
database, but these

is applicable to
the
problem they are trying
to
address
and
that
reflects
the
underlying
data
structure. Biological users usually know
which
data
they
require,
but
they
have
no
technical
knowledge of
the
data
structure or
how
a
DBMS
represents
the

they may
not
guarantee usability.
Characteristic 7: The context of data
gives
added
meaning for its use in
biological
applications.
Hence,
context
must be
maintained
and
conveyed to
the
user
when
appropriate. In addition, it should be possible to integrate as many
contexts
as possible
to
maximize
the
interpretation
of a biological
data
value. Isolated values are of less use in
biological systems. For example,
the

Without
any
knowledge of
the
data
structure (see Characteristic 6), average users
cannot
construct a
complex query across
data
sets on
their
own. Thus, in order
to
be truly useful, systems
must provide some tools for building these queries. As
mentioned
previously, many
systems provide predefined query templates.
Characteristic 9:
Users
of
biological
information often
require
access
to
"old"
values
of the

most up-to-date data, but
they must also be able
to
reconstruct previous work
and
reevaluate prior
and
current
information. Consequently, values
that
are about to be updated in a biological database
cannot
simply be
thrown
away.
All
of these characteristics clearly
point
to
the
fact
that
today's
DBMSs
do
not
fully
cater
to
the

with
an estimated 3 to 4 billion nucleotides.
The
goal of
the
Human
Genome
Project (HGP) has
been
to obtain
the
complete
sequence-the
ordering of the
bases-of
those nucleotides. A rough draft of entire
human
genome sequence was
announced
in June 2000
and
the
13-year effort will
end
in year 2003 with
the
completion of
the
human
genetic sequence. In isolation,

Drosophila,
and
C
.elegans
have
been
investigated. We will briefly discuss some of
the
existing database
sys-
tems
that
are supporting or
have
grown
out
of the
Human
Genome
Project.
GenBank.
The
preeminent
DNA
sequence database in
the
world today is GenBank,
maintained
by
the

each
month.
The
database size in flat file format is over 100 GB uncompressed
and
has
been
doubling
every 15 months.
Through
international
collaboration
with
the
European Molecular
Biology Laboratory(EMBL) in
the
U.K.
and
the
DNA
Data
Bank of Japan (DDBJ), data
are exchanged
among
the
three sites on a daily basis.
The
mirroring of sequence data at
the

existing
OMIM
and
PDB
databases
and
redesigning
the
structure of the
GenBank
system to accommodate these new
data
sets.
The
system is maintained as a
combination
of flat files, relational databases, and
files
containing
Abstract
Syntax
Notation
One
(ASN.l)-a
syntax for defining data structures
developed for
the
telecommunications industry. Each
GenBank
entry is assigned a unique

structure of
the
data
directly
for querying or
other
functions,
although
complete snapshots of
the
database are available
for
export
in a
number
of formats, including ASN.1.
The
query
mechanism
provided is via
the
Entrez application (or its World Wide
Web
version),
which
allows keyword,
sequence,
and
GenBank
UID searching

map depends
upon
the
source of
the
data,
but
it is usually
not
at
the
level of
individual nucleotide bases.
GOB
data
includes
data
describing primarily map information
(distance
and
confidence limits),
and
Polymerase
Chain
Reaction
(PCR) probe
data
(experimental conditions,
PCR
primers,

4).
The
implementors of
GOB
have
noted
difficulties in using this model to capture more
than
simple map
and
probe data.
In
order to improve
data
integrity
and
to simplify
the
programming for application writers,
GOB
distributes a Database Access Toolkit.
However, most users use a
Web
interface to search
the
ten
interlinked
data
managers.
Each

are most useful
when
users are simply looking
for an index
into
map or probe data. Exploratory ad
hoc
searching of
the
database is
not
encouraged by present interfaces.
Integration
of
the
database structures of
GOB
and
OMIM
(see below) was
never
fully established.
Online
Mendelian
Inheritance
in
Man.
Online
Mendelian
Inheritance

the
entire
database was
converted
to
NCBI's
GenBank
format. Today it
contains
more
than
14,000
entries.
OMIM
covers material on five disease areas based loosely
on
organs
and
systems.
Any
morphological, biochemical, behavioral, or
other
properties
under
study are referred to as
phenotype
of an individual (or a cell).
Mendel
realized
that

OMIM was transferred to
the
NCB!.
This
greatly improved
the
ability to link OMIM data to
other
databases and it also provided a rigorous structure for
the
data. However,
the
basic form of
the
database remained difficult to modify.
EcoCyc.
The
Encyclopedia of
Escherichia
coli Genes and Metabolism (EcoCyc) is a
recent experiment in combining information about
the
genome and
the
metabolism ofE.
coli
K-12.
The
database was created in 1996 as a collaboration between Stanford Research
Institute

model was first used to implement
the
system, with data
stored
on
Ocelot, a frame knowledge representation system. EcoCyc
data
was arranged in
a hierarchy of object classes based
on
the
observations
that
(1)
the
properties of a
reaction are
independent
of an enzyme
that
catalyzes it, and (2) an enzyme has a number
of properties
that
are "logically distinct" from its reactions.
EcoCyc provides two methods of querying: (1) direct (via predefined queries) and (2)
indirect (via hypertext navigation). Direct queries are performed using menus and dialogs
that
can
initiate a large
but

past
ten
years, there has been an increasing interest in the applications of
databases in biology and medicine. GenBank,
GOB, and OMIM have been created as central
repositories of certain types of biological data but, while extremely useful, they do not yet
cover
the
complete spectrum of
the
Human
Genome
Project data. However, efforts are
under way around
the
world to design new tools and techniques
that
will alleviate the data
management problem for
the
biological scientists and medical researchers.
Gene
Ontology.
We already explained
the
concept
of ontologies in Section 29.2.3
in
the
context

is likely to be a single limited universe of genes
and
proteins
that
are conserved in most or all living cells.
On
the
other
hand,
genome
data
is increasing exponentially
and
there
is no uniform way to interpret
and
conceptualize
the
shared biological elements.
Gene
Ontology makes possible
the
annotation
of gene products using a
common
vocabulary based on
their
shared biological
attributes
and

MAJOR
INITIAL
CURRENT
DB
PROBLEM
PRIMARY
DATA
NAME
CONTENT
TECHNOLOGY
TECHNOLOGY
AREAS
TYPES
Genbank
DNA/RNA
Text
files Flat-file/ASN.1
Schema
brows- Text, numeric,
sequence,
ing, schema
some complex
protein
evolution, link- types
ing to
other
dbs
OMIM Disease
Index cards/text
Flat-file/

Schema
expan- Text, numeric
linkage data,
sion/evolution,
sequence
data
linking to
other
(non-human)
dbs
HGMDB
Sequence
and
Flat
file-
Flat-file-
Schema
expan- Text
sequence application
application sion/evolution,
variants specific
specific linking to
other
dbs
EcoCyc Biochemical
00
00
Locked into
Complex
types,

and 5,244,000 associations between gene products and
GO
terms.
The
Gene
Ontology was implemented using MySQL, an
open
source relational
DBMS and a
monthly
database release is available in
SQL
and XML formats. A set of
tools and libraries,
written
in C, Java, Perl and XML etc, is available for database access
and
development
of
applications. Web-based
and
stand-alone
GO
browsers are available
from
the
GO
consortium.
29.4.4 Selected Bibliography for Genome Databases
Bioinformatics has become a popular area of research in recent years and many workshops

users
access and analysis of
the
data available in
the
databases.
Wallace (1995) has
been
a pioneer in
the
mitochondrial genome research, which
deals with a specific part of
the
human
genome;
the
sequence and organizational details of
this area appear
in
Anderson
et al. (1981)
Recent
work in Kogelnik et al. (1997, 1998)
and Kogelnik (1998) addresses
the
development of a generic solution to the data
management
problem in biological sciences by developing a prototype solution. Apweiler
et al. (2003) review
the

EBI databases including InterPro,
GO,
and SWISS-PROT, together
with links to SCOP,
CATH,
PFAM
and
PROSITE. Karp (1996) discusses
the
problems of
interlinking
the
variety of databases
mentioned
in this section. He defines two types of
29.4 Genome Data Management I 945
links: those
that
integrate
the
data and those
that
relate
the
data
between databases.
These were used to design
the
Ecocyc database.
Some of

accessed
from
http://expasy.hcuge.ch/sprot/.
The
ACEDB
database
information
is
available
at
http://probe.nalusda.gov:8080/acedocs/.
Alternative
Diagrammatic
Notations for
ER
Models
Figure
A.I
shows a
number
of different diagrammatic
notations
for representing ER
and
EER model concepts. Unfortunately,
there
is no standard
notation:
different database
design practitioners prefer different

and
constraints.
The
notation
we used in
Chapter
3 is
quite close to
the
original
notation
for ER diagrams,
which
is still widely used. We discuss
some
alternate
notations
here.
Figure
Al
(a) shows different
notations
for displaying
entity
types/classes, attributes,
and
relationships. In
Chapters
3
and

for
attaching
attributes
to
entity
types. We used
notation
(i).
Notation
(ii)
uses
the
third
notation
(iii) for attributes from Figure
Al
(a).
The
last two
notations
in Figure
Al(b)-(iii)
and
(iv)-are
popular in
OOA
methodologies
and
in some CASE tools. In particular,
the

(ii)
CD
(iii)
R
EMPLOYEE
(ii)
I 8
ssn
EMPLOYEE Name
Address
(iii)
EMPLOYEE
Ssn
Name
Address
(iv)
1 1
Ssn
Name
Address
(v)
T
~
c
(iv)
Hire_emp
(d)
Fire_emp
(c)
(i)

the
cardinality ratio of binary
relationships. We used
notation
(i) in
Chapters
3
and
24.
Notation
(ii)-known
as the
chicken
feet
notation-is
quite popular.
Notation
(iv) uses
the
arrow as a functional
reference (from
the
N
to
the
1 side)
and
resembles our
notation
for foreign keys in the

and
(iv) places arrowheads on
both
sides. For an
M:N
relationship,
(ii)
uses
chicken
feet at
both
ends of
the
line; (iii) makes
both
halves of
the
diamond
black;
and
(iv) does
not
display any arrowheads.
Figure
A.l(d)
shows several variations for displaying (min, max) constraints,
which
are used to display
both
cardinality ratio

max values are 1;
and
for M:N,
both
max values are n. A
min
value greater
than
0 (zero) specifies total
participation
(existence dependency). In methodologies
that
use
the
straight line for
displaying relationships, it is
common
to
reverse
the positioning of
the
(min, max)
constraints, as
shown
in (iii).
Another
popular
technique-which
follows
the

for displaying specialization/generalization. We
used
notation
(i) in
Chapter
14, where a d in
the
circle specifies
that
the
subclasses (S1,
S2,
and
S3) are disjoint
and
an a specifies overlapping subclasses.
Notation
(ii)
uses G
(for generalization) to specify disjoint,
and
Gs to specify overlapping; some
notations
use
the
solid arrow, while others use
the
empty arrow (shown at
the
side).

representing subclasses
within
the
box representing
the
superclass.
Of
the
notations
based
on (vi), some use a single-lined arrow,
and
others use a double-lined arrow (shown at
the
side).
The
notations
shown in Figure
A.l
show only some of
the
diagrammatic symbols
that
have
been
used or suggested for displaying database conceptual schemes.
Other
notations,
as well as various
combinations

block between
the
disk
and
a
main
mem-
ory buffer.
This
is
the
random
access time for accessing a disk block.
There
are three
time
components
to consider:
1.
Seek
time
(s):
This
is
the
time
needed
to mechanically position
the
read/write

the
disk manufacturer provides an average seek time in milliseconds.
The
typical range of average seek time is 10 to 60 msec.
This
is
the
main
"culprit" for
the
delay involved in transferring blocks between disk
and
memory.
2.
Rotational
delay
(rd):
Once
the
read/write
head
is at
the
correct track,
the
user
must wait for
the
beginning of
the

revolution
(if
the
start
of
the
required block just passed
the
read/write
head
after
951
952
IAppendix C Parameters of Disks
the
seek). If
the
speed
of
disk rotation is p revolutions per minute (rpm),
then
the
average rotational delay rd is given by
rd =
(1/2)*(1/p)
min =
(60*1000)/(2*p)
msec
A typical value for p is 10,000 rpm, which gives a rotational delay of rd = 3 msec.
For fixed-head disks, where

track size, and
the
rotational speed. If
the
transfer
rate
for
the
disk is tr bytes/msec and
the
block size is B bytes, then
btt
=
B/tr
msec
If we
have
a track size of 50 Kbvtes and p is 3600 rpm,
the
transfer rate in bytes/
msec is
tr
= (50*1000)/(60*1000/3600) = 3000
bytes/msec
In this case,
btt
= B/3000 msec, where B is
the
block size in bytes.
The

on
the
same
cylinder,
we
need
approximately
s + (k *
(rd
+
btt))
msec
In this case, we
need
two or more buffers in
main
storage, because we are
continuously reading or writing
the
k blocks, as we discussed in Section 4.3.
The
transfer
time per block is reduced
even
further
when
consecutive
blocks
on
the

rate
(btr)
that
takes
the
gap size into account
when
reading
consecutively stored blocks.
If
the
gap size is G bytes,
then
btr
= (B/(B + G)) *
tr
bytes/msec
The
bulk transfer rate is
the
rate of transferring useful bytes in
the
data
blocks. The
disk read/write head must go over all bytes
on
a track as
the
disk rotates, including
the

(B/btr))
msec
Another
parameter
of disks is
the
rewrite time.
This
is useful in cases
when
we read a
block from
the
disk
into
a
main
memory buffer, update
the
buffer,
and
then
write
the
buffer back
to
the
same disk block on
which
it was stored. In many cases,

disk revolution
the
updated buffer is
rewritten back to
the
disk block.
Hence,
the
rewrite time
Trw'
is usually estimated to be
the
time
needed
for
one
disk revolution:
Trw
= 2
~,
rd msec
To summarize, here is a list of
the
parameters we
have
discussed
and
the
symbols we
use for them:

syntax developed for database systems. It was devel-
oped at IBM Research
and
is available as an IBM commercial
product
as
part
of
the
QMF
(Query
Management
Facility) interface
option
to DB2.
The
language was also imple-
mented
in
the
PARADOX DBMS,
and
is related to a point-and-click type interface in
the
ACCESS DBMS (see
Chapter
10). It differs from SQL in
that
the
user does

to follow any rigid syntax rules for query specification; rather,
con-
stants
and
variables are
entered
in
the
columns of
the
templates to construct an example
related to
the
retrieval or update request. QBE is related to
the
domain
relational calculus,
as we shall see,
and
its original specification has
been
shown
to be relationally complete.
D.l BASIC RETRIEVALS
IN
QBE
In QBE, retrieval queries are specified by filling in
one
or more rows in
the

DNUMBER
I
DLOCATION
I
ESSN
I WORKS ON
I ~
HOURS I
PNAME
RELATIONSHIP
FIGURE
D.l
The relational schema of Figure 7.6 as it may be displayed by QBE.
domain
variable
and
is specified as an example value preceded by
the
underscore charac-
ter
L). Additionally, a
P.
prefix (called
the
P
dot
operator) is
entered
in certain columns
to indicate

QBE. In Figure 9.6(a) an example of an employee is pre-
sented
as
the
type of row
that
we are interested in. By leaving
John
B.
Smith
as constants
in
the
FNAME,
MINH,
and
LNAME
columns, we are specifying an
exact
match
in those columns.
All
the
rest of
the
columns are preceded by an underscore indicating
that
they are domain
(a)
ADDRESS

Nhờ tải bản gốc

Tài liệu, ebook tham khảo khác

FUNDAMENTALS OF DATABASE SYSTEMS Fourth Edition phần 10 doc - Pdf 21

Tài liệu, ebook tham khảo khác

Học thêm