Tài liệu High-Performance Parallel Database Processing and Grid Databases- P2 - Pdf 98

30 Chapter 1 Introduction
1.8 SUMMARY
This chapter focuses on three fundamental questions in parallel query processing,
namely, why, what,andhow, plus one additional question based on the technolog-
ical support. The more complete questions and their answers are summarized as
follows.
ž
Why is parallelism necessary in database processing?
Because there is a large volume of data to be processed and reasonable
(improved) elapsed time for processing this data is required.
ž
What can be achieved by parallelism in database processing?
The objectives of parallel database processing are (i ) linear speed up and
(ii) linear scale up. Superlinear speed up and superlinear scale up may happen
occasionally, but they are more of a side effect, rather than the main target.
ž
How is parallelism performed in database processing?
There are four different forms of parallelism available for database process-
ing: (i) interquery parallelism, (ii) intraquery parallelism, (iii) intraoperation
parallelism, and (iv) interoperation parallelism. These may be combined in
parallel processing of a database job in order to achieve a better performance
result.
ž
What facilities of parallel computing can be used?
There are four different parallel database architectures: (i) shared-memory,
(ii) shared-disk, (iii) shared-nothing, and (iv) shared-something architectures.
Distributed computing infrastructure is fast evolving. The architecture was
monolithic in 1970s, and since then, during the last three decades, developments
have been exponential. The architecture has evolved from monolithic, to open,
to distributed, and lately virtualization techniques are being investigated in the
form of Grid computing. The idea of Grid computing is to make computing a

1995).
Ongoing work on parallel databases is supported by the availability of parallel
machines and architectures. An excellent overview on parallel database architec-
ture was given by Bergsten, Couprie, and Valduriez (The Computer Journal 1993).
A thorough discussion on the shared-everything and shared-something architec-
tures was presented by Hua and Lee (PDIS 1991) and Valduriez (ICDE 1993).
More general parallel computing architectures, including SIMD and MIMD archi-
tectures, can be found in widely known books by Almasi and Gottlieb (1994) and
by Patterson and Hennessy (1994).
AnewwaveofGrid databases started in the early 2000s. A direction on this
area is given by Atkinson (BNCOD 2003), Jeffery (EDBT 2004), Liu et al. (SIG-
MOD 2003), and Malaika et al. (SIGMOD 2003). One of the most prominent
works in Grid databases is the DartGrid project by Chen, Wu et al., who have
reported their project in Concurrency and Computation (2006), at the GCC confer-
ence (2004), at the Computational Sciences conference (2004), and at the APWeb
conference (2005).
Realizing the importance of parallelism in database processing, many com-
mercial DBMS vendors have included some parallel processing capabilities in
their products, including Oracle (Cruanes et al. SIGMOD 2004) and Informix
(Weininger SIGMOD 2000). Oracle has also implemented some grid facilities
(Poess and Othayoth VLDB 2005). The work on parallel databases continues with
recent work on shared cache (Chandrasekaran and Bamford ICDE 2003).
1.10 EXERCISES
1.1. Assume that a query is decomposed into a serial part and a parallel part. The serial
part occupies 20% of the entire elapsed time, whereas the rest can be done in parallel.
Given that the one-processor elapsed time is 1 hour, what is the speed up if 10 pro-
cessors are used? (For simplicity, you may assume that during the parallel processing
of the parallel part the task is equally divided among all participating processors).
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
32 Chapter 1 Introduction

Analytical models are cost equations/formulas that are used to calculate the elapsed
time of a query using a particular parallel algorithm for processing. A cost equation
is composed of variables, which are substituted with speciﬁc values at runtime
of the query. These variables denote the cost components of the parallel query
processing.
In this chapter, we brieﬂy introduce basic cost components and how these are
used in cost equations. In Section 2.1, an introduction to cost models including their
processing paradigm is given. In Section 2.2, basic cost components and cost nota-
tions are explained. These are basically the variables used in the cost equations. In
Section 2.3, cost models for skew are explained. Skew is an important factor in paral-
lel database query processing. Therefore, understanding skew modeling is a critical
part of understanding parallel database query processing. In Section 2.4, basic cost
calculation for general parallel database processing is explained.
2.1 COST MODELS
To measure the effectiveness of parallelism of database query processing, it is nec-
essary to provide cost models that can describe the behavior of each parallel query
algorithm. Although the cost models may be used to estimate the performance of a
query, it is the primary intention to use them to describe the process involved and
for comparison purposes. The cost models also serve as tools to examine every
cost factor in more detail, so that correct decisions can be made when adjusting
the entire cost components to increase overall performance. The cost is primarily
expressed in terms of the elapsed time taken to answer a query.
The processing paradigm is processor farming, consisting of a master processor
and multiple slave processors. Using this paradigm, the master distributes the work
to the slaves. The aim is to make all slaves busy at any given time, that is, the
High-Performance Parallel Database Processing and Grid Databases,
by David Taniar, Clement Leung, Wenny Rahayu, and Sushant Goel
Copyright  2008 John Wiley & Sons, Inc.
33
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.

Data processing in each processor is based on the number of records. For
example, the evaluation of an attribute is performed at a record level. On the other
hand, systems processing, such as I/O (read/write data from/to disk) and data
distribution in an interconnected network, is done at a page level,whereapage
normally consists of multiple records.
In terms of their notations, for the actual size of a table, a capital letter, such as
R, is used. If two tables are involved in a query, then the letters R and S are used
to indicate tables 1 and 2, respectively. Table size is measured in bytes. Therefore,
if the size of table R is 4 gigabytes, when calculating a cost equation variable R
will be substituted by 4 ð 1024 ð 1024 ð 1024.
For the number of records, the absolute value notation is used. For example,
the number of records of table R is indicated by jRj. Again, if table S is used in
the query, jSj denotes number of records of this table. In calculating the cost of an
equation, if there are 1 million records in table R,variablejRj will have a value of
1,000,000.
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
2.2 Cost Notations 35
Table 2.1 Cost notations
Symbol Description
Data parameters
R Size of table in bytes
R
i
Size of table fragment in bytes on processor i
|R | Number of records in table R
|R
i
| Number of records in table R on processor i
Systems parameters
N Number of processors

the number of records in table R on processor i is indicated by jR
i
j.Thesame
notation is applied to table S whenever it is used in a query.
As the subscript i indicates the processor number, R
1
and jR
1
j are fragment
table size and number of records of table R in processor 1, respectively. The values
of R
1
and jR
1
j may be different from (or the same as), say for example, R
2
and
jR
2
j. However, in parallel database query processing, the elapsed time of a query
processing is determined by the longest time spent in a processor. In calculating the
elapsed time, we are concerned only with the processors having the largest number
of records to process. Therefore, for i D 1 :::n, we choose the largest R
i
and jR
i
j
to represent the longest elapsed time of the heaviest load processor. If table R is
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
36 Chapter 2 Analytical Models

jDjRj=N (jR
i
jD1;000;000=10 D 100;000 records).
If the data is not uniformly distributed, jR
i
j denotes the largest number of
records in a processor. Realistically, jR
i
j must be larger than jRj=N,orinother
words, the divisor must be smaller than N .Usingthesameexampleasabove,
jR
i
j must be larger than 100,000 records (say for example 200,000 records). This
shows that the processor having the largest record population is the one with
200,000 records. If this is the case, jR
i
jD200;000 records is obtained by dividing
jRjD1;000;000 by 5. The actual number of the divisor must be modeled correctly
to imitate the real situation.
There are two other important systems parameters, namely:
ž
Page size (P/ and
ž
Hash table size (H/
Page size, indicated by P, is the size of one data page in bytes, which contains
a batch of records. When records are loaded from disk to main memory, it is not
loaded record by record, but page by page.
To calculate the number of pages of a given table, divide the table size by the
page size. For examples, R D 4 gigabytes (D 4 ð 1024
3

mined by the number of records in the query result, and the original total number
of records. Like π, selectivity ratio σ also ranges from 0 to 1. For example, sup-
pose initially there are 1000 records (jR
i
jD1000 records), and the query produces
4 records. The selectivity ratio σ is then 4/1000 D 1=250 D 0:004.
Selectivity ratio σ is used in many different query operations. To distinguish
one selectivity ratio from the others, a subscript can be used. For example, σ
p
in a parallel group-by query processing indicates the number of groups produced
in each processor. Using the above example, the selectivity ratio σ of 1/250 (σ D
0:004) means that each group in that particular processor gathers an average of 250
original records from the local processor.
If the query operation involves two tables (like in a join operation), a selectivity
ratio can be written as σ
j
, for example. The value of σ
j
indicates the ratio between
the number of records produced by a join operation and the number of records
of the Cartesian product of the two tables to be joined. For example, jR
i
jD1000
records and jS
i
jD500 records; if the join produces 5 records only, then the join
selectivity ratio σ
j
is 5=.1;000 ð 500/ D 0:00001.
Projectivity and selectivity ratios are important parameters in query processing,

The time to write the query results into a disk is very much reduced as only a
small subset of R
i
is selected. Therefore, in the cost equation, in order to reduce
the number of records as indicated by the query results, R
i
is normally multiplied
by other query parameters, such as π and σ.
Times to read/write a record in/to main memory are indicated by t
r
and t
w
,
respectively. These two unit costs are associated with reading records, which are
already in the main memory. These two unit costs are also used when obtaining
records from the data page. Note now that these two unit costs work at a record
level, not at a page level.
The time taken to perform a computation in the main memory varies from one
computation type to another, but basically, the notation is t followed by a subscript
that denotes the type of computation. Computation time in this case is the time
taken to compute a single process in the CPU. For example, the time taken to hash
a record to a hash table is shown as t
h
, and the time taken to add a record to current
aggregate value in a group by operation is denoted as t
a
.
Finally, the time taken to compute the destination of a record is denoted by t
d
.

being sent. The unit cost for the sending is the sum of the two communication cost
components.
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
2.3 Skew Model 39
At the receiver end, the receiver cost is the total cost of receiving records in
pages, which is calculated by multiplying number of pages received and the mes-
sage protocol cost per page only. Note that in the receiver cost, the message latency
is not included. Therefore, continuing the above example, the receiving cost would
be R=P ð m
p
.
In a multiprocessor environment, the sending cost is the cost of sending data
from one processor to another. The sending cost will come from the heaviest loaded
processor, which sends the largest volume of data. Assume the number of pages to
be sent by the heaviest loaded processor is p
1
; the sending cost is p
1
ð .m
p
C m
l
/.
However, the receiving cost is not just simply p
1
ð .m
p
/, since the maximum page
size sent by the heaviest loaded processor may likely be different from the max-
imum page size received by the heaviest loaded processor. As a matter of fact,

the perspective of parallel query processing, it does not matter whether or not the
processor is the same.
As has been shown above, the most important cost component is in fact p
1
and
p
2
, and these must be accurately modeled to reﬂect the accuracy of the communi-
cation costs involved in a parallel query processing.
2.3 SKEW MODEL
Skew has been one of the major problems in parallel processing. Skew is deﬁned as
the nonuniformity of workload distribution among processing elements. In parallel
external sorting, there are two different kinds of skew, namely:
ž
Data skew and
ž
Processing skew
Data skew is caused by the unevenness of data placement in a disk in each local
processor, or by the previous operator. Unevenness of data placement is caused by
the fact that data value distribution, which is used in the data partitioning function,
may well be nonuniform because of the nature of data value distribution. If initial
data placement is based on a round-robin data partitioning function, data skew
will not occur. However, it is common for database processing not to involve a
single operation only. It sometimes involves many operations, such as selection
ﬁrst, projection second, join third, and sort last. In this case, although initial data
placement is even, other operators may have rearranged the data—some data are
eliminated, or joined, and consequently, data skew may occur when the sorting is
about to start.
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
40 Chapter 2 Analytical Models

The symbol θ denotes the degree of skewness, where θ D 0 indicates no skew
and θ D 1 highly skewed. Clearly, when θ D 0, the fragment sizes follow a discrete
uniform distribution with jR
i
jD
jRj
N
. This is an ideal distribution, as there is no
skew. In contrast, when θ D 1 indicating a high degree of skewness, the fragment
sizes follow a pure Zipf distribution. Here, the above equation becomes:
jR
i
jD
jRj
i ð
N
P
jD1
1
j
D
jRj
i ð H
N
³
jRj
i ð .γ C ln N/
(2.2)
where γ D 0:57721 (Euler’s constant)andH
N

N
(2.4)
and when it is highly skewed, jR
i
jD
jRj
N
P
jD1
1
j
θ
. To illustrate the difference between
these two equations, we use the example shown in Figures 2.1 and 2.2. In this
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
2.3 Skew Model 41
No Skew
0
10000
20000
30000
40000
12345678
Processor Number
Number of Records
(
R
i
)
Figure 2.1 Uniform distribution (no skew)

In extreme situations, the heaviest loaded processor can hold all the records
(e.g., 100,000 records), whereas all other processors are empty. Although this is
possible, in real implementation, it may rarely happen. And this is why a more
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
42 Chapter 2 Analytical Models
Comparison
0
5000
10000
15000
20000
25000
30000
35000
40000
q = 1.0 q = 0.8 q = 0.5 q = 0
Number of Records (

R
i

)
12345678
Processor
Figure 2.3 Comparison between highly skewed, less skewed, and no-skew distributions
realistic distribution model is used, such as the Zipf model, which has been
well-regarded as being suitable for modeling data distribution in parallel database
systems.
Figures 2.1 and 2.2 actually show the two extremes, namely highly skewed and
no skew at all. In practice, the degree of skewness may vary between θ D 0and

R= 100,000 records
0
10000
20000
30000
40000
50000
Number of Processors
(
N
)
Number of Records (R
i
)
No Skew
25000 12500 6250 3125 1563 781 391
Highly Skew
48077 36765 29586 24631 21097 18416 16340
4 8 16 32 64 128 256
Figure 2.4 Comparison between the heaviest loaded processors using no-skew and highly skewed
distributions
Table 2.2 Divisors (with vs. without skew)
N 4 8 16 32 64 128 256
Divisor without skew 4 8 16 32 64 128 256
Divisor with skew 2.08 2.72 3.38 4.06 4.74 5.43 6.12
distribution, jRj is divided by a corresponding divisor shown in the last row in
order to obtain jR
i
j.
The divisor with the high skew remains quite steady compared with the one

i
), page size (P/,and
the I/O unit cost (IO). R
i
and jPj are needed to calculate the number of pages to
be read/written, whereas IO is the actual unit cost.
If all records are being loaded from a disk, then we use R
i
to indicate the size
of the table read. If the records have been initially stored and distributed evenly to
all disks, then we use a similar equation to Equation (2.4) to calculate R
i
,where
R
i
D R=N .
However, if the initial records have not been stored evenly in all disks, then it
is skewed, and a skew model must be used. As aforementioned, in performance
modelling, when it is skewed, we normally assume it is highly skewed with θ D
1:0. Therefore, we use an equation similar to Equation 2.3 to determine the value
of R
i
, which gives R
i
D R=.γ C ln N/.
Once the correct value of R
i
has been determined, we can calculate the total
cost of reading the data page from the disk as follows:
scanning cost D R

Unlike disk operations, main memory operations are based on records, not on
pages. In other words, jR
i
j is used instead of R
i
.
The select cost is calculated as the number of records loaded from the disk times
the reading and writing unit costs to the main memory (t
r
and t
w
). The reading unit
cost is used to model the reading operation of records from the data page, whereas
the writing unit cost is to actually write the record, which has been read from the
data page, to main memory. Therefore, a select cost is calculated as follows:
select cost DjR
i
jð.t
r
C t
w
/ (2.7)
Equations 2.3 and 2.4 can be used to estimate jR
i
j, in the case of skew and
no-skew data distribution, respectively.
The query results generation cost is similar to the select cost, like the disk writ-
ing cost is to the disk reading cost. In the query results generation cost, there are
two main important differences in particular. One is that the unit time cost is the
writing cost (t

the data has been loaded from its local disk and redistribute the data immediately
to other processors depending on some distribution function. Some other
algorithms perform initial data computation on the local data before distributing
it to other processors for further data computation. Data computation and data
distribution may be carried out in several steps, also depending on the algorithms.
Data Computation
As data computation works in main memory, the cost is based on the number of
records involved in the computation and the unit computation time itself. Each
data computation operation may involve several basic costs, such as unit costs for
hashing, for adding the current record to the aggregate value, and so on. However,
generally, the data computation cost is a product of the number of records involved
in the computation (jR
i
j/ and the data computation unit costs (t
x
,wherex indicates
the total costs for all operations involved). Hence, a general data computation cost
takes the form:
data computation cost DjR
i
jð.t
x
/ (2.9)
Equation (2.9) assumes that the number of records involved in the data compu-
tation is jR
i
j. If the number of records has been reduced because of previous data
computation, then we must insert additional variables to reduce jR
i
j. Also, the data

j has been reduced, additional cost variables must be included.
Also, an appropriate assumption must be made whether jR
i
j involves skew or no
skew.
The data transmission itself, which is explained above in Section 2.2.5, is
divided into the sending cost and the receiving cost.
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
2.7 Exercises 47
2.5 SUMMARY
This chapter is basically centered on the basic cost models to analytically model
parallel query processing. The basic elements of cost models include:
ž
Basic cost notations, which includes several important parameters, such as
data parameters, systems parameters, query parameters, time unit costs, and
communication costs
ž
Skew model,usingaZipf distribution model
ž
Basic parallel database processing costs, including general steps of parallel
database processing, such as disk costs, main memory costs, data computation
costs, and data distribution costs
2.6 BIBLIOGRAPHICAL NOTES
Two excellent books on performance modeling are Leung (1988) and Jain (1991).
Although the books are general computer systems performance modeling and anal-
ysis books, some aspects may be used in parallel database processing. A general
book on computer architecture is Hennessy and Patterson (1990), where the details
of a low-level architecture are discussed.
Speciﬁc cost models for parallel database processing can be found in
Hameurlain and Morvan (DEXA 1995), Graefe and Cole (ACM TODS 1995),

ponents. Investigate your favorite DBMS and ﬁnd out what kind of tools are available
to examine the query processing costs.
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
Part II
Basic Query
Parallelism
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
Chapter 3
Parallel Search
Searching is a common task in our everyday lives and may involve activities such
as searching for telephone numbers in a directory, locating words in a dictionary,
checking our appointment diary for a given day/time, etc., etc. Searching is also a
key activity in database applications. Searching is the task of locating a particular
record within a collection of records. Searching is one of the most primitive, yet most
of the time the most accessed, operations in database applications. In this chapter, we
focus on search operations.
In Section 3.1, search queries are expressed in SQL. A search classiﬁcation is
also given based on the searching predicate in the SQL. As parallel search is very
much determined by data partitioning, in Section 3.2 various data partitioning meth-
ods are discussed. These include single-attribute-based data partitioning methods,
no-attribute-based data partitioning methods, and multiattribute-based partitioning
methods. The ﬁrst two are categorized as basic data partitioning, whereas the latter
is called complex data partitioning.
Section 3.3 studies serial and parallel search algorithms. Serial search algorithms,
together with data partitioning, form parallel search algorithms. Therefore, under-
standing these two key elements is an important aspect of gaining a comprehensive
understanding of parallel search algorithms.
3.1 SEARCH QUERIES
The search operation in databases is represented by the selection operation.

Query 3.1:
Select *
From STUDENT
Where Sid D 23;
The resulting table of an exact match query can contain more than one record,
depending on whether there are duplicate values in the search attribute. In this
case, since the search predicate is on the primary key, the resulting table con-
tains one record only. However, if the search predicate is on a nonprimary key
attribute in which duplicate values are allowed, it is likely that the resulting table
will contain more than one record. For example, the query “retrieve student details
with last name Robinson” may return multiple records. The SQL is expressed as
follows:
Query 3.2:
Select *
From STUDENT
Where Slname D ‘Robinson’;
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
3.1 Search Queries 53
3.1.2 Range Search Query
A range search query is a query where the search attribute attr value in the
query result may contain more than single unique values. Range queries fall into
two categories:
ž
Continuous range search query and
ž
Discrete range search query
In the continuous range search query, the search predicates contain a continuous
range check, normally with continuous range-checking operators, such as < , Ä, >
, ½,!D, Between, Not, and Like operators. On the other hand, the discrete range
search query uses discrete range check operators, such as In and Or operators.

Where Sdegree IN (‘BCS’, ‘BinfSys’)
And Sgpa > 3.50;
In this case (Query 3.5), the ﬁrst predicate is a discrete range predicate as in
Query 3.4, whereas the second predicate is a continuous range predicate as in
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
54 Chapter 3 Parallel Search
Query 3.3. Therefore, the resulting table contains only those excellent BCS and
BInfSys students (measured by greater than 3.50 in their GPAs).
3.1.3 Multiattribute Search Query
Both exact match and range search queries as given in Queries 3.1–3.4 involve
single attributes in their search predicates. If multiple attributes are involved, we
call this query a multiattribute search query. Each attribute in the predicate can be
either an exact match predicate or a range predicate.
Multiattribute search query can be classiﬁed into two types, depending on
whether
AND or OR operators are used in linking each of the simple predicates.
Complex predicates involving
AND operators are called conjunctive predicates,
whereas predicates involving
OR operators are called disjunctive predicates.When
AND and OR operators exist, it is common for the predicate to be normalized in
order to form a conjunctive prenex normal form (CPNF).
An example of a multiattribute search query is “retrieve student details with
the surname ‘Robinson’ enrolled in either BCS or BInfSys”. This query is sim-
ilar to Query 3.2 above, with further ﬁltering in which only BCS and BInfSys
are selected. The ﬁrst predicate is an exact match predicate on attribute
Slname,
whereas the second predicate is a discrete range predicate on attribute
Sdegree.
These simple predicates are combined in a form of CPNF. The SQL of the above

Nhờ tải bản gốc

Tài liệu, ebook tham khảo khác

Tài liệu High-Performance Parallel Database Processing and Grid Databases- P2 - Pdf 98

Tài liệu, ebook tham khảo khác

Học thêm