Tài liệu Độ tin cậy của hệ thống máy tính và mạng P6 - Pdf 87

6
NETWORKED SYSTEMS
RELIABILITY
Reliability of Computer Systems and Networks: Fault Tolerance, Analysis, and Design
Martin L. Shooman
Copyright 
2002
John Wiley & Sons, Inc.
ISBNs:
0
-
471
-
29342
-
3
(Hardback);
0
-
471
-
22460
-X (Electronic)
283
6
.
1
INTRODUCTION
Many physical problems (e.g., computer networks, piping systems, and power
grids) can be modeled by a network. In the context of this chapter, the word
network means a physical problem that can be modeled as a mathematical

matical graphs, tie sets, and cut sets. A summary of the relevant concepts is
given in Section B
2
.
7
, and there is a brief discussion of some aspects of graph
theory in Section
5
.
3
.
5
; other concepts will be developed in the body of the
chapter. The reader should be familiar with these concepts before continuing
with this chapter. For more details on graph theory, the reader is referred to
Shooman [
1983
, Appendix C]. There are of course other approaches to net-
work reliability; for these, the reader is referred to the following references:
Frank [
1971
], Van Slyke [
1972
,
1975
], and Colbourn [
1987
,
1993
,

and interconnections or links. In general, these terms are synonymous and used
interchangeably.
In the most general model, both the nodes and the links can fail, but here
we will deal with a simpliﬁed model in which only the links can fail and the
nodes are considered perfect. In some situations, communication can go only
in one direction between a node pair; the link is represented by a directed edge
(an arrowhead is added to the edge), and one or more directed edges in a graph
result in a directed graph (digraph). If communication can occur in both direc-
tions between two nodes, the edge is nondirected, and a graph without any
directed nodes is an ordinary graph (i.e., nondirected, not a digraph). We will
consider both directed and nondirected graphs. (Sometimes, it is useful to view
DEFINITION OF NETWORK RELIABILITY
285
ab1
42
dc3
5
6
Figure
6
.
1
A four-node graph representing a computer or communication network.
a nondirected graph as a special case of a directed graph in which each link
is represented by two identical parallel links, with opposite link directions.)
When we deal with nondirected graphs composed of E edges and N nodes,
the notation G(N, E) will be used. A particular node will be denoted as n
i
and
a particular edge denoted as e

, and n
4
are a, b, c, and d. Edge
1
is denoted by e
1

e(n
1
, n
2
)

(a, b), edge
2
by e
2

e(n
2
, n
3
)

(b, c), and so forth. The example of a network
graph shown in Fig.
6
.
1
has four nodes (a, b, c, d) and six edges (

!
/
[(
2
!)(
4
−
2
)!]

6
.
In formulating the network model, we will assume that each link is either
good or bad and that there are no intermediate states. Also, independence of
link failures is assumed, and no repair or replacement of failed links is con-
sidered. In general, the links have a high reliability, and because of all the
multiple (redundant) paths, the network has a very high reliability. This large
number of parallel paths makes for high complexity; the efﬁcient calculation
of network reliability is a major problem in the analysis, design, or synthesis
of a computer communication network.
6
.
3
DEFINITION OF NETWORK RELIABILITY
In general, the deﬁnition of reliability is the probability that the system oper-
ates successfully for a given period of time under environmental conditions
(see Appendix B). We assume that the systems being modeled operate con-
tinuously and that the time in question is the clock time since the last failure
286
NETWORKED SYSTEMS RELIABILITY

is called the all-
terminal reliability.
In more formal terms, we can state that the all-terminal reliability is the
probability that node n
i
can communicate with node n
j
for all pairs n
i
n
j
(where
i
϶
j ). We wish to show that this is equivalent to the proposition that node s
can communicate with all other nodes t
1

n
2
, t
2

n
3
, . . . , t
n −
1

n

6
.
1
)
or the two-terminal reliability:
R
st

P(that nodes s and t are connected) (
6
.
2
)
Similarly, k-terminal reliability is the probability that a subset of k nodes
2
≤
k ≤ n) are connected. Thus we must specify what type of reliability we are
discussing when we begin a problem.
We stated previously that repairs were not included in the analysis of net-
work reliability. This is not strictly true; for simplicity, no repair was assumed.
In actuality, when a node-switching computer or a telephone communications
line goes down, each is promptly repaired. The metric used to describe a
repairable system is availability, which is deﬁned as the probabilty that at any
instant of time t, the system is up and available. Remember that in the case
of reliability, there were no failures in the interval
0
to t. The notation is A(t),
and availability and reliability are related as follows by the union of events:
DEFINITION OF NETWORK RELIABILITY
287

expanded as a sum of probabilities:
A(t)

P(no failure in interval
0
to t)
+ P(
1
failure and
1
repair in interval
0
to t)
+ P(
2
failures and
2
repairs in interval
0
to t)+·· · (
6
.
4
)
Clearly,
•
The ﬁrst term in Eq. (
6
.
4

a single element with various failure and repair probability distributions can
become quite complex. In general, the derivations are simpliﬁed by assuming
exponential probability distributions for the failure and repair times (equiv-
alent to constant-failure rate, l, and constant-repair rate, m). Sometimes, the
mean time to failure (MTTF) and the mean time to repair (MTTR) are used
to describe the repair process and availability. In many cases, the terms mean
time between failure (MTBF) and mean time between repair (MTBR) are used
instead of MTTF and MTTR. For constant-failure and -repair rates, the mean
times become MTBF

1
/
l and MTBR

1
/
m. The solution for A(t) has an
exponentially decaying transient term and a constant steady-state term. After a
few failure repair cycles, the transient term dies out and the availability can be
represented by the simpler steady-state term. For the case of constant-failure
and -repair rates for a single item, the steady-state availability is given by the
equation that follows (see Appendix B).
A
ss

m
/
(l + m)

MTBF

6
)
The formulation given in Eq. (
6
.
6
) is more convenient than that of Eq. (
6
.
5
)
if we wish to estimate A
ss
based on collected ﬁeld data. In the case of a com-
puter network, the availability computations can become quite complex if the
repairs of the various elements are coupled, in which case a single repairman
might be responsible for maintaining, say, two nodes and ﬁve lines. If sev-
eral failures occur in a short period of time, a queue of failed items wait-
ing for repairs might build up and the downtime is lengthened, and the term
“repairman-coupled” is used. In the ideal case, if we assume that each element
in the system has its own dedicated repairman, we can guarantee that the ele-
ments are decoupled and that the steady-state availabilities can be substituted
into probability expressions in the same way as reliabilities are. In a practi-
cal case, we do not have individual repairmen, but if the repair rate is much
larger than the failure rate of the several components for which the repairman
supports, then approximate decoupling is a good assumption. Thus, in most
network reliability analyses there will be no distinction made between reli-
ability and availability; the two terms are used interchangeably in the network
ﬁeld in a loose sense. Thus a reliability analyst would make a combinatorial
model of a network and insert reliability values for the components to calculate

e
combinations. Each of these combinations of
good and bad edges can be treated as an event E
i
. These events are all mutually
TWO-TERMINAL RELIABILITY
289
exclusive (disjoint), and the reliability expression is simply the probability of
the union of the subset of these events that contain a path between s and t.
R
st

P(E
1
+ E
2
+ E
3
·· ·) (
6
.
7
)
Since each of these events is mutually exclusive, the probability of the union
becomes the sum of the individual event probabilities.
R
st

P(E
1

1
. We are interested in the two-terminal reli-
ability for node pair a and b; thus s

a and t

b. Since there are six edges,
there are
2
6

64
events associated with this graph, all of which are presented
in Table
6
.
1
. The following deﬁnitions are used in constructing Table
6
.
1
:
E
i

the event i
j

the success of edge j
j


[P(E
1
)] + [P(E
2
) +·· ·+P(E
7
)] + [P(E
8
) + P(E
9
) + · · · + P(E
22
)]
+ [P(E
23
)+P(E
24
) + · · · + P(E
34
) + P(E
37
) + · · · + P(E
42
)]
+ [P(E
43
)+P(E
44
) + · · · + P(E

6
.
9
) yields a polynomial in p and q:
R
ab

p
6
+
6
qp
5
+
15
q
2
p
4
+
18
q
3
p
3
+
7
q
4
p

6
!
0
!
6
!

1
E
1

123456
Good
One failure:
΂
6
1
΃

6
!
1
!
5
!

6
E
2


6
Good
E
7

123456
′
Good
Two failures:
΂
6
2
΃

6
!
2
!
4
!

15
E
8

1
′
2
′
3456


1
′
23456
′
Good
E
13

12
′
3
′
456
Good
E
14

12
′
34
′
56
Good
E
15

12
′
345

19

123
′
456
′
Good
E
20

1234
′
5
′
6
Good
E
21

1234
′
56
′
Good
E
22

12345
′
6

123
′
45
′
6
′
Good
E
25

123
′
4
′
56
′
Good
E
26

123
′
4
′
5
′
6
Good
E
27

30

12
′
3
′
456
′
Good
E
31

12
′
3
′
45
′
6
Good
E
32

12
′
3
′
4
′
56


1
′
234
′
5
′
6
′
Bad
E
36

1
′
2
′
3456
′
Bad
E
37

1
′
2
′
345
′
6

′
6
Good
E
41

1
′
23
′
4
′
56
Good
E
42

1
′
2
′
3
′
456
Good
Four failures:
΂
6
4
΃

6
′
Good
E
45

12
′
3
′
45
′
6
′
Good
E
46

12
′
3
′
4
′
56
′
Good
E
47


′
6
′
Bad
E
50

1
′
23
′
4
′
56
′
Good
E
51

1
′
23
′
4
′
5
′
6
Bad
E

34
′
5
′
6
Bad
E
55

1
′
2
′
3
′
456
′
Bad
E
56

1
′
2
′
3
′
45
′
6

58

12
′
3
′
4
′
5
′
6
′
Good
E
59

1
′
23
′
4
′
5
′
6
′
Bad
E
60


2
′
3
′
4
′
56
′
Bad
E
63

1
′
2
′
3
′
4
′
5
′
6
Bad
Six failures:
΂
6
6
΃



1
and q

0
, which should yield a reliability of unity. Similarly, evaluating the
292
NETWORKED SYSTEMS RELIABILITY
polynomial for p

0
and q

1
should yield a reliability of
0
. (Any network
has a reliability of unity regardless of its topology if all edges are perfect; it
has a reliability of
0
if all its edges have failed.)
Numerical evaluation of the polynomial for p

0
.
9
and q

0
.

0
.
9
)
4
+
18
(
0
.
1
)
3
(
0
.
9
)
3
+
7
(
0
.
1
)
4
(
0
.

.
0984
+
0
.
0131
+
5
.
67
×
10
−
4
+
9
×
10
−
6
(
6
.
11
b)
R
ab

0
.

imal cut sets and minimal tie sets of the graph (see Appendix B and Shooman
[
1990
, Section
3
.
6
.
5
]). The tie sets are the groups of edges that form a path
between s and t. The term minimal implies that no node or edge is traversed
more than once, but another way of deﬁning this is that minimal tie sets have
no subsets of edges that are a tie set. If there are i tie sets between s and t,
then the reliability expression is given by the expansion of
R
st

P(T
1
+ T
2
+ · · · + T
i
)(
6
.
12
)
Similarly, one can focus on the minimal cut sets of a graph. A cut set is a
group of edges that break all paths between s and t when they are removed

2
.
Since there are fewer cut sets, it is easier to use Eq. (
6
.
13
) rather than Eq.
(
6
.
12
); however, there is no general rule for when j < i or vice versa.
TWO-TERMINAL RELIABILITY
293
TABLE
6
.
2
Minimal Tie Sets and Cut Sets for the
Example of Fig.
6
.
1
(s

a, t

b)
Tie Sets
Cut Sets

C
3

1
′
5
′
6
′
3
′
T
4

234
C
4

1
′
2
′
3
′
4
′
T
5

536

1
′
6
′
2
′
+
1
′
5
′
3
′
6
′
+
1
′
2
′
3
′
4
′
)(
6
.
14
b)
R

′
3
′
4
′
)]
+ [P(
1
′
2
′
4
′
5
′
6
′
)+P(
1
′
3
′
4
′
5
′
6
′
) + P(
1

6
′
) + P(
1
′
2
′
3
′
4
′
5
′
6
′
)]
− [P(
1
′
2
′
3
′
4
′
5
′
6
′
)+P(

′
3
′
4
′
5
′
6
′
)] + [P(
1
′
2
′
3
′
4
′
5
′
6
′
)] (
6
.
14
c)
The expansion of the probability of a union of events that occurs in Eq. (
6
.


P([
1
′
4
′
5
′
][
1
′
5
′
6
′
3
′
])

P(
1
′
3
′
4
′
5
′
6
′

c) becomes
R
ab

1
− [
2
q
3
+
2
q
4
] + [
5
q
5
+ q
6
] − [
4
q
6
] + [q
6
]
R
ab

1


0
for q

1
, are valid.
For q

0
.
1
, Eq. (
6
.
15
) yields
R
ab

1
−
2
×
0
.
1
3
−
2
×

16
) is identical to Eq. (
6
.
11
c). If we substitute
tie sets into Eq. (
6
.
12
), we would get a different though equivalent expression.
The expansion of Eq. (
6
.
13
) has a complexity of
2
j
and is more complex
than Eq. (
6
.
12
) if there are more cut sets than tie sets. At this point, it would
294
NETWORKED SYSTEMS RELIABILITY
seem that we should analyze the network and see how many tie sets and cut
sets exist between s and t, and assuming that i and j are manageable numbers
(as is the case in the example to follow), then either Eq. (
6

ﬁnding the number of cut sets are of polynomial complexity; one discussed in
Shier [
1991
, p.
63
] is of complexity order O(n + e + ie). In the case of cut sets,
the ﬁnding algorithms are also of polynomial complexity, and Shier [
1991
, p.
69
] discusses one that is of order O([n + e] j). Observe that the notation O( f )
is called the order of f or “big O of f.” For example, if f

5
x
3
+
10
x
2
+
12
, the
order of f would be the dominating term in f as x becomes large, which is
5
x
3
.
Since the constant
5

complexity are exponential, O(
2
i
) or O(
2
j
) [Colbourn,
1987
,
1993
]. This is
the reason why approximate methods are discussed in the next two sections.
In addition, some of these algorithms are explored in the problems at the end
of this chapter.
If we examine Eqs. (
6
.
12
) and (
6
.
13
), we see that the complexity of
these expressions is a function of the cut sets or tie sets, the number of
edges in the cut sets or tie sets, and the number of “brackets” that must be
expanded (the number of terms in the union of cut sets or tie sets—i.e., in
the inclusion–exclusion formula). We can approximate the cut-set or tie-set
expression by dropping some of the less-signiﬁcant brackets of the expansion,
by dropping some of the less-signiﬁcant cut sets or tie sets, or by both.
6

unity term, one obtains the same expression that would have ensued had the
cuts been disjoint (but they are not). Thus we will call the retention of only
the ﬁrst two terms the disjoint approximation.
In Shooman [
1990
, Section
3
.
6
.
5
], it is shown that a disjoint cut-set approx-
imation is a lower bound. For the example of Fig.
6
.
1
, we obtain Eq. (
6
.
17
)
for the disjoint approximation, and assuming q

0
.
1
:
R
ab
≥

6
.
16
). If we include the
next bracket in Eq. (
6
.
14
c), we get a closer approximation at the expense of
computing [ j + (
j
2
)]

[ j( j −
1
)
/
2
] terms.
R
ab
≤
1
− [
2
q
3
+
2

6
.
18
)
Equation (
6
.
18
) is not only an approximation but an upper bound. In fact, as
more terms are included in the inclusion–exclusion formula, we obtain a set of
alternating bounds (see Shooman [
1990
, Section
3
.
6
.
5
]). Note that Eq. (
6
.
17
)
is a sharp lower bound and that Eq. (
6
.
18
) is ever sharper, but both equa-
tions effectively bracket the exact result. Clearly, the sharpness of these bounds
increases as q

0
.
997851
2

0
.
9978255
(
6
.
20
)
The accuracy of the preceding approximation can be evaluated by examining
the deviation in the computed probability of failure F
ab

1
− R
ab
. In the region
of high reliability, all the values of R
ab
are very close to unity, and differences
are misleadingly small. Thus, as our error criterion, we will use
% error

|
F
ab

.
20
) yields
% error

|
0
.
0021745
−
0
.
002152
|
0
.
002152
×
100
%

1
.
05
(
6
.
22
)
296

(
6
.
23
)
A moment’s reﬂection leads to the conclusion that the highest-order approx-
imation will always be the closest and should be used in the denominator of
an error bound. The numerator, on the other hand, should be the difference
between the two highest-order terms. Thus, for our example,
% error ≈
|
0
.
002200
−
0
.
002149
|
0
.
0021749
×
100
%

2
.
37
(

2
j
to a poly-
nomial in j (perhaps j
2
or j
3
).
6
.
4
.
4
Subset Approximations
In the last section, we discussed approximation by truncating the inclusion–
exclusion expression. Now we discuss approximation by exclusion of low-
probability cut sets or tie sets. Clearly, the occurrence probability of the lower-
order (fewer edges) cut sets is higher than the higher-order (more edges) ones.
Thus, we can approximate Eq. (
6
.
14
a) dropping C
3
and C
4
fourth-order cut
sets and retaining the third-order cut set to yield an upper bound (since we
have dropped cut sets, we are making an optimistic approximation).
R

′
6
′
2
′
) + P(
1
′
2
′
4
′
5
′
6
′
)(
6
.
25
a)
For q

0
.
1
,
TWO-TERMINAL RELIABILITY
297
R

, T
3
)—compare Eq. (
6
.
12
) and
Table
6
.
2
.
R
ab
≥ P(T
1
+ T
2
+ T
3
)

P(
1
+
25
+
46
)


p
2
−
2
p
3
− p
4
+ p
5

0
.
9
+
2
×
0
.
9
2
−
2
×
0
.
9
3
−
0

99639
≤ R
ab
≤
0
.
99801
(
6
.
27
)
and approximate R
ab
by the midpoint of the two bounds:
R
ab

0
.
99639
+
0
.
99801
2

0
.
9971955

8
(
6
.
29
)
The percentage error is larger than in the case of the truncation approxima-
tions, but it remains small enough for the approximation to be valid. The com-
plexity is still exponential—of order
2
x
; however, x is now a small integer and
2
x
is of modest size. Furthermore, the tie-set and cut-set algorithms take less
time since we now do not need to ﬁnd all cut sets and tie sets—only those of
order ≤ x. Of course, one can always combine both approximation methods by
dropping out higher-order cut sets and then also truncating the expansion. For
more details on network reliability approximations, see Murray [
1992
,
1993
].
6
.
4
.
5
Graph Transformations
Anyone who has studied electric-circuit theory in a physics or engineering

1
3
2
4
5
Expand about 5: = [ (5) ( ) + (5 ) ( )]R P Pr G P Pr G
ad 12
’
Figure
6
.
2
Illustration of series, parallel, and decomposition transformations for two-
terminal pair networks.
of equivalent network transformations: some that are remarkably similar, and
some that are quite different, especially in the case of all-terminal reliability (to
be discussed later). We must remember that these are not ﬂow transformations
but probability transformations.
This method of calculating network reliability is based on transforming the
network into a simpler network (or set of networks) by successively applying
transformations. Such transformations are simpler for two-terminal reliability
than for all-terminal reliability. For example, for two-terminal reliability, we
can use the transformations given in Fig.
6
.
2
. In this ﬁgure, the series transfor-
mation indicates that we replace two branches in series with a single branch
that is denoted by the intersection of the two original branches (
1

2
can now be evaluated by using combinations of series and parallel transforma-
tions. These three transformations—series, parallel, and decomposition—are
all that is needed to perform the reliability analysis for many networks.
TWO-TERMINAL RELIABILITY
299
Now we discuss a more difﬁcult network conﬁguration. In the ﬁrst transfor-
mation in Fig.
6
.
2
(a) series, we readily observe that both (intersection) edges
1
and
2
must be up for a connection between a and b to occur. However, this
transformation only works if there is no third edge connected to node b; if
a third edge exists, a more elaborate transformation is needed (which will be
discussed in Section
6
.
6
on all-terminal reliability). Similarly, in the case of
the parallel transformation, nodes a and b are connected if either (union) a or
b is up.
Assume that any failures of edge
1
and edge
2
are independent and the

1
p
2
, and for the parallel subnetwork in Fig.
6
.
2
(b), p
ab

p
1
+ p
2
− p
1
p
2

1
− q
1
q
2
.
The case of decomposition (called the keystone component method in sys-
tem reliability [Shooman,
1990
] or the edge-factoring method in network reli-
ability) is a little more subtle; it is used to eliminate an edge x from a graph.

x is good)
+ P(x is bad) × P(there is a path between s and t
|
x is bad) (
6
.
30
)
The preceding equation can be rewritten in a more compact notation as follows:
P
st

P(x)P(G
1
) + P(x
′
)P(G
2
)(
6
.
31
)
The term P(G
1
) is the probability of a connection between s and t for the
modiﬁed network where x is good, that is, the terminals at either end of edge
x are connected to the graph [see Fig.
6
.

)(
1
− q
2
q
4
) + q
5
(p
1
p
2
+ p
3
p
4
− p
1
p
2
p
3
p
4
)(
6
.
32
)
300

and G
2
are a bit more complex, and
sometimes transformations are recursively computed. More examples of trans-
formations appear in the problems at the end of this chapter; for a complete
discussion of transformations, see Satyanarayana [
1985
] and A. M. Shooman
[
1992
].
We can illustrate the use of the three transformations of Fig.
6
.
2
on the
network given in Fig.
6
.
1
, where we begin by decomposing about edge
6
.
R
ab

P(
6
)
.

6
is
simply removed from the network.
We now calculate P(G
1
) and P(G
2
) for a connection between nodes a and
b with the aid of the series and parallel transformations of Fig.
6
.
2
:
P(G
1
)

P(
1
+
4
+
52
+
53
)

[P(
1
)+P(

6
.
34
)
P(G
2
)

P(
1
+
25
+
243
)

[P(
1
)+P(
25
) + P(
243
)]
− [P(
125
)+P(
1243
) + P(
2543
)] + [P(

p
3
+
4
p
4
− p
5
] + q[ p + p
2
−
2
p
4
+ p
5
](
6
.
36
)
NODE PAIR RESILIENCE
301
Substitution of p

0
.
9
and q


997848
(
6
.
37
)
Of course, this result agrees with the previous computation given in Eq. (
6
.
16
).
6
.
5
NODE PAIR RESILIENCE
All-terminal reliability, in which all the node pairs can communicate, is dis-
cussed in the next section. Also, k-terminal reliability will be treated as a speci-
ﬁed subset (
2
≤ k ≤ all-terminal pairs) of all-terminal reliability. In this section,
another metric, essentially one between two-terminal and all-terminal, is dis-
cussed.
Van Slyke and Frank [
1972
] proposed a measure they called resilience for
the expected number of node pairs that can communicate (i.e., they are con-
nected by one or more tie sets). Let s and t represent a node pair. The number
of node pairs in a network with N nodes is the number of combinations of N
choose
2

Α
Α
Α
{s, t} ⊂ N
R
st
(
6
.
39
)
We can illustrate a resilience calculation by applying Eq. (
6
.
39
) to the net-
work of Fig.
6
.
1
. We begin by observing that if p

0
.
9
for each edge, symmetry
simpliﬁes the computation. The node pairs divide into two categories: the edge
pairs (ab, ad, bc, and cd ) and the diagonal pairs (ac and bd). The edge-pair
reliabilities were already computed in Eqs. (
6

Nhờ tải bản gốc

Tài liệu, ebook tham khảo khác

Tài liệu Độ tin cậy của hệ thống máy tính và mạng P6 - Pdf 87

Tài liệu, ebook tham khảo khác

Học thêm