Tài liệu Mạng lưới giao thông và đánh giá hiệu suất P1 doc - Pdf 87

1
SELF-SIMILAR NETWORK TRAFFIC:
AN OVERVIEW
K
IHONG
P
ARK
Network Systems Lab, Department of Computer Sciences,
Purdue University, West Lafayette, IN 47907
W
ALTER
W
ILLINGER
Information Sciences Research Center, AT&T LabsÐResearch, Florham Park, NJ 07932
1.1 INTRODUCTION
1.1.1 Background
Since the seminal study of Leland, Taqqu, Willinger, and Wilson [41], which set the
groundwork for considering self-similarity an important notion in the understanding
of network traf®c includingthe modelingand analysis of network performance, an
explosion of work has ensued investigating the multifaceted nature of this phenom-
enon.
1
The long-held paradigm in the communication and performance communities
has been that voice traf®c and, by extension, data traf®c are adequately described by
certain Markovian models (e.g., Poisson), which are amenable to accurate analysis
and ef®cient control. The ®rst property stems from the well-developed ®eld of
Markovian analysis, which allows tight equilibrium bounds on performance vari-
ables such as the waitingtime in various queueingsystems to be found. This also
forms a pillar of performance analysis from the queueingtheory side [38]. The
Self-Similar Network Traf®c and Performance Evaluation, Edited by KihongPark and Walter Willinger
ISBN 0-471-31974-0 Copyright # 2000 by John Wiley & Sons, Inc.

They describe the phenomenon where a certain property of an objectÐfor example,
a natural image, the convergent subdomain of certain dynamical systems, a time
series (the mathematical object of our interest)Ðis preserved with respect to scaling
in space and=or time. If an object is self-similar or fractal, its parts, when magni®ed,
resembleÐin a suitable senseÐthe shape of the whole. For example, the two-
dimensional (2D) Cantor set livingon A 0; 1Â0; 1 is obtained by startingwith
a solid or black unit square, scalingits size by 1=3, then placingfour copies of the
scaled solid square at the four corners of A. If the same process of scalingfollowed
by translation is applied recursively to the resultingobjects ad in®nitum, the limit set
thus reached de®nes the 2D Cantor set. This constructive process is illustrated in Fig.
1.1. The limitingobjectÐde®ned as the in®nite intersection of the iteratesÐhas the
property that if any of its corners are ``blown up'' suitably, then the shape of the
zoomed-in part is similar to the shape of the whole, that is, it is self-similar.Of
Fig. 1.1 Two-dimensional Cantor set.
2
SELF-SIMILAR NETWORK TRAFFIC: AN OVERVIEW
course, this is not too surprisingsince the constructive processÐby its recursive
actionÐendows the limitingobject with the scale-invariance property.
The one-dimensional (1D) Cantor set, for example, as obtained by projectingthe
2D Cantor set onto the line, can be given an interpretation as a traf®c series
X tPf0; 1gÐcall it ``Cantor traf®c''Ðwhere X t1 means that there is a packet
transmission at time t. This is depicted in Fig. 1.2 (left). If the constructive process is
terminated at iteration n ! 0, then the contiguous line segments of length 1=3
n
may
be interpreted as on periods or packet trains of duration 1=3
n
, and the segments
between successive on periods as off periods or absence of traf®c activity. Nonuni-
form traf®c intensities may be imparted by generalizing the constructive framework

correspondingto 1D on=off Cantor traf®c.
1.1 INTRODUCTION
3
components, respectively. The probability measure is represented by ``height''; we
observe that scale invariance is exactly preserved. In general, the traf®c patterns
producible with ®xed weights a
L
, a
R
are limited, but one can extend the framework
by allowing possibly different weights associated with every edge in the weighted
binary tree induced by the 1D Cantor set construction. Such constructions arise in a
more re®ned characterization of network traf®cÐcalled multiplicative processes or
cascadesÐand are discussed in Chapter 20. Further generalizations can be obtained
by de®ningdifferent af®ne transformations with variable scale factors and transla-
tions at every level in the ``traf®c tree.'' The correspondingtraf®c pattern is self-
similar if, and only if, the in®nite tree can be compactly represented as a ®nite
directed cyclic graph [8].
Whereas the previous constructions are given interpretations as traf®c activity
per unit time, we will ®nd it useful to consider their corresponding cumulative
processes, which are nondecreasingprocesses whose differencesÐalso called
increment processÐconstitute the original process. For example, for the on=off
Cantor traf®c construction (cf. Fig. 1.2 (left)), let us assign the interpretation that
time is discrete such that at step n ! 0, it ranges over the values t  0;
1=3
n
; 2=3
n
; ...; 3
n

we observe that exact self-similarity is preserved even in the cumulative process.
This points toward the fact that self-similarity may be de®ned with respect to a
cumulative process with its increment processÐwhich is of more relevance for
traf®c modelingÐ``inheriting'' some of its properties including self-similarity.
An important drawback of our constructions thus far is that they admit only a
strongform of recursive regularityÐthat of deterministic self-similarityÐand needs
to be further generalized for traf®c modeling purposes where stochastic variability is
an essential component.
4
SELF-SIMILAR NETWORK TRAFFIC: AN OVERVIEW
1.1.3 Stochastic Self-Similarity and Network Traf®c
Stochastic self-similarity admits the infusion of nondeterminism as necessitated by
measured traf®c traces but, nonetheless, is a property that can be illustrated visually.
Figure 1.3 (top left) shows a traf®c trace, where we plot throughput, in bytes, against
time where time granularity is 100 s. That is, a single data point is the aggregated
traf®c volume over a 100 second interval. Figure 1.3 (top right) is the same traf®c
series whose ®rst 1000 second interval is ``blown up'' by a factor of ten. Thus the
truncated time series has a time granularity of 10 s. The remaining two plots zoom in
further on the initial segment by rescaling successively by factors of 10.
Unlike deterministic fractals, the objects correspondingto Fig. 1.3 do not possess
exact resemblance of their parts with the whole at ®ner details. Here, we assume that
the measure of ``resemblance'' is the shape of a graph with the magnitude suitably
normalized. Indeed, for measured traf®c traces, it would be too much to expect to
observe exact, deterministic self-similarity given the stochastic nature of many
network events (e.g., source arrival behavior) that collectively in¯uence actual
network traf®c. If we adopt the view that traf®c series are sample paths of stochastic
processes and relax the measure of resemblance, say, by focusingon certain statistics
of the rescaled time series, then it may be possible to expect exact similarity of the
mathematical objects and approximate similarity of their speci®c realizations with
respect to these relaxed measures. Second-order statistics are statistical properties

This work has speci®-
cally been geared toward network traf®c self-similarity [28, 64] and has focused on
exploitingthe immense volume, high quality, and diversity of available traf®c
measurements; for a detailed discussion of these and related issues, see Willinger
and Paxson [72, 73]. At a formal level, the validity of an inference or estimation
technique is tied to an underlyingprocess that presumably generated the data in the
®rst place. Put differently, correctness of system identi®cation only holds when the
data or sample paths are known to originate from speci®c models. Thus, in general, a
sample path of unknown origin cannot be uniquely attributed to a speci®c model,
and the main (and only) purpose of statistical or scienti®c inference is to deal with
this intrinsically ill-posed problem by concludingwhether or not the given data or
sample paths are consistent with an assumed model structure. Clearly, being
consistent with an assumed model does not rule out the existence of other models
that may conform to the data equally well. In this sense, the aforementioned works
on measurement-based traf®c modelinghave demonstrated that self-similarity is
2
The relationship between self-similarity and long-range dependenceÐthey need not be one and the
sameÐis explained in Section 1.4.1.
6
SELF-SIMILAR NETWORK TRAFFIC: AN OVERVIEW
consistent with measured network traf®c and have resulted in addingyet another
class of modelsÐthat is, self-similar processesÐto an already longlist of models for
network traf®c. At a practical level, many of the commonly used inference
techniques for quantifying the degree of self-similarity or long-range dependence
(e.g., Hurst parameter estimation) have been known to exhibit different idiosyncra-
sies and robustness properties. Due to their predominantly heuristic nature, these
techniques have been generally easy to use and apply, but the ensuing results have
often been dif®cult to interpret [64]. The recent introduction of wavelet-based
techniques to the analysis of traf®c traces [1, 23] represented a signi®cant step
toward the development of more accurate inference techniques that have been shown

3
; and two, it is
3
The same holds true for the LBL WAN data considered by Paxson and Floyd [56] and the BU WWW data
analyzed by Crovella and Bestavros [13].
1.2 PREVIOUS RESEARCH
7
well-known that VBR video can be approximated by short-range dependent traf®c
models, which, in turn, makes it possible to investigate certain aspects of the impact
on performance of long-range correlation structure within the con®nes of traditional
Markovian analysis [32, 37].
The second type of causalityÐalso called structural causality [50]Ðis more
subtle in nature, and its roots can be attributed to an empirical property of distributed
systems: the heavy-tailed distribution of ®le or object sizes. For the moment, a
random variable obeyinga heavy-tailed distribution can be viewed as giving rise to a
very wide range of different values, includingÐas its trademarkÐ``very large''
values with nonnegligible probability. This intuition is made more precise in Section
1.4.1. Returningto the causality description, in a nutshell, if end hosts exchange ®les
whose sizes are heavy tailed, then the resultingnetwork traf®c at multiplexingpoints
in the network layer is self-similar [50]. This causal phenomenon was shown to be
robust in the sense of holdingfor a variety of transport layer protocols such as
TCPÐfor example, Tahoe, Reno, and VegasÐand ¯ow-controlled UDP, which
make up the bulk of deployed transport protocols, and a range of network
con®gurations. Park et al. [50] also showed that research in UNIX ®le systems
carried out duringthe 1980s give strongempirical evidence based on ®le system
measurements that UNIX ®le systems are heavy-tailed. This is, perhaps, the most
simple, distilled, yet high-level physical explanation of network traf®c self-similarity.
Correspondingevidence for Web objects, which are of more recent relevance due to
the explosion of WWW and its impact on Internet traf®c, can be found in Crovella
and Bestavros [13].

In the third category are works that provide mathematical models of long-range
dependent traf®c with a view toward facilitatingperformance analysis in the
queueingtheory sense [2, 3, 17, 43, 49, 53, 66]. These works are important in
that they establish basic performance boundaries by investigating queueing behavior
with long-range dependent input, which exhibit performance characteristics funda-
mentally different from correspondingsystems with Markovian input. In particular,
the queue length distribution in in®nite buffer systems has a slower-than-exponen-
tially (or subexponentially) decreasingtail, in stark contrast with short-range
dependent input for which the decay is exponential. In fact, dependingon the
queueing model under consideration, long-range dependent input can give rise to
Weibullian [49] or polynomial [66] tail behavior of the underlyingqueue length
distributions. The analysis of such non-Markovian queueingsystems is highly
nontrivial and provides fundamental insight into the performance impact question.
Of course, these works, in addition to providingvaluable information into network
performance issues, advance the state of the art in performance analysis and are of
independent interest. The queue length distribution result implies that bufferingÐas
a resource provisioningstrategyÐis rendered ineffective when input traf®c is self-
similar in the sense of incurringa disproportionate penalty in queueingdelay vis-a
Á
-
vis the gain in reduced packet loss rate. This has led to proposals advocating a small
buffer capacity=large bandwidth resource provisioningstrategy due to its simplistic,
yet curtailingin¯uence on queueing: if buffer capacity is small, then the ability to
queue or remember is accordingly diminished. Moreover, the smaller the buffer
capacity, the more relevant short-range correlations become in determining buffer
occupancy. Indeed, with respect to ®rst-order performance measures such as packet
loss rate, they may become the dominant factor. The effect of small buffer sizes and
®nite time horizons in terms of their potential role in delimitingthe scope of
in¯uence of long-range dependence on network performance has been studied
[29, 58].

feedback traf®c control. Due to their feedback-free nature, the works on queueing
analysis with self-similar input have direct bearingon the resource dimensioning
problem. The question of quantitatively estimatingthe marginal utility of a unit of
additional resource such as bandwidth or buffer capacity is answered, in part, with
the help of these techniques. Of importance are also works on statistical multiplexing
usingthe notion of effective bandwidth, which point toward how ef®ciently
resources can be utilized when shared across multiple ¯ows [27]. A principal
lesson learned from the resource provisioningside is the ineffectiveness of allocating
buffer space vis-a
Á
-vis bandwidth for self-similar traf®c, and the consequent role of
short-range correlations in affecting ®rst-order performance characteristics when
buffer capacity is indeed provisioned to be ``small'' [29, 58].
On the feedback control side is the work on multiple time scale congestion
control [67, 68], which tries to exploit correlation structure that exists across
multiple time scales in self-similar traf®c for congestion control purposes. In spite
of the negative performance impact of self-similarity, on the positive side, long-
range dependence admits the possibility of utilizing correlation at large time scales,
transformingthe latter to harness predictability structure, which, in turn, can be
affected to guide congestion control actions at smaller time scales to yield signi®cant
performance gains. The problem of designing control mechanisms that allow
correlation structure at large time scales to be effectively engaged is a nontrivial
technical challenge for two principal reasons: one, the correlation structure in
question exists at time scales typically an order of magnitude or more above that
of the feedback loop; and two, the information extracted is necessarily imprecise due
10
SELF-SIMILAR NETWORK TRAFFIC: AN OVERVIEW
to its probabilistic nature.
4
Tuan and Park [67, 68] show that large time scale

whether it is worthwhile to migrate a process given the ®xed, high overhead cost of
process migration [31]. The ensuing opportunities have numerous applications in
traf®c control, one recent example beingthe discrimination of long-lived ¯ows from
short-lived ¯ows such that routingtable updates can be biased toward long-lived
¯ows, which, in turn, can enhance system stability by desensitizingagainst ``trans-
ient'' effects of short-lived ¯ows [61]. In general, the connection duration informa-
tion can also come from directly available information in the application layerÐfor
example, a Web server, when servicinga HTTP request, can discern the size of the
object in questionÐand if this information is made available to lower layers,
decisions such as whether to engage in open-loop (for short-lived ¯ows) or closed-
loop control (for long-lived ¯ows) can be made to enhance traf®c control [67].
4
We remark that understandingthe correlation structure of network traf®c at time scales below the
feedback loop may be of relevance but remains, at this time, largely unexplored [22].
5
A form of Amdahl's Law states that to improve a system's performance, its functioningwith respect to its
most frequently encountered states must be improved. Conversely, performance gain is delimited by the
latter.
1.2 PREVIOUS RESEARCH
11
1.3 ISSUES AND REMARKS
1.3.1 Traf®c Measurement and Estimation
The area of traf®c measurementÐsince the collection and analysis of the original
Bellcore data [41]Ðhas been tremendously active, yieldinga wealth of traf®c
measurements across a wide spectrum of different contexts supportingthe view
that network traf®c exhibits self-similar scalingproperties over a wide range of time
scales. This ®ndingis noteworthy given the fact that networks, over the past decades,
have undergone signi®cant changes in their constituent traf®c ¯ows, user base,
transmission technologies, and scale with respect to system size. The observed
robustness property or insensitivity to changing networking conditions justi®ed

models are parameterized systems that are suf®ciently powerful to give rise to
6
Not surprisingly, extremities in control actions and resource con®gurations do affect the property of
induced network traf®c, in some instances, diminishingself-similar burstiness altogether [50]. Moreover,
re®ned structure in the form of multiplicative scalingover sub-RTT time scales has only recently been
discovered [23].
12
SELF-SIMILAR NETWORK TRAFFIC: AN OVERVIEW
sample paths in the form of measured traf®c time series. Mathematical system
identi®cation, under these circumstances, therefore, is an intrinsically ill-posed
problem. Viewed in this light, the fact that different works can assign disparate
modelinginterpretations to the same measurement data, with differingconclusions,
is not surprising[26, 33]. Put differently, it is well known that with a suf®ciently
parameterized model class, it is always possible to ®nd a model that ®ts a given data
set. Thus, the real challenge lies less in mathematical model ®tting than in physical
modeling, an approach that in addition to describing the given data provides insight
into the causal and dynamic nature of the processes that generated the data in the
®rst place. On the positive side, the discussions about short-range versus long-range
dependence have brought out into the open concerns about nonstationary effects
[16]Ð3 p.m. traf®c cannot be expected to stem from the same source behavior
conditions as 3 a.m. traf®cÐthat can in¯uence certain types of inference and
estimation procedures for long-range dependent processes. These concerns have
spurned the development and adoption of estimation techniques based on wavelets,
which are sensitive to various types of nonstationary variations in the data [1]. What
is not in dispute are computed sample statisticsÐfor example, autocorrelation
functions of measured traf®c seriesÐwhich exhibit nontrivial correlations at time
lags on the order of seconds and above. Whether to call these time scales ``long
range'' or ``short range'' is a matter of subjective choice and=or mathematical
convenience and abstraction. What impact these correlations exert on queueing
behavior is a function of how large the buffer capacity, the level of traf®c intensity,

periods in packet trains has been shown by Park et al. [50], and a more modern
interpretation for the World Wide Web has been demonstrated by Crovella and
Bestavros [13]. One weakness of the on=off model is its assumption of independence
of on=off sources. This has been empirically addressed [50] by studyingthe
in¯uence of dependence arisingfrom multiple sources coupled at bottleneck routers
sharing resources when the ¯ows are governed by feedback congestion control
protocols such as TCP in the transport layer. It was found that couplingdid not
signi®cantly impact long-range dependence. A more recent study [22] shows that
dependence due to feedback and inter¯ow interaction may be the cause for multi-
plicative scalingphenomena observed in the short-range correlation structure, a
re®ned physical characterization that may complement the previous ®ndings, which
focused on coarser structure at larger time scales. We remark that the on=off model is
able to induce both fractional Gaussian noiseÐupon aggregation over multiple ¯ows
and normalizationÐand a form of self-similarity and long-range dependence called
asymptotic second-order self-similarityÐa single process with heavy-tailed on=off
periodsÐwhich constitute two of the most commonly used self-similar traf®c
models in performance analysis.
Finally, physical models, because of their grounding in empirical facts, in¯uence
the general argument advanced in Section 1.3.1 on the ill-posed nature of the
identi®cation problem. They can be viewed as tilting the scale in favor of long-range
dependent traf®c models. That is, since ®le sizes in various network related contexts
have been shown to be heavy-tailed and the physical modelingworks show that
resulting traf®c is long-range dependent, other things being equal, empirical
evidence afforded by physical models biases toward a more consistent and
parsimonious interpretation of network traf®c as being long-range dependent as
opposed to the mathematically equally viable short-range dependence hypothesis.
Thus physical models, by virtue of their casual attribution, can also in¯uence the
choice of mathematical modelingand performance analysis.
1.3.3 Performance Analysis and Traf®c Control
The works on queueinganalysis with self-similar input have yielded fundamental

derived, or the queue is assumed to be ®nite but its over¯ow probability is computed
as the buffer capacity is taken to in®nity. There is, as yet, a chasm between these
asymptotic results and their ®nitistic brethren that have alluded tractability. It is
unclear whether the asymptotic formulasÐbeyond their qualitative relevanceÐare
also practically useful as resource provisioningand traf®c engineeringtools. Further
work is needed in this direction to narrow the gap. Another signi®cant drawback of
the performance analysis resultsÐalso related to the asymptotic nature of queueing
Fig. 1.4 Mean queue length as a function of buffer capacity for input traf®c with varying
long-range dependence a  1:05, 1.35, 1.65, 1.95).
1.3 ISSUES AND REMARKS
15

Nhờ tải bản gốc

Tài liệu, ebook tham khảo khác

Tài liệu Mạng lưới giao thông và đánh giá hiệu suất P1 doc - Pdf 87

Tài liệu, ebook tham khảo khác

Học thêm