Nicolescu/Model-Based Design for Embedded Systems 67842_C001 Finals Page 6 2009-10-1
6 Model-Based Design for Embedded Systems
In addition, in many cases, the same simulation environment can be used
for both function and performance verifications. However, most simulation-
based performance estimation methods suffer from insufficient corner-case
coverage. This means that they are typically not able to provide worst-case
performance guarantees. Moreover, accurate simulations are often computa-
tionally expensive.
In other works [5,6], hybrid performance estimation methods have been
presented that combine simulation and analytic techniques. While these
approaches considerably shorten the simulation run-times, they still cannot
guarantee full coverage of corner cases.
To determine guaranteed performance limits, analytic methods must be
adopted. These methods provide hard performance bounds; however, they
are typically not able to model complex interactions and state-dependent
behaviors, which can result in pessimistic performance bounds.
Several models and methods for analytic performance verifications ofdis-
tributed platforms have been presented so far. These approaches are based
on essentially different abstraction concepts. The first idea was to extend
well-known results of the classical scheduling theory to distributed sys-
tems. This implies the consideration of communication delays, which cannot
be neglected in a distributed system. Such a combined analysis of proces-
sor and bus scheduling is often referred to as holistic scheduling analysis.
Rather than a specific performance analysis method, holistic scheduling is
a collection of techniques for the analysis of distributed platforms, each of
which is tailored toward a particular combination of an event stream model,
a resource-sharing policy, and communication arbitration (see [10,11,15] as
examples). Several holistic analysis techniques are aggregated and imple-
mented in the modeling and analysis suite for real-time applications (MAST)
[3].
∗
HR
, has a higher frame reso-
lution and is displayed in full screen whereas the lower stream, V
LR
,hasa
lower frame resolution and is displayed in a smaller window at the bottom
left edge of the screen.
The MPEG-2 video decoding consists of the following tasks: variable
length decoding (VLD), inverse quantization (IQ), inverse discrete cosine
transformation (IDCT), and motion compensation (MC). In the considered
set-top box, the decoding application is partitioned onto three processors:
CPU
1
,CPU
2
,andCPU
3
. The tasks VLD and IQ are mapped onto CPU
1
for the first video stream (process P
1
)andontoCPU
2
for the second video
stream (process P
3
). The tasks IDCT and MC are mapped onto CPU
3
for both
video streams (processes P
and P
4
are
potentially bursty and need to be smoothed out in order to make sure that
no packets are lost by the video interface, which cannot handle more than a
certain packet rate per stream.
We assume that the arrival patterns of the two streams, V
HR
and V
LR
,
from the network as well as the execution demands of the various tasks in
the system are known. The performance characteristics that we want to ana-
lyze are the worst-case end-to-end delays for the two video streams from
the input to the output of the set-top box. Moreover, we want to analyze the
memory demand of the system in terms of worst-case packet buffer occupa-
tion for the various tasks.
Nicolescu/Model-Based Design for Embedded Systems 67842_C001 Finals Page 8 2009-10-1
8 Model-Based Design for Embedded Systems
Network
Network
interface
V
HR
V
LR
CPU1 CPU3
CPU2
P1
P2
In Section 1.3, we at first will formally describe the above system in the
concrete time domain. In principle, this formalization could directly be used
in order to perform a simulation; in our case, it will be the basis for the MPA
described in Section 1.4.
1.3 Representation in the Time Domain
As can be seen from the example described in Section 1.2, the basic model
of computation consists of component networks that can be described as a
set of components that are communicating via infinite FIFO (first-in first-out)
buffers denoted as channels. Components receive streams of tokens via their
input channels, operate on the arriving tokens, and produce output tokens
that aresent tothe outputchannels. Wealso assumethat thecomponents need
resources in order to actually perform operations. Figure 1.3 represents the
simple component network corresponding to the video decoding example.
Examples of components are tasks that are executed on computing
resources or data communication via buses or interconnection networks.
Therefore, the token streams that are present at the inputs or outputs of a
component could be of different types; for example, they could represent
simple events that trigger tasks in the corresponding computation compo-
nent or they could represent data packets that need to be communicated.
1.3.1 Arrival and Service Functions
In order to describe this model in greater detail, at first we will describe
streams in the concrete time domain. To this end, we define the concept of
arrival functions: R(s, t) ∈ R
≥0
denotes the amount of tokens that arrive in
the time interval [s,t) for all time instances, s, t ∈ R, s < t,andR(t, t) = 0.
Depending on the interpretation of a token stream, an arrival function may
be integer valued, i.e., R(s,t) ∈ Z
≥0
. In other words, R(s,t) “counts” the
3
S
1
S
2
σ
1
σ
2
FIGURE 1.3
Component networks corresponding to the video decoding example in Sec-
tion 1.2: (a) without resource interaction, and (b) with resource interaction.
Nicolescu/Model-Based Design for Embedded Systems 67842_C001 Finals Page 10 2009-10-1
10 Model-Based Design for Embedded Systems
number of tokens in a time interval. Note that we are taking a very liberal
definition of a token here: It just denotes the amount of data or events that
arrive in a channel. Therefore, a token may represent bytes, events, or even
demanded processing cycles.
In the component network semantics, tokens are stored in channels that
connect inputs and outputs of components. Let us suppose that we had
determined the arrival function R
(s, t) corresponding to a component out-
put (that writes tokens into a channel) and the arrival function R(s, t) corre-
sponding to a component input (that removes tokens from the channel); then
we can easily determine the buffer fill level, B(t), of this channel at some time
t: B(t) = B(s) +R
(s, t) −R(s, t).
As has been described above, one of the major elements of the model is
cycles (in case of computing) or into number of bytes (in case of com-
munication). A much more detailed view on workloads and their mod-
eling can be found in [8], for example, modeling time-varying resource
usage or upper and lower bounds (worst-case and best-case resource
demands).
• AND and OR: As a last simple example, let us suppose a component
that only produces output tokens if there are tokens on all inputs
(AND). Then the relation between the arrival functions at the inputs
Nicolescu/Model-Based Design for Embedded Systems 67842_C001 Finals Page 11 2009-10-1
Performance Prediction of Distributed Platforms 11
R
1
(s, t) and R
2
(s, t), and output R
(s, t) is R
(s, t) = min{B
1
(s) +R
1
(s, t),
B
2
(s) + R
2
(s, t)}, where B
1
(s) and B
(s, t)
The above definition can be related to the intuitive notion of a greedy
component as follows: The output between some time λ and t cannot
be larger than C(λ, t), and, therefore, R
(s, t) ≤ R
(s, λ) +C(λ,t),andalso
R
(s, t) ≤ C(s, t). As the component cannot output more than what was
available at the input, we also have R
(s, λ) ≤ R(s, λ) + B(s), and, therefore,
R
(s, t) ≤ min{R(s, λ) + C(λ, t) + B(s), C(s,t)}. Let us suppose that there is
some last time λ
∗
before t when the buffer was empty. At λ
∗
, we clearly
have R
(s, λ
∗
) = R(s, λ
∗
) + B(s). In the interval from λ
∗
12 Model-Based Design for Embedded Systems
Scaler Tokenizer AND OR GPC Shaper
R
R
2
R
2
R
1
R
1
RRRR RRR R R
ω
+
GPC
σ
C
C
FIGURE 1.4
Examples of component types as described in Section 1.3.2.
1.3.3 Composition
The components shown in Figure 1.4 can now be combined to form a
component network that not only describes the flow of tokens but also the
interaction with the available resources. Figure 1.3b shows the component
network that corresponds to the video decoding example. Here, the compo-
nents, as introduced in Section 1.3.2, are used. Note that necessary scaler and
tokenizer components are not shown for simplicity, but they are needed to
relate the different units of tokens and resources, and to form tokens out of
partially computed data.
For example, the input events described by the arrival function, R
concrete stream instances only. However, event and resource streams can
exhibit a large variability in their timing behavior because of nondetermin-
ism and interference. The designer of a real-time system has to provide per-
formance guarantees that cover all possible behaviors of a distributed system
Nicolescu/Model-Based Design for Embedded Systems 67842_C001 Finals Page 13 2009-10-1
Performance Prediction of Distributed Platforms 13
and its environment. In this section, we introduce the abstraction of the MPA
with the RTC [1] (MPA-RTC) that provides the means to capture all possible
interactions of event and resource streams in a system, and permits to derive
safe bounds on best-case and worst-case behaviors.
This approach was first presented in [13] and has its roots in network cal-
culus [7]. It permits to analyze the flow of event streams through a network
of heterogeneous computation and communication resources in an embed-
ded platform, and to derive hard bounds on its performance.
1.4.1 Variability Characterization
In the MPA, the timing characterization of event streams and of the resource
availability is based on the abstractions of arrival curves and service curves,
respectively. Both the models belong to the general class of variability char-
acterization curves (VCCs), which allow to precisely quantify the best-case
and worst-case variabilities of wide-sense-increasing functions [8]. For sim-
plicity, in the rest of the chapter we will use the term VCC if we want to refer
to either arrival or service curves.
In the MPA framework, an event stream is described by a tuple of arrival
curves, α(Δ) =[α
l
(Δ), α
u
(Δ)], where α
l
: R
Similarly, the availability of a resource is described by a tuple of service
curves, β(Δ) =[β
l
(Δ), β
u
(Δ)], where β
l
: R
≥0
→ R
≥0
denotes the lower
service curve and β
u
: R
≥0
→ R
≥0
the upper service curve. Again, we say
that a tuple of service curves, β(Δ), conforms to an event stream described
by the service function, C(s, t), denoted as β |= C iff for all t > s we have
β
l
(t −s) ≤ C(s, t) ≤ β
u
(t −s). Figure 1.5b shows an example tuple of service
curves.
Note that, as defined above, the arrival curves are expressed in terms
of events while the service curves are expressed in terms of workload/
service units. However, the component model described in Section 1.4.2
GPC
αα
β
(a)
GPC
R
C
C
β
R
(b)
FIGURE 1.6
(a) Abstract and (b) concrete GPCs.
themselves. Basically, these curves define the minimum and maximum
workloads imposed on a resource by a given number of consecutive events,
i.e., they capture the variability in execution demands. More details about
workload transformations can be found in [8]. In the simplest case of a con-
stant workload w for all events, an event-based curve is transformed into a
resource-based curve by simply scaling it by the factor w. This can be done
by an appropriate scaler component, as described in Section 1.3.
1.4.2 Component Model
Distributed embedded systems typically consist of computation and com-
munication elements that process incoming event streams and are mapped
on several different hardware resources. We denote such event-processing
units as components. For instance, in the system depicted in Figure 1.2, we
can identify six components: the four tasks, P
1
, P
2
, P
α
, f
β
] with α
= f
α
(α, β) and β
= f
β
(α, β).
1.4.3 Component Examples
In the following, we describe the abstract components of the MPA frame-
work that correspond to the concrete components introduced in Section 1.3:
scaler, tokenizer, OR, AND, GPC, and shaper.
Using the above relation between concrete and abstract components, we
can easily determine the transfer functions of the simple components, tok-
enizer, scaler, and OR, which are depicted in Figure 1.4.
• Tokenizer: The tokenizer outputs only integer tokens and is character-
ized by R
(s, t) =R(s, t). Using the definition of arrival curves, we
simply obtain as the abstract transfer function α
u
(Δ) =α
u
(Δ) and
α
l
2
(Δ) and α
l
(Δ) = α
l
1
(Δ) + α
l
2
(Δ).
The derivation of the AND component is more complex and its corre-
sponding transfer functions can be found in [4,17].
As described in Section 1.3, a GPC models a task that is triggered by
the events of the incoming event stream, which queue up in a FIFO buffer.
The task processes the events in a greedy fashion while being restricted
by the availability of resources. Such a behavior can be modeled with the
following internal relations that are proven in [17]:
∗
∗
The deconvolutions in min-plus and max-plus algebra are defined as (f g)(Δ) =
sup
λ≥0
{f(Δ + λ) − g(λ)} and (f g)(Δ) = inf
λ≥0
{f(Δ + λ) − g(λ)}, respectively. The convo-
lution in min-plus algebra is defined as (f ⊗g)(Δ) = inf
0≤λ≤Δ
{f(Δ −λ) +g(λ)}.
Nicolescu/Model-Based Design for Embedded Systems 67842_C001 Finals Page 16 2009-10-1
16 Model-Based Design for Embedded Systems
(λ)},0}
β
l
(Δ) = sup
0≤λ≤Δ
{β
l
(λ) − α
u
(λ)}
In the example system of Figure 1.2, the processing semantics of the tasks
P
1
, P
2
, P
3
,andP
4
can be modeled with abstract GPCs.
Finally, let us consider a component that is used for event stream shaping.
A greedy shaper component (GSC) with a shaping curve σ delays events of
an input event stream such that the output event stream has σ as an upper
arrival curve. Additionally, a greedy shaper guarantees that no events are
delayed longer than necessary. Typically, greedy shapers are used to reshape
bursty event streams and to reduce global buffer requirements. If the abstract
input event stream of a GSC with the shaping curve, σ, is represented by the
tuple of arrival curves, [α
l
, α
the mapping of tasks to computation or communication resources and the
scheduling policies adopted by these resources.
To obtain a performance model of a system, we first have to model the
event streams that trigger the system, the computation and communication
resources that are available, and the processing components. Then, we have
to interconnect the arrival and service inputs and outputs of all these ele-
ments so that the architecture of the system is correctly represented.
Figure 1.7 depicts the MPA performance model for the example system
described in Figure 1.2. Note that the outgoing abstract service stream of
GPC
2
is used as the ingoing abstract service stream for GPC
4
, i.e., GPC
4
gets only the resources that are left by GPC
2
. This represents the fact that
the two tasks share the same processor and are scheduled according to a
Nicolescu/Model-Based Design for Embedded Systems 67842_C001 Finals Page 17 2009-10-1
Performance Prediction of Distributed Platforms 17
α
HR
α
LR
β
1
β
3
β
abstract components with appropriate transfer functions have been intro-
duced. Figure 1.8 shows some examples of how to model different schedul-
ing policies within the MPA framework.
1.4.5 Performance Analysis
The performance model provides the basis for the performance analysis
of a system. Several performance characteristics such as worst-case end-to-
end delays of events or buffer requirements can be determined analytically
within the MPA framework.
The performance of each abstract component can be determined as a
function of the ingoing arrival and service curves by the formulas of the RTC.
For instance, the maximum delay, d
max
, experienced by an event of an event
stream with arrival curves, [α
l
, α
u
], that is processed by a GPC on a resource
with service curves, [β
l
, β
u
], is bounded by
d
max
≤ sup
λ≥0
inf{τ ≥ 0: α
u
l
(λ)}
def
= Buf(α
u
, β
l
)
Nicolescu/Model-Based Design for Embedded Systems 67842_C001 Finals Page 18 2009-10-1
18 Model-Based Design for Embedded Systems
α
A
α
A
GPC
GPC
α
B
β
β
(a)
α
A
α
B
β
β
(b)
EDF
α
slot
1
β
slot
2
α
A
α
B
β
β
s
2
β
s
1
β
s
1
(d)
GPC
TDMA
Share
Sum
GPC
GPC
GPC
FIGURE 1.8
Modeling scheduling policies in the MPA framework: (a) preemptive fixed
priority, (b) EDF, (c) TDMA, and (d) generalized processor sharing.
l
2
⊗ ⊗ β
l
n
)
For an abstract GSC, the maximum delay and the maximum backlog are
bounded by
d
max
= Del(α
u
, σ) b
max
= Buf(α
u
, σ)
Nicolescu/Model-Based Design for Embedded Systems 67842_C001 Finals Page 19 2009-10-1
Performance Prediction of Distributed Platforms 19
d
max
b
max
β
l
α
u
Δ
FIGURE 1.9
Graphical interpretation of d
3
⊗σ
2
)
1.4.6 Compact Representation of VCCs
The performance analysis method presented above relies on computations
on arrival and service curves. While the RTC provides compact mathemati-
cal representations for the different operations on curves, their computation
in practice is typically more involved. The main issue is that the VCCs are
defined for the infinite range of positive real numbers. However, any com-
putation on these curves requires a finite representation.
To overcome this problem, we introduce a compact representation for
special classes of VCCs. In particular, we consider piecewise linear VCCs
that are finite, periodic, or mixed.
• Finite piecewise linear VCCs consist of a finite set of linear segments.
• Periodic piecewise linear VCCs consist of a finite set of linear segments
that are repeated periodically with a constant offset between consecu-
tive repetitions.
• Mixed piecewise linear VCCs consist of a finite set of linear segments
that are followed by a second finite set of linear segments that are
repeated periodically, again with a constant offset between consecu-
tive repetitions.
Figure 1.10a through c shows examples of these three classes of curves.
Many practically relevant arrival and service curves are piecewise linear.
For example, if a stream consists of a discrete token, the corresponding
Nicolescu/Model-Based Design for Embedded Systems 67842_C001 Finals Page 20 2009-10-1
20 Model-Based Design for Embedded Systems
8
6
4
also atomic. However, these units are often too fine-grained for a practical
analysis andhence it ispreferable to usea continuousmodel. In mostpractical
applications, the fluid resource availability is piecewise constant over time,
that is, practically relevant service curves are also piecewise linear.
Here, we want to note that there are also piecewise linear VCCs that are
not covered by the three classes of curves that we have defined above. In
particular, we have excluded irregular VCCs, that is, VCCs with an infinite
number of linear segments that do not eventually show periodicity.
However, most practically relevant timing specifications for event streams
and availability specifications for resources can be captured by either finite,
Nicolescu/Model-Based Design for Embedded Systems 67842_C001 Finals Page 21 2009-10-1
Performance Prediction of Distributed Platforms 21
periodic, or mixed piecewise linear VCCs. In addition, note that VCCs only
describe bounds on token or resource streams, and, therefore, one can always
safely approximate an irregular VCC to a mixed piecewise VCC.
In the following, we describe how these three classes of curves can be rep-
resented by means of a compact data structure. First, we note that a single lin-
ear segment of a curve can be represented by a triple x, y,swith x ∈ R
≥0
and
y, s ∈ R that specifies a straight line in the Cartesian coordinate system, which
starts at the point (x, y) and has a slope s. Further, a piecewise linear VCC can
be represented as a (finite or infinite) sequence
x
1
, y
1
, s
1
x
, p
y
, x
p0
, y
p0
where
Σ
A
is a sequence of linear segments describing a possibly existing irregular
initial part of ρ
Σ
P
is a sequence of linear segments describing a possibly existing regularly
repeated part of ρ
If Σ
P
is not an empty sequence, then the regular part of ρ is defined by
the period p
x
and the vertical offset p
y
between two consecutive repetitions
of Σ
P
, and the first occurrence of the regular sequence Σ
P
starts at (x
p0
> 0.
As an example, consider the regular mixed piecewise linear VCC
depicted in Figure 1.10c. Its compact representation according to the defi-
nition above is given by the tuple
ν
C
=
0, 1,0, 0.2,2, 0, 0.4,3, 0,0.6, 4,0
,
0, 0,0
,2,1,2,5
The described compact representation of VCCs is used as a basis for prac-
tical computations in the RTC framework. All the curve operators adopted
in the RTC (minimum, maximum, convolutions, deconvolutions, etc.) are
closed on the set of mixed piecewise linear VCCs. This means that the result
of the operators, when applied to finite, periodic, or mixed piecewise linear
Nicolescu/Model-Based Design for Embedded Systems 67842_C001 Finals Page 22 2009-10-1
22 Model-Based Design for Embedded Systems
VCCs, is again a mixed piecewise linear VCC. Further details about the com-
pact representation of VCCs and, in particular, on the computation of the
operators can be found in [17].
1.5 RTC Toolbox
The framework for the MPA with the RTC that we have described in this
chapter has been implemented in the RTC Toolbox for MATLAB
Nicolescu/Model-Based Design for Embedded Systems 67842_C001 Finals Page 23 2009-10-1
Performance Prediction of Distributed Platforms 23
The Java kernel is accessed from MATLAB via the MATLAB Java Inter-
face. However, this access is completely hidden from the user who only
uses the MATLAB functions provided by the RTC libraries. The MATLAB
libraries of the RTC Toolbox provide functions to create VCCs, plot VCCs,
and apply operators of the RTC on VCCs. From the point of view of the user,
the VCCs are MATLAB data types, even if internally they are represented
as Java objects. Similarly, the MATLAB functions for the RTC operators are
wrapper functions for the corresponding methods that are implemented in
the Java kernel.
On top of the VCC and the RTC libraries, there is the MPA library. It
provides a set of functions that facilitate the use of the RTC Toolbox for the
MPA. In particular, it contains functions to create commonly used arrival
and service curves, as well as functions to conveniently compute the outputs
of the various abstract components of the MPA framework.
1.6 Extensions
In the previous sections, we have introduced the basics of the MPA approach
based on the RTC. Recently, several extensions have been developed to refine
the analysis method.
In [4], the existing methods for analyzing heterogeneous multiprocessor
systems are extended to nonpreemptive scheduling policies. In this work,
more complex task-activation schemes are investigated as well. In particular,
components with multiple inputs and AND- or OR-activation semantics are
introduced.
The MPA approach also supports the modeling and analysis of systems
with dynamic scheduling policies. In [16], a component for the modeling of
the EDF scheduling is presented. This work also extends the ability of the
MPA framework to model and analyze hierarchical scheduling policies by
introducing appropriate server components. The TDMA policies have been
1.7 Concluding Remarks
In this chapter, we have introduced the reader to the system-level perfor-
mance prediction of distributed embedded platforms in the early design
stages. We have defined the problem and given a brief overview of
approaches to performance analysis.
Starting from a simple application scenario, we have presented a formal
system description method in the time domain. We have described its use-
fulness for the simulation of concrete system executions, but at the same time
we have pointed out that the method is inappropriate for worst-case analy-
sis, as in general it cannot guarantee the coverage of corner cases.
Driven by the need to provide hard performance bounds for distributed
embedded platforms, we have generalized the formalism to an abstraction
in the time interval domain based on the VCCs and the RTC. We have pre-
sented the essential models underlying the resulting framework for the MPA
and we have demonstrated its application. Finally, we have described a com-
pact representation of the VCCs that enables an efficient computation of RTC
curve operations in practice, and we have presented the RTC Toolbox for
MATLAB, the implementation of the MPA analysis framework.
Acknowledgments
The authors would like to thank Ernesto Wandeler for contributing to some
part of this chapter and Nikolay Stoimenov for helpful comments on an ear-
lier version.
Nicolescu/Model-Based Design for Embedded Systems 67842_C001 Finals Page 25 2009-10-1
Performance Prediction of Distributed Platforms 25
References
1. S. Chakraborty, S. Künzli, and L. Thiele. A general framework
for analysing system properties in platform-based embedded system
designs. In Design Automation and Test in Europe (DATE), pp. 190–195,
Munich, Germany, March 2003. IEEE Press.
2. L. de Alfaro and T. A. Henzinger. Interface theories for component-based
26 Model-Based Design for Embedded Systems
Proceedings of the Tenth International Symposium on Hardware/Software
Codesign, pp. 187–192, New York, 2002. ACM.
12. K. Richter, M. Jersak, and R. Ernst. A formal approach to mpsoc perfor-
mance verification. IEEE Computer, 36(4):60–67, 2003.
13. L. Thiele, S. Chakraborty, and M. Naedele. Real-time calculus for
scheduling hard real-time systems. In Proceedings Symposium on Circuits
and Systems, volume 4, pp. 101–104, Geneva, Switzerland, 2000.
14. L. Thiele, E. Wandeler, and N. Stoimenov. Real-time interfaces for com-
posing real-time systems. In International Conference on Embedded Software
EMSOFT 06, pp. 34–43, Seoul, Korea, 2006.
15. K. Tindell and J. Clark. Holistic schedulability analysis for distributed
hard real-time systems. Microprocess. Microprogram., 40(2–3):117–134,
1994.
16. E. Wandeler and L. Thiele. Interface-based design of real-time systems
with hierarchical scheduling. In 12th IEEE Real-Time and Embedded Tech-
nology and Applications Symposium (RTAS), pp. 243–252, San Jose, CA,
April 2006.
17. E. Wandeler. Modular performance analysis and interface-based design
for embedded realtime systems. PhD thesis, ETH Zürich, 2006.
18. E. Wandeler, A. Maxiaguine, and L. Thiele. Quantitative characterization
of event streams in analysis of hard real-time applications. Real-Time Sys-
tems, 29(2):205–225, March 2005.
19. E. Wandeler, A. Maxiaguine, and L. Thiele. Performance analysis of
greedy shapers in real-time systems. In Design, Automation and Test in
Europe (DATE), pp. 444–449, Munich, Germany, March 2006.
20. E. Wandeler and L. Thiele. Optimal TDMA time slot and cycle length
allocation. In Asia and South Pacific Desing Automation Conference (ASP-
DAC), pp. 479–484, Yokohama, Japan, January 2006.
21. E. Wandeler and L. Thiele. Real-Time Calculus (RTC) Toolbox.
2.4.5.3 Cache Model 44
2.4.5.4 Cache Analysis Blocks 44
2.4.5.5 Cycle Calculation Code 45
2.4.6 Consideration of Task Switches 46
2.4.7 Preemption of Software Tasks 46
2.5 ExperimentalResults 47
2.6 Outlook 50
2.7 Conclusions 50
References 51
This chapter presents a methodology for SystemC-based performance anal-
ysis of embedded systems. This methodology is based on a cycle-accurate
simulation approach for the embedded software that also allows the inte-
gration of abstract SystemC models. Compared to existing simulation-based
approaches, a hybrid method is presented that resolves performance issues
27
Nicolescu/Model-Based Design for Embedded Systems 67842_C002 Finals Page 28 2009-10-13
28 Model-Based Design for Embedded Systems
by combining the advantages of simulation-based and analytical approaches.
In the first step, cycle-accurate static execution time analysis is applied at
each basic block of a cross-compiled binary program using static proces-
sor models. After that, the determined timing information is back-annotated
into SystemC for a fast simulation of all effects that cannot be resolved stat-
ically. This allows the consideration of data dependencies during runtime,
and the incorporation of branch prediction and cache models by efficient
source-code instrumentation. The major benefit of our approach is that the
generated code can be executed very efficiently on the simulation host with
approximately 90% of the speed of the untimed software without any code
instrumentation.
2.1 Introduction
In the future, new system functionality will be realized less by the sum of
What is also important is the early inclusion of the intended target plat-
form in the model-based system design (UML), the mapping of function
blocks on platform components, and the use of virtual prototypes for the
abstract modeling of the target architecture.
Nicolescu/Model-Based Design for Embedded Systems 67842_C002 Finals Page 29 2009-10-13
SystemC-Based Performance Analysis of Embedded Systems 29
An early evaluation of the target platform means that the application
software can be evaluated while considering the target platform. Hence, an
optimization of the target platform under consideration of the application
software, performance requirements, power dissipation, and reliability can
take place.
An early analysis of the system integration is provided by an early verifi-
cation and exposure of integration faults using virtual prototypes. After that,
a seamless transition to the physical prototype can take place.
2.2 Performance Analysis of Distributed Embedded
Systems
The main question of performance analysis of distributed embedded systems
is: What is the global timing behavior of a system and how can it be deter-
mined? The central issue is that computation has no timing behavior as long
as the target platform is not known because the target platform has a major
effect on timing.
The specification, however, can contain global performance require-
ments. The fulfillment of these requirements depends on local timing behav-
iors of system parts. A solution for determining local timing properties is an
early inclusion of the target architecture.
Several analytical and simulative approaches for performance analysis
have previously been proposed. In this chapter, a hybrid approach for per-
formance analysis will be presented.
2.2.1 Analytical Approaches
Analytical approaches perform a formal analysis of pessimistic corner cases
in white-box system analysis.
Timed Petri nets [49] are able to represent the internal behavior of a
system. Although there exist stochastic extensions by generalized stochas-
tic Petri nets (GSPN) [23], these do not consider execution times of the actual
system components. Furthermore, synchronization by communication and
the specification of communication protocols have to be modeled explic-
itly and cannot be extracted from executable functional implementations of
a design.
System-level performance and power estimation based on stochastic
automata networks (SAN) are introduced in [22]. The system including
probabilities of execution times is modeled explicitly in SAN. The actual
execution behavior of the components related to timing and control flow of a
functional implementation is not considered. Stochastic automata [3] extend
the model of communicating I/O automata [42] by general probability distri-
butions for verifying performance requirements of systems. The system and
timing probabilities have to be modeled explicitly and no bottom-up evalua-
tion of a functional system implementation is given.
2.2.2 Simulative Approaches
Simulative approaches perform a simulation of the entire communication
infrastructure and the processing elements. If necessary, this simulation
includes a hardware IP.
Depending on the underlying model of computation, a network simu-
lator such as the OPNET [28], Simulink, or SystemC [14] can be employed
to simulate a network between communicating C/C++ processes. Timing
annotation of such a network simulation is possible, but the exact timing
behavior of the software is missing. To obtain this timing behavior, it is nec-
essary to simulate the software execution on the target processor. For this
simulation, the binary code for the target platform component is required.
This binary code can run on an instruction set simulator (ISS). An ISS is
an abstract model for executing instructions at the binary level and can be