Identification of small scale biochemical networks based
on general type system perturbations
Henning Schmidt
1
, Kwang-Hyun Cho
2,3
and Elling W. Jacobsen
1
1 Signals, Sensors and Systems, Royal Institute of Technology – KTH, Stockholm, Sweden
2 College of Medicine, Seoul National University, Chongno-gu, Seoul, Korea
3 Korea Bio-MAX Institute, Seoul National University, Gwanak-gu, Korea
New high-throughput experimental technologies, i.e.
for monitoring the expression levels of large gene sets
and the concentrations of metabolites, are evolving
rapidly. These data sets contain the information
required to uncover the organization of biological
systems on a genetic, proteomic, and metabolic level.
However, in order to realize the translation of data
into a system level understanding of cell functions,
methods that can construct quantitative mathematical
models from data are needed. In particular, determin-
ation of the quantitative interactions between the
components within and across these levels is an
important issue. These interactions lead to the notion
of networks that can be represented by weighted,
Keywords
biochemical networks; identification;
Jacobian; time-series measurements
Correspondence
E. W. Jacobsen, Department of Automatic
Control, Royal Institute of Technology –
squares estimation, using time-series measurements of concentrations and
expression profiles, in which system states and parameter perturbations are
estimated simultaneously. An important side-effect of also employing esti-
mation of the parameter perturbations is that knowledge of the system’s
steady-state concentrations, or activities, is not required and that deviations
from steady-state prior to the perturbation can be dealt with. Time deriva-
tives are computed using a zero-order hold discretization, shown to yield
significant improvements over the widely used Euler approximation. We
also show how network interactions with dynamics that are too fast to be
captured within the available sampling time can be determined and
excluded from the network identification. Known and unknown moiety
conservation relationships can be processed in the same manner. The
method requires that the number of samples equals at least the number of
network components and, hence, is at present restricted to relatively small-
scale networks. We demonstrate herein the performance of the method on
two small-scale in silico genetic networks.
FEBS Journal 272 (2005) 2141–2151 ª 2005 FEBS 2141
directed graphs, where the nodes correspond to the
biochemical components and the edges, represented as
arrows with weights attached, indicate the direct
quantitative effect that a change in a certain compo-
nent has on another component. The weights are in
general nonlinear functions that represent reaction
kinetics. Determination of these network structures
will provide insight into the functional relationships
between the involved components, so as to better
understand the functions of biological systems, and
will eventually lead to knowledge concerning how
these systems can be manipulated in order to achieve
a certain desired behavior.
methods focusing on the determination of the qualitat-
ive structure of the interactions and those aimed at
determining quantitative information about the inter-
actions. Ross [5] reviews two approaches to determine
the structure of reaction pathways from time-series
measurements of metabolites and proteins. The first
approach is based on small pulses of concentration
changes applied to the different species around a stable
steady-state. Depending on the relative behavior of the
measured responses, the considered metabolic pathway
can be determined [6,7]. The second approach is based
on correlations between different species when period-
ically forcing the system by changing some input spe-
cies over time. Using correlation and multidimensional
scaling analysis, the structure of the considered path-
way can be unravelled [8].
Kholodenko et al., Gardner et al. and Vance et al.
propose methods for determining quantitative inter-
action matrices based on steady-state responses of per-
turbed genetic networks [2,3,9]. As the responses to the
applied perturbations can often become relatively large
in steady-state, these methods are potentially limited
depending on the nonlinearity of the considered sys-
tems. Furthermore, the fact that Kholodenko et al.
and Vance et al. determine the n
2
elements of the inter-
action matrix from n
2
measurements, suggests that the
the perturbation has no direct effect on this compo-
nent, that is, the designed perturbation only works indi-
rectly through other components in the network [4].
However, this requires substantial a priori structural
knowledge, and furthermore causes problems with rank
deficient measurement matrices. (The latter is discussed
in more detail in the supplementary material.)
Identification of biochemical network structures H. Schmidt et al.
2142 FEBS Journal 272 (2005) 2141–2151 ª 2005 FEBS
Herein, we study biochemical networks involving
genes, proteins, and ⁄ or metabolites, and consider
determination of the Jacobian using least-squares esti-
mation from time-series measurements obtained in the
vicinity of some steady-state. The method is able to
deal with very general types of system perturbations.
In this paper we assume the use of constant parameter
perturbations, such as gene knockouts and inhibitor
additions. For these type of perturbations, the exact
size as well as the direct effect of the perturbations will
in general be largely unknown. We therefore also con-
sider incorporating a determination of the perturbation
itself from the available data. However, the method
can be applied equally when pulse perturbations are
realizable for a given network, and in the case of
known or unknown time-varying parameter perturba-
tions. Furthermore, it is possible to combine pulse and
parameter perturbations. (The use of the method for
other types of perturbations is discussed in the
supplementary material.)
Furthermore, we show that the effect of unsteady-
only in terms of the independent intermediates of the
network.
The proposed method will in general uncover only a
phenomenological interaction topology of the network
as not all intermediates can be measured. That is, we
assume that only the measured components are part of
the network to be modelled. This is a common
assumption often used [4]. This assumption is relaxed
somewhat by Mihaliuk et al. [10]. However, they
assume that all components are known and are poss-
ible to perturb. The case of unknown and unmeasura-
ble components is, of course, a highly relevant topic,
but outside the scope of this paper.
The outline of the paper is as follows. We first pre-
sent the problem formulation and briefly outline the
method used for network identification from measure-
ment samples. The proposed method is then applied
to in silico models of two small scale gene networks.
Following the conclusions, we present a detailed
description of the method for least-squares identifica-
tion of the Jacobian, and discuss the impact of the
sampling time and moiety conservations.
Results and Discussion
Problem formulation
We consider metabolic reactions, signaling networks,
and gene networks that can be described by a system
of nonlinear differential equations of the form in Eqn
(1).
_
xðtÞ¼f ðxðtÞ; pÞ; ð1Þ
unfeasible to determine the nonlinear functions f
i
(x,p)
using a ‘top-down’ approach, that is, determining all
reaction mechanisms and involved parameters, such as
rate constants, from measured responses of the per-
turbed network. We therefore consider the system
(Eqn 1) in the neighborhood of some steady-state
H. Schmidt et al. Identification of biochemical network structures
FEBS Journal 272 (2005) 2141–2151 ª 2005 FEBS 2143
(x
0
,p
0
) and assume that it behaves linearly for small
variations around this state. This assumption allows us
to represent the system as a linear time invariant sys-
tem (Eqn 2)
D
_
xðtÞ¼@f =@x
jx
0
;p
0
DxðtÞþ@f =@p
jx
0
;p
0
(Eqn 2) as a discrete time system (Eqn 3)
Dx
kþ1
¼ A
d
Dx
k
þ B
d
Dp
k
; ð3Þ
where Dx
k
¼ Dx(kDT) and Dp
k
¼ Dp(kDT). Using
Eqn (3) we will, in the following, show how an esti-
mation A
d
for the discrete time Jacobian A
d
can be
determined. An estimation A for the continuous time
Jacobian A can then be calculated through a reverse
transformation to continuous time using the Euler
approximation or the, so called, zero-order hold dis-
cretization.
The commonly used Euler approximation for the
time derivatives of the states implies replacing the con-
and Dx
k
. This leads to the
zero-order hold discretization [13] (Eqn 5)
A
zoh
¼
1
DT
log
m
ðA
d
Þ; ð5Þ
where log
m
(A
d
) denotes the matrix logarithm. Note
there are no approximations involved in this transfor-
mation provided the parameter perturbations are con-
stant between samples. (A more detailed discussion of
the zero-order hold discretization and a comparison to
the Euler discretization can be found in part 1 of the
supplementary material.)
Having determined an estimation A
d
for A
d
, an esti-
From system identification theory it is well known
that a good estimation result requires a sufficient exci-
tation of the system. (In particular, part 3 of the sup-
plementary material, shows that perturbations have to
be chosen such that the complete space of the network
states is perturbed.) In the following we consider
time-series data obtained from constant parameter per-
turbation experiments. The perturbed parameters cor-
respond to the maximal enzyme rates involved in the
transcription of the genes (part 6 of the supplementary
material). Furthermore, the magnitudes, as well as the
direct effects of the perturbations, are assumed to be
Identification of biochemical network structures H. Schmidt et al.
2144 FEBS Journal 272 (2005) 2141–2151 ª 2005 FEBS
unknown and thus not used in the identification
algorithm. The estimations of
^
A
d
are obtained using
absolute measurements and applying Eqn 15. Unless
otherwise stated, the zero-order hold discretization is
used in the following.
Estimation of the Jacobian
We first performed an in silico experiment in which the
maximal enzyme rate corresponding to the transcrip-
tion of gene number one is perturbed by 1%. The
sampling time is chosen as DT ¼ 0.01 h, and we collect
six samples, the minimal number required for estima-
ting the Jacobian when the size of the perturbation is
eter perturbation. Note that the nonlinear effects in
general will depend on the parameter chosen for per-
turbation.
In real experiments, inhibition efficiencies by small
interfering RNA (siRNA) or chemical inhibitors are
much higher than in the experiment above. A more
realistic experimental setting is to assume perturbations
of 50%, and to do several experiments, in which differ-
ent parameters are perturbed. We performed four
experiments and combine the obtained measurements.
In each experiment, one of the maximal enzyme rates
of the four genes is perturbed by 50%, and the mini-
mum required number of samples are taken – three
samples in each experiment. The sampling time is
DT ¼ 0.01 h. The result of the estimation is given by:
^
A ¼
À6:45 À2:69 À0:01 2:59
0:00 À8:14 À0:02 4:06
À2:19 2:90 À14:38 À0:01
0:07 0:08 8:51 À9:72
2
6
6
4
3
7
7
5
Comparing with the ‘true’ linear Jacobian in Eqn (6),
DT and perturbation magnitudes, using two discretiza-
tion methods (Eqns 4 and 5). The relative estimation
error, e, is calculated as (Eqn 7)
1
3
2
4
Fig. 1. Structure of the four-gene network. The interconnections
represent the direct interactions between the genes. An arrow indi-
cates a positive effect on the gene transcription, and a bar indicates
a negative (inhibitory) effect.
H. Schmidt et al. Identification of biochemical network structures
FEBS Journal 272 (2005) 2141–2151 ª 2005 FEBS 2145
e ¼
1
N
X
n
i¼1
X
n
j¼1
ja
ij
j
a
ij
¼
^
A
The results clearly demonstrate that the zero-order
hold discretization in Eqn (5) leads to a considerable
improvement in the network identification, compared
to the commonly used Euler approximation.
Impact of measurement uncertainty
We consider herein the effect of measurement uncer-
tainty on the estimation of the Jacobian. The uncer-
tainty is simulated in silico by adding noise to the
absolute measurements x
k
as follows
x
noise
k
¼ x
k
þ W Áx
0
Here, W denotes a diagonal matrix in which the entries
are uniformly distributed random variables between
)0.02 and 0.02. These values may appear small com-
pared to the uncertainty in realistic biological experi-
ments. However, the noise levels relative to the
measured deviations Dx correspond to over 50% for
some samples. This should also be seen in relation to
the fact that measurements of gene expressions are
often carried out in a relative manner, corresponding
to the measurement of Dx.
The sampling time is DT ¼ 0.01 h as before, and
considered magnitudes of parameter perturbations are
The last column displays the relative error introduced by using the
Euler approximation.
DT
Error (%)
50% perturbation 10% perturbation Approximation error
e (ZOH) e (Euler) e (ZOH) e (Euler) g(DT)
0.001 0.48 1.02 0.10 0.83 0.013
0.01 3.95 9.48 0.84 7.92 1.3
0.1 22.77 56.98 6.22 52.82 120
3 6 9 12 15 18 21
10
1
10
2
10
3
10
4
Measured time-steps / experiment
Mean relative error and standard deviation (%)
20% perturbation
50% perturbation
100% perturbation
21% Error
Fig. 2. Mean value and standard deviation of the relative estimation
error (7) in the nonzero elements of the Jacobian, obtained from
100 Monte-Carlo simulations.
Identification of biochemical network structures H. Schmidt et al.
2146 FEBS Journal 272 (2005) 2141–2151 ª 2005 FEBS
gene with relatively fast dynamics. (The equations,
chosen to be DT ¼ 0.01 h.
Following the approach discussed above, we collect
the measurements and find that the smallest singular
value of the measurement matrix M is r
1
¼ 0.00026,
which is relatively close to 0 compared to the other
singular values. Thus, we conclude that the chosen
sampling time was too large with respect to the fastest
dynamics of the system.
Using the zero-order hold discretization to deter-
mine A from A
d
, ignoring the fact that some modes
have not been captured in the data, the following
result is obtained:
A ¼
À5:27 À4:29 À0: 03 2:49 À5:57
0:68 À8:58 0: 02 3:76 À3:33
1:19 2:27 À14:32 0:02 À9:48
À19:32 15:60 9:50 À9:72 92:46
30:52 À25:11 0:03 0:07 À149:4
2
6
6
6
6
4
3
7
6
4
3
7
7
5
The identified Jacobian is close to the true Jacobian
for the reduced network, with relative errors in all
nonzero elements being smaller than 20%, and all zero
elements being identified as close to zero. It is import-
ant to point out that the Jacobian of the reduced net-
work is not supposed to be equal to the Jacobian of
the four gene network in the previous example. The
dynamics of the reduced Jacobian also correspond rea-
sonably well to the slow dynamics of the five gene net-
work, as can be seen from the computed eigenvalues
in Table 2. The structure of the identified reduced
Jacobian reflects well the structure of the network in
Fig. 3 when gene five is taken out. For instance, gene
one directly affects gene three when the dynamics of
gene five are neglected, or assumed to be infinitely
fast.
The results presented above show that it is indeed
possible to obtain a useful identification result even in
the case that fast dynamics are not captured correctly.
Moreover, one can obtain the information on which
components are involved in the fast reactions, and
their static relationship with the other components of
the network.
5
ics relatively fast compared to the sampling time can
be detected and removed from the identification; linear
dependencies due to moiety conservations can be iden-
tified and processed; samples from any number of
experiments can be combined in the identification, as
long as these experiments have been carried out
around the same steady-state.
We have shown that measurement uncertainty can
have a large effect on the identification result. Possible
solutions for uncertainty and noise are to collect and
use more measurement data, and to make use of avail-
able a priori structural knowledge. In addition, methods
from identification theory on estimating and filtering
noise can be incorporated. Furthermore, the signal-
to-noise ratio can be increased by choosing larger pert-
urbations. However, the latter can lead to increased
nonlinear effects and a trade-off between the two
effects, therefore, has to be taken into consideration.
Instead of using the widely accepted Euler discretiza-
tion, we have shown that the zero-order hold discreti-
zation, in general, results in a significantly improved
estimation and should be used in all methods aimed at
identifying dynamic biochemical networks.
We have not discussed explicitly the effect of auto-
regulation of biological systems by self-negative feed-
back. For example, certain components might be regu-
lated by homeostatic effects and a response to
perturbations might not be visible in the measurement
data. However, under the assumption that these
effects are significantly slower than the sampling time
in vivo, it is not possible to quantify the applied perturba-
tions, meaning that the magnitude of the applied perturba-
tions is unknown. Furthermore, for gene networks, it is
usually also unknown which components the perturbations
affect in a direct manner. Previously proposed methods
often assume this information to be available, at least parti-
ally. Sontag et al., for example, assume that the magnitude
of the perturbations is unknown but the genes that are
directly affected by the perturbed parameters are known
[4]. In the following we consider both the magnitude and
the direct effects of perturbations to be unknown. To keep
the exposition relatively simple, we assume, however, that
the parameter perturbations are constant over time.
(However, in part 2 of the supplementary material we show
how this assumption can be relaxed to take time varying
parameter perturbations, known or unknown, and pulse
perturbations into account.)
We assume that the network response to the applied
perturbations is sufficiently small such that, in the time
range of the measurements, the system can be regarded as
Table 2. Comparison between the eigenvalues of the nominal
Jacobian of the five gene Anetwork and the eigenvalues of the esti-
mated reduced Jacobian
^
A
1;2;3;4
. i,
ffiffiffiffiffiffiffi
À1
p
þB
d
Dp ¼ A
d
Dx
k
þDu ¼½A
d
; Du
Dx
k
1
: ð8Þ
Here, [A
d
, Du] is a matrix, consisting of the discrete time
Jacobian and the unknown perturbation vector Du, repre-
senting the unknowns in the equation. The vectors Dx
k+1
and ½Dx
T
k
; 1
T
are given by the measurements. Assume now
that the system (Eqn 1) at time k ¼ k
0
is in steady-state
(Dx
Dx
nþ1
Dx
n
::: Dx
1
11::: 1
¼½A
d
; DuM: ð9Þ
Under the assumption that the (n+1) · (n+1) measure-
ment matrix M on the right hand side in (Eqn 9) can be
inverted, the unknown matrix [A
d
, Du] can be determined
from:
½A
d
; Du¼½Dx
nþ2
; Dx
n
; :::; Dx
2
Dx
nþ1
Dx
n
steps, the matrices M and R are constructed as above, but
with more columns, corresponding to the measurements in
the additional time-steps. Thus, M will no longer be a
square matrix and the pseudoinverse needs to be used
instead:
½
^
A
d
; D
^
u¼RM
T
ðMM
T
Þ
À1
; ð10Þ
this corresponds to a least-squares solution for
^
A
d
and D
^
u.
As the identification of the overall network structure
requires a relatively large number of measurement samples,
we consider combining data from several experiments. It
has to be pointed out that these experiments should be per-
formed around the same steady-state, as only then an
; :::; Dx
i
2
; ð12Þ
where m
i
determines the number of measured time-steps in
experiment i. Note that the measurements are assumed to
start at time k ¼ 1 and end at time k ¼ m
i
.
The measurement matrix M is constructed as Eqn (13):
M ¼
M
1
M
2
::: M
r
10::: 0
01::: 0
.
.
.
.
.
.
.
.
.
i
m
i
À1
; Dx
i
m
i
À2
; :::; Dx
i
1
: ð14Þ
The 1 and 0 elements in Eqn (13) denote row vectors
with unity and zero entries, respectively. These vectors
have the same width as the corresponding measurement
matrices M
i
. The 1-vectors have the same origin as those
in Eqn (9). However, in the case of several experiments,
one has to take into account that the perturbation Du
i
in
the i-th experiment can be different from the perturbation
in the other experiments, and thus for each experiment
one perturbation vector needs to be taken into account.
(The construction of the matrices M and R is illustrated
for a simple example in part 4 of the supplementary
material.)
H. Schmidt et al. Identification of biochemical network structures
columns of M should at least equal n+r.For the con-
struction of R and M at least n+2rmeasured time-steps
are required. Note that Eqn (15) involves the pseudoinverse
of M, corresponding to a least-squares estimation of A
d
and Du
i
.
An important side effect of incorporating estimation of
the applied perturbations using measurement data, is that
also nonzero, or unsteady-state, initial conditions can be
handled. This follows from the fact that initial unknown
deviations from the steady-state in fact can be represented
as an unknown perturbation. Thus, the proposed method
can be used even in cases where the steady-state x
0
is
unknown. To see this, Eqn (8) is reformulated using the
relations Dx
k+1
¼ x
k+1
) x
0
and Dx
k
¼ x
k
–x
0
0
in Eqns (12) and (14) by the corresponding
absolute measurements x
i
k
. Equation (15) then becomes
½
^
A
d
;
^
u
1
;:::;
^
u
r
¼RM
T
ðMM
T
Þ
À1
;
where the only difference lies in the fact that now the
lumped perturbations u
i
, instead of Du
i
considered network.
Assume the fastest mode of the linearized system (Eqn 2)
corresponds to an eigenvalue s
f
or a time-constant
s
f
¼ 1 ⁄ |k
f
|. In order to obtain a reasonable estimate of the
corresponding dynamics, a sampling time DT smaller than
s
f
should be used [15]. If the sampling time is chosen to be
significantly larger than s
f
, then the transients of this mode
will essentially disappear between samples. This implies that
there exists an almost linear dependency between the meas-
urements of the sampled states, and hence that the
measurement matrix M will be (almost) rank deficient. In
general – and we assume the perturbations fulfil the con-
trollability condition discussed above ) the deficiency will
be equal to the number of modes with time-constant signifi-
cantly smaller than the sampling time. The linear depend-
ency, corresponding to the interactions with dynamics
significantly faster than the sampling time, can be deter-
mined directly from the collected measurements using a
singular value decomposition (SVD) of the measurement
matrix, that is, M ¼ USV
trations of the intermediates in the networks will be linearly
dependent, resulting in a measurement matrix without full
row rank. Thus, the same approach as presented above for
dealing with linear dependencies due to a too large sampling
time, can be used to determine the components involved in
Identification of biochemical network structures H. Schmidt et al.
2150 FEBS Journal 272 (2005) 2141–2151 ª 2005 FEBS
the moiety conservations, and a reduced Jacobian can then
be determined for a reduced network containing only the
independent components. The algebraic relations corres-
ponding to moiety conservations are obtained directly from
the SVD of the measurement matrix, M.
Note that if the linear dependencies due to moiety con-
servations and ⁄ or fast dynamic modes are not eliminated
from M prior to the determination of
^
A
d
and
^
A, the result-
ing Jacobian will contain gross errors.
Acknowledgments
Henning Schmidt and Elling W. Jacobsen acknowledge
financial support from the Swedish Research Council.
Kwang-Hyun Cho acknowledges the support received
by a grant from the Korea Ministry of Science and
Technology (Korean Systems Biology Research Grant,
M10309000006–03B5000-00211) and also by The 21C
Frontier Microbial Genomics and Application Center
9 de la Fuente A, Brazhnik P & Mendes P (2002) Linking
the genes: inferring quantitative gene networks from
microarray data. Trends Genet 18, 395–398.
10 Mihaliuk E, Skodt H, Hynne F, Sorensen PG & Sho-
walter K (1999) Normal modes for chemical reactions
from time series analysis. J Phys Chem 103, 8246–8251.
11 Crampin EJ, Schnell S & McSharry PE (2004) Mathe-
matical and computational techniques to deduce com-
plex biochemical reaction mechanisms. Prog Biophys
Mol Biol 86, 77–112.
12 Siddhartha J & van Schuppen JH (2001) Modelling and
control of cell reaction networks. PNA-R0116, CWI,
Amsterdam.
13 Rugh W (1996) Linear System Theory. Prentice Hall,
Upper Saddle River, NJ, USA.
14 Kay S (1993) Fundamentals of Statistical Signal Process-
ing. Prentice Hall, Upper Saddle River, NJ, USA.
15 Ljung L (1999) System Identification – Theory for the
User, 2nd edn. Prentice Hall, Upper Saddle River, NJ,
USA.
Supplementary material
The following material is available from http://www.
blackwellpublishing.com/products/journals/suppmat/EJB/
EJB4605/EJB4605sm.htm
Appendix S1. Additional proofs and models.
H. Schmidt et al. Identification of biochemical network structures
FEBS Journal 272 (2005) 2141–2151 ª 2005 FEBS 2151