5
DUAL EXTENDED KALMAN
FILTER METHODS
Eric A. Wan and Alex T. Nelson
Department of Electrical and Computer Engineering, Oregon Graduate Institute of
Science and Technology, Beaverton, Oregon, U.S.A.
5.1 INTRODUCTION
The Extended Kalman Filter (EKF) provides an efficient method for
generating approximate maximum-likelihood estimates of the state of a
discrete-time nonlinear dynamical system (see Chapter 1). The filter
involves a recursive procedure to optimally combine noisy observations
with predictions from the known dynamic model. A second use of the
EKF involves estimating the parameters of a model (e.g., neural network)
given clean training data of input and output data (see Chapter 2). In this
case, the EKF represents a modified-Newton type of algorithm for on-line
system identification. In this chapter, we consider the dual estimation
problem, in which both the states of the dynamical system and its
parameters are estimated simultaneously, given only noisy observations.
123
Kalman Filtering and Neural Networks, Edited by Simon Haykin
ISBN 0-471-36998-5 # 2001 John Wiley & Sons, Inc.
Kalman Filtering and Neural Networks, Edited by Simon Haykin
Copyright # 2001 John Wiley & Sons, Inc.
ISBNs: 0-471-36998-5 (Hardback); 0-471-22154-6 (Electronic)
To be more specific, we consider the problem of learning both the
hidden states x
k
and parameters w of a discrete-time nonlinear dynamical
system,
x
kþ1
multilayer neural networks, in which case w are the weights.
The problem of dual estimation can be motivated either from the need
for a model to estimate the signal or (in other applications) from the need
for good signal estimates to estimate the model. In general, applications
can be divided into the tasks of modeling, estimation, and prediction. In
estimation, all noisy data up to the current time is used to approximate the
current value of the clean state. Prediction is concerned with using all
available data to approximate a future value of the clean state. Modeling
(sometimes referred to as identification) is the process of approximating
the underlying dynamics that generated the states, again given only the
noisy observations. Specific applications may include noise reduction
(e.g., speech or image enhancement), or prediction of financial and
economic time series. Alternatively, the model may correspond to the
explicit equations derived from first principles of a robotic or vehicle
system. In this case, w corresponds to a set of unknown parameters.
Applications include adaptive control, where parameters are used in the
design process and the estimated states are used for feedback.
Heuristically, dual estimation methods work by alternating between
using the model to estimate the signal, and using the signal to estimate the
model. This process may be either iterative or sequential. Iterative
schemes work by repeatedly estimating the signal using the current
model and all available data, and then estimating the model using the
estimates and all the data (see Fig. 5.1a). Iterative schemes are necessarily
restricted to off-line applications, where a batch of data has been
previously collected for processing. In contrast, sequential approaches
use each individual measurement as soon as it becomes available to update
both the signal and model estimates. This characteristic makes these
algorithms useful in either on-line or off-line applications (see Fig. 5.1b).
124
5 DUAL EXTENDED KALMAN FILTER METHODS
EKF to estimate the time-series and using these estimates to train a neural
network via gradient descent. A joint EKF is used in [15] to model
partially unknown dynamics in a model reference adaptive control frame-
work. Furthermore, iterative EM approaches to the dual estimation
problem have been investigated for radial basis function networks [16]
and other nonlinear models [17]; see also Chapter 6. Errors-in-variables
(EIV) models appear in the nonlinear statistical regression literature [18],
and are used for regressing on variables related by a nonlinear function,
but measured with some error. However, errors-in-variables is an iterative
approach involving batch computation; it tends not to be practical for
dynamical systems because the computational requirements increase in
proportion to N
2
, where N is the length of the data. A heuristic method
known as Clearning minimizes a simplified approximation to the EIV cost
function. While it allows for sequential estimation, the simplification can
lead to severely biased results [19]. The dual EKF [19] is a nonlinear
extension of the linear dual Kalman approach of [5], and recursive
prediction error algorithm of [6]. Application of the algorithm to speech
enhancement appears in [20], while extensions to other cost functions
have been developed in [21] and [22]. The crucial, but often overlooked
issue of sequential variance estimation is also addressed in [22].
Overview The goal of this chapter is to present a unified probabilistic
and algorithmic framework for nonlinear dual estimation methods. In the
next section, we start with the basic dual EKF prediction error method.
This approach is the most intuitive, and involves simply running two EKF
filters in parallel. The section also provides a quick review of the EKF for
both state and weight estimation, and introduces some of the complica-
tions in coupling the two. An example in noisy time-series prediction is
also given. In Section 5.3, we develop a general probabilistic framework
and
covariance P
À
x
k
with the current noisy measurement y
k
. These estimates are
optimal in both the MMSE and MAP senses. Maximum-likelihood signal
estimates are obtained by letting the initial covariance P
x
0
approach
infinity, thus causing the filter to ignore the value of the initial state
^
xx
0
.
For nonlinear systems, the extended Kalman filter provides approxi-
mate maximum-likelihood estimates. The mean and covariance of the state
are again recursively updated; however, a first-order linearization of the
dynamics is necessary in order to analytically propagate the Gaussian
random-variable representation. Effectively, the nonlinear dynamics are
approximated by a time-varying linear system, and the linear Kalman
filters equations are applied. The full set of equations are given in Table
5.1. While there are more accurate methods for dealing with the nonlinear
dynamics (e.g., particle filters [24, 25], second-order EKF, etc.), the
standard EKF remains the most popular approach owing to its simplicity.
Chapter 7 investigates the use of the unscented Kalman filter as a
potentially superior alternative to the EKF [26–29].
t
À x
À
t
Þ
T
ðR
v
Þ
À1
ðx
t
À x
À
t
Þg ð5:10Þ
where x
À
t
¼ Fðx
tÀ1
; wÞ is the predicted state, and R
n
and R
v
are the
additive noise and innovations noise covariances, respectively. This inter-
pretation will be useful when dealing with alternate forms of the dual EKF
in Section 5.3.3.
5.2 DUAL EKF–PREDICTION ERROR
; ð5:11Þ
d
k
¼ Gðx
k
; w
k
Þþe
k
; ð5:12Þ
where the parameters w
k
correspond to a stationary process with identity
state transition matrix, driven by process noise r
k
. The output d
k
Table 5.1 Extended Kalman filter (EKF) equations
Initialize with:
^
xx
0
¼ E½x
0
; ð5:2Þ
P
x
0
¼ E½ðx
0
kÀ1
P
x
kÀ1
A
T
kÀ1
þ R
v
; ð5:5Þ
and the measurement-update equations are
K
x
k
¼ P
À
x
k
C
T
k
ðC
k
P
À
x
k
C
T
k
C
k
ÞP
À
x
k
; ð5:8Þ
where
A
k
¼
D
@Fðx; u
k
; wÞ
@x
^
xx
k
; C
k
¼
D
@Hðx; wÞ
@x
t¼1
½d
t
À Gðx
t
; wÞ
T
ðR
e
Þ
À1
½d
t
À Gðx
t
; wÞ: ð5:21Þ
If the ‘‘ noise’’ covariance R
e
is a constant diagonal matrix, then, in fact, it
cancels out of the algorithm (this can be shown explicitly), and hence can
be set arbitrarily (e.g., R
e
¼ 0:5I). Alternatively, R
e
can be set to specify a
weighted MSE cost. The innovations covariance E½r
k
r
T
k
ww
0
Þ
T
ð5:14Þ
For k 2 1; ...;1fg, the time update equations of the Kalman filter are:
^
ww
À
k
¼
^
ww
kÀ1
ð5:15Þ
P
À
w
k
¼ P
w
kÀ1
þ R
r
kÀ1
ð5:16Þ
and the measurement update equations:
K
w
k
^
ww
À
k
þ K
w
k
ðd
k
À Gð
^
ww
À
k
; x
kÀ1
ÞÞ ð5:18Þ
P
w
k
¼ðI À K
w
k
C
w
k
ÞP
À
w
k
w
k
, where l 2ð0; 1 is often referred to as the
‘‘forgetting factor.’’ This provides for an approximate exponentially
decaying weighting on past data and is described more fully in [22].
Set R
r
k
¼ð1 À aÞR
r
kÀ1
þ aK
w
k
½d
k
À Gðx
k
;
^
wwÞ½d
k
ÀGðx
k
;
^
wwÞ
T
ðK
w
k
is available for training.
5.2.3 Dual Estimation
When the clean state is not available, a dual estimation approach is
required. In this section, we introduce the basic dual EKF algorithm,
which combines the Kalman state and weight filters. Recall that the task is
to estimate both the state and model from only noisy observations.
Essentially, two EKFs are run concurrently. At every time step, an EKF
state filter estimates the state using the current model estimate
^
ww
k
, while
the EKF weight filter estimates the weights using the current state estimate
^
xx
k
. The system is shown schematically in Figure 5.2. In order to simplify
the presentation of the equations, we consider the slightly less general
state-space model:
x
kþ1
¼ Fðx
k
; u
k
; wÞþv
k
; ð5:22Þ
y
^
ww
À
k
associated with the weight filter. This is due to the fact
that the signal filter, whose parameters are being estimated by the weight
filter, has a recurrent architecture, i.e.,
^
xx
k
is a function of
^
xx
kÀ1
, and both
are functions of w.
1
Thus, the linearization must be computed using
recurrent derivatives with a routine similar to real-time recurrent learning
x
k-1
Measurement
Update EKFx
Measurement
Update EKFw
x
x
k
y
k
xx
kÀ1
;
^
ww
À
k
Þ=@
^
xx
kÀ1
, can be computed with a simple technique (such as backpropagation)
because
^
ww
À
k
is not itself a function of
^
xx
kÀ1
.
5.2 DUAL EKF–PREDICTION ERROR
131
(RTRL) [35]. Taking the derivative of the signal filter equations results in
the following system of recursive equations:
@
^
xx
À
k
; ð5:35Þ
@
^
xx
k
@
^
ww
¼ðI À K
x
k
CÞ
@
^
xx
À
k
@
^
ww
þ
@K
x
k
@
^
ww
ðy
k
;
^
xx
0
¼ E½x
0
; P
x
0
¼ E½ðx
0
À
^
xx
0
Þðx
0
À
^
xx
0
Þ
T
:
For k 2 1; ...;1gf , the time-update equations for the weight filter are
^
ww
À
k
¼
k
;
^
ww
À
k
Þ; ð5:26Þ
P
À
x
k
¼ A
kÀ1
P
x
kÀ1
A
T
kÀ1
þ R
v
: ð5:27Þ
The measurement-update equations for the state filter are
K
x
k
¼ P
À
x
k
k
Þ; ð5:29Þ
P
x
k
¼ðI À K
x
k
CÞP
À
x
k
; ð5:30Þ
and those for the weight filter are
K
w
k
¼ P
À
w
k
ðC
w
k
Þ
T
½C
w
k
P
@Fðx;
^
ww
À
k
Þ
@x
^
xx
kÀ1
;
k
¼ðy
k
À C
^
xx
À
k
Þ; C
w
k
¼
D
À
@
and @Fð
^
xx;
^
wwÞ=@
^
ww
k
are evaluated at
^
ww
k
and contain
static linearizations of the nonlinear function.
The last term in Eq. (5.36) may be dropped if we assume that the
Kalman gain K
x
k
is independent of w. Although this greatly simplifies the
algorithm, the exact computation of @K
x
k
=@
^
ww may be computed, as shown
in Appendix A. Whether the computational expense in calculating the
recursive derivatives (especially that of calculating @K
x
k
=@
k
contain measurement noise n
k
in addition
to the signal. The dual EKF requires reformulating this model into a state-
space representation. One such representation is given by
x
k
¼ Fðx
kÀ1
; wÞþBv
k
; ð5:38Þ
x
k
x
kÀ1
.
.
.
x
kÀMþ1
2
6
6
6
6
4
3
7
.
.
x
kÀM
2
6
6
4
3
7
7
5
2
6
6
6
6
4
3
7
7
7
7
5
þ
1
0
.
.
.
is chosen to be lagged values of the time series, and the
state transition function FðÁÞ has its first element given by f ðÁÞ, with the
remaining elements corresponding to shifted values of the previous state.
The results of a controlled time-series experiment are shown in Figure
5.3. The clean signal, shown by the thin curve in Figure 5.3a, is generated
by a neural network (10-5-1) with chaotic dynamics, driven by white
Gaussian-process noise (s
2
v
¼ 0:36). Colored noise generated by a linear
autoregressive model is added at 3 dB signal-to-noise ratio (SNR) to
produce the noisy data indicated by þ symbols. Figure 5.3b shows the
5.2 DUAL EKF–PREDICTION ERROR
133
Figure 5.3 The dual EKF estimate (heavy curve) of a signal generated by a
neural network (thin curve) and corrupted by adding colored noise at 3 dB
(þ). For clarity, the last 150 points of a 20,000-point series are shown. Only the
noisy data are available: both the signal and weights are estimated by the
dual EKF. (a ) Clean neural network signal and noisy measurements. (b) Dual
EKF estimates versus EKF estimates. (c ) Estimates with full and static deriva-
tives. (d ) MSE profiles of EKF versus dual EKF.
134
5 DUAL EXTENDED KALMAN FILTER METHODS
time series estimated by the dual EKF. The algorithm estimates both the
clean time series and the neural network weights. The algorithm is run
sequentially over 20,000 points of data; for clarity, only the last 150 points
are shown. For comparison, the estimates using an EKF with the known
neural network model are also shown. The MSE for the dual EKF,
computed over the final 1000 points of the series, is 0.2171, whereas
the EKF produces an MSE of 0.2153, indicating that the dual algorithm
k
g
N
1
and weights
2
A surprising result is that the dual EKF sometimes actually outperforms the EKF, even
though the EKF appears to have an unfair advantage of knowing the true model. Our
explanation is that the EKF, even with the known model, is still an approximate estimation
algorithm. While the dual EKF also learns an approximate model, this model can actually
be better matched to the state estimation approximation.
5.3 A PROBABILISTIC PERSPECTIVE
135
w, given the noisy data fy
k
g
N
1
. For notational convenience, define the
column vectors x
N
1
and y
N
1
, with elements from fx
k
g
N
1
N
1
wjy
N
1
.
The MAP estimation approach consists of determining instances of the
states and weights that maximize this conditional density. For Gaussian
distributions, the MAP estimate also corresponds to the minimum mean-
squared error (MMSE) estimator. More generally, as long as the density is
unimodal and symmetric around the mean, the MAP estimate provides the
Bayes estimate for a broad class of loss functions [36].
Taking MAP as the starting point allows dual estimation approaches to
be divided into two basic classes. The first, referred to here as joint
estimation methods, attempt to maximize r
x
N
1
wjy
N
1
directly. We can write
this optimization problem explicitly as
ð
^
xx
N
1
;
^
wjy
N
1
ð5:42Þ
and maximizing the two terms separately, that is,
^
xx
N
1
¼ arg max
x
N
1
r
x
N
1
jwy
N
1
;
^
ww ¼ arg max
w
r
wjy
N
1
: ð5:43Þ
The cost functions associated with joint and marginal approaches will be
¼
r
y
N
1
jx
N
1
w
r
x
N
1
jw
r
w
r
y
N
1
: ð5:44Þ
Although fy
k
g
N
1
is statistically dependent on fx
k
g
N
jw
¼ r
y
N
1
jx
N
1
w
r
x
N
1
jw
ð5:45Þ
with respect to fx
k
g
N
1
and w.
To derive the corresponding cost function, we assume v
k
and n
k
are
both zero-mean white Gaussian noise processes. It can then be shown (see
[22]), that
r
y
"#
Â
1
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ð2pÞ
N
jR
v
j
N
q
exp À
P
N
k¼1
1
2
ðx
k
À x
À
k
Þ
T
ðR
v
Þ
À1
ðx
k
sponding cost function is given by
J ¼
P
N
k¼1
logð2ps
2
n
Þþ
ðy
k
À Cx
k
Þ
2
s
2
n
"
ð5:48Þ
þ logð2pjR
v
jÞ þ ðx
k
À x
À
k
Þ
T
ðR
k¼1
ðy
k
À Cx
k
Þ
2
s
2
n
þðx
k
À x
À
k
Þ
T
ðR
v
Þ
À1
ðx
k
À x
À
k
Þ
"#
: ð5:50Þ
The first term is a soft constraint keeping fx
the estimates that maximize the joint density r
y
N
1
x
N
1
jw
. This is a difficult
optimization problem because of the high degree of coupling between the
unknown quantities fx
k
g
N
1
and w. In general, we can classify approaches as
being either direct or decoupled. In direct approaches, both the signal and
the state are determined jointly as a multivariate optimization problem.
Decoupled approaches optimize one variable at a time while the other
variable is fixed, and then alternating. Direct algorithms include the joint
EKF algorithm (see Section 5.1), which attempts to minimize the cost
sequentially by combining the signal and weights into a single (joint) state
vector. The decoupled approaches are elaborated below.
Decoupled Estimation To minimize J
j
ðx
N
1
; wÞ with respect to the
signal, the cost function is evaluated using the current estimate
À Cx
k
Þ
2
s
2
n
þðx
k
À
^
xx
À
k
Þ
T
ðR
v
Þ
À1
ðx
k
À
^
xx
À
k
Þ
"#
: ð5:51Þ
kÀ1
; wÞ. Again, this results in a straightforward
substitution in Eq. (5.50):
J
j
ð
^
xx
N
1
; wÞ¼
P
N
k¼1
ðy
k
À C
^
xx
k
Þ
2
s
2
n
þð
^
xx
k
À
j
i
ð
^
xx
N
1
; wÞ¼
P
N
k¼1
ð
^
xx
k
À
^
xx
À
k
Þ
T
ðR
v
Þ
À1
ð
^
xx
k
xx
N
1
;
^
wwÞ could conceivably be
obtained as a solution. Nonetheless, this is essentially the approach used
in [14] for robust prediction of time series containing outliers.
In the decoupled approach to joint estimation, by separately minimizing
each cost with respect to its argument, the values are found that maximize
(at least locally) the joint conditional density function. Algorithms that fall
into this class include a sequential two-observation form of the dual EKF
algorithm [21], and the errors-in-variables (EIV) method applied to batch-
style minimization [18, 19]. An alternative approach, referred to as error
coupling, makes the extra step of taking the errors in the estimates into
account. However, this error-coupled approach (investigated in [22]) does
not appear to perform reliably, and is not described further in this chapter.
5.3 A PROBABILISTIC PERSPECTIVE
139
5.3.2 Marginal Estimation Methods
Recall that in marginal estimation, the joint density function is expanded
as
r
x
N
1
wjy
N
1
¼ r
because both factors also depend on w, maximizing the second (r
wjy
N
1
)
alone with respect to w is not the same as maximizing the joint density
r
x
N
1
wjy
N
1
with respect to w. Nonetheless, the resulting estimates
^
ww are
consistent and unbiased, if conditions of sufficient excitation are met [37].
The marginal estimation approach is exemplified by the maximum-
likelihood approaches [8, 9] and EM approaches [11, 12]. Motivation for
these methods usually comes from considering only the marginal density
r
wjy
N
1
to be the relevant quantity to maximize, rather than the joint density
r
x
N
1
wjy
: ð5:55Þ
If there is no prior information on w, maximizing this posterior density is
equivalent to maximizing the likelihood function r
y
N
1
jw
. Assuming Gaus-
sian statistics, the chain rule for conditional probabilities can be used to
express this likelihood function as:
r
y
N
1
jw
¼
Q
N
k¼1
1
ffiffiffiffiffiffiffiffiffiffiffi
2ps
2
e
k
q
exp À
ðy
k
À y
likelihood cost function:
J
ml
ðwÞ¼
P
N
k¼1
logð2ps
2
e
k
Þþ
ðy
k
À y
kjkÀ1
Þ
2
s
2
e
k
"#
: ð5:58Þ
Note that the log-likelihood function takes the same form whether the
measurement noise is colored or white. In evaluating this cost function,
the term
y
kjkÀ1
¼ C
ðwÞ¼
P
N
k¼1
ðy
k
À y
kjkÀ1
Þ
2
: ð5:59Þ
The basic dual EKF algorithm described in the previous section minimizes
this simplified cost function with respect to the weights w, and is an
example of a recursive prediction error algorithm [6, 19]. While ques-
tionable from a theoretical perspective, these algorithms have been shown
in the literature to be quite useful. In addition, they benefit from reduced
computational cost, because the derivative of the variance s
2
e
k
with respect
to w is not computed.
5.3 A PROBABILISTIC PERSPECTIVE
141
EM Algorithm Another approach to maximizing r
wjy
N
1
is offered by the
expectation-maximization (EM) algorithm [10, 12, 38]. The EM algorithm
gives
log r
y
N
1
jw
¼ E
XjYW
½log r
y
N
1
x
N
1
jw
jy
N
1
;
^
wwÀE
XjYW
½log r
x
N
1
jwy
N
1
1
jw
jy
N
1
;
^
ww with respect to w, each time setting
^
ww to the new
maximizing value. The procedure results in maximizing the original
marginal density r
y
N
1
jw
.
For the white-noise case, it can be shown (see [12, 22]) that the EM cost
function is
J
em
¼ E
XjYW
P
N
k¼1
logð2ps
2
n
Þþ
z
y
N
1
;
^
ww
#
; ð5:62Þ
where x
À
k
¼
D
Fðx
kÀ1
; wÞ, as before. The evaluation of this expectation is
computable on a term-by-term basis (see [12] for the linear case).
However, for the sake of simplicity, we present here the resulting
3
Jensen’s inequality states that E½gðxÞ gðE½xÞ for a concave function gðÁÞ.
142
5 DUAL EXTENDED KALMAN FILTER METHODS
expression for the special case of time-series estimation, represented in
Eq. (5.37). As shown in [22], the expectation evaluates to
^
xx
kjN
À
^
xx
À
kjN
Þ
2
þ p
kjN
À 2p
y
kjN
þ p
À
kjN
s
2
v
#
; ð5:63Þ
where
^
xx
kjN
and p
kjN
are defined as the conditional mean and variance of x
k
, conditioned on all the
data. Again we see that determining state estimates is a necessary step to
determining the weights. In this case, the estimates
^
xx
kjN
are found by
minimizing the joint cost J
j
ðx
N
1
;
^
wwÞ, which can be approximated using an
extended Kalman smoother. A sequential version of EM can be imple-
mented by replacing
^
xx
kjN
with the usual causal estimates
^
xx
k
, found using
the EKF.
Summary of Cost Functions The various cost functions given in this
section are summarized in Table 5.4. No explicit signal estimation cost is
given for the marginal estimation methods, because signal estimation is
;
^
wwÞ Joint signal r
x
N
1
wjy
N
1
(5.51)
J
j
ð
^
xx
N
1
; wÞ Joint weight r
x
N
1
wjy
N
1
(5.52)
J
j
i
ð
^
5.3.3 Dual EKF Algorithms
In this section, we show how the dual EKF algorithm can be modified to
minimize any of the cost functions discussed earlier. Recall that the basic
dual EKF as presented in Section 5.2.3 minimized the prediction error cost
of Eq. (5.59). As was shown in the last section, all approaches use the
same joint cost function for the state-estimation component. Thus, the
state EKF remains unchanged. Only the weight EKF must be modified.
We shall show that this involves simply redefining the error term
k
.
To develop the method, consider again the general state-space formula-
tion for weight estimation (Eq. (5.11)):
w
kþ1
¼ w
k
þ r
k
; ð5:64Þ
d
k
¼ Gðx
k
; w
k
Þþe
k
: ð5:65Þ
We may reformulate this state-space representation as
w
; wÞ
T
ðR
e
Þ
À1
½d
t
À Gðx
t
; wÞ ¼
P
k
t¼1
J
t
.However,
if we consider the modified-Newton algorithm interpretation, it can be
shown [22] that the EKF weight filter is also equivalent to the recursion
^
ww
k
¼
^
ww
À
k
þ P
w
k
144
5 DUAL EXTENDED KALMAN FILTER METHODS
and
P
À1
w
k
¼ðl
À1
P
w
kÀ1
Þ
À1
þðC
w
k
Þ
T
ðR
e
Þ
À1
C
w
k
: ð5:72Þ
The weight update in Eq. (5.68) is of the form
^
ww
terms J
k
¼
T
k
k
. This basic idea was presented by Puskorius and Feld-
kamp [40] for minimizing an entropic cost function; see also Chapter 2.
Note that J
k
¼
T
k
k
does not uniquely specify
k
, which can be vector-
valued. The error must be chosen such that the gradient and inverse
Hessian approximations (Eqs. (5.70) and (5.72)) are consistent with the
desired batch cost.
In the following sections, we give the exact specification of the error
term (and corresponding gradient C
w
k
) necessary to modify the dual EKF
algorithm to minimize the different cost functions. The original set of dual
EKF equations given in Table 5.3 remains the same, with only
k
being
redefined. Note that for each case, the full evaluation of C
^
xx
^
xx
k
¼
4
ð
^
xx
k
À
^
xxÞ
À
k
;
Note that this dual EKF algorithm represents a sequential form of the
decoupled approach to joint optimization; that is, the two EKFs minimize
the overall joint cost function by alternately optimizing one argument at a
5.3 A PROBABILISTIC PERSPECTIVE
145
time while the other argument is fixed. A direct approach found using the
joint EKF is described later in Section 5.3.4.
Marginal Estimation Forms–Maximum-Likelihood Cost The
corresponding weight cost function (see Eq. (5.58)) and error terms are
given in Table 5.6, where
e
k
¼ y
jfy
t
g
kÀ1
1
; wð5:75aÞ
¼ E½ðn
k
þ x
k
À
^
xx
À
k
Þ
2
jfy
t
g
kÀ1
1
; wð5:75bÞ
¼ s
2
n
þ CP
À
k
C
2
s
2
e
k
"#
;
e
k
¼
4
ðl
e;k
Þ
1=2
s
À1
e
k
e
k
"#
; C
w
k
¼
À
1
2
ðl
e
k
Þ
3=2
H
T
w
ðs
2
e
k
Þ
2
6
4
3
7
5
:
Table 5.5 Joint cost function observed error terms for the dual EKF
weight filter
J
j
ð
^
xx
k
1
; wÞ¼
P
k
¼
4
s
À1
n
e
k
s
À1
v
~
^
xx
^
xx
k
"#
; with C
w
k
¼À
s
À1
n
H
T
w
e
k
~
^
xx
^
xx
kjk
¼
^
xx
k
À
^
xx
À
kjk
.
Note that J
em
k
ðwÞ was specified by dropping terms in Eq. (5.63) that are
independent of the weights (see [22]). While
^
xx
k
are found by the usual
state EKF, the variance terms p
y
kjk
, and p
À
ðwÞ¼
P
N
k¼1
e
2
k
¼ðy
k
À
^
xx
À
k
Þ
2
; ð5:76Þ
k
¼
4
e
k
¼ðy
k
À
^
xx
À
k
Þ; C
^
xx
k
À
^
xx
À
kjk
Þ
2
À 2p
y
kjk
þ p
À
kjk
s
2
v
; ð5:77Þ
k
¼
s
À1
v
~
^
xx
^
xx
k
¼
À
1
s
v
H
T
w
~
^
xx
^
xx
kjk
À
ffiffiffiffiffiffiffi
À2
p
ð p
y
kjk
Þ
À1=2
2s
v
H
T
w
p
7
7
7
5
:
5.3 A PROBABILISTIC PERSPECTIVE
147