4
LEAST-SQUARES AND MINIMUM–
VARIANCE ESTIMATES FOR LINEAR
TIME-INVARIANT SYSTEMS
4.1 GENERAL LEAST-SQUARES ESTIMATION RESULTS
In Section 2.4 we developed (2.4-3), relating the 1 Â 1 measurement matrix
Y
n
to the 2 Â 1 state vector X
n
through the 1 Â 2 observation matrix M as given
by
Y
n
¼ MX
n
þ N
n
ð4:1-1Þ
It was also pointed out in Sections 2.4 and 2.10 that this linear time-invariant
equation (i.e., M is independent of time or equivalently n) applies to more
general cases that we generalize further here. Specifically we assume Y
n
is a
1 Âðr þ 1Þ measurement matrix, X
n
a1Â m state matrix, and M an
ðr þ 1ÞÂm observation matrix [see (2.4-3a)], that is,
Y
n
¼
1
ðtÞ
.
.
.
x
mÀ1
ðtÞ
2
6
6
6
6
4
3
7
7
7
7
5
ð4:1-1bÞ
155
Tracking and Kalman Filtering Made Easy. Eli Brookner
Copyright # 1998 John Wiley & Sons, Inc.
ISBNs: 0-471-18407-1 (Hardback); 0-471-22419-7 (Electronic)
and in turn
N
n
¼
their derivatives as given by (2.4-6). Alternately, if we were tracking only a one-
dimensional coordinate, then the states could be the coordinate x itself followed
by its m derivatives, that is,
X
n
¼ Xðt
n
Þ¼
x
Dx
.
.
.
D
m
x
2
6
6
4
3
7
7
5
n
ð4:1-2Þ
where
D
j
x
n
¼
x
n
_
x
n
x
n
2
4
3
5
ð4:1-3Þ
and
È ¼
1 TT
2
=2
01 T
00 1
2
4
3
5
ð4:1-4Þ
Assume that measurements such as given by (4.1-1a) were also made at the L
preceding times at n À 1; ...; n À L. Then the totality of L þ 1 measurements
156
MX
n
--------
MX
nÀ1
---------
.
.
.
---------
MX
nÀL
2
6
6
6
6
6
6
6
6
4
3
7
7
7
7
7
7
7
7
7
7
7
5
ð4:1-5Þ
Assume that the transition matrix for transitioning from the state vector X
nÀ1
at time n À 1 to the state vector X
n
at time n is given by È [see (2.4-1) of
Section 2.4, which gives È for a constant-velocity trajectory; see also Section
5.4]. Then the equation for transitioning from X
nÀi
to X
n
is given by
X
n
¼ È
i
X
nÀi
¼ È
i
X
nÀi
ð4:1-6Þ
where È
i
------
.
.
.
Y
nÀL
2
6
6
6
6
6
6
4
3
7
7
7
7
7
7
5
¼
MX
n
----------
MÈ
À1
X
n
N
n
------
N
nÀ1
-------
.
.
.
-------
N
nÀL
2
6
6
6
6
6
6
6
6
4
3
7
7
7
7
7
7
7
7
7
7
7
7
5
|fflfflfflfflffl{zfflfflfflfflffl}
1
¼
M
-------
MÈ
À1
--------
.
.
.
-------
MÈ
ÀL
2
6
6
6
6
6
6
6
6
4
6
6
6
6
6
6
6
4
3
7
7
7
7
7
7
7
7
5
|fflfflfflfflffl{zfflfflfflfflffl}
1
9
>
>
>
>
>
>
>
>
=
.
.
.
-----
Y
nÀL
2
6
6
6
6
6
6
6
6
4
3
7
7
7
7
7
7
7
7
5
N
ðnÞ
¼
N
5
ð4:1-11aÞ
T ¼
M
-------
MÈ
À1
-------
.
.
.
-------
MÈ
ÀL
2
6
6
6
6
6
6
6
6
4
3
7
7
7
7
7
time. Correspondingly M is the observation matrix [see (2.4-3a)] when a
measurement is available at only one time whereas T is the observation matrix
when measurements are available from L þ 1 times. Both observation matrices
transform the state vector X
n
into the observation space. Specifically X
n
is
transformed to a noise-free Y
n
in (4.1-1) when measurements are available at
one time or to Y
ðnÞ
in (4.1-11) when measurements are available at Lþ 1 time
instances. We see that the observation equation (4.1-11) is identical to that of
(4.1-1) except for T replacing M.
[In Part 1 and (4.1-4), T was used to represent the time between
measurements. Here it is used to represent the observation matrix given by
(4.1-11b). Unfortunately T will be used in Part II of this text to represent these
two things. Moreover, as was done in Sections 1.4 and 2.4 and as shall be done
later in Part II, it is also used as an exponent to indicate the transpose of a
matrix. Although this multiple use for T is unfortunate, which meaning T has
should be clear from the context in which it is used.]
158
LEAST-SQUARES AND MINIMUM–VARIANCE ESTIMATES
By way of example of T, assume L ¼ 1 in (4.1-11a) and (4.1-11b); then
Y
ðnÞ
¼
Y
x
n
ð4:1-14bÞ
On comparing (4.1-14a) and (4.1-14b) with (4.1-8) we see that we can rewrite
(4.1-14a) and (4.1-14b) as (4.1-8) with X
n
given by (2.4-1a) and
È
À1
¼
1 ÀT
01
ð4:1-15Þ
We can check that È
À1
is given by (4.1-15) by verifying that
ÈÈ
À1
¼ I ð4:1-16Þ
where I is the identify matrix and È is given by (2.4-1b).
As done in Section 2.4, assume a radar sensor with only the target
range being observed, with x
n
representing the target range. Then M is given by
(2.4-3a) and Y
n
and N
n
are given by respectively (2.4-3c) and (2.4-3b).
5
ð4:1-18Þ
GENERAL LEAST-SQUARES ESTIMATION RESULTS
159
It is instructive to write out (4.1-11) for this example. In this case (4.1-11)
becomes
Y
ðnÞ
¼
y
n
y
nÀ1
y
nÀ2
.
.
.
y
0
2
6
6
6
6
6
4
3
7
7
þ
n
nÀ1
nÀ2
.
.
.
0
2
6
6
6
6
6
4
3
7
7
7
7
7
5
ð4:1-19Þ
where use was made of (2.4-3b) and (2.4-3c), which hold for arbitrary n;
specifically,
Y
x
n
at time n and the measurement error
nÀi
. The above example thus gives
us a physical feel for the observation matrix T. For the above example, the
ði þ 1Þst row of T physically in effect first transforms X
n
back in time to time
n À i through the inverse of the transition matrix È to the ith power, that is,
through È
Ài
by premultiplying X
n
to yield X
nÀi
, that is,
X
nÀi
¼ È
Ài
X
n
ð4:1-23Þ
Next X
nÀi
is effectively transformed to the noise-free Y
nÀi
measurement at time
n À i by means of premultiplying by the observation matrix M to yield the
x
n
T þ
x
n
ð
1
2
T
2
Þð4:1-25aÞ
_
x
nÀ1
¼
_
x
n
À
x
n
T ð4:1-25bÞ
x
nÀ1
¼
x
again representing target
range. Then M is given by
M ¼½100ð4:1-27Þ
and Y
n
and N
n
are given by respectively (2.4-3c) and (2.4-3b). Substituting
(4.1-26) and (4.1-27) into (4.1-11b) yields finally, for L ¼ n,
T ¼
10 0
1 ÀT
1
2
T
2
1 À2T
1
2
ð2TÞ
2
.
.
.
.
.
.
.
.
.
y
0
2
6
6
6
6
6
4
3
7
7
7
7
7
5
¼
10 0
1 ÀT
1
2
T
2
1 À2T
1
2
ð2TÞ
2
.
.
n
x
n
2
4
3
5
þ
n
nÀ1
nÀ2
.
.
.
0
2
6
6
6
6
6
4
3
7
7
ðnÞ
ð4:1-30Þ
where W is a row matrix of weights, that is, W ¼½w
1
; w
2
; ...; w
s
, where s is
the dimension of Y
ðnÞ
; see (4.1-10) and (4.1-11a). For the least-squares estimate
GENERAL LEAST-SQUARES ESTIMATION RESULTS
161
(LSE) we are looking for, we require that the sum of squares of errors be
minimized, that is,
eðX
Ã
n;n
Þ¼e
n
¼½Y
ðnÞ
À TX
Ã
n;n
T
½ Y
ðnÞ
.
.
.
y
0
2
6
6
6
6
6
4
3
7
7
7
7
7
5
ð4:1-33Þ
and the estimate of the state vector X
n
at time n given by
X
Ã
n;n
¼
x
Ã
n;n
Ã
n;n
¼ x
Ã
nÀi;n
ð4:1-35Þ
as it should. Hence
x
Ã
n;n
x
Ã
nÀ1;n
x
Ã
nÀ2;n
.
.
.
x
Ã
0;n
2
6
6
6
6
6
6
4
n;n
_
x
Ã
n;n
¼ TX
Ã
n;n
ð4:1-36Þ
162
LEAST-SQUARES AND MINIMUM–VARIANCE ESTIMATES
Substituting (4.1-33) and (4.1-36) into (4.1-31) yields
e
n
¼ eðX
Ã
n;n
Þ¼
X
n
i¼0
ðy
nÀi
À x
Ã
nÀi; n
Þ
2
ð4:1-37Þ
Ã
n;n
, and
its slope at time n,
_
x
Ã
n;n
. In constrast in Section 1.2.6 we represented the line
fitting the data by its ordinate and slope at time n ¼ 0, that is, by x
Ã
0
and
v
Ã
0
¼
_
x
Ã
0
, respectively. A line is defned by its ordinate and slope at any time.
Hence it does not matter which time we use, time n ¼ n or time n ¼ 0. (The
covariance of the state vector, however, does depend on what time is used.) The
state vector estimate gives the line’s ordinate and slope at some time. Hence
the state vector at any time defines the estimated line trajectory. At time n ¼ 0
the estimated state vector is
X
Ã
0;n
n
----
Y
nÀ1
----
.
.
.
----
Y
1
----
Y
0
2
6
6
6
6
6
6
6
6
6
6
6
6
4
3
7
MX
0
2
6
6
6
6
6
6
6
6
6
6
6
6
4
3
7
7
7
7
7
7
7
7
7
7
7
7
5
3
7
7
7
7
7
7
7
7
7
7
7
7
5
ð4:1-40Þ
GENERAL LEAST-SQUARES ESTIMATION RESULTS
163
This in turn becomes
Y
n
----
Y
nÀ1
----
.
.
.
----
Y
1
5
¼
MÈ
n
--------
MÈ
nÀ1
--------
.
.
.
--------
MÈ
--------
M
2
6
6
6
6
6
6
6
6
6
6
6
6
4
3
0
2
6
6
6
6
6
6
6
6
6
6
6
6
4
3
7
7
7
7
7
7
7
7
7
7
7
7
5
ð4:1-41Þ
6
6
6
6
6
6
6
6
6
6
6
4
3
7
7
7
7
7
7
7
7
7
7
7
7
5
ð4:1-43Þ
In Section 1.2.10 it was indicated that the least-squares fitting line to the data
of Figure 1.2-10 is given by the recursive g–h growing-memory (expanding-
memory) filter whose weights g and h are given by (1.2-38a and 1.2-38b). The
n;n
given above by (4.1-30) and
(4.1-32) requires a matrix inversion in the calculation of the weights. In Section
5.3 it is shown how the least-squares polynomial fit can be obtained without a
matrix inversion. This is done by the use of the powerful discrete-time
orthogonal Legendre polynomials. What is done is that the polynomial fitof
degree m of (4.1-44) is expressed in terms of the powerful discrete-time
orthogonal Legendre polynomials (DOLP) having degree m. Specifically
(4.1-44) is written as
xðrÞ _¼ p
Ã
ðrÞ¼
X
m
k¼0
k
k
ðrÞð4:1-45Þ
where
k
ðrÞ is the normalized discrete Legendre polynomial (to be defined in
Section 5.3) of degree k and r is an integer time index, specifically, t ¼ rT, and
the
k
are constants that specify the fittoxðrÞ. Briefly
k
ðrÞ is a polynomial in r
of degree k with
, ... instead of L þ 1 measurements. In this case, the discounted least-
squares weighted sum is minimized as was done in (1.2-34) [see (7.1-2)] to
yield the fading-memory filter. Again the best-fitting polynomial of the form,
given by (4.1-45) is found to the data. In Section 1.2.6, for the constant-velocity
target, that is m ¼ 1 in (4.1–44), the best-fitting polynomial, which is a straight
line in this case, was indicated to be given by the fading memory g–h filter,
whose weights g and h are given by (1.2-35a) and (1.2-35b). To find the best-
fitting polynomial, in general the estimating polynomial is again approximated
by a sum of discrete-time orthogonal polynomials, in this case the orthonormal
discrete Laguerre polynomials, which allow the discounted weightings for the
semi-infinite set of data. The resulting best-fitting discounted least-squares
GENERAL LEAST-SQUARES ESTIMATION RESULTS
165
polynomial fit is given by (7.2-5) in recursive form for the case where the
polynomial is of arbitrary degree m.Form ¼ 1, this result yields the fading-
memory g–h filter of Section 1.2.6. Corresponding convenient explicit results
for this recursive fading-memory filter for m ¼ 0, ..., 4 are given in Table 7.2-2.
In reference 5 (4.1-32) is given for the case of a time-varying trajectory
model. In this case M, T, and È all become a function of time (or equivalently n)
and are replaced by M
n
and T
n
and Èðt
n
; t
nÀ1
Þ, respectively; see pages 172,
173, and 182 of reference 5 and Chapter 15 of this book, in which the time-
varying case is discussed.
n;n
¼ T
T
½Y
ðnÞ
À TX
Ã
n;n
¼0 ð4:1-46Þ
Solving for X
Ã
n;n
yields (4.1-32) as we desired to show.
In reference 5 (pp. 181, 182) the LSE weight given by (4.1-32) is derived by
simply putting (4.1-31) into another form analogous to ‘‘completing the
squares’’ and noting that eðX
Ã
n;n
Þ is minimized by making the only term
depending on W zero, with this being achieved by having W be given by
(4.1-32). To give physical insight into the LSE, it is useful to derive it using a
geometric development. We shall give this derivation in the next section. This
derivation is often the one given in the literature [75–77]. In Section 4.3 (and
Chapter 10) it is this geometric interpretation that we use to develop what is
called the voltage-processing method for obtaining a LSE without the use of the
matrix inversion of (4.1-32).
166
LEAST-SQUARES AND MINIMUM–VARIANCE ESTIMATES
4.2 GEOMETRIC DERIVATION OF LEAST-SQUARES
SOLUTION
ð4:2-1Þ
so that
t
1
¼
t
11
t
21
t
31
2
4
3
5
and t
2
¼
t
12
t
22
t
32
2
4
3
5
ð4:2-2Þ
X
10
1 ÀT
1 À2T
2
4
3
5
ð4:2-5Þ
and
t
1
¼
1
1
1
2
4
3
5
t
2
¼
0
ÀT
À2T
2
4
3
5
ð4:2-6Þ
space determined by the m
0
column vectors of T ). Typically Y
ð3Þ
is not in this
plane due to the measurement noise error N
ðnÞ
; see (4.1-11).
Let us go back to the case of arbitrary dimension s for the column space of T
and consider the vector
p
T
¼
p
1
p
2
.
.
.
p
s
2
6
6
6
4
3
7
7
ðnÞ
À TX
n
Þð4:2-9Þ
Applying (4.2-8) to (4.2-9) gives, for the three-dimensional case being
considered,
eðX
n
Þ¼
X
3
i¼1
ðy
i
À p
i
Þ
2
ð4:2-10Þ
Figure 4.2-1 Projection of data vector Y
ð3Þ
onto column space of 3 Â 2 T matrix.
Used to obtain least-squares solution in three-dimensional space. (After Strang [76].)
168
LEAST-SQUARES AND MINIMUM–VARIANCE ESTIMATES
But this is nothing more than the Euclidean distance between the endpoints of
the vectors p
T
and Y
ð3Þ
À TX
3
¼ Y
ð3Þ
À p
T
ð4:2-11Þ
is perpendicular to the plane T
p
when the error term eðX
n
Þ is minimized. Then
X
3
¼ X
Ã
3;3
, where X
Ã
3;3
is such that
ðY
ð3Þ
À TX
Ã
3;3
Þ?T
p
ð4:2-12Þ
We now obtain an expression for X
T
Y
ð3Þ
À T
T
TX
Ã
3;3
Þ¼0 ð4:2-14Þ
Because (4.2-14) must be true for all z, it follows that it is necessary that
T
T
TX
Ã
3;3
¼ T
T
Y
ð3Þ
ð4:2-15Þ
The above in turn yields
X
Ã
3;3
¼ðT
T
TÞ
À1
T
T
ðnÞ
¼ TX
n
(in the least-squares sense) when T is nonsingular, as
it is when s > m
0
, so that T
À1
does not exist and X
n
¼ T
À1
Y
ðnÞ
does not
provide a solution for (4.1-31). The case where s > m
0
is called the
overdeterministic case. It is the situation where we have more measurements
s than unknowns m in our state vector. Also the LSE given by (4.2-16), or
equivalently (4.1-30) with W given by (4.1-32), is referred to as the normal-
equation solution [75, 76, 79–82]. Actually, to be precise, the normal equation
are given by a general form of (4.2-15) given by
T
T
TX
Ã
n;n
¼ T
T
T
t
ð4:2-20Þ
By way of example consider the case where
Y
nÀi
¼ MX
nÀi
þ N
nÀi
ð4:2-21Þ
with each term of the above being 1 Â 1 matrices given by
Y
nÀ1
¼½y
nÀi
ð4:2-21aÞ
M ¼½1ð4:2-21bÞ
X
nÀi
¼½x
nÀi
ð4:2-21cÞ
N
nÀi
¼½
nÀi
ð4:2-21dÞ
so that
y
¼
1
s
X
s
i¼1
y
i
ð4:2-23Þ
which is the sample mean of the y
i
’s, as expected.
Before proceeding let us digress for a moment to point out some other
interesting properties relating to the geometric development of the LSE. We
start by calculating the vector p
T
for the case X
3
¼ X
Ã
3;3
. Specifically,
substituting X
Ã
3;3
given by (4.2-16) into (4.2-8) yields
p
T
¼ TðT
T
projection matrix [76]. [Note that for the projection matrix of (4.2-25) a capital
P is used whereas for the column matrix p
T
of (4.2-8), which represents a
vector in the space being projected onto, a lowercase p is used and the subscript
T is added to indicate the space projected onto.]
The matrix I À P, where I is the identity matrix (diagonal matrix whose
entries equal one), is also a projection matrix. It projects Y
ð3Þ
onto the space
perpendicular to T
p
. In the case of Figure 4.2-1 it would project the vector
Y
ð3Þ
onto the line perpendicular to the plane T
p
forming the vector
Y
ð3Þ
À TX
3
¼ Y
ð3Þ
À p
T
.
The projection matrix P has two important properties. First it is symmetric
[76], which means that
P
^
t
j
¼
1 for i ¼ j
0 for i 6¼ j
ð4:2-28Þ
Generally the t
i
are not unitary and orthogonal; see, for example, (4.1-28) and
(4.1-18). However, we shall show in Section 4.3 how to transform T so that the
t
i
are orthonormal. For an orthonormal matrix
T
T
T ¼ I ð4:2-29Þ
where I is the identity matrix. When T is orthonormal (4.2-25) becomes, for
arbitrary m
P ¼ TT
T
¼
^
t
1
^
t
T
1
ð4:2-31Þ
and
p
t
¼
^
t
1
^
t
T
1
Y
ðnÞ
ð4:2-32Þ
Here p
t
is the projection of Y
ðnÞ
onto the one-dimensional space T
p
, that is, onto
the unit vector
^
t
1
.
When T is composed of m orthonormal vectors
^
t
0
^
t
T
m
0
Y
ðnÞ
ð4:2-33Þ
that is, p
T
is the sum of the projections of Y
ðnÞ
onto the orthonormal vectors
t
1
; ...; t
m
0
. Finally when T is orthonormal so that (4.2-29) applies, (4.2-16)
becomes, for arbitrary m
0
,
X
Ã
n;n
¼ T
T
Y
ðnÞ
ðnÞ
Þ
^
t
1
ð4:2-35Þ
As implied above with respect to the discussion relative to Figure 4.2-1, Y
ðnÞ
and
^
t
1
can be interpreted as s-dimensional vectors in hyperspace. Physically,
in the above,
^
t
T
1
Y
ðnÞ
represents the amplitude of the projection of Y
ðnÞ
onto
the unit vector
^
t
1
. The direction of the projection of Y
ðnÞ
onto
with
^
t
1
. This is given by
^
t
1
Á Y
ðnÞ
¼k
^
t
1
kÁkY
ðnÞ
k cos
¼kY
ðnÞ
k cos
ð4:2-36Þ
where use was made in the above of the fact that
^
t
1
is unitary so that
k
^
t
1
2
þ t
31
y
3
ð4:2-37Þ
For this case t
i1
of (4.2-2) is the ith coordinate of the unit vector
^
t
1
in some
three-dimensional orthogonal space; let us say x, y, z. In this space the
coordinates x, y, z themselves have directions defined by respectively the unit
vectors i, j, k given by
i ¼
1
0
0
2
4
3
5
j ¼
0
1
0
2
4
1
k t
1
k
ð4:2-39Þ
GEOMETRIC DERIVATION OF LEAST-SQUARES SOLUTION
173
But the magnitude of t
1
, also called its Euclidean norm, is given by
k t
1
k¼
ffiffiffiffiffiffiffiffi
t
T
1
t
1
q
ð4:2-40Þ
Hence (4.2-35) becomes, for t
1
not unitary,
p
t
¼
t
T
1
1
t
1
¼
t
T
1
Y
ðnÞ
t
1
kt
1
k
2
ð4:2-41Þ
This is the situation we had in (4.2-20). Thus we again see physically that the
least-squares estimate
^
X
Ã
n;n
of (4.2-20) is the projection of Y
ðnÞ
onto the
direction of the nonunitary vector t
1
as it should be based on the discussion
relative to Figure 4.2-1.
4.3 ORTHONORMAL TRANSFORMATION AND