8
Adaptive Filtering
As discussed in previous chapters, filtering refers to the linear process designed to alter
the spectral content of an input signal in a specified manner. In Chapters 5 and 6, we
introduced techniques for designing and implementing FIR and IIR filters for given
specifications. Conventional FIR and IIR filters are time-invariant. Theyperform linear
operations on an input signal to generate an output signal based on the fixed coeffi-
cients. Adaptive filters are time varying, filter characteristics such as bandwidth and
frequencyresponse change with time. Thus the filter coefficients cannot be determined
when the filter is implemented. The coefficients of the adaptive filter are adjusted
automaticallybyan adaptive algorithm based on incoming signals. This has the import-
ant effect of enabling adaptive filters to be applied in areas where the exact filtering
operation required is unknown or is non-stationary.
In Section 8.1, we will review the concepts of random processes that are useful in the
development and analysis of various adaptive algorithms. The most popular least-mean-
square (LMS) algorithm will be introduced in Section 8.2. Its important properties will be
analyzed in Section 8.3. Two widely used modified adaptive algorithms, the normalized
and leakyLMS algorithms, will be introduced in Section 8.4. In this chapter, we introduce
and analyze the LMS algorithm following the derivation and analysis given in [8]. In
Section 8.5, we will brieflyintroduce some important applications of adaptive filtering.
The implementation considerations will be discussed in Section 8.6, and the DSP imple-
mentations using the TMS320C55x will be presented in Section 8.7.
8.1 Introduction to Random Processes
A signal is called a deterministic signal if it can be described preciselyand be reproduced
exactlyand repeatedly. However, the signals encountered in practice are not necessarily
of this type. A signal that is generated in a random fashion and cannot be described by
mathematical expressions or rules is called a random (or stochastic) signal. The signals
in the real world are often random in nature. Some common examples of random
signals are speech, music, and noises. These signals cannot be reproduced and need to
be modeled and analyzed using statistical techniques. We have briefly introduced
probabilityand random variables in Section 3.3. In this section, we will review the
representation of the process as a whole.
8.1.1 Correlation Functions
For manyapplications, one signal is often used to compare with another in order to
determine the similaritybetween the pair, and to determine additional information
based on the similarity. Autocorrelation is used to quantify the similarity between two
segments of the same signal. The autocorrelation function of the random process x(n)is
defined as
r
xx
n, kExnxk: 8:1:1
This function specifies the statistical relation of two samples at different time index n
and k, and gives the degree of dependence between two random variables of n À k
units apart. For example, consider a digital white noise x(n) as uncorrelated random
variables with zero-mean and variance s
2
x
. The autocorrelation function is
r
xx
n, kExnxk ExnExk
0, n T k
s
2
x
, n k.
&
8:1:2
If we subtract the means in (8.1.1) before taking the expected value, we have the
autocovariance function
g
n, kÀm
x
nm
y
k: 8:1:5
Correlation is a veryuseful DSP tool for detecting signals that are corrupted
byadditive random noise, measuring the time delaybetween two signals, determining
the impulse response of a system (such as obtain the room impulse response used in
Section 4.5.2), and manyothers. Signal correlation is often used in radar, sonar, digital
communications, and other engineering areas. For example, in CDMA digital commu-
nications, data symbols are represented with a set of unique key sequences. If one of
these sequences is transmitted, the receiver compares the received signal with every
possible sequence from the set to determine which sequence has been received. In radar
and sonar applications, the received signal reflected from the target is the delayed
version of the transmitted signal. Bymeasuring the round-trip delay, one can determine
the location of the target.
Both correlation functions and covariance functions are extensivelyused in analyzing
random processes. In general, the statistical properties of a random signal such as the
mean, variance, and autocorrelation and autocovariance functions are time-varying
functions. A random process is said to be stationaryif its statistics do not change
with time. The most useful and relaxed form of stationaryis the wide-sense stationary
(WSS) process. A random process is called WSS if the following two conditions are
satisfied:
1. The mean of the process is independent of time. That is,
Exn m
x
, 8:1:6
1. where m
x
is a constant.
0Ex
2
n is equal to the mean-squared value, or the power in the
random process.
In addition, if x(n) is a zero-mean random process, we have
r
xx
0Ex
2
n s
2
x
: 8:1:10
Thus the autocorrelation function of a signal has its maximum value at zero lag.
If x(n) has a periodic component, then r
xx
k will contain the same periodic com-
ponent.
Example 8.1: Given the sequence
xna
n
un,0< a < 1,
the autocorrelation function can be computed as
r
xx
k
I
nÀI
xn kxn
x
Ecos!n 0.
(b) r
xx
kExn kxn Ecos!n !k cos!n
1
2
Ecos2!n !k
1
2
cos!k
1
2
cos!k:
The crosscorrelation function of two WSS processes x(n) and y(n) is defined as
r
xy
kExn kyn: 8:1:11
This crosscorrelation function has the property
r
xy
kr
yx
Àk: 8:1:12
Therefore r
yx
k is simplythe folded version of r
xy
k. Hence, r
1
N
NÀ1
n0
Â
xnÀm
x
Ã
2
: 8:1:14
The sample autocorrelation function is defined as
r
xx
k
1
N À k
NÀkÀ1
n0
xn kxn, k 0, 1, ..., N À 1, 8:1:15
where N is the length of the sequence x(n). Note that for a given sequence of length
N, Equation (8.1.15) generates values for up to N different lags. In practice, we can
onlyexpect good results for lags of no more than 5±10 percent of the length of the
signals.
The autocorrelation and crosscorrelation functions introduced in this section can be
computed using the MATLAB function xcorr in the Signal Processing Toolbox. The
crosscorrelation function r
k
rather than x(n).
The correlation functions represent the time-domain description of the statistics of a
random process. The frequency-domain statistics are represented by the power density
spectrum (PDS) or the autopower spectrum. The PDS is the DTFT (or the z-transform)
of the autocorrelation function r
xx
k of a WSS signal x(n) defined as
P
xx
!
I
kÀI
r
xx
ke
Àj!k
, 8:1:16
or
P
xx
z
I
kÀI
r
xx
kz
Àk
p
Àp
P
xx
!d!: 8:1:19
Thus r
xx
0 represents the average power in the random signal x(n). The PDS is a
periodic function of the frequency !, with the period equal to 2p. We can show (in the
exercise problems) that P
xx
! of a WSS signal is a real-valued function of !.Ifx(n)isa
real-valued signal, P
xx
! is an even function of !. That is,
P
xx
!P
xx
À!8:1:20
or
P
xx
zP
xx
z
À1
: 8:1:21
356
defined as
r
xx
ks
2
x
dkm
2
x
: 8:1:24
The corresponding PDS is given by
P
xx
!s
2
x
2pm
2
x
d!, j!j p: 8:1:25
An important white random signal is called white noise, which has zero mean.
Thus its autocorrelation function is expressed as
r
xx
ks
2
x
dk, 8:1:26
and the power spectrum is given by
P
Hz
2
P
xx
z, 8:1:29
INTRODUCTION TO RANDOM PROCESSES
357
h(n)
H(w)
x(n)
P
xx
(w) P
yy
(w)
y(n)
Figure 8.1 Linear filtering of random processes
where H! is the frequencyresponse of the filter. Therefore the value of the output
PDS at frequency ! depends on the squared magnitude response of the filter and the
input PDS at the same frequency.
Another important relationships between x(n) and y(n) are
m
y
E
I
lÀI
hlxn À l
Taking the z-transform of both sides, we obtain
P
yx
zHzP
xx
z: 8:1:32
Similarly, the relationships between the input and the output signals are
r
xy
k
I
lÀI
hlr
xx
k lhkÃr
xx
Àk8:1:33
and
P
yx
zH
Ã
zP
xx
z: 8:1:34
If the input signal x(n) is a zero-mean white noise with the autocorrelation function
defined in (8.1.26), Equation (8.1.31) becomes
r
yx
(b) r
yy
kE yn kyn
14r
xx
k9r
xx
k À 19r
xx
k 12r
xx
k À 22r
xx
k 2
14s
2
x
if k 0
9s
2
x
if k Æ1
2s
2
x
if k Æ2
0 otherwise.
V
b
y(n). The function of the adaptive algorithm is to adjust the digital filter coefficients to
ADAPTIVE FILTERS
359
x(n) y(n)
d(n)
e(n)
+
−
Adaptive
algorithm
Digital
filter
Figure 8.2 Block diagram of adaptive filter
y(n)
z
−1
z
−1
w
0
(n)
x(n)
x(n−1)
x(n−L+1)
w
L−1
(n)w
1
(n)
Figure 8.3 Block diagram of FIR filter for adaptive filtering
8:2:2
360
ADAPTIVE FILTERING
and the weight vector at time n as
wnw
0
n w
1
n ...w
LÀ1
n
T
: 8:2:3
Then the output signal y(n) in (8.2.1) can be expressed using the vector operation
ynw
T
nxnx
T
nwn: 8:2:4
The filter output y(n) is compared with the desired response d(n), which results in the
error signal
endnÀyndnÀw
T
nxn: 8:2:5
In the following sections, we assume that d(n)andx(n) are stationary, and our objective is
to determine the weight vector so that the performance (or cost) function is minimized.
8.2.2 Performance Function
The general block diagram of the adaptive filter shown in Figure 8.2 updates the
coefficients of the digital filter to optimize some predetermined performance criterion.
The most commonlyused performance measurement is based on the mean-square error
and
r
dx
kEdnxn À k 8:2:9
is the crosscorrelation function between d(n) and x(n). In (8.2.7), R is the input auto-
correlation matrix defined as
R Exnx
T
n
r
xx
0 r
xx
1 ÁÁÁ r
xx
L À 1
r
xx
1 r
xx
0 ÁÁÁ r
xx
L À 2
.
.
.
ÁÁÁ
.
.
x(n)
x(n−1)
w
1
z
−1
++
+
−
d(n)
e(n)
If Ex
2
n 1, Exnxn À 1 0:5, Ed
2
n 4, Ednxn À1, and
Ednxn À 1 1. Find x.
From (8.2.10), R
10:5
0:51
!
, and from (8.2.8), we have P
À1
1
!
.
Therefore from (8.2.7), we obtain
x Ed
2
n À 2p
as the solution to
Rw
o
p: 8:2:12
This system equation defines the optimum filter coefficients in terms of two correlation
functions ± the autocorrelation function of the filter input and the crosscorrelation
function between the filter input and the desired response. Equation (8.2.12) provides a
solution to the adaptive filtering problem in principle. However, in manyapplications,
the signal maybe non-stationary. This linear algebraic solution, w
o
R
À1
p, requires
continuous estimation of R and p, a considerable amount of computations. In addition,
when the dimension of the autocorrelation matrix is large, the calculation of R
À1
may
present a significant computational burden. Therefore a more useful algorithm is
obtained bydeveloping a recursive method for computing w
o
, which will be discussed
in the next section.
To obtain the minimum MSE, we substitute the optimum weight vector w
o
R
À1
p
for w(n) in (8.2.7), resulting in
x
min
xndn À 1. Find w
o
and x
min
Similar to Example 8.2, we can obtain r
xx
0Ex
2
n Ed
2
n 1,
r
xx
1cos!
0
, r
xx
2cos2!
0
, r
dx
0r
xx
1,andr
dx
1r
xx
2.From
(8.2.12), we have
w
0
cos2!
0
2 cos!
0
À1
!
0:
Equation (8.2.7) is the general expression for the performance function of an adaptive
FIR filter with given weights. That is, the MSE is a function of the filter coefficient
vector w(n). It is important to note that the MSE is a quadratic function because the
weights appear onlyto the first and second degrees in (8.2.7). For each coefficient vector
w(n), there is a corresponding (scalar) value of MSE. Therefore the MSE values
associated with w(n) form an L 1-dimensional space, which is commonlycalled the
MSE surface, or the performance surface.
For L 2, this corresponds to an error surface in a three-dimensional space. The
height of xn corresponds to the power of the error signal e(n) that results from filtering
the signal x(n) with the coefficients w(n). If the filter coefficients change, the power in the
error signal will also change. This is indicated bythe changing height on the surface
above w
0
Àw
1
the plane as the component values of w(n) are varied. Since the error
surface is quadratic, a unique filter setting wnw
o
will produce the minimum MSE,
x
r
xx
1 r
xx
0
!
10
01
!
.From
(8.2.8), we have p
r
dx
0
r
dx
1
!
b
0
b
1
!
. From (8.2.7), we get
x Ed
2
n À 2p
T
1
w
2
0
w
2
1
:
The MATLAB script (exam8_7a.m in the software package) is used to plot the error
surface shown in Figure 8.4(a) and the script exam8_7b.m is used to plot the error
contours shown in Figure 8.4(b).
1200
1000
800
600
MSE
400
200
20
15
10
5
0
−5
−10
−15
−20
−30 −20 −10
0
w0
steady-state performance, and the computational complexity.
The steepest-descent method reaches the minimum byfollowing the direction in
which the performance surface has the greatest rate of decrease. Specifically, an algo-
rithm whose path follows the negative gradient of the performance surface. The
steepest-descent method is an iterative (recursive) technique that starts from some initial
(arbitrary) weight vector. It improves with the increased number of iterations. Geomet-
rically, it is easy to see that with successive corrections of the weight vector in the
direction of the steepest descent on the concave performance surface, we should arrive
at its minimum, x
min
, at which point the weight vector components take on their
optimum values. Let x0 represent the value of the MSE at time n 0 with an arbitrary
choice of the weight vector w(0). The steepest-descent technique enables us to descend
to the bottom of the bowl, w
o
, in a systematic way. The idea is to move on the error
surface in the direction of the tangent at that point. The weights of the filter are updated
at each iteration in the direction of the negative gradient of the error surface.
The mathematical development of the method of steepest descent is easilyseen from
the viewpoint of a geometric approach using the MSE surface. Each selection of a filter
weight vector w(n) corresponds to onlyone point on the MSE surface, [wn, xn].
Suppose that an initial filter setting w(0) on the MSE surface, [w0, x0] is arbitrarily
chosen. A specific orientation to the surface is then described using the directional
derivatives of the surface at that point. These directional derivatives quantifythe rate of
change of the MSE surface with respect to the w(n) coordinate axes. The gradient of the
error surface rxn is defined as the vector of these directional derivatives.
The concept of steepest descent can be implemented in the following algorithm:
wn 1wnÀ
m
2
2
n, to estimate the MSE. That is,
^
xne
2
n: 8:2:15
Therefore the gradient estimate used bythe LMS algorithm is
r
^
xn2
Â
ren
Ã
en: 8:2:16
Since endnÀw
T
nxx, renÀxn, the gradient estimate becomes
r
^
xnÀ2xnen: 8:2:17
Substituting this gradient estimate into the steepest-descent algorithm of (8.2.14), we have
wn 1wnmxnen: 8:2:18
This is the well-known LMS algorithm, or stochastic gradient algorithm. This algorithm
is simple and does not require squaring, averaging, or differentiating. The LMS algo-
rithm provides an alternative method for determining the optimum filter coefficients
without explicitlycomputing the matrix inversion suggested in (8.2.12).
Widrow's LMS algorithm is illustrated in Figure 8.5 and is summarized as follows:
1. Determine L, m, and w(0), where L is the order of the filter, m is the step size, and
w(0) is the initial weight vector at time n 0.
2. Compute the adaptive filter output
estimation error.
8.3.1 Stability Constraint
As shown in Figure 8.5, the LMS algorithm involves the presence of feedback. Thus
the algorithm is subject to the possibilityof becoming unstable. From (8.2.18), we
observe that the parameter m controls the size of the incremental correction applied
to the weight vector as we adapt from one iteration to the next. The mean weight
convergence of the LMS algorithm from initial condition w(0) to the optimum filter w
o
must satisfy
0 < m <
2
l
max
, 8:3:1
where l
max
is the largest eigenvalue of the autocorrelation matrix R defined in (8.2.10).
Applying the stability constraint on m given in (8.3.1) is difficult because of the compu-
tation of l
max
when L is large.
In practical applications, it is desirable to estimate l
max
using a simple method. From
(8.2.10), we have
PERFORMANCE ANALYSIS
367
trRLr
xx
0
Ã
8:3:4
denotes the power of x(n). Therefore setting
0 < m <
2
LP
x
8:3:5
assures that (8.3.1) is satisfied.
Equation (8.3.5) provides some important information on how to select m, and they
are summarized as follows:
1. Since the upper bound on m is inverselyproportional to L, a small m is used for large-
order filters.
2. Since m is made inverselyproportional to the input signal power, weaker signals use
a larger m and stronger signals use a smaller m. One useful approach is to normalize
with respect to the input signal power P
x
. The resulting algorithm is called the
normalized LMS algorithm, which will be discussed in Section 8.4.
8.3.2 Convergence Speed
In the previous section, we saw that w(n) converges to w
o
if the selection of m satisfies
(8.3.1). Convergence of the weight vector w(n) from w(0) to w
o
corresponds to the
convergence of the MSE from x0 to x
min
. Therefore convergence of the MSE toward
its minimum value is a commonlyused performance measurement in adaptive systems
tional to m, we have a large t
mse
when m is small (i.e., the speed of convergence is slow). If
we use a large value of m, the time constant is small, which implies faster convergence.
The maximum time constant t
mse
1=ml
min
is a conservative estimate of filter per-
formance, since onlylarge eigenvalues will exert significant influence on the conver-
gence time. Since some of the projections maybe negligiblysmall, the adaptive filter
error convergence maybe controlled byfewer modes than the number of adaptive
filter weights. Consequently, the MSE often converges more rapidly than the upper
bound of (8.3.6) would suggest.
Because the upper bound of t
mse
is inverselyproportional to l
min
, a small l
min
can
result in a large time constant (i.e., a slow convergence rate). Unfortunately, if l
max
is
also verylarge, the selection of m will be limited by(8.3.1) such that onlya small m can
satisfythe stabilityconstraint. Therefore if l
max
is verylarge and l
min
is verysmall, from
minjX!j
2
, 8:3:8
where X! is DTFT of x(n) and the maximum and minimum are calculated over the
frequencyrange 0 ! p. From (8.3.7) and (8.3.8), input signals with a flat (white)
spectrum have the fastest convergence speed.
8.3.3 Excess Mean-Square Error
The steepest-descent algorithm in (8.2.14) requires knowledge of the gradient rxn,
which must be estimated at each iteration. The estimated gradient r
^
xn produces the
gradient estimation noise. After the algorithm converges, i.e., w(n) is close to w
o
, the
true gradient rxn%0. However, the gradient estimator r
^
xn T 0. As indicated by
the update of Equation (8.2.14), perturbing the gradient will cause the weight vector
wn 1 to move awayfrom the optimum solution w
o
. Thus the gradient estimation
PERFORMANCE ANALYSIS
369