19
Convolutive Mixtures and
Blind Deconvolution
This chapter deals with blind deconvolution and blind separation of convolutive
mixtures.
Blind deconvolution is a signal processing problem that is closely related to basic
independent component analysis (ICA) and blind source separation (BSS). In com-
munications and related areas, blind deconvolution is often called blind equalization.
In blind deconvolution, we have only one observed signal (output) and one source
signal (input). The observed signal consists of an unknown source signal mixed with
itself at different time delays. The task is to estimate the source signal from the
observed signal only, without knowing the convolving system, the time delays, and
mixing coefficients.
Blind separation of convolutive mixtures considers the combined blind deconvolu-
tion and instantaneous blind source separation problem. This estimation task appears
under many different names in the literature: ICA with convolutive mixtures, mul-
tichannel blind deconvolution or identification, convolutive signal separation, and
blind identification of multiple-input-multiple-output (MIMO) systems. In blind
separation of convolutive mixtures, there are several source (input) signals and sev-
eral observed (output) signals just like in the instantaneous ICA problem. However,
the source signals have different time delays in each observed signal due to the finite
propagation speed in the medium. Each observed signal may also contain time-
delayed versions of the same source due to multipath propagation caused typically
by reverberations from some obstacles. Figure 23.3 in Chapter 23 shows an example
of multipath propagation in mobile communications.
In the following, we first consider the simpler blind deconvolution problem, and
after that separation of convolutive mixtures. Many techniques for convolutive mix-
355
Independent Component Analysis. Aapo Hyv
¨
arinen, Juha Karhunen, Erkki Oja
and the convolution coefficients
a
k
are unknown. Observing
x(t)
only, we want to estimate the source signal
s(t)
.In
other words, we want to find a deconvolution filter
y (t)=
1
X
k=1
h
k
x(t k )
(19.2)
which provides a good estimate of the source signal
s(t)
at each time instant. This
is achieved by choosing the coefficients
h
k
of the deconvolution filter suitably. In
practice the deconvolving finite impulse response (FIR) filter (see the Appendix
for definition) in Eq. (19.2) is assumed to be of sufficient but finite length. Other
structures are possible, but this one is the standard choice.
To estimate the deconvolving filter, certain assumptions on the source signal
s(t)
must be made. Usually it is assumed that the source signal values
deconvolution is frequently needed in communications applications where it is con-
venient to use complex-valued data. Therefore we present most methods for this
general case. The respective algorithms for real data are obtained as special cases.
Methods for estimating the ICA model with complex-valued data are discussed later
in Section 20.3.
19.1.2 Bussgang methods
Bussgang methods [39, 171, 174, 315] include some of the earliest algorithms [152,
392] proposed for blind deconvolution, but they are still widely used. In Bussgang
methods, a noncausal FIR filter structure
y (t)=
L
X
k=L
w
k
(t)x(t k )
(19.3)
of length
2L +1
is used. Here
denotes the complex conjugate. The weights
w
k
(t)
of
the FIR filter depend on the time
t
, and they are adapted using the least-mean-square
.
Assume that the filter length
2L +1
is large enough and the learning algorithm
has converged. It can be shown that then the following condition holds for the output
y (t)
of the FIR filter (19.3):
E
fy (t)y (t k )g
E
fy (t)g (y (t k ))g
(19.6)
A stochastic process that satisfies the condition (19.6) is called a Bussgang process.
The nonlinearity
g (t)
can be chosen in several ways, leading to different Bussgang
type algorithms [39, 171]. The Godard algorithm [152] is the best performing
Bussgang algorithm in the sense that it is robust and has the smallest mean-square
358
CONVOLUTIVE MIXTURES AND BLIND DECONVOLUTION
error after convergence; see [171] for details. The Godard algorithm minimizes the
nonconvex cost function
J
p
(t)=
E
fjy (t)j
p
p
is zero when perfect deconvolution is attained, that is, when
y (t)
=
s(t)
. The error
signal (19.5) in the gradient algorithm (19.4) for minimizing the cost function (19.7)
with respect to the weight
w
k
(t)
has the form
e(t)=y (t)jy (t)j
p2
p
jy (t)j
p
]
(19.9)
In computing
e(t)
, the expectation in (19.7) has been omitted for getting a simpler
stochastic gradient type algorithm. The respective nonlinearity
g (y (t))
is given by
[171]
g (y (t)) = y (t)+y(t)jy (t)j
p2
p
in (19.4); see [11]
BLIND DECONVOLUTION
359
are involved into the estimation process implicitly via the nonlinear function
g ()
.
Cumulants have been defined and discussed briefly in Chapter 2.
Shalvi and Weinstein [398] have derived necessary and sufficient conditions and
a set of cumulant-based criteria for blind deconvolution. In particular, they intro-
duced a stochastic gradient algorithm for maximizing a constrained kurtosis based
criterion. We shall next describe this algorithm briefly, because it is computationally
simple, converges globally, and can be applied equally well to both subgaussian and
supergaussian source signals
s(t)
.
Assume that the source (input) signal
s(t)
is complex-valued and symmetric,
satisfying the condition E
fs(t)
2
g
=
0
. Assume that the length of the causal FIR
deconvolution filter is
M
. The output
z (t)
z (t)]y
(t)
w(t +1) = u(t +1)= k u(t +1) k
(19.14)
Here
s
is the kurtosis of
s(t)
,
kk
is the usual Euclidean norm, and the unnormalized
filter weight vector
u(t)
is defined quite similarly as
w(t)
in (19.13).
It is important to notice that Shalvi and Weinstein’s algorithm (19.14) requires
whitening of the output signal
y (t)
for performing appropriately (assuming that
s(t)
is white, too). For a single complex-valued signal sequence (time series)
fy (t)g
,the
temporal whiteness condition is
E
fy (t)y
close relationship between their algorithm and the CMA algorithm discussed in the
previous subsection; see also [351]. Later, they derived fast converging but more
involved super-exponential algorithms for blind deconvolution in [399]. Shalvi and
360
CONVOLUTIVE MIXTURES AND BLIND DECONVOLUTION
Weinstein have reviewed their blind deconvolution methods in [170]. Closely related
algorithms were proposed earlier in [114, 457].
It is interesting to note that Shalvi and Weinstein’s algorithm (19.14) can be
derived by maximizing the absolute value of the kurtosis of the filtered (deconvolved)
signal
z (t)
under the constraint that the output signal
y (t)
is temporally white [398,
351]. The temporal whiteness condition leads to the normalization constraint of
the weight vector
w(t)
in (19.14). The corresponding criterion for standard ICA is
familiar already from Chapter 8, where gradient algorithms similar to (19.14) have
been discussed. Also Shalvi and Weinstein’s super-exponential algorithm [399] is
very similar to the cumulant-based FastICA as introduced in Section 8.2.3. The
connection between blind deconvolution and ICA is discussed in more detail in the
next subsection.
Instead of cumulants, one can resort to higher-order spectra or polyspectra [319,
318]. They are defined as Fourier transforms of the cumulants quite similarly as
the power spectrum is defined as a Fourier transform of the autocorrelation function
(see Section 2.8.5). Polyspectra provide a basis for blind deconvolution and more
generally identification of nonminimum-phase systems, because they preserve phase
information of the observed signal. However, blind deconvolution methods based
on higher-order spectra tend to be computationally more complex than Bussgang
for a finite number of values of the summation index
k
as
~
x = A
~
s
(19.18)
where
A
is a matrix that contains the coefficients
a
k
of the convolution filter as its
rows, at different positions for each row. This is the classic matrix representation
of a filter. This representation is not exact near the top and bottom rows, but for a
sufficiently large
n
, it is good enough in practice.
From (19.18) we see that the blind deconvolution problem is actually (approxi-
mately) a special case of ICA. The components of
s
are independent, and the mixing
is linear, so we get the standard linear ICA model.