Tài liệu Thuật toán ICA - 13: Practical Considerations - Pdf 92

13
Practical Considerations
In the preceding chapters, we presented several approaches for the estimation of
the independent component analysis (ICA) model. In particular, several algorithms
were proposed for the estimation of the basic version of the model, which has a
square mixing matrix and no noise. Now we are, in principle, ready to apply those
algorithms on real data sets. Many such applications will be discussed in Part IV.
However, when applying the ICA algorithms to real data, some practical con-
siderations arise and need to be taken into account. In this chapter, we discuss
different problems that may arise, in particular, overlearning and noise in the data.
We also propose some preprocessing techniques (dimension reduction by principal
component analysis, time filtering) that may be useful and even necessary before the
application of the ICA algorithms in practice.
13.1 PREPROCESSING BY TIME FILTERING
The success of ICA for a given data set may depend crucially on performing some
application-dependent preprocessing steps. In the basic methods discussed in the
previous chapters, we always used centering in preprocessing, and often whitening
was done as well. Here we discuss further preprocessing methods that are not
necessary in theory, but are often very useful in practice.
263
Independent Component Analysis. Aapo Hyv
¨
arinen, Juha Karhunen, Erkki Oja
Copyright

2001 John Wiley & Sons, Inc.
ISBNs: 0-471-40540-X (Hardback); 0-471-22131-7 (Electronic)
264
PRACTICAL CONSIDERATIONS
13.1.1 Why time filtering is possible
In many cases, the observed random variables are, in fact, time signals or time series,

i
(t)
, with the same
mixing matrix. This can be seen as follows. Denote by
X
the matrix that contains
the observations
x(1):::x(T )
as its columns, and similarly for
S
. Then the ICA
model can be expressed as:
X = AS
(13.1)
Now, time filtering of
X
corresponds to multiplying
X
from the right by a matrix, let
us call it
M
.Thisgives
X

= XM = ASM = AS

(13.2)
which shows that the ICA model still remains valid. The independent components
are filtered by the same filtering that was applied on the mixtures. They are not
mixed with each other in

B
B
B
B
B
B
B
B
@
.
.
.
:::1 1 1 0 0 0 0 0 :::
:::0 1 1 1 0 0 0 0 :::
:::0 0 1 1 1 0 0 0 :::
:::0 0 0 1 1 1 0 0 :::
:::0 0 0 0 1 1 1 0 :::
:::0 0 0 0 0 1 1 1 :::
.
.
.
1
C
C
C
C
C
C
C
C

B
B
B
B
B
B
B
B
@
.
.
.
:::1 1 0 0 0 0 0 :::
:::0 1 1 0 0 0 0 :::
:::0 0 1 1 0 0 0 :::
:::0 0 0 1 1 0 0 :::
:::0 0 0 0 1 1 0 :::
:::0 0 0 0 0 1 1 :::
.
.
.
1
C
C
C
C
C
C
C
C

s(t)
given its past. Thus the
innovation process of
~
s(t)
is defined by
~
s(t)=s(t)  E fs(t)js(t  1) s(t  2):::g
(13.5)
The expression “innovation” describes the fact that
~
s(t)
contains all the new infor-
mation about the process that can be obtained at time
t
by observing
s(t)
.
The concept of innovations can be utilized in the estimation of the ICA model due
to the following property:
Theorem 13.1 If
x(t)
and
s(t)
follow the basic ICA model, then the innovation
processes
~
x(t)
and
~

13.1.4 Optimal filtering
Both of the preceding types of filtering have their pros and cons. The optimum would
be to find a filter that increases the independence of the components while reducing
PREPROCESSING BY PCA
267
noise. To achieve this, some compromise between high- and low-pass filtering may
be the best solution. This leads to band-pass filtering, in which the highest and the
lowest frequencies are filtered out, leaving a suitable frequency band in between.
What this band should be depends on the data and general answers are impossible to
give.
In addition to simple low-pass/high-pass filtering, one might also use more so-
phisticated techniques. For example, one might take the (1-D) wavelet transforms of
the data [102, 290, 17]. Other time-frequency decompositions could be used as well.
13.2 PREPROCESSING BY PCA
A common preprocessing technique for multidimensional data is to reduce its dimen-
sion by principal component analysis (PCA). PCA was explained in more detail in
Chapter 6. Basically, the data is projected linearly onto a subspace
~
x = E
n
x
(13.6)
so that the maximum amount of information (in the least-squares sense) is preserved.
Reducing dimension in this way has several benefits which we discuss in the next
subsections.
13.2.1 Making the mixing matrix square
First, let us consider the case where the the number of independent components
n
is smaller than the number of mixtures, say
m


Nhờ tải bản gốc

Tài liệu, ebook tham khảo khác

Music ♫

Copyright: Tài liệu đại học © DMCA.com Protection Status