Adaptive
Blind Signal
and
Image Processing
Learning Algorithms
and Applications
includes CD
Andrzej CICHOCKI Shun-ichi AMARI
Contents
Preface xxix
1 Introduction to Blind Signal Processing: Problems and Applications 1
1.1 Problem Formulations – An Overview 2
1.1.1 Generalized Blind Signal Processing Problem 2
1.1.2 Instantaneous Blind Source Separation and
Independent Component Analysis 5
1.1.3 Independent Component Analysis for Noisy Data 11
1.1.4 Multichannel Blind Deconvolution and Separation 14
1.1.5 Blind Extraction of Signals 18
1.1.6 Generalized Multichannel Blind Deconvolution –
State Space Models 19
1.1.7 Nonlinear State Space Models – Semi-Blind Signal
Processing 21
1.1.8 Why State Space Demixing Models? 22
1.2 Potential Applications of Blind and Semi-Blind Signal
Processing 23
1.2.1 Biomedical Signal Processing 24
1.2.2 Blind Separation of Electrocardiographic Signals of
Fetus and Mother 25
2.4.1 Problems Formulation 67
2.4.1.1 A Historical Overview of the TLS Problem 67
2.4.2 Total Least-Squares Estimation 69
2.4.3 Adaptive Generalized Total Least-Squares 73
2.4.4 Extended TLS for Correlated Noise Statistics 75
2.4.4.1 Choice of
¯
R
NN
in Some Practical Situations 77
2.4.5 Adaptive Extended Total Least-Squares 77
2.4.6 An Illustrative Example - Fitting a Straight Line to a
Set of Points 78
2.5 Sparse Signal Representation and Minimum Fuel Consumption
Problem 79
CONTENTS vii
2.5.1 Approximate Solution of Minimum Fuel Problem
Using Iterative LS Approach 81
2.5.2 FOCUSS Algorithms 83
3 Principal/Minor Component Analysis and Related Problems 87
3.1 Introduction 87
3.2 Basic Properties of PCA 88
3.2.1 Eigenvalue Decomposition 88
3.2.2 Estimation of Sample Covariance Matrices 90
3.2.3 Signal and Noise Subspaces - AIC and MDL Criteria
for their Estimation 91
3.2.4 Basic Properties of PCA 93
3.3 Extraction of Principal Components 94
3.4 Basic Cost Functions and Adaptive Algorithms for PCA 98
3.4.1 The Rayleigh Quotient – Basic Properties 98
4.1.8 Robust Prewhitening - Batch Algorithm 140
4.2 SOS Blind Identification Based on EVD 141
4.2.1 Mixing Model 141
4.2.2 Basic Principles: SD and EVD 143
4.3 Improved Blind Identification Algorithms Based on
EVD/SVD 148
4.3.1 Robust Orthogonalization of Mixing Matrices for
Colored Sources 148
4.3.2 Improved Algorithm Based on GEVD 153
4.3.3 Improved Two-stage Symmetric EVD/SVD Algorithm 155
4.3.4 BSS and Identification Using Bandpass Filters 156
4.4 Joint Diagonalization - Robust SOBI Algorithms 157
4.4.1 Modified SOBI Algorithm for Nonstationary Sources:
SONS Algorithm 160
4.4.2 Computer Simulation Experiments 161
4.4.3 Extensions of Joint Approximate Diagonalization
Technique 162
4.4.4 Comparison of the JAD and Symmetric EVD 163
4.5 Cancellation of Correlation 164
4.5.1 Standard Estimation of Mixing Matrix and Noise
Covariance Matrix 164
4.5.2 Blind Identification of Mixing Matrix Using the
Concept of Cancellation of Correlation 165
Appendix A. Stability of the Amari’s Natural Gradient and
the Atick-Redlich Formula 168
Appendix B. Gradient Descent Learning Algorithms with
Invariant Frobenius Norm of the Separating Matrix 171
Appendix C. JADE Algorithm 173
5 Sequential Blind Signal Extraction 177
5.1 Introduction and Problem Formulation 178
Sources 214
5.7.1 Formulation of the Problem 214
5.7.2 Extraction of Single i.i.d. Source Signal 215
5.7.3 Extraction of Multiple i.i.d. Sources 217
5.7.4 Extraction of Colored Sources from Convolutive
Mixture 218
5.8 Computer Simulations: Illustrative Examples 219
5.8.1 Extraction of Colored Gaussian Signals 219
5.8.2 Extraction of Natural Speech Signals from Colored
Gaussian Signals 221
5.8.3 Extraction of Colored and White Sources 222
5.8.4 Extraction of Natural Image Signal from Interferences 223
x CONTENTS
5.9 Concluding Remarks 224
Appendix A. Global Convergence of Algorithms for Blind
Source Extraction Based on Kurtosis 225
Appendix B. Analysis of Extraction and Deflation Procedure 227
Appendix C. Conditions for Extraction of Sources Using
Linear Predictor Approach 228
6 Natural Gradient Approach to Independent Component Analysis 231
6.1 Basic Natural Gradient Algorithms 232
6.1.1 Kullback–Leibler Divergence - Relative Entropy as
Measure of Stochastic Independence 232
6.1.2 Derivation of Natural Gradient Basic Learning Rules 235
6.2 Generalizations of Basic Natural Gradient Algorithm 237
6.2.1 Nonholonomic Learning Rules 237
6.2.2 Natural Riemannian Gradient in Orthogonality
Constraint 239
6.2.2.1 Local Stability Analysis 240
6.3 NG Algorithms for Blind Extraction 242
7.1.1 Recurrent Neural Network 274
7.1.2 Statistical Independence 274
7.1.3 Self-normalization 277
7.1.4 Feed-forward Neural Network and Associated
Learning Algorithms 278
7.1.5 Multilayer Neural Networks 282
7.2 Iterative Matrix Inversion Approach to Derivation of Family
of Robust ICA Algorithms 285
7.2.1 Derivation of Robust ICA Algorithm Using
Generalized Natural Gradient Approach 288
7.2.2 Practical Implementation of the Algorithms 289
7.2.3 Special Forms of the Flexible Robust Algorithm 291
7.2.4 Decorrelation Algorithm 291
7.2.5 Natural Gradient Algorithms 291
7.2.6 Generalized EASI Algorithm 291
7.2.7 Non-linear PCA Algorithm 292
7.2.8 Flexible ICA Algorithm for Unknown Number of
Sources and their Statistics 293
7.3 Computer Simulations 294
Appendix A. Stability Conditions for the Robust ICA
Algorithm (7.50) [332] 300
8 Robust Techniques for BSS and ICA with Noisy Data 305
8.1 Introduction 305
8.2 Bias Removal Techniques for Prewhitening and ICA
Algorithms 306
8.2.1 Bias Removal for Whitening Algorithms 306
8.2.2 Bias Removal for Adaptive ICA Algorithms 307
8.3 Blind Separation of Signals Buried in Additive Convolutive
Reference Noise 310
8.3.1 Learning Algorithms for Noise Cancellation 311
9.1.3 Feed-forward Deconvolution Model and Natural
Gradient Learning Algorithm 342
9.1.4 Recurrent Neural Network Model and Hebbian
Learning Algorithm 343
9.2 Multichannel Blind Deconvolution with Constraints Imposed
on FIR Filters 346
9.3 General Models for Multiple-Input Multiple-Output Blind
Deconvolution 349
9.3.1 Fundamental Models and Assumptions 349
9.3.2 Separation-Deconvolution Criteria 351
9.4 Relationships Between BSS/ICA and MBD 354
CONTENTS xiii
9.4.1 Multichannel Blind Deconvolution in the Frequency
Domain 354
9.4.2 Algebraic Equivalence of Various Approaches 355
9.4.3 Convolution as Multiplicative Operator 357
9.4.4 Natural Gradient Learning Rules for Multichannel
Blind Deconvolution (MBD) 358
9.4.5 NG Algorithms for Double Infinite Filters 359
9.4.6 Implementation of Algorithms for Minimum Phase
Non-causal System 360
9.4.6.1 Batch Update Rules 360
9.4.6.2 On-line Update Rule 360
9.4.6.3 Block On-line Update Rule 360
9.5 Natural Gradient Algorithms with Nonholonomic Constraints 362
9.5.1 Equivariant Learning Algorithm for Causal FIR
Filters in the Lie Group Sense 363
9.5.2 Natural Gradient Algorithm for Fully Recurrent
Network 367
9.6 MBD of Non-minimum Phase System Using Filter
10.3 Estimating Functions for Temporally Correlated Source
Signals 397
10.3.1 Source Model 397
10.3.2 Likelihood and Score Functions 399
10.3.3 Estimating Functions 400
10.3.4 Simultaneous and Joint Diagonalization of Covariance
Matrices and Estimating Functions 401
10.3.5 Standardized Estimating Function and Newton
Method 404
10.3.6 Asymptotic Errors 407
10.4 Semiparametric Models for Multichannel Blind Deconvolution
407
10.4.1 Notation and Problem Statement 408
10.4.2 Geometrical Structures on FIR Manifold 409
10.4.3 Lie Group 410
10.4.4 Natural Gradient Approach for Multichannel Blind
Deconvolution 410
10.4.5 Efficient Score Matrix Function and its Representation
413
10.5 Estimating Functions for MBD 415
10.5.1 Superefficiency of Batch Estimator 418
Appendix A. Representation of Operator K(z) 419
11 Blind Filtering and Separation Using a State-Space Approach 423
11.1 Problem Formulation and Basic Models 424
11.1.1 Invertibility by State Space Model 427
11.1.2 Controller Canonical Form 428
11.2 Derivation of Basic Learning Algorithms 428
11.2.1 Gradient Descent Algorithms for Estimation of
Output Matrices W = [C, D] 429
11.2.2 Special Case - Multichannel Blind Deconvolution with
13.1.9 Important Inequalities 460
13.2 Distance measures 462
13.2.1 Geometric distance measures 462
13.2.2 Distances between sets 462
13.2.3 Discrimination measures 463
References 465
14 Glossary of Symbols and Abbreviations 547
xvi CONTENTS
Index 552
List of Figures
1.1 Block diagrams illustrating blind signal processing or blind
identification problem. 3
1.2 (a) Conceptual model of system inverse problem. (b)
Model-reference adaptive inverse control. For the switch in
position 1 the system performs a standard adaptive inverse
by minimizing the norm of error vector e, for switch in
position 2 the system estimates errors blindly. 4
1.3 Block diagram illustrating the basic linear instantaneous
blind source separation (BSS) problem: (a) General block
diagram represented by vectors and matrices, (b) detailed
architecture. In general, the number of sensors can be larger,
equal to or less than the number of sources. The number of
sources is unknown and can change in time [264, 275]. 6
1.4 Basic approaches for blind source separation with some a
priori knowledge. 9
1.5 Illustration of exploiting spectral diversity in BSS. Three
unknown sources and their available mixture and spectrum
of the mixed signal. The sources are extracted by passing the
mixed signal by three bandpass filters (BPF) with suitable
frequency characteristics depicted in the bottom figure. 11
1359, 1360, 1361]. 20
1.14 Block diagram of a simplified nonlinear demixing NARMA
model. For the switch in open position we have feed-forward
MA model and for the switch closed we have a recurrent
ARMA model. 22
1.15 Simplified model of RBF neural network applied for nonlinear
semi-blind single channel equalization of binary sources; if
the switch is in position 1, we have supervised learning, and
unsupervised learning if it is in position 2. 23
LIST OF FIGURES xix
1.16 Exemplary biomedical applications of blind signal processing:
(a) A multi-recording monitoring system for blind
enhancement of sources, cancellation of noise, elimination
of artifacts and detection of evoked potentials, (b) blind
separation of the fetal electrocardiogram (FECG) and
maternal electrocardiogram (MECG) from skin electrode
signals recorded from a pregnant women, (c) blind
enhancement and independent components of multichannel
electromyographic (EMG) signals. 26
1.17 Non-invasive multi-electrodes recording of activation of the
brain using EEG or MEG. 28
1.18 (a) A subset of the 122-MEG channels. (b) Principal and
(c) independent components of the data. (d) Field patterns
corresponding to the first two independent components.
In (e) the superposition of the localizations of the dipole
originating IC1 (black circles, corresponding to the auditory
cortex activation) and IC2 (white circles, corresponding to
the SI cortex activation) onto magnetic resonance images
(MRI) of the subject. The bars illustrate the orientation of
the source net current. Results are obtained in collaboration
the total least-squares (TLS), least-squares (LS) and data
least-squares (DLS) estimation procedures for the problem of
finding a straight line approximation to a set of points. The
TLS optimization assumes that the measurements of the x
and y variables are in error, and seeks an estimate such that
the sum of the squared values of the perpendicular distances
of each of the points from the straight line approximation
is minimized. The LS criterion assumes that only the
measurements of the y variable is in error, and therefore
the error associated with each point is parallel to the y axis.
Therefore the LS minimizes the sum of the squared values
of such errors. The DLS criterion assumes that only the
measurements of the x variable is in error. 68
2.4 Straight lines fit for the five points marked by ‘x’ obtained
using the: (a) LS (L
2
-norm), (b) TLS, (c) DLS, (d)
L
1
-norm, (e) L
∞
-norm, and (f) combined results. 70
2.5 Straight lines fit for the five points marked by ‘x’ obtained
using the LS, TLS and ETLS methods. 80
3.1 Sequential extraction of principal components. 96
3.2 On-line on chip implementation of fast RLS learning
algorithm for the principal component estimation. 97
4.1 Basic model for blind spatial decorrelation of sensor signals. 130
4.2 Illustration of basic transformation of two sensor signals
with uniform distributions. 131
and g
(y
1
) = 3y
2
1
. 189
5.5 Block diagram illustrating implementation of learning
algorithm for temporally correlated sources. 194
5.6 The neural network structure for one-unit extraction using
a linear predictor. 196
5.7 The cascade neural network structure for multi-unit extraction.198
5.8 The conceptual model of single processing unit for extraction
of sources using adaptive bandpass filter. 202
5.9 Frequency characteristics of 4-th order Butterworth bandpass
filter with adjustable center frequency and fixed bandwidth. 204
5.10 Exemplary computer simulation results for mixture of three
colored Gaussian signals, where s
j
, x
1j
, and y
j
stand for
the j-th source signals, whiten mixed signals, and extracted
signals, respectively. The sources signals were extracted by
employing the learning algorithm (5.73)-(5.74) with L = 5
[1142]. 220
xxii LIST OF FIGURES
1
the
image extracted by the extraction processing unit shown in
Fig. 5.6. The learning algorithm (5.91) with q = 1 was
employed [68, 1142]. 223
6.1 Block diagram illustrating standard independent component
analysis (ICA) and blind source separation (BSS) problem. 232
6.2 Block diagram of fully connected recurrent network. 237
6.3 (a) Plot of the generalized Gaussian pdf for various values
of parameter r (with σ
2
= 1) and (b) corresponding nonlinear
activation functions. 244
6.4 (a) Plot of generalized Cauchy pdf for various values of
parameter r (with σ
2
= 1) and (b) corresponding nonlinear
activation functions. 248
6.5 The plot of kurtosis κ
4
(r) versus Gaussian exponent r: (a)
for leptokurtic signal; (b) for platykurtic signal [232]. 250
6.6 (a) Architecture of feed-forward neural network. (b)
Architecture of fully connected recurrent neural network. 256
7.1 Block diagrams: (a) Recurrent and (b) feed-forward neural
network for blind source separation. 275
7.2 (a) Neural network model and (b) implementation of the
Jutten-H´erault basic continuous-time algorithm for two
channels. 276
7.3 Block diagram of the continuous-time locally adaptive
3
using the algorithm (7.32). 295
7.8 Exemplary computer simulation results for Example 2 using
the algorithm (7.25). (a) Waveforms of primary sources,
(b) noisy sensor signals and (c) reconstructed source signals. 297
7.9 Blind separation of speech signals using the algorithm (7.80):
(a) Primary source signals, (b) sensor signals, (c) recovered
source signals. 298
7.10 (a) Eight ECG signals are separated into: Four maternal
signals, two fetal signals and two noise signals. (b) Detailed
plots of extracted fetal ECG signals. The mixed signals
were obtained from 8 electrodes located on the abdomen of a
pregnant woman. The signals are 2.5 seconds long, sampled
at 200 Hz. 299
8.1 Ensemble-averaged value of the performance index for
uncorrelated measurement noise in the first example: dotted
line represents the original algorithm (8.8) with noise,
dashed line represents the bias removal algorithm (8.10)
with noise, solid line represents the original algorithm (8.8)
without noise [404]. 309
8.2 Conceptual block diagram of mixing and demixing systems
with noise cancellation. It is assumed that reference noise is
available. 311
8.3 Block diagrams illustrating multistage noise cancellation
and blind source separation: (a) Linear model of convolutive
noise, (b) more general model of additive noise modelled
by nonlinear dynamical systems (NDS) and adaptive neural
networks (NN); LA1 and LA2 denote learning algorithms
performing the LMS or back-propagation supervising learning
rules whereas LA3 denotes a learning algorithm for BSS. 313
detailed structure of the recurrent model. 344
9.3 Block diagrams illustrating the multichannel blind
deconvolution problem: (a) Recurrent neural network,
(b) feed-forward neural network (for simplicity, models for
two channels are shown only). 347
9.4 Illustration of the multichannel deconvolution models: (a)
Functional block diagram of the feed-forward model, (b)
architecture of feed-forward neural network (each synaptic
weight W
ij
(z, k) is an FIR or stable IIR filter, (c) architecture
of the fully connected recurrent neural network. 350
LIST OF FIGURES xxv
9.5 Exemplary architectures for two stage multichannel
deconvolution. 353
9.6 Illustration of the Lie group’s inverse of an FIR filter,
where H(z) is an FIR filter of length L = 50, W(z) is the Lie
group’s inverse of H(z), and G(z) = W(z)H(z) is the composite
transfer function. 367
9.7 Cascade of two FIR filters (non-causal and causal) for blind
deconvolution of non-minimum phase system. 369
9.8 Illustration of the information back-propagation learning. 371
9.9 Simulation results of two channel blind deconvolution for
SIMO system in Example 9.2: (a) Parameters of mixing
filters (H
1
(z), H
2
(z)) and estimated parameters of adaptive
deconvoluting filters (W
Example 9.4. 377
9.13 The distribution of parameters of the global transfer function
G(z) of non-causal system in Example 9.4: (a) The initial
state, (b) after convergence [1369]. 378
11.1 Conceptual block diagram illustrating the general linear
state-space mixing and self-adaptive demixing model for
blind separation and filtering. The objective of learning
algorithms is the estimation of a set matrices {A, B, C, D, L}
[287, 289, 290, 1359, 1360, 1361, 1368]. 425
11.2 Kalman filter for noise reduction. 438
12.1 Typical nonlinear dynamical models: (a) The Hammerstein
system, (b) the Wiener system and (c) Sandwich system. 444
12.2 The simple nonlinear dynamical model which leads to the
standard linear filtering and separation problem if the
nonlinear function can be estimated and their inverses exist. 445