WAVELET IMAGE AND VIDEO
COMPRESSION
1
1
4
4
7
7
9
10
11
12
13
15
15
15
19
21
23
24
25
28
30
33
33
38
38
40
43
45
I Preliminaries
2 Preliminaries by Pankaj N. Topiwala
1
2
3
4
Mathematical Preliminaries
1.1
1.2
1.3
Finite-Dimensional Vector Spaces
Analysis
Fourier Analysis
.
Digital Signal Processing
2.1
2.2
Digital Filters
Z-Transform and Bandpass Filtering
Primer on Probability
References
3 Time-Frequency Analysis, Wavelets And Filter Banks
.
References
Contents vi
61
61
63
64
64
66
66
67
68
70
70
73
73
73
73
75
76
78
79
79
80
83
83
85
86
91
93
2.4
DPCM
Huffman Coding
Arithmetic Coding
Run-Length Coding
Quantization
3.1
3.2
Scalar Quantization
Vector Quantization . . .
Summary of Rate-Distortion Theory
Approaches to Lossy Compression
5.1
5.2
5.3
5.4
5.5
VQ
Transform Image Coding Paradigm
JPEG
Pyramid
Wavelets
Image Quality Metrics
6.1
6.2
3
4
5
6
7
Introduction
Subband Coding .
(Sub)optimal Quantization
Interband Decorrelation, Texture Suppression
Human Visual System Quantization
Summary
References
7 Image Coding Using Multiplier-Free Filter Banks by
Alen Docef, Faouzi Kossentini, Wilson C. Chung and Mark J. T. Smith
1
Introduction
1
1
Based on “Multiplication-Free Subband Coding of Color Images”, by Docef, Kossen-
tini, Chung and Smith, which appeared in the Proceedings of the Data Compression
Contents vii
112
114
116
161
162
163
165
168
168
2
3
4
5
6
Coding System
Design Algorithm
Multiplierless Filter Banks
Performance
References
8 Embedded Image Coding Using Zerotrees of Wavelet Coefficients
by Jerome M. Shapiro
1 Introduction and Problem Statement
1.1
1.2
1.3
Embedded Coding
Features of the Embedded Coder
4.3
4.4
4.5
Successive-Approximation Entropy-Coded Quantization
. .
Relationship to Bit Plane Encoding
Advantage of Small Alphabets for Adaptive Arithmetic Coding
Order of Importance of the Bits
Relationship to Priority-Position Coding
5
6
7
8
A Simple Example
Experimental Results
Conclusion
References
9
A New Fast/Efficient Image Codec Based on Set Partitioning in
Hierarchical Trees . by Amir Said and William A. Pearlman
1
2
3
4
5
6
7
8
177
181
183
184
184
185
187
188
189
190
190
191
191
191
192
194
199
199
199
200
202
202
203
204
206
208
209
211
212
214
description
Notation
and
problem
statement
Proposed approach
The SFQ Coding Algorithm
3.1
3.2
3.3
3.4
Tree pruning algorithm: Phase I (for fixed quantizer q and
fixed
)
Predicting the tree: Phase II
Joint
Optimization
of
Space-Frequency
Quantizers
Complexity of the SFQ algorithm
Coding Results
Extension of the SFQ Algorithm from Wavelet to Wavelet Packets .
Wavelet packets
Wavelet packet SFQ
subband
2.1
2.2
2.3
2.4
Classification gain for a single subband
Subband classification gain
Non-uniform
classification
The trade-off between the side rate and the classification gain
Arithmetic coded trellis coded quantization
3.1
3.2
3.3
Trellis coded quantization
Arithmetic coding
Encoding generalized Gaussian sources with ACTCQ system
Image
subband
coder
based
on
classification
and
ACTCQ
4.1
246
246
247
247
249
250
251
253
253
254
255
256
257
258
261
261
12 Low-Complexity Compression of Run Length Coded Image Sub-
bands by John D. Villasenor and Jiangtao Wen
1
2
3
4
5
6
7
Introduction
Large-scale statistics of run-length coded subbands
Structured code trees
2.3
Motivation for Fractal Coding
Mechanics of Fractal Block Coding
Decoding Fractal Coded Images
A Wavelet Framework
3.1
3.2
Notation
A Wavelet Analog of Fractal Block Coding
Self-Quantization of Subtrees
4.1
4.2
Generalization to non-Haar bases
Fractal Block Coding of Textures
Implementation
5.1
Bit Allocation
Results
6.1
6.2
6.3
265
265
266
267
268
271
271
272
275
276
276
277
278
280
280
281
281
281
282
283
283
283
283
286
286
287
289
289
290
291
4
5
6
7
Introduction
1.1
1.2
Background
Overview of the algorithm
The DWT subband decomposition for fingerprints
2.1
2.2
2.3
Linear phase filter banks
Symmetric boundary conditions
Spatial frequency decomposition
Uniform scalar quantization
3.1
3.2
Quantizer characteristics
Bit allocation
Huffman coding
4.1
4.2
The
8
9
Introduction
Discrete Wavelet Transform
Differential Coding
Binary Reduction
Description of the Algorithm
Experimental Results
SAR Image Compression
Conclusions
References
18
Block
Transforms
in
Progressive
Image
Coding
by
Trac
D. Tran and Truong Q. Nguyen
1
2
327
328
329
330
333
334
335
336
337
338
339
341
342
342
343
343
344
349
349
4
5
6
Transform Design
Coding Results
References
IV Video Coding
19
Brief on Video Coding Standards by Pankaj N. Topiwala
1
2
Analysis of Quantization Noise
4
5
The Spatiotemporal Pyramid
Multiresolution Motion Estimation and Interpolation
5.1
5.2
5.3
Basic Search Procedure
Stepwise Refinement
Motion Based Interpolation
6
Compression for ATV
6.1
6.2
6.3
Compatibility and Scan Formats
Results
Relation to Emerging Video Coding Standards
7
Complexity
7.1
3
Based on “A New Approach to Scalable Video Coding”, by Chung, Kossentini and
Smith, which appeared in the Proceedings of the Data Compression Conference, Snowbird,
Utah, March 1995, ©1995 IEEE
Contents xii
350
354
355
357
361
361
362
364
364
365
373
374
374
379
381
383
383
384
385
385
387
387
389
389
391
by Ralph Neff and Avideh Zakhor
1
2
3
Introduction
Matching Pursuit Theory
Detailed System Description
3.1
3.2
3.3
3.4
Motion Compensation
Matching-Pursuit
Residual
Coding
Buffer
Regulation
Intraframe Coding
4
5
6
Results
Conclusions
References
Object Interior Coding
4.1
4.2
Adaptive Motion-Compensated Coding
Spatiotemporal (3-D) Coding of Objects
5
6
7
Simulation results
Conclusions
References
24
Embedded Video Subband Coding with 3D SPIHT by
William A. Pearlman, Beong-Jo Kim, and Zixiang Xiong
1
2
Introduction
System Overview
2.1
2.2
System Configuration
3D Subband Structure
3
4
SPIHT
Motion Compensation
5.1
5.2
5.3
Block Matching Method
Hierarchical Motion Estimation
Motion Compensated Filtering
6
7
Implementation Details
Coding results
7.1
7.2
7.3
7.4
The High Rate Regime
The Low Rate Regime
Embedded Color Coding
Computation Times
8
9
Conclusions
References .
data stream is stored in what must be the ultimate compression system, utilizing
a highly prioritized, time-dependent bit allocation method. Yet despite the high
density mapping (which is lossy—nature chose lossy compression!), on important
enough data sets we can reconstruct images (e.g., events) with nearly perfect clarity.
While estimates on the brain’s storage capacity are indeed astounding (e.g.,
B [1]), even such generous capacity must be efficiently used to permit the
full breadth of human capabilities (e.g., a weekend’s worth of video data, if stored
“raw,” would alone swamp this storage). However, there is fairly clear evidence that
sensory information is not stored raw, but highly processed and stored in a kind
of associative memory. At least since Plato, it has been known that we categorize
objects by proximity to prototypes (i.e., sensory memory is contextual or “object-
oriented”), which may be the key.
What we can do with computers today isn’t anything nearly so remarkable. This
is especially true in the area of pattern recognition over diverse classes of structures.
Nevertheless, an exciting new development has taken place in this digital arena that
has captured the imagination and talent of researchers around the globe—wavelet
image compression. This technology has deep roots in theories of vision (e.g. [2])
and promises performance improvements over all other compression methods, such
as those baaed on Fourier transforms, vectors quantizers, fractals, neural nets, and
many others. It is this revolutionary new technology that we wish to present in this
edited volume, in a form that is accessible to the largest readership possible. A first
glance at the power of this approach is presented below in figure 1, where we achieve
1. Introduction 2
a dramatic 200:1 compression. Compare this to the international standard JPEG,
which cannot achieve 200:1 compression on this image; at its coarsest quantization
(highest compression), it delivers an interesting example of cubist art, figure 2.
FIGURE 1. (a) Original shuttle image, image, in full color (24 b/p); (b) Shuttle
image compressed 200:1 using wavelets.
FIGURE 2. The shuttle image compressed by the international JPEG standard, using the
coarsest quantization possible, giving 176:1 compression (maximum).
The subject of digital dieting, or data compression, divides neatly into two cate-
gories: lossless compression, in which exact recovery of the original data is ensured;
and lossy compression, in which only an approximate reconstruction is available.
The latter naturally requires further analysis of the type and degree of loss, and its
suitability for specific applications. While certain data types cannot tolerate any
loss (e.g., financial data transfers), the volume of traffic in such data types is typ-
ically modest. On the other hand, image data, which is both ubiquitous and data
intensive, can withstand surprisingly high levels of compression while permitting
reconstruction qualities adequate for a wide variety of applications, from consumer
imaging products to publishing, scientific, defense, and even law enforcement imag-
ing. While lossless compression can offer a useful two-to-one (2:1) reduction for
most types of imagery, it cannot begin to address the storage and transmission bot-
tlenecks for the most demanding applications. Although the use of lossless compres-
sion techniques is sometimes tenaciously guarded in certain quarters, burgeoning
data volumes may soon cast a new light on lossy compression. Indeed, it may be
refreshing to ponder the role of lossy memory in natural selection.
While lossless compression is largely independent of lossy compression, the reverse
is not true. On the face of it, this is hardly surprising since there is no harm (and
possibly some value) in applying lossless coding techniques to the output of any
lossy coding system. However, the relationship is actually much deeper, and lossy
compression relies in a fundamental way on lossless compression. In fact, it is only
a slight exaggeration to say that the art of lossy compression is to “simplify” the
given data appropriately in order to make lossless compression techniques effective.
We thus include a brief tour of lossless coding in this book devoted to lossy coding.
1. Introduction 4
2 Compression Standards
It is one thing to pursue compression as a research topic, and another to develop
live imaging applications based on it. The utility of imaging products, systems and
networks depends critically on the ability of one system to “talk” with another—
interoperabilty. This requirement has mandated a baffling array of standardization
the ones developed in this book, offer sufficient advantages over JPEG to warrant
rethinking the standard in part. JPEG is fully covered in the book [5], while all of
these standards are covered in [3].
3 Fourier versus Wavelets
The starting point of this monograph, then, is that we replace the well-worn 8-by-8
DCT by a different transform: the wavelet tranform (WT), which, for reasons to be
clarified later, is now applied to the whole image. Like the DCT, the WT belongs to
a class of linear, invertible, “angle-preserving” transforms called unitary transforms.
1. Introduction 5
However, unlike the DCT, which is essentially unique, the WT has an infinitude of
instances or realizations, each with somewhat different properties. In this book, we
will present evidence suggesting that the wavelet transform is more suitable than
the DCT for image coding, for a variety of realizations.
What are the relative virtues of wavelets versus Fourier analysis? The real strength
of Fourier-based methods in science is that oscillations—waves—are everywhere in
nature. All electromagnatic phenomena are associated with waves, which satisfy
Maxwell's (wave) equations. Additionally, we live in a world of sound waves, vibra-
tional waves, and many other waves. Naturally, waves are also important in vision,
as light is a wave. But visual information—what is in images—doesn’t appear to
have much oscillatory structure. Instead, the content of natural images is typically
that of variously textured objects, often with sharp boundaries. The objects them-
selves, and their texture, therefore constitute important structures that are often
present at different “scales.” Much of the structure occurs at fine scales, and is of
low “amplitude” or contrast, while key structures often occur at mid to large scales
with higher contrast. A basis more suitable for representing information at a variety
of scales, with local contrast changes as well as larger scale structures, would be a
better fit for image data; see figure 3.
The importance of Fourier methods in signal processing comes from station-
arity assumptions on the statistics of signals. Stationary signals have a “spec-
tral” representation. While this has been historically important, the assumption
FIGURE 4. A typical natural image (Lena), with an analysis of the local histogram vari-
ations in the image domain.
algorithms. Such an approach would not only fail to reach the vast and growing
numbers of students, professionals and researchers, from whatever background and
interest, who are drawn to the subject of wavelet compression and to whom we are
principally addressing this book; it would perhaps also risk early obsolescence. For
the algorithms presented herein will very likely be surpassed in time; the real value
and intent here is to educate the readers of the methodology of creating compression
1. Introduction 7
FIGURE 5. An analysis of the local histogram variations in the wavelet transform domain.
algorithms, and to enable them to take the next step on their own. Towards that
end, we were compelled to provide at least a brief tour of the basic mathematical
concepts involved, a few of the compression paradigms of the past and present, and
some of the tools of the trade. While our introductory material is definitely not
meant to be complete, we are motivated by the belief that even a little background
can go a long way towards bringing the topic within reach.
4 Overview of Book
This book is divided into four parts: (I) Background Material, (II) Still Image
Coding, (III) Special Topics in Image Coding, and (IV) Video Coding.
4.1 Part I: Background Material
Part I introduces the basic mathematical structures that underly image compression
algorithms. The intent is to provide an easy introduction to the mathematical
concepts that are prerequisites for the remaider of the book. This part, written
largely by the editor, is meant to explain such topics as change of bases, scalar and
vector quantization, bit allocation and rate-distortion theory, entropy coding, the
discrete-cosine transform, wavelet filters, and other related topics in the simplest
terms possible. In this way, it is hoped that we may reach the many potential
readers who would like to understand image compression but find the research
literature frustrating. Thus, it is explicitly not assumed that the reader regularly
reads the latest research journals in this field. In particular, little attempt is made
vector quantization, transforms such as the discrete-cosine transform (DCT), the
JPEG standard, pyramids, and wavelets. A quick tour of potential mathematical
definitions of image quality metrics is provided, although this subject is still in its
formative stage.
Chapter 5 (“Symmetric Extension Transforms”) is written by Chris Brislawn, and
explains the subtleties of how to treat image boundaries in the wavelet transform.
Image boundaries are a significant source of compression errors due to the disconti-
nuity. Good methods for treating them rely on extending the boundary, usually by
reflecting the point near the boundary to achieve continuity (though not a contin-
uous derivative). Preferred filters for image compression then have a symmetry at
their middle, which can fall either on a tap or in between two taps. The appropri-
ate reflection at boundaries depends on the type of symmetry of the filter and the
length of the data. The end result is that after transform and downsampling, one
preserves the sampling rate of the data exactly, while treating the discontinuity at
the boundary properly for efficient coding. This method is extremely useful, and is
applied in practically every algorithm discussed.
1. Introduction 9
4.2 Part II: Still Image Coding
Part II presents a spectrum of wavelet still image coding techniques. Chapter 6
(“Wavelet Still Image Coding: A Baseline MSE and HVS Approach”) by Pankaj
Topiwala presents a very low-complexity image coder that is tuned to minimizing
distortion according to either mean-squared error or models of human visual system.
Short integer wavelets, a simple scalar quantizer, and a bare-bones arithmetic coder
are used to get optimized compression speed. While use of simplified image models
mean that the performance is suboptimal, this coder can serve as a baseline for
comparision for more sophisticated coders which trade complexity for performance
gains. Similar in spirit is chapter 7 by Alen Docef et al (“Image Coding Using
Multiplier-Free Filter Banks”), which employs multiplication-free subband filters for
efficient image coding. The complexity of this approach appears to be comparable
to that of chapter 6’s, with similar performance.
applied; this is in distinction from recent “backward-adaptive” approaches such as
[4] that also perform extremely well. Finally, Chapter 12 (“Low-Complexity Com-
pression of Run Length Coded Image Subbands”) by John Villasenor and Jiangtao
Wen innovates on the entropy coding approach rather than in the quantization as
1. Introduction 10
in nearly all other chapters. Explicitly aiming for low-complexity, these authors
consider the statistics of quantized and run-length coded image subbands for good
statistical fits. Generalized Gaussian source statistics are used, and matched to a
set of Goulomb-Rice codes for efficient encoding. Excellent performance is achieved
at a very modest complexity.
4.3 Part III: Special Topics in Still Image Coding
Part III is a potpourri of example coding schemes with a special flavor in either ap-
proach or application domain. Chapter 13 (“Fractal Image Coding as Cross-Scale
Wavelet Coefficient Prediction”) by Geoffrey Davis is a highly original look at
fractal image compression as a form of wavelet subtree quantization. This insight
leads to effective ways to optimize the fractal coders but also reveals their limi-
tations, giving the clearest evidence available on why wavelet coders outperform
fractal coders. Chapter 14 (“Region of Interest Compression in Subband Coding”)
by Pankaj Topiwala develops a simple second-generation coding approach in which
regions of interest within images are exploited to achieve image coding with vari-
able quality. As wavelet coding techniques mature and compression gains saturate,
the next performance gain available is to exploit high-level content-based criteria to
trigger effective quantization decisions. The pyramidal structure of subband coders
affords a simple mechanism to achieving this capability, which is especially relevant
to surveillance applications. Chapter 15 (“Wavelet-Based Embedded Multispectral
Image Compression”) by Pankaj Topiwala develops an embedded coder in the con-
text of a multiband image format. This is an extension of standard color coding,
in which three spectral bands are given (red, green, and blue) and involves a fixed
color transform (e.g., RGB to YUV, see the chapter) followed by coding of each
band separately. For multiple spectral bands (from three to hundreds) a fixed spec-
LeGall is a reprint of a very early (1991) application of pyramidal methods (in
both space and time) for video coding. It uses the freedom of pyramidal (rather
than perfect reconstruction subband) coding to achieve excellent coding with bonus
features, such as random access, error resiliency, and compatibility with variable
resolution representation. Chapter 21 (“Subband Video Coding for Low to High
Rate Applications”) by Wilson Chung, Faouzi Kossentini and Mark Smith, adapts
the motion-compensation and I-frame/P-frame structure of MPEG-2, but intro-
duces spatio-temporal subband decompositions instead of DCTs.
Within the
spatio-temporal subbands, an optimized rate-allocation mechanism is constructed,
which allows for more flexible yet consistent picture quality in the video stream. Ex-
perimental results confirm both consistency as well as performance improvements
against MPEG-2 on test sequences. A further benefit of this approach is that it is
highly scalable in rate, and comparisions are provided against the H.263 standard
as well.
Chapter 22 (“Very Low Bit Rate Video Coding Based on Matching Pursuits”) by
Ralph Neff and Avideh Zakhor is a status report of their contribution to MPEG-4.
It is aimed at surpassing H.263 in performance at target bitrates of 10 and 24 kb/s.
The main innovation is to use not an orthogonal basis but a highly redundant dic-
tionaries of vectors (e.g., a “frame”) made of time-frequency-scale translates of a
single function in order to get greater compression and feature selectivity. The com-
putational demands of such representations are extremely high, but these authors
report fast search algorithms that are within potentially acceptable complexity
costs compared to H.263, while providing some performance gains. A key objec-
tive of MPEG-4 is to achieve object-level access in the video stream, and chapter
23 (“Object-Based Subband/Wavelet Video Compression”) by Soo-Chul Han and
John Woods directly attempts a coding approach based on object-based image seg-
mentation and object-tracking. Markov models are used for object transitions, and a
version of I-P frame coding is adopted. Direct comparisons with H.263 indicate that
while PSNRs are similar, the object-based approach delivers superior visual quality