20
© 2000 by CRC Press LLC
MPEG System — Video, Audio,
and Data Multiplexing
In this chapter, we present the methods and standards requiring how to multiplex and synchronize
the MPEG-coded video, audio, and other data into a single bitstream or multiple bitstreams for
storage and transmission.
20.1 INTRODUCTION
ISO/IEC MPEG has completed work on the ISO/IEC 11172 and 13818 standards known as MPEG-1
and MPEG-2, respectively, which deal with the coding of digital audio and video signals. Currently,
ISO/IEC is working on ISO/IEC 14496 known as MPEG-4 that is object-based generic coding for
multimedia applications. As mentioned in the previous chapters, the MPEG-1, 2, and 4 standards
are designed as generic standards and as such are suitable for use in a wide range of audiovisual
applications. The coding part of the standards convert the digital visual, audio, and data signals to
the compressed formats that are represented as binary bits. The task of the MPEG system is focused
on multiplexing and synchronizing the coded audio, video, and data into a single bitstream or
multiple bitstreams. In other words, the digital compressed video, audio, and data are all first
represented as binary formats which are referred to as bitstreams, and then the function of system
is to mix the bitstreams from video, audio, and data together. For this purpose, several issues have
to be addressed by the system part of the standard:
• Distinguishing different data, such as audio, video, or other data;
• Allocating bandwidth during muxing;
• Reallocating or decoding the different data during demuxing;
• Protecting the bitstreams in error-prone media and detecting the errors;
• Dynamically multiplexing several bitstreams.
is also considered by the buffer control or rate control mechanism in the encoder. The video, audio,
and data information are multiplexed according to the system syntax by inserting time stamps for
decoding, presenting, and delivering the coded audio, video, and other data. It should be noted that
both the program stream and the transport stream are packet-oriented multiplexing. Before we
explain these streams, we first give a set of parameter definitions used in the system documents.
Then, we describe the overall picture regarding the basic multiplexing approach for single video
and audio elementary streams.
20.2.1 M
AJOR
T
ECHNICAL
D
EFINITIONS
IN
THE
MPEG-2 S
ES
)
:
A generic term for one of the coded video, coded audio, or other
coded bitstreams in PES packets. One elementary stream is carried in a sequence of PES
FIGURE 20.1
Simplified overview of system layer scope. (From ISO/IEC 13818-1, 1996. With permission.)
© 2000 by CRC Press LLC
packets with one and only one stream identification. This implies that one elementary
stream can only carry the same type of data, such as audio or video.
Packet:
A packet consists of a header followed by a number of contiguous bytes from an
elementary data stream.
Packet identification
(
PID
)
)
:
A hypothetical reference model of a decoding process used
to describe the semantics of the MPEG-2 system-multiplexed bitstream.
Program-specific information
(
PSI
)
:
PSI includes normal data that will be used for demul-
tiplexing of programs in the transport stream by decoders. One case of PSI, the nonman-
datory network information table, is privately defined.
System header:
The leading fields of program stream packets.
Transport stream packet header:
The leading fields of program stream packets.
The following definitions are related to timing information:
Decoding time stamp
(
DTS
)
:
A time stamp that may be presented in a PES packet header
used to indicate the time when an access unit is decoded in the system target decoder.
Program clock reference
(
PCR
)
:
A time stamp in the transport stream from which decoder
timing is derived.
Presentation time stamp
(
© 2000 by CRC Press LLC
one or more programs. An important feature of a transport stream is that the transport stream is
designed in such a way that makes the following operations possible with minimum effort. These
operations include several transcoding requirements, including the following:
• Retrieve the coded data from one program within the transport stream, decode it, and
present the decoded results. In this operation, the transport stream is directly demulti-
plexed and decoded. The data in the transport stream are constructed in two layers: a
system layer and a compression layer. The system decoder decodes the transport streams
and demultiplexes them to the compressed video and audio streams that are further
decoded to the video and audio data by the video decoder and the audio decoder,
respectively. It should be noted that nonaudio/video data is also allowed. The function
of the transport decoder includes demultiplexing, depacketization, and other functions
such as error detection, which will be explained in detail later. This procedure is shown
in Figure 20.2.
•
Extract the transport stream packets from one program within the transport stream and
produce as the output a new transport stream that contains only that one program. This
operation can be seen as system-layer transcoding that converts a transport stream
containing multiple programs to a transport stream containing only a single program. In
this case, the remultiplexing operation may need the correction of PCR values to account
for changes in the PCR locations in the bitstream.
• Extract the transport stream packets of one or more programs from one or more transport
streams and produce as output of a new transport stream. This is another kind of
transcoding that converts selected programs of one transport stream to a different one.
• Extract the contents of one program from the transport stream and produce as output
another program stream. This is a transcoding that converts the transport program to a
the case of containing PES packets only. If the transport stream carries both PES and PSI packets,
then the structure of transport stream is as shown in Figure 20.4 would result. If the transport stream
packet header indicates that the transport stream packet includes the adaptation field, then the
construct is as shown in Figure 20.5.
In Figure 20.5, the appearance of the optional field depends on the flag settings. The function
of adaptation field will be explained in the syntax section. Before we go ahead, however, we should
give a little explanation regarding the size of the transport stream packet. More specifically, why
is a packet size of 188 bytes chosen? Actually, there are several reasons. First, the transport packet
size needs to be large enough so that the overhead due to the transport headers is not too significant.
Second, the size should not be so large that the packet-based error correction code becomes
inefficient. Finally, the size 188 bytes is also compatible with ATM packet size which is 47 bytes;
one transport stream packet is equal to four ATM packets. So the size of 188 bytes is not a theoretical
solution but a practical and compromised solution.
FIGURE 20.3
Structure of transport stream containing only PES packets. (From ISO/IEC 13818-1, 1996.
With permission.)
FIGURE 20.4
Structure of transport stream containing both PES packets and PSI packets.
© 2000 by CRC Press LLC
20.2.2.2 Transport Stream Syntax
As we indicated, the transport stream is a layered structure. To explain the transport stream syntax
we start from the transport stream packet header. Since the header part is very important, it is the
highest layer of the stream. We describe it in more detail. For the rest, we do not repeat the standard
transport_priority 1 bslbf
PID 13 uimsbf
transport_scrambling_control 2 bslbf
adaptation_field_control 2 bslbf
continuity_counter 4 uimsbf
bslbf Bitstream left bit first
unimsbf Unsigned integer, most significant bit first
© 2000 by CRC Press LLC
bit set to 1. The original idea of adding a flag to indicate the priority of packets comes
from video coding. The video elementary bitstream contains mostly bits that are con-
verted from DCT coefficients. The priority indicator can set a partitioning point that can
divide the data into a more important part and a less important part. The important part
includes the header information and low-frequency coefficients, and the less important
part includes only the high-frequency coefficients that have less effect on the decoding
and quality of reconstructed pictures.
• PID is a 13-bit field that provides information for multiplexing and demultiplexing by
uniquely identifying which packet belongs to a particular bitstream.
• The transport_scrambling_control is a 2-bit flag. 00 indicates that the packet is not
scrambled, the other three (01, 10, and 11) indicate that the packet is scrambled by a
user-defined scrambling method. It should be noted that the transport packet header and
adaptation field (when it is present) should not be scrambled. In other words, only the
payload of transport packets can be scrambled.
• The adaptation_field_control is a 2-bit indicator that is used to inform whether or not
there is an adaptation field present in the transport packet. 00 is reserved for future use:
01 indicates no adaptation field; 10 indicates that there is only an adaptation field and
no payload. Finally, 11 indicates that there is an adaptation field followed by a payload
in the transport stream packet.
of the transport stream packet with the current PID will contain the first byte of a video
sequence header or the first byte of an audio frame.