5
Video Communications Over
Mobile IP Networks
5.1 Introduction
The near future will witness the universal deployment of the third-generation
mobile access networks that are expected to revolutionise the world of telecom-
munications. In addition to conventional voice communication services provided
by the second-generation GSM networks, the third-generation mobile networks
will support a greatly enhanced range of services due to the higher throughput
made available by embracing a number of new access technologies. These include
TDMA and a variety of CDMA radio access families such as the direct sequence
Wideband-CDMA (WCDMA) and multi-carrier CDMA. Consequently, the most
prominent development brought forward by the third-generation family of stan-
dards and protocols, namely IMT-2000, compared to second-generation GSM
systems, is the provision of high data rates that will enable the support of a wide
range of real-time mobile multimedia services including combinations of video,
speech/audio and data/text traffic streams with QoS control (Third-generation
Partnership project). This chapter examines the issues involved in the provision of
video services over the 2.5G and 3G mobile networks, and evaluates the perceived
service quality resulting from video transmissions over these networks under
various operating conditions. The focus will also be on describing and analysing
the performance of a number of tools specifically designed to improve the percep-
tual video quality over the new mobile access networks.
5.2 Evolution of 3GMobile Networks
The second-generation GSM technology has resulted in a major success for the
delivery of telephony and low bit rate data services to mobile end users. On the
other hand, the tremendous growth of the Internet has given rise to a new range of
multimedia applications that have penetrated the global market at an explosive
Compressed Video Communications
Abdul Sadka
Copyright © 2002 John Wiley & Sons Ltd
the quality of service (QoS) offered to client applications will be a function of
different connection parameters such as throughput, end-to-end delays, error rates
and frame dropping rates. Therefore, each mobile terminal will have access to a
number of bearer channels, each offering a different QoS to the various services
being used. On the other hand, the standardised protocols that were adopted for
the Internet Protocol and have consequently led to the widespread success of the
Internet have allowed an extremely diverse range of terminals and devices to
communicate with each other. Moreover, the accepted application-layer stan-
dards such as the HyperText Transfer Protocol (HTTP) have also allowed multi-
media applications to be deployed and to proliferate. The combination and
interoperability of these universally accepted application and network-layer stan-
dards will certainly constitute the core of the architecture of 3G systems, and will
identify the mechanism of operation of multimedia services over these mobile
platforms. This chapter will focus on the real-time transmission of compressed
178
VIDEO COMMUNICATIONS OVER MOBILE IP NETWORKS
Figure 5.1 Evolution of mobile networks
video data encapsulated in IP packets over the future mobile networks. Figure 5.1
illustrates the time evolution of mobile networks as a function of their provided
services. This evolution was consolidated by the remarkable migration from the
second-generation GSM network to the third-generation EDGE (Enhanced Data
rate GSM Evolution) and UMTS (Universal Mobile Telecommunication System)
networks through the 2.5G packet-switching GPRS (General Packet Radio Ser-
vice) and circuit-switching HSCSD(High Speed Circuit Switched Data) systems.
5.3 Video Communications from a Network Perspective
One of the main design trends of multimedia networks is to achieve a connection
between two or more users by bringing digital content, such as video, to their
desktops. Video telephony, videoconferencing, telemedicine and distance learning
are all examples of multimedia applications that aim at providing video (along
with voice) services in a networking environment. Beyond the desktop, multimedia
start and end indicators. However, to enable the receiver to determine the begin-
ning and end of a block of data (set of characters), each block of data begins with a
preamble bit pattern and ends with a post-amble bit pattern, as is the case in
asynchronous communication systems. This block of data is referred to as a
packet. The packet can be of fixed length such as the ATM cell (53 bytes), or
variable length as for IP packets.
Unlike data streams, coded video has a very low tolerance to delay, and
therefore dropped video information cannot be retransmitted. Alternatively, com-
pressed video data has to be fitted into a certain structure that enables error
control to be applied in case of information loss and bit errors. This structure is
called a packet and consists of a video payload and a protocol header. The process
of fitting the video payload into this packet structure is called packetisation, and
the part of the communication system where packetisation is performed is known
as the packetiser. Figure 5.2 is a block diagram of a typical packetiser with one
input video source.
A number of advantages are obtained from packetising a compressed video
stream before transmission.
It is intended that a number of applications would be running between two
end-points at the same time. Moreover, the traffic flow between these two end-
points may consist of a number of various traffic types. Therefore, the successful
end-to-end control and delivery of routed multimedia information would be
impossible if the information bits were not sent in packet format. The traffic type of
the payload is then identified by the content of the type field in each packet header.
Using the packet structure, it would be possible to multiplex various streams of
180
VIDEO COMMUNICATIONS OVER MOBILE IP NETWORKS
Figure 5.2 Block diagram of a video packetiser/depacketiser system
data onto the same bearer since the depacketiser would then be able to identify the
source of each packet from the content of its type field. Once the source is known,
the payload is then delivered to the corresponding decoder. Consequently, the
selection (RPS) technique.
The packet structure also enables the prioritisation of video data in accordance
with its sensitivity to errors and contribution to overall video quality. Some levels
of priority can then be assigned to video packets depending on their payload (the
prioritised information loss of Section 3.7). In case of reported network congestion,
the video encoder drops low-priority packets, hence reducing its output rate for
graceful quality degradation.
5.4 Description of Future Mobile Networks
The second-generation mobile cellular networks, namely GSM, do not provide
sufficient capabilities for the routing of packet data. In order to support packet
data transmission and allow the operator to offer efficient radio access to external
IP-based networks such as the Internet and corporate Intranets, GPRS (General
Packet Radio Service) has been developed by ETSI (European Telecommunica-
tion Standards Institute) and added to GSM. GPRS is an end-to-end mobile
packet radio communication system that makes use of the same radio architecture
as GSM (Brasche and Walke, 1997). GPRS permits packet mode data trans-
mission and reception, on both the radio interface and the network infrastructure,
without employing circuit switched resources. Although GPRS was initially de-
signed for the provision of non delay-critical data services, this packet-switched
system can be a suitable medium for video communications due to two main
reasons. Firstly, the throughput capability of a single GPRS terminal can be
increased using the multi-slotting feature of the GPRS system simply by allocating
more timeslots or PDTCH (Packet Data Traffic Channels) to a single terminal.
Another important feature of GPRS is its IP support, and this allows for accessing
and interworking with the video applications of the Internet.
The network infrastructure for implementing the GPRS service is based on IP
technology. For data packet transmission in the GPRS network, the mobile
terminal is identified by an IP address assigned to it either permanently or
dynamically at the time the session is set up. The routing of IP packets is
performed by a logical network entity that is referred to as the GPRS Support
Subsystem (BSS).
The GPRS service introduced in the GSM system is an intermediate step
towards the third-generation UMTS network. EGPRS (Enhanced GPRS) is an
enhanced version of GPRS that allows for a considerable increase in throughput
availability to a single user given enough traffic availability from active sources
and benign interference conditions. This implies that EGPRS can provide video
services with higher data rates than is possible with GPRS. EGPRS uses the same
5.4 DESCRIPTION OF FUTURE MOBILE NETWORKS
183
protocol architecture of GPRS described above, with improvements of the modu-
lation scheme employed in the EDGE (Enhanced Data rate GSM Evolution)
radio interface that lead to the increase in throughput availability. Similarly,
UMTS uses an innovative radio access approach to increase the available capacity
of the radio interface. The UMTS infrastructure is integrated with GSM so that
the UMTS core network can perform both the circuit- and packet-switching
functions. However, the major technological innovations of UMTS are incorpor-
ated in the packet-switched IP nodes. The structure of the packet switched part of
the UMTS core network is similar to that of the GPRS described above, where the
BSS access segment is replaced by the UTRAN (Universal Terrestrial Radio
Access Network) access network that is based on W-CDMA (Wideband Code
Division Multiple Access) technologies. The connection between the UMTS core
network and UTRAN access network is guaranteed by a new interface called I
S
,
which specialises in managing both the packet-switched and the circuit-switched
components. The main improvements achieved by UMTS compared to GPRS are
in the IP mobility management and the quality of service control. UMTS offers a
range of QoS levels that are suitable for real-time video communications, namely
those specified in the conversational and streaming classes. The main feature that
defines the capability of a QoS class to accommodate a real-time video service is its
case, the bit errors result in the same effects that have been examined in Chapter 4.
However, in packet video networks, quality degradation could also be due to
network congestion and link overflows. These network problems result in com-
pletely discarding the video packets that have been subject to excessive amounts of
delay. In order to mitigate the effect of packet loss, some intelligent content-based
packetisation schemes must be employed.
5.5.1 Packetisation schemes
The structure of a packet depends on the layer at which the packet is defined and
the networking platform upon which the packets are transmitted. As described in
Section 4.4, MPEG-4 defines an application layer packet structure where each
packet consists of two main partitions. The first partition contains the more
error-sensitive shape and motion data, while the second partition consists of the
more error-tolerant texture data. This packetisation scheme allows the video
decoder to successfully reconstruct (with minor quality degradation) the MBs
contained in a packet using their motion and shape data (first partition) when
errors hit only the texture data (second partition) of the packet. This application
layer MPEG-4 packet differs from the transport layer packet in which the MPEG-
4 packets are encapsulated. The latter has additional protocol headers which
reduce the overall throughput available to the video source. The overhead im-
posed by the packetisation scheme depends on the transport mechanism employed
for the transmission of video packets. For instance, packing coded video streams
in RTP (Schulzrinne et al., 1996) packets for real-time video transmission over IP
networks has different implications from packing the same video data into ATM
cells for transport over the B-ISDN networks (Broadband Integrated Service
Digital Network).
The layering structure of video coding standards requires that some information
should be specified in the video packet at each level of the hierarchy. For instance,
at the frame level, information such as temporal reference and picture header is
contained in the output stream. At the GOB level, the GOB number and the
quantiser level for the entire GOB are indicated. At the MB level, both coded and
the 48-byte payload, the coded video can be packed using one of two different
approaches (Ghanbari and Hughes, 1993), as illustrated in Figure 5.4.
In the close packing scheme, video data is packed continuously in the payload
field until the ATM cell is completely full. This leads to the possibility that some
MBs can be split between two adjacent cells. In the second approach, i.e. the loose
packing, each ATM cell contains an integral number of MBs. In both methods, an
eight-bit field is assigned to the cell sequence number and a five-bit one to the
picture number. Moreover, in both methods, the first complete MB inside the
ATM cell is absolutely addressed with reference to the picture information, while
all the following MBs in the cell are relatively addressed. The use of absolute
addressing is useful in eliminating the effect of cell loss propagation into the
forthcoming correctly received cells. A unique bit pattern is used in the close
packing methodology to designate the end of the variable-length section of data
belonging to the previous cell. This unique bit pattern must be different from the
GSC (GOB Start Code) so that the depacketiser will not fall on a false start of a
GOB. The shorter this bit pattern, the higher the probability of falsely detecting it
due to combinations of other codewords in the ATM cell. However, it is a
186
VIDEO COMMUNICATIONS OVER MOBILE IP NETWORKS
Figure 5.4 Packing video in ATM cells: (a) close packing, (b) loose packing
requirement to reduce the size of this unique bit pattern in order to minimise the
amount of overhead imposed by the close packetisation scheme. As a trade-off
between throughput and error robustness, the size of the unique bit pattern is set
to 11 bits. Therefore, the total overhead of the close packing scheme is 4.125 bytes,
whereas it is only 2.75 bytes for the loose packing technique. However, the loose
packing scheme results in a less efficient use of bandwidth, especially when ATM
cells carry the traffic of multiple video sources.
Apart from bandwidth utilisation, the packetisation scheme also has an effect on
the error performance of the packet video application. In the ATM cell close
packing technique, the loss of a cell affects not only the MBs of the discarded cell,
over the future mobile networks depends on a number of other parameters,
namely the available throughput and the employed channel coding schemes. For
example, the GPRS data is transmitted over the Packet Data Traffic CHannel
(PDTCH) after being error-protected using one of four possible channel protec-
tion schemes, namely CS-1, CS-2, CS-3 and CS-4. The first three coding schemes
use convolutional codes and block check sequences of different strengths to
produce different protection rates. CS-2 and CS-3 use punctured versions of the
CS-1 code, thereby allowing for a greater user payload at the expense of reduced
performance in error-prone environments. However, CS-4 only provides error
detection functionality and is therefore not suitable for video transmission pur-
poses. For video applications, it has been experimentally proved that only CS-1
and CS-2 could achieve acceptable video quality. Table 5.1 shows the data rates
provided per timeslot for each one of these GPRS channel coding schemes.
As can be observed in Table 5.1, the payload available in a GPRS radio block
depends on the channel coding scheme used. The rate of the RLC/MAC data
payload, i.e. the rate presented to the LLC layer, varies from 8 kbit/s for CS-1 to
20.35 kbit/s for CS-4. Depending on the multislotting capabilities of the mobile
GPRS terminal, the throughput available to the terminal is a multiple of these data
rates. These data rates represent only the throughput at which LLC PDUs (Packet
Datagram Unit) are transmitted across the radio interface. However, when con-
sidering the GPRS protocol stack illustrated in Figure 5.3, it can be seen that the
RLC/MAC data payload will contain header and other related signalling over-
heads from the LLC, SNDC, IP, UDP and RTP layers. The presence of these
overheads will reduce the true throughput presented to the application layer, i.e.
the video source coder. The protocol overheads constitute approximately 10 per
cent to 15 per cent of the total throughput at the RLC layer for QCIF video
transmissions at frame rates of 5 to 10 f/s when no header compression is applied.
For this reason, the total throughput, as seen by the application layer in the GPRS
protocol stack, for all combinations of timeslots (TS) and channel coding schemes
(CS) allowed by GPRS, is depicted in Table 5.2.
depicted in Table 5.3.
As in GPRS, due to the overheads imposed by the protocols overlying the
RLC/MAC layer, some protocol efficiency has to be compromised. Similarly, in
EGPRS, a protocol efficiency of 85 per cent can be achieved for QCIF frame rate
of 5 f/s, assuming an overall header size of 44 bytes in each RLC/MAC block.
Consequently, the throughput presented to video sources at the application layer
is less than that available at the RLC/MAC layer and can vary with the employed
MCS scheme. Using a single timeslot at the radio interface, it is possible to provide
the 5 f/s video coder at the application layer of an EGPRS terminal with a source
throughput varying from 7.5 kbit/s for MCS-1 to 50 kbit/s for MCS-9. Using the
multislotting capabilities of the radio interface, the video source can have
multiples of these data rates, as shown in Table 5.4. This reflects the large spread in
the values of available throughput for video services over EGPRS. The choice of a
suitable CS-TS combination for video services over mobile networks depends
5.5 QOS ISSUES FOR PACKET VIDEO OVER MOBILE NETWORKS
189
Table 5.4 Video source throughput in kbit/s for all EGPRS TS/MCS combinations
Scheme 1 TS 2 TS 3 TS 4 TS 5 TS 6 TS 7 TS 8 TS
MCS-1 7.5 15 22.5 30 37.5 45 52.5 60
MCS-2 9.6 19.2 28.8 38.4 48 57.6 67.2 76.8
MCS-3 12.6 25.2 37.8 50.4 63 75.6 88.2 100.8
MCS-4 15 30 45 60 75 90 105 120
MCS-5 19 38 57 76 95 114 133 152
MCS-6 25.2 50.4 75.6 100.8 126 151.2 176.4 201.6
MCS-7 38 76 114 152 190 228 266 304
MCS-8 46.2 92.4 138.6 184.8 231 277.2 323.4 369.6
MCS-9 50.31 100.6 150.9 201.2 251.5 301.8 352.1 402.4
highly on the activity of the video source and error characteristics of the radio
network.
5.6 Real-time Video Transmissions over Mobile IP
Therefore, video frames are segmented and encapsulated into RTP packets, which
are then embodied in the packet structure of the underlying protocols, namely
UDP and IP as shown in Figure 5.5.
5.6.1 Packetisation of data partitioned MPEG-4 video using
RTP/UDP/IP
The careful packetisation of video data is necessary to ensure the optimal trade-off
between the channel utilisation and error robustness. Several researchers (Basso,
Varakliotis and Castagno, 2000) have attempted to develop optimal techniques in
order to pack compressed video data into RTP packets for real-time transmission
over IP networks. The main focus of their work has been on the ability to
synchronise MPEG-4 streams with other RTP payloads, the monitoring of
MPEG-4 delivery performance through the use of the RTP control protocol,
namely RTCP (Real Time Control Protocol), on the reverse channel, and also the
combination of MPEG-4 with other real-time data streams into a set of con-
solidated streams by means of RTP mixers. However, these packetisation tech-
niques did not focus on the error-resilience issues of packet video over mobile
networks. The size of the video payload and the sequence of video data within each
packet do have a direct influence on the error robustness and channel utilisation of
the video application. Therefore, in order to achieve the best quality of service, the
error-resilience aspects of the packetisation scheme have to be considered.
On the other hand, due to the time-varying nature of the mobile channel
conditions, the packetisation techniques ought to be adaptive in order to maintain
an optimal trade-off between throughput and error resilience at any instant of
5.6 REAL-TIME VIDEO TRANSMISSIONS OVER MOBILE IP NETWORKS
191