MINISTRY OF EDUCATION AND TRAINING
HANOI UNIVERSITY OF SCIENCE AND TECHNOLOGY
Nguyen Thi Thanh Nga
EFFICIENT DATA COMMUNICATION FOR WIRELESS SENSOR NETWORK
BASED ON DATA CORRELATION
Major: Computer Engineering
Code No.: 9480106
COMPUTER ENGINEERING DISSERTATION
SUPERVISORS:
1. Dr. Nguyen Kim Khanh
2. Assoc. Prof. Ngo Hong Son
Hanoi - 2018
TABLE OF CONTENT
COMMITMENT ...................................................................................................... 2
ACKNOWLEDGMENTS ........................................................................................ 3
TABLE OF CONTENT ........................................................................................... 4
LIST OF ABBREVIATIONS .................................................................................. 7
LIST OF FIGURES .................................................................................................. 8
LIST OF TABLES .................................................................................................. 11
PREFACE ............................................................................................................... 13
1
INTRODUCTION ........................................................................................... 16
Overview .............................................................................................. 31
2.2.2
Entropy concept ................................................................................... 32
2.2.3
Joint entropy ......................................................................................... 32
Correlation and entropy ............................................................................ 33
2.3.1
Correlation of two variables ................................................................. 33
2.3.1.1 Mutual information ............................................................. 33
2.3.1.2 Entropy correlation coefficient ........................................... 34
4
2.3.2
Correlation of more than two variables ................................................ 36
Conclusions .............................................................................................. 38
3
ENTROPY-BASED CORRELATION CLUSTERING .............................. 39
Joint entropy estimation ........................................................................... 39
4
ENTROPY CORRELATION BASED DATA AGGREGATIONS ........... 57
Compression aggregation ......................................................................... 57
4.1.1
Comparison of compression schemes .................................................. 57
4.1.2
Compression based routing scheme in a correlated region .................. 60
4.1.2.1 1-D analysis ........................................................................ 61
4.1.2.2 2-D analysis ........................................................................ 65
4.1.2.3 General topology model analysis ........................................ 69
4.1.3
Optimal routing scheme in correlation networks ................................. 71
Representative aggregation ...................................................................... 72
4.2.1
Distortion function ............................................................................... 72
4.2.2
Number of representative nodes........................................................... 73
4.2.3
5.4.2
Simulation results and discussions ....................................................... 92
5.4.2.1 Compression aggregation-based routing protocol .............. 92
5.4.2.2 Representative aggregation-based routing protocol ........... 97
5.4.3
Evaluations and comparison .............................................................. 100
5.4.3.1 The case of ECODA with compression aggregation ........ 101
5.4.3.2 The case of ECODA with representative aggregation ...... 106
Conclusions ............................................................................................ 107
6
CONCLUSIONS AND FUTURE STUDY .................................................. 109
Summary of Contributions ..................................................................... 109
Limitations .............................................................................................. 110
Future work ............................................................................................ 111
PUBLICATION LIST .......................................................................................... 112
REFERENCES ..................................................................................................... 113
APPENDIX............................................................................................................ 125
6
LIST OF ABBREVIATIONS
Abbreviation
LEACH-C
Low Energy Adaptive Clustering Hierarchy- Centralized
MAC
Media Access Control
MEMS
Micro Electro Mechanical Systems
QoS
Quality of Service
RDC
Routing Driven Compression
RSSI
Received Signal Strength Indication
SPT
Shortest Path Tree
TDMA
Figure 3.2 Sensor layout in Intel Berkeley Research Lab ........................................ 45
Figure 3.3 Practical, upper bound and lower bound joint entropy (JE) of subsets of
the dataset 1 ..................................................................................................... 46
Figure 3.4 Estimated joint entropy with different values of entropy correlation
coefficients using upper bound function (with Hmax = 2[bits]) ........................ 48
Figure 3.5 Estimated joint entropy (by upper bound) and practical joint entropy of
dataset 1 ........................................................................................................... 49
Figure 3.6 Correlation-based clustering algorithm................................................... 52
Figure 3.7 Temperature data measured at 11 nodes in the dataset 1 ........................ 53
Figure 3.8 Derivative of estimated joint entropy and calculated the joint entropy of
the selected group ............................................................................................ 55
Figure 4.1 Routing paths for three schemes: (a) DSC, (b) RDC, and (c) CDR [122]
.......................................................................................................................... 59
Figure 4.2 Energy consumptions for the DSC, RDC and CDR schemes respectively
to entropy correlation coefficients. .................................................................. 60
Figure 4.3 Routing pattern of 1-D network .............................................................. 61
Figure 4.4 Total bit-hop cost Es that corresponds to cluster size with different values
of entropy correlation coefficient in the case of 1-D with compression along
SPT to the cluster head. ................................................................................... 63
Figure 4.5 Total bit-hop cost Es that corresponds to cluster size with different values
of entropy correlation coefficient in the case of 1-D with compression at the
cluster head only. ............................................................................................. 64
Figure 4.6 Routing pattern of the 2-D network [122] .............................................. 65
8
Figure 4.7 Total bit-hop cost Es that corresponds to cluster size with different values
of entropy correlation coefficient in the case of 2-D with compression along
SPT to the cluster head. ................................................................................... 67
Figure 4.8 Total bit-hop cost Es that corresponds to cluster size with different values
Figure 5.9 Total energy in each round in case of representative aggregation with
compression with 16 correlation clusters ........................................................ 98
Figure 5.10 Number of alive nodes in each round in case of representative aggregation
with compression with 16 correlation clusters ................................................ 98
Figure 5.11 Total energy in each round in the case of representative aggregation
without compression with 16 correlation clusters ........................................... 99
Figure 5.12 Number of alive nodes in each round in the case of representative
aggregation without compression with 16 correlation clusters. .................... 100
Figure 5.13 Total energy comparison between distance-based protocol and ECODA
with compression aggregation in the case of 16 correlation clusters ............ 101
Figure 5.14 Total energy comparison between distance-based protocol and ECODA
with compression aggregation in the case of 16 correlation clusters ............ 102
Figure 5.15 Total energy comparison between distance-based protocol and ECODA
with compression aggregation in the case of 8 correlation clusters .............. 102
Figure 5.16 Total energy comparison between distance-based protocol and ECODA
with compression aggregation in the case of 8 correlation clusters .............. 103
Figure 5.17 Total energy comparison between distance-based protocol and ECODA
with compression aggregation in the case of 4 correlation clusters .............. 104
Figure 5.18 Total energy comparison distance-based protocol and ECODA with
compression aggregation in the case of 4 correlation clusters ...................... 105
Figure 5.19 Total energy comparison between distance-based protocol and ECODA
with representative aggregation in the case of 16 correlation clusters .......... 106
Figure 5.20 Number of alive nodes comparison between distance-based protocol and
ECODA with representative aggregation in the case of 16 correlation clusters
........................................................................................................................ 107
10
11
Table 5.8 Comparison between distance-based protocol and ECODA with
compression aggregation in the case of 4 correlation clusters ...................... 105
Table 5.9 Comparison between distance-based protocol and ECODA with
representative aggregation in the case of 16 correlation clusters .................. 106
12
PREFACE
Wireless Sensor Network (WSN) is the collection of sensor nodes which
cooperatively monitor surrounding phenomena over large physical areas. The
advances in the integration of micro-electro-mechanical systems and digital
electronics with the development of wireless communications have enabled the wide
deployment of WSNs. Sensor nodes in WSNs have been equipped with various
sensing capabilities in space and time and higher processing capacities can satisfy
requests from various modern applications. Because of low-cost, small-in-size and
no-replace battery powered characteristics of sensor nodes, energy conservation is
commonly recognized as the key challenge in designing and operating the networks.
In typical WSNs applications, sensors are required for spatially dense
deployment to achieve satisfactory coverage. As a result, multiple sensors will record
information about a single event in the sensing field, i.e. sensed data are correlated
with each other. The existence of correlation characteristic can bring many significant
potential advantages for the development of efficient communication protocols wellsuited to the WSNs paradigm. For example, due to the correlation degree, data in a
correlated region can be compressed with a high ratio to reduce the amount of sent
data for saving dissipated energy. Even with high enough correlation, it may not be
necessary for every sensor node in a correlation group to transmit its data to the base
network is reduced so that the dissipated energy is decreased.
The thesis structure is as follows:
Chapter 1: Introduction
This chapter reviews the introduction of WSNs, energy conservation schemes,
and data correlation problems. The main contributions of the thesis are also presented
shortly in this chapter.
Chapter 2: Correlation in Wireless Sensor Network
This chapter presents the survey of correlation model in WSNs and the
correlation through the point of view of information entropy. Then, the idea to
establish a new correlation model is described.
Chapter 3: Entropy-based Correlation Clustering.
Based on the analyzed factors in chapter 2, we propose the approximated
estimation of joint entropy. From this approximation method, we define the
correlation region and propose the correlation clustering scheme. We also verify the
validation of the proposed estimation and correlation clustering scheme in this
chapter.
Chapter 4: Entropy-based Data Aggregations
In this chapter, we exploit the advantages of using data correlation by data
aggregation using entropy correlation including entropy-based representative
aggregation and entropy-based data compression.
In entropy-based representative aggregation, the distortion of data in the group
while some nodes are put into sleep state is evaluated using the proposed correlation
model. From this evaluation, the number of representative nodes in a group is decided
14
based on the desired distortion and then a representative aggregation routing protocol
is developed.
In entropy-based data compression, several data compression schemes are
evaluated using the proposed correlation model. Then a compression-based routing
(VLSI), microelectromechanical systems (MEMS), and wireless communications,
that make sensors become tinier, low-power, inexpensive, further contribute to the
widespread use of distributed sensor systems such as wireless sensor networks.
Figure 1.1 Wireless Sensor Network1
1
/>
16
Wireless Sensor Network (WSN), is the collection of sensor nodes which
cooperatively monitor surrounding phenomena over large physical areas [1]–[4].
These sensor nodes can sense, observe or measure, gather information from the
environment and transmit the sensed data to the user based on some local decision
process. A typical sensor node is composed of a sensing unit which is equipped with
one or more sensors, a processing unit, a power unit, and a transceiver unit. The
sensing unit could have various sensors such as thermal, biological, chemical, optical,
and magnetic to measure properties of the environment. A sensor node acquires data
through the sensing unit, processes sensed data by the processing unit and finally
transmits processed data using the transceiver unit. Because of the limitations of
memory capabilities, sensor nodes should be implemented by wireless
communication to transfer the data to a base station, allowing them to disseminate
their sensor data to remote processing, visualization, analysis, and storage systems.
Figure 1.2 Wireless Sensor Network Applications 2
2
WSNs consist of various low-cost sensor nodes equipped with cameras and
microphones. They are usually deployed in a pre-planned manner into the
environment to guarantee coverage. Multi-media sensor nodes interconnect with each
other over a wireless connection for data retrieval, process, correlation, and
compression. Because of high data transmission, challenges in multi-media WSN
include high bandwidth demand, high energy consumption, quality of service (QoS)
provisioning, data processing and compressing techniques, and cross-layer design.
Mobile WSNs [8] [9] consist of a collection of sensor nodes that can move on
their own and interact with the physical environment. Same as in static WSNs, nodes
in mobile WSNs can sense, compute, and communicate. But unlike from static nodes,
18
mobile nodes can reposition and organize itself in the network. This mobility
characteristic requires dynamic routing in a mobile WSN. Challenges in mobile WSN
include deployment, localization, self-organization, navigation and control, coverage,
energy, maintenance, and data process.
The above described features of WSNs ensure great potential for many
applications [10]–[14]. The development of WSNs was motivated by military
applications [15]–[19] and then were widely used in various fields such as industrial
monitoring [20]–[25], environment monitoring [26]–[33], agriculture [34]–[37],
forest fire detection [38]–[40], animal tracking [41] [42], healthcare [43]–[50],
security [51]–[53], home automation [54] [55], power utility’s distribution [56],
logistics [57], intelligent traffic systems [58], etc.
In Vietnam, studies on WSNs have been considered in the last two decades.
The most attracted topics are energy saving and load balancing in WSNs, in
consideration of base station position [59], delay constrained [60], 3D WSN [61],
WSNs with holes [62], k-means clustering [63]. The applications of WSNs are also
widely considered such as landside monitoring [64], smart grid [65], target tracking
[66], logistics [57], and healthcare monitoring [67].
physical layer by adjusting radio transmission power. The idea is that a lower
communication range between nodes requires less power from radio [76] [77].
Another idea is that a node with higher remaining energy may increase its
transmission power, which enables other nodes to decrease their transmission power
[78].
Directional antenna schemes allow the signal to be sent and received in one
direction at a time that allows the improvement of transmission range and throughput
[79] [80]. To take advantage of directional antennas, new MAC protocols have been
proposed in [81] [82]. In addition, some specific problems also have to be considered
in [83].
1.2.2 Sleep/wake-up schemes
Sleep/wake-up schemes try to adapt node activity to save energy by putting
the radio in sleep mode. The main idea of this approach is the duty cycling scheme.
Duty cycling scheme schedules the node radio state according to network activity to
minimize idle listening and favor the sleep mode. They are the most energy-efficient
but suffer from sleep latency. In some cases, it is not possible to broadcast information
to all its neighbors because of unsimultaneously active. In addition, some fixing
parameters such as listening/sleeping period, preamble length, and slot time are
strictly issues because of system performance. The detailed survey of duty cycling
can be found in [84].
1.2.3 Energy efficient routing
Routing is also a burden that makes seriously drain energy reserves. In general,
there are various routing paradigms. In this research area, some main paradigms are
considered such as cluster architecture, energy as a routing metric, multipath routing,
20
relay node placement. An extensive review of energy-aware routing protocols can be
found in [85] [86].
can lose much of the original structure in the extracted data.
The second type of aggregation technique is data compression. Data
compression techniques are further divided into distributed data compression [97]
21
[98] and local data compression [99] [100]. The distributed data compression
techniques are the most optimal compression. However, it is much more complicated
than local data compression that is with smaller compression rate. The detailed survey
of data compression in WSN can be found in [101]. It is important to note that the
data compression techniques are only effective with correlation data. Therefore, the
correlation is usually required when using these techniques.
The third type of aggregation technique is representative type [102] in which
some nodes are chosen to be the representative of a group of nodes. The other nodes
in the group can be put to sleep to save energy. The number of sleep nodes that affects
the power consumption is decided by specified distortion. Same as data compression,
these techniques required data in correlation.
Adaptive sampling techniques adjust the sampling rate at each sensor while
ensuring that application needs are met in terms of coverage of information precision
by exploiting spatial-temporal correlations between data. By reducing the number of
samples, the amount of transmitted data is reduced thus save the node energy. The
temporal analysis of sensed data is used in [103] and spatial correlation is used in
[104]. More details about adaptive sampling can be found in [105].
Network coding is used to reduce the traffic in broadcast scenarios by sending
a linear combination of several data instead of a copy of each data. At the destination
nodes, data can be decoded by solving the linear equations [84] [106]. Network
coding exploits the trade-off between computation and communication since
communications are slow compared to computations and more energy consumption.
1.2.5 Charging solution
Several recent types of research address energy harvesting and wireless
degree, data in a correlated region can be compressed with a high ratio, thus the
amount of sent data is reduced [109]. Even with high enough correlation, it may not
be necessary for every sensor node in a correlation group to transmit its data to the
base station; instead, a smaller number of sensor measurements might be adequate to
communicate the event features to the base station within a certain reliability/fidelity
level [110].
In addition, in WSNs, the power breakdown heavily depends on the specific
node. However, the following remarks generally hold [109] [111].
• The radio energy consumption is of the same order of magnitude in the reception,
transmission, and idle states, while the power consumption drops of at least one
order of magnitude in the sleep state. Therefore, the radio should be put to sleep
(or turned off) whenever possible.
• The communication activity has an energy consumption much higher than the
computation activity. It has been shown that transmitting one bit may consume as
much as executing a few thousand instructions [112]. Therefore, communication
should be traded for computation.
Data correlation can allow us to reduce the data transferring, or even to put
some sensor nodes to sleep. Thus, it can make WSNs conserve energy significantly.
23
Problem statements and contributions
The main problems in this research are “How to recognize the correlation
among dataset by looking at data itself and how to exploit the correlation
characteristic for energy conservation in WSNs”. In this research, we focus on
WSNs working in high correlation environment. A high correlation environment can
be divided into groups called correlation regions where measured data strongly
depends on each other. By clustering sensor nodes into correlation regions, data
aggregation can be done to conserve the energy in WSNs. In this paper, we focus on
CHAPTER 2
2
CORRELATION IN WIRELESS SENSOR NETWORK
As mentioned in chapter 1, correlation characteristic has many significant potential
advantages for the development of energy-efficient communication protocols for
WSNs. To evaluate and exploit the correlation characteristic, it is necessary to build
a correlation model. This chapter concentrates on the survey of the existing
correlation models. From the advantages and limitations of the previous correlation
model, the approaching methodology of developing a new correlation model will be
pointed out.
Correlation model survey
Correlation is represented for the relationships between quantitative
variables or categorical variables. In other words, it’s a measure of how things are
related. Data correlation is a measure of how data is related to each other.
To exploit the correlation in WSNs, it is necessary to recognize the correlation
among data in the network by establishing correlation models. There have been many
research efforts to study the correlation model in WSNs. In [111], correlated nodes
are supposed to observe the same source 𝑆, and the observed data 𝑋𝑖 (𝑡 ) at the ith node
is the sum of a correlated version of the source 𝑆𝑖 (𝑡 ) and observed noise 𝑁𝑖 (𝑡 ).
𝑋𝑖 (𝑡 ) = 𝑆𝑖 (𝑡 ) + 𝑁𝑖 (𝑡 ).
(2.1)
The correlation model is the covariance function 𝐾𝜗 (correlation coefficient
𝜌) that is chosen to be distance dependence and can be classified into four groups
including:
Spherical:
3𝑑 1 𝑑 3
(2.4)
.
Matérn:
1
𝑑 𝜃2
𝑑
( ) 𝒦𝜃2 ( ).
𝐾𝜗 = 𝜃 −1
2 2 Γ(𝜃2 ) 𝜃1
𝜃1
(2.5)
In these models, 𝑑 is the Euclidean distance between nodes; 𝜃1 and 𝜃2 are
coefficients to be chosen, 𝒦𝜃2 (. ) is the modified Bessel function of second kind and
order 𝜃2 . Among these groups, power exponential correlation model is widely used
as described in [112]–[114].
Some papers also build correlation model in which the correlation coefficient
is a function of distance among nodes [115] [116]. In [115], the compression rate is
calculated based on the correlation coefficient 𝜌 between two nodes and the
correlation coefficient is defined to be inversely proportional to their Euclidean
distance 𝑑.
𝜌=
1
.
1+𝑑
(2.6)
series
𝑋 {𝑥1 , 𝑥2 , … , 𝑥𝑞 } and
𝑌 {𝑦1 , 𝑦2 , … , 𝑦𝑞 } are magnitude 𝑚-dissimilarity if there is an 𝑖 (1 ≤ 𝑖 ≤ 𝑞) such that
|𝑥𝑖 − 𝑦𝑖 | > 𝑚.
26
Trend 𝑡-dissimilarity: Two-time series 𝑋 {𝑥1 , 𝑥2 , … , 𝑥𝑞 } and 𝑌 {𝑦1 , 𝑦2 , … , 𝑦𝑞 }
are trend 𝑡-dissimilarity if
𝑞1
< 𝑡,
𝑞
(2.8)
in which 𝑞1 is the total number of pairs (𝑥𝑖 , 𝑦𝑖 ) in the time series that satisfies:
(𝑥𝑖 − 𝑥𝑖−1 )(𝑦𝑖 − 𝑦𝑖−1 ) > 0 with 1 ≤ 𝑖 ≤ 𝑞 − 1.
In [118], the dissimilarity is defined using Manhattan distance. To evaluate
the sensed data similarity between two nodes, a sliding window is used to choose
sensed data in both nodes, each dataset corresponds to a vector, and Manhattan
distance of these two vectors express the similarity. Manhattan distance of two
vectors 𝑣𝑖 {𝑥1 , 𝑥2 , … , 𝑥𝑞 } and 𝑣𝑗 {𝑦1 , 𝑦2 , … , 𝑦𝑞 } is defined as the sum of absolute
differences of their components:
𝑞
(2.9)
𝑑(𝑣𝑖 , 𝑣𝑗 ) = ∑|𝑥𝑘 − 𝑦𝑘 |.