MINISTRY OF EDUCATION AND TRAINING
HANOI UNIVERSITY OF SCIENCE AND TECHNOLOGY
Nguyen Thi Thanh Nga
EFFICIENT DATA COMMUNICATION FOR WIRELESS SENSOR NETWORK
BASED ON DATA CORRELATION
Major: Computer Engineering
Code No.: 9480106
COMPUTER ENGINEERING DISSERTATION
SUPERVISORS:
1. Dr. Nguyen Kim Khanh
2. Assoc. Prof. Ngo Hong Son
Hanoi - 2018
COMMITMENT
I assure that this is my own research. All the data and results in the thesis are
completely true, were agreed to use in this thesis by co-authors. This research hasn’t
been published by other authors than me.
Hanoi, 17th Decemberber 2018
SUPERVISORS
AUTHOR
Dr. Nguyen Kim Khanh
supporting me continuously and throughout writing this thesis.
Nguyen Thi Thanh Nga
3
TABLE OF CONTENT
COMMITMENT ...................................................................................................... 2
ACKNOWLEDGMENTS ........................................................................................ 3
TABLE OF CONTENT ........................................................................................... 4
LIST OF ABBREVIATIONS .................................................................................. 7
LIST OF FIGURES .................................................................................................. 8
LIST OF TABLES .................................................................................................. 11
PREFACE ............................................................................................................... 13
1
INTRODUCTION ........................................................................................... 16
Overviews ................................................................................................. 16
Energy conservation in WSNs.................................................................. 19
1.2.1
Radio optimization ............................................................................... 19
1.2.2
Sleep/wake-up schemes ....................................................................... 20
1.2.3
2.3.1
Correlation of two variables ................................................................. 33
2.3.1.1 Mutual information ............................................................. 33
2.3.1.2 Entropy correlation coefficient ........................................... 34
4
2.3.2
Correlation of more than two variables ................................................ 36
Conclusions .............................................................................................. 38
3
ENTROPY-BASED CORRELATION CLUSTERING .............................. 39
Joint entropy estimation ........................................................................... 39
3.1.1
Determining the upper bound of joint entropy..................................... 39
3.1.2
Determining the lower bound of joint entropy..................................... 42
3.1.3
Validating entropy estimation .............................................................. 44
Correlation region and correlation clustering algorithm .......................... 47
4.1.2.2 2-D analysis ........................................................................ 65
4.1.2.3 General topology model analysis ........................................ 69
4.1.3
Optimal routing scheme in correlation networks ................................. 71
Representative aggregation ...................................................................... 72
4.2.1
Distortion function ............................................................................... 72
4.2.2
Number of representative nodes........................................................... 73
4.2.3
Representative node selection .............................................................. 76
4.2.4
Practical validation ............................................................................... 77
Conclusions .............................................................................................. 80
5
5 ENTROPY CORRELATION BASED DATA AGGREGATION
PROTOCOL (ECODA) ......................................................................................... 82
Conclusions ............................................................................................ 107
6
CONCLUSIONS AND FUTURE STUDY .................................................. 109
Summary of Contributions ..................................................................... 109
Limitations .............................................................................................. 110
Future work ............................................................................................ 111
PUBLICATION LIST .......................................................................................... 112
REFERENCES ..................................................................................................... 113
APPENDIX............................................................................................................ 125
6
LIST OF ABBREVIATIONS
Abbreviation
Meaning
BS
Base Station
CDR
Compression Driven Routing
CH
QoS
Quality of Service
RDC
Routing Driven Compression
RSSI
Received Signal Strength Indication
SPT
Shortest Path Tree
TDMA
Time Division Multiple Access
VLSI
Very Large-Scale Integration
WSN(s)
Wireless Sensor Network(s)
1-D
.......................................................................................................................... 59
Figure 4.2 Energy consumptions for the DSC, RDC and CDR schemes respectively
to entropy correlation coefficients. .................................................................. 60
Figure 4.3 Routing pattern of 1-D network .............................................................. 61
Figure 4.4 Total bit-hop cost Es that corresponds to cluster size with different values
of entropy correlation coefficient in the case of 1-D with compression along
SPT to the cluster head. ................................................................................... 63
Figure 4.5 Total bit-hop cost Es that corresponds to cluster size with different values
of entropy correlation coefficient in the case of 1-D with compression at the
cluster head only. ............................................................................................. 64
Figure 4.6 Routing pattern of the 2-D network [122] .............................................. 65
8
Figure 4.7 Total bit-hop cost Es that corresponds to cluster size with different values
of entropy correlation coefficient in the case of 2-D with compression along
SPT to the cluster head. ................................................................................... 67
Figure 4.8 Total bit-hop cost Es that corresponds to cluster size with different values
of entropy correlation coefficient in the case of 2-D with compression at the
cluster head only. ............................................................................................. 68
Figure 4.9 Illustration of clustering for a general topology model .......................... 69
Figure 4.10 Total transmission cost that corresponds to cluster size with different
values of entropy correlation coefficient with compression along SPT to the
cluster head. ..................................................................................................... 70
Figure 4.11 Total transmission cost respectively to cluster size with different values
of entropy correlation coefficient with compression at the cluster head only. 71
Figure 4.12 The relation between distortion and the number of representative nodes
with N = 10 ...................................................................................................... 74
Figure 4.13 The relation between distortion and the number of representative nodes
with N = 15 ...................................................................................................... 74
with compression aggregation in the case of 16 correlation clusters ............ 102
Figure 5.15 Total energy comparison between distance-based protocol and ECODA
with compression aggregation in the case of 8 correlation clusters .............. 102
Figure 5.16 Total energy comparison between distance-based protocol and ECODA
with compression aggregation in the case of 8 correlation clusters .............. 103
Figure 5.17 Total energy comparison between distance-based protocol and ECODA
with compression aggregation in the case of 4 correlation clusters .............. 104
Figure 5.18 Total energy comparison distance-based protocol and ECODA with
compression aggregation in the case of 4 correlation clusters ...................... 105
Figure 5.19 Total energy comparison between distance-based protocol and ECODA
with representative aggregation in the case of 16 correlation clusters .......... 106
Figure 5.20 Number of alive nodes comparison between distance-based protocol and
ECODA with representative aggregation in the case of 16 correlation clusters
........................................................................................................................ 107
10
LIST OF TABLES
Table 3.1 Node’s entropy of the dataset 1 ................................................................ 46
Table 3.2 Entropy correlation coefficient of each pair from the dataset 1 ............... 47
Table 3.3 Practical, upper bound and lower bound joint entropy (JE) of subsets of the
dataset 1 ........................................................................................................... 49
Table 3.4 Clustering results of 48 nodes .................................................................. 53
Table 4.1 Number of representative nodes with distortion D = 0.05 ....................... 76
Table 4.2 Number of representative nodes with distortion D = 0.1 ......................... 76
Table 4.3 Number of representative nodes with distortion D = 0.15 ....................... 76
Table 4.4 Selection of representative nodes and the actual distortion based on
theoretical calculation (dataset 1 with N = 11 nodes) ..................................... 78
Table 4.5 Selection of representative nodes and the actual distortion based on
PREFACE
Wireless Sensor Network (WSN) is the collection of sensor nodes which
cooperatively monitor surrounding phenomena over large physical areas. The
advances in the integration of micro-electro-mechanical systems and digital
electronics with the development of wireless communications have enabled the wide
deployment of WSNs. Sensor nodes in WSNs have been equipped with various
sensing capabilities in space and time and higher processing capacities can satisfy
requests from various modern applications. Because of low-cost, small-in-size and
no-replace battery powered characteristics of sensor nodes, energy conservation is
commonly recognized as the key challenge in designing and operating the networks.
In typical WSNs applications, sensors are required for spatially dense
deployment to achieve satisfactory coverage. As a result, multiple sensors will record
information about a single event in the sensing field, i.e. sensed data are correlated
with each other. The existence of correlation characteristic can bring many significant
potential advantages for the development of efficient communication protocols wellsuited to the WSNs paradigm. For example, due to the correlation degree, data in a
correlated region can be compressed with a high ratio to reduce the amount of sent
data for saving dissipated energy. Even with high enough correlation, it may not be
necessary for every sensor node in a correlation group to transmit its data to the base
station. Instead, a smaller number of sensor measurements (representation) might be
adequate to communicate the event features to the base station within a certain
reliability/fidelity level.
From this point of view, various researches have focused on discovering and
exploiting the correlation of sensed data in WSNs. At the beginning of these
researches, the traditional probability and statistic theory have been used to describe
the correlation among data. Nevertheless, these approaches limited the correlation as
a linear relation that may not appropriate for general, nonlinear cases in practice.
Therefore, the information entropy approach has been considered to obtain the
generality. However, most of the research approach, using traditional probability statistic theory or information entropy theory, considered the correlation as a
distance-dependence feature. In general, the correlation of data may be independent
of external factors such as sensor location and environmental conditions and thus, so
estimation of joint entropy. From this approximation method, we define the
correlation region and propose the correlation clustering scheme. We also verify the
validation of the proposed estimation and correlation clustering scheme in this
chapter.
Chapter 4: Entropy-based Data Aggregations
In this chapter, we exploit the advantages of using data correlation by data
aggregation using entropy correlation including entropy-based representative
aggregation and entropy-based data compression.
In entropy-based representative aggregation, the distortion of data in the group
while some nodes are put into sleep state is evaluated using the proposed correlation
model. From this evaluation, the number of representative nodes in a group is decided
14
based on the desired distortion and then a representative aggregation routing protocol
is developed.
In entropy-based data compression, several data compression schemes are
evaluated using the proposed correlation model. Then a compression-based routing
protocol is developed.
Chapter 5: Entropy Correlation based Data Aggregation Protocol (ECODA)
In this chapter, we outline an Entropy COrrelation-based Data Aggregation
protocol (ECODA) using the proposed clustering scheme in chapter 3 and data
aggregation schemes in chapter 4. The simulations have also been done to validate
the effectiveness of the proposed clustering and aggregating schemes.
Chapter 6: Conclusions and Further study
This chapter concludes the results of the thesis with careful evaluations and
points out the remained problems that are the future works.
15
Wireless Sensor Network (WSN), is the collection of sensor nodes which
cooperatively monitor surrounding phenomena over large physical areas [1]–[4].
These sensor nodes can sense, observe or measure, gather information from the
environment and transmit the sensed data to the user based on some local decision
process. A typical sensor node is composed of a sensing unit which is equipped with
one or more sensors, a processing unit, a power unit, and a transceiver unit. The
sensing unit could have various sensors such as thermal, biological, chemical, optical,
and magnetic to measure properties of the environment. A sensor node acquires data
through the sensing unit, processes sensed data by the processing unit and finally
transmits processed data using the transceiver unit. Because of the limitations of
memory capabilities, sensor nodes should be implemented by wireless
communication to transfer the data to a base station, allowing them to disseminate
their sensor data to remote processing, visualization, analysis, and storage systems.
Figure 1.2 Wireless Sensor Network Applications 2
2
/>Wireless_Sensor_Networks/figures?lo=1
17
There are five types of WSNs: terrestrial WSN, underground WSN,
underwater WSN, multi-media WSN, and mobile WSN [3]. In terrestrial WSNs [1],
there are hundreds to thousands of inexpensive wireless sensor nodes deployed in a
given area, either in an ad hoc or in a pre-planned manner. Reliable communication
in a dense environment is very important in this WSN type. Battery power is limited
and may not be rechargeable in terrestrial sensor nodes, however, they can be
equipped with a secondary power source such as solar cells. In a terrestrial WSN,
mobile nodes can reposition and organize itself in the network. This mobility
characteristic requires dynamic routing in a mobile WSN. Challenges in mobile WSN
include deployment, localization, self-organization, navigation and control, coverage,
energy, maintenance, and data process.
The above described features of WSNs ensure great potential for many
applications [10]–[14]. The development of WSNs was motivated by military
applications [15]–[19] and then were widely used in various fields such as industrial
monitoring [20]–[25], environment monitoring [26]–[33], agriculture [34]–[37],
forest fire detection [38]–[40], animal tracking [41] [42], healthcare [43]–[50],
security [51]–[53], home automation [54] [55], power utility’s distribution [56],
logistics [57], intelligent traffic systems [58], etc.
In Vietnam, studies on WSNs have been considered in the last two decades.
The most attracted topics are energy saving and load balancing in WSNs, in
consideration of base station position [59], delay constrained [60], 3D WSN [61],
WSNs with holes [62], k-means clustering [63]. The applications of WSNs are also
widely considered such as landside monitoring [64], smart grid [65], target tracking
[66], logistics [57], and healthcare monitoring [67].
Energy conservation in WSNs
In most cases, energy for activities in WSNs comes from a limited battery
supply. However, in many applications, it is very hard or impossible to recharge the
batteries due to the deployment of the nodes because of the difficulties and hostile
terrain or due to a large number of nodes deployed in the environment [68] [69]. For
those reasons, energy conservation is commonly recognized as the key challenge to
designing and operating the network in WSNs, because individual sensor nodes are
expected to be low-cost, small-in-size, and powered by a non-replaceable battery.
In recent years, numerous energy-saving approaches have been proposed in
[70] [71]. They can mainly be classified into five categories including radio
optimization, data reduction, sleep/wakeup schemes, energy-efficient routing and
charging solutions. The next section will present these five categories of energysaving approaches.
the radio in sleep mode. The main idea of this approach is the duty cycling scheme.
Duty cycling scheme schedules the node radio state according to network activity to
minimize idle listening and favor the sleep mode. They are the most energy-efficient
but suffer from sleep latency. In some cases, it is not possible to broadcast information
to all its neighbors because of unsimultaneously active. In addition, some fixing
parameters such as listening/sleeping period, preamble length, and slot time are
strictly issues because of system performance. The detailed survey of duty cycling
can be found in [84].
1.2.3 Energy efficient routing
Routing is also a burden that makes seriously drain energy reserves. In general,
there are various routing paradigms. In this research area, some main paradigms are
considered such as cluster architecture, energy as a routing metric, multipath routing,
20
relay node placement. An extensive review of energy-aware routing protocols can be
found in [85] [86].
Cluster architecture is the organizing of the network into clusters and each
cluster is managed by a cluster head. This technique has been proposed to enhance
energy efficiency because it can limit energy consumption in different means such as
reducing the communication range inside the cluster that requires less transmission
power; limiting the number of transmits by fusion done by cluster head; reducing
energy-intensive computation to cluster head; enabling power-off some nodes inside
the cluster while cluster head takes forwarding responsibilities; and balancing energy
consumption by cluster head rotation [87].
Energy can also be considered as a metric in the setup path phase to extend the
lifetime of sensor networks. In this case, routing algorithms not only focus on the
shortest paths but also can select the next hop based on its residual energy [88].
Multipath routing, in general, is more complex than single-path routing. But
The third type of aggregation technique is representative type [102] in which
some nodes are chosen to be the representative of a group of nodes. The other nodes
in the group can be put to sleep to save energy. The number of sleep nodes that affects
the power consumption is decided by specified distortion. Same as data compression,
these techniques required data in correlation.
Adaptive sampling techniques adjust the sampling rate at each sensor while
ensuring that application needs are met in terms of coverage of information precision
by exploiting spatial-temporal correlations between data. By reducing the number of
samples, the amount of transmitted data is reduced thus save the node energy. The
temporal analysis of sensed data is used in [103] and spatial correlation is used in
[104]. More details about adaptive sampling can be found in [105].
Network coding is used to reduce the traffic in broadcast scenarios by sending
a linear combination of several data instead of a copy of each data. At the destination
nodes, data can be decoded by solving the linear equations [84] [106]. Network
coding exploits the trade-off between computation and communication since
communications are slow compared to computations and more energy consumption.
1.2.5 Charging solution
Several recent types of research address energy harvesting and wireless
charging techniques for WSNs as promising solutions because of recharge capability
without human intervention.
Energy harvesting techniques have been developed to enable the sensors to
harvest energy from their surrounding environment such as solar, wind or kinetic
energy [107]. Energy harvesting schemes often require energy prediction to manage
the available power efficiently. It is important to note that because of the limitation
of remain energy between two harvesting opportunities, the energy saving
mechanisms are still necessary to implement.
The breakthrough in wireless power transfer is expected to enable the wireless
charging capability for WSNs. Wireless charging can be done in two ways:
electromagnetic radiation and magnetic resonant coupling. It is shown that
22
• The communication activity has an energy consumption much higher than the
computation activity. It has been shown that transmitting one bit may consume as
much as executing a few thousand instructions [112]. Therefore, communication
should be traded for computation.
Data correlation can allow us to reduce the data transferring, or even to put
some sensor nodes to sleep. Thus, it can make WSNs conserve energy significantly.
23
Problem statements and contributions
The main problems in this research are “How to recognize the correlation
among dataset by looking at data itself and how to exploit the correlation
characteristic for energy conservation in WSNs”. In this research, we focus on
WSNs working in high correlation environment. A high correlation environment can
be divided into groups called correlation regions where measured data strongly
depends on each other. By clustering sensor nodes into correlation regions, data
aggregation can be done to conserve the energy in WSNs. In this paper, we focus on
two data aggregation schemes including data compression and representative
aggregation. The main contributions of the thesis are:
Developing an entropy correlation clustering algorithm and entropy
correlation model to describe the correlation characteristics of a correlation cluster.
This algorithm can divide a correlation environment into several correlation
regions using the entropy values of measured data and the entropy correlation
coefficients of measured data pairs in the environment. At the same time, this
algorithm uses only the data itself and does not depend on the distance information.
The correlation model describes the relationship between the joint entropy of a dataset
and the number of data series in the dataset, in consideration of data’s entropy
correlation coefficient.
Analyzing and evaluating the impact of the correlation characteristic to data
Correlation model survey
Correlation is represented for the relationships between quantitative
variables or categorical variables. In other words, it’s a measure of how things are
related. Data correlation is a measure of how data is related to each other.
To exploit the correlation in WSNs, it is necessary to recognize the correlation
among data in the network by establishing correlation models. There have been many
research efforts to study the correlation model in WSNs. In [111], correlated nodes
are supposed to observe the same source 𝑆, and the observed data 𝑋𝑖 (𝑡 ) at the ith node
is the sum of a correlated version of the source 𝑆𝑖 (𝑡 ) and observed noise 𝑁𝑖 (𝑡 ).
𝑋𝑖 (𝑡 ) = 𝑆𝑖 (𝑡 ) + 𝑁𝑖 (𝑡 ).
(2.1)
The correlation model is the covariance function 𝐾𝜗 (correlation coefficient
𝜌) that is chosen to be distance dependence and can be classified into four groups
including:
Spherical:
3𝑑 1 𝑑 3
( )
𝐾𝜗 = {1 − 2 𝜃1 + 2 𝜃2 if 0 ≤ 𝑑 ≤ 𝜃1 ,
0
if 𝑑 > 𝜃1 > 0.
(2.2)
Power exponential:
𝐾𝜗 = 𝑒
(−