Nghiên cứu phát triển các phương pháp của lý thuyết đồ thị và otomat trong giấu tin mật và mã hóa tìm kiếm - Pdf 60

MINISTRY OF EDUCATION AND TRAINING
HANOI UNIVERSITY OF SCIENCE AND TECHNOLOGY
——————————

Nguyen Huy Truong

RESEARCH ON DEVELOPMENT OF METHODS
OF GRAPH THEORY AND AUTOMATA
IN STEGANOGRAPHY AND SEARCHABLE ENCRYPTION

DOCTORAL DISSERTATION IN MATHEMATICS AND INFORMATICS

Hanoi - 2020


MINISTRY OF EDUCATION AND TRAINING
HANOI UNIVERSITY OF SCIENCE AND TECHNOLOGY
——————————

Nguyen Huy Truong

RESEARCH ON DEVELOPMENT OF METHODS
OF GRAPH THEORY AND AUTOMATA
IN STEGANOGRAPHY AND SEARCHABLE ENCRYPTION
Major: Mathematics and Informatics
Major code: 9460117

DOCTORAL DISSERTATION IN MATHEMATICS AND INFORMATICS

SUPERVISORS:
1. Assoc. Prof. Dr. Sc. Phan Thi Ha Duong

valuable comments and helpful advice.
I give thanks to PhD students of Late Assoc. Prof. Dr. Phan Trung Huy for sharing
and exchanging information in steganography and searchable encryption.
Finally, I must also thank my family for supporting all my work.


CONTENTS

Page
LIST OF SYMBOLS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
iii
LIST OF ABBREVIATIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . .
iv
LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
v
LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
vi
INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
CHAPTER 1 PRELIMINARIES . . . . . . . . . . . . . . . . . . . . . . . 4
1.1 Basic Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4
1.1.1 Strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4
1.1.2 Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4
1.1.3 Deterministic Finite Automata . . . . . . . . . . . . . . . . . . . .
6
m
1.1.4 The Galois Field GF (p ) . . . . . . . . . . . . . . . . . . . . . . .
7

4.2
4.3
4.4
4.5

Mathematical Basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Automata Models for Solving The LCS Problem . . . . . . . . . . . . . . .
Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
CHAPTER 5 CRYPTOGRAPHY BASED ON STEGANOGRAPHY
AND AUTOMATA METHODS FOR SEARCHABLE ENCRYPTION . .
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.2 A Novel Cryptosystem Based on The Data Hiding Scheme (2, 9, 8) . . . . .
5.3 Automata Technique for Exact Pattern Matching on Encrypted Data . . .
5.4 Automata Technique for Approximate Pattern Matching on Encrypted Data
5.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
CONCLUSION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
LIST OF PUBLICATIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

ii

57
61
66
67
68
68
70
74

The length of a LCS(p, x)
LeftID(u)
The least element the leftmost location of u
Rmp (u)
The last component of LeftID(u) in p
(I, M, K, Em, Ex) A data hiding scheme
I
A set of all image blocks with the same size and image format
M
A finite set of secret elements
K
A finite set of secret keys
Em
An embedding function embeds a secret element in an image
block
Ex
An extracting function extracts an embedded secret element
from an image block
qcolour
The number of different ways to change the colour of each
pixel in an arbitrary image block
I
An image block
M
A secret element
K
A secret key
Adjacent(cp , a)
An adjacent vertex of cp
A string of length c

CTL
EBOM
ER
FJS
FOPA
FSBNDM
HASH
HCIH
LBNDM
LCS
LSB
MSDR
MSE
NP
OPA
PA
PCT
PSNR
RGB
SA
SAE
SBNDM
SE
SSE
TVSBS
WF
WL

Average Optimal Shift Or
Brute Force


iv


LIST OF FIGURES

Figure
Figure
Figure
Figure
Figure

1.1.
1.2.
1.3.
1.4.
1.5.

A simple graph . . . . . . . . . . . . . . . . . . .
A spanning tree of the graph given in Figure 1.1
The transition diagram of A in Example 1.3 . .
The basic diagram of digital image steganography
The degree of appearance of the pattern p . . . .

.
.
.
.
.


.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.


Table 2.4. The payload, ER and PSNR for the optimal data hiding scheme
(1, 2n − 1, n) for palette images with qcolour = 1 . . . . . . . . . . . . .
Table 2.5. The payload, ER and PSNR for the near optimal data hiding scheme
(2, 9, 8) for gray images with qcolour = 3 . . . . . . . . . . . . . . . . . .
Table 2.6. The payload, ER and PSNR for the near optimal data hiding scheme
(2, 9, 8) for palette images with qcolour = 3 . . . . . . . . . . . . . . . .
Table 2.7. The comparisons of embedding and extracting time between the
chapter’s and Chang et al.’s approach for the same optimal data hiding
scheme (1, N, log2 (N + 1) ), where N = 2n − 1, for the binary image
with qcolour = 1. Time is given in second unit . . . . . . . . . . . . . .
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table

3.1. The performing steps of the MR1 algorithm . . . . . . . . .
3.2. Experimental results on rand4 problem . . . . . . . . . . .
3.3. Experimental results on rand8 problem . . . . . . . . . . .
3.4. Experimental results on rand16 problem . . . . . . . . . . .
3.5. Experimental results on rand32 problem . . . . . . . . . . .
3.6. Experimental results on rand64 problem . . . . . . . . . . .
3.7. Experimental results on rand128 problem . . . . . . . . . .
3.8. Experimental results on rand256 problem . . . . . . . . . .
3.9. Experimental results on a genome sequence (with |Σ| = 4) .

.
.
.
.

.
.
.
.
.
.
.
.
.
.

36
37
37

37

.
.
.
.
.
.
.
.

.
.
.
.

30
30
31


INTRODUCTION
In the modern life, when the use of computer and Internet is more and more essential,
digital data (information) can be copied as well as accessed illegally. As a result,
information security becomes increasingly important. There are two popular methods to
provide security, which are cryptography and data hiding [2, 5, 6, 20, 56, 62, 81].
Cryptography is used to encrypt data in order to make the data unreadable by a third
party [5]. Data hiding is used to embed data in digital media. Based on the purpose of
the application, data hiding is generally divided into steganography that hides the
existence of data to protect the embedded data and watermarking that protects the
copyright ownership and authentication of the digital media carrying the embedded data.
Steganography can be used as an alternative way to cryptography.
However,
steganography will become weak if attackers detect existence of hidden data. Hence
integrating cryptography with steganography is as a third choice for data security
[2, 5, 6, 12, 19, 61, 62, 81, 86, 93].
With the rapid development of applications based on Internet infrastructure, cloud
computing becomes one of the hottest topics in the information technology area. Indeed, it
is a computing system based on Internet that provides on-demand services from application
and system software, storage to processing data. For example, when cloud users use the
storage service, they can upload information to the servers and then access it on the Internet

Based on the results and suggestions introduced by P. T. Huy et al., the dissertation
will focus on following four problems in steganography and searchable encryption:
- Digital image steganography;
- Exact pattern matching;
- Longest common subsequence;
- Searchable encryption.
The first problem is stated newly in Chapter 2, the three remaining problems are recalled
and clarified in Chapter 1.
For the first three problems, the dissertation’s work is to find new and efficient solutions
using graph theory and automata. Then they will be used and applied to solve the last
problem.
The dissertation has been completed with structure as follows.
Apart from
Introduction at the beginning and Conclusion at the end of the dissertation, the main
content of it is divided into five chapters.
Chapter 1.
Preliminaries. This chapter recalls basic knowledge indicated
throughout the dissertation (strings, graph, deterministic finite automata, digital images,
the basic model of digital image steganography, some parameters to determine the
quality of digital image steganography, the exact pattern matching problem, the longest
common subsequence problem, and searchable encryption), re-presents important
concepts and results used and researched on development in remaining chapters of the
dissertation (adjacency list, breadth first search, Galois field, the fastest optimal parity
assignment method, the module method and the concept of the maximal secret data
ratio, the concept of the degree of fuzziness (appearance), the Knapsack Shaking
approach, and the definition of a cryptosystem).
Chapter 2. Digital image steganography based on the Galois field using
graph theory and automata. Firstly, from some proposed concepts of optimal and
near optimal secret data hiding schemes, this chapter states the interest problem in digital
image steganography. Secondly, the chapter proposes a new approach based on the Galois

secret data, respectively. In searchable encryption, the cryptosystem can be used to encode
and decode secret data on users side and pattern matching algorithms can be used to
perform pattern search on cloud providers side.
The contents of the dissertation are written based on the paper [T1] published in,
the revised manuscript [T4] submitted to KSII Transactions on Internet and Information
Systems (ISI), and the papers [T2, T3] published in Journal of Computer Science and
Cybernetics in 2019. The main results of the dissertation have been presented at:
- Seminar on Mathematical Foundations for Computer Science at Institute of
Mathematics, Vietnam Academy of Science and Technology,
- The 9th Vietnam Mathematical Congress, Nha Trang, August 14-18, 2018,
- Seminar at School of Applied Mathematics and Informatics, Hanoi University of
Science and Technology.

3


CHAPTER 1

PRELIMINARIES
This chapter will attempt to recall terminologies, concepts, algorithms and results which
are really needed in order to present the dissertation’s new results clearly and logically,
as well as help readers follow the content of the dissertation easily. The background
knowledge re-presented here consists of basic structures (Section 1.1: strings (Subsection
1.1.1), graph (Subsection 1.1.2), deterministic finite automata (Subsection 1.1.3), and the
Galois field GF (pm ) (Subsection 1.1.4)), digital image steganography (Section 1.2), exact
pattern matching (Section 1.3 ), longest common subsequence (Section 1.4) and searchable
encryption (Section 1.5).

1.1 Basic Structures
1.1.1 Strings

Cover
Image

Embed

Stego
Image

Stego
Image

Extract

Cover
Image

Send
to
An edge connecting a vertex to itself
is called
a loop. Multiple edges are edges connecting
the same vertices. A graph having no loops and no multiple edges is called a simple graph.
In a simple graph, the edge associated to an unordered pair of vertices {i, j} is called the
Secret Key
Secret
edge {i,
j}. Key
Two vertices i and j in a graph G are called adjacent if they are vertices of an edge of
G.
Sender

2

1, 3, 4

3

1, 2, 4, 5

4

2, 3, 5

5

3, 4

Given a simple graph G, a subgraph of G that is a tree including every vertex of G is
called a spanning tree of G. A spanning tree of a connected simple graph can be built by
using breadth first search (BFS). This algorithm is shown in pseudo-code as follows.
Breadth First Search:
Input: A connected simple graph G with vertices ordered as i1 , i2 , . . . , in .
Output: A spanning tree T .
Set T to be a tree consisting only i1 ;
Set L to be an empty list;
Put i1 in L
While (L is not empty)
{
Remove the first vertex i from L;
5


using BFS as in Figure 1.2.

1
2

3

4

5

2

3

4

5

Figure 1.2. A spanning tree of the graph given in Figure 1.1

A graph with directed edges (or arcs) is called a directed graph. Each arc is associated
with the ordered pair of vertices. In a simple directed graph, the arc associated with the
ordered pair (i, j) called the arc (i, j). And the vertex i is said to be adjacent to the vertex
j and the vertex j is said to be adjacent from the vertex i.
1.1.3 Deterministic Finite Automata
Study on the problem of the construction and the use of deterministic finite automata
is one of objectives of the dissertation. Hence, this subsection will clarify this model of
computation [44, 82].
Definition 1.1 ([44]). Let Σ be a finite alphabet. A deterministic finite automaton


b

q0

q0

q1

q1

q2

q1

q2

q2

q2

a

q0

a, b

b

q1

Image
m
Galois field GF (p ), where p is prime and m ≥ 1 is an integer [88]. The algebraic structure
Send to
will be used in Chapter 2.
Let p be a prime number. Define Zp [x] to be the set of all polynomials with the variable
x, whose coefficients belong to the field Zp . Addition and multiplication in Zp [x] are defined
Secret Key
Key the coefficients modulo p at the end.
in the usual way andSecret
then reduce
For f (x) ∈ Zp [x], the degree of f (x), denoted by deg(f ), is the largest exponent of
Sender
Receiver
x in f (x). A polynomial
f (x) ∈ Zp [x] is called to be irreducible if there does
not exist
7

Cover
Image


polynomials f1 (x), f2 (x) ∈ Zp [x] such that
f (x) = f1 (x)f2 (x),
where deg(f1 ) > 0 and deg(f2 ) > 0.
Let f (x) ∈ Zp [x] be an irreducible polynomial with deg(f ) = m ≥ 1. Define
Zp [x]/(f (x)) to be the set of pm polynomials of degree at most m − 1 in Zp [x]. Addition
and multiplication in Zp [x]/(f (x)) are given as in Zp [x], followed by a reduction modulo
f (x). Then Zp [x]/(f (x)) with these operations is a field having pm elements, called the

representing a pixel and is limited by 8 bits. For a string of 8 bits, call palette images 8-bit
palette images.
The objective of digital image steganography is to protect data by hiding the data in
a digital image well enough so that unauthorized users will not even be aware of their
existence [21, 18]. Figure 1.4 shows the basic model of digital image steganography, where
the cover image is a digital image used as a carrier to embed secret data into, the stego
image is digital image obtained after embedding secret data into the cover image by the
8


function block Embed with the secret key on the Sender side. For steganography generally,
a
a, b
the secret data needs to be extracted fully by the block Extract with the secret key on
the Receiver side [20, 61, 63, 76].
The total number of the secret data
in the cover image is called
b sequence
abits embedded
q
q
q
2
1
0
a Payload. Corresponding to a certain Payload, to measure the embedding capacity of the
cover image, the embedding rate (ER) is used and defined as follows [104].
ER =

b

Send to

Secret Key

Secret Key

Sender

Receiver

Figure 1.4. The basic diagram of digital image steganography

The peak signal to noise ratio (PSNR) is used to evaluate quality of stego image. Based
on the value of PSNR, we can know the degree of similarity between the cover image and
stego image. If the PSNR value is high, then quality of stego image is high. Conversely,
quality of stego image is low. In general, for the digital image, PSNR is defined by the
2
4
following formula [20, 53]
2
255
PSNR = 10 log10
(dB),
(1.2)
MSE
1
where
MSE =

W −1

2. Val(c) =Val(Next(c)) + 1 on the field Z2 .
Call GP = (VP , EP ) a weighted complete undirected graph of the palette image G, where
VP = P and the weight of the edge {c, c } is d(c, c ). The function Nearest, Nearest: P → P ,
is given by Nearest(c) = c holding d(c, c ) = minv=c∈P d(c, v). A rho forest F = (V, E) is
a directed graph with vertices weighted by the functionVal, where V = VP , E is a set of
all arcs (v, Next(v)), the vertex v has the weightVal(v) for all v ∈ V . The construction of
a algorithm determining F is the essence of the FOPA method.
Algorithm for FOPA:
Input: A weighted complete undirected graph GP , the function Nearest.
Output: A rho forest F = (V, E).
Choose a vertext c ∈ P , set V = {c}, and set C = P \{c};
SetVal(c) = 0; // Or 1 randomly
While (C is not empty) // Update F
{
a) Take one element v ∈ C;
b) Initialize v0 = v, setVal(v0 ) = 0 (or 1 randomly), by a finite loop, find a longest
sequence of k + 1 different elements in P consecutively, v0 , v1 , . . . , vk , such that
Nearest(vi ) = vi+1 for ∀i = 0, k − 1, vi ∈ C, vk ∈ C or vk ∈ V , and set
Next(vi ) = vi+1 , i = 0, k − 1;
b1) Case vk ∈ C: SetVal(vi ) = 1+Val(vi−1 ), i = 1, k and Next(vk ) = vk−1 ;
Set V = V ∪ {v0 , v1 , . . . , vk } and C = C\{v0 , v1 , . . . , vk };
b2) Case vk ∈ F : SetVal(vi ) = 1+Val(vi+1 ), i = k − 1, . . . , 1, 0;
Set V = V ∪ {v0 , v1 , . . . , vk−1 } and C = C\{v0 , v1 , . . . , vk−1 };
}
Return F ;
End.
Definition 1.3 ([51]). Let M be a module over the ring Zm , k > 0 be a natural number,
and U be a subset of M \{0}. Call U a k-base of M if for any v in M \{0}, there
exist t elements v1 , v2 , . . . , vt ∈ U, t ≤ k, together with a1 , a2 , . . . , at ∈ Zm such that
v = v1 a1 + v2 a2 + .. + vt at .

2
k
k
MSDR k (N ) = log2 (1 + qcolour CN
+ qcolour
CN
+ · · · + qcolour
CN
) .

(1.3)

1.3 Exact Pattern Matching
This section will restate the exact pattern matching problem, and recall the concept of
the degree of fuzziness (appearance) used in Chapter 3 [24, 52, 68].
Let x be a string of length n. Denote the substring x[i]x[i + 1]..x[j] of x by x[i..j]
for ∀1 ≤ i ≤ j ≤ n, the ith element of x by x[i] and i is called a position in x. Let
p be a substring of length m of x, where m is a positive integer, then there exists i for
1 ≤ i ≤ n − m + 1 such that p = x[i..i + m − 1]. And say that i is an occurrence of p in x
or p occurs in x at position i.
Definition 1.5 ([68]). Let p be a pattern of length m and x be a text of length n over
the alphabet Σ. Then the exact pattern matching problem is to find all occurrences of the
pattern p in x.
The following example uses the Brute Force (BF) algorithm [24] to demonstrate the
most original way solving this problem.
Table 1.2. The performing steps of the BF algorithm

Step x
1


d

f

f

a h
f

a

h

f

k

f

a

h

a

h

f

a


11

a

h


Example 1.4. Given a pattern p = fah and a text x = dfahfkfaha. Then there are two
occurrences of p in x as shown below: dfahfkfaha. The BF algorithm is performed by the
following steps presented in Table 1.2, the bold letters correspond to the mismatches, the
underlined letters represent the matches when comparing the letters of the pattern and
the text. We know that many letters scanned will be scanned again by the BF algorithm
because each time either a mismatch or a match occurs, the pattern is only moved to the
right one position.
Chapter 3 uses the degree of fuzziness in [52] to determine the longest prefix of the
pattern in the text at any position. However, this terminology can lead to several
misunderstandings for the readers. So throughout this dissertation, the degree of
fuzziness will be replaced with the degree of appearance. The concept of the degree of
appearance is restated as follows.
Definition 1.6 ([52]). Let p be a pattern and x be a text of length n over the alphabet
Σ. Then for each 1 ≤ i ≤ n, a degree of appearance of p in x at position i is equal to the
length of a longest substring of x such that this substring is a prefix of p, where the right
end letter of the substring is x[i].
Notice that obviously, if the degree of appearance of p in x at an arbitrary position i
equals |x|, then a match for p in x occurs at position i − |p| + 1. Figure 1.3 illustrates the
concept of the degree of appearance of the pattern p in x.

The degree of appearance of p in x at the position being scanned is equal to 4


lcs(p, x) = 4.
Let p and x be two strings of lengths m and n over the alphabet Σ, m ≤ n. The longest
common subsequence problem for two strings (LCS problem) can be stated in two following
forms [24, 47].
Problem 1. Find a longest common subsequence of p and x.
Problem 2. Compute the length of a longest common subsequence of p and x.
The simple way to solve the LCS problem is to use the algorithm introduced by
Wagner and Fischer in 1974 (called the Algorithm WF). This algorithm defines a dynamic
programming matrix L(m, n) recursively to find a LCS(p, x) and compute the lcs(p, x) as
follows [94].


i = 0 or j = 0,
0
L(i, j) = L(i − 1, j − 1) + 1
p[i] = x[j],




max{L(i, j − 1), L(i − 1, j)} otherwise,

where L(i, j) is the lcs(p[1..i], x[1..j]) for 1 ≤ i ≤ m, 1 ≤ j ≤ n.
Example 1.6. Let p = bgcadb and x = abhcbad. Use the Algorithm WF, the L(m, n)
is obtained below. Then lcs(p, x) = L(6, 7) = 4. In Table 1.3, by traceback procedure,
starting from value 4 back to value 1, a LCS(p, x) found is a string bcad.
Table 1.3. The dynamic programming matrix L

p=
b

1 1 1
2 2 2
2 2 3
2 2 3
2 3 3

d
7
0
1
1
2
3
4
4

Definition 1.10 ([47]). Let u = p[j1 ]p[j2 ] . . . p[jt ] be a subsequence of p. Then an element
of the form (j1 , j2 , . . . , jt ) is called a location of u in p.
From Definition 1.10, the subsequence u has at least a location in p. If all the different
locations of u are arranged in the dictionary order, then call the least element the leftmost
location of u, denoted by LeftID(u). Denote the last component of LeftID(u) by Rmp (u)
[47].

13


Example 1.7. Let p = aabcadabcd and u = abd. Then u is a subsequence of p and has
seven different locations in p, in the dictionary order they are
(1, 3, 6), (1, 3, 10), (1, 8, 10), (2, 3, 6), (2, 3, 10), (5, 8, 10), (7, 8, 10).
It follows that LeftID(u) = (1, 3, 6) and Rmp (u) = 6.

In 2002, P. T. Huy et al. introduced a method to solve the Problem 1 by using the
automaton given as in the following theorem. In this way, they named their method the
Knapsack Shaking approach [47].
Theorem 1.1 ([47]). Let p and x be two strings of lengths m and n over the alphabet
Σ, m ≤ n. Let Ap = (Σ, Q, q0 , ϕ, F ) corresponding to p be an automaton over the alphabet
Σ, where
• The set of states Q = Config(p),
14


• The initial state q0 = C0 ,
• The transition function ϕ is given as in Definition 1.12,
• The set of final states F = {Cn }, where Cn = ϕ(q0 , x).
Suppose Cn = {x1 , x2 , . . . , xt } for 1 ≤ t ≤ m. Then
1. For every subsequence u of p and x, there exists xi ∈ Cn , 1 ≤ i ≤ t such that the two
following conditions are satisfied.
(i) |u| = |xi |,
(ii) Rm p (xi ) ≤ Rm p (u).
2. A LCS(p, x) equals xt .

1.5 Searchable Encryption
This section clarifies the term of searchable encryption (SE) and recalls the definition
of a cryptosystem. They will be studied and used in Chapter 5 [26, 40, 60, 85, 88, 102].
Consider a problem to occur in cloud security as follows [60, 85, 102]. Cloud tenants, for
example enterprises and individuals with limited resource including software and hardware,
store data with sensitive information on cloud servers. Assume that these servers cannot
be fully trusted. This means they may not only be curious about the users’ information
but also abuse the data received. Then users wish to encrypt their data before uploading
them to servers. Because of limitations of cloud users’ information technology system,
users also wish that cloud providers can help them perform information search directly


Nhờ tải bản gốc

Tài liệu, ebook tham khảo khác

Music ♫

Copyright: Tài liệu đại học © DMCA.com Protection Status