Nghiên cứu phát triển các phương pháp của lý thuyết đồ thị và otomat trong giấu tin mật và mã hóa tìm kiếm tt tiếng anh - Pdf 60

MINISTRY OF EDUCATION AND TRAINING
HANOI UNIVERSITY OF SCIENCE AND TECHNOLOGY
——————————

Nguyen Huy Truong

RESEARCH ON DEVELOPMENT OF METHODS
OF GRAPH THEORY AND AUTOMATA
IN STEGANOGRAPHY AND SEARCHABLE ENCRYPTION
Major: Mathematics and Informatics
Major code: 9460117

ABSTRACT OF DOCTORAL DISSERTATION IN MATHEMATICS
AND INFORMATICS

Hanoi - 2020

The dissertation is completed at:
Hanoi University of Science and Technology

Supervisors:
1. Assoc. Prof. Dr. Sc. Phan Thi Ha Duong
2. Dr. Vu Thanh Nam

Reviewer 1:
Reviewer 2:
Reviewer 3:
The dissertation will be defended before approval committee at
Hanoi University of Science and Technology

to protect the embedded data and watermarking that protects the
copyright ownership and authentication of the digital media carrying
the embedded data.
Steganography can be used as an alternative way to cryptography.
However, steganography will become weak if attackers detect existence
of hidden data. Hence integrating cryptography with steganography is
as a third choice for data security.
With the rapid development of applications based on Internet
infrastructure, cloud computing becomes one of the hottest topics in
the information technology area. Indeed, it is a computing system
based on Internet that provides on-demand services from application
and system software, storage to processing data. For example, when
cloud users use the storage service, they can upload information to
the servers and then access it on the Internet online. Meanwhile,
enterprises can not spend big money on maintaining and owning a
system consisting of hardware and software.
Although cloud
computing brings many benefits for individuals and organizations,
cloud security is still an open problem when cloud providers can
abuse their information and cloud users lose control of it. Thus,
guaranteeing privacy of tenants’ information without negating the
benefits of cloud computing seems necessary. In order to protect
cloud users’ privacy, sensitive data need to be encoded before
outsourcing them to servers. Unfortunately, encryption makes the
servers perform search on ciphertext much more difficult than on
plaintext.
To solve this problem, many searchable encryption
1

For the first three problems, the dissertation’s work is to find new
and efficient solutions using graph theory and automata. Then they
will be used and applied to solve the last problem.
The dissertation has been completed with structure as follows.
Apart from Introduction at the beginning and Conclusion at the
end of the dissertation, the main content of it is divided into five
chapters.
2

Chapter 1. Preliminaries.
Chapter 2. Digital image steganography based on the Galois field
using graph theory and automata.
Chapter 3. An automata approach to exact pattern matching.
Chapter 4.
Automata technique for the longest common
subsequence problem.
Chapter 5. Cryptography based on steganography and automata
methods for searchable encryption.
The contents of the dissertation are written based on the paper
[T1] published in, the revised manuscript [T4] submitted to KSII
Transactions on Internet and Information Systems (ISI), and the
papers [T2, T3] published in Journal of Computer Science and
Cybernetics in 2019. The main results of the dissertation have been
presented at: Seminar on Mathematical Foundations for Computer
Science at Institute of Mathematics, Vietnam Academy of Science and
Technology; The 9th Vietnam Mathematical Congress, Nha Trang,
August 14-18, 2018; Seminar at School of Applied Mathematics and
Informatics, Hanoi University of Science and Technology.
CHAPTER 1

a
q2
q1
q0 (FOPA)
optimal parity assignment
method,
the module method and
the concept of the maximal secret
data
ratio
(MSDR).
b
The basic model of digital image steganography is shown in Figure
1.4.
Secret Data

Cover
Image

Secret Data
Communication
Channel

Embed

Stego
Image

Stego
Image

1
2
2
k
k
MSDR k (N ) = log2 (1+qcolour CN
+qcolour
CN
+· · ·+qcolour
CN
) . (1.3)

4

1

3

5

1.3 Exact Pattern Matching
This section will restate the exact pattern matching problem, and
recall the concept of the degree of fuzziness (appearance) used in
Chapter 3.
Definition 1.5. Let p be a pattern of length m and x be a text of length
n over the alphabet Σ. Then the exact pattern matching problem is to
find all occurrences of the pattern p in x.
Definition 1.6 (P. T. Huy et al., 2002). Let p be a pattern and x be
a text of length n over the alphabet Σ. Then for each 1 ≤ i ≤ n, a

CHAPTER 2
DIGITAL IMAGE STEGANOGRAPHY BASED ON THE
GALOIS FIELD USING GRAPH THEORY AND
AUTOMATA
This chapter first proposes concepts of optimal and near optimal
secret data hiding schemes. The chapter then proposes a new digital
image steganography approach based on the Galois field GF (pm )
using graph and automata to design the data hiding scheme of the
general form (k, N, log2 pmn ) for binary, gray and palette images
with the given assumptions, where k, m, n, N are positive integers and
p is prime, shows sufficient conditions for existence and proves
existence of some optimal and near optimal secret data hiding
schemes. These results are derived from the concept of the maximal
secret data ratio of embedded bits, the module method and the
FOPA method proposed by P. T. Huy et al. in 2011, 2012 and 2013,
recalled in Section 1.2 of Chapter 1. An application of the schemes to
the process of hiding a finite sequence of secret data in an image is
also considered. Security analyses and experimental results confirm
that the proposed approach can create steganographic schemes which
achieve high efficiency in embedding capacity, visual quality, speed as
well as security, which are key properties of steganography.
The results of Chapter 2 have been published in [T1].
2.1 Introduction
2.2 The Digital Image Steganography Problem
Definition 2.1. A block based secure data hiding scheme in digital
images (for short, called a data hiding scheme) is a five tuple
(I, M, K, Em, Ex), where the following conditions are satisfied.
1. I is a set of all image blocks with the same size and image type,
2. M is a finite set of secret elements,
3. K is a finite set of secret keys,

where x, y ∈ GF n (pm ) and x = (x1 , x2 , . . . , xn ), y = (y1 , y2 , . . . , yn ).
We remember that (GF n (pm ), +, ·) is a vector space over the field
GF (pm ).
Definition 2.5. The class of an element x ∈ GF n (pm ), denoted by
[x], is given by
[x] = {ax|a ∈ GF (pm )\{0}}.

7

Given a class [x], x is referred to as the representative of [x]. For
simplicity, denote the class [0] by 0.
Denote the set of all classes by [GF n (pm )]. This can be represented
by [GF n (pm )] = {[x]|x ∈ GF n (pm )}. The number of elements of a set
S is denoted by |S|.
Propostion 2.2. |[GF n (pm )]\{0}| =

pmn −1
pm −1 .

Definition 2.6. Suppose S ⊂ [GF n (pm )]\{0}. Then S is called a
k-[Generators] for the set [GF n (pm )], where k is a positive integer, if
∀[v] ∈ [GF n (pm )]\{0}, [v] ∈ {[ ti=1 ai vi ]|ai ∈ GF (pm )\{0},
[vi ] ∈ S, i = 1, t, t ≤ k}.
Definition 2.7. Let V be a vector space over a field K, S ⊂ V . Then
S is called a k-Generators for V , where k is a positive integer, if the
two following conditions are satisfied.
a) ∀v, v ∈ S, a ∈ K, v = av,
b) ∀v
∈

S = {v1 , v2 , . . . , vN }.
Definition 2.8. A weighted directed graph G = (V, E) is called a flip
graph over the Galois field GF (pm ) (for short, called a flip graph) if
the two following conditions are satisfied.
1. V = C and for all v ∈ V , the vertex v is assigned a weight by a
functionVal such thatVal(v) ∈ GF (pm ),
2. For ∀cp ∈ V, ∀a ∈ GF (pm )\{0}, ∃!(cp , cp ) ∈ E and the arc
(cp , cp ) is assigned the weight a such that Val (cp ) = Val (cp ) + a (on
GF (pm )).
Given a flip graph G, we denote by Adjacent(cp , a) an adjacent
vertex of cp , where the weight a is assigned to the arc
(cp , Adjacent(cp , a)).
Assume that build a flip graph G = (V, E).
From the way to determine the arc set E in Definition 2.8, assume
that
|C| ≥ pm and qcolour = pm − 1.
(2.1)
Let an image block I ∈ I, a secret element M ∈ M, a key K ∈ K.
By using the automaton A(I, M, K) and the flip graph G, two functions
Em and Ex in the data hiding scheme (I, M, K, Em, Ex) are designed
as follows.
The function Em (embedding M in I):
q = q0 ;
For i = 1 to N Do q = δ(q, Ii );
q = δ(q, M );
For each (it , at ) in q Do Iit = Adjacent(Iit , at );
I = I;
The function Ex (extracting M from I ):
q = q0 ;
For i = 1 to N Do q = δ(q, Ii );

Notice that if we set N = 2n − 1, then the data hiding scheme
(1, 2n − 1, n) becomes the data hiding scheme (1, N, log2 (N + 1) ).
Remember that for N is a positive integer, the data hiding scheme
(1, N, log2 (N +1) ) for binary image with qcolour = 1 is the data hiding
scheme CTL (Chang et al., 2005). So, Proposition 2.6 shows that the
data hiding scheme CTL reaches an optimal data hiding scheme for
N = 2n − 1, where n is a positive integer.
Theorem 2.4. Suppose that find a 2-Generators S for the vector
space GF n (pm ) with |S| =

pm −3
+
2

mn
(pm −3)2
+2(2 log2 p
4
pm −1

−1)

and build

a flip graph G. Then there exists the optimal data hiding scheme
(2, |S|, log2 pmn ) for qcolour = pm − 1.
2.4 The Near Optimal and Optimal Data Hiding Schemes for
Gray and Palette Images
Here consider the case k = p = m = 2 and n = 4, the data hiding
scheme (2, N, 8) exists if the hypothesis of Theorem 2.2 is satisfied, it

MATCHING
This chapter proposes a flexible approach using automata to
design an effective algorithm for exact pattern matching in practice,
and compares it with some of the most efficient algorithms, such as
AOSO, EBOM, FJS, FSBNDM, HASHq,
LBNDM, SA,
BMH-SBNDM, SBNDMq, TVSBS. These results are based on the
concept of the degree of appearance introduced by P. T. Huy et al. in
2002, recalled in Section 1.3 of Chapter 1. Theoretical analyses and
experimental results show that in practice the proposed algorithm is
faster than the above mentioned algorithms in most of the given cases
of patterns and alphabets.
11

The results of Chapter 3 have been published in [T2].
3.1 Introduction
This chapter only focuses on addressing the exact pattern matching
problem, recalled in Section 1.3 of Chapter 1. Research on applying
the proposition of the chapter for solving this problem to SE will be
introduced in Chapter 5.
3.2 The New Algorithm - The MRc Algorithm
Given a positive integer c, a string of length c is called a c block.
A c block is called (resp. not) to be in p, denoted by c block ∈ p (resp.
c block ∈
/ p), if the c block is (resp. not) a substring of p. For a given
positive integer c, 1 ≤ c ≤ m and c ≤ i ≤ m, the substring p[i − c + 1..i]
is called a c block of p at position i, denoted by c blockip . In particular,
c = 1, then c block is only a letter.
Definition 3.3. Let p be a pattern and z be a c block of p, where c

b,c,#
b

b,c,#
b,#

12
a

Theorem 3.3. For any given pattern p and text x, the MRc algorithm
finds all occurrences of the pattern p in x.
3.3 Analysis of The MRc Algorithm
Propostion 3.2. Let p be a pattern of length m and x be a text of
length n over the alphabet Σ. Let c be a positive integer constant such
that 1 ≤ c ≤ m. Then MRc algorithm requires n + 2c letters of x
accessed in the worst case.
Denote the probability of an arbitrary event by P .
Propostion 3.3. Let p be a pattern of length m over the alphabet Σ.
If |Σ| ≥ 4 and 8 ≤ m ≤ 2048, then there exists c, 1 ≤ c ≤ 8 such that
for an arbitrary c block z over the alphabet Σ, P (z ∈ x) ≤ 2−5 with a
uniform distribution over the alphabet Σ.
Theorem 3.4. Let p be a pattern of length m and x be a text of length
n over the alphabet Σ. Let T (n) be the number of all letters of x
accessed by the MRc algorithm. If |Σ| ≥ 4, 16 ≤ m ≤ 2048 or |Σ| ≥ 32,
8 ≤ m ≤ 2048, then there exists c, 1 ≤ c ≤ 8 such that the two following
conditions are satisfied with a uniform distribution over Σ.
(a) T (n) < n,
(b) P (z ∈ x) ≤ 2−5 , where z is an arbitrary c block over the
alphabet Σ.

computing the length of a longest subsequence of two strings of
lengths m and n, which is the Problem 2 restated in Section 1.4 of
Chapter 1. Further, study on applying results of this chapter to SE for
approximate pattern matching is one of main objectives of Chapter 5.
4.2 Mathematical Basis
In fact, when apply the Problem 2 to the approximate pattern
matching problem, we only need to find a common subsequence of
two strings such that the length of this common subsequence is equal
to a given constant. So, in general case, Theorem 1.1 will be replaced
with the following theorem.
Theorem 4.1. Let p and x be two strings of lengths m and n over the
alphabet Σ, m ≤ n. Let c be a positive integer constant, 1 ≤ c ≤ m
and Acp = (Σ, Q, q0 , ϕ, F ) corresponding to p be an automaton over the
alphabet Σ, where
• The set of states Q = Config(p),
• The initial state q0 = C0 ,
• The transition function ϕ is given as in Definition 1.12,
•
The
set
of
final
states
F = {Cf |Cf ∈ Config(p), Cf = {x1 , x2 , . . . , xc } or Cf = ϕ(C0 , x)}.
Suppose Cf = {x1 , x2 , . . . , xt } is a final state for 1 ≤ t ≤ m. Then
there exists a substring u of x such that a LCS(p, u) equals xt .
14

Theorem 4.2. Let p be a string of length m on the alphabet Σ,

Output: The lcs(p, x).
q = W0 ; // Set up the initial state of the automaton APp c .
For i = 1 to |x| Do
{
q = δp (q, x[i]);
15

If (|q| = |p|) Break;
}
lcs(p, x) = |q|;
Propostion 4.3. Let p and x be two strings of lengths m and n over
the alphabet Σ, m ≤ n. Suppose the Algorithm 2 uses k processors
(k ≤ m), where k is an upper estimate of the length of a longest
common subsequence of the two strings. Then the time complexity of
the Algorithm 2 is O(n) in the worst case.
4.4 Experimental Results
This section carries out a number of experiments to compare the
two proposed algorithms with the Algorithm WF.
4.5 Conclusions
By automata technique, the lcs(p, x) is always reflected and updated
at every location being scanned in the string x. Then the chapter’s
technique for the LCS problem can be exploited in SE. This application
will be studied in Chapter 5.
CHAPTER 5
CRYPTOGRAPHY BASED ON STEGANOGRAPHY
AND AUTOMATA METHODS FOR SEARCHABLE
ENCRYPTION
This chapter first proposes a novel cryptosystem based on the data
hiding scheme (2, 9, 8), a new result presented in Section 2.4 of Chapter

I = I;
For each (it , at ) in q Do Iit = Adjacent(Iit , at );
q = q0 ;
For i = 1 to N Do q = δ(q, Ii );
M = q;
Remark 5.1. From defining two functions Em and Ex as above, all
image blocks I used are not changed.
Consider Σ to be an alphabet of size 256. Set P = Σ.
By the decimal representation of the vector space GF 4 (22 ) over the
field GF (22 ) in Section 2.4, then |P| = |GF 4 (22 )| = 256, hence there
exists a bijective function f from P to GF 4 (22 ), denote the inverse
function of f by f −1 . Put F to be a set of all f .
From the function δ2 , the state q of the automaton A(I, M, K)
computed by Statement (2.4) is a set. Suppose B is a binary string of
length 12 to present an arbitrary state q.
17

Put Q to be a set of all possible states q, C is a set of all 12-bit strings
B presenting q, q ∈ Q. Consider a function h, h : Q → C, h(q) = B,
where q is presented by B. Obviously, h is a bijective function. Denote
the inverse function of h by h−1 .
Let K = {(f, K, I)|f ∈ F, K ∈ K, I ∈ I} is a finite set of secret
keys. For k ∈ K , k = (f, K, I), ek and dk are defined as follows.
1. ek : P → C, ek (x) = h(Em (I, f (x), K)) for x ∈ P.
2. dk : C → P, dk (y) = f −1 (Ex (h−1 (y), I, K)) for y ∈ C.
Set E = {ek |k ∈ K }, D = {dk |k ∈ K }. From Definition 1.13, the
correctness of the cryptosystem (P, C, K , E, D) is guaranteed by the
following theorem.
Theorem 5.1. Let ∀x ∈ P, ∀k ∈ K , ek ∈ E and dk ∈ D. Then

given as follows.
iK = 1;
iF = 1;
For i = 1 to t3 Do
{
ki = (f, K iK , FiF );
yi = eki (xi );
iK = (iK − 1) mod t1 + 1;
iF = (iF − 1) mod t2 + 1;
}
y = y1 y2 ..yt3 ;
The decrypting algorithm dK used to decrypt y is given as follows.
iK = 1;
iF = 1;
For i = 1 to t3 Do
{
ki = (f, K iK , FiF );
xi = dki (yi );
iK = (iK − 1) mod t1 + 1;
iF = (iF − 1) mod t2 + 1;
}
x = x1 x2 ..xt3 ;
Remark 5.3. For two algorithms eK and dK given as above, an
arbitrary image block I in the input image F can be used many times
in process of encrypting and decrypting the secret data. So, for a give
input image F , the secret data encrypted is not limited by the size of
the input image F .
19

introduced results in Chapter 3, here continues using automata
technique to meet the requirement.
Theorem 5.2. Let p be a pattern over the alphabet Σ. Let two
automata Mp = (Σ, Qp , q0 , δp , Fp ) and Mp = (Σ , Qp , q0 , δp , Fp ) be
determined as in Theorem 3.2. Then Qp = Qp , Fp = Fp , ∀q ∈ Qp ,
∀a ∈ Σ , a = dk (a ), δp (q, a ) = δp (q, a), where p = ek (p).
20

Remark 5.5. The meaning of Theorem 5.2 in practice is to compute
δp from δp .
Let a pattern p and a text (secret data) x be two strings over the
same alphabet Σ and assume |p|
|x|. For assuming that we have
only the encrypted secret data y which is not decrypted to the secret
data x, from Propositions 5.2, 5.3 and 5.4, Theorem 5.2, based on the
MRc algorithm for c = 1 and using the type a breaking point and
the concept of Posp in Chapter 3, and by using the automaton Mp
given as in Theorem 3.2, we have an exact pattern matching algorithm
immediately that finds all occurrences of the pattern p in x as follows.
Note that the trapdoor according to the search pattern p is computed
based on p, which includes the length of p, the functions Sign, Posp
and the automaton Mp .
jump = |p|;
While (jump ≤ |y|)
{
If (sign(yjump ) == 1)
{
q = 0;
i = jump − P osp (yjump ) + 1;

To construct the approximate pattern matching algorithm, we need
a function to measure the string similarity. This section defines a new
measure of similarity between two strings
d(p, u) = 1 −

lcs(p, u)
,
min{|p|, |u|}

(5.11)

where p is a pattern and u is a substring of x.
The constant c in Theorem 4.4 is determined by c = (1 − )|p| .
Without decrypting y, based on Theorem 5.3, Definition 5.1 and
Formula (5.11), use the automaton APp c given as in Theorem 4.4, we
immediately have an approximate pattern matching algorithm which
determines whether p appears in x with the error or not as follows,
where the trapdoor responding to the pattern p is determined from p
and , which consists of the constant c and the automaton APp c .
app = 0;
q = W0 ; //The initial state of the automaton APp c is started from W0
For i = 1 to |y| Do
{
q = δp (q, yi );
22

Nhờ tải bản gốc

Tài liệu, ebook tham khảo khác

Nghiên cứu phát triển các phương pháp của lý thuyết đồ thị và otomat trong giấu tin mật và mã hóa tìm kiếm tt tiếng anh - Pdf 60

Tài liệu, ebook tham khảo khác

Học thêm