Nhận dạng cử chỉ động của bàn tay người sử dụng kết hợp thông tin hình ảnh và độ sâu ứng dụng trong tương tác người thiết bị - Pdf 47

MINISTRY OF EDUCATION AND TRAINING
HANOI UNIVERSITY OF SCIENCE AND TECNOLOGY

THI HUONG GIANG DOAN

DYNAMIC HAND GESTURE RECOGNITION USING RGB-D
IMAGES FOR HUMAN-MACHINE INTERACTION

DOCTORAL THESIS OF
CONTROL ENGINEERING AND AUTOMATION

Hanoi − 2017


MINISTRY OF EDUCATION AND TRAINING
HANOI UNIVERSITY OF SCIENCE AND TECNOLOGY

THI HUONG GIANG DOAN

DYNAMIC HAND GESTURE RECOGNITION USING
RGB-D IMAGES FOR HUMAN-MACHINE
INTERACTION

Specialty: Control Engineering and Automation
Specialty Code: 62520216

DOCTORAL THESIS OF
CONTROL ENGINEERING AND AUTOMATION

SUPERVISORS:
1. Dr. Hai Vu

Dr. Thi Thanh Hai TRAN

i


ACKNOWLEDGEMENT
This thesis was written during my doctoral study at International Research Institute Multimedia, Information, Communication and Applications (MICA), Hanoi
University of Science and Technology (HUST). It is my great pleasure to thank all the
people who supported me for completing this work.
First, I would like to express my sincere gratitude to my advisors Dr. Hai Vu and
Dr. Thi Thanh Hai Tran for the continuous support of my Ph.D. study and related research, for their patience, motivation, and immense knowledge. Their guidance helped
me in all the time of research and writing of this thesis. I could not have imagined
having a better advisor and mentor for my Ph.D. study.
Besides my advisors, I would like to thank the scientists and the authors of the
published works which are cited in this thesis, and I am provided with valuable information resources from their works for my thesis. The attention at scientific conferences
have always been a great experience for me to receive many the useful comments.
In the process of implementation and completion of my research, I have received
many supports from the board of MICA directors. My sincere thanks go to Prof. Yen
Ngoc Pham, Prof. Eric Castelli and Dr. Son Viet Nguyen, who provided me with an
opportunity to join researching works in MICA institute, and who gave access to the
laboratory and research facilities. Without their precious support would it have been
being impossible to conduct this research.
As a Ph.D. student of 911 programme, I would like to thanks 911 programme for
their financial support during my Ph.D course. I also gratefully acknowledge the financial support for publishing papers and conference fees from research projects T2014-100,
T2016-PC-189, and T2016-LN-27. I would like to thank my colleagues at Computer
Vision Department and Multi-Lab of MICA institute over the years both at work and
outside of work.
Special thanks to my family. Words can not express how grateful I am to my
mother and father for all of the sacrifices that they have made on my behalf. I would
also like to thank my beloved husband. Thank you for supporting me for everything.

xvi

1 LITERATURE REVIEW
1.1 Completed hand gesture recognition systems for controlling home appliances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.1.1 GUI device dependent systems . . . . . . . . . . . . . . . . . . .
1.1.2 GUI device independent systems . . . . . . . . . . . . . . . . .
1.2 Hand detection and segmentation . . . . . . . . . . . . . . . . . . . . .
1.2.1 Color . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2.2 Shape . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2.3 Motion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2.4 Depth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2.5 Discussions . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.3 Hand gesture spotting system . . . . . . . . . . . . . . . . . . . . . . .
1.3.1 Model-based approaches . . . . . . . . . . . . . . . . . . . . . .
1.3.2 Feature-based approaches . . . . . . . . . . . . . . . . . . . . .
1.3.3 Discussions . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.4 Dynamic hand gesture recognition . . . . . . . . . . . . . . . . . . . . .
1.4.1 HMM-based approach . . . . . . . . . . . . . . . . . . . . . . .
1.4.2 DTW-based approach . . . . . . . . . . . . . . . . . . . . . . .
1.4.3 SVM-based approach . . . . . . . . . . . . . . . . . . . . . . . .
1.4.4 Deep learning-based approach . . . . . . . . . . . . . . . . . . .
1.4.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.5 Discussion and Conclusion . . . . . . . . . . . . . . . . . . . . . . . . .

8
8
14
18
19
20


2.5

Defining dynamic hand gestures . . . . . . . . . . . . . . . . . . . . . .
The existing dynamic hand gesture datasets . . . . . . . . . . . . . . .
2.2.1 The published dynamic hand gesture datasets . . . . . . . . . .
2.2.1.1 The RGB hand gesture datasets . . . . . . . . . . . . .
2.2.1.2 The Depth hand gesture datasets . . . . . . . . . . . .
2.2.1.3 The RGB and Depth hand gesture datasets . . . . . .
2.2.2 The non-published hand gesture datasets . . . . . . . . . . . . .
2.2.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Definition of the closed-form pattern of gestures and phasing issues . .
2.3.1 A conducting commands of a dynamic hand gestures set . . . .
2.3.2 Definition of the closed-form pattern of gestures and phasing issues
2.3.3 Characteristics of dynamic hand gesture set . . . . . . . . . . .
Data collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.4.1 MICA1 dataset . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.4.2 MICA2 dataset . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.4.3 MICA3 dataset . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.4.4 MICA4 dataset . . . . . . . . . . . . . . . . . . . . . . . . . . .
Discussion and Conclusion . . . . . . . . . . . . . . . . . . . . . . . . .

3 HAND DETECTION AND GESTURE SPOTTING WITH USERGUIDE SCHEME
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2 Heuristic user-guide scheme . . . . . . . . . . . . . . . . . . . . . . . .
3.2.1 Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2.2 Proposed framework . . . . . . . . . . . . . . . . . . . . . . . .
3.2.3 Estimating heuristic parameters . . . . . . . . . . . . . . . . . .
3.2.3.1 Estimating parameters of background model for body
detection . . . . . . . . . . . . . . . . . . . . . . . . .

54
55

56
56
58
58
58
60
60
62
63
65
65
66
66
66
67
71
71


3.4.2
3.4.3

3.5

The computational time for hand segmentation and recognition
Performance of the hand region segmentations . . . . . . . . . .
3.4.3.1 Evaluate the hand segmentation . . . . . . . . . . . .

4.2.3.3 Phase synchronization using hand posture interpolation 94
4.2.3.4 Dynamic hand gesture recognition using difference classifications . . . . . . . . . . . . . . . . . . . . . . . . . 96
4.3 Experimental results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
4.3.1 Influence of temporal resolution on recognition accuracy . . . . 97
4.3.2 Tunning kernel scale parameters RBF-SVM classifier . . . . . . 98
4.3.3 Performance evaluation of the proposed method . . . . . . . . . 99
4.3.4 Impacts of the phase normalization . . . . . . . . . . . . . . . . 100
4.3.5 Further evaluations on public datasets . . . . . . . . . . . . . . 101
4.4 Discussion and Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . 103
4.4.1 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
4.4.2 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
5 CONTROLLING HOME APPLIANCES USING DYNAMIC HAND
GESTURES
105
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
v


5.2

5.3

5.4

Deployment of control systems using hand gestures . . . . . . . . . . .
5.2.1 Assignment of hand gestures to commands . . . . . . . . . . . .
5.2.2 Different modes of operations carried out by hand gestures . . .
5.2.2.1 Different states of lamp and their transitions . . . . . .
5.2.2.2 Different states of fan and their transition . . . . . . .
5.2.3 Implementation of the control system . . . . . . . . . . . . . . .

121
121
122
126

vi


ABBREVIATIONS
TT Abbreviation Meaning
1

ANN

Artifical Neural Network

2

ASL

American Sign Language

3

BB

Bounding Box

4


CIF

Common Intermediate Format

10

CNN

Convolution Neural Network

11

CPU

Central Processing Unit

12

CRFs

Conditional Random Fields

13

CSI

Channel State Information

14


DTW

Dynamic Time Warping

20

FAR

False Acceptance Rate

21

FD

Fourier Descriptor

22

FP

False Positive

23

FN

False Negative

24


GUI

Graphic User Interface

30

HCI

Human Computer Interaction

vii


31

HCRFs

Hidden Conditional Random Fields

32

HNN

Hopfield Neural Network

33

HMM

Hidden Markov Model


39

ISOMAP

ISOmetric MAPing

40

JI

Jaccard Index

41

KLT

Kanade Lucas Tomasi

42

KNN

K Nearest Neighbors

43

LAN

Local Area Network


49

MSC

Mean Shift Clustering

50

MR

Magic Ring

51

NB

Naive Bayesian

52

PC

Persional Computer

53

PCA

Principal Component Analysis


59

RBF

Radial Basic Function

60

RF

Random Forest

61

RGB

Red Green Blue

62

RGB-D

Red Green Blue Depth

63

RMSE

Root Mean Square Error


Short Time Energy

69

STF

Spatial Temporal Feature

70

ToF

Time of Flight

71

TN

True Negative

72

TP

True Positive

73

TV

16

Table 1.4

Hand gestures utilized for different devices using MR technique. .

17

Table 1.5

The existing in-air gesture-based systems . . . . . . . . . . . . .

18

Table 1.6

The existing vision-based dynamic hand gesture methods . . . .

36

Table 2.1

The existing Hand gesture datasets . . . . . . . . . . . . . . . . .

46

Table 2.2

The main commands of some smart home electrical appliances .



Table 3.4

The required time to hand segmentation

. . . . . . . . . . . . .

74

Table 3.5

The required time to hand posture recognition . . . . . . . . . .

74

Table 3.6

Results of the JI indexes without/with learning scheme

75

. . . . .

Table 4.1 Recall rate the proposed method (%) on myself datasets with the
difference classifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
Table 4.2

Performance of the proposed method on three different datasets . 103

Table 5.1 Assignment of hand gestures to commands for controlling lamp

3

Figure 3
The proposed frame-work of the dynamic hand gesture recognition for controlling home appliances. . . . . . . . . . . . . . . . . . . .

6

Figure 1.1

Mitsubishi hand gesture-based TV [46]. . . . . . . . . . . . . . .

9

Figure 1.2

Samsung-Smart-TV using hand gestures. . . . . . . . . . . . . .

10

Figure 1.3

Dynamic hand gestures used for Samsung-Smart-TV. . . . . . .

10

Figure 1.4

Hand gesture commands in Soft Remote Control System [39]. .

11


Figure 1.10 Wi-Fi signals to control home appliances using hand gesture [119]. 15
Figure 1.11 Seven hand gestures for wireles-based interaction[9] (Wisee dataset). 16
Figure 1.12 Simulation of using MR to control some home appliances [62]. .

17

Figure 1.13 AirTouch-based control uses depth cue [33]. . . . . . . . . . . .

18

Figure 1.14 Depth threshold cues and face skin [97].

. . . . . . . . . . . . .

22

Figure 1.15 Depth threshold and skeleton [60]. . . . . . . . . . . . . . . . . .

23

Figure 1.16 The process of detecting hand region [69]. . . . . . . . . . . . .

23

Figure 1.17 Spotting dynamic hand gestures system using HMM model [71].

25

Figure 1.18 Threshold using HMM model for different gestures [71]. . . . . .

38

Figure 2.2

Four hand gestures of [83]. . . . . . . . . . . . . . . . . . . . . .

39

Figure 2.3

Cambridge hand gesture dataset of [67].

. . . . . . . . . . . . .

39

Figure 2.4

Five hand gestures of [82]. . . . . . . . . . . . . . . . . . . . . .

40

Figure 2.5

Twelve dynamic hand gestures of the MSRGesture3D dataset [1]. 41

Figure 2.6

Dynamic hand gestures of [88].


45

Figure 2.12 Dynamic hand gestures of PowerGesture dataset [71]. . . . . . .

45

Figure 2.13 Hand shape variations and hand trajectories (low panel) of the
proposed gesture set (5 gestures). . . . . . . . . . . . . . . . . . . . . .

48

Figure 2.14 In each row, changes of the hand shape during a gesture performing. From left-to-right, hand-shapes of the completed gesture chance in
a cyclical pattern (closed-opened-closed). . . . . . . . . . . . . . . . . .

49

Figure 2.15 Comparing the similarity between the closed-form gestures and
a simple sinusoidal signal. . . . . . . . . . . . . . . . . . . . . . . . . .

51

Figure 2.16 Close cyclical hand gesture pattern and cycle signal. . . . . . . .

51

Figure 2.17 The environment setup the MICA1 dataset. . . . . . . . . . . .

52

Figure 2.18 The environment setup for the MICA2 dataset . . . . . . . . . .


Results of hand region detection . . . . . . . . . . . . . . . . . .

61

Figure 3.5 Result of the learning distance parameter. (a-c) Three consecutive frames; (d) Results of subtracting two first frames; (e) Results of
the subtracting two next frames; (f) Binary thresholding operator; (g)
A range of hand (left) and of body (right) on the depth histogram . . .

63

Figure 3.6

The training skin color model . . . . . . . . . . . . . . . . . . .

63

Figure 3.7

Result of the training skin color model . . . . . . . . . . . . . .

64

Figure 3.8 Results of the hand segmentation. (a) a Candidate of hand; (b)
Mahalanobis distance; (c) Refining the segmentation results using RGB
features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

66

Figure 3.9

73

Figure 3.16 Results of the kernel-based descriptors for hand posture recognition without/with segmentation . . . . . . . . . . . . . . . . . . . . . .

76

xiv


Figure 3.17 Performances of the dynamic gesture spotting on two datasets
MICA1 and MICA2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

77

Figure 3.18 An illustration of the gesture spotting errors. . . . . . . . . . . .

77

Figure 4.1

The comparison framework of hand gesture recognition . . . . .

81

Figure 4.2

Optical flow and Trajectory of the go-right hand gesture. . . . .

83



Define quasi-periodic image sequence . . . . . . . . . . . . . . .

91

Figure 4.9

Illustrations of the phase variations. . . . . . . . . . . . . . . . .

92

Figure 4.10 Define quasi-periodic image sequence in phase domain. . . . . .

92

Figure 4.11 Manifold representation of the cyclical Next hand gesture . . . .

93

Figure 4.12 Phase synchronization. . . . . . . . . . . . . . . . . . . . . . . .

94

Figure 4.13 Whole length sequence is synchronized with the best difference
phase. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

95

Figure 4.14 Whole length sequence is synchronized with the the best similar
phase. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Figure 5.3

The state diagram of the proposed lighting control system. . . . 107

Figure 5.4

The state diagram of the proposed fan control system. . . . . . 108

Figure 5.5 A schematic representation of basic components in hand gesturebased control system. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
Figure 5.6

Integration of hand gesture recognition modules. . . . . . . . . . 109

Figure 5.7

The proposed frame-work for training phase. . . . . . . . . . . . 110

Figure 5.8 The proposed flow chart for the online dynamic hand gesture
recognition. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
Figure 5.9

The proposed flow chart for controlling lamp. . . . . . . . . . . 113

Figure 5.10 The proposed flow chart for controlling fan. . . . . . . . . . . . 114
Figure 5.11 Setup for evaluating the control systems . . . . . . . . . . . . . 115
Figure 5.12 Illustration of environment and material setup. . . . . . . . . . . 117
Figure 5.13 The time-line of the proposed evaluation system. . . . . . . . . . 119
Figure 5.14 The time cost for the proposed dynamic hand gesture recognition
system. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
Figure 5.15 Usability evaluation of the proposed system. . . . . . . . . . . . 120

to benefit from Kinect sensor [2] which provides both RGB and depth features. Utilizing such valuable features offer an efficient and robust solution for addressing the
challenges.

1


Objectives
The thesis aims to achieve a robust, real-time hand gesture recognition system. As
a feasible solution, the proposed method should be natural and friendly for end-users.
A real application is deployed for automatically controlling a fan and/or bulb/lamp
using hand gestures. They are the common electrical home appliances. Without any
limitation, the proposed technique tends to extend a specific case to general home
automation control systems. To this end, the concrete objectives are:
- Defining an unique set of dynamic hand gestures. This gesture set conveys commands that are available in common home electronic appliances such as television, fan, lamp, door, air-conditioner, and so on. Moreover, the proposed gesture
set is designed with unique characteristics. These characteristics are important
cues and offer promising solutions to address the challenges of a dynamic hand
gestures recognition system.
- A real-time spotting dynamic hand gestures from input video stream. The proposed spotting gesture technique consists of relevant solutions of hand detection
and hand segmentation from consecutive RGB-D images. In the view of a complete system, the spotting technique considers a preprocessing procedure.
- Performances of a dynamic hand gesture recognition method depends on gesture’s
representation and matching phases. This work aims to extract and represent
both spatial and temporal features of the gestures. Moreover, thesis intends to
match phases of the gallery and probe sequences using a phase synchronization
scheme. The proposed phase synchronization aims to solve variants of gesture
speeds, acquisition frame rates. In the experiments, the proposed method with
various positions, directions, and distances from the human to the Kinect sensor
are evaluated.
- A proposed framework to control home appliances (such as lamp/fan) is deployed.
A full hand gesture-based system is built in an indoor scenario (a smart-room).
The prototypes of the proposed system for controlling fans and lamps are shown


Figure 1 Home appliances in a smart homes

Figure 2 Controlling home appliances using dynamic hand gestures in smart house.
The proposed system operates with a Kinect sensor. This device is mounted at
the fixed position to obtain good system performance as well as to make end-users feel
comfortable. To deploy a real application of home appliance controlling using dynamic
hand gestures, the thesis has some constraints for studying on dynamic hand gesture
recognition as the following:
3


❼ The Kinect sensor:

– The Kinect sensor is immobile when end-users implement interactions.
– The Kinect sensor captures RGB and Depth images at a normal frame rate
(from 10 to 30 fps) with an image resolution of 640×480 pixels for both of
those image types.
– The visible area is an area in front of the Kinect sensor so that every object
can be viewed by the Kinect sensor (not only limited by distance from the
objects to the camera (from 0.8m to 4m) but also coved by an angle of 300
around the center axe of the Kinect sensor).
❼ Furnitures and other objects are distributed uniformly in a square room.
❼ For an instance time, it is assumed that that only one end-user controls a home

appliance by using dynamic hand gestures of his/her right hand. If there is more
than one subject in the room, the nearest person from the Kinect sensor will be
considered.
❼ When an end-user wants to control an electronic appliance, he/she should stand in



tures with artifacts such as different speeds/velocities, captured frame rate changes,
various length of hand’s trajectories. Therefore, the proposed dynamic hand
gesture system must be designed to adapt to such variations. Thesis mainly
addresses such issues by a new phase synchronization technique.

Contributions
Throughout the thesis, the main objectives are addressed by a unified solution.
Thesis achieves following contributions:
❼ Contribution 1: Designing a dynamic hand gesture dataset to conduct the

commands of electronic home equipments. The proposed gestures are suitable to
deploy gesture-based systems for smart room environments. The dataset consists
of specific characteristics that are useful and supportive for deploying a robust
hand gesture recognition system. A number of datasets are captured with a large
number of end-users. The datasets consist both RGB-Depth images and publish
for the research community about dynamic hand gestures. In addition, these
datasets are to evaluate the performances of proposed algorithms.
❼ Contribution 2: An efficient user-guide scheme is proposed to learn the heuris-

tic parameters-based with a trade-off between a real-time system and user independent system. This scheme helps to obtain both a real-time hand detection
and good performance of hand segmentation. Then, an efficient gesture spotting method is proposed that utilizes the features extracted from continuous
segmented hand regions.
❼ Contribution 3: Proposing an efficient representation for dynamic hand ges-

tures which combines spatial-temporal features. By using some most significant
dimensions from the nonlinear reduced space (ISOMAP technique), the spatial
features are extracted for dynamic hand gesture representations. The trajectories
of hand movements are extracted using KLT technique. This proposed representation is especially helpful for discriminating the different types of the gestures.
In addition, to resolve the gestures’ variation issues, a new phase synchronization


Real-time dynamic hand gesture spotting
Dynamic hand gesture
representation

Phase synchronization

Hand gesture
classifer

Robust dynamic hand gesture recognition
Control home appliances
(natural way & real environment)
Application system

Figure 3 The proposed frame-work of the dynamic hand gesture recognition for controlling home appliances.
This thesis proposes an unified solution of dynamic hand gesture recognition. The
proposed framework consists of three main phases as illustrated in Fig. 3. They are
(1) hand detection and segmentation from a video stream; (2) spotting dynamic hand
gestures; and (3) the recognition schemes. Utilizing this framework, a real application
is also deployed. The application is evaluated in different contexts such as in lab-based
environments, demonstrations in the exhibitions and Tech-mart events. Particularly,
these research works in the thesis are divided into five chapters as follows:
❼ Introduction: This chapter describes the main motivations and objectives of the

study. The thesis also presents the research’s context, constraints, and challenges.
These factors could be raised when addressing the relevant problems in the thesis.
Additionally, the general proposed framework, and the main contributions are
also presented in this Chapter.


❼ Conclusion and Future Works: Conclusions of the works and relevant discussions

on the limitations of the proposed method are given in this Chapter. Further
research directions are proposed for future works.

7



Nhờ tải bản gốc

Tài liệu, ebook tham khảo khác

Music ♫

Copyright: Tài liệu đại học © DMCA.com Protection Status