Robust Control Theory and Applications Part 6 potx - Pdf 14

Robust Delay-Independent/Dependent Stabilization of
Uncertain Time-Delay Systems by Variable Structure Control

187
ii = 1;
setlmis([])
P =lmivar(1,[2 1]);
R1=lmivar(1,[2 1]);
R2=lmivar(1,[2 1]);
lmiterm([-1 1 1 P],ii,ii)
lmiterm([-2 1 1 R1],ii,ii)
lmiterm([4 1 1 P],1,A0til','s')
lmiterm([4 1 1 R1],ii,ii)
lmiterm([4 2 2 R1],-ii,ii)
lmiterm([4 1 2 P],1,A1hat)
LMISYS=getlmis;
[copt,xopt]=feasp(LMISYS);
P=dec2mat(LMISYS,xopt,P);
R1=dec2mat(LMISYS,xopt,R1);
evlmi=evallmi(LMISYS,xopt);
[lhs,rhs]=showlmi(evlmi,4);
lhs
P
eigP=eig(P)
R1
eigR1=eig(R1)
eigsLHS=eig(lhs)
BTP=B'*P
BTPB=B'*P*B
invBTPB=inv(B'*P*B)
normG1 = norm(G1)

lmiterm([4 1 1 R2],ii,ii)
lmiterm([4 2 2 R1],-ii,ii)
lmiterm([4 1 2 P],1,A1hat)
lmiterm([4 1 3 P],1,A2hat)
lmiterm([4 3 3 R2],-ii,ii)
LMISYS=getlmis;
[copt,xopt]=feasp(LMISYS);
P=dec2mat(LMISYS,xopt,P);
R1=dec2mat(LMISYS,xopt,R1);
R2=dec2mat(LMISYS,xopt,R2);
evlmi=evallmi(LMISYS,xopt);
[lhs,rhs]=showlmi(evlmi,4);
lhs
eigsLHS=eig(lhs)
P
eigP=eig(P)
R1
R2
eigR1=eig(R1)
eigR2=eig(R2)
BTP=B'*P
BTPB=B'*P*B
invBTPB=inv(B'*P*B)
% recalculate
Geq=inv(B'*P*B)*B'*P
A0hat=A0-B*G*A0
A1hat=A1-B*G*A1
A2hat=A2-B*G*A2
G= place(A0hat,B,[-4.2 6i -4.2+.6i])
A0til=A0hat-B*G1

lhs
eigsLHS=eig(lhs)
P
eigP=eig(P)
R1
R2
eigR1=eig(R1)
eigR2=eig(R2)
BTP=B'*P
BTPB=B'*P*B
invBTPB=inv(B'*P*B)
normG1 = norm(G1)
A3
clear;
clc;
A0=[-0.228 2.148 -0.021 0; -1 -0.0869 0 0.039; 0.335 -4.424 -1.184 0; 0 0 1 0];
A1=[ 0 0 -0.002 0; 0 0 0 0.004; 0.034 -0.442 0 0; 0 0 0 0];
B =[-1.169 0.065; 0.0223 0; 0.0547 2.120; 0 0];
setlmis([])
P =lmivar(1,[4 1]);
R1=lmivar(1,[4 1]);
G=inv(B'*P*B)*B'*P
A0hat=A0-B*G*A0
Robust Control, Theory and Applications

190
A1hat=A1-B*G*A1
G1= place(A0hat,B,[ 5+.082i 5 082i 2 3])
A0til=A0hat-B*G1
eigA0til=eig(A0til)

A1=[-1 0 0; -0.1 0.25 0.2; -0.2 4 5]
B =[0;0;1]
%break
h1=1.0;
setlmis([]);
P=lmivar(1,[3 1]);
Geq=inv(B'*P*B)*B'*P
A0hat=A0-B*Geq*A0
A1hat=A1-B*Geq*A1
eigA0hat=eig(A0hat)
eigA1hat=eig(A1hat)
Robust Delay-Independent/Dependent Stabilization of
Uncertain Time-Delay Systems by Variable Structure Control

191
DesPol = [-2.7 8+.5i 8 5i];
G= place(A0hat,B,DesPol)
A0til=A0hat-B*G
eigA0til=eig(A0til)
R1=lmivar(1,[3 1]);
S1=lmivar(1,[3 1]);
T1=lmivar(1,[3 1]);
lmiterm([-1 1 1 P],1,1);
lmiterm([-1 2 2 R1],1,1);
lmiterm([-2 1 1 S1],1,1);
lmiterm([-3 1 1 T1],1,1);
lmiterm([4 1 1 P],(A0til+A1hat)',1,'s');
lmiterm([4 1 1 S1],h1,1);
lmiterm([4 1 1 R1],h1,1);
lmiterm([4 1 1 T1],1,1);

Robust Control, Theory and Applications

192
lmiterm([-1 1 1 P],1,1);
lmiterm([-1 2 2 R1],1,1);
lmiterm([-2 1 1 S1],1,1);
lmiterm([-3 1 1 T1],1,1);
lmiterm([4 1 1 P],(A0til+A1hat)',1,'s');
lmiterm([4 1 1 S1],h1,1);
lmiterm([4 1 1 R1],h1,1);
lmiterm([4 1 1 T1],1,1);
lmiterm([4 1 2 P],-1,A1hat*A0hat);
lmiterm([4 1 3 P],-1,A1hat*A1hat);
lmiterm([4 2 2 R1],-1/h1,1);
lmiterm([4 3 3 S1],-1/h1,1);
lmiterm([4 4 4 T1],-1,1);
LMISYS=getlmis;
[copt,xopt]=feasp(LMISYS);
P=dec2mat(LMISYS,xopt,P);
R1=dec2mat(LMISYS,xopt,R1);
S1=dec2mat(LMISYS,xopt,S1);
T1=dec2mat(LMISYS,xopt,T1);
evlmi=evallmi(LMISYS,xopt);
[lhs,rhs]=showlmi(evlmi,4);
lhs,h1,P,R1,S1,T1
eigLHS=eig(lhs)
NormP=norm(P)
G
NormG = norm(G)
invBtPB=inv(B'*P*B)

lmiterm([-1 1 1 P],1,1);
lmiterm([-1 2 2 R1],1,1);
lmiterm([-2 1 1 S1],1,1);
lmiterm([-3 1 1 T1],1,1);
lmiterm([4 1 1 P],(A0til+A1hat)',1,'s');
lmiterm([4 1 1 S1],h1,1);
lmiterm([4 1 1 R1],h1,1);
lmiterm([4 1 1 T1],1,1);
lmiterm([4 1 2 P],-1,A1hat*A0hat);
lmiterm([4 1 3 P],-1,A1hat*A1hat);
lmiterm([4 2 2 R1],-1/h1,1);
lmiterm([4 3 3 S1],-1/h1,1);
lmiterm([4 4 4 T1],-1,1);
LMISYS=getlmis;
[copt,xopt]=feasp(LMISYS);
P=dec2mat(LMISYS,xopt,P);
R1=dec2mat(LMISYS,xopt,R1);
S1=dec2mat(LMISYS,xopt,S1);
T1=dec2mat(LMISYS,xopt,T1);
evlmi=evallmi(LMISYS,xopt);
[lhs,rhs]=showlmi(evlmi,4);
lhs,h1,P,R1,S1,T1
eigsLHS=eig(lhs)
% repeat
Geq=inv(B'*P*B)*B'*P
A0hat=A0-B*Geq*A0
A1hat=A1-B*Geq*A1
eigA0hat=eig(A0hat)
eigA1hat=eig(A1hat)
G = avec;

[lhs,rhs]=showlmi(evlmi,4);
lhs,h1,P,R1,S1,T1
eigsLHS=eig(lhs)
NormP=norm(P)
G
NormG = norm(G)
invBtPB=inv(B'*P*B)
BtP=B'*P
eigsP=eig(P)
eigsR1=eig(R1)
eigsS1=eig(S1)
eigsT1=eig(T1)
8. References
Utkin, V. I. (1977), Variable structure system with sliding modes, IEEE Transactions on
Automatic Control, Vol. 22, pp. 212-222.
Sabanovic, A.; Fridman, L. & Spurgeon, S. (Editors) (2004). Variable Structure Systems: from
Principles to Implementation, The Institution of Electrical Engineering, London.
Perruquetti, W. & Barbot, J. P. (2002). Sliding Mode Control in Engineering, Marcel Dekker,
New York.
Richard J. P. (2003). Time-delay systems: an overview of some recent advances and open
problems, Automatica, Vol. 39, pp. 1667-1694.
Robust Delay-Independent/Dependent Stabilization of
Uncertain Time-Delay Systems by Variable Structure Control

195
Young, K. K. O.; Utkin, V. I. & Özgüner, Ü. (1999). A control engineer’s guide to sliding
mode control, Transactions on Control Systems Technology, Vol. 7, No. 3, pp. 328-342.
Spurgeon, S. K. (1991). Choice of discontinuous control component for robust sliding mode
performance, International Journal of Control, Vol. 53, No. 1, pp. 163-179.
Choi, H. H. (2002). Variable structure output feedback control design for a class of uncertain

Xu, J X.; Hashimoto, H.; Slotine, J J. E.; Arai, Y. & Harashima, F. (1989). Implementation of
VSS control to robotic manipulators-smoothing modification, IEEE Transactions on
Industrial Electronics, Vol. 36, No. 3, pp. 321-329.
Tan, S C.; Lai, Y. M.; Tse, C. K.; Martinez-Salamero, L. & Wu, C K. (2007). A fast-
response sliding-mode controller for boost-type converters with a wide range of
operating conditions, IEEE Transactions on Industrial Electronics, Vol. 54, No. 6, pp.
3276-3286.
Robust Control, Theory and Applications

196
Li, H.; Chen, B.; Zhou, Q. & Su, Y. (2010). New results on delay-dependent robust stability of
uncertain time delay systems, International Journal of Systems Science, Vol. 41, No. 6,
pp. 627-634.
Schmidt, L. V. (1998). Introduction to Aircraft Flight Dynamics, AIAA Education Series, Reston,
VA.
Jafarov, E. M. (2008). Robust delay-dependent stabilization of uncertain time-delay
systems by variable structure control, Proceedings of the International IEEE
Workshop on Variable Structure Systems VSS’08, pp. 250-255, June 2008, Antalya,
Turkey.
Jafarov, E. M. (2009). Robust sliding mode control of multivariable time-delay systems,
Proceedings of the 11th WSEAS International Conference on Automatic Control,
Modelling and Simulation, pp. 430-437, May-June 2009, Istanbul, Turkey.
9
A Robust Reinforcement Learning System
Using Concept of Sliding Mode Control for
Unknown Nonlinear Dynamical System
Masanao Obayashi, Norihiro Nakahara, Katsumi Yamada,
Takashi Kuremoto, Kunikazu Kobayashi and Liangbing Feng
Yamaguchi University
Japan

and action space. RL-based solutions to the continuous-time optimal control problem have
been given in Doya (Doya (2000). The main advantage of using RL for solving optimal
Robust Control, Theory and Applications

198
control problems comes from the fact that a number of RL algorithms, e.g. Q-learning
(Watkins et al. (1992)) and actor-critic learning (Wang et al. (2002)) and Obayashi et al.
(2008)), do not require knowledge or identification/learning of the system dynamics. On the
other hand, remarkable characteristics of SMC method are simplicity of its design method,
good robustness and stability for deviation of control conditions.
Recently, a few researches as to robust reinforcement learning have been found, e.g.,
Morimoto et al. (2005) and Wang et al. (2002) which are designed to be robust for external
disturbances by introducing the idea of H
∞
control theory (Zhau et al. (1996)), and our
previous work (Obayashi et al. (2009)) is for deviations of the system parameters by
introducing the idea of sliding mode control commonly used in model-based control.
However, applying reinforcement learning to a real system has a serious problem, that is,
many trials are required for learning to design the control system.
Firstly we introduce an actor-critic method, a kind of RL, to unite with SMC. Through the
computer simulation for an inverted pendulum control without use of the inverted pendulum
dynamics, it is clarified the combined method mentioned above enables to learn in less trial of
learning than the only actor-critic method and has good robustness (Obayashi et al. (2009a)).
In applying the controller design, another problem exists, that is, incomplete observation
problem of the state of the system. To solve this problem, some methods have been
suggested, that is, the way to use observer theory (Luenberger (1984)), state variable filter
theory (Hang (1976), Obayashi et al. 2009b) and both of the theories (Kung and Chen (2005)).
Secondly we introduce a robust reinforcement learning system using the concept of SMC,
which uses neural network-type structure in an actor/critic configuration, refer to Fig. 1, to
the case of the system state partly available by considering the variable state filter (Hang

Fig. 1. The construction of the actor-critic system. (symbols in this figure are reffered to
section 2)
The rest of this chapter is organized as follows. In Section 2, the conventional actor-critic
reinforcement learing system is described. In Section 3, the controlled system, variable filter
and sliding mode control are shortly explained. The proposed actor-critic reinforcement
learning system with state variable filter using sliding mode control is described in Section
4. Comparison between the proposed system and the conventional system through
simulation experiments is executed in Section 5. Finally, the conclusion is given in Section 6.
A Robust Reinforcement Learning System Using Concept of
Sliding Mode Control for Unknown Nonlinear Dynamical System

199
2. Actor-critic reinforcement learning system
Reinforcement learning (RL, Sutton and Barto (1998)), as experienced learning through
trial and error, which is a learning algorithm based on calculation of reward and penalty
given through mutual action between the agent and environment, and which is
commonly executed in living things. The actor-critic method is one of representative
reinforcement learning methods. We adopted it because of its flexibility to deal with both
continuous and discrete state-action space environment. The structure of the actor-critic
reinforcement learning system is shown in Fig. 1. The actor plays a role of a controller and
the critic plays role of an evaluator in control field. Noise plays a part of roles to search
the optimal action.
2.1 Structure and learning of critic
2.1.1 Structure of critic
The function of the critic is calculation of
(
)
Pt : the prediction value of sum of the discounted
rewards
r(t) that will be gotten over the future. Of course, if the value of

)
(
)
(
)
1Vt rt Vt
=
++
γ
. (2)
Here the prediction value of
(
)
Vt is defined as
(
)
Pt . The prediction error
(
)
ˆ
rt is expressed
as follows,

(
)
(
)
(
)
(

∑
ω
(4)

22
1
() exp ( () ) /( )
n
ccc
jiijij
i
yt xt c
σ
=
⎡
⎤
=− −
⎢
⎥
⎣
⎦
∑
. (5)
Here, ( ) : th
c
j
y
t
j
node’s output of the middle layer of the critic at time t ,

ˆ
rt go to zero. Updating rule of parameters are as follows,

2
ˆ
,( 1, ,)
c
t
ic
c
i
r
iJ
∂
=− ⋅ =
∂
Δω η
ω
" . (6)
Here
c
η
is a small positive value of learning coefficient.
2.2 Structure and learning of actor
2.2.1 Structure of actor
Figure 3 shows the structure of the actor. The actor plays the role of controller and outputs
the control signal, action
()at , to the environment. The actor basically also consists of radial
basis function network. The
thj basis function of the middle layer node of the actor is as

y
t
=
=⋅
∑
ω
, (8)

1max
1exp( '())
() ,
1exp( '())
ut
ut u
ut
+−
=⋅
−−
(9)

(
)
1
() ()ut u t n t=+. (10)
Here : th
a
j
yj
node’s output of the middle layer of the actor,
a

(
)
(
)
(
)
min 1,exp(
tt
nt n noise Pt== ⋅ − , (11)
where
t
noise is uniformly random number of
[
]
1,1−
, min (
⋅
): minimum of
⋅
. As the
(
)
Pt

will be bigger (this means that the action goes close to the optimal action), the noise will be
smaller. This leads to the stable learning of the actor.
2.2.3 Learning of parameters of actor
Parameters of the actor, ( 1, , )
a

is the learning coefficient. Equation (12) means that
ˆ
()
tt
nr
−
⋅
is considered as an
error,
a
j
ω
is adjusted as opposite to sign of
ˆ
()
tt
nr
−
⋅ . In other words, as a result of executing
()ut , e.g. if the sign of the additive noise is positive and the sign of the prediction error is
positive, it means that positive additive noise is sucess, so the value of
a
j
ω
should be
increased (see Eqs. (8)-(10)), and vice versa.
3. Controlled system, variable filter and sliding mode control
3.1 Controlled system
This paper deals with next nth order nonlinear differential equation.

functions.
Object of the control system: To decide control input u which leads the states of the system
to their targets
x. We define the error vector e as follows,

(1)
(1) (1)
[,, , ],
[,,, ].
n
T
nn
T
dd d
ee e
xxxx x x
−
−−
=
=− − −
e

"

"
(15)
The estimate vector of e,
ˆ
e , is available through the state variable filter (see Fig. 4).
3.2 State variable filter

"
"
(16)
ω
n

σ
1
p

1
p

1
p

n
−
1

n−
2

0

ω
0e

ee
, (17)
A Robust Reinforcement Learning System Using Concept of
Sliding Mode Control for Unknown Nonlinear Dynamical System

203

()
T
s =e α e
, (18)
where
1
1
n−
=
α

01 1
[,,, ],
T
n−
=
αα α
α "
and
12
12 0
nn
nn

smaller, i.e., error vector
e would be close to zero, the reward r(t) would be bigger.
4.2 Noise
Noise n(t) is used to maintain diversity of search of the optimal input and to find the
optimal input. The absolute value of sliding variable s is bigger, n(t) is bigger, and that of s is
smaller, it is smaller.
Robust Control, Theory and Applications

204

2
1
() exp ,nt z n
s
⎛⎞
=⋅⋅ −⋅
⎜⎟
⎝⎠
β
(20)
where, z is uniform random number of range [-1, 1].
n is upper limit of the perturbation
signal for searching the optimal input
.u
β
is predefined positive constant for adjusting.
5. Computer simulation
5.1 Controlled object
To verify effectiveness of the proposed method, we carried out the control simulation using
an inverted pendulum with dynamics described by Eq. (21) (see Fig. 6).

V
μ

coefficient of friction 0.02
q
T
input torque -
[,]
θ
θ
=X


observation vector -
Table 1. Parameters of the system used in the computer simulation.
5.2 Simulation procedure
Simulation algorithm is as follows,
Step 1. Initial control input
0
q
T is given to the system through Eq. (21).
Step 2. Observe the state of the system. If the end condition is satisfied, then one trial ends,
otherwise, go to Step 3.
Step 3. Calculate the error vector
e
, Eq. (15). If only ()
y
x
=
, i.e.,

(,)(18[ ],0[ /sec])rad rad=
θθ π

and continues the
system control for 20[sec], and sampling time is 0.02[sec]. The trial ends if
/4≥
θπ
or
controlling time is over 20[sec]. We set upper limit for output
1
u of the actor. Trial success
means that
θ
is in range
[
]
360, 360−
ππ
for last 10[sec]. The number of nodes of the
hidden layer of the critic and the actor are set to 15 by trial and error (see Figs. (2)–( 3)). The
parameters used in this simulation are shown in Table 2.

0
α
: sliding variable parameter in Eq. (18)
5.0
c
η
: learning coefficient of the actor in Eqs. (6)-(A6)
0.1

0 5 10 15 20
[ra d ] [ra d /s e c ]
TIME [ sec]
Angular Position
Angular Velocity

-20
-10
0
10
20
0 5 10 15 20
Torque [N]
TIME [sec]
Control signal

(a)
,
θ
θ

(b) Torque
q
T

Fig. 7. Result of the proposed method in the case of complete observation (
θθ

,
).

e
pp
e
01
2
2
0
ˆ
ωω
ω
++
=
(22)

e
pp
p
e
01
2
2
1
ˆ
ωω
ω
++
=
(23)

θ
is available).
c. The case of incomplete observation using the difference method
Instead of the state variable filter in 5.4.1 B, to estimate the velocity angle, we adopt the
commonly used difference method, like that,

1
ˆ
−
−=
ttt
θθθ

. (24)
We construct the sliding variable
s
in Eq. (18) by using
θθ

ˆ
,
. The results of the simulation of
the proposed method are shown in Fig. 10.
A Robust Reinforcement Learning System Using Concept of
Sliding Mode Control for Unknown Nonlinear Dynamical System

207
-0.4
-0.2
0

q
T

Fig. 10. Result of the proposed method using the difference method in the case of incomplete
observation (only
θ
is available).
5.4.2 Results of the conventional method.
d. Sliding mode control method
The control input is given as follows,

]N[0.20
0,
0,
)(
max
max
max
=
+=
⎩
⎨
⎧
≤⋅−
>
⋅
=
=
U
c

10
20
30
0 5 10 15 20
Torque[N]
Time
[
sec
]
Controll signal

(a) ,
θ
θ

(b) Torque
q
T
Fig. 11. Result of the conventional (SMC) method in the case of complete observation (
θθ

,
).
e. Conventional actor-critic method
The structure of the actor of the conventional actor-critic control method is shown in Fig. 12.
The detail of the conventional actor-critic method is explained in Appendix. Results of the
simulation are shown in Fig. 13.
Robust Control, Theory and Applications

208

q
T
Fig. 13. Result of the conventional (actor-critic) method in the case of complete observation
(,
θ
θ

).

-0.4
-0.2
0
0.2
0.4
0 5 10 15 20
[rad] [rad/sec]
TIME [sec]
Angular Position
Angular Velocity

-20
-10
0
10
20
0 5 10 15 20
Torque [N]
TIME [sec]
Control signal

, (26)
here, 45, 1, 10
pId
KKK==⋅=. Fig. 14 shows the results of the PID control.
5.4.3 Discussion
Table 3 shows the control performance, i.e. average error of
θ
θ

,
, through the controlling
time when final learning for all the methods the simulations have been done. Comparing
the proposed method with the conventional actor-critic method, the proposed method is
better than the conventional one. This means that the performance of the conventional actor-
critic method hass been improved by making use of the concept of sliding mode control.

Proposed method Conventional method
Actor-Critic
+ SMC
SMC PID
Actor-
Critic
Incomplete
Observation
(
θ
: available)

0.15
0.2
0 2 4 6 8 10
Angle[rad]
Time[sec]
Incomplete state observation using State-filter RL+SMC
actor-critic RL
PID

Fig. 15. Comparison of the porposed method with incomplete observation, the conventional
actor-critic method and PID method for the angle,
θ
.
Robust Control, Theory and Applications

210
Figure 15 shows the comparison of the porposed method with incomplete observation, the
conventional actor-critic method and PID method for the angle,
θ
. In this figure, the
proposed method and PID method converge to zero smoothly, however the conventional
actor-critic method does not converge. The comparison of the proposed method with PID
control, the latter method converges quickly. These results are corresponding to Fig.16, i.e.
the torque of the PID method converges first, the next one is the proposed method, and the
conventional one does not converge.

-20
-10
0
10

the case with the state variable filter, and with the difference method for the angle,
θ
.
A Robust Reinforcement Learning System Using Concept of
Sliding Mode Control for Unknown Nonlinear Dynamical System

211
Fig. 17 shows the comparison of the porposed method among the case of the complete
observation, the case with the state variable filter, and with the difference method for the
angle,
θ
. Among them, the incomplete state observation with the difference method is best
of three, especially, better than the complete observation. This reason can be explained by
Fig. 18. That is, the value of
s
of the case of the difference method is bigger than that of the
observation of the velocity angle, this causes that the input gain becomes bigger and the
convergence speed has been accelerated.

-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
0 5 10 15 20
TIME
[

good performance and robustness comparing with the conventional actor-critic method,
because of the making use of the ability of the SMC method.

Nhờ tải bản gốc

Tài liệu, ebook tham khảo khác

Robust Control Theory and Applications Part 6 potx - Pdf 14

Tài liệu, ebook tham khảo khác

Học thêm