Bài giảng Tối ưu hóa nâng cao: Chương 6 - Hoàng Nam Dũng - Pdf 62

Subgradients

Hoàng Nam Dũng
Khoa Toán - Cơ - Tin học, Đại học Khoa học Tự nhiên, Đại học Quốc gia Hà Nội

Last time: gradient descent
Consider the problem
min f (x)
x

for f convex and differentiable, dom(f ) = Rn .
Gradient descent: choose initial x (0) ∈ Rn , repeat
x (k) = x (k−1) − tk · ∇f (x (k−1) ),

k = 1, 2, 3, . . .

Step sizes tk chosen to be fixed and small, or by backtracking line
search
If ∇f Lipschitz, gradient descent has convergence rate O(1/ε)
Downsides:
Requires f differentiable
Can be slow to converge
1

Outline

Today:
Subgradients
Examples

−1

first-order
approximation
f at xlower
is abound
global lower
• the The
first-order
approximation
of f at x isof
a global

bound.
∇f defines a non-vertical supporting hyperplane to epi(f ) at
T
∇f (x)
y
x
(x, f (x))
−
≤ 0 ∀(y, t) ∈ epi f

• ∇f (x) defines a non-vertical supporting hyperplane to epi f at (x, f (x)):

Subgradients

∇f

−1

4

Subgradients
A subgradient of a convex function f at x is any g ∈ Rn such that
f (y ) ≥ f (x) + g T (y − x), ∀y ∈ dom(f ).

Always exists (on the relative
interior of dom(f ))
Subgradient
If f differentiable at x, then g = ∇f (x) uniquely
g is a subgradient of a convex function f at x ∈ dom f if
Same
definition works for nonconvex f (however, subgradients
need not exist).f (y) ≥ f (x) + gT (y − x) ∀y ∈ dom f
f (y)
f (x1) + g1T (y − x1)
f (x1) + g2T (y − x1)

f (x2) + g3T (y − x2)
x1

x2

g1 and g2 are subgradients at x1 , g3 is subgradient at x2 .
g1, g2 are subgradients at x1; g3 is a subgradient at x2

4

2

x

• For x = 0, unique subgradient g = sign(x)

For x = 0, unique subgradient g = sign(x)
For x = 0, subgradient g is any element of [−1, 1].

• For x = 0, subgradient g is any element of [−1, 1]

5
5

Examples of subgradients
Considerf f: :RRnn→
→R,
R,ff (x)
(x) =
= xx
Consider

22

f(x)

x2

x1

x1

For x = 0, unique ith component g = sign(x )
For xi = 0, ith component gi is any element of [−1, 1].
• For xi = 0, ith component gi is any element of [−1, 1]
• For xi i = 0, unique ith component gi i = sign(xii)

7

Examples of subgradients

0

5

f(x)

10

15

n
Considerf f(x)
(x)==max{f
max{f11(x),
(x), ff22(x)}, for
Consider
for ff11,,ff2 2: R

= ∇f
∇f22(x)
(x)
• For

Forf1f1(x)
(x)==ff22(x),
(x), subgradient
subgradient gg isis any
any point
point on
on line
line segment
segment
• For
between∇f
∇f11(x)
(x)and
and∇f
∇f22(x)
(x).
between

8

Subdifferential

Set of all subgradients of convex f is called the subdifferential:
∂f (x) = {g ∈ Rn : g is a subgradient of f at x}.

Monotonicity
Theorem
The subdifferential of a convex function f is a monotone operator
(u − v )T (x − y ) ≥ 0, ∀u ∈ ∂f (x), v ∈ ∂f (y ).
Chứng minh.
By definition we have
f (y ) ≥ f (x) + u T (y − x) and f (x) ≥ f (y ) + v T (x − y ).
Combining them get shows monotonicity.
Question: Monotonicity for differentiable convex function?

10

Monotonicity
Theorem
The subdifferential of a convex function f is a monotone operator
(u − v )T (x − y ) ≥ 0, ∀u ∈ ∂f (x), v ∈ ∂f (y ).
Chứng minh.
By definition we have
f (y ) ≥ f (x) + u T (y − x) and f (x) ≥ f (y ) + v T (x − y ).
Combining them get shows monotonicity.
Question: Monotonicity for differentiable convex function?
(∇f (x) − ∇f (y ))T (x − y ) ≥ 0,
which follows directly from the first order characterization of
convex functions.
10

Examples of non-subdifferentiable functions

IC (y ) ≥ IC (x) + g T (y − x) for all y .
For y ∈ C , IC (y ) = ∞

For y ∈ C , this means 0 ≥ g T (y − x).
12

●
●

●

●

13 11

Subgradient calculus

Basic rules for convex functions:
Scaling: ∂(af ) = a · ∂f provided a > 0.
Addition: ∂(f1 + f2 ) = ∂f1 + ∂f2 .

Affine composition: if g (x) = f (Ax + b), then
∂g (x) = AT ∂f (Ax + b).
Finite pointwise maximum: if f (x) = maxi=1,...m fi (x), then
∂f (x) = conv

∂fi (x)
i:fi (x)=f (x)

= max z T x.
z

q ≤1

And
∂f (x) = argmax

z

q ≤1

z T x.

15

Why subgradients?

Subgradients are important for two reasons:
Convex analysis: optimality characterization via subgradients,
monotonicity, relationship to duality.
Convex optimization: if you can compute subgradients, then
you can minimize any convex function.

16

Optimality condition

min f (x) subject to x ∈ C
x

is solved at x if and only if
∇f (x)T (y − x) ≥ 0 for all y ∈ C .a
a

Direct proof see, e.g., />Teaching/ORF523/S16/ORF523_S16_Lec7_gh.pdf. Proof using subgradient
next slide.

Intuitively: says that gradient increases as we move away from x.
Note that for C = Rn (unconstrained case) it reduces to ∇f = 0.

18

Derivation of first-order optimality
Chứng minh.
First recast problem as
min f (x) + IC (x).
x

Now apply subgradient optimality: 0 ∈ ∂(f (x) + IC (x)).
Observe
0 ∈ ∂(f (x) + IC (x)) ⇔ 0 ∈ {∇f (x)} + NC (x)
⇔ −∇f (x) ∈ NC (x)

⇔ −∇f (x)T x ≥ −∇f (x)T y for all y ∈ C
⇔ ∇f (x)T (y − x) ≥ 0 for all y ∈ C
as desired.

Nhờ tải bản gốc

Tài liệu, ebook tham khảo khác

Bài giảng Tối ưu hóa nâng cao: Chương 6 - Hoàng Nam Dũng - Pdf 62

Tài liệu, ebook tham khảo khác

Học thêm