Introduction to the Theory of Nonhnear Optimization
Johannes Jahn
Introduction to the Theory of NonHnear Optimization Third Edition
With 31 Figures
Sprin ger
Prof. Dr. Johannes Jahn Universitat ErlangenNiirnberg Institut fur Angewandte Mathematik Martensstr. 3 91058 Erlangen Germany
[email protected] Library of Congress Control Number: 2006938674
ISBN 9783540493785 Springer Berlin Heidelberg New York ISBN 9783540614074
Second Edition Springer Berlin Heidelberg New York
This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilm or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9,1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable to prosecution under the German Copyright Law. Springer is part of Springer Science+Business Media springer.com © SpringerVerlag Berlin Heidelberg 1994,1996,2007 The use of general descriptive names, registered names, trademarks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. Production: LETgX Jelonek, Schmidt & Vockler GbR, Leipzig Coverdesign: Erich Kirchner, Heidelberg SPIN 11932048
42/3100YL  5 4 3 2 1 0
Printed on acidfree paper
To Claudia and Martin
Preface This book presents an applicationoriented introduction to the theory of nonhnear optimization. It describes basic notions and conceptions of optimization in the setting of normed or even Banach spaces. Various theorems are appHed to problems in related mathematical areas. For instance, the EulerLagrange equation in the calculus of variations, the generahzed Kolmogorov condition and the alternation theorem in approximation theory as well as the Pontryagin maximum principle in optimal control theory are derived from general results of optimization. Because of the introductory character of this text it is not intended to give a complete description of all approaches in optimization. For instance, investigations on conjugate duality, sensitivity, stability, recession cones and other concepts are not included in the book. The bibliography gives a survey of books in the area of nonlinear optimization and related areas like approximation theory and optimal control theory. Important papers are cited as footnotes in the text. This third edition is an enlarged and revised version containing an additional chapter on extended semidefinite optimization and an updated bibliography. I am grateful to S. GeuB, S. Gmeiner, S. Keck, Prof. Dr. E.W. Sachs and H. Winkler for their support, and I am especially indebted to D.G. Cunningham, Dr. G. Eichfelder, Dr. F. Hettlich, Dr. J. Klose, Prof. Dr. E.W. Sachs, Dr. T. Staib and Dr. M. Stingl for fruitful discussions. Erlangen, September 2006
Johannes Jahn
Contents Preface 1 Introduction and Problem Formulation
vii 1
2 Existence Theorems for Minimal Points 2.1 Problem Formulation 2.2 Existence Theorems 2.3 Set of Minimal Points 2.4 Application to Approximation Problems 2.5 Application to Optimal Control Problems Exercises
7 7 8 18 19 23 29
3 Generalized Derivatives 3.1 Directional Derivative 3.2 Gateaux and Frechet Derivatives 3.3 Subdifferential 3.4 Quasidifferential 3.5 Clarke Derivative Exercises
31 31 37 49 57 67 75
4 Tangent Cones 4.1 Definition and Properties 4.2 Optimality Conditions 4.3 A Lyusternik Theorem Exercises
79 79 88 95 103
5 Generalized Lagrange Multiplier Rule 5.1 Problem Formulation
105 105
X
Contents
5.2 Necessary Optimality Conditions 5.3 Sufficient Optimality Conditions 5.4 Application to Optimal Control Problems Exercises
108 126 136 156
6 Duality 6.1 Problem Formulation 6.2 Duality Theorems 6.3 Saddle Point Theorems 6.4 Linear Problems 6.5 Application to Approximation Problems Exercises
159 159 164 168 172 175 184
7 Application to Extended Semidefinite Optimization 7.1 Lowner Ordering Cone and Extensions 7.2 Optimality Conditions 7.3 Duality Exercises
187 187 202 207 210
8
Direct Treatment of Special Optimization Problems 213 8.1 Linear Quadratic Optimal Control Problems 213 8.2 Time Minimal Control Problems 221 Exercises 238
A Weak Convergence
241
B Reflexivity of Banach Spaces
243
C HahnBanach Theorem
245
D Partially Ordered Linear Spaces
249
Bibliography
253
Answers to the Exercises
275
Index
289
Chapter 1 Introduction and Problem Formulation In optimization one investigates problems of the determination of a minimal point of a functional on a nonempty subset of a real linear space. To be more specific this means: Let X be a real linear space, let S' be a nonempty subset of X, and let / : iS —> R be a given functional. We ask for the minimal points of / on S. An element X E S is called a minimal point offonS if f{x) < f{x) for all
xeS.
The set S is also called constraint set^ and the functional / is called objective functional In order to introduce optimization we present various typical optimization problems from Applied Mathematics. First we discuss a design problem from structural engineering. Example 1.1. As a simple example consider the design of a beam with a rectangular crosssection and a given length I (see Fig. 1.1 and 1.2). The height xi and the width X2 have to be determined. The design variables Xi and X2 have to be chosen in an area which makes sense in practice. A certain stress condition must be satisfied, i.e. the arising stresses cannot exceed a feasible stress. This leads to the inequality 2000 < x\x2. (1.1)
Chapter 1. Introduction and Problem Formulation
"A
Xx X2
Figure 1.1: Longitudinal section.
Figure 1.2: Crosssection.
Moreover, a certain stability of the beam must be guaranteed. In order to avoid a beam which is too slim we require Xi < 4X2
(1.2)
X2 < Xi.
(1.3)
and Finally, the design variables should be nonnegative which means xi > 0
(1.4)
X2>0.
(1.5)
and Among all feasible values for xi and X2 we are interested in those which lead to a light construction. Instead of the weight we can also take the volume of the beam given as lxiX2 as a possible criterion (where we assume that the material is homogeneous). Consequently, we minimize lxiX2 subject to the constraints (1.1), ... ,(1.5). With the next example we present a simple optimization problem from the calculus of variations. Example 1.2. In the calculus of variations one investigates, for instance, problems of minimizing a functional / given as
f{x)=
fl{x{t),x{t),t)dt
Chapter 1. Introduction and Problem Formulation
3
where — o o < a < 6 < o o and / is argumentwise continuous and continuously differentiable with respect to x and x. A simple problem of the calculus of variations is the following: Minimize / subject to the class of curves from S := {x e C^[a^b] \ x{a) = Xi and x{b) — X2} where Xi and X2 are fixed endpoints. In control theory there are also many problems which can be formulated as optimization problems. A simple problem of this type is given in the following example. Example 1.3. On the fixed time interval [0,1] we investigate the linear system of differential equations
with the initial condition ^i(O) \
/ 2x/2 \
0:2(0) J
{ 5V2 J
With the aid of an appropriate control function u G C[0,1] this dynamical system should be steered from the given initial state to a terminal state in the set M := {(xi, X2) eR'^\xl
+ xl = 1}.
In addition to this constraint a control function u minimizing the cost functional 1
f{u) = j{u{t)f 0
has to be determined. Finally we discuss a simple problem from approximation theory.
Chapter 1. Introduction and Problem Formulation
(3 = sinh a j3 = xa
X ^ 1.600233
0
1
2
^
Figure 1.3: Best approximation of sinh on [0,2].
Example 1.4. We consider the problem of the determination of a hnear function which approximates the hyperbohc sine function on the interval [0,2] with respect to the maximum norm in a best way (see Fig. 1.3). So, we minimize max ax
sinh a I
aG[0,2]
This optimization problem can also be written as min A subject to the constraints A = max lax — sinh a\ aG[0,2]
(x,A)
eR\
The preceding problem is equivalent to the following optimization problem which has infinitely many constraints: min A subject to the constraints ax — sinh a < A for all a G [0, 2] ax — sinh a > —A (x,A) G R ^
Chapter 1. Introduction and Problem Formulation
5
In the following chapters the examples presented above will be investigated again. The solvability of the design problem (in Example 1.1) is discussed in Example 5.10 where the KarushKuhnTucker conditions are used as necessary optimality conditions. Theorem 3.21 presents a necessary optimality condition known as EulerLagrange equation for a minimal solution of the problem in Example 1.2. The Pontryagin maximum principle is the essential tool for the solution of the optimal control problem formulated in Example 1.3; an optimal control is determined in the Examples 5.21 and 5.23. An application of the alternation theorem leads to a solution of the linear Chebyshev approximation problem (given in Example 1.4) which is obtained in Example 6.17. We complete this introduction with a short compendium of the structure of this textbook. Of course, the question of the solvability of a concrete nonlinear optimization problem is of primary interest and, therefore, existence theorems are presented in Chapter 2. Subsequently the question about characterizations of minimal points runs like a red thread through this book. For the formulation of such characterizations one has to approximate the objective functional (for that reason we discuss various concepts of a derivative in Chapter 3) and the constraint set (this is done with tangent cones in Chapter 4). Both approximations combined result in the optimality conditions of Chapter 5. The duality theory in Chapter 6 is closely related to optimality conditions as well; minimal points are characterized by another optimization problem being dual to the original problem. An apphcation of optimality conditions and duahty theory to semidefinite optimization being a topical field of research in optimization, is described in Chapter 7. The results in the last chapter show that solutions or characterizations of solutions of special optimization problems with a rich mathematical structure can be derived sometimes in a direct way. It is interesting to note that the HahnBanach theorem (often in the version of a separation theorem like the Eidelheit separation theorem) proves itself to be the key for central characterization theorems.
Chapter 2 Existence Theorems for Minimal Points In this chapter we investigate a general optimization problem in a real normed space. For such a problem we present assumptions under which at least one minimal point exists. Moreover, we formulate simple statements on the set of minimal points. Finally the existence theorems obtained are applied to approximation and optimal control problems.
2.1
Problem Formulation
The standard assumption of this chapter reads as follows: Let (X, II • II) be a real normed space; "j let 5 be a nonempty subset of X; > and let / : iS —> R be a given functional. J
(2.1)
Under this assumption we investigate the optimization problem
fin fix),
(2.2)
i.e., we are looking for minimal points of / on S, In general one does not know if the problem (2.2) makes sense because / does not need to have a minimal point on S. For instance, ioT X = S = R and f{x) = e^ the optimization problem (2.2) is not
8
Chapter 2. Existence Theorems for Minimal Points
solvable. In the next section we present conditions concerning / and S which ensure the solvability of the problem (2.2).
2.2
Existence Theorems
A known existence theorem is the WeierstraB theorem which says that every continuous function attains its minimum on a compact set. This statement is modified in such a way that useful existence theorems can be obtained for the general optimization problem (2.2). Definition 2.1. Let the assumption (2.1) be satisfied. The functional / is called weakly lower semicontinuous if for every sequence (^n)nGN 1^ S couvcrgiug wcakly to some x G S' we have: liminf/(a:^) > f{x) n—^oo
(see Appendix A for the definition of the weak convergence).
Example 2.2. The functional / : R ^ R with
,. . _ r O i f x  0 ^
^
1
\ 1 otherwise J
is weakly lower semicontinuous (but not continuous at 0). Now we present the announced modification of the WeierstraB theorem. Theorem 2.3. Let the assumption (2.1) he satisfied. If the set S is weakly sequentially compact and the functional f is weakly lower semicontinuous^ then there is at least one x E S with f{x) < f{x)
for all
xeS,
i.e., the optimization problem (2.2) has at least one solution.
2.2. Existence Theorems
Proof. Let {xn)neN be a socalled infimal sequence in S', i.e., a sequence with limf{xn) n—>oo
= inf/(x). xES
Since the set S is weakly sequentially compact, there is a subsequence (^nJiGN converging weakly to some x E S. Because of the weak lower semicontinuity of / it follows f{x) < liminf/(xnj = inf/(:^), and the theorem is proved.
D
Now we proceed to specialize the statement of Theorem 2.3 in order to get a version which is useful for apphcations. Using the concept of the epigraph we characterize weakly lower semicontinuous functionals. Definition 2.4. Let the assumption (2.1) be satisfied. The set E{f) := {{x,a) eSxR\
f{x) < a}
is called epigraph of the functional / (see Fig. 2.1). a /N /
X
Figure 2.1: Epigraph of a functional.
10
Chapter 2. Existence Theorems for Minimal Points
Theorem 2.5. Let the assumption (2.1) he satisfied, and let the set S he weakly sequentially closed. Then it follows: f is weakly lower semicontinuous R with f{x) == \\x — x\\ for all x G X.
2.4. Application to Approximation Problems
21
The functional / is continuous because for arbitrary x^y E. X we have \Mf{y)\
=
fix) > fix). Consequently x E S is a> minimal point of f on S. The following theorem shows that, in general, the reflexivity of the Banach space plays an important role for the solvability of approximation problems. But notice also that under strong assumptions concerning the set S an approximation problem may be solvable in nonreflexive spaces.
•
22
Chapter 2. Existence Theorems for Minimal Points
Theorem 2.19. A real Banach space is reflexive if and only if every nonempty convex closed subset is proximinal. Proof. One direction of the assertion is already proved in the existence theorem 2.18. Therefore we assume now that the considered real Banach space is not reflexive. Then the closed unit ball 5 ( 0 x , l ) := {x e X I \\x\\ < 1} is not weakly sequentially compact and by a James theorem (Thm. B.2) there is a continuous linear functional I which does not attain its supremum on the set S(Ox, 1), i.e., l{x)
sup
l{y)},
yeB{Ox,i)
then one obtains S n B{Ox, 1) = 0 Consequently the set S is not proximinal. • Now we turn our attention to a special problem, namely to a problem of uniform approximation of functions (problem of Chebyshev approximation). Let M be a compact metric space and let C{M) be the real linear space of continuous realvalued functions on M equipped with the maximum norm  •  where \\x\\ •= max \x{t)\ for all x G CiM). II
II
^ ^ ^
I
V /I
\
/
Moreover let 5 be a nonempty subset of C{M)^ and let x G C{M) be a given function. We are looking for a function x E S with
11^ — ^11 ^ 11^ — ^11 for all X E: S (see Fig. 2.8). Since X = C{M) is not reflexive, Theorem 2.18 may not be applied directly to this special approximation problem. But the following result is true. Theorem 2.20. If S is a nonempty convex closed subset of the normed space C{M) such that for any x E S the linear subspace spanned by S — {x} is reflexive, then the set S is proximinal
2.5. Application to Optimal Control Problems
23
/N
\x — x\\ = max \x{t) — x{t)\ I
II
^^^
I
V /
\
n
M = [a, b]
Figure 2.8: Chebyshev approximation.
Proof. For x E S we have inf \\x — x\\ = inf (X x) — {x ~ x)\ xes xes = inf \\x {x — x) xes{x} If V denotes the linear subspace spanned by £ — x and S — {£}, then V is reflexive and Theorem 2.18 can be appHed to the reflexive real Banach space V. Consequently the set S is proximinal. • In general, the linear subspace spanned by S — {x} is finite dimensional and therefore reflexive, because S is very often a set of linear combinations of finitely many functions of C{M) (for instance, monoms, i.e. functions of the form x{t) = l , t , t ^ , . . . ,t^ with a fixed n E N). In this case a problem of Chebyshev approximation has at least one solution.
2.5
Application to Optimal Control Problems
In this section we apply the existence result of Theorem 2.12 to problems of optimal control. First we present a problem which does not
24
Chapter 2. Existence Theorems for Minimal Points
have a minimal solution. Example 2.21. ferential equation
We consider a dynamical system with the dif
x{t) = —uitY almost everywhere on [0,1],
(2.5)
the initial condition :r(0)  1
(2.6)
x{l) = 0.
(2.7)
and the terminal condition
Let the control ?i be a L2function, i.e. u G I/2[0,1]. A solution of the differential equation (2.5) is defined as x{t) =c
u{sfds
for all t G [0,1]
0
with c G R. In view of the initial condition we get t
x{t) = 1  ju{sfds
for all t G [0,1].
0
Then the terminal condition (2.7) is equivalent to
1  f u{sfds = 0. 0 1
Question: Is there an optimal control minimizing Jt'^u{tydt 0
For X = 1/2 [0,1] we define the constraint set 1
5 : = nGL2[0,l]
fuisfds^ll
?
2.5. Application to Optimal Control Problems
25
{S is exactly the unit sphere in L2[0,1]). The objective functional / : S' —> R is given by 1
f{u)= ftMtYdt
for all u e 5.
0
One can see immediately that
0<mif{u). ues Next we define a sequence of feasible controls {un)neN by /j,\ _ j ^ almost everywhere on [0, ^ ) 1 '^^ ^ \ 0 almost everywhere on [^, 1] J ' Then we get for every n G N 1
1
^
INni2[o,i] = / \un{t)\'^dt = / n^dt = 1. 0
0
Hence we have Un e S for all n G N (every Un is an element of the unit sphere in I/2[0,1]). Moreover we conclude for all n G N
ft^nHt ^ "it'
f{Un)= ft\n{tfdt=:
0
3n4
and therefore we get lim/(w„) = 0 = inf/(ti). n—>oo
UES
If we assume that / attains its infimal value 0 on 5', then there is a control u E S with f{u) = 0, i.e.
0
>0
26
Chapter 2. Existence Theorems for Minimal Points
But then we get u{t) = 0 almost everywhere on [0,1] and especially u ^ S. Consequently / does not attain its infimum on S. In the following we consider a special optimal control problem with a system of linear differential equations. Problem 2.22. Let A and B be given (n, n) and (n, m) matrices with real coefficients, respectively, and let the system of differential equations be given as x{t) = Ax{t) + Bu{t) almost everywhere on [to,ti]
(2.8)
with the initial condition x(to) = xo E M^
(2.9)
where — oo < to < ^i < oo . Let the control i/ be a 1/2^[to, ^i] function. A solution X of the system (2.8) of differential equations with the initial condition (2.9) is defined as t
x{t) =xo+
f e^^^'^Bu{s) ds for all t G [to, h]. to
The exponential function occurring in the above expression is the matrix exponential function, and the integral has to be understood in a componentwise sense. Let the constraint set S C 1/2^[to, h] be given as S := {u e L^[to,ti] I \\u{t)\\ < 1 almost everywhere on [to,ti]} (II • II) denotes the I2 norm on BJ^). The objective functional / : 5 —> R is defined by h
f{u)
= j{g{x{t)) + h{u{t))) dt to h

t
/ (^(^0+ I e^^''^Bu{s)ds\+h{u{t))\dt to
to
for ^WueS
2.5. Application to Optimal Control Problems
27
where 5^ : R'^ —> R and h : R^ —> R are real valued functions. Then we are looking for minimal points of f on S. T h e o r e m 2.23. Let the problem 2.22 be given. Let the functions g and h be convex and continuous, and let h be Lipschitz continuous on the closed unit ball. Then f has at least one minimal point on S. Proof. First notice that X := L^[to,ii] is a reflexive Banach space. Since S is the closed unit ball in L2^[to,ti], the set S is closed, bounded and convex. Next we show the quasiconvexity of the objective functional / . For that purpose we define the linear mapping L : 5 —> A(7^[to, ii] (let AC^[to^ ti] denote the real linear space of absolutely continuous n vector functions equipped with the maximum norm) with ti
L{u){t) = I e^^^'^Bu{s)ds
for ^WueS
and all t e [to.ti].
to
If we choose arbitrary Ui,U2 E S and A G [0,1], we get g{xo + L{Xui + {lX)u2){t)) = g{xo + XL{m){t) + {lX)L{u2m) = g{X[xo + L{m){t)] + {l X)[xo + L{u2m]) < Xg{xo + L{ui){t)) + {l~X)g{xo + L{u2){t)) for alH G [to,ii]. Consequently the functional g{xo + L{)) is convex. For every a G R the set Sa:={ueS \ f{u) < a} is then convex. Because for arbitrary Ui^U2 G Sa and A G [0,1] one obtains /(Aui + (1  X)u2) =
f[g{xo + L{Xui +
{lX)u2){t))
to
+h{Xui{t) + {1  X)u2{t))]dt
28
Chapter 2. Existence Theorems for Minimal Points ti
0 for all
Chapter 3. Generalized Derivatives
32
X+h Figure 3.1: A directionally differentiable function. n G N, with the additional property that x + A^/i belongs to the domain S for all n G N. This restriction of the sequences converging to 0 can be dropped, for instance, if S equals the whole space X. Example 3.2. For the function / : R^ _, ]R ^ith f{x^^x,)
= S^f^^
+ ^^^ f f x ^ i o }
fo^^ll(^i,^2)GM^
which is not continuous at 0^2, we obtain the directional derivative
/'(OR^)(/M, h^)
=A>o+ limA\f{X{h„ h)) =\ ( 0I
'11'^^
it h2 = 0
in the direction (/ii,/i2) G M^. Notice that f{0^2) is neither continuous nor linear. As a first result on directional derivatives we show that every convex functional is directionally differentiable. For the proof we need the following lemma.
3.1. Directional Derivative
33
Lemma 3.3. Let X be a real linear space^ and let f : X ^ W be a convex functional. Then for arbitrary x^h E: X the function (^ : R+ \ {0} > R with (^(A) = i ( / ( x + Xh)  f{x))
for
allX>0
A
is monotonically increasing (i.e., 0 < s R is called suhlinear^ if (a)
f{ax) = af{x)
for all x G X and all a > 0
(b) f{x + y) < f{x) + f{y)
for all x,y e X
(positive homogenity), (subadditivity).
Now we show that the directional derivative of a convex functional is subhnear with respect to the direction. T h e o r e m 3.6. Let X be a real linear space ^ and let f : X —^R be a convex functional. Then for every x G X the directional derivative f\x)(') is a sublinear functional. Proof. With Theorem 3.4 the directional derivative f\x){') exists. First we notice that f{x){Ox) = 0. For arbitrary h E X and a > 0 we obtain f'ix)iah)
= lim hfix A—>0+ A
+ Xah)  fix)) = a/'(x)(/i).
3.1. Directional Derivative
35
Consequently /'(x)() is positively homogeneous. For the proof of the subadditivity we fix arbitrary /ii,/i2 ^ X. Then v^e obtain for an arbitrary A > 0 because of the convexity of / / ( x + A(/ii + /i2)) = / ( ^ ( x + 2A/ii) + i ( x + 2A/i2))
and
+ ^\f{x + 2\h^)f{x)]. Hence we get for A ^ 0+ f{x){h^ + h^) < f{x){h)
+ f'{x){h2)
and the proof is complete.
•
If a functional / is defined not on a whole real linear space X but on a nonempty subset 5, the property that / has a directional derivative at x in any direction x — x with x e 5, requires necessarily X + X{x — x) = Xx + {1 — X)x e S for sufficiently small A > 0. This necessary condition is fulfilled, for instance, if S is starshaped with respect to x — a notion which is introduced next. Definition 3.7. A nonempty subset S of a real hnear space is called starshaped with respect to some x E S^ ii for all x E S: Xx + {1 X)x E S for all A G [0,1] (see Fig. 3.2).
36
Chapter 3. Generalized Derivatives
Figure 3.2: A set S' which is starshaped with respect to x.
Every nonempty convex subset of a real linear space is starshaped with respect to each of its elements. And conversely, every nonempty subset of a real linear space which is starshaped with respect to each of its elements is a convex set. Using directional derivatives we obtain a simple necessary and sufficient optimality condition. Theorem 3.8. Let S he a nonempty subset of a real linear space, and let f : S —^ W be a given functional. (a) Let X E S be a minimal point of f on S. If the functional f has a directional derivative at x in every direction x — x with arbitrary x E S, then f{x){x
~x)>0
for all x e S.
(3.1)
(b) Let the set S be convex and let the functional f be convex. If the functional f has a directional derivative at some x e S in every direction x — x with arbitrary x E S and the inequality (3.1) is satisfied, then x is a minimal point of f on S. Proof. (a) Take any x E S. Since / has a directional derivative at x in the direction x — x, we have f'{x){x x)=
lim Ufix A—>0j A
+ X{x  x)) 
fix)).
3.2. Gateaux and Prechet Derivatives
37
X is assumed to be a minimal point of / on 5, and therefore we get for sufficiently small A > 0 f{x +
X{xx))>f{x),
Consequently we obtain f{x){xx)>Q. (b) Because of the convexity of / we have for an arbitrary x E S and all A G (0,1] fix + X{x  x)) = f{Xx + (1  A)^) < Xf{x) + (1 
X)f{x)
and especially fix) > fix) + jifix
+ Xix  x)) 
fix)).
Since / has a directional derivative at x in the direction x — x^ it follows fix)>fix)
+
f'ix)ixx)
and with the inequality (3.1) we obtain
fix) > fix). Consequently S is a minimal point of / on S. D
In part (b) of the preceding theorem one can weaken the assumptions on / and S, if one assumes only that / is convex at x. In this case S needs only to be starshaped with respect to x.
3.2
Gateaux and Prechet Derivatives
In this section we turn our attention to stronger differentiability notions. We want to ensure especially that differentiable mappings are
38
Chapter 3. Generalized Derivatives
also continuous. Furthermore we investigate a known problem from the calculus of variations. Definition 3.9. Let (X,  • x) and (y,  • y) be real normed spaces, let 5 be a nonempty open subset of X, and let f : S —^Y he a given mapping. If for some x E: S and all /i E X the limit
f'{x){h):=^mj{f{x
+
Xh)m)
exists and if f\x) is a continuous linear mapping from X to Y^ then f\x) is called the Gateaux derivative of / at x and / is called Gateaux differentiable at x.
Example 3.10. (a) Let / : R^ —> R be a given function with continuous partial derivatives. Then for every S G R^ the Gateaux derivative of / at X reads as
n^h)
= 4^ fix + Xh)\ dy
= V/(x + Xhfh\
IA=O
=
Vfixfh
A=0^
for all h E (b) Let (X, II • IIx) and (F,  • y) be real normed spaces, and let L : X —> y be a continuous linear mapping. Then the Gateaux derivative of L at every x G X is given as L\x){h) = L{h) for a l l / i G X .
Sometimes the notion of a Gateaux derivative does not suffice in optimization theory. Therefore we present now a stronger concept of a derivative. Definition 3.11. Let (X,  • x) and (Y,  • y) be real normed spaces, let 5 be a nonempty open subset of X, and let / : iS —> F be a
3.2. Gateaux and Prechet Derivatives
39
given mapping. Furthermore let an element x E S he given. If there is a continuous hnear mapping f\x) : X ^Y with the property j . ^ \\f{x + h)f{x)~f{x){h)\\y ll^llx0 \\h\\x
_^
then f{x) is called the Frechet derivative oi f ai x and / is called Frechet differentiable at x. According to this definition we obtain for Frechet derivatives with the notations used above f{x + h)^fix)
+ f'ix)ih)
+ oi\\h\\x)
where the expression o(/ix) of this Taylor series has the property
Example 3.12. We consider a function / : R^ —> R which is continuous with respect to each of its arguments and which has continuous partial derivatives with respect to the two first arguments. Moreover we consider a functional / : C^[a, 6] —> E (with — CXD < a < b < oo) given by 6
f{x)=
Il{x{t),x{t),t)dt
for alia; GC^[a,6].
a
Then we obtain for arbitrary x, /i G C^[a, 6]
f{x + h) fix) b
=
[l{x{t) + h{t),x{t) + h{t),t)
l{x{t),i{t),t)]dt
a b
= jMx{t),^{t),t)h{t) + k{x{t),i^^^
40
Chapter 3. Generalized Derivatives
Consequently the Frechet derivative of / at x can be v^ritten as b
f{x){h)
= A/^(x(t),x(t),t)/i(t) + /i(x(t),x(t),t)/i(t)]rft a
ioi8llheC^[a,b],
Next we present some important properties of Frechet derivatives. T h e o r e m 3.13. Let (X,  • \\x) (ind (Y",  • y) be real normed spaces, let S be a nonempty open subset of X, and let f : S —^ Y be a given mapping. If the Frechet derivative of f at some x E S exists, then the Gateaux derivative of f at x exists as well and both are equal Proof. we have
Let f{x)
denote the Frechet derivative of / at x. Then
implying lim r^A\f{x + Xh)  fix)  f'{x){Xh)\\y
= 0 for all h e
X\{Ox}.
A—>0  A 
Because of the linearity of f{x) we obtain lim hf{x
+ Xh)  f{x)] = f\x){h)
for all
heX.
A—>0 A
D
Corollary 3.14. Let (X,  • x) dnd (Y, \\ • y) be real normed spaces, let S be a nonempty open subset of X, and let f : S —^ Y be a given mapping. If f is Frechet differentiable at some x E S, then the Frechet derivative is uniquely determined.
3.2. Gateaux and Prechet Derivatives
41
Proof. With Theorem 3.13 the Frechet derivative coincides with the Gateaux derivative. Since the Gateaux derivative is as a hmit uniquely determined, the Frechet derivative is also uniquely determined. • The following theorem says that Frechet differentiability implies continuity as well. Theorem 3.15. Let (X,  • x) and (Y",  • y) be real normed spaces^ let S be a nonempty open subset of X, and let f : S —^ Y be a given mapping. If f is Frechet differentiable at some x E S, then f is continuous at x. Proof. To a sufficiently small e > 0 there is a ball around x so that for all X + h of this ball \\f(x
+
h)fix)f'{x){h)\\Y<s\\h\\x.
Then we conclude for some a > 0
\\f{x+h)m\\Y =
\\f{x + h)
fix)
 f{x){h)
fiXx + (1  X)y) + fiXx + (1  X)y)iil  A)(x  y)) and fiv) > fiXx + (1  X)y) + fiXx + (1  X)y)iXix  y)). Since Gateaux derivatives are linear mappings, we conclude further Xfix) + (1  X)fiy) > XfiXx + (1  X)y) + A(l  A)/'(Aa: + (1  X)y)ix  y) + (1  A)/(Aa; + (1  X)y)
XilX)f'iXx +
ilX)y)ixy)
= /(Ax + (1A)y). Consequently, the functional / is convex. D
3.2. Gateaux and Prechet Derivatives
43
If 5 is a nonempty convex open subset of R'^ and f : S —^ R is a continuously partially differentiable function, then the inequality (3.2) can also be written as f{y) > f{x) + Vf{xY{y
 x) for all x,y e S,
If one considers for every x E S the tangent plane to / at (x, /(x)), this inequality means geometrically that the function is above all of these tangent planes (see Fig. 3.3).
Figure 3.3: Illustration of the result of Thm. 3.16. Next we formulate a necessary optimality condition for Gateaux differentiable functional. Theorem 3.17. Let (X,  • ) 6e a real normed space, and let f : X ^ R be a given functional. If x E X is a minimal point of f on X and f is Gateaux differentiable at x, then it follows f(x){h)=0
for all
hex.
Proof. Let an element h e X he arbitrarily given. Then it follows ioT X := h + X with Theorem 3.8, (a)
f'{x)ih) > 0,
44
Chapter 3. Generalized Derivatives
and for x := —h + x we get
f'mh)
> 0.
Because of the linearity of the Gateaux derivative the assertion follows immediately. • Finally, we discuss an example from the calculus of variations. We proceed as in the proof of Theorem 3.17 which, in virtue of Theorem 3.13, holds also for Frechet differentiable functional. Example 3.18. We consider a function continuous with respect to all arguments and partial derivatives with respect to the two first let a functional / : C^[a, 6] —> R (with —oo
0 ioi^WheS
— S
{x}.
The set S can also be written as S = {xe C^[a,h] I x{a) = x{h) = 0}.
3.2. Gateaux and Prechet Derivatives
45
With h E S we have —/i G 5 as well. Because of the linearity of the Frechet derivative v^e obtain f{x){h)=0
for all/iG 5.
With Example 3.12 we have b
f\x){h)
 A/,(x(t),^(t),t)/i(t) + /i(x(t),^(t),t)/i(t)]dt a
for all
heS.
Hence our first result reads h
[k{x{t),x{t),t)h{t)
+ k{x{t),2^^^
= 0 for all h G 5.(3.3)
For further conclusions in the previous example we need an important result which is prepared by the following lemma. L e m m a 3.19. For —oo < a < b < oo let S = {xe C^[a,b] I x{a) = x{b) = 0}. If for some function x G C[a, 6] b
/ x{t)h{t) dt = 0 for all he then X = constant on [a, 6].
Proof. We define b
^=6^/^(')'\dt
S,
Chapter 3. Generalized Derivatives
46
and choose especially h E S with t
h{t) = / {x{s)  c) ds for all t G [a, h].
Then we get b
l{x{t)cfdt
=
a
I{x{t)  c)h{t).\dt a b
=
f x{t)h{t) dt  c[h{b)  h{a)] a
=
ch(h) b
I x{s) ds — c{b — a) = 0.
Hence it follows x(t) = c for all t G [a, b]. D
L e m m a 3.20.
For —CXD < a < b < OO let
S = {xe C^[a,b] I x{a) = x{b) = 0}. / / there are functions x^y E C[a,b] with b
/ [x{t)h{t) + y{t)h{t)] dt = 0 for all he S, a
then it follows y G C^[a, b] and y = x.
(3.4)
3.2. Gateaux and Prechet Derivatives
47
Proof. We define a function (/? : [a, 6] —> R by t
ifit) = / x{s) ds for all t G [a, 6]. a
Then we obtain by integration by parts h
b
I x(t)h{t)dt
^
= ^{t)h{t)\ 
a
(p{t)h{t)dt a
b
=

(f{t)h{t) dt for all he
S,
a
and from the equation (3.4) it foUov^s b
f[(f{t)
+ y{t)]h{t) dt = 0 for all
heS,
a
With Lemma 3.19 we conclude for some constant c G R y{t) = (p{t) + c for all t G [a, b]. Taking into consideration the definition of (p this equality leads to y{t) •=x{t) for all t G [a, 6], and the assertion is shown.
•
Using this last lemma we obtain the following theorem which is well known in the calculus of variations. Theorem 3.21. Let the assumptions of the Example 3.18 be satisfied. If x E S is a minimal point of f on S, it follows —k{x{t),x{t),t)
= k{x{t),x{t),t)
for allte
[a,b].
(3.5)
48
Chapter 3. Generalized Derivatives
Proof. In Example 3.18 the equation (3.3) is already proved to be a necessary optimality condition. Then the application of Lemma 3.20 leads immediately to the assertion. • In the calculus of variations the equation (3.5) is also called the EulerLagrange equation. Example 3.22. Determine a curve x G C^[a, 6] (with —oo < a < b < oo) with smallest length which connects the two end points (a, Xi) and (6, X2) (where xi, 0:2 G M). In other words: We are looking for a minimal point x of f on S with S := {x E C^la^ b] \ x{a) = Xi and x{b) — X2} and 0
f{x) = I y/1 + x{tf dt for all x e S. In this case the EulerLagrange equation (3.5) reads = 0. X=X
This equation is equivalent to d
2x{t)
dt2^/TTJW
= 0.
Then we get for some constant c G M ,
x(t) ^'
= c for all t e \a, b]
VTTW and i = constant. Hence we have the result that the optimal curve x is just the straight line connecting the points (a, xi) and (6, X2) (see Fig. 3.4). This result is certainly not surprising.
3.3.
Subdifferential
49 /1\ ^2
Xi
a
b
t
Figure 3.4: Illustration of the result of Example 3.22.
3.3
Subdifferential
In this section we present an additional concept of a derivative which is formulated especially for convex functional. With the aid of this notion we derive the generalized Kolmogoroff condition known in approximation theory. The characterization of convex Gateaux differentiable functional which is given in Theorem 3.16 proves to be very useful for the formulation of optimality conditions. This characterization motivates the following definition of a subgradient. Definition 3.23. Let (X,  • ) be a real normed space, and let / : X —> R be a convex functional. For an arbitrary x ^ X the set df{x) of all continuous linear functionals / on X with f{x) > f{x) + l{x  x) for all X G X is called the subdifferential of / at x. A continuous linear functional / e df{x) is called a subgradient of / at x (see Fig. 3.5).
Example 3.24. (a) With Theorem 3.16 for every convex Gateaux differentiable functional / defined on a real normed space the subdifferential df{x) at an arbitrary x G X is nonempty. For every x E X we have for the Gateaux derivative f{x) G df{x)^ i.e., f'{x) is a subgradient of / at x.
Chapter 3. Generalized Derivatives
50 A\
y = f{x)
\li{xx)
__ _  2/ = f{x) + k{x  x)
~
y = fix) + hix  x) >
X
X
Figure 3.5: Subgradients of a convex functional.
(b) Let (X, II • II) be a real normed space, and let (X*,  • \\x*) denote the real normed space of continuous linear functionals on X (notice that /x* = sup ^ for all / G X*). Then for every x E X the subdifferential of the norm at x is given as d\\x\\ =
{lex* {lex*\
\ l{x) = \\x\\ and ^U* = 1} if ^ 7^ Ox \\i\\x* < 1}
if X = Ox
Proof. (i) For ^ = Ox we obtain
aii^ll = {lex* = {lex* = {lex*
:E > l{x) \l{x)\
for all x e
X}
< 1 for all X G X \ {Ox}}
X
X
(ii) Now let an arbitrary element x ^ Ox he given. Then we obtain for every continuous linear functional I G X* with l{x) = \\x\\ and /x* = 1 (see Theorem C.4 for the existence of such a functional) l{x) < \\x\\ for all X G X which implies x + l{x x)
= \\x\\  l{x) + l{x) < \\x\
3.3. Subdifferential
51
Hence it follows / G d\\x\\. Finally, we assume that I is a subgradient of the norm at x 7^ OxThen we get \\x\\m
= 2\\x\\  \\x\\  l{x) = \\2x\\  \\x\\  l{2x  x) > 0
and \\x\\ + l{x)
= \\0x\\  \\x\\  l{Ox  x) > 0.
These two inequalities imply/(x) = \\x\\. Furthermore we obtain for all a; G X \\x\\ >
\\x\\+l{xx)
= Pll + ^(x)llxll =
lix).
But then we conclude \\l\\x^ = sup 1 ^
< 1.
Because oi l{x) = \\x\\ this leads to /x* = 1 So the assertion is proved. D
With the following lemma we also give an equivalent formulation of the subdifferential. Lemma 3.25. Let (X,  • ) be a real normed space, and let f : X ^ R be a convex functional. Then we have for an arbitrary xeX df{x) := {/ G X* I f\x){h)
> l{h) for all h e X}
52
Chapter 3. Generalized Derivatives
(where f\x){h) direction h).
denotes the directional derivative of f at x in the
Proof. For an arbitrary / G df{x) we have f{x){h)
= lim \{f{x
+ Xh) ~ f{x)) > l{h) for all
heX.
Hence one set inclusion is shown. For the proof of the converse inclusion we assume that any / G X* is given with f{x){h)
>l{h)
iorallheX.
Then it follows with Lemma 3.3 (for A == 1) f{x + h) f{x) > f{x){h)
> l{h) for alike
X
which means that / G df{x).
•
Next we investigate the question under which assumption a convex functional already has a nonempty subdifferential. Theorem 3.26. Let (X,  • ) 6e a real normed space, and let f : X —^ E. be a continuous convex functional. Then the subdifferential df{x) is nonempty for every x E X. Proof. Choose any point x E X. Since the functional / is continuous at 5, there is a ball around x on which the functional / is bounded from above by some o; G M. Consequently, the epigraph E{f) of / has a nonempty interior (e.g., {x, a H 1) G int(£'(/))), and obviously we have {x^f{x)) ^ int(£'(/)). / is a convex functional, and therefore with Theorem 2.8 the epigraph E{f) of / is convex. Hence the sets E{f) and {(x,/(x))} can be separated with the aid of the Eidelheit separation theorem (Theorem C.2). Then there are a number 7 G R and a continuous linear functional (/,/?) on X x R with {1,(3)^ (Ox*,0) and l{x) + /3a < 7 < l[x) + (3f{x) for all (x, a) G E{f),
(3.6)
3.3. Subdifferential
53
For X == :r we obtain especially Pa f{x).
Consequently we have /3 < 0. If we assume that /? = 0, we obtain from the inequality (3.6) l{x ~ x) i / ( x ) + j{x)
for all (x, a) E E{j)
which implies for a = f{x) fix) > fix)  hix
 x) for all x e X.
Consequently, —^l is an element of the subdifferential dfix).
•
Under the assumptions of Theorem 3.26 it can be shown in addition that the subdifferential is a convex weak*compact subset of X*. Notice that with Lemma 2.13 the convex functional in the previous theorem is already continuous if it is continuous at some point. With the aid of subgradients we can immediately present a necessary and sufficient optimality condition. This theorem is formulated without proof because it is an obvious consequence of the definiton of the subdifferential. Theorem 3.27. Let (X,  • ) be a real normed space, and let f : X —^R be a convex functional. A point x E X is a minimal point of f on X if and only if Ox* E dfix). With the following theorem we investigate again the connection between the directional derivative and the subdifferential of a convex functional. We see that the directional derivative is the least upper bound of the subgradients (compare also Lemma 3.25).
54
Chapter 3. Generalized Derivatives
Theorem 3.28. Let (X,  • ) 6e a real normed space, and let f : X —^ R be a continuous convex functional. Then for every x^h E X the directional derivative of f at x in the direction h is given as f{x){h)=max{l{h)
I
ledf{x)}.
Proof. Let x G X be an arbitrary point and /i G X be an arbitrary direction. With Theorem 3.4 the directional derivative f'{x){h) exists and with Theorem 3.26 the subdifferential df{x) is nonempty. With Lemma 3.25 we have f{x){h)
> l{h) for all I e
df{x).
Hence it remains to show that there is a subgradient / with f\x){h) l{h). For that purpose we define the set
=
T := {{x + Xh, f{x) + Xf'{x){h)) G X X M I A > 0}. Because of Lemma 3.3 we have f{x + \h) > f{x) + Xf\x){h)
for aU A > 0.
Therefore we get {x + A/i, f{x) + \f'{x){h))
0 int(£;(/)) for all A > 0
(as in the proof of Theorem 3.26 notice that the epigraph of / has a nonempty interior because / is continuous). Then it follows int(jE'(/)) f]T = 0. If we also notice that the sets S := E{f) and T are convex, then the Eidelheit separation theorem is applicable (Theorem C.2). Consequently, there are a continuous linear functional / on X and real numbers (3 and 7 with the property (/,/?) 7^ (Ox^^O) and l{x) + Pa f{x)
(3.7)
3.3. Subdifferential
55
which leads to /? < 0. If we assume that P = 0, then we obtain from the inequahty (3.7) with A = 0 l{x  x) f{x) + Xf\x){h)
(3.8)
for all (x, a) G E{f) and all A > 0. For a = f{x) and A = 0 we obtain /(^) > / ( ^ )  ^K^  ^) for all X G X, i.e., —4/ is a subgradient of / at x. For x = x, a = f{x) and A = 1 we also conclude from the inequality (3.8)
nx){h) < ^m. Because oi —hi G df{x) the assertion is shown.
•
As a result of the previous theorem the following necessary and sufficient optimality condition can be given. Corollary 3.29. Let S he a nonempty subset of a real normed space (X, II • II), and let f : X —^R be a continuous convex functional. (a) If X E S is a minimal point of f on S and S is starshaped with respect to x, then max{/(x x)\le
df{x)} > 0 for all x E S.
(3.9)
(b) If for some x E S the inequality (3.9) is satisfied, then x is a minimal point of f on S.
56
Chapter 3. Generalized Derivatives
Proof. The part (a) of this theorem follows immediately from the Theorems 3.8, (a) and 3.28 (together with a remark on page 35). For the proof of the part (b) notice that with Theorem 3.28 and Lemma 3.3 it follows from the inequality (3.9) j{f{x
+ X{x  x))  f{x)) > f{x){x
x)>0
for sllx e S and all A > 0. Hence we get for A = 1 f{x) < f{x)
for all
xeS.
Consequently, x is a minimal point of f on S.
•
For the apphcation of this corollary we turn our attention to approximation problems. Theorem 3.30. Let S be a nonempty subset of a real normed space (X, II • 11)^ and let x ^ X \S be a given element. (a) If X E S is a best approximation to x from S and S is starshaped with respect to x, then max{/(x — x) \ I E X*, l{x — x) = \\x — x\\ and II^IU* = 1} > 0 for all xeS, (3.10) (b) If for some x E S the inequality (3.10) is satisfied, then x is a best approximation to x from S. Proof. X G S' is a best approximation to x from S if and only if X — X 7^ Ox is a minimal point of the norm  •  on 5 — {x}. With Example 3.24, (b) we have d\\x ~ x\\ = {; G X* I l{x x)
= \\x  x\\ and /x* = 1}.
Then the inequahty (3.9) is equivalent to the inequality
3.4. Quasidifferential
57
max{/(x — X + x) \ I E X*, l{x — x) = \\x — x\\ and /x* = 1} > 0 for all X E S'  {x} resulting in m3x{l{x ~x) \ I e X*, l{x ~x) = \\x  x\\ and /x* = 1} > 0 for all X E S. Finally notice in part (a) that the set S — {x} is starshaped with respect to x —£ and the norm  •  is a continuous functional (compare page 21). So this theorem is proved using Corollary 3.29. • The optimality condition for approximation problems given in Theorem 3.30 is also called generalized Kolmogorov condition in approximation theory.
3.4
Quasidifferential
The theory of subdifferentials may also be extended to certain nonconvex functional. Such an extension was proposed by Dem'yanov and Rubinov^ and is the subject of this section. We give only a short introduction to this theory of quasidifferentials. Definition 3.31. Let 5 be a nonempty open subset of a real normed space (X,  • ), let / : 5 —> R be a given functional, and let X G 5 be a given element. The functional / is called quasidifferentiable at x if / is directionally differentiable at x and if there are two nonempty convex weak*compact subsets df{x) and df{x) of the topological dual space X* with the property f{x){h)
= max l{h) + min l{h) for all Ledfix) ledfix)
heX.
The pair of sets Df{x) := {df{x)jdf{x)) is called a quasidifferential of / at X, and the sets df{x) and df{x) are called subdifferential and superdifferential of / at x, respectively. ^V.F. Dem'yanov and A.M. Rubinov, "On quasidifferentiable functionals", Soviet Math. Dokl 21 (1980) 1417.
58
Chapter 3. Generalized Derivatives
Quasidifferentials have interesting properties. But, in general, it is difficult to determine a quasidifferential to a given functional. Notice in the preceding definition that the subdifferential and the super differential are not uniquely determined. For instance, for every ball B{Ox*,e) := {/ G X*  \\l\j_x* < s} with an arbitrary e > 0 the pair of sets {df{x) + B{Ox*^s)^df{x) — B{Ox*')S)) is a quasidifferential of / at X as well. Example 3.32. Let (X,  • ) be a real normed space, and let f : X ^ R and g : X ^ R he convex functional. If / and g are continuous at some x G X, then the functional cp := f — g is quasidifferentiable at x. In this case (9/(x), —dg{x)) is a quasidifferential of (^ at X where df{x) and dg{x) denote the subdifferential of / and g at X, respectively. Proof. By Theorem 3.4 / and g are directionally differentiable and therefore (f = f — g is also directionally differentiable. If df{x) and dg{x) denote the subdifferential of / and g at x (these two sets are nonempty, convex and weak*compact), we define the sets d(p{x) := df{x) and dip{x) := —dg{x). By Theorem 3.28 the directional derivative of (^ is given as 0 so that for alH G N there is some hi e R" with 0 ^ \\hi\\ < ^ and \f{x + K)  fix)  nx)ihi)\>
(3\\hi\\
(313)
Next we set g. •= 1 ^
for all i e N.
(3.14)
ll^ill Obviously we have ll^ill =e
foralHeN,
(3.15)
i.e., Qi belongs to the sphere {a; G R"  x = s} which is compact. Therefore the sequence {gi)ien has a subsequence {gij)jeN converging to some g with 5f = s. If we also set a := I M > 0 for all i G N, e
we obtain lim a^ = 0 and with the equality (3.14) i—•oo
hi — aigi for all i G N.
(3.16)
Finally we define for every i G N (t>i ••= Ifi^ + ^ig)fix)f'{x){aig)\ = \f{x + hi)  fix)  f'ix)ihi)  fix + hi) + fix + aig) +f'ix)ihi)f'ix)iaig)\ = \[fix + hi)fix)f'ix)ihi)] [ifix + hi)  fix + aig))  if'ix)ihi)  f'ix)iaig))]  > \fix + hi)fix)f'ix)ihi)\ \ifix + hi)  fix + aig))  if'ix)ihi) f'ix)aig))\ > \fix + hi)fix)f'ix)ihi)\ ilfix + hi)  fix + aig)\ + \f'ix)ihi) f'ix)iaig)\). For sufficiently large i G N we have x + /ii G
SV\Bix,e)
Chapter 3. Generalized Derivatives
62
and x + aiQ G S
nB{x,e),
and therefore we get with the inequaUties (3.13), (3.11), (3.12) and the equahties (3.16), (3.15) 0i > P\\hi\\  {k\\hi  aig\\ + k\\hi  aig\\) = PaiWgiW 2kai\\gig\\ = ai{Pe^2k\\gig\\). Since the sequence {gi.)j^fi converges to g, we obtain for sufficiently large j EN \\9yg\\ij > cXi, yPe  y
j
(5e =
OLi. •
and because of the positive homogenity of f'{x) \f{x + aig)
f{x)
OLi
\f{x +
rmg)
ai.g)f{x)f'{x){ai.g)\ Oi
(f>i
Pe
a,
2,
Prom the preceding inequality it follows
f{x){g) ^ lim
f{x + ai g) 
f{x)
C^i
which is a contradiction to the definition of the directional derivative. D
The preceding theorem presents an interesting property of directionally differentiable and locally Lipschitz continuous functionals on
3.4. Quasidifferential
63
R'^. It is now used in order to prove the equivalence of the quasidifferentiabihty to the Frechet property for locally Lipschitz continuous functionals on R^. Theorem 3.36. Let S be a nonempty open subset of R^, let x G S be a given element^ and let f : S ~^R be a given functional which is Lipschitz continuous at x. The functional f is quasidifferentiable at X if and only if f has the Frechet property at x with some functional / : R^ —> R which can be represented as difference of two Lipschitz continuous sublinear functionals. Proof, (i) First, assume that / is quasidifferentiable at x. Then / is also directionally differentiable at x, and by Theorem 3.35 it has the Frechet property at x with the directional derivative of / at x /
:= f{x) =
=
max /(.)+_min !(•) iedf{x) ledfix)
max /(•) — max !(•). ledfix)^^ ledfix)
(3.17)
Next we define the functional (^ : R'^ —^ R by ip{h) := max l_{h) for all h e R"". Lp is subhnear because for all /ii, /12 G R^ and all A > 0 we have (^(/ii + /i2) = = < =
max l{hi + /12) l^df{x)
max l{hi) + l{h2)
iedf{x)
max l(hi) + max /(/i2)
i_Gdf{x)~^
Ledfix)'^
(/p(/ii) + (p{h2)
and (f{Xhi)
— max l{Xhi) l^df{x)
=
X max l{hi)
=
max A/(/ii) L^df{x)
= Xip{hi).
Chapter 3. Generalized Derivatives
64
The functional ^ is also continuous because for all h G
\^{K)\ =
0 exists because df{x) is weak*compact). Now we show that the continuous sublinear functional (p is also Lipschitz continuous. For that proof take any /ii, /12 G M^. Then we get with the inequality (3.18) (f{hi)
= (p{hi  h2 + h2) < (p{hi  h2) + ^{h2) < L\\hi  /i2 +(f{h2)
resulting in (p{hi)  (f{h2) < L\\hi  /i2. Similarly one obtains ^(/^2) ^{hi)
< L\\hi /12II,
\(p{hi) ^{h2)\
< L\\hi /i2.
and so it follows
Consequently we have shown that / has the Prechet property at x with / := f{x) which, by the equation (3.17), can be written as the difference of two Lipschitz continuous sublinear functionals. (ii) Now we assume that / has the Frechet property at x with some functional f :W^ —^ R which can be represented as difference of two Lipschitz continuous sublinear functionals. First we prove that / is the directional derivative f{x) of / at x. Because of the positive homogenity of f{x) and / we have f{x){OMn)=f{OMn)=0.
3.4. Quasidifferential
65
Since / has t h e Frechet property at x with / , we get for every h G
\f{x+xh)mfixh)\_ A™^ and
\\Xh\\
j.^ \f{x+xh)mf{xh)\ ^ Q
A^0+ A Because / is positively homogeneous, we obtain lim
I fix+xh) m_
j^^^
A
0.
Hence / is directionally differentiable at x with / == /'(x), and the directional derivative f'{x) can be written as difference of two Lipschitz continuous sublinear functionals (/:?i,(/92 • ^^ —^ IR, ie. /(x)(^1992.
(3.19)
Now fix an arbitrary i G {1,2} and define the set Ai := {(peW
\ ^^x < ipi{x) for all x G W}
which is nonempty convex and weak*compact (in fact, it is a compact subset of E^). Then we have for all x eW ^i{x) > maxip^x.
(3.20)
(peAi
Next, fix any S G R^ and consider the set {{x^(pi{x))} and the epigraph E{(pi). Notice that this epigraph is convex and it has a nonempty interior because cpi is a Lipschitz continuous sublinear functional. Then by the application of the Eidelheit separation theorem (Thm. C.2) there are a number 7 G M and a vector (/,/?) G R^"^^ with (/,/?) 7^ Oi^n+i and F x + /?a < 7 < Fx + P(fi{x) for all (x, a) G E{(fi).
(3.21)
With the same arguments used in the proof of Theorem 3.26 we get P < 0. If we set (p := —^/, we get for x = 0^^ and a = (pi{0]^n) = 0 from the inequality (3.21) ^i{x) < (f^x.
(3.22)
66
Chapter 3. Generalized Derivatives
It follows from the inequality (3.21) that l^x + P(pi{x) < 0 for all xeW
(3.23)
(otherwise we get for some x G M^ with l^x + P(fi{x) > 0 l^{5x) + P(pi{5x) = S{l^x + f5^i{x)) —> oo for 5 —> oo which contradicts the inequality (3.21)). From the inequality (3.23) we conclude (fx ~ ipi{x) < 0 for all x G R"", i.e. (p G Ai. Then it follows from the inequalities (3.20) and (3.22) that (fi{x) — max(^^x, and so we have with the equality (3.19) f (x) {x) = max ip^x — max cp^x (peAi
(peA2
= meixcp^x + min (p'^x for all x EW^. (peAi
(pe—A2
Consequently, the functional / is quasidifferentiable at x.
•
Finally, we also present a necessary optimality condition for quasidifferentiable functionals. T h e o r e m 3.37. Let (X,  • ) be a real normed space, and let f : X —^ R be a given functional If x E X is a minimal point of f on X and if f is quasidifferentiable at x with a quasidifferential {df{x)^df{x)), then it follows
df{x)cdmProof. Using Theorem 3.8,(a) we obtain the following necessary optimality condition for the directional derivative: f'{x){h)
> 0 for all
heX.
3.5. Clarke Derivative
67
Then, by Definition 3.31, we get for a quasidifferential (9/(x), 9/(x)) max l{h)
L^dfix)
> — min 1(h) ledfix)

m_ax l{h) for a l l / i G X
(3.24)
~iedf{x)
Now assume that there is some / G —df{x) with the property I ^ df{x). Since the subdifferential df{x) is convex and weak*compact, by a separation theorem (Thm. C.3) there is a weak*continuous linear functional x** on X* with x**(0 > sup x**(/).
(3.25)
Ledfix)
Every weak*continuous linear functional on X* is a point functional. In our special case this means that there is some h E X with
x^'^(l)=l{h)
forallfeX*.
Then it follows from the inequality (3.25) _ mjtx J(h) > l{h) > max 1(h) ~iedf{x) ledfix) which is a contradiction to the inequality (3.24). Hence our assumption is not true and we have —df(x) C df(x). •
3,5
Clarke Derivative
An interesting extension of the concept of the directional derivative for realvalued mappings was introduced by F.H. Clarke^. This section presents a short discussion of this notion of a derivative. A simple necessary optimality condition is also given. ^F.H. Clarke, "Generalized gradients and applications", Trans. Amer. Math. Soc. 205 (1975) 247262.
68
Chapter 3. Generalized Derivatives
Definition 3.38. Let S' be a nonempty subset of a real normed space (X, II • II), let / : 5 ^ M be a given functional, and let two elements x G S and h E X he given. If the limit superior r{x){h) = limsup J {f{x + Xh) 
f{x))
exists, then f'{x){h) is called the Clarke derivative of / at x in the direction h. If this limit superior exists for all /i G X, then / is called Clarke differentiable at x. The difference between the Clarke derivative and the directional derivative is based on the fact that for the Clarke derivative the limit superior has to be determined and the base element x of the difference quotient has to be varied. In this section we see that the Clarke derivative has interesting properties. But it has also the disadvantage that this derivative describes a functional only "cumulatively". Notice that for the Clarke derivative the limit superior is considered only for those x E X and A > 0 for which x e S and x + Xh E S. There are no difficulties, for instance, if x belongs to the interior of the set S. But other types of sets are possible, too. Example 3.39. For the absolute value function / : R —> E with /(x)  \x\ for all X G R the Clarke derivative at 0 reads for every /i G R f{0){h) = limsup i {\x + Xh\  \x\) = \h\. x^O
^
In order to see this result, notice that we get with the aid of the triangle inequality f{0){h)
=
limsup \{\x +
Xh\\x\)
3.5. Claxke Derivative
69
0+
=
\h\.
=
l i m s u p ^ ( k + A/ix)
For X =^ Xh we obtain f{0){h)
A>0+
>
limsupi(2A/iA/i) A^0+ A
= H Hence we have f{0){h) = \h\. The class of locally Lipschitz continuous functionals is already differentiable in the sense of Clarke. Theorem 3.40. Let S be a subset of a real normed space (X,  • ) with nonempty interior, let x G int(S) be a given element, and let f : S ^^ R be a functional which is Lipschitz continuous at x with a Lipschitz constant k. Then f is Clarke differentiable at x and \f{x){h)\
X
A>0+
X
< k\\h\\ which is to prove.
D
The assumption in the preceding theorem that x belongs to the interior of the set S can be weakened essentially. But then Theorem 3.40 becomes more technical. Clarke derivatives have the interesting property to be sublinear with respect to the direction h. Theorem 3.41. Let S be a subset of a real normed space (X,  • ) with nonempty interior^ let x G int(S) be a given element, and let f : S —^ M. be a functional which is Clarke differentiate at x. Then the Clarke derivative f'{x) is a sublinear functional. Proof. For the proof of the positive homogenity of fix) that f'{x){Ox) = 0 and that for arbitrary h G X and a > 0 f{x){ah)
notice
— limsup  (/(x + \aK) — f{x)) X —> X
^
A>0+
=
1 a limsup — {f{x + Xah) — f{x)) x^x
^^
A>0+
=
cxf'{x){h).
Next we prove the subadditivity of f{x). we get f'{x){h^ + h2)
For arbitrary /ii,/i2 G X
3.5. Clarke Derivative
71
= limsup i (fix + X{hi + h2)) X ^
X
fix))
^
= limsup i (/(x + Xhi + A/12)  f{x + A/12) + f{x + A/12)  /(:^)) X ^
X
^
< limsup  (/(x + A/12 + \hi)  / ( x + A/i2)) X ^
X
^
+ limsup y (/(:r; + A/i2) X —^ X
f{x))
^
A^0+
= /'(S)(/ia) + / ' ( x ) ( M Consequently, f'{x) is sublinear.
•
In the case of a locally Lipschitz continuous convex functional the directional derivative and the Clarke derivative coincide. Theorem 3.42. Let (X,  • ) he a real normed space, and let f : X —> W be a convex functional which is Lipschitz continuous at some X E X. Then the directional derivative of f at x coincides with the Clarke derivative of f at x. Proof. Let h G X denote an arbitrary direction. By Theorem 3.4 and Theorem 3.40 the directional derivative f'{x){h) and the Clarke derivative f^(x){h) of / at x in the direction h exist. By the definition of these derivatives it follows immediately
f'ix)ih) < f\x)ih). For the proof of the converse inequality we write f%x)ih)
=
limsup Ufix
+
Xh)fix))
A>0+
=
lim ^^0+
sup
1 sup  {f{x + Xh) — f{x)).
:c_40+
sup
l(/(x + e/i)/(x)),
\\xx\\ 0 we obtain f{x){h)
= lim
sup
 {f{x + eh) 
f{x)).
If we notice that because of the local Lipschitz continuity of / we have for sufficiently small e > 0  {fix + eh)  fix))   {fix + eh) e
fix))
e
M be a functional which is Lipschitz continuous at some x E int(S'). Then the set dcif{x) of all continuous linear functional / on X with f{x){h)
>l{h)
forall/iGX
3.5. Clarke Derivative
73
is called the generalized gradient of / at x (where f'{x){h) the Clarke derivative of / at x in the direction h).
denotes
For functionals defined on the whole space, notice the formal analogy of the definition of the generalized gradient and the equivalent definition of the subdifferential from Lemma 3.25. The formal difference Ues in the fact that one uses the directional derivative for the subdifferential whereas one works with the Clarke derivative for the generalized gradient. The next result follows immediately from Theorem 3.42 and Lemma 3.25. Corollary 3.44. Let (X,  • ) he a real normed space, and let f : X —> M. be a convex functional which is Lipschitz continuous at some X E X. Then the subdifferential df{x) of f at x coincides with the generalized gradient dcif{x) of f at x. With the following theorem we show that locally Lipschitz continuous functionals have a nonempty generalized gradient. Theorem 3.45. Let S be a subset of a real normed space (X,  • ) with nonempty interior, and let f : S —> R be a given functional. If f is Lipschitz continuous at some x G int(S), then the generalized gradient dcif(x) of f at x is nonempty. Proof. By Theorem 3.40 the Clarke derivative exists and by Theorem 3.41 it is sublinear. Consequently, by the basic version of the HahnBanach theorem (compare Thm. C.l) there is a linear functional / on X which satisfies the inequality f{x){h)
> l{h) for all heX.
(3.26)
For the proof of the continuity of / we choose an arbitrary h E X. Then it follows from the inequality (3.26) and Theorem 3.40
m < nx){h) < \f'{x){h)\ < k\\h\\ (where k >0 denotes a Lipschitz constant) and
i{h) = i{h) < fmh)
< \nx){h)\ < k\\  h\\ = k\\h\\.
74
Chapter 3. Generalized Derivatives
This leads to the inequality
\m\ < km. Hence I is continuous at O^. Because of the hnearity of / the functional I is also continuous on X. This completes the proof. • It is also possible to derive a necessary optimality condition for Clarke differentiable functional. This condition is given in the next theorem. Theorem 3.46. Let T be a superset of a nonempty subset S of a real normed space (X, \\'\\), let f \T ^^ be a given functional, and let T have a nonempty interior. Ifx^SD int(T) is a minimal point of f on S, the set S is starshaped with respect to x and the functional f is Lipschitz continuous at x, then the following inequality holds for the Clarke derivative f\x){xx)
> 0 for
allxeS.
Proof. Let x G 5 be a minimal point of / on S. Since x G int(T) and / is Lipschitz continuous at x, we have for an arbitrary x E S
^(f{x +
Xixx))f{x))
< ^lA(xx) =
A;x — x for sufficiently small A > 0.
Consequently the expression limsup Y ( / ( S + A(x  x)) 
f{x))
exists. Because of the minimality of / at :r and the starshapness of S with respect to x this limit superior is nonnegative. Then we conclude 0
o+
A
Exercises
75
< limsup(/(y + A(x^))/(y)) y —^ X
^
= nx){xx) which completes the proof.
•
If (X, II • II) is a real normed space and / : X —> M is a given functional, then in the case ofS = X the assertion of Theorem 3.46 can also be interpreted as follows: If x G X is a minimal point of / on X, then the functional Ox* is an element of the generalized gradient of / at X.
Exercises 3.1) For the function / : R ^ R with „, . _ J x^ sin ^ if x 7^ 0 / W   0 "" i f x = : 0 determine the directional derivative at x = 0. 3.2) Let M be a compact subset of R^, and let C{M) denote the linear space of continuous realvalued functions on M equipped with the maximum norm  •  where b = m a x b ( t )  for all x G C(M). II
II
^ ^ ^
I
\
/I
\
/
To a given function x G C{M) we consider a functional / : C{M) ^ R with f{x) = \\xx\\
for all X G C ( M ) .
Show that the directional derivative of / at an arbitrary x G C{M) is given as
{
max sgn(x(t) — x{t))h{t) max \h(t)\
\ix ^ x li X = X
teM{x)^
with M{x) :={teM
\ \x{t)  x{t)\ = f{x)}.
76
Chapter 3. Generalized Derivatives
3.3) Let (X, II • II) be a real normed space, and let / : X —> R be a convex functional v^hich is Gateaux differentiable at some x G X. Prove that x is a minimal point of / on X if and only if 3.4) For the function / : E ^ R with f{x) = \x\ for all a; G M determine the subdifferential 5/(0) at zero. 3.5) Let {X, II • II) be a real normed space, and let / : X —> R be a convex functional. Show: For an arbitrary x E X the subdifferential df{x) is a convex set. 3.6) Prove: For every convex function / : R" —> R which is differentiable at some x G R" it follows df{x) = {Vf{x)}. 3.7) Let the function / : R^ ^ R with f{xi,X2) = a;ia;2 for all (a;i,a;2) G M^ be given. Determine a quasidifferential of / at an arbitrary point (a;i,a;2) ^ E^3.8) Consider the function / : R^ ^ R with f(x,x.)
= l l^ill^^l + l ^ \
0
ii{xr,x,)^iO,0) if (0:1,0:2) = ( 0 , 0 ) •
Show that / is quasidifferentiable ai x := (0,0) and that it does not have the Frechet property at x v^ith f := f{x) (directional derivative of / at x). 3.9) Let the function / : R^ > R with / ( x i , . . . ,Xn) = max{a:i,... ,Xn} for all Xi,... ,x^ G R be given. For an arbitrary x eW^ let I{x) :  {i G { 1 , . . . , n} I f{x) = x j .
Exercises
77
Show that the Clarke derivative of / at x in an arbitrary direction h eW^ is given as f{x){h)
=
max{hi},
iei{x)
Chapter 4 Tangent Cones In this chapter certain approximations of sets are considered which are very useful for the formulation of optimality conditions. We investigate socalled tangent cones which approximate a given set in a local sense. First, we discuss several basic properties of tangent cones, and then we present optimality conditions with the aid of these cones. Finally, we formulate a Lyusternik theorem.
4.1
Definition and Properties
In this section we turn our attention to the sequential Bouligand tangent cone which is also called the contingent cone. For this tangent cone we prove several basic properties. First, we introduce the concept of a cone. Definition 4.1. Let C be a nonempty subset of a real linear space X. (a) The set C is called a cone if X e C, A > 0 = ^ Ax G C (b) A cone C is called pointed if X G C,  X G C =^
X =^ Ox
Chapter 4. Tangent Cones
80
•:>::::x>:::C?x:::::::::
Figure 4.1: Cone.
Figure 4.2: Pointed cone.
Example 4.2. (a) The set Wl :={xeW\xi>0
for all i G { 1 , . . . ,n}}
is a pointed cone. (b) The set C :={xe
C[0,1] I x{t) > 0 for all t e [0,1]}
is a pointed cone. In order theory and optimization theory convex cones are of special interest. Such cones may be characterized as follows: Theorem 4.3. A cone C in a real linear space is convex if and only if for all x^y E C
x + yeC.
Proof. x,y eC
(4.1)
(a) Let C be a convex cone. Then it follows for all
2(^ + 2/)^ 2 ^ ^ 2 ^ ^ which implies x + y E C. (b) For arbitrary x,y E C and A G [0,1] we have Xx E C and (1—A)y G C. Then we get with the condition (4.1) Ax + (1  A)^ G C.
4.1. Definition and Properties
81
Consequently, the cone C is convex.
•
In the sequel we also define cones generated by sets. Definition 4.4. space. The set
Let 5 be a nonempty subset of a real linear
cone(5') := {As  A > 0 and s ^ S} is called the cone generated by S,
cone(S')
Figure 4.3: Cone generated by S.
Example 4.5. (a) Let 5 ( 0 x , l ) denote the closed unit ball in a real normed space (X,  • ). Then the cone generated by B(Ox, 1) equals the linear space X. (b) Let S denote the graph of the function / : R ^ R with
f{x) = {
a; sin i 0 "^
ii x ^ 0 ifx = 0
Then the cone generated by S is given as cone(;S) = {{x,y) G M^  y < a;}.
82
Chapter 4. Tangent Cones
Now we turn our attention to tangent cones. Definition 4.6. Let S he a nonempty subset of a real normed space (X, II • II). (a) Let X G cl(S') be a given element. A vector h E X is called a tangent vector to S at x, if there are a sequence {xn)nen of elements in S and a sequence {Xn)neN of positive real numbers with X = lim Xn n—^oo
and h = lim Xn{xn — x). (b) The set T(S', x) of all tangent vectors to 5 at x is called sequential BouUgand^ tangent cone to S' at x or contingent cone to 5' at x. Notice that x needs only to belong to the closure of the set S in the definition of T{S, x). But later we will assume that x E S. By the definition of tangent vectors it follows immediately that the contingent cone is in fact a cone. Before investigating the contingent cone we briefly present the definition of the Clarke tangent cone which is not used any further in this chapter. Remark 4.7. Let x be an element of the closure of a nonempty subset S' of a real normed space (X,  • ). (a) The set Tci{S^x)
:= {h E X \ for every sequence (x^)neN of elements of S with x = lim Xn and n—>oo
for every sequence {Xn)neN of positive real numbers converging to 0 there is ^M.G. Bouligand, "Sur les surfaces depourvues de points hyperlimites (ou: un theoreme d'existence du plan tangent)", Ann. Soc. Polon. Math. 9 (1930) 3241. F. Severi remarked that he has independently introduced this notion (F. Severi, "Su alcune questioni di topologia infinitesimale", Ann. Soc. Polon. Math. 9 (1930) 97108).
4.1. Definition and Properties
83
TiS,x)
Jx
T{S, x)
X
Figure 4.4: Two examples of contingent cones. a sequence {hn)neN with h = lim hn n—>oo
and Xn + Xnhn G S for all n G N} is called (sequential) Clarke tangent cone to 5 at x. (b) It is evident that the Clarke tangent cone Tci{S^x) is always a cone. (c) li x e S^ then the Clarke tangent cone Tci{S^x) is contained in the contingent cone T{S,x). For the proof of this assertion let some h G Tci{S,x) be given arbitrarily. Then we choose the special sequence {x)nen ^^d an arbitrary sequence {\n)nm of positive real numbers converging to 0. Consequently, there is a sequence (/in)nGN with h = lim hn and x+Xnhn G S
Chapter 4. Tangent Cones
for all n G N. Now we set yn := X + Xnhn for all n G N and tn '•= — for all n G N. Then it follows Vn ^ S for all n G N, lim yn = lim {x + XuK) = x n—>oo
n—>oo
and lim tn{yn  x) = lim i—{x + Xnhn  x) = lim hn = h. n—^oo
n^oo Art
n^oo
Consequently, /z is a tangent vector.
D
Figure 4.5: Illustration of the result in Remark 4.7,(c).
(d) The Clarke tangent cone Tci{S, x) is always a closed convex cone. We mention this result without proof. Notice that this assertion is true without any assumption on the set S. Next, we come back to the contingent cone and we investigate the relationship between the contingent cone T(5, x) and the cone generated by S — {x}.
4.1. Definition and Properties
85
Theorem 4.8. Let S be a nonempty subset of a real normed space. If S is starshaped with respect to some x E S^ then it follows cone{S{x})
cT{S,x),
Proof. Let the set S be starshaped with respect to some x e S, and let an arbitrary element x E S he given. Then we define a sequence {xn)neN with Xn '= X \—(x — x) = —X + (1 )x G S' for all n G N. n n \ nJ For this sequence we have lim Xn = X n—>oo
and lim n{xn — x) = X — X. Consequently, x — x is a tangent vector, and we obtain
S{x}cT{S,x), Since T(S', x) is a cone, we conclude cone{S  {x}) C cone(r(5, x)) = T{S, x). D
Theorem 4.9. Let S be a nonempty subset of a real normed space. For every x E S it follows T{S,x) C
cl{cone{S~{x})).
Proof. We fix an arbitrary x E S and we choose any h E T{S^x). Then there are a sequence {xn)neN of elements in S and
86
Chapter 4. Tangent Cones
a sequence {Xn)neN of positive real numbers with x = lim Xn and n—>oo
h = lim Xn{xn — x). The last equation implies n—>oo
h e cl(cone(5'  {x})) which has to be shown.
•
By the two preceding theorems we obtain the following inclusion chain for a set S which is starshaped with respect to some x E S: cone{S{x})
C T{S,x)
C cl(cone(S' {x})).
(4.2)
The next theorem says that the contingent cone is always closed. Theorem 4.10. Let S be a nonempty subset of a real normed space (X, II • II). For every x E S the contingent cone T{S, x) is closed. Proof. Let x E S he arbitrarily chosen, and let {hn)neN be an arbitrary sequence of tangent vectors to 5 at x with lim hn = h E X. n—>oo
For every tangent vector hn there are a sequence (x^JieN of elements in S and a sequence (A^J^^N of positive real numbers with x — lim x^. and hn = lim Xmi^m — ^) Consequently, for every n E N there is a number i(n) G N with W^m — 5 ^ — for all i G N with i > iin) n and ll'^ni(^ni ~ ^) "~ ^n ^ — for alH G N with i > i{n), lb
If we define the sequences {yn)nen ^nd (tn)nGN by Vn '•= Xn.^r.) ^ ^ fo^ all n G N and tn := An,(,) > 0 for all n G N, then we obtain lim yn = x and n—^oo WtniVn  X)  h\\
=
\\Xn,^^^{Xn,^„^x)hn
oo
n—^oo
and X  lim yn, /i2 = lim UniVn  x), n—>oo
n—>oo
Next, we define additional sequences (z^n)nGN ^^d {zn)nen with ^n •= K + iin for all n G N
88
Chapter 4. Tangent Cones
and Zn '= —{XnXn + llnVn) for all n G N.
Because of the convexity of S we have Zn = T
^
Xn + T
^
Vn ^ S foi all n E N,
and we conclude lim Zn n—»oo
=
hm — {XnXn + l^nVn) n—^oo p^
=
l i m —{XnXn n^ooUn
= =
 XnX + flnVn
~ l^nX + XnX +
jlnX)
lim { — {xn  x) + —{Vn  x) + x) n^oo KUn X
l^n
^
and lim Un{Znx) n—^oo
= = =
lim {XnXn n—^oo l i m {Xn{Xn
+ finVn  x ) +
 l^nX) IXniVn
"
x))
/ll + /l2.
Hence it follows /ii + /12 G T{S^ x). Since T(5, x) is a cone, Theorem 4.3 leads to the assertion. • Notice that the Clarke tangent cone to an arbitrary nonempty set S is already a convex cone, while we have shown the convexity of the contingent cone only under the assumption of the convexity of 5'.
4,2
Optimality Conditions
In this section we present several optimality conditions which result from the theory on contingent cones. First, we show, for example, for convex optimization problems with a continuous objective functional that every minimal point x of f on S can be characterized as a minimal point of / on {x} + T{S^ x).
4.2. Optimality Conditions
89
Theorem 4.13. Let S be a nonempty subset of a real normed space (X, II • 11)^ and let f : X —^M be a given functional. (a) If the functional f is continuous and convex, then for every minimal point X E S of f on S it follows: f{x) < f{x + h) for all h G r ( ^ , x),
(4.3)
(b) If the set S is starshaped with respect to some x E S and if the inequality (43) is satisfied, then x is a minimal point of f on S. Proof. (a) We fix an arbitrary x E. S and assume that the inequahty (4.3) does not hold. Then there are a vector h G T{S^ x) \ {Ox} and a number a > 0 with f{x) ~f{x + h)>a>0.
(4.4)
By the definition of h there are a sequence {xn)neN of elements in S and a sequence {Xn)neN of positive real numbers with X = lim Xn
and h = lim hn n—»oo
where hn '•= Xni^n — ^) for all n G N. Because of h ^ Ox we have lim — = 0. Since / is convex and continuous, we obtain with the inequality (4.4) for sufficiently large neN: fM
=
f(—x
An
+ Xnx
+
^
< l{f(x + h) +
= m
x—x)
^n /
a)+{ll)f{x)
90
Chapter 4. Tangent Cones
Consequently, x is not a minimal point of / on 5. (b) If the set S is starshaped with respect to some x E S, then it follows by Theorem 4.8
S{x}cT{S,x). Hence we get with the inequality (4.3) f{x) < f{x + h) for dllheS
{x},
i.e., X is a minimal point of f on S.
•
{x} + T{S,x)
{xex\
f{x) = f{x)}
Figure 4.6: Geometric illustration of the result of Theorem 4.13.
Using Frechet derivatives the following necessary optimality condition can be formulated. Theorem 4.14. Let S be a nonempty subset of a real normed space (X, II • 11)^ and let f be a functional defined on an open superset of S. If X ^ S is a minimal point of f on S and if f is Frechet differentiable at x, then it follows f{x){h)
>OforallheT{S,x).
4.2.
Optimality Conditions
91
Proof. Let x G 5 be a minimal point of / on 5, and let some h G T(5, x) \ {Ox} be arbitrarily given (for h = Ox the assertion is trivial). Then there are a sequence {xn)nen of elements in S and a sequence (An)nGN of positive real numbers with X = lim Xn n—>cx)
and h = lim hn n—>oo
where hn •— ^n{^n — ^) for all n G N. By the definition of the Frechet derivative and because of the minimality of / at X it follows: f\x){h)
=
f{x)(limXr,{Xnx))
=
\imXnf'ix){xnx) n—>oo
= >
—
l i m X„[f{Xn)
n—>oo
 f{x)
\imXn{f{Xn)n—^oo
 {f{Xn)
f{x)
 f{x)
II 11 \\Xn — X\\

x))]
f'{x){Xnx))
l:^ II ^. II/(^")  / ( ^ ) ~ f'i^){^n
n m ll't'nil n^oo
 f'{x){Xn
 X)
= 0. Hence, the assertion is proved.
•
Next, we investigate under which assumptions the condition in Theorem 4.14 is a sufficient optimahty condition. For this purpose we define pseudoconvex functionals. Definition 4.15. Let 5 be a nonempty subset of a real linear space, and let / : 5 —> R be a given functional which has a directional derivative at some x G 5 in every direction x — x with arbitrary x E S. The functional / is called pseudoconvex at x if for all x G 5 f{x){xx)>0
=>
f{x)f{x)>0.
92
Chapter 4. Tangent Cones
Example 4.16. The functions / : R  > R a n d g : R  > R with /(x) = xe^ for all X G R and
1 g{x) = —q—^— for all x G 1+X2 are pseudoconvex at every x G R. But the two functions are not convex. A relationship between convex and pseudoconvex functionals is given by the next theorem. Theorem 4.17. Let S be a nonempty convex subset of a real linear space, and let f : S ^ R be a convex functional which has a directional derivative at some x E S in every direction x — x with arbitrary x E S. Then f is pseudoconvex at x. Proof. We fix an arbitrary x E S. Because of the convexity of / we get for all A G (0,1] f{Xx + (1  X)x) < Xf{x) + (1  A)/(x) and
fix)
> f{x) +
jif{Xx+{lX)x)fix))
= f{x) + ^{f{x +
X{xx))f{x)).
Since / has a directional derivative at x in the direction x — x, we conclude
f{x)m>f'{x){xx). Consequently, if f'{x){x — x) > 0, then
fix)  fix) > 0. Hence / is pseudoconvex at x.
•
It is also possible to formulate a relationship between quasiconvex and pseudoconvex functionals.
4.2. Optimality Conditions
93
Theorem 4.18. Let S be a nonempty convex subset of a real normed space, and let f be a functional which is defined on an open superset of S. If f is Frechet differentiable at every x E S and pseudoconvex at every x E S, then f is also quasiconvex on S. Proof. Under the given assumptions we prove that for every a G M the level set S^:={xeS\
fix) < a]
is a convex set. For this purpose we fix an arbitrary a G E so that Sa is a nonempty set. Furthermore we choose two arbitrary elements x^y E Sa In the following we assume that there is a A G [0,1] with /(Ax + (1  X)y) >a>
max{/(x), f{y)}.
Then it follows A G (0,1). Since / is Frechet differentiable on S, by Theorem 3.15 / is also continuous on S. Consequently, there is a A G (0,1) with /(Ax + (1  ~X)y) > f{Xx + (1  A)^) for all A G (0,1). Using Theorem 3.13 and Theorem 3.8,(a) (which is now applied to a maximum problem) it follows ior x := Xx + (1 — X)y
nx){xx)nx){xx)
=
{lX)f\x){xy)
and 0>nx){yx)
=
Xf{x){xy).
(4.5)
94
Chapter 4. Tangent Cones
Hence we have 0=
f'{x){xy),
and with the equahty (4.5) it also follows
f'{x)iyx)=0. By assumption / is pseudoconvex at x and therefore we conclude
m  fix) > 0. But this inequahty contradicts the following inequality: f{y)m
= f{y)f{Xx < f{y)f{Xx
+ {l~X)y) + {lX)y)
< 0.
•
Using Theorem 3.13 the result of the Theorems 4.17 and 4.18 can be specialized in the following way: If (X,  • ) is a real normed space and if / : X —> R is a functional which is Frechet diflPerentiable at every x G X, then the following implications are satisfied: / convex => =^
f pseudoconvex at every x E X / quasiconvex.
After these investigations we come back to the question leading to the introduction of pseudoconvex functionals. With the next theorem we present now assumptions under which the condition in Theorem 4.14 is a sufficient optimality condition. T h e o r e m 4.19. Let S be a nonempty subset of a real normed space, and let f be a functional defined on an open superset of S. If S is starshaped with respect to some x E S, if f is directionally differentiable at x and pseudoconvex at x, and if f{x){h)>OforallheT{S,x), then X is a minimal point of f on S.
4.3. A Lyusternik Theorem
95
Proof. Because of the starshapedness of S with respect to x E S it follows by Theorem 4.8 S — {x} C T(S', x), and therefore we have f{x){xx)
>OforallxG;S.
Since / is pseudoconvex at x, we conclude f{x)  f{x) > 0 for all x e S, i.e., X is a minimal point of f on S.
•
Notice that the assumption in Theorem 3.8,(b) under which the inequality (3.1) is a sufficient condition, can be weakened with the aid of the pseudoconvexity assumption. This result is summarized with Theorem 3.8 in the next corollary. Corollary 4.20. space, and let f : S functional f have a direction x — x with Then x is a minimal
Let S be a nonempty subset of a real linear —^ R be a given functional. Moreover, let the directional derivative at some x E S in every arbitrary x E S and let f be pseudoconvex at x. point of f on S if and only if
f\x){xx)
4.3
>OforallxeS.
A Lyusternik Theorem
For the application of the necessary optimality condition given in Theorem 4.14 to optimization problems with equality constraints we need a profound theorem which generalizes a result given by Lyusternik^ published in 1934. This theorem says under appropriate assumptions that the contingent cone to a set described by equality constraints is a superset of the set of the linearized constraints. Moreover, it can be shown under these assumptions that both sets are equal. ^L.A. Lyusternik, "Conditional extrema of functionals", Mat Sb. 41 (1934) 390401.
96
Chapter 4. Tangent Cones
Theorem 4.21. Let (X,  • \\x) and (Z,  • \\z) be real Banach spaces^ and let h : X ^ Z be a given mapping. Furthermore^ let some X E S with S:={xeX\ h{x) = Oz} be given. Let h be Frechet dijjerentiable on a neighborhood of x, let h'{) be continuous at x, and let h'{x) be surjective. Then it follows {xeX\
h'{x){x) = Oz} C T{S, x).
(4.6)
Proof. We present a proof of Lyusternik's theorem which is put forward by Werner [347]. This proof can be carried out in several steps. First we apply an open mapping theorem and then we prove the technical inequality (4.14). In the third part we show the equations (4.26) and (4.27) with the aid of a construction of special sequences, and based on these equations we get the inclusion (4.6) in the last part. (1) Since h\x) is continuous, linear and surjective by the open mapping theorem the mapping h\x) is open, i.e. the image of every open set is open. Therefore, if J5(0x, 1) denotes the open unit ball in X, there is some ^ > 0 such that B{Oz^Q)ch\x)B{OxA)
(4.7)
where 5(0^, g) denotes the open ball around Oz with radius g. Because of the continuity of h\x) there is a ^0 := sup{^ > 0 I B{Oz, g) C h\x) S(Ox, 1)}. (2) Next we choose an arbitrary e G (0, y ) . /i'() is assumed to be continuous at x, and therefore there is a 5 > 0 with \\h\x)  h\x)\\Lix,z) < e for all x e B{x,25).
(4.8)
Now we fix arbitrary elements x,x G B{x,2S). By a HahnBanach theorem (Thm. C.4) there is a continuous linear functional I G Z* with
II^IU* = 1
(4.9)
4.3. A Lyusternik Theorem
97
and l{h{S:)h{x)h\x){S:x))
= \\h{^)h{x)h\x){S:x)\\z^
(4.10)
Next we define a functional (/P : [0,1] ^ R by ip{t) = l{h{x + t(S:  x))  th'{x){5:  x)) for all t G [0,1].
(4.11)
Lp is differentiable on [0,1] and we get ip'{t) = l{h'{x + tii  x)){5: x)
h'{x){5:  x)).
(4.12)
By the mean value theorem there is a t e (0,1) with (^(1)  (^(0) = ^'(i).
(4.13)
Then we obtain with (4.10), (4.11), (4.13), (4.12), (4.9) and (4.8) \\h{5:)h{x)h'{x){3:x)\\z = l{h{x)  h{x)  h\x){5^ ~ x)) =
¥'(l)^(0)
=
l{h'{x + t{i  x)){S; ~x)
1 so that a (  + —) < 1 (notice that — <  ) . For the proof of the inclusion (4.6) we take an arbitrary x E X with h'{x){x) — Oz For x — Ox the assertion is trivial, therefore we assume that x j^ Ox We set A := nir stnd fix '
^
/
^
\\x\\x
an arbitrary A G (0, A]. Now we define sequences (r^)nGN and {un)neN as follows: ri = Ox,
98
Chapter 4. Tangent Cones
h'{x){un) = h{x + \x + Tn) for all n e N,
(4.15)
r^+i ^^VnUn for all n G N.
(4.16)
Since h'{x) is assumed to be surjective, for a given r^ G X there is always a vector u^ & X with the property (4.15). Consequently, sequences {rn)nen and {un)nm are welldefined (although they do not need to be unique). From the inclusion (4.7) which holds for ^ = ^ and the equation (4.15) we conclude for every n G N \\Un\\x 1 we have 9 < l   < i .
(4.19)
Then we assert for all n G N: r„U <  r f ( A ) i f ^ ,
(4.20)
\\h{x + Xx + rn)\\z 0 for all n G N An and Xn''=^X + XnX + r(An) for all n G N. From the equation (4.26) we get Xn^ S for aU n G N.
102
Chapter 4. Tangent Cones
Moreover, we have with (4.27) hm Xn = n—>oo
hm x + XnX + r{Xn) n—^oo
— hm X + \n\x 
n>oo X,
+
y
An
/
and we conclude hm iin{xnx)
=
n^oo
hm 7{Xn^+ r{Xn)) n^oo
^ =
A^
V _, r{Xn) hm X H—\—n^oo X.
A^
Consequently, we obtain x G T{S, x) which completes the proof. D
With the following theorem we show that the inclusion (4.6) also holds in the opposite direction. Theorem 4.22. Let {X, \\ • \\x) cind (Z,  • \\z) be real normed spaces, and let h : X —^ Z be a given mapping. Furthermore, let some X G S with S:={xeX\ h{x) = Oz} be given. If h is Frechet differentiable at x, then it follows T{S,x) C {x e X \h\x){x)
={)z}'
Proof. Let y G T(S', S)\{Ox} be an arbitrary tangent vector (the assertion is evident for y — Ox) Then there are a sequence {xn)neN of elements in S and a sequence (An)nGN of positive real numbers with X = lim Xn n—>oo
and y= lim yn
Exercises
103
where Vn '= ^n{^n ~ ^) for all u EN. Consequently, by the definition of the Frechet derivative we obtain: h'{x){y)
= h'{x)[\im =
Xn{xn  x)]
lim Xnh\x){xn — x) n—»oo
= =
lim Xn[h{xn)  h{x)  {h{xn)  h{x) ~ h\x){xn  x))] n—>oo r II II KXn)  l i m WVnWx n>oo

h{x) ~ h'{x){Xn n ZTj \\Xn ~x\\x
X)
= Oz. D
The proof of the preceding theorem is similar to the proof of Theorem 4.14. Since the assumptions of Theorem 4.22 are weaker than those of Theorem 4.21, we summarize the results of the two preceding theorems as follows: Under the assumptions of Theorem 4.21 we conclude T{S,x) = {xeX\ h\x){x) = Oz}.
Exercises 4.1) Let C be a convex cone in a real normed space with nonempty interior int(C). Show: int(C)= int(C)+C 4.2) Let X be a real linear space. Prove that a functional / : X —> R is subhnear if and only if its epigraph is a convex cone. 4.3) Let iS be a nonempty convex subset of a real linear space. Show that the cone generated by S is convex. 4.4) In M^ let the set S := {{x,y) e R'^ \ x + y < 1, 2x + y < 4, 0 < x < § , ? / > 0} be given. Determine the cone generated hyS. 4.5) Let the set S be given as in Exercise 4.4). Determine the contingent cone to 5 at (1, 2).
104
Chapter 4. Tangent Cones
4.6) Let S' be a nonempty subset of a real normed space (X,  • ) with nonempty interior int(5). For every x E int(S') show
T{S,x)=X. 4.7) Let 5'i and S2 be two nonempty subsets of a real normed space. Prove the following implications: (a) xeSiCS2 => T{Sux)cT{S2,x), (b) xeSinS2 ^ T{SinS2,x)cT{Si,x)nT{S2,x). 4.8) Let 5 be a nonempty subset of a real normed space (X,  • ), and let some x E S he arbitrarily given. Show: T{S^ x) = {h E X \ there are a number a > 0 and a mapping r : {0,a] —^ X with lim r{t) = Ox, and there is a sequence t^o+ t {tn)neN of positive real numbers converging to 0 so that x + tnh + r{tn) G S for all n G N}. 4.9) Let X be an element of a subset iS of a real normed space. Prove that the Clarke tangent cone Tci{S^x) is closed and convex. 4.10) Is the function / : R > R with f{x) = x^ for all x G R pseudoconvex at an arbitrary x G R?
Chapter 5 Generalized Lagrange Multiplier Rule In this chapter we investigate optimization problems with constraints in the form of inequahties and equahties. For such constrained problems we formulate a multiplier rule as a necessary optimality condition and we give assumptions under which this multiplier rule is also a sufficient optimality condition. The optimality condition presented generalizes the known multiplier rule published by Lagrange in 1797. With the aid of this optimahty condition we deduce then the Pontryagin maximum principle known from control theory. The classical Lagrange multiplier rule is a generalization of a Fermat theorem (given in 1629) to optimization problems with constraints in the form of equahties. Lagrange developed this rule in connection with problems from mechanics. First he applied this principle to infinite dimensional problems of the classical calculus of variations (where it led to the EulerLagrange equation) and later he extended it also to finite dimensional problems.
5.1
Problem Formulation
First, we present the class of optimization problems for which we formulate the generalized Lagrange multiplier rule as an optimality condition. Furthermore, we discuss several examples.
Chapter 5. Generalized Lagrange Multiplier Rule
106
The standard assumption of this chapter reads as follows: Let (X, II • \\x) and (Z, jj • ^) be real Banach spaces; let (Y, II • y) be a partially ordered real normed space with ordering cone C with a nonempty interior int(C); let 5 be a convex subset of X with nonempty interior int(5); let / : X —> R be a given functional, and let g : X —^ Y^ h : X —^ Z he given mappings; furthermore let the constraint set S:={xeS\ g{x) G  C , h{x) = Oz} be nonempty.
(5.1)
Under this assumption we consider the optimization problem min/(x), i.e., we are looking for minimal points of / on S. The following examples illustrate why the considered class of constraint sets is important for applications. Example 5.1. (a) We consider again the design problem in Example 1.1. For this optimization problem the constraint set reads as follows: S := {x eR^ \ 2000 < x^rrg, xi < Ax2, X2 < Xi, Xi > 0, ^2 > 0}. This set can be obtained, for instance, if we choose in the standard assumption (5.1): X = \Y = R\C = M^,S = Rl andg : R^ R^ with g{xi,x2)
for all (xi,a;2) G
Notice that the mapping h does not appear explicitly (formally, one can also choose the mapping being zero). (b) In Example 1.4 an optimization problem is given which has the constraint set S := {(x. A) G R2 I ax sinha < A for all a G [0,2], ax — siuha > —A for all a G [0,2]}.
5.1. Problem Formulation
107
For the description of this set we choose especially in the standard assumption (5.1): X = R'^,Y = C[0,2f, C = {(v?i, (^2) e C[0,2f I ^i{t) > 0 and (p2{t) > 0 for all t G [0,2]}, ^ = M2 and c/: E^ ^ C[0,2f with ,
.^
/
a: id — sinh —Al \
^
„ .
,^
^0
^(^'^)==(a;id + sinhAljf'^'^^^^(^"^)^^Let id denote the identity on [0,2], and let 1 denote the C[0,2] function which equals 1 on [0, 2]. The mapping h does not appear explicitly. (c) In nonlinear control theory one considers, among other things, the following dynamical system with additional conditions: x{t) = f{x{t)^u{t))
almost everywhere on [to,ti],
g{x{ti)) = OMr, u{t) G VL almost everywhere on [to^^i]Next, we discuss the used notations and the necessary assumptions. The control process is considered on the fixed time interval [toj^i] where —oo < to < ^i < oo. Let the control u be an L ^ function, i.e., u G L^[£o,ti] The dynamical system is described by a system of ordinary differential equations of first order. Let the function / : j^n y^ j^m __^ ]^n j ^ ^ contiuuously partially differentiable. If we define ^I!OO[^OJ^I]
'— {^ • [^Oj^i] —^ ^^ absolutely continuous  i;GL^[to,ti]},
then the space W[oo[^05^i] equipped with the norm  •  defined by \\x\\ = max{xLs,[to,til, iUso[to,ti]} for all x G Wl^[U,ti] is a Banach space. A solution x of the differential equation x=
f{x,u)
for a fixed u G L^\tQ^ ti] is defined as a function x G Wi^^\t^^ ti] with x{t) = f{x{t)^u{t))
almost everywhere on [toj^i]
108
Chapter 5. Generalized Lagrange Multiplier Rule
Then we conclude with the initial condition x(to) — XQ (where XQ E M.^ is a given vector) t
x{t) = xo+
f{x{s),u{s))ds
for all t G [to^^i]
to
For the terminal state x{ti) we require that g{x{ti)) = ORV where ^ : M^ ^ R^ is a continuously partially difFerentiable vector function. Let fi be a convex subset of E ^ with nonempty interior. Among all feasible controls one tries to determine such a control for which a given functional becomes minimal. For the description of the constraint set S of this optimization problem we use the following notations in the standard assumption (5.1): X = W^i^oo[^o?^i] x LZ[to,tilZ = W^i%[to,ti] xW,S = {{x,u) e X I u{t) e n almost everywhere on [to,ti]}, and h : X ^ Z with h{x, u) = ( ^(^  ^°  / /(^(^)' ^(^)) ^' ] for all {X, u) G X.
The constraint g does not appear explicitly in (5.1).
5.2
Necessary Optimality Conditions
In this section we present, under the assumption (5.1), a necessary condition for minimal points of / on S. This optimality condition generalizes the known Lagrange multiplier rule. As an essential tool for the proof of the multipher rule we need the following lemma which is obtained with the aid of the necessary optimality condition of Theorem 4.14 and the Lyusternik theorem. Lemma 5.2. Let the assumption (5.1) he satisfied, and let x he a minimal point of f on S. Let the functional f and the mapping g he
5.2. Necessary Optimality Conditions
109
Frechet differentiable at x. Let the mapping h be Frechet differentiable in a neighborhood of x, and let h'{) be continuous at x. Moreover, let the mapping h'{x) be surjective. Then there is no x E int(S) with f'{x){x — x) < 0, g{x) + g'{x){x — x) G — int(C) and h\x){x — x) —
Proof. Let x G iS be a minimal point of / on S. We fix an arbitrary x G int(S') with x ^ x^ g{x) + g\x){x — x) G ~ int(C) and h\x){x — x) = ^z (if such an x does not exist, the assertion is evident). By the Lyusternik Theorem 4.21 we get x — x E T{S,x) with S:={xeX\ h{x) = Oz}. Consequently, there are a sequence {xn)neN of elements in S and a sequence {Xn)neN of positive real numbers with X = h m Xn
and X — X = lim Hn
(5.2)
n—>'00
where Vn '= ^ni^n
— ^) for a l l 71 G N .
Because of x G int(S') we obtain with the equation (5.2) X + yn E S ioi sufficiently large n G N. Then we get with the convexity of S for sufficiently large n G N
_ _ J_ = x + —{yn + =
(l—)x
xx)
+ —{yn +
x)eS,
and therefore we have Xn G 5 n 5 for sufficiently large n G N.
(5.3)
110
Chapter 5. Generalized Lagrange Multiplier Rule
For the constraint g we obtain
aM ^ gMg'{x){xnx)
+ —g\x){yn)
= r~^K{g{xn)  g{x)  g'{x){xn  x)) + g\x){yn  {x  x)) +g{^) + g\^)i^
 ^)] + f 1  T)^(^) fo^ ^1^ ^^^'
(54)
For n ^ cxD we conclude with A^ = ,•''_l^ for sufficiently large n EN and the Frechet differentiability of g K{g{xn)  g{x)  g'{x){xn  x)) + g'{x){yn {x
x)) ^ 0.
(5.5)
Because of g[x) + g'{x){x x)
e  int(C)
it follows with (5.5) for sufficiently large n G N K{g{xn)  g{x)  g\x){xn  x)) + g'{x){yn  {x  x)) +g{^) + g\^){^  s) ^ c.
(5.6)
Since g{x) G —C, we get from (5.4) and (5.6) with the convexity of C g{xn) G — C for sufficiently large n G N. Hence we obtain with (5.3) for sufficiently large n G N XnE S = {x e S \ g{x) G  C , h{x) = 0^}, and it follows
xxeT{S,x). Then we conclude with Theorem 4.14
f\x){xx)>0. This leads to the assertion.
•
Now we are able to present the generalized Lagrange multiplier rule.
5.2. Necessary Optimality Conditions
111
Theorem 5.3. Let the assumption (5.1) he satisfied, and letx be a minimal point of f on S. Let the functional f and the mapping g he Frechet differentiahle at x. Let the mapping h he Frechet differentiahle in a neighborhood of x, let h\) he continuous at x, and let the image set h'{x){X) he closed. Then there are a real number /J. > 0 and continuous linear functionals li G C* and I2 G Z* with (/i,/i,/2) 7^ (0,0y*,0z*). {fif{x) + /i o g\x) + I20 h\x)){x  x) > 0 for allxeS
(5.7)
and h{gi^)) = 0.
(5.8)
If in addition to the above assumptions, ^',[1] ) cone {S  {x}) + cone ( ^ + ^^^j^^^ ^=YxZ,
(5.9)
then it follows /i > 0. Proof. For the proof of this theorem we consider the two cases that h\x) is not surjective or alternatively that h'{x) is surjective. First, we assume that h\x) is not surjective. Then there is a ^ G Z with z ^ h\x){X) = cl{h\x){X))^ and by a separation theorem (Theorem C.3) there is a continuous linear functional I2 G Z''\{Oz^} with Uz) < inf kiz). zeh'{x){X)
Because of the linearity of h'{x) it follows for every z G h'{x){X) hiz) < kiXz) = Xkiz) for all A G R, and so we get kiz) = 0 iov all z e
h\x){X)
resulting in l2oh\x)=^0x*If we set fi = 0 and /i = Oy*, then the inequality (5.7) and the equation (5.8) are fulfilled, and the first part of the assertion is proved.
112
Chapter 5. Generalized Lagrange Multiplier Rule
For the following we assume the surjectivity of h'{x). In the product space R X y X Z we define the nonempty set
M := {{f\x){xx) + a, g{x)+g\x){xx)+y, h\x){xx)) eRxY X Z \xe mt{S), a > 0, ye int(C)}, and we show several properties of this set. First, we prove that M is a convex set. For this proof we fix two arbitrary elements (ai,6i,Ci), (a2,625^2) G M and an arbitrary A G [0,1]. By definition there are elements Xi,ai,yi and X2,c^2??/2 with the properties ai = f{x){xi h = g{x) + g\x){xi
x) x)+
Ci = h\x){xi
+ ai, a2  f{x){x2  x) + 0^2, yi, 62 = 9{x) + g\x){x2  x) + ?/2, — x), C2 = h'{x){x2 — x).
Consequently, we obtain Aai + (1  A)a2 = f{x){Xxi
+ (1  A)x2  x) + Aai + (1  A)a2,
A61 + (1  A)62 = g{x) + g\x){Xxi + (1  X)x2 x) + Xyi + (1  A)?/2, Aci + (1  A)c2 = h\x){Xxi + (1  X)x2  x) which implies A(ai,61,ci) + (lA)(a2,62,02) E M. Next, we show that M is an open set (i.e. M = int(M)). Since int (M) C M by definition, we prove the inclusion M C int(M). We choose an arbitrary triple (a, 6, c) G M. Then there are elements x G int(S'), a > 0 and y G int(C) with a = / ' ( x ) ( x — x ) + a, b =^ g{x) + g'{x){x  x) \ y and c = h'{x){x — x).
5.2. Necessary Optimality Conditions
113
The mapping h'{x) is continuous, linear and surjective. By the open mapping theorem the image of every open set is open under the mapping h\x). If we notice furthermore that a > O^y E int(C) and that Frechet derivatives are continuous and hnear, it follows (a, 6, c) G int (M). By Lemma 5.2 we have (0,0y,0z)^M, i.e. Mn{(O,Oy,Oz)} = 0. By the Eidelheit separation theorem (Theorem C.2) there are a real number //, continuous hnear functional li G F* and I2 G Z* and a real number 7 with (^,/i,/2) 7^ (0,0y*,0^*) and li{f{x){x  x) + a) + li{g{x) + g\x){x x) +l2{h\x){x ~ x)) > 7 > 0 for all X G int(^), a > 0 and j/ G int(C).
+ y) (5.10)
If we notice that every convex subset of a real normed space with nonempty interior is contained in the closure of the interior of this set, then we get from the inequality (5.10) li{f{x){x  x) + a) + li{g{x) + g\x){x x) +l2{h\x){x  x)) > 7 > 0 for all 2; G ^, a > 0 and y G C.
+ y) (5.11)
From the inequality (5.11) we obtain for x = x fia + li{g{x) +y)>0
for all a > 0 and y eC.
(5.12)
With a = 1 and y = —g{x) we get /x > 0. From the inequality (5.12) it follows for a = 0 h{g{x)) >h{y)
ioi all yeC.
(5.13)
Assume that for some ^ G C it is /i(?/) < 0, then with Xy E C for some sufficiently large A > 0 one gets a contradiction to the inequality (5.13). Therefore we have hiy) > O f o r a l l y G C ,
(5.14)
114
Chapter 5. Generalized Lagrange Multiplier Rule
i.e., li is an element of the dual cone (7* of C, Moreover, the inequality (5.13) implies li{g{x)) > 0. Since x satisfies the inequality constraint, i.e., it is g{x) G —C, we also conclude with the inequality (5.14) li{g{x)) < 0. Hence we get li{g{x)) = 0 and the equation (5.8) is proved. Now, we show the equation (5.7). For a = 0 and y = —g{x) we obtain from the inequality (5.11) fj.f{x){x x)
+ li{g'{x){x  x)) + l2{h\x){x  x)) > 0 for all x G ^
and (/i/(x) + /i o g\x) + I20 h\x)){x x)>0
for all x e S,
Finally, we consider the case that in addition to the given assumptions
(g))oo„e(5W).e(^+{f)>)=yxZ. For arbitrary elements y E Y and z E Z there are nonnegative real numbers a and /? and vectors x E S and c E C with y = g\x){a{x
 x)) + /?(c + g{x))
and z = h'(x){a{x — x)). Assume that /i = 0. Then we obtain with the inequality (5.7), the equation (5.8) and the positivity of l\ h{y) + i2{z) = {h o g\x)){a{x > 0.
 x)) + Ph{c + g{x)) + [h o h'{x)){a{x  x))
Consequently, we have li = Oy* and I2 = Oz* But this contradicts the assertion that (/x,h^h) 7^ (0,Oy*,0^*). • Every assumption which ensures that the multipher /x is positive is also called a regularity assumption. We call the additional assumption
5.2. Necessary Optimality Conditions
115
(5.9) given in Theorem 5.3 the KurcyuszRobins onZowe^ regularity conditon. Notice that a regularity assumption is only a condition on the constraint set S and not a condition on the objective functional / . For // == 0 the inequality (5.7) reads (/i o g'{x) + I20 h\x)) (x  x) > 0 for all
xeS,
and in this case the generalized Lagrange multiplier rule does not contain any information on the objective functional / — this is not desirable. Therefore, in general, one is interested in a necessary optimality condition with /a > 0. For /i > 0 the inequality (5.7) leads to (f{x) \
+ h o g'{x) + h o h'{x))(x  x) > 0 for all jJL
Li
xeS,
/
and from the equation (5.8) it follows
If we define the continuous linear functional u :== li E C* and V := H2 E Z"", then we obtain {f{x) +UO g\x) ^vo h\x)){x  x) > 0 for dl\ x e S
(5.15)
and u{g{x))  0. The functional L := f + uog + vohis also called Lagrange functional Then the inequality (5.15) can also be written as L\x){xx)
> 0 for a l l x G ^
where L\x) denotes the Frechet derivative of the Lagrange functional at X. ^S.M. Robinson, "Stability theory for systems of inequalities in nonlinear programming, part II: differentiable nonlinear systems", SIAM J. Numer. Anal 13 (1976) 497513. J. Zowe and S. Kurcyusz, "Regularity and stability for the mathematical programming problem in Banach spaces", Appl. Math. Optim. 5 (1979) 4962.
116
Chapter 5. Generalized Lagrange Multiplier Rule
If the superset S of the constraint set S equals the whole space X, then the generalized Lagrange multiplier rule can be specialized as follows: Corollary 5.4. Let the assumption (5.1) with S = X be satisfied, and let x be a minimal point of f on S. Let the functional f and the mapping g be Frechet differentiable at x. Let the mapping h be Frechet differentiable in a neighborhood of x, let h\') be continuous at x and let h'{x){X) be closed. Then there are a real number fJ^ > 0 and continuous linear functional li G C* and I2 E Z* with (fi J1J2) 7^ (0,0y*,0z*);
fifix)
+ I10 g\x) + I20 h\x) = Ox*
and h{g{x)) = Q. If, in addition to the above assumptions, the KurcyuszRobins onZowe regularity assumption (5.9) is satisfied, then it follows fi> 0. Proof. In this special setting the inequality (5.7) reads (/x/(x) + I10 g'{x) + I20 h\x)) (x  x) > 0 for all x G X which implies because of the linearity of the considered mappings ^lf\x) + /i o g'{x) + I2 o h\x) = OxThen the assertion follows from Theorem 5.3.
•
The assumptions of Theorem 5.3 (and also those of Corollary 5.4) can be weakened considerably: Instead of the assumption that int(C) is nonempty and h'{x){X) is closed, Theorem 5.3 can also be proved under the assumption that either the set f  ) ) cone ( 5  { . } ) + cone ( ^ + < f ) > is closed or the product space Y x Z is finite dimensional (compare Theorem 5.3.6 in the book [347] by Werner).
5.2. Necessary Optimality Conditions
117
In the proof of Theorem 5.3 we have shown the following implication: If the KurcyuszRobinsonZowe condition is satisfied at some X E S^ then the generalized Lagrange multiplier rule is not fulfilled with /i = 0 at X. Conversely we prove now: If the generalized Lagrange multiplier rule does not hold with /i = 0 at some x E S^ then a condition is satisfied at x which is in a certain sense a modified KurcyuszRobinsonZowe condition (condition (5.16)). This result shows that the KurcyuszRobinsonZowe condition is a very weak regularity assumption. Theorem 5.5. Let the assumption (5.1) he satisfied (without the assumption int{C) ^ ^), and let some x E S be given. Let the mappings g and h be Frechet differentiable at x. If there are no continuous linear Junctionals li G C* and I2 G Z* with {I1J2) 7^ (Oy*,0^*); (/i o g^{x) + I20 h'{x)) (x — x) > 0 for all x E S and h{g{x))=0, then it follows
Proof. We prove the assertion by contraposition and assume that there is a pair (y^z) EY x Z with
The set appearing in the right hand side of this condition is nonempty, closed and convex. By a separation theorem (Theorem C.3) there is then a continuous linear functional (h^h) EY"" x Z"" with {h^h) 7^ (Oy*,02:*) and k{y) + l2{z) < {hog\x)){a{xx))+Ph{c + g{x)) +{l2 o h'{x)) {a[x  x)) for all a > 0, /? > 0, X G ^, c G C.
118
Chapter 5. Generalized Lagrange Multiplier Rule
With standard arguments it follows a{li o g\x) + I20 h\x)){x  x) + (5li{c + g{x)) > 0 for sl\a>0,
P>0,xeS,ceC.
(5.17)
Then we get with a = 0 and P = I hie) > h{g{x))
for a l l c e C
which leads to h G C* and li{g{x)) = 0. From the inequality (5.17) we obtain with a = 1 and P ~ 0 {h o g\x) + I20 h\x)) (x  x) > 0 for all
XES.
Hence the generalized Lagrange multiplier rule is fulfilled with /i = 0 at X. • The KurcyuszRobinsonZowe regularity assumption may seem to be unwieldy. In the following we see that there are simpler (and therefore more restrictive) conditions implying this regularity assumption. Theorem 5.6. Let the assumption (5.1) be satisfied, and let some X ^ S be given. Let the mappings g and h be Frechet differentiable at X. If the mapping h'{x) is surjective and if there is a vector x G int{S) with g{x) + g\x){x — x) G —int{C) and h\x){x — x) =^ Oz, then the KurcyuszRobinsonZowe regularity assumption (5.9) is satisfied. Proof. Let y ^ Y and z E Z he arbitrarily given elements. Because of the surjectivity of h\x) there is a vector x E X with h'{x){x) = z. Then we have z = h'{x){x + \{x  x)) for all A > 0. Since x G int(5'), it follows for sufficiently large A > 0 X + \{x — x) = X(x + X — x) G cone(5'  {x}). Because of g{x) + g'{x){x — x) e —int(C) we also get for sufficiently large A > 0 g{x)  g\x){x x)
+ j{y  g\x){x))
G C.
5.2. Necessary Optimality Conditions
119
If we notice that y = g'{x){x + X{xx)) +j{yg'{x){x
+
+
\{^g{x)+g{x)
X{xx)))^
= g'{x){x {• X{x  x)) + X\^ g{x)  g'{x){x  x) + ^ ( y  g\x){x)) + g{x))
for all A > 0,
then we conclude
Consequently, the KurcyuszRobinsonZowe regularity assumption (5.9) is satisfied. • For the proof of the generalized Lagrange multiplier rule we have assumed that the ordering cone C has a nonempty interior. If we drop this restrictive assumption, the following example shows that the KurcyuszRobinsonZowe condition can be satisfied although the regularity assumption of Theorem 5.6 is not fulfilled. Example 5.7. We consider especially X = Y = L2[0,1] with the natural ordering cone C := {x G 1/2[0,1] I x{t) > 0 almost everywhere on [0,1]} (notice that int(C) = 0). For an arbitrary a G I/2[0,1] we investigate the optimization problem min < x^x > subject to the constraints X—ae C xeC. Let < •, • > denote the scalar product in the Hilbert space I/2[0,1]. Since the ordering cone C is closed and convex, this optimization
120
Chapter 5. Generalized Lagrange Multiplier Rule
problem has at least one minimal solution x (by Theorem 2.18). If we define the set S := C and the constraint mapping g : X —^Y by g{x) = —X + a for all x G X, then we obtain for this minimal solution x g\x) cone(5  {x}) + cone(C + {g{x)]) = g'{x) cone((7  {x}) + cone(C + {g{x)}) = C + cone({5}) + C + cone{{g{x)}) = X because we have X = C—C. Hence this optimization problem satisfies the KurcyuszRobinsonZowe condition. In the following we turn our attention to finite dimensional problems. We specialize Corollary 5.4 for such problems. In this finite dimensional setting one speaks of the socalled F. John conditions^ and in the case of // > 0 one speaks of the KarushKuhnTucker conditions^. Theorem 5.8. Let the objective function f : W^ ^ R and the constraint functions g :W^ ^ W^ and h :W^ ^MP be given. Let the constraint set S which is assumed to be nonempty be given as S :=^ {x eMJ" I gi{x) < 0 for alii e {1,...,m} and hi{x) = 0 for all z G { 1 , . . . ^p}}Let X E S be a minimal point of f on S. Let f and g be differentiable at X and let h be continuously differentiable at x. Moreover^ let the ^F. John, "Extremum problems with inequalities as side conditions", in: K.O. Priedrichs, O.E. Neugebauer and J.J. Stoker (eds.). Studies and Essays^ Courant Anniversary Volume (Interscience. New York, 1948). ^W.E. Karush, Minima of functions of several variables with inequalities as side conditions (Master's Dissertation, University of Chicago, 1939). H.W. Kuhn and A.W. Tucker, "NonUnear programming", in: J. Neyman (ed.). Proceedings of the Second Berkeley Symposium on Mathematical Statistics and Probability (University of California Press, Berkeley, 1951), p. 481492.
5.2. Necessary Optimality Conditions
121
following regularity assumption be satisfied: Assume that there is a vector X EMJ^ with Vgiix^x
< 0 for all i e I{x)
and Vhi{x) X = 0 for all z G { 1 , . . . ,p}, and that the vectors V/ii(x),..., V/ip(x) are linearly independent. Here let I{x) :  { z G l , . . . , m } \g^{x) = Q} denote the index set of the inequality constraints which are ^'active'' at X. Then there are multipliers Ui > 0 {i E I{x)) and Vi E M. {i E { 1 , . . . ,p}) with the property V
V/(x) + Y^ UiVgiix) + Y^ViVhi{x) iel{x)
= O^n.
i=l
Proof. We verify the assumptions of Corollary 5.4. h^{x) is surjective because the vectors V/ii(x),..., Vhp{x) are linearly independent. The ordering cone C on W^ is given as C = W^, Then we have int(C) = {yeW\yi>{) for all i G { 1 , . . . ,m}} and C* A>0
. Consequently, we obtain for some sufficiently small gi{x) + XVgi{xfx
\
g{x) + g\x){Xx) =
G int(C) gm{x) + XVgm{x)^x J
and ( h\x){Xx)
XWhi{xfx
=
y xvhp{xYx
122
Chapter 5. Generalized Lagrange Multiplier Rule
Because of Theorem 5.6 the KurcyuszRobinsonZowe regularity assumption is then also satisfied. By Corollary 5.4 there are elements /i > 0, /i G R!^' and k G W with m
p
and m i=l
For u := ^i G W? and v := H2 G R^ it follows m
p
Vf{x) + ^UiVgi{x)
+ ^ViVhi{x)
= ORn
(5.18)
and m
J2uM^)=0,
(5.19)
Because of the inequalities gi{x) < 0 for alH G { 1 , . . . , m}, Ui>0
for alH G { 1 , . . . , m}
and the equation (5.19) we obtain UiQiix) = 0 for alH G { 1 , . . . , m}.
(5.20)
For every i G { 1 , . . . ,m}\/(x) we get gi{x) < 0, and therefore we conclude with (5.20) Ui = 0. Hence the equation (5.18) can also be written as p iel{x)
i=l
D
123
5.2. Necessary Optimality Conditions
V^2(S)
r ^ ^9s{x)
m = fix) 9s{x)=^0 Figure 5.1: Geometric interpretation of Theorem 5.8.
The regularity assumption given in the previous theorem is also called ArrowHurwiczUzawa condition. Figure 5.1 illustrates the result of Theorem 5.8. If a finite optimization problem has only constraints in the form of inequalities, then a simple sufficient condition for the ArrowHurwiczUzawa regularity assumption can be given. This condition presented in the next lemma is also called Slater condition. Lemma 5.9. Let g : let the constraint set S :={xeW\
be a given vector function, and
gi{x) < 0 for all i G { 1 , . . . ,m}}
be nonempty. If the functions gi^... ,gm CLTC differentiable and convex, and if there is a vector x EM.^ with gi{x) < 0 for a// i G { 1 , . . . , m},
124
Chapter 5. Generalized Lagrange Multiplier Rule
then the regularity assumption of Theorem 5.8 is satisfied. Proof. With Theorem 3.16 we get for every x E S 9i{x) > Qiix) + Vgi{xY{x
 x) for alH G { 1 , . . . , m},
and then by the assumptions we conclude Vgi{xf{x
x)
< gi{x)  gi{x) < 0 for all i G I{x)
where I{x) denotes the index set of the inequality constraints being "active" at S. • The Slater condition can be checked with the aid of the constraint functions gi^... ^gm without the knowledge of the minimal point x. But this condition is also very restrictive since, in general, one has to assume that the functions fi^i,..., fl'm are convex. Example 5.10. We consider again Example 1.1. For X = M? we define the objective function / by f{x) = lxiX2 for all x G R^. The constraint functions gi^... ^g^ are given as gi{x) =2Qmx\x2
\
92(00) = x i  4x2
gs{x) =
xx'Vx2
g^{x)
=
Xi
g^{x)

X2
> for all X G
The constraint set S reads as follows S' := {x G R^ I gi{x) < 0 for alH G { 1 , . . . , 5}}. In this example there are no equality constraints. Figure 5.2 illustrates the constraint set. One can see immediately that the constraints described by 5^4 and g^ do not become active at any x E S. These constraints are therefore called redundant.
5.2. Necessary Optimality Conditions
125
Xi + X2 = 0
Xi — 4x2 = 0 ^ lxiX2 = 200/  lxiX2 = 100/
Figure 5.2: Illustration of the constraint set S.
For X := (20,10) the set Sa:={xeS\
f{x) < a}
with a := f{x) — 200/ is certainly compact because of the continuity of / . Hence / has at least one minimal point x on 5a. Then x is also a minimal point of f on S. If we notice that the assumptions of Theorem 5.8 are satisfied (e.g., the regularity assumption is satisfied for X = £ — x), then there are multipliers ui^U2^us > 0 (g^4 and g^ do not become active) with the property
V/(x)+ J2 ^i^9i{^)=^R^' iei{x)
126
Chapter 5. Generalized Lagrange Multiplier Rule
For the calculation of x,ui^U2 and us one can investigate all possible cases of no, one or two constraints being active at x. For the following we assume that gi and g2 are active. Then we get
.(5f)(0 = (o 2000 = x\x2, Xi = 4X2, Ui >0^U2>
0.
A solution of this nonhnear system reads Xi = 20, X2 = 5, ni — ^ / , U2 = /. Consequently, x = (20,5) G 5^ satisfies the KarushKuhnTucker conditions.
5.3
Sufficient Optimality Conditions
The necessary optimality conditions formulated in the preceding section are, in general, not sufficient optimality conditions if we do not consider additional assumptions. Therefore we introduce first socalled Cquasiconvex mappings and we show the equivalence of the (7quasiconvexity of a certain mapping with the sufficiency of the generalized multiplier rule as optimality condition for a modified problem. Moreover, we present a special sufficient optimality condition for finitedimensional optimization problems. In Definition 2.9 we already introduced quasiconvex functionals. With the following theorem we give a necessary condition for a quasiconvex directionally differentiable functional. Theorem 5.11. Let S be a nonempty convex subset of a real linear space X, and let f \ S —^M. be a quasiconvex functional having a directional derivative at some x E: S in every direction x — x with arbitrary x E S. Then the following implication is satisfied for all xeS f{x)f{x)0+ A
D
The previous theorem motivates the following definition of Cquasiconvex mappings. Definition 5.12. Let 5 be a nonempty subset of a real linear space X, and let C be a nonempty subset of a real normed space (y, II • II). Let f : S —> Y he a. given mapping having a directional derivative at some x G 5 in every direction x — x with arbitrary x E S. The mapping / is called Cquasiconvex at x, if for all x E S:
fix)fix)eC=^f'ix)ixx)eC. Example 5.13. (a) Let 5 be a nonempty convex subset of a real linear space, and let
128
Chapter 5. Generalized Lagrange Multiplier Rule
/ : 5 —> R be a quasiconvex functional having a directional derivative at some x E S in every direction x — x with arbitrary x e S. Then / is R_quasiconvex at x. Proof. We choose an arbitrary x e S with f{x) — f{x) < 0. Then it follows with Theorem 5.11 f {x){x — x) < 0, and the assertion is proved. • (b) Let 5 be a nonempty subset of a real linear space, and let / : S' —> R be a given functional having a directional derivative at some X E S in every direction x ~ x with arbitrary x E S and let / be pseudoconvex ai x E S. Then / is also (R_\{0})quasiconvex at x. Proof. For an arbitrary x E S with f{x) — f{x) < 0 it follows f{x){x — x) < 0 because of the pseudoconvexity of / at x. •
With the aid of the (7quasiconvexity it is now possible to characterize the sufficiency of the generalized multiplier rule as an optimality condition for a modified optimization problem. For that purpose we need the following assumption: Let 5 be a nonempty subset of a real linear space X] let (y, II • y) be a partially ordered real normed space with an ordering cone C; let (Z, II • 11^) be a real normed space; let / : AS —> R be a given functional; let g : S ^Y and h : S ^ Z he given mappings; moreover, let the constraint set S:={XES\
g{x) E  C ,
^ (5.21)
h{x) = Oz]
be nonempty. Theorem 5.14. Let the assumption (5.21) be satisfied, and let / ; g, h have a directional derivative at some x E S in every direction X — X with arbitrary x E S. Moreover, assume that there are linear junctionals u E C and v E Z' with {f{x) +UO g\x) +VO h\x)) (x  x) > 0 for all x E S
(5.22)
5.3. Sufficient Optimality Conditions
129
and u{g{x))  0.
(5.23)
Then x is a minimal point of f on S := {x e S \ g{x) G  C + cone{{g{x)})  cone{{g{x)}), h{x) = Oz} if and only if the mapping {f,g,h)
',S^RxY
xZ
is Cquasiconvex at x with C := (M\{0}) X (  C + cone{{g{x)})  cone{{g{x)})) x {0^}.
Proof. First we show under the given assumptions {f'{x){xx),g'{x){xx),h'{x){xx))^C
for all x G S*. (5.24)
For the proof of this assertion assume that there is a vector x E S with {f'{x){x  x),g'{x){x  x), h\x){x  x)) G C, i.e.
f'{x){xx)
< 0,
g'{x){xx) h'{x){xx)
G C + = Oz.
cone{{g{x)})cone{{g{x)}),
Hence we get with the equation (5.23) for some a, /3 > 0 {f'{x)+UOg'(x)+
v o h'{x))(x — x)
< u{g'(x) {x — x)) < au{g{x))  /3u{g{x)) = 0.
But this inequality contradicts the inequality (5.22). Consequently, we have shown that the condition (5.24) is satisfied.
130
Chapter 5. Generalized Lagrange Multiplier Rule
If the mapping (f^g^h) is Cquasiconvex at x^ then it follows from (5.24) (fix)  f{x),g{x)
 g{x), h{x)  h{x)) i C for all
xeS,
i.e. there is no a; G 5' with f{x) 9{x) h{x)
< fix), e {g{x)} C + cone({p(;r)})  cone({5f(x)}) = C + cone{{g{x)})  cone({^(x)}), = Oz.
If we notice that with g{x) eC
CC
+ cone({^(x)}) 
cone{{g{x)})
it also follows X E S, then ;r is a minimal point of / on S. Now we assume in the converse case that S is a minimal point of f on S, then there is no a: G 5 with fix) gix) Kx)
< fix), e C + coneiigix)})  cone({^(x)}) = {aix)} C + cone({p(x)})  cone({p(x)}), = Oz,
i.e. ifix)  fix), gix)  gix), hix)  /i(x)) ^ C for all x e S. Consequently, with the condition (5.24) we conclude that the mapping if^g^h) is (7quasiconvex at x. • By Theorem 5.14 the Cquasiconvexity of the mapping (/, g, h) is characteristic of the sufficiency of the generalized Lagrange multiplier rule as an optimahty condition for the optimization problem min/(x) xes
5.3. Sufficient Optimality Conditions
131
with
^ := {x e ^ I g{x) eC
+ cone({^(x)})  cone{{g{x)}), h{x) = 0^}.
The set cone{{g{x)}) — cone{{g{x)}) equals the one dimensional subspace of Y spanned by g{x). Figure 5.3 illustrates the modified constraint set S. g3{x)=0
gi{x)=0 92{x)=0
Figure 5.3: Illustration of the set S
For the orginal problem min/(x) xes we obtain the following result. Corollary 5.15. Let the assumption (5.21) he satisfied, and let f, g, h have a directional derivative at some x E S in every direction X — X with arbitrary x E S. If there are linear Junctionals u E C^ and V E Z' with {f{x) +UO g\x) + VO h'{x)) (x  x) > 0 for all x E S and u{g{x)) = 0, and if the mapping {f,g,h)
:S^RxYxZ
132
Chapter 5. Generalized Lagrange Multiplier Rule
is Cquasiconvex at x with C := (M_\{0}) X (  C + cone{{g{x)})  cone{{g{x)})) x {^z}, then X is a minimal point of f on S. Proof. By Theorem 5.14 x is a minimal point of / on S. For every x E S we have gix)
e C
C C + cone{{g{x)}) 
cone{{g{x)}).
Consequently we get S C S^ and therefore x is also a minimal point of / on 5. • With the following lemma we present conditions on f^ g and h which ensure that the composite mapping (/, g^ h) is Cquasiconvex. Lemma 5.16. Let the assumption (5.21) he satisfied, and let f, g, h have a directional derivative at some x E S in every direction X — X with arbitrary x E S. If the functional f is pseudoconvex at Xj the mapping g is {—C{ cone{{g{x)}) ~ cone{{g{x)}))quasiconvex at X and the mapping h is {Oz}quasiconvex at x, then the composite mapping (/, g^ h) is Cquasiconvex at x with C := (M\{0}) X (C + cone{{g{x)})cone{{g{x)}))
x {0^}.
Proof. Choose an arbitrary x E S with
{f,g,h){x){f,g,h){x)eC, i.e. fix)fix)
< 0,
gix)  gix)
e
hix)  hix)
= Qz
C + coneiigix)})
 cone({y(S)}),
Because of the pseudoconvexity of / it follows
f'ix)ixx) h\x){x x)
= Oz
(5.25)
Such a mapping is also called quasilinear at x. For instance, every affinlinear mapping h satisfies the implication (5.25), but also the nonlinear function /i : R —> M with h{x) = x^ for all x G E is quasilinear at every x G R. Now we turn our attention to finite dimensional optimization problems and we give assumptions on f, g and h under which the KarushKuhnTucker conditions are sufficient optimality conditions. Theorem 5.17. Let an objective function f :W^ ^R as well as constraint functions g '.W^ ^ R^ and h :W^ —^MP be given. Let the constraint set S which is assumed to be nonempty be given as S :={xeW
I gi{x) < 0 for a// z G { 1 , . . . ,TO}and hi{x) = 0 for all z G { 1 , . . . ,p}}.
Let the functions / , g'l,..., g'^, / i i , . . . , /ip be differentiable at some x G S. Let the set I{x) :  { i G { ! , . . . , T O } \gi{x)  0 } denote the index set of the inequality constraints being '^active^^ at X. Assume that the objective function f is pseudoconvex at x, the
134
Chapter 5. Generalized Lagrange Multiplier Rule
constraint functions gi {i G I{x)) are quasiconvex at x, and the constraint functions / i i , . . . , /ip are quasilinear at x. If there are multipliers Ui >0{i E I{x)) and ^'^ E M (z G { 1 , . . . ,p}) with p
Vf{x) + Y,
^i^9i{x) + ^ViWhi{x)
iel{x)
= ORn,
(5.26)
i=l
then X is a minimal point of f on S. Proof. If we define additional multipliers Ui := 0 for alH G { 1 , . . . , m}\/(x), then it follows from the equation (5.26) m
p
V/(x) + Y,UiVgi{x)
+ J2^iVhi{x)
z=l
= ORn
i=l
and m
Y^UiQiix)
=0.
Then the assertion results from Corollary 5.15 in connection with Lemma 5.16. One interesting point is only the assumption of the {—R^ + cone{{g{x)}) — cone({5'(x)}))quasiconvexity of g at x. For the verification of this assumption we choose an arbitrary x E M.^ with 9i{^)  9i{^) < otgi{x)  pgi{x)
for alH G { 1 , . . . , m} and some a,/? > 0.
(5.27)
The inequality (5.27) implies 9i{x) — gi{x) < 0 for all i G I{x). Because of the quasiconvexity of the gi {i G I{x)) it then follows V^^(x)^(x x)
0 with Vgi{xY{x
— x) < {u — fJ^)gi{x) for all z e { 1 , . . . , m}.
Consequently, the vector function g is {—W^ + cone({5'(x)}) — cone ({5'(x)}))quasiconvex at x. This completes the proof. •
Example 5.18. We investigate the following optimization problem: min 2x1 + 2x1X2 + x  — lOxi — 10^2 subject to the constraints
xl + xl5 R is defined by / ( x i , X2) = 2x\ + 2x1^2 + x\ — lOxi — 10x2 for all (xi, X2) G M^, and the constraint functions g'l, 5^2 • R^ ^ R are given by g'i(xi,X2) = xl + xl — 5 for all (xi,X2) G R^ and 5^2(^1? ^2) = 3xi + X2 — 6 for all (xi, X2) G M^. Then the KarushKuhnTucker conditions for some x G R^ read as follows: 4xi + 2x2  10 \
/^ 2x1 \
f ^\
f 0
2xi + 2x210j+^^^2x2 j + ^ ^ U J ^ V O Ui{xl + X2 — 5) = 0, '^^2(3x1 IX2  6) = 0. Notice that Xi, X2, ui and U2 must also fulfill the following inequalities: X? + x   5 < 0, 3xi + X2 — 6 < 0, ui > 0 , U2 > 0.
136
Chapter 5. Generalized Lagrange Multiplier Rule
For the determination of solutions xi^X2^ui^U2 we can consider all possible cases of no, one or two active constraints. Under the assumption that only the constraint function gi is active {^ U2 = 0) it follows Axi + 2x2  10 + 2uiXi = 0,
2xi + 2x2  10 + 2uiX2 = 0, Xi "T" Xo
0 — U5
3xi + X2  6 < 0, ui > 0. Xi = 1, X2 = 2 and ui = 1 are a solution of this system. Hence X = (1,2) satisfies the KarushKuhnTucker conditions. Question: Is x also a minimal point of / on the constraint set? In order to answer this question we use the result of Theorem 5.17. The Hessian matrix H of the objective function / reads
^=(2 2 and is positive definite (the eigenvalues are 3 ± \/5). Consequently we have
f{y) = f{x) + Vf{xf{yx) > f{x) + Vf{xf{yx)
+
^{yxfH{yx)
for all X, 2/G R^
Then we know with Theorem 3.16 that / is convex. Since the Hessian matrix of the constraint function gi is also positive definite, we conclude with the same arguments as for / that gi is convex. Consequently, by Theorem 5.17 x = (1,2) is a minimal point of / on the constraint set.
5.4
Application to Optimal Control Problems
It is the aim of this section to apply the generalized Lagrange multiplier rule to an optimal control problem which was already described
5.4. Application to Optimal Control Problems
137
in Example 5.1,(c). For this optimal control problem we deduce the Pontryagin maximum principle as a necessary optimality condition, and moreover we give assumptions under which this maximum principle is a sufficient optimality condition. In the following we consider the optimal control problem in Example 5.1,(c) with a special objective functional and g instead of g. Let / i : R^ ^ R and /a : R"" x R^ ^ R be continuously partially differentiable functions. Then the investigated optimal control problem reads as follows: min f,{x{h))
+J
f2{x{t),u{t))dt
to
subject to the constraints x{t) = f{x(t)^u{t)) almost everywhere on [to^^i]?
(5.28)
g{x{ti)) =Qw, u{t) G Vt almost everywhere on [to,ti]. The assumptions were already given in Example 5.1,(c) (for g instead oig). With the following theorem we present a necessary optimality condition for an optimal control of this control problem. This optimality condition is also called the Pontryagin maximum principle. Theorem 5.19. Let the optimal control problem (5.28) be given. Let the functions /i : R^ > R, /s : R^ x R^ > R, / : R^ x R^ > R^ and g '.W^ —^W be continuously partially differentiable. Let Q be a convex subset ofW^ with nonempty interior. Let u G L^[to,ti] be an optimal control and let x G W^f^[^o,^i] be the resulting state. Let the matrix ^{x(ti)) be row regular. Moreover, let the linearized system
almost everywhere on [to,ti], x(to)
=
be controllable (i.e., for every xi G R'^ there are a controluEL'^\tQ^t^ and a resulting trajectory x G W[^Q^[io5^i] satisfying this linearized
138
Chapter 5. Generalized Lagrange Multiplier Rule
system and for which x{ti) = Xi is fulfilled). Then there are a function p G WT^o^fto, ^i] cirid a vector a EW so that (a)
Pitf
^p{tf^{x{t),u{t))

^{x{t),m)
almost everywhere on [to,ti] (adjoint equation)^
ib) (c)
p{t,Y = a^ll (x(ii)) + ^ (x(ti)) (transversality condition), for every control u G L^ftoj^i] "^ith u{t) G VL almost everywhere on [to,ti] the following inequality is satisfied: p{tf^{x{t)Mt))
 ^{x{t)Mt))]
{u{t)u{t))
R is defined by ti
(p{x,u) = fi{x{ti)) + / f2{x{t),u{t))dt
for all {x,u) G X.
to
The constraint mapping h : X —> Z is given by uf^^A^l ^{')ocoa\x^ u) — \
^Q
ff{x{s),u{s))ds
\
for all {x,u) e X.
Furthermore, we define the set S := {{x^u) G X I u{t) G ^ almost everywhere on [to,ti]}.
5.4. Application to Optimal Control Problems
139
Then the optimization problem which has to be investigated reads as follows: min (f{x^u) "I subject to the constraints I . ^.
{x,u)eS.
J
By the assumption (x,n) is a minimal solution of the optimization problem (5.29). For the formulation of the generalized Lagrange multiplier rule for the problem (5.29) we need the Frechet derivatives of (p and h at {x^u). One can show that these derivatives are given as follows: ip\x,u){x,u)
=
—^{x{ti))x{ti)
^{x{s),u{s))
+
x{s) + ~{x{s),u{s))
u{s) ds
to
for all (x, u) £ X and h'{x^u) {x^u) = ^ ( • )  / [^(:^^(5),^(5))a;(5) + —(x(5),n(5))n(s)] ds, to
^ ^ /{x{ti)) / x{ti) \
for all {x, u) G X.
Notice that h is continuously Frechet differentiable at {x^u). Next, we show that the optimization problem (5.29) satisfies a regularity condition. By Theorem 5.6 the problem is regular, if the mapping h'ix^u) is surjective (notice that we do not have inequality constraints). For the proof of the surjectivity of h'{x^u) we fix arbitrary elements w E W^i%[*05^i] Q^nd y EW. Since the matrix ^(x(ti)) is row regular, there is a vector ^ G R^ with
dx
{x{ti))y = y.
140
Chapter 5. Generalized Lagrange Multiplier Rule
The integral equation
f df _ x{t) — w{t) + / —{x{s)^u{s))x{s)ds
for all t G [to,ti]
to
is a linear Volterra equation of the second kind and therefore it has a solution X := z E Wi^[tQ^ti]. With this solution z we then consider the linearized system of differential equations m
= ^{x{t),u{t))x{t)
+
^{x{t),u{t))u{t)
almost everywhere on [to^ti] with the initial condition
and the terminal condition x{ti) =y
z(ti).
Because of the controllability of this hnearized system there are a control u G L'^\tQ^ ti] and a resulting trajectory x G VKfo^fto, ti] satisfying the initial and terminal condition. Then we obtain h'ix^u) {x + z^u) =
K) + zi)J[^ix{s)Ms)){x{s) xi
+ z{s))
to
+^ixis),u{s))
u{s)\ ds, ^{x{h))
fvdf _ _ w{) + x{)  / y—{x{s),u{s))x{s) to
{x{h) +
z{h))\
df _ _ 1 + ^(^(s),u{s))u{s)\^ds,
5.4. Application to Optimal Control Problems
141
Consequently, the mapping h'{x^u) is surjective, and we can choose the parameter /x as 1 in the generalized Lagrange multiplier rule. Since all assumptions of Theorem 5.3 are satisfied, there are a continuous linear functional I G {Wi^\t{),ti]y and a vector a E M^ with ((/;'(x,u) + (/,a) o h'(x^u)) {x — x^u — u) > 0 for all x E S. Then it follows h
'{x{h)) (xih)  x{h)) + I [^{x{s),u{s)) dx
{x{s)  x{s))
to
5/2 _
_
+g;^ix{s),u{s))
(uis)  u{s))
dl
+l(xi.)x{)l
dx
ds
{x{s),u{s)){x{s)x{s))
to
df,
+ ^(3^(s),^i(s))(«(s)M(s))
ds
W^{x{h)){x{tr)x{U)) (5.30)
> 0 for all (x, u) e S. If we plug u = u in the inequality (5.30), then we get ti
  ( x ( t i ) ) {x{h)  x{h)) + J ^{x{s),
u{s)) {x{s)  x{s)) ds
to
.L,..,/..).,.).(....... to
\
+a^^{x{h)) dx
/
{x{h)  x{h)) > 0 for all x G
W^Jto.h]
and ti
' (x(ti)) x{h) + J ^{x{s)Ms)) dx to
x{s) ds
142
Chapter 5. Generalized Lagrange Multiplier Rule
+/
x{') 
/
—{x{s),u{s))x{s)ds
to
rdg, W^{x{h))x{h) = 0 for sllxeW^^[to,ti]]
(5.31)
for x = x it follows ti
df ^{x{s),u{s))
{u{s)  u{s)) ds
/ to
\
)
to
> 0 for all u G I/^[to,ti] with u{t) E Q almost everywhere on [io^^i]
(5.32)
Next, we consider the equation (5.31) and we try to characterize the continuous linear functional l. For this characterization we need the following assertion: If $ is the unique solution of
almost everywhere on [to^ti]
(5.33)
then for an arbitrary y e W^ro^ftoj^i] the function x(.) = y{) + $(.) J ^\s)^{x{s),u{s))
y{s) ds (5.34)
to
satisfies the integral equation x{)  J ^ ( ^ ( s ) , «(5)) x{s) ds = y{). to
(5.35)
5.4. Application to Optimal Control Problems
143
For the proof of this assertion we plug x (as given in (5.34)) in the left hand side of the equation (5.35) and we obtain by integration by parts
X
f df _ ^*^~y ^(X(5),ll(5))x(5)ds to
= y{') + H') J
^\s)^{x{s),u{s))y{s)ds
to
l^{x{s),u{s))[y{s) to
s
+$(s) J ^\a)^{x{a),u{a))
y{a) da
to
= y{) + <E>() / ^\s)^{x{s)Ms))
y{s) ds
to
f df _ s
 j ^s) I to
^\a)^(x{a),u{a))y{a)dads
to
y{) + $(•) j ^\s)^{x{a),u{a))
y{a) dads
to
r df _ J
•Q^{^is),u{s))y{s)ds to
^•)
J
to
^\s)^{x{s),u{s))y{s)ds
144
Chapter 5. Generalized Lagrange Multiplier Rule
+ J ^s)^\s)^{x{s),
u{s)) y{s) ds
to
=
y{)
Hence the equation (5.35) is proved. For an arbitrary function y G ^[^^^[tojti] we conclude from the equation (5.31) with the aid of the equation (5.34)
m = ((.(«)+a(.fe))) f
tl
\
y{h) + $(ti) j $  i ( , ) g ( x ( 5 ) , u{s)) y{s) ds to tl
S
to
df _
to
_
\
— {x{a),u{a)) y{a) da 1 ds. Integration by parts leads to tl
to
df
— {x{s),u{s))y{s)dsj
\
*1
t df
 /
~{x{s),u{s))y{s)ds
to tl
J^{xis)Ms))Hs)ds to t
J^Hs)^{x{s),uis))y{s)ds to
to tl
t
+11 ^ix{s),u{s)) Hs) ds ^\t)^(x{t)Mt)) y{t) dt to to
5.4. Application to Optimal Control Problems
145
^(x(tO) + a^ff(s(tO))y(ti) (x(t,))+a'^(x(f,)))$(t,)»^'W to
g^{x{t)Mt))^{mMt)) tl
 J ^{x{s),u{s))
$(s) ds
^\t)^{x{t),u{t))
to t
+1
^{x{s),uis))^s)ds^\t)^ix{t)Mt))]y{t)dt
to
{x{t^)) +
a''^{x{h)))^t,)^\t)
to
^ix{t),u{t))^{x{t)Mt)) tl
 J ^ix{s),uiB))
^s) ds ^\t)^{xit),
uit))]y{t) dt
t
iorallyeW^^JtoM For the expression in brackets we introduce the notation r(t)^, i.e.
df2 {x{t),u{t)) dx h
^{xis)Ms))Hs)ds^\t)^{x{t)Mt)) almost everywhere on [toj^i]
146
Chapter 5. Generalized Lagrange Multiplier Rule
With the equation (5.33) it follows (compare page 217)
df _ =
—$~'^(t) —{x{t)^u{t))
almost everywhere on [to,ti].
Then we obtain
t
almost everywhere on [to,fi]. For
tl
 f j^{x{s),u{s))^s)ds^\t)
foralHG [to,ti]
we get p{t) = —r{t) almost everywhere on [to,ti]. Then it follows
i.e., the transversality condition is satisfied. Moreover, we conclude
p{tr^{x{t),u{t))^{x{t)Mt)) ^{x{t,))
+
a^^{x{t,))]^t^)^Ht)^{mMt))
5.4. Application to Optimal Control Problems
147
t
dx =
—p{tY almost everywhere on [ioj^i]
Hence p satisfies the adjoint equation
almost everywhere on [toj^i]Then the continuous linear functional / can be written as Ky)=p{tify{ti)
 Jp{tfy{t)dt
for all y G
W^^JtoM
to
Now we turn our attention to the inequality (5.32). From this inequality we obtain by integration by parts ti
0
•
158
Chapter 5. Generalized Lagrange Multiplier Rule
5.8) Let S' be a nonempty subset of M^, and let f : S —^ B., g : S ^ R^ and h : S —^MP he given functions. Let the constraint set S := {x ^ S
I gi{x) < 0 for all z G { 1 , . . . , m} and /i.(x) = 0 for all z E { 1 , . . . ,p}}
be nonempty. Let the functions / , g^i,..., g'^, / i i , . . . , /ip be differentiable at some x ^ S. Let there be multipliers Ui > 0 {i e I{x)) and Vi eR {i e {1,... ,p}) with P
\T
V/(^) + Y. ^i^9i{^) + X]^iV/i,(x) j (x  x) > 0 iei{x)
2=1
^
for all X e. S. Let / be pseudoconvex at x, for every i E I{x) let gi be quasiconvex at x, for every z G { 1 , . . . ,p} with t?^ > 0 let /i^ be quasiconvex at x, and for every i G { 1 , . . . ,p} with Vi < 0 let —/li be quasiconvex at x. Prove that x is a minimal point of / onS. 5.9) Determine an optimal control u G i^^[0,1] of the following problem: 1
r /
1
^lXl(f)   X i ( t ) + 2^2(t) 
2
dt X2{t)
0
subject to the constraints x^{t) = Uu^it)  2m{tf  x,{t)  U2{t) 1 , ^ ^„ r m i X2{t) = 12u2{t)  2u2{tf  X2{t)  U^{t) j ^•® °'' ^^' ^J a;i(0) = Xo^, 0:2(0) = a:o2 •^ ^ / ~ „ > almost everywhere on [0,1] where xoi and XQ^ are given real numbers.
Chapter 6 Duality The duality theory is also an additional important part of the optimization theory A main question which is investigated in duahty theory reads as follows: Under which assumptions is it possible to associate an equivalent maximization problem to a given (in general convex) minimization problem. This maximization problem is also called the optimization problem dual to the minimization problem. In this chapter we formulate the dual problem to a constrained minimization problem and we investigate the relationships between the both optimization problems. For a linear problem we transform the dual problem in such a way that we again obtain a linear optimization problem. Finally we apply these results to a problem of hnear Chebyshev approximation.
6.1
Problem Formulation
In this section we consider a constrained optimization problem. Let the constraints be given in the form of a general system of inequalities. Then we associate a socalled dual problem to this optimization problem, the socalled primal problem. First, we need Definition 6.1. Let S' be a nonempty convex subset of a real linear space, and let y be a partially ordered real linear space with an ordering cone C A mapping g : S —^ Y is called convex^ if for all
160
Chapter 6. Duality
Xg{x) + (1  A) g{y)  g{\x + (1  \)y) e C for all A G [0,1].
Example 6.2. Let 5 be a nonempty convex subset of a real linear space, and let / i , . . . , /^ : /§ —> M be convex functionals. If the linear space R'^ is supposed to be partially ordered in a natural way (i.e., C := Wl), then the vector function / = ( / i , . . . , /^) : 5 > M^ is convex. Now we turn our attention to a class of mappings which are slightly more general than convex ones. Definition 6.3. Let 5 be a nonempty subset of a real linear space and let F be a partially ordered real linear space with an ordering cone C. A mapping g : S —^Y is called convexlike^ if the set g{S) + C is convex.
Example 6.4. (a)
Let S' be a nonempty convex subset of a real linear space, and let y be a partially ordered real linear space with an ordering cone C. Every convex mapping g : S —^Y is also convexlike. Proof. We have to show that the set g{S) + C is a convex set. For that purpose choose arbitrary elements yi,y2 ^ g{S) + C and an arbitrary number A G [0,1]. Then there are elements Xi,X2 E S and ci, C2 G C with 2/1 = g{xi) + ci and 2/2 = ^ ( ^ 2 ) + C 2 .
Consequently, we get with the convexity of g
6.1. Problem Formulation
= e
161
A^(xi) + (1A)^(:r2) + Aci + ( 1  A ) c 2 {g{Xxi + {lX)x2)} + C + XC + ilX)C
I.e.
Xyi + (1  A2/2) G g{S) + C. Hence the set g{S) + C is convex, and the mapping g is convexhke. D
(b)
We consider the mapping 5^ : R —> R^ with g(x) = (
.^
) for all
xeR.
Let the real Hnear space R? be partially ordered in a natural way (i.e., C := M+). Then the mapping g is convexlike but it is certainly not convex. The preceding example shows that the class of convexlike mappings includes the class of convex mappings, and, in fact, it goes beyond this class slightly. After the introduction of convexlike mappings we are now able to formulate the standard assumption for the following investigations: Let AS be a nonempty subset of a real linear space X; let (F, II • II) be a partially ordered real normed space with the ordering cone C; let / : AS ^ R be a given objective functional; let g : S ^Y he a> given constraint mapping; } (6.1) let the composite mapping (/, g') : 5 —> R x y be convexlike (with respect to the product cone R+ x C in R x F ) ; let the constraint set be given as S := {x E S \ g{x) G —C} which is assumed to be nonempty. If the set S is convex, if the objective functional / is convex and if the constraint mapping g is convex, then the composite mapping
162
Chapter 6. Duality
{f^g)'S^'RxYis convexlike (with respect to the product cone R_. X C in R X y ) . Because of the assumption of the convexhkeness of (/, S') in (6.1) it is even possible to treat certain nonconvex optimization problems with this duality theory. Under the assumption (6.1) we investigate the constrained optimization problem min/(x) subject to the constraints g{x) G c
i ^
/^ r>\ ^''^
xeS. In this context the optimization problem (6.2) is also called primal problem. With the following lemma we see that, under the additional assumption of the ordering cone being closed, this problem is equivalent to the optimization problem min sup f{x) + u{g{x)) xes uec*
(6.3)
where C* denotes the dual cone of C Lerama 6.5. Let the assumption (6.1) he satisfied and in addition let the ordering cone C he closed. Then x is a minimal solution of the prohlem (6.2) if and only if x is a minimal solution of the prohlem (6.3). In this case the extremal values of hoth problems are equal. Proof. First we assume that x G 5' is a minimal point of / on S. For every x E S with g{x) G —C we have u{g{x)) < 0 for all z/ G C* and therefore we get sup u{g{x)) = 0. uec* Since C is convex and closed, for every x E S with g{x) ^ —C there is, by a separation theorem (Theorem C.3), a xZ G C* \ {Ox*} with u{g{x)) > 0
6.1. Problem Formulation
163
which imphes sup u{g{x)) = oo. uec* Consequently, we obtain for every x G 5 SM^ f{x)+u{g{x)) uec*
= f{x)+sup u{g{x)) uec*
=m
< f{x) + sup u{g{x)) uec* < sup f{x) + u{g{x)). uec* Hence x G S is also a minimal solution of the optimization problem (6.3). Finally, we assume that x G 5 is a minimal point of the functional (/9 : 5 —> M with (p{x) = sup f{x) + u{g{x)) for all x G S' uec* on S. Assume that g{x) ^ —C. Then with the same arguments as above we get sup u{g{x)) — oo uec* which is a contradiction to the solvability of problem (6.3). Consequently, we have sup u{g{x)) = 0. uec* Then we obtain for all x E S fix)
= fix) + sup uigix)) ueC* = sup fix) + uigix)) uec* < sup fix) + uigix)) uec*
=
fix) + sup uigix)) uec*
= m
Hence x G 5 is a minimal point of / on S'.
•
164
Chapter 6. Duality
Now we associate another problem to the primal problem (6.2). This new problem results from the problem (6.3) by exchanging "min" and "sup" and by replacing "min" by "inf" and "sup" by "max". This optimization problem then reads: max inf f{x) + u{g{x)).
(6.4)
The optimization problem (6.4) is called the dual problem associated to the primal problem (6.2). Obviously, this dual problem is equivalent to the optimization problem max A subject to the constraints f{x) + u{g{x)) > A for all X G ^
"j I (
XeR,ueC\
)
If tZ G C* is a maximal solution of the dual problem (6.4) maximal value A, then (X^u) is a maximal solution of the (6.5). Conversely, for every maximal solution (X^u) of the (6.5) tZ is a maximal solution of the dual problem with the value A.
6.2
. . ^ ^^ with the problem problem maximal
Duality Theorems
In this section the relationships between the primal problem (6.2) and the dual problem (6.4) are investigated. We present a socalled weak duality theorem and a socalled strong duality theorem which says in which sense the primal and dual problem are equivalent. First we formulate a socalled weak duality theorem. Theorem 6.6. Let the assumption (6.1) he satisfied. For every X E S (i.e., for every feasible element of the primal problem (6.2)) and for every u E C (i.e., for every feasible element of the dual problem (6.4)) the following inequality is satisfied: inf f{x) + xes
u{g{x))0,yeC}
By the assumption (6.1) the composite mapping {f^g):S^RxYis convexlike, and therefore the set M is convex. Because of int(C) 7^ 0
166
Chapter 6. Duality
the set M has a nonempty interior int(M) as well. Since the primal problem is solvable there is a vector x ^ S with f{x) < f{x)
for all
xeS.
Consequently we have (/(S),Oy)^int(M) and m t ( M ) n { ( / ( S ) , O y ) } = 0. By the Eidelheit separation theorem (Thm. C.2) there are real numbers /i and 7 and a continuous linear functional u ^V with (fi^u) j^ (0,0y*) and IJ.P + u{z) > 7 > fif{x)
for all (/?, z) G int(M).
(6.6)
Since every convex subset of a real normed space with nonempty interior is contained in the closure of the interior of this set, we conclude from the inequality (6.6) /i(/(x) + a) + u{g{x) + y) > 7 > ^Jif{x) for dl\ x e S,a>Q,y
e C. (6.7)
For X = X and a = 0 it follows from the inequality (6.7) u{y) > u{g{x))
for all yeC,
(6.8)
With standard arguments we get immediately t^ G C*. For y = Oy it follows from the inequality (6.8) u{g{x)) > 0. Because of g{x) E —C and tA e C* we also have u{g{x)) < 0 which leads to u{g{x)) = 0. For X = X and y = Oy we get from the inequality (6.7) /ia > 0 for all a > 0 which implies /i > 0. For the proof of /i > 0 we assume that [1 = 0, Then it follows from the inequality (6.7) with y = Oy u{g{x)) > 0 for all x e S.
6.2. Duality Theorems
167
Because of the generalized Slater condition there is one x ^ S with g{x) e —int(C), and then we have u{g{x)) = 0. Now we want to show that u = Oy*. For that purpose we assume that u ^ Oy*, i.e., there is one y EY with u{y) > 0. Then we have u{Xy + (1  A) g{x)) > 0 for all A G (0,1],
(6.9)
and because of g{x) G ~int((7) there is one A G (0,1) with Xy+{1
A) g{x) G C
for all A G [0, A].
Then we get u{Xy + (1  A) g{x)) < 0 for all A G [0, A] which contradicts the inequahty (6.9). With the assumption /i == 0 we also obtain u = Oy*, a contradiction to (/i,n) 7^ (0,0y*). Consequently, we have /i 7^ 0 and therefore fi > 0. Then we conclude from the inequality (6.7) with a == 0 and y = Oy IJif{x) + u{g{x)) > fxf{x) for all x E S and f{x) + ~ u{g{x)) > f{x)
for all
xeS.
If we define u := u E C* we obtain with u{g(x)) = 0 inf f{x) + xes
u{g{x))>f{x)+u{g{x)).
Hence we have f{x) + u{g{x)) = inf f{x) + u{g[x)), xes and with the weak duality theorem tl G C* is a maximal solution of the dual problem (6.4). Obviously, the extremal values of the primal and dual problem are equal. •
168
Chapter 6. Duality
In the following we discuss the practical importance of the strong duality theorem. If one wants to solve the primal problem and if one is interested in the minimal value in particular, then under suitable assumptions one can also solve the dual problem and determine the maximal value which is then equal to the minimal value of the primal problem. If the dual problem is simpler to solve than the primal problem, then this method is very useful.
6.3
Saddle Point Theorems
Relationships between the primal and the dual problem can also be described by a saddle point behavior of the Lagrange functional. These relationships will be investigated in this section. First, we define the notion of the Lagrange functional which has already been mentioned in the context of the generalized Lagrange multiplier rule in Section 5.2. Definition 6.8. Let the assumption (6.1) be satisfied. functional L : 5 x C* —> R with
The
L(x, u) = f{x) + u{g{x)) for all x G S' and all u E C is called Lagrange functional. Since we will investigate saddle points of the Lagrange functional L, we introduce the following notion. Definition 6.9. Let the assumption (6.1) be satisfied. A point (x, iZ) G 5 X C* is called a saddle point of the Lagrange functional L if L(x, u) < L(S, u) < L(x, u) for all x G 5 and all u E C*.
A saddle point of the Lagrange functional can be characterized by a "min sup = max inf" result which goes back to a known John von
6.3. Saddle Point Theorems
169
Neumann^ saddle point theorem. T h e o r e m 6.10. Let the assumption (6.1) he satisfied. A point (x, u) G *§ X C* is a saddle point of the Lagrange functional L if and only if L ( x , ' u ) = m i n sup L ( x , i ^ ) = m a x inf L{x^u). xes uec* ^^ y be a continuous linear mapping; let 6 G y be a given element; let the constraint set S := {x E Cx \ A{x) — b E Cy} he nonempty. Under this assumption we consider the primal problem min c(x) subject to the constraints A{x)bECY xeCx
. ^
fa ^Q\ ^^'^^^
6.4. Linear Problems
173
In the problem formulation (6.2) we have replaced the objective functional / by the continuous linear functional c and the constraint mapping ghy b — A{'). The set S equals the ordering cone Cx Notice that under the assumption (6.12) the composite mapping (c(),6 — M')) : Cx —> M X y is also convexhke. In this case the dual problem reads (by (6.4)) max inf c(x) + u(b — A(x)). uec^ xeCx This problem is equivalent to the problem (compare (6.5)) max A subject to the constraints c{x) + u{b  A{x)) > A for all x e Cx
"I I (
{f^^A\ ^ ^ ^
If we define the constraint set of the problem (6.14) as 5* := {(A, u) eRxC;r\
c{x) + u{b  A{x)) > A for all x E C^}, (6.15) then we can reformulate this constraint set using the following lemma. Lemma 6.13. Let the assumption (6.12) be satisfied, and let the set S* be given by (6.15). Then it follows 5* = { ( A , ^ ) G R X Q  C  A * ( ^ ) G Q
and\ 0 and /2(0 > 0 for all t e M}. If we define c := ( 1 , 0 , . . . , 0) G R"+S b := {v, v) G Y and the mapping A : X ^ y with Ae + 2_^ XiVi i=l
A{X,x)
for all (A, x) E \
XoVi AeE i=l
pn+l
/
then the problem (6.20) can also be written as follows: min c^(A,x) subject to the constraints
A{X,x)beCY (A,x) G C X .
(6.21)
6.5. Application to Approximation Problems
177
This is a linear optimization problem which was aheady discussed in the preceding section. For the formulation of the equivalent dual problem (by (6.17)) we need the adjoint mapping A* of A, among other things. The mapping A* : F* > X* (  E^"^^) is defined by A''{ui,U2){X,x)
=
{ui,U2){A{X,x))
= Ui i Xe + Y^ XiVi \ +U2 I Xe — V^ XiVi j for all (A, x) G W~^^. The statement c  A * (^1,7x2) e Cx is equivalent to Xui
I Xe + Y^XiVi I U2 I Xe ^^XiVi ] =0 for all (A,a;)GR''^^
resulting in n
X{lui{e)U2{e))
+ "^Xi{u2{vi)~ui{vi))=0
for all (A,x) G E^+l
z=l
This equation is also equivalent to Ui{vi)  U2{vi) = 0 for all z G { 1 , . . . , n} and ui{e) +U2{e) = 1. Consequently, the equivalent dual problem (by (6.17)) which is associated to the problem (6.21) reads as follows: max Ui{v) — U2{v) subject to the constraints '^i('^i) ~ 'i^2{vi) = 0 for alH G { 1 , . . . , n} ui{e) + U2{e) = 1
(6.22)
This problem is also a semiinfinite optimization problem which has finitely many constraints in the form of equalities. With the following representation theorem for positive linear forms on C{M) the
178
Chapter 6. Duality
problem (6.22) can be simplified essentially. A proof of this representation theorem can be found in the book [201, p. 184] by Krabs. T h e o r e m 6.14. Let F be a finite dimensional linear subspace of C{M) (compare (6.18)) spanned by functions / i , . . . , /m G C(M). Let F be partially ordered in a natural way, and assume that there is a function f E F with f{t) > 0 for all t e M. Then every continuous linear functional I G Cp (dual cone in F*) can be represented as k
mYl^jfitj)
forallfeF
where fc G N, t i , . . . , t/. G M are different points, and A i , . . . , A/^ are nonnegative real numbers. Now we apply this theorem to the linear subspace E. Since e E E with e{t) = 1 > 0 for alH G M, all assumptions of Theorem 6.14 are fulfilled, and therefore we obtain the following representations for Ui^U2 G C% (dual cone in £"*) ki
and k2
U2{v) = ^
X2jV{hj) for all v e E.
3=1
Here we have A:i,A:2 G N; t i j , . . . , t i ^ ^ M are different points; hi^ " ' ^hk ^ ^ ^^^ different points; and it is Ai^,..., Ai^ , A21,...,
K, > 0. '
6.5. Application to Approximation Problems
179
Consequently, the problem (6.22) is equivalent to the following problem: ki
k2
max Xl'^ii'^(*i.)  X^^2,^(i2,) subject to the constraints ki
k2
^>^ijVi{ti.) i=i kl
 ^X2jVi{t2j) 3=1
= 0 for alH G { 1 , . . . , n } \ (6.23)
/C2
A l l , . . . j A i ^ ^ , A 2 1 , . . . ,A2fc2 > 0
Before simphfying this problem we discuss the question of solvability. Theorem 6.15. Let the assumption (6.18) he satisfied. Then the optimization problem (6.23) has at least one maximal solution (Aii,...,Ai^^,A2i,...,A2fe2'*ii'"'^ifci' hi,'"Mk^), cirid the extremal value of this problem equals the extremal value of the problem (6.19). Proof. By Theorem 2.20 the problem (6.19) of linear Chebyshev approximation is solvable. Then the equivalent linear optimization problem (6.21) is also solvable. The ordering cone Cy has a nonempty interior; and the generalized Slater condition is satisfied, because for an arbitrary £ G R^ we obtain with A

^
XiVi
+1
i=l
also b  A(A, x) G int(C). Then by Theorem 6.7 the problem (6.22) which is equivalent to the dual problem of (6.21) is also solvable. With the preceding remarks this problem is also equivalent to the maximization problem (6.23) which has therefore a solution. Finally,
Chapter 6. Duality
180
we conclude with Theorem 6.7 that the extremal values of the corresponding problems are equal. • The maximization problem (6.23) is a finite optimization problem with finitely many variables and finitely many constraints. But it is unwieldy because ki and k2 are not known. One can show that ki + k2 < n + 1 (we refer to Krabs [201, p. 54]). But even if we restrict the number of variables in the maximization problem (6.23) in this way, this problem is a finite nonlinear optimization problem which, from a numerical point of view, is not easier to solve than the original problem of linear Chebyshev approximation. Finally, we formulate a socalled alternation theorem for the investigated problem of linear Chebyshev approximation. Theorem 6.16. Let the assumption (6.18) be satisfied. A vector X ^ W^ is a solution of the problem (6.19) of linear Chebyshev apn
proximation (i.e., 2_\^i^i '^^ ^ best approximation to v in E) if and ^=l
only if there are k < n+ 1 different points t i , . . . , t^ G M with
^(^j) ~ ^^iM''^j)\
= r ~ X I ^^^^
i=l
^^^ all j = l,...,k
(6.24)
i=l
and there are numbers A i , . . . , A/. G M with k
(6.25)
y~^ XjVi{tj) = 0 for alii =
Xj^O
for j = l,...,k=^
v{tj)Y^XiVi{tj) i=l
1,..., n.
= ^^v'^XiVi^],
(6.26)
sgn{Xj).
i=l
(6.27)
6.5. Application to Approximation Problems
181
Proof. First we assume that for some x G R^ there are k < n+1 different points ti^... ^tk E M so that the conditions (6.24), (6.25), (6.26) and (6.27) are satisfied. Then we obtain for every x G M^ n
k
n
\vY^XiVi^=^\\j\ i=l
^v'^XiVi^
j=l
(by (6.25))
i=l
k
/
n
:= J]A,sgn(A,) {v{tj)Y,^iV,{t^)\ 3=1 \ i=l J k
n
k
j=l k
i=l
j=l
= EV(^i) k
(by (6.26)) n
k
= J2 ^^•^(*^) ~ E ^^ E ^Mtj) i=i k
/
(by (6.27))
*=i i = i n
(by (6.26))
^
"= ^^j U(*i)l]^^^^(*i) j=l k
i=l
\ n
n
= \\vJ2xiVi\\
(by (6.25)).
i=l
Consequently, x is a solution of the problem (6.19) of the hnear Chebyshev approximation. Next, we assume that x G M^ solves the problem (6.19). By Theorem 6.15 the optimization problem (6.23) has a maximal solution (All ^ • • • 5 K, 5 A21,..., As,^, t i , , . . . , f 1,^, ^21,..., ^2^2) (with positive A l l , . . . , Aifc^, A21,..., A2fc2 J otherwise, if A^^. = 0 for some i G {1,2} and some j G { 1 , . . . , A;^}, we can drop the variable A^^. together with the point ti without changing the minimal value of the problem (6.23)), and the extremal values of the two problems are equal, i.e. ki
/?:=
V  J^XiVi\\ = ^ i=l
j=l
/C2
Ai.t)(ii,)  ^ j=l
A2,z)(t2,).
(6.28)
Chapter 6. Duality
182
Because of the constraint /C2
kl
X ; Ai,^,(ti,) J2>^^Mt^j) 3=1
= 0 for all i e { 1 , . . . , n }
(6.29)
3=1
it follows kl
k2
n
n
(6.30) and with the constraint /C2
kl
(6.31) and the equations (6.30) and (6.28) we conclude "
k2
n
"1
i=l
J
r
^
L
i=l
J nO
n 1
/
nl
IvO
E^i.^(*i.)+E^2i^(^)+/^ E^1. + E^2. 
0.
Then the following equations are satisfied: n
^(^ij ~ E^*^^(^ij) "^ ^ "^^^ ^ J e { 1 , . . . , fci}, i=l
^(^2,)  E^»^»(^2,) = /? for all j G { 1 , . . . , A;2}. i=i
If we define the variables [ij
:=
Ai^. for j = l,...,A;i,
Sj
:=
ti^. for j = 1 , . . . , kx
\
6.5. Application to Approximation Problems
183
and ///ci+j := A2,. for j = 1 , . . . , A:2, Sk^+j := t2j for j = 1,...,A:2, we get with the equation (6.31)
and with the equation (6.29) it follows y] lij Vi{sj) = 0 for all z = 1 , . . . , n. j=i
Moreover, the following equation is satisfied: n
n
v{sj)YxiVi{sj)
= pXl^^'^^ sgn(/ij) for all j G {1,... ,ki+k2}.
i=l
i=l
If we notice that ki + k2 < n+ 1^ then the assertion follows immediately. •
Example 6.17. We consider again Example 1.4 and ask for a solution of the problem min max Isinht — xt\. xeR te[Q,2]
By the alternation theorem the necessary and sufficient conditions for a minimal solution :r G R of this problem read as follows:
IA1I + IA2HI Aiti + A2t2 = 0 Ai 7^ 0 => sinhti — xti = sinh — xid sgn(Ai) A2 7^ 0 =^ sinht2 — xt2 = sinh — xid sgn(A2) [sinhti — xti\ = sinh — xid sinht2 — ^hl = Ijsinh — ^id A I , A 2 G M ; ti,£2G[0,2].
184
Chapter 6. Duality
One obtains from these conditions that :r is a minimal solution of the considered approximation problem if and only if x ?^ 1.600233 (see Fig. 1.3).
Exercises 6.1) Let the following primal minimization problem be given: 1
min 2a +
txit) dt 0
subject to the constraints 1
1—a —
x{s) ds < 0 almost everywhere on [0,1]
t
x{t) > 0 almost everywhere on [0,1] a>0 XGL2[0,1],
aeR.
(a) Formulate the equivalent dual problem (6.5) of this minimization problem. (b) Show that the minimal value of this problem is 2 and that the maximal value of the dual problem (6.4) is 1. Consequently, there is a duality gap. 6.2) With the matrices An G R(^i'^i), A12 G R(^i'^2)^ ^^i G R(^2,ni)^ A22 G M("^2,n2) and the vectors 61 G R^S 62 ^ R^^ ci G W\ C2 G R^^ we consider the following primal linear optimization problem: min cf Xi + c^X2 subject to the constraints AnXi + A12X2 = bi A21X1 + A22X2 > 62 Xl >OMni, X2eW\
The inequalities have to be understood component by component. Associate to this primal problem an equivalent dual problem by (6.17).
Exercises
185
6.3) Consider the problem (6.19) of the hnear Chebyshev approximation with M = [0,1], v{t) = t^ for all t e [0,1], n = 1, Vi{t) = t for all t G [0,1]. With the aid of the alternation theorem (Theorem 6.16) determine a solution of this problem.
Chapter 7 Application to Extended Semidefinite Optimization In semidefinite optimization one investigates nonlinear optimization problems in finite dimensions with a constraint requiring that a certain matrixvalued function is negative semidefinite. This type of problems arises in convex optimization, approximation theory, control theory, combinatorial optimization and engineering. In system and control theory socalled linear matrix inequalities (LMFs) and extensions like bilinear matrix inequahties (BMFs) fit into this class of constraints. Our investigations include various partial orderings for the description of the matrix constraint and in this way we extend the standard semidefinite case to other types of constraints. We apply the theory on optimality conditions developed in Chapter 5 and the duality theory of Chapter 6 to these extended semidefinite optimization problems.
7.1
Lowner Ordering Cone and Extensions
In the socalled conic optimization one investigates finite dimensional optimization problems with an inequality constraint with respect to a special matrix space. To be more specific, let <S^ denote the real linear space of symmetric (n, n)matrices. It is obvious that this space is a
188
Chapter 7. Application to Extended Semidefinite Optimization
finite dimensional Hilbert space with the scalar product (•, •) defined by {A, B) = trace(A • B) for all A,B e 5^. (7.1) Recall that the trace of a matrix is defined as sum of all diagonal elements of the matrix. Let C be a convex cone in S'^ inducing a partial ordering :^. Then we consider a matrix function G : W^ —^ S^ defining the inequality constraint G{x) ^^s^.
(7.2)
If / : W^ —^ R denotes a given objective function, then we obtain the conic optimization problem min/(x) subject to the constraints . G{X) ^ {)sn ^
,^ . ^^'"^^
The name of this problem comes from the fact that the matrix inequality has to be interpreted using the ordering cone C. Obviously, the theory developed in this book is fully apphcable to this problem structure. In the special literature one often investigates problems of the form min/(X) subject to the constraints •
.^ ..
X eSP with given functions / : <S^ —> R and G : S'^ ^ S^. In this case the matrix X E S^ can be transformed to a vector x E R^'^ where X consists of all columns of X by stacking up columns of X from the first to the pth column. The dimension can be reduced because p(p+i) X is symmetric. Then we obtain x G R 2 . If cp denotes the transformation from the vector x to the matrix X, then the problem (7.4) can be written as min {f oip){x) subject to the constraints (G0(^)(X) ^O^n £(£+1)
X e R~"2—.
7.1. Lowner Ordering Cone and Extensions
189
Hence, the optimization problem is of the form of problem (7.3) and it is not necessary to study the nonlinear optimization problem (7.4) separately. In practice, one works with special ordering cones for the Hilbert space <S^. These cones are discussed now. Remark 7.1. Let <S^ denote the real linear space of symmetric (n, n) matrices. (a) The convex cone S^l := {X e S'^ \ X is positive semidefinite} is called the Lowner^^ ordering cone. The partial ordering induced by the convex cone S!l is also called Lowner partial ordering :< (notice that we use the special symbol :< for this partial ordering). The problem (7.3) equipped with the Lowner partial ordering is then called a semidefinite optimization problem. The name of this problem is caused by the fact that the inequality constraint means that the matrix G{x) has to be negative semidefinite. Although the semidefinite optimization problem is only a finite dimensional problem, it is not a usual problem in W^ because the Lowner partial ordering makes the inequality constraint complicated. In fact, the inequality (7.2) is equivalent to infinitely many inequalities of the form y^G{x)y 0 for all y e K} for a given convex cone K CR^^ i.e., we consider only matrices for which the quadratic form is nonnegative on the convex cone •^^K. Lowner, Uber monotone Matrixfunktionen, Mathematische Zeitschrift 38 (1934) 177216.
190
Chapter 7. Application to Extended Semidefinite Optimization
K. If the partial ordering induced by this convex cone is used in problem (7.3), then we speak of a Kcopositive optimization problem. It is evident that S!^ C C^ for every convex cone K and S'!^ = C^n. Therefore, we have for the dual cones (C^)* C (<S!f)*. If K equals the positive orthant W\_^ then C£n is simply called copositive ordering cone and the problem (7.3) is then called copositive optimization problem. (c) The nonnegative ordering cone is defined by A^^ := {X G 5^ I Xij > 0 for all ij G { 1 , . . . ,n}}. In this case the optimization problem (7.3) with the partial ordering induced by the convex cone A^'^ reduces to a standard optimization problem of the form min/(a;) subject to the constraints Gij{x) < 0 for all z, j G { 1 , . . . , n} X
eW^.
The number of constraints can actually be reduced to lii^^tlz because the matrix G[x) is assumed to be symmetric. So, such a problem can be investigated with the standard theory of nonlinear optimization in finite dimensions. (d) The doubly nonnegative ordering cone is defined by =
{X E S^ \ X IS positive semidefinite and elementwise nonnegative}.
If we use the partial ordering induced by this convex cone in the constraint (7.2), then the optimization problem (7.3) can be written as min/(x) subject to the constraints G{x) ^ Osn Gij{x) < 0 for all z, j G { 1 , . . . , n} X
eW^.
7.1. Lowner Ordering Cone and Extensions
191
So, we have a semidefinite optimization problem with additional finitely many nonlinear constraints. Obviously, for every convex cone K we have D^ C C^ and (C^)* C {D'^y. Before discussing some examples we need an important lemma on the Schur complement Lemma 7.2. Let X = f "^ ^
J G 5^+^ with A e S\
C e S^
and B G R^^'^^ be given, and assume that A is positive definite. Then we have for the Lowner partial ordering :< X
^ Osk,1 ^=^
{C
BA^B^)
^ Osi
(the matrix C — BA~^B^ is called the Schur complement of A in X). Proof. We have
= x^Ax + 2x^B^y + y^Cy
for all x eR^ and all y G M^ 4==> 0 < min x^Ax + 2x^B^y + y^Cy for all y e^K Since A is positive definite, for an arbitrarily chosen ?/ G M^ this optimization problem has the minimal solution —A~^B^y with the minimal value y^BA'B^y
+ y^Cy = y^{C 
BA'B^)y.
Consequently we get X^Qs^^i
4=^ ^=>
y^{CBA^B^)y>0
for all y G R^
~{CBA^B^)^Osi. D
The following example illustrates the significance of semidefinite optimization.
192
Chapter 7. Application to Extended Semidefinite Optimization
Example 73. (a) The problem of determining the smallest among the largest eigenvalues of a matrixvalued function A : M^ —> S^ leads to the semidefinite optimization problem min A subject to the constraints A{x) ~XI di Osn (with the identity matrix / G 5^ and the Lowner partial ordering : 0 with v{x) > P\\x\\^ for all x G R^ Then we conclude with (7.10) lim x(t) = 0 , t—^oo
7.1. Lowner Ordering Cone and Extensions
195
i.e. the autonomous linear system (7.7) is stabilizable. Hence, we obtain the stabihzation of (7.7) by a feedback control, if we choose a positive definite matrix P E S^ and a matrix F G M^^'^^ so that {A + BFYP + P{A + BF) is negative definite. In order to fulfill this requirement we consider the semidefinite optimization problem min A subject to the constraints XI + {A + BFfP + P{A + BF) di O^fc  A / P di^s^ AGR,
PeS^,
(7.11)
FGR(^'^)
(/ G S^ denotes the identity matrix and recall that :< denotes the Lowner partial ordering). By a suitable transformation this problem formally fits into the class (7.3) of semidefinite problems. Here G has to be defined in an appropriate way. It is important to note that it is not necessary to solve the problem (7.11). Only a feasible solution with A < 0 is requested. Then the matrices P and F fulfill the requirements for the stabilization of the autonomous linear system (7.7). (d) Finally we discuss an applied problem from structural optimization and consider a structure of k elastic bars connecting a set of Tp nodes (see Figure 7.1). The design variables xi {i — 1^... ,k)
Figure 7.1: Cantilever with 7 nodes and the load force fj.
196
Chapter 7. Application to Extended Semidefinite Optimization
are the crosssectional areas of the bars. We assume that nodal load forces / i , . . . , /p are given, / i , . . . , //. denote the length of the bars, v is the maximal volume, and x^ > 0 and Xi a r e the lower and upper bounds of the crosssectional areas. The socalled stiffness matrix A{x) G S'^ is positive definite for all X i , . . . , x/j; > 0. We want to find a feasible structure with minimal elastic stored energy. Then we obtain the optimization problem
min/^A(x)~V subject to the constraints
y ^ kxi < V z=l
oc.i < Xi < Xi for all z G { 1 , . . . , k} or min A subject to the constraints
fA{x)~'f
 A 0, i.e. X G (C^)*. For the proof of the converse inclusion we first show H^ C C^. Let an arbitrary X ^ C^ be given. Then there is some y E K with y^Xy < 0. If we set Y" := yy'^, then we have Y G HK and (y, X) = trace(y • X) = tmce{Xyy^)
= y^Xy < 0,
i.e. X ^ H^. Consequently H^ C CK and for the dual cones we get (C^)* C m y . (7.12) Next, we show that {H]^)* C HK For this proof let Z G (F^)* be arbitrarily given and assume that Z ^ i// 0. Proof. Because of the differentiability assumptions we have that / and G are Frechet differentiable at x. Then we apply Corollary 5.4 and obtain the existence of a real number /i > 0 and a matrix L E C fiVf{x) + Lo G'{x) = ORm and {L,G{x))=0. For every h G R"^ we obtain with Lemma 7.7 {LoG'{x)){h)
=
{L,G'{x){h)) m i=l
m
= Y.mG,,{x))hi i=i
(7.18)
7.2. Optimality Conditions
205
/ {L,G,,{x)) y h.
Then the equahty (7.18) imphes (L,G,,(x)) \
/xV/(x) +
{L,G,Jx)) J Hence, one part of the assertion is shown. If we consider the KurcyuszRobinsonZowe regularity assumption (5.9) for the special problem (7.3), we have 5 = M^ and cone(^  {x}) = W^. So, the equality (7.17) is equivalent to the regularity assumption (5.9). This completes the proof. • In the case of /i > 0 we can set U := L E C* and the equalities (7.15) and (7.16) can be written as f,,{x) + {U,G,,{x)) = 0 for all z G { 1 , . . . ,m} and {U,G{x)) = 0. This gives the extended KarushKuhnTucker conditions to matrix space problems. In Theorem 7.8 the Lagrange multiplier L is a matrix in the dual cone C*. According to the specific choice of the ordering cone C discussed in Lemmas 7.4, 7.5 and 7.6 we take the dual cones given in Lemmas 7.4,(b), 7.5,(b),(ii) and 7.6,(b),(d). For instance, if C denotes the Lowner ordering cone, then the multiplier L is positive semidefinite. Instead of the regularity assumption (7.17) used in Theorem 7.8 we can also consider a simpler condition. Lemma 7.9. Let the assumption of Theorem 7.8 be satisfied and let C denote the Kcopositive ordering cone C^ for an arbitrary convex cone K. If there exists a vector x G M^ so that G{x) +
206
Chapter 7. Application to Extended Semidefinite Optimization
m
2_\Gxi{x)(xi~Xi)
is negative definite, then the regularity assumption
in Theorem 7.8 is fulfilled. Proof. By Lemma 7.5,(a) we have m
G{x) + G'{x){xx)
=
G{x) +
^G,,{x){^i^i)
e
int(Q)
and with Theorem 5.6 the KurcyuszRobinsonZowe regularity assumption is satisfied, i.e. the equahty (7.17) is fulfilled. •
It is obvious that in the case of the Lowner partial ordering 5'!^ = C^n Lemma 7.9 is also applicable. Notice that a similar result can be shown for the ordering cones discussed in Lemma 7.6. For the interior of these cones we can then use the results in Lemma 7.6, (a) and (c). Next, we answer the question under which assumptions the Lagrange multiplier rule given in Theorem 7.8 as a necessary optimality condition is a sufficient optimality condition for the conic optimization problem (7.3). Theorem 7.10. Let f : W^ ^ R and G : W^ ^ S"" be given functions. Let for some x G R^ / be differentiable and pseudoconvex atx and letG be elementwise differentiable and {—C+cone{{G{x)}) — cone{{G{x)}))quasiconvex at x. If there is a matrix L E C" with Vf{x)+
/ {L,G,,ix)) :
\ =0R
(7.19)
\{L,G,Jx)) J and {L,G{x))=0, then X is a minimal solution of the conic optimization problem (7.3). Proof. With Lemma 7.7 the equality (7.19) implies Vf{x) + LoG\x)
= OMm.
7.3. Duality
207
By Lemma 5.16 and Corollary 5.15 the assertion follows immediately. D
The quasiconvexity assumption in Theorem 7.10 (compare Definition 5.12) means that for all feasible x G M^ G{x)  G{x) G  C + cone{{G{x)}) 
cone{{G{x)})
m
=^
X^G'^i(^)(2;i  Xi) e  C + cone({G(^)}) 
cone{{G{x)}).
For all feasible x G W^ this implication can be rewritten as G{x) + (a  1  p)G{x) e C
for some
a,p>0
m
=4>
yZ ^^i (^)(^^ ~ ^i) + (7 "~ ^)G{x) E —C for some 7,5 > 0
or G{x) + aG{x) G C
for some a
eR
m
==> y^Gxi{x){xi
— Xi) + ^G{x) G —C for some 7 G M.
1=1
7.3
Duality
The duality theory developed in Chapter 6 is now applied to the conic optimization problem (7.3) with given functions / : W^ —> M and G : W^ —> 5 " and the partial ordering ^^ inducing the ordering cone C. For convenience we recall the primal optimization problem min/(x) subject to the constraints G{X) ^ O^n XGR^.
According to Section 6.1 the dual problem can be written as max inf fix) + (U,Gix))
(7.20)
208
Chapter 7. Application to Extended Semidefinite Optimization
or equivalently max A subject to the constraints f{x) + ([/, G{x)) > A for all x G R^ A G R, U eC\ We are now able to formulate a weak duality theorem for the conic optimization problem (7.3). Theorem 7.11. Let the composite mapping (/, G) : R"^ ^RxS^ be convexlike. For every feasible x of the primal problem (7.3) and for every feasible U of the dual problem (7.20) the following inequality is satisfied
Mfix) + {tj,G{x)) A^^ positive semidefinite.
7.6) Show that the linear semidefinite optimization problem min X2 subject to the constraints Xi 1
1 X2 '
Xi,X2 G M
(where :< denotes the Lowner partial ordering) is not solvable. 7.7) Let the hnear mapping G : R^ —^ <S^ with
G{xuX2) =(11
^0' ) f^^ ^^^ (^i'^2) G M'
be given. Show that G does not fulfill the generahzed Slater condition given in Theorem 7.12 for C = S^.
Exercises
211
7.8) Let c G R"*, B G 5 " and a linear mapping A : R"* ^ <S" with A{x) = A^^^xi + ... + A^'^^Xm for all
xeW^
for A W , . . . , A ( ' " ) e<S"be given. Show that for the Hnear problem min c^x subject to the constraints B ^ A{x) the dual problem is given by max (J5, U) subject to the constraints {Am^U)=Cr
U
eC\
7.9) Consider the linear semidefinite optimization problem min xi subject to the constraints 0 xi 0 \ Xi
X2
0
0
0
Xi1
I^ O53
J
Xi,X2 G M
(where :< denotes the Lowner partial ordering). Give the corresponding dual problem and show that the extremal values of the primal and dual problem are not equal. Why is Theorem 7.12 not applicable?
Chapter 8 Direct Treatment of Special Optimization Problems Many of the results derived in this book are concerned with a generally formulated optimization problem. But if a concrete problem is given which has a rich mathematical structure, then solutions or characterizations of solutions can be derived sometimes in a direct way. In this case one takes advantage of the special structure of the optimization problem and can achieve the desired results very quickly. In this final chapter we present two special optimal control problems and show how they can be treated without the use of general theoretical optimization results. The first problem is a socalled linear quadratic optimal control problem. For the given quadratic objective functional one gets a minimal solution with the aid of a simple quadratic completion without using necessary optimality conditions. The second problem is a timeminimal optimal control problem which can be solved directly by the apphcation of a separation theorem.
8.1
Linear Quadratic Optimal Control Problems
In this section we consider a system of autonomous linear differential equations x{t) = Ax{t) + Bu{t) almost everywhere on [0, f]
(8.1)
214
Chapter 8. Direct Treatment of Special Optimization Problems
and an initial condition x{0) = x^
(8.2)
(where T > 0 and x^ G E^ are arbitrarily given). Let A and B be (n, n) and (n, m) matrices with real coefficients, respectively. Let every control u G L^([0,r]) be feasible (i.e. the controls are unconstrained). It is our aim to steer the system (8.1), (8.2) as close to a state of rest as possible at the terminal time T. In other words: For a given positive definite symmetric (n, n) matrix G with real coefficients the quadratic form X{TYGx{T) should be minimal. Since we want to reach our goal with a minimal steering effort, for a given positive definite symmetric (m, m) matrix R with real coefficients the expresf sion / u(tYRu{t) dt should be minimized as well. These two goals are 0
used for the definition of the objective functional J : I/^([0, T]) —> M with f
J{u) = x{ffGx{f)
+ Iu{tYRu{t)dt
for all u G L^([0,f]).
0
Under these assumptions the considered linear quadratic optimal control problem then reads as follows: Minimize the objective functional J with respects to all controls u G L^([0, T]) for which the result I ing trajectory is given by the system (8.1) of dif [ ferential equations and the initial condition (8.2). J
. . ^ '^
In order to be able to present an optimal control for the problem (8.3) we need two technical lemmas. Lemma 8.1. Let P() he a real {n^n) matrix function which is symmetric and differentiable on [0, T]. Then it follows for an arbitrary control u G L^([0,T']) and a trajectory x of the initial value problem (8.1), (8.2): f 0 =
a;°'^P(0)x°  x{ffP{f)x{f)
+ f ^2u{tf
B'^P{t)x{t)
8.1. Linear Quadratic Optimal Control Problems
215
+x{tY [Pit) + A^P{t) + Pit)A) xit) dt.
Proof. For an arbitrary control u G L^([0,r]) and a corresponding trajectory x of the initial value problem (8.1), (8.2) and an arbitrary real matrix function P() defined on [0,T] and being symmetric and differentiable it follows:
I [x{tYP{t)x{t)] = x{tfP{t)x{t) =
{Ax{t) + Bu{t)f +x{tf
=
+ x{tf
iP{t)x{t) + P{t)x{t))
P{t)x{t)
iP{t)x{t) + P{t) {Ax{t) + 5M(i)))
x{tf(^P{t)+A^P{t)
+
+2u{tf B^P{t)x{t)
Pit)A^x{t)
almost everywhere on [0,f].
With the initial condition (8.2) we get immediately by integration x{ffP(f)x{f)
 x'>'^P{0)x''
1
1
\2u{tfB^P{t)x{t) +x{tf
(P(t) + A^Pit) + P{t)A)
xit) dt
which implies the assertion.
Lemma 8.2.
n
The {n,n) matrix function P() with
P{t) = ^A(tf)gi^A'r(tf)
^
I
/ e^(*^)5i?iS^e^^(*^) ds
for all t e [0, f]
(8.4)
216
Chapter 8. Direct Treatment of Special Optimization Problems
is a solution of the Bernoulli matrix differential equation P{t) + A^P{t) + P{t)A  P{t)BR^B^P{t)
= 0 for all t e [0, f] (8.5)
with the terminal condition
P{f) = G.
(8.6)
The matrix function P() defined by (8.4) is symmetric. Proof. First we define the (n, n) matrix function Q() by
ds
for all te
[0,f]
(notice that the matrix exponential function is defined as a matrix series). It is evident that Q{') is a symmetric matrix function. For an arbitrary z ER^^ z y>^ ORn, we obtain z^Q{t)z f
>
>0 O f o r a l H e [0,f].
t
>0
Consequently, for every t G [0,T] the matrix Q(t) is positiv definite and therefore invertible, i.e. the matrix function P() with P{t)^Q{t)^
f o r a l H E [0,f]
is welldefined. Since Q() is symmetric, P() is also symmetric. It is obvious that P() satisfies the terminal condition (8.6). Hence, it remains to be shown that P() is a solution of the Bernoulli matrix differential equation (8.5). For this proof we calculate the derivative (notice the implications for arbitrary t G [0, T]: Q{t) • Q{t)~^ = I ==^
8.1. Linear Quadratic Optimal Control Problems
217
Q{t)Qit)' + Q{t)i{Q{tr') = 0 =^ iiQit)')
= Q{tr'Q{t)
Pit) = iiQity') =
Q{t)'Qm{t)' f t
+e^(*^)Si?i5^e^'^(*^U^) ds 
BR'B^'\Q{t)~'

Q{t)'
[AQ{t) + Q{t)A^  BR^B^]
Q{t)~'
= =
Q{t)'A  A^Q{t)' + Q{t)'BR'B^Q{t)' P{t)A  A^P{t) + P{t)BR^B^P{t) for all t G [0,f ].
Consequently, P() satisfies the Bernoulli matrix differential equation (8.5). D With the aid of the two preceding lemmas it is now possible to present the optimal control of the linear quadratic problem (8.3). Theorem 8.3.
The socalled feedback control u given by
u{t) = —R~^B^P{t)x{t)
almost everywhere on [0,T]
is the only optimal control of the linear quadratic control problem (8.3) where the matrix function P() is given by (8.4)Proof. In the following let P() be the matrix function defined by (8.4). Then we have with Lemma 8.1 and Lemma 8.2 for every control u e L^([0, f ]) with u ^ u:
J{u) =
x{ffGx{f)+fu{tfRu{t)dt 0
=
x^^P(0)x^ + x{ff[G

P{f)]x{T)
Chapter 8. Direct Treatment of Special Optimization Problems
218
+ I \u{tfRu{t) +
2u{tfB^P{t)x{t)
0
+x{tf{p{t)
+ A^Pit) +
P{t)Ay{t)\ dt
(from Lemma 8.1) =
X
f P(0)x^ + / \u{tfRu{t) +
0^]
2u{tfB^P{t)x{t)
0
+x{tY P{t)BR^B^ P{t)x{t)\ dt (from Lemma 8.2) 1
X
P{Qi)x^ + / {u{t) +
R~^B^P{t)x{t)YR
0
(u{t) + R^B^P{t)x{t)) dt > a;°^P(0)x'^ = J{u). Hence u is the only minimal point of the functional J.
D
The optimal control presented in Theorem 8.3 depends on the time variable t and the current state x{t). Such a control is called a feedback or a closed loop control (see Fig. 8.1). u{t)
x{t) ^ x{t) = Ax{t) + Bu(t)
u {t) =
R^B^P{t)x{t)k:
Figure 8.1: Feedback control of Theorem 8.3.
8.1. Linear Quadratic Optimal Control Problems
219
If the control function depends only on t and not on the state x(t), then it is called an open loop control Feedback controls are of special importance for applications. Although feedback controls are also derived from the mathematical model, they make use of the real state of the system which is described mathematically only in an approximate way. Hence, in the case of perturbations which are not included in the mathematical model, feedback controls are often more realistic for the regulation of the system. Since the matrix function P is analytic and the trajectory x is absolutely continuous, the optimal control u in Theorem 8.3 is an absolutely continuous vector function. In fact, a solution of the linear quadratic optimal control problem lies in a smaller subspace of Notice that the proof of Theorem 8.3 could be done with the aid of an optimality condition. Instead of this we use a quadratic completion with Lemma 8.1 and 8.2 which is simpler from a mathematical point of view. The linear quadratic control problem (8.3) can be formulated more generally. If one defines the objective functional J by f J{u) = x{ffGx{f)
+ / {x{tfQx{t)
+ u{tfRu{t))
dt
0
for all 7xGL^([0,f]) where Q is a positive definite symmetric (n, n) matrix with real coefficients, then the result of Theorem 8.3 remains almost true for the modified control problem. The only difference is that then the matrix function P() is a solution of the Riccati matrix differential equation P{t) + A^P{t) f P{t)A + Q P{t)BR^B^P{t)
= 0 for all t e [0, f ]
with the terminal condition P{T) — G. Example 8.4. equation
As a simple model we consider the differential
x{t) = 3x{t) + u{t) almost everywhere on [0,1]
220
Chapter 8. Direct Treatment of Special Optimization Problems
with the initial condition x(0) = x° where x^ G R is arbitrarily chosen. The objective functional J reads as follows: 1
J{u) = x{lY + i / u { t f dt for all u e Loo([0,1]). 0
Then we obtain the function P as n 1 oHt
Pit) =
t 1
1
t 1
6(tl) _ 5 6(tl) , ^
6
6
6
—T f o r a l H e [0,1].
5 + e6(*i)
Hence, the optimal control u is given by
== —
30 Q(ti) ^(*) ^l^c>st everywhere on [0,1].
(8.7)
If we plug the feedback control u in the differential equation, we can determine the trajectory x: x{t)
= =
3x{t)+u{t) 3x{t)
30 ZTT^W 5 + e6(*i)
30 5 + e^Ci)
xit).
8.2. Time Minimal Control Problems
221
Then we obtain the trajectory x as x{t)
= x^ e^^
^+^
^
=
X^ g(3s6(sl)+ln(e6(i)+5))*
=
X^ g3t+ln(e6(*i)5)ln(e6+5) X^
g3t ^g6(tl) ^ 5j f^^ g^ii ^ ^ JQ^ ^1^
(g g)
e^ + 5 If we plug the equation (8.8) in the equation (8.7), we get the optimal control u in the open loop form u{t) =
—— e~^* almost everywhere on [0,1]. 6
~r 0
This optimal control is even a smooth function.
8.2
Time Minimal Control Problems
An important problem in control theory is the problem of steering a linear system with the aid of a bounded control from its initial state to a desired terminal point in minimal time. In this section we answer the questions concerning the existence and the characterization of such a time minimal control. As a necessary condition for such an optimal control we derive a socalled weak bangbang principle. Moreover, we investigate a condition under which a time minimal control is unique. In this section we consider the system of hnear differential equations x{t) = A{t)x{t) + B{t)u{t) almost everywhere on [0,T]
(8.9)
with the initial condition x{0) = x^
(8.10)
x{f)=x^
(8.11)
and the terminal condition
222
Chapter 8. Direct Treatment of Special Optimization Problems
where T > 0, x^,x^ G R^, A and B are (n,n) and {n^m) matrix functions with real coefficients, respectively, which are assumed to be continuous on [0,r], and controls u are chosen from L^([0, T]) with ll'^^lli/oo([o,f]) — ^ f^^ ^1^ ^ ^ {Ij • • • 5^} Then we ask for a minimal time T G [0, T] so that the linear system (8.9) can be steered from x^ to x^ on the time interval [0,T]. If we consider the linear system (8.9) on a time interval [0, T] with T E [0, T] we use the abbreviation U(T)
:= {u G L^([0, T])  for every fc G { 1 , . . . , m} we have '^/c(^) ^ 1 almost everywhere on [0,T]} for all T G [0,f] (8.12)
for the set of all feasible controls with terminal time T. Definition 8.5. For any T G [0,T] consider the linear system (8.9) on [0,T] with the initial condition (8.10). The set K{T)
:= {x{T) eW \ue U{T) and x satisfies the linear system (8.9) on [0,T] and the initial condition (8.10)}
is called the set of attainability. The set of attainability consists of all terminal points to which the system can be steered from x^ at the time T. Since we assume by (8.11) that the system can be steered to x^ we have x^ G K{T). Hence, the problem of finding a time minimal control for the linear system (8.9) satisfying the conditions (8.10), (8.11) can be transformed to a problem of the following type: Determine a minimal time T G [0,T] for which x^ G K{f) (see Fig. 8.2). Before going further we recall that for an arbitrary u G L^{[0, T]) the solution of the initial value problem (8.9), (8.10) with respect to the time interval [0,T], T G [0,T], can be written as t
x{t) = ^{t)x^ + $(t) / ^s)^B{s)u{s)
ds for all t G [0, f ]
8.2. Time Minimal Control Problems
223
y1\
K{f) K{f) K{T),
TG(0,f)
>
Figure 8.2: Illustration of the set of attainability.
where $ is the fundamental matrix with $(t) = A{t)^t)
for all t e [0,T],
$(0) = I (identity matrix)^^ Notice that in the case of a time independent matrix A, the fundamental matrix $ is given as
In the following, for reasons of simplicity, we use the abbreviations Y{t) := ^\t)B{t)
for all t G [0,r]
and R{T) :  2^"" (where
226
Chapter 8. Direct Treatment of Special Optimization Problems
K{') denotes the set of attainability) is continuous (with respect to the Hausdorff distance). Proof, First we prove the continuity of the mapping R. For that proof let f , r G [0,f], with T ^ T, be arbitrarily chosen. Without loss of generality we assume T < T. Then for an arbitrary y G R{T) there is a feasible control u with 1
y=
Y{t)u{t) dt.
For the feasible control u given by u{t)
_ J u{t) almost everywhere on [0,T] ( l , . . . , l f foralHG ( f , r ]
we have 1
fy{t)u{t) dt e
R{T).
Consequently we get d{y,R(T))
:=
0 the inequality g{K{f), K{T)) < /3$(T)  $(r) + $(f ) g{R{f), R{T))) that the mapping K is continuous. •
Lemma 8.9. (8.10) and some of the set K{T) that y is also an
Let the linear system (8.9) with the initial condition T G [0,T] he given. Let y he a point in the interior of attainahility, then there is a time T E (0, T) so interior point of K{T).
Proof. Let y be an interior point of the set K{T) (this implies f > 0). Then there is an s > 0 so that B{y,£) C K{f) for the closed ball B{y^e) around y with radius e. Now we assume that for
228
Chapter 8. Direct Treatment of Special Optimization Problems
all T G (0,T) y is not an interior point of the set K{T). For every T e (0,f) the set K{T) C W is closed and convex. Then for every T G (0,T) there is a hyperplane separating the set K{T) and the point y (compare Theorem C.5 and Theorem C.3). Consequently, for every T G (OjT) there is a point yr G B{y,e) whose distance to the set K{T) is at least e. But this contradicts the continuity of the setvalued mapping K. •
The next lemma is the key for the proof of a necessary condition for time minimal controls. For the formulation of this result we use the function sgn : M —> R given by
r
1 for y > 0 1
sgn(2/) = I 0 for 2/  0 > . [  1 for y < 0 J Lemma 8.10. Let the linear system (8.9) with the initial condition (8.10) and some f G (0,f] be given. If x{f,u) G dK{f) for some u G U{T), then there is a vector rj ^ ORn so that for all A; G { 1 , . . . , m } ; U]^{t) = sgn[ri'^Yk{t)] almost everywhere on {t G [0,T]r/^Y/.(t) ^ 0} fx(T, u) denotes the state at the time T with respect to the control u; Yk{t) denotes the kth column of the matrix Y{t)). Proof. Let an arbitrary point y := x{T^u) G dK{T) be given. Since the set K{T) is a convex and closed subset of R'^, by a separation theorem (see Theorem C.5) there is a vector f) ^ ORTZ with the property
fy
> fy
for all y G K{T).
Because of f
fy
^ f^{T)x''
+ f^{T)
I Y{t)u{t) dt
8.2. Time Minimal Control Problems
229
and f
fy
= f^f)x^
+ f^{f)
I Y{t)u{t) dt for all y E K{f) 0
we obtain for rj^ :=
ff^{T) f
f
rf I Y{t)u{t) dt>ri^ 0
I Y{t)u{t) dt
(8.14)
0
for all feasible controls steering the linear system (8.9) with the initial condition (8.10) to a state in the set K{T) of attainability. From the inequality (8.14) we conclude rfY{t)u(t)
> r}^Y{t)u{t) almost everywhere on [0,T].
(8.15)
For the proof of the implication "(8.14) = > (8.15)" we assume that the inequality (8.15) is not true. Then there is a feasible control u and a set M C [0, T] with positive measure so that rfY{t)u{t)
< r}^Y{t)u(t) almost everywhere on M.
If one defines the feasible control u^ by ^, X _ J u{t) almost everywhere on [0,r] \ M 1 ^ ^ \ u{t) almost everywhere on M J ' then it follows f
T]^ fY{t)u'{t)dt
= rf lY(t)u{t)dt M
+ ri^
I
_ _dt Y{t)u{t)^
[0,t]\M
' I Y{t)u{t) dt + ri^ I Y{t)u{t) dt M T
T]'' fY{t)u{t)dt 0
[0,t]\M
230
Chapter 8. Direct Treatment of Special Optimization Problems
which contradicts the inequality (8.14). Hence, the inequality (8.15) is true. From the inequality (8.15) we get for all A; G { 1 , . . . , m} uj,{t) = sgn [v^Yk{t)] almost everywhere on {t G [0,f]\r]'^Yk{t) / 0}. D
Now we present the aforementioned necessary condition for time minimal controls. Theorem 8.11. Lei the linear system (8,9) with the initial condition (8.10) and the terminal condition (8.11) he given. If u is a time minimal control with respect to the minimal terminal time T G [0,T], then there is a vector rj ^ OE^ SO that for a// /c G { 1 , . . . , m}: Uk{t) = sgn[ri^Yk{t)] almost everywhere on {t G [0,T]\ri^Yk{t) ^ 0}. (8.16)
Proof. The assertion is obvious for T = 0. Therefore we assume T > 0 for the following. We want to show that T
y := $(f)x^ + $(T) /Y{t)u{t) dt G dK{f).
(8.17)
0
Suppose that y were an interior point of the set K{T) of attainability. Then by Lemma 8.9 there is a time T G (0, T) so that y is also an interior point of the set K{T). But this contradicts the fact that T is the minimal time. Hence, the condition (8.17) is true. An application of Lemma 8.10 completes the proof. • The statement (8.16) is also called a weak hanghang principle. If the measure of the set {t G [0,T]77^yA;(^) = 0} equals 0 for every k G { l , . . . , m } , the statement (8.16) is called a strong hanghang principle. Theorem 8.11 can also be formulated as follows: For every time minimal control u there is a vector 77 7^ Q^n SO that u satisfies the weak bangbang principle (8.16).
8.2. Time Minimal Control Problems
231
The next example illustrates the applicability of Theorem 8.11. Example 8.12. We consider the harmonic oscillator mathematically formalized by y{t) + y{t) ~ u{t) almost everywhere on [0, T], ll^llLoo([0,f]) ^ 1 where T > 0 is sufficiently large. An initial condition is not given explicitly. The corresponding linear system of first order reads
— A
=:B
We have ^/.\ ^'
At
v ^ A'it^ ^^—^ %\ ^=o
( cost sint \ — sm£ cost ^
and
Then we obtain for an arbitrary vector r] ^ 0^^ r}^Y{t) — —r/i sin t + r/2 cos t. Consequently, we get for a number a G M and a number 5 G [—TT, TT] ri^Y{t)  a s i n ( t + (5) and therefore sgTi[r}^Y{t)] = sgn[asin(t + S)] (see Fig. 8.3). Conclusion: If there is a time minimal control u, then it fulfills the strong bangbang principle, and therefore it is unique. After TT time units one always gets a change of the sign of u. With a standard result from control theory one can see that the considered linear system is null controllable (i.e., it can be steered to
Chapter 8. Direct Treatment of Special Optimization Problems
232
1
\
sgn[asin(£ + S)]
n
u 1
VV
•V
V
V
TT
n
/
t
Figure 8.3: Illustration of the time optimal control.
the origin in a finite time). Hence, by Theorem 8.7 there is also a time minimal control u which steers this system into a state of rest, and therefore the preceding results are applicable. Now we present an example for which the necessary condition for time minimal controls does not give any information. Example 8.13. We investigate the simple linear system ../.
^\J. ,
}J. > almost everywhere on [0, Tl
with l^llLoo[0,f] ^ 1 and T > 0. Here we set 1 0 0 1
A=
/ and B
Then we obtain
Y(t)
At B
^t
and for any vector rj ^ ^^2 we get
For example, for ry = (
. 1 we conclude
77^y(t) = 0 for alH G [0,t],
8.2. Time Minimal Control Problems
233
and Theorem 8.11 does not give a suitable necessary condition for time minimal controls. Next we investigate the question under which conditions time minimal controls are unique. For this investigation we introduce the notion of normality. Definition 8.14. (a) The linear system (8.9) is called normal on [0, T] (with T G [0, f ]), if for every vector TJ ^ 0^^ the sets Gk{j])  {t G [0, T] I r/^n(t)  0} with /c G { 1 , . . . , m} have the measure 0. Y^it) denotes again the /cth column of the matrix Y{t). (b) The linear system (8.9) is called normal^ if for every T G [0,T] this system is normal on [0,r].
Theorem 8.15. Let the linear system (8.9) with the initial condition (8.10) and the terminal condition (8.11) he given. If u is a time minimal control with respect to the minimal terminal time T G [0,T] and if the linear system (8.9) is normal on [0, T], then u is the unique time minimal control. Proof. By Theorem 8.11 for every time minimal control u there is a vector r/ ^ ORn so that for all fc G { 1 , . . . , m}: Uk{t) = sgn[ri'^Yk{t)] almost everywhere on [O^T] \ Gkirf). Then the assertion follows from the normality assumption (notice that in the proof of Lemma 8.10 the vector 77 depends on the terminal state and not on the control). • A control u which satisfies the assumptions of Theorem 8.15 fulfills the strong bangbang principle u{t) = sgn[?7^y;.(t)] almost everywhere on [0,T].
234
Chapter 8. Direct Treatment of Special Optimization Problems
One obtains an interesting characterization of the concept of normahty in the case of an autonomous linear system (8.9) with constant matrix functions A and B. Theorem 8.16. The autonomous linear system (8,9) with constant matrix functions A and B is normal if and only if for every k E {1^... ,m} either rank {Bj,, ABk,...,
A'^'^Bk) = n
(8.18)
or rank (A — A/, 5/^) — n for all eigenvalues A of A.
(8.19)
Here B^ denotes the kth column of the matrix B. Proof. We fix an arbitrary terminal time T G [0, T]. First notice that for every A; G { 1 , . . . , m} and every 77 G R^
Consequently, the realvalued analytical function rfYk{') on [0,T] is either identical to 0 or it has a finite number of zeros on this interval. Therefore, the autonomous hnear system (8.9) is normal on [0, T] if and only if the following implication is satisfied: rj^e'^^Bk = 0 for all t G [0, T] and some fc G { 1 , . . . , m} => 77 = 0]Rn. (8.20) Next we show that the implication (8.20) is equivalent to the condition (8.18). For this proof we assume that the condition (8.18) is satisfied. Let a vector 77 G R'^ with rj^e^^Bk = 0 for all t G [0, T] and some A; G { 1 , . . . , m} be arbitrarily given. By repeated differentiation and setting H = 0" we get if{Bk, ABk,...,
A''~'^Bk) = O^n for some /c G { 1 , . . . , m}.
By assumption the system of row vectors of the matrix (5^, ABk^..., A^'^Bk) is linear independent, and therefore we get r] = OM^. Hence,
8.2. Time Minimal Control Problems
235
the implication (8.20) is satisfied, i.e. the autonomous linear system (8.9) is normal on [0,r]. Now we assume that the condition (8.18) is not satisfied. This means that for some /c G { 1 , . . . , m} the system of row vectors of the matrix {Bk^ABk,... ,A^~^Bk) is linear dependent. Then there is a vector T) ^ 0^n with r]^{Bk,AB,,...,A^'B,)
= Ol.
which implies rj'^Bk = v^ABk = "• = v^A^'Bk
= 0.
(8.21)
The CayleyHamilton theorem states that the matrix A satisfies its characteristic equation, i.e. A"" = aol + aiA +•• + a^_i A^~^ with appropriate coefficients a^^ai^,.. with (8.21)
,ani G R. Then we obtain
rj^A'^Bk = aoV^Bk + aiV^ABj, + • • • + aniff A""^ B^ = 0 and by induction rj^A^Bk = 0 for all I > n.
(8.22)
The equations (8.21) and (8.22) imply r]^A^Bk = 0 for all / > 0 which leads to rj^e'^'Bk  V^ ( £ ^ '  ^
) Bk = 0 for all t G [0,r].
Consequently, the imphcation (8.20) is not satisfied, i.e. the autonomous linear system (8.9) is not normal on [0,T]. Finally we show the equivalence of the two rank conditions (8.18) and (8.19). Let A; G { 1 , . . . , m} be arbitrarily chosen.
236
Chapter 8. Direct Treatment of Special Optimization Problems
Assume that the condition (8.19) is not satisfied, i.e. for some possibly complex eigenvalue A of A we have rank {A — A/, Bk) 7^ n. Then there is a vector z eR^ with z ^ ORn and 2:^(AA/,fi^)0^n+i, i.e. z^A  \z^
(8.23)
z^B^ = 0.
(8.24)
and With the equations (8.23) and (8.24) we conclude z'^ABk = Xz^Bk = 0, and by induction we get z^A^Bk = 0 for all I > 0. Hence we have mnk{Bk,ABk,...,A''^Bk)7^n. Conversely, we assume now that the equation (8.18) is not satisfied. Then there is a 2: 7^ O^n with z^Bk = 0, z^ABk  0 , . . . , z^A^'^Bk = 0. Again with the CayleyHamilton theorem we conclude immediately z^A^Bk = 0 for all I > 0. Consequently, the linear subspace S :={zeW
\ z^A^Bk = 0 for all I > 0}
has the dimension > 1. Since the set S is invariant under A^ (i.e. A^S C S)^ one eigenvector z of A^ belongs to S. Hence, there is an eigenvalue A of A^ which is also an eigenvalue of A so that A^z = Xz
8.2. Time Minimal Control Problems
237
or alternatively z^{AXI)=Oln.
(8.25)
Because of z e 5 we obtain with / = 0 z^Bk = 0.
(8.26)
The equations (8.25) and (8.26) imply rank {A — A/, Bk) 7^ n for some eigenvalue A of A, This completes the proof.
•
In control theory the condition rank
{B,AB,,..,A''^B)=n
is called the Kalman condition. It is obvious that the condition rank {Bj,, ABj,,...,
A''~'^Bk) = n for all A; E { 1 , . . . , m}
which is given in Theorem 8.16 impUes the Kalman condition. Moreover, in control theory the condition rank {A — A/, B) = n for all eigenvalues A of A is called the Hautus condition which is implied by the condition rank {A — A/, Bk) = n for all fcG{l,..., m} and all eigenvalues A of A. One can show with the same arguments as in the proof of Theorem 8.16 that the Kalman and Hautus conditions are equivalent. In control theory one proves that the Kalman condition (or the Hautus condition) characterizes the controllability of an autonomous linear system, i.e. in this case there is an unconstrained control which steers the autonomous linear system from an arbitrary initial state to an arbitrary terminal state in finite time. The following example shows that the Kalman condition (or the Hautus condition) does not imply the condition (8.18) (and (8.19), respectively).
238
Chapter 8. Direct Treatment of Special Optimization Problems
Example 8.17. The following autonomous linear system satisfies the Kalman condition but it is not normal:
with some T > 0. Here we set A= {
1 0 ^ _2„ J and Q — ^B " \ 1 1
Then we have
1 y '
^
V  2 y•
The matrix (^2, ^^2) has the rank 1, and therefore the hnear system is not normal. On the other hand we have rank {B,AB)
2,
i.e. the Kalman condition is satisfied.
Exercises 8.1) Consider the differential equation x{t) = 2x{t) — 3u{t) almost everywhere on [0,2] with the initial condition x{0) = x^ for an arbitrarily chosen x° G M. Determine an optimal control u G Loo([0,2]) ^s a minimal point of the objective functional J : Loo([0,2]) ^ R with 2
J{u) = x{lY
+ 2 fu{tfdt
for all u G Loo([0,2]).
239
Exercises
8.2) ([49, p. 132133]) Let the initial value problem x{t) = u(t) almost everywhere on [0,1], x(0)  1 be given. Determine an optimal control u G I/oo([0,1]) for which the objective functional J : I/oo([0,1]) ^ R with 1
J{u) = I {u{tf + x{tf) dt for all u e Loo([0,1]) 0
becomes minimal. 8.3) Consider the linear differential equation of nth order y^\t) + aniy^^'\t)
+ . • • + a,y{t) = u{t) almost everywhere on [0, T]
where T > 0 and ao,... ,^^1 G M are given constants. The control u is assumed to be an Loo([0,T]) function. Show that the system of linear differential equations of first order which is equivalent to this differential equation of nth order satisfies the Kalman condition. 8.4) ([206, p. 2224]) Let the system of linear differential equations x{t) — Ax{t) + Bu{t) almost everywhere on [0, T] with /
0 a A = 0 \ 0
1 0 0 0
0 0 0 0
0 \ 0 1 0^
( and B = ^
o\
/3 0
1)
be given where T > 0 , a > 0 , / 3 > 0 and 7 > 0 are constants. It is assumed that u G Loo([0,T]). Show that this system satisfies the Hautus condition.
240
Chapter 8. Direct Treatment of Special Optimization Problems
8.5) For the linear system in 8.4) assume in addition that the terminal time T is sufficiently large. Moreover, let the initial condition x{0) = x^ with x^ G R^ and the terminal condition x{f) = OM4 be given. For the control u we assume ll^llLoo([0,f]) ^ 1It can be proved with a known result from control theory that this system can be steered from x^ to 0^4 in finite time. Show then that a time minimal control exists which is unique, and give a characterization of this time minimal control.
Appendix A Weak Convergence Definition A . l . Let {X, \\ • ) be a normed space. A sequence (^n)neN of elements of X is called weakly convergent to some x e X if for all continuous linear functional I on X lim l{xn) = l{x). n—>oo
In this case x is called a weak limit of the sequence {xn)neN' In a finite dimensional normed space a sequence is weakly convergent if and only if it is convergent. In an arbitrary normed space every convergent sequence is also weakly convergent; the converse statement does not hold in general. Example A.2. Consider the Hilbert space I2 of all real sequences X == (x^)ieN (x')^eN with with Yl x^p < 00. In this linear space we investigate the 2=1
special sequence x i : ^ (1,0,0,0,...), X2:= (0,1,0,0,...), x s : ^ (0,0,1,0,...),
and so on. This sequence converges weakly to O/2 because for each
242
Appendix A. Weak Convergence
continuous linear functional / on I2 there is a y E I2 with l{x) — {y, x) for all x Eh so that lim l{xn) = lim {y^x^) = lim y'^ = 0. n—>oo
n—>oo
n—>oo
On the other hand the sequence {xn)nen does not converge to O/2 because
l^nll
^{x\^Y
\ Y^nt^n) \
= 1 for all
neN.
i=l
Definition A.3. Let (X,  • ) be a normed space. A nonempty subset S' of X is called weakly sequentially closed if for every weakly convergent sequence in S the weak limit also belongs to S. Every weakly sequentially closed subset of a normed space is also closed (because every convergent sequence converges weakly to the same limit). The converse statement is not true in general. But every nonempty convex closed subset of a normed space is also weakly sequentially closed. Definition A.4. Let (X,  • ) be a normed space. A nonempty subset 5 of X is called weakly sequentially compact if every sequence in S contains a weakly convergent subsequence whose weak limit belongs to 5. A nonempty subset of a normed space is weakly sequentially compact if and only if it is weakly compact (i.e. compact with respect to the weak topology). In a finite dimensional normed space a nonempty subset is weakly sequentially compact if and only if it is closed and bounded.
Appendix B Reflexivity of Banach Spaces Definition B . l . space.
A complete normed space is called a Banach
Using a James theorem (e.g., compare [168, § 19]) a sufficient condition for the weak sequence compactness of a nonempty subset of a real Banach space can be given. Theorem B.2. Let S be a nonempty convex bounded closed subset of a real Banach space. If every continuous linear functional attains its supremum on S^ then the set S is weakly sequentially compact. Reflexive normed spaces are special Banach spaces. In specialist literature a normed linear space (X,  • ) is called reflexive if the canonical embedding of X into X** is surjective — but here we use a known characterization for the definition of this notion. Definition B.3. A Banach space (X,  • ) is called reflexive if the closed unit ball {x G X  x < 1} is weakly sequentially compact. Every finite dimensional normed space is reflexive. For instance, the linear space l/i[0,1] of Lebesgue integrable realvalued functions on [0,1] is a Banach space, but it is not reflexive.
244
Appendix B. Reflexivity of Banach Spaces
In a reflexive Banach space a simple sufficient condition for the weak sequence compactness of a nonempty subset can be given (for instance, compare [347, Cor. 6.1.9]). Theorem B.4. Every nonempty convex bounded closed subset of a reflexive Banach space is weakly sequentially compact. Notice that in a finite dimensional normed space the assumption of convexity can be dropped.
Appendix C HahnBanach Theorem The following theorem is also called a basic version of the HahnBanach theorem (for a proof, for instance, compare [181, Thm. 3.8]). Theorem C.l. Let X be a real linear space. For every sublinear functional / : X —> R there is a linear functional I on X with l{x) < f{x)
for allx e X
(see Fig. C.l). /N
/
X
Figure C.l: Illustration of the result of Thm. C.l.
Besides this basic version there are further versions of the HahnBanach theorem. The following Eidelheit separation theorem can be
246
Appendix C. HahnBanach Theorem
deduced from Theorem C.l (for a proof see [181, Thm. 3.16]). Theorem C.2. Let S and T he nonempty convex subsets of a real topological linear space X with int{S) ^ 0. Then we have int[S) DT = 0 if and only if there are a continuous linear functional / G X* \ {Ox*} and a real number 7 with l{s) < 7 < l{t) for allseS
and
allteT
and l{s) < 7 for all s G int{S) (see Fig. C.2).
Figure C.2: Illustration of the result of Thm. C.2.
The following separation theorem can be obtained from the preceding theorem. Theorem C.3. Let S be a nonempty convex and closed subset of a real locally convex space X. Then we have x E X\S if and only if there is a continuous linear functional I G X* \ {Ox*} u)ith l{x) <mil{s).
(C.l)
Appendix C. HahnBanach Theorem
247
Proof. (a) Let any x E X he given. If there is a continuous linear functional / G X* \ {Ox*} with the property (C.l), then it follows immediately x ^ S, (b) Choose an arbitrary element x E X \ S. Since S is closed, there is a convex neighborhood N of x with N H S = 0. By the Eidelheit separation theorem (Thm. C.2) there are a continuous linear functional I E X* \ {Ox*} and a real number 7 with l{x) < 7 < l{s) for all s e S. The inequality (C.l) follows directly from the previous inequality. D
The next result is a special version of the HahnBanach theorem deduced by the Eidelheit separation theorem. Theorem C.4, Let (X,  • \\x) be a real normed space. For every X e X there is an I E X* with \\l\\x* = 1 ct'i^d l{x) = ^xProof. For x — Ox the assertion is evident. Therefore assume in the following that any a; 7^ Ox is arbitrarily given. Let S denote the closed ball around zero with the radius a;, and letT \= {x}. Because of int(S') n T = 0 by the Eidelheit separation theorem (Thm. C.2) there are an TG X* \ {Ox*} and a 7 G R with /"(s) < 7 < l{x) for all
SES,
If we define / := mr7^5 we have /x* = 1 and l{s) < l{x) for all
SES.
Then we get \\x\\x =
IklU sup \l{y)\ \\y\\x
248
Appendix C. HahnBanach Theorem
sup \l{\\x IUy)
\\y\\x
=
sup /(5)1 ses
=
SUp/(5)
0 for all x e C}
is called the dual cone for C (here X' denotes the algebraical dual space of X). With the aid of the dual cone C a partial ordering is described on the dual space X\ In the case of a real normed space (X,  • ) the dual cone in the topological dual space X* is denoted by C*.
Bibliography [1] J. Abadie, Nonlinear programming (NorthHolland, Amsterdam, 1967). [2] J, Ahsidie^ Integer and nonlinear programming (NorthHolland, Amsterdam, 1970). [3] P.R. Adby and M.A. Dempster, Introduction to optimization methods (Chapman and Hall, London, 1974). [4] D. Alevras and M.W. Padberg, Linear optimization and extensions  problems and solutions (Springer, Berlin, 2001). [5] W. Alt, Nichtlineare Optimierung  Fine Einfiihrung in Theorie, Verfahren und Anwendungen (Vieweg, 2002). [6] E.J. Anderson and P. Nash, Linear programming in infinitedimensional spaces (Wiley, Chichester, 1987). [7] T.S. Angell and A. Kirsch, Optimization methods in electromagnetic radiation (Springer, New York, 2004). [8] J.S. Arora, Introduction to optimum design (McGrawHill, New York, 1989). [9] K.J. Arrow and L. Hurwicz, Studies in linear and nonlinear programming (Stanford University Press, Stanford, 1958). [10] J.P. Aubin, Analyse convexe et ses applications (Springer, Berlin, 1974). [11] J.P. Aubin, Applied abstract analysis (Wiley, New York, 1977). [12] J.P. Aubin, Explicit methods of optimization (gauthiervillars, Paris, 1984). [13] J.P. Aubin, Optima and equilibra  an introduction to nonlinear analysis (Springer, Berlin, 2002). [14] J.P. Aubin, P. Nepomiastchy and A.M. Charles, Methodes explicites de I'optimisation (Dunod, Paris, 1982).
254
Bibliography
[15] A.V. Balakrishnan, Introduction to optimization theory in a Hilbert space (Springer, Berlin, 1971). [16] V.K. Balakrishnan, Network optimization (Chapman & Hall, New York, 1994). [17] R. Baldick, Applied optimization  formulation and algorithms for engineering systems (Cambridge University Press, 2006). [18] B. Bank, J. Guddat, D. Klatte, B. Kummer and K. Tammer, Nonlinear parametric optimization (Birkhauser, Basel, 1983). [19] V. Barbu, Mathematical methods in optimization of differential systems (Kluwer, Dordrecht, 1994). [20] V. Barbu and T. Precupanu, Convexity and optimization in Banach spaces (Editura Acad., Bucaresti, 1986). [21] J.F. Bard, Practical bilevel optimization  algorithms and applications (Springer, 1999). [22] M. BartholomewBiggs, Nonlinear optimization with financial applications (Springer, 2005). [23] M.S. Bazaraa and CM. Shetty, Foundations of optimization (Springer, Berhn, 1976). [24] M.S. Bazaraa and CM. Shetty, A^on/mearpro^'rammm^' (Wiley, New York, 1979). [25] E.M.L. Beale, Introduction to optimization (Wiley, Chichester, 1988). [26] E.J. Beltrami, An algorithmic approach to nonlinear analysis and optimization (Academic Press, New York, 1970). [27] M.P. Bends0e and O. Sigmund, Topology optimization  theory, methods J and applications (Springer, Berlin, 2004). [28] A. BenIsrael, A. BenTal and S. Zlobec, Optimality in nonlinear programming (Wiley, New York, 1981). [29] H. Benker, Mathematische Optimierung mit Computeralgebrasystemen  Einfuhrung filr Ingenieure, Naturwissenschaftler und Wirtschaftswissenschaftler unter Anwendung von Mathematical Maple, Mathcad, Matlab und Excel (Springer, 2003). [30] A. BenTal and A. Nemirovski, Lectures on modern convex optimization  analysis, algorithms, and engineering applications (SIAM, Philadelphia, 2001).
Bibliography
255
[31] A. Berman, Cones, matrices and mathematical programming (Springer, Berlin, 1973). [32] D.P. Bertsekas, Constrained optimization and Lagrange multiplier methods (Athena Scientific, Belmont, 1996). [33] D.P. Bertsekas, Nonlinear programming (Athena Scientific, Belmont, 2004). [34] D.P. Bertsekas with A. Nedic and A.E. Ozdaglar, Convex analysis and optimization (Athena Scientific, Belmont, 2003). [35] J.T. Betts, Practical methods for optimal control using nonlinear programming (SIAM, 2001). [36] G.S. Beveridge and R.S. Schechter, Optimization (McGrawHill, London, 1970). [37] M.A. Bhatti, Practical optimization methods  with Mathematica applications (Springer, New York, 2000). [38] E. Blum and W. Oettli, Mathematische Optimierung (Springer, Berlin, 1975). [39] A.I. Bojarinov and V.V. Kafarov, Optimierungsmethoden in der chemischen Technologic (Verlag Chemie, Weinheim, 1972). [40] V.G. Boltjanskij, Mathematische Methoden der optimalen Steuerung (Hanser, Miinchen, 1972). [41] J.F. Bonnans, J.C. Gilbert, C. Lemarechal and C.A. Sagastizabal. Numerical optimization  theoretical and practical aspects (Springer, Berlin, 2003). [42] J.F. Bonnans and A. Shapiro, Perturbation analysis of optimization problems (Springer, New York, 2000). [43] K.H. Borgwardt, Optimierung, Operations Research, Spieltheorie  Mathematische Grundlagen (Birkhauser, 2001). [44] J.M. Borwein and A.S. Lewis, Convex analysis and nonlinear optimization  theory and examples (Springer, New York, 2006). [45] M.J. Box, D. Davies and W.H. Swann, Nonlinear optimization techniques (Oliver &: Boyd, Edinburgh, 1969). [46] S. Boyd and L. Vandenberghe, Convex optimization (Cambridge University Press, Cambridge, 2004). [47] J. Bracken and G.P. McCormick, Selected applications of nonlinear programming (Wiley, New York, 1968).
256
Bibliography
[48] S.J. Britvec, Stability and optimization of flexible space structures (Birkhauser, Boston, 1995). [49] R.W. Brockett, Finite dimensional linear systems (Wiley, New York, 1970). [50] B. Brosowski, Parametric semiinfinite optimization (Lang, Frankfurt, 1982). [51] D. Bucur and G. Buttazzo, Variational methods in shape optimization problems (Birkhauser, Boston, 2005). [52] R. Bulirsch, A. Miele, J. Stoer and K.H. Well, Optimal control (Birkhauser, Basel, 1992). [53] B.D. Bunday, Basic optimisation methods (Arnold, London, 1985). [54] E.R. Caianiello, Functional analysis and optimization (Academic Press, New York, 1966). [55] N. Cameron, Introduction to linear and convex programming (Cambridge University Press, Cambridge, 1985). [56] M.D. Canon, C D . CuUum and E. Polak, Theory of optimal control and mathematical programming (McGrawHill, New York, 1970). [57] K.W. Cattermole, Optimierung in der Nachrichtentechnik (Verlag Chemie, Weinheim, 1990). [58] J. Cea, Optimisation (Dunod, Paris, 1971). [59] J. Cea, Lectures on optimization (Springer, Berlin, 1978). [60] Y. Censor and S.A. Zenios, Parallel optimization  theory, algorithms, and applications (Oxford University Press, New York, 1997). [61] L. Cesari, Optimization, theory and applications (Springer, New York, 1983). [62] G.y. Chen, X. Huang and X. Yang, Vector optimization  setvalued and variational analysis (Springer, 2005). [63] E.K.P. Chong and S.H. Zak, An introduction to optimization (Wiley, 2001). [64] V. Chvatal, Linear programming (Freeman, New York, 1983). [65] P.G. Ciarlet, Introduction to numerical linear algebra and optimisation (Cambridge University Press, Cambridge, 1989).
Bibliography
257
[66] S.J. Citron, Elements of optimal control (Holt, New York, 1969). [67] F.H. Clarke, Methods of dynamic and nonsmooth optimization (SIAM, Philadelphia, 1989). [68] F.H. Clarke, Optimization and nonsmooth analysis (SIAM, 1990). [69] F. Clarke, Necessary conditions in dynamic optimization (AMS, Providence, 2005). [70] T.F. Coleman, Large sparse numerical optimization (Springer, Berlin, 1984). [71] L. CoUatz and W. Wetterling, Optimierungsaufgahen (Springer, Berlin, 1971). [72] L. CoUatz and W. Wetterling, Optimization problems (Springer, Heidelberg, 1975). [73] Y. Collette and P. Siarry, Multiobjective optimization  principles and case studies (Springer, Berlin, 2004). [74] A.R. Conn, N.I.M. Gould and P.L. Toint, Trustregion methods (SIAM, Philadelphia, 2000). [75] G. Cornuejols and R. Tutuncu, Optimization methods in finance (Cambridge University Press, 2007). [76] B.D. Craven, Mathematical programming and control theory (Chapman and Hall, London, 1978). [77] B.D. Craven, Fractional programming (Heldermann, Berlin, 1988). [78] B.D. Craven, Control and optimization (Chapman & Hall, London, 1995). [79] R.F. Curtain and A.J. Pritchard, Functional analysis in modern applied mathematics (Academic Press, London, 1977). [80] R.F. Curtain and H.J. Zwart, An introduction to infinitedimensional linear systems theory (Springer, 1995). [81] T.R. Cuthbert, Optimization using personal computers (Wiley, New York, 1987). [82] J.W. Daniel, The approximate minimization of functional (PrenticeHall, Englewood Cliffs, 1971).
258
Bibliography
[83] R.W. Daniels, An introduction to numerical methods and optimization techniques (NorthHolland, New York, 1978). [84] S. Danoe, Nonlinear and dynamic programming (Springer, Wien, 1975). [85] R.B. Darst, Introduction to linear programming  applications and extensions (Dekker, New York). [86] K. Deb, Multiobjective optimization using evolutionary algorithms (Wiley, Chichester, 2004). [87] D.G. de Figueiredo, The Ekeland variational principle with applications and detours (Springer, Berlin, 1989). [88] V.F. Dem'yanov and A.M. Rubinov, Approximate methods in optimization problems (Elsevier, New York, 1970). [89] V.F. Demyanov and A.M. Rubinov, Constructive nonsmooth analysis (Lang, Frankfurt, 1995). [90] V.F. Demyanov, G.E. Stavroulakis, L.N. Ployakova and P.D. Panagiotopoulos, Quasidifferentiability and nonsmooth modelling in mechanics, engineering and economics (Kluwer, Dordrecht, 1996). [91] V.F. Dem'yanov and L.V. VasiPev, Nondifferentiable optimization (Optimization Software, New York, 1985). [92] M.M. Denn, Optimization by variational methods (McGrawHill, New York, 1969). [93] J.B. Dennis, Mathematical programming and electrical networks (MIT Press, Cambridge, 1959). [94] J.E. Dennis and R.B. Schnabel, Numerical methods for unconstrained optimization and nonlinear equations (SIAM, Philadelphia, 1996). [95] W. Dinkelbach, Sensitivitdtsanalysen und parametrische Programmierung (Springer, Berlin, 1969). [96] W. Dinkelbach, Entscheidungsmodelle (de Gruyter, Berlin, 1982). [97] S. Dischinger, Pivotauswahlverfahren in der Linearen Programmierung (Lang, Frankfurt, 1995). [98] U. Diwekar, Introduction to applied optimization (Springer, 2003).
Bibliography
259
[99] L.C.Dixon^ Numerical optimisation of dynamic systems {NorthHoUand, Amsterdam, 1980). [100] A. Dontchev and T. Zolezzi, Wellposed optimization problems (Springer, Berlin, 1993). [101] D.Z. Du, P.M. Pardalos and W. Wu, Mathematical theory of optimization (Springer, 2001). [102] R.J. Duffin, E.L. Peterson and C. Zener, Geometric programming^ theory and application (Wiley, New York, 1967). [103] P. Dyer and S.R. McReynolds, The computation and theory of optimal control (Academic Press, New York, 1970). [104] T.F. Edgar and D.M. Himmelblau, Optimization of chemical processes (McGrawHill, New York, 1988). [105] M. Ehrgott, Multicriteria optimization (Springer, Berlin, 2005). [106] H.A. Eiselt, G. Pederzoli and C.L. Sandblom, Continuous optimization models (de Gruyter, Berlin, 1987). [107] I. Ekeland and R. Temam, Convex analysis and variational problems (SIAM, 1999). [108] I. Ekeland and T. TurnbuU, Infinitedimensional optimization and convexity (University of Chicago Press, Chicago, 1983). [109] K.H. Elster, Nichtlineare Optimierung (Teubner, Leipzig, 1978). [110] K.H. Elster, R. Reinhardt, M. Schauble and G. Donath, Einfiihrung in die nichtlineare Optimierung (Teubner, Leipzig, 1977). [Ill] V.W. Eveleigh, Adaptive control and optimization techniques (McGrawHill, New York, 1967). [112] B.S. Everitt, Introduction to optimization methods and their application in statistics (Chapman and Hall, London, 1987). [113] J.G. Evtusenko, Numerical optimization techniques (Optimization Software, New York, 1985). [114] H.O. Fattorini, Infinite dimensional optimization and control theory (Cambridge University Press, Cambridge, 1999). [115] A.V. Fiacco and G.P. McCormick, Nonlinear programming (Wiley, New York, 1968).
260
Bibliography
[116] R. Fletcher, Practical methods of optimization (Wiley, Chichester, 2003). [117] M. Florenzano and C. Le Van, Finite dimensional convexity and optimization (Springer, 2001). [118] C.A. Floudas, Deterministic global optimization  theory, methods and applications (Kluwer, Dordrecht, 2000). [119] C.A. Floudas and P.M. Pardalos, A collection of test problems for constrained global optimization algorithms (Springer, Berlin, 1990). [120] L.R. Foulds, Optimization techniques (Springer, New York, 1981). [121] R.L. Fox, Optimization methods for engineering design (AddisonWesley, Reading, 1971). [122] W. Frank, Mathematische Grundlagen der Optimierung (Oldenbourg, Miinchen, 1969). [123] H. Frankel, Diskrete optimale Steuerungsprobleme und konvexe Optimierung (de Gruyter, Berlin, 1971). [124] A. Fromm, Nichtlineare Optimierungsmodelle (Deutsch, Frankfurt, 1975). [125] R.H. Gallagher, Optimum structural design (Wiley, Chichester, 1977). [126] C. Geiger and C. Kanzow, Numerische Verfahren zur Losung unrestringierter Optimierungsaufgaben (Springer, 1999). [127] C. Geiger and C. Kanzow, Theorie und Numerik restringierter Optimierungsaufgaben (Springer, 2002). [128] A.M. Geoffrion, Perspectives on optimization (AddisonWesley, Reading, 1972). [129] P. Gessner and K. Spremann, Optimierung in Funktionenrdumen (Springer, Berlin, 1972). [130] F. Giannessi, Constrained optimization and image space analysis  volume 1: separation of sets and optimality conditions (Springer, 2005). [131] P.E. Gill, Numerical methods for constrained optimization (Academic Press, London, 1974).
Bibliography
261
[132 RE. Gill, W. Murray and M.H. Wright, Practical optimization (Academic Press, London, 1981). [133 G. Giorgi, A. Guerraggio and J. Thierfelder, Mathematics of optimization: smooth and nonsmooth case (Elsevier, 2004). [134 I.V. Girsanov, Lectures on mathematical theory of extremum problems (Springer, Berlin, 1972). [135 K. Glashoff and S.A. Gustafson, Linear optimization and approximation (Springer, New York, 1983). [136; F. Glover, N. Phillips and D. Khngman, Network models in optimization and their applications in practice (Wiley, New York, 1992). [137; A. Gopfert, Mathematische Optimierung in allgemeinen Vektorrdumen (Teubner, Leipzig, 1973). [138; A. Gopfert, L. Bittner, K.H. Elster, F. Nozicka, J. Piehler and R. Tichatschke (eds.), Lexikon der Optimierung (AkademieVerlag, Berlin, 1986). [139 A. Gopfert and R. Nehse, Vektoroptimierung  Theorie, Verfahren und Anwendungen (Teubner, Leipzig, 1990). [i4o; A. Gopfert, H. Riahi, C. Tammer and C. Zalinescu, Variational methods in partially ordered spaces (Springer, 2003). [141 B.S. Gottfried and J. Weismann, Introduction to optimization theory (PrenticeHall, Englewood Cliffs, 1973). [142; C. Grossmann and A.A. Kaplan, Strafmethoden und modifizierte Lagrangefunktionen in der nichtlinearen Optimierung (Teubner, Leipzig, 1979). [143 C. Grossmann and H. Kleinmichel, Verfahren der nichtlinearen Optimierung (Teubner, Leipzig, 1976). [144 C. GroBmann and J. Terno, Numerik der Optimierung (Teubner, Stuttgart, 1993). [145 W.A. Gruver and E. Sachs, Algorithmic methods in optimal control (Pitman, Boston, 1980). [146; J. Guddat, F.G. Vasquez and H.Th. Jongen, Parametric optimization: singularities, pathfollowing and jumps (Wiley, Chichester, 1990).
262
Bibliography
[147] J. Guddat, F.G. Vasquez, K. Tammer and K. Wendler, Multiobjective and stochastic optimization based on parametric optimization (AkademieVerlag, Berlin, 1985). [148] I. Gumowski and C. Mira, Optimization in control theory and practice (Cambridge University Press, Cambridge, 1968). [149] G. Hadley, Nichtlineare und dynamische Programmierung (PhysicaVerlag, Wiirzburg, 1969). [150] H.W. Hamacher, Mathematische Losungsverfahren fiir planare Standortprobleme (Vieweg, Wiesbaden, 1995). [151] H.W. Hamacher and K. Klamroth, Lineare und NetzwerkOptimierung  ein bilinguales Lehrbuch (Vieweg, Braunschweig, 2000). [152] R. Hartley, Linear and nonlinear programming (Harwood, Chichester, 1985). [153] L. Hasdorff, Gradient optimization and nonlinear control (Wiley, New York, 1976). [154] J. Haslinger and R.A.E. Makinen, Introduction to shape optimization  theoryJ approximation^ and computation (SI AM, Philadelphia, 2003). [155] S. Helbig, Parametrische semiinfinite Optimierung in totalgeordneten Gruppen (Fischer, Frankfurt, 1987). [156] U. Helmke and J.B. Moore, Optimization and dynamical systems (Springer, Berlin, 1994). [157] W.S. Hemp, Optimum structures (Clarendon Press, Oxford, 1973). [158] M.R. Hestenes, Optimization theory (Wiley, New York, 1975). [159] M.R. Hestenes, Conjugate direction methods in optimization (Springer, Berlin, 1980). [160] R. Hettich and P. Zencke, Numerische Methoden der Approximation und semiinfiniten Optimierung (Teubner, Stuttgart, 1982). [161] C. Hillermeier, Nonlinear multiobjective optimization  a generalized homotopy approach (Birkhauser, 2001). [162] D.M. Himmelblau, Applied nonlinear programming (McGrawHill, New York, 1972).
Bibliography
263
[163] J.B. HiriartUrruty and C. Lemarechal, Convex analysis and minimization algorithms^ Part 1 and 2 (Springer, New York, 1993). [164] W. Hock and K. Schittkowski, Test examples for nonlinear programming codes (Springer, Berlin, 1981). [165] E. Hofer and R. Lunderstadt, Numerische Methoden der Optimierung (Oldenbourg, Miinchen, 1975). [166] U. Hoffmann and H. Hofmann, Einfiihrung in die Optimierung (Verlag Chemie, Weinheim, 1971). [167] R.B. Holmes, A course on optimization and best approximation (Springer, Berlin, 1972). [168] R.B. Holmes, Geometric functional analysis and its applications (Springer, New York, 1975). [169] C.S. Hong and Z. Quan, Integral global optimization (Springer, Berlin, 1988). [170] R. Horst, Nichtlineare Optimierung (Hanser, Miinchen, 1979). [171] R. Horst, RM. Pardalos and N. V. Thoai, Introduction to global optimization (Springer, 2001). [172] R. Horst and H. Tuy, Global optimization  deterministic approaches (Springer, 1996). [173] L. Huang, Secondorder directional derivatives in nonsmooth optimization (Chinese University Press, Hong Kong, 1995). [174] R Hupfer, Optimierung von Baukonstruktionen (Teubner, Stuttgart, 1970). [175] M.D. Intriligator, Mathematical optimization and economic theory (SIAM, 2002). [176] A.D. loffe and V.M. Tichomirov, Theorie der Extremalaufgaben (Deutscher Verlag der Wissenschaften, Berlin, 1979). [177] G. Isac, Complementarity problems (Springer, 1992). [178] G. Isac and V. Postolica, The best approximation and optimization in locally convex spaces (Lang, Frankfurt, 1993). [179] S.L. Jacoby, J.S. Kowalik and J.T. Pizzo, Iterative methods for nonlinear optimization problems (PrenticeHall, Englewood Cliffs, 1972).
264
Bibliography
[180] J. Jahn, Mathematical vector optimization in partially ordered linear spaces (Lang, Frankfurt, 1986). [181] J. Jahn, Vector optimization  theory, applications, and extensions (Springer, Berlin, 2004). [182] B. Jansen, Interior point techniques in optimization  complementarity, sensitivity and algorithms (Kluwer, Dordrecht, 1997). [183] F. Jarre and J. Stoer, Optimierung (Springer, 2004). [184] M.W. Jeter, Mathematical programming (Dekker, New York, 1986). [185] H.T. Jongen, P. Jonker and F. Twilt, Nonlinear optimization in finite dimensions  Morse theory, Chebyshev approximation, transversality, flows, parametric aspects (Springer, 2001). [186] H.T. Jongen, K. Meer and E. Triesch, Optimization theory (Springer, 2004). [187] I. Kaliszewski, Quantitative Pareto analysis by cone separation technique (Kluwer, Boston, 1994). [188] E.L. Kaplan, Mathematical programming and games (Wiley, New York, 1982). [189] M.H.Karwan, Redundancy in mathematical programming{Spvinger, Berlin, 1983). [190] C.T. Kelly, Iterative methods for linear and nonlinear equations (SIAM, Philadelphia, 1995). [191] C.T. Kelley, Iterative methods for optimization (SIAM, Philadelphia, 1999). [192] A. Kirsch, W. Warth and J. Werner, Notwendige Optimalitdtsbedingungen und ihre Anwendung (Springer, Berhn, 1978). [193] K.P. Kistner, Optimierungsmethoden (Physica, Heidelberg, 1988). [194] K.C. Kiwiel, Methods of descent for nondifferentiable optimization (Springer, Berlin, 1985). [195] D. Klatte and B. Kummer, Nonsmooth equations in optimization  regularity, calculus, methods and applications (Springer, 2002).
Bibliography
265
[196] B. Kolman and R.E. Beck, Elementary linear programming with applications (Academic Press, 1995). [197] D. Koo, Elements of optimization (Springer, Berlin, 1977). [198] P. Kosmol, Methoden zur numerischen Behandlung nichtlinearer Gleichungen und Optimierungsaufgaben (Teubner, Stuttgart, 1989), [199] P. Kosmol, Optimierung und Approximation (de Gruyter, Berlin, 1991). [200] J.S. Kowalik and M.R. Osborne, Methods for unconstrained optimization problems (Elsevier, New York, 1968). [201] W. Krabs, Optimierung und Approximation (Teubner, Stuttgart, 1975). [202] W. Krabs, Einfilhrung in die Kontrolltheorie (Wissenschaftliche Buchgesellschaft, Darmstadt, 1978). [203] W. Krabs, Optimization and approximation (Wiley, Chichester, 1979). [204] W. Krabs, Einfilhrung in die lineare und nichtlineare Optimierung filr Ingenieure (Teubner, Stuttgart, 1983). [205] W. Krabs, On moment theory and controllability of onedimensional vibrating systems and heating processes (Springer, Berlin, 1992). [206] W. Krabs, Optimal control of undamped linear vibrations (Heldermann, Lemgo, 1995). [207] W. Krabs and S.T. Pickl, Analysis, controllability and optimization of timediscrete systems and dynamical games (Springer, Berlin, 2003). [208] B. Kreko, Optimierung (Deutscher Verlag der Wissenschaften, Berlin, 1974). [209] H.P. Kuenzi and W. Krelle, NichtlineareProgrammierung{Spvmger, Berlin, 1962). [210] H.P. Kuenzi, W. Krelle and R. von Randow, Nichtlineare Programmierung (Springer, Berlin, 1979). [211] H.P. Kuenzi and W. Oettli, Nichtlineare Optimierung (Springer, Berlin, 1969).
266
Bibliography
[212] H.P. Kuenzi, H.G. Tzschach and C.A. Zehnder, Numerische Methoden der mathematischert Optimierung mit ALGOL und FORTRANProgrammen (Teubner, Stuttgart, 1967). [213] H.P. Kuenzi, H.G. Tzschach and C.A. Zehnder, Numerical methods of mathematical optimization (Academic Press, New York, 1971). [214] J.L. Kuester and J.H. Mize, Optimization techniques with Fortran (McGrawHill, New York, 1973). [215] A.H. Land and S. Powell, Fortran codes for mathematical programming (Wiley, London, 1973). [216] K. Lange, Optimization (Springer, 2004). [217] E. Laporte and P. Le Tallec, Numerical methods in sensitivity analysis and shape optimization (Birkhauser, 2002). [218] L.S.LdiSdon^ Optimization theory for large systems (Macmillan, London, 1970). [219] P.J. Laurent, Approximation et optimisation (Herman, Paris, 1972). [220] E.B. Lee and L. Markus, Foundations of optimal control theory (Wiley, New York, 1967). [221] G. Leitmann, Topics in optimization (Academic Press, New York, 1967). [222] G. Leitmann, Einfiihrung in die Theorie optimaler Steuerung und der Differentialspiele (Oldenbourg, Miinchen, 1974). [223] E.S. Levitin, Perturbation theory in mathematical programming and its applications (Wiley, Chichester, 1994). [224] K. Littger, Optimierung (Springer, Berlin, 1992). [225] G. Lorenzen, Parametrische Optimierungen und einige Anwendungen (Oldenbourg, Miinchen, 1974). [226] D.T. Luc, Theory of vector optimization (Springer, Berhn, 1989). [227] D.G. Luenberger, Optimization by vector space methods (Wiley, New York, 1998). [228] D.G. Luenberger, Introduction to linear and nonlinear programming (AddisonWesley, Reading, 1973).
Bibliography
267
[229] D.G.Luenheiger, Linear and nonlinear programming (AddisonWesley, Reading, 1984). [230] K. Malanowski, Stability of solutions to convex problems of optimization (Springer, Berlin, 1987). [231] O.L. Mangasarian, Nonlinear programming (SIAM, 1994). [232] K. Marti, Descent directions and efficient solutions in discretely distributed stochastic programs (Springer, Berlin, 1988). [233] K. Marti, Stochastic optimization methods (Springer, Berlin, 2005). [234] K. Marti and D. Groger, Einfiihrung in die lineare und nichtlineare Optimierung (Physica, 2000). [235] B. Martos, Nonlinear programming theory and methods (NorthHolland, Amsterdam, 1975). [236] A. MarzoUo, Periodic optimization (Springer, Wien, 1972). [237] A. MarzoUo, Controllability and optimization (Springer, Wien, 1972). [238] G.P. McCormick, Nonlinear programming (Wiley, New York, 1983). [239] C. McMillan, Mathematical programming (Wiley, New York, 1970). [240] C.W. Merriam, Optimization theory and the design of feedback control systems (McGrawHill, New York, 1964). [241] K. Miettinen, Nonlinear multiobjective optimization (Kluwer, Boston, 2004). [242] R.E. Miller, Optimization  foundations and applications (Wiley, New York, 2000). [243] M. Minoux, Programmation mathematique (Dunod, Paris, 1983). [244] M. Minoux, Mathematical programming (Wiley, Chichester, 1986). [245] K.V. Mital, Optimization methods in operations research and systems analysis (Wiley, New Delhi, 1976). [246] K.V. Mital, Optimization methods (Wiley, New Delhi, 1977). [247] G. Mitra, Theory and application of mathematical programming (Academic Press, London, 1976).
268
Bibliography
[248] J. Mockus, Bayesian approach to global optimization  theory and applications (Kluwer, Dordrecht, 1989). [249] B.S. Mordukhovich, Variational analysis and generalized differentiation I  basic theory (Springer, Berlin, 2006). [250] B.S. Mordukhovich, Variational analysis and generalized differentiation II  applications (Springer, Berlin, 2006). [251] J.J. More and S.J. Wright, Optimization software guide (SIAM, Philadelphia, 1993). [252] I.H. Mufti, Computational methods in optimal control problems (Springer, Berlin, 1970). [253] W.A. Murray, Numerical methods for unconstrained optimization (Academic Press, London, 1972). [254] K.G. Murty, Linear Programming (Wiley, New York, 1983). [255] H. Nakayama and T. Tanino, Theory and applications of multiobjective programming (in Japanese) (Society of Instrument and Control Engineers, Tokyo, 1994). [256] J.L. Nazareth, Differentiable optimization and equation solving  a treatise on algorithmic science and the Karmarkar revolution (Springer, New York, 2003). [257] J.L. Nazareth, An optimization primer  on models^ algorithms, and duality (Springer, 2004). [258] Y. Nesterov, Introductory lectures on convex optimization  a basic course (Springer, 2004). [259] Y. Nesterov and A. Nemirovskii, Interiorpoint polynomial algorithms in convex programming (SIAM, Philadelphia, 1994). [260] L.W. Neustadt, Optimization (Princeton University Press, Princeton, 1976). [261] J. Nocedal and S.J. Wright, Numerical optimization (Springer, New York, 1999). [262] T. Okabe, Evolutionary multiobjective optimization  on the distribution of offspring in parameter and fitness space (Shaker, Aachen, 2004). [263] G.M. Ostrovskij and J.M. Volin, Methoden zur Optimierung chemischer Reaktoren (Akademie Verlag, Berlin, 1973).
Bibliography
269
[264] A. Osyczka, Evolutionary algorithms for single and multicriteria design optimization (Physica, Heidelberg, 2002). [265] J. Outrata, M. Kocvara and J. Zowe, Nonsmooth approach to optimization problems with equilibrium constraints  theory, applications and numerical results (Kluwer, Dordrecht, 1998). [266] M. Padberg, Linear optimization and extensions (Springer, Berlin, 1999). [267] D. Pallaschke and S. Rolewicz, Foundations of mathematical optimization  convex analysis without linearity (Kluwer, Dordrecht, 1997). [268] M.J. Panik, Classical optimization (NorthHolland, Amsterdam, 1976). [269] M. Papageorgiou, Optimierung (Oldenbourg, Miinchen, 1991). [270] P.M. Pardalos and J.B. Rosen, Constrained global optimization (Springer, Berlin, 1987). [271] P. Pedregal, Introduction to optimization (Springer, New York, 2004). [272] A.L. Peressini, F.E. Sullivan and J.J. Uhl, The mathematics of nonlinear programming (Springer, New York, 1988). [273] G.C. Pflug, Optimization of stochastic models  the interface between simulation and optimization (Kluwer, Boston, 1996). [274] R.R. Phelps, Convex functions, monotone operators and differentiability (Springer, Berlin, 1993). [275] D.A. Pierre, Optimization theory with applications (Wiley, New York, 1969). [276] D.A. Pierre and M.J. Lowe, Mathematical programming via augmented lagrangians (AddisonWesley, Reading, 1975). [277] E. Polak, Computational methods in optimization (Academic Press, New York, 1971). [278] E. Polak, Optimization  algorithms and consistent approximations (Springer, New York, 1997). [279] J.P, Ponstein, Approaches to the theory of optimization (Cambridge University Press, 2004). [280] W. Prager, Introduction to structural optimization (Springer, Wien, 1972).
270
Bibliography
[281] B.N. Psenicnyj, Notwendige Optimalitdtsbedingungen (Oldenbourg, Miinchen, 1972). [282] B.N. Psenicnyj and J.M. Danilin, Numerische Methoden fur Extremalaufgaben (Deutscher Verlag der Wissenschaften, Berlin, 1982). [283] B.N. Pshenichnyj, The linearization method for constraint optimization (Springer, Berlin, 1994). [284] L. Pun, Introduction to optimization practice (Wiley, New York, 1969). [285] J. Ramik and M. Vlach, Generalized concavity in fuzzy optimization and decision analysis (Springer, 2001). [286] S.S. Rao, Optimization (Wiley, New Delhi, 1978). [287] T. Rapcsak, Smooth nonlinear optimization in R^ (Kluwer, Dordrecht, 1997). [288] H. Ratschek and J. Rokne, New computer methods for global optimization (Ellis Horwood Limited, Chichester, 1988). [289] B.S. Razumikhin, Physical models and equilibrium methods in programming and economics (Reidel, Dordrecht, 1984). [290] G.V. Reklaitis, K.M. Ravindran and A. Raysdell, Engineering optimization (Wiley, New York, 1983). [291] C. Richter, Optimierungsverfahren und BASICProgramme (AkademieVerlag, Berlin, 1988). [292] C. Richter, K. Schittkowski and R. Hettich, Sequential quadratic programming  theory and applications (AkademieVerlag, Berlin, 1993). [293] K.J. Richter, Methoden der Optimierung (Fachbuchverlag, Leipzig, 1970). [294] R.T. Rockafellar, Convex analysis (Princeton University Press, Princeton, 1970). [295] R.T. Rockafellar, Conjugate duality and optimization (SIAM, Philadelphia, 1974). [296] R.T. Rockafellar, The theory of subgradients and its applications to problems of optimization: convex and nonconvex functions (Heldermann, Berlin, 1981).
Bibliography
271
[297] R.T. Rockafellar, Network flows and monotropic optimization (Wiley, New York, 1984). [298] R.T. Rockafellar and R.J.B. Wets, Variational analysis (Springer, 2004). [299] C. Roos, T. Terlaky and J.Ph. Vial, Theory and algorithms for linear optimization  an interior point approach (Wiley, Chichester, 1997). [300] C. Roos, T. Terlaky and J.R Vial, Interior point methods for linear optimization (Springer, 2006). [301] A. Rubinov, Abstract convexity and global optimization (Springer, 2000). [302] A. Rubinov and X.q. Yang, Lagrangetype functions in constrained nonconvex optimization (Springer, 2003). [303] D.L. Russell, Optimization theory (Benjamin, New York, 1970). [304] B. Rustem, Algorithms for nonlinear programming and multipleobjective decisions (Wiley, Chichester, 1998). [305] M. Sakawa, Fuzzy sets and interactive multiobjective optimization (Plenum, New York, 1993). [306] H.J. Sander, Dualitdt bei Optimierungsaufgaben (Oldenbourg, Miinchen, 1973). [307] Y. Sawaragi, H. Nakayama and T. Tanino, Theory of multiobjective optimization (Academic Press, Orlando, 1985). [308] W. Schirotzek, Differenzierbare Extremalprobleme (Teubner, Leipzig, 1989). [309] K. Schittkowski, More test examples for nonlinear programming codes (Springer, Berlin, 1987). [310] R. Schmidt, Advances in nonlinear parameter optimization (Springer, Berlin, 1982). [311] E. Seiffart and K. Manteuffel, Lineare Optimierung (Teubner, 1991). [312] J.F. Shapiro, Mathematical programming (Wiley, New York, 1979). [313] H.D. Sherali, Optimization with disjunctive constraints (Springer, Berlin, 1980).
272
Bibliography
[314] N.Z. Shor, Minimization methods for nondifferentiable functions (Springer, Berlin, 1985). [315] N.Z. Shor, Nondifferentiable optimization and polynomial problems (Springer, 1998). [316] G. Sierksma, Linear and integer programming  theory and practice (Dekker, New York, 1996). [317] I. Singer, Duality for nonconvex approximation and optimization (Springer, New York, 2006). [318] S.M. Sinha, Mathematical programming  theory and methods (Elsevier, 2006). [319] M. Sniedovich, Dynamic programming (Dekker, New York). [320] J.A. Snyman, Practical mathematical optimization  an introduction to basic optimization theory and classical and new gradientbased algorithms (Springer, 2005). [321] P. Spellucci, Numerische Verfahren der nichtlinearen Optimierung (Birkhauser, Basel, 1993). [322] K. Stahl and N. Schulz, Mathematische Optimierung und mikrookonomische Theorie (Springer, Berlin, 1981). [323] J. Stoer and C. Witzgall, Convexity and optimization in finite dimensions (Springer, Berlin, 1970). [324] R.G. Strongin and Y.D. Sergeyev, Global optimization with nonconvex constraints  sequential and parallel algorithms (Springer, 2000). [325] W. Sun and Y.x. Yuan, Optimization theory and methods nonlinear programming (Springer, 2006). [326] R.K. Sundaram A first course in optimization theory (Cambridge University Press, 1996). [327] D. Sworder, Optimal adaptive control systems (Academic Press, New York, 1966). [328] M. Tawarmalani and N.V. Sahinidis, Convexification and global optimization in continuous and mixedinteger nonlinear programming  theory, algorithms, software, and appl ications (Springer, 2003). [329] V.M. Tichomirov, Fundamental principles of the theory of extremal problems (Wiley, Chichester, 1986).
Bibliography
273
[330] H. ToUe, Optimierungsverfahren fur Variationsaufgaben mit gewohnlichen Differentialgleichungen als Nebenbedingungen (Springer, Berlin, 1971). [331] H. ToUe, Optimization methods (Springer, Berlin, 1975). [332] A. Torn and A. Zilinskas, Global optimization (Springer, Berlin, 1989). [333] F. Troltzsch, Optimality conditions for parabolic control problems and applications (Teubner, Leipzig, 1984). [334] J.L. Troutman, Variational calculus and optimal control  optimization with elementary convexity (Springer, Berlin, 1995). [335] V. Tsurkov, Largescale optimization  problems and methods (Springer, 2001). [336] H. Tuy, Convex analysis and global optimization (Kluwer, Dordrecht, 1998). [337] R.J. Vanderbei, Linear programming  foundations and extensions (Kluwer, Boston, 1996). [338] G.N. Vanderplaats, Numerical optimization techniques for engineering design (McGrawHill, New York, 1984). [339] J. Varga, Praktische Optimierung (Oldenbourg, Miinchen, 1974). [340] R.S. Varga, Functional analysis and approximation theory in numerical analysis (SIAM, Philadelphia, 1991). [341] S.A. Vavasis, Nonlinear optimization: complexity issues (Oxford University Press, 1992). [342] T.L. Vincent and W.J. Grantham, Optimality in parametric systems (Wiley, New York, 1981). [343] W. Vogel, Vektoroptimierung in Produktrdumen (Hain, Meisenheim am Glan, 1977). [344] J. von Neumann and O. Morgenstern, Theory of games and economic behavior (Princeton University Press, Princeton, 1944). [345] M. Walk, Theory of duality in mathematical programming (Springer, Wien, 1989). [346] G.R. Walsh, Methods of optimization (Wiley, London, 1975). [347] J. Werner, Optimization  theory and applications (Vieweg, Braunschweig, 1984).
274
Bibliography
[348] R.J. Wets, Grundlagen konvexer Optimierung (Springer, Berlin, 1976). [349] D.J. White, Optimality and efficiency (Wiley, Chichester, 1982). [350] D.J. Wilde, Optimum seeking methods (PrenticeHall, Englewood Cliffs, 1964). [351] D.J. Wilde, Globally optimal design (Wiley, New York, 1978). [352] D.J. Wilde and C.S. Beightler, Foundations of optimization (PrenticeHall, Englewood CUffs, 1967). [353] H.P. Williams, Model building in mathematical programming (Wiley, Chichester, 1985). [354] D.A. Wismer, Optimization methods for largescale systems (McGrawHill, New York, 1971). [355] D.A. Wismer and R. Chattergy, Introduction to nonlinear optimization (NorthHolland, New York, 1978). [356] S.J. Wright, Primaldual interiorpoint methods (SIAM,1997). [357] Z.B. Zabinsky, Stochastic adaptive search for global optimization (Springer, 2003). [358] F. Zach, Technisches Optimieren (Springer, Wien, 1974). [359] R. Zielinski and P. Neumann, Stochastische Verfahren zur Suche nach dem Minimum einer Funktion (AkademieVerlag, Berlin, 1983). [360] G. Zoutendijk, Mathematical programming methods (NorthHolland, Amsterdam, 1976). [361] S.I. Zuchovickij and L.I. Avdeeva, Linear and convex programming (Saunders, Philadelphia, 1966).
Answers to t h e Exercises CHAPTER 2 2.1) Use Definition 2.1 and notice that in a finite dimensional normed space weak convergence is equivalent to norm convergence. 2.2) Show for the functions / i , /s : K ^ K with
I I
forallx>l J
and n f s _ j for all X < ~1 1 Moo)  I ^^^^ ^^^ ^jj X >  1 j that the level sets S^' := {x eR\ fi{x) < a} and S^^ := {x G ^ I /2(^) ^ to_i(/(. + Ae.)/(.)) = M£) which results in i; = V/(x). 3.7) The sub and super differential can be chosen as {(sgn a;ix2, sgn X2xi)} if X1X2 7^ 0 o/(xi, X2) = { {{u, 0) I 1^1 0 and /(O) — 0. The subadditivity can be simply shown.
Answers to the Exercises
279
4.3) Take xi,a;2 G cone(5'), i.e. xi = AiSi and X2 = X2S2 for some Ai, A2 > 0 and Si, S2 G S. Without loss of generality we assume Ai + A2 7^ 0. Then 2;i + a^2 = (Ai + A2) I T—;^si + T—r"^S2 ) e cone(5). \ A i + A2
Ai + A2
/
4.4) cone(S')=R^. 4.5) T(5, (1,2)) = {A(a,  1 )  A > 0, a e [1,1]}. 4.6) Simply apply Definition 4.6. 4.7)
(a) Apply Definition 4.6 and notice that Si C 82(b) By part (a) one obtains T{Si D S2,x) C T{Si,x) and T{SinS2,x) C r(52,S) implying r(5in^2,:r) C T{Sux)n
ns2,x), 4.8) See the remark under 4. on page 31 in [192]. 4.9) Apply Theorem 2.4.5 in [68]. The assertion then follows from Proposition 2.2.1 in [68]. This result is also proved on the pages 1718 in [296, Theorem 2E]. 4.10) / is not pseudoconvex at x = 0.
CHAPTER 5 5.1) Since x ^ S = c\{S) and S is convex, the separation theorem C.3 gives the desired result. 5.2) See Lemma 1.1 and Lemma 2.1 in [192]. 5.3) The Slater condition given in Theorem 5.9 is satisfied (take x :=
(hi))5.4)
(a) Since Xi,a;2 > 0, it follows Xi + 0^2 > 0 for all feasible {xi,X2) G R 2 .
280
Answers to the Exercises
(b) No. (c) No. 5.5)
(a) (2,1). (b) (1.5,2.25). / \ / 1 8 5 _55_ VW V768' 7 6 8 '
_5_\ 16/*
5.6) Yes. For all feasible (x,y) it follows x + 3y + 3 ^ 1 2x + y + Q 2
ly ^1 2x + y + Q  2'
Since ^^^y ^^ > 0 for y > 0, we conclude that there are no other solutions. 5.7) This problem satisfies the ArrowHurwiczUzawa condition. Then the KarushKuhnTucker conditions give the desired assertion. 5.8) Choose an arbitrary x E S, show Vf{xY{x — x) > 0 and conclude that X is a minimal point of / on 5. 5.9) The function p with Pit) = \ ( l  e'') ( 2 ) fo^ ^11 ^ ^ [0' 1] satisfies the adjoint equation (5.36) and the transversality condition (5.37). Then an optimal control u = {ui^U2) is given as (6 J 2 Ui{t) =
0 for all x G M". For X = {0,... ,0,Xi,... ,Xj,0,... ve then obtain Xj G M we 0
0. So, the constraint set of the primal problem can be written as {(xi,X2) G M^  Xi = 0, ^2 > 0} and, therefore, the extremal value of the primal problem equals 0. With Exercise 7.8) the dual problem can be written in this special case as max[/33
subject to the constraints 2Ui2 + C/33  1 U22 = 0
v^s\
or equivalently
maxt/33 subject to the constraint
( '
Un (lC/33)
i(lt/33) 0
t/31 \ t/32
Usi
f/32
Us3 )
eSl
Since the matrix defining the constraint is positive semidefinite, by Exercise 7.5) the leading block matrices U^^ := (Un) and rri2._f  '
U^^ i(lf/33)
1(1  t/33) 0
Answers to the Exercises
287
have to be positive semidefinite as well. The eigenvalues of the matrix U^'^ are Al/2 = ^
±
% ) +i(l^33)^
They are nonnegative if and only if t/n > 0 and Uss = 1. Then the extremal value of the dual problem equals —1. So, the extremal values of the primal and dual problem do not coincide. We consider an arbitrary x G E^ for which the matrix B — A(x) is negative semidefinite. Then one eigenvalue of this matrix equals 0. Therefore, B — A{x) is not negative definite. Hence, the generalized Slater condition is not satisfied and Theorem 7.12 is not applicable.
CHAPTER 8 8.1) An optimal (feedback) control u is given by u{t) =
12
x{t) almost everywhere on [0,2].
7e4(t2) + 9 '
8.2) An optimal (feedback) control u is given by u(t) = —tanh (1 — t) x{t) almost everywhere on [0,1]. 8.3) The equivalent system of linear diff"erential equations of first order reads (
0 0
1 0
0 1
•• • ••
0 0
/ 0 \ 0
\
x{t) =
x{t) + 0
0
0
••
ai
—02
• •
1
almost everywhere on [0,T]. This system satisfies the Kalman condition.
u{t) 0
288
Answers to the Exercises
8.4) The eigenvalues of A are Ai = ^/ai^ A2 = —\foL%^ A3 = A4 = 0. For every eigenvalue of A we get the implication z^{A  Xjl, S)  0^5
==^ z=
resulting in Rank {A  Xjl, J5) = 4 for j = 1 , . . . , 4. Hence, the Hautus condition is fulfilled. 8.5) By Theorem 8.7 there is a time minimal control iZ, and by Theorem 8.11 there is a vector r/ 7^ 0]R4 with u{t)
= sgn —p= sin t\/a — 772/? cos t^/a — rj^jt + 7/47 almost everywhere on [0, T]
(T denotes the minimal time). Since the term in brackets has only finitely many zeros, the time minimal control u is unique.
Index AC^[to,ti]27 "active" inequality constraints 121, 124, 133 adjoint equation 138, 149 alternation theorem 180 antisymmetric 249, 250 approximation problem 20 ArrowHurwiczUzawa condition 123 asymptotically stable 193 Banach space 243 basic version of the HahnBanach theorem 245 Bernoulli matrix differential equation 216 best approximation 20 binary relation 249 C{M) 22, 175 characterization of a convex functional 12, 41 Chebyshev approximation 22 Clarke derivative 68 differentiable 68 tangent cone 83, 104 closed loop control 218 completely positive matrices 201
concave functional 12 cone 79, 250 cone(5) 81 conic optimization 187 conic optimization problem 188 constraint set 1 contingent cone 82, 96, 102, 103 controllable 137, 149 convex functional 11, 92, 94 mapping 159 set 11 convexlike mapping 160 copositive optimization problem 190 copositive ordering cone 190 (7quasiconvex 127 d.c. functional 58 directional derivative 31, 54 directionally differentiable 31 doubly nonnegative ordering cone 190 dual cone 251 problem 164 duality gap 165 duality theory 207
290
Eidelheit separation theorem 245 epigraph 9, 103 EulerLagrange equation 48 feedback control 217, 218 Fejer theorem 198 F. John conditions 120 Frechet derivative 39 differentiable 39 property 59, 63 fundamental matrix 223 Gateaux derivative 38 differentiable 38 generalized gradient 73 Kolmogorov condition 57 Lagrange multiplier rule 110, 116 Slater condition 165, 172, 208 generated cone 81, 103 Hamilton function 149 HahnBanach theorem 245, 247 Hautus condition 237 James theorem 243 John von Neumann saddle point theorem 169 K{T) 222 Kalman condition 149, 237 KarushKuhnTucker conditions 120 iCcopositive optimization problem 190
Index
/Ccopositive ordering cone 189 KurcyuszRobinsonZowe regularity assumption 115, 118 Lagrange functional 115, 168 Lipschitz constant 59 continuous 59 local Pontryagin maximum principle 138, 149 minimal point 18 Lowner ordering cone 189, 197 Lowner partial ordering 189 Lyapunov function 194 Lyusternik theorem 96 minimal point 1, 7 necessary optimality condition 36, 43, 47, 53, 55, 56, 66, 74, 89, 90, 95, 108, 111, 137, 230 nonnegative ordering cone 190 normal 233 objective functional 1 open loop control 219 optimal control problem 26, 137, 213, 221 optimality conditions 202 optimization problem 7, 106, 162, 172 ordering cone 251
Index
partial ordering 249 partially ordered linear space 249 pointed cone 79, 250 Pontryagin maximum principle 137 positive cone 251 homogenity 34 primal problem 162, 172, 175 problem of Chebyshev approximation 22 linear Chebyshev approximation 175, 180 proximinal 20 pseudoconvex 91, 94 quasiconvex 14, 93, 94 quasidifferentiable 57, 63 quasidifferential 57 quasilinear 133 R{T) 223 reachable set 223 redundant constraint 124 reflexive 243 regularity assumption 114, 118, 123 representation theorem for positive linear forms 178 Riccati matrix differential equation 219 saddle point 168, 169, 170, 172 Schur complement 191 semidefinite optimization problem 189
291
semiinfinite optimization problem 176, 177 separation theorem 246, 248 sequential Bouligand tangent cone 82 set of attainability 222 sgn(2/) 228 Slater condition 123 stabilizable 194 starshaped set 35 strong bangbang principle 230, 233 duality theorem 165, 208 structural optimization 195 subadditivity 34 subdifferential 49, 51, 57 of the norm 50 subgradient 49 sublinear functional 34, 103 sufficient optimality condition 36, 53, 55, 56, 89, 95, 128, 131, 152 superdifferential 57 T{S, x) 82 tangent vector 82 transversality condition 138 U{T) 222 uniform approximation 22 W^,ooKt^] 107 weak bangbang principle 230 duality theorem 164, 208 limit 241
292
weakly convergent 241 lower semicontinuous 8 sequentially closed 242 sequentially compact 242 Y{t) 223
Index