Summary

Summary Introduction to Optimization

Rating

Sold

Pages

Uploaded on

10-04-2025

Written in

2024/2025

Summary and lecture notes of Introduction To Optimization

Institution

Course

Content preview

Introduction to Optimization

Week 1

1 Introduction
1.1 Optimization problems and related terminology
Consider a set X and a subset C ⊂ X. A minimization problem consists in finding a point xmin ∈ C such
that f (xmin ) ≤ x for all x ∈ C. f : X → R is called the objective function, C is called the feasible set,
and its elements are feasible points. The problem is constrained if C ̸= X and unconstrained otherwise.
xmin is called the (global) minimizer of f on C, or a solution to the problem

min(f ) = min f (s) = min{f (x) : x ∈ C}
C x∈C

The set of such solutions is denoted by arg minC (f ). If C = X, we often leave out the C.
If a function has no minimizers, we still may be interested in finding the value inf C (f ) = inf{f (x) : x ∈
C}. In any case, the set of minimizers can be described as
\
arg min(f ) = [f ≤ γ], where [f ≤ γ] = {x ∈ X : f (x) ≤ γ}
C
γ>inf C (f )

[f ≤ γ] is called the γ-sublevel set of f .
A maximization problem is equivalent by

max f (x) = − min{−f (x)}
x∈C x∈C

1.2 Two simple examples in R that actually tell us a lot
1.2.1 Cost-efficient cylindrical containers
A cylindrical container has a top, of say cost a, and a side of say cost b. The total cost is then given by

C(r, h) = 2πar2 + 2brπh

while the volume of the can is given by
V = πr2 h
From these two we can find that the optimal height h̄ is dependent on the optimal radius r̄ by
2r̄a
h̄ =
b
Generalizing the techniques we used, we end up with two propositions:
Proposition 1.1 Let I be an open interval, and let f : I → R be twice differentiable, with f ′′ (x) > 0
for all x ∈ I. If there is x̄ ∈ I such that f ′ (x̄) = 0, then f (x̄) < f (x) for every x ∈ I ̸ {x̄}.
Proposition 1.2 Let I be an open interval, and let f : I → R be differentiable. If there is x̄ ∈ I such
that f (x̄) ≤ f (x) for every x ∈ I, then f ′ (x̄) = 0

1

,1.2.2 A broken fastest path
Fermat’s Principle of Least Time states that the path taken by a ray of light between two given points
is the path that can be traveled in the shortest time. We can do the math ourselves.
Proposition 1.5 (law of refraction) For a given pair of media, the ratio of the sines of the angles of
incidence and refraction equals the ratio of the speed of light in the corresponding media.

1.3 Examples from data analysis
Consider a data set
D = {(aj , yj ) ∈ X × Y : j = 1, 2, . . . , J}
where the aj ’s are certain features (within set X) and the yj ’s are observations or labels (in set Y ),
corresponding to a sample or individual j ∈ {1, . . . , J}. The aim is to discover or learn a function
ϕ : X → Y such that ϕ(aj ) ≃ yj . Typically, ϕ will belong to a family of functions depending on a
parameter θ ∈ Rn .

1.3.1 The loss function approach
Define a loss function
J
1X
LD (θ) = ℓ(aj , yj , θ)
J j=1

One of the most typical problems in data analysis is the linear least squares problem:
J
1 X T 1
minn (θ aj − yj )2 = minn ||Aθ − y||2
θ∈R 2J j=1 θ∈R 2J

In this case, ℓ(a, y, θ) is therefore (θT a − y)2 .
If additionally to approximating the data with a linear structure, we want to impose a structure on θ,
we can add a regularization term to the problem. Several examples:
• Ridge regression: adds λ||θ||2 (less sensitive to pertubatuions)
• LASSO: adds λ||θ||1 (takes into account sparseness of θ, but not differentiable)

1.3.2 Support Vector machines
A Support Vector Machine (SVM) is an optimization approach to solving binary classification problems
(let’s say y takes values −1 and 1). We seek a hyperplane that seperates the sets {aj : yj = −1} and
{aj : yj = 1}. This hyperplane need not exist, in which case we can still find the best possible fit. If
such a hyperplane exists, we want to maximize the distance to either side.

2 Basic optimization theory
2.1 Metric and topological structure of Rn

x1
x ∈ Rn ⇐⇒ x =  ...  , x1 , . . . , xn ∈ R
 

xn
√
The Euclidean norm of x ∈ Rn is ||x|| = x1 + · · · + xn . Recall a norm satisfies
• ||0|| = 0 and ||x|| > 0 for x ̸= 0
• ||ax|| = |a|||x|| for all x ∈ Rn and a ∈ R
• ||x + y|| ≤ ||x|| + ||y|| for all x, y ∈ Rn .

2

, The 1-norm and ∞-norm are defined as

||x||1 = |x1 | + · · · + |xn |, ||x||∞ = max{|x1 |, . . . , |xn |}

The distance between x, y ∈ Rn is dist(x, y) = ||x − y||. A metric satisfies the following properties:
• dist(x, x) = 0 and dist(x, y) > 0 for x ̸= y
• dist(x, y) = dist(y, x)
• dist(x, y) ≤ dist(x, z) + dist(z, y)
The open and closed ball centered at x ∈ Rn with radius r > 0 are given by

B(x, r) = {y ∈ Rn : ||x − y|| < r}, B̄(x, r) = {y ∈ Rn : ||x − y|| ≤ r}

A subset S ⊂ Rn is open if for each x ∈ S, there exists r > 0 such that B(x, r) ⊂ S.
Open balls are open sets. The intersection of a finite number of open sets is open, while any union (finite
or infinite) of open sets is open.
´
The
´ interior of a set S ⊂ Rn , noted by (S), is the largest ´open set contained in S. The elements of
(S) are called interior points of S. S is open if and only if (S) = S.
A subset S ∈ Rn is closed if and only if its complement is open. Closed balls are closed sets.
A sequence (xk ) in Rn converges to x̄ ∈ Rn (as k → ∞) if, for every ϵ > 0, there is k0 ∈ N such that
xk ∈ B(x̄, ϵ) In this case, x̄ is the limit of (xk ).
Proposition 2.8 A set S ⊂ Rn is closed if and only if every convergent sequence of points in S has its
limit in S.
The closure of a set S ⊂ Rn , denoted by S̄, is the smallest closed set that contains S. A set S is closed
if and only if S̄ = S.
A set S ⊂ Rn is bounded if it is contained in a ball.
Proposition 2.9 (Bolzano-Weierstrass) Every bounded sequence in Rn has a convergent subsequence.
A set C ∈ Rn is compact if it is both closed and bounded.
Corollary 2.10 Every sequence in a compact set has a subsequence that converges to a point in the set.

2.2 Continuous functions and existence of minimizers
Let f : D ⊂ Rn → R, let x̄ ∈ D, and let ℓ ∈ R. If, for every ϵ > 0 there is δ > 0 such that |f (x) − ℓ| < ϵ
whenever ||x − x̄|| < δ, we say that f (x) converges to ℓ as x tends to x̄, and that ℓ is the limit of f (x)
as x tends to x̄. In this case, we write ℓ = limx→x̄ f (x).
A function f : D ⊂ Rn → R is continuous at a point x̄ ∈ D if limx→x̄ f (x) = f (x̄). If f is continuous at
every point of D, it is continuous in D, or just continuous. f is continuous at x̄ if and only if for every
sequence (xk ) converging to x̄, we have limk→∞ f (xk ) = f (x̄).
Theorem 2.12 (Weierstrass) If C ⊂ Rn is compact and f : C → R is continuous, then there exists
xmin , xmax ∈ C such that f (xmin ) ≤ f (x) ≤ f (xmax ) for all x ∈ C.
A function f : Rn → R is coercive if for every R > 0, there is ρ > 0 such that f (x) > R for every
x ̸∈ B(0, ρ). In this case, we shall often write lim||x||→∞ f (x) = ∞.
Given γ ∈ R, the γ-level set of f : D ⊂ Rn → R is

[f = γ] = {x ∈ D : f (x) = γ}

The level and sublevel sets of a continuous function are closed.
Proposition 2.15 The sublevel sets of a coercive function must be bounded.

3

Report Copyright Violation

Written for

Institution: Rijksuniversiteit Groningen (RuG)
Study: mathemathics
Course: Introduction to Optimization (WBMA05405)

All documents for this subject (1)

Document information

Uploaded on: April 10, 2025
Number of pages: 29
Written in: 2024/2025
Type: SUMMARY

Subjects

optimization theory
minimizers
convexity
stochastic gradient
proximity operator
dual problems
smooth unconstrained minimization
minimization of nonsmooth convex functions

4.21 $

Get access to the full document:

Written by students who passed

Immediately available after payment

Read online or as PDF

Get to know the seller

jardnijholt

Get to know the seller

jardnijholt Rijksuniversiteit Groningen

View profile

Sold

Member since

1 year

Number of followers

Documents

Last sold

11 months ago

0.0

0 reviews

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

Satisfaction guarantee: how does it work?

Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.

Who am I buying these notes from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller jardnijholt. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy these notes for 4.21 $. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews) 49231 documents were sold in the last 30 days Founded in 2010, the go-to place to buy study notes for 16 years now

Summary Introduction to Optimization

Content preview

Written for

Document information

Subjects

Get to know the seller

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Didn't get what you expected? Choose another document

Pay as you like, start learning right away

Working on your references?

Frequently asked questions

What do I get when I buy this document?

Satisfaction guarantee: how does it work?

Who am I buying these notes from?

Will I be stuck with a subscription?

Can Stuvia be trusted?