Stochastic Operations Research Models
Max Batstra
March 29, 2026
1
, 1 The Poisson Process
1.1 Exponential Distribution
A non-negative stochastic variable X is said to be exponentially distributed with parameter λ, so X ∼ Exp(λ)
if: (
1 − e−λt if t ≥ 0
F (t) = P (X ≤ t) =
0 if t < 0
Then, X’s density is given by: (
λe−λt if t > 0
f (t) = P (X = t) =
0 if t < 0
n!
Note that the derivative of F (t) does not exist in t = 0. Note that in general E[X n ] = n , which follows
λ
from n-times repeatedly partial integration, as:
Z ∞ Z ∞
n n n ∞
E[X ] = t f (t)dt = [t F (t)]0 − n tn−1 (1 − eλt )dt
0 0
∞ n ∞ n−1
Z
−λt
n
= t (1 − e )) 0 − λt − tn−1 λe λt
|{z} dt
λ 0
=f (t)
Z ∞
nλ n n
= lim tn − lim ·t + tn−1 f (t)dt
t→∞ t→∞ λn λ 0
n
= 0 + E[X n−1 ]
λ
..
.
n · (n − 1)
= E[X n−2 ]
λ2
..
.
n!
= E[X 0 ]
λn Z
n! ∞
= n 1 · f (t)d(t)
λ 0
n!
= n ·1
λ
The exponential distribution has the important property of memorylessness, which is:
P (X > t + u) e−λ(t+u)
P (X > t + u | X > t) = = = e−λu = P (X > u)
P (X > t) e−λt
Interpretation: past knowledge doesn’t tell us anything about the future. Suppose a light bulb is more
than t years old, and that this is known. If we want to estimate the remaining lifetime using the Poisson
distribution, then the probability that the light bulb will remain at least another u time periods, is not
influenced by the fact that the light bulb is more than t years old.
2
,There are only two distributions with the memorylessness property:
1. Poisson distribution (only continuous distribution with this property)
2. Geometric distribution (only discrete distribution with this property)
There are two important properties for X1 , . . . , Xn independent exponential random variables with parame-
ters λ1 , . . . , λn respectively:
Property 1.1. Let Xi ∼ Exp(λi ) for i = 1, . . . , n with X1 , . . . , Xn independent, then:
min{X1 , . . . , Xn } ∼ Exp(λ1 + . . . + λn )
Property 1.2. Let Xi ∼ Exp(λi ) for i = 1, . . . , n with X1 , . . . , Xn independent, then:
Y λi
P (Xi = min{X1 , . . . , Xn }) =
λi + λj
j̸=i
Proofs
1. Suppose Xi = min{X1 , . . . , Xn },then P (min{X1 , . . . , Xn } < x) = P (Xi < x). Since it’s unknown what
Xi = min{X1 , . . . , Xn }, it follows that P (min{X1 , . . . , Xn } < x) = P (X1 < x ∪ . . . ∪ Xn < x). Using
the inclusion-exclusion principle: for 3 events A, B and C we have that:
P (A ∪ B ∪ C) = P (A) + P (B) + P (C) − P (A ∩ B) − P (B ∩ C) − P (A ∩ C) + P (A ∩ B ∩ C)
If events A, B and C are indpendent it follows that:
P (A ∪ B ∪ C) = P (A) + P (B) + P (C) − P (A)P (B) − P (B)P (C) − P (A)P (C) + P (A)(B)P (C)
Let event A, B and C be events X1 < x, X2 < x and X3 < x respectively, where X1 , X2 , X3 are
independent exponential random variables with parameters λ1 , λ2 and λ3 respectively, then we can
derive that:
P (X1 < x ∪ X2 < x ∪ X3 < x) = P (X1 < x) + P (X2 < x) + P (X3 < x) − P (X1 < x)P (X2 < x)
− P (X2 < x)P () − P (X1 < x)P (X3 < x) − P (X1 < x)P (X2 < x)P (X3 < x)
..
.
= 1 − e−(λ1 +...+λn )x
Q
2. Note that P (Xi = min{X1 , . . . , Xn }) = P (Xi < X1 , . . . , Xi < Xn ) = j̸=i P (Xi < Xj ) due to that
3
, X1 , . . . , Xn are independent, therefore:
Y YZ ∞ Z yj
P (Xi = min{X1 , . . . , Xn }) = P (Xi < Xj ) = λi e−λi x λj e−λj yj dxdyj
[1] 0 0
j̸=i j̸=i
YZ ∞
= (1 − e−λi yj ) · λj e−λj yj dyj
j̸=i 0
YZ ∞
= λj e−λj yj − λj e−(λi +λj )yj dyj
j̸=i 0
Y yj →∞ λj h −(λi +λj )yj iyj →∞
= e−λj yj yj =0 − −e
λi + λj yj =0
j̸=i
Y λj Y λi
= 1− =
λi + λj λi + λj
j̸=i j̸=i
1.2 Erlang Distribution
k
Pk
A random variable Y has an Erlang-k (k = 1, 2, . . .) distribution with E[Y ] = λ if Y = i=1 Xi , where
i.i.d.
X1 , . . . , Xk ∼ Exp(λ). We write this as Y ∼ Ek (λ) or Y ∼ Erlang(k, λ), where λ is called the scale
parameter, and k is called the shape parameter.
For Y ∼ Ek (λ):
(λt)k−1 −λt
f (t) = λ e , t>0
(k − 1)!
and
k−1
X (λt)j −λt
F (t) = 1 − e , t≥0
j=0
j!
Y ∼ Ek (λ) has the following properties:
k k
X k X k Var(Y ) 1
E[Y ] = E[Xi ] = Var(Y ) = Var(Xi ) = c2Y ≡ 2 ·
i=1
λ i=1
λ2 (E[Y ]) k
1.3 Counting process definition
A stochastic process is a collection of random variables {X(t), t ∈ T }. That is, for each t ∈ T , X(t)
is a random variable. We often interpret t as time, and call X(t) the state of the process at time t. If
T = {0, 1, . . .}, then we usually write X0 , X1 , . . . and we speak of a discrete-time stochastic process.
A stochastic process {N (t), t ≥ 0} is called a counting process when N (t) represents the number of ’events’
that have occurred in the time interval (0, t]. All counting processes have to satisfy the following properties:
(i) N (0) = 0 (No events can have occurred if duration is 0)
4
, (ii) N (t) ∈ Z for all t (Counting function, so only allows integers)
(iii) N (s) ≤ N (t) if s < t, so N (t) is increasing in t (No information loss)
(iv) For s < t, the quantity N (t) − N (s) represents the number of events in (s, t]
Special cases:
• Independent increments
If (a, b] ∩ (c, d] = ∅, then N (b) − N (a) and N (d) − N (c) are independent.
• Stationary increments
It holds that N (t) − N (s) and N (t + u) − N (s + u) are identically distributed for all s, t, u > 0 and s < t
A Poisson process is a counting process with independent and stationary increments, that also has a third
property that can be described in multiple ways.
Definition 1.3. [Poisson Process (i)]
The counting process {N (t), t ≥ 0} is a Poisson process with rate λ if
1. N (0) = 0
2. The process has stationary points and independent increments
e−λt (λt)n
3. P (N (t) = n) = with n = 0, 1, . . . , and t ≥ 0
n!
Hence N (t) is for all t > 0 Poisson distributed, with mean E[N (t)] = λt, which is the expected number events
in (0, t)
Definition 1.4. [Poisson Process (ii)]
The counting process {N (t), t ≥ 0} is a Poisson process with rate λ if
1. N (0) = 0
2. The process has stationary points and independent increments
3.
P (N (h) = 1) = λh + o(h), with h ↓ 0
P (N (h) ≥ 2) = o(h), with h ↓ 0
f (h)
Where f (·) ≡ o(h) if lim = 0, Interpretation: o(h) vanishes (becomes 0) faster than h, as h ↓ 0.
h→0 h
We’ll prove that these two definitions are equivalent. Note that as h → 0, we may assume that (λh)n → 0
faster than h → 0 for all n ≥ 2 and n ∈ N, since λ cannot be sufficiently great to ensure that h < (λh)n .
Also, recall the Taylor expansion for an infinitely differentiable function f (x) around a point c:
f ′′ (c) f ′′′ (c)
f (x) = f (c) + f ′ (c)(x − c) + (x − c)2 + (x − c)3 + . . .
2! 3!
Proof
5
Max Batstra
March 29, 2026
1
, 1 The Poisson Process
1.1 Exponential Distribution
A non-negative stochastic variable X is said to be exponentially distributed with parameter λ, so X ∼ Exp(λ)
if: (
1 − e−λt if t ≥ 0
F (t) = P (X ≤ t) =
0 if t < 0
Then, X’s density is given by: (
λe−λt if t > 0
f (t) = P (X = t) =
0 if t < 0
n!
Note that the derivative of F (t) does not exist in t = 0. Note that in general E[X n ] = n , which follows
λ
from n-times repeatedly partial integration, as:
Z ∞ Z ∞
n n n ∞
E[X ] = t f (t)dt = [t F (t)]0 − n tn−1 (1 − eλt )dt
0 0
∞ n ∞ n−1
Z
−λt
n
= t (1 − e )) 0 − λt − tn−1 λe λt
|{z} dt
λ 0
=f (t)
Z ∞
nλ n n
= lim tn − lim ·t + tn−1 f (t)dt
t→∞ t→∞ λn λ 0
n
= 0 + E[X n−1 ]
λ
..
.
n · (n − 1)
= E[X n−2 ]
λ2
..
.
n!
= E[X 0 ]
λn Z
n! ∞
= n 1 · f (t)d(t)
λ 0
n!
= n ·1
λ
The exponential distribution has the important property of memorylessness, which is:
P (X > t + u) e−λ(t+u)
P (X > t + u | X > t) = = = e−λu = P (X > u)
P (X > t) e−λt
Interpretation: past knowledge doesn’t tell us anything about the future. Suppose a light bulb is more
than t years old, and that this is known. If we want to estimate the remaining lifetime using the Poisson
distribution, then the probability that the light bulb will remain at least another u time periods, is not
influenced by the fact that the light bulb is more than t years old.
2
,There are only two distributions with the memorylessness property:
1. Poisson distribution (only continuous distribution with this property)
2. Geometric distribution (only discrete distribution with this property)
There are two important properties for X1 , . . . , Xn independent exponential random variables with parame-
ters λ1 , . . . , λn respectively:
Property 1.1. Let Xi ∼ Exp(λi ) for i = 1, . . . , n with X1 , . . . , Xn independent, then:
min{X1 , . . . , Xn } ∼ Exp(λ1 + . . . + λn )
Property 1.2. Let Xi ∼ Exp(λi ) for i = 1, . . . , n with X1 , . . . , Xn independent, then:
Y λi
P (Xi = min{X1 , . . . , Xn }) =
λi + λj
j̸=i
Proofs
1. Suppose Xi = min{X1 , . . . , Xn },then P (min{X1 , . . . , Xn } < x) = P (Xi < x). Since it’s unknown what
Xi = min{X1 , . . . , Xn }, it follows that P (min{X1 , . . . , Xn } < x) = P (X1 < x ∪ . . . ∪ Xn < x). Using
the inclusion-exclusion principle: for 3 events A, B and C we have that:
P (A ∪ B ∪ C) = P (A) + P (B) + P (C) − P (A ∩ B) − P (B ∩ C) − P (A ∩ C) + P (A ∩ B ∩ C)
If events A, B and C are indpendent it follows that:
P (A ∪ B ∪ C) = P (A) + P (B) + P (C) − P (A)P (B) − P (B)P (C) − P (A)P (C) + P (A)(B)P (C)
Let event A, B and C be events X1 < x, X2 < x and X3 < x respectively, where X1 , X2 , X3 are
independent exponential random variables with parameters λ1 , λ2 and λ3 respectively, then we can
derive that:
P (X1 < x ∪ X2 < x ∪ X3 < x) = P (X1 < x) + P (X2 < x) + P (X3 < x) − P (X1 < x)P (X2 < x)
− P (X2 < x)P () − P (X1 < x)P (X3 < x) − P (X1 < x)P (X2 < x)P (X3 < x)
..
.
= 1 − e−(λ1 +...+λn )x
Q
2. Note that P (Xi = min{X1 , . . . , Xn }) = P (Xi < X1 , . . . , Xi < Xn ) = j̸=i P (Xi < Xj ) due to that
3
, X1 , . . . , Xn are independent, therefore:
Y YZ ∞ Z yj
P (Xi = min{X1 , . . . , Xn }) = P (Xi < Xj ) = λi e−λi x λj e−λj yj dxdyj
[1] 0 0
j̸=i j̸=i
YZ ∞
= (1 − e−λi yj ) · λj e−λj yj dyj
j̸=i 0
YZ ∞
= λj e−λj yj − λj e−(λi +λj )yj dyj
j̸=i 0
Y yj →∞ λj h −(λi +λj )yj iyj →∞
= e−λj yj yj =0 − −e
λi + λj yj =0
j̸=i
Y λj Y λi
= 1− =
λi + λj λi + λj
j̸=i j̸=i
1.2 Erlang Distribution
k
Pk
A random variable Y has an Erlang-k (k = 1, 2, . . .) distribution with E[Y ] = λ if Y = i=1 Xi , where
i.i.d.
X1 , . . . , Xk ∼ Exp(λ). We write this as Y ∼ Ek (λ) or Y ∼ Erlang(k, λ), where λ is called the scale
parameter, and k is called the shape parameter.
For Y ∼ Ek (λ):
(λt)k−1 −λt
f (t) = λ e , t>0
(k − 1)!
and
k−1
X (λt)j −λt
F (t) = 1 − e , t≥0
j=0
j!
Y ∼ Ek (λ) has the following properties:
k k
X k X k Var(Y ) 1
E[Y ] = E[Xi ] = Var(Y ) = Var(Xi ) = c2Y ≡ 2 ·
i=1
λ i=1
λ2 (E[Y ]) k
1.3 Counting process definition
A stochastic process is a collection of random variables {X(t), t ∈ T }. That is, for each t ∈ T , X(t)
is a random variable. We often interpret t as time, and call X(t) the state of the process at time t. If
T = {0, 1, . . .}, then we usually write X0 , X1 , . . . and we speak of a discrete-time stochastic process.
A stochastic process {N (t), t ≥ 0} is called a counting process when N (t) represents the number of ’events’
that have occurred in the time interval (0, t]. All counting processes have to satisfy the following properties:
(i) N (0) = 0 (No events can have occurred if duration is 0)
4
, (ii) N (t) ∈ Z for all t (Counting function, so only allows integers)
(iii) N (s) ≤ N (t) if s < t, so N (t) is increasing in t (No information loss)
(iv) For s < t, the quantity N (t) − N (s) represents the number of events in (s, t]
Special cases:
• Independent increments
If (a, b] ∩ (c, d] = ∅, then N (b) − N (a) and N (d) − N (c) are independent.
• Stationary increments
It holds that N (t) − N (s) and N (t + u) − N (s + u) are identically distributed for all s, t, u > 0 and s < t
A Poisson process is a counting process with independent and stationary increments, that also has a third
property that can be described in multiple ways.
Definition 1.3. [Poisson Process (i)]
The counting process {N (t), t ≥ 0} is a Poisson process with rate λ if
1. N (0) = 0
2. The process has stationary points and independent increments
e−λt (λt)n
3. P (N (t) = n) = with n = 0, 1, . . . , and t ≥ 0
n!
Hence N (t) is for all t > 0 Poisson distributed, with mean E[N (t)] = λt, which is the expected number events
in (0, t)
Definition 1.4. [Poisson Process (ii)]
The counting process {N (t), t ≥ 0} is a Poisson process with rate λ if
1. N (0) = 0
2. The process has stationary points and independent increments
3.
P (N (h) = 1) = λh + o(h), with h ↓ 0
P (N (h) ≥ 2) = o(h), with h ↓ 0
f (h)
Where f (·) ≡ o(h) if lim = 0, Interpretation: o(h) vanishes (becomes 0) faster than h, as h ↓ 0.
h→0 h
We’ll prove that these two definitions are equivalent. Note that as h → 0, we may assume that (λh)n → 0
faster than h → 0 for all n ≥ 2 and n ∈ N, since λ cannot be sufficiently great to ensure that h < (λh)n .
Also, recall the Taylor expansion for an infinitely differentiable function f (x) around a point c:
f ′′ (c) f ′′′ (c)
f (x) = f (c) + f ′ (c)(x − c) + (x − c)2 + (x − c)3 + . . .
2! 3!
Proof
5