To begin with, capital letter Θ is used to denote the parameter space: all possible values the
variable can potentially take. It is also used with an index such as Θ0 to denote the space under
the null hypothesis and Θ1 to denote the space under the alternative hypothesis.
However, the Greek small letter 𝜃 is used in statistic to denote an unknown parameter of interest.
For an example in A/B testing it is usually modeled as a random variable. The true value of 𝜃 is
denoted by 𝜃 ∗ , while the estimator of 𝜃 (usually the likelihood estimate) is denoted with a hat
above the letter.
The common problem is to find the values of 𝜃. For example: it is commonly used to name the
mean and the standard deviation in a Normal distribution mu and sigma 𝑁 ∼ (𝜇, 𝜎). 𝜇 in Normal
distribution tells where the mean of the distribution is and so it can describe random variables
with different mean values. So, the parameters are often called 𝜃. Other distributions have at
least one unknown parameter of interest. In Binomial distribution, there are two parameters: the
number of independent trials (n) and the probability of success (p). There is also the Gamma
distribution consisting of two parameters: the shape parameter 𝛼 and the rate parameter 𝛽.
Furthermore, the is the Poisson distribution consisting of only one parameter which is the mean
number of events 𝜆. There is also the Geometric distribution which has only one parameter,
which is the probability of success for each trial (p).
Theta 𝜃 can apply in any parameters you want to estimate. For an example: the reference of 𝜃
can depend on what model you are working on, such as the least squares regression where you
model a dependent variable (Y) as a linear combination of one or more independent variables
(X). 𝑌𝑖 = 𝑏0 + 𝑏1 𝑋1 + 𝑏2 𝑋2 + ⋯ + 𝑏𝑛 𝑋𝑛 . Where n is the number of independent variables and
the parameters to be estimated are the 𝛽𝑠 . So, 𝜃 is the name of all the 𝛽𝑠 .
On another example: you want to study the disintegration of radioactive atom which decreases
exponentially. Letting t to be the time to disintegration then the model is: 𝑓(𝑡) = 𝜃𝑒 −𝜃𝑡 . Where
f(t) is a probability density function of an atom disintegrating in the time interval (t, t + dt) which
is f(t) dt. So, the interest is to estimate 𝜃 which is the disintegration rate.
On the last example: You want to study the precision of a weighing instrument. (measurements
are Gaussian), you model the weighing of a standard 1kg object as: 𝑓(𝑥) =
1 𝑥−𝜇 2
exp {− ( 2𝜎 ) } . Where x is the measure given by the scale. F(x) is the probability density
𝜎√2𝜋
and the parameters are 𝜇 𝑎𝑛𝑑 𝜎. So, 𝜃 = (𝜇, 𝜎) where mu is the target weight and sigma is the
standard deviation of the measure every time you weigh the object.
In conclusion, symbol 𝜃 is used to denote any unknown parameters of interest. Statistics is about
finding the best or appropriate 𝜃 values (Bayesians would say: given the data and priors.