► Upon completion of this lecture section, you will be able to:
► Summarize a binary outcome across a group of individual observations via the sample
proportion
► Explain why, with binary data, the sample proportion is the only summary statistic
(besides sample size n) necessary to describe characteristics of the sample
► Compute the sample proportion based on the results of a study
3
, Example: Treatment Response to ART, HIV+ Individuals—1
► Response to therapy in a random sample of 1,000 HIV-positive patients from a citywide
clinical population
► 206 of the 1,000 patients responded. The summary measure used for binary outcomes is
the sample proportion 𝑝𝑝̂ (pronounced p-hat!), given by
# 𝑜𝑜𝑜𝑜 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟 206
𝑝𝑝̂ = = = 0.206 𝑜𝑜𝑜𝑜 20.6%
𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 # 𝑖𝑖𝑖𝑖 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 1,000
► Why the hat? To distinguish 𝑝𝑝,̂ the sample estimate, from the underlying true (population)
proportion p (which can only be estimated)
Source: http://inclass.kaggle.com/ 4
,p-hat, Generally Speaking
► Response to therapy in a random sample of 1,000 HIV-positive patients from a citywide
clinical population
► The sample proportion 𝑝𝑝� is just a sample mean of data that takes on two values, 0 and 1
► Generally, binary data values are given a value of x=1 for observations that have the
outcome, and x=0 for observations that do not have the outcome
5
, Example: Treatment Response to ART, HIV+ Individuals—2
► Response to therapy in a random sample of 1,000 HIV-positive patients from a citywide
clinical population
► So, with 206 of the 1,000 responding, we have xi=1 for 206 observations, and xi=0 for 794
observations
∑𝑛𝑛𝑖𝑖=1 𝑥𝑥𝑖𝑖 ∑1000
𝑖𝑖=1 𝑥𝑥𝑖𝑖 206
𝑝𝑝̂ = = = = 0.206 𝑜𝑜𝑜𝑜 20.6%
𝑛𝑛 1,000 1,000
6