Statistics & Methodology
Statistical Inference - answers Using sample data to test a null hypothesis about the
population
Quantitative vs. Qualitative Data - answers Numerical vs. non-numerical data
Observational vs. Experimental Studies - answers Studying subjects as they are vs.
imposing treatment on subjects
Random Sampling - answers Selecting a sample from a population where each member
has an equal chance of being chosen
Clinical Trials - answers Research studies that test how well new medical approaches
work in people
Normal Distribution - answers A bell-shaped symmetrical curve representing the
distribution of many traits
Measures of Central Tendency - answers Statistics that describe the center of a data
set (mean, median, mode)
Measures of Variation - answers Statistics that describe the spread of a data set (range,
variance, standard deviation)
Hypothesis Testing - answers A method for testing a claim or hypothesis about a
parameter in a population
Central Limit Theorem - answers States that the distribution of sample means
approximates a normal distribution
Standard Error - answers The standard deviation of the sampling distribution of a
statistic
Confidence Interval - answers A range of values so defined that there is a specified
probability that the value of a parameter lies within it
Data Presentation/Graph Plotting - answers Displaying data in a visual form to
communicate information effectively
Type I Error - answers Probability of rejecting the null hypothesis when it is true
Type II Error - answers Probability of failing to reject the null hypothesis when it is false
,Power of a Test - answers Probability of rejecting a null hypothesis when it is indeed
false
Probability - answers Expresses the degree of confidence or uncertainty in data
Null Hypothesis - answers Hypothesis that there is no significant difference between
specified populations
Binomial Distribution Function - answers Calculates the probability of a certain number
of successes in a fixed number of independent Bernoulli trials
Chi Square Test - answers A statistical test to determine if there is a significant
difference between the expected and observed frequencies
Probability Distribution - answers A function that describes the likelihood of obtaining
the possible values that a random variable can take
Prognostics - answers Predictive factors indicating the likely outcome of a disease
Statistical Significance - answers The likelihood that a result or relationship is caused by
something other than mere random chance
Interpreting Results - answers Carefully analyze and explain findings
Accepting Uncertainty - answers Acknowledging and embracing lack of absolute
certainty
P Value - answers Probability of obtaining test results at least as extreme as the ones
observed
Alternative Hypothesis - answers Contrary to the null hypothesis, representing a
different conclusion
Test Statistic - answers Numerical summary of sample data used in hypothesis testing
Sampling Distribution - answers Distribution of a sample statistic, like the sample mean,
across many samples
Standard Error of Means - answers Standard deviation of the sample mean distribution
Confidence Intervals - answers Range within which the true population parameter is
likely to fall
T-Distribution - answers Similar to normal distribution, accounting for estimation of
standard error
,Degrees of Freedom - answers Number of values in the final calculation of a statistic
that are free to vary
what is a statistic - answers summary information of sample data
what is a parameter - answers summary information of population data
what are the 3 types of summary statistics - answers measures of central tendency,
measures of spread and shape information
what is central tendency (with example) - answers measures the centre point of the data
- calculated to understand the most likely outcome in the data e.g. mean, median mode
what is the mean of this data set : 35.8, 30.5, 15.1, 14.8, 28.2, 26.9, 26.3, 20.1, 9.3,
24.6 - answers 22.48
Why is the mean not a great measure of the centre - answers highly affected by
extreme values, any value will change the mean
what is the median - answers the value that perfectly divides the data in two
what is the mode - answers the value(s) that appears the most times in the data set
what is another term for measures of spread - answers statistical dispersion
what does statistical dispersion tell us - answers how spread out the data is
what are the key measures of spread - answers range, interquartile range, and standard
deviation / variance
how is range calculated - answers find the largest and smallest values in the data and
calculate the difference between them
what is the interquartile range - answers the difference between the first and third
quartiles when we split a dataset in 4
what is variance - answers the mean squared distance of each data value from the
mean - e.g. how far away from the mean is each data point
using the same dataset where the mean is 22.48, what is the squared difference of this
data set : 35.8, 30.5, 15.1, 14.8, 28.2, 26.9, 26.3, 20.1, 9.3, 24.6 - answers 618.8
what are the two different types of variance - answers sample and population
how do you calculate population variance - answers sum of squared differences divided
by the population size
, if the sum of squared differences of our dataset is 618.8, what is the population variance
- answers 61.9
how do you calculate sample variance - answers sum of squared differences divided by
the sample size minus one
what would be the sample variance for our data set (ssd = 618.8) - answers 68.8
what is the difference between sample and population variance - answers to use
population variance you need all of the data available, whereas for sample variance you
only need a proportion of it
how do you calculate standard deviation - answers the square root of the variance
if the sample variation was 68.8, what would the sample st. deviation be - answers 8.3
degrees c
what is shape information - answers gives a numerical way to describe the shape of the
data, often used for comparison of data
what is skew/skewness - answers it tells us how symmetrical the data is
what does positive skew mean - answers the tail of a distribution curve is longer on the
right side - skewed to the right
what does negative skew mean - answers the tail of a distribution curve is longer on the
left side - skewed to the left
what is kurtosis - answers a measure of the peakedness of the data
what does a lower kurtosis mean - answers there is less of a single peak in the data
what does a higher kurtosis mean - answers more distinct peak in the data
what are common issues in data sets, which may mean we want to use data cleaning -
answers missing data, outliers, impossible values, incorrect formats, variable types
wrong
what is the goal of data cleaning/ what do we want to do - answers identify all
errors/issues in the data, fix/address these issues
what are some common solutions - answers delete rows, remove variables, replace,
work around it