Answers
how do you summarise categorical data (1 and 2 1: frequencies, proportions or percentages
variables) 2: contingency table
One sample chi squared test (goodness of fit test) statistical test for categorical variables
compares data to an expected distribution/norm
alsways positive, so only consider the right tail of the distribution
One sample chi squared test (goodness of fit test) data is independent
assumptions
Chi squared test of independence (Pearsons) associations between two categorical variables
expected frequencies = (column total x row total)/overall total
Test statistic is the same formula as the goodness of fit test
Chi squared test of independence assumptions observational units are independent
expected cell counts should be >5
Options for assessing associations for categorical chi squared test of independence
variables difference in proportions of specified outcome
relative risk (RR)
odds ratio
If there is no difference between groups, RR and OR 1
should be...
McNemar's test of association - used for 2x2 tables
- can be used for repeated measurements on same variable
- dichotomous
Fisher's exact test - test of association for 2x2 tables
- P value is exact, unlike Pearson's which is based on chi squared distributions
- can be used for small sample sizes
Fisher's exact test assumptions observations are independent
df equation (number of rows -1) x (number of columns -1)
Test statistic (T) The discrepancy between the data and what is expected under the null
hypothesis
= (observed - expected)/precision
what test to use for 2 categorical variables one-sample chi squared test
, How could you improve the results from a chi squared - decide on groups before analysis
test - don't shift cut points
- expected frequency more than 5
- need sampling independence
- only two variables
one sample t-test parametric test
used for continuous data
comparing one mean to an external population (fixed value)
Continuity correction modification of Pearson's chi squared (corrects for the small number of cells in
the table)
alternative to Fisher's exact test if expected cell counts are >5.
Continuity correction assumptions observations are independent
Linear-by-linear association both values are ordinal with at least 3 categories
mod of Pearson's chi, recognises that categories have an order.
not as sensitive to small sample sizes compared to chi square
Linear by linear association assumptions observations are independent
Parametric tests assume ... the data follows a known distribution
why are distributions useful can use the theoretical distribution in place of the data distribution
one sample t test assumption data is from independent observations.
normally distributed OR sample size >30
what does the test statistic tell us value of the test statistic tells us that the mean is 'x' SEs away from the sample
mean
what are degrees of freedom indicate how many of the data points are flexible
eg. once 2 numbers are provided, the rest can be calculated
T test T-test is a parametric test that assumes data follows the t-distribution.
symmetric like a normal distribution but has fatter tails (which depend on
degrees of freedom)
Comparing 2 means independent samples t-test (or two sample t-test)
USED WHEN
- dependent variable is continuous
- independent variable is categorical
Assumptions for two-sample t-test 1. normally distributed OR sample size >30.
if violated, consider Mann-Whitney test
2. observations are independent
3. variances in the two groups are the same
- testing for differences in means, not variability. Modified t-test can be used
what do you use if you have unequal variances use Levene's test