Statistics Lecture Notes
Lecture week 1: Introduction to Statistics
The research process
The empirical cycle
o Initial observation (can be anything)
o General claim via induction (general principle)
o Testable hypothesis via deduction (if…then…)
o Collecting empirical data as a test
o Do the data support? Evaluation (statistics)
Role of empirical data and statistics
Evaluation of the empirical data as a test of the hypothesis.
Assuming that the data are relevant, collected properly, coded or scored adequately etc.etc
Formulating hypotheses
Hypothesis: the best prediction or a tentative solution to a problem. Can come from literature
or theory….
Should have the property that it can be either refuted or confirmed. (it is the hypothesis that is
being tested, not the problem.)
Theory, hypotheses and falsification
o Scientific method
, o Falsifiable claims (Problematic for some theories – for example: psycho-analyses, …)
o Replicability
Verify or falsify?
Karl Popper (1902 – 1994)
One falsification is stronger than an infinite number of verifications
Why?
The logic of falsification
All swans are white: what do you need to find out to verify this statement, and what to falsify
it?
o Proving a hypothesis based on verification is impossible because you never know what
you haven’t observed.
o It is possible to prove a hypothesis wrong: falsification
Goal of research
o Research describes and tries to understand/explain variability
o In linguistic and communication research:
- Individual differences
- Differences between conditions/settings
o Most important: variability (variance, variation)
Concepts and constructs
Constructs: ‘Theoretical variables’ (vs. constant)
Operationalizing ‘theoretical’ variables e.g. language proficiency, attitude towards ads
Variables: roles research design / measurement psychometrics
Variables roles research design
The core variable that shows the result/outcome the dependent variable DV
Other variables that are controlled, mediated or manipulated the independent variable IV
,Research approaches
o Experimental vs. Descriptive research
Experimental research attempts to identity cause and effect relationship by conducting
controlled experiments
Descriptive research focuses on describing or portraying some phenomenon, event or
situation
o Quantitative vs. Qualitative research
Quantitative research collects numerical data
Qualitative research collects nonnumerical data (e.g., pictures, statements, written
records)
Research Design
The structure of the research/data collection depends on the kind of research questions.
For example:
o Observe Surveys: proficiency levels, opinions, attitudes, correlations between
characteristics, etc.
o Experiment Experimentation: special tasks or conditions, treatments, etc. Different
degrees of ‘control’ and randomization. Researcher manipulated.
Example: Research design
Advertisements; conditions; exposed (Group 1) and not-exposed (Group 2); measurement:
change in attitude, pre-test compared to post-test.
Experiment because researcher controlled,
Dependent variable: attitude
Independent variable: group
Researcher Designs
Development: longitudinal vs. cross-sectional
Comparing conditions: between subjects/group vs. within subjects (repeated measures)
, Important:
- Control: group, conditions, items
- Randomization and counter-balancing
- Researcher manipulation
- Avoiding confounds
- Systematic vs. unsystematic (random) variation
Cause and effect
Causality is a design issue, not a statistical issue
Think about
Why can’t you claim cause and effect from just the relationship (a correlation) between two
variables?
Suppose you find that kids who watch more soap shows also know more English words. Does
this imply that watching soap shows causes your English vocabulary to grow?
Correlation does not imply causation (not necessarily), but if one action causes the other, there
most certainly also be a correlation (e.g., smoking and lung cancer)
Causality
Changes in variable A cause changes in variable B.
Three required conditions for causal relationships:
Condition 1: Variable A and B must be associated or related relationship condition
Condition 2: Changes in variable A must precede changes in variable B temporal order
condition
Condition 3: No plausible alternative explanations exist for the relationship between variable A
and B no alternative explanation condition
The role of statistics (twofold)
Quantitative research
1. Data description
2. Inferencing from sample to population
Lecture week 1: Introduction to Statistics
The research process
The empirical cycle
o Initial observation (can be anything)
o General claim via induction (general principle)
o Testable hypothesis via deduction (if…then…)
o Collecting empirical data as a test
o Do the data support? Evaluation (statistics)
Role of empirical data and statistics
Evaluation of the empirical data as a test of the hypothesis.
Assuming that the data are relevant, collected properly, coded or scored adequately etc.etc
Formulating hypotheses
Hypothesis: the best prediction or a tentative solution to a problem. Can come from literature
or theory….
Should have the property that it can be either refuted or confirmed. (it is the hypothesis that is
being tested, not the problem.)
Theory, hypotheses and falsification
o Scientific method
, o Falsifiable claims (Problematic for some theories – for example: psycho-analyses, …)
o Replicability
Verify or falsify?
Karl Popper (1902 – 1994)
One falsification is stronger than an infinite number of verifications
Why?
The logic of falsification
All swans are white: what do you need to find out to verify this statement, and what to falsify
it?
o Proving a hypothesis based on verification is impossible because you never know what
you haven’t observed.
o It is possible to prove a hypothesis wrong: falsification
Goal of research
o Research describes and tries to understand/explain variability
o In linguistic and communication research:
- Individual differences
- Differences between conditions/settings
o Most important: variability (variance, variation)
Concepts and constructs
Constructs: ‘Theoretical variables’ (vs. constant)
Operationalizing ‘theoretical’ variables e.g. language proficiency, attitude towards ads
Variables: roles research design / measurement psychometrics
Variables roles research design
The core variable that shows the result/outcome the dependent variable DV
Other variables that are controlled, mediated or manipulated the independent variable IV
,Research approaches
o Experimental vs. Descriptive research
Experimental research attempts to identity cause and effect relationship by conducting
controlled experiments
Descriptive research focuses on describing or portraying some phenomenon, event or
situation
o Quantitative vs. Qualitative research
Quantitative research collects numerical data
Qualitative research collects nonnumerical data (e.g., pictures, statements, written
records)
Research Design
The structure of the research/data collection depends on the kind of research questions.
For example:
o Observe Surveys: proficiency levels, opinions, attitudes, correlations between
characteristics, etc.
o Experiment Experimentation: special tasks or conditions, treatments, etc. Different
degrees of ‘control’ and randomization. Researcher manipulated.
Example: Research design
Advertisements; conditions; exposed (Group 1) and not-exposed (Group 2); measurement:
change in attitude, pre-test compared to post-test.
Experiment because researcher controlled,
Dependent variable: attitude
Independent variable: group
Researcher Designs
Development: longitudinal vs. cross-sectional
Comparing conditions: between subjects/group vs. within subjects (repeated measures)
, Important:
- Control: group, conditions, items
- Randomization and counter-balancing
- Researcher manipulation
- Avoiding confounds
- Systematic vs. unsystematic (random) variation
Cause and effect
Causality is a design issue, not a statistical issue
Think about
Why can’t you claim cause and effect from just the relationship (a correlation) between two
variables?
Suppose you find that kids who watch more soap shows also know more English words. Does
this imply that watching soap shows causes your English vocabulary to grow?
Correlation does not imply causation (not necessarily), but if one action causes the other, there
most certainly also be a correlation (e.g., smoking and lung cancer)
Causality
Changes in variable A cause changes in variable B.
Three required conditions for causal relationships:
Condition 1: Variable A and B must be associated or related relationship condition
Condition 2: Changes in variable A must precede changes in variable B temporal order
condition
Condition 3: No plausible alternative explanations exist for the relationship between variable A
and B no alternative explanation condition
The role of statistics (twofold)
Quantitative research
1. Data description
2. Inferencing from sample to population