Research methods in clinical neuropsychology
The goal: to acquire crucial methodological skills for a career in the field of clinical neuropsychology –
for both research and clinical work.
Introductory test
You believe that prospective memory of patients with dementia is compromised. You perform a study
in which you compare several aspects of prospective memory between a group of patients with
dementia and a healthy comparison group. Name two advantages of effect sizes over significance
tests:
- With effect sizes we do not see the existence only, but also the severity (the impact) of the
problem of the people (how severely are they compromised in daily life?). So, ES helps to
compare significant effects with each other.
- The number of people is important. Significance testing largely depends on sample size,
effect size far less. If the sample size determines whether it is significant or not, this is a
problem if we want to know if the effect is present. If you have a very small sample, we need
a very large effect size to have a significant difference. Therefore, significance is not efficient,
and effect size is especially important for smaller samples. So, significance depends on
sample size, ES not.
A colleague is wondering whether your study is underpowered. Explain in plain language (without
technical terms) what underpowered means.
- There is a low likelihood to reveal a true existing effect. If it is there, the sample is too small
too show. There could be a significant effect, but you didn’t find it because the sample size is
too small. Low likelihood to find a true existing effect. So, given patients with dementia are
really impaired in prospective memory, it is unlikely that you will reveal this effect in your
study.
You evaluate a new treatment for cognitive impairment after acquired brain damage. Name two
different active control group designs that you could implement in your design.
- What is an active control group design? A big problem in many intervention studies is that we
want to find out whether a new intervention works, but there are reasons (biases) that a
researcher wants to show its effectiveness. So, you want a control group, for instance waitlist,
which is a passive, non-treated control group. This is artificial because if you have a certain
condition, it is unrealistic to give no treatment. So, more relevant is to compare a new
treatment to the usual treatment (realistic alternative) instead of an untreated group. So, an
active control group design could be a
o Dose control group design they get the same treatment but a lower dose
o Dismantling design intervention programs consist of different elements
(assessment, feedback, trainings, re-evaluation, etc.). These elements are called
active ingredients (all together are responsible for treatment effect). Dismantling
means that you do the same, but you leave out one element. You will then get to
know the effect of that element.
A new test is developed for the diagnosis of MCI. You perform a study to validate this test in clinical
practice. Which statistical values would you report to describe the validity of such a test?
- Sensitivity/specificity are very important terms and more important than just knowing that
the test gives a significant p-value between two groups. You want to know how likely it is that
you can identify the right group with this test:
o Sensitivity detecting people with MCI
, o Specificity detecting people without MCI
o Overall diagnostic accuracy (e.g., AUC)
o Sensitivity towards disease, specificity towards no disease
Name two research designs that you can apply to explore the effectiveness of a treatment on one
patient (single case studies)?
- ABAB design
- Multiple baseline design
What is the Reliable Change Index?
- Indicates the change in a measure from one time to another (e.g., before and after
treatment), taking into consideration the variability of scores at the first time, as well as the
test-retest reliability.
How does a Delphi methodology contribute to reaching consensus?
- A Delphi methodology includes a group of experts on a topic (the expert panel) and
facilitates collective decision making by a structured and anonymous way of
communication.
Designing, implementing, analysing
1. Group designs: selection, recruitment, measurement
2. Pitfalls in significance testing
3. Effect sizes
4. Implications for power
5. Controlling for confounding factors
1 Group designs: selection, recruitment, measurement
We have a population of patients (e.g., depression, TBI,
dementia, etc.). We can never assess the entire group, but we
take a sample of this group. We have cross sectional studies
meaning that we compare a sample to another sample on one
point in time. Case control studies are cross-sectional studies,
for each person who has a condition, you select one or more
similar people who do not have the condition. You try to match them on important factors like age or
gender, so the only major difference between the groups is whether they have the condition. It could
also be that we follow the same group over time, this is a cohort study, the same group over time to
see how this sample is doing later in time. We can also follow the cohort of the healthy controls over
time. These are longitudinal studies. If we assess these cohorts right now, and follow them
prospectively, it is a prospective longitudinal study or a prospective cohort. A retrospective
longitudinal or retrospective cohort is also possible, then you go back in time. You assess a group of
people now and then look back into the files to see how they did in the past (e.g., school reports
etc.).
Selection, recruitment, measurement
There is a truth in the universe that we would like to know, e.g.,
for all people with a certain brain disorder we would like to
know how their attention is. We do a research study in which
we assess the phenomenon with certain variables. Then, we
hopefully assess the phenomenon in a good way. We can never
assess the whole population, so we assess a sample that is
hopefully representative of the population. Then, you do some
,assessments which are less perfect that you want (circumstances) and the participants are also not
perfect than you want because of circumstances. We try to plan it very well, do good assessment,
have good experimenters and environments, this is all about the internal validity. The external
validity tells us something about the real world, how the results of the test translate to the real
world, this is very important for us. To have a good external validity, we start with a good (as good as
possible) selection of people/participants.
Establishing selection criteria. We need to refine the criteria for selection of people. This is important
because we only learn about these people. For instance, if our study is about work environment, we
should only select people within the working age. Or if we want to learn about the conditions of
people in families, we should select people having a family in a certain age group. So, it is very
important that we know what we want to learn about and stick to inclusion/exclusion criteria.
Inclusion criteria:
- Demographic characteristics
- Clinical characteristics
- Geographic characteristics
- Temporal characteristics
Exclusion criteria:
- Risk of being lost at follow up
- Inability to provide good data
- At risk for possible adverse effects
- Non-representative for population
Sampling. If you have the criteria and know your inclusion and exclusion criteria, the next question is
how to find these people. When we do clinical studies, you often have a non-probability sample. Be
aware that in many cases we have a selected bias sample, not randomly selected.
Nonprobability samples
- Convenience samples
- Snowball sampling
- …
Probability samples
- Simple random samples
- Systematic random sample
- Stratified random sample (population is divided into distinct, non-overlapping subgroups
(strata) based on shared characteristics, and then a random sample is drawn from each
stratum)
- Cluster sample (a population is divided into groups, or “clusters”, and a random sample of
these clusters is chosen).
These probability sample all have in common that they are random and represent the true
population. In reality, we have very few random samples.
Measurement. If you select and recruit people, then we have to think about how we are going to do
our assessment and which measures we are going to apply (scaling, sensitivity, precision (reliability),
accuracy (validity), computerized or paper-pencil assessment?). How do we pick good
measurements? Why do we not ask directly the thing that we want to know (can you still drive a car?)
but use scales, etc. It is tempting to ask the question directly, but not always wise.
, Scaling categorical or continuous variables? One advice is to avoid categorical variables if possible
because you lose much power and information. Continuous variables contain more information, are
more flexible and are often preferred. It gives more information, is more sensitive, statistical analysis
is more powerful if you don’t take categorical variables but continuous ones which can later be
categorized if you like them to.
Precision (reliability) is your measurement precise (reliable)?
Precision/reliability is the degree to which individuals retain relative
position within a distribution of scores form one testing session to
another. So, if you score as the second last today, you are also
expected to score second to last in three days’ time. This is most often
presented as correlation in test-retest reliability (stability). If it is more reliable, it is more useful.
Example of precision (reliability) one of the most well-known tests in ADHD is the CPT. About
reliability they show the internal consistency and test-retest reliability based on 120 respondents
from the general population. Is this applicable to a clinical context? No, we don’t assess healthy
people and typical for non-healthy people is that they fluctuate more. So, stable assessments in
healthy children does not mean this test is reliable in a clinical population. So, based on this
information we don’t know how reliable the test is in the population we are interested in.
Accuracy (validity) does the test measure what it is supposed to measure? E.g., the CPT explains it
in terms of discriminative and incremental validity. You want a measure that is sensitive, meaning
showing effects in the groups you assess. E.g., study to whole body vibration; they wanted to show
the effects in healthy individuals, so if they select good measurement that detects improvements
even in healthy people, it needs to be very sensitive to do this.
Still, you want to reduce random error in your measurement. So, you have an experiment of which
you know that if there are effects, they are small. So, the measurement must be sensitive which
means that the random error should be reduced. How do we reduce random error?
- Standardizing the measurement
- Training the staff doing assessments
- Automating the instrument
- Blinding
- Repeating the measurement
In this example, a sensitive measurement is needed because they expected small effects. If you want
to reveal memory deficits in a person with AD, you don’t need a sensitive measurement because if
you take any word list, a person with AD will remember fewer words than a healthy control. However,
in this test, small effects are expected, and thus sensitive measures are necessary.
Modern assessment methods: computerized vs. paper-pencil assessment we can improve the
precision and sensitivity if we use computerized assessment. However, even though we have these
computerized assessments available, we still don’t use them too much. Why are neuropsychologists
so reluctant to use these modern assessment techniques? Reasons that may explain this:
1. Psychometric obstacles
o Reliability of traditional and computerized tests
o Equivalence of computer tests and paper-and-pencil tests
o Quality of normative data
2. Technical obstacles
o E.g., speed in technical developments hamper work on psychometric properties
3. Theoretical obstacles