Problem 7
Woolfolk
- Standardized tests: standard methods of developing items, administering the test, scoring
it and reporting the scores
- Classroom assessments: created and selected by teachers, can take many different forms
(essays, group projects etc.)
- Measurement:
o Quantitative
o Description of an event or characteristic using numbers
o Allows teachers to compare one student’s performance on a task with a specific
standard or other students’ performances
- Assessment:
o Broader than testing
o Described the process of gathering information about student’s learning
o Includes all kinds of ways to sample and observe student’s skills and abilities
- Formative assessment:
o Occurs before or during instruction
o Purpose is to guide teachers in planning and improving instruction
o Provides feedback – nonevaluative, supportive, timely, specific
o E.g. pretest
- Summative assessment:
o Occurs at the end of instruction
o Provides a summary of accomplishment
o E.g. the final exam
Test Interpretations – there must be some kind of comparison to interpret test results
- Norm-referenced testing:
o People who have taken the test provide the norms for determining the meaning of
a given individual’s score
o Norm is the typical level of performance of a group and then the person’s raw score
is compared to the norm
o There are at least 4 types of norm groups: class or school, the school district,
national samples and international samples
o The norm groups are selected so that all SES groups are included in the sample
o The score of one large-scale assessment is used until the test is revised or re-
normed
o Norm-referenced test are appropriate when only the top candidates can be
admitted
o Limitations: results don’t tell you whether students are ready to move on to more
advanced material & they are not appropriate for affective and psychomotor
objectives
- Criterion-referenced testing:
o Scores are compared to a given criterion or standard of performance rather than
the scores of others
, o Measures the mastery of very specific objectives e.g. driving test
o The results tell exactly what the students can and cannot do, at least under certain
conditions
o When teaching basic skills, it’s more important to compare to a preset standard
o Limitations: many subjects can’t be broken down into a set of specific objectives,
criterion can be often arbitrary, it’s often valuable to know how others students
compare to you
Assessing the assessments:
- Reliability of test scores:
o Scores are reliable if a test gives consistent and stable reading of a person’s ability
from one occasion to another
o Test-retest reliability (stability): giving a test on 2 different occasions
o Alternate-from reliability: if a group of people take 2 equivalent versions of a test
and the scores are comparable
o Internal consistency: precision of test, calculated by split-half-reliability – comparing
performance on half of the test questions with performance on the other half
o Below 0.8 is not very good reliability for commercially produced standard tests
(SATs)
- Error in scores:
o Errors related to students: mood, motivation, cheating
o Errors related to test: unclear directions, high reading level, ambiguous items
o The more reliable the test scores are, the less error there will be
o Standard error of measurement: estimation of how much students’ scores would
vary if they were tested repeatedly
o Confidence interval: uses standard error of measurement and considers the range
of scores that might include a student’s true score
- Validity:
o Judged in relation to a particular use or purpose – decisions based on the test must
be supported by evidence
o Content-related evidence of validity: the test questions should include all important
topics
o Criterion-related validity: predicting outcomes e.g. SATs predict college
performance
o Construct-related validity: psychological characteristics or constructs, when results
correlate with another well-established measure
o A test must be reliable in order to be valid
- Absence of bias:
o Assessment bias: qualities of an assessment instrument that offend or unfairly
penalize a group of students because of gender, race, ethnicity etc.
Testing
- Objective testing:
o Multiple choice questions, true/false statements, and short answer or fil-in items
are all types of objective testing
o The word objective relates to not being open to many interpretations
- Multiple choice tests:
Woolfolk
- Standardized tests: standard methods of developing items, administering the test, scoring
it and reporting the scores
- Classroom assessments: created and selected by teachers, can take many different forms
(essays, group projects etc.)
- Measurement:
o Quantitative
o Description of an event or characteristic using numbers
o Allows teachers to compare one student’s performance on a task with a specific
standard or other students’ performances
- Assessment:
o Broader than testing
o Described the process of gathering information about student’s learning
o Includes all kinds of ways to sample and observe student’s skills and abilities
- Formative assessment:
o Occurs before or during instruction
o Purpose is to guide teachers in planning and improving instruction
o Provides feedback – nonevaluative, supportive, timely, specific
o E.g. pretest
- Summative assessment:
o Occurs at the end of instruction
o Provides a summary of accomplishment
o E.g. the final exam
Test Interpretations – there must be some kind of comparison to interpret test results
- Norm-referenced testing:
o People who have taken the test provide the norms for determining the meaning of
a given individual’s score
o Norm is the typical level of performance of a group and then the person’s raw score
is compared to the norm
o There are at least 4 types of norm groups: class or school, the school district,
national samples and international samples
o The norm groups are selected so that all SES groups are included in the sample
o The score of one large-scale assessment is used until the test is revised or re-
normed
o Norm-referenced test are appropriate when only the top candidates can be
admitted
o Limitations: results don’t tell you whether students are ready to move on to more
advanced material & they are not appropriate for affective and psychomotor
objectives
- Criterion-referenced testing:
o Scores are compared to a given criterion or standard of performance rather than
the scores of others
, o Measures the mastery of very specific objectives e.g. driving test
o The results tell exactly what the students can and cannot do, at least under certain
conditions
o When teaching basic skills, it’s more important to compare to a preset standard
o Limitations: many subjects can’t be broken down into a set of specific objectives,
criterion can be often arbitrary, it’s often valuable to know how others students
compare to you
Assessing the assessments:
- Reliability of test scores:
o Scores are reliable if a test gives consistent and stable reading of a person’s ability
from one occasion to another
o Test-retest reliability (stability): giving a test on 2 different occasions
o Alternate-from reliability: if a group of people take 2 equivalent versions of a test
and the scores are comparable
o Internal consistency: precision of test, calculated by split-half-reliability – comparing
performance on half of the test questions with performance on the other half
o Below 0.8 is not very good reliability for commercially produced standard tests
(SATs)
- Error in scores:
o Errors related to students: mood, motivation, cheating
o Errors related to test: unclear directions, high reading level, ambiguous items
o The more reliable the test scores are, the less error there will be
o Standard error of measurement: estimation of how much students’ scores would
vary if they were tested repeatedly
o Confidence interval: uses standard error of measurement and considers the range
of scores that might include a student’s true score
- Validity:
o Judged in relation to a particular use or purpose – decisions based on the test must
be supported by evidence
o Content-related evidence of validity: the test questions should include all important
topics
o Criterion-related validity: predicting outcomes e.g. SATs predict college
performance
o Construct-related validity: psychological characteristics or constructs, when results
correlate with another well-established measure
o A test must be reliable in order to be valid
- Absence of bias:
o Assessment bias: qualities of an assessment instrument that offend or unfairly
penalize a group of students because of gender, race, ethnicity etc.
Testing
- Objective testing:
o Multiple choice questions, true/false statements, and short answer or fil-in items
are all types of objective testing
o The word objective relates to not being open to many interpretations
- Multiple choice tests: