PYC4807 ASSIGNMENT 02 2021.
PYC4807 ASSIGNMENT 02 2021. PYC4807 - Psychological Assessment. STEPS IN DEVELOPING A PSYCHOLOGICAL MEASURE The steps in developing a psychological measure follow a systematic progression; however it is important to note that, in reality, there would be some degree of integration of the phases. In particular the test developer should be aware of the implications and requirements for establishing validity, reliability and norms throughout the development of the measure. 2.1 PLANNING PHASE When first embarking on the development of a measure, it is necessary to create a strong foundation that will guide the development of the measure. This is done by putting together a good plan for the measure and should include identifying the aim, content and test plan. When developing a measure for multi-cultural or multi-lingual application as in the South African context, the specific complexities of the environment that the test will be used in must be adequately addressed in order to avert any potential bias or discrimination in the use of the measure (Foxcroft & Roodt, 2009). 2.1.1 Specifying the aim of the measure Initially one should clarify the aim of the measure. This includes identifying the purpose of the test, for example, a test screening for depression. Then the construct to be measured should be clarified, for example, the construct ‘depression’ (criterion variable) and the possible predictor variables such as suicidal ideation, sleep patterns, appetite changes should also be identified. Next what the test will be used for (e.g. screening for depression or in-depth diagnostic assessment) and the conclusions that can be drawn from the results (for e.g. whether or not treatment is needed) should be described. The test developer should also outline who (the population) the test is being developed for (for e.g. teenagers) and its applicability in a multicultural context (identifying the applicable groups) must be discussed. 2 Janet Walker / PO Box 72433, Parkview, 2122 / / PSY4988 / Assignment 2 Furthermore, decisions must be made about whether it should be administered to individuals and/or groups and whether paper or computer based administration is appropriate. Lastly, whether it is a normative measure (i.e. individual’s score is compared to a norm group), ipsative measure (i.e. individual’s score is compared to his/her score on another test) or criterion referenced test (i.e. performance compared to a well-defined content domain) should be stated (Foxcroft & Roodt, 2009). The test developer needs to be clear about the reasons this new test is necessary and how it improves on or is different to other similar measures. For example, a new screening test for depression in teenagers might be developed with the aim of identifying teenagers at risk of suicide, in a particular population where many teenage suicides have taken place. This could be a good reason for developing a new depression measure. 2.1.2 Defining the content of the measure The content and purpose of a measure are closely related. The test developer must clarify the construct (depression) to be tapped by the measure and this is done by operationally defining the ‘content domain’ (Mazabow, 2011). The intention is to make certain that the dimensions of the construct (suicidal ideation, appetite, sleep patterns etc) are theoretically grounded (by reviewing theories of depression), thereby ensuring that the construct is defined in concrete terms and easily measurable (Foxcroft & Roodt, 2009). The Rational method involves conducting an in-depth literature review of the main theoretical viewpoints concerning the construct to be measured. In an educational setting focus is placed on specific learning outcomes in particular areas (for e.g. arithmetic ability, vocabulary, problem solving and writing skills). In an organizational setting, a job analysis (identifying knowledge or competencies needed) would be conducted (Foxcroft & Roodt, 2009). The Criterion-Keying method is used when the test is required to distinguish between different groups, for example, to identify individuals who are depressed and in need of treatment from those who are not. The test developer would investigate those aspects of the construct that the groups differ on (for e.g. the presence of suicidal ideation or not) and would include relevant items that identify high-risk and non-high risk subjects (Foxcroft & Roodt, 2009). Very often factor analysis is conducted after the experimental administration of the test, so as to further refine the fundamental dimensions being tapped (Mazabow, 2011). If the test is to be used in a multi-cultural and multi-lingual context, it is essential for the construct to be unpacked in terms of each language and cultural groups understanding (for e.g. how do they understand the term ‘depression’ and the underlying dimensions such as suicidal ideation). Interviews and focus groups can be used to establish this (Foxcroft & Roodt, 2009). 2.1.3 Developing the test plan 3 Janet Walker / PO Box 72433, Parkview, 2122 / / PSY4988 / Assignment 2 The next step in the planning phase involves determining the format of the test-items and the number of items needed. The test format consists of the stimulus, that is, the test items (open-ended/forced-choice/sentence- completion/performance-based items), and the mechanism for response which include objective formats (only one correct answer as in multiple-choice and matching exercises) and subjective formats (response is verbal or written as in interview/oral/essay, requires subjective interpretation) (Foxcroft & Roodt, 2009). The choice of item format is influenced by what is to be measured and some practical considerations, for example, if a questionnaire is to be given to Unisa students about their experience of distance learning, then essay type questions are not practical, rather forced choice items such as true/false or multiple choice which are easy to score and administer to a large number of people. Performance-based items, such as, an oral presentation, are appropriate in assessing job performance where a demonstration of ability is required. The number of items (length of the measure) depends on the availability of time to administer the measure and also on the purpose of the measure (Foxcroft & Roodt, 2009). At this stage the test developer must be aware of the potential for bias. Bias can be accidentally introduced via the item stimulus or method of response or the response set (for e.g. test-taker agrees with all questions) or via the language the test is developed in. Caution is needed particularly in multi-cultural contexts, for e.g. Owen (cited in Foxcroft and Roodt, 2009) found that story-type items resulted in biased differential performance between white and black South Africans. So it is clear that method and test bias can be controlled and reduced by the way in which the test items are constructed. In multi-lingual contexts a test developed in English could present bias for Afrikaans or Zulu test-takers. Multiple language versions of the test would reduce this bias (Foxcroft & Roodt, 2009). The test plan is now complete. It clearly outlines the specific content domains to be included and the number of items for each domain. 2.2 ITEM WRITING This phase includes the actual writing of the items and the process of reviewing the items. 2.2.1 Writing the items The items are written by the test developer in consultation with a team of experts who draw on many sources (such as theoretical literature and other measures), and are guided by the test plan and purpose of the measure. Some important pointers in writing good items are: The items should be unambiguous and clearly written using short sentences (as a lack of understanding could skew test-takers results); appropriate vocabulary should be used and negative expressions/double-negatives and more than one theme in a single item should be avoided, for example, do you agree that there should not be a death sentence or gun ownership in South Africa? Also try to keep the length of True-False items more or less equal 4 Janet Walker / PO Box 72433, Parkview, 2122 / / PSY4988 / Assignment 2 with the same amount of true or false statements. In multiple choice items make sure the position of the correct answer changes, plus the distracters (alternative answers) should be plausible. The content of the items should also be appropriate, for example, if the purpose of the measure is screening for depression then the items should not include questions about geography (Mazabow, 2011). There will probably be far too many items at this stage but this is appropriate as many (at least one third) will have to be discarded at the item analysis phase (Foxcroft & Roodt, 2011). 2.2.2 Reviewing the items The review and evaluation of the items is conducted by a team of experts. Emphasis is placed on whether or not the items adequately tap the content domain of the construct being measured. The experts opinions on the cultural, linguistic and gender appropriateness of the test items is noted. An in-depth evaluation of the item wording and the item formats takes place. The items can then be administered to a small sample of the target population in order to assess if there is any misunderstanding of items or test instructions. This qualitative information and the feedback from the team of experts would result in some items being revised or rewritten or discarded (Foxcroft & Roodt, 2009). After this process, the experimental version of the measure is developed. 2.3 ASSEMBLING AND PRE-TESTING THE EXPERIMENTAL VERSION OF THE MEASURE The purpose of this phase is to try the measure out on a large, representative sample (400 to 500 people) of the population for whom the measure is intended. This involves doing the following. 2.3.1 Arranging the items Keeping in mind the construct being tapped, the items are sequenced in a logical order, for example, easier questions develop onto more difficult items. Plus items or groups of items must be recorded on the correct pages in the test booklet in order to avoid confusion (Foxcroft & Roodt, 2009). 2.3.2 Finalizing the length The time needed to complete the measure is important especially if there are test time limits. In which case, to ensure that there is sufficient time to complete the test, the content to be read should be reduced or some items should be eliminated or re-written (Foxcroft & Roodt, 2009). 2.3.3 Answer protocols 5 Janet Walker / PO Box 72433, Parkview, 2122 / / PSY4988 / Assignment 2 Answer protocols should allow for easy scoring of the test and be simple to duplicate. Final decisions about either developing a separate answer sheet or using the test booklet must be made at this stage (Foxcroft & Roodt, 2009). 2.3.4 Developing administration instructions There is the risk that poorly worded administration instructions can lead to poor performance on the test thereby contaminating test scores. Therefore the administration instructions should be clear and unambiguous. In order to ensure this the test developer can test the instructions on a sample of the intended population. Test administrators must also be well trained in administering the measure to ensure the reliability and validity of the measure (Foxcroft & Roodt, 2011). 2.3.5 Pre-testing the experimental version of the measure The experimental version of the measure is now administered to a large (400 to 500 people), representative sample of the intended population. Both quantitative (performance on items) and qualitative information is gathered. Qualitative feedback is gathered by the test developers and includes test-takers views on item difficulty or item comprehension. Plus opinions on test materials used and the layout of items and test length can be obtained. This information will be vital to the refinement of items and the final item selection (Mazabow, 2011). 2.4 ITEM ANALYSIS PHASE At this point the scores for each item from the experimental administration of the test are statistically analysed. Based on this quantitative information the test developer decides if each item is serving its intended purpose. Each item is either kept, discarded or rewritten. For the purpose of this essay Classical Test Theory, that is, the item difficulty value (difficulty level of an item), the item discrimination value (does the item discriminate between good and bad performance) and the item-total correlation (what are the shortcomings of an item) will be discussed. Plus Item Response Theory where both item discriminatory power and item difficulty levels and item bias can be effectively identified (Foxcroft & Roodt, 2009).
Geschreven voor
- Instelling
- University of South Africa
- Vak
- PYC4807 - Psychological Assessment (PYC4807)
Documentinformatie
- Geüpload op
- 4 oktober 2021
- Aantal pagina's
- 27
- Geschreven in
- 2021/2022
- Type
- Tentamen (uitwerkingen)
- Bevat
- Vragen en antwoorden
Onderwerpen
-
psychological assessment
-
pyc4807
-
pyc4807 psychological assessment
-
pyc4807 psychological assessment 02 2021