Psychological Testing Ch 4 Lecture 3
Psychological Testing Ch 4 Lecture 3 Psych 3325
Popular in Introduction to Psychological Testing
verified elite notetaker
Popular in Psychology (PSYC)
This 11 page Class Notes was uploaded by AmberNicole on Sunday September 11, 2016. The Class Notes belongs to Psych 3325 at East Carolina University taught by Dr. Gary Stainback in Fall 2016. Since its upload, it has received 9 views. For similar materials see Introduction to Psychological Testing in Psychology (PSYC) at East Carolina University.
Reviews for Psychological Testing Ch 4 Lecture 3
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 09/11/16
Chapter 4: Of Tests and Testing Some Assumptions About Psychological Traits and States Exist Assumption 1: Psychological Traits and States Exist o Trait: defined as "any distinguishable, relatively enduring way in which one individual varies from another" o States also distinguish one person from another but are relatively less enduring o The term psychological trait, much like the term trait alone, covers a wide range of possible characteristics o Cultural evolution may bring new trait terms into common usage o For our purposes, a psychological trait exists only as a construct – an informed, scientific concept developed or constructed to describe or explain behavior o Overt behavior refers to an observable action or the product of an observable action, including test – or assessment-related responses o The phrase relatively enduring in our definition of trait is a reminder that a trait is not expected to be manifested in behavior 100% of the time o The psychological trait of sensation seeking has been defined as "the need for varied, novel, and complex sensations and experiences and the willingness to take physical and social risks for the sake of such experiences o There also seems to be rank-order stability in personality traits o Whether a trait manifests itself in observable behavior, and to what degree it manifests, is presumed to depend not only on the strength of the trait in the individual but also on the nature of the situation o Exactly how a particular trait manifests itself is, at least to some extent, situation- dependent o The definitions of trait and state we are using also refer to a way in which one individual varies from another Assumption 2: Psychological Traits and States Can Be Quantified and Measured First step in understanding the meaning of that score is understanding how aggressive was defined by the test developer Ideally, the test developer has provided test users with a clear operational definition of the construct under study The test score is presumed to represent the strength of the targeted ability or trait or state and is frequently based on cumulative scoring Inherent in cumulative scoring is the assumption that the more the testtaker responds in a particular direction as keyed by the test manual as correct or consistent with a particular trait, the higher that testtaker is presumed to be on the targeted ability or trait Domain sampling refers to either (1) a sample of behaviors from all possible behaviors that could conceivably be indicative of a particular construct or (2) a sample of test items from all possible items that could conceivably be used to measure a particular construct Assumption 3: Test-Related Behavior Predicts Non-Test-Related Behavior By their nature, however, such tests yield only a sample of the behavior that can be expected to be emitted under non test conditions The obtained sample of behavior is typically used to make predictions about future behavior, such as work performance of a job applicant Assumption 4: Tests and Other Measurement Techniques Have Strengths and Weaknesses Yet this deceptively simple assumption – that test users know the tests they use and are aware of the tests' limitations – is emphasized repeatedly in the codes of ethics of associations of assessment professionals Assumption 5: Various Sources of Error Are Part of the Assessment Process In the context of assessment, error need not refer to a deviation, an oversight, or something that otherwise violates expectations To the contrary, error traditionally refers to something that is more than expected; it is actually a component of the measurement process o More specifically, error refers to a long-standing assumption that factors other than what a test attempts to measure will influence performance on the test Error variance: Component of a test score attributable to sources other than the trait or ability measured There are many potential sources of error variance Assessors, too, are sources of error variance Classical test theory (CTT; also variously referred to as true score theory) the assumption is made that each testtaker has a true score on a test that would be obtained but for the action of measurement error Assumption 6: Testing and Assessment Can Be Conducted in a Fair and Unbiased Manner More controversial than the remaining six Today all major test publishers strive to develop instruments that are fair when used in strict accordance with guidelines in the test manual Some potential problems related to test fairness are more political than psychometric In all questions about tests with regard to fairness, it is important to keep in mind that tests are tools And just like other, more familiar tools, they can be used properly or improperly Assumption 7: Testing and Assessment Benefit Society Considering the many critical decisions that are based on testing and assessment procedures, we can readily appreciate the need for tests, especially good tests What's a "Good Test"? Logically, the criteria for a good test would include clear instructions for administration, scoring, and interpretation A good test would seem to be one that measures what it purports to measure Psychometric soundness of tests, two key aspects of which are reliability and validity Reliability A good test or, more generally, a good measuring tool or procedure is reliable Consistency of the measuring tool: the precision with which the test measures and the extent to which error is present in measurements In theory, the perfectly reliable measuring tool consistently measures in the same way Unreliable measurement is to be avoided We want to be reasonably certain that the measuring tool or test that we are using is consistent o We want to know that it yields the same numerical measurement every time it measures the same thing under the same conditions Validity A test is considered valid for a particular purpose if it does, in fact, measure what it purports to measure Questions regarding a test's validity may focus on the items that collectively make up the test Individual items will also come under scrutiny in an investigation of a test's validity The validity of a test may also be questioned on grounds related to the interpretation of resulting test scores Other Considerations A good test is one that trained examiners can administer, score, and interpret with a minimum of difficulties A good test is a useful test, one that yields actionable results that will ultimately benefit individual testtakers or society at large If the purpose of a test is to compare the performance of the testtaker with the performance of other testtakers, a good test is one that contains adequate norms Also referred to as normative data, norms provide a standard with which the results of measurement can be compared Putting Tests to the Test Answers to questions about specific instruments may be found in published sources of information (such as test catalogues, test manuals, and published test reviews) as well as unpublished sources (correspondence with test developers and publishers and with colleagues who have used the same or similar tests) These guidelines describe three types of assessments relevant to a child custody decision: (1) the assessment of parenting capacity, (2) the assessment of psychological and developmental needs of the child, and (3) the assessment of the goodness of fit between the parent's capacity and the child's needs An educated opinion about who should be awarded custody can be arrived at only after evaluating (1) the parents (or others seeking custody), (2) the child, and (3) the goodness of fit between the needs and capacity of each of the parties Published guidelines and research may also provide useful information regarding how likely the use of a particular test or measurement technique is to meet the Daubert or other standards set by courts Research to determine whether a particular instrument is reliable starts with a careful reading of the test's manual and of published research on the test, test reviews, and related sources A measure of one type of reliability, referred to as test-retest reliability, would indicate how consistent a child's perception of father and mother is over time Validity refers to the extent to which a test measures what it purports to measure The need for rmultiple sources of data on which to base an opinion stems not only from the ethical mandates published in the form of guidelines from professional associates but also from the practical demands of meeting a burden of proof in court What starts as research to determine the validity of an individual instrument for a particular objective may end with research as to which combination of instruments will best achieve that objective Group tests had greater utility than individual tests In evaluating a test, it is critical to consider the inferences that may reasonably be made as a result of administering that test Another issue regarding the generalizability of findings concerns how a test was administered Norms Norm-referenced testing and assessment as a method of evaluation and a way of deriving meaning from test scores by evaluating an individual test taker's score and comparing it to scores of a group of test takers Common goal of norm-referenced tests is to yield information on a testtaker's standing or ranking relative to some comparison group of testtakers Norm in the singular is used in the scholarly literature to refer to behavior that is usual, average, normal, standard, expected, or typical Norms is the plural form of norm, as in the term gender norms o Norms are the test performance data of a particular gorup of testtakers that are designed for use as a reference when evaluating or interpreting individual test scores A normative sample is that group of people whose performance on a particular test is analyzed for reference in evaluating the performance of individual test takers A test administration to this representative sample of testtakers yields a distribution (or distributions) of scores These data constitute the norms for the test and typically are used as a reference source for evaluating and placing into context test scores obtained by individual testtakers The data may be in the form of raw scores or converted scores Norming refers to the process of deiving norms Describe a particular type of norm derivation Race norming is the controversial practice of norming on the basis of face or ethnic background Test manuals provide what are variously known as user norms or program norms, which "consist of descriptive statistics based on a group of testtakers in a given period of time rather than norms obtained by formal sampling methods" Sampling to Develop Norms The process of administering a test to a representative sample of testtakers for the pupose of establishing norms is referred to as standardization or test standardization Test is said to be standardized when it has clearly specified procedures for administration and scoring, typically including normative data Sampling The test developer can obtain a distribution of test responses by administering the test to a sample of the population – a portion of the universe of people deemed to be representative of the whole population The process of selecting the portion of the universe deemed to be representative of the whole population is referred to as sampling Stratified sampling would help prevent sampling bias and ultimately aid in the interpretation of the findings If such sampling were random (that is, if every member of the population had the same chance of being included in the sample), then the procedure would be termed stratified-random sampling Two other types of sampling procedures are purposive sampling and incidental sampling If we arbitrarily select some sample because we believe it to be representative of the population, then we have selected what is referred to as a purposive sample Incidental sample or convenience sample is one that is convenient or available for use Generalization of findings from incidental samples must be made with caution Some groups were deliberately excluded from participation o Persons tested on any intelligence measure in the six months prior to the testing o Persons not fluent in English or who are primarily nonverbal o Persons with uncorrected visual impairment or hearing loss o Persons with upper-extremity disability that affects motor performance o Persons currently admitted to a hospital or mental or psychiatric facility o Persons currently taking medication that might depress test performance o Persons previously diagnosed with any physical condition or illness that might depress test performance (such as stroke, epilepsy, or meningitis) How "Standard" is Standard in Measurement? As a noun, standard may be defined as that which others are compared to or evaluated against Most common use of standard as a noun in the context of testing and assessment is in the title of that well-known manual that sets forth ideals of professional behavior against which any practitioner's behavior can be judged: The Standards for Educational and Psychological Testing, usually referred to simply as the Standards As an adjective, standard often refers to what is usually, generally accepted, or commonly employed The verb "to standardize" refers to making or transforming something into something that can serve as a basis of comparison or judgment Test developers standardize tests by developing replicable procedures for administering the test and for scoring and interpreting the test Part of standardizing a test is developing norms for the test Assessment professionals have reserved the term standardized test for those tests that have clearly specified procedures for administration, scoring, and interpretation in addition to norms Ideally, the test manual, which may be published in one or more booklets, will provide potential test users with all of the information they need to use the test in a responsible fashion o The test manual enables the test user to administer the test in the "standardized" manner in which it was designed to be administered If a standardized test is designed for scoring by the test user (in contrast to computer scoring), the test manual sill ideally contain detailed scoring guidelines The term standardization could be applied to "standardizing" all the elements of a standardized test that need to be standardized o One definition of standardization as applied to tests is "the process employed to introduce objectivity and uniformity into test administration, scoring and interpretation" o More typical use of standardization, however, is reserved for that part of the test development process during which norms are developed o Terms test standardization and test norming have been used interchangeably by many test professionals Raw scores (as well as Z-scores) linearly transformed to any other type of standard scoring systems- that is, transformed to a scale with an arbitrarily set mean and standard deviation – are differentiated from z scores by the term standardized A z-score would still be referred to as a "standard score" whereas a T score, for example, would be referred to as a "standardized score" Types of Standard Error Standard Error of Measurement o A statistic used to estimate the extent to which an observed score deviates from a true score Standard error of estimate o In regression, an estimate of the degree of error involved in predicting the value of one variable from another Standard error of the mean o A measure of sampling error Standard error of the difference o A statistic used to estimate how large a difference between two scores should be before the difference is considered statistically significant Developing Norms for a Standardized Test Establishing a standard set of instructions and conditions under which the test is given makes the test scores of the normative sample more comparable with the scores of future testtakers Important it is that the normative sample take the test under a standard set of conditions, which are then replicated (to the extent possible) on each occasion the test is administered After all the test data have been collected and analyzed, the test developer will summarize the data using descriptive statistics, including measures of central tendency and variability Test manuals sometimes supply prospective test users with guidelines for establishing local norms, one of many different ways norms can be categorized Phrases normative sample and standardization sample are often used interchangeably Types of Norms Percentile norms are the raw data from a test's standardization sample converted to percentile for Percentiles Instead of dividing a distribution of scores into quartiles, we might wish to divide the distribution into deciles, or 10 equal parts Alternatively, we could divide a distribution into 100 equal into deciles, or 10 equal parts Alternatively, we could divide a distribution into 100 equal parts – 100 percentiles In such a distribution, the xth percentile is equal to the score at or below which x% of scores fall Thus, the 15 percentile is the score at or below which 15% of the scores in the distribution fall The 99% of the scores in the distribution fall It can be seen that a percentile is a ranking that conveys information about the relative position of a score within a distribution of scores More formally defines, a percentile is an expression of the percentage of people whose score on a test or measure falls below a particular raw score Intimately related to the concept of a percentile as a description of performance on a test is the concept of percentage correct Note that percentile and percentage correct are not synonymous A percentile is a converted score that refers to a percentage of testtakers Percentage correct refers to the distribution of raw scores – more specifically, to the number of items that were answered correctly multiplied by 100 and divided by the total number of items Popular way of organizing all test-related data, including standardization sample data Lend themselves to use with a wide range of tests Problems with using percentiles with normally distributed scores is htat real differences between raw scores may be minimized near the ends of the distribution and exaggerated in the middle of the distribution Age Norms Also known as age-equivalent scores, age norms indicate the average performance of different samples of testtakers who were at various ages at the time the test was administered Irrespective of chronological age, children with the same mental age could be expected to read the same level of material, solve the same kinds of math problems, reason with a similar level of judgment, and so forth The problem is that "mental age" as a way to report test results is too broad and too inappropriately generalized Grade Norms Designed to indicate the average test performance of testtakers in a given school grade, grade norms are developed by administering the test to representative samples of children over a range of consecutive grade levels Next, the mean or median score for children at each grade level is calculated Fractions in the mean or median are easily expressed as decimals Children learn and develop at varying rates but in ways that are in some aspects predictable Perhaps because of this fact, grade norms have widespread application, especially to children of elementary school age Grade norms do not provide information as to the content or type of items that a student could or could not answer correctly Perhaps the primary use of grade norms is as a convenient, readily understandable gauge of how one student's performance compares with that of fellow students in the same grade One drawback of grade norms is that they are useful only with respect to years and months of schooling completed They have little or no applicability to children who are not yet in school or to children who are out of school Both grade norms and age norms are referred to more generally as developmental norms, a term applied broadly to norms developed on the basis of any trait, ability, skill, or other characteristic that is presumed to devleop, deteriorate, or otherwise be affected by chronological age, school grade, or stage of life National Norms National norms are derived form a normative sample that was nationally representative of the population at the time the norming study was conducted The precise nature of the questions raised when developing national norms will depend on whom the test is designed for and what the test is designed to do Norms from many different tests may all claim to have nationally representative samples National anchor norms An equivalency table for scores on the two tests, or national anchor norms, could provide the tool for such a comparison The method by which such equivalency tables or national anchor norms are established typically begins with the computation of percentile norms for each of the tests to be compared Using the equipercentile method, the equivalency of scores on different tests is calculated with reference to corresponding percentile scores th th Thus, if the 96 percentile corresponds to a score of 69 on the BRT and if the 96 percentile corresponds to a score of 14 on the XYZ, then we can say that a BRT score of 69 is equivalent to an XYZ tests must have been obtained on the same sample – each member of the sample took both tests, and the equivalency tables were then calculated on the basis of these data Although national anchor norms provide an indication of the equivalence of scores on various tests, technical considerations entail that it would be a mistake to treat these equivalencies as precise equalities Subgroup norms A normative sample can be segmented by any of the criteria initially used in selecting subjects for the sample What results from such segmentation are more narrowly defined subgroup norms Local norms Typically developed by test users themselves, local norms provide normative information with respect to the local population's performance on some test Fixed Reference Group Scoring Systems Norms provide a context for interpreting the meaning of a test score A fixed reference group scoring system: the distribution of scores obtained on the test from one group of testtakers – referred to as the fixed reference group – is used as the basis for the calculation of test scores for future administrations of the test Although John and Mary may have achieved the same raw score, they would not necessarily achieve the same scaled score Test items common to each new version of the SAT and each previous version of it are employed in a procedure (termed anchoring) that permits the conversion of raw scores on the new version of the test into fixed reference group scores Norm-Referenced Versus Criterion-Referenced Evaluation One way to derive meaning from a test score is to evaluate the test score in relation to other scores on the same test Criterion as a standard on which a judgment or decision may be based Criterion-referenced testing and assessment may be defined as a method of evaluation and a way of deriving meaning from test scores by evaluating an individual's score with reference to a set standard The criterion in criterion-referenced assessments typically derives from the values or standards of an individual or organization Because the focus in the criterion-referenced approach is on how scores relate to a particular content area or domain, the approach has also been referred to as domain – or content- referenced testing and assessment Because criterion-referenced tests are frequently used to gauge achievement or mastery, they are sometimes referred to as mastery tests Critics of the criterion-referenced approach argue that if it is strictly followed, potentially important information about an individual's performance relative to other testtakers is lost By contrast, brilliance and superior abilities are recognizable in tests that employ norm- referenced interpretations They are the scores that trail off all the way to the right on the normal curve, past the third standard deviation Norm-referenced and criterion-referenced are two of many ways that test data may be viewed and interpreted Terms are not mutually exclusive In a sense, all testing is ultimately normative, even if the scores are as seemingly criterion- referenced as pass-fail At some point in that continuum, a dichotomizing cutoff point has been applied Also make the point that some so-called norm-referenced assessments are made with subject samples wherein "the norm is hardly the norm" Culture and Inference In selecting a test for use, the responsible test user does some advance research on the test's available norms to check on how appropriate they are for use with the targeted testtaker population Historical context should not be lost sight of in evaluation Culturally informed assessment Do's Be aware of the cultural assumptions on which a test is based Consider consulting with members of particular cultural communities regarding the appropriateness of a particular assessment techniques, tests, or test items Strive to incorporate assessment methods that complement the worldview and lifestyle of assessees who come from a specific cultural and linguistic population Be knowledgeable about the many alternative tests or measurement procedures that may be used to fulfill the assessment objectives Be aware of equivalence issues across cultures, including equivalence of language used and the constructs measured Score, interpret, and analyze assessment data in its cultural context with due consideration of cultural hypotheses as possible explanations for findings Culturally informed assessment Do Not's Take for granted that a test is based on assumptions that impact all groups in much the same way Take for granted that members of all cultural communities will automatically deem particular techniques, tests, or test items appropriate for use Take a "one size fits all" view of assessment when it comes to evaluation of persons from various cultural and linguistic populations Select tests or other tools of assessment with little or no regard for the extent to which such tools are appropriate for use with a particular assessee Simply assume that a test that has been translated into another language is automatically equivalent in every way to the original Score, interpret, and analyze assessment in a cultural vacuum
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'