New User Special Price Expires in

Let's log you in.

Sign in with Facebook


Don't have a StudySoup account? Create one here!


Create a StudySoup account

Be part of our community, it's free to join!

Sign up with Facebook


Create your account
By creating an account you agree to StudySoup's terms and conditions and privacy policy

Already have a StudySoup account? Login here

PSYC 320 Final study

by: Aimee Castillon

PSYC 320 Final study PSYC320

Marketplace > George Mason University > Psychlogy > PSYC320 > PSYC 320 Final study
Aimee Castillon
GPA 3.61

Preview These Notes for FREE

Get a free preview of these Notes, just enter your email below.

Unlock Preview
Unlock Preview

Preview these materials now for free

Why put in your email? Get access to more of this material and other relevant free materials for your school

View Preview

About this Document

Sorry this is super lengthy but this is the answers to David Ferrier's study guide. It's a combination of my notes and textbook
Psyc Tests and Measurements
David Ferrier
Study Guide
Psychology, Tests & Measures, final study guide
50 ?




Popular in Psyc Tests and Measurements

Popular in Psychlogy

This 46 page Study Guide was uploaded by Aimee Castillon on Thursday May 5, 2016. The Study Guide belongs to PSYC320 at George Mason University taught by David Ferrier in Spring 2016. Since its upload, it has received 14 views. For similar materials see Psyc Tests and Measurements in Psychlogy at George Mason University.


Reviews for PSYC 320 Final study


Report this Material


What is Karma?


Karma is the currency of StudySoup.

You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 05/05/16
PSYC 320-001 Psychological Tests and Measurements Exam 4: Review Sheet Chapter 1 1) Describe major milestones in the history of testing. - C. 2200 BC: Chinese test public officials - Late 1800s & Early 1900s - Carl Fredeurich Gauss discovers the normal distribution when evaluating measurement error - Civil service exams used in Europe - 19th century - Physicians and psychiatrists assess mental patients with new techniques - Brass instruments era- emphasis on measuring sensory and motor abilities - Civil service exams initiated in US in 1883 - Early attention to questionnaires and rating scales by Galton and Cattell - 1905: Binet-Simon Scale released-- ushers in era of intelligence testing - 1917: Army Alpha and Beta released -- first group of intelligence tests - 1918: Woodworth Personal Data Sheet released-- ushers in era of personality assessment - 1920s: Scholastic Aptitude Test (SAT) and Rorschach inkblot test developed-- testing expands it influence - 1930s: David Wechsler releases Wechsler-Bellevue I- initiates a series of influential intelligence tests - 1940s: MInnesota Multiphasic Personality Inventory (MMPI) released-- destined to become the leading objective personality inventory 2) Define test, measurement, and assessment. - Test: device or procedure in which a sample of an individual’s behavior is obtained, evaluated, and scored using standardized procedures. - Measurement- set of rules for assigning a numbers to represent objects, traits, attributes, or behaviors - Assessment: a systematic procedure for collecting information that can be used to make inferences about the characteristics of people or objects. 3) Describe and give examples of different types of tests. - Maximum performance tests assess the upper limits of the examinee’s knowledge and abilities - Maximum performance tests - Achievement tests: Assess the knowledge or skills of an individual in a content domain in which they have received instruction - I.e. School exams - Aptitude tests: Designed to measure the cognitive skills, abilities, and knowledge that an individual has accumulated as the result of overall life experiences - I.e. SAT and ACT - Objective vs. subjective tests - Objective tests- less disagreement in how it is graded - I.e. True/false, multiple-choice questions - Subjective tests- different graders → different scores - I.e. Essay questions, open-ended questions, debates - Speed vs. power tests - Speed tests- performance only reflects differences in the speed of performance (how much you can do in a certain amount of time) - Power tests- performance reflects difficulty of the items the test taker is able to answer correctly (I.e. GRE and SAT) - Typical response tests - Objective tests - Use items that are not influenced by the subjective judgment of the person scoring the test - Projective tests - Presentation of unstructured or ambiguous material that elicits an almost infinite range of responses → requires subjectivity 4) Describe and give examples of different types of score interpretations. - Norm-referenced scores: Interpretation is relative to other people - I.e. Percentile ranks - Criterion-referenced scores: Interpretation is absolute - I.e. Percent correct on an exam 5) Describe and explain the assumptions underlying psychological assessment. - Psychological constructs (i.e. intelligence, depression, attitudes, etc.) exist - Psychological constructs can be measured - Measurement is not perfect-- there is usually some degree of error is inherent in all measurements - There are different ways to measure a construct - All procedures have strengths and weaknesses - Multiple sources of information should be part of the process (think about the college application process) - Performance on tests can be generalized - Assessment can provide information - Assessments can be fair - Assessments can benefit individuals and society 6) Describe and explain the major applications of psychological assessments. - Diagnosis - Treatment planning and effectiveness - Selection, placement, and certification - Self-understanding - Evaluation - Licensing - Program evaluation - Scientific method 7) Explain why psychologists use tests? - people are not good at judging others - Tests provide objective information that helps us make better decisions 8) Describe the major participants in the assessment process. - people who develop tests - People who use tests - People who take tests Chapter 2 1) Describe how linear regression is used to predict performance. - Linear Regression- a relationship best represented by a straight line - allows you to predict values of one variable given information on another variable, typically seen on a scatter plot the data can be assessed by the direct of line (Positive relationship, negative relationship, no relationships). 2) Describe the different scales of measurement and give examples - Nominal Scales- classify people or objects to categories, classes, or sets - i.e. eye color, gender, type of sport, college major, music genre - Ordinal Scales- there’s a clear order, but the magnitude makes no sense - i.e. rank according to height, age classification - Interval Scales- rank ppl or objects like an ordinal scale, but on a scale with equal units (i.e. IQ, temperature, percentile rank) - no true zero - Ratio Scales- equal intervals between; has a true zero! - i.e. weight in pounds, height of buildings in city, distance from one city to another, percentage score 3) Describe the measures of central tendency and their appropriate use. - Mean- simple, arithmetic average - Mean = sum of scores / number of scores - Pros: - Useful with interval and ratio level data - Mean of a sample is a good estimate of the mean for the population from which the sample was drawn - essential when calculating many useful stats - Cons - Sensitive to unbalanced extreme scores (outliers) - Median- point that divides the distribution in half - Insensitive to extreme scores; often preferred with skewed distribution - Useful for ratio, ordinal, and interval level data - Mode- most frequent score - useful with all four scales of measurement (nominal, ordinal, ratio, and interval) - Limitations - Some distributions have two scores that are equal in frequency and higher than other scores (i.e. “bimodal” distributions) - Not a stable measure of central tendency, particularly with small samples 4) Describe the measures of variability and their appropriate use. - Range = highest score - lower score - “The range considers only the two most extreme scores in a distribution and tells us about the limits or extremes of a distribution.” - “However, it doesn’t provide information about how the remaining scores are spread out or dispersed within these limits” - standard deviation- average distance that scores vary from mean - The larger the standard distribution, the more scores differ from the mean and the more variability there is in the distribution - variance- a measure of the degree of variability in scores - special meaning as a theoretical concept 5) Correctly interpret descriptive statistics (e.g., skew, kurtosis). - skewness - - - kurtosis - leptokurtic - - platykurtic - - mesokurtic - 6) Explain the meaning of correlation coefficients and how they are used. - Correlation coefficients- quantitative measures of the relationship between two variables - designated by r - range from -1.0 to +1.0 - sign of the coefficient - indicates the direction of the relationship - magnitude or absolute size - indicates strength of the relationship 7) Explain how scatterplots are used to describe the relationships between two variables. - Scatterplots are graphs that visually display the relationship between two variables 8) Describe major types of correlation coefficients. - Pearson-Product Moment Correlation Coefficient - when variables are on an interval or ratio scale - Spearman Rank Correlation Coefficient - when variables are on an ordinal scale - Point-Biserial Correlation Coefficient - one variable dichotomous; one on an interval or ratio scale - i.e. comparing gender with test scores 9) Distinguish between correlation and causation. Correlation DOES NOT MEAN causation Chapter 3 1) Describe “raw scores” and explain their limitations. - Raw scores are simply the number of items scored or coded in specific manner - Limitations - often of limited use - to give raw scores more meaning, we need to transform them (i.e. scaled; derived; standard) scores 2) Define norm-referenced and criterion-referenced score interpretations and explain their major characteristics. - Norm-referenced: examinee’s performance is compared to performance of other people (i.e. intelligence test scores) - The most important consideration when making norm-referenced interpretations involves the relevance of the group of individuals to whom the examinee’s performance is compared. The reference group from which the norms are derived should be representative of the type of individuals expected to take the test or the group to which the examinee is to compared or referenced - Criterion-referenced: examinee’s performance is compared to a specified level of performance (i.e. driver’s test, mastery test) - Emphasizes what the examinee knows or what he/she can do, not the person’s standing relative to other test takers 3) List and explain the important criteria for evaluating standardization data. - representative - appropriate for application - current - i.e. Flynn Effect- “substantial and long-sustained increase in both fluid and crystallized intelligence test scores measured in many parts of the world from roughly 1930 to the present day” - adequate size (bigger the sample, the better the applicability) 4) Describe the normal curve and explain its importance in interpreting test scores. The normal curve(Bell curve) is way to interpret data by from across different testing platforms, for example comparing two different college exam scores from two different tests. The score from both tests are given a standard score then compared on the normal curve (mean of 0 and standard deviation 1). 5) Describe the major types of standard scores. - z-scores: mean of 0 and SD of 1 - t-scores: mean of 50 and SD of 10 - IQs: mean of 100 and SD of 15 - CEEB scores (SAT/GRE): mean of 500 and SD of 100 6) Convert standard scores (e.g., z-scores, T-scores, etc.,) from one format to another. - formula: - new standard score = x SS2 + SD SSx ( x−xSS1 ) SDSS1 - X = original standard score - x SS1 = mean of original standard score format - SD SS= standard deviation of original standard score format - x SS2 = mean new standard score format - SD 2 = mean of new standard score SS format 7) Define normalized standard scores and describe the major types of normalized standard scores. - normalized standard score- underlying distributions that were not originally normal, but were transformed into having a normal distribution - Stanine Scores: mean of 5 and SD of 2 - Wechsler scaled scores: mean of 10 and SD of 3 - Normal Curve equivalents (NCE): mean of 50 and SD of 21.06 8) Define percentile rank and explain its interpretation. - percentile rank- reflects the percentage of people scoring below a given point - I.e. a percentile rank of 80 indicates that 80% of the individuals in the standardization sample scored below this score 9) Define grade (and age) equivalents and explain their limitations. - grade equivalents- based on percentile ranks - many limitations - subject to misinterpretation - age equivalents- same limitations as grade equivalents 10) Describe some common applications of criterion-referenced score interpretations. - percentage correct - i.e. score of 85% on classroom test - mastery testing - i.e. pass/fail on driver’s license exam - standards-based interpretations - i.e. assigned an “A” to reflect superior work 11) Explain how tests can be developed that produce both norm- referenced and criterion-referenced interpretations. - “For example, it would be possible to interpret a student’s test performance as ‘by correctly answering 75% of the multiplication problems (criterion), the student scored better than 60% of the students in the class (norm).’” 12) Describe Item Response Theory and the properties of IRT or Rasch-type scores. - IRT scores are similar to traditional raw scores - they can be transformed to norm- or criterion- referenced scores - but- they are interval level scores, and they have stable standard deviations across age groups - go by different names: - Rasch or IRT-scores - Change Sensitive Scores (CSS) - W Scores (WJ-III) 13) Give an example of a qualitative score description and explain its purpose. Qualitative Descriptions of test scores - helps communicate test results in an accurate and consistent manner - i.e. Stanford-Binet Intelligence Scale, Behavior Assessment System for Children Chapter 4 1) Define and explain the importance of reliability in psychological assessment. - Reliability refers to the consistency, accuracy, or stability of assessment results 2) Define and explain the concept of measurement error. - Measurement error: “difference between a measured value of quantity and its true value” - Some degree of measurement error is inherent in all measurement 3) Explain classical test theory and its importance to psychological assessment. - CTT is the most influential theory to help us understand measurement issue - Holds that every score has two components: - True score that reflects the examinee’s true skills, abilities, knowledge, etc. - Error score 4) Describe the major sources of measurement error and give examples. - Content sampling error- largest source of error - Time-sampling error reflects random fluctuations in performance from one situation or time to another and limits our ability to generalize test results across different situations - I.e. some students did not get a good night’s sleep the night before the test = decreased performance - Other sources - Administrative and scoring errors - I.e. Clerical errors - Inter-rater differences: would the test taker receive the same score if different individuals graded the test 5) Identify the major methods for estimating reliability - Reliability coefficients - Test-retest reliability - Alternate-form reliability - Simultaneous administration - Delayed administration - Internal consistency coefficients - Split-half reliability- dividing administered test into two equivalent halves that are scored independently - Coefficient alpha (Cronbach) - Kuder-Richardson reliability (KR-20) - Inter-rater reliability- multiple people independently score an individual’s test 6) Identify the sources of measurement error that are reflected in different reliability estimates. - Type of reliability Major source of error variance Test-retest reliability time-sampling Alternate-form reliability - Simultaneous - Content sampling - delayed - Time-sampling and content sampling Split-half reliability Content sampling Coefficient alpha and KR-20 Content sampling and item heterogeneity Inter-rater reliability Differences due to raters/scores 7) Explain how multiple scores can be combined in a composite to enhance reliability. - This involves combining the scores on several different tests or subtests - I.e. The WAIS-3rd Edition is composed of 11 subtests that are used in the calculation of the Full Scale Intelligence Quotient (FSIQ) 8) Describe the factors that should be considered when selecting a reliability coefficient for a specific assessment application. - “One should consider factors such as the nature of the construct and how the scores will be used when selecting an estimate of reliability” - If a test is designed to be given more than one time to the same individuals = test-retest and alternate-forms with delayed administration are ideal - When a test is designed to be given only one time = internal consistency is ideal - I.e. KR-20 and coefficient alpha are ideal for test measuring a homogeneous domain of knowledge or unitary characteristic 9) Describe steps that can be taken to improve reliability. First, the most obvious approach is to increase the number of items on the test to increase split-half reliability, which involves administering the test and then dividing it into two equivalent halves that are scored independently. The Spearman-Brown formula can be used to analyze the split-half reliability of the test. Secondly, by using a composite score, instructors can use multiple measurements that are combined into one score. This is useful for situations in which various factors limit the number of items that one can include on the test, such as when a professor develops tests that will be given during their allocated class period. Developing better items using item analyses is another important step to increase reliability. Usually the ideal item difficulty level is 0.50 because that means that 50% of the test takers answered the question correctly while the other 50% of the test takers got that question wrong. This can be calculated by the following formula: p = number of examinees correctly answering the item / number of examinees. Item discrimination (D), which refers to how well an item can discriminate among test takers who on the construct being measured by the test, should also be accounted for. This can be calculated by the following formula: D = p T p B If the item has a D value of 0.30 or above, then the item is acceptable. If not, then the item must be reviewed since it can be too easy, too hard, or unrelated to the construct. Lastly, you can reduce measurement error by practicing “good housekeeping items,” such as providing explicit instructions of how to administer and score the test, using reliable scoring, and requiring extensive training before individuals can administer, grade, or interpret a test. 10) Discuss special issues in estimating reliability - Reliability of speed tests - Ideal to use test-retest or alternate-form reliability - Reliability as a function of score level - Range restriction - Mastery testing - “Reliability analyses of mastery tests typically focus on the consistency of classification - Correction for attenuation 11) Define the standard error of measurement (SEM) and explain its importance. The standard error of measurement (SEM) is the standard deviation of the distribution of scores that would be obtained by one person if he/she were tested on an infinite number of parallel forms of a test comprised of items randomly sampled from the same content domain. This means that, even though test scores contain some error, SEM provides how much confidence can be placed in accuracy of a person’s test score—the greater the reliability of test scores, the smaller the SEM and more confidence we have in the scores’ precision. 12) Explain how SEM is calculated and describe its relation to reliability. The reliability coefficient takes into consideration measurement errors and standard deviation of the scores. When SEM increases, reliability decreases (a negative relationship). 13) Explain how confidence intervals are calculated and used in educational and psychological assessment. A confidence interval is typically measured using the SEM and is used to try to ascertain an individual's true score. Since you only have an observable score it attempts to give an idea of where the true score would be. For example if could scored a 70 on an exam with a SEM 3 and a 95% confidence interval (1.96 on the Z table), That would be 70 plus or minus 3(1.96) meaning you’re 95% confident that your true score will be between 64.12 and 75.88. 14) Explain the basics of Generalizability Theory. - CTT- only an undifferentiated error component - Generalizability theory- shows how much variance is associated with different sources of error 15) Explain the basics of Item Response Theory (IRT). Item Response Theory (IRT) is a theory of mental measurement that holds that the responses to the items on a test are accounted for by latent traits (latent traits are abilities or characteristics that exist based on behavioral theories but cannot be assessed directly). It is a complex model that describes how examinees at different levels of ability will respond to individual test items. Chapter 5 1) Define validity and explain its importance in the context of psychological assessment. - Validity refers to the appropriateness or accuracy of the interpretation of test scores - I.e. “If scores are interpreted as reflecting intelligence, do they actually reflect intellectual ability?” - In other words, the validity of the interpretations of test scores is directly tied to the usefulness of the interpretations 2) Describe the major threats to validity. - Construct underrepresentation: test measures either less or more than the construct it’s supposed to measure - Construct-irrelevant variance: test doesn’t measure important aspects of specified construct 3) Explain the relationship between reliability and validity. - scale can be reliable but not valid - Scale cannot be unreliable but be valid 4) Trace the development of the contemporary conceptualization of validity. - 1974: Validity as three types: content validity, construct validity, and criterion validity - 1985: Validity as three interrelated types: Content-related validity, criterion-related validity, and construct-related validity - 1999: validity as unitary construct: - Evidence based on content; evidence based on relations to other variables; evidence based on response processes; evidence based on internal structure; evidence based on consequences of testing 5) Describe the five categories of validity evidence specified in the 1999 Standards. - Evidence based on test content - Evidence based on relations to other variables - Evidence based on internal structure - Evidence based on response processes - Evidence based on consequences of testing 6) For each category of validity evidence, give an example to illustrate the type of information provided. First, there is evidence based on test content. Item relevance and content coverage are two major issues. Item relevance involves examining each individual test item and determine whether it reflects essential content in the specified domain. For example, for a test assessing depression symptoms, a question about the frequency of thinking about suicidal thoughts would be a relevant item. Content coverage involves looking at the overall test and rate the degree to which the items cover the specified domain. For example, if all of the items dealt with depression symptoms, then the test has good content coverage. Secondly, validity evidence can be secured through examining relationships between test scores and other variables. According to the Standards for educational and psychological testing, test-criterion evidence is a “measure of some attribute or outcome that is of primary interest” (1999). The two different types are predictive study and concurrent study. The predictive study is the degree to which a test predicts/correlates with external criteria that is measured some time in the future. For example, does a depression scale predict later clinical diagnosis? Or, does the SAT score predict later college GPA? In contrast, concurrent study is the degree to which the measure correlates with different criteria that is measured at the same time. For example, you can administer the SAT to students in their first semester of college and then correlate their SAT scores with their GPA at the end of the semester. Convergent evidence occurs when you correlate a test with existing tests that measure the same or similar constructs. For example, you can make your own intelligence test and then compare the scores with scores on the Wechsler Intelligence Scale for Children —if there is a strong correlation between those two, then it is evidenced that your intelligence test actually measured intelligence. In contrast discriminant evidence occurs when you correlate a test with existing tests that measure dissimilar constructs. For example, you might correlate the depression scale scores with the stress scale scores. Since depression and stress are not the same thing, you would expect a negative correlation between the measures. Lastly, contrasted group studies involve examining different groups that are expected to differ on the construct measured by the test. For example, for an intelligence measure, you can compare one group that has individuals with mental retardation with another group with normal control participants. The third evidence is based on internal structures, which means that the relationships between test items are consistent with the construct the test is designed to measure. Factor analysis is used to determined the number of conceptually distinct factors or dimensions underlying a test or a battery of tests. For example, you could create a personality test based off of the Big Five and then run a factor analysis report to support the validity of the test you created. The fourth evidence is based on response processes. This involves analyzing the fit between the performance and actions the examinees actually engage in and the construct being assessed. For example, if you want to test an examinee on their reasoning ability, it would also be important to see the examinee’s response process to verify that they are actually engaged in the analysis. So, you could conduct an interview, record response times and eye movements, or analyze the types of error committed by the examinee. The fifth and final evidence is based on consequences of testing. For example, if a survey was given to students at the end of the semester about their professors, it can be assumed that the results will result in better teaching styles. 7) Explain how validity coefficients are interpreted. - Predictive and concurrent validity studies examine the relationship between a test and a criterion and the results are often reported in terms of a validity coefficient - Coefficients should be large enough to indicate that information from the test will help predict how individuals will perform on the criterion measure (ideally 0.7) 8) Define the standard error of estimate and explain its interpretation. - Standard error of estimate describes the amount of prediction error due to the imperfect validity of interpretation of a test score - “Allows researchers to calculate confidence intervals around a client’s predictor score that reflect a range of scores that will include his or her actual score with a prescribed probability.” 9) Describe the steps in factor analysis and how factor analytic results can contribute evidence of validity. - Begin with a table of intercorrelations among variables (correlation matrix) - Select a factoring technique and applying it to the data - Determine how many factors to retain - Create factor matrix that reflects the correlations between the variables and the factors 10) Explain how validity evidence is integrated to develop a sound validity argument. Validity evidence is needed to develop a sound validity argument because validity is ultimately needed to interpret the technical quality of a testing system by including evidence of careful test construction; adequate score reliability; appropriate test administration and scoring; equating and standard setting; and attention to fairness. 11) Review validity evidence presented in a test manual and evaluate the usefulness of the test scores for specific purposes. - “Ultimately, the validity of an intended interpretation of test scores relies on all the evidence relevant to the technical quality of a testing system. This includes evidence of careful test construction; adequate score reliability; appropriate test administration and scoring, accurate score scaling; equating, and standard setting; and careful attention to fairness for all examinees” (Standards, p. 17) Chapter 6 1) Distinguish between objective and subjective items. - Objective items: multiple choice, true/false (only one correct answer) high level of agreeability. - Subjective items: essays (graded by individual’s judgement) low level of agreeability. 2) Distinguish between selected-response and constructed-response items. - Selected-response: focus on recall (memory) (multiple choice) - Constructed-response: deeper level of processing (requires a test-taker to construct or create a response) (short answer) 3) Specify the strengths and limitations of selected-response and constructed-response items and the specific item formats. - Selected-response items - Pros - Include a relatively large number of selected-response items in your test - Can be scored in an efficient, objective, and reliable manner - Flexible and can be used to assess a wide range of abilities - They can reduce the influence of certain construct-irrelevant factors - Cons - Difficult to write - Not able to assess all abilities - Subject to random guessing - Construct-response items - Pros - Easier to write - Well suited for assessing higher order cognitive abilities and complex task performance - Eliminate random guessing - Cons - Cannot include a lot of items in a test - More difficult to score in a reliable manner - Vulnerable to feigning - Vulnerable to the influence of construct- irrelevant factors 4) Understand and describe general guidelines for developing items. - Provide clear directions - Present the question, problem, or task in as clear and straightforward a manner as possible - Develop items and tasks that can be scored in a decisive manner - Avoid inadvertent cues to the answers - Arrange items in a systematic manner - Ensure that individual items are contained on one page - Tailor the items to the target population - Minimize the impact of construct-irrelevant factors - Avoid using the exact phrasing from study materials - Avoid using biased or offensive language - Use a print format that’s clear and easy to read - Determine how many items to include Chapter 7 1) Discuss the relationship between the reliability and validity of test scores and the quality of the items on a test. - “The reliability of test scores and the validity of the interpretation of test scores are dependent on the quality of the items on the test. If you can improve the quality of the individual items, you will improve the overall quality of your test.” 2) Describe the importance of the Item Difficulty Index and demonstrate its calculation and interpretation. - Item difficulty- percentage or proportion of test takers who correctly answer the item - Formula: p = (Number of Examinees Correctly) / (Answering the Item Number of Examinees) - For maximizing variability and reliability, the optimal item difficulty is 0.50 - Not necessary for ALL items though - Different levels are desirable in different testing situations - Constructed-response items - 0.50 is typical - Selected-response items - optimal level varies 3) Describe the importance of the Item Discrimination and demonstrate its calculation and interpretation. - Item discrimination refers to how well an item can discriminate (differentiate) among test takers who differ on the construct being measured by the test - calculation - p = item difficulty for top XX% of test takers T - pB= item difficulty for bottom XX% of test takers - The difference between p and T is theB discrimination index, designated as D, and is calculated with the following formula: - D = p -Tp B - items with D values over 0.30 are acceptable (the larger the better) - items with D values below 0.30 should be reviewed - might be too easy, too hard, or too unrelated 4) Describe the relationship between item difficulty and discrimination. Items in varying degrees of difficulty should be able to discriminate between test taking with varying levels of ability if the item is “good”. If it is not then anyone across the spectrum can answer correctly. 5) Describe how item-total correlations can be used to examine item discrimination. - can be examined by correlating performance on the items (scored as either 0 or 1) with the total test score - “Item-total calculated on the adjusted total will be lower than that computed on the unadjusted total and is preferred because the item being examined does not ‘contaminate’ or inflate the correlation” - higher correlations: items and test measuring same thing 6) Describe how the calculation of Item Discrimination can be modified for mastery tests. - For mastery tests, traditional item discrimination indexes tend to underestimate an item’s true measurement characteristics. - two groups - one group that has received instruction - one that has not - Formula: D = p instruction no instruction - administer same test twice - D = p posttest p prettest - limitations - Requires that the test be used as both a pretest and posttest - May involve carryover effects - use the cut-score to determine group - D = p mastery- p non mastery - can be calculated based on the data from one test administration with one sample 7) Describe the importance of the Distracter Analysis and describe how the selection of distracters influences item difficulty and discrimination. - Distracter analysis allows you to examine how many examinees in the top and bottom groups selected each option on a multiple-choice item. - effective distractor - selected by some examinees - Attract more examinees in the bottom group than the top group - Demonstrate negative discrimination 8) Show how item analysis statistics can be used to improve test items. - Item difficulty: optimal difficulty level is 0.5, indicating that half of the examinees answered item correctly and half answered it incorrectly - Item discrimination: D values over 0.3 are acceptable, while items below 0.3 need review - Distracter analysis: - First a properly functioning distracter should “distract” some examinees - If a distracter is obviously wrong and no one selects it, it is useless and deserves attention - second involves discrimination - Effective distracters should attract more examinees in the bottom group than in the top group - Qualitative approaches - Proofread the test after setting it aside for a few days - Have a trusted colleague review the test - Test develops should get feedback from examinees regarding the clarity of directions and identification of ambiguous items 9) Describe how item analysis procedures can be applied to performance assessments. : Consider a cut score on a performance exam and what that qualitatively  means. You would want an item that discriminates between someone who has what it takes and someone who doesn't.  10)   Describe the major concepts related to and interpret Item Characteristic Curves. The item characteristic curve (ICC) is a graph with ability reflected on the horizontal axis and the likelihood of a correct response reflected on the vertical axis. The halfway point between the upper and lower asymptotes is the inflection point, and it represents the difficulty of the item. Discrimination (a point) is reflected by the slope of the ICC at that point. The 1-Parameter Model of IRT, or the Rasch model, shows that the items differ in difficulty (b point). In other words, all the items have equal difficulty. Thus, the ICCs for the items will have the same S shape and slope, but will differ in their inflection point. The 2-Parameter model assumes that the items differ in both difficulty and discrimination, and thus differ in their inflection points and in their slopes. It also reflects real-life test development applications than the Rasch model. The 3-Parameter model assumes that even if the examinee does not have an “ability,” there is still a probability that he/she may answer the question correctly simply by chance (c point). This model is usually used on selected-response items. For example, the examinee has a 25% chance to correctly answer the item by guessing, and a 50% chance of getting a true-false question right by guessing. 11)     Explain the major aspects of Item Response Theory. Item Response Theory (IRT) is a theory of mental measurement that holds that the responses to the items  on a test are accounted for by latent traits (latent traits are abilities or characteristics that exist based on behavioral  theories but cannot be assessed directly). It is a complex model that describes how examinees at different levels of  ability will respond to individual test items. 12)     Identify major contemporary applications of Item Response Theory. - Computer adaptive testing (CAT) - Detecting biased items - Scores based on item response theory - Reliability Chapter 8 1)      Describe the characteristics of standardized tests and explain why standardization is important. a.   Assessment tests are designed to assess an examinee’s knowledge or skills in a  content domain in which he or she has received instruction 2)      Describe the major characteristics of achievement tests. a.   Standardized achievement tests typically contain high­quality items that were selected  on the basis of both quantitative and qualitative item analysis procedure b.  They have precisely stated directions for administration and scoring so that consistent  procedures can be followed in different settings c.  Many contemporary standardized achievement tests provided both norm­referenced and criterion­referenced interpretations d.  Normative data are based on large, representative samples e.  Equivalent or parallel forms of the test are often available f   They have professionally developed manuals and support materials that provide  extensive information about the test, how to administer, score and interpret it, and its  measurement characteristics. 3)      Describe the major uses of standardized achievement tests in schools. a.   Track student achievement over time or to compare group achievement across classes,  schools, or districts b.  Standardized tests are increasingly being used in high­stakes decision making in public  schools c.  Help identify academic strengths and weaknesses of individual examinees d.  Used to evaluate the effectiveness of instructional programs or curricula and help  educators identify areas of concern e.  Identify students who need special educational requirements 4)      Explain what "high-stakes testing" means and trace the historical development of this phenomenon. ● “High­stakes testing”­ any test used to make important decisions about students,  educators, schools, or districts, most commonly for the purpose of accountability—i.e.,  the attempt by federal, state, or local government agencies and school administrators to  ensure that students are enrolled in effective schools and being taught by effective  teachers. ●  During 1970s, news reports indicated that students couldn’t demonstrate the  basic academic skills (i.e. reading and writing) ○ Parents questioned the quality of the education their children were receiving ○ Legislators implemented statewide minimum­competency testing  programs ○ Guarantee that public school graduates met the minimum  academic standards ○ Eventually, many schools developed more sophisticated  assessment programs that used both state­developed tests and commercially  produced nationally standardized achievement tests ● 2001: No Child Left Behind Act­ each state should develop high academic  standards and implement annual assessments to monitor the performance of states,  districts, and schools. 5)     Compare and contrast group administered and individually administered achievement tests and describe the strengths and weaknesses of group administered and individually administered achievement tests. a.  Group­administered tests­ tests that can be administered to more than one examinee b.  Individually­administered test­ test that can be administered to one individual only 6)     Discuss the major issues and controversies surrounding state and high- stakes testing programs. ● Group­administered test ○  Pros: ■ Efficiency ■ more uniform testing ■ can be scored objectively ■ large standardization or normative samples ○ Cons: ■ little personal interaction with examinees ■ Types of items typically included on group  achievement tests ■ Lack of flexibility ● Individually­administered test ○ Pros: ■ More thorough assessment of examinee’s skills ■ Observe examinee closely and gain insight into  source of learning problems ■ Scored individually ■ Room to have open­ended questions ○ Cons ■ Costly ■ timely 7)     Describe and evaluate common procedures used for preparing students for standardized tests. ● Pros ○ Increase academic expectations and ensuring that all students are judged according to the same standards ● Cons ○ Emphasize rote learning and often neglect critical thinking,  problem solving, and communication skills ·   Tests were culturally biased and aren’t fair to minority students 8)      Describe and explain the major factors that should be considered when selecting standardized achievement tests. a.   Content covered, its technical properties, and practical issue such as cost and time  requirements 9)      Describe and explain major factors that should be considered when developing a classroom achievement test. a.   Specify educational objectives b.  Develop a test blueprint c.  Determine how the scores will be interpreted d.  Select item format e.  Assignments of grades 10)     Describe major factors that should be considered when assigning student grades. a.   Base grades of academic achievement b.  Choose a frame of reference c.  Report student progress d.  Keep grades confidential 11)     Describe some of the uses of achievement tests outside of the school setting. a.   Driver’s licenses; professional licensing (i.e. passing the bar or board); tests for job  promotions Chapter 9 1)      Compare and contrast the constructs of achievement and intelligence/aptitude. a.   “We defined achievement tests as those designed to assess students’ knowledge or skills in a content domain in which they have received instruction. In contrast, aptitude tests are designed to measure the cognitive skills, abilities, and knowledge that individuals have accumulated as the result of their overall life experiences” 2)      Discuss the major milestones in the history of intelligence assessment. ● Early 1900s ○ Theodore Simon and Alfred Binet devised a test aimed at identifying schoolchildren with intellectual disability laying the groundwork for today’s intelligence testing ○ Binet developed an intelligence test that assessed general cognitive abilities (attention, judgment, and reasoning skills) ■ Mental age vs. chronological age ● Chronological age- how old one is ● Mental age- age that reflects a child’s mental abilities in comparison to the “average” child ○ Lewis Terman adapts Binet-Simon scale => Stanford-Binet ■ Intelligence Quotient = (MA/CA) x 100 ○ David Wechsler challenged shortcomings in Stanford-Binet ■ Scores on individual subscales measuring different mental abilities ■ Deviation IQ is based on normal distribution ○ Other tests have come out in last 30 years ■ Kaufman Assessment Battery for Children (KABC) ■ Differential Ability Scale (DAS) ■ Woodcock-Johnson Tests of Cognitive Abilities (WJ) ○ The Wechsler scales remain the most frequently used individually administered intelligence scales. 3) Describe the major uses of aptitude and intelligence tests. ­ Provide alternative measures of cognitive abilities that reflect  information not captured by standard achievement tests or school grades ­ Help educators tailor instruction to meet a student’s unique pattern of cognitive strengths and weaknesses ­ Assess how well students are prepared to profit from school  experiences ­ Identify clients who are underachieving and may need further  assessment to rule out learning disabilities or other cognitive disorders, including  mental retardation, as well as disability determination ­ Identify students for gifted and talented programs ­ Provide a baseline against which other client characteristics may  be compared ­ Helping guide students and parents with educational and  vocational planning 4) Describe and evaluate (i.e., differentiate) the 3 major individually administered intelligence tests (ignore the RIAS) ·  Wechsler Intelligence Scale for Children­ Fourth Edition (WISC­IV) ­ Most popular individual intelligence test used in  clinical and school settings with children ­ Approximately 2 to 3 hours to administer/score ­ Depends on what subtests  administered ­ If scoring is by hand ­ Normed for use with children aged 6­16  ­ Different versions of the Wechsler  scales must be used for different ages ­ Produces four index scores ­ Verbal comprehension index (VCI) ­ Perceptual reasoning index (PRI) ­ Working memory index (WMI) ­ Processing Speed Index (PSI) ­ This four­index framework is based on factor  analytic and clinical research. ­ Stanford­Binet Intelligence Scales, 5th Edition (SB5) ­ Ages 2­85 ­ Produces five factors indexes: ­ Fluid reasoning ­ Knowledge ­ Quantitative reasoning ­ Visual­spatial processing ­ Working memory ­ Three IQ scores ­ Verbal ­ Nonverbal ­ Full scale ­ Woodcock­Johnson III Tests of Cognitive Abilities (WJ III Cog) ­ For use with individuals 2 to 90 years of age ­ Based on the Cattell­Horn­Carroll (CHC) theory of  cognitive abilities ­ Comprehensive assessment of CHC abilities 5) Identify the major college admission tests and describe their use. (p. 295­297) ­ Scholastic Assessment Test (SAT) ­ Created by College Board and was designed to  provide colleges and universities with a valid measure of students’  academic abilities ­ 3 sections: Critical Reading, Mathematics, Writing ­ Preliminary SAT (PSAT)­ practice for SAT ­ Identify student’s academic  strengths and weaknesses ­ American College Test (ACT) ­ Designed to assess the academic development of  high school students and predict their ability to complete college work ­ Four skill areas: English, Mathematics, Reading,  and Science Reasoning ­ Later included optional 30­minute  writing test Chapter 10 1)      Compare and contrast maximum performance tests and typical response tests. ­ Maximum performance test ­ Items are scored correctly or incorrectly; examinees should do their best ­ I.e. achievement and aptitude tests ­ Typical response tests assess constructs such as personality,  behavior, attitudes, or interests 2)      Define and give examples of "response sets." a.   Response sets­ examinee unconsciously responds in either a positive or negative  manner b. I.e. individual who is hoping to win a large settlement in a court case might exaggerate the mental distress he or she is experiencing as the result of a traumatic event. 3)      Explain how test "validity scales" can be used to guard against response sets and give examples. a.   “Validity scales” are designed to detect individuals who are not responding in an  accurate manner (i.e. “I never lie”, “My life is perfect”) 4)     Explain factors that make the assessment of personality more challenging in terms of reliability and validity. ­ Response sets may compromise the validity of the results ­ Response biases are more problematic on personality  assessments ­ Constructs measured by personality tests may be less stable than  the constructs measured by maximum performance tests ­ Trait vs. state ­ When measuring unstable variables, psychologists will refer to the test­retest reliability of the score as poor or as having poor accuracy when in fact  the test has measured the variable quite well 5)     Distinguish between objective and projective personality tests and give examples of each. ­ objective self­report measures­ endorse selected­resp


Buy Material

Are you sure you want to buy this material for

50 Karma

Buy Material

BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.


You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

Why people love StudySoup

Bentley McCaw University of Florida

"I was shooting for a perfect 4.0 GPA this semester. Having StudySoup as a study aid was critical to helping me achieve my goal...and I nailed it!"

Amaris Trozzo George Washington University

"I made $350 in just two days after posting my first study guide."

Jim McGreen Ohio University

"Knowing I can count on the Elite Notetaker in my class allows me to focus on what the professor is saying instead of just scribbling notes the whole time and falling behind."

Parker Thompson 500 Startups

"It's a great way for students to improve their educational experience and it seemed like a product that everybody wants, so all the people participating are winning."

Become an Elite Notetaker and start selling your notes online!

Refund Policy


All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email


StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here:

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.