Educational Testing and Measurement
Educational Testing and Measurement EDP 560
Popular in Course
Popular in Educational Psychology
This 103 page Class Notes was uploaded by Ebony Nikolaus on Thursday October 15, 2015. The Class Notes belongs to EDP 560 at North Carolina State University taught by John Nietfeld in Fall. Since its upload, it has received 28 views. For similar materials see /class/223734/edp-560-north-carolina-state-university in Educational Psychology at North Carolina State University.
Reviews for Educational Testing and Measurement
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 10/15/15
Achievement Tests Functions of Standardized Tests Student Assessment Diagnosis Placement and Selection Accountability Predictive Validity Advantages of Standardized Tests Evaluating students general educational development in the basic skills amp in learning outcomes common to many courses of study Evaluating student progress during the school year or over a period of years Determining strengths amp weaknesses Weaknesses of Standardized Tests Evaluating the learning outcomes and content unique to a particular class or school Evaluating students daytoday progress Evaluating knowledge of current developments in rapidly changing content areas such as science and social studies Types of Achievement Tests State Developed Tests NC EOG S PublisherDeveloped Normreferenced batteries ITBS CAT GRE Stanf Ach Test PublisherDeveloped Normreferenced content area tests NelsonDenny GatesMcGinitie PublisherDeveloped Criterionreferenced tests Some popular tests to be aware of California Achievement Test CAT Iowa Tests of Basic Skills ITBS Metropolitan Achievement Tests MAT Stanford Achievement Tests TerraNova Why do we use these High technical quality Standard directions for administration and scoring Norms based upon national large national samples Equivalent forms Comprehensive manuals Using Achievement Tests Be wary of using subtests for diagnostic purposes unless enough items are included What is a norm group and what is the benefit of having a norm group Norm groups provide a standard frame of reference Equivalent forms Achievement Batteries Consists of a series of individual tests all standardized on the same national sample Where to find info on tests Mental Measurement Yearbook MMY MMY published by Buros Center for Testing httpwwwunledulburosl Some AptitudeIntelligence tests to be aware of Wechsler Intelligence Tests WISC WAIS StanfordBinet Raven s Advanced Progressive Matrices Cognitive Abilities Test CogAT Graduate Record Examination GRE OtisLennon School Ability Test OLSAT Cattell CultureFair Intelligence Tests Armed Services Vocational Aptitude Battery ASVAB Differential Aptitude Test DAT Aptitude Tests Do not measure fixed capacity but rather a different type of ability used to predict future performance Common distinction achievement tests measure what a student has learned and that aptitude tests measure the ability to learn new tasks Why use aptitude tests when you have achievement tests Can be administered in a relatively short time Can be used with students of more widely varying educational backgrounds Can be used before any training or instruction Testing Controversies Is there something wrong with a student getting grades and getting accepted to multiple colleges but not being able to pass a stated mandated minimum competency test required to graduate If you lived in Scarsdale would you have been protesting Do we test too much What did you think about what the woman said about the tests driving the curriculum When data are skewed they do not possess the characteristics of the normal curve distribution For example 68 of the subjects do not fall within one standard deviation above or below t e mean mode and median do not fall on the same score The mode will still be represented by the highest point ofthe distribution but the mean will be toward the side with the tail and the median will fall between the mo and mean a ea lt 0 WW um or m 5m mmmnn pm or man 5m mammm Nonsymmetrical 1 Positive Skew high number of scores 2 Negative Skew high number of high scores Nonbell shaped curves 1 Ushaped curves 2 Bimodal or Multimodal distributions I I B Mode 1 Unimodal single peak 2 Bimodal double peak A Median Mode Mode Median Mean Mean Skewed Curve Mode peak of distribution Median in middle Mean closest to tail of distribution Descriptive Statistics 39 I javalint imiahanzg 3 mmle Urbain Hanhnlih fj tandard Deviatio I r Z Score Percentile Ranks Standard Deviation Accurate measure of dispersionhow spread out the scores are Average distance of each score in a distribution is from the mean Normal Distribution Bell Curve With Wechsler Intelligence Test Scores 4 9594 s 14 34 I 34 14 2 Number of scores gt 01 l 55 7O 85 IOO 115 130 145 Wechsler IQ score 11 Questions about Grade Equivalent Scores What does it mean if a 4th grader has a grade equivalent score of 73 What is the purpose of grade equivalent scores Avoid Misconceptions with Grade Equivalent Scores Don t confuse norms with standards of what should be Don t interpret a grade equivalent as an estimate of the grade where a student should be placed Don t expect that all students should gain 10 grade equivalent each year Don t assume that the units are equal at different parts of the scale Avoid Misconceptions with Grade Equivalent Scores Don t assume that scores on different tests are comparable Don t interpret extreme scores as dependable estimates of student performance level Some Popular Standard Scores Mean SD Scale Name 500 100 GRE SAT GMAT 100 15 Wechsler IQ 20 5 ACT 50 10 Tscale MMPI Standard Scores A calculated score that enables a researcher to compare scores from different scales Zscore most popular mean of zero and SD of one o z XMISD T Scores Another type of standard score sometimes preferred because all numbers are positive T score 50 10z Example If you scored 2 SD s above the mean on a reading test your T score would be 50 102 70 Stanines Another type of standard score preferred for interpretability Score between 1 and 9 with each stanine covering 12 standard deviation unit eg stanine of 5 40ile59ile Judging the Adequacy of Norms Should be relevant Who do you want to compare your scores to Should be representative Should be up to date Should be comparable Should be adequately described National Testing Program NAEP NAEP National Assessment of Educational Progress Formulated in the 19605 to provide benchmarks of educational attainment Now tests at grades 4 8 and 12 Subject areas change depending on year Includes multiplechoice and openended test items SelfReport Tests Keep in mind that selfreport inventories assumes that individuals are WILLING AND ABLE to report accuratelythis may be a big assumption Try to also collect some observable measure of the trait of interest Interest Inventories StrongCampbell Interest Inventory suggests particular occupations based upon responses SelfDirected Search SDS Career Explorer classifies individual according to occupational themes realistic investigative artistic social enterprising conventional Personality Personality refers to a person s unique and relatively stable pattern of thoughts feelings and actions Personality is an interaction between biology and environment Genetic studies suggest heritability of personality Other studies suggest learned components of personality Measures of Personality Personality refers to a person s unique and relatively stable pattern of thoughts feelings and actions traits Interviews Unstructured Tell me about yourself Structured Set list of questions Observation Psychologist learns about personality by observing the person Objective tests selfinventories that involve paper and pencil Projective tests subjects reveal aspects of their personality when they talk about ambiguous stimuli Personality Measurement Issues Objective selfreport personality tests can be criticized on the basis of Deliberate deception and social desirability bias Can the test detect deception and attempts to enhance social desirability Inappropriate use when tests are used for purposes other than their designed use Use of a personality test to decide a presidential election The Big 5 Modern personality research argues for 5 basic personality traits OCEAN Openness whether a person is open to new expenences Conscientiousness whether a person is disciplined and responsible Extroversion whether a person is sociable outgoing and affectionate Agreeableness whether a person is cooperative trusting and helpful Neuroticism whether a person is unstable and prone to insecurity Overview of the Big 5 Low Scorers High Scorers Downto earth Imaginative 1 Openness Uncreative Creative Conventional Original Uncurious Curious Negligent Conscientious 2 t y La Hardeworking COHSCIemIOUSness Disorganized H Wellorganized Late Punctual Loner Joiner Quiet Talkative 3 Extroversmn Passive Active Res 39 3d Aft ct o t eivc e i nac Suspicious Trusting 4 Critical Lenient Agreeableness Ruthless Softhearted I quott bl G d t d rii a e 00 m ure Calm Worried 5 Neuroticsm Even tempered Temperamemal I Comfortable l l Selteconscious Unemotional Emotional 19 Projective Tests Projection is an idea developed by Freud in which people are thought to reveal their true feelings and thoughts when describing ambiguous stimuli A projective test presents a series of ambiguous stimuli and asks that a subject describe each stimulus The idea is that their verbal descriptions will reveal key aspects of their personality Specific Projective Tests Rorschach test Consists of 10 inkblots Reliability and validity of this test is low Thematic Apperception Test TAT TAT also consists of a series of ambiguous figures Was devised to measure achievement motivation by Henry Murray in 1938 20 TAT That makes me think of the garden It is the city in the country very much so It looks like New York with the Empire State Building right there Calming relaxing There39s a tree there so you can see the countryside and you39ve got the background with the city and the buildings so it39s a regional focus Rorschach Inkblot Test 21 What are objectives good for Have a master plan a roadmap Top Down planning Broad objectives for class Objectives for large units Specific behavioral objectives Ensure assessment at all levels of knowing Bloom s Taxonomv Current trends in learning objectives Shift from learning discrete facts to complex performancereflected in a greater emphasis from cognitive psychology as opposed to behavioral psychology Behaviorismbottom up approach Cognitive Constructivist approach believes in teaching for and assessing higher level skills in the learning process Bloom s Taxonomy Cognitive Cnnlirehenslnn Knnwledue Bloom s Taxonomy provides a useful way of describing the complexity of a cognitive objective by classifying it into one of six hierarchical categories ranging from the most simple to the most complex Guidelines for writing objectives Learning objectives should be MEASURABLE Use verbs that are specific and indicate observable responses General objectives should provide a comprehensive yet parsimonious overview of course content Specific objectives focus on content within each major unit of the classcourse Bloom39s Taxonomy of Educational Objectives Level Description Example Knowledge Rote memory learning Name each state capital facts Comp rehension Summarize interpret or Summarize the use of explain material symbols on a map Application Use general rules and Write directions for principles to solve new traveling by numbered problems roads using a map Analysis Reduction of concepts into parts and explain the Describe maps in terms of relationship of palts to the function and form whole Synthesis Creation of new ideas or Construct a map of a results from existing hypothetical country with concepts given characteristics Evaluation Judgment of value or Evaluate the usefulness of w01th a map to enable travel from one place to another Examples of objectives at different levels Students will be able to identify important contributions of Skinner embedded in a multiple choice format Students will be able to apply the method of loci mnemonic when studying for their quiz Students can distinguish between fixed interval and variable interval reinforcement schedules Examples of objectives at different levels Students will be able to synthesize information from the course and personal experience to create a sophisticated visual representation for effective instruction Given an argument supporting the use of extrinsic rewards students will be able to break down the premises into those which are logical and those which are fallacies Bloom s Taxonomy of Learning Outcomes 1956 Bloom editor The Taxonomy of Educational Objectives The Classification of Educational Goals Handbook I Cognitive Domain Six categories of cognitive learning Knowledge recall knowledge of specifics ways of dealing with specifics facts generalizations theories amp structures Comprehension interpretation extrapolation summarizing Application ability to use learned material in a practical manner or within a new situation using rules principles Analysis criticize deconstruct identify assumptions Synthesis relating one theory to another combining and re constructing ideas seeing relationships Evaluation the ability to appraise assign value assess arguments etc Bloom s Taxonomy Verbs for Writing Instructional Objectives Knowledge Comprehension Application Analysis Synthesis Evaluation arrange classify apply analyze arrange appraise define describe choose appraise assemble argue duplicate discuss explain demonstrate calculate collect assess label list express dramatize categorize compose attach memorize identify employ compare construct choose name indicate locate illustrate contrast create compare order recognize interpret criticize design defend recognize report restate operate differentiate develop estimate reproduce review select practice discriminate formulate judge state translate schedule distinguish manage predict sketch solve examine organize rate core use write experiment plan select question prepare support test propose value set up evaluate write Bloom s Taxonomy Criticism Almost 50 years old Behaviorist approach Developed before we understood the cognitive processes involved in learning and performance The categories or levels of Bloom s taxonomy Knowledge Comprehension Application Analysis Synthesis Evaluation are not supported by any research on learning Table of Specifications or Test Blueprint The method of ensuring congruence between classroom instruction and test content is the development and application of a table of specifications which is also referred to as a test blueprint Table 75 Table of Speci cations for Test on Chapter 2 Based on Content Areas Number of Items Level of Objective Content Areas Knowledge Comprehension Application Analysis Synthesis Evaluation Total Scales of Measurement 2 2 2 6 Measures of Central Tendency 3 3 6 Measures of Variability 3 3 3 9 Correlation amp Regression 2 3 2 2 9 How does a teacher decide what to put on a test Consulting Published Sources Using test questions from a publisher can lead to highquality tests or very low quality tests typically checked for errors or potential problems not true ALL the time however may not adequately reflect the material that was taught May not test at level YOU want or need Using Instructional Objectives Ideally a test should cover the ALL objectives of the class The When and How of Testing More Frequent Shorter Tests Students tend to leave their studying until just before the test the more often they are tested the more they will study Consider Testing Conditions Poor conditions can depress test performance Be attentive to the conditions the student will face Ensure Clear Directions Developing an Assessment Types of Items Selectedresponse items require a student to select a response from available alternatives multiplechoice truefalse amp matching items Constructedresponse items require students to create or construct a response llintheblank short answer essay items performance assessments amp portfolios Selected Response vs Constructed Response Which type is better There is no consistent advantage of one over other One is not inherently superior to the other Select the item type that provides the most direct measure of the intended learning outcome Should a classroom assessment instrument be very hard very easy or somewhere in behNeen When assessments are too easy students may not study very much and therefore may not learn as much as we would like When students become accustomed to passing assessments with minimal effort they may be easily frustrated in later years when they encounter more challenging material and do have to work hard Should a classroom assessment instrument be very hard very easy or somewhere in behNeen When assessments are too easy teachers and students alike may think students have mastered something they haven t really mastered at all In other words the assessments are not a valid measure of students learning When assessments are too difficult students may become discouraged and believe they are incapable of mastering the subject matter being assessed Creating Your Own Assessments Remember it takes a lot practice to develop good assessment instruments and items Ideally you should consider measuring outcomes in multiple ways eg multiple choice projects homework etc Don t fall into the eitheror thinking A brief introduction to test construction Why would you use a truefalse type item What are the pros and cons to this type of item What makes a good TF item True false items Benefits Disadvantages 50 of Items correct by Can quickly assess random chance a bunch of Limited amount of info gained objectiveseffluent Emphasizes rote Easy to score memorization without understanding Low diagnostic capability How to write GOOD TF items Attempt to test something other than rote memorization Avoid specific determiners words that give away the answer Eg always or never impossible Make each statement UNEQUIVOCALLY true or false no room for argument or interpretation NO doublebarreled itemsunless the item is intended to show a cause and effect relationship the item should contain only one idea How to write GOOD TF items If an opinion is used it should be attributed to someone One strategy is to create a list of true statements from the material and then convert approximately half of them to false statements True and False statements should be approximately the same length true statements may tend to be longerqualifiers Avoid ambiguous terms or statements Creating Matching Items Use homogenous material eg famous tennis players Include unequal number of responses and premises amp responses may be used more than once Matching Items Advantages Compact formmeasure a lot at one time Ease of construction for the most part Disadvantages Restricted to factual information Difficulty of finding homogenous material A brief introduction to test construction Why would you use a multiplechoice type item What are the pros and cons to this type of item What makes a good multiplechoice item Best Buy Multiple Choice Item Format provides a Best Buy for Content coverage Administration Scoring Reliability Item Writing Rules Why Worry An item containing a flaw that directs any examinee to the correct answer who otherwise would NOT know the answer is invalid If an item is answered correctly but for the wrong reason it is not measuring the outcome it was intended to measure Flawed items provide an advantage to testwise students Multiple Choice Items Rules for Writing Stems The stem should present a single selfcontained question problem or idea State the problem as simply and clearly as possible avoid excess verbiage and window dressing The stem should contain as much as the item s content as possible Writing GOOD multiplechoice ques ons Attempt to test something other than rote memorization Avoid specific determiners words that give away the answer like a an his or her etc eg EDP 560 is an A Semester of Laughs B Terrific Time C Terriny Good Time D Absolute abomination Writing GOOD multiplechoice ques ons Be clear in the stem what you are looking for Not Christopher Columbus was what Male A guy who lived long ago Adventurer Smallpox carrier Writing GOOD multiplechoice ques ons Make sure that ONE answer is clearly the best Make correct answer abcde in equal amounts Writing GOOD multiplechoice ques ons Use plausible options as distracters The leader of the Allied forces in the Pacific during WWII was A B C D Hitler Eisenhower MacArthur Mickey Mouse Writing GOOD multiplechoice ques ons Be careful using all of the above as an option this is often a specific determiner Using none of the above may increase the level of knowing and difficulty Make sure the answer is clearly defensible Writing GOOD multiplechoice ques ons The question should not typically be answerable without studying the material Avoid giving the answer away in the question or in the remainder of the test Create items that measure knowledge at all levels Work on your distractersthey make all the difference Written Exams Multiple Choice Format Advantages Disadvantages Lower chance score Often requires Reliable recognition only Good sampling Difficult to write Can be computer Requires longer scored development time Low administration than other exams cost Large candidate groups Multiple choice items Advantages Disadvantages More versatile than 25 of items TlF in assessing correct by random higher cognitive chance if 4 39eVe39S Limited amount of Can quickly assess info gained a Punch 0f Emphasizes rote ObJeCtheS memorization Easy to score without understanding Why Evaluation amp Assessment is Important Feedback to students Feedback to teachers lnformation to parents lnformation for selection and certification lnformation for accountability lncentives to increase student effort Bottom Line It provides sources of information to aid in the educational process On the purpose of testing The purpose of testing is to SAMPLE a testtaker s knowledge about a given topic It is typically not intended to measure ALL of the test taker s knowledge The results of the test are intended to assist us in making inferences BEYOND that of the specific test Assessment Comes in many forms including informal questioning in the classroom It is important to choose the most appropriate method of assessment to measure the topic at hand Ultimately the purpose of assessment is to assist students in attaining learning goals The Assessment Process Feedback to realign objectives instruction amp assessment Informal Assessment Bevekfp Pretest of Instruction reetnjg earnmg gt Knowledge gt earning GoalsObjectlves Goals I I I I I I Formal Checkpoints Feedback to Observe Students variability in students Develop understanding abilities of choosing appropriate methods Important terms Formative vs Summative evaluation Formative How are you doing Summative How did you do Normreferenced assessment vs Criterion referencedMastery assessment Norms comparison to peer group Criterion meeting instructional objectives Traditional vs Authentic Assessment Traditional measuring basic knowledge amp skills Spelling test Math word problems Physical fitness tests Authentic measuring skills in a reallifequot context Develop a school newspaper Build a model city Present a persuasive argument Portfolios Variable Types Dichotomous variable that has only two categories eitheror Discrete variables that increase or decrease by whole units Continuous variables that can theoretically assume infinite number of values Scales of Measurement Stevens 1951 1 Nominal or Categorical 2 Ordinal 3 Interval 4 Ratio 1 Nominal or Categorical naming scale Classification according to presence or absence of qualities No information provided on order or magnitude of differences Because nominal scales have no quantitative properties data consist of frequencies only Eg sex race religion political party 2 Ordinal ranking scale Classification according to degree of quality present Distinguish between ordered relationships between classes or characteristics but no information about the magnitude of difference Eg tall gt normal gt short first gt second gt third Eg percentile ranks 3 Interval Addition of a meaningful unit of measure equal size interval Consistent and useful unit of measure allows the use of basic arithmetic functions addition subtraction multiplication division Eg Fahrenheit scale shoe size 4 Ratio Addition of an absolute zero point to interval scale Zero implies total absence of the characteristic Ability to utilize ratio statements 21 15 Eg Height and weight Data types decide statistical analysis Nominal scale tw or more ca r39es Gender car types countries level of education Ordinal scale ranking surv y Classi ubjects and rank them 39om highest to lowest or most to least 0 Rank Stu ents by height weight or IQ sc o The differences between ranks are not equ l Interval scale having predetermined equal intervals 0 A s ore ofzero in an IQ testgtabsence olintelligence 200gtperfect intelligence attitudet st 0 Most ofthe tests used in educational research achievement aptitude mo ivation and Ratio scale having a meaningful true zero point olten used in physical measurements Having a meaningful true zero point 0 Height weight time distance and speed Descriptive Statistics familial itili ujiii Mean Median Mode l m faluilil lithum windiiifj Variance ZScore Standard Dev1at10n Percentile Ranks Range Shapes of Distributions Symmetric Distributions Normal Distribution BellShaped Curve Special symmetric distribution that is unimodal with mode median mean Skewed Distributions Positive Skew Negative Skew Figure 21 vaothcticul Distribution oi large Smuduniizmiim Sample Figure 23 Negatively Skewed Distribution Ln Scnrcs High Scum TCSI Scares Figure 24 Positively Skewed Distribution im curc High mm Tcsi Scores Mean median and mode are the same a Normal Distribution c Positiveiy skewed Distribution Descriptive Statistics Measures of Variability Range Variance Standard Deviation Standard Deviation Accurate measure of dispersion how spread out the scores are Average distance of each score in a distribution is from the mean Measure of Association Describes the degree of relationship that exists between two variables Bivariate relationships Correlations A relationship between two variables CAUSATION Size Correlations range from 1 to 1 Sign Zero means no relationship Positive correlationAs one variable goes up or down the other variable goes up or down Negative correlationAs one variable goes up the other goes down Uses of coefficient 1 Prediction if related systematically use one variable to predict the other 2 Validity measures of the same construct should have high degree of relationship 3 Theory verification test specific predictions 4 Reliability relationship across time or separate parts of test Represent relationship graphically Direction of Relationship P0sitive Negative Y Form of Relationship Linear Curvilinear Y Degree of Relationship Strong Weak Strength of a Correlation General Rule of Thumb but definitely situationally constrained Strong coefficients 70 90 Moderate coefficients 40 50 Weak coefficients 15 25 Pearson39s Product Moment Correlation Coefficient 1896 rxy correlation between x and y What s Pearson R One type of correlation coefficient Relationship between two variables ratio or interval Inventor Karl Pearson Indicates a linear relationship Plotting for the data result in a straight line If the Scattergram shows curvilinear then use other correlation coefficient 16 Other Measures of Association Spearman rho rs correlation coefficient for nonlinear ordinal data Pointbiserial used to correlate continuous interval data with a dichotomous variable Phicoefficient used to determine the degree of association when both variable measures are dichotomous Data Type measurement scales and Analysis Type of data gathered Type of Example Analysis Both variables are on Use Pearson r Is IQ related to ratio or interval Scale Then use GRE FiSher39s 2 Is the relationship statistically signi cant Both variables are USE Spearman s Is height related to ranked ordinal scale Rho weight ofthis group of people One continuous variable and Point biserial Is IQ related to gender one dichotomous variable Two dichotomous variables Phi coefficient Is gender related to TF item performance To be Statistically Significant the probability of chance The difference is due to systematic in uence and not due to chance Signi cance level Alpha 01 005 001 0001 Normally a005 Probability lt 005 1 chance in 20 difference found not due to treatment or intervention Use data analysis software SPSS SAS or Fisher s z to determine signi cance Calculating Association Between Variables Display correlation coefficients in a matrix Calculate the coefficient of determination assesses the proportion of variability in one variable that can be determined or explained by a second variable Use r2 eg if r70 or 70 squaring the value leads to r249 49 of variance in Y can be determined or explained by X 1 2 3 4 5 6 7 8 9 10 1 SAT 59 33 16 28 25 61 55 37 2 ACT 82 03 31 35 52 93 60 62 3 Grad Rate 12 36 07 20 64 24 68 4 ST Ratio 31 08 36 15 20 27 5 Experience 63 68M 39 33 29 6 Salary 84 4l 71 12 7 Per Pupil 48 59M 06 8 Lunches 59 51 9 Income 08 10 Minority Note p lt 05 p lt 01 Correlations for SAT include only those states N23 with at least 50 participation rate Correlations for ACT include only those states N25 with at least 50 participation late Factors that Effect Correlations Most correlations assume a linear relationship falling on a straight line If another type of relationship exists traditional correlations may underestimate the correlation If there is a restriction of range in either variable the magnitude of the correlation will be reduced 19 Linear Regression A statistical technique for predicting scores on one variable criterion or Y given a score on another predictor or X Predicts criterion scores based on a perfect linear relationship Strong correlations result in accurate predictions weak correlations result in less accurate predictions 20 HOW does a teacher decide What to put on a test Consulting Published Sources 7 Using test questions from a publisher can lead to highquality tests or very low quality tests 7 typically checked for errors or potential problems not true ALL the time however 7 may not adequately re ect the material that was taught 7 May not test at level YOU want or need Using Instructional Objectives 7 Ideally a test should cover the ALL objectives of the class The When and HOW of Testing More Frequent Shorter Tests 7 Students tend to leave their studying until just before the test the more often they are tested the more they will study Consider Testing Conditions 7 Poor conditions can depress test performance Be attentive to the conditions the student will face Ensure Clear Directions Objective vs Subjective Items What is the difference between Closedended objective and Open ended subjective items Which type is better Objective closedended types of items 7 MultipleChoice Items 7 TrueFalse Items 7 Matching Items 7 Short answerFillinblank Subjective openended types of items 7 Various forms of performanceauthentic assessment 7 Essaypaper There is no consistent advantage of one over other One is not inherently superior to the other Select the item type that provides the most direct measure of the intended learning outcome Advantages Of Authentic Assessment Motivation increases Teach to real life Assessment and educational product are the same External validity Utilize energy Student understanding of their work increases Tap higher level thinking amp problem solving Disadvantages of Authentic Assessment Often contrived assessments Increased cost material Increased time More dif cult to score reliably Capture only a small part of academic achievement More things to consider for classroom management and planning Should a Classroom assessment instrument be very hard very easy or somewhere in between When assessments are too easy students may not study very much and therefore may not learn as much as we would like When students become accustomed to passing assessments with minimal effort they may be easily frustrated in later years when they encounter more challenging material and do have to work hard Should a Classroom assessment instrument be very hard very easy or somewhere in between When assessments are too easy teachers and students alike may think students have mastered something they haven t really mastered at all In other words the assessments are not a valid measure of students learning When assessments are too dif cult students may become discouraged and believe they are incapable of mastering the subject matter being assessed Creating Your Own Assessments Remember it takes a lot practice to develop good assessment instruments and items Ideally you should consider measuring outcomes in multiple ways eg multiple choice projects homework etc Don t fall into the eitheror thinking A brief introduction to test construction Why would you use a truefalse type item What are the pros and cons to this type of item What makes a good TF item True false items 0 Bene ts 0 Disadvantages Can quickly assess a 50 of items correct bunch of by random chance objectivesef cient LiIHited amount of Easy to score info gained Emphasizes rote memorization Without understanding Low diagnostic capability True or False All Ivy League schools are located on the east coast of the US How to write GOOD TF items Attempt to test something other than rote memorization Avoid specific determiners words that give away the answer 7 Eg always or never impossible Make each statement UNEQUIVOCALLY true or false no room for argument or interpretation NO doublebarreled itemsunless the item is intended to show a cause and effect relationship the item should contain only one idea How to write GOOD TF items If an opinion is used it should be attributed to someone One strategy is to create a list of true statements from the material and then convert approximately half of them to false statements Avoid simply stealing statements from a textbook True and False statements should be approximately the same length true statements may tend to be longerquaifiers Avoid ambiguous terms or statements Short answercompletion Questions 0 When would you use these types of questions 0 What are the advantages amp disadvantages Short answercompletion Questions 0 Advantages Measure RECALL instead of RECOGNITIONreduces guessing Easy to construct 0 Disadvantages Not suitable for complex learning outcomes More dif cult scoring speHinghandwritingdiversity of answers eg Where was George Washington born Writing GOOD short answer items Generally one blank per question Stem needs to indicate the expected response NOT IQ is is a better But The term IQ is an acronym for what phrase Regression is a better way of analyzing continuous data than of than Ask direct questions rather than incomplete statementsthey are more natural to students and generally better structured Matching Items Advantages Compact formmeasure a lot at one time Ease of construction for the most part Disadvantages Restricted to factual information Dif culty of nding homogenous material Creating Matching Items Use homogenous material eg famous tennis players Include unequal number of responses and premises amp responses may be used more than once No more than 10 items in either column A brief introduction to test construction Why would you use a multiplechoice type item What are the pros and cons to this type of item What makes a good multiplechoice item Multiple choice items 0 Bene ts 0 Disadvantages More versatile than 25 of items correct TF in assessing by random chance if higher cognitive levels 4 Can quickly assess a bunch of objectives Easy to score Limited amount of info gaine Emphasizes rote memorization Without understanding Very timeconsuming to write Types of Essay Questions 0 ExtendedResponse open ended more freedom on the part of the student on What to include Advantagesallows for creativity encourages independent organizational skills Disadvantagessamples only limited amount of content scoring criteria less de ned scoring may be unreliable Types of Essay Questions RestrictedResponse more clearly de ned parameters on how to respond narrows response options Advantagesmeasures factual material scoring is clearer than with extendedresponse Disadvantagesreduces independent organization skills and may reduce problem to supplytype Suggestions for assembling items Clean up errors proofread Keep items free from racial ethnic and gender bias Include clear directions don t assume students know these Group items according to item format Suggestions for administering items Make students aware of any time limits but assure them adequate time is provi e Make sure students know how to record answers Explain guessing typically have students answer all quest39ons Don t hold students on the starting line too long Avoid interruptions Avoid giving hints that might advantage some nts Discourage cheating Purpose of Grading Grading is often used to serve multiple purposes Administrative purpose of evaluating s an ran ing Educational purpose of assessing learning and progress toward class 39 c ives Some people also view grades as motivators Shortcomings of Letter Grades Are typically a combination of achievement effort work habits and good behavior The proportion of students assigned each letter grade varies from teacher to teacher They do not indicate a student s specific strengths and weaknesses in learning Reliability of classroom measurement Research shows classroom measurement to be less than perfectly reliable Different teachers asked to grade the same essays often come up grades that differ by several letter grades Numerical grades are notoriously unreliable Norm vs criterionreferenced testing Normreferenced testing s primary purpose is administrative ranking students for selection into programs tracks etc Criterionreferenced testing is typically the most useful way to assess learningprogress toward educational goals 3 Options for Grading Relative gradi g ltgrmsreferenced performance in relation to other group members Absolute grading criterion referencedlmastery performance in relation to specified standards Improvementability grading performance in relation to some determined baseline or starting point Absolute Grading A Outstanding Student instructional goa s B Very good student has ajor instructional goals m and most of the minor ones Etc A 95 to 100 correct B 85 to 94 correct C75 to 84 correct D 65 to 74 F below 65 correct K Mu Match I Assigning grades Numerical grades are common in education both to ay and historically In order to do this sort of grading We must assume that there are REA differences between is different than an 83quot L grades ie that an 82 A We must assume that the number represents something R AL We must assume that our measurement is E RELIABL Onechance testing A postsecondary education In onechance testing the students take the test receive grades and What is the message here Ever hear a student when asked about material studied previously say something like Oh we re done with thatquot Teachers often assume that students will review o what they missed and learn the materIa they ls onechance testing serving an educational or evaluative administrative purpose In other areas of life involving skill acquisition is it oneshot evaluation Driving test Learning a Athletics language Basic military Learning to walk training Learning to play a 39 Learning to use a musical instrument compmer Learning to Pottery making cookbake Learning a trade mamgnitive ability me ability to accumer nailed on your learning an employing incrmsingly effective strategies Suggestions for Grading Assess frequently Use true FEEDBACK to facilitate learning base feedback on frequent or constant observations homework oneminute quiz behavioral observation peer mentoring etc think coachingmentoring here SelfReport Tests Keep in mind that selfreport inventories assumes that individuals are WILLING AND ABLE to report accurately this may be a big assumption Try to also collect some observable measure of the trait of interest Interest Inventories StrongCampbell Interest Inventory suggests particular occupations based upon responses SelfDirected Search SDS Career Explorer classi es individual according to occupational themes realistic investigative artistic social enterprising conventional Personality Personality refers to a person s unique and relatively stable pattern of thoughts feelings Personality is an interaction between biology and environment Genetic studies suggest heritability of personali other studies suggest learned components of personality Measures of Personality Personality refers to a person s unique and relatively stable pattern of thoughts feelings and actions traits Interviews 7 Unstructured Tell me about yourself quot r Structured Set list of questions n by observing the pers on Objective tests selfinventories that involve paper I tests Pro e sts subjects reveal aspects of their personality when they talk about ambiguous stimuli Personality Measurement Issues Objective selfreport personality tests can be criticized on the basis of Deliberate deception and social desirability can the test detect deception and attempts to enhance social desirability nappropriate use when tests are used for purposes other than their designed use Use of a personality test to decide a presidential election The Big 5 Modern personality research argues for 5 basic personality traits OCEAN Openness whether a person is open to new xperiences ntlousness whether a person is p Ined and responsible Extroversion whether a person is sociable outgoing and affectionate Agreeableness whether a person is cooperative trusting and helpful Neuroticism whether a person is unstable and prone to insecurity Overview of the Big 5 Low Scorers High Scorers Donn In mlh inngimlm 1 Names UiKnaliw mauve 39 Lnnvmilumai Original UiKurious wimis Negligeni iirwieniimis 2 C n N WWW Lazv imrwmimm Dismgamzed l l Weirurgamed La c Pun rmi Loner joiner 5 V Quiet Tdikdlive WWW Pawive l l ALlive Resewekl Aiieuiumle suspinnm lruwlmg 4 A reLdmeucss quot quotm 39I m g 39 Rulhicss l l Saltrhmrh ri immhlc ionrlrlmllllt d nim Worried 5 Neumliciim Luvnnmpvmd Temperameumi Cnmimmhk i i Seircunmum Unrmolmnai Emouunal Projective Tests Projection is an idea developed by Freud in which people are thought to reveal their true feelings and thoughts when describing ambiguous stimuli A projective test presents a series of ambiguous stimuli and asks that a subject describe each stimulus The idea is that their verbal descriptions will reveal key aspects of their personality Specific Projective Tests Rorschach test Consists of 10 inkblots Reliability and validity of this test is low Thematic Apperception Test TAT TAT also consists of a series of ambiguous figures Was devised to measure achievement motivation by Henry Murray in 1938 That makes me think of the garden It is the city in the country very much so It looks like New York with the Empire State Building right there Calming relaxing There s a tree there so you can see the countryside and you ve got the background with the city and the buildings so it s a regional focus Rorschach Inkblot Test Validity The extent to which an assessment instrument measures what it is supposed to measure Reliability The consistency of assessment results Reliability is a necessary but not suf cient condition for validity About validity 0 Refers to the appropriateness of the interpretation and use made of the results not the procedure itself 0 It is a matter of degreennot all of nothing 0 It is speci c to some particular use or interpretationmo assessment is valid for all purposes Can have a reliable but invalid measure If measure is valid then necessarily reliable Sources of Validity Evidence Does the assessment tap into a representative sample of the content domain being assessed content validity Does the instrument measure a particular psychological or educational characteristic construct validity Do students scores predict their success on a later task predictive validitycriterionrelated validity What are the effects of using a particular assessment consequential validity Does the assessment appear to be a reasonable measure face validity Content Validity 0 Important for both classroom tests and when selecting standardized tests 0 On a classroom test this takes much forethought and developed skill to ensure this EG SAY YOUR TEXTBOOK HAD 1 2 CHAPTERS ON PSYCHOMEI39RICS AND YOUR EXAM ONLY COVERED 2 NOTA VALID EXAMINATION OF 111E ENTIRE DOMAIN Construct Validity 0 Whether or not an abstract hypothetical concept exists as postulated 0 Examples of Constructs 0 Intelligence 0 Happiness 0 Impulsivity Based on 0 Convergence different measures that purport to measure the same construct should be highly correlated similar With one another Divergence tests measuring one construct should not be highly correlated similar to tests purporting to measure other constructs Construct Underrepresentation the extent to which aspects of the construct are underrepresented in the assessment Construct Irrelevant Variance performance is in uenced by irrelevant factors eg reading level on a math test
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'