This 10 page Class Notes was uploaded by Lura Daniel on Wednesday September 9, 2015.




Date Created: 09/09/15
JOURNAL OF APPLIED BEHAVIOR ANALYSIS 199124231234 NUMBER 2 SUMMER 1991 IF RELIANCE ON EPIDEMIOLOGY WERE TO BECOME EPIDEMIC WE WOULD NEED TO ASSESS ITS SOCIAL VALIDITY DONALD M BAER AND ILENE S SCHWARTZ UNIVERSITY OF KANSAS Behavior analysts often discuss social validity these days Some see it as an essential element of the field s survival others as a diversionary trap leading to the eld s demise Either argu ment is daring in that we know rather little about the accurate and valid assessment of what gets called social validity 7 and we know a great deal less about the survival of elds Our most defensible argument may be only that until we can do social validity assessments well we cannot determine their importance Of course if they have importance it will be to help choose and guide program developments and applications To the extent that we are un sure of the optimal way to choose and guide program developments and applications we are not ready to address the needs or concerns of our community This is not a simple problem Any procedures developed to facilitate accurate social validity assessment must be applicable to the wide scope of research questions and con sumers encountered by behavior analysts Cur rently those range from reaching academic sup port skills to highrisk preschool students to im plementing communitywide public health pro grams In the future if things go well the neces sary range will be even wider The interesting question is whether the assessment of social va lidity is an important part of things going well in the future One guess is that it ought to be not merely important but crucial Winett Moore and Anderson 1991 address many of these issues and questions Their con We gratefully acknowledge Mary Todd for her as sistance in preparing this manuscript Requests for re prints should be addressed to Donald M Baer Depart ment of Human Development 4001 Dole Human De velopment Center University of Kansas Lawrence Kansas 66045 or Ilene S Schwartz Juniper Gardens Children s Project 1614 Washington Blvd Kansas Citv Kansas 66102 111151551 SEAWE EMSIMILE SHINE nurrmmm Ell NEE BREAKS ME E MSNELE cept of social validity is interesting partly be cause it is somewhat different from our prece dents however considering their examples it is certainly applicable Winett s group looks be yond the immediate results of their work they guess about the longterm implications of both the programmed and unexpected behavior changes even if the length of that term requires the guesses to become increasingly diffuse and nonspecific Certainly that is the central strategy of social validity assessment But their tactic in that strategy is distinctive at least for our lit erature at least so far They offer first a primer of epidemiology and an adventure called social marketing and from that base suggest an inter section of these methodologies for social valid ity assessments and program evaluation Their definition and characterization create what might be called a rational social validity 7 the social validity that will be seen as such by rational people Its essence is that a problem is not simply what causes people to complain about it39 a problem has verifiable importance eg at the level of epidemiological data Fur thermore interventions into problems are to be evaluated rst by objective reliable measures of their effectiveness testifying in careful complete experimental designs that yield unambiguous judgments about that effectiveness When a relevant audience is not rational enough to adopt and maintain an intervention merely proven to solve what they ought to recognize as their problem the intervention is socially marketed interactively with its development either as it becomes clear that it must be or routinely like an immunization against some disease in this case the disease of social rejection We can only agree but that is mainly because we are often members of that rational audience Please all contradictions should be restricted to written notes sent to us in a plain brown wrap per That is we too are reinforced by objec 231 232 DONALD M BAER and ILENE S SCHWARTZ tive reliable often epidemiological data testi fying correctly to the societal importance of a problem which is often its size interacting with the severity of its consequences perhaps that is because we see ourselves also as members of a society so much at risk for its survival that the research and intervention efforts of our disci pline ought to be aimed mainly at the most ur gent threats to that survival that our discipline knows how to approach We are also members of a science subcommunity that defines proof in naturalscience terms thus we de ne effective ness in terms of the representativeness of the experimental samples the choices of the control conditions the choices of the measures the ob jectivity reliability and validity of those meas ures the kind and size of changes in those measures achieved by the intervention and the thoroughness of the experimental design in which those data were gathered 7 for starters Finally we are workers in a discipline called applied behavior analysis because of that we assume that the behaviors constituting social validity and invalidity are modifiable behaviors and consequently might well be modified per haps primarily by the same techniques that make up social marketing But quintessentially at least for the moment we are also members of a socialpolicy subcom munity of a philosophyofscience community likely to label the modi cation of any current social validity or invalidity in a target popula tion not only as behavior modi cation but also as the imposition of our social validity on theirs apparently because we know we are right In addition we are members of a skeptical 39 that 39 39J the jquotquot reliability and validity that epidemiological data can have but also notes there are many sets of such data in effect someone must choose which of them identifies the most imperative problem for our next intervention Although epidemiol ogical data are or can be made objective reli able and valid choosing among them is per sonal subjective and behavioral Therefore it is subject to many contingencies and stimulus controls other than and in addition to rational ity So when some of us respond to the range of epidemiological data on our current societal problems by choosing one of them for immedi ate intervention and others of us respond to the same range of data by choosing a different problem for immediate intervention where is the epidemiology to show in advance which of us made the correct choice More likely we will engage in one of those behavioral processes usually labeled politics to see which choice will be declared officially correct and acted upon And what else is the assessment of social validity if not the attempt to predict the outcome of exactly that process in advance 7 either to acknowledge and comply with it or to acknowl edge and then try to change it Assessment does not commit us to either course Does the case described by Winett et al 1991 create a we who are presumed more rational than 1hey We think so If it does what are the implications of doing so for the survival of a discipline trying to study and con tribute to the survival of its society We do not know we can only guess Thus the approach taken by Winett and his colleagues is a strik ingly profound one and the resulting definition of social validity challenges current applied be havior analysis True this field has encountered many deliberate challenges these past two dec ades and too many of them have taught us only to tend our garden However this one looks dif ferentthis one we had better meet If we are becoming dissatis ed with assess ments of social validity that merely ask selected consumers of a program to complete a short questionnaire administered by the experimenter at the end of the intervention and if we are be coming dissatisfied with assessments of treat ment ef cacy based on a small single handpicked sample that received the interven tion under ideal situations then perhaps we are ready to become contextual If so then we shall soon learn how to assess the contextual vari ables governing the outcomes of our interven tions and to acknowledge that when any com ponent of that context is altered other compo nents very likely will react to that change Per haps that is indeed a fair description of the paths taken earlier by both epidemiology and social marketing Perhaps we have something to learn from them But as always in disciplinary challenges per haps we do not However as always the pru dent strategy in responding to strategy clash is to find out We might well be cautious though in EPIDEMIOLOGY AND SOCML VALIDITY 233 doing that First we should remember that wel coming new assessment methodologies from other disciplines does not require discarding our old methodologies either for assessing or achieving the outcomes we target This is good to remember especially when the other disci plines from which we are borrowing seem to recommend just that Most of what we do has been empirically documented as effective in ac complishing what it was aimed to do These new recommendations after all are only to aim at more in particular to aim at a specific more a kind of consumer behavior tentatively labeled consumer satisfaction or social validity that may prove crucial to achieving a largescale ef fectiveness for what previously was smallscale effectiveness It may often be true that the achievement of smallscale effectiveness does not demand as much consumer satisfaction as does largescale effectiveness On the other hand perhaps that is not true very often To find out which is the case we shall need an accurate assessment of consumer satisfaction If we then find that we do indeed often need a lot of con sumer satisfaction for viable largescale appli cations how fortunate it will be for us that we already know by then how to assess it accu rately and validly Thus these are recommendations only to be gin the study of the accurate and valid assess ment of consumer satisfaction so that we will have a crucial dependent variable for all that re search Note how different is a crucial de pendent variable from 1he crucial dependent variable or 1he only dependent variable Thus we need not alter our current conceptual approach in this adventure We have proposed only an extension of behavioral measurement starting with what we already have The main ugliness in what we already have is only that it looks terribly vulnerable to invalid and mislead ing assessments of social validity and if so that could spell trouble for our clients us our disci pline and our discipline s chance of contribut ing to its society If in the process of finding out how to measure social validity accurately and validly we discover that a we had been doing so all along andor b it was not all that impor tant to the survival of our applications and the health of our discipline then we may celebrate resolve not to heed foolish Cassandras ever again and get on with the important develop ment of better more valuable and more appli cable interventions either with those good old cheap and easy measures of social validity or without any Some of us will also emit some tacts about surprise 7 after all Cassandra was correct But in that future this form of surprise will hardly constitute a literature event let alone a journal symposium The arguments of Winett and his colleagues require only caution not criticism Those argu ments recommend some techniques to extend our current assessments of social validity and so do our arguments But neither team is yet in a position to urge all researchers and practitioners to discard their current practices in favor of what now are only promising still not thoroughly de veloped and very expensive techniques of un known generality We are in a position only to recommend their intensive investigation because of their attractive logic and their usefulness in some initial trials For example Winett et al 1991 suggest that social validity is established through a number of interactive a priori steps p 215 They offer some examples to make these abstractions real In these examples they do not seek expert views of the problem do not let the researcher identify the problem and do not wait for someone in the community to generate specific complaints In stead they choose some class of epidemiological data to inspect and from it to identify problems and the target groups most at risk That deter mines the nature of their interventions The original Cassandra was the one person who knew with certitude that her Trojans would soon make a fatal error and later knew with equal certitude exactly which of their numerous tactics it was to be Her problem perhaps was that the principle of crying wolf too often had not yet been disseminated and so she overdid and consequently was not believed Yet clearly half of social validity is to be a Cassandra The trouble with trying to find a Cassandra is the extraordinary abundance of applicants all quite certain about future disasters ready to caution anyone seen to be puzzling over how behavior works You never know until too late which if any of the numerous competitors for Cassan dra s job was correct In this context the nomi nation by the Winett team of independent ob 234 jective widely sampled usually reliable epide miological data seems to accomplish the stan dard aim of science 7 to replace apparently magic personal skills with valid objective meth ods that anyone can apply given the proper graduate training These procedures may be an improvement over current practice for some features of social validity with some research questions and populations These procedures are very appro priate to their example of reducing the risk of HIV infection in adolescents However they are not appropriate for many of the problem behav iors addressed by behavior analysts Is it neces sary or possible to collect epidemiological data to establish the importance of decreasing selfinjury in people with developmental dis abilities If concerned parents seek intervention to teach independent play skills to their young child is their concern enough to constitute im portance Winett et al 1991 dearly state that the lack of epidemiological data does not pre clude a problem from being socially important however over reliance on epidemiology may dissuade behavior analysts from developing new and perhaps more effective methods to as sess the social importance of the problems tar geted by behavioranalytic intervention pro grams These interactive steps may be appropri ate in assessing certain components of social validity but any a priori procedure will be in adequate to assess a comprehensive view of so cial validity The question of whether or not to conduct so cial validity assessments was answered most eloquently by Wolf 1978 The important ques tions currently facing the field of applied be havior analysis are how to conduct these as sessments and how to apply these data to im prove behavioral interventions Winett and his colleagues describe a methodology that may be useful in answering these questions in particu lar the framework they provide suggests ques tions and challenges that individual behavior analysts can use to further their own work in this area DONALD M BAER and ILENE S SCHWARTZ REFERENCES Winett R A Moore J F amp Anderson E S 1991 Extending the concept of social valid ity Behavior analysis for disease prevention and health promotion Journal of Applied Be haviorAnalysis 24 215230 Wolf M M 1978 Social validity The case for subjective measurement or how behavior analysis is finding its heart Journal oprplied BehaviorAnalysis I 203214 ReceivedJanuary 15 199 Initial editorial decision February 19 199 Final acceptance March 1 199 Action Editor B Scott Geller Notes on Threats to Internal Validity These notes are intended as supplements for the information presented by Kennedy on pages 33 amp 34 of his text concerning threats to external and internal validity Four general considerations might help in understanding the labels applied to various threats to internal and external validity and Kennedy s characterizations of them First many labels have their origins in groupstatistical research and might seem odd when applied to other research traditions Where possible I ve provided the rationale for the original term to help you understand its origins I was tempted to offer alternative labels for each of the threats but decided not to do so Almost any label would be insufficient to do justice to the underlying principle so it seemed best to let tradition commit the sin rather than to take that burden upon myself Second although a threat s label per se may seem inappropriate in most cases the underlying concern is still very real in singlesubj ect or timeseries research Kennedy appears to have sometimes overlooked that possibility and I ve provided my own thoughts on the subject Third there are at least two threats to internal validity that are not discussed by Kennedy in this chapter although he discusses their implications in later chapters 7 reactive designs and reactive intervention timing These threats are perhaps more common in singlesubject timeseries research than in group research but apply to both Kazdin might have omitted them from his discussions because they were not speci cally discussed by Campbell amp Stanley in their seminal 1963 overview of research designs I discuss these threats here for two reasons 1 because they are important considerations when designing research and 2 to point out that no list of possible threats is really likely to be complete It39s important that you consider everything about the circumstances under which your research takes place and not be bound by traditional ways of conceptualizing possible threats to the validity of the conclusions you draw Finally many threats to internal validity are also threats to external validity the degree to which we might con dently extend our results beyond the original study and vice versa For example quotdiffusion of treatmentquot inadvertently applying the treatment when it should not be applied can indeed be a serious threat to internal validity but it can also seriously effect generalization of observed outcomes to situations where that sort of crosscontamination of conditions is unlikely Similarly the related but different threat of quotmultiple treatment interferencequot as a threat to external validity can also seriously compromise internal validity Don39t be so eager therefore to classify a threat as belonging to issues of internal or external validity Consider every particular feature of your research from both perspectives 1 History Kennedy s Description History effects are events that occur outside of the experimental situation but can potentially influence the behavior under study Examples include events such as sleep deprivation health problems or out of school math tutoring In addition history effects in educational research also encompass events such as substitute teachers unanticipated fire alarms and students being called out of class Owen s Comments Most people assume that historical confounds must refer to things that happened before the study began However this term arises from the common perception that quotposttestonlyquot designs in group statistical are preferred since pretests andor repeated measures of subjects cannot then in uence outcomes see quot3 Testingquot below In a quotposttestonlyquot design all changes in the subject s environment are quothistoricalquot to come before that posttest so all uncontrolled changes that might in uence outcomes are quothistorical confoundsquot even if they occur during the course of the experiment 2 Maturation Kennedy s Description Maturation effects are a second type of threat to internal validity Children mature overtime and these developmental processes present a problem to researchers Referred to as quotmaturation effects quot normal developmental processes can in uence the behavior under study particularly in experiments that occur over a long period of time For example when studying the effects of an intervention on language development if the experimental effect from the independent variable is slow it may be unclear how much of the effect is from normal maturation rather than the intervention Owen s Comments This threat is called quotmaturationquot because the process of quot growing olderquot is an obvious threat to internal validity any time a subject is evaluated repeatedly over a long period In singlecase research this is most likely to be a problem when experimental effects are slow to develop For example the impact of speech physical or occupational therapy might take months or years to become manifest However quot growing olderquot is only one threat associated with the passage of time In general any systematic process over time that changes a subject s ability to respond can be considered a potential maturational confound quotBoredomquot and quotfatiguequot are among the most common such threats whether the boredom is manifest within a given experimental session or over a period of days or weeks 3 Testing Kennedy s Description Testing effects are threats to experimental control resulting from changes in behavior that occur when exposed to a testing situation The idea is that exposure to questions regarding the curriculum being taught to a student might in fact teach them something about the testing context eg how to answer questions more accurately or the testing situation may teach something about the material to be learned In such instances behavior can change simply as a result of testing apart from any intervention being analyzed Owen s Comments Any change in the subject that can be attributed to testing or measurement per se is considered a threat to internal validity If care is not taken in the development of an attitudinal survey for example taking the survey even once can in uence the subj ect s behavior independently of the variables we want to study Testing confounds arise when the measurement process actually alters the behavior or status of the subject For example a child might behave better just because she is aware we are watching measuringtesting independently of any program per se we might implement to improve the behavior The potential impact of testing confounds can be especially important in singlecase timeseries designs since multiple assessments are completed both within and across conditions 4 Instrumentation Kennedy s Description Instrumentation effects take two general forms First malfunctions in software andor hardware being used to record behavior might occur For example a software glitch or a stuck key on a keyboard during a computer based assessment can alter the data that are obtained and produce unwanted changes Threats to Internal Validity 2 in recorded behavior Second behavior being recorded by observers can result in inaccurate representations of responding One example of this is poorly trained observers who inaccurately record the behaviors of interest Another concern is that observers will gradually alter how they define and record behavior over the course of a study a phenomenon known as observer drift Each of these types of instrumentation effects will be discussed at length in Chapter 7 Owen s Comments An quotinstrumentationquot error in contrast to a testing confound see 3 above is a change in the way we measure rather than a change in the subject s actual behavior For example if we are recording the number of times a child uses quotsocially unacceptable wordsquot and our standards change over the course of the study perhaps we get more lenient or tougher about what we consider quotunacceptablequot it might appear that the subject s behavior has changed when actually only our measurement of the behavior has changed While it is true that instrumentation problems are very often due to our use of human observers whose standards might change over time problems can also arise when we use nonhuman devices for gathering information For example machines can get out of quotcalibrationquot with repeated use and transmission errors in data sent over the intemet can sometimes alter results in subtle ways 5 Regression to the Mean Kennedy s Description Regression to the mean is another threat to internal validity This is a statistical sampling phenomenon in which highly unlikely outcomes quotoutliersquot occurring within a normal distribution tend not to reoccur when re sampled see Gould 1981 In behavior analysis there is no such thing as an outlier behavior All behavior occurs for a reason and outliers are simply a manifestation of a behavioral process that has yet to be analyzed and understood This makes the concept of statistical regression not very useful for a research approach based on repeated measures see next section Owen s Comments Most statisticians would characterize regression to the mean as the tendency for high or low scores not just extreme scores to move closer to the mean average of a distribution of scores over time Since the phenomena was rst observed and de ned in group statistical research singlesubject researchers often reject its application to singlesubject timeseries data That s clearly a mistake Extreme scoresquot in a distribution for a group or individual do often tend to become more like the quotaveragequot over time After all there are many reasons why a score might be extreme e g the subject was sick at the time of assessment that might not be true when we reassess the subject If a series of such unusual scores precede a change in experimental conditions the shift in performances following the change might actually only re ect a regression toward the mean The inverse might also be true Extreme scores might re ect some fundamental characteristic of the subject that will only cause repeated measures to become more extreme over time A young child who bats a baseball very well in little league might be truly talented and that talent could very well improve over time making the subject even less like the quotaveragequot for kids his age The tendency for scores to become more extreme over time is called quotStein39s Paradoxquot We can sometimes guess whether quotregression to the meanquot or quotStein39s Paradoxquot will operate by asking relevant questions eg quotis the child sickquot but only repeated assessments under careful conditions will allow us to answer the question with any con dence eg make sure the child is well before the next assessment and see what happens Threats to Internal Validity 3 6 Participant Selection Bias Kennedy s Description Participant selection bias relates to the equivalence of people being assigned to different treatment groups This threat like regression to the mean is derived from a group comparison approach to research derived from traditional psychological research see Chapter 2 and does not easily map onto a single case design logic with individuals Owen s Comments I disagree In my opinion selection bias can easily in uence singlesubj ect studies When using a multiple baseline design across different skills with a single subject for example if we always choose the next behavior for intervention because we believe quotit is ready for a changequot then that quotreadinessquot might constitute a quotselection biasquot that makes the interpretation of results difficult That is did the behavior change as a result of our intervention or because we selected a behavior that was ready to change Similarly if in an alternating treatments design we consistently use one set of procedures in the morning and a different set of procedures in the afternoon are any observed differences in performance due to the treatments per se or the fact that the individual might be biased to behave differently in the morning than in the afternoon Whenever we select behaviors or intervention times we must ask ourselves if we might be unconsciously biasing outcomes by the way in which those selections are made 7 Selective Attrition Bias Kennedy s Description Selective attrition of participants refers to individuals dropping out or being removed from a study for some systematic reason that is unrecognized by the researcher Although this is historically a concern in group comparison designs it also can affect single case designs For n 1 designs selective attrition is a concern because people with certain characteristics may not be able to complete an experiment For example the intervention may be too complex it may not be socially acceptable it may run counter to some cultural practices or it may produce unwanted side effects This is not so much a concern for the internal validity of a single case design but can be an important issue for systematic replication ie establishing the generality or external validity of findings see Chapter 4 Owen s Description I agree in the main with Kennedy s description However attrition can have an impact on the internal validity of certain designs if the behaviors being observed fall off the observation scale before the experiment has been completed For example if some of the behaviors in a multiple baseline design quotmax ou quot on our measurement scale or spontaneously improve before we apply our intervention to them those baselines are essentially quotlostquot to the study just as subjects who drop out of a study are lost We don t know if those behaviors are systematically different from the behaviors we were able to study meaningfully throughout the study so the results of our study might be due in part to that quotattritionquot This has also been called a quotmortalityquot problem 8 Interactions Among Selective Attrition and other Threats Kennedy s Description Interactions among selective attrition and other threats are the final concern in regard to internal validity In such cases a threat such as history effects or testing effects systematically influences why participants do not complete a study In single case designs because of their inductive nature this type Threats to Internal Validity 4 of threat is really an elaboration of the previously mentioned concern This is because interactions between variables causing selective attrition are essentially a refinement in the potential experimental question Why did some participants not complete the study but others did Owen s Comments Kennedy s characterization of this threat is essentially correct but I m unsure why he says it is the final concern in regard to internal validity There are at least 3 additional threats that should be considered as noted below From this point on in this paper all of the descriptions and comments are Owen s 9 Diffusion of Treatment Diffusion of treatment occurs when the conditions applied to some subjects and or at some times has an impact on presumably different conditions applied to other subjects or at other times This problem crops up in singlesubj ect research in three ways 1 treatment is inadvertently applied where it was not supposed to be provided at least some of the time 2 the effects of intervention generalizes from the situation in which treatment is applied to a situation where it was not applied generalization is quotgoodquot from an applied standpoint but can really mess up a design when it occurs spontaneously in ways we did not anticipate or 3 there are quotcarry overquot effects from one phase to another e g when treatment is withdrawn the behavior might have come under the control of other conditions and not revert to the baseline levels needed for support of our experimental hypotheses For example if we use a multiple baseline design to study various contingencies we might apply the conditions to one classroom and continue baseline conditions for another classroom If the children in the first classroom interact with children from the second classroom on the playground however children in the second classroom might experience those contingencies vicariously even though we have not yet applied those conditions directly in their class This has also been called a quottreatment contaminationquot problem although that phrase implies that other unspecified conditions have crept into our study in ways that might explain outcomes 10 Reactive Design Sometimes an experimental design will interact with treatment in a way that produces special outcomes For example in Walker s early research with token economies he observed that ABAB withdrawal designs often increased the impact of a token economy Essentially when subjects were first introduced to a token economy it might have little effect After the tokens were terminated withdrawn in the second Aphase of the study performances might noticeably worsen quotyou don t know what you have until you lose itquot and differences in performance only became truly clear after the tokens are reintroduced the second B phase quotwhen you get something back after losing it you really show your appreciationquot In other words the effect of the token economy can be in uenced by the design we use to study it 11 Reactive Intenention Timing One of the potential advantages to singlesubject research is that it39s possible to evaluate the progress of the study on a regular basis and if necessary make adjustments in our design or intervention to improve outcomes We might for example plan a study with a 5day baseline but the end of the 5 days conclude that the behavior is not sufficiently stable to allow a clear analysis of intervention effects so we continue the baseline for several more days Such adjustments are generally a good idea It is important to realize however that by adjusting Threats to Internal Validity 5 intervention or the timing of interventions in reaction to the subject s performances we might also be introducing threats to the internal validity of the study For example if we wait until a behavior has quotbottomedoutquot gotten as low as it is likely to go before introducing a program to increase the behavior it s possible that we d be intervening at precisely the time when the behavior was ready to quotbounce backquot without intervention What appears to be an intervention effect could then be simply the natural course of events Threats to Internal Validity 6


