New User Special Price Expires in

Let's log you in.

Sign in with Facebook


Don't have a StudySoup account? Create one here!


Create a StudySoup account

Be part of our community, it's free to join!

Sign up with Facebook


Create your account
By creating an account you agree to StudySoup's terms and conditions and privacy policy

Already have a StudySoup account? Login here


by: Celestino Bergnaum


Celestino Bergnaum
Texas A&M
GPA 3.5


Almost Ready


These notes were just uploaded, and will be ready to view shortly.

Purchase these notes here, or revisit this page.

Either way, we'll remind you when they're ready :)

Preview These Notes for FREE

Get a free preview of these Notes, just enter your email below.

Unlock Preview
Unlock Preview

Preview these materials now for free

Why put in your email? Get access to more of this material and other relevant free materials for your school

View Preview

About this Document

Class Notes
25 ?




Popular in Course

Popular in Statistics

This 20 page Class Notes was uploaded by Celestino Bergnaum on Wednesday October 21, 2015. The Class Notes belongs to STAT 641 at Texas A&M University taught by Staff in Fall. Since its upload, it has received 22 views. For similar materials see /class/225761/stat-641-texas-a-m-university in Statistics at Texas A&M University.




Report this Material


What is Karma?


Karma is the currency of StudySoup.

You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 10/21/15
H F 9 4 U 03 5 00 p O H STAT 641 INTRODUCTION TO STATISTICS TOPICS De niton of Statistics Scienti c Method Research Process What is Statistics Some Current Applications of Statistics What Do Statisticians Do Quality and Process Improvement Communication Verbal amp Written Preparation of Data Guidelines for a Statistical Analysis and Report Examples of StudiesExperiments DEFINITION OF STATISTICS Statistics is the science of designing studies or experiments collecting data and analyzing data for the purpose of decision making problem solving in the presence of variation Some examples of situations involving statistical ideas 1 Find the maximum of the function f ge o m mm2 over the range 0 S x S 12 when we observe y f e and the 6 s are unknown E0 How can the level of emissions in the exhaust of an automobile be reduced 00 What are the side effects of a drug developed to control swelling in the knee joint 7 What variables and activities affect the onset of dementia in the elderly 9 How accurate is a new device for detecting E coli in ground beef How do the measure ments compare to the currently used measuring device 03 How do we identify which factors have the greatest affect on product quality in an auto mobile assembly line A very famous statistician John Tukey distinctly summarized the joy of being a statistician The best thing about being a statistician is that you get to play in everyone7s backyard77 SCIENTIFIC METHOD Formulate Research Goal Research Hypotheses7 Models Formulate Decisions New Models Written Conclusions New Hypotheses Oral Presentations plan Study lnferences A A Graphs Sample Size7 Variables Collect Data E t t Experimental Units Data Management S lma 10H i S 1 M h i Hypotheses Testing amp mg SC amsm Model Assessment RESEARCH PROCESS De ne Research Objectives Formulate Research Hypotheses Models Populations of Interest Design Process to Collect Data Select Desired Power of Tests Level of Con dence Sample Size Experimental Unit s Treatments Sampling Design Variables etc Data Collection Monitor Selection of EU s Monitor Measurement Process Monitor Recording of Data Analyze Datai Develop New Study Graphs Tables Displays Validate Fitted Models Models Estimateswith SE s Explore New Models Testsp values Effect Size etc Forml ate New Hypotheses Need More Information Inferences Fme Data About Report ReSUItS 0f Stuck3 Subjects Do Not Respond Reseamh ObJeCtWeS Use non statistical Jargon Subjects Leave Study lt7 Assess Validity of Model Conditions Use Graphs Tables Variables Not Measured DeCiSiOHS abOUt Researeh Hypotheses Clearly Written with Statistical vs Practical Signi cance Proper Grammar The following discussion is from An Introduction to Statistical Methods and Data Analysis by L 0131 and M Longnecker What Is Statistics What is statistics Is it the addition of numbers Is it graphs batting averages percentages of passes completed percentages of unemployment and in general numerical descriptions of society and nature Statistics is a set of scienti c principles and techniques that are useful in reaching conclusions about populations and processes when the available informa tion is both limited and variable that is statistics is the science of learning from data Almost everyone including corporate presidents marketing representa tives social scientists engineers medical researchers and consumers deals with data These data could be in the form of quarterly sales gures percent increase in juvenile crime contamination levels in water samples survival rates for patients undergoing medical therapy census gures or input that helps determine which brand of car to purchase In this text we approach the study of statistics by considering the four steps in learning from data 1 designing the data collection process 2 preparing data for analysis summarization models 3 analyzing the data and 4 reporting the conclusions obtained during data analysis Before we jump into the study of statistics let s consider four instances in which the application of statistics could help to solve a practical problem 1 A lightbulb manufacturer produces approximately a half million bulbs per day The quality control department must monitor the defect rate of the bulbs It can accomplish this task by testing each bulb but the cost would be substantial and would greatly increase the price per bulb An alternative approach is to select 1000bulbs from the daily production of 500000 bulbs and test each of the 1000 The fraction of defective bulbs in the 1000 tested could be used to estimate the frac tion defective in the entire day s production provided that the 1000 bulbs were selected in the proper fashion We will demonstrate in later chapters that the fraction defective in the tested bulbs will probably be quite close to the fraction defective for the entire day s production of 500000 bulbs 2 To investigate the claim that people who quit smoking often experi ence a subsequent weight gain researchers selected a random sample of 400 participants who had successfully participated in programs to quit smoking The individuals were weighed at the beginning of the program and again one year later The average change in weight of the participants was an increase of 5 pounds The investigators concluded that there was evidence that the claim was valid We will develop tech niques in later chapters to assess when changes are truly signi cant changes and not changes due to random chance 3 For a study of the effects of nitrogen fertilizer on wheat production a total of 15 elds were available to the researcher She randomly as signed three elds to each of the ve nitrogen rates under investiga tion The same variety of wheat was planted in all 15 elds The elds were cultivated in the same manner until harvest and the number of pounds of Wheat per acre was then recorded for each of the 15 elds The experimenter wanted to determine the optimal level of nitrogen to apply to any wheat eld but of course she was limited to running ex periments on a limited number of elds After determining the amount of nitrogen that yielded the largest production of wheat in the study elds the experimenter then concluded that similar results would hold for wheat elds possessing characteristics somewhat the same as the study elds Is the experimenter justi ed in reaching this conclusion Similar applications of statistics are brought to mind by the frequent use of the New York TimesCBS News Washington PostABC News CNN Harris and Gallup polls How can these pollsters determine the opinions of more than 195 million Americans who are of voting age They certainly do not contact every potential voter in the United States Rather they sample the opinions of a small number of poten tial voters perhaps as few as 1500 to estimate the reaction of every person of voting age in the country The amazing result of this precess is that the fraction of those persons contacted who hold a particular opinion will closely match the fraction in the total population holding that opinion at a particular time We will supply convincing supportive evidence of this assertion a 1 39 quot 39 1quot These problems illustrate the four steps in learning from data First each problem involved designing an experiment or study The quality control group had to decide both how many bulbs needed to be tested and how to select the sample of 1000 bulbs from the total production of bulbs to obtain valid results The polling groups must decide how many voters to sample and how to select these individuals in order to obtain information that is representative of the population of all voters Similarly it was necessary to carefully plan how many participants in the weight gain study were needed and how they were to be selected from the list of all such participants Furthermore what variables should39the researchers have measured on each participant Was it necessary to know each participant s age sex physical tness and other healthrelated variables or was Weight the only important variable The results of the study may not be relevant to the general population if many of the participants in the study had a particular health condition In the wheat experiments it was important to measure both soil characteristics of the elds and environmental conditions such as temperature and rainfall to obtain results that could be generalized to elds not included in the study The design of a study or experiment is crucial to obtaining results that can be generalized beyond the study Finally having collected summarized and analyzed the data it is important to report the results in unambiguous terms to interested people For the lightbulb example management and technical staff would need to know the quality of their production batches Based on this information they could determine whether adjustments in the process are necessary Therefore the results of the statistical analyses cannot be presented in ambiguous terms decisions must be made from a wellde ned knowledge base The results of the weight gain study would be of vital interest to physicians who have patients participating in the smokingcessation program If a signi cant increase in weight was recorded for those individuals who had quit smoking physicians may have to recommend diets so that the former smokers would not go from one health problem smoking to another elevated blood pressure due to being overweight It is crucial that a careful description of the participants that is age sex and other health related information be included in the report In the wheat study the experiments would provide farmers with information that would allow them to economically select the optimum amount of nitrogen required for their elds Therefore the report must contain information concerning the amount of moisture and types of soils present on the study elds Otherwise the conclusions about optimal wheat production may not pertain to farmers growing wheat under considerably different conditions To infer validly that the results of a study are applicable to a larger group than just the participants in the study we must carefully de ne the population to which inferences are sought and design a study in which has been appropriately selected from the designated the sample 1 population wquotquotquot 39quot a quot V quot Set of all measurements the population 39 N Set of measurements selected from the population the sample lamina current appttcttttena at fatatistica Acid Rain A threat to Star nvimnmant The accepted causes of acid rain are sulfuric and nitric acids the sources of these acidic components of rain are hydrocarbon fuels which spew sulfur and nitric joxide into the atmosphere when burned Here are some of the many effects of l acid rain as Acid rain when present in spring snow melts invades breeding areas for many sh which prevents successful reproduction Forms of life that depend on ponds and lakes contaminated by acid rain begin to dis appear I In forests acid rain is blamed for weakening some varieties of trees making them more susceptible to insect damage and disease 0 In areas surrounded by affected bodies of water vital nutrients are leached from the soil 0 Manmade structures are also affected by acid rain Experts from the United States estimate that acid rain has caused nearly 15 billion of l damage to buildings and other structures thus far l Solutions to the problems associated with acid rain will not be easy The National Science Foundation NSF has recommended that we strive for a 50 l reduction in sulfuroxide emissions Perhaps that is easier said than done High sulfur coal is a major source of these emissions but in states dependent on coal for energy a shift to lower sulfur coal is not always possible Instead better scrubbers must be developed to remove these contaminating oxides from the burning process before they are released into the atmosphere Fuels for internal combustion engines are also major sources of the nitric and sulfur oxides of acid rain Clearly better emission control is needed for automobiles and trucks Reducing the oxide emissions from coalburning furnaces and motor vehicles will require greater use of existing scrubbers and emission control devices as well as the development of new technology to allow us to use available energy sources Developing alternative cleaner energy sources is also important if we are to meet NSF s goal Statistics and statisticians will play a key role in monitoring atmosphere conditions testing the effectiveness of proposed emission control devices and developing new control technology and alternative energy sources i Determining the Effectiveness j of a New Drug Product The development and testing of the Salk vaccine for protection against poliornyeli tis polio provide an excellent example of how statistics can be used in solving practical problems Most parents and children growing up before 1954 can recall the panic brought on by the outbreak of polio cases during the summer months Although relatively few children fell victim to the disease each year the pattern of outbreak of polio was unpredictable and caused great concern because of the possibility of paralysis or death The fact that very few of today s youth have even heard of polio demonstrates the great success of the vaccine and the testing program that preceded its release on the market It is standard practice in establishing the effectiveness of a particular drug product to conduct an experiment often called a clinical trial with human partici pants For some clinical trials assignments of participants are made at random with half receiving the drug product and the other half receiving a solution or tablet called a placebo that does not contain the medication One statistical problem concerns the determination of the total number of participants to be included in the clinical trial This problem was particularly important in the testing of the Salk vaccine because data from previous years suggested that the incidence rate for polio might be less than 50 cases for every 100000 children Hence a large number of participants had to be included in the clinical trial in order to detect a difference in the incidence rates for those treated with the vaccine and those receiving the placebo With the assistance of statisticians it was decided that a total of 400000 children should be included in the Salk clinical trial begun in 1954 with half of them randomly assigned the vaccine and the remaining children assigned the placebo No other clinical trial had ever been attempted on such a large group of participants Through a public school inoculation program the 400000 participants were treated and then observed over the summer to determine the number of children contracting polio Although fewer than 200 cases of polio were reported for the 400000 participants in the clinical trial more than three times as many cases appeared in the group receiving the placebo These results together with some statistical calculations were suf cient to indicate the effectiveness of the Salk polio vaccine However these conclusions would not have been possible if the statisticians and scientists had not planned for and conducted such a large clinical trial The development of the Salk vaccine is not an isolated example of the use of statistics in the testing and developing of drug products In recent years the Food and Drug Administration FDA has placed stringent requirements on pharmaceutical rms to establish the effectiveness of proposed new drug products Thus statistics has played an important role in the development and testing of birth control pills rubella vaccines chemotherapeutic agents in the treatment of cancer and many other preparations Applications of Statistics in Our Courts Libel suits related to consumer products have touched each one of us you may have been involved as a plaintiff or defendant in a suit or you may know of someone who was involved in such litigation Certainly we all help to fund the costs of this litigation indirectly through increased insurance premiums and in creased costs of goods The testimony in libel suits concerning a particular product automobile drug product and so on frequently leans heavily on the interpreta tion of data from one or more scienti c studies involving the product This is how and why statistics and statisticians have been pulled into the courtroom For example epidemiologists have used statistical concepts applied to data to determine whether there is a statistical association between a speci c charac teristic such as the leakage in silicone breast implants and a disease condition such as an autoimmune disease An epidemiologist who finds an association should try to determine whether the observed statistical association from the study is due to random variation or whether it re ects an actual association between the characteristic and the disease Courtroom arguments about the interpretations of these types of associations involve data analyses using statistical concepts as well as a clinical interpretation of the data Many other examples exist in which statistical models are used in court cases In salary discrimination cases a lawsuit is led claiming that an employer underpays employees on the basis of age ethnicity or sex Statistical models are developed to explain salary differences based on many factors such as work experience years of education and work performance The adjusted salaries are then compared across age groups or ethnic groups to deter mine whether signi cant salary differences exist after adjusting for the relevant work performance factors Estimating Bowhead Whale Population Size Raftery and Zeh 1998 discuss the estimation of the population size and rate of increase in bowhead whales Balaena mysticetus The importance of such a study derives from the fact that bowheads were the first species of great whale for which commercial whaling was stopped thus their status indicates the recovery prospects of other great whales Also the International Whaling Commission uses these estimates to determine the aboriginal subsistence whaling quota for Alaskan Es kimos To obtain the necessary data researchers conducted a visual and acoustic census off Point Barrow Alaska The researchers then applied statistical models and estimation techniques to the data obtained in the census to determine whether the bowhead population had increased or decreased since commercial whaling was stopped The statistical estimates showed that the bowhead population was increas ing at a healthy rate indicating that stocks of great whales that have been decrmated by commercial hunting can recover after hunting is discontinued throne Exposure and reputation transits Ambient ozone pollution in urban areas is one of the nation s most pervasive en vironmental problems Whereas the decreasing stratospheric ozone layer may lead to increased instances of skin cancer high ambient ozone intensity has been shown to cause damage to the human respiratory system as well as to agricultural crops and trees The Houston Texas area has ozone concentrations rated second only to Los Angeles in exceeding the National Ambient Air Quality Standard Carroll et a1 1997 describe how to analyze the hourly ozone measurements collected in Houston from 1980 to 1993 by 9 to 12 monitoring stations Besides the ozone level each station also recorded three meteorological variables temperature wind speed and wind direction The statistical aspect of the project had three major goals i Provide information andor tools to obtain such information about the amount and pattern of missing data as well as about the quality of the ozone and the meteorological measurements 9 Build a model of ozone intensity to predict the ozone concentration at any given location within Houston at any given time between 1980 and 1993 3 Apply this model to estimate exposure indices that account for either a longterm exposure or a shortterm highconcentration exposure also relate census information to different exposure indices to achieve population exposure indices The spatial temporal model the researchers built provided estimates demon strating that the highest ozone levels occurred at locations with relatively small populations of young children Also the model estimated that the exposure of young children to ozone decreased by approximately 20 from 1980 to 1993 An examination of the distribution of population exposure had several policy impli cations In particular it was concluded that the current placement of monitors is not ideal if one is concerned with assessing population exposure This project in volved all four components of learning from data planning where the monitoring stations should be placed within the city how often data should be collected and what variables should be recorded conducting spatial temporal graphing of the data creating spatial temporal models of the ozone data meteorological data and demographic data and nally writing a report that could assist local and federal of cials in formulating policy with respect to decreasing ozone levels maintain and Weierstrass hatter Public opinion consumer preference and election polls are commonly used to assess the opinions or preferences of a segment of the public for issues products or candidates of interest We the American public are exposed to the results of these polls daily in newspapers in magazines on the radio and on television For example the results of polls related to the following subjects were printed in local newspapers over a 2day period 0 Consumer con dence related to future expectations about the economy 0 Preferences for candidates in upcoming elections and caucuses O Attitudes toward cheating on federal income tax returns 0 Preference polls related to speci c products for example foreign vs American cars Coke vs Pepsi McDonald s vs Wendy s 0 Reactions of North Carolina residents toward arguments about the mo rality of tobacco 0 Opinions of voters toward proposed tax increases and proposed changes in the Defense Department budget A number of questions can be raised about polls Suppose we consider a poll on the public s opinion toward a proposed income tax increase in the state of Michigan What was the population of interest to the pollster Was the pollster interested in all residents of Michigan or just those citizens who currently pay income taxes Was the sample in fact selected from this population If the popula tion of interest was all persons currently paying income taxes did the pollster make sure that all the individuals sampled were current taxpayers What questions were asked and how were the questions phrased Was each person asked the same question Were the questions phrased in such a manner as to bias the responses Can we believe the results of these polls Do these results represent how the general public currently feels about the issues raised in the polls Opinion and preference polls are an important visible application of statistics for the consumer We will discuss this topic in more detail 39 a We hope that after studying this material you will have a better understanding of how to interpret the results of these polls What Do Statisticians Do What do statisticians do In the context of learning from data statisticians are involved with all aspects of designing a study or experiment preparing the data for analysis using graphical and numerical summaries analyzing the data and reporting the results of their analyses There are both good and bad ways to gather data Statisticians apply their knowledge of existing survey techniques and scienti c study designs or they develop new techniques to provide a guide to good methods of data collection We will explore these ideas further Once the data are gathered they must be summarized before any meaningful interpretation can be made Statisticians can recommend and apply useful methods for summarizing data in graphical tabular and numerical forms Intelligent graphs and tables are useful rst steps in making sense of the data Also measures of the average or typical value and some measure of the range or spread of the data help in interpretation These topics will be discussed in detail 7 The objective of statistics is to make an inference about a population of interest based on information obtained from a sample of measurements from that population The analysis stage of learning from data deals with making inferences For example a market research study reaches only a few of the potential buyers of a new product but the probable reaction of the set of potential buyers popula tion must be inferred from the reactions of the buyers included in the study sample If the market research study has been carefully planned and executed the reactions of those included in the sample should agree reasonably well but not necessarily exactly with the population We can say this because the basic concepts of probability allow us to make an inference about the population of interest that includes our best guess plus a statement of the probable error in our best guess We will illustrate how inferences are made by an example Suppose an auditor randomly selects 2000 nancial accounts from a set of more than 25000 accounts and nds that 84 42 are in error What can be said about the set of 25000 accounts What inference can we make about the percentage of accounts in error for the population of 25000 accounts based on information obtained from the sample of 2000 accounts We will show that our best guess inference about the percentage of accounts in error for the population is 42 and this best guess should be within 19 of the actual unknown percentage of accounts in error for the population The plus orminus factor is called the probable error of our inference Anyone can make a guess about the percentage of accounts in error concepts of probability allow us to calculate the probable error of our guess In dealing with the analyses of data statisticians can apply existing methods for making inferences some theoretical statisticians engage in the development of new methods with more advanced mathematics and probability theory Finally statisticians are involved with communicating the results of their analyses as the nal stage in making sense of data The form of the communication varies from an informal conversation to a formal report The advantage of a more formal verbal presentation with visual aids or a study report is that the communication can use graphical tabular and numerical displays as well as the analyses done on the data to help convey the sense found in the data Too often this is lost in an informal conversation The report or communication should convey to the intended audience what can be gleaned from the sample data and it should be conveyed in as nontechnical terms as possible so there can be no confusion as to what is inferred 39 quot We will identify the important components that should be included in the report while discussing case studies used to illustrate the statistical concepts in several of the chapters i 7 It is important to note that the ideas in the preceding discussion are relevant to everyone involved in a study or experiment Degreed statlst1c1ans are somewhat rare individuals Many organizations have no statisticians or only a few in their employment Thus in many studies the design used in collecting the data the summary and statistical analyses of the data and the communication of the study results will be conducted by the individuals involved in the study with little or no support from a degreed statistician In those cases where a statistician is an active member of the research team it is still important for the other members of the group to have general knowledge of the concepts involved in a statistical design and data analysis In fact each member of the team brings an area of expertise and experience to the problems being addressed Then within the context of the team dynamics decisions Will be made about the design of the study and how the results of the analyses will be communicated l Statistics and Process Improvement One might wonder at this stage why we would bring up the subject of quality and process improvement in a statistics textbook We do so to make you aware of some of the broader issues involved with learning from data in the business and scienti c communities The post World War 11 years saw US business and the US economy dominate world business and this lasted for about 30 years During this time there was little attempt to change the ways things were done the major focus was on doing things on a much grander scale perfecting mass production However from the mid19705 through today many industries have had to face erce competi tion from their counterparts in Japan and more recently from other countries in the Far East such as China and Korea Quality rather than quantity has become the principal buying gauge used by consumers and American industries have had a dif cult time adjusting to this new emphasis Unless there are drastic changes in the way many American industries approach their businesses there will be many more casualties to the quality revolution OI The Japanese were the rst to learn the lessons of quality They readily used the statistical qualitycontrol and processcontrol suggestions espoused by Deming 1981 and others and installed total quality programs Through the organization from top management down they had a commitment to improving the quality of their products and procedures They were never satis ed with the way things were and continually looked for new and better ways A number of American companies have now begun the journey toward excellence through the institution of a quality improvement process Listed below are ten basic requirements that provide the foundation for a successful quality improvement process Fundamental Requirements for a Successful QualityImprovement Process 1 A focus on the customer as the most important part of the process 2 A longterm commitment by management to make the quality im provement process part of the management system 3 The belief that there is room to improve 4 The belief that preventing problems is better than reacting to them 5 Management focus leadership and participation 6 A performance standard goal of zero errors 7 Participation by all employees both as groups and as individuals 8 An improvement focus on the process not the people 9 The belief that suppliers will work with you if they understand your needs 10 Recognition for success Embedded in a companywide qualityimprovement process or running con current with such a process is the idea of improving the work processes For years companies in trying to boost and improve performance have tried to speed up their processes usually with additional people or technology but without ad dressing possible de ciencies in the work processes In the groundbreaking book Reengineering the Corporation 1993 by Michael Hammer and James Champy and in Hammer s later book Beyond Reengineering 1996 Hammer and Champy addressed how a corporation could achieve radical improvement in quality ef ciency and effectiveness by completely rethinking their business processes that have been maintained in a rapidly changing business and technology environment If we de ne a task as a unit of work and a process as a sequence of related tasks that create value for the customer Hammer and Champy were offering corporations a way to refocus their change efforts in value creating activities The case for change is compelling Within almost every major business apparel eg Nike chemicals eg Dupont computer equipment eg Dell computer software eg Microsoft electronics eg General Electric food e g Nestl general merchandising eg WalMart network communications eg Cisco petroleum eg Exxon Mobil pharmaceuticals eg Eli Lilly and so on the competitive position of the segment leader has been is currently or will soon be challenged In many cases the industry leader has not kept pace with the dizzying changes occurring in the marketplace Mergers proliferate with high ex pectations from management and shareholders for increased market share cost l synergies reductions and increased pro tability Unfortunately the list of suc cessful mergers as de ned by those meeting the initial case for action driving the merger is pitifully small Something else is needed Christopher Meyer in his book Fast Cycle Time 1993 makes the case that in an everchanging marketplace the competitor that can consistently reliably and profitably provide the greatest value to the cus tomer will win Meyer s basic premise is that a corporation must shorten its overall business cycle which begins with identi cation of a customer s need and ends with the payment for a product delivered or service rendered A company that can do this well over time as needs and the competitive environment change will win Whether a company focuses on business process improvement or fast cycle time the foundation for change will be the underlying data about customer needs current internal cycle time and comparable benchmark data in the industry Win ners in the ongoing competition will be those who de ne what they re trying to do establish ongoing data requirements to assess customer needs and current state of operations rapidly implement recommended changes and document their learn ing These four points which are very similar to the four steps in learning from data discussed earlier in the chapter drive home the relevance of statistics learning from data to the business environment A number of statistical tools and techniques that can help in these business improvement efforts are shown here Statistical Tools Techniques and Methods Used in Quality Improvement and Reengineering it Histograms Numerical descriptive measures means standard deviations proportions etc Scatterplots Line graphs scatterplots with dots connected Control charts y sample mean r sample range and 5 sample standard deviation Sampling schemes Experimental designs 6 9 The statistical tools and concepts listed here and discussed in this textbook are only a small component of a business process improvement or fastvcycletime quot initiative As you encounter these tools and concepts in various parts of this text keep in mind where you think they may have application in business improvement efforts Quality improvement process redesign and fast cycle time are clearly the focus of American industry for the 1990s in world markets characterized by in 39 creased competition more consolidation and increased specialization These shifts l will have impacts on us all either as consumers or business participants and it will be useful to know some of the statistical tools that are part of this revolution Finally in recent years the ideas and principles of quality control have been applied in areas outside of manufacturing Service industries such as hotels restau rants and department stores have successfully applied the principles of quality 1 control in their businesses Many federal agencies for example the IRS the Department of Defense and the USDA have adapted the principles of quality control to improve the performance of their agencies II the assesses at ease estasstaieettea We have spent time throughout the book making sense of data the final step in this process is the communication of results How might you communicate the results of a study or survey The list of possibilities is almost endless including all forms of verbal and written communication There is quite a range of possibilities for ver bal and written communication For example written communication within a company or organization can vary from an informal short note or memo to a for mal project report Figure 131 39 Communicating the results of a statistical analysis in concise unambiguous terms is dif cult In fact descriptions of most things are dif cult For example try to describe the person sitting next to you so precisely that a stranger could select the individual from a group of others having similar physical characteristics It is not an easy task Fingerprints voiceprints faceprints and photographs all picto rial descriptions are some of the most precise methods of human identl cation The description of a set of measurements is also a dif cult task However llke the description of a person it can be accomplished more easily by usmg graphics or pIC torial methods I Cave drawings convey to us scattered bits of information about the life of pre historic people Similarly vast quantities of knowledge about the ancient lives and cultures of the Babylonians Egyptians Greeks and Romans are brought to life by means of drawings and sculpture Art has been used to convey a picture of various lifestyles history and culture in all ages Not surprisingly the use of graphs and tables along with a written description can help to convey the meaning of a statis tical analysis In reading the results of a statistical analysis and in communicating the results of our own analyses we must be careful not to distort them because of the way we present the data and results You have all heard the expression It is easy to he With statistics The idea is not new The famous British statesman Disraeli IS quoted as vertsicgmmtm tara 39 quot 39 eg 39 innlv Formal preject report 39 saying There are three kinds of lies lies damned lies and statistics Where do things go wrong First of all the distortion of truth can occur only when we communicate And because communication can be accomplished with graphs pictures sound aroma taste words numbers or any other means devised to reach our senses distortions can occur using any one or any combination of these methods of communication In this respect statements that we make could be misleading to others because we might have omitted something in the explanation of the datagathering stage or with the analyses done For example we might unintentionally fail to clearly ex plain the meaning of a numerical statement or we might omit some background information that is necessary for a clear interpretation of the results Even a correct statement may appear to be distorted if the reader lacks knowledge of elementary statistics Thus a very clear expression of an inference using a 95 confidence in terval is meaningless to a person who has not been exposed to the introductory concepts of statistics 39 Now we will look at some potential hurdles to effective communication that we must carefully consider when we present the results of a statistical analysis or when we try to interpret what someone else has presented Qesssaaatesttea tttttettasz taetrteat seamstress Pictures can easily distort the truth The marketing of many products including soft drinks beers cosmetics clothing automobiles and many more involves the use of attractive youthful models The notsosubtle impression we are left with is that somehow by using the product we too will look like these models Have you ever stepped back from one of these commercials and wondered how the commer cial message relates to the quality and usefulness of the product Have you thought about how you are being misled by a commercial The use of sex appeal to sell products is very prominent and we seem to accept this type of distortion The beer ad article shown here which appeared in USA Today March 15 2000 illustrates how we are manipulated through these types of ads Sex Appeal Slipping Back into Beer Ads Risqu TV leads to more what were seen as sexist symbols such as The Swedish Bikini Team from their ads Now brewers including Miller Brew ing AnheuserBusch and Heineken are in jecting sex appeal back into TV ads We re in a new decade and tastes are changing says marketing consultant By Michael McCarthy USA TODAY NEW YORK The babes in socalled Beer amp Babes advertising are back i Nearly a decade ago national soul 1 searching about sexual harrassment in the wake of the Anita Hill controversy forced beer marketers to purge traditional and liberal promotions Laura Ries As TV is more risque beer ads re ect a more liberal environmentquot she says For a while it was dogs cats and penguins Now they re reverting back to sex sells It s an attention grabberquot ZI Unemf ioyrwil 24416 6 Television Internet and catalog pictures of products are frequently more at tractive than the real thing but we usually take this type of distortion for granted Statistical pictures are the histograms frequency polygons pie charts and bar graphs These drawings or displays of numerical results are dif cult to combine with sketches of lovely women or handsome men and hence are secure from the most common form of graphic distortion However other distor tions are possible One could shrink or stretch the axes thus distorting the actual results The idea behind these distortions is that shallow and steep slopes are com monly associated with small and large increases respectively For example suppose we want to examine the unemployment rate over a 12month period We might show the upward trend as shown in Figure 132 In this graph the increase in the unemployment rate is apparent but it does not appear to be that great On the other hand we could represent the same data in a much dif ferent light as shown in Figure 133 In this graph the vertical axis is stretched and does not include 0 Note the impression of a substantial rise that is indicated by the i tig 12month unemployment rate 2000 2001 i If i N MW at Mwswgw vau kM 4 wmwwtwwwwwsvwr 3 2 1 0 l I I I l I I I I I N D J F M A M J J A S O 2000 2001 sseeaa i 12month unemployment i rate 2000 2001 55 a f 50 1 WWW 45 WWNWWMWWMW WWWW ampWHWWM 40 W l I I I I 2000 2001 steeper slope Another way to achieve the same effect to decrease or increase a slope is to stretch or shrink the horizontal axis When we present data in the form of bar graphs histograms frequency poly gens or other gures we must be careful not to shrink or stretch axes because do ing so will catch most readers offguard Increases or decreases in responses should be judged large or small depending on the arbitrary importance to the observer of the change not on the slopes shown in graphic representations In reality most people look only at the slopes in the pictures erttstaetaa ee semesters stresses Masseuse One of the most common statistical distortions occurs because the experimenter unwittingly or sometimes knowingly samples the wrong population That is he or she draws the sample from a set of measurements that is not the proper population of interest For example suppose that we want to assess the reaction of taxpayers to a proposed park and recreation center for children A random sample of households is selected and interviewers are sent to those households in the sample Unfortu nately no one is at home in 40 of the sample households so we randomly select and substitute other households in the city to make up the de cit The resulting sample is selected from the wrong population and the sample is therefore said to be biased The speci ed population of interest in the household survey is the collection of opinions that is obtained from the complete set of all households in the city In contrast the sample was drawn from a much smaller population or subset of this group the set of opinions from householders who were at home when the sample was taken It is possible that the fractions of householders favoring the park in these two populations are equal and no damage was done by con ning the sampling to those at home However it is much more likely that those at home had small chil dren and that this group would yield a higher fraction in favor of the park than would the city as a whole Thus we have a biased sample because it is loaded in fa vor of families with small children Perhaps a better way to see the dif culty is to note that we unwittingly selected the sample only from a special subset of the popu lation of interest Biased samples frequently result from surveys that use mailed questionnaires In a sense the investigator lets the selection and number of the sampling units de pend on the interests available time and various other personal characteristics of the individuals who receive the questionnaires Extremely busy and energetic people may drop the questionnaires into the nearest wastebasket you rarely hear from those lowenergy folk who are uninterested or who are engrossed with other activities Most often the respondents are activists those who are highly in favor those who are opposed or those who have something to gain from a certain out come of the survey In telephone surveys just think how many people have caller ID or use their answering machine to screen calls choosing to answer only selected calls this can have a tremendous impact on the results of the survey Although numerous newscasters and analysts use election results as an ex pression of public opinion on major issues it is a wellknown fact that voting results represent a biased sample of public opinion Those who vote represent much less than half of the eligible voters they are individuals who desire to exercise their SI rights and responsibilities as citizens or are individuals who have been specially motivated to participate The resultant subset of voters is not representative of the interests and opinions of all eligible voters in the country Sampling the wrong population also occurs when people attempt to extrapo late experimental results from one population to another Numerous experimental results have been published about the effect of various products eg saccharin in inducing cancer in moles rats the breasts of beagles and so forth These results are often used to imply that humans have a high risk of developing cancer after fre quent or extended exposure to the product These inferences are not always justi ed because the experimental results were not obtained on humans It is quite pos sible that humans are capable of resisting much higher doses than rats or perhaps humans may be completely resistant for some reason Drug induction of cancer in small mammals does indicate a need for concern and caution by humans but it does not prove that the drug is de nitely harmful to humans Note that we are not criti cizing experimentation in various species of animals it is frequently the only way we can obtain any information about potential toxicity in human beings We simply point out that the experimenter is knowingly sampling a population that is only similar and quite likely not too similar to the one of interest Engineers also test rats instead of humans The rats in this context are miniature models or pilot plants of a new engineering system Experiments on the models occasionally yield results that differ substantially from the results of the larger real systems So again we see a sampling from the wrong population but it is the best the engineer can do because of the economics of the situation Funds are not usually available to test a number of fullscale models prior to production Many other examples could be given of biased samples or of sampling from the wrong populations The point is that when we communicate the results of a study or survey we should be clear about how the sample was drawn and whether it was randomly selected from the population of interest How often are we told how a survey was conducted and what questions were asked when the results of polls are presented in the media If this information is not given in the published results of a survey or experiment the reader should take the inferences with a grain of salt emmeeiaaeiea iquot estates segregate tae Distortions can occur when the sample size is not discussed For example suppose you read that a survey indicates that approximately 75 of a sample favor a new highrise building complex Further investigation might reveal that the investigator sampled only four people When three out of the four favored the project the in vestigator decided to stop the survey Of course we exaggerate with this example but we could also have revealed inconclusive results based on a sample of 25 even though many buyers would consider this sample size to be large enough As you well know very large samples are required to achieve adequate information in sam pling binomial populations Fortunately many publications now provide more information about the sample size and how opinion surveys are conducted Not too many years ago it was rare to nd how many people were sampled much less how they were sampled The situation is different now In fact sometimes the media have gone too far in an at tempt to be completely open about how a survey was done A case in point is the following article from the Wall Street Journal How many of us understand much more than the number of people sampled and the approximate plus or minus con dence interval It would take a person well trained in statistics and survey sampling to interpret what was done Again the moral of the story is simple Try to communicate in unambiguous terms How Poll Was Conducted The Wall Street JournalNBC News poll was baSed on nationwide telephone inter views conducted last Friday through Mon day with 4159 adults age 18 or older There were 2630 likely voters The sample was drawn from a com plete list of telephone exchanges choSen so that each region of the country was rep resented in proportion to its population Households were selected by a method included The results of the survey were weighted to adjust for variations in the sample relating to edueation age race gen der and religion Chances are 19 of 20 that if all adults in the United States had been surveyed us ing the same questionnaire the ndings would differ from these poll results by no more than two percentage points in either direction The margin of error for sub that gave all telephone numbers listed and groups may be larger unlisted a proportionate chance of being emptiness esta tar tettstieat emanate We begin with a discussion of the steps involved in processing data from a study In practice these steps may consume 75 of the total effort from the receipt of the raw data to the presentation of results from the analysis What are these steps why are they so important and why are they so timeconsuming To answer these questions let s list the major dataprocessing steps in the cycle which begin with receipt of the data and end when the statistical analysis be gins Then we ll discuss each step separately Receiving the raw data source f2 Creating the database from the raw data source 2 Editing the database 33 Correcting and clarifying the raw data source Finalizing the database a Creating data les from the database 1 Receiving the raw data source For each study that is to be summarized and analyzed the data arrive in some form which we ll refer to as the raw data source For a clinical trial the raw data source is usually case report forms sheets of 8quot X 11 paper that have been used to record study data for each patient entered into the study For other types of studies the raw data source may be sheets of pa per from a laboratory notebook a magnetic tape or any other form of machine readable data hand tabulations and so on It is important to retain the raw data source because it is the beginning of the data trail which leads from the raw data to the conclusions drawn from a study Many consulting operations involved with the analysis and summarizing of many T71 different studies keep a log that contains vital information related to the study and raw data source General information contained in a study log is shown next i Data received and from Whom Q Study investigator 3 S 39 sticiarrand othersQaSSigned 2 f d escriptionpf study iii Tr tments39tcornp ounds preparations and so vowstudied Raw data39source Responses measured Reference number for study is Estimated actual completion date its Other pertinent information was Later when the study has been analyzed and results have been communi cated additional information can be added to the log on how the study results were communicated where these results are recorded what data les have been saved and where these les are stored 2 Creating the database from the raw data source For most studies that are scheduled for a statistical analysis a machinereadable database is created The steps taken to create the database and the eventual form of the database vary from one operation to another depending on the software systems to be used in the sta tistical analysis However we can give a few guidelines based on the form of the entry system When the data are to be keyentered at a terminal the raw data are rst checked for legibility Any illegible numbers or letters or other problems should be brought to the attention of the study coordinator Then a coding guide that assigns column numbers and variable names to the data is lled out Certain codes for miss ing values for example thOSe not available are also de ned here Also it is helpful to give a brief description of each variable The data le keyed in at the terminal is referred to as the machinereadable database A listing of the contents of the data base should be obtained and checked carefully against the raw data source Any errors should be corrected at the terminal and veri ed against an updated listing Sometimes data are received in machinereadable form In these situations the magnetic tape or disk le is considered to be the database You must however have a coding guide to read the database Using the coding guide obtain a listing of the contents of the database and check it carefully to see that all numbers and characters look reasonable and that proper formats were used to create the le Any problems that arise must be resolved before proceedng further Some data sets are so small that it is not necessary to create a machine readable data le from the raw data source Instead calculations can be performed by hand or the data can be entered into an electronic calculator In these situations check any calculations to see that they make sense Don t believe everything you see redoing the calculations is not a bad idea 3 Editing the database The types of edits done and the completeness of the editing process really depend on the type of study and how concerned you are about the accuracy and completeness of the data prior to analysis For example in using SAS les it is wise to examine the minimum maximum and frequency dis tribution for each variable to make certain nothing looks unreasonable Certain other checks should be made Plot the data and look for problems Also certain logic checks should be done depending on the structure of the data For example if data are recorded for patients during several different visits then the data recorded for visit 2 cannot be earlier than the data for visit 1 similarly if a patient is lost to followup after visit 2 we cannot have any data for that patient at later Visits For small data sets we can do these data edits by hand but for large data sets the job may be too timeconsuming and tedious If machine editing is required look for a software system that allows the user to specify certain data edits Even so for more complicated edits and logic checks it may be necessary to have a customized edit program written in order to machineedit the data This programming chore can be a timeconsuming step plan for this well in advance of receipt of the data 4 Correcting and clarifying the raw data source Questions frequently arise concerning the legibility or accuracy of the raw data during any one of the steps 39 from the receipt of the raw data to the communication of the results from the statis tical analysis We have found it helpful to keep a list of these problems or discrep ancies in order to de ne the data trail for a study If a correction or clari cation is required to the raw data source this should be indicated on the form and the ap propriate change made to the raw data source If no correction is required this should be indicated on the form as well Keep in mind that the machine readable database should be changed to re ect any changes made to the raw data source 5 Finalizing the database You may have been led to believe that all data for a study arrive at one time This of course is not always the case For example with a marketing survey different geographic locations may be surveyed at dif ferent times and hence those responsible for data processing do not receive all the data at once All these subsets of data however must be processed through the cycles required to create edit and correct the database Eventually the study is de clared complete and the data are processed into the database At this time the database should be reviewed again and nal corrections made before beginning the analysis This is because for large data sets the analysis and summarizing chores take considerable staff and computer time It s better to agree on a nal database analysis than to have to repeat all analyses on a Changed database at a later date 6 Creating data les from the database Generally one or two sets of data les are created from the machinereadable database The rst set referred to as original les re ects the basic structure of the database A listing of the les is checked against the database listing to verify that the variables have been read with correct formats and missing value codes have been retained For some studies the original les are actually used for editing the database A second set of data les called work les may be created from the original les Work les are designed to facilitate the analysis They may require restruc turing of the original les a selection of important variables or the creation or ad dition of new variables by insertion computation or transformation A listing of the work les is checked against that of the original les to ensure proper restruc turing and variable selection Computed and transformed variables are checked by hand calculations to verify the program code If your original and work les are SAS data sets you should use the docu mentation features provided by SAS At the time an SAS data set is created a de scriptive label for the data set of up to 40 characters should be assigned The label can be stored with the data set and imprinted wherever the contents procedure is used to print the data set s contents All variables can be given descriptive names up to 8 characters in length which are meaningful to those involved in the project 9i In addition variable labels up to 40 characters in length can be used to provide ad ditional information Title statements can be included in the SAS code to identify the project and deseribe each job For each le a listing proc print and a diction ary proc contents can be retained For les created from the database using other software packages use the la beling and documentation features available in the computer program Even if appropriate statistical methods are applied to data the conclusions drawn from the study are only as good as the data on which they are based So you be the judge The amount of time spent on these data processing chores before analysis really depends on the nature of the study the quality of the raw data source and how con dent you want to be about the completeness and accuracy of the data aideiines tea deaststieat easiests and dessert In this section we brie y discuss a few guidelines for performing a statistical analy sis and list some important elements of a statistical report used to communicate re sults The statistical analysis of a large study can usually be broken down into three types of analyses 1 preliminary analyses 2 primary analyses and 3 backup analyses The preliminary analyses which are often descriptive or graphic familiarize the statistician with the data and provide a foundation for all subsequent analyses These analyses may include frequency distributions histograms descriptive statis tics an examination of comparability of the treatment groups correlations or uni variate and bivariate plots Primary analyses address the objectives of the study and the analyses on which conclusions are drawn Backup analyses include alternate methods for examining the data that con rm the results of the primary analyses they may also include new statistical methods that are not as readily accepted as the more standard methods Several guidelines for analyses follow t Perform the analyses with software that has been extensively tested Eel Label the computer output to re ect which study is analyzed what sub jects animals patients and so on are used inthe analysis anda brief description of the analysis preferred For example TITLE statements in SAS are very helpful 3 Use variable labels and value labels for example 0 none 1 mild on the output it Provide a list of the data used in each analysis it Check the output carefully for all analyses Did the job run successfully Are the sample sizes means and degrees of freedom correct Other checks may be necessary as well ti Save all preliminary primary and backup analyses that provide the infor mational base from which study conclusions are drawn After the statistical analysis is completed conclusions must be drawn and the results communicated to the intended audience Sometimes it is necessary to com municate these results as a formal written statistical report A general outline for a statistical report that we have found useful and informative follows j General Outline for a 1 Statistical Report l l is Surmnary Introd etion it Expe ntaisdeSignand studyprocedures a Descriptive statistiCS a Statistical methodology a Results and conclusions l Discussion a Data listings basemensatiea and dressage at narrates The nal part of this cycle of data processing analysis and summarizing concerns the documentation and storage of results For formal statistical analyses that are subject to careful scrutiny by others it is important to provide detailed documen tation for all data precessing and the statistical analyses so the data trail is clear and the database or work les readily accessible Then the reviewer can follow what has been done redo it or extend the analyses The elements of a documentation and storage le depend on the particular setting in which you work The contents for a general documentation storage le are as follows i Study Documentation land Storage File l t Statistical report l a Study description 2s Random code used to assign subjects to treatment groups a Important correspondence i it File creation information l 3 Preliminary primary and backup analyses 3 Raw data source l 1 A data management sheet which includes the study log as well as infor mation on the storage of the data les The major thrust behind the documentation and storage le is that we want to provide a clear data and analysis trail for our own use or for someone else s use should there be a need to revisit the data For any given situation ask yourself whether such documentation is necessary and if so how detailed it must be A l good test of the completeness and understandability of your documentation is to ask a colleague who is unfamiliar with your project but knowledgeable in your eld l to try to reconstruct and even redo the primary analyses you did If he or she can navigate through your documentation trail you have done the job l Example of Designed Experiment The following example is from RD Snee 1983 Graphical Analysis of Process Variation Studies77 Joumal of Quality Technology 15 76 88 In most industrial processes there are numerous sources of variation in the physical characteristics of the product being produced Frequently studies are conducted to investigate what aspects of the production process are the major causes of the variation For example a chemical analysis is performed on the raw materials prior to their injection into the process This analysis involves different specimens of the raw materials and is performed by numerous operators using a combustion furnace In order to investigate sources of variability for a chemical analysis an experiment was designed and analyzed to ensure that relevant sources of variation could be identi ed and measured The process engineer and operators discussed the situation and agreed that the four possible major sources of variation are 1 Operator 0 Variation due to operators systematically differing in the their adherence to the analytic procedurers to Specimen 80 Variation in specimens of raw materials analyzed by the same operator OJ Combustion Run RSO Variation in measurements from run to run in the furnace using the same specimen and operator due to varations in the conditions within the combustion furnace on any given running F Chemical Analysis ARSO Variation in the measurements of the chemical analysis performed on the material from a xed combustion run using same specimen and operator due to equipment or procedural variation The experiment was designed to measure the relative sizes of each of the four potential sources of variation Three operators were randomly selected to perform the analysis Each operator analyzed two specimens made three combustion runs on each specimen and titrated each run in duplicate The results of the experiment are displayed in the following table and gure Chemical Analysis Operator Specimen Run 1 1 1 1 156 154 2 151 154 3 154 160 2 1 148 150 2 154 157 3 147 149 2 3 1 125 125 2 94 95 3 98 102 4 1 118 124 2 112 117 3 98 110 3 5 1 184 184 2 172 186 3 181 191 6 1 172 176 2 181 184 3 175 177 Operator Ir 3 175 I IlIlI Operator k l RI R1 R 213 S s 1504I I I I 5 6 4978 NCAA of 3 g K39 RM KL R3 39 a Mchswawwvls S a I 5 Operatorkz 12539 I 39 o I 75 Figure Results of Study on Variation in Chemical Analysis Combustion Runs are boxed Duplicate analyses are connected by vertical lines The above gure highlights a major problem with the chemical analysis procedure There are de nite differences in the analytic results of the three operators Operator 1 exhibits very con sistent results for each of the two specimens and each of the three combustion runs Operator 2 produces analytic results which are lower on the average than those of the other two operators Operator 3 shows good consistency between the two specimens but the repeat analysis of two of the combustion runs on specimen 5 appear to have substantially larger variation than for most of the other repeat analysis in the data set Operator 2 likewise shows good average consistency for the two specimens but large variations both for the triplicate combustion runs for each specimen and for at least one of the repeat analysis for the fourth specimen A statistical analysis of the four variance components reveals the following per cent allocation of the total variation in the measurements 9467 Operator 000 Specimen 403 Combustion Run 122 Chemical Analysis 008 Other Sources The gures on the next page are plots from hypothetical experiments which would yield partic ularly distinct results 17 Experiment Having All Variation Due to Operator 20 Operatorka c Q 5 o a a Operatoer s S O C I I C 5 c 150 h l bquot v 43918 E 5 2 39 Operatoruz 125 100 53 4 75 l J I I l l l 2 3 5 6 Specimen Experiment Having 200 175 150 Response 100 75 125 L All Variation Due to Specimen Within Operator 39 39 c s I nuns 1 17 Studying the relationship Between Variable The Challenger Disaster The following example is from Hogg Ledoleter 1992 Applied Statistics for Engineers and Physi cal Scientist On January 28 1986 the Challenger space shuttle was launched from Cape Kennedy in Florida on a January morning Meteorologists on the previous day had predicted temperatures at launch to be around 30 F The night before launch there was much debate among engineers and NASA of cials whether such a low temperature launch was safe Several engineers advised against a launch because they thought that O ring failures were related to temperature Data on O ring failures experienced in previous launches were available and were studied the night before the launch There were seven previous launches in which O ring failures occurred A plot of the number of O ring failures versus temperature is given below Numbu u dim csxcd rings pcr launch 50 55 60 65 70 75 80 Temperature HF FIGURE 151 Scatter plot of number of distrcssed rings per launch against tem perature From this plot alone there does not seem to be a strong relationship between the number of O ring failures and temperature Based on this information it was decided to launch The launch resulted in disaster the loss of seven astronauts billions of dollars and a serious setback in the space program The major problem with the above plot is that the engineers did not display all the data that were relevant to the question of whether O ring failure is related to temperature They only looked at the launches where there were failures they ignored the launches where there were no failures A scatter plot of the number of O ring failures per launch against temperature using data from all previous shuttle launches is displayed here 19 Nulnhcn ul distressed rings per launch I e e Flights with no incidents 39 I 0 50 55 60 65 70 75 80 85 Temperature PF FIGURE 152 Scatter plot of number of distressed rings per launch against tem perature all data This plot reveals a relationship between failures and temperature Furthermore an extrapolation is required and that an inference about the number of failures outside the observed range of temperatures is needed The temperature at Challenger s launch was only 31 F while the lowest temperature recorded at a previous launch was 51 F It is always very dangerous to extrapolate inferences to a region of values for which there is not data If NASA of cials had looked at this plot certainly the launch would have been delayed This example illustrates why it is so important to have statistically minded engineers involved in important decisions Ron Snee a noted applied statistician has stated many times In God We Trust Others Must Have Data This example raises two important points The importance of scatter plots where we plot one variable against another variable Secondly is the importance of plotting relevant data In the Challenger study a scatter plot was used in reaching the decision to launch however not all the relevant data were utilized It takes knowledge of statistics to make good decisions as well as knowledge of the relevant subjects common sense and an ability to question the relevance of information 20


Buy Material

Are you sure you want to buy this material for

25 Karma

Buy Material

BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.


You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

Why people love StudySoup

Jim McGreen Ohio University

"Knowing I can count on the Elite Notetaker in my class allows me to focus on what the professor is saying instead of just scribbling notes the whole time and falling behind."

Janice Dongeun University of Washington

"I used the money I made selling my notes & study guides to pay for spring break in Olympia, Washington...which was Sweet!"

Bentley McCaw University of Florida

"I was shooting for a perfect 4.0 GPA this semester. Having StudySoup as a study aid was critical to helping me achieve my goal...and I nailed it!"


"Their 'Elite Notetakers' are making over $1,200/month in sales by creating high quality content that helps their classmates in a time of need."

Become an Elite Notetaker and start selling your notes online!

Refund Policy


All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email


StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here:

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.