All Chapter Notes
All Chapter Notes Math 202
Popular in INTRO TO STATISTCL METHODS II
Popular in Statistics
This 246 page Study Guide was uploaded by Marisa Keller on Tuesday August 4, 2015. The Study Guide belongs to Math 202 at University of Delaware taught by Anthony Seraphin in Summer 2015. Since its upload, it has received 1227 views. For similar materials see INTRO TO STATISTCL METHODS II in Statistics at University of Delaware.
Reviews for All Chapter Notes
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 08/04/15
Chapter 11 Categorical Data Analysis Categorical Data and the Multinomial Distribution Properties of the Multinomial Experiment 1 2 Experiment has 17 identical trials There are k possible outcomes to each trial called classes categories or cells Probabilities of the k outcomes remain constant from trial to trial Trials are independent Variables of interest are the cell counts n1 n2nk the number of observations that fall into each of the k classes Testing Category Probabilities OneWay Table In a multinomial experiment with categorical data from a single qualitative variable we summarize data in a oneway table Schema for oneway table for an experiment with k outcomes k1 k2 k n1 n2 nk Testing Category Probabilities OneWay Table Hypothesis Testing for a OneWay Table Based on the x2 statistic which allows comparison between the observed distribution of counts and an expected distribution of counts across the k classes Expected distribution Enknpk where n is the total number of trials and pk is the hypothesized probability of being in class k according to H0 k 2 The test statistic x2 is calculated as x2 43 E m and the rejection region is determined i1 by the x2 distribution using k1 df and the desired oc Testing Category Probabilities OneWay Table Hypothesis Testing for a OneWay Table The null hypothesis is often formulated as a no difference where H0 p1p2p3pk1k but can be formulated with nonequivalent probabilities Alternate hypothesis states that Ha at least one of the multinomial probabilities does not equal its hypothesized value Testing Category Probabilities OneWay Table Hypothesis Testing for a OneWay Table The null hypothesis is often formulated as a no difference where H0 p1p2p3pk1k but can be formulated with nonequivalent probabilities Alternate hypothesis states that Ha at least one of the multinomial probabilities does not equal its hypothesized value Testing Category Probabilities OneWay Table OneWay Tables an example HO pnone3910 pStandard3965 pMerit3925 Ha At least 2 proportions differ from proposed plan Rejection region with OLO1df k1 2 is 921034 Since the test statistic falls in the rejection region we reject HO A B l r 1 2 iCATEGOR f Observed Expected iNone 42 EU iStandard 355 390 indent 193 150 iTOTAL EDD B ElSignificance Level DDS 10Nurnberofcategories 3 lDegrees offreedorn 2 Total X p 2 Chisquare test statistic 1932550222 13pvalue 634908505 14 Copyright 2005 Pearson Prenti lllllllll c Testing Category Probabilities OneWay Table Conditions Required for a valid x2 Test Multinomial experiment has been conducted Sample size is large with Eni at least 5 for every cell Testing Category Probabilities TwoWay Contingency Table Used when classifying with two qualitative variables General r X c Contingency Table Column 1 2 c Row Totals 1 1111 n12 nlc R1 2 H21 H22 nzc R2 Row r nrl nrz 11c R1 Column Totals C1 C2 C n H0 The two classifications are independent Ha The two classifications are dependent Test Statistic X2 2 l where Rf Rejection regionx2gtx2a where XZQ has r1c1 df Testing Category Probabilities TwoWay Contingency Table Conditions Required for a valid x2 Test N observed counts are a random sample from the population of interest Sample size is large with Eni at least 5 for every cell Testing Category Probabilities TwoWay Contingency Table Sample Statistical package output GENDER 1 USER Crosstabulation USER Mail Net Both Total GENDER Male Count 34 T 31 122 Expected Count T26 113 3T4 1223 Female Count 1T3 36 11214 318 Expectad Count 1334 311 3TB 3131 Total Count 252 43 135 443 Expected Count 2820 430 1350 4400 ChiSquare Tests Asymp Sig Value df 2sided Pearson ChiSquare 52973 2 333 Likelihood Ratio 7135 2 1323 IQsancliattijorltlnear 43931 1 3903 N of Valid Cases 44 a 1 cells 0 have expected countless than 5 The minimum expected count is 1192 Copyright 2005 Pearson Prentice Hall Inc A Word of Caution about ChiSquare Tests When an expected cell count is less than 5 x2 probability distribution should not be used If H0 is not rejected do not accept Hothat the classifications are independent due to the implications of a Type II error Do not infer causality when H0 is rejected Contingency table analysis determines statistical dependence only Chapter 12 Simple Linear Regression Probabilistic Models General form of Probabilistic Models Y Deterministic Component Random Error where Ey Deterministic Component Probabilistic Models First Order StraightLine Probabilistic Model y o 1x5 Probabilistic Models 5 steps of Simple Linear Regression 1 2 Hypothesize the deterministic component Use sample data to estimate unknown model parameters Specify probability distribution of 8 estimate standard deviation of the distribution Statistically evaluate model usefulness Use for prediction estimatation once model is useful Fitting the Model The Least Squares Approach AdvertisingSales Data Month Advertising Expenditure x 1005 Sales Revenue y OOOS 1 1 1 2 2 1 3 3 2 4 4 2 5 5 4 y y A O i kNUJlk O H J 1 LI Fitting the Model The Least Squares Approach Least Squares Line j 50 3 1x has Sum of errors SE 0 Sum of Squared errors SSE is smallest of all straight line models Formulas A SSW A A Slope 51 SS yinteroept 50 y 3 1x XX SSW 209 C2 2x02 n SSW 2xi Cyi72xiyi2xi 2yi Fitting the Model The Least Squares Approach Preliminary Computations 35 yr xi 1 1 1 2 1 4 3 2 9 4 2 16 5 4 25 Totals 2x1 15 2y 10 2x3 55 1 2 23 4 5 Copyright 2005 Pearson Prentice Hall Inc Comparing Observed and Predicted Values for the Least Squares Prediction Equation x y 17x y y Y 1 1 6 4 16 2 1 13 3 09 3 2 20 00 00 4 2 27 7 49 5 4 34 6 36 Sum 0fErr0rs0 SSE 110 Model Assumptions Mean of the probability distribution of a is O Variance of the probability distribution of a is constant for all values of X Probability distribution of a is normal Values of a are independent of each other 2 c Copyright 2005 Pearson Prentice Hall Inc An Estimator of 02 I 2 I I The image cannot be displayed Your computer may not h enou me r 0 open the image or the image may have been corru ted Restart our com uter and 2 then open the file again If the red x still appears you ay to d t e i ge and then insert it again S Degrees Offreedam for ermr n 2 SSE ssyy 51 ssxy SSW y y2 y3 yi2 n s xs 2 Estimated Standard Errorof the Regression Model Making Inferences about the Slope 31 Sampling Distribution of 3 1 0 U 51 SSxx f3 Copyright 2005 Pearson Prenti c Making Inferenoes about the Slope 31 A Test of Model Utility Simple Linear Regression OneTailed Test TwoTailed Test HO B1O HO B1O Ha B1ltO or Ha B1gtO Ha B1 O T est statlstlc t s31 siSSXX Rejection region tlt ta Rejection region tgt ta2 ortlt ta when Ha B1gtO Where ta and to 2 are based on n2 degrees of freedom Making Inferences about the Slope 31 A 1001cx Confidence Interval for 31 51 i 052591 where 31 SS XX The Coefficient of Correlation r ssxx SSW a Positive r y increases as x increases d r 1 a perfect positive relationship between y and x b r near 0 little or no relationship between y and x x e r 1 a perfect negative relationship between y and x y A 00 9 0 0 o o o o O O o O o o O O 000 o o x c Negatlve r y decreases as x increases 3 it o 0 0 0 9 0 9 o a o o 0 O o x f r near 0 little or no linear relationship between y and x Copyright 2005 Pearson Prentice Hall Inc The Coefficient of Determination y y 00 0 00 0 0 o o o Ooo A o y yy o Ooooo 09quot 00 0 00 00 x a Scattergram of data b Assumption x contributes no information SS for predicting y 9 7 yy 1 yy yy 9BOB1x c Assumption x contributes information for predicting y 30 3x Copyright 2005 Pearson Prentice Hall Inc Using the Model for Estimation and Prediction Sampling errors and confidence intervals will be larger for Predictions than for Estimates Standard error of f a 0 2 1ltxp xgt n SS XX Standard error of the prediction Using the Model for Estimation and Prediction 1001d Confidence interval for Mean Value of y at xxp 2 ttaZS 1xP x n SS xx 1001d Confidence interval for an Individual New Value ofyatxxp 2 A 1 x x yita2S1 p gt where ta2 is based on n2 degrees of freedom Chapter 13 Multiple Regression and Model Building Multiple Regression Models The General Multiple Regression Model y 50 1x1 2x2 quot395kxk 8 y is the dependent variable x1x2xk are the independent variables Ey 3 0 3 1x13 2x2 3 kxk is the deterministic portion of the model i determines the contribution of the independent variable xi Multiple Regression Models 99quotPSDN Analyzing a Multiple Regression Model Hypothesize the deterministic component of the model Use sample data to estimate 160161162 3k Specify probability distribution of a and estimate 0 Check that assumptions on a are satisfied Statistically evaluate model usefulness Useful model used for prediction estimation other purposes The FirstOrder Model Estimating and Interpreting the Parameters For Ey o 51351 52352 53x3 4x4 55955 the chosen fitted model 9 3 0 3 1x1 3 ka 2 minimizes SSE 20 The FirstOrder Model Estimating and Interpreting the 3Parameters y 30 31X1 E2X2 E3X3 5 where Y Sales price dollars X1 Appraised land value dollars X2 Appraised improvements dollars X3 Area square feet The FirstOrder Model Estimating and Interpreting the Parameters Plot of data for sample size n20 Matrix Plot of SALEPRIC vs LANDVAL IMPROVAL AREA 0 40000 80000 I I I 120000 100000 D 30000 E D I 239 a 60000 O O s o O 0 40000 cg 0 20000 I I I I I I 0 10000 20000 800 1600 2400 LANDVAL IMPRDVAL AREA Copyright 2005 Pearson Prenti c The FirstOrder Model Estimating and Interpreting the Parameters Fit model to data Reression Analysis SALEPRIC versus LANDVAL IMPROVAL AREA The regression equation is SALEPRIC 14T0 0814 LANDUAL 0820 IMPROVAL 135 AREA Predictor Coefr SE Coef T P lConstant39 1amp0 5746 025 0801 lLKNUVALl 108145 05122 159 0131 lIHPEUVAL 4082046 02112 388 0001 lAREAI 13529 6586 205 0057 E 791948 R Sq 897 RSandj 878 Analysis of Variance Source DF 4 F P Regression 3 8779676741 2926558914 4666 0000 lResidual Error 16 391003491259 l52718304 Total 19 9783168000 Copyright 2005 Pearson Prentice Hall Inc The FirstOrder Model Estimating and Interpreting the 3Parameters Interpret 3 estimates Ey the mean sale price of the property is 3 8145 estimated to increase 8145 dollars for every 1 1 39 increase in appraised land value holding other variables constant Ey the mean sale price of the property is 8204 estimated to increase 8204 dollars for every 1 increase in appraised improvements holding other variables constant R ll Ey the mean sale price of the property is 5 13 53 estimated to increase 1353 dollars for additional 1 39 square foot of living area holding other variables constant The FirstOrder Model Estimating and Interpreting the 3Parameters Given the model Ey 1 2X1 X2 the effect of X2 on Ey holding X1 and X2 constant is y A O N U3 Igt U1 ON 1 00 I I I Copyright 2005 Pearson Prenti ccccccccc c The FirstOrder Model Estimating and Interpreting the 3Parameters Given the model Ey 1 2X1 X2 the effect of X2 on Ey holding X1 and X2 constant is y A y A CD N U3 Igt U1 ON 1 00 I I I I DJ I J Lot NmAuloxq IIIIII 0 Copyright 2005 Pearson Prenti ccccccccc c Model Assumptions Assumptions about Random Error 5 1 For any given set of values of X1 X2Xk the random error has a normal probability distribution with mean 0 and variance 02 2 The random errors are independent Estimators of 02 for a Multiple Regression Model with k Independent Variables SSE SSE 82 nNumber of Estimated 8 parameters nk1 Inferences about the 3Parameters 2 types of inferences can be made using either confidence intervals or hypothesis testing For any inferences to be made the assumptions made about the random error term a normal distribution with mean 0 and variance 02 independence or errors must be met Inferenoes about the Parameters A 1001a Confidence Interval for a Parameter 5i i taZSgi where to 2 is based on nk1 degrees of freedom and n Number of observations k1 Number of 3 parameters in the model Inferenees about the Parameters A Test of an Individual Parameter Coefficient OneTailed Test TWOTalled Test HO 81O H0 81O Ha ilt0 or Ha igtogt Ha pas 0 o 16 Test Statlstlc t s A 6 Rejeotion region tlt 4a Rejection or tlt 406 When Ha 8 1gtO region ltlgt t12 Where ta and 1W2 are based on nk1 degrees of freedom lnferences about the pParameters An Excel Analysis Regression Analysis Regression Statistics Multiple R 99448995 R Square 9892848918 Adjusted R Square 9884919359 Standard Error 1894848818 Observations 92 ANOVA Regression 2 Residual 29 Total 91 Intercept 4 Use for hypotheses about parameter coefficients ss 1148 F 428388288 14158148 129188181 5182285888 1 1815855 42882885 1389949 T93558913 3 14989231 929E1 Copyright 2005 Pee Hall Inc 78581 E98 Signi cance F 9 2 1 838E 1 5 534T522 49329 112211393 199783854 V Use for con dence Intervals Checking the Overall Utility of a Model 3 tests 1 Multiple coefficient of determination R2 SSE SSW SSE Explained variability R2 1 SS yy SS yy Total variability 2 Adjusted multiple coefficient of determination 2 quot1 SSE 111 2 Ra 1 n k1 nk1 1 K 3 Global Ftest T eststatistic F SSW SSEk RZk SSEn k11 R2n k1 Checking the Overall Utility of a Model Testing Global Usefulness of the Model The Analysis of Variance Ftest H0518 218218k20 Ha At least one 81 75 O SSW SSE k R2 k Mean Square Model SSEn k1 1 R2n k1 MeanSquareError Test statistic F Where n is the sample size and k is number of terms in the model Rejection region F gtF a With k numerator degrees of freedom and n k1 denominator degrees of freedom Checking the Overall Utility of a Model Checking the Utility of a Multiple Regression Model 1 Conduct a test of overall model adequacy using the Ftest If H0 is rejected proceed to step 2 2 Conduct ttests on 8 parameters of particular interest Using the Model for Estimation and Prediction As in Simple Linear Regression intervals around a predicted value will be wider than intervals around an estimated value Most statistics packages will print out both confidence and prediction intervals Predicted Values for New Observations New Obs Fit SE Fit 95 CI 95 PI 1 433 480 3392 5282 139 10412 Values of Predictors for New Observations New Obs DOTEST BEBlRAT B3BlRAT BHBlRAT STATUS DISTRICT BTPRATIO DAYSEST 1 497 107 108 119 0000000 100 0500 900 Copyright 2005 Pearson Prentice Hall lnc Model Building Interaction Models An Interaction Model relating Ey to Two Quantitative Independent Variables Eb 60 61351 zxz 163351352 where w gxz represents the change in Ey for every 1unit increase in x holding x2 fixed 162 3x1 represents the change in Ey for every 1unit increase in x2 holding x1 fixed Model Building Interaction Models When the relationship between two y When the linear relationship and xi is not impacted by a second x between y and xi depends on no interaction another X y 200 153 107 60 I o I J LOHthmmq O U a N O Copyright 2005 Pearson Prentice Hall Inc Model Building Interaction Models Reression Analysis PRICE versus AGE NUMBIDS AGEBID The regression equation is PRICE 328 888 AGE 933 NUHBIDS 138 AGEBID Predictor Coef Constant 3205 AGE 0878 NUHBIDS 9326 asaoi 12978 S 889145 RSq 954 RSqladj Analysis of Variance Source DF Regression Residual Error 28 Total 31 SS HS 4578427 1526142 221362 7986 49998 Copyright 2005 Pearson Prentice Hall Inc Price 6 biddets x2 1 5 bidders 1 Age of clock b Interaction between x1 and x2 Model Building Quadratic and other HigherOrder Models A Quadratic SecondOrder Model Ey o 1x 2x2 where 60 is the yinteroept of the curve 61 is a shift parameter 62 is the rate of curvature Model Building Quadratic and other HigherOrder Models Home SizeElectrical Usage Data Size of Home Monthly Usage x sq ft y kilowatthours 1290 1182 1350 1172 1470 1264 1600 1493 1710 1571 1840 1711 1980 1804 2230 1840 2400 195 2930 1954 USAGE 2000 1900 1800 1700 1600 1500 1400 1300 1200 1100 Scatterplot of USAGE vs SIZE l 1500 I l 2000 2500 SIZE Copyright 2005 Pearson Prentice Hall Inc I 3000 Model Build ing Quadratic and other Higher rder Models Regression Analysis Regression Statistics Multiple R 333333111 R Square 3331335324 Adjusted R Square 33T373331 Standard Error 4333133333 Observations 13 ANOVA df 88 MS T lSiQHi fibanoe F 7 Regression 2 3313335434 41553432 1337133341 33333E3i Residual 739 1533255333 2133334334 Total 3 3434321 553513 Standard Error t Stat Pvai ue Lower 95 Upper 95 Intercept 7121314333 2423333335 5333333472 3331553325 1T33233334 3413334T33 SIZE 23333331l 3245335332 3253233333 251335E35 131T32173 2333233537 SIZESQ 333345334 53333E35 1431733t353 33331244151 333353334 3333313343 Copyright 2005 Pearson Prentice Hall Inc USAGE Fitted Line Plot USAGE 1216 2399 SIZE 0000450 SIZE2 2000 1900 1800 1200 1600 1500 1400 1300 1200 1100 l 1500 l I 2000 2500 SIZE Copyright 2005 Pearson Prentice Hall Inc 9 12161 23989x 00045x2 I 3000 s RSq RSqtad 468313 982 973 Model Building Quadratic and other HigherOrder Models A Complete SecondOrder Model with Two Quantitative Independent Variables E y o Ax zxz 3x1x2 4x12 sxz2 where 60 is the yintercept value of Ey when x1x20 6162changes cause the surface to shift along the x1 and x2 axes 63 controls the rotation of the surface mg control the type of surface rates of curvature Model Building Quadratic and other HigherOrder Models Ey Ey Ey Copyright 2005 Pearson Prentice Hall Inc Model Building Qualitative Dummy Variable Models Dummy variables coded qualitative variables Codes are in the form of 1 O 1 being the presence of a condition 0 the absence Create Dummy variables so that there is one less dummy variable than categories of the qualitative variable of interest E y A Gender dummy variable coded as x 1 if male xO if female lf model is Ey8081x 8 captures the effect of being male on the dependent variable Mean salary Females Males Copyright 2005 Pearson Prentice Hall Inc Model Building Models with both Quantitative and Qualitative Variables Start with a first order model with one quantitative variable Ey 0 1x E9 O E y Adding a qualitative variable 3 Television CE Newspaper with no interaction 6 Radio 2 E 0 1x1 6239 633 0 x1 Advertising expenditure Copyri 000000000000000000 nti cccc II Inc Model Building Models with both Quantitative and Qualitative Variables Adding an interaction term EO 218081x1 3239 133x3 8430394r 5x1x3 J H V J Main effect Main effect Interaction x1 x2 and x3 E y 8 A E 5 Television g Newspaper a Radio a ct g I I I I I I I I x 0 5 10152025303540 Monthly advertising expenditure thousands of dollars Model Building Comparing Nested Models Models are nested if one model contains all the terms of the other model and at least one additional term Complete full model the more complex model Reduced model the simpler model Model Building Comparing Nested Models Models are nested if one model contains all the terms of the other model and at least one additional term Complete full model the more complex model EV o 1x 2x2 3x1x2 4x12 I39 sxz2 Reduced model the simpler model 1337 o Ax zxz 3x1x2 Model Building Comparing Nested Models FTest for Comparing Nested Models Reduced model Ey o lxl gxg Complete Model E y o lxl gxg ng1 kxk HO 18g1 18g28k0 Ha At least one 8 under test is nonzero Teststatistic F SSER SSEC g SSER SSEC 639S 5th WHO SSEC n k1 MSEC Rejection region F gtF a With kg numerator degrees of freedom and nk1 denominator degrees of freedom Model Building Stepwise Regression Used when a large set of independent variables Software packages will add in variables in order of explanatory value Decisions based on largest tvalues at each step Procedure is best used as a screening procedure only Residual Analysis Checking the Regression Assumptions Regression Residual the difference between an observed y value and its corresponding predicted value y Properties of Regression Residuals The mean of the residuals equals zero The standard deviation of the residuals is equal to the standard deviation of the fitted regression model Residual Analysis Checking the Regression Assumptions Residuals Versus SIZE response is USAGE Analyzing Residuals Top plot of residuals reveals Residua nonrandom pattern curved shape Copyright 2005 Pearson Prenti lllllllll c Residuals Versus SIZE response i ssss GE Second plot based on secondorder term being added to model results in random pattern better model Copyright 2005 Pearson Prenti lllllllll c Residual Analysis Checking the Regression Assumptions Identifying Outliers Residual plots can reveal outliers I Outliers need to be checked to try 0 I e z i to determine if error is involved o If error is involved or observation is not representative analysis can OO 0 be rerun after deleting data point I I I I to assess the effect Nulf ins f Copyright 2005 Pearson Prenti c Outlier Residual Analysis Checking the Regression Assumptions Checking for Normal Errors Histogram of the Residuals Histogram of the Residuals response is PRICE response is PRICE 14 9 a 12 7 10 6 a 3 5 B 5 5 3 3 8 2 4 u39 5 39 u 3 4 2 2 1 0 I I I I I I I I 0 I I I I I I I I 750 600 450 300 150 0 150 300 150 100 50 0 50 100 150 200 Residual Residual Normal Probability Plot of the Residuals Normal Probability Plot of the Residuals response is PRICE response is PRICE N 32 AD 1 228 PValue lt00USOO N 31 AD 0 1716 PValue 0923 Percent Percent 250 Residual 250 500 200 100 0 100 200 Residual 2005 Pearson Prentice Hall Inc Copyright 2005 Pearson Prentice Hall Inc With Outlier Without Outlier Residual Analysis Checking the Regression Assumptions Checking for Equal Variances C A y y it 0 oo o 80 0 o3 o o o o W o 0 0 3 we 08030 o 3 e 5 ago a 0 a 8 cpgnahgt 8 ooq 9 006 1 go g a 90 3939 0 quot 3 0 Residual o ltgt Residual o 00 5 g 39 0 08 a 62 s ltgt R681dua1 o 339 0 0 9 go Copyright 2005 Pearson Prentice Hall Inc Pattern in residuals indicate violation of equal variance assumption Can point to use of transformation on the dependent variable to stabilize variance Residual Analysis Checking the Regression Assumptions Steps in Residual Analysis 1 Check for misspecified model by plotting residuals against quantitative independent variables 2 Examine residual plots for outliers 3 Check for nonnormal error using frequency distribution of residuals 4 Check for unequal error variances using plots of residuals against predicted values Some Pitfalls Estimability Multicollinearity and Extrapolation Estimability the number of levels of observed xvalues must be one more than the order of the polynomial in x that you want to fit Multicollinearity when two or more independent variables are correlated Some Pitfalls Estimability Multicollinearity and Extrapolation Multicollinearity when two or more independent variables are correlated Leads to confusing misleading results incorrect parameter estimate signs Can be identified by checking correlations among X s nonsignificant for mostall X s signs opposite from expected in the estimated 8 parameters Can be addressed by Dropping one or more of the correlated variables in the model Restricting inferences to range of sample data not making inferences about individual 8 parameters based on ttests Some Pitfalls Estimability Multicollinearity and Extrapolation Extrapolation use of model to predict outside of range of sample data is dangerous Correlated Errors most common when working with time series data values of y and x s observed over a period of time Solution is to develop a time series model Chapter 15 Time Series Descriptive Analyses Models and Forecasting Descriptive Analysis Index Numbers Index Number a number that measures the change in a variable over time relative to the value of the variable during a specific base period Simple Index Number index based on the relative changes over time in the price or quantity of a single commodity Time series value at timer Index number at timer 100 Time series value at base period It 100 Y O Descriptive Analysis Index Numbers Laspeyres and Paasche Indexes compared The Laspeyres Index weights by the purchase quantities of the baseline period The Paasche Index weights by the purchase quantities of the period the index value represents Laspeyres Index is most appropriate when baseline purchase quantities are reasonable approximations of purchases in subsequent periods Paasche Index is most appropriate when you want to compare current to baseline prices at current purchase levels Descriptive Analysis Index Numbers Calculating a Laspeyres Index Collect price info for the k price series to be used denote as P P2tPkt Select a base period tO Collect purchase quantity info for base period denote as Q1t0 QZtOquotquot39thO Calculate weighted totals for each time period using the formula 3 k 11 EQITOIDIT Qil OBl Calculate the index using the formula It i1 x100 k 2 Descriptive Analysis Index Numbers Calculating a Paasche Index Collect price info for the k price series to be used denote as P P2tPkt Select a base period tO Collect purchase quantity info for every period denote as Q Q2t th Calculate the index for time t using the formula k EQZTIDZT I i f k x100 1 Qit 13170 i1 Descriptive Analysis Exponential Smoothing Exponential smoothing is a type of weighted average that applies a weight wto past and current values of the time series Exponential smoothing constant w lies between 0 and 1 and smoothed series Et is calculated as E1 Y1 E2 WY2 1 WE1 E3 WY3 1 WE2 M Er wYt 1 WEt1 Descriptive Analysis Exponential Smoothing Selection of smoothing constant W is made by researcher Small values of W give less weight to current value yield a smoother series Large values of W give more weight to current value yield a more variable series Time Series Components 4 components of time series models Tt secular or longterm trend Ct cyclical trend St seasonal effect Rt residual effect These 4 components are from a widely used model called the additive model YtTtCtStRt Forecasting Exponential Smoothing Calculation of Exponentially Smoothed Forecasts Calculate exponentially smoothed values E1 E2Et for observed time series Y1 Y2Yt Used last smoothed value to forecast the next time series value F Et 15 wK E t1 Assuming that Yt is relatively free of trend and seasonal components use the same forecast for all future values of Yt 12 F t t1 F F t3 t1 M Forecasting Trends The Holt Winters Model The HoltWinters model adds a trend component to the forecast Calculating Components of the HoltWinters Model 1 Select exponential smoothing constant w 2 Select trend smoothing constant v 3 Calculate the two components Et and Tt from time series Yt beginning at time t2 E2Y2 T2Y239Y1 E3WY31WE2T2 T3VE339E2139VT2 EtWYt 1 39WEt 1Tt1 TtVEt39Et11 39VTt1 Forecasting Trends The Holt Winters Model HoltWinters Forecasting 1 Calculate the exponentially smoothed and trend components Et and Tt for each observed value of Yt t 2 Calculate the onestepahead forecast using Ft1EtTt Calculate the kstepahead forecast using FtkEtkTt Measuring Forecast Accuracy MAD and RMSE Measures of Forecast Accuracy for m Forecasts Mean absolute Deviation MAD m gm E m Mean absolute percentage error MAPE m i1 M4D KE lt M4PE x100 m Root mean squared error RMSE Sm E2 RMSE H m Forecasting Trends Simple Linear Regression Simple Linear Regression is the simplest inferential forecasting model After fitting the regression line to existing data the least squares model can be used to forecast future values of the dependent variable Two problems are associated with using a LSM to forecast time series 1 Forecasting falls outside of the experimental region increases width of prediction intervals 2 Cyclical effects are not built in to the model and introduce the problem of correlated error Seasonal Regression Models Use of multiple regression model with dummy variables to describe seasonal differences is common In the following example dummy variables are set up for Quarters 1 2 and 3 Model that reflects seasonal component and expected growth in usage is 50 51f 2Q1 3Q2 4Q3 Seasonal Regression Models Data to be used is in the following table Quarterly Power Loads megawatts for a Southern Utility Company 19922003 Year Quarter Power Load Year Quarter Power Load 1992 1 688 1998 1 1306 2 650 2 1168 3 884 3 1442 4 690 4 1233 1993 1 836 1999 1 1423 2 697 2 1240 3 902 3 1461 4 725 4 1355 1994 1 1068 2000 1 1471 2 892 2 1193 3 1107 3 1382 4 917 4 1276 1995 1 1086 2001 1 1434 2 989 2 1340 3 1201 3 1596 4 1021 4 1351 1996 1 1131 2002 1 1495 2 942 2 1233 3 1205 3 1544 4 1074 4 1394 1997 1 1162 2003 1 1516 2 1044 2 1337 3 1317 3 1545 4 1179 4 1351 Seasonal Regression Models Result of regression analysis is The regression equation is LOAD 305 164 T 13 01 374 Q2 185 03 Predictor Coef 5E Coef T P Constant 70509 3116 2263 0000 T 163621 008214 1992 0000 01 13659 3217 425 0000 02 336 3212 116 0251 Q3 1840 3209 56 0000 S T85T95 R Sq 914 R Sqiadji 906 Analysis of Variance Source DF 55 HS F P Regression 4 283T50 7093 11488 0000 Residual Error 43 26551 617 Total 47 310301 Copyright 2005 Pearson Prentice Hall Inc Seasonal Regression Models Forecast results and actual values for 2004 are QTRPOWERforecasts MTW C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 T YEAR QTR LOAD Q1 Q2 Q3 PRED L095PLIM HI95PLIM 1 49 2004 1 1513 1 I 0 164341 14294 181389 2 5 2004 2 1329 0 1 El 148583 131535 165630 3 51 2004 3 1505 I U 1 12425 15538 18942 4 52 2004 4 1510 I I 0 155591 138544 172839 Copyright 2005 Pearson Prentice Hall lnc Seasonal Regression Models Use of multiplicative models often provides a better forecasting model when the time series is changing at an increasing rate over time Multiplicative model for Power Load problem would be 50 1t 2Q1 3Q2 4Q3 8 Taking antilogarithm of both sides shows multiplicative nature Y eXp5o eXP51texp52Q1 33Q2 34Q3 6Xp 9 Constant Secular Seasonal Component Residual trend Component Autooorrelation and the Durbin Watson Test A residual pattern as illustrated here suggests that autocorrelation may be an issue Autocorrelation is the correlation between time series residuals at different points in time Correlation between neighboring residuals is called firstorder autocorrelation Copyright 2005 Pearson Prenti lllllllll c Autooorrelation and the Durbin Watson Test DurbinWatson d statistic is calculated to test for the presence of firstorder autooorrelation n 2 gulf RH d 2 n 2 RangeofdzOsds4 2 i1 If residuals are uncorrelated then d z 2 If residuals are positively autooorrelated then dlt2 and if autooorrelation is very strong d z 0 If residuals are negatively autooorrelated then dgt2 and if autooorrelation is very strong d z 4 Autocorrelation and the Durbin Watson Test OneTailed Test TwoTailed Test H0No rstorder autocorreation HaPositive rstorder autocorreation or HaNegative rstorder autocorrelation Test Statistic Rt RHY d in Rejection region dlt dLia or 4dlt dLia if HaNegative rstorder autocorreation Where dm is the lower tabled value corresponding to k independent variables and n observations The corresponding upper value dujadefines a possibly significant region between dm and duja H0No rstorder autocorreation HaPositive or negative rstorder autocorreation I 0 V 1 140 1522 3 Rejection region evidence at CL 05 of positive autocorrelation Nonrejection region insufficient evidence at 0c 2 05 of positive autocorrelation Possibly significant autocorrelation ri ht 2005 Pearson Prentice Hall Rejection region dlt dmg or 4dlt dmg Where dmg is the lower tabled value corresponding to k independent variables and n observations The corresponding upper value duo2 defines a possibly significant region between dLOg and duo2 MATH202 Spring 2006 Exam 2a Instructions 1 Do not start until instructed to do so 2 If you brought a cell phone by mistake turn it off and place it under your seat You may NOT use it as a calculator 3 You may use a calculator NOT a cell phone calculator and a 3X5 card front and back with notes but nothing else 4 Code your UDelNet ID in the Last Name space on your scansheet and fill in the bubbles 5 Write your name in the White space below the name box on your scansheet 6 DO NOT put any part of your Social Security Number on your scansheet 7 Choose the best answer to each question 8 Use OL 05 Questions 1 8 A real estate agent wanted to develop a model to predict the selling price of a home The agent believed that its size and style Twostory Sidesplit Backsplit Ranch are two important variables related to selling price Let Scatterplot of y vs x1 A Style 0 Backsplit 250 A I Ranch 0 Sidesplit A AA A Twoiory A I 200 1 0 0 A M 9 I gt I A A A 150 9A A I O Q A I A I O I A O at 3 100 I o I o 50 I I I I I 1000 1500 2000 2500 3000 XI y selling price in thousands of dollars x1 house size in ftz x2 1 if Twostory 0 otherwise x3 1 if Sidesplit 0 otherwise x4 1 if Backsplit 0 otherwise These data were collected for a sample of houses and the following regression models were fit to the data Regression output for each model follows Y o 1x1 2x2 3x3 4x48 ModelB yI60162x2163x3 IB4X4 8 ModelC yZIBOIle1 8x128 y 60 61351 62352 63353 64x4 165751752 66x1x3 67x1x4 8 Model A Model D Model A Model B Predictor Coef SE Coef T P Predictor Coef SE Coef T P Constant 2748 1158 237 0020 Constant 141232 8316 1698 0000 x1 0063341 0005673 1117 0000 x2 2532 1038 244 0017 x2 20341 6878 296 0004 x3 460 1077 043 0670 x3 12591 7159 176 0082 x4 1546 1176 131 0192 x4 19325 7783 248 0015 s 362501 R Sq 77 R Sqadj 48 s 239637 R Sq 601 R Sqadj 584 Analysis of Variance Analysis of Variance Source DF SS MS F P Source DF SS MS F P Regression 3 10460 3487 265 0053 Regression 4 82056 20514 3572 0000 Residual Error 96 126151 1314 Residual Error 95 54555 574 Total 99 136611 Total 99 136611 Model C Predictor Coef SE Coef T P Constant 9010 4025 224 0027 x1 000932 004302 022 0829 x1Sq 000001421 000001104 129 0201 S 247072 R Sq 567 R Sqadj Analysis of variance Source DF SS MS F P Regression 2 77398 38699 6339 0000 Residual Error 97 59213 610 Total 99 136611 Model D Predictor Coef Constant 3673 x1 005819 x2 894 x3 3123 x4 1850 x1x2 001584 x1x3 001155 x1x4 002162 S 237031 R Sq Analysis of variance Source DF Regression 7 Residual Error 92 Total 99 SE Coef T P 2176 169 0095 001173 496 0000 2820 032 0752 2833 110 0273 3554 052 0604 001499 106 0293 001576 073 0466 001975 109 0277 622 R Sqadj 593 SS MS F P 84922 12132 2159 0000 51689 562 136611 1 Which model is best described by the phrase parallel lines a Model A b Model B c Model C d Model D e none of the above 2 Use Model D to predict selling price for a 2000 ft2 twostory house a 208720 b 107440 c 17585 d 175850 e none of the above 3 Use Model A to nd the estimated relationship between selling price and size for ranches y 2748 063341x1 y 79737 063341x1 f2 2748 063341x1 20341x2 12591x3 19325x4 d 6 9 47821 063341x1 y 1158 005673x1 Interpret the value of 3962 from Model B a For every additional square foot for twostory houses we estimate an average increase in selling price of 2532 holding the size of sidesplits and backsplits constant We estimate that on average selling prices for twostory houses are 25320 higher than they are for all other houses combined We estimate that on average selling prices for twostory houses are 4600 higher than they are for ranches We estimate that average selling prices for all twostory houses is 25320 We estimate that on average selling prices for twostory houses are 25320 higher than they are for ranches Compute adjusted R2 for Model C 567 563 553 558 none of the above The value of F in Model A F 3572 corresponds to which of the following hypothesis tests a b 603961 62 B3 640 Ha atleastoneIB 0 HO 1 2 3 4 Ha Ho MSRMSE Ha MSR MSE 61 62 B3 B40 Ha atleastone39B 0 11621631640 Ha161 162 163 164 0 Ho at least one 398 differs from the rest Ho Ho Using Model C is there enough evidence that the relationship between selling price and house size is curved a 9999 Yes since the pvalue 000 Yes since the pvalue 027 No since the pvalue 829 No since the pvalue 201 No since R2 is only 567 8 Calculate the value of the partial F statistic for comparing Model D to Model A a 1413 b 510 c 73 d 170 e 170 Questions 9 11 In recent years the growth of data communications networks has been amazing The cost of adding a new communications node at a location not currently included in the network was of concern for a major Fort Worth manufacturing company To try to predict the price of new communications nodes data were obtained on a sample of eXisting nodes The installation cost y and the number of ports X available for access in each eXisting node were readily available information A scatterplot of the data along with the Minitab output for the regression of y on the explanatory variable X are shown Scatter pl at of vsx Residual Plots fory 60000 y Normal Probability Plot of the Residuals Residuals Versus the Fitted Values O E 90 E U z 0 50000 0 0 g 50 1 10 g 2 1 a 39 gt 40000 Standardized Residual 20000 30000 Fi galue 50000 60000 Histogramof the Residuals ResidualsVersustheOrderof the Data 48 g 1 30000 E36 E G A o E 24 l V uquot 12 g 2 200 1390 2390 3390 4390 5390 6390 7390 a39 X 3925 39239 s39 fda lea es 3a 0395 1390 l 2 3 4 Saiwl fn 0313 11 12 l3 1 Regression Analysis y versus x The regression equation is y 16594 650 x Predictor Coef SE Coef T P Constant 16594 2687 618 0000 x 65017 6691 972 0000 s 4306 91 RSq 887 RSqadj 87 8 Analysis of Variance Source DF SS MS F P Regression 1 1751268376 1751268376 9441 0000 Residual Error 12 222594146 18549512 Total 13 1973862522 Unusual Observations Obs x y Fit SE Fit Residual St Resid 1 680 52388 60805 2414 8417 236R 10 240 23444 32198 1414 8754 215R R denotes an observation with a large standardized residual 9 Write the first order model that relates the cost y to the explanatory variable X Yi z o 161xi 81quot a b 60 Iglxi C 9i 60 lei 81quot d 60 l xi 16235 8i e oxi 18i 10 Is there sufficient evidence to conclude that a linear relationship exists between cost y and number of access ports X What are the null hypothesis and test statistic Hoz 0OT972 Hoz 1OT618 C Hoz 1OT972 d Haz 0 0T618 e Haz 0 0T972 11 Find a 95 confidence interval estimate of 3981 a 650 i 21796691 b 650 i 2179430691 c 972 i 21796691 d 972 i 2179430691 e 6691 i 2179650 Questions 12 13 What factor is most important in building a Winning baseball team Some might argue for a high batting average Or it might be a team that hits for power as measured by the number of home runs On the other hand many believe that it is quality pitching as measured by the earned run average of the team s pitchers Let y number of games won x 1 number of home runs h1t xz 39 average batt1ng average x3 earned run average These data were collected for the 30 major league baseball teams during the 2002 season and the model E 60 61x1 162352 163353 was fit to the data The results are shown below Regression Analysis y versus x1 x2 x3 The regression equation is y 219 00976 x1 606 x2 169 x3 Predictor Coef SE Coef T P VIF Constant 2188 2893 076 0456 x1 009759 003572 273 0011 11 x2 6063 1008 602 0000 12 x3 16897 1758 961 0000 11 s 500226 RSq 897 RSqadj 885 Analysis of Variance Source DF SS MS F P Regression 56616 0000 Residual Error Total 63122 Correlations x1 x2 x3 x1 x2 x2 0296 x3 0183 0343 12 Calculate the value of the global F statistic a 897 b 65363 c 7542 d 870 e none of the above 13 There is a serious problem with multicollinearity one of the predictor terms should be removed from the model a True b False Questions 14 16 A random sample of 42 firms was chosen from the SampP 500 firms listed in the Spring 2003 Special Issue of Business Week The Business Week Fifty Best Performers The dividend yield y and the 2002 earnings per share x were recorded for these 42 firms A scatterplot of these data along with the Minitab output are given below Residual Plots for y Normal Probability Plot of the Residuals Residuals Versus the tted Values on Scatterplot of y vs x 639 O o 0 o 1 0 0 0 Percent U1 C Standardized Residual o o 39 g Q Q 0 2 1 0 1 2 20 25 30 35 40 gt o 39 Slandardzed Residual Fitted Value 0 Histogram of the Residuals Residuals Versus the Order of the Data 1 39 co RAMAAAXAfM 0396 391 i a a a l m VUWVHVVW D l I i Standardzed Residual 1015205303540 Standardzed Residual Observation Order Regression Analysis y versus x The regression equation is y 99999999999999 Predictor Coef SE Coef T P Constant 20336 05405 376 0001 x 03740 02395 156 0126 s 184975 RSq 57 RSqadj 34 Analysis of variance Source DF SS MS F P Regression 1 8345 8345 244 0126 Residual Error 40 136864 3422 Total 41 145208 unusual Observations Obs x y Fit SE Fit Residual St Resid 22 465 1370 3773 0714 2403 141 X 23 193 6930 2755 0285 4175 228R 33 505 1670 3922 0803 2252 135 X 39 482 4790 3836 0751 0954 056 X R denotes an observation with a large standardized residual X denotes an observation whose X value gives it large influence 14 What is the estimated regression equation relating dividend yield to 2002 earnings per share o131x8 9 05405 20336x 3 20336 0374036 9 03740 20336x 32 156 37636 15 Interpret the coefficient 60 20336 a b Dividend yield is expected to be 203 on average when earnings per share are zero However this is not realistic since zero earnings per share is not close to the range of the sampled data Dividend yield is expected to be 203 on average when earnings per share are zero This is a realistic expectation since zero earnings per share is close to the range of the sampled data The interpretation of 80 20336 should not be attempted since one or more of the regression assumptions is violated Dividend yield is expected to increase on average when the 2002 earnings per share increase by one unit 16 Is the variable 2002 earnings per share a good predictor of dividend yield 17 a b Yes the variable 2002 earnings per share is a very good predictor of dividend yield since the sample test statistic of 156 gives a p value of 0126 Yes the variable 2002 earnings per share is a very good predictor of dividend yield since the regression explains about 945 of the variation in dividend yield No since the regression only explains 57 of the variation in dividend yield No since there are only four unusual observations in the sample data Yes since there appears to be no unusual pattems in the plots of the residuals A company that provides transportation services uses a telemarketing division to help sell its services The division manager is interested in the time spent on the phone by the telemarketers in the division Data on the number of months of employment MONTHS and the average number of calls placed per day for 20 working days CALLS is recorded for a sample of 20 employees The model 10 Residual Plots for CALLS Normal Probability Plot of the Residuals Residuals Versus the tted Values on 90 2 0 o E 39 i n n I 50 2 o a 3 o n 2 o 10 39 39 39 O 1 21 50 25 00 25 50 20 24 28 32 36 Residual Fitted Value Histogram of the Residuals Residuals Versus the Order of the Data 48 36 n A 24 39 R 12 39 0n 4 3 2 1 0 1 2 3 2 4 6 8 10 12 14 16 18 20 Residual Observation Order CALLS 60 BIMONTHS e Frequenq Resudual C N is fit to the data and some residual plots are shown here Which of the following can we conclude from the residual plots a The relationship between length of employment and number of calls seems to be fairly linear The straightline model may not be appropriate There is strong evidence of a violation of the normality assumption There is strong evidence of a violation of the constant variance assumption 9900quot all of the above 18 Fanfare International Inc designs distributes and markets ceiling fans and lighting fixtures The company s product line includes 120 basic models of ceiling fans and 138 compatible fan light kits and table lamps In the summer of 1994 Fanfare decided it needed to develop forecasts of future sales To do so it collected monthly data on y total monthly sales thousands of dollars x1 advertising expense thousands of dollars x2 housing starts thousands of units The model fit to the data is y 60 61x1 62352 163751752 8 Some regression analyses are shown below Regression Statistics Multiple R 0911542 R Square 0830909 11 Adjusted R Square 0818831 Standard Error 1704107 Observations 46 ANOVA Significance df 88 MS F F Regression 3 5993430 1997810 6879557 0000000 Residual 42 1219672 2903981 Total 45 7213102 Standard Coefficients Error tStat Pvalue Lower 95 Upper 95 Intercept 492131 2506023 196379 0056196 997867 1360539 x1 4786484 1371746 3489337 0001151 2018188 7554781 x2 1946096 2672634 7281566 0000000 1406737 2485456 x1x2 003282 0013983 23472 0023704 006104 00046 Is there enough evidence that the relationship between advertising expense and sales depends on housing starts Give the null hypothesis for this test a H0 11621630 b Ho 612 6220 Ho 6320 d Ho R2 0 none of the above Questions 19 21 The relationship between exchange rates and agricultural exports is of interest to agricultural economists One such export of interest is wheat The following data were analyzed y US wheat export shipments x the real index of weightedaverage exchange rates for the US dollar A scatterplot of the data and Minitab output are given below 12 Residuals Versus the Fitted Values 28m 3XO Fitted Value Residuals Versus the Order of the Data Residual Plots for Scatterplot of y vs x y Nonml Probability Plot of the Residuals 7000 I O 99 Q g 4 6000 15 9 3 m E E 5 5000 g 1 o 9 o 039 4 2 o 2 4 g 2600 gt 4000 0 o 0 0 o Standardized Residual as 39 o o 39o o O Histogram of the Residuals O 0 390 8 3000 b I 039 0 so 14 o 0 f 39 o O 00 0 i 20 a 2000 0 In 2 on o s a o 39 E 10 0 1000 I I I I I I I I I I n 39 E 2 80 90 100 110 120 130 140 150 160 170 2 1 o 1 2 3 4 X Standardized Residual Observation Order weeaseeeeasese Regression Analysis y versus x The regression equation is y 1969 786 x Predictor Coef SE Coef T P Constant 19691 4136 0000 x 7862 3760 0038 S 819383 RSq RSqadj 25 Analysis of variance Source DF SS MS F P Regression 1 2935648 2935648 437 0038 Residual Error 133 89294612 671388 Total 134 92230260 unusual Observations Obs x y Fit SE Fit Residual St Resid 93 107 52840 28123 707 24717 303R 129 151 66050 31568 1753 34482 431RX 130 154 37360 31782 1847 5578 070 X 131 151 26480 31589 1762 5109 064 X 132 156 35910 31950 1921 3960 050 X 133 160 18970 32290 2074 13320 168 X 134 166 23270 32742 2278 9472 120 X 135 166 15760 32737 2275 16977 216RX R denotes an observation with a large standardized residual X denotes an observation whose X value gives it large influence 19 Compute R2 a 25 b 968 c 229 d 32 13 20 Test the hypothesis that the variable real index of weightedaverage exchange rates for the US dollar x is a significant predictor of US wheat export shipments y The calculated test statistic and critical value for the test are a toalculated 209 tc cal 198 respectively b toalculated 476 tc cal 166 respectively c toalculated 209 tc cal 166 respectively d tcalculated 437 tc cal 198 respectively e tcalculated 476 tc cal 198 respectively 21 What is the interpretation of S 819383 a The predicted US wheat export shipments is accurate to within 139 2819383 units at the 005 level of signi cance b There is a 819 chance of a correct estimation of US wheat export shipments The predicted US wheat export shipments is accurate to within 139 2819383 units at the 001 level of signi cance d The predicted real index of weightedaverage exchange rates for the US dollar is accurate to within 139 2819383 units at the 005 level of significance e The regression explains 819383 of the variation in US wheat export shipments adjusted for degrees of freedom 22 Suppose a set of data provides the following results 102125 E 4398 SSXy 1796475 SSXX 1404355 n 8 Find the least squares point estimate of the slope 3981 39 a 01279 b 1548 c 257 d 01279 e 1548 Questions 23 25 An accountant wishes to predict direct labor cost y in hundreds of dollars on the basis of the batch size x of a product produced in a job shop A scatterplot and Minitab output follow Sc mm of y vs x Residual Plots for y Normal Probability Plot of the Residuals Residuals Versus the tted Values 1000 39 m U 10 g g n U 800 E E 10 10 1 741 600 20 10 0 10 20 0 250 500 750 1000 gt Residual Fitted Value 400 Histogram of the Residuals Residuals Versus the Order of the Data 0 60 U o 10 200 E 45 A Q g 3 O n VA A n 39 E 33 V O 15 10 I I I o 20 4o 60 80 100 M 741 x 15105051015 123456789101112 Residual Observation Order 14 Regression Analysis y versus x The regression equation is y 185 101 x Predictor Coef SE Coef T P Constant 18488 4677 395 0003 x 101463 00866 11713 0000 s 864154 RSq 999 RSqadj 999 Analysis of variance Source DF SS MS F P Regression 1 1024593 1024593 1372047 0000 Residual Error 10 747 75 Total 11 1025340 Obs x y Fit SE Fit Residual St Resid 1 5 71 00 69 22 432 178 024 2 62 66300 64756 287 1544 189 3 35 38100 37361 266 739 090 4 12 13800 14024 384 224 029 5 83 86100 86063 408 005 6 14 14500 16054 371 1554 199 7 46 49300 48522 249 778 094 8 52 54800 54609 255 191 023 9 23 25100 25185 317 085 011 10 100 102400 103311 533 911 134 11 41 43500 43448 253 006 12 75 77200 77946 356 746 095 Predicted values for New Observations New Obs Fit SE Fit 95 CI 95 PI 1 86063 408 85153 86973 83933 88192 values of Predictors for New Observations New Obs x 1 830 23 Interpret the meaning of 3981 1039146339 a We are 95 confident that the sample slope 8 1 1015 b Average direct labor cost is estimated to increase by about 1015 for every unit increase in batch size 15 c Average direct labor cost is estimated to be about 1015 when the batch size of the product is zero d Average direct labor cost is estimated to be about 1015 when the batch size of the product is zero e Estimated batch size of the product is eXpected to increase about 1015 on average for every 100 increase in direct labor cost 24 The accountant wishes to predict the mean direct labor cost for all batches of size 83 What is the interval estimate a 85153 86973 However this estimate is unreliable since the predictor value of 83 is not within the range of the sample data b 83933 88192 This estimate is reliable since the predicted fit value of 86063 falls within the computed interval c 85153 86973 This estimate is fairly reliable since the predictor value of 83 is within the range of the sample data d 83933 88192 This estimate is fairly reliable since the predictor value of 83 is within the range of the sample data 25 What is the residual for a point prediction of the direct labor cost for an individual batch of size 41 a The model over predicted direct labor cost by 52 b The model under predicted direct labor cost by 5 2 c The model over predicted direct labor cost by 394 d The model exactly predicted the direct labor cost for that individual batch size Exam 2 Solutions Correct choice Solution la Under Model A Twostory y 60 32 lel 8 Sidesplit y 30 33 An 8 Backsplit y 60 34 lel 8 Ranch y 60 31951 8 Parallel lines same slope different yintercept 2039 x12OOO x2 1 x 0 x4 0 y 3673 058192000 8941 31230 18500 00158420001 0115520000 0216220000 17585 or 175850 3b x220 x30 x40 y 2748 063341x1 203410 125910 193250 2748 063341x1 4e 16 5d R3 1 11 R2 1 amp1 567 558 n k1 n 21 6d 7d 89 F SSRC SSRRk g 84922 82056 7 4 1 70 MSEc 562 39 9a 100 11a A it S A 61 24 dfn k114 1112 t025 239179 120 FMSR SSRk 566163 75 42 MSE SSEn k 1 63122 5661630 3 1 39 13b all pairwise correlations are weak to modest VlFs are all lt 10 14c 15b 160 17b The plot of residuals vs fitted values y shows a pattern 18e HO 33 0 19d R2 SSR 2935648 SSW 92230260 39 20a t 61 0 7862 0 209 s A 3760 I31 dfn k1135 11133use120mtable rm 198 21a 22a A SS 61 W 1796475 001279 SSH 1404355 23b 24c Use Cl 25b y 7 435 18488 10146341 435 4344863 2 5137 17 MATH202 Spring 2006 Exam 1a Instructions 1 2 Do not start until instructed to do so If you brought a cell phone by mistake turn it off and place it under your seat You may NOT use it as a calculator You may use a calculator NOT a cell phone calculator and a 3X5 card front and back with notes but nothing else Code your UDelNet ID in the Last Name space on your scansheet and fill in the bubbles Write your name in the White space below the name box on your scansheet DO NOT put any part of your Social Security Number on your scansheet Choose the best answer to each question Use OL 05 Questions 1 4 A completely randomized experiment measured weight gain in grams of male rats under siX diets varying by source of protein beef cereal pork and level of protein high low Ten rats were randomly assigned to each diet Some data analyses are shown below General Linear Model weightgain versus ProSource ProLevel Interaction Plot data means for weight gain Analysis of Variance for weightgain h W Source DF 33 MS F P T 31 ProSource 2665 0541 quotmme ProLevel 31683 0000 m ProSourceProLevel 0073 g r Error 115860 W Total 16198 9 m S 146477 R Sq 2848 R Sqadj 2185 Tukey 950 Simultaneous Confidence Intervals Response Variable weightgain All Pairwise Comparisons among Levels of ProSource ProSource beef subtracted from ProSource Lower Center Upper cereal 1587 4700 6469 pork 1167 0500 10669 10 0 10 20 ProSource cereal subtracted from ProSource Lower Center Upper pork 6969 4200 1537 10 0 10 20 Tukey 950 Simultaneous Confidence Intervals Response Variable weightgain All Pairwise Comparisons among Levels of ProLevel ProLevel high subtracted from ProLevel Lower Center Upper low 2212 1453 6951 18 0 12 0 6 0 1 What are the factors a beef cereal pork 1 high low c protein source protein level d the siX diets 6 both a and b 2 Find the value of the F statistic for the test for interaction between protein source and protein level a 300 b 91 c 5891 d 11781 6 275 3 What does interaction mean in this problem a There are differences among the sources of protein with respect to average weight gain b There are differences among the levels of protein with respect to average weight gain 0 The effect of protein source on average weight gain depends on level of protein d Both a and b 4 Which of the following is the best conclusion a There is strong evidence of a difference between high and low levels of protein with respect to average weight gain b There is insufficient evidence of differences among the sources of protein with respect to average weight gain 0 The strong interaction effect prevents us from making conclusions about the effects of protein source and protein level separately d Both a and b Questions 5 8 An airline conducted an experiment to see whether callers would remain on hold longer on average if they heard an advertisement about the airline muzak or classical music when calling the tollfree number for reservations The company randomly selected 15 calls and then randomly assigned 5 calls to each of the three hold messages The time in minutes that the caller remained on hold before hanging up was measured Some data analyses are shown below Boxplot of Ad Muzak Classical Probability Plot of Ad Muzak Classical 16 Normal 95 Cl Ad m Muzak Ad Data Percent B O H O N O U1 O U1 B i U1 2 U1 Classical Pvalue 0860 Ad Muzak Classical 0 6 12 18 24 Oneway ANOVA Ad Muzak Classical Source DF SS MS F P Factor 2 1492 746 643 0013 Error 12 1392 116 Total 14 2884 S 3406 R Sq 5173 R Sqadj 4369 Individual 95 CIs For Mean Based on Pooled StDev Level N Mean StDev Ad 5 5400 4159 Muzak 5 2800 2387 Classical 5 10400 3435 00 40 80 120 Pooled StDev 3406 Tukey 95 Simultaneous Confidence Intervals All Pairwise Comparisons Individual confidence level Ad subtracted from Lower Center Muzak 8342 2600 Classical 0742 5000 Muzak subtracted from 9794 Upper 3142 10742 70 00 70 Upper 13342 70 00 70 Ha Ha Ha Ha Ha i Hm i No xa 7t xm 7t xc at least one proportion is different from the rest at least one mean is different from the rest Lower Center Classical 1858 7600 5 What are the null and alternative hypotheses a Ho uapmuc vs b Ho fa Cm Cc vs 0 Ho pa pIn pc vs d Ho fa Cm Cc vs 6 Ho uapmuc vs Ha at least one mean is different from the rest Which of the following is the correct conclusion a The small pvalue suggests that there are differences among the messages with respect to average hold time b The differences among the three sample means prove that there are differences among the messages with respect to average hold time c The pvalue is too small so we cannot reject Ho d There is not enough evidence of differences among the messages with respect to average hold time 6 Both c and d 7 According to the multiple comparisons analysis where are the differences a lie gt Ma b Ila gt 4n lie gt Ila c Magtllcgtllallcgtlim d Me gt Hm e Hm gt lie 8 Refer to the ANOVA table Which of the following formulas was used to compute 1392 SSE n k MST M SE k Zni xi 9 02 i1 n 2 d 2 xi i1 e i quotj i1 Questions 9 10 Refer to the four interaction plots below constructed from a sample of data Scatterplot of Mean Response vs FactorA 10 1l5 20 25 30 I I I I I H FactorB 1 2 3 Mean Response 10 15 20 25 30 FactorA Panel variable Graph 9 Which plot shows the most evidence of interaction a I b II c III d IV 10 Which is the best interpretation of plot IV a Plot IV suggests that neither factor A nor factor B has much of an effect on the mean response b Plot IV suggests that factor A has an effect on the mean response but factor B does not have much of an effect 0 Plot IV suggests that factor B has an effect on the mean response but factor A does not have much of an effect d Plot IV suggests that both factors A and B have an effect on the mean response 6 Plot IV suggests that factors A and B interact to affect the mean response Questions 11 13 A travel agent wanted to know whether there are differences among the average prices of Marriott Hyatt and Sheraton hotel rooms She knew that location of the hotel is a factor in determining the price so she blocked by city After randomly selecting several cities and obtaining the room rate for each hotel she obtained the following data ANOVA Source of Variation 55 df M5 F Pvaue F crit 246918 493836 147638 028014 332583 City 4 5 9 7 3 7 Hotel 832444 2 416222 124434 032916 410281 4 2 8 6 6 334490 334490 Error 2 10 2 664653 Total 1 17 11 How many cities did she select a 5 b 17 c 18 d 6 e 3 12 How many room rates did she obtain 5 b 17 c 18 d 6 e 3 a 13 Which of the following is the best conclusion The differences among the sample mean rates for the cities are statistically significant The differences among the sample mean rates for the hotels are statistically significant The differences among the sample mean rates for the hotels are not statistically significant The hotel factor is a significant source of room rate variation Both b and d Questions 14 amp 15 As part of a study to evaluate differences in education quality between two training centers A and B a standardized examination is given to individuals who are trained at the centers The difference between the mean examination scores is used to assess quality differences between the centers Independent simple random samples of n1 30 individuals from training center A and n2 40 individuals from training center B are taken The respective sample means are 371 82 and Z 78 It is known that the standard deviation of the standardized examination at both centers is 10 Do these data suggest a significant difference between the population means at the two training centers 14 What are the null and alternative hypotheses for the above test H0 uA uB 0VSHa uA uB gt0 H0 A uB OVSHa A uB 0 H0 KIM MB 7tOVSHa KIM MB 0 H0 AuB 0VS Ha A uB lt0 15 What inference can be drawn The test statistic of 166 gives a pvalue of 00970 The sample results provide insufficient evidence that the training centers differ in quality The test statistic of 166 gives a pvalue of 00485 The sample results provide sufficient evidence that the training centers differ in quality Since the pvalue of 00970 is greater than a 005 we should conclude that there is sufficient evidence that the training centers differ in quality d Since the pvalue of 00970 is greater than a 0025 we should conclude that there is sufficient evidence that the training centers differ in quality 6 Both c and d are correct Questions 16 amp 17 The College Board provided comparisons of Scholastic Aptitude Test SAT scores based on the highest level of education attained by the test taker s parents A research hypothesis was that students whose parents had attained a higher level of education would on average score higher on the SAT During 2003 the overall mean SAT verbal score was 507 The World Almanac 2004 SAT verbal scores for independent samples of students whose parents are college graduates and whose parents are high school graduates follow Minitab output is attached College Grads High School Grads 485 487 442 492 534 533 580 478 650 526 479 425 554 410 486 485 550 515 528 390 572 578 524 535 497 448 592 469 TwoSample TTest and Cl College Grads High School Grads Two sample T for College Grads vs High School Grads SE N Mean StDev Mean College Grads 16 5250 594 15 High School Grad 12 4870 517 15 Difference mu College Grads mu High School Grads 95 lower bound for difference 13280 T Test of difference 0 vs gt T Value 177 P Value 0044 DF 26 Both use Pooled StDev 563021 16 What is the point estimate of the difference between the means for the two populations a 77 b 507 c 38 d 0 e 506 17 What should the researcher conclude a There is insufficient evidence to indicate that there is a difference between average SAT scores of students whose parents had attainted a college degree and whose parents were only high school graduates b There is sufficient evidence to indicate that there is a difference between average SAT scores of students whose parents had attainted a college degree and whose parents were only high school graduates since the test statistic of 177 gives a pvalue of 0044 c No conclusion should be made because a paired sample ttest should have been performed d There is insufficient evidence to indicate that average SAT scores of students whose parents had attainted a college degree is higher than average scores of students whose parents were only high school graduates 6 Since the test statistic of 177 gives a pvalue of 0044 which is less than a 005 there is sufficient evidence from the sample data that average SAT scores of students whose parents had attainted a college degree is higher than average SAT scores of students whose parents were only high school graduates Questions 18 amp 19 In recent years a growing array of entertainment options competes for consumer time By 2004 cable television and radio surpassed broadcast television recorded music and the daily news to become the two entertainment media with the greatest usage The Wall Street Journal January 26 2004 Researchers used a sample of 15 individuals and collected data on the hours per week spent watching cable television and hours per week spent listening to the radio The data and the Minitab output follow Individual Television Radio Individual Television Radio 1 22 25 9 21 21 2 8 10 10 23 23 3 25 29 11 14 15 4 22 19 12 14 18 5 12 13 13 14 17 6 26 28 14 16 15 7 22 23 15 24 23 8 19 21 Paired TTest and Cl Television Radio Paired T for Television Radio N Mean StDev SE Mean Television 15 188000 54143 13980 Radio 15 200000 54248 14007 Difference 15 120000 197122 050897 95 CI for mean difference 229163 010837 T Test of mean difference 0 vs not 0 T Value P Value 0033 18 Do the data show a significant difference between the population mean usage for cable television and radio What is the critical tvalue for the test a tmoimojL761 b t00515 1753 0 t0025 14 2145 d t0025 15 2131 e tmoijL701 19 What is the value of the test statistic for the above test a 236 b 387 c 229 d 0033 e 12 Questions 20 amp 21 Many companies use wellknown celebrities as spokespersons in their TV advertisements A study was conducted to determine whether the proportion of identified products was less if the spokesperson was a male celebrity Each in a sample of 300 female TV viewers was asked to identify a product advertised by a celebrity spokesperson The gender of the spokesperson and whether or not the viewer could identify the product was recorded The numbers in each category are given below Male Female Total Celebrity Celebrity Identified 41 61 102 product Could not 109 89 198 identify Total 150 150 300 20 Which test should be used to determine if the male celebrities are less effective than female celebrities a Paired t test b Z test to compare two proportions c 95 test for equality of proportions d Pooled variance t test 6 Either b or c 21 Calculate the value of the test statistic and make your decision Assume the pvalue 0052 may not be the actual value but assume it is to answer this question a Since the calculated value 244 there is insufficient evidence to conclude male celebrities are less effective than female celebrities b Since the calculated value 244 we should conclude that male celebrities are more effective than female celebrities G Since the calculated value 3619 there is extremely strong evidence to conclude that male celebrities are less effective than female celebrities d Since the calculated value is close to 3 we should conclude that an unequal variance ttest should be used 6 None of the above answers is correct 22 A study of 202 married couples living in family housing at the University of Georgia was conducted to investigate the role of lifestyles in forming a marriage Based on their responses to a questionnaire each spouse was classified into one of the following five lifestyle types achieving avoiding control outdoing and pleasing The researchers crossclassified the 202 couples according to both wife s lifestyle type and husband s lifestyle type The purpose of this analysis is to determine whether husband s lifestyle type depends on wife s lifestyle type and vice versa Minitab output follows Interpret the pvalue of the test Expected counts are printed below observed counts Achievin Avoiding Control Outdoing Pleasing Total 1 7 7 10 6 4 34 623 640 842 690 606 2 5 6 11 8 8 38 696 715 941 771 677 3 11 7 15 10 8 51 934 959 1262 1035 909 4 5 12 11 11 7 46 843 865 1139 934 820 10 Total Chi Sq 16 9 04 37 0096 0 0 1 552 294 393 1 P Value 445 6 3 6 621 8 17 6 70 38 50 41 0057 0298 0118 0185 0270 0011 0701 0447 0012 1294 0013 0296 0007 3270 0073 0620 I OOOO 36 700 223 131 175 654 33 202 13715 a The evidence is not strong enough to show that there is a relationship between husband s and wife s lifestyles b The large pvalue indicates there is strong evidence of a relationship between husband s and wife s lifestyles c The evidence is not strong enough to show that there is a difference of proportions between husband s and wife s lifestyles d The evidence is strong enough to show there is a difference of proportions between husband s and wife s lifestyles 6 None of the above A twoway analysis of variance test should be used since the data are quantitative Questions 23 25 One criterion used to evaluate employees in the assembly section of a large factory is the number of defective pieces per 1000 parts produced The quality control department wants to find out whether there is a relationship between years of eXperience and defect rate Since the job is repetitious after the initial training period any improvement due to a learning effect might be offset by a loss of motivation A defect rate is calculated for each worker in a yearly evaluation The results for 100 workers are given in the table below Years Since Training Period lt 1 Year 1 4 Years 5 9 Years High 6 9 9 Defect Rate Average 9 19 23 Low 7 8 10 23 a Reject H0 if 12 b Reject H0 if 12 c Reject H0 if 12 d Reject H0 if 12 gt 16919 gt 15507 gt 11143 gt 9488 11 Find the rejection region necessary for testing whether there is a relationship between defect rate and years of experience 24 What is the expected number of employees with less than one year of training time and a high defect rate a 417 b 460 c 528 d 917 25 A test was conducted to determine if a relationship eXists between defect rate and years of experience Which of the following p values would indicate that defect rate and years of eXperience are dependent a 0045 b 0055 c 0074 d 0080 12 quot Wei31 Jin m i 3 93393 e ME 3 i I a Inf El WWI L Effart f ri Biz 2 M I WW T Em ViFSIW ET 939 lt3 NH i h Iquot 365 h gj L 1 WE 13 1 Eh Kg39e l 3 Eli F hf 39 I H5 33 1 5 UEE WW FER3157L 5 L J1 14 MATH202 Spring 2006 Final Exam a Instructions 1 2 Do not start until instructed to do so If you brought a cell phone by mistake turn it off and place it under your seat You may NOT use it as a calculator You may use a calculator NOT a cell phone calculator and three 3X5 cards front and back with notes but nothing else Code your UDelNet ID in the Last Name space on your scansheet and fill in the bubbles Write your name in the White space below the name box on your scansheet DO NOT put any part of your Social Security Number on your scansheet Choose the best answer to each question Use OL 05 Questions 1 7 The table at the left shows quarterly retail sales figures for J CPenney Year Quarter Sales 1n mllllons of dollars Some data analyses are shown below 1nclud1ng regress1on 1996 1 4452 output for the following models 1996 2 4507 1996 3 5537 y retall sales mllllons of dollars 1996 4 8157 Time time indeX 1 2 24 1997 1 6481 Q1 1 if quarter 1 0 otherwise 1997 2 6420 Q2 1 if quarter 2 0 otherwise 1997 3 7208 Q3 1 if quarter 3 0 otherwise 4139 I 1998 2 6483 Model 1 y o Bszme 8 1998 3 7129 Model 2 y 80 BlTime 8le B3Q2 B4Q3 8 1998 4 9072 1999 1 7339 MovingAverage Plotfor Sales 10000 Single Exponential Smoothing Plot for Sales I 2 7 1 I r 32 1999 3 7639 39 with 1999 4 9661 If I y 2000 1 7528 g IIIIIII 21 g 7000 W 2000 2 7207 6000 6000 3 5000 500039 4 IIIIIO 4000 I I I I I I I I I I I I 2001 1 7522 2 Al 6 239 1390 liaexlh 1396 1398 2390 2392 2394 2 4 6 8 10 libquot 16 18 20 22 24 2 2001 3 7729 Scatterplotof Sales FITSTvs Time from Model 1 Scatterplotof Sales FITSTS vsTinre from Model 2 9000 ng 10000 ilai ss MOdel 1 I 8000 900039 i O n g g Analysis Sales gt gt versus Time W m The regression 0 5 10 me 15 2390 2395 0 5 10 me 15 2390 2395 equation is Sales 5903 119 Time Predictor Coef SE Coef T P Constant 59032 4929 1198 0000 Time 11875 3449 344 0002 s 116971 R Sq 350 R Sqadj 321 Analysis of Variance Source DF SS MS F P Regression 1 16217509 16217509 1185 0002 Residual Error 22 30100626 1368210 Total 23 46318136 Model 2 Regression Analysis Sales versus Time Q1 02 Q3 The regression equation is Sales 7859 995 Time 2274 Q1 2565 Q2 2023 Q3 Predictor Coef SE Coef T P Constant 78588 3313 2372 0000 Time 9954 1693 588 0000 Q1 22742 3311 687 0000 Q2 25646 3289 780 0000 Q3 20228 3276 617 0000 s 566720 R Sq 868 R Sqadj 841 Analysis of Variance Source DF SS MS F P Regression 4 40215885 10053971 3130 0000 Residual Error 19 6102250 321171 Total 23 46318136 1 Use single exponential smoothing with w 6 to find the smoothed value for 1996 quarter 2 a 4474 b 4832 c 4480 d 4485 e 4507 Using the HoltWinter s smoothing method with w 2 and V 4 the smoothed value and trend estimates for 2001 quarter 4 are 80055 and 499 respectively Use these to forecast retail sales in millions for 2002 quarter 3 a 80055 b 80085 c 80155 d 81552 e 96917 Which time series components are apparent in this time series a residual only b trend and seasonal only c trend and cyclical and residual d seasonal and residual only e trend and seasonal and residual Use model 2 to predict retail sales in millions for 2002 quarter 4 a 378432 b 1064592 c 825696 d 862312 e 785834 How do the moving average length 3 and the single exponential smoothing W 6 models shown above compare in terms of forecast accuracy a The moving average model is better because it has higher accuracy measures b The single exponential smoothing model is better because it has lower accuracy measures c The single exponential smoothing model is better because it has a higher R2 d The moving average model is better because it has a lower pvalue Consider Model 2 and the hypotheses Ho 32 33 34 0 vs Ha at least 1 3 i 0 Which research question is this test designed to answer a Is Model 2 useful for forecasting retail sales b Is there enough evidence of an interaction between time and season c Are there differences among the slopes for quarters 2 3 and 4 d Is there enough evidence of a seasonal component present in the time series e Is there enough evidence of a linear trend present in the time series Calculate the partial Ftest statistic to compare Model 2 to Model 1 a 585 b 80 c 187 d 2491 e none of these A manufacturing plant receives 300 lots of material from outside vendors per week As part of an effort to reduce errors in the system of placing and filling orders you will monitor the proportion of rejected lots each week How should you calculate the control limits for the appropriate control chart a E13055 d2 b 13i3p1p n i342 d ED3 D4 Ed e Xi3 l2 J Questions 9 12 Times in seconds to answer calls to a corporate customer service center are randomly selected 6 from each shift for 20 consecutive shifts A control chart of the data is shown below Note that E 5 32 Sanple Mean Xbar Chart ofTimal Tima6 50 40 A JL v v 30 X3l38 2039 10 1391 1393 1395 1397 1399 Sarrple 10 11 12 13 14 Find the upper control limit a 571 b 57 c 409 d 1066 e 575 Which of the following is the best interpretation of the control chart The process is in control because there are no points beyond the control limits The process is out of control because there is a downward trend over time The process is out of control because there are 2 out of 3 points in a row in zone A The process is out of control because there is too much variation The process is in control because there are about the same number of points on either side of the centerline 0999 If you wanted to construct a control chart to monitor the variation in times to answer calls how should you construct the control limits for the appropriate chart a ED3 D4 b 1713 190 19 11 c iiAZE Ed d H3 2 Suppose the customer service staff on one particular shift get along so well that they frequently stand around someone s cubicle and talk As a result answering time for calls received on this shift tends to be longer on average Will this particular sampling scheme be able to detect this problem a Yes b No Which of the following is true a The HoltWinter s method accounts for a trend component b Using w 8 in single exponential smoothing will follow the original series more closely than using w 4 c A moving average of length 5 will produce a smoother series than a moving average of length 3 d b and c only e a b and c Let K be the value of a time series at time t let 1 1 2 n be a time indeX and let D 1 if season A 0 if season B there are only two seasons Which of the following models is appropriate to model only the seasonal and residual components of this time series 3 Yt1308t b Yt80BlDet 0 Yt130131t182D8t d Y 80 Blt32DB3Dt8t e Y 30 31t8t Questions 15 amp 16 In recent years a growing array of entertainment options compete for consumer time By 2004 cable television and radio surpassed broadcast television recorded music and the daily news to become the two entertainment media with the greatest usage The Wall Street Journal January 26 2004 Researchers used a sample of 15 individuals and collected data on the hours per week spent watching cable television and hours per week spent listening to the radio The data and the Minitab output follow Individual Television Radio Individual Television Radio 1 22 25 9 21 21 2 8 10 10 23 23 3 25 29 11 14 15 4 22 19 12 14 18 5 12 13 13 14 17 6 26 28 14 16 15 7 22 23 15 24 23 8 19 21 Paired TTest and Cl Television Radio Paired T for Television Radio N Mean StDev SE Mean Television 15 188000 54143 13980 Radio 15 200000 54248 14007 Difference 15 120000 197122 050897 95 CI for mean difference 229163 010837 T Test of mean difference 0 vs not 0 T Value P Value 0033 15 Do the data show a significant difference between the population mean usage for cable television and radio What is the critical tvalue for the test tmoimojL761 t00515 t0025 14 2145 t0025 15 2131 000523 0999 s 16 What is the point estimate for the difference between the population mean usage for cable television and radio a 236 b 387 c 229 d 0033 e 12 17 Many companies use wellknown celebrities as spokespersons in their TV advertisements A study was conducted to determine whether the proportion of identified products was the same for male and female celebrity spokespersons Each in a sample of 300 female TV viewers was asked to identify a product advertised by a celebrity spokesperson The gender of the spokesperson and whether or not the viewer could identify the product was recorded The numbers in each category are given below Male Female Total Celebrity Celebrity Identified 41 61 102 product Could not 109 89 198 identify Total 150 150 300 Which test should be used to determine if there is a difference in effectiveness between male celebrities and their female counterparts Paired t test Z test to compare two proportions x2 test for equality of proportions Pooled variance t test Either b or c 0999 Questions 18 19 Fanfare International Inc designs distributes and markets ceiling fans and lighting fixtures The company s product line includes 120 basic models of ceiling fans and 138 compatible fan light kits and table lamps In the summer of 1994 Fanfare decided it needed to develop forecasts of future sales To do so it collected monthly data on y total monthly sales thousands of dollars x1 advertising eXpense thousands of dollars x2 housing starts thousands of units The model fit to the data is y 80 181x1 182352 183x1x2 8 Some regression analyses are shown below Regression Statistics Multiple R 0911542 R Square 0830909 Adjusted R Square 0818831 Standard Error 1704107 Observations 46 ANOVA Significance df SS MS F F Regression 3 5993430 1997810 6879557 0000000 Residual 42 1219672 2903981 Total 45 7213102 Standard Coefficients Error tStat Pvalue Lower 95 Upper 95 Intercept 492131 2506023 196379 0056196 997867 1360539 x1 4786484 1371746 3489337 0001151 2018188 7554781 x2 1946096 2672634 7281566 0000000 1406737 2485456 x1x2 003282 0013983 23472 0023704 006104 00046 18 Is there enough evidence that the relationship between advertising expense and sales depends on housing starts Give the null hypothesis for this test a Ho 3132 B3 0 b Ho 31le 0 Ho 33 0 d Ho R2 0 none of the above 19 a 831 b 819 c 181 20 How much variation in total sales is NOT eXplained by this regression model d 169 e 88 A company that provides transportation services uses a telemarketing division to help sell its services The division manager is interested in the time spent on the phone by the telemarketers in the division Data on the number of months of employment MONTHS and the average number of calls placed per day for 20 working days CALLS is recorded for a sample of 20 employees The model CALLS 80 BIMONTHS 8 is fit to the data and some residual plots are shown here Percent Which of the following interpretations of the residual plots is incorrect a There is no relationship between length of employment and number of calls b There is strong evidence that the straightline model is appropriate Frequency Residual Plots for CALLS Normal Probability Plot of the Residuals m 90 50 10 Residuals Versus the tted Values Residual 50 25 00 25 50 Residual Histogram of the Residuals Residual 20 24 28 32 36 Fitted Value Residuals Versus the Order of the Data NWA V w c There is strong evidence of a violation of the Residual 2 4 6 8 101214161820 Observation Order normality assumption d There is strong evidence that the residuals are independent e all of the above interpretations are incorrect 21 A study of artificial sweeteners and weight is conducted using a random sample of subjects Each subject is asked if they use artificial sweeteners in place of sugar and their weight is measured We d like to know if people who use artificial sweeteners in place of sugar are heavier on average than those who use sugar Which statistical method would be appropriate to analyze the data a simple linear regression time series analysis paired t test or paired ztest independent ttest or independent ztest twoway ANOVA 0900quot Questions 22 amp 23 An accountant wishes to predict direct labor cost y in hundreds of dollars on the basis of the batch size X of a product produced in a job shop A scatterplot and Minitab output follow scatterplot ofyvsx Residual Plots fory Nomial Probability Plot of the Residuals Residuals Versus the tted Values 1000 m 90 10 5 50 E quot 39 39 800 E E o 39 10 10 0 600 1 7n 0 250 500 750 1000 gt Residual Fitted Value 400 Histogram of the Residuals Residuals Versus the Order of the Data 50 4 200 E 45 10 A 0 g 30 0 V N d 3 2390 4390 60 8390 100 M 7n x 451050 5 1015 123456789101112 Residual Observation Order Regressmn AnalySIs y versus x The regression equation is y 185 101 X Predictor Coef SE Coef T P Constant 18488 4677 395 0003 x 101463 00866 11713 0000 S 864154 R Sq 999 R Sqadj 999 Analysis of Variance Source DF SS MS F P Regression 1 1024593 1024593 1372047 0000 Residual Error 10 747 75 Total 11 1025340 Obs x y Fit SE Fit Residual St Resid 1 5 7100 6922 432 178 024 2 62 66300 64756 287 1544 189 3 35 38100 37361 266 739 090 4 12 13800 14024 384 224 029 5 83 86100 86063 408 005 6 14 14500 16054 371 1554 199 7 46 49300 48522 249 778 094 8 52 54800 54609 255 191 023 9 23 25100 25185 317 011 10 100 102400 103311 533 911 134 11 41 43500 43448 253 006 12 75 77200 77946 356 746 095 Predicted Values for New Observations New Obs Fit SE Fit 95 CI 95 PI 1 86063 408 85153 86973 83933 88192 Values of Predictors for New Observations New Obs X 1 830 22 What is the residual for a point prediction of the direct labor cost for an individual batch of size 23 a The model over predicted direct labor cost by 85 b The model under predicted direct labor cost by 85 c The model over predicted direct labor cost by 210 d The model exactly predicted the direct labor cost for that individual batch size 23 Find a 95 confidence interval estimate for l a 10146 i 00872228 b 10146 i 00871812 c 18488 i 00871812 d 18488 i 00872228 e 10146 i 86412228 24 A random sample of 42 firms was chosen from the SampP 500 rms listed in the Spring 2003 Special Issue of Business Week The Business Week Fifty Best Performers The dividend yield y and the 2002 earnings per share x were recorded for these 42 firms A scatterplot of these data along with the Minitab output are given below Residual Plots for Scatterplot of y vs x y Normal Probability Plot of the Residuals Residuals Versus the tted Values 7 00 o g 2 90 E O 0 6 o 39 g a 1 o O 3 50 395 O o g g c 0 o o 5 O O 10 g 0 1 o39 0 0 g o o 4 39 2 1 o 1 2 20 25 30 35 40 gt o Standardized Residual Fitted Value 3 039 0 Histogram of the Residuals Residuals Versus the Order of the Data 0 8 2 39 o 0 g Q 0 O 6 1 39 i ii A M o o E 4 n i A A h u f 0 o a m2 1VUVJV Jr 0 1 2 3 4 5 n a x o 1 2 151052025303540 Standardized Residual Observation Order Regression Analysis y versus x The regression equation is y 99999999999999 Predictor Coef SE Coef T P 10 Constant 20336 05405 376 0001 x 03740 02395 156 0126 s 184975 R Sq 57 R Sqadj 34 Analysis of Variance Source DF SS MS F P Regression 1 8345 8345 244 0126 Residual Error 40 136864 3422 Total 41 145208 Interpret the coefficient 31 03740 a Dividend yield is expected to increase by 03740 on average when earnings per share are zero However this is not realistic since zero earnings per share is not close to the range of the sampled data b Dividend yield is expected to increase by 03740 on average when earnings per share are zero This is a realistic expectation since zero earnings per share is close to the range of the sampled data c The interpretation of l 03740 should not be attempted since one or more of the regression assumptions is violated d Dividend yield is expected to increase by 03740 on average when the 2002 earnings per share increase by one unit Questions 25 27 A company wishes to evaluate the effect of package design on one of its products a certain brand of cereal The four package designs are to be tested in different stores throughout a large city There are 28 stores available for the study Cereal sales are known to vary depending on the size of the stores so the company decides to use size as an additional variable in an attempt to minimize variation due to randomness The 28 stores are divided into seven groups of four stores each by size The table below illustrates the design The four package designs are randomized within each size The randomization is performed independently between sizes Minitab output is attached Sizes Packagel Package2 Package3 Pac kage4 1 40 23 17 33 2 32 45 40 25 3 43 3 1 3 8 47 4 44 41 56 45 5 43 60 47 64 6 40 45 38 45 7 43 41 38 25 General Linear Model SALES versus PACKAGE SIZE Factor Type Levels Values PACKAGE fixed 4 1 2 3 4 SIZE fixed 7 1 2 3 4 5 6 7 Analysis of Variance for SALES using Adjusted SS for Tests Source DF Seq SS Adj SS Adj MS F P 11 PACKAGE 3 1325 1325 442 006 0981 SIZE 6 158686 158686 26448 354 0017 Error 18 134600 134600 7478 Total 27 294611 S 864741 R Sq 5431 R Sqadj 3147 Tukey 950 Simultaneous Confidence Intervals Response Variable SALES All Pairwise Comparisons among Levels of PACKAGE PACKAGE 1 subtracted from PACKAGE Lower Center Upper 2 1293 0143 1322 3 1465 1571 1150 4 1322 0143 1293 8 0 00 8 0 PACKAGE 2 subtracted from PACKAGE Lower Center Upper 3 1479 1714 1136 4 1336 0286 1279 8 0 00 8 0 PACKAGE 3 subtracted from PACKAGE Lower Center Upper 4 1165 1429 1450 8 0 00 8 0 25 Which is the blocking variable and its degrees of freedom a PACKAGE DF 3 b PACKAGE DF 4 C SIZE DF 6 d SIZE DF 7 e ERROR DF 18 26 Is the company convinced that the use of blocking has helped and should continue a There is insufficient evidence that the use of blocking has helped since the pvalue 0981 b There is strong evidence that the use of blocking has helped pvalue 0017 and should continue c There is strong evidence that the use of blocking has helped pvalue 0017 but should be discontinued since blocking will only increase variation d There is insufficient evidence that the use of blocking has helped since the pvalue 0017 e None of the above A Twoway AN OVA test with replication should have been performed 27 Is there a difference in sales because of package design What is the null hypothesis 21 H0 designl design2 design3 design4 0 b H0 designl 7t designZ 7t design3 7t design4 i 0 039 HO designl design2 design3 design4 12 d H0 ludesignl ludesign2 ludesign3 ludesign4 ludesignS ludesign6 ludesigr 0 e H 0 ludesignl i ludesign2 i ludesign3 i ludesign4 i ludesignS i ludesign6 i ludesign7 i 0 Questions 28 30 Suppose a racecar driver wants to compare mean quarter mile times associated with three different brands of tires He employs a completely randomized design The times in seconds were recorded for each set of tires for ten runs per set and tabulated below Brand A Brand B Brand C 15356 15546 15632 15723 15534 15576 15349 15524 15620 15473 15510 15536 15444 15572 15602 15463 15490 15532 15436 15583 15519 15425 15549 15604 15389 15583 15602 28 Suppose the calculated F statistic exceeds the value of the critical F What is the significance of this statement a The two sources of variation between the treatment means and within treatments are approximately equal b The population treatment means are not different c The variation among treatment means exceeds the withintreatment variation by more than what we d expect from chance alone d None of the above choices is correct The racecar driver has decided that his speed is in uenced by a combination of factors namely the brand of tires and the type of oil in the car The driver decides to investigate this by performing a full factorial experiment for the three brands of tires and two brands of oil The ANOVA summary is given below Source of SS df MS F Pvalue F critical Variation Oil 00401 1 00401 57801 00272 44139 Tires 00406 2 00203 29243 00795 35546 OilTires 00991 2 00496 71435 00052 35546 Error 01249 18 00069 Total 03047 23 29 Based on these results would a researcher conclude that the brand of oil affects the speed of the car a Yes the calculated F value exceeds the critical F value Therefore it is safe to conclude that the variation caused by the type of oil is greater than the variation due to the experiment random error b No the calculated F value exceeds the critical F value Therefore it is safe to conclude that the variation caused by the type of oil is less than the variation due to the treatment c No the pvalue for OilTires source is small Therefore the researcher should conclude that the variation is due to an interaction between the oil and tires not due to the oil alone d Yes the pvalue for OilTires source is small Therefore the researcher should conclude that the variation is due to the oil not due to an interaction between the tires and the type of oil used 13 30 Would it be appropriate to conduct a multiple comparisons analysis among the levels of tires a No since there is significant evidence of interaction between tires and the type of oil b No since there is no significant evidence of interaction between tires and the type of oil However there is significant evidence that the variation due to the type of tire is greater than the variation due to the eXperiment error c Yes since there is no significant evidence of interaction between tires and type of oil d Yes since there is no significant evidence of interaction between tires and the type of oil and there is significant evidence that the variation due to the type of tire is greater than the variation due to the eXperiment error e Yes The pvalues for both interaction and the type of tire are less than the given level of significance l4 Final Exam Solutions Correct choice Solution 1d E1 Y1 4452 E2 wY2 1 wE1 64507 44452 4485 2d E24 80055 T24 499 F27 E24 3T24 80055 3499 81552 3e 4b t28 010 020 030 F28 52 78588 995428 227420 256460 202280 10645 5b 6d 7d F SSRC SSRRk g 40215885 16217509 4 1 24 MSEC 321171 39 8b a pchart would be the appropriate chart 9a n 6 2c A21 3138 483532 571 10c 11a an Rchart would be the appropriate chart 12a 13e 14b 15c paired df nd 1 15 1 14 twosided 16e 17e can use either since test for two proportions is twosided 18c 19d 1 R21831169 20e 21d 22a 52 1848810146323 25185 y 52 251 25185 85 23a 81 i tor2S 1 dfn k112 1 110 24d 25c 26b 27c 28c 29c 30a 15 MBUIBM l l Welcome Keller Marisa Quit amp Save EMapleTA Math 202 2015 Summer Chapter 13 quiz View Grade Help M View Details Feedback Details Report Math 202 2015 Summer Chapter 13 quiz Keller Marisa 73015 at 215 AM X INCORRECT Question 1 Score 01 Which of the following is the definition of the root mean squared error Your Answer Correct Answer i m 2 Z 1 F1 t 1 l m Using double exponential forecasting with w 045 and v 06 find the forecasted value for the 1st year after the recorded data Question 2 Score 01 INCORRECT Y1 260 Y2 235 Y3 300 What is the forecast for the 1st year Your Answer Correct Answer 249805 INCORRECT Question 3 Score 01 Forecasted methods predicted over three years goods sold would be 290 225 300 If the actual results turned out to be 10f10A20rea20f20 20rea30 X 2958 23625 309 respectively what is the RMSE INCORRECT V 5 Q Your Answer Correct Sqrtrea Answer f30quot2030i The trend smoothing constant v when closer to 0 causes the new series to have a series trending closer to the past rather Question 4 Score 01 than recent changes Your Answer Correct Answer True Question 5 Score 01 The inferential forecasting model is better known for this basic algebraic concept of x INCORRECI Your Answer Correct Answer a he Question 6 Score 01 Forecasted methods predicted over three years goods sold would be 285 200 355 If the actual results turned out to be x 29355 192 37275 respectively what is the MAPE INCOME Your Answer Correct abS Answer abSr Question 7 Score 01 Using double exponential forecasting with w 07 and v 045 find the forecasted value for the 4th year after the recorded data Y1 290 Y2 295 x Y3 345 What is the forecast for the 1st2nd3rd year 39NCORRECT Your Answer Correct Answer 350675i05 X INCORRECI Your Answer Correct Answer 36985105 X INCORRECI Your Answer Correct Answer 389025i05 If the actual results over those years turned out to be 350675 3587545 36957375 respectively what is the x MAPE INCORRECT Your Answer Correct abSrea10f10 ea10abs rea20f20rea Answer 20 absrea30f30rea 30301000i05 A Question 8 Score 01 Recall the technique of exponentially smoothing data Consider data spanning a year for Firestone tire sales from a local business Exponentially smooth the data with w 06 rounding to the nearest tire January recorded 295 sales smoothed sales x INCORRECT Your Answer Correct Answer 29505 February recorded 260 sales smoothed sales x INCORRECT Your Answer Correct Answer 27405 March recorded 305 sales smoothed sales x INCORRECT Your Answer Correct Answer 292605 April recorded 205 sales smoothed sales x INCORRECT Your Answer Correct Answer 24004i05 May recorded 280 sales smoothed sales x INCORRECT Your Answer Correct Answer 264016i05 June recorded 250 sales smoothed sales x INCORRECT Your Answer Correct Answer 2556064i05 July recorded 285 sales smoothed sales x INCORRECT Your Answer Correct Answer 27324256i05 August recorded 310 sales smoothed sales x INCORRECT Your Answer Correct Answer 295297024i05 September recorded 270 sales smoothed sales x INCORRECT Your Answer Correct Answer 28011881i05 October recorded 225 sales smoothed sales x INCORRECT Your Answer Correct Answer 247047524i05 November recorded 365 sales smoothed sales x INCORRECT Your Answer Correct Answer 31781901i05 December recorded 235 sales smoothed sales x INCORRECT Your Answer Correct Answer 268127604i05 Question 9 Score 01 The exponential smoothing constant w when closer to 0 causes the new forecasted series to be in general less error prone x within a small number of years INCORRECT Your Answer Correct Answer True Question 10 Score 01 Which of the following is the definition of the mean absolute percentage error x INCORRECT Your Answer Correct Answer x 100 Activity 10 Analyzing Topic 49 uses ltTESTSgt FANOVA with lists of data Variance gathered from a completely randomized design Program A1 ANOVA see Appendix B is introduced with its capability of ANOVA using summary statistics for input Program A1 ANOVA can also use raw data stored in a matrix as opposed to a list This technique is required for those who will analyze randomized block designs and twofactor factorial experiments which are presented in Topics 50 and 51 The idea for the examples used in this chapter is from McClaveBenson STATISTICS FOR BUSINESS AND ECONOMICS 5e 1992 pp 870 891 909 Reprinted by permission of Prentice Hall Upper Saddle River New Jersey Topic 49 Completely Randomized Designs OneWay ANOVA To compare the distances traveled by three different brands of golf balls when struck by a driver we use a completely randomized design A robotic golfer using a driver is set up to hit a random sample of 24 balls 8 of each brand in a random sequence The distance is recorded for each hit and the results are shown in the table below organized by brand L1 L2 L3 Brand Brand A Brand B Brand C Distance 2643 2629 2419 2586 2599 2386 2664 2647 2449 2565 254 2362 1827 1912 1673 181 189 1659 1776 1855 1624 1873 1921 1725 Mean 221 8 2249 2037 Sthev 4258 3808 394 n 8 8 8 1997 TEXAS INSTRUMENTS INCORPORATED STATISTICS HANDBOOK FOR THE Tl 83 107 Activity 10 Analyzing Variance ANOVA cont Test the null hypothesis H0 uA 1113 110 1 With the data stored in lists L1 L2 and L3 press ltTESTSgt FANOVAL1 1 L2G L3 as shown in screen 1 2 Press to display the next two screens 2 and 3 With the p value of 0531 the data shows no significant difference between the mean distance traveled by the three brands of balls We do not reject the null hypothesis Bonferroni Multiple Comparison Procedure Since we do not reject the null hypothesis there is no significant difference between any of the means A multiple comparison procedure is not needed or appropriate Topic 50 gives an example of the multiple comparison procedure that relates back to this topic so you will know how to proceed if the null hypothesis above is rejected Program A1 ANOVA Program A1 ANOVA is available from Texas Instruments over the internet wwwticom or on disk 1 800TICARES and can be transferred to your TI83 using TIG RAPH LINKTM The program listing is in Appendix B 1 Press highlight program A1 ANOVA and then press to paste the name as shown in screen 4 2 Press for the menu on screen 5 3 Press 10NEWAY ANOVA for screen 6 which informs you of two options for input of the data a matrix or summary statistics The procedures for using these options follow Using Summary Statistics 1 Press ENTER and the menu screen 7 presents the options mentioned in screen 6 2 Select 2313x1n1 2 for screen 8 3 When prompted with HOW MANY LEVELS screen 8 type 3 and then press There are three levels or brands 1 FIHIII39u39FI39iL1 LEL3I 3 cl 2 552396 FIEBEE I HS1EI4ESEE42 Duewag HHDUH T HSIE4ESEE42 Error dF21 SS33FEF49FE HSIEEE11893 SKP4EEESBSEE Note Sxp xMSE 160511893 400639355 See screen 3 4Iprawn annual g E ELDEH DES F H HFIH I 3 ELLIFIITI FIETEIHI L 4 QUIT E 15x1n1 2 DUIT gnu HHH LEUELS LEUEL HEHH221E 5u42ss SIEEEI 108 STATISTICS HANDBOOK FOR THE TI 83 1997 TEXAS INSTRUMENTS INCORPORATED 4 Enter the means standard deviations and sample sizes for each level as shown in screens 8 and 9 After you press the final time the ANOVA table appears screen 10 The results are basically the same as before The differences occur because the means and standard deviations were rounded with the summary statistics The mean squares MS are not given in the table but are easily calculated with MS 2 SSDF or for Factor 2097762 10489 and for Error 337085W21 16052 5 Press again and 95 percent confidence intervals screen 11 for each mean are calculated based on the pooled standard deviation SP or Sxp Note all of these intervals overlap indicating there is no significant difference between the means Using Matrix D 1 Enter the 24 data values in a 24x2 matriX D see screen 12 with all the distance data in column 1 and the level or brand data in column 2 eight 1s eight 2s and eight 3s This is eXplained in the informational screen screen 6 that appears after you select ONEWAY ANOVA See Storing Data in a MatriX in Topic 48 You may enter data by column by pressing E after each value instead of as shown in Topic 48 2 Select 1DATA MAT D as shown in screen 13 The ANOVA table appears See screen 14 Press ENTER again to get the sample size means and standard deviations for the three levels See screen 15 Use E to view the standard deviation 3 Press ENTER for the confidence intervals which are the same as above See screen 11 If you wonder how a robot swinging a driver could get such wide variations in distances several possible eXplanations exist 0 The wind was shifting and gusting o The balls were inconsistently made 0 The robot hit with a wide variability of forces Other reasons are given in the next two topics HEFIH2224 39539 5 El 23I3 EH3 SIEE2E LE39u39EL EEFIHE22EIS Tquot S 2339E39 4 IEE2EI DF 55 5 FFIE 2 239226 ERR 21 3323851 FE5 P531 1m SP4EEE454293 I155 IIIS LE3939EL 1 FIT TIJF39 11 MFITRIH El 24x2 I an H I3939 39391 m 39l IIIIrIrIrIrIII IIIIIII3939I39I mummmmm Humnu 39JIIlJ39J39JlrJ39d 39JI39 39 39 rI 39J39l39ll II39I 39J1l39 II M iwm rwwwww ivrJrJrJrnI IlIlIlIlIlIlIl IlI3939I3939I3939I3939I39I39 mwxwxum MEI Irmme IIIIIIIII H H I II II II rIrIII139 quot39 laxallquot wwumuIIm quot39 H H 39IJ 39I 39II w lllllllw IIIIIIIIIIIIII IIIIIIIIIIIIII I39llIIIIIII IlIlIlI u I39 u a SKlal lla Eu T R1 3 3RUI DF 55 i FRE 2 23962638 ERR 21 3323249 FEE 531 SR4E36393551 14 LEU H HERH SD 1 B 2218 2 8 2249125 5 3 8 2332125 1997 TEXAS INSTRUMENTS INCORPORATED STATISTICS HANDBOOK FOR THE TI83 109 Activity 10 Analyzing Variance ANOVA cont Topic 50 Randomized Block Design Program A1 ANOVA Suppose eight golfers are randomly selected and each golfer hits three balls one of each brand in a random sequence The distance is measured and recordedas shown in the table below and in matrix D on the TI 83 See screen 16 Golfer Block Brand A Brand B Brand C 1 2643 2629 2419 2 2586 2599 2386 3 2664 2647 2449 4 2565 254 2362 5 1827 1912 1673 6 181 189 1659 7 1776 1855 1624 8 1873 1921 1725 Mean 221 8 2249 2037 Sthev 4258 3808 394 n 8 8 8 Test the null hypothesis H0 uA LLB 110 1 Press highlight program A1 ANOVA and then press so the name is pasted as shown in screen 17 2 Press for the menu in screen 18 3 Press 2RAN BLOCK DESIGN for screen 19 This screen informs you how to input the data into matriX D You need a 24x3 matriX With the 24 distances in column 1 the factor levels brands in column 2 eight 1s eight 2s and eight 3s and the block integer golfer in column 3 1 to 8 three times See the matriX example in screen 16 HRTRIH El 2413 IIIIIIIIIIIIII 39J 391l39I39I 1 I II II II II II I IlIlI39IJI39IJWI39IJIl Emmalama 39lIlII39IW39IJ I39I 39IJ39I39IW 39l39391lquot39I3939Ilm III II M r rwwwww ivrJrJrJrJII m d 39l39 I lI lI I I39 u BTW WINE MEI Irmme 39391Iquot39I3939Ilm39J IIIIIIIII H H 39J IlIlIlI I IlWWWI39IJIlIl 39IJ 39I 39II rIrIII139 quot39 II139uIquot39 wwumuIImh39 w lllllIlw lIlllllllIllll IIIIIIIIIIIIII H H I39 u PPQMHIHHDUHI l 17 RHHELDCH DESI EMHV FHETDHIHL 4EUIT 18 DHTH IH H3 MHT DDBSEHUHTIDHS EDL 1nL 2cHi EDL are EDHTHIH FHETDH LEUEL HHD BLEEKIHTEEEHS I I I 110 STATISTICS HANDBOOK FOR THE TI 83 1997 TEXAS INSTRUMENTS INCORPORATED 4 Press and then continue for the ANOVA table shown in screen 20 The very large F value of 16824 and a pvalue of 0 to 3 decimal places 0000 leads us to reject the null hypothesis and conclude that the mean distances are not all the same for the three brands of balls 5 Press ENTER to see screen 21 Screen 21 shows S MSE SSDF 87239214 249627 Bonferroni Multiple Comparison Procedure We will use the Bonferroni Procedure to see which means differ The table below shows the means ranked in order Note that the ONEWAY ANOVA option of program A1 ANOVA could be run with the current matriX D and used to find the means for each brand of ball Brand C A B Mean 2037 2218 2249 Number of Pairwise Comparisons C kk 12 There are three nCr 2 ways of picking pairs from three means or 3 22 3 with nCr under ltPRBgt These are CA CB and AB Note that if there were four means this would be four nCr 2 4 32 6 pairs Comparisonwise Significance Level 2 oc 2C In doing multiple t tests of H0 pl 2 112 with the alternate Ha ul 72 H2 and holding the overall eXperimental significance level to CC 005 you will need to use a comparisonwise significance level 0053 Because you are doing a twotail test you must divide this by 2 for 0056 2 000833 in each tail Critical tvalue With 000833 in the right tail of a tdistribution with 14 degrees of freedom Error degrees of freedom use the MATH equation solver to solve tcdf X E99 14 000833 for x as eXplained on the last page of Topic 34 X is the critical value and equals 2718 as verified with DISTR 5tcdf 2 1 718 1 E99 1 14 for an area of 000833 as shown in screen 22 88 E 888 FIE883 3828 2583 Tquot 238188 8 F398 888 82 48EEF391828 JriiIIquotIiETquot18 E9931 8883283348 22 1997 TEXAS INSTRUMENTS INCORPORATED STATISTICS HANDBOOK FOR THE TI83 111 Activity 10 Analyzing Variance ANOVA cont Comparisons Calculate t 2 Y2 Y1 MSE 1n1 1n2 for each comparison For AC 15 2218 2037 249627 18 18 14502 gt 2718 For BC 15 2249 2037 249627 18 18 16985 gt 2718 For AB 1 2249 2218 249627 18 18 2484 lt 2718 The ENTRY feature is helpful for doing the previous calculations on the home screen as shown in screen 23 Notice that Brand C has a smaller mean distance than either Brand A or Brand B but brands A and B do not have significantly different means We show this with a line over or under A and B but not C CAB or CAB Bonferroni for Completely Randomized Designs Topic 49 In Topic 49 we did not do the multiple comparisons procedure with the example because there was no significant difference between any of the means If we could have rejected the null hypothesis then we would have done the Bonferroni Procedure as it was done above The means were the same but S MSE 400639 and the Error degree of freedom was 21 which leads to a critical tvalue of 2601 The largest difference is between Brand B and Brand C for the following t statistic t 2249 2037 400639 18 18 10583 lt 2601 There is therefore no significant difference Note that for the completely randomized designs the sample sizes do not have to be of the same size n1 72 122 Although the data is the same there is a large difference in the MSE in Topics 49 and 50 Because of the different power in which different golfers could hit the ball we were able to block out much of the variability in Topic 50 All golfers were not able to hit Brand C as far as the other two brands There was no such blocking possible in Topic 49 where the variability was due to other unknown causes Although the means were the same in the two cases there was no significant difference in Topic 49 If the null hypothesis were true it is quite possible that another sample of 24 balls would have Brand C with the greater mean distance but not significantly different from the other two brands 112 STATISTICS HANDBOOK FOR THE TI 83 1997 TEXAS INSTRUMENTS INCORPORATED Topic 51 TwoFactor Designs With Equal Replicates Suppose we test three brands of golf balls and two different clubs driver and fiveiron in a completely randomized design HF39TH I H D 2413 Each of the six ballclub combinations is randomly and i EEEIE i i 1 independently assigned to four experimental units each of i EEEI E E 1 which consists of a specific position in the sequence of hits by a i Egg E E 1 golf robot The distance response is recorded for each of the 24 24 I 1 1 2 J39 hits and the results are shown in the table below and in the I in 3 1 2 T matrix D shown in screen 24 E Egg E E E E 541 a 1 1 Factor B FactorA E ESE3 h g 1 Club Brand A Brand B Brand C 1 I 3 32 Driver 2643 2629 2419 E EEEE E E E 2586 2599 2386 E EEEE E E E 2664 2647 2449 E 3quot 3 1 1 2565 254 2362 E 33 h E 1 Fiveiron 1827 1912 1673 551343 E 1 181 189 1659 E Egg h E E 1776 1855 1624 3 l 33 1873 1921 1725 The data is stored in matrix D of order 24x3 with the distance data in column 1 FactorA level in column 2 1 2 or 3 for IPPQNH 1 HHDUH39 I Brand A B or C and FactorB level in column 3 1 for driver 25 or 2 for fiveiron For example the last value in the table above and in the last row of matrix D is 1725 which is the distance a Brand C 3 ball is hit with a fiveiron 2 2 1 Press highlight program A1 ANOVA and then Emu I T press so the name is pasted as shown in screen 25 26 2 Press for the menu on screen 26 EQUHL HEPLIEHTEE EIFITFI IH 4113 NH 3 Press 32WAY FACTORIAL for screen 27 which informs El IST CEILEIFITFI you how to input the data into matrix D as was done above LE39u39ELS I HTEEEHS STFIHTIHE WITH 1 4 Press and then continue for the ANOVA table at 27 the top of the next page 1997 TEXAS INSTRUMENTS INCORPORATED STATISTICS HANDBOOK FOR THE Tl 83 113 Activity 10 Analyzing Variance ANOVA cont We see from the results that there is no significant interaction with FAB 221 and a pvalue 0139 This is also clear from the xyLine plots screen 30 of the mean for each of the siX ballclub combinations recorded in the table below ITIIIIIIII TUEEI 39l39ll l39wJI HICI 39I 39l39l II 39 quotE393939l39J39l3939I The plots are obtained as follows IIIEI IHIEI 13 9 n III 1 Store 1 2 3 in L1 the three driver means in L2 and the three fiveiron means in L3 2 Set up Plot1 for L1 and L2 and Plot2 for L1 and L3 as xyLine plots see in Topic 1 54 395233112 There is a significant difference between the B factors pvalue 0000 It is clear that the driver drives the ball further on 29 average than the fiveiron just as we would eXpect F39iFIE 39II 39II MEI P1L1LE DHII39IEH Because there is also a significant difference between the different balls Factor A Brand C seems least effective for distance Multiple comparisons could also be done similar to FI3939EIF39I those in Topic 50 3O H1 TEELH5 Factor B Factor A L1 Club Brand A Brand B Brand C 1 2 3 Driver L2 26145 26038 2404 Fiveiron L3 18215 18945 16703 114 STATISTICS HANDBOOK FOR THE TI 83 1997 TEXAS INSTRUMENTS INCORPORATED MATH202 Formula Sheet 9 6 C D w t 1 2 2 2 d DO Z 2 2 2 1 1 2n1151n2152 dD0 t 31 S2 Sp Sp Z Sd quotd n1 n2 nln2 2 SalJ n1 n2 dfnd 1 dfn1n2 2 20 51 3 52 Do S2 S2 12 pl 132D0 2 n1 quot2 Z A x1x2 Slarer A A 1 1 p 2g Sfn1S22n22 JP1 Pnn quot1 smaller 1 2 5121102 s n22 111 1 112 1 5 g XI Xzirta2 n n 512 522 fl EJitaz 5122 1 1 2 21 22 g l39z Sd x1x2 Za2 quot1 quot2 S n S n 2 n 1 2 dfn1n22 sfn12S n22 n1 1 n2 1 girdZS d k 2 k 2 k kk 1 SST SSE 1 J x x Sl C 2 2 df 2 nd 1 Source df SS MS 2 0 E2 Treatment k 1 SST MST MST X z E Error n k SSE MSE MSE df k 1 or Total n 1 SSTotal df r 1c 1 E row tota1col total table total Source df SS MS F MST F MSB Treatment k 1 SST MST MSE MSE Block 9 1 SSB MSB Error k 1b 1 SSE MSE Total n 1 SS Total Source df SS MS MSAB MSA MSB Factor A a 1 SSA M SA MSE MSE MSE Factor B b 1 SSB MSB Interaction a 1b 1 SSA B MSAB Error n ab SSE MSE Total n 1 SS Total MATH202 Formula Sheet 2 A A yiIB IBxi8i 5er eri n n 0 1 0 1 SS 2206i x2 zxi2 i1 i1 i1 1 n n 2 n n n n yi n n in i i1 i1 SSW Zyr y22y3 In SSW Zoe xgty ygt Zxryr f i1 i1 i1 i1 A A A n A 2 A n A 2 31 30y 1x SSEZyry SSyy ISSxy SSRZyy xx i1 i1 Source df SS MS Regression k SSR MSR 2 SSE t Error n k1 SSE MSE S JS VMSE nk1 33 Total n 1 SSW df n k1 A A 2 SS A i taZslgi SB ZS 1 x SE S r WJr73ign l df nk1 0 n SSH SS SSxxSSyy 2 2 2 italzs 1M italzs 11M 7 n SS n SS SSW SSW xx xx df n 2 df n 2 y r80 r81x1r82x2 r8kxk 8 5 2 Bo 43313514332352 quotkak 2 FZMSRZ 2 r k ra21 n 1 1r2 yryi MSE 1 rn k1 n k1 F SSRC SSRRk g SSER SSECk g VIFj 1 MSEC MSEC 1 r2 MATH202 Formula Sheet 3d3 5 d2 WG17 pi3 n gtlt Nb gtUI M an i2d3 WG17 WG13 pi2 n piquot n Rule 1 One point beyond Zone A more than 3 SD from center line Rule 2 Nine points in a row in Zone C or beyond on same side of center line Rule 3 Six points in a row all increasing or all decreasing Rule 4 Fourteen points in a row alternating up and down Rule 5 Two out of three points in a row in Zone A or beyond more than 2 SD from center line Rule 6 Four out of five points in a row in Zone B or beyond more than 1 SD from center line Rule 7 Fifteen points in a row in Zone C within 1 SD of center line Rule 8 Eight points in a row outside Zone C more than 1 SD from center line E1Y1 E2 wY2 1 wE1 E t wYt 1 E2 Y2 E3 wY3 1 Er wYt 1 Eln MAD t1 m Ft1 Et Ft2 Et WEt 1 Ftk Et T2Y2Y1 Ft1EtTt wE2 T2 T3 vE3 E2 1 vT2 Ft2 Et 2Tt WEt 1 Tt 1 Tr VEt Et 11 v 1 Ftk Et th Y 2 Ft 2m Fgt2 x100 MSD i1 i t1 Ft t MAPE m m MATH202 Formula Sheet Z Rt RH2 reject Ho if d lt dL RMSE xMSD Rt 2 Yr ft d i2 n inconclusive if dL lt d lt dU 2th do not reject Ho if d gt dU 1 rejectHoifdgt4 dL rejectHoifdltdL 0rdgt4 dL inconclusiveif 4 dU ltd lt4 dL inconclusiveif dL ltd ltdU 0r4 dU ltd lt4 dL do not rejectHoif d lt 4 dU do not reject Hoif dU ltd lt4 dU Where We39ve Been Discovered that variation in the output of processes is inevitable Presented both managerial and statistical methods for continuously improving processes and the quality of their output Learned how to use control charts to monitor process variation and determine when action should be taken to improve a process Where We39re Going Focus on methods for analyzing data generated by a process over time ie time series data Present descriptive methods for characterizing time series data Present inferential methods for forecasting future values of time series data Forecasting the Monthly Sales of a New Cold Medicine In the pharmaceutical industry sales forecasting is critical to the success of a company Accurate forecasts aid sales managers in improving decision making the nance department in controlling and scheduling its operating costs and capital budget the human resources department in projecting staffing and the purchasing department in controlling inventories and production schedules Due to the critical lifeanddeath nature of the industry pharma ceutical manufacturers rely on sophisticated analytical techniques to build their forecasts Several years ago a major pharmaceutical company based in New Jersey intro duced a new cold medicine called ColdeX For proprietary reasons the actual name of the product is Withheld ColdeX is now sold regularly in drugstores and supermarkets across the United States Prior to launching the product nationally the company hired consultants from the Graduate School of Management at Rutgers University The State University of New Jersey to help the company build a monthly forecast model for ColdeX This Statistics in Action problem involves a portion of the analysis conducted by continued 1 3 1 the consultants 132 CHAPTER 13 Time Series Statistics IN Action Consider the task of forecasting the Statistics N Action Revisited sales of Coldex for the rst 3 months of the third year of the product s existence The company provided data on the monthly sales in thousands of dollars for the first 2 years of the prod continued 0 Forecasting with Exponential Smoothing p 1329 0 Forecasting with Simple uct s life The data saved in the COLDEX file are listed in Linear RegreSSion 1339 133932 Table SIA131 In the Statistics in Action Revisited sections 0 Forecasting with a Seasonal I in this chapter we demonstrate several forecasting methods Regression Model p 1336 used by the consultants Table SIA131 Coldex Monthly Sales Data Year Month Time Sales Year Month Time Sales 1 Jan 1 3394 2 Jan 13 4568 Feb 2 4010 Feb 14 3710 Mar 3 924 Mar 15 1675 Apr 4 205 Apr 16 999 May 5 293 May 17 986 Jun 6 1130 Jun 18 1786 Jul 7 1116 Jul 19 2253 Aug 8 4009 Aug 20 5237 Sep 9 5692 Sep 21 6679 Oct 10 3458 Oct 22 4116 Nov 11 2849 Nov 23 4109 Dec 12 3470 Dec 24 5124 Source Personal communication from Carol Cowley Carla Marchesini and Ginny Wilson Rutgers University Graduate School of Management a Data Set COLDEX In the previous chapter we were concerned with improving processes In this chapter our concern is not with the improvement of the internal workings of processes but with describing and predicting the output of processes The process outputs on which we focus are the streams of data generated by processes over time Recall from Chapters 2 and 12 that such data streams are called time series or time series data For example businesses generate time series data such as weekly sales quarterly earnings and yearly pro ts that can be used to describe and evaluate the performance of the business The US economy can be thought of as a system that generates streams of data that include the gross domestic product the Consumer Price Index and the unemployment rate The methods of this chapter focus exclusively on the time series data generated by a process Properly analyzed these data reveal much about the past and future behavior of the process Time series data like other types of data we have discussed in previous chapters are subjected to two kinds of analyses descriptive and inferential Descriptive analyses use graphical and numerical techniques to provide a clear understanding of any patterns that are present in the time series After graphing the data you will often want to use it to make inferences about the future values of the time series ie you will want to forecast future values For example once you understand the past and present trends of the Dow Jones Industrial Average you would probably want to forecast its future trend before making decisions about buying and selling stocks Because signi cant amounts of money may be riding on the accuracy of your forecasts you would be inter ested in measures of their reliability Forecasts and their measures of reliability are exam ples of inferential techniques in time series analysis SECTION 131 Descriptive Analysis Index Numbers 133 131 Descriptive Analysis Index Numbers BIOGRAPHY IRVING FISHER 1867 1947 Index Numbers Expert I New York State if native Irving Fisher the son of a congregational minister gradu ated from Yale 7 and University with a bachelor s degree in mathematics in 1888 Fisher continued at Yale with his graduate studies earning the first PhD in economics ever awarded by the university in 1891 He had a long distinguished career as a professor at Yale and became a very successful businessman Fisher made a for tune with his invention of a visible card index system known today as the Rolodex Fisher is consid ered one of the most influential economists of the 19th and 20th centuries he had an uncanny ability to explain and write clearly about the most technical economic theories Fisher also had a reputa tion as a colorful eccentric To illustrate his price theory in his dissertation Fisher constructed a remarkable machine equipped with pumps wheels levers and pipes Fisher s bestknown contribution to the field of statistics was as a pioneer in the construction and use of price indexes A colleague at Yale James Tobin once called Fisher the greatest expert of all time on index numbers I A common technique for characterizing a business or economic time series is to com pute index numbers Index numbers measure how time series values change relative to a preselected time period called the base period An index number measures the change in a variable over time relative to the Value of the variable during a speci c base period Two types of indexes dominate business and economic applications price and quantity indexes Price indexes measure changes in the price of a commodity or group of commodities over time The Consumer Price Index CPI is a price index because it measures price changes of a group of commodities that are intended to reflect typical purchases of American consumers On the other hand an index constructed to measure the change in the total number of automobiles produced annually by American manufacturers would be an example of a quantity index Methods of calculating index numbers range from very simple to extremely com plex depending on the numbers and types of commodities represented by the index Several important types of index numbers are described in this section Simple Index Numbers When an index number is based on the price or quantity of a single commodity it is called a simple index number A simple index number is based on the relative changes over time in the price or quantity of a single commodity For example consider the price of silver in dollars per fine ounce between 1975 and 2008 shown in Table 131 To construct a simple index to describe the relative changes in silver prices we must first choose a base period The choice is important because the price for all other periods will be compared with the price during the base period We select 1975 as the base period a time just Table 131 Silver Prices 1975 2008 Year Price oz Year Price oz Year Price oz 1975 443 1987 702 1999 522 1976 435 1988 653 2000 495 1977 463 1989 550 2001 437 1978 542 1990 483 2002 460 1979 1108 1991 406 2003 485 1980 2098 1992 395 2004 665 1981 1049 1993 431 2005 722 1982 792 1994 529 2006 1157 1983 1143 1995 520 2007 1339 1984 814 1996 520 2008 1502 1985 613 1997 490 1986 546 1998 554 Source The Silver Institute 2009 www5ilverinstituteorg a Data Set SILVER 134 CHAPTER 13 Time Series TT A E c preceding the period of rapid economic inflation associated with dramatic oil price 1 IJ39E cF PRICE INDEX increases 1935 1313 lm To calculate the simple index number for a particular year we divide that year s 239 Egg 3 133 price by the price during the base year and multiply the result by 100 Thus for the 1990 5 19m 5742 112335 silver price index number we calculate a 199 1 1 be 215111 1 139 1930 21198 413529 1990 silver price 483 a 19313 11149 23519 19901ndex number f 100 1090 9 SllVeI pI lCC 1D 1933 1 143 2581311 11 1934 314 3335 Similarly the index number for 2008 is 12 1935 613 1333 13 1985 546 12325 H 2008 s11ver r1ce 1502 if 1933 1192 1534 2008 1ndex number p 100 100 3391 11 1933 553 14140 1975 s11ver pr1ce 443 16 1969 550 152415 1 1 The index number for the base period is always 100 In our example we have 19 1992 3 95 3916 20 1993 431 9129 1975 silver price a w 1975 index number f 100 100 21 19941 529 11941 1975 Sllver pnce 22 199325 520 1 1138 23 1995 520 1 17138 34 199 490 11U11 Thus the silver price increased by 9 the difference between the 1990 and 1975 25 19138 554 125115 index numbers between 1975 and 1990 and by 239 between 1975 and 2008 The sim 1939 53922 1183 ple index numbers for all silver prices between 1975 and 2008 were computed using 2 21330 495 1 11 TQ 23 2mm 43 9835 Excel and are shown 1n the Excel spreadsheet F1gure 131 The 1ndex 1s also portrayed 29 2012 453 10334 graphically in Figure 132 The steps for calculating simple index numbers are summa 3 3 2003 435 113943 rized in the next box 31 MM 565 1 51211 1 21135 T22 16293 33 20313 115 25131 34 EM 1339 313226 35 2903 1592 33905 Steps for Calculating a Simple Index Number Figure 13391 1 Obtain the prices or quantities for the commodity over the time period of interest Excel workbook with simple index numbers for silver prices 239 select a base PerlOd base 1975 3 Calculate the index number for each period according to the formula Time series value at time t Index number at t1me t T1me ser1es value at base perlod Symbolically I 3 gt100 t Yo where I t is the index number at time t Y is the time series value at time t and Y0 is the time series value at the base period Composite Index Numbers A composite index number represents combinations of the prices or quantities of several commodities For example suppose you want to construct an index for the total number of sales of the two major automobile manufacturers in the United States General Motors and Ford The first step is to collect data on the sales of each manufacturer during the period in which you are interested say 2000 2008 To sum marize the information from both time series in a single index we add the sales of each manufacturer for each year that is we form a new time series consisting of the total number of automobiles sold by the two manufacturers Then we construct a simple index for the total of the two series The resulting index is called a Simple Figure 132 Minitab time series graph of simple silver price index SECTION 131 Descriptive Analysis Index Numbers 135 Time Series Hint of IN SUD 40039 Ill INZEK W G I HRH I am i l I I V i i i t F i run Hrquot whys we labs nabs labs 19530 1955 mint 2ng Tear composite index We illustrate the construction of a simple composite index in Example 131 A simple composite index is a simple index for a time series consisting of the total price or total quantity of two or more commodities Example 131 Constructing a Simple Composite Index for HiTech Stocks 9 Problem One of the primary uses of index numbers is to characterize changes in stock prices over time Stock market indexes have been constructed for many different types of companies and industries and several composite indexes have been developed to characterize all stocks These indexes are reported on a daily basis in the news media eg Standard and Poor s 500 Stocks Index and Dow Jones 65 Stocks Index Consider the 2008 monthly closing prices ie closing prices on the last day of each month given in Table 132 for three hightechnology company stocks listed on the New York Stock Exchange To see how this type of stock fared over the year construct a simple composite index using January 2008 as the base period Graph the index and comment on its implications Solution First we calculate the total for the three stock prices each month These totals are shown in the TOTAL column in the Excel workbook displayed in Figure 133 Then the simple composite index is calculated by dividing each monthly total by the Table 132 Monthly Closing Prices of Three HighTechnology Company Stocks Year Month Time IBM Intel Microsoft 2008 Jan 1 10711 2110 3260 Feb 2 11386 1997 2720 Mar 3 11514 2118 2838 Apr 4 12070 2226 2852 May 5 12943 2318 2832 Jun 6 11853 2148 2751 Jul 7 12798 2219 2572 Aug 8 12173 2287 2729 Sep 9 11696 1873 2669 Oct 10 9297 1603 2233 Nov 11 8160 1380 2022 Dec 12 8416 1466 1944 Source Standard amp Poor s NYSE Daily Stock Price Record 2008 httpmoneycentralmsncom a Data Set HITECH 136 CHAPTER 13 Time Series 2 12 1 2 2E F 12 H 1 111222 11221121 111112 1211111 1111121 11111212122222 quot121121 1111211211 2 2222 1 12139 11 2112 2222 12221 12222 2 2222 122 2 11222 1222 2222 12122 122 12 1 2222 1122 2 112 14 21 12 2222 122 22 122112 2 2222 222 2 12222 22 22 2222 121 22 12221 2 2222 11211quot 122112 2212 2222 12222 11221 39139 2223 JUN 12 11252 211 22 2151 12239 52 1211 12 2 2222 1121 2 12222 22 12 2212 12222 12222 Figure 133 2 2222 211112 2 121 12 2222 2222 12122 12222 E 1 kb k th 12 2222 222 2 11222 12 22 2222 122 22 122 22 IXCC W01quot 09 W1 11 2222 122T 12 22 22 12 22 22 22 121 22 2121 s1mple compos1te 1ndex for 12 2222 111211 11 2122 1222 2222 11222 2122 1111112 Flint 21f 21211111122122 INDEX III 1112 r Hm I K r x 239 IL I39 39 PdF PFquot 1 Ill il39 II III 1 1 i2 I Figure 134 o o o FJ39 M1n1tab t1me ser1es graph 12 I I I I I I I I of composite index for 2211 221 1121 222 1122 2211 2111 222 522 211 11211 hitech stock prices 222 January 2008 total The index values are given in the last column of Figure 133 and a graph of the simple composite index is shown in Figure 134 The plot of the 2008 simple composite index for these hightechnology stocks shows a generally decreasing trend over the year The composite price of these high technology stocks dropped about 27 from January Index 100 to December Index 7354 Look Back The difference between two index numbers gives the percentage change in the value of the time series variable between the two time periods Now Work Exercise 1310c d A simple composite price index has a major drawback The quantity of the com modity that is purchased during each period is not taken into account Only the price totals are used to calculate the index We can remedy this situation by constructing a weighted composite price index A weighted composite price index weights the prices by quantities purchased prior to calculating totals for each time period The weighted totals are then used to compute the index in the same way that the unweighted totals are used for simple composite indexes Because the quantities purchased change from time period to time period the choice of which time period s quantities to use as the basis for the weighted composite index is an important one A Laspeyres index uses the base period quantities as weights The rationale is that the prices at each time period should be compared as if the same quantities were purchased each period as were purchased during the base period This method measures price in ation or de ation by xing the purchase quantities at their base period values The method for calculating a Laspeyres index is given in the box on the next page SECTION 131 Descriptive Analysis Index Numbers 137 Steps for Calculating a Laspeyres Index 1 Collect price information for each of the k price series to be used in the composite index Denote these series by P1 P2 Pkt 2 Select a base period Call this time period to 3 Collect purchase quantity information for the base period Denote the k quantities by Qltov QZZ Ov 9 tho39 4 Calculate the weighted totals for each time period according to the formula k QitOPit 5 Calculate the Laspeyres index I t at time tby taking the ratio of the weighted total at time t to the base period weighted total and multiplying by 100 that is k EQitOPit 1 21 x 100 k EQitOPito i1 Example 132 Constructing a Laspeyres Index Problem The 2008 January and December prices for the three hightechnology company stocks are given in Table 133 Suppose that in January 2008 an investor purchased the quantities shown in the table N0te Only two prices are used to simplify the example The same methods can be applied to calculate the index for other months Calculate the Laspeyres index for the investor s portfolio of hightechnology stocks using January 2008 as the base period Solution First we calculate the weighted price totals for each time period using the January quantities as weights Thus 3 January weighted total 2 Q1 Jan P 1 Jan i1 50010711 1002110 10003260 88265 3 December weighted total 2 Q1 Jan Pi Dec i1 5008416 1001466 10001944 62986 Table 133 Prices of HighTechnology Stocks with Quantities Purchased IBM Intel Microsoft Shares purchased 500 100 1000 January price 10711 2110 3260 December price 8416 1466 1944 138 CHAPTER 13 Time Series Then the Laspeyres index is calculated by multiplying the ratio of each weighted total to the base period weighted total by 100 Thus 3 EQi Jan Pi Jan i1 1 x 100 88 265 x 100 100 Jan 3 88265 EQi Jan Pi Jan i1 3 1 ngi Janipi Dec39 x 100 62 986 x 100 71 36 Decquot 3 88265 EQi Jan Pi Jan i1 Look Back The implication is that when weighted by quantities purchased the total value of these stocks decreased by about 100 71 29 from January to December in 2008 Now Work Exercise 1314b The Laspeyres index is appropriate when the base period quantities are reasonable weights to apply to all time periods This is the case in applications such as that described in Example 132 where the base period quantities represent actual quantities of stock purchased and held for some period of time Laspeyres indexes are also appropriate when the base period quantities remain reasonable approximations of purchase quanti ties in subsequent periods However it can be misleading when the relative purchase quantities change signi cantly from those in the base period Probably the bestknown Laspeyres index is the allitems Consumer Price Index CPI This monthly composite index is made up of hundreds of item prices and the US Bureau of Labor Statistics BLS sampled over 30000 families purchases in 1982 1984 to determine the base period quantities Thus beginning in 1988 the allitems CPI published each month re ects quantities purchased in 1982 1984 by a sample of families across the United States However as prices increase for some commodities more quickly than for others consumers tend to substitute less expensive commodities where possible For example as gasoline prices rapidly in ated in the mid2000s consumers began to purchase more fuel ef cient cars The net effect of using the base period quantities for the CPI is to overestimate the effect of in ation on consumers because the quantities are xed at levels that will actually change in response to price changes There are several solutions to the problem of purchase quantities that change relative to those of the base period One is to change the base period regularly so that the quantities are regularly updated A second solution is to compute the index at each time period by using the purchase quantities of that period rather than those of the base period A Paasche index is calculated by using price totals weighted by the purchase quantities of the period the index value represents The steps for calculating a Paasche index are given in the box Steps for Calculating a Paasche Index 1 Collect price information for each of the k price series to be used in the composite index Denote these series by P1 P2 Pkt 2 Select a base period Call this time period to 3 Collect purchase quantity information for every period Denote the k quantities for PeriOd t by Q11 Q21 9 ka 4 Calculate the Paasche index for time t by multiplying the ratio of the weighted to tal at time tto the weighted total at time to base period by 100 where the weights used are the purchase quantities for time period t Thus k EQitP it I x 100 ileitPito SECTION 131 Descriptive Analysis Index Numbers 139 Example 133 Constructing a Paasche Index for HiTeeh Stock Prices Ethics 1N Statistics Intentionally selecting a base period in order to inflate or deflate an index number without reporting the base period is considered unethical statistical practice Problem The 2008 January and December prices and volumes actual quantities pur chased in millions of shares for three hightechnology company stocks are shown in Table 134 Calculate and interpret the Paasche index using January 2008 as the base period Table 134 Prices and Volumes of HighTechnology Stocks IBM Intel Microsoft Price Volume Price Volume Price Volume January 10711 247 2110 1462 3260 1950 December 8416 189 1466 1332 1944 1549 Source Standard amp Poor s NYSE Daily Stock Price Record 2009 httpmoneycentralmsncom Solution The key to calculating a Paasche index is to remember that the weights purchase quantities change for each time period Thus 3 ZQi Jan Pi Jan 13 1 x 100 100 z 3 zQi Jan Pi Jan 1 3 Z Qi Dec Pi Dec 1D 2 1 x 100 3 E Qi Dec Pi Jan i1 1898416 13321466 15491944 x 100 18910711 13322110 15493260 65 545 9 x 100 66 3 988464 The implication is that in 2008 December prices represent a 100 663 337 decrease from January prices assuming the purchase quantities were at December levels for both periods Now Work Exercise 1314d The Paasche index is most appropriate when you want to compare current prices to base period prices at current purchase levels However there are several major prob lems associated with the Paasche index First it requires that purchase quantities be known for every time period This rules out a Paasche index for applications such as the CPI because the time and monetary resource expenditures required to collect quantity information are considerable Recall that more than 30000 families were sampled to estimate purchase quantities in 1982 1984 A second problem is that although each period is compared to the base period it is dif cult to compare the index at two other periods because the quantities used are different for each period Consequently the change in the index is affected by changes in both prices and quanti ties This fact makes it dif cult to interpret the change in a Paasche index between peri ods when neither is the base period Although there are other types of indexes that use different weighting factors the Laspeyres and Paasche indexes are the most popular composite indexes Depending on the primary objective in constructing an index one of them will probably be suitable for most purposes 1310 CHAPTER 13 Time Series Exercises 131 1314 Learning the Mechanics 131 132 133 134 135 Explain in words how to construct a simple index Explain in words how to calculate the following types of indexes a Simple composite index b Weighted composite index c Laspeyres index 1 Paasche index Explain in words the difference between Laspeyres and Paasche indexes The table below gives the prices for three products A B and C for the four quarters of last year Quarter A B C 1 325 175 800 2 350 125 935 3 390 120 970 4 425 100 1050 a Compute a simple index for the Quarter 4 price of product A using Quarter 1 as the base period b Compute a simple index for the Quarter 2 price of product B using Quarter 1 as the base period c Compute a simple composite index for the Quarter 4 price of all three products using Quarter 1 as the base period 1 Compute a simple composite index for the Quarter 4 price of all three products using Quarter 2 as the base period Refer to Exercise 134 The next table gives the quantities purchased for three products A B and C for the four quarters of last year Quarter A B C 1 100 20 50 2 200 25 35 3 250 50 25 4 300 100 20 a Compute a Laspeyres index for the Quarter 4 price of all three products using Quarter 1 as the base period b Compute a Paasche index for the Quarter 4 price of all three products using Quarter 2 as the base period Applying the Concepts Basic 136 MI 0 Annual median family income The next table lists the US median annual family income every 5 years during the period 1975 2005 It also contains several values for each of two Year 1975 1980 1985 1990 1995 2000 2005 Source Income Base 1975 Index Base 1980 Index 13719 6526 21023 15324 27735 20216 35353 25769 40611 50732 56194 US Census Bureau StatisticalAbstract of the United States 2008 simple indexes for median family income The data are saved in the FAMINCOME le a Calculate the missing values of each simple index b Interpret the index for 1990 Annual US beer production The table below describes US beer production in millions of barrels for the period 1980 2007 The data are saved in the USBEER le a Use 1980 as the base period to compute the simple index for this time series Interpret the value for 2007 b Refer to part a Is this an example of a quantity index or a price index c Recompute the simple index using 1990 as the base period Plot the two indexes on the same graph What pattern do you observe Year Beer Year Beer Year Beer 1980 188 1990 204 2000 199 1981 194 1991 203 2001 199 1982 194 1992 202 2002 200 1983 195 1993 203 2003 195 1984 193 1994 202 2004 198 1985 193 1995 199 2005 197 1986 195 1996 201 2006 198 1987 195 1997 199 2007 199 1988 198 1998 198 1989 200 1999 198 Source US Beer Institute 2008 Brewer s Almanac 138 9 Quarterly singlefamily housing starts The quarterly numbers of singlefamily housing starts in thousands of dwellings in the United States from 2004 through 2008 are recorded below and saved in the QTRHOUSE le a Using Quarter 1 2004 as a base period calculate the simple index for this quarterly time series b Interpret the simple index for Quarter 22007 Year Quarter Housing Starts 2004 1 345 2 456 3 440 4 370 2005 1 369 2 485 3 471 4 392 2006 1 382 2 433 3 372 4 278 2007 1 260 2 333 3 265 4 188 2008 1 162 2 194 3 163 4 103 Source US Census Bureau Statistical Abstract of the United States 2009 139 0 c By what percentage did the number of housing starts increase between Quarter 12004 and Quarter 42008 1 By what percentage did the number of housing starts increase between Quarter 12006 and Quarter 42008 Price of natural gas The table below lists the price of natural gas in dollars per 1000 cubic feet between 1980 and 2007 The data are saved in the NATGAS le Year 1980 1990 1991 1992 1993 1994 1995 Source Price Year Price Year Price 368 1996 634 2003 963 580 1997 694 2004 1075 582 1998 682 2005 1270 589 1999 669 2006 1375 616 2000 776 2007 1301 641 2001 963 606 2002 789 US Census Bureau StatisticalAbstract of the United States 2009 a Using 1980 as the base period calculate and plot the simple index for the price of natural gas from 1990 through 2007 b Use the simple index to interpret the trend in the price of natural gas c Is the index you constructed in part a a price or quantity index Explain Applying the Concepts Intermediate 1310 9 Employment in farm and nonfarm categories Civilian employment is broadly classi ed by the federal govern ment into two categories agricultural and nonagricultural Employment gures in thousands of workers for farm and nonfarm categories for selected years from 1980 to 2005 are given in the table below and saved in the CVEMPLOY le Year 1980 1985 1990 Source E 1311 0 Farm Nonfarm Year Farm Nonfarm 3364 95938 1995 3440 121460 3179 103971 2000 2464 134427 3223 115570 2005 2197 139532 US Census Bureau StatisticalAbstract of the United States 2009 a Compute simple indexes for each of the two time series using 1980 as the base period b Which segment has shown the greater percentage change in employment over the period shown c Compute a simple composite index for total employment for the years 1980 2005 Use 1980 as a base period 1 Refer to part c Interpret the composite index value for 2005 GDP personal consumption expenditures The gross domes tic product GDP is the total national output of goods and SECTION 131 Descriptive Analysis Index Numbers 1311 services valued at market prices As such the GDP is a com monly used barometer of the US economy One component of the GDP is personal consumption expenditures which is itself the sum of expenditures for durable goods non durable goods and services The GDP for these components in billions of dollars is shown in the next table in 5year increments from 1960 to 2005 and saved in the GDP le a Using these three component values construct a simple composite index for the personal consumption compo nent of GDP Use 1970 as the base year b Suppose we want to update the index by using 1980 as the base year Update the index using only the index values you calculated in part a without referring to the original data c Graph the personal consumption expenditure index for the years 1960 2005 rst using 1970 as the base year and then using 1980 as the base year What effect does changing the base year have on the graph of this index Year Durables Nondurables Services 1960 435 1531 33 1359 1965 635 1919 1892 1970 853 2704 2908 1975 1343 4160 4745 1980 2125 6829 8527 1985 3529 9194 13951 1990 4682 12292 20638 1995 5897 14973 28820 2000 8633 19472 39288 2005 10208 25141 51592 Source US Census Bureau Statistical Abstract of the United States 2009 wwwbeagov 1312 GDP personal consumption expenditures cont d Refer 1313 0 to Exercise 1311 Suppose the output quantities in 1970 measured in billions of units purchased are as follows Durable goods 109 Nondurable goods 1402 Services 426 a Use the outputs to calculate the Laspeyres index from 1960 to 2005 same increments as in Exercise 1311 with 1970 as the base period b Plot the simple composite index of Exercise 1311 and the Laspeyres index of part a on the same graph Comment on the differences between the two indexes Hourly earnings for nonsupervisory workers The below table presents the average hourly earnings and the average number of hours worked per week in 5year increments from 1975 to 2000 for nonsupervisory workers in three different industries These data are saved in the NONSUPER le Transportation and Public Utilities Manufacturing Year Hourly Earnings Weekly Hours Hourly Earnings 1975 483 395 588 1980 727 397 887 1985 954 405 1140 1990 1083 408 1297 1995 1237 416 1423 2000 1438 417 1622 Source US Census Bureau StatisticalAbstract of the United States 2009 Wholesale Trade Weekly Hours Hourly Earnings Weekly Hours 397 472 386 396 695 384 395 915 384 389 1079 381 395 1243 383 396 1520 384 1312 d CHAPTER 13 Time Series Compute a simple index for average hourly earnings for manufacturing workers over the period 1975 2000 Use 1975 as the base year Do the same for transportation and public utilities workers Plot the two simple indexes on the same graph and interpret the results Compute simple composite indexes for hourly earnings and weekly hours for the 25year period Use 1975 as the base year Plot the two composite indexes part c on the same graph and interpret the results 1314 Production and price of lead steel and copper The level a of price and production of metals in the United States is one measure of the strength of an industrial economy The table below lists the 2004 prices in dollars per ton and production in thousands of tons for three metals important to US industry These data are saved in the METALS le a l lb Compute simple composite price and quantity indexes for the 12month period using January as the base period Compute the Laspeyres price index for the 12month period using January as the base period Plot the simple composite and Laspeyres indexes on the same graph Comment on the differences Compute the Paasche price index for metals for the 12month period using January as the base period Plot the Laspeyres and Paasche indexes on the same graph Comment on the differences Compare the Laspeyres and Paasche index values for September and December Which index is more appro priate for describing the change in this 4month period Explain Copper Month Price Production Price Jan 11330 1040 18775 Feb 13800 989 21992 Mar 13800 1050 25085 Apr 13864 1110 22455 May 15200 1090 18190 Jun 15200 1130 18000 Jul 15328 1040 22250 Aug 18000 1070 24932 Sep 18000 1120 21738 Oct 18000 1110 23762 Nov 18000 1040 24800 Dec 18000 1140 22357 Steel Lead Production Price Production 8656 7696 334 8400 8958 328 9268 9060 335 8901 7986 351 9163 8488 312 9006 9032 331 9164 9640 338 9314 9492 369 9234 9616 369 9551 9602 362 8989 9908 340 8660 9950 339 Source The CRB Commodity Yearbook 2005 New York John Wiley amp Sons Inc 132 Descriptive Analysis Exponential Smoothing As you have seen in the previous section index numbers are useful for describing trends and changes in time series However time series often have such irregular uctuations that trends are dif cult to describe Index numbers can be misleading in such cases because the series is changing so rapidly Methods for removing the rapid uctuations in a time series so the general trend can be seen are called smoothing techniques Exponential smoothing is one type of weighted average that assigns positive weights to past and current values of the time series A single weight w called the exponential smoothing constant is selected so that w is between 0 and 1 Then the expo nentially smoothed series E is calculated as follows E3 E1 2 Y1 E2 IUYZ i leE1 2 MY i IUE2 Et 2 wyt 1 wEt 1 Thus the exponentially smoothed value at time t assigns the weight w to the current series value and the weight 1 w to the previous smoothed value 91919939911113 9 911 39 99 919199 91199119111199 1 1939 199 1993 9 1939 1 193 99933 9 1939 319 99931 9 39 1999 319 99999 9 1999 999 91999 3 39 1999 1139 111999 9 1999 399 99919 9 1939 999 39999 9 1999 999 99199 19 1991 939 99999 111 1999 999 99999 19 1999 991 99999 123 1999 999 99999 19 1999 993 99199 119 1999 999 99999 19 1999 999 99193 19 1999 999 91999 19 199 999 91999 19 9393 999 91191 93 9391 999 99993 91 9399 999 99991 99 9399 999 99199 99 9399 999 99999 99 9999 199 99999 99 9999 1 191 99199 99 999r 1999 93993 99 9399 1939 119999 Figure 135 Minitab worksheet with exponentially smoothed w 3 silver prices Figure 136 Minitab graph of exponentially smoothed w 3 silver prices Figure 137 Minitab graph of exponentially smoothed w 3 and w 7 silver prices SECTION 132 Descriptive Analysis ExponentialSmoothing 1313 For example consider the silver price time series in Table 131 p 134 Suppose we want to calculate the exponentially smoothed series for the years 1982 through 2008 using a smoothing constant of w 3 The calculations proceed as follows E1982 2 Y1982 792 E1983 2 3Y1983 1 3E1982 2 E1984 2 3Y1984 1 3E1983 31143 7792 897 3814 7897 872 All the exponentially smoothed values corresponding to w 3 are given in the Minitab worksheet Figure 135 Note Minitab gives the value of E in row t 1 The actual silver prices and exponentially smoothed prices are graphed in Figure 136 Like many averages the exponentially smoothed series changes less rapidly than the time series itself The choice of w affects the smoothness of E The smaller closer to 0 is the value of w the smoother is 15 Because small values of 11 give more weight to the past values of the time series the smoothed series is not af fected by rapid changes in the current values and therefore appears smoother than the original series Conversely choosing 11 near 1 yields an exponentially smoothed series that is much like the original series that is large values of 11 give more weight to the current value of the time series so the smoothed series looks like the original series This concept is illustrated in Figure 137 The steps for calculating an exponentially smoothed series are given in the box on the next page 5919191me Pilatzfgr FREE Expunenml 91931911 9911319999 9999 Ii F113 39511999WE939IH19 113 99911399 H J E 313923331 91991 LSEEEE MED 537139911 1939 1939 1939 1939 1939 1939 9999 9999 9993 111111119 SE ES Plat of PRICE SMIDDFHEF VSMDDW 99119339 15 919111 Ii o Sui12331 1013 39 Qimr Erica 39 1939 1939 1939 1339 1999 1939 9999 9999 9999 1314 CHAPTER 13 Time Series Steps for Calculating an Exponentially Smoothed Series 1 Select an exponential smoothing constant w between 0 and 1 Remember that small values of w give less weight to the current value of the series and yield a smoother series Larger choices of w assign more weight to the current value of the series and yield a more variable series 2 Calculate the exponentially smoothed series E from the original time series Yr as follows E1 Y1 E2 wYZ 1 wE1 wY3 1 wE2 5 WYt WEt1 E1 Example 134 Exponentially Smoothing for a Firm39s Annual Sales Revenue 9 21112222211111 1 2 G1 BE E3 T 22122 21122 1 1 22 2222 2 2 12 1212 2 2 2 2 2212 1 1 122 12121 2 221 21212 2 2 222 22222 2 2 211 22121 2 2 122 12212 2 2 121 11222 12 12 22 2 22222 11 11 222 22122 12 12 222 12212 12 12 121 12221 11 11 212 22222 12 12 22 2 22211 1E 16 71 13911 589113 1 12 221 12222 12 12 222 22222 I 12 12 1222 22212 1 22 22 221 22121 1 21 21 222 22122 22 22 222 22222 1 22 22 22 22122 21 21 221 22122 22 22 1222 22212 22 22 111 1 122221 22 22 1222 122211 22 22 112 2 112222 22 22 112 2 112212 22 22 122 2 122221 21 21 122 2 122222 22 39 22 1122 112121 22 22 1221 111122 21 21 121 1 122222 22 22 1222 122222 Figure 138 Minitab worksheet with exponentially smoothed w 7 sales revenues Problem Annual sales data recorded in thousands of dollars for a rm s rst 35 years of operation are provided in Table 135 Create an exponentially smoothed series for the sales time series using w 7 and plot both series Solution To nd the exponentially smoothed series with w 7 we calculate E1 2 Y1 IUYZ l IUE1 E3 IUY3 l IUE2 S I 740 358 454 755 3454 521 etc We obtained the exponentially smoothed values for all 35 years using Minitab The values are shown on the Minitab worksheet Figure 138 in the column labeled EXP7 A plot of the original time series and exponentially smoothed series is shown in the Minitab graph Figure 139 You can see that the smoothed series provides a good picture of the general trend of the original series Note too that the exponentially Table 135 A Firm s Annual Sales Revenues thousands Year Sales Revenue Year Sales Revenue 1 58 19 1032 2 40 20 854 3 55 21 862 4 156 22 899 5 251 23 892 6 203 24 991 7 314 25 1003 8 480 26 1117 9 461 27 1022 10 359 28 1155 11 355 29 1192 12 535 30 1252 13 484 31 1363 14 616 32 1468 15 656 33 1501 16 714 34 1514 17 834 35 1509 18 936 a Data Set SALE835 Figure 139 Minitab graph of exponen tially smoothed w 7 sales revenues SECTION 132 Descriptive Analysis Exponential Smoothing 1315 Emmt39hiingi Flint far SMESV Exponential Me ho 14D 1210 r 104 E 39 E LEE 1141 EU ramble 2 Arman mun Fits swam imam Eh Hem5 HAFE 15 HEB antiD 3135 MED 952252 115 211 211 2 39in 373 T ihig smoothed series is less sensitive to shortterm deviations of the sales revenues from the trend that occurred in years 10 11 and 19 Look Back If you desire a less variable exponentially smoothed series then select a smoothing constant closer to 0 eg w 5 or w 2 Now Work Exercise 1316 One of the primary uses of exponential smoothing is to forecast future values of a time series Because only current and past values of the time series are used in exponential smoothing it is easily adapted to forecasting We demonstrate this application of exponen tially smoothed series in Section 134 Exercises 1315 1322 Learning the Mechanics 1315 Describe the effect of selecting an exponential constant of w 2 Of w 8 Which will produce a smoother trend 1316 The monthly time series shown in the table are saved in 0 the LM1316 le l l 3 Calculate the missing values in the exponentially smoothed series using to 5 b Graph the time series and the exponentially smoothed series on the same graph Exponentially Smoothed Month t Y Series W 5 Jan 1 280 Feb 2 281 Mar 3 250 2653 Apr 4 246 2556 May 5 239 Jun 6 218 Jul 7 218 Aug 8 210 Sep 9 205 Oct 10 206 Nov 11 200 Dec 12 200 Applying the Concepts Basic 1317 Annual US beer production Refer to the annual US a beer production time series Exercise 137 p 1310 saved in the USBEER le 3 Calculate the exponentially smoothed series for US b beer production for the period 1980 2007 using to 2 Calculate the exponentially smoothed series using to 8 Plot the two exponentially smoothed series to 2 and w 8 on the same graph Which smoothed series best portrays the longterm trend 1318 Foreign sh production Over shing and pollution of US a coastal waters have resulted in an increased dependence by the United States on the shing grounds of other countries The next table describes the annual sh catch in thousands of Year Chile Peru 2000 4692 10665 2001 4363 7990 2002 4817 8775 2003 4176 6100 2004 5584 9627 2005 5029 9416 Source US Census Bureau Statistical Abstract of the United States 2009 1316 CHAPTER 13 Time Series metric tons in all shing areas of Peru and Chile for the years 2000 to 2005 These data are saved in the FISHTONS le a Compute an exponentially smoothed series for both Chile b and Peru using a smoothing coef cient of w 5 Plot both actual series and both smoothed series on the same graph Describe the differences in variation of catches over time between the two countries For exam ple do they move up and down together over time Applying the Concepts Intermediate 1319 Yearly price of gold The price of gold is used by some a nancial analysts as a barometer of investors expectations of in ation with the price of gold tending to increase as con cerns about in ation increase The table below shows the average annual price of gold in dollars per ounce from 1990 through 2008 These data are saved in the GOLDYR le a b Year PHce Year PHce 1990 384 2000 279 1991 362 2001 271 1992 344 2002 310 1993 360 2003 363 1994 384 2004 410 1995 384 2005 445 1996 388 2006 603 1997 331 2007 695 1998 294 2008 872 1999 279 Source World Gold Council wwwkitcocom Compute an exponentially smoothed series for the gold price time series for the period from 1990 to 2008 using a smoothing coef cient of w 8 Plot the original series and the exponentially smoothed se ries on the same graph Comment on the trend observed 1320 Personal consumption in transportation There has been 0 phenomenal growth in the transportation sector of the economy since 1990 The personal consumption expenditure gures in billions of dollars are given in the table below are saved in the TRANSPRT le Year Expenditure on Transportation 1990 5936 1991 5532 1992 5851 1993 6114 1994 6463 1995 6586 1996 6908 1997 7307 1998 7813 1999 8321 2000 8535 2001 8721 2002 8911 2003 9130 2004 9263 2005 9059 2006 9221 2007 9252 2008 9153 Source US Census Bureau Statistical Abstract of the United States 2009 a Compute exponentially smoothed values of this personal consumption time series using the smoothing constants w 2andw 8 Plot the actual series and the two smoothed series on the same graph Comment on the trend in personal con sumption expenditure on transportation in the 2000s as compared to the 1990s 1321 OPEC crude oil imports The data in the table below a saved in the OPECOIL le are the amounts of crude oil millions of barrels imported into the United States from the Organization of Petroleum Exporting Countries OPEC for the years 1990 2007 a b Year t Imports Yr 1990 1 1283 1991 2 1233 1992 3 1247 1993 4 1339 1994 5 1307 1995 6 1219 1996 7 1258 1997 8 1378 1998 9 1522 1999 10 1543 2000 11 1659 2001 12 1770 2002 13 1490 2003 14 1671 2004 15 1948 2005 16 1738 2006 17 1745 2007 18 1969 Source US Census Bureau Statistical Abstract of the United States 2009 Construct two exponentially smoothed series for this time series using w 1 and w 9 Plot the original series and the two smoothed series on the same graph Which smoothed series looks more like the original series Why 1322 SampP 500 Stock Index Standard amp Poor s 500 Composite 0 Stock Index SampP 500 is a stock market index Like the Dow Jones Industrial Average it is an indicator of stock Year 2001 2002 2003 2004 Quarter SampP 500 Year Quarter SampP 500 1 11603 2005 1 11806 2 12244 2 11913 3 10409 3 12288 4 11481 4 12483 1 11474 2006 1 12949 2 9898 2 12702 3 8153 3 13358 4 8798 4 14183 1 8482 2007 1 14209 2 9745 2 15033 3 9960 3 15267 4 11119 4 14684 1 11262 2008 1 13227 2 11408 2 12800 3 11146 3 11647 4 12119 4 9033 Source Standard amp Poor s Statistical Service Current Statistics 2009 wwweconomagiccom SECTION 133 Time Series Components 1317 market activity The data in the table saved in the SP500 b Repeat part a but use to 7 le are endof quarter values of the SampP 500 for the c Which exponentially smoothed series do you prefer for years 2001 2008 describing trends in the series Explain a Calculate and plot the exponentially smoothed series for the quarterly SampP 500 using a smoothing constant of w 3 133 Time Series Components In the previous two sections we showed how to use various descriptive techniques to ob tain a picture of the behavior of a time series Now we want to expand our coverage to include techniques that will let us make statistical inferences about the time series These inferential techniques are generally focused on the problem of forecasting future values of the time series Before forecasts of future values of a time series can be made some type of model that can be projected into the future must be used to describe the series Time series models range in complexity from descriptive models such as the exponential smoothing models discussed in the previous section to inferential models such as the combinations of regression and specialized time series models to be discussed later in this chapter Whether the model is simple or complex the objective is the same to produce accurate forecasts of future values of the time series Many different algebraic representations of time series models have been proposed One of the most widely used is an additive modelilt of the form YTCSR The secular trend T also known as the longterm trend is a time series compo nent that describes the longterm movements of Y For example if you want to charac terize the secular trend of the production of automobiles since 1930 you would show T as an upwardmoving time series over the period from 1930 to the present This does not imply that the automobile production series has always moved upward from month to month and from year to year but it does mean the longterm trend has been an increase over that period of time The cyclical effect C generally describes uctuations of the time series about the secular trend that are attributable to business and economic conditions For exam ple refer back to the monthly closing prices of three hightechnology stocks for 2008 Table 132 p 135 Recall that a plot of the simple composite index Figure 134 showed a generally decreasing secular trend However during periods of recession the index tends to lie below the secular trend while in times of general economic expan sion it lies above the longterm trend line The seasonal effect S describes the uctuations in the time series that recur dur ing speci c time periods For example quarterly power loads for a Florida utility com pany tend to be highest in the summer months Quarter III with another smaller peak in the winter months Quarter I The spring and fall Quarters II and IV sea sonal effects are negative meaning that the series tends to lie below the longterm trend line during those quarters The residual effect R is what remains of Y after the secular cyclical and sea sonal components have been removed Part of the residual effect may be attributable to unpredictable rare events earthquake presidential assassination etc and part to the randomness of human actions In any case the presence of the residual component makes it impossible to forecast the future values of a time series without error Thus the presence of the residual effect emphasizes a point we rst made in Chapter 10 in con nection with regression models No business phenomena should be described by deter ministic models All realistic business models time series or otherwise should include a residual component gtquotAnother useful form is the multiplicative model Y TCSR This can be changed to an additive form by taking natural logarithms ie ln Y ln T 1n C 1n S 1n R See Section 138 1318 CHAPTER 13 Time Series Each of the four components contributes to the determination of the value of Y at each time period Although it will not always be possible to characterize each compo nent separately the component model provides a useful theoretical formulation that helps the time series analyst achieve a better understanding of the phenomena affecting the path followed by the time series 134 Forecasting Exponential Smoothing In Section 132 we discussed exponential smoothing as a method for describing a time series that involved removing the irregular uctuations In terms of the time series com ponents discussed in the previous section exponential smoothing tends to deemphasize or smooth most of the residual effects This coupled with the fact that exponential smoothing uses only past and current values of the series makes it a useful tool for fore casting time series Recall that the formula for exponential smoothing is E wYt 1 wEt1 where w the exponential smoothing constant is a number between 0 and 1 We learned that the selection of to controls the smoothness of 15 A choice near 0 places more emphasis weight on past values of the time series and therefore yields a smoother series a choice near 1 gives more weight to current values of the series Suppose the objective is to forecast the next value of the time series Yt The exponentially smoothed forecast for Yt1 is simply the smoothed value at time t Ft1 Et where Ft1 is the forecast of Yt1 To help interpret this forecast formula substitute the smoothing formula for E t Ft1 Et 2 wyt 1 wEt 1 wYt 1 wFt E wYt ET Note that we have substituted E for E t1 because the forecast for time tis the smoothed value for time t 1 The nal equation provides insight into the exponential smoothing forecast The forecast for time t 1 is equal to the forecast for time t E plus a correc tion for the error in the forecast for time t Y Ft This is why the exponentially smoothed forecast is called an adaptive forecast the forecast for time t 1 is explicitly adapted for the error in the forecast for time t Because exponential smoothing consists of averaging past and present values the smoothed values will tend to lag behind the series when a longterm trend exists In addition the averaging tends to smooth any seasonal component Therefore exponen tially smoothed forecasts are appropriate only when the trend and seasonal components are relatively insigni cant Because the exponential smoothing model assumes that the time series has little or no trend or seasonal component the forecast Ft1 is used to forecast not only Yt1 but also all future values of Yt1 that is the forecast for two time periods ahead is Ft2 Ft1 and for three time periods ahead is Ft3 Ft2 Ft1 The exponential smoothing forecasting technique is summarized in the next box SECTION 134 ForecastingExponentiaSmoothing 1319 1 r Calculation of Exponentially Smoothed Forecasts 1 Given the observed time series Y1 Y2 smoothed values E1 E2 El using E1 Y1 E2 LUYZ i 11E1 Yr rst calculate the exponentially Er WY 1 wEt 1 2 Use the last smoothed value to forecast the next time series value F11 Et 3 Assuming that Y is relatively free of trend and seasonal components use the same forecast for all future values of Y F12 F11 Ft3 F11 Two important points must be made about exponentially smoothed forecasts 1 The choice of w is crucial If you decide that w will be small near 0 you will obtain a smooth slowly changing series of forecasts On the other hand the se lection of a large value of w near 1 will yield more rapidly changing forecasts that depend mostly on the current values of the series In general several val ues of w should be tried to determine how sensitive the forecast series is to the choice of w Forecasting experience will provide the best basis for the choice of w for a particular application 2 The farther into the future you forecast the less certain you can be of accuracy Because the exponentially smoothed forecast is constant for all future values any changes in trend or seasonality are not taken into account However the uncer tainty associated with future forecasts applies not only to exponentially smoothed forecasts but also to all methods of forecasting In general time series forecasting should be con ned to the short term Example 135 Forecasting Annual Sales Revenue with Exponentially Smoothing Problem Refer to Example 134 p 1314The annual sales data recorded in thousands of dollars for a rm s rst 35 years of operation are reproduced in the Minitab work sheet Figure 1310 Apply the exponential smoothing technique to the data for years 1 to 32 in order to forecast sales revenue in years 33 34 and 35 Make forecasts using both w 3 and w 7 and compare the results Solution First we used Minitab to calculate the exponentially smoothed series for years 1 32 using both w 3 and w 7 These smoothed values are shown in Figure 1310 in the columns labeled EXP3 and EXP7 respectively Now the forecast for year 33 is simply the smoothed sales revenue value in the last year of the smoothed series year 32 Consequently we have F33 2 E32 2 12845 for w 3 values highlighted on Figure 1310 14243 for w 7 With exponential smoothing the same forecasts are made for any future year Thus we have F34 2 E32 2 fOI LU 14243 for w and F35 E32 2 forLU 14243 for w 1320 CHAPTER 13 Time Series 4 51 12 15 54 T 39 511555 39 511125 39 E1522 1 39 1 55 5555 5555 2 2 45 5255 4545 5 39 5 55 5552 5212 4 4 155 5412 12454 5 5 251 15415 21515 5 5 255 15455 25555 2 139 514 25255 25151 5 quot 5 455 25551 42545 5 quot 5 451 55555 44555 15 39 15 555 54455 55555 11 1 1 555 54255 55425 12 39 12 555 45555 45525 15 quot 15 454 42252 45554 14 14 515 45454 52555 15 39 15 555 55554 55211 15 a 15 214 55525 55545 1 1139 554 55225 25555 15 15 555 24455 55255 15 15 1552 55555 55512 25 25 554 55252 55454 21 39 21 552 54552 52155 22 22 555 55125 55555 25 25 552 52545 55155 24 24 551 55555 55 125 25 25 1555 55554 55545 25 39 25 1112 55 555 152554 22 22 1522 55555 155511 25 39 25 1155 154521 112525 Figure 1310 25 25 1152 155555 112542 Minitab worksheet With actual an I 33 1252 113355 12311511 and 1301 th 33110th 51 quot 51 1555 125555 152255 11 an 11 sa CS 1quotevenues years 1 32 3a 1 32 1 45 3 3quotiiiijif 5i39 m Table 136 Sales Revenues Years 33 35 Actual versus Forecast Values Year Actual Forecast W 3 Forecast Error Forecast w 7 Forecast Error 33 1501 12845 2165 14243 767 34 1514 12845 2295 14243 897 35 1509 12845 2245 14243 847 Both sets of forecasts are shown in Table 136 Also shown are the actual sales rev enue values and corresponding forecast errors for years 33 35 The forecast error is de ned as the actual value minus the forecast value ie Forecast error Actual Forecast You can see that the forecast errors using 11 7 are considerably smaller than the fore cast errors for w 3 Consequently future forecasts of the sales revenue time series Will likely have a smaller forecast error When a smoothing constant of 7 is employed Look Back Note that both the w 3 and w 7 forecasts underestimate the sales rev enues for years 33 35 This is because exponentially smoothed forecasts implicitly assume no trend eXists in the time series This example illustrates the risk associated With anything other than very shortterm forecasting Now Work Exercise 1325a SECTION 135 Forecasting Trends Holt39s Method 1321 Many time series have longterm or secular trends For such series the exponentially smoothed forecast is inappropriate for all but the very short term In the next section we present an extension of the exponentially smoothed forecast Holt s method that allows for secular trend in the forecasts 135 Forecasting Trends Holt39s Method The exponentially smoothed forecasts for the sales revenue values in the previous section have large forecast errors in part because they do not recognize the trend in the time se ries In this section we present an extension of the exponential smoothing method of fore casting that explicitly recognizes the trend in a time series The Holt forecasting model consists of both an exponentially smoothed component E t and a trend component Tgt lt The trend component is used in the calculation of the exponentially smoothed value The following equations show that both E t and T are weighted averages Et 2 wyt 1 wEt 1 Tt l Tt vEt Et l 1 vTt 1 Note that the equations require two smoothing constants to and 1 each of which is between 0 and 1 As before to controls the smoothness of 15 a choice near 0 places more emphasis on past values of the time series while a value of to near 1 gives more weight to current values of the series and deemphasizes the past The trend component of the series is estimated adaptively using a weighted aver age of the most recent change in the level represented by E E t1 and the trend es timate represented by Tt1 from the previous period A choice of the weight 1 near 0 places more emphasis on the past estimates of trend while a choice of 1 near 1 gives more weight to the current change in level The calculation of the components for Holt s model which proceeds much like the exponential smoothing calculations is summarized in the box I I Steps for Calculating Components of the Holt Forecasting Model 1 Select an exponential smoothing constant w between 0 and 1 Small values of w give less weight to the current values of the time series and more weight to the past Larger choices assign more weight to the current value of the series 2 Select a trend smoothing constant 0 between 0 and 1 Small values of 0 give less weight to the current changes in the level of the series and more weight to the past trend Larger values assign more weight to the most recent trend of the series and less to past trends 3 Calculate the two components E and T from the time series Y beginning at time t 2 as follows lt E2 2 Y2 T2 2 Y2 Y1 E3 IUY3 T2 in II E3 E2 1 vT2 Et 2 wyt 1 wEt 1 Tt l Tr Et Et l 1 vTt 1 Note E1 and T1 are not de ned I I gtquotIn some statistical sofware packages eg Minitab Holt s method is called double exponential smoothing gt The calculation begins at time t 2 rather than at t 1 because the first two observations are needed to obtain the first estimate of trend T2 As an option some statistical software packages use simple linear regression to estimate E1 and T1 for the model EY 80 Blt E1 80 and T1 8 1 1322 CHAPTER 13 Time Series Example 136 Applying Holt39s Method to Annual Sales Data Problem Refer to the yearly sales data Examples 134 and 135 pp 1314 and 1319 Using 11 7 and v 5 apply Holt s smoothing method to the data for years 1 through 30 Give the values of the smoothed and trend components each year then plot the data and the smoothing component E on the same graph Solution We used Minitab to perform Holt s calculations and generate the values of E and T for the annual series The E and T values for each year are listed under the SMOOTH and TREND columns respectively on the Minitab worksheet Figure 1311 A graph of Y and E is shown in Figure 1312 Note that the trend component T measures the general upward trend in th lt 4 51 12 55 54 T 511155 5111515114 T551115 39 1 1 55 5555 414555 2 2 45 5554 255525 5 5 55 5255 124552 4 4 155 15154 455525 5 5 251 22545 554551 5 5 255 255511 555551 139 39139 514 25554 524451 5 5 455 44155 551452 5 5 451 45455 555555 15 39 15 555 411155 515555 11 11 555 55424 4255551r 115 12 555 45551 425555 15 15 454 455115 255111 1 14 14 515 55552 555154 15 555 5545 551555 15 15 514 51455 525445 11 39 11quot 554 51555 5115555 15 15 555 52455 545552 15 15 1552 152555 55255 25 25 554 55555 554155 21 39 21 552 55525 425555 22 39 22 555 551115 455555 25 25 552 551155 4555554 24 24 551 55555 525555 25 25 1555 55 541 555555 25 25 111 155255 551212 Figure 1311 27 p 21 1522 15525 1115555 Minitab worksheet with actual 25 23 1155 113231 435939 and H01 smoothed w 7 25 39 55 1152 115525 451555 and 1 5 sales revenues 7 I years 130 55 p 55 1252 Smnnt39lling Flat iur SALES Dou hle Expunantiall 14D 1 gang i Ac1ml 125 quotW m 3 1144151555 515555415 l E a 355359 Figure 1312 Minitab graph of Holt s smoothed w 7 and 1 5 sales data gt Minitab uses simple linear regression to calculate the initial smoothed and trend values See the footnote on page 1321 SECTION 135 Forecasting Trends Holt39s Method 1323 Look Back The choice of v 5 gives equal weight to the most recent trend and to past trends in the sales of the rm The result is that the exponential smoothing component Er provides a smooth upwardtrending description of the rm s sales Our objective is to use Holt s exponentially smoothed series to forecast the future values of the time series For the onestepahead forecast this is accom plished by adding the most recent exponentially smoothed component to the most recent trend component that is the forecast at time t 1 given observed values up to time t is Ft1 Et Tr The idea is that we are constructing the forecast by combining the most recent smoothed estimate E with the estimate of the expected increase or decrease attributable to trend T The forecast for two steps ahead is similar except that we add estimated trend for two periods Ft2 E 2Tt Similarly for the kstepahead forecast we add the estimated increase or decrease in trend over k periods FtkEtth Holt s forecasting methodology is summarized in the box 2 I Holt s Forecasting Methodology 1 Calculate the exponentially smoothed and trend components E and Tt for each observed value of Y t 2 2 using the formulas given in the previous box 2 Calculate the onestepahead forecast using Ft1 Et Tr 3 Calculate the kstepahead forecast using Ftk Et th Example 137 Forecasting Annual Sales with Holt39s Method Problem Refer to Example 136 and Figure 1311 which lists the sales revenue for years 1 30 along with Holt s smoothing components using w 7 and v 5 Apply Holt s forecasting methodology to forecast the rm s annual sales in years 31 35 Compute the forecast errors for each of these ve years Solution From Figure 1311 the smoothed and trend values highlighted for the last year year 30 are E30 12473 and T30 546 Consequently the forecast for year 31 the 1year ahead forecast is F31 E30 T30 12473 546 13019 The forecast 2 years ahead is F32 E30 2T30 12473 2546 13565 For years 33 35 we nd 1324 CHAPTER 13 Time Series Table 137 Sales Revenues Years 31 35 Actual versus Forecast Values Year Actual Forecast W 7 v 5 Forecast Error 31 1363 13019 611 32 1468 13565 1115 33 1501 14111 899 34 1514 14657 483 35 1509 15203 113 The forecasts and forecast errors for these ve years are shown in Table 137 Note that unlike the constant exponential smoothing forecasts Holt s forecast values increase from year 31 to year 35 This upward trend in the forecast is a result of Holt s estimated trend component Note also that the forecast errors uctuate in magnitude and in sign This is due to the fact that the actual time series value sales revenue does not neces sarily increase at the same rate as Holt s forecasted values Look Back The selection of w 7 and v 5 as the smoothing and trend weights for the sales forecasts was based on the objectives of assigning more weight to recent series values in the exponentially smoothed component and of assigning equal weights to the recent and past trend estimates Most forecasters will try several different combinations of weights when using Holt s forecasting model in order to assess the sensitivity of the forecasts to the choice of weights The values of w and 1 that lead to the smallest fore cast errors are typically used to make future forecasts Now Work Exercise 1325b In the following section we demonstrate how to use the forecast errors to develop a single measure of forecast accuracy These measures can then be used to compare and contrast different forecasting methodologies Exercises 1323 1331 Learning the Mechanics 1323 How does the choice of the smoothing constant w impact an exponentially smoothed forecast 1324 Refer to Exercise 134 p 1310 The table with the prices for product A for the four quarters of last year is repro duced below Holt s smoothing method with w 2 and v 6 was applied to the data 3 Use the 1980 2004 values to forecast the 2005 2007 production using simple exponential smoothing with w 3 Repeat with w 7 b Use Holt s model with w 7 and v 3 to forecast the 2005 2007 production Repeat with w 3 and v 7 1326 Quarterly singlefamily housing starts Refer to the quarterly a housing start series Exercise 138 p 1310 and the data saved in the QTRHOUSE le Suppose you want to forecast the number of new housing starts in 2009 using data for 2007 Quarter 1 2 3 4 3 Find the missing trend value for Quarter 3 b Find the missing smoothed value for Quarter 4 c Give the Holt s forecast for the price in Quarter 5 Applying the Concepts Basic 1325 Annual US beer production Refer to Exercise 137 l l p 1310 and the data on US beer production in millions of barrels for the years 1980 2007 The data are saved in A 325 350 390 425 the USBEER le Smoothed Value Tren d and 2008 3 Calculate the exponentially smoothed values for 2007 and 2008 using w 6 350 025 b Plot the housingstarts series and the exponentially 378 smoothed series on the same graph 029 c Use the exponentially smoothed data from 2007 2008 to forecast the quarterly number of housing starts in 2009 1327 Consumer Price Index The CPI measures the increase or a decrease in the prices of goods and services relative to a base year The CPI for the years 1990 2008 using 1984 as a base period is shown in the table on the next page and saved in the CPI le 3 Graph the time series Do you detect a longterm trend b Calculate and plot the exponentially smoothed series for the CPI using a smoothing constant of w 4 Use the ex ponentially smoothed values to forecast the CPI in 2009 SECTION 136 Measuring Forecast Accuracy MAD and RMSE 1325 c Use Holt s forecasting model with trend to forecast the CPI in 2009 Use smoothing constants w 4 and v 5 Applying the Concepts Intermediate 1329 SampP 500 Stock Index Refer to the quarterly Standard amp a Poor s 500 stock market index Exercise 1322 p 1316 Year Cpl Year Cpl saved in the SP500 le 3 Use exponential smoothing with w 7 to smooth 1990 1258 2000 1715 the series from 2001 through 2007 Then forecast the 1991 1291 2001 1771 quarterly values in 2008 using only the information 1992 1328 2002 1799 through the fourth quarter of 2007 1993 1368 2003 1840 b Repeat part 3 using w 3 1994 1478 2004 1889 1995 1524 2005 1953 1330 SampP 500 Stock Index cont d Refer to Exercise 1329 1996 1569 2006 2016 Suppose you want to use only the 2005 2007 SampP values to 1997 1605 2007 2073 forecast the quarterly 2008 values Calculate the forecasts 1998 1630 2008 2153 using Holt s model with w 3 and v 5 Repeat with 1999 1666 w 7 and v 5 Source Survey of Current Business U S Department of Commerce Bureau of Economic Analysis 1331 Monthly gold prices The fluctuation of gold prices is a a reflection of the strength or weakness of the US dollar The table below saved in the GOLDMON file shows monthly gold prices from January 2001 to December 2008 3 Use exponential smoothing with w 5 to calculate monthly smoothed values from 2001 to 2007 Then forecast the monthly gold prices for 2008 1328 OPEC crude oil imports Refer to the annual OPEC 0 oil import data Exercise 1321 p 1316 saved in the OPECOIL le 3 Use the exponentially smoothed w 9 series you constructed in Exercise 1321a to forecast OPEC oil imports in 2007 b Forecast OPEC oil imports in 2007 using the Holt Winters forecasting model with smoothing constants w 3andv 8 Calculate 12 onestepahead forecasts for 2008 by up dating the exponentially smoothed values with each month s actual value and then forecasting the next month s value Calculate the errors of the forecasts parts a and b Which Repeat parts a b using Holt s method with w 5 and method yields the smallest forecast error 1 5 Month 2001 2002 2003 2004 2005 2006 2007 2008 Jan 2655 2817 3569 4140 4242 5499 6312 8896 Feb 2619 2955 3590 4053 4234 5550 6647 9223 Mar 2630 2940 3406 4067 4342 5571 6549 9684 Apr 2605 3027 3282 4030 4289 6106 6794 9097 May 2724 3145 3557 3834 4219 6765 6669 8887 Jun 2702 3212 3565 3920 4307 5962 6555 8895 Jul 2675 3133 3510 3981 4245 6338 6653 9398 Aug 2724 3103 3598 4005 4379 6326 6654 8390 Sep 2834 3192 3789 405 3 4560 5982 7127 8299 Oct 2831 3166 3789 4205 4699 5858 7546 8066 Nov 2762 3192 3899 4394 4767 6278 8063 7609 Dec 2759 3334 4076 4417 5098 6298 8032 8161 Sources Standard amp Poor s Statistics 2009 wwwkitcocom Current Statistics 2009 136 Measuring Forecast Accuracy MAD and RMSE As demonstrated in Example 135 forecast error ie the difference between the actual time series value and its forecast can be used to evaluate the accuracy of the forecast Knowledge of a forecast s accuracy aids in the selection of both the forecasting method ology to be utilized and the parameters of the forecast formula eg the weights in the exponentially smoothed or Holt forecasts Three popular measures of forecast accu racy both based on forecast errors are the mean absolute deviation MAD the mean absolute percentage error MAPE and the root mean squared error RMSE 0f the forecasts Their formulas are given in the box 1326 CHAPTER 13 Time Series Measures of Forecast Accuracy for m Forecasts Assume time series data for t 1 2 3 are used to make forecasts for the periodst n 1n 2 n m 1 Mean absolute deviation MAD nm EY1Ft tn1 n1 MAD 2 Mean absolute percentage error MAPE Yr Ft Yr nm tn1 MAPE X 100 no 3 Root mean squared error RMSE RMSE Note that all three measures require one or more actual values of the time series against which to compare the forecasts Thus we can either wait several time periods un til the observed values are available or we can hold out several of the values at the end of the time series not using them to model the time series but saving them for evaluat ing the forecasts obtained from the model Example 138 Comparing Measures of Forecast Accuracy Problem Refer to the annual sales data of Examples 136 and 137 pp 1322 and 1323 We want to use the data for all 35 years to forecast annual sales for years 36 40 Consider three alternative forecasting models exponential smoothing with w 3 exponential smoothing with w 7 and Holt s method with w 7 and v 5 Minitab was used to obtain the forecasts for these alternative models The Minitab printouts shown in Figures 1313a c give the forecasts highlighted for all three models Suppose the actual sales values in thousands of dollars for years 36 40 are 1502 1617 1593 1685 and 1704 respectively Find measures of forecast accuracy MAD MAPE and RMSE for each of the three forecasting models and use this information to evaluate the models Single Exponential Smoothing for SALES Hana SALES Length 35 Smoothl g Gangrene Alpha 3 AGEEEEEE Measures MAPE 214EE FEED 13 LEquot MED E ga EDEECESES Adquot w quot 7 5 r r 1115 4319 Flgure 1313a m we gage Minitab forecasts of annual 1 7 r u r m 11 x27 1 1 1397 sales exponentlal smoothlng TE 2 I i i 115 q 115 i am w 3 SECTION 136 Measuring Forecast Accuracy MAD and RMSE 1327 Single Exponential Smoothing for S LE S Data SELES Length 35 Emnathing Canstant Alpha T Accuracy Hea5u2ea M EE 15 SE HAD EDETS MED 53252 Forecast 3351 if vrhain Lowe Hype 39 130935 17oe a 130135 i EET i3 i ampET 39 i l 1146 I EEGEQEE i amp Figure 1313b Minitab forecasts of annual I sales exponential smoothing w 7 Solution For ease of notation we will number the forecasting models as follows Model Exponential smoothing w 3 Model 2 Exponential smoothing w 7 Model 3Holt s method 00 7 v 5 The forecasts for the three models as well as forecast errors are listed in the Excel workbook Figure 1314 These forecast errors are used to nd the MAD MAPE and RMSE measures of forecast accuracy for each of the three models For example for Model 1 701 1851 1611 2531 2721 MAD1 1883 701 1851 1611 2531 2721 MAPEl 1502 1617 1593 1685 1704 X 100 2 1144 5 2 2 2 2 2 RMSEl 701 1851 1611 2531 2721 2 2016 The formulas for MAD MAPE and RMSE were programmed into Excel The results are shown highlighted in Figure 1314 For Model 2 MAD2 1150 MAPE2 692 and RMSE2 1339 For Model 3 MAD3 444 MAPE3 273 and RMSE3 489 Double Exponent smoothing for SALES Data EELEE Lengch 35 Smoothing Sonata us alpha levelj QT Gamma trend 5 Reguracy Heagures PEPE Q a 3 Equot amp LEAD D l LHSD Elamp353 Lowe Upper 33132 1T2EEE EEEEE 1T EEE i i i 1955 i 5 13539 i 3 12552 2 7 Figure 1313c Minitab forecasts of annual sales Holt s method w 7 and v 5 1328 CHAPTER 13 Time Series it B I D E F E H i J it MDDEIL 1 1211319 14319 14319 1amp119 I ll1 15 I SAUEE El 1 Elm 1351 1611 2531 Hall MQDEL 2 ERRDH 2 151313 4153 MILES we 15033 35 153315 11quot lE ES 135 MODEL 3 E 3 391 5511 TI 1 5131 155 1 55195 55 1i i3919 i531 Iii41151 53 Figure 1314 Excel workbook with measures of annual sales forecast accuracy E l l l l Emmqmmhmmp You can see that model 3 has the smallest MAD as well as the smallest MAPE and the smallest RMSE Of the three forecasting models then Model 3 Holt s method with w 7 and v 5 yields the most accurate predictions of future annual sales Look Back We expect Holt s method to yield more accurate forecasts for annual sales because it explicitly accounts for trends in the sales data The exponential smoothing forecasts do not account for any increasing or decreasing trends in the data hence they are the same for all ve forecasted years The accuracy of all three forecasting methods however will decrease the further we forecast into the future Now Work Exercise 1332 Ethics 1N Statistics Intentionally selecting time periods with small forecast errors and omitting time periods with large forecast errors in the calculations of MAPE MAD and RMSE for the purpose of evaluating forecast accuracy is considered unethical statistical practice N0te Most statistical software packages will automatically compute the values of MAPE MAD and RMSE equivalant to the square root of the mean squared deviation MSD for all n observations in the data set For example these statistics are shown in the middle of the Minitab printouts Figures 1313a c Criteria such as MAPE MAD and RMSE for assessing forecast accuracy require special care in interpretation The number of time periods included in the evaluation is critical to the decision about which forecasting model is preferred The choice depends on how many time periods ahead the analyst plans to forecast With N time periods in your data a good rule of thumb is to forecast ahead no more than N2 time periods Remember however that longterm forecasts are generally less accurate than shortterm forecasts We conclude this section with a comment A major disadvantage of forecasting with smoothing techniques exponential smoothing or Holt s model is that no meas ure of the forecast error or reliability is known prior to observing the future value Although forecast errors can be calculated after the future values of the time series have been observed as in Example 138 we prefer to have some measure of the accuracy of the forecast before the actual values are observed One option is to com pute forecasts and forecast errors for all n observations in the data set and use these past forecast errors to estimate the standard deviation of all forecast errors ie the standard error of the forecast A rough estimate of this standard error is the value of RMSE and an approximate 95 prediction interval for any future forecast is F i 2RMSE An interval like this is shown at the bottom of the Minitab printouts Figures 1313a c However because the theoretical distributional properties of the forecast errors with smoothing methods exponential smoothing or Holt s method are unknown many analysts regard smoothing methods as descriptive procedures rather than as inferential ones In the preceding chapters we learned that predictions with inferential regression models are accompanied by wellknown measures of reliability The standard errors of the predicted values allow us to construct 95 prediction intervals We discuss an infer ential time series forecasting model in the next section SECTION 136 Measuring Forecast Accuracy MAD and RMSE Statistics IN Action Revisited Recall that a pharmaceutical company hired consultants at Rutgers University to forecast monthly sales of a new brand of cold medicine called Coldex The company provided monthly data on Coldex sales for the rst 2 years of the product s life and desires forecasts of sales for the rst 3 months of the third year The data are saved in the COLDEX le One forecasting model considered by the consultants was an exponential smoothing model with a smoothing constant of w 7 Minitab was used to nd the smoothed values of the monthly series The Minitab plot of both the actual monthly sales and smoothed sales values is shown in Figure SIA131 followed by the exponentially smoothed forecasts in Figure SIA132 The exponentially smoothed sales forecast for each of the rst 3 months of year 3 is the smoothed value for the last month of the series month 24 This value highlighted on Figure SIA132 is 4870 thousand dollars Minitab also gives approximate 95 con dence bounds around the forecast The interval highlighted on the printout is 1750 7989 Thus we are approximately 95 con dent that the actual sales for the Single Expnnen al Smnntliiirng Flint fnr SALES SALES ll 2 Figure SIA131 Minitab plot of monthly Coldex sales with exponentially smoothed values w 7 Exercises 1332 1337 Applying the Concepts Basic 1332 Annual US beer production Refer to the beer production Ml forecasts Exercise 1325 p 1324 In part a you obtained a forecasts of 2005 2007 beer production using exponential smoothing with w 3 and w 7 Recall that the data are saved in the USBEER le 3 Calculate the forecast errors for the w 3 exponentially smoothed forecasts b Calculate the forecast errors for the w 7 exponentially smoothed forecasts 0 Calculate MAD MAPE and RMSE for the exponential smoothing forecasts using w 3 d Calculate MAD MAPE and RMSE for the exponential smoothing forecasts using w 7 1329 Forecasting with Exponential Smoothing month will be between 1750 and 7989 thousand dollars This wide interval was deemed un usable by the pharmaceutical V company consequently the A 39 consultants searched for a bet 39 ter forecasting model One of these models is presented in the next Statistics in Action Revisited p 1332 a Data Set COLDEX Single Exponential Smoothing for Data SALES Length 2 Smnnnning Cnnsnant Alpha D nnnunnnn 1123311525 HAPE EB HAD 12T3 HED 251225 Farecasns Period 25 3 i5 21 Figure SIA132 Minitab forecasts of monthly Coldex sales using exponential smoothing w 7 e Refer to parts 0 and 1 Which forecast method do you recommend 1333 Annual US beer production cont d Refer to the beer production forecasts Exercise 1325 p 1324 In part b you obtained forecasts of 2005 2007 beer production using Holt s method with w 3 v 7 and w 7 v 3 3 Calculate the forecast errors for the w 3 v 7 Holt forecasts b Calculate the forecast errors for the w 7 v 3 Holt forecasts 0 Calculate MAD MAPE and RMSE for the w 3 v 7 Holt forecasts 1 Calculate MAD MAPE and RMSE for the w 7 v 3 Holt forecasts 1330 CHAPTER 13 Time Series e Refer to parts 0 and 1 Which forecast method do you recommend 1334 SampP 500 Stock Index Refer to your exponential smoothing O forecasts of the quarterly SampP 500 for 2008 Exercise 1329 p 1325 Recall that the data are saved in the SP500 le a Calculate MAD MAPE and RMSE for the forecasts with w 7 b Calculate MAD MAPE and RMSE for the forecasts with w 3 0 Compare MAD MAPE and RMSE for the two simple exponential smoothing forecast models Which model leads to more accurate forecasts 1335 SampP 500 Stock Index cont d Refer to your Holt fore casts of the quarterly SampP 500 for 2008 Exercise 1330 p 1325 a Calculate MAD MAPE and RMSE for the forecasts withw 3 and v 5 b Calculate MAD MAPE and RMSE for the forecasts withw 7 and v 5 0 Compare MAD MAPE and RMSE for the two Holt forecasts models Which model leads to more accurate forecasts Applying the Concepts Intermediate 1336 Monthly gold prices Refer to Exercise 1331 p 1325 and a the data saved in the GOLDMON le Two models were used to forecast the monthly 2008 gold prices an exponen tial smoothing model with w 5 and a Holt model with w 5andv 5 a Use MAD MAPE and RMSE criteria to evaluate the two models accuracy for forecasting the monthly 2008 values using the 2001 2007 data b Use the MAD MAPE and RMSE criteria to evaluate the two models accuracy when making the 12 one stepahead forcasts updating the models with each month s actual value before forecasting the next month s value 1337 US school enrollments The next table saved in the 0 SCHOOLENROLL le reports annual US school enroll ment in thousands for the period 1990 2008 Year Enrollment Year Enrollment 1990 60267 2000 68146 1991 61605 2001 69936 1992 62686 2002 71215 1993 63241 2003 71442 1994 63986 2004 71688 1995 64764 2005 72075 1996 65743 2006 73318 1997 66470 2007 73685 1998 66983 2008 74079 1999 67667 Source US Census Bureau StatisticalAbstract 0f the United States 2009 a Use the 1990 to 2005 enrollments and simple exponential smoothing to forecast the 2006 2008 school enrollments Use w 8 b Use Holt s method with w 8 and v 7 to forecast the 2006 2008 enrollments c Apply the MAD MAPE and RMSE criteria to evaluate the two forecasting models of parts a and b Which model is better Why 137 Forecasting Trends Simple Linear Regression Perhaps the simplest inferential forecasting model is one with which you are familiar the simple linear regression model A straightline model is used to relate the time series Yr to time t and the least squares line is used to forecast future values of K Suppose a rm is interested in forecasting its sales revenues for each of the next 5 years To make such forecasts and assess their reliability a time series model must be constructed Refer again to the yearly sales data for a rm s 35 years of operation given in Table 135 p 1314 A Minitab plot of the data Figure 1315 reveals a linearly increasing trend so the model EYt 30 Blt Eu uf SALES vs T 1531 9 1413 a 12 quot 39 1111 t Ir i 1 m V quot II Ei at quot 39 395 W 7 Etl V 1 a 9 fl 40 a i 23913 a If e Figure 1315 a 3 YE I r r I iquot k I 1 M1n1tab scatterplot of annual Cl 5 10 an 25 an 35 40 T sales with least squares line Figure 1316 SPSS least squares regression of annual sales Figure 1317 SPSS spreadsheet with 95 prediction intervals for annual sales SECTION 137 Forecasting Trends Simple Linear Regression 1331 31555 Summary 51ng 551531555 553 5551151 I Hi 355355 155 5tmm5t5 1 l 5555 NE 355 35553 5 Pr5515t5r5 5055515531J 555555 515551 1 V M 55115555 5f M555 3555r5 F 3919 HEQFEESlU EEFH EJM E 1 EWBBDZQ 1D412 IIIIIIIIIE HESi UEI 2154353 313 53355 ital 5333373 34 5 F r5515t555 5055515511 T h D5555555Wari35l5 BALEQ C E iEiEl i Sa Si ldj i rdiE Ed Unatandarriiized Cme ic nta Guel icients Mg 5 5 E 515 Enmr 3 319 1i 397 2555 521 555 E1 135 555 35355 555 5 D5555555W5ri55l5 B rLEE seems plausible for the secular trend We t the model to the data using SPSS the resulting printout is shown in Figure 1316 The least squares model highlighted on the printout is 1 30 falr 333 4325r with R2 969 This least squares line is shown on Figure 1315 We can now forecast sales for years 36 40 The forecasts of sales and the corre sponding 95 prediction intervals are shown on the SPSS spreadsheet Figure 1317 For example for t 36 we have 1 15538 with the 95 prediction interval 1382 1726 Similarly we can obtain the forecasts and prediction intervals for years 37 40 Although it is not easily perceptible in the printout the prediction intervals widen as we attempt to forecast farther into the future This agrees with the intuitive notion that shortterm forecasts should be more reliable than longterm forecasts There are two problems associated with forecasting time series using a least squares model Problem 1 We are using the least squares model to forecast values outside the region of observation of the independent variable t that is we are forecasting for values of t between 36 and 40 but the observed sales are for t values between 1 and 35 As we noted in Chapters 10 and 11 it is risky to use a least squares regression model for prediction outside the experimental region Problem 1 obviously cannot be avoided Because forecasting always involves predictions about the future values of a time series some or all of the independent variables i T 5525545355 ll 1535555555 55555555u 1 l 35 15533 13513 13355 3 3 15335 13343 15555 L 35 15553 14555 131135 4 7 33 15535 15531 13513 5 l 55 13553 15515 13531 CHAPTER 13 Time Series 1332 will probably be outside the region of observation on which the model was developed It is important that the forecaster recognize the dangers of this type of prediction If underlying conditions change drastically after the model is estimated eg if federal price controls are imposed on the rm s products during the 36th year of operation then for any forecasting model the forecasts and their con dence intervals are probably useless Problem 2 Although the straightline model may adequately describe the secular trend of the sales we have not attempted to build any cyclical effects into the model Thus the effect of in ationary and recessionary periods will be to increase the error of the forecasts because the model does not anticipate such periods Fortunately the forecaster often has some degree of control over problem 2 as we demonstrate in the remainder of the chapter In forming the prediction intervals for the forecasts we made the standard regres sion assumptions Chapters 10 and 11 about the random error component of the model We assumed the errors have mean 0 constant variance and normal probability distributions and are independent The latter assumption is dubious in time series models especially in the presence of shortterm trends Often if a year s value lies above the secular trend line the next year s value has a tendency to be above the line also that is the errors tend to be correlated see Figure 1315 We discuss how to deal with correlated errors in Section 139 For now we can charac terize the simple linear regression forecasting method as useful for discerning secular trends but it is probably too simplistic for most time series And as with all forecasting methods the simple linear regression forecasts should be applied only over the short term Statistics IN Action Revisited Forecasting with Simple Linear Regression A second model considered by consultants to forecast forecast The intervals for monthly sales of a new cold medicine Coldex was a simple linear regression model with time t as the independent vari able where t 1 2 3 24 The Minitab graph of the least squares line is shown in Figure SIA133 followed by the sim ple linear regression printout in Figure SIA134 Note that the pvalue for testing the slope coef cient is 047 Thus at a 05 the model is statistically useful for pre dicting monthly sales However the coef cient of determination is low R2 168 only about 17 of the sample variation in monthly sales can be explained by the linear time trend The simple linear forecasts for each of the rst 3 months of year 3 are shown on the Minitab worksheet Figure SIA135 Minitab also gives a 95 prediction interval for each months 25 26 and 27 are 4805 81479 5497 82904 K and 6165 84353 respec 39 1 tively For the rst month of 1 1 year 3 ie month 25 we are 8 95 con dent that the actual sales will fall between 4805 and 81479 thousand dollars Similar interpretations are made for the other two forecasts As with the exponential smoothing model these intervals were too wide to be of practical use by the pharmaceutical company How can the forecasting model be improved Examine Figure SIA133 and note the cyclical trends in the Trend Analysis Plat fur Tremi In del W 1053313 Figure SIA133 Minitab plot of least squares Variable 9 maid I Fits iWE 136 irs39l 1413 MED line for forecasting monthly 4 13 3954 113 Coldex sales 131 1is in 2b 2 aquot TIME Figure SIA134 Minitab simple linear regression printout for the linear trend forecasting model monthly sales data Neither the exponential smoothing model nor the linear regression model account for this cyclical variation A forecasting model is needed that explicitly SECTION 138 Seasonal Regression Models 1333 Regression Analysis SALES versus TIME The regression squatinn is EALES 2 1663 105 TIHE Pnsdittsn Cnsf SE Cssf T P Csnstsnt 15684 1 232 D 3 TIME 10583 5U23 211 UDT s 1nns52 Rsq 168 R qisdj 130 insists is sf quotUs isnss Saunas DF 33 FE F P Regressisn 1 izasnsus lzasnans n4s nns Rssidnsi Ennnn 22 EEEESSiE ESEiSSE Tutsi 23 TE2223 4 e1 7 c2 es 7 c4 5 7 es 39 EAR munl TiMlE FastCAST FLEEaniFLssu E 39l 3 i 215 2 3 2 EB 39 57 14 W V 3 3 2 Figure SIA135 Minitab worksheet with simple linear regression forecasts of monthly sales accounts for such trends Such a model is presented in the next Statistics in Action Revisited p 1336 a Data Set COLDEX 138 Seasonal Regression Models Many time series have distinct seasonal patterns Retail sales are usually highest around Christmas spring and fall with lulls in the winter and summer periods Energy usage is highest in summer and winter and lowest in spring and fall Teenage unemployment rises in summer months when schools are not in session and falls near Christmas when many businesses hire parttime help Multiple regression models can be used to forecast future values of a time series with strong seasonal components To accomplish this the mean value of the time series E Y is given a mathematical form that describes both the secular trend and seasonal components of the time series Although the seasonal model can assume a wide variety of mathematical forms the use of dummy variables to describe seasonal differences is common For example consider the power load data for a Southern utility company shown in Table 138 Data were obtained for each quarter from 1998 through 2009 A model that combines the expected growth in usage and the seasonal component is EYt 30 Blt 32Q1 33Q2 34Q3 where t 2 Time period ranging from t 1 for quarter 1 of 1998 to t 48 for quarter 4 of 2009 1334 CHAPTER 13 Time Series Figure 1318 Minitab least squares t to quarterly power load model Table 138 Quarterly Power Loads megawatts for a Southern Utility Company 1998 2009 Year Quarter Power Load 1998 1 688 2 650 3 884 4 690 1999 1 836 2 697 3 902 4 725 2000 1 1068 2 892 3 1107 4 917 2001 1 1086 2 989 3 1201 4 1021 2002 1 1131 2 942 3 1205 4 1074 2003 1 1162 2 1044 3 1317 4 1179 Q14 1 if Quarter 1 if Quarter 2 3 0r 4 if Quarter 2 if Quarter 1 3 0r 4 if Quarter 3 if Quarter 1 2 0r 4 Year 2004 2005 2006 2007 2008 2009 Quarter UJNr kaNH4kUJNr 4kUJNr 4kUJNr 4kUJNi k Power Load 1306 1168 1442 1233 1423 1240 1461 1355 1471 1193 1382 1276 1434 1340 1596 1351 1495 1233 1544 1394 1516 1337 1545 1351 9 Data Set QTRPOWER The Minitab printout in Figure 1318 shows the least squares t of this model to the data in Table 138 ngrsssisn Analysis LEAD versus T m Q2 T 5 15 nalysis sf variance Source REQIESSiD Residual Eris Total 3 23375 2555 31030 IJli CYILIT Ii Ii 1 M 1 sh Im M 3 ID 39D39 D Equot Re t 1 adj P DOQ QDD 009 i251 GDQ ilil is s 93 Figure 1319a Minitab worksheet with quarterly power load forecasts SECTION 138 Seasonal Regression Models 1335 Note that the model appears to t well with R2 914 indicating that the model accounts for about 91 of the sample variability in power loads over the 12year period The global F 11488 pvalue 000 strongly supports the hypothesis that the model has predictive utility The model standard deviation of 786 indicates that the model predictions will usually be accurate to within approximately i2786 or about 16 megawatts Furthermore 01 164 indicates an estimated average growth in load of 164 megawatts per quarter Finally the seasonal dummy variables have the fol lowing interpretations refer to Chapter 11lt 02 1366 Quarter 1 loads average 1366 megawatts more than Quarter 4 loads B3 374 Quarter 2 loads average 374 megawatts less than Quarter 4 loads 04 1847 Quarter 3 loads average 1847 megawatts more than Quarter 4 loads Thus as expected winter and summer loads exceed spring and fall loads with the peak occurring during the summer months In order to forecast the 2010 power loads we calculate the predicted value lAk for k 49 5051 and 52 at the same time substituting the dummy variable appropriate for each quarter Thus for 2010 fQuarterl I O 3149 1amp2 fQuarter2 1amp0 3150 1amp3 fQuarter3 1amp0 3151 1amp4 fQuarter4 0 3152 2 7051 163649 1366 1643 1724 1724 The predicted values and 95 prediction intervals highlighted are given on the Minitab worksheet Figure 1319a and graphed in Figure 131 Also shown in Figure 1319a are the actual 2010 quarterly power loads Notice that all 2010 power loads fall inside the forecast intervals The seasonal model used to forecast the power loads is an additive model because the secular trend component 81t is added to the seasonal component 82Q1 83Q2 84Q3 to form the model A multiplicative model would have the same form except that the dependent variable would be the natural logarithm of power load that is l Yt 30 31 32Q1 33Q2 34Q3 8 To see the multiplicative nature of this model we take the antilogarithm of both sides of the equation to get Yt 2 EXP 30 31t 32Q1 33Q2 34Q3 8 30CXP31t eXPiBzQ1 33Q2 34Q3 CXP Co nt Sgquar I trend exp H J Residual component Seasonal component The multiplicative model often provides a better forecasting model when the time series is changing at an increasing rate over time When time series data are observed monthly a regression forecasting model needs 11 dummy variables to describe monthly seasonality three dummy variables can be used QTRPDWE fmrecasiEMTW W I C1 quotT 49quot 50 51 52 G2 E3 YEAR GTE 2 I I Elli l 2101 l 210 0 cs as n C Q1 Q2 123 Bill UPPERQECLI 391 r r I 39 l 572 a gtquotThese interpretations assume a xed value of t In practical terms this is unrealistic because each quarter is associated with a different value of 1 Nevertheless the coef cients of the seasonal dummy variables provide insight into the seasonality of these time series data 1336 CHAPTER 13 Time Series 1 quotWIN Inf LEAD T 1 35 6 39 um quot I h iFmemm 15m 125 a u 3 l F5 Figure 1319b 5n Minitab graph of quarterly E 1i 15 1395 in 211 is 3392 3396 4i 4 4 419 532 power loads with forecasts T as in the previous models if the seasonal changes are hypothesized to occur quarterly In general this approach to seasonal modeling requires one dummy variable fewer than the number of seasonal changes expected to occur There are approaches besides the regression dummy variable method for forecasting seasonal time series Trigonometric sine and cosine terms can be used in regression mod els to model periodicity Other time series models such as Holt s exponential smoothing model do not use the regression approach at all and there are various methods for adding seasonal components to these models For example the HoltWinters forecasting model is a modi cation of Holt s model that includes a seasonality component We have chosen to discuss the regression approach because it makes use of the impor tant modeling concepts covered in Chapter 11 and because the regression forecasts are ac companied by prediction intervals that provide some measure of the forecast reliability While most other methods do not have explicit measures of reliability many have proved their merit by providing good forecasts for particular applications Consult the references at the end of the chapter for details of other seasonal models StatisticsINAction Revisited Forecasting with 3 Seasonal Regression Model The consultants hired by the pharmaceutical company de trend they created 11 dummy 1 Q tected a cyclical trend in the monthly sales data They variables x1 x2 x11 for in I noted that sales of the cold medicine were higher during the 12 months of the year the Winter and summer months as compared to the other The seasonal forecasting is months over the 2year period To account for this seasonal model takes the form iii The Eegzeasinn equation is SJLES 3951 THE TIHE SUE JEN 306 FEB 2339 HER 31D1 mFR 3133 HE 2393 JEN 224 JUL 23 h G 2111 EEF 3E QCT 4 new Prediet r Enei SI nei T P Ennucant 29605 32E2 925 DU TIRE amp25 1165 833 Jim 5E E 3555 13 n193 FEE SE55 3515 34 LE HAH H33253 355 51 UDDE APE w l l 353 BT Ha 313 35L9 92 JUN EEEESEv SEE E35 E JUL BEE13 392 EQ5 U UG 623Dt 3H55 13n 9 SE 2L113 341 51q D CT 3515 331 1 5 n3l5 NEH Tamp37 3935 E1T DD53 E 3E31E E Bq 333 Sqla jj 955 Figure 6 Analysis u i liianancre Source DF 33 HQ F F Mlnltal prlntout for seasonal REUTEESELUH 12 Iii54335359 EZSEZTTE 53 a 525 mum regresslon model of monthly Resi u l Errclr 11 unease 1111393 TGEEL 33 TETETEES Coldex sales EYt 30 Blt 32351 33352 34353 35364 36355 37356 38357 39358 310359 311x10 3123511 The Minitab regression printout for the model is shown in Figure SIA136 followed by the model forecasts in Figure SIA137 The global F test pvalue 000 indicates that the model is statistically useful for predicting monthly sales and the model coef cient of determination R2 983 indi cates that over 98 of the sample variation in monthly sales can be explained by the seasonal model Statistically this model is a tremendous improvement over the linear trend model The 95 prediction intervals for sales in months 25 26 and 27 highlighted on Figure SIA137 are 42858 63492 SECTION 138 Seasonal Regression Models 1337 41648 62282 and 16043 36677 respectively Thus for the first month of year 3 month 25 we are 95 confident that the actual sales will fall between 42858 and 63492 thousand dollars Similar interpretations are made for the other two forecasts These intervals are much narrower than those for the previous two forecasting models and they also reflect the expected drop in sales in March month 3 from the winter months This seasonal model was used successfully by the pharmaceutical firm to forecast monthly sales a Data Set COLDEX 4 r r r N 77133 N 7 r H r l TEHE MONTH TlIMlE FGRE CA ST P LQEL UWllFL EEUFP Figure SIA137 1 l 3 1 25 Emma I 1 I 1171 Minitab worksheet with 2 3 2 EB 5196 5 seasonal regression forecasts 9 r 7 of monthly sales 3 3 3 El EESED i Activity 131 Time Series For this activity select a recurring quantity from your own life for which you have monthly records for at least two years This might be the cost of a utility bill the number of cell phone minutes used or even your income If you do not have access to such records use the Internet to find similar data such as median monthly home prices in your area for at least two years Exercises 1338 1344 Learning the Mechanics 1338 The annual price of a nished product in cents per pound 0 from 1994 to 2009 is given in the table below and saved in the LM1338 le The time variable t begins with t 1 in 1994 and is incremented by 1 for each additional year Year t Price Yt Year t Price Yr 1994 1 2173 2002 9 2442 1995 2 2432 2003 10 2549 1996 3 2531 2004 11 2619 1997 4 2636 2005 12 2731 1998 5 27 31 2006 13 2440 1999 6 2758 2007 14 2424 2000 7 2479 2008 15 2587 2001 8 2536 2009 16 2686 3 Fit the straightline model E Y1 80 81t to the data b Give the least squares estimates of the B S c Use the least squares prediction equation to obtain the forecasts for 2010 and 2011 1 Find 95 forecast intervals for 2010 and 2011 1339 Retail sales in Quarters 1 4 over a 10year period for a El department store are shown in hundreds of thousands of 0 dollars in the next table and saved in the LM1339 le 1 Which methods from this chapter might apply to your data Does there appear to be a seasonal component affecting the data If so can you explain the seasonal effect in simple terms 2 Use methods from this chapter to predict the value of your quantity for the next year Be prepared to defend your choice of methods Quarter Year 1 2 3 4 1 83 103 87 135 2 98 121 101 154 3 121 145 127 171 4 137 160 142 192 5 174 197 180 231 6 182 205 186 240 7 200 222 205 251 8 223 251 229 277 9 247 269 251 298 10 258 287 260 322 3 Write a regression model that contains trend and sea sonal components to describe the sales data b Use least squares regression to t the model Evaluate the t of the model c Use the regression model to forecast the quarterly sales during year 11 Give 95 prediction intervals for the forecasts 1340 What advantage do regression forecasts have over expo nentially smoothed forecasts Does this advantage ensure that regression forecasts will prove to be more accurate Explain 1338 CHAPTER 13 Time Series Applying the Concepts Basic 1341 Mortgage interest rates The level at which commercial lending institutions set mortgage interest rates has a significant effect on the volume of buying selling and construction of residential and commercial real estate The data in the table saved in the INTRATE30 file are the annual average mortgage interest rates for conventional fixedrate 30year loans for the period No of Policies Year GnrnHHons Year GnrnHHons 1980 402 1994 366 1981 400 1995 370 1982 390 1996 355 1983 387 1997 351 1984 385 1998 358 1985 386 1999 367 1986 391 2000 369 1987 395 2001 377 1988 391 2002 375 1989 394 2003 379 1990 389 2004 373 1991 375 2005 373 1992 366 2006 375 1993 363 Source US Census Bureau Statistical Abstract of the United States 2009 No of Policies 1985 2007 Year Interest Rate Year Interest Rate 1985 1185 1997 757 1986 1133 1998 692 1987 1046 1999 746 1988 1086 2000 808 1989 1207 2001 701 1990 997 2002 656 1991 1114 2003 589 1992 827 2004 586 1993 717 2005 593 1994 828 2006 647 1995 786 2007 640 1996 776 Source US Census Bureau StatisticalAbstract 0f the United States 2009 a b Fit the simple regression model EYr 30 31f where t is the number of years since 1985 ie t 01 22 Forecast the average mortgage interest rate in 2010 Find a 95 prediction interval for this forecast 1342 Price of natural gas Refer to Exercise 139 p 1311 and 0 the annual prices of natural gas from 1990 to 2007 A sim ple linear regression model EYt 80 81t where t is the number of years since 1990 is proposed to forecast the annual price of natural gas Recall that the data are saved in the NATGAS le 3 Give the least squares estimates of the 8 s and inter pret their values b Evaluate the model s t Find and interpret 95 prediction intervals for the years 2008 and 2009 Describe the problems associated with using a simple linear regression model to predict time series data Applying the Concepts Intermediate 1343 Life insurance policies in force The next table saved in 0 the LIFEINS le represents all life insurance policies in millions in force on the lives of US residents for the years 1980 through 2006 a b C Use the method of least squares to t a simple regres sion model to the data Forecast the number of life insurance policies in force for 2007 and 2008 Construct 95 prediction intervals for the forecasts of part b 1 Check the accuracy of your forecasts by looking up the actual number of life insurance policies in force for 2007 and 2008 in the Statistical Abstract of the United States 1344 Graphing calculator sales The next table saved in the 0 GRAPHICAL le presents the quarterly sales index for one brand of graphing calculator at a campus bookstore The quarters are based on an academic year so the rst quarter represents fall the second winter the third spring and the fourth summer De ne the time variable as t 1 for the rst quarter of 2005 t 2 for the second quarter of 2005 etc Consider the following seasonal dummy variables 1 if Quarter 1 Q1 2 0 otherwise Q 1 if Quarter 2 2 0 otherwise 1 if Quarter 3 Q3 2 0 otherw1se First Second Third Fourth Year Quarter Quarter Quarter Quarter 2005 438 398 252 160 2006 464 429 376 216 2007 523 496 425 318 2008 593 576 456 398 2009 636 640 526 498 3 Write a regression model for EY as a function of I Q1 Q2 and Q3 b Find and interpret the least squares estimates and eval uate the usefulness of the model c Which of the assumptions about the random error component is in doubt when a regression model is t to time series data 1 Find the forecasts and the 95 prediction intervals for the 2010 quarterly sales Interpret the result SECTION 139 Autoeorrelation and the DurbinWatson Test 1339 139 Autocorrelation and the DurbinWatson Test Figure 1320 Illustration of cyclical errors Figure 1321 Minitab scatterplot of annual sales with least squares line Figure 1322 Minitab plot of residuals versus time Recall that one of the assumptions we make when using a regression model for predictions is that the errors are independent However with time series data this assumption is questionable The cyclical component of a time series may result in deviations from the secular trend that tend to cluster alternately on the positive and negative sides of the trend as shown in Figure 1320 The observed errors between the time series and the regression model for the secular trend and seasonal component if present are called time series residuals Thus if the time series Y has an estimated trend of 17 then the time series residualilt is Rt 2 Yr ft Note that time series residuals are de ned just as the residuals for any regression model However we will usually plot time series residuals versus time to determine whether a cyclical component is apparent For example consider the sales forecasting data in Table 135 p 1314 to which we fit a simple straightline regression model The Minitab plot of the data and model is repeated in Figure 1321 and a plot of the time series residuals is shown in Figure 1322 Notice the tendency of the residuals to group alternately into positive and negative clusters that is if the residual for year t is positive there is a tendency for n1 SALES vs T 155 1 145a 125 waifquot ma i a g V If i 3 Edit quot in JPquot 55 fr 5 g 2 4m f i am 5 5 1 15 in 25 553 35 511 T Seatterpnlnt Hf REEID39U AL vs T 25 5th 15 f i i t l mu 39 395 i a if 5 F I 3395 5 5 m 5 a e 1 itquot 39 I r it til Id ill E III F II In 55 R g l g i J iquot i I II m i a ii 3 I If 15 u 5 m 15 in 2395 55 35 45 T gt We use R rather than 3 to denote a time series residual because as we shall see time series residuals often do not satisfy the regression assumptions associated with the random component a 1340 CHAPTER 13 Time Series the residual for year t 1 to be positive These cycles are indicative of possible positive correlation between neighboring residuals The correlation between time series residuals at different points in time is called autocorrelation and the autocor relation of neighboring residuals time periods I and t 1 is called firstorder autocorrelation The correlation between time series residuals at different points in time is called autocorrelation Correlation between neighboring residuals at times t and t 1 is called rstorder autocorrelation In general correlation between residuals at times tand t d is called dthorder autocorrelation Rather than speculate about the presence of autocorrelation among time series residuals we prefer to test for it For most business and economic time series the relevant test is for firstorder autocorrelation Other higherorder autocorrelations may indicate seasonality eg fourthorder autocorrelation in a quarterly time series However when we use the term autocorrelation in this text we are referring to firstorder autocorrelation unless otherwise specified So we test H 0 No firstorder autocorrelation of residuals H a Positive firstorder autocorrelation of residuals The DurbinWatson dstatistic is used to test for the presence of rstorder auto correlation The statistic is given by the formula fl 2 Rt Rt l2 t2 n A ER t1 d where n is the number of observations time periods and 1A2 R4 represents the difference between a pair of successive time series residuals The value of d always falls in the interval from 0 to 4 The interpretations of the values of d are given in the box Most statistical software packages include a routine that calculates d for time series residuals I Interpretation of DurbinWatson dStatistic n A A 2 E Rt Rt l t2 n A E R t1 1 If the residuals are uncorrelated then d w 2 cl Rangeofd0 d 4 2 If the residuals are positively autocorrelated then d lt 2 and if the autocorrelation is very strong d w 0 3 If the residuals are negatively autocorrelated then d gt 2 and if the autocorrelation is very strong d w 4 Durbin and Watson 1951 give tables for the lowertail values of the dstatistic which are shown in Tables X11 0 05 and X111 0 01 of Appendix B Part of Table XII is reproduced in Table 139 For the sales example we have k 1 Figure 1323 Rejection region for the DurbinWatson d test Sales example SECTION 139 Autocorrelation and the DurbinWatson Test 1341 Table 139 Reproduction of Part of Table XII of Appendix B Critical Values for the DurbinWatson dStatistic a 05 k1 k2 k3 k4 k5 n dL dU dL dU dL dU dL dU dL dU 31 136 150 130 157 123 165 116 174 109 183 32 137 150 131 157 124 165 118 173 111 182 33 138 151 132 158 126 165 119 173 113 181 34 139 151 133 158 127 165 121 173 115 181 35 134 158 128 165 122 173 116 180 36 141 152 135 159 129 165 124 173 118 180 37 142 153 136 159 131 166 125 172 119 180 38 143 154 137 159 132 166 126 172 121 179 39 143 154 138 160 133 166 127 172 122 179 40 144 154 139 160 134 166 129 172 123 179 independent variable and n 35 observations Using a 05 for the onetailed test for positive autocorrelation we obtain the tabled values dL 140 and dU 152 The meaning of these values is illustrated in Figure 1323 Because of the complexity of the sampling distribution of d it is not possible to specify a single point that acts as a boundary between the rejection and nonrejection regions as we did for the z t E and other test statistics Instead an upper dU and lower dL bound are specified Thus a dvalue less than dL does provide strong evidence of positive autocorrelation at a 05 recall that small d values indicate positive autocorrelation a d value greater than dU does not provide evidence of positive autocorrelation at a 05 and a value of d between dL and dU might or might not be significant at the a 05 level If dL lt d lt dU more information is needed before we can reach any conclusion about the presence of autocorrelation x O 1 140 1522 3 Rejection region evidence at 0c 2 05 of positive autocorrelation Nonrejection region insufficient evidence at on 05 of positive autocorrelation Possibly significant autocorrelation BIOGRAPHY GEOFFREY S WATSON 1921 1998 The DurbinWatson Test Australian Geoff Watson was educated at the University of Melbourne where he earned a mathematics degree in 1942 Following World War II Watson moved to North Carolina State University to begin work on a graduate degree in statistics He eventually earned his PhD in 1951 During his illustrious career as a statistics professor and researcher Watson had appointments at Cambridge University Australian National University the University of Toronto Johns Hopkins University and Princeton University where he was chairman of the statistics department While visiting Cambridge in the late 19403 Watson collaborated with James Durbin of the London School of Economics to develop their wellknown DurbinWatson test for serial correlation His research interests covered a wide spectrum of statistical applications all across the world including estimating the size of the penguin population Antarctica paleontology problems Sweden probability in quantum mechanics Rome molecular biology Italy and ozone depletion US Energy Information Administration Outside his professional life Watson was a serious painter landscapes and hills and an accomplished tennis player effective lob l 1342 CHAPTER 13 Time Series Tests for negative autocorrelation and twotailed tests can be conducted by making use of the symmetry of the sampling distribution of the dstatistic about its mean The test procedure is summarized in the next box I DurbinWatson dTest for Autocorrelation OneTailed Test TwoTailed Test H 0 No rstorder autocorrelation H 0 No rstorder autocorrelation H a Positive rstorder autocorrelation or H a Negative rstorder autocorrelation H a Positive or negative rstorder autocorrelation Test statistic n A A 2 2Rt Rt 1 t2 n A ER t1 Rejection region d Rejection region d lt dL Of lt dL if H a Negative rstorder autocorrelation d lt dL0z2 0r 4 d lt dL0z2 where dL is the lower tabled value corresponding to k independent variables and n observations The corresponding upper value dU de nes a possibly signi cant region between dL and dug see Figure 1323 where dual2 is the lower tabled value corresponding to k independent vari ables and n observations The corre sponding upper value dug2 de nes a possibly signi cant region between dL z and dug2 see Figure 1323 Requirements for the Validity of the dTest The residuals are normally distributed Example 138 Conducting the DurbinWatson Test for the Sales Revenue Model Problem Refer to the straightline regression model relating annual sales revenue Yr to year t Figures 1321 and 1322 provide graphical evidence of a potential residual correla tion problem Conduct a formal test for positive rstorder autocorrelation by applying the DurbinWatson test Use a 05 Solution We used SPSS to conduct the DurbinWatson test A portion of the SPSS print out for the least squares regression of annual sales is shown in Figure 1324 The elements of the test follow H0 No rstorder autocorrelation H a Positive rstorder autocorrelation Test statistic d 102 highlighted on Figure 1324 Model Summer 39 Howie Adjusted Equot Std Error m F39gure 13 24 o o I Ft R Eguere Emm the Eslimate SPSS regressmn prlntout w1th 1 gas 955 gig 1amp3 a Predismrri t nnstan In Dependent Variable DurbinWatson statistic for annual sales model SECTION 139 Autocorrelation and the DurbinWatson Test 1343 Rejection region d lt dL 140 where dL is highlighted in Table 139 for n 35 a 05 and k 1 Conclusion Because d 102 is below the critical dL value of 140 we reject H0 Thus there is suf cient evidence at a 05 to conclude that the residuals of the straightline model for sales are positively autocorrelated Look Back In the presence of autocorrelated residuals the regression analysis tends to produce in ated t statistics Consequently an analyst has a greater than a probability of committing a Type I error when testing a model parameter For example if a t test is conducted on a Bparameter using a 05 the analyst will falsely reject H0 more than 5 of the time when the regression errors are autocorrelated Now Work Exercise 1347 Once strong evidence of autocorrelation has been established as in the case of the sales example doubt is cast on the least squares results and any inferences drawn from them Under these conditions a time series model that accounts for the autocorrelation of the random errors is required Such time series models take the form YIfEYt Rt where E Y is the usual deterministic portion of the model and R is a term that represents autocorrelated error at time t Consequently the analyst must not only model the determin istic component but also the error component For example a model that is useful when the errors have rstorder autocorrelation called a rstorder autoregressive model is Rt CbRt l 8t where 8 represents the usual independent error term If the deterministic component can be modeled as a straightline function oft then the full model is Yt 80 Blt CbRt l 8t HF HF E Yr Rt Consult the references at the end of this chapter for how to analyze these sophisticated time series models Exercises 1345 1354 Learning the Mechanics 1345 De ne autocorrelation Explain why it is important in time series modeling and forecasting 1346 What do the following DurbinWatson statistics suggest about the autocorrelation of the time series residuals from which each was calculated a d39 b d2 c d199 for nancial and strategic planning at large automotive corporations The following forecasting model was devel oped for Yr total monthly passenger car and light truck sales in thousands 1500 30 31361 32362 33363 34364 35365 where x1 average monthly retail price of regular gaso line x2 annual percentage change in GDP per quarter x3 monthly consumer con dence index x4 total num ber of vehicles scrapped millions per month and 1347 For each case indicate the decision regarding the test of MI the null hypothesis of no rstorder autocorrelation against the alternative hypothesis of positive rstorder autocorrelation a k2n20a 05d 11 b k2n20a 01d 11 c k5n65a 05d 95 d k 1n31a 01d 135 Applying the Concepts Basic 1348 Forecasting monthly car and truck sales Forecasts of auto motive vehicle sales in the United States provide the basis x5 vehicle seasonality The model was t to monthly data collected over a 12year period ie n 144 months with the following results R2 856 DurbinWatson d 101 a Is there suf cient evidence to indicate that the overall model contributes information for the prediction of monthly passenger car and light truck sales Test using a 05 b Is there suf cient evidence to indicate that the regression errors are positively correlated Test using a 05 c Comment on the validity of the inference concerning model adequacy in light of the result of part b 1344 CHAPTER 13 Time Series 1349 Modeling the deposit share of a retail bank Exploratory research published in the Journal of Professional Services Marketing Vol 5 1990 examined the relationship between deposit share of a retail bank and several market ing variables Quarterly deposit share data were collected for ve consecutive years for each of nine retail banking institutions The model analyzed took the following form 1506 30 BIPt l 3254 BsDz 1 where Yf deposit share of a bank in quarter tt 1 2 20 Pt1 expenditures on promotion related activities in quarter t 1 St1 expenditures on servicerelated activities in quarter t 1 and Dt1 expenditures on distributionrelated activities in quarter t 1 A separate model was t for each bank with the results shown in the table oValue for Bank R2 Global F Test DurbinWatson d 1 914 000 13 2 721 004 34 3 926 000 27 4 827 000 19 5 270 155 85 6 616 012 18 7 962 000 25 8 495 014 23 9 500 011 11 a Interpret the values of R2 for each bank b Test the overall adequacy of the model for each bank using or 01 0 Conduct the DurbinWatson dtest for positive residual correlation for each bank at or 01 What conclusions do you draw about autocorrelation 1350 The consumer purchasing value of the dollar Yr from a 1970 to 2007 is illustrated by the data in the table in the next column The buying power of the dollar compared with 1982 is listed for each year and saved in the BUYPOWER le The rstorder model YzZBO31t8 was t to the data using the method of least squares The Minitab printout and a plot of the regression residuals are shown below Minitab plot of regression residuals for Exercise 1350 W mu Raidrnsis listsus T ass i j I ass 7quot I l I I il U D l mat 39 g i i I l39 39 i I g s 25 a a 9 Iquot i i 7 ill I i i ii 39 ssne T D 10 EU 3D 4D Year 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 t Value Y Year t Value Y 1 2545 1989 20 0880 2 2469 1990 21 0839 3 2392 1991 22 0822 4 2193 1992 23 0812 5 1901 1993 24 0802 6 1718 1994 25 0797 7 1645 1995 26 0782 8 1546 1996 27 0762 9 1433 1997 28 0759 10 1289 1998 29 0765 11 1136 1999 30 0752 12 1041 2000 31 0725 13 1000 2001 32 0711 14 0984 2002 33 0720 15 0964 2003 34 0698 16 0955 2004 35 0673 17 0969 2005 36 0642 18 0949 2006 37 0623 19 0926 2007 38 0600 Source US Census Bureau StatisticalAbstract of the United States 2009 3 Examine the plot of the regression residuals against t Is there a tendency for the residuals to have long posi tive and negative runs To what do you attribute this phenomenon b Locate the DurbinWatson dstatistic on the printout and test the null hypothesis that the time series residuals are uncorrelated Use or 10 c What assumptions must be satis ed in order for the test of part b to be valid Applying the Concepts Intermediate 1351 43 Mortgage interest rates Refer to the data on annual mort gage interest rate K Exercise 1341 p 1338 saved in the INTRATE30 le You t the simple linear regression model E Y 80 81t to the data for the years 1985 to 2007 t 012 22 3 Find and plot the regression residuals against t Does the plot suggest the presence of autocorrelation Explain b Conduct the DurbinWatson test at or 05 to test formally for the presence of positively autocorrelated regression errors Minitab regression output for Exercise 1350 Regression Analysis V LUE versus T Ihs sgrsssisn squatisn is thUE s 134 sssss T Predissss Essf ss Cssf I s constant 1sssss sssss2 2035 ssss I asss25s5 sssslss as2s ssss 5 0251076 R Sq s 714 EEqisdji a 137 nalysis of var1sncs Scurse BE 55 HE E P Regssss1sn 1 H2113 32T13 10sss D Ressd usl Essa 35 E EH1 D C1190 Tassl 37 111155 Eurbiniiatssn statist1c 10 EElDEi 1352 9 1353 c Comment on the validity of the inference concerning model adequacy in light of the result of part b Price of natural gas Refer to the annual data on natural gas price K Exercise 1342 p 1338 saved in the NATGAS le You t the simple linear regression model EYt 80 311 to the data for the years 1990 to 2007 t 12 17 a Find and plot the regression residuals against t Does the plot suggest the presence of autocorrelation Explain b Conduct the DurbinWatson test at a 05 to test formally for the presence of positively autocorrelated regression errors c Comment on the validity of the inference concerning model adequacy in light of the result of part b Forecasting foreign exchange rates T C Chiang consid ered several time series forecasting models of future for eign exchange rates for US currency The Journal of Financial Research Summer 1986 One popular theory among nancial analysts is that the forward 90day ex change rate is a useful predictor of the future spot ex change rate Using monthly data on exchange rates for the British pound for n 81 months Chiang t the model EYt 80 81xt1 where Y lnspot rate in month t Chapter Notes 1345 and xizl Inforward rate in month t 1 The analysis yielded the following results tvalue 479 s 025 R2 957 DurbinWatsond 962 a Is the model statistically useful for forecasting future spot exchange rates for the British pound Test using a 05 b Interpret the values of s and R2 c Is there evidence of positive autocorrelation among the residuals Test using a 05 1 Based on the results of parts a c would you recom mend using the model to forecast spot exchange rates 1354 Life insurance policies in force Refer to the annual data a on number of life insurance policies in force Yr Exer cise 1343 p 1338 saved in the LIFEINS le You t the simple linear regression model EYt 80 311 to the data for the years 1980 to 2006 t 12 27 a Find and plot the regression residuals against t Does the plot suggest the presence of autocorrelation Explain b Conduct the DurbinWatson test at a 05 to test formally for the presence of positively autocorrelated regression errors c Comment on the validity of the inference concerning model adequacy in light of the result of part b CHAPTER NOTES Key Terms Adaptive forecast 1318 Additive model 1317 1335 Autocorrelation 1340 Base period 133 Composite index number 134 Cyclical effect 1317 Descriptive 132 Descriptive models 1317 DurbinWatson dstatistic 1340 DurbinWatson test 1340 Exponentially smoothed forecast 1318 Exponential smoothing 1312 Exponential smoothing constant 1312 Firstorder autocorrelation 1340 Forecast 132 Forecast error 1319 Holt s forecasting model 1321 HoltWinter s forecasting model 1335 Index number 133 Inferential 132 Inferential forecasting model 1330 Key Formulas Inferential models 1317 Inferential techniques 132 Laspeyres index 136 Longterm trend 1317 Mean absolute deviation 1326 Mean absolute percentage error 1326 Multiplicative model 1335 Paasche index 138 Price 133 Quantity indexes 133 Residual effect 1317 Root mean squared error 1326 Seasonal effect 1317 Seasonal model 1333 Secular trend 1317 Simple composite index number 135 Simple index number 133 Time series 132 Time series residuals 1339 Weighted composite price index 136 I Yt100 t Y0 Simple index It 2 lt Total of all Y values at timer 0 Simple composite Total of all Y values at time to index Weighted composite price indexes k ZQitOPit I H 100 Laspeyres k E QitOPito i1 k ZQitPit 1 L 100 k E QitPitO i1 Exponential smoothing Et 2 wyt 1 wEt 1 Note E1 Y1 Forecast Ftk E Holt s method Et 2 wYt 1 wEt 1 Tt l Note E2 Y2 T2 Y2 Y1 Tr Et Et l 1 vTt 1 Forecast Ftk E th Paasche nm E IYt Fri Mean absolute MAD Tn deviation nm Yr E Y Mean absolute MAPE Tn t x 100 percentage error Root mean square error 1346 CHAPTER 13 Time Series n A A 2 2Rt Rt l d 22 DurbinWatson 21 test statistic t1 Key Symbols Yf Time series value at time t I t Index at time t Pf Price series at time t Qf Quantity series at time t Ef Exponentially smoothed value at time t Tf Smoothed trend at time t F H k kstepahead forecast value MAD Mean absolute deviation MAPE Mean absolute percentage error RMSE Root mean squared error E Residual at time t d Value of DurbinWatson test statistic dL Lower critical value of d dU Upper critical value of d Key Ideas Time Series Data Data generated by processes over time Index Number Measures the change in a variable over time relative to a base period Guide to Time Series Analysis Types of index numbers 1 simple index number 2 simple composite index number and 3 weighted composite number Laspeyres index or Paasche index Time Series Components 1 secular longterm trend 2 cyclical effect 3 seasonal effect and 4 residual effect Time Series Forecasting Descriptive methods of forecasting With smoothing 1 exponential smoothing and 2 Holt s method An inferential forecasting method least squares regression Measures of forecast accuracy 1 mean absolute deviation MAD 2 mean absolute percentage error MAPE and 3 root mean squared error RMSE Problems With least squares regression forecasting 1 prediction outside the experimental region and 2 regression errors are aatocorrelated Autocorrelation Correlation between time series residuals at different points in time A test for rstorder autocorrelation DurbinWatson test Descriptive Index Numbers Simple Exponential Smoothing Simple Composite Weighted Composite Past values of Yf Exponential Smoothing Type of Analysis Forecasting Predictor Variables Independent variables X115 X215 Past values of Yr Trend Holt s Model Least Squares Regression Models Supplementary Exercises 1355 1368 Applying the Concepts Basic 1355 0 Insured social security workers Workers insured under the Social Security program are categorized as fully and permanently insured fully but not permanently insured or insured in the event of disability The number of work ers in millions in each insured category from 2000 to 2008 are provided in the accompanying table and saved in the INSURED le Year 2000 2001 2002 2003 2004 2005 2006 2007 2008 Fully Fully Event of Permanent Not Permanent Disability 1409 449 1395 1429 452 1417 1449 453 1435 1470 450 1449 1490 448 1462 1511 447 1477 1533 451 1501 1554 456 1523 1574 460 1545 Source Social Security Administration 2009 wwwssag0v 1356 1357 a Compute a simple composite index for the number of workers in the three insured categories using 2000 as the base period b Is the index part a a price index or a quantity index c Refer to part a Interpret the index value for 2008 Insured social security workers cont d Refer to Exercise 1355 a Compute the exponentially smoothed series for the annual number of workers who are fully and perma nently insured for Social Security Use the smoothing constant w 5 b Plot the number of workers and the exponentially smoothed values on the same graph c Use exponential smoothing to forecast the number of fully and permanently insured workers in 2009 and 2010 What is the drawback to these forecasts Demand for emergency room services With the advent of managed care US hospitals have begun to operate like businesses More than ever before hospital adminis trators need to know and apply the theories and methods taught in business schools Richmond Memorial Hospital in Richmond Virginia uses regression analysis to fore cast the demand for emergency room services Speci cally Richmond Memorial uses data on patient visits to the emergency room during each of the past 10 Augusts to forecast next August s demand Data for the month of August in a recent 10year period are shown below and saved in the ER le Year t Ulikain x Daily Daily Visits Average Yt Year t Visits Average Yt 1367 4409 6 3019 9738 1642 5296 7 2794 9012 1780 5741 8 2846 9180 2060 6645 9 3001 9680 2257 7280 10 3548 11445 1358 Supplementary Exercises 1355 1368 1347 a Use a straightline regression model to construct a point forecast for emergency room demand for each of the next three Augusts b Provide 95 prediction intervals around the forecasts c Describe the potential dangers associated with using simple linear regression to forecast demand for emer gency room services d Which other method described in this chapter would be appropriate for forecasting patient visits to the emergency room Retail prices of food items In 1990 the average weekly food cost for a suburban family of four was estimated to be 15440 The table below saved in the FOOD4 le presents the retail prices of four food items for selected years from 1990 to 2007 Assume a typical suburban fam ily of four purchased the following quantities of food on average each week during 1990 Spaghetti Ground Beef Eggs Potatoes 21b 5 1b 1 doz 101b Spaghetti Ground Eggs Potatoes Year b Beef b doz b 1990 85 163 100 32 1995 88 140 116 38 2000 88 163 96 35 2005 87 230 135 50 2007 85 223 210 53 Source US Census Bureau StatisticalAbstract of the United States 2009 a Calculate a Laspeyres price index for 1990 to 2007 using 1990 as the base year b According to your index how much did the above basket of foods increase or decrease in price from 1990 to 2007 Applying the Concepts Intermediate 1359 1360 Mortgage interest rates Refer to the annual interest rate time series Exercise 1341 p 1338 Use w 3 and v 7 to compute the Holt forecasts for 2008 2010 Compare these to the linear regression forecasts obtained in Exercise 1341 using MAD MAPE and RMSE Note39 You will need to obtain the actual values of the time series for 2008 2010 to complete this exercise Price of Abbott Labs stock The yearly closing prices of Abbott Laboratories stock is shown in the table on the next page and saved in the ABBLAB le a Use exponential smoothing with w 8 to forecast the 2009 and 2010 closing prices If you buy at the end of 2008 and sell at the end of 2010 what is your expected gain loss b Repeat part a using Holt s method with w 8 and v 5 c In which forecast do you have more con dence Explain 1348 CHAPTER 13 Time Series Year Closing Price Year Closing Price 1990 4500 2000 4844 1991 6807 2001 5575 1992 3003 2002 4000 1993 2905 2003 4660 1994 3205 2004 4665 1995 4105 2005 3943 1996 5075 2006 4871 1997 6550 2007 5615 1998 4900 2008 5337 1999 3631 Sources Standard amp Poor s NYSE Daily Stock Price Record 1990 2009 wwwabbottcom 1361 1362 1363 5 Price of Abbott Labs stock cont d Refer to Exer cise 1360 a Fit a simple linear regression model to the stock price data b Plot the tted regression line on a scattergram of the data c Forecast the 2009 and 2010 closing prices using the re gression model 1 Construct 95 prediction intervals for the forecasts of part c Interpret the intervals in the context of the problem e Obtain the time series residuals for the simple linear model and use the DurbinWatson dstatistic to test for the presence of autocorrelation Assets of retirement mutual funds Annual mutual fund retirement assets in billions of dollars for two fund types are given in the table below and saved in the FUND2 le a Compute simple indexes for each of the two time series using 1990 as the base period b Construct a time series plot that displays both indexes c Using the results of parts a and b compare and contrast the two types of funds Year IRA 401 k 1990 138 35 1994 342 184 1995 464 598 1996 582 711 1997 763 479 1998 961 616 1999 1257 811 2000 1231 820 2001 1166 796 2002 1043 707 2003 1309 922 2004 1491 1092 2005 1664 1239 2006 1977 1476 2007 2243 1656 Source US Census Bureau Statistical Abstract of the United States 2009 Quarterly GDP values The gross domestic product GDP is the total US output of goods and services val ued at market prices The quarterly GDP values in bil lions of dollars for the period 2004 2008 are given in the 1364 1365 1366 accompanying table and saved in the QTRGDP le Using weights w 5 and v 5 calculate Holt fore casts for the four quarters of 2009 Year Quarter GDP 2004 1 11406 2 11610 3 11779 4 11949 2005 1 12155 2 12298 3 12538 4 12696 2006 1 12960 2 13134 3 13250 4 13370 2007 1 13511 2 13738 3 13951 4 14031 2008 1 14151 2 14295 3 14413 4 14200 Source US Department of Commerce Bureau of Economic Analysis 2009 wwwbea gov Quarterly GDP values cont d Refer to Exercise 1363 a Use the simple linear regression model t to the 2004 2008 data to forecast the 2009 quarterly GDP Place 95 prediction limits on the forecasts b The GDP values given are seasonally adjusted which means that an attempt to remove seasonality has been made prior to reporting the gures Add quar terly dummy variables to the model Use the partial F test discussed in Section 119 to determine whether the data indicate the signi cance of the sea sonal component Does the test support the assertion that the GDP gures are seasonally adjusted c Use the seasonal model to forecast the 2009 quarterly GDP values 1 Calculate the time series residuals for the seasonal model and use the DurbinWatson test to determine whether the residuals are autocorrelated Use a 10 Quarterly GDP values cont d Refer to Exercises 1363 and 1364 For each of the forecasting models apply the MAD MAPE and RMSE criteria to evaluate the fore casts for the four quarters of 2009 Which of the forecast ing models performs best according to each criterion You will need to obtain the actual 2009 GDP values to complete this exercise Revolving credit loans A major portion of total con sumer credit extended is in the category of revolving credit loans Amounts outstanding in billions of dollars for the period 1990 2007 are given in the table on the next page and saved in the LOANS le a Use a simple linear regression model to forecast the 2009 and 2010 values Place 95 prediction limits on each forecast Ybar Revoh ng Ybar Revoh ng 1990 239 1999 609 1991 264 2000 683 1992 278 2001 716 1993 310 2002 749 1994 366 2003 771 1995 444 2004 800 1996 508 2005 825 1997 538 2006 875 1998 579 2007 942 Source US Census Bureau Statistical Abstract of the United States 2009 b Calculate the Holt forecasts for 2009 and 2010 using to 7 and v 7 Compare the results with the sim ple linear regression forecasts of part a 1367 Using the CPI to compute real income The number of 0 dollars a person receives in a year is referred to as his or her monetary or money income This figure can be adjusted to reflect the purchasing power of the dollars received relative to the purchasing power of dollars in some base period The result is called a person s real in come The CPI can be used to adjust monetary income to obtain real income in terms of 1984 dollars To compute your real income for a specific year simply di vide your monetary income for that year by that year s CPI and multiply by 100 In Exercise 1327 p 1325 we listed the CPI for 1990 and 2008 as 1258 and 2153 respectively 3 Suppose your monetary income increased from 50000 in 1990 to 95000 in 2008 What were your real incomes in 1990 and 2008 Were you able to buy more goods and services in 1990 or 2008 Explain References Abraham B and Ledholter J Statistical Methods for Forecasting New York Wiley 1983 paperback 2005 Anderson T W The Statistical Analysis of Time Series New York Wiley 1971 paperback 1994 Box G E P Jenkins G M and Reinsel G C Time Series Analysis Forecasting and Control 4th ed New York Wiley 2008 Chipman J S Ef ciency of Least Squares Estimation of Linear Trend When Residuals Are Autocorrelated Econometrica Vol 47 1979 Durbin J and Watson G S Testing for Serial Correlation in Least Squares Regression I Biometrika Vol37 1950 pp 409 428 Durbin J and Watson G S Testing for Serial Correlation in Least Squares Regression II Biometrika Vol38 1951 pp 159 178 Durbin J and Watson G S Testing for Serial Correlation in Least Squares Regression III Biometrika Vol58 1971 pp 1 19 Evans M Practical Business Forecasting New York Wiley Blackwell 2002 Fuller W A Introduction to Statistical Time Series 2nd ed New York Wiley 1996 Granger C W J and Newbold P Forecasting Economic Time Series 2nd ed New York Academic Press 1986 References 1349 b What monetary income would have been required in 2008 to provide equivalent purchasing power to a 1990 monetary income of 20000 1368 IBM stock prices Refer to Example 131 p 135 and 0 the 2008 monthly IBM stock prices The data are saved in the HITECH le 3 Use the exponentially smoothed series with w 5 from January to September 2008 to forecast the monthly values of the IBM stock price from October to December 2008 Calculate the forecast errors b Use a simple linear regression model t to the IBM stock prices from January to September 2008 Let time trange from 1 to 9 representing the 9 months in the sample Interpret the least squares estimates c With what approximate precision do you expect to be able to predict the IBM stock price using the regres sion model 1 Give the simple linear regression forecasts and the 95 forecast intervals for the October December 2008 prices How does the precision of these forecasts agree with the approximation obtained in part c e Compare the exponential smoothing forecasts part a to the regression forecasts part 1 using MAD MAPE and RMSE f What assumptions does the random error component of the regression model have to satisfy in order to make the model inferences such as the forecast inter vals in part c valid g Test to determine whether there is evidence of rst order positive autocorrelation in the random error component of the regression model Use oz 05 What can you infer about the validity of the model inferences Greene W H Econometric Analysis 6th ed Upper Saddle River NJ Prentice Hall 2008 Hamilton J D Time Series Analysis Princeton Princeton University Press 1994 Harvey A Time Series Models 2nd ed Cambridge MIT Press 1993 Maddala G S Introduction to Econometrics 3rd ed New York Wiley 2001 Makridakis S et al The Forecasting Accuracy of Major Time Series Methods New York Wiley 1984 Nelson C R Applied Time Series Analysis for Managerial Forecasting San Francisco HoldenDay 1990 Shively T S Fast Evaluation of the Distribution of the Durbin Watson and Other Invariant Test Statistics in Time Series Regression Journal of the American Statistical Association Vol85 1990 Theil H Principles of Econometrics New York Wiley 1971 White K J The DurbinWatson Test for Autocorrelation in Nonlinear Models Review of Economics and Statistics Vol74 1992 Willis R E A Guide to Forecasting for Planners Englewood Cliffs NJ Prentice Hall 1987 1350 CHAPTER 13 Time Series USING TECHNOLOGY SPSS Forecasting Exponential Smoothing or Holt39s Method Step 1 Click the Analyze button on the SPSS main menu bar then click on Time Series and then click on Exponential Smoothing as shown in Figure 13S1 The resulting dialog box is shown in Figure 13S2 Figure 1381 SPSS options 3 IM for exponential smoothing Figure 1382 SPSS exponential smoothing dialog box Step 2 Select the quantitative variable to be smoothed and place it in the Variables box Step 3 For exponential smoothing select Simple in the Model box For the Holt method select Holt in the Model box Step 4 To set the value of the smoothing constants click the Parameters button and make your selections on the resulting menu screen as shown in Figure 13S3 Experiential Smeathiinggr j El TlB dI M p lma1 I Damn quotSquot 55 l GEnimafg C E Value if Hat itquot EridSeaclr F fmKrlg ngu 139 3132 lt I 139 lj I I J l a u quotif quotLI l39rr39 Sl quotlo l lean A m 39 ti 539quot I 39 ii r 3 m c F lgnu39jEJxgauei39i IF 33 33331 l n 1 l i 5 39quot I 1 I i I i 39339 Tierii F 57 v39azquot it he 5 cl Figure 1383 Selecting exponential smoothing parameters Step 5 Click Continue to return to the Exponential Smoothing dialog box and then click OK to view the results N0te Forecasts for each time period in the data set will show up in a column on the SPSS spreadsheet screen Regression Step 1 Click the Analyze button on the SPSS main menu bar then click on Regression and Linear See Using Technology Chapter 11 Step 2 Specify the dependent time series variable in the Dependent box and the independent variables in the model in the Independents box Click Save and make the appropri ate menu selections to save the forecasted values as well as 95 prediction intervals Step 3 To conduct the DurbinWatson test for autocorrelated errors click on the Statistics button to obtain the menu shown in Figure 13S4 Check the DurbinWatson box and then click Continue to return to the Linear Regression dialog box Click OK to view the results Linear Regression Statistics Regression Cne icient mm m Estimates 1 Commence intervals El R Emared change I gescr ptives I1 Emma mam Eart and panial caramelwe 1 Canineariw E ag ies ri esii uals I gaeewiise magni es I Continue Cancel W Heip i Figure 1384 SPSS linear regression statistics menu Minitab Forecasting Exponential Smoothing or Holt39s Method Step 1 Click the Stat button on the Minitab main menu bar and then click on Time Series This will produce the menu list shown in Figure 13M1 Step 2 Click on Single Exp Smoothing for the exponential smoothing method or Double Exp Smoothing for Holt s method with trend For example clicking Single Exp Smoothing will result in the dialog box shown in Figure 13M2 Step 3 Select the quantitative variable to be smoothed and place it in the Variable box and set the value of the smoothing constant in the Weight to use in smoothing box Step 4 Select the Options box and specify 1 where Minitab asks for the number of observations to use for the initial Minitab Ll ril39rllizdl Emile em Dela ch Eta raph E ilw Duels mundane Help a l asie Stali llitsa E If I la 1 E I egressian I I I I W T EN39 iii I Es m WE F ianlml charts F Quality Tum F ehabililyi39 unriiral Ir WE39lEiEi E 39Ei Hi l E l u T39im39EEWE 5 E lime Series Pier J Eli lab F tregdlnnalysis z T 5 unnpa m n r Egcempesiljnn 1 1 Emquot i a Eewer and Earnsgig Size I39 imagequot 2 quot in gh qu Emeline 3 3 5395 Daniele Em Smwtliln 395 5 1553 inlers39 Mathew E it Differences Figure 13M1 Minitab options for time series analysis Single Experiential smegthing variable 1 sues Weight tn Use 1 Emailing rt ml seams iii Lise Ill l Geneere fereeasts lliurnh er inf forearm Selma tram mum IVES l Tme I entians I Signage II I G hi Fitslurs I V HELI Bill I cancel I Figure 13M2 Minitab exponential smoothing dialog box smoothed value As an option you can store the forecast values by selecting Storage and making the appropriate selections Step 5 Click OK to view the results Regression Step 1 Click the Stat button on the Minitab main menu bar and then click on Regression and Regression again See Using Technology Chapter 11 Step 2 Specify the dependent time series variable in the Response box and the independent variables in the model in the Predictors box Step 3 Click Options to display the Regression Options dialog box As shown in Figure 13M3 you may select siren ptiiuns Weights Display Link at Ft Tests F Ermine i nllaliinnieei m iiquot Pure emit I Dturhl39wWateuniete e e fl Date summing i PRESS and meatsued Requiem 1 Fill Inlermpl Frediieliien inlemle in new n heem iune I31 Enn denm level lming F Fita i1 Euzn dlenee limits fjli i l E SE Pre le uni quotI39IiIIHH Helip l I in I manual Figure 13M3 Minitab regression options Using Technology 1351 DurbinWatson statistic to conduct a test for autocorrelated errors andor make selections for producing a prediction interval for a future value of the time series variable Step 4 Click Storage and make the appropriate menu selec tions to save the forecasted values as well as 95 prediction intervals These values will appear on the Minitab worksheet Step 5 Click OK on the Linear Regression dialog box to produce the forecasts ExcelDDXL Forecasting Exponential Smoothing Step 1 Access the Excel spreadsheet with the time series data Step 2 Highlight select the relevant data columns on the Excel spreadsheet Step 3 Click on AddIns in the main Excel menu bar and select DDXL On the resulting menu select Exponential Smoothing as shown in Figure 13E1 h Si lagh 1 Summaries Tables Charts and Plats Eegressinrn awovn Euniidence Intervals r 397 quoti Figure 13E1 ExcelDDXL Emma mm V menu options for exponential smoothing EKPEI nentiall Emu Dining Step 3 On the resulting menu select Exponential Smoothing in the Function Type box as shown in Figure 13E2 Emnmliel Emmi qu Dialing l r i 39 Thltm m mm tmh a Imavgiablehdihgc mlmwimm 55 FWIL IM u mistakenm my Help I Figure 13E2 ExcelDDXL exponential smoothing dialog box Step 4 Move the column with the values of the time series variable into the Data Variable box as shown in Figure 13E2 Step 5 Click OK and then click Compute Exponential Smooth to view the results Note The default smoothing constant is w 1 1352 CHAPTER 13 Time Series Step 6 To change the smoothing constant w move the line on the Exponential Smoother Controls box to the value of your choice as shown in Figure 13E3 Then click Recompute to view the new results dialog box Regression See Using Tech nology Chapter 11 Step 1 Access the Excel spreadsheet with the time series data Step 2 Add observations to the end of the spreadsheet for the time periods you want to forecast as shown in Figure 13E4 Then highlight select the data for the time series variable and independent variables on the Excel spreadsheet Step 3 Click on AddIns in the main Excel menu bar and se lect DDXL On the resulting menu select Regression and then select Multiple Regression in the Function Type box Step 4 Move the column with the values of the dependent vari able into the Response Variable box and the columns with the values of the independent variables into the Explanatory Variables box and then click OK Step 5 In the Regression Guidance box click on 95 Con dence and Prediction Intervals Figure 13E4 Excel workbook with additional time period added for forecasting Chapter 14 Methods for Quality Improvement Quality Processes and Systems Quality of a good or service the extent to which it satisfies user needs and preferences 8 Dimensions of Quality Performance Features Reiabiity Conformance Durabiity Serviceabiity Aesthetics Other perceptions that influence judgment of quality Quality Processes and Systems Process series of actions or operations that transforms input into outputs over time INPUTS INFORMATION gt METHODS ENERGY MATERIALS MACHINES PEOPLE OUTPUTS Operation Operation A B Copyright 2005 Pearson Prentice Hall Inc Opelgtlon 00 Quality Processes and Systems System collection of interacting processes with an ongoing purpose INPUT P 1m P OUTPUT SUPPLIER V roiess B rogess b FEEDBACK Copyright 2005 Pearson Prentice Hall Inc Quality Processes and Systems Two important points about systems 1 No two items produced by a process are the same 2 Variability is an inherent characteristic of the output of all processes Measurement Transformation process Inputs Outputs Copyright 2005 Pearson Prentice Hall Inc Quality Processes and Systems 6 major sources of Process Variation People Machines Materials Methods Measurement 03 0quot P 00 N 4 Environment Statistical Control Control Charts graphical devices used for o monitoring process variation o 0 0 0 0 o 0 dentifying when to take action to E10000 0 o o o o o 0 improve the process 9990 o o o Assisting in diagnosing the causes of process variation 9980 llllllllllllllllllllllllllllllllllllllllllllllll I Order of production 005 Pe on Prentic Ha run chart or time series plot Statistical Control Run Chart enhanced 10020 10010 Adding centerline 39 39 39 39 Connecting plot E10000 quoti points in temporal a 39 order 99805 llllllllllllllllllllllllllllllllllllllllllllllll I Enhancements aid 1 10 20 30 40 50 the eye in picking out Order of production Copyright 2005 Pearson Prentice Hall Inc a n y p S Statistical Control Output variable of interest can be described by a probability distribution at any point in time Particular value of output variable at time t can be thought of as being generated by these probability distributions The distribution may change over time either the mean the variance or both Distribution of the process distribution of the output variable Statistical Control A process whose output distribution does not change over time is said to be in statistical control or in control Processes with changing distributions are out of statistical control or out of control or lacking stability Output variable gt Output variable gt eg weight In control Out of control Copyright 2005 Pearson Prentice Hall Inc Statistical Control by p p Measurement Measurement Measurement rlrlrlrlrlrlrlr 2468101214 rlrlrlrlrlrlrlr 2468101214 2468101214 Measurement Order of production a Uptrend il 2468101214 Order of production c Increasing variance Order of production Order of production b Downtrend a Uptrend Measurement Measurement Measurement Illllllllllllll 2468101214 7 2468101214 Measurement V V Order of production 1 Cyclical Order of production e Meandering Order of production Order of production 1 ShockFreakOutlier d Cyclical DDD DD Measurement rlrlrlrlrlrlrlr 2468101214 Measurement Order of production g Level shift Order of production g Level shift Copyright 2005 Pearson Prentice Hall Inc Patterns of Process Variation Measurement p p DD Measurement Measurement D e Order of production b Downtrend Measurement Order of production e Meandering Copyright 2005 Pearson Prentice Hall Inc my Order of production c lncreasrng variance D DDD DD Order of production f ShockFreakOutlier Patterns of Process Variation changing distributions Statistical Control The output of processes that are in statistical control still have variability associated with them but there is no pattern to this variability It is random The future of an I If the proc SSSSSSS n8 outofcontrol process 1n control 1ts future I lt 7 7 a 18 not predlctable will be like its past Copyright 2005 Pearson Prentice Hall Inc Statistical Control Statistical Process Control keeping a process in statistical control or bringing a process into statistical control through monitoring and eliminating variation Common Causes of variation methods materials machines personnel and environment that constitute a process and the inputs required by the process Statistical Control Special Causes of Variation Assignable Causes events or actions that are not part of the process design Processes in control still exhibit variation from the common causes Processes out of control exhibit variation from both common causes and special causes of variation Most processes are not naturally in a state of statistical control Statistical Control Process in control The variation due to one or more common causes has been reduced or eliminated 7 Process in control Special causes eliminated only common causes AV resent p A39 Time Process out of control A Both special and common causes present Output variable gt Copyright 2005 Pearson Prentice Hall Inc The Logic of Control Charts Control charts are used to help differentiate between variation due to common and special causes Upper control limit it 36 Mean when process y Centerhne u IS In control Lower control limit it 3o ble Outp t I I I I I I I I I I I l I I I I I I I I l I I l I I I 7 Order of production Copyright 2005 Pear When a value falls outside the control limits it is either a rare event or the process is out of control The Logic of Control Charts Hypothesis testing with control charts HO Process is under control Ha Process is out of control Another view HO u centerline Ha u centerline Ha here indicates that the mean has shifted Output variable 00135 n3o u u 3o 00135 1 2 3 4 5 6 7 8 9 10 Order of production Normal distribution with mean u and standard deviation 0 Copyright 2005 Pearson Prentice Hall Inc V The Logic of Control Charts Control limits vs Specification limits Specification limits set by customers management product designers Determined as acceptable values for an output ftf Control limits are dependent A on the process specification limits are not 9838 999336igii93p93und130050 100156 7 LCL Lower control limit UCL Upper control limit LSL Lower specification limit USL 2 Upper specification limit Copyright 2005 Pearson Pr 39 A Control Chart for Monitoring the Mean of a Process The xChart If the process is under control and follows a normal distribution with mean u and standard deviation 5 c Chart control chart that plots sample means Often used in concert with Rchart which monitors process variation More sensitive to changes in process mean than a chart of individual measurements Hair 7 also follows a normal distribution with mean u but has standard deviation o l p 3643 Copyright 2005 Pearson Prentice Hall Inc A Control Chart for Monitoring the Mean of a Process The xChart To construct you need 20 samples of a sample size of at least 2 XIX2X3xk Centerllne x k Lower control limit x Al2 IT Upper control limit c Al2 IT where A2 is found in a Table of Control Chart Constants and R is the mean range of the samples A Control Chart for Monitoring the Mean of a Process The xChart Two important decisions in Constructing an ltchart 1Determine sample size n 2Determine the frequency with which samples are to be drawn Rational Subgroups subgroups chosen with sample size n and frequency to make it likely that process changes will happen between rather than within samples Rational Subgrouping strategy maximizes the chance for measurements to be similar within each sample and for samples to differ from each other A Control Chart for Monitoring the Mean of a Process The xChart Summary onchart Construction 1 Collect at least 20 samples with sample size n 2 2 utilizing rational subgrouping strategy 2 Calculate mean and range for each sample 3 Calculate meanof sample means Yand mean of sample ranges R 4 Plot centerline and control limits 5 Plot the k sample means in the order that the samples were produced by the process A Control Chart for Monitoring the Mean of a Process The xChart Constructing Zone Boundaries Using 3sigma control limits Upper AB Boundary Lower AB Boundary Upper BC Boundary Lower BC Boundary Using estimate standard deviation of 3c Upper AB Boundary Lower AB Boundary Upper BC Boundary Lower BC Boundary Nb gtU wlr wlr WINWIN ND gtU XII X XII X ND gtU Nix gtU 73612 32c2 XII I l al 0 X X I Ea2 J2 These zone boundaries are used in conjunction with PatternAnalysis rules to help determine when a process is out of control A Control Chart for Monitgring the Mean of a Process The xChart UCL UCL UCL A A B B C C Centerline Centerline Centerhne C C B B A A LCL LCL LCL Rule 1 One point beyond Zone A Rule 2 Nine points in a row in Rule 3 Six points in a row steadily Zone C or beyond increasing or decreasing UCL UCL UCL A B C Centerhne C Centerhne Centerline B A LCL LCL LCL Rule 4 Fourteen points in a row Rule 5 Two out of three points in a Rule 6 Four out of five points in a alternating up and down row in Zone A or beyond row in Zone B or beyond Rules 1 2 5 and 6 should be applied separately to the upper and lower halves of the control chart Rules 3 and 4 should be applied to the whole chart Copyright 2005 Pearson Prentice Hall Inc Any of the 6 rules being broken suggests an out of control process A Control Chart for Monitoring the Variation of a Process The RChart Rchart used to detect changes in process variation Rchart plots and monitors the variation of sample ranges R A Upper control limit R 36R A Centerline LLR Lower control limit HR 36R llllllll 2468101214 Sample number Copyright 2005 Pearson Prentice Hall Inc A Control Chart for Monitoring the Variation of a Process The RChart To construct you need 20 samples of a sample size of at least 2 R R R R Centerlirze R 1 2 k3 k Lower control limit 172D3 Upper control limit 2RD4 where D3 and D4 are found in a Table of Control Chart Constants When n s 6 there is only an upper control limit A Control Chart for Monitoring the Variation of a Process The RChart Summary of RChart Construction 1 Collect at least 20 samples with sample size n 2 2 utilizing rational subgrouping strategy 2 Calculate the range for each sample Calculate mean of sample ranges R 4 Plot centerline and control limits When n s 6 there is only an upper control limit 5 Plot the k sample ranges in the order that the samples were produced by the process 9 A Control Chart for Monitoring the Variation of a Process The RChart Constructing Zone Boundaries R Upper A B Boundary R 2073 d 2 E Lower AB Boundary R 2073 d 2 1 2 Upper BC Boundary R 0393 d 2 172 Lower BC Boundary R 0393 d 2 Note when n S 6 the R chart has no lower control limit but boundaries can still be plotted if nonnegative These zone boundaries are used in conjunction with PatternAnalysis rules 14 to help determine when a process is out of control A Control Chart for Monitoring the Proportion of Defectives Generated by a Process The pChart pchart used to detect changes in process proportion when output variable is categorical As long as process proportion remains constant process is in statistical control p A Upper control limit 13 34131 13n W Centerline E Lower control limit 3 3 1131 7n llllllllllllll 2 4 6 8101214 Sample number Copyright 2005 P eeee on Prentice Hall Inc A Control Chart for Monitoring the Proportion of Defectives Generated by a Process The pChart SampleSize determination Choose 17 such that 90 170 170 Ilgt where n Sample Size p0 an estimate of the process proportion p A Control Chart for Monitoring the Proportion of Detectives Generated by a Process The pChart Calculations for pohart Construction p Number of defective items in sample Number of items in sample Ceme me 1 9 Total number of defective items in all k samples Total number of units in all k samples 13 1 13 n Upper central limit 1 9 3 131 13 Lower central limit p 3 n A Control Chart for Monitoring the Proportion of Defectives Generated by a Process The pChart Summary of pChart Construction 1 9 Collect at least 20 samples utilizing rational subgrouping strategy and appropriate sample size Calculate proportion of defective units for each sample Plot centerline and control limits Plot the k sample proportions on the control chart in the order the samples were produced by the process A Control Chart for Monitoring the Proportion of Defectives Generated by a Process The pChart Constructing Zone Boundaries Upper AB Boundary 52 HF 71 Lower AB Boundary E 2 51 E 71 Upper BC Boundary 5 50 5 71 Lower BC Boundary E 51395 71 Note when LCL is negative it should not be plotted Lower zone boundaries can be plotted if nonnegative These zone boundaries are used in conjunction with PatternAnalysis rules 14 to help determine when a process is out of control Diagnosing the Causes of Variation If monitoring phase identifies that problems exist diagnosis is needed to determine what the problems are INPUTS PHASE 3 DIAGNOSE 0 Identify causes of variation OUTPUTS PHASE 1 PHASE 2 Copyright 2005 Pearson Prentice Hall Inc Diagnosing the Causes of Variation CauseandEffeot diagrams used to assist in process diagnosis Basic CauseandEffeot diagram lMEASUREMENTI METHODS I MATERIALS I Copyright 2005 Pearson Prentice Hall Inc Diagnosing the Causes of Variation CauseandEffect diagram applied to specific problem Training Maintenance No trouble Poor special Interval too long shooting training instruction Poor hopper design Level of responsibility Weight adjustment Particle size Runs four fillers simultaneously Method of dependent adj ustment gt Process variation too large Too largesmall Feeding mechanism Particle size gt Bags not properly positioned Too much variance P001 top Opening design Material I Methods I Copyright 2005 Pearson Prentice Hall Inc Capability Analysis Used when a process is in statistical control 39 I but level of variation IS a b unacceptably high A Copyright 2005 Pearson Prentice Hall Inc Capability Analysis A Capability Analysis diagram is used to assess process capability This diagram builds on a frequency distribution of a large sample of individual measurements from the process by adding specification limits and target value Process Capability Analysis of WEIGHT sssssssss ta 99999 00 000000 00 000000 ID 99999 34 TTTT at SSSSS eN I a II xquot I II I l Overall Capability 9985 99925 9995 100025 10005 100125 all 44 l l eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee ca 96 c SL 96 at 46 95 SL 95 3 T tal T Capability Analysis From this 2 approaches 1 Report percentage of outcomes that fall outside of specification limits 2 Construct a capability index CIO where Speci cation spreaa USL LSL Process Spread 60 P Capability Analysis Interpretation of CIO lf Cp1 specification spread process spread process is capable lf Cpgt1 specification spread gt process spread process is capable lf Cplt1 specification spread lt process spread process is not capable If the process follows a normal distribution Cp100 means about 27 units per 1000 will be unacceptable Cp133 means about 63 units per million will be unacceptable Cp167 means about 6 units per million will be unacceptable Cp200 means about 2 units per billion will be unacceptable
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'