Popular in Course
Popular in Statistics
This 29 page Class Notes was uploaded by Orval Funk on Monday September 28, 2015. The Class Notes belongs to STAT101 at University of Pennsylvania taught by A.Buja in Fall. Since its upload, it has received 9 views. For similar materials see /class/215428/stat101-university-of-pennsylvania in Statistics at University of Pennsylvania.
Reviews for INTROBUSINESSSTAT
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 09/28/15
STAT 101 Module 5 Logarithms and Regression Factor Changes versus Percentage Changes Example CEO SORich of the company OSOPoor has his total compensation increased by 7 It is rising from 1000000 to 1070000 Q By what factor is total compensation changing A By a factor 107 1 7100 Q If compensation is lowered by 3 what s the factor change A The factor is 097 1 3100 New compensation 970000 Comment When people calculate percentage changes they calculate hundreths and addsubtract them from the amount 1000000 100 39perc 1000000 This however is the same as 1000000 1perc 100 1000000 f In what follows we will need this last form of expressing changes In general If a quantity Z changes by perc it changes by a factor f 1perc100 Conversely if the quantity changes by a factor f it changes by this many percentages perc f 1 100 The change is commonly computed as Znew Zold Zold100 39Perc which is the same as Znew Zold1perc100 Zold f As we said earlier it is this second form that we will need Q Why percentage changes Why not always factor changes A In business changes are often small in the order of a few hundredths So one keeps track of the hundredths only Example Of a change by a factor 107 one retains the 7 which is the percentage It s a sort of universal laziness Of course it comes to haunt us because we have to become adept at going back and forth between 7 and 107 and 3 and 097 Practice 0 What is the factor change for a 10 50 02 80 1000 100 200 5 0 1 change 0 What is the percentage change for a change by a factor 17 20 1001 92 15 1000078 999935 5 FactorPercent Changes and Logarithms Basics of logarithms 0 Two types of commonly used logarithms base 10 and base e2718282 e is here the transcendental number not a residual variable The base10 logarithm is written log10 or log10 the natural basee logarithm ln JMP however uses log for the natural logarithm Logaritth are the inverses of exponentiations lnea a log1010 a For example log10l000000 6 o 1n1 0 log10l 0 because eO 10O 1 mm 0 The reason for using logarithms Logaritth transport multiplicative changes to additive changes 1nZf1nZ1nf In other words If a quantity Z changes by a factor f its logarithm lnb changes by an amount ln f In yet different words Logaritth transport factor changes to amount changes In English amount is something that gets added whereas factor is something that gets multiplied JMP You can use JMP as a logarithm table or logarithm calculator by forming two columns one containing the input values the other being a formula with the logarithm of the input values 1 Create a blank spreadsheet by clicking the leftmost icon in the second tool bar 2 Create a column with name f say COIS gt New Column gt Column Name gt f 3 Create a column with name clnf COIS gt New Column gt Column Name gt lnf Column Properties gt Formula gt Functions grouped gt Transcendental gt Log ln in IMP 4 Enter numbers in the column 1quot Their natural logarithms will appear instantly in the column lnf Small factor approximation for natural logarithms ln 1 perc 100 z perc 100 This approximation is quite good for 10 5 perc 5 10 It can also be written as follows ln1h z k which is quite good for 01 5 h 5 01 Try a few values between f 09 and f 11 on the above JlVlP calculator Important This approximation holds only for the natural logarithm not the lObased logarithm Check the graph of the natural logarithm the approximation means that the 45 degree line and the curve of the logarithm hug each other at 1 Technically the line is the first order Taylor approximation to the natural logarithm at 1 It uses the fact that the derivative of the natural logarithm at 1 is 1 that is d1n1h dh 1 at h0 This implies ln1h ln1h z 1 for small h that is 1n1hh z 1 or ln1h S h The convexity of the logarithm yields not only an approximation but an inequality as well ln 1 h 5 h with equality for h 0 Changes by small percentages If a quantity Z changes by fewer than 10 we can use the small factor approximation lnZ 1 perc 100 z lnZ perc 100 which is quite good for 10 gperc 5 10 In words If a quantity Z changes by perc then lnZ changes by an approximate amount perc 100 Examples 0 If Z changes by 1 then lnb gets added 001 If Z changes by 5 then lnZ gets added If Z changes by 10 then lnb gets added If Z changes by 1 then lnZ gets added If Z changes by 5 then lnZ gets added If Z changes by 10 then lnZ gets added For percent changes greater than 10 or less than 10 one has to calculate ln1perc 100 and add to lnE Examples 0 If Z changes by 15 then lnb gets added 0140 0 If Z changes by 50 then lnb gets added o If Z changes by 15 then lnZ gets added o If Z changes by 50 then lnZ gets added Get the added values to three decimals from the above logarithm calculator in JMP or from your pocket calculator m Add reverse list afexamples lnZ was added 0 01 gt Z was multiplied by 101 gt Z was added 1 FactorPercent Changes and Exponentials Basics of exponentials 0 Two types of commonly used exponentials base 10 and base e2718282 again e is here the transcendental number not a residual variable The base10 exponential is written 10 the basee exponential is written e or expa o Exponentiations are the inverses of logarithms expln Z 101090 Z o expO 1 100 1 Exponential bases The natural exponential function expx can be used to represent exponential functions to any other basis In particular f x can be written f x expbx where b lnf Expressions such as f x occur naturally in business contexts for example when a company grows by a factor f lperc 100 every year It also occurs in fixedincome investments where the interest gets compounded monthly or quarterly or yearly Lastly it occurs in depreciation when some object loses a fixed percentage in value every year Small factor approximation for exponentials The natural exponential has a slope derivative of l at 0 and exp01 Therefore the first order Taylor approximation is exph z 1 h which is a satisfactory for 01 5 h 5 01 As the following graph shows the exponential is convex hence the approximation is actually an inequality exph 2 l h with equality for h0 expo Logarithms and Regression The purpose of logarithms in regression is to extend the reach of linear regression so it can be used to describe certain types of non linear association To be exact logarithms can be used to reduce the following three types of nonlinear association to linear associations i Exponential growthdecaydepreciation This is an association of the form yeXpb0b1x Taking a logarithm on both sides reduces it to a linear assocation between lny and x lny b0b1x o n Practical use In many business applications x is related to time We then speak of exponential growth or decay depending on whether 1 is positive or negative Examples 0 Exponential growth Revenue of a company might grow by a constant 7 every year httpenwikinedin nrg wikiE quot 39 growth 0 Exponential decay The customer base of a company might shrink every year by a constant 8 htt enwiki ediaor wikiEx onential deca Exponential depreciation This term is used when x age y value of assets and b1 lt 0 The value of assets may decrease for example by a fixed 15 every year We are concerned here with empiricaleconomic depreciation that is loss in actual market value this is different from stylized accounting rules that use for example xed amount depreciation over 5 years which may have very little to do with actual values of aging assets but are used to satisfy reporting and tax requirements httpenwikinedin nrg WikiT I We think of the descending exponential as a potentially realistic approximation to actual values as a function of age Whether this is realistic or not needs to be decided by residual analysis after tting an exponential to agevalue data or as we will actually do after tting a straight line to agelnvalue data Math The exponential formula above describes constant factor changes as follows A unit difference in x is associated with a difference in y by a factor expbl Proof y exp b0 b1 x eXpbo Xpb1x Zfquot where Zexpb0 andf expb1 Note Amount difference in x gt Factor difference in y Approximation If 1 is small 01 5 b1 5 01 then Unit difference in x gt difference in y by a factor f expbl z 1 1 N gt difference in y is approximately by 100 b1 Mechanics Exponential associations are linear associations between x and lny Hence the only difficulty is in creating a lnyvariable and interpreting the meaning of the coefficients appropriately Restriction Exponential associations can exist only when y takes on positive values y gt 0 Logarithmically diminishing returns These are associations of the form y bobl lnx with 1 gt0 x lnx General concept of diminishing returns In general diminishing returns are situations where constant increases in x are associated with lesser increases in y for larger x Mathematically speaking diminishing returns are present whenever the association between x and y is described by a monotone increasing but concave function as in the following three examples httpenwikinedin org wiki Tquot 39 returns V na Vlngtlt him The third is the Worst in that y is bounded above Whereas the others at least aren t bounded We are limiting ourselves to the middle case of logarithmically diminishing returns Practical use Diminishing returns frequently arise in business contexts when x measures effort to raise y such as o x advertizing expenditure or o x display space allocation for a product in a store and the response y is an outcome such as sales of or pro ts from the advertized or displayed product It is a common e erience in such situations that initial effort has considerable effect but raising the effort encounters 39 returns that is the same increase in effort brings a lesser increase in outcomes sales When the effort is large Math Logarithmically diminishing returns describe the association between x and y as follows A difference in x by a factor f is associated with a difference in y in the amount b1 lnf Proof ym bu In lnfx bu In 11106 In 1nf yald 171 1Hf Note Factor difference in x 9 Amount difference in y Approximations It is natural to use a 1 or 10 difference in x for simple interpretation of 31 We ll throw in a 100 difference also which is not small 0 1 difference inx gt f 101 gt lnfz 001 gt difference in y in the amount b1100 10 difference inx gt f 11 gt lnf z 01 gt difference in y in the amount 31 10 100 difference in x gt f 2 gt lnf z 07 gt difference in y in the amount 07 1 ln20693 Illustration Assume advertizing has a logarithmically diminishing returns relation with sales If one gets an increase in sales of 1000000 by increasing advertizing from 100000 to 200000 one gets also an increase in sales of 1000000 by increasing advertizing from 500000 to 1000000 In both cases effort is doubled resulting in the same amount increase in sales To make it even more drastic increasing advertizing from 10000000 to 20000000 would also result in a sales increase of 1000000 In this hypothetical illustration what is b1 From the given information can you infer b0 Mechanics Logarithmically diminishing returns are linear associations between lnx and y Hence the only difficulty is in creating a lnxvariable and interpreting the meaning of the coefficients appropriately Restriction Logarithmically diminishing returns can only hold when x takes on only positive values x gt 0 Otherwise the logarithm is not defined 3 Constant elasticity This type of association is really a power law b y a x The power or elasticity b can be positive or negative Which of the power law curves below have negative elasticity The plot shows powers 2 l l2 12 1 2 Annotate it with the correct power Xquotb 15 2i0 25 30 10 Power laws can be reduced to a linear assocation by taking logarithms on both sides lny lna b lnx which can be rewritten as lny 0 1 lnx with 0 lna and bl 3 Thus a power law becomes a linear association between lnx and lny and can therefore be estimated with a straight line fit Math Constant elasticity a power law describes the association between x and y as follows A difference in x by a factor f is associated with a difference in y by a factor f b Proof ynew afxquot axbW yold Note Factor difference in x gt Factor difference in y Percent difference inx gt Percent difference in y Terminology Associations described as factortofactor or percenttopercent mappings are called elasticities Keep in mind that we are only dealing in constant elasticity Whether this is a reasonable assumption or not must be checked For background on elasticity see httpenwikipediaorgwikiElasticity 28economics29 Approximation It is natural to use a 1 difference in x for simple interpretation of b b1 0 1 difference in x difference in x by a factor f 101 difference in y by aactor f difference in lny by amount ln f b b lnf z b 100 difference in y by a factor 2 1b 100 difference in y by b if 10 S b 510 iiiii 1 difference in x gt b difference in y assuming b5 10 Practical uses Elasticities occur primarily in the relation between x price and y quantity sold with negative I If the price is raised by 1 quantity sold changes by 3 assuming I b 5 10 In business and economics elasticities are almost always negative and the minus sign is usually not mentioned but implied Hence an elasticity of 08 is meant to mean I 08 o In a homework we will encounter very different types of elasticities where b gt 0 in biology the association between brain weight and body weight across species It should also be mentioned that for 0 lt b lt l and a gt 0 the association is increasing and concave hence could be used to describe certain types of diminishing returns a 1 increase in effort advertizing results in a b increase in outcomes sales Mechanics Constant elasticities are linear associations between 1nx and lny Hence the only difficulty is in creating a lnxvariable a lnyvariable and interpreting the meaning of the coefficients appropriately Restriction Since we take logarithms of both x and y both must take on only positive values as is the case for prices and quantities sold Note on causality In the contexts of depreciation diminishing returns and elasticity one assumes that there is a causal relationship between x and y based on economic theory Hence a change in x is assumed to be responsible for a change in y Simplified Summary 1 Exponential association If x gtxc then y gtyf amount change in x gt factor change in y 2 Logarithmically diminishing returns If x gtxf then y gtyc fgt 1 factor change in x gt amount change in y 3 Elasticity or elastic association If x gtx 101 then y gtyf factor change in x gt factor change in y Exponential Growth The Internet 19972000 The dataset Web Servers 19972000JMP contains the number of web servers for the internet for the booming years of 1997 through 2000 The following plots show the time series twice on two differently labeled time axes 250000007 39 250000007 20000000 39 200000007 w x 150000007 55 5 150000007 5 n 3 100000007 or 100000007 50000007 5000000 c nnmnnv an ac A ac V NN r r r r 5 1 15 2 25 3 35 4 45 YrssinceJan97 This is clearly a candidate for exponential growth JMP has two ways of fitting an exponential none replacing the other Hence we will typically run both ways 1 Analyze gt Fit y by X gt selectx andy gt click little red icon top left of scatterplot gt Fit Special gt Y Transformation click oNaturaI Logarithm ogy gt OK WebServers 05115 2 25 3 35 4 45 YrssinceJan97 LogWebServers 13408146 09001777 YrssinceJan97 Summary of Fit RSquare 0995876 Root Mean Square Error 006414 Mean of Response 1535853 Observations or Sum Wgts 45 Before analyzing the results we create the second type of JMP output 2 Create one new column with name lny with the log function in the formula here lnWebServers then Analyze gt Fit y by X gt selectx and lny gt click little red icon top left of scatterplot gt Fit Line gt OK 170 165 nWebServers or or 0 f39 145 140 135 I 5 1 15 2 25 3 35 4 45 YrssinceJan97 nWebServers 13408146 09001777 YrssinceJan97 Summary of Fit RSquare 0995876 Root Mean Square Error 006414 Mean of Response 1535853 Observations or Sum Wgts 45 The two outputs agree in the numbers even though they differ in the plots The agreement in numbers stems from the fact that the Fit Special version 1 also fits a straight line to the xlny data and reports those numbers even though it shows an xy plot Interpretation of numeric output 0 Equation slightly rounded nWebServers 134 090 YrssinceJan97 Exponentiated that is exp applied to both sides WebServers exp134 exp090 Y39sSimJa 97 where exp134 660003 and exp090 246 This means that the estimated number of web servers at the beginning of 1997 is 660000 and it grew every year on average by a factor 246 or 146 In rounder numbers beginning of 1997 N 23 million servers growing by almost 150 per year Note that there are no small change approximations possible here the growth rate of 146 is anything but small 0 Quality of fit 0 R Square is 0996 which is extremely high Recall that this is the value for the xlny data from the version 2 output 0 The RMSE is 0064 which needs careful attention It means the residuals on the lnyscale are in the amount i0064 meaning on the yscale they are in the order of a factor expi0064 z 1i0064 or i64 I This is something to put your mind around the RMSE starts out as a iamount on the lny scale and translates to a ipercentage on the y scale The second thing to think through is that the simple translation to i64 is due to the small change approximation It would be messier if the RMSE were something larger like 03 expi03 135 and 074 resp This means the residuals on the y scales are within 35 and 26 Note the asymmetry Residuals The two versions of JMP outputs result in two different residual plots nnnnn nn uuuuu uu 1500000 39 010quot g 1000000 39 g 392 39 E 39 3 500000 3005 quotquot o quot n 391 4110 500000 39 4115 4 412 5 1 15 2 25 3 35 4 45 5 1 15 2 25 3 35 4 45 YrssinceJan97 YrssinceJan97 1 Fit Special shows the residuals off the curve in the xy data 6139 yi CXPUJO 5136 2 Fit Line shows the residuals off the line in the xlny data 911110139 bo 5138 Which one is preferable Surprisingly it is the second One should always show the residuals where the line was fitted here the xlny data They translate to factor residuals on the xy scale eXPei yi eXP b0 5136 It says that yl is off of exp 0 1 x1 by a factor eXpe1 Residual analysis 1 The left plot shows that in absolute terms the eXponential has very small residuals early on and very large ones later 2 The right plot shows that in percentages the residuals at the beginning are as large as the later ones Both plots the second better than the first show that there is a systematic pattern in that early and late residuals are positive while middle residuals are negative We know that this indicates curvature hence an unsatisfactory residual structure knowing x we can know something about the residuals Further comments 0 It is surprising that the residuals are unsatisfactory even though the R Square value is huge It seems that the data have even more convexity than the exponential because the residuals have still largely convex curvature The convex curvature of the residuals ends on the very right hand side where the residuals bend back down This bending back down makes a lot of sense because the end of the internet boom started with the economic downturn in March 2000 O O Logarithmically diminishing returns of display space The dataset DisplaySpaceJMP contains data for a chain of liquor stores where a new wine product was launched The various stores allocated different amounts of display space to promote the new product and so management of the chain wanted to investigate after the campaign whether there was an association between the amount of display space allocated by the store and the sales figure for the new wine at the store Aznn suuu 40007 35007 30007 I I g 25007 39 g 20007 15007 10007 5007 n u l l l l l l l DisplayFeet The initial plot of sales versus display space in feet of width seems to indicate an association of diminishing returns a onefoot increase in display space seems to produce progressively smaller increases in sales We try our model of logarithmically diminishing returns JMP Create a new column lnDisplay with formula logDisplayFeet Then t a straight line to Sales versus lnDisplay i 0 5 10 15 20 lnDisplay Sales 673934 15052329 lnDisplay Summary of Fit RSquare 0859946 Root Mean Square Error 3803813 Mean of Response 2678109 Observations or Sum Wgts 47 We are not showing the alternative analysis with Fit Special where one would choose Natural Logarithm IogX and leave y untransformed Practice Do the analysis with Fit Special on your own and compare Interpretation of numeric output 0 Equation slightly rounded Sales 674 1505 nDisplay o Slope For a 10 increase in Display space 1nDisp1ay will increase by ln11 z 01 hence Sales will increase on average by about 01 1505 z 150 0 Intercept lnDisplay 0 when Display 1ft in which case the average sales are estimated to be 674 0 R Square is 086 which is quite strong 0 The RMSE is 380 which means that the residuals are in the order of about 400 N 0 problem of interpretation as we are working on the raw y scale Residual analysis The residual plot should be taken from the lnxy data that is it should be a plot of e versus lnx Annn luuu 500 0 4 L LLJ 500 39 Residual Annn uuu I I o 5 110 15 20 nDisplay The residual do not look problematic It would be difficult to argue that x gives us information for predicting residuals Summary The analysis is reasonably successful and the logarithmically diminishing returns model is quite consistent with the data Elasticity between price and demand of parcel service The dataset CourierJMP has pricevolume data for a type of service called CourierPak in the early days of the Fedex company An anecdote The concept is interesting and wellformed but in order to earn better than a quotCquot the idea must be feasible said the Yale University management professor in response to Fred Smith s paper proposing reliable overnight delivery service Smith went on to found Federal Express Corp The data are more complex namely overlayed with exponential growth The present version of the data is adjusted for this growth and shows the pricevolume elasticity effect alone l l r r r r r 9 10 11 12 13 14 15 16 17 Price Think of the volume as in thousands and the price as unit price The data is very noisy in fact so noisy that we can t see a non linearity to indicate a power law Yet the negative association is what one expects of a pricedemand relationship Out of curiosity we fit a straight line both to the raw xy data and to the lnxlny data the latter for a proper elasticity analysis r r r r r r r 9 10 11 12 13 14 15 16 17 Price Red Fit Line with raw x andy Green Fit Special choosing both lnx and lny Linear Fit Volume 1826 35680 Price Summary of Fit RSquare 0407245 Root Mean Square Error 8693111 Mean of Response 1374269 Observations 26 The interpretations are Transformed Fit Log to Lo LogVoume 805 0327 LogPrice Summary of Fit RSquare 0413282 Root Mean Square Error 0062924 Mean of Response 7222564 Observations 26 0 Linear fit red For a 1 increase in unit price there is on average o Transformed lnln fit green For a 1 increase in unit price there is on average The RSquare values are imperceptibly different 0407 versus 0413 hence there is no statistical basis for distinguishing between the two models Yet one must report the second analysis because H the world speaks elasticity when it comes to pricedemand and 2 even though one refrains from extrapolating the elasticity equation would extrapolate more sanely because it does not allow prices and volumes to go negative Why Practice Write the elasticity equation as a power law By the way a residual analysis does not reveal any problems Here are the residuals from Fit Special to the xy data and from Fit Line to the lnxlny data As one should eXpect the two plots are nearly identical except for the aXis labeling 200 010 g 100 39 g 0057 g of g 0007 7 n r 39 39 39 m 0057 1007 n w 39 0107 m i i i i i i i 10 11 12 13 14 15 1395 17 22 23 24 215 26 27 28 Price nPrice Reanalysis 0f the Diamond data with a power law Even though the Singapore diamond data do not make an elasticity situation because the association is positive a power law would be more plausible because it extrapolates more gracefully Here is a comparison 1000 W O Price Singapore dollars l l l l 1 15 2 25 3 35 Weight carats iLinear Fit iTransformed Fit Log to Log Linear Fit Price Singapore dollars Transformed Fit Log to Log 260 3721 Weight carats LogPrice Singapore dollars 857 1498 LogWeight carats Summary of Fit Summary of Fit RSquare 0978261 RSquare 0970542 RMSE 3184052 RMSE 0068544 Mean 5000833 Mean 6134642 Observations 48 Observations 48 The interpretation of the slope of the power law is that a 1 increase in weight is associated on average with about a 15 increase in price This seems very meaningful because it expresses the increased value of large stones due to rarity On the other hand the R Square does not speak in favor of the power law while the difference between 0978 and 0971 seems small reexpressed as variance unexplained this is a more formidable difference l 0978 22 versus l 097129 Therefore on the range of the data the straight line fit is slightly better yet for extrapolation the power law is better The residuals and the RMSE of the power law have the same interpretation as for the exponential model all that matters is the logarithm of the response lny e z lny 86l5 x hence expe zy exp86l5 x which means that the exponentials of residuals are the ratios of observed to predicted Correspondingly the RMSE of about 007 implies that the residuals e are in the order of i007 hence the actual prices differ from the predictions by factors around expi007 z 1 i 007 or in the order of i7 Practice Reanalyze the relation between Weight and NIPG Highway and City in the CarModelsZOO34JMP dataset using a power law The interpretation will be one of elasticity
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'