New User Special Price Expires in

Let's log you in.

Sign in with Facebook


Don't have a StudySoup account? Create one here!


Create a StudySoup account

Be part of our community, it's free to join!

Sign up with Facebook


Create your account
By creating an account you agree to StudySoup's terms and conditions and privacy policy

Already have a StudySoup account? Login here


by: Agustin O'Connell


Marketplace > University of California Riverside > Statistics > STAT 231B > STATISTCS FOR BIOLOGICL SCIENCES
Agustin O'Connell
GPA 3.92

Xinping Cui

Almost Ready


These notes were just uploaded, and will be ready to view shortly.

Purchase these notes here, or revisit this page.

Either way, we'll remind you when they're ready :)

Preview These Notes for FREE

Get a free preview of these Notes, just enter your email below.

Unlock Preview
Unlock Preview

Preview these materials now for free

Why put in your email? Get access to more of this material and other relevant free materials for your school

View Preview

About this Document

Xinping Cui
Class Notes
25 ?




Popular in Course

Popular in Statistics

This 47 page Class Notes was uploaded by Agustin O'Connell on Thursday October 29, 2015. The Class Notes belongs to STAT 231B at University of California Riverside taught by Xinping Cui in Fall. Since its upload, it has received 28 views. For similar materials see /class/231764/stat-231b-university-of-california-riverside in Statistics at University of California Riverside.

Similar to STAT 231B at UCR

Popular in Statistics




Report this Material


What is Karma?


Karma is the currency of StudySoup.

You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 10/29/15
MULTPLE REGRESSIONI Regression analysis examines the relation between a single dependent variable Y and one or more independent variables X1 Xp SIMPLE LINEAR REGRESSION MODELS First Order Model with One Predictor Variable Yi B0 BIXi1 8i il2 n I 30 is the intercept of the line and BI is the slope of the line One unite increase in X gives 31 unites increase in Y Si is called a statistical random error for the ith observation Yi It accounts for the fact that the statistical model does not give an exact t to the data 39 8i cannot be observed We assume 0 E8i0 o Vareics2 for all i1 n o COV8i SjO for all iij I ResponseRegression function and an example EiYi i 2 Bo BIXil EYi 95 21xi1 Vhat961 9X Hours Number of Bids Prepared The response Yi given the level of X in the ith trial Xi comes from a probability distribution Whose mean is 952l Xi Therefore ResponseRegression function relates the means of the probability distributions of Y for given X to the level of X 39 BO 31 and 8 are unknown parameters and have to be estimated from data We use b0 b1 and e for the estimates of BO 31 and 8 39 Fitted Values and an Example Yizbo A lei1 is called tted values Yi 9619xi1 I Residual and an Example eiY Y l is called the residual value 1 ei 108 961945 2129 8i2108 10424 O O O O O O The residuals are observable and can be used to check assumptions on the statistical random errors 8i Points above the line have positive residuals and points below the line have negative residuals A line that ts the data well has small residuals We want the residuals to be small in magnitude because large negative residuals are as bad as large positive residuals Therefore we cannot simply require ZZei 0 In fact ZZei 0 derivation on board Two immediate solutions Require Zl ei l to be small Require ZZei2 to be small We consider the second option because working with squares is mathematically easier than working with absolute values for example it is easier to take derivatives I Find the pair b0 b1 that minimizes 2e least square estimates 0 We call SSEZe the Residual Sum of Squares 0 We want to nd the pair b0 b1 that m1n1m1zes SSEZ Zmimwxj 0 We set the partial derivatives of SSE with respect to b0 b1 equal to zero Normal equation 1 62 Z l2Yi b0 b1Xi 0 o bo b1Xi 0 Normal equation 2 agg Z Xi 2Yi b0 blxi 0 1 inm b0 b1Xi 0 Then the solution is derivation on board mviux b 202 Rm 7 1 202 if I Properties of the residuals o Zci 0 since the regression line goes through the point iv o inei 0 andZiei 0 The residuals are uncorrelated with the independent variables X and with the tted values at prove it on board Zei 0 7 ZlXiei 0 and Elie 0 In fact from normal equation 1 and 2 we can immediately tell Zei 0 and inei 0 Since Ziei 2030 blxi ei 13026i bIZXiei we can easily know that Ziqei 0 0 Least square estimates are uniquely de ned as long as the values of the independent variable are not all identical In that case the numerator Zog if 0 draw gure I Point Estimation of Error Terms Variance 62 0 Single Population Unbiased sample variance estimator of the population variance Yu8iln 2 S 9 n l Etazcz 0 Regression modelenbiased sample variance estimator of the population variance Yi 2B0 31Xi sii1n 2 2 Esi 06 8i 6 GSi8j 0 SSE n 2 9 EMSE 02 MSE I Inference in Regression Analysis Estimator of 31 Mean ofbl EM l Variance of b1 02b1S 2 Estimated variance s2 b MS 1 SSXX b 8 I Score and 105 CI of l t Tll N t 2 1 05 CI b1 it1 oc2n 2sbl MSE 2 b s 1 SSM 39 Estimating the Mean Value at Xh Estimator EYh 0 m M fh 0 leh ean gt A EYh EYh Var1ance A 1 X gt02Yh oZh Est1mated var1ance n SSW szrhMSEg z I Analysis of Variance Approach to Regression Analysis ssTOT SSR SSE Y1 i YYi AZ 2Wi Y2 2261 YYi if zltwgt2zltmgt2 Basic Table Source of Variation SS df MS E MS Regression SSR ff 1 MSR SSTR 0 2 2le1 7 Error A 2 n2 SSE 2 SSE 203 X MSE n2 0 Total SSTOT Y2 n l 0 Test Statistic Fyg MSR o F Distribution MSE o Numerator F MSR N Fa n2 Degrees of Freedom del MSE Tail Probability o Denominator Degrees of Freedom denl PF gt Fa lm 2 a H0 31 0 H1 31 7t 0 Fora level of signi cance 1fF SF1 a1n 2gtH0 1fF gtF1 a1n 2gtH1 o Hypothesis 0 Decision Rule Fittin a re ression in SAS data Toluca infile 39Cstat23lBO6chOltaOltxt39 input lotsize workhrs proc reg least square estimation of regression coefficient model workhrslotsize output outresults pyhat rresidualyhat denotes for fitted values and residual denotes for residual values run proc print dataresults var yhat residual print the fitted values and residual vales run SAS Output The REG Procedure Model MODELl Dependent Variable workhrs Number of Observations Read 25 Number of Observations Used 25 Analysis of Variance Sum 0 Mean Source DF Squares Square F Value Pr gt F Model 1 252378 252378 10588 lt 0001 Error 23 54825 238371562 Corrected Total 24 307203 Root MSE 4882331 RSquare 08215 Dependent Mean 31228000 Adj RSq 08138 Coeff Var 1563447 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr gt ItI Intercept 1 6236586 2617743 238 00259 lotsize 1 3 57020 0 34697 10 29 lt 0001 Obs yhat residual 1 347982 51018 2 169472 48472 Simple Linear Regression Model in Matrix Terms The simple linear model le o i39 le i39el Y22 0181X252 Yquot 30 an 817 Matrix terms Y i YXB8 ngtltl HXZ 7X1 11x1 8 Normal Ee 0 63e 031 Extension to multiple regression model Least Squares Estimation of Regression Parameters Normal Equations As simultaneous linear equations quot70 biin 1in 712292 In matrix terms X X b X Y 2x2 2x1 2x1 Estimated Regression Coefficients I 1 I I I I XX XXbXX X Y b x xquotx Y Extension to multiple regression model Example Toluca Company 399 1 80 121 1 30 Y x I 1 323 1 70 80 1 1 1 30 25 1750 XX80 30 70 3 1750 142300 1 70 399 I 1 1 1 121 7807 XY8O 30 70 3 617180 323 4 287475 003535 X X 003535 00005051 287475 003535 7807 7 bXX XX 003535 00005051 617180 6237 35702 Fitted Values and Residuals Fitted Values Y1 A Y Y ngtltl 17 Y ngtltl ngtlt2 Zgtlt1 1 80 34798 Y Xb 1 30 6237 16947 3 35702 S 1 70 31228 HatMatrix Y XX39XquotX Y 9 H Y H XX Xquot X Extensionto multiple regression model We see that the tted values are a linear combination of observed values The hat matrix depends solely on the predictor variable X H is called the hat matrix as it gives the caret to Y It is square and symmetric and has the special property of idempotency HH H Residuals 61 6 ngtltl 8 e Y Y Y Xb 11X ngtltl 17gtlti m 11gtlti eY YY HYl HY VarianceCovariance Matrix of Residuals 0391e 011 H HXI I s3e MSEI H I X H Derivation of the variancecovariance matrix we 1 H62Yl H 1 H63el H 021 Hll H 021 Hl H 021 11 The last step follows from the idempotency of lH Extension to multiple regression model Analysis of Variance Results Sums of Squares 1 SSW Y Y Z Y JY SSE Y Y b X Y l SSR b X Y Y JY 77 Extension to multiple regression model Degrees of freedom the number of independent componenw that are needed to calculate the respective sum of squares l SSTOT is the sum of n squared components However since 2y 0 only nl components are needed for its calculation The remaining one can always be nl calculated from yn y 2 y y Hence SSTOT has nl degrees of freedom 11 2 SSEZeZ is the sum of n squared components However there are p restrictions among the residuals coming from the p normal equations X Y X X b X39Y Xb X39e 0 Hence SSE has np degrees of freedom x1 pxp pxl 3 SSRZ fr if is the sum ofp squared components since all tted values 3 are calculated from the same estimated regression line One degree of freedom is lost because of one restriction 2929y y YZ yZy Y0 Hence SSRhaS Pl degrees of freedom Inferences in Regression Analysis Regression Coefficients 62b 02X X 1 1 272 X 02 b02 7 SEXX SSXX 2x2 1 SS SSXX 52b MSEX Xquot Example Toluca 8428 12040 68534 5 Z 8428 Extension to multiple regression model i Derivation of the VarianceCovariance Matrix of the Least 8 Coefficients b X Xquot X Y AY A X XquotX39 A xx xquot 02b A02YA 0392Y 031 0392b X XquotX 031XX X39I 03X39xquot x xx xquot 03X39xquot1 73X39x l Mean Response 1 XI 2 IX X Xib 02m Xj163bX alxx xquot x 53 MSExx xquotx LOGISTIC REGRESSION POISSON REGRESSION AND GENERALIZED LINEAR MODELS We have introduced that a continuous response Y could depend on continuous or discrete variables X1 X2 Xp1 However dichotomous binary outcome is most common situation in biology and epidemiology Example In a longitudinal study of coronary heart disease as a function of age the response variable Y was defined to have the two possible outcomes person developed heart diease during the study person did not develop heart disease during the study These outcomes may be coded 1 and 0 respectively Logistic regression Age and signs of coronary heart disease on w p m o w SSHL iEHB oaooooooooo oaoaoaoooao Prevalence of signs of CD according to age group Age group in group Diseased 20 29 30 39 40 49 50 59 60 69 7079 8089 6 7 7 5 2 1 a N a a N a o The simple linear regression model Yi 0 1Xisi Yi0l The response function EYi 0 1Xi We view Y as a random variable with a Bernoulli distribution with parameter 111 Yi ProbYi 1 PYi1 m 0 PYi0 1 m PYik nfa 711 k k01 EYi1ni01m m Special Problems When Response Variable Is Binary l Nonnormal Error Terms When Yil8il 0E1Xi When Yi08i 0E1Xi Can we assume Si are normally distributed 2 Nonconstant Error Variance 62m owlxixl o lxi ordinary least squares is no longer optimal 3 Constraints on Response Function 0 EYi 1 What does EYi mean EYi 0 1Xi m EYi is the probability that Yil when This interpretation applies whether the response function is a simple linear one as shown above or a compleX multiple regression one then level of the predictor variable is Xi The logistic function Prnhzhll y n1 dlsezse eBMBix Py1 1 65mm x Both theoretical and empirical results suggest that when the response variable is binary the shape of the response function is either as a tilted S or as a reverse tilted S The logistic function eBo31x P6 11 e3031x 17Pyl lin log 1i function Bo B1X Simple Logistic Regression 1 Model YiE Yi 8i Where Y1 are independent Bernoulli random variables with eXpB0B1Xi TCi EYi7 i1mltps0 599 9 1n1 n J Bo 31quot J log it function 2 How to estimate 30 and 31 a Likelihood Function Since the Yi observations are independent their joint probability function is gltY1Yngt n 1 an il The logarithm of the joint probability function loglikelihood function LBOB110ge gY1Yn n 7 n ZYi10ge 1 Zloge1ni 11 1 7ti 11 we B1Xigt loge1 explt130 B1Xigt1 b Maximum Likelihood Estimation Maximum Likelihood Estimation 1 i2 w Loglikelihood The maximum likelihood estimates of 30 and 31 in the simple logistic regression model are those values of 30 and 31 that maximize the loglikelihood function However no closedform solution exists for the values of 30 and 31 that maximize the log likelihood function Several Computerintensive numerical search procedures are widely used to find the maximum likelihood estimates b0 and b1 We shall rely on standard statistical software programs specifically designed for logistic regression to obtain the maximum likelihood estimates b0 and b1 3 Fitted Logit Response Function 751 logelt gtb0 b1Xi 1 7i 4 Interpretation of b1 7 logelt Agtb0 b1X 1 7 fEX b b X I 0 1 when XX Oddsl A e J 1 TEX I fix1 b b X1 I 0 1 when XXj1 Oddsl A 6 J nxju ORCM gtlogeORb1 odds2 b1increase in logodds for a one unite increase in X Assumption P LogitPi Transform Predictor Predictor 1 Perso Months of Task Fitted Devianc Examp 6 n Experienc Succes Valu e e s e Residual 0 Y 1 if the task was finished 1 Xi Ya it Devi 1 14 0 0 31 862 0 1f the task wasn t fmlshed 2 29 0 0835 1899 3 6 0 01 10 483 o X months of programming experlence 23 28 1 39 646 0812 24 22 1 0621 976 25 8 1 0146 1962 SAS CODE SAS OUPUT proc logistic data chl4ta01 a mm mm model y event39 139 x Analysis a Max1m titer11m EstWes run parameter D Estmm mm 22325 w gt CE Notice that we can specify which event 3mm 3 3112 3152 2132 2123 to model using the event option in the odds W Emma model statement The other way of Wm 95 W 1 Effett Estimate con dante Limits event instead of 0 is to use the X quot93 quot33 descending option in the proc logistic statement How to use the output to calculate 715139 How to interpret 1031 Interpretation of Odds Ratio OR1 175 means that the odds completing the task increase by 175 percent with each additional month of experience Interpretation of b1 b10 1615 means that the logodds of completing the task increase 01615 with each additional month of experience 4 Repeat ObservationsBinomial Outcomes In some cases particularly for designed experiments a number of repeat observations are obtained at several levels of the predictor variable X For example in a study of the effectiveness of coupons offering a price reduction on a given product 1000 homes were selected at random The coupons offered different price reductions 5101520 and 30 dollars and 200 homes werej assigned at random to each of the price reduction categories Level Price Number of Number of Proportion of Mondel Reduction Households Coupons Coupons Based Redeemed Redeemed Estimate j l g 900 do pjso j 2 10 200 55 1275 3 15 200 70 350 393562 4 20 200 100 500 394731 5 30 200 137 685 7028 l ith household redeemed coupons at level X j Yi 11nl2345 J 0 1th household not redeemed coupons at level X j J nJ yj YjEYij Pj J The random variable YJ has a binomial distribution given by fY n h n1 n1 w ere I Yj Yj Yanj YJ39 The log likelihood function j 1 7quot logeLBoal31 i10ge j j Yjl30 4 BIXj7nj10ge1 eXPBo 4 11 j SAS CODE SAS OUTPUT data ch14ta02 we Lmrsrn New infile 39Cstat23 l BOGCh1 4ta02tXt39 EM 39 input x n y pro Varamaer n mm Errar mswe m mm TM 1 39amp 3333323 1332333 233333 proc logistic datach14ta02 model ynx Prequest estimates of the predicted mas Rana Estimates mm 55 Wald mm Estimate mundane mm vaues to be stored in a file named X n m Pestmates under the variable name pie l output outestimates p pie m x n y W we run39 33 proc print dataestimate run l n 55 3275 325425 7a mm m 3533 m H535 a E9 Multiple Logistic Regression 1 MOdCl YiE Y1 Si Where Y1 are independent Bernoulli random variables with expXB EYi i 1 expoil TE TC ln 2 X3 log it function 2 How to estimate the vector 3 loge Lo Ya X53 logerl expltxgisgt1 3 Fitted Logit Response Function TC i i logelt A gt Xab l ni Case Age Socioeconomic City Disease Fitted Example i X Status Sector Status Value X12 X13 X14 Y1 1 disease present 1 33 0 0 0 0 109 0 disease absent 2 35 0 0 0 0 3 6 0 0 0 0 X1 Age 4 60 0 0 0 0 5 18 0 1 0 1 7 lMiddle Class 6 26 0 1 0 0 136 2 0 others t t 39 39 39 39 39 somoeconomic s a us X 1 Lower Class 98 35 0 1 0 0 171 3 0 others SAS OUTPUT39 1 city sector 2 0 city sector 1 4 7 Study purpose assess the strength ofthe association between each ofthe predictor variables and the probability of a person having contracted the disease SAS CODE data chl4ta03 infile 39cstat23lBO6chl4taO3txt39 DELIMITER39 O939x input case x1 x2 x3 x4 y proc logistic datachl4ta03 model y event39l39xl x2 x3 x4 run Analysis af Maximum Likelihood Estimates Standard weld Veremaer Estimate Errar hlrSwere w gt qu Interzept mm x nn27s x2 mam x3 mans x4 man can Rana Estimates mm Estimate 55 weld Effezt anildenze mm mm H455 H225 15m The odds of a person having contracted the disease increase by about 30 percent with each additional year of age Xi for given socioeconomic status and city sector location The odds ofa person in section 2 X4 having contracted the disease are almost ve times as great as for a person in sector 1 for given age and socioeconomic Polynomial Logistic Regression 1 MOdCl YiE Y1 Si Where Y1 are independent Bernoulli random variables with eXpX B EYi i1expx 7139 39 2 k 1n 1 XiBBOBIIXBZZX quot39Bka l 7ti logit function Where X denotes the centered predictor X X Scama39Hotawdietodertogitmwe Exanmlc 1 IPO was nanced by venture capital funds Q5 0 IPO wasn t nanced by venture capital funds a X1 7 the face value of the company as Study purpose determine the characteristics of companies that a attract venture capi al D SASCODE n data i o a infile 39cstat231BO6appenclltxt39F D quotI 6 6 w m 3 m e S X l n input case vc faceval shar lnfaceLOGfaceval run Run lst order logistic regression analysis proc logistic dataipo descending model vclnface output outlinear plinpie run produce scatterplot and fitted lst orde run proc sort datagraphl by lnface run pro gplot dat agraphl The natural logarithm of face value is symboll colorblack valuenone interpolj Oin chosen because face value ranges symbolZ colorblack valueCircie over several orders of magnitude title39Scatter Plot and lst Order Logit Curve39 with a highiy skewed distribution plot linpielnface vclnfaceoverlay overlay means to overlay the two graph 2 The lowess smooth clearlysuggests run a moundshaped relationship Find mean of lnfacel67088 SAS OUTOUT proc means va r l nf ace m m5quot mam run Run 2st order logistic regression analys 5 data step2 mm Wimmmmm linear a 3922 75133 chtlnface16708 2 1 22 221 1252 323 W 1 51m mm m cht2xcnt2 run add Rina mm mt 53 222 muff l proc logistic datastep2 descending K H Lml model vcxcnt chtZ output outestimates ppie run xzmz 5 53 5555 produce scatterplot and fitted t order set estimates run proc sort datagraph2 by cht run proc gplot datagraph2 symboll colorblack valuenone interpoljoin symbolZ colorblack valuecircle title39Scatter Plot and lst Order Logit Curve39 plot piexcnt vcxcntoverlay overlay means to overlay the two graph run Inferences about Regression Parameters 1 Test Concerning a Single Bk Wald Test Hypothesis H0 Bk0 vs Ha Bk 0 Test Statistic 2 bk Sbk Decision rule If z zloc2 conclude H0 If zgt zloc2 conclude Ha Where 2 is a standard normal distribution Note Approximate joint confidence intervals for several logistic regression model parameters can be developed by the Bonferroni procedure If g parameters are to be estimated with family confidence coefficient of approximately loc the joint Bonferroni confidence limits are bkiBs bk where Bzloc2g 2 Interval Estimation of a Single 3k The approximate loc confidence limits for Bk bkizloc2s bk The corresponding confidence limits for the odds ratio exp3k expbkizloc2s bk Example Pm 323 332955 me 3253 0 Y 1 if the task was finished Y 0 03 7862 0 if the task wasn t finished 2 if 8 gffg 339 23 28 1 0812 646 o X months of programming 2 282 l 831 19322 experience SAS CODE SAS OUPUT proc logistic datachl4ta01 my model y event39l39x cl run Notice that l we can specify cl in the model statement to get the output for interval estimate for g 51 etc 2 The test for 51 is a twosided test For a one sided test we simply divide the pvalue 00129 by 2 This yields the onesided pvalue of 00065 3 The text authors report Z2485 and the square on is equal to the Wald ChiSquare Statistic 6176 which is distributed approximately as ChiSquare distribution with dfl Varameter IMenem i x i H 50 vs Ha ggto For o5 Since onersided prvalue00065lt005 we conclude that Biispontwe mitn approXinateiy 95 confidence that g 15 between uu and n2aaa The corresponding 95 conoioence iinits r odds ratio are expD34l1D3 and expZEEE133 Hm 341 or the 3 Test Whether Several 3150 Likelihood Ratio Test Hypothesis H0 BqBq1Bp10 v Ha not all of the 3k in H0 equal zero Full Model expX39BF l expX 3F Reduced Model expX39BR l expX BR 1 exp X39BF X39BF 30 699 B1XP1 1 exp X39BR X39ISR 30 699 quinei LR The Likelihood Ratio Statistic G2 7210geL7 72logeLR7logeLF F The Decision rule If G23 X2locpq conclude H0 If G2gt X2locpq conclude Ha Example Case Age Socioeconomic City Disease Fitted 1 disease Present i 1 S atus Sector Status Value Xi X13 X14 Y 7 0 d1sease absent 1 33 0 0 0 0 209 X1 Age 2 35 0 0 0 o 3 6 0 0 0 0 lMlddle Class 4 50 0 0 0 0 39371 X 7 111 2 0 Others 5 18 0 1 0 1 136 socioeconomic status 5 25 0 1 0 0 39 1 Lower Class X3 0 other S 98 35 0 1 0 0 171 1 city sector 2 4 0 city sector 1 the wumm disease SAS OUTPUT Fuiimadei data ch 1 4 taO 3 Me at 5mm infile 39 C stat23lB06chl4taO3 txt 39 New mfg DELIMITER09X mamquot mu wmma input case x1 x2 x3 x4 y 23 33235 31233 2mg 2 f1t full m del proc logistic datachl 4ta03 m quot 5mm Mme model Y event39 139 Fm X2 x3 x4 fit reduced mode1 proc logistic datachl4ta03 model y event39l39x2 x3 x4 run 2 mgr 122313 We use proc logistic to regress Y on X1 X2X3 and X4 and refer to this as full model In SAS output for full model we see that 2 Log Likelihood statistic101054 We now regress Y on X2X3 and X4 and refertothis as the full model In SAS output for reduced model we see that 2 Log Likelihood statistic106204 Using equation 1460 test page 581 we nd G2106204101054515 For cz005 we require x2951384 Since our computed G2 value 515 is greater than the critical value 384 we conclude Ha that X1 should not be dropped from the model 4 Global Test Whether all 3150 Score Chisquare test POLYTOMOUS LOGISTIC REGRESSION POISSON REGRESSION AND GENERALIZED LINEAR MODELS Polytomous Logistic Regression for Nominal Response What do we do if the response variable has more than two levels Logistic regression can still be employed by means of a polytomous or multicategory logistic regression model Example A study which determines the strength of association between several risk factors mother s age nutritional status history of tobacco use and history of alcohol use and the during of pregnancies preterm intermediate term full term Case Duration Response Category Nutritional Status AgeCategory Alcohol Use Smoking History History 1 Y1 Yn Y1 Y1 X11 X12 X13 X14 X15 1 1 1 0 0 150 0 0 0 no 1 2 1 1 0 0 124 1 0 0 0 no 3 1 1 0 0 128 0 0 0 1 100 3 0 0 1 117 0 0 1 1 yes 101 3 0 0 1 165 0 0 1 1 102 3 0 0 1 134 0 0 1 yes 1 1 Preterm less than 36 weeks Yi 2 Intermediate term 36 to 37 weeks 3 Full term 3 8 weeks or greater AgeCategory Xi Xig lt20 years old 1 0 2130 years old 0 0 gt30 years old 0 1 There are 3 response categories If we use category 3 as the baseline category there are two comparisons to this referent category All other comparisons can be obtained based on these two comparisons Let 1 denote the probability that category j is selected for the ith response then the logit for the two comparisons are expXBI y m 444444 44444444f41 n Xi l logenzXBZ39 1 1expxi lexpxi z i3 i3 7Til 7Ti loge m expXBI 2 1 expXBI eXpXBZ 1 W 444444i44444444747 3 16XPXiBIeXPXiBZ We use maximum likelihood method to estimate parameter vectors 31 32 The idea Y Step 1 PYi2PYil0 Yiz1 Yi307ti2 1TH 0 gtltTi21 gtlt 1TB 0 h39tij j1 n 3 Yii Step 2 PY1 Yquot Iii gnu 1 n 2 2 y Step 3 loge PY1 Yquot 2ZlYiXi5j loge1 expXi5j 1 j 1 Step 4 Find b1 b2 that will maximize 10ge PY1 Yn by using standard statistical software A 7 expxgbl Step 5 n 7 1 eXpXb1 expxgb2 expltxgb1gt 2 1 expXb1 expXb2 1 T f 3 1 eXpXib1 eXpXib2 SAS CODE data pregnanc infile 39Cstat23lBO6chl4tal3txt39 input case y 01 r02 r03 x1 x2 x3 x4 x5 x21 x2 x31 x3 x41 x4 x51 x5 run use linkglogit option right after model statement will produce appropriate analysis for a multinomial response proc logistic datapregnancy Class x2 x3 X4 x5 model yxl x2 x3 x4 XSlinkglogit run SAS OUTPUT Response Profile Ordered Total Value y Frequency 1 1 26 2 2 3 5 3 41 First indicates that the response had three levels 123 with different frequency Logits modeled use y3 as the reference category Y3 is the reference category Analysis of Maximum Likelihood Estimates Standard Wald Parameter y DF Estimate Error ChiSquare Pr gt ChiSq Intercept 1 1 102306 25966 155240 lt 0001 Intercept 2 1 8 0069 2 2027 132141 0 0003 X1 1 1 0 0654 0 0182 12 8642 0 0003 X1 2 1 0 0464 0 0149 9 7357 0 0018 X2 0 1 1 4784 0 4822 9 3990 0 0022 X2 0 2 1 1 4567 0 4288 11 5420 0 0007 X3 0 1 1 1 0298 0 4474 5 2982 0 0213 X3 0 2 1 0 9437 0 4044 5 4457 0 0196 X4 0 1 1 1 0214 0 3549 8 2847 0 0040 X4 0 2 1 0 5335 0 3248 2 6984 0 1004 X5 0 1 1 1 2261 0 3657 11 2382 0 0008 X5 0 2 1 11152 03341 111419 0 0008 This table contains estimated regression coefficient estimated approximate standard errors the wald test statistics and pvlues As the table shows all Wald test Pvlaues are less than 05 with the exception of alcohol in the second linear predictor indicating that all of the predictors should be retained For all cases the direction of the association between the predictors and the estimated logits as indicated by the signs of the estimated regression coefficients were as expected Odds Ratio Estimates Point 95 Wald Effect y Estimate Confidence Limits X1 1 0937 0 904 0 971 X1 2 0955 0 927 0 983 X2 0 vs 1 1 19237 2 905 127 382 X2 0 vs 1 2 18418 3430 98895 X3 0 vs 1 1 7842 1358 45295 0 vs 1 2 6602 1353 32221 x4 0 vs 1 1 7712 1919 30997 x4 0 vs 1 2 2906 0 814 10381 x5 0 vs 1 1 11614 2769 48710 0 vs 1 2 9303 2511 34464 This table contains the estimated odds ratios for the two estimated linear predictors and the 95 confidence intervals for the odds ratios For example for teenagers the estimated odds of delivering preterm compared to full term are 1924 times the estimated odds for women 2030 years of age the 95 confidence interval for this odds ratio has a lower limit of 343 and an upper limit of 9891 Polytomous Logistic Regression for Ordinal Response The model that is usually employed is called the proportional odds model The proportional odds model for ordinal logistic regression models the cumulative probabilities PY 15 rather than the speci c category probabilities PY as was the case for nominal logistic regression For category j Proportional odds model expo ng PYi S j 1 expocj XiB forjl2 Jl cumulative logits PYi gj 7 loge 17PYigj70cjXiB for 12J 1 As we can see in the nominal case each ofthe J1 parameter vectors 3 is unique For ordinal responses the slope coef cient vectors 3 are identical for each ofthe J1 cumulative logits but the intercepts differ Forthe ordinal responses the slope coef cient vectors 3 can be interpreted as the change in the logarithm ofan odds ratio this time the cumulative odds ratio for a unit change in its associated predictor We use maximum likelihood method to estimate parameter vectors 11 XJ1 and The idea J YE Step 1 PYiSl lll7Tijl J Step 23 PY1 Yn PYi SiPYi S j1Yii i1 j1 Step 3 loge PY1 Yn Step 4 Find al all and b that will maximize 10ge PY1 Yn by using standard statistical software SAS CODE data pregnancy infile 39Cstat23lBO6chl4tal3txt39 input case y rcl r02 r03 xl x2 x3 x4 x5 x21 x2 x31 x3 x41 x4 x51 x5 run Since there are 3 levels of Y SAS will assume that the variable is ordinal and perform ordinal logistic regression proc logistic datapregnancy Class x2 x3 x4 x5 model yxl x2 x3 x4 x5 run Analysis of Maximum Likelihood Estimates Standard Parameter DF Estimate Error ChiSquare Pr gt ChiSq Intercept 1 1 62303 15826 154982 0001 Intercept 2 1 83251 16838 244446 0001 X1 1 00489 00117 174958 0001 X2 0 1 09880 02937 113137 00008 X3 0 1 06817 02773 60426 00140 X4 0 1 08349 02364 124767 00004 X5 0 1 07958 02263 123683 00004 The LOGISTIC Procedure Odds Ratio Estimates Point Effect Estimate Confidence Limits X1 0952 0931 0974 X2 0 vs 1 7214 2281 22815 X3 0 vs 1 3910 1318 11595 X4 0 vs 1 5311 2103 13415 X5 0 vs 1 4911 2023 11924 For example the results indicate that the logarithm of the odds of a pre or Intermediateterm delivery YiSZ for smokers X51 is estimated to be times the logarithm of the odds for nonsmokers X50 The estimated cumulative odds ratio is given by expl 51949l and a 95 confidence interval for the true cumulative odds ratio has a lower limit of 202 and an upper limit of 1192 Poisson Regression Model Poisson regression is useful when the outcome is a count with largecount outcomes being rare events Poisson Regression Model Yi are independent Poisson random variables with expected values ui Where uiexpX B The maximum likelihood estimation can be used again to estimate 3 Example The Miller Lumber Company conducted an instore customer survey The researcher counted the number of customers who visited the store from each nearby census tract The researcher also collected and subsequently retained five quantitative predictor variables for use in the Poisson Regression SAS CODE data Miller infile 39cstat23lBO6chl4tal4txt39 input y x1 x2 x3 x4 x5 run Poisson regression is available in the proc genmod procedure Proc genmod fits a generalized linear model to the data There are a number of link functions and probabilites distributions that can be specified by the user We complete the Miller Lumber analysis by specifying the Poisson distribution on the model statement proc genmod model yxl x2 x3 x4 XSdistpoisson run Random and MixedEffects Model Model 11 Random Factor Levels for Twofactor Studies What we have considered Interest centers on the effects of the speci c factor levels chosen Fixed Effect Model speei efactorlevels chosen Stat1stlca11nferences 1s con ned to these spe01 c levels stud1ed What if a factor has a large number of possible levels and interest centers on the effects of all possible levels Keep in mind that measuring responses at every level may be dif cult impossible or prohibitively expensive Solution 1 Regard the set of all levels under consideration as a statistical population 2 Draw conclusions about this population on the basis of the observed responses to a random sample of levels selected from this population I Interest centers on the effects of all possible levels where the speci c factor levels were chosen from Random Effect Model nonSpeeil39ie factorlevels chosen Statistical inferences apply to entire population of factor levels Example A company owns several hundred retail stores seven of these stores were selected at random and a sample of employees in each store was asked to evaluate the management of the store The seven stores chosen for the study constitute the seven levels of the random factor retail stores N Management was not just interested in the management of the seven stores chosen but wanted to generalize the results to the entire population of stores Oneway random effects model Yij Hi Sij 1 cell means model i1 r 39 j1 n Y u ti 2 2 factor effect model i1 r 39 j1 n Where u the effect of the ith randomly selected treatment are independent N0 oi sij random error are independent N0 62 pi and eij are mutually independent u the overall mean expected response ti the effect of the ith randomly selected treatment are independent N0 6 ti and eij are mutually independent m H mm a Slugs umnuumsuvvlpls ulzipumwrw smerauy u is gtelecleu hum pupuluuvn m was m We emenmenml fannr m m m A m u 22 Questions of Interest l estimation of p 2 estimation of 6 3 estimation of a 4 estimation of 6 62 cit 5 H0 26 0 H126gt0 Table 101 ANOVA Table oneway random effect model with equal replication Source of Variation Degree of Sum of Mean Freedom Squares Square F0 2 2 a 7 7 a Y Y MS Treatments al nZYi Y 2 MSTreaLmem E7 M 7 7 n an 171 171 E a n 7 Error anl ZZOQJ Yi2 MsE 11 j1 a n i a n Y2 Total anl 22Yijy2 ZZZYU2 an i1j1 i1j1 The expected values of these mean squares are known as 2 2 2 EMSE 6 EMSTreatment 6 n6u Estimation of H 2 2 2 2 7 S y no 6 VY 41 i 7 r rn rn S2Y MSTreatment rn The con dence limits for p Y r tlimy sY Estimation of 01 62 0 2 2 MSTreaunem nan 5 MS 62 N Fr71rn71 E 2 MSTreamel c F 170c MSE 6362 1 012 1rn 1 PF02r71rn71 S 6 PLggUlia Where L 1Msnmm 1 1 n MSE Flia2rilrnil U i MSTreaunems 1 71 n MSE Fa2rilrnil 2 L S c5I S U L 1 cf 52 1 U I Obs I y of cer I Candidate I Example Apex enterprises studied personnel of cer evaluation ratings of I 1 I 76 I 1 I 1 I potential employees Five of the company s personnel of cers were randomly selected 2 65 1 2 and four job applicants were randomly 3 85 1 3 ass1gned to each of the ve of cers Thus each of cer rated four job applicants 4 74 1 4 5 59 2 1 SAS CODE data apex infile 39cstat23lBO6ch25taOltxt39 input y officer candidate run proc glm class officer model yofficer because the five officers are perceived as a random sample from a Very large quotpopulationquot of officers we refer to the factor officers as a random factor random officer run To construct confidence intervals for functions of sigmaA2etc and carry out additional analyses we use SAS proc mixed Note that the 39method39option can be used to specify the estimation method mlmaximum likelihood remlresidulrestrictedmaximum likelihood miqvue0minimum variance quadratic unbiased estimates for the covariance parameters proc mixed method reml asycov cl covtest alpha1 class officer estimate the mean rating of all personnel officers with a 90 confidence interval model y cl alpha1 random officer run data ratio use text equation 2518 n4 r5 mstr394925 mse7328333 flfinv1 01 r 1 rn 1 fufinv01 r 1 rn 1 l1nmstrmse1fl 1 u1nmstrmse1fu 1 lstarll1 ustaruu1 proc print dataratio var l u lstar ustar run SAS OUTPUT The GLM Procedure Dependent Variaple y Sum 0F Source DF Squares Mean Square F Value Pr gt F Model 4 1579766666 394925666 539 66658 Error 15 1699256666 Corrected Total 19 2578956666 Rquuare CoeFF Var Root MSE y Mean 6589571 1198126 8556559 7145666 Source DF Type 1 SS Mean Square F Value Pr gt F oFFicer 4 1579766666 394925666 539 66658 Source DF Type 111 SS Mean Square F Value Pr gt F oFFicer 4 1579766666 394925666 We conclude that the means for officers in the population of of cers are not all equal because pvalue0006 Solution For Fixed EFFects Standard EFFect Estimate Error DF t Value Pr gt ltl Alpna Lower Upper Intercept 44437 4 1568 lt6661 61 We estimate the mean rating u of all personnel of cers with a 90 con dence intervals This is accomplished by the method ycl alpha1 statement From the Solution for Fixed effects table the estimate is it 7145 and the con dence limit is between 6198 and 8092 Tne Mixed Procedure CoVariance Parameter Estimates Standard 2 Error Value Pr 2 Alpna Lower Upper oFFicer 761333 115 61258 61 Residual 732833 257593 274 66631 61 Asymptotic Covariance Matrix oF Estimates CoV Parm Estimate Row CoV Pa rm CoVPi CoVPZ 1 oFFicer 491858 717901 2 Residual 717901 71565 to estimate sigmaAZ with a 90 con dence interval we use proc mixed and requested that the asymptotic covariance matrix of the covariance parameters be displayed and that 90 con dence limits be computed to estimate the parameters asycov cl covtest alpha1 The covtest option produces asymptotic standard errors and Wald Ztest for the covariance parameter estimates The 90 con dence interval is calculated between 4398 and 15139 5 1 u lstar ustar 1 032052 496436 An estimate of 52 is given by MSE7328 see the first output The lower and upper limits for 2 GM 2 Guirc 2 is between 24272 and 083234 Two or more random effects Model 11 Suppose both factorsA and B are random factors the twoway complete model with random effeds is Yijk u xi B 86 sijk il a jl b kl2 n uquot the overall mean expected response ai the effect of the ith randomly selected treatment are independent N0 6 3 the effect of the jth randomly selected treatment are independent N0 6 043g the effect of the ijth randomly selected interaction are independent N0 6343 8in random error are independent N0 62 xi 3 1 and 8in are pairwise independent As in atwoway complete model with xed effects SSTmal SSA SSB SSAB SSE and the calculation for each term is the same Model 11 ANOVA table Source of Degree of Sum of Mean Expected Variation Freedom Squares Square MS F0 E MS MS FactorA al ssA MSA SSA A F A a l 62nci bnc OMSAB SS EMS MS Factor B bl SSB MSB B 2 B 2 2 0 B b l 6 Jrncsulfranol3 MSAB 53 E MS MS AB a1b1 ssAB MSAB 2 2 F0 AB a 1b 1 6 noa MSE interaction SS Error abn 1 ssE MSE 75 EMSE 62 abn 1 Total abnl SSTmal Hypothsteding H0 16 0 no interaction H1G 3gt0 MS ReJeCt H0 If F0 AB gtF1rra1b1abn1 MsE When there is no interaction we can test MS Reject Ho if FO A gtF1cra1a1b1 MSAB Note the denominator is not M SE Wecan alsotest H0 6 0 H126 gt0 MS Reject Ho If FO B gtFrrb1a1b1 MSAB Mixed Modelngodel III When a model has both fixed effects and random effects it iscalled a mixed model If a factor is random itsinteractionswith any other factor will be regarded as random effects Yijk H 0 i Bj 83 8ijk i1 a j1b kl2 n uquot the overall mean expected response xi the effect ofthe ith fixed effect 210ci 0 ii the effect of the jth randomly selected treatment are independent N0 6 01 the effect of the ijth randomly selected interaction are independent N0 21771633 subject to the restrictions 21 ocBij 0 8in random error are independent N0 62 3 uBij and I in are pairwise independent Source of Degree of Sum of Mean Expected Variation Freedom Squares Square MS F0 EMSA SS a MS FactorA al ssA MSA A a o A 61 1 62 noil3 bn 1 MSAB a 71 33 E MS FactorB b1 ssB M83273 2 B 2 F0 b l 6 anol3 MSE SS EMS MS AB a1b1 ssAB MSAFA 2 AB 2 F0 AB a lb l 6 n6u MSE interaction SS Error abnl SSE MSE 75 EMSE 62 abn 1 Total abnl SSTmal Hypothstesting H0 ch 0 no interaction


Buy Material

Are you sure you want to buy this material for

25 Karma

Buy Material

BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.


You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

Why people love StudySoup

Jim McGreen Ohio University

"Knowing I can count on the Elite Notetaker in my class allows me to focus on what the professor is saying instead of just scribbling notes the whole time and falling behind."

Janice Dongeun University of Washington

"I used the money I made selling my notes & study guides to pay for spring break in Olympia, Washington...which was Sweet!"

Bentley McCaw University of Florida

"I was shooting for a perfect 4.0 GPA this semester. Having StudySoup as a study aid was critical to helping me achieve my goal...and I nailed it!"

Parker Thompson 500 Startups

"It's a great way for students to improve their educational experience and it seemed like a product that everybody wants, so all the people participating are winning."

Become an Elite Notetaker and start selling your notes online!

Refund Policy


All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email


StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here:

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.