Applied Stat for Bio Sci
Applied Stat for Bio Sci STA 100
Popular in Course
Popular in Statistics
This 26 page Class Notes was uploaded by Carmen Mayer on Tuesday September 8, 2015. The Class Notes belongs to STA 100 at University of California - Davis taught by Chengzhi Chu in Fall. Since its upload, it has received 13 views. For similar materials see /class/191908/sta-100-university-of-california-davis in Statistics at University of California - Davis.
Reviews for Applied Stat for Bio Sci
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 09/08/15
STA 100 Lecture 22 Linear Regression and Correlation I Introduction Simple linear regression is one of the most widely used statistical techniques for developing a mathematical relationship between a dependent variable and an independent variable A Practical Problem In an agricultural study we would like to develop a linear relationship between the wheat yield measured in bushelsacre and fertilizer level measured in poundacre The following data were collected from seven randomly selected plots Fertilizer X Yield y Pounds Acre BushelsAcre 100 40 200 50 300 50 400 70 500 65 600 65 700 80 We are interested in fitting a regression model to the data and using the model for the estimation and prediction II The Model a The simple linear regression model is yBo B1X8 b The Least Squares Estimation Let Example Wheat yield yA bO b1 X Find b0 and b1 such that SSresid Z y yA 2 is minimized Fertilizer x Yield y PoundsAcre BushelsAcre 100 200 300 400 500 600 700 40 50 50 7O 65 65 80 Yield BushelslAcre denote the estimated regression line Wheat Yield 0 O O O O O 0 200 400 600 Fertilizer PoundsAcre c Residuals The residuals are defined as observed minus estimated values Example Wheat yield Fertilizer X Yield y Estimated Residual PondsAcre BushelsAcre 100 40 423 23 200 50 482 18 300 50 541 41 400 70 600 100 500 65 659 09 600 65 715 68 700 80 777 23 Residual Plot 15 10 7 0 5 7 3 9 g 0 l l l 200 400 600 8 0 5 O 10 Fertilizer PoundsAcre Normal Probability Plot 999 99 95 7 g 30 Q 50 E 20 7 D 05 01 001 Average 70 0000000 StDev 544179 N 7 0 1390 RES1 AndersonrDarlmg NormalltyTest 1 PVValue 0 650 III Assessing the Model a In the simple linear regression model y 30 31 X 8 we assume that 8 s are independent and they have normal distribution with mean 0 and standard deviation 6 b An estimate ofo is Example Wheat yield c A confidence interval for the slope Consider the estimator for 31 as b12x y 2x 2 Then b1 is a random variable What is the distribution of b1 A lOOloc confidence interval for 31 is Example Wheat yield d Test for the Regression Slope Recall that t b1 31 Sb1 is distributed as t with n 2 degrees of freedom To test HO 31 0 compute t and use the chart Example Wheat yield STA 100 Lecture 4 Probability I Introduction a What is the use of probability 0 Probability allows us to handle variations in experimental outcomes mathematically It helps us to deal with uncertainties Examples Risk of breast cancer Chance of being hit by a deadly virus Tuberculin skin tests results 0 Statistical inference is based on probability b Experiment Process of observing a random phenomenon Examples Coin tossing Selecting an item at random to test for quality Select an individual at random and record the blood type c Sample Space The set of all possible distinct outcomes Examples Toss a coin once S H T Roll a die once S 1 2 3 4 5 6 Select an item at random S good defective Select an individual at random and record the blood types S A AB B 0 1 Event The set of all outcomes having a certain feature Examples Toss a coin once E H Roll a die once E 2 4 6 Select an item at random E defective Blood types E A AB II Assigning Probabilities to Events a Classical Approach Suppose the sample space consists of n distinct outcomes If the outcomes are equally likely then PE Number of outcomes in E n Examples b Relative Frequency Approach Probability of an event is a number representing the portion of times the event is expected to occur when the experiment is repeated many times Examples c Subjective Approach Subjective probability re ects the degree to which we believe the event will occur Examples 1 Probability Rules 6 Probability Trees Examples f Probability of Combination of Events 111 Conditional Probability and Independence a Let A and B be two events such that PB 3 0 The conditional probability of A given B is defined as PAB PA and B PB Example Sideeffects Male None 60 Mild 20 Moderate l 5 Sever 5 Total 1 00 A No sideeffect Female 80 10 5 5 100 B Selected individual is male b Multiplication Rule Use the definition of conditional probability Total 140 30 20 10 200 PA and B PA PBA PB PAB Example c Independence Events A and B are independent if PAB PA or PBA PB This leads to PA and B PA PB Example Sideeffects Male None 60 Mild 20 Moderate 1 5 Sever 5 Total 1 00 A N0 sideeffect Female 80 10 5 5 100 B Selected individual is male C Sever sideeffects Total 140 30 20 10 200 STA 100 Lecture 6 Continuous Probability Distributions When the number of observations is large we may describe the overall pattern of observations by a smooth curve Examples These curves are called density curves Examples The Normal Distribution 1 Introduction One of the most important continuous distributions is the normal distribution Normal distributions are characterized by two parameters u and G u mean of the population 6 standard deviation of the population Empirical rule comes from the normal distribution Examples of the normal distribution 1 Birth weighs oz are normal with mean 120 and standard deviation 20 2 The volume of air a person can expel in 6 seconds is called Forced Vital Capacity FVC and it is a standard measure of pulmonary function For a given age and gender it has a normal distribution 3 Measurement errors are normal mean 0 11 Standard Normal Distribution a A normal distribution with mean 0 and standard deviation 1 is called a standard normal distribution Examples of using the table of standard normal b If Y has a normal distribution with mean u and standard deviation 6 then Z Y u G is standard normal Example A state has 5 million students in public schools A student is classified as gifted if hisher IQ is at least 130 A legislator has proposed providing schools with 200 extra for each gifted student under a new program How much will this program cost the state STA 100 Lecture 23 Linear Regression and Correlation continued IV Using the Model a Prediction Given a specific value of the independent variable X say Xg a 1001 0c prediction interval for y is y ta288V1lnXg 2 2x x2 where yAbo blx Example Wheat yield b Estimating the Expected Value Given a speci c value of the independent variable X say Xg a 1001 0c con dence interval for the expected value of y is y to 2 sex 1nxg 2Zx x2 where yAyAbo blx Example Wheat yield V Coefficient of Correlation a The Pearson coefficient of correlation is a measure of the strength of linear relationship between the response variable y and explanatory variable X The sample correlation coefficient is defined as r2ltx xgtlty ygtVzltx xgt2 2y yf Example Wheat yield Fertilizer x Yield y PoundsAcre BushelsAcre 100 40 200 50 300 50 400 70 500 65 600 65 700 80 To test the null hypothesis H0 p 0 we use the test statistic t r x n 2 1 r2 which is distributed as twith n 2 degrees of freedom Example Wheat yield b The coefficient of determination is defined as r2 1 SSresid SStotal Example Wheat yield 3 The analysis of variance table Regression Analysis The regression equation is Yield 364 00589 Fertilizer Predictor Coef StDev T P Constant 36429 5038 723 0001 Fertiliz 005893 001127 523 0003 s 5961 R Sq 845 R Sqadj 815 Analysis of Variance Source DF SS MS F P Regression 1 97232 97232 2736 0003 Residual Error 5 17768 3554 Total 6 115000 STA 100 Lecture 8 Sampling Distributions I Introduction Recall the definitions of parameter and statistic We are interested in finding the sampling distribution of a statistic Examples Use computer simulations 11 Sampling Distribution of 3 a Why we are interested in the distribution of the sample mean 6 Properties of the sample mean mean of the sample mean mean of the population variance of the sample mean variance of the population sample size standard deviation of the sample mean standard deviation of the population square root of sample size Example c The Central Limit Theorem For large sample sizes n 2 30 the sample mean is approximately normal with mean u and standard deViation GNn Example 1 Distribution of serum triglycerides Example 2 Safety of airline passengers An airliner can carry 42000 pound passengers luggage The weight of passengers plus luggage has a mean of 200 pounds with standard deViation of 40 pounds If the airliner had 202 passengers what is the probability of overloading IV Sampling Distribution of a Proportion Let Y be the number of successes in n independent trials The sample proportion is defined as A p Y n If n p gt 5 and n lp gt 5 then the sample proportion is approximately normal with mean p and standard deviation lp1pn Example In a hospital 6 of newborns are considered to have low birth weight Find the sampling distribution of the sample proportion for a sample of size 100 What is the probability that for a sample of size 100 the sample proportion is more than 10 STA 100 Lecture 19 Analysis of Categorical Data continued IV Fisher s Exact Test The Fisher s exact test is based on computing the probability of the observed table and tables that are even more extreme than the observed table It is very much related to the calculation of the p values Example Consider the following contingency table on biopsy results Test Results Positive Negative Yes 10 20 Fam Hist No 5 25 Are the test results independent of the family history of cancer V The rxk Contingency Tables The rxk contingency tables are generalization of 2X2 tables in dealing with two categorical responses A Practical Problem 250 men and 400 women were asked about their preferences of avors of certain brand of ice cream Flavor Gender 1 2 3 4 Total Male 23 169 48 10 250 Female 174 34 150 42 400 Total 197 203 198 52 650 Are male and females different in terms of their preferences of ice cream avors Again if we let 0 represent the observed and E the expected frequencies We can use the chisquare test statistics X2 z o E2 E The degree of freedom of this chisquare statistics is df r 1k1 Example The ice cream avor VI Relative Risk and Odds ratio In studies related to human health ratio of two probabilities is called relative risk Example Smoking and cardiovascular disease CVD No Yes No 50 1 5 Smoking Yes 1 0 25 The odds ratio is the ratio of two odds under two different conditions Example 1 Smoking and cardiovascular disease Example 2 Side effects Side Effects Present Absent Drug l 5 3 5 Treatment Placebo 4 46 VII Difference Between Two Population Proportions A Practical Problem These days the tobacco industry is under fire due to some possible association between smoking and health problems ls lung cancer related to smoking Let s consider the following data Lung Cancer Yes No Total Smokers 20 60 80 Non Smokers 15 105 120 Total 3 5 l 65 200 a Consider two populations Population 1 Population 2 Success with probability p1 Success with probability p2 Failure with probability 1 p1 Failure with probability lp2 b A 100loc confidence interval for p1 p2 is c To test H0 p1 p2 we compute 2 Example Smoking and lung cancer
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'