New User Special Price Expires in

Let's log you in.

Sign in with Facebook


Don't have a StudySoup account? Create one here!


Create a StudySoup account

Be part of our community, it's free to join!

Sign up with Facebook


Create your account
By creating an account you agree to StudySoup's terms and conditions and privacy policy

Already have a StudySoup account? Login here


by: Jasmine Jackson

BioStatNotes BIOL 446

Jasmine Jackson
GPA 3.0
Dr. Joshua B. Plotkin

Almost Ready


These notes were just uploaded, and will be ready to view shortly.

Purchase these notes here, or revisit this page.

Either way, we'll remind you when they're ready :)

Preview These Notes for FREE

Get a free preview of these Notes, just enter your email below.

Unlock Preview
Unlock Preview

Preview these materials now for free

Why put in your email? Get access to more of this material and other relevant free materials for your school

View Preview

About this Document

Dr. Joshua B. Plotkin
Study Guide
BIOL, Biostat, 446, upenn
50 ?





Popular in Statistics

This 34 page Study Guide was uploaded by Jasmine Jackson on Thursday December 10, 2015. The Study Guide belongs to BIOL 446 at University of Pennsylvania taught by Dr. Joshua B. Plotkin in Fall 2015. Since its upload, it has received 42 views. For similar materials see BIOSTATISTICS in Statistics at University of Pennsylvania.


Reviews for BioStatNotes


Report this Material


What is Karma?


Karma is the currency of StudySoup.

You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 12/10/15
01 What is Statistics 0 Deduction vs Induction Science is a zigzag process between deductions and inductions o Deduc on I Deduction act of formulating a theory of how the world should look what observations should be I Statement of the Future I Start with if 0 Induc on I Induction make some conclusion on the basis of an observation inference o In Science 0 Copernicus lfwe make observations we will see that the planets have circular orbits deductive theory 0 However observation of elliptical orbits therefore we can use induction to tell us Copernicus was wrong 0 Newton s Theory of Gravity was deductive stating If the law of Gravity holds then we expect to see elliptical orbits o Randomness Experiments 0 Deduction becomes probability theory 0 Induction becomes statistical say something about the world based on observations including randomness 0 When randomness occurs we cannot be completely sure only 95 certain https WWWCourseheroComfile8062549BioStatNotes 1 11 Discrete Random Variable and Their Probability Distribution and Parameters 0 Discrete Random Variable a conceptual and numerical quantity which in some future experiment will take some value from a discrete set of possible values with known or unknown probabilities 0 Let X be a discrete random variable uppercase letter 0 Conceptual because experiment is in the future 0 Probability Distribution of a Discrete Random Variable a listing of all the possible values the random value can assume given with their respective probabilities 0 Examples 0 Example 1 I plan to toss a fair coin twice I DRV is the number of heads that will be observed DRV o 1 2 PD 14 12 14 tRESUI TT HTTH HH 0 Notes I Sum of the probabilities is 1 I Probabilities are not equally likely I Two orderings result in two ways to get 1 DRV 0 Example 2 Plan to toss a biased coin twice Probability of heads 7 o DRV is the number of heads that will be observed DRV O 1 2 09 42 49 PD 33 237 77 Res TT HT TH HH 0 Example 3 I plan to toss a coin twice Probability of heads o DRV is the number of heads that will be observed DRV O 1 2 PD 12 21 2 Resul TT HTTH HH httpsWWWCourseher0Comfile8062550BiOStatNoteSZ 0 Has Parameter 0 Notes on Parameter 0 Parameter an unknown constant associated with a random variable Not 314 Parameter is unknown value By convention parameters are always Greek letters Most of statistics attempts to say something about the value of an unknown parameter I The question in the example is Is the coin fair I Statistics is about estimating the value of an unknown parameter or testing a hypothesis regarding the unknown parameter 0 Estimation vs Calculation 0 Estimate to experiment get data but not know the value with perfect accuracy 0 Should never say calculate value of an unknown parameter because we can t know 0 Probability Distribution 0 Tabular Method I List in table 0 Graphical Method I List in bars of graph I Bad for unknown parameter 0 Formulaic Method adult way I Let X be a discrete random variable I ProbXxsome formula involving x and parameters OOOO httpsWWWCourseheroComfile8062550BioStatNotesZ 12 Examples of Discrete Probability Distributions 0 121 Some Numerically Defined Distributions 0 Ex Let X be the number of traffic accidents in Philly tomorrow 0 122 Uniform Distribution 0 Uniform Distribution each value is equally likely 0 There are b possible values for X the smallest of which is a 0 Each possible value has the same probability 1b 0 Let X be a RV with uniform distribution I ProbXx 1b for xa xa1 xab1 0 123 The Binomial Distribution 0 Note the random variable is dictated by the experiment You don t pick it o Binomial Distribution Arises FF I You plan to conduct a xed number n of trials I There are two possible outcomes that may arise success or failure I Outcomes of trials are independent I Probability of success is 0 Let X be the total number of successes across n trials is a binomial usually 0 Examples of Failures I NOT RULE 1 Planning for 10k people but then get funding cut and change number of patients during trial I NOT RULE 2 Has 3 options Success Failure Neutral I NOT RULE 3 Pulling colored marbles out ofjar without replacing thus changing parameters I NOT RULE 4 Counting home runs in baseball not same probability for each player 0 Probability Distribution of Binomial httpsWWWCourseher0Comfile8062551BiOStatN0tes3 I Probability of getting x successes and nx failures in any particular designated order is always 0 Notes on the Binomial Formula I Know this formula I Binomial formula applies iff4 rules are met I One of the conditions is that there are 2 outcomes BUT that does not mean is always 12 I Binomial Probability Chart can be used 0 124 Other Discrete Distributions o Trinomial Distribution 3 outcomes rather than 2 I Ex Genetics AA Aa aa 0 Negative Binomial Distribution inverse of binomial distribution I Fixed number of success with random number of trials I Ex Fisherman casts his rod 237 times random in order to catch 100 fish httpsWWWCourseheroComfile8062551BioStatNotes3 16 Data 0 Observations that come from some process involving randomness 0 X1 number on Die Roll 1 0 X2 number on Die Roll 2 0 Can NEVER say X43 Because RVs are not numbers It I o For Data we use lowercase letter x 0 Probability is our hypothesis predictions of observations 0 Statistics is observations inference on the state of the world 21 Concepts of Statistical Inference Notes 0 We will always use probability theory prior to using statistics 0 You can make an erroneous inference o Unlucky data bad conclusion 0 ln probability theory we calculate the mean and or variance by assuming we knew dist However we can t calculate mean and variance in real cases Instead we estimate 0 Statistical Inference Has 2 Areas 0 Estimating the values of unknown 0 Testing Hypotheses about unknown parameters I Start by phrasing your research question in terms of the values in unknown parameters httpsWWWCourseheroComfile8062555BioStatNotes7 22 Estimating the Unknown Value of a Parameter 0 221 Estimating the Mean 0 Let X be the blood pressure reduction we will observe when we give the drug to a random person I We don t know the probability distribution or the mean 0 GOAL estimate the value of the mean and understand the precision of our estimate 0 Example I We give the drug to people and observe and reduction X Reductio Person 1 n 1 X2 Person 2 Xn Person n Observed Average 1n X1 X2 Xquot I Three Objects o unknown mean of distribution 0 observed average in our data this is our estimate of the mean 0 random variable average that existed but was unknown prior to experiment 0 Notes on Estimating the Mean I is the estimate I is the estimator I depends on properties of 0 Recall mean of o is an unbiased estimator of 0 RV whose mean equals parameter you are trying to estimate 0 is an unbiased estimate of I How Accurate is our estimate 0 Depends on probabilistic properties of estimator o Variance of is 2n 0 As n grows variance is smaller 0 222 Estimating the Variance 0 We first estimate 2 0 Given data x1 x2 x n httpsWWWCourseher0Comfile8062556BiOStatNotes8 I We use the following estimate of the variance 2 2 X12 X22 Xn2n1 0 Before the experiment we could define 82 ltX1gt2 X292 ltXgt21ltn1gt Estimator of variance Mean of S2 2 0 Alternative Formula for s2 s2 W x92 x92 ngt21n1gt 0 Degrees of Freedom Why We Use n1 I Ways in which we can move I Because we divide by n1 we lost 1 degree of freedom because we ve already used the data once 0 223 Confidence Intervals of the Mean 0 Precision depends on the variance of the estimator 2 n 0 STEP 1 Apply 2 Standard Deciation Rule to Prob 2n 2 n 95 0 STEP 2 Switch around so that is in the middle Prob 2n 2 n 95 0 STEP 3 Use S for Prob ZS Vn ZS n 95 WE can be 95 sure that is between those 2 values 0 Notes on the Confidence Interval o In two tests of same drug with different number of participants different n will be different But both will be similar because on target n makes a huge difference in the interval Unbiased estimates of mean Width of the confidence interval is 0000 4S n As n gets larger width gets smaller To double precision need 4x n httpsWWWCourseheroComfile8062556BioStatNotes8 o This is why sample sizes need to be huge I We CANNOT control S so we aim to control n S is an estimate of standard deviation It is a property of the drug SVn is called the standard error of the mean This is dumb because I It means standard deviation I S is an estimate so it s an estimate of the standard deviation I It s the standard deviation of the estimate of the mean is a parameter without standard deviation There for it is an estimate of the standard deviation of the estimator of the mean 0 Where does probability theory come in I Concept of mean Probability Theory I Concept of variance Probability Theory I Concept of Probability Theory I Concept of 82 Probability Theory I Two SD Rule Probability Theory 0 224 Estimation of and confidence interval for the binomial probability 0 EX 1 I Plan to give new drug to n people I Total number of people cured X o X is binomial I Define Q Xn proportion cured 0 Mean of Q o Variance of Q 1 n I Observed Xcured is q Xn o Derive 95 Confidence Interval by Applying 2 SD Rule Prob 21 n 2 Q 21 n 2 95 Prob Q 21 n 2 Q 21 n 2 95 c We need to replace with estimator q Prob Q 2q1q n 2 Q 2q1q n 2 95 0 Alternative 0 Max value of q1q occurs when is equal to 12 Then the max value is A So conservatively we can say Prob Q 1Nn Q1Nn 95 0 Notes on the Confidence Interval I Width of confidence interval is 2 ln httpsWWWCourseher0Comfile8062556BiOStatNotes8 I For 1nX1 X2 Xn zs n o For a different I Probability theory came from probabilistic deductions of Q and the 2 SD rule I If given an interval we can find n by working backwards 0 Notes on Estimation I is our estimate of o mu hat is also the estimate of mu I If using you should specify that it is 2 Standard Deviations I 1 SD is 67 2 SD is 95 0 225 Estimation in Regression o How does one random quantity y depend upon some nonrandom quantity x I Yuncontrollable source of randomness not all randomness I Xcontrollable no randomness 0 Ex 1 how does the height of a plant y after 2 days depend on the amount of water x 0 Ex 2 how does the heartbeat of a lab rat y depend on temperature x Plan to subject n12 rats to different temperatures and we will measure the heartbeat in each rat Rat123456789101112 Tem l0 16 16 16 18 18 2O 22 24 24 26 26 26 0 Think of some rat Plan to subject to temp x Let Y that Will be observed RV with mean and variance We assume that MeanY x Linear with form ymxb VarianceY 2 Dependent on temperature 0 Goal Estimate 2 with some degree of precision o Variance for 2 use x for 2n use 0 Steps 1 Plot your data Does it look linear 2 Compute 5 Quantities 1nx1 x2 xn 1ny1 Y2 yn SXX x12 x22 xn2 n2 Syy Y12 Y22 Yn2 39 2 Sxy X1Y1 X2Y2 XnYn 39 0 Our estimate of is httpsWWWCourseheroComfile8062556BioStatNotes8 b SXy SXX Our estimate of is a b Our estimate of is 32 syy sxy2 sxx Fl2 Because we have already used the data twice loss of 2 degrees of freedom 0 Why do we use these Probabilistic reasons 0 Before exp Y1 Y2 Yn are all RVs We have assumed MeanY x VarianceY1 VarianceY2 2 1nX1 X2 Xn 1nY1 Y2 Yn SXX X2 n2 SYY y2 39 2 SXY XY 39 n B SXY SXX 0 Mean of B Therefore it is an unbiased estimator 0 Ex Imagine a rat that we plan to subject to temp x We will observe heart rate Y MeanY x intercept sope VarianceY1 2 Natural rat to rat variance 0 We make assumptions about mean heart beat 0 We wish to use regression to find the value of 3 parameters 0 We can use the formulas as estimators because they are unbiased Mean B Mean A Mean 82 2 o How precise are our estimates Depends on B estimator of slope var B 2 SXX We can apply the 2 SD rule to B Switch B and httpsWWWCourseheroComfile8062556BioStatNotes8 Substitute s for Therefore we can be 95 sure that lies between 0 Notes on Estimation in Regression 1 2 Estimated regression line is the least squares line 3 Don t extrapolate beyond the range of temperatures used 4 Most important parameter is lf 0 x has no effect on y 5 n2 degrees of freedom because you lose 2 degrees of freedom for the 2 regression used data twice If you have n2 you cannot estimate 32 because you cannot divide by O 6 32 will NEVER BE NEGATIVE 7 fthere is high variance we will have little confidence fthere is low variance we will have high confidence This is dependent on temperature as if SXX is small rather than large it is hard to test We can control SXX by picking a very wide range with Half of temperatures very high Half of temperatures very low 0 Brief Summary of Estimation Paramet Estimato Estimate Mean of Var of Estimator er r Estimator 2n 2 82 2 n1 2 Q q 1n B b ZSxx A a 2 82 2 n2 2 0 Notes on Estimation 1 Estimator is a random variable 2 Mean of Estimator equals parameter we are trying to estimate unbiasedtrue for us 3 Estimate is the observed value after the experiment 4 12 is the variance in a random variable 22 is the variance in a slope httpsWWWCourseheroComfile8062556BioStatNotes8 DIFFERENT httpsWWWCourseher0Comfile8062556BiOStatN0tes8 23 Hypothesis Testing 231 Introduction 0 Set up two impeding hypotheses The first is the null hypothesis 0 Set up alternative hypothesis BEFORE you start the experiment IS Fhls Is this new drug Is the mean blood pressure Is there any effect of com fair better than the old different for men and temperature on 39 drug women heartbeat NU new old men women 0 2side 39 men women d O 1side gt gt d up new old men women 0 1side d 39 39 men lt women lt0 down 0 How do we proceed o Eventually we will accept H0 or H1 0 Use 2x2 contingency table H0 is true H1 is true H0 is X Type 2 Error observed H1 is T 1 E X e rror observed yp Type 1 Errorfase positive Type 2 Errorfase negative 0 Notes on Errors in Conclusion Always some probability of type 2 or type 1 error Can t make both errors Purpose of hypothesis testing is to limit these errors as much as possible If amount of data is fixed we cannot simultaneously reduce both error rates 0 Scientists care more about Type 1 Error so we try to reduce that 232 Informal example of Hypothesis Testing Using the Binomial Distribution 0 Choose an acceptable type 1 error rate 0 Coin Toss Example Sloppy Version C1 C2 Consider test statistic Xthe number of heads HO 12 Toss coin n10000 times Reject the null if XltC1 or XgtC2 Choose type 1 error rate let s say 5 Make sure H1 12 Probtype 1 error5 Probreject HO when it is true05 Probreject HO when pi 1205 ProbXltC1 or XgtC2 when pi 1205 12 then X is a binomial with index n10000 parameter 12 Mean ofX 5000 Variance of X2500 SD50 I Assuming 12 then Prob4900 X 5100 95 ProbXlt4900 or xgt5100 05 c14900 and c25100 0 Coin Toss Example Nice Version I ProbX C1 when 12 025 I Use Normal 0 C1 4901 round to make more conservative 0 C2 50985 5099 Notes 0 We choose error rate and design experiment to establish a range of values 0 What fwe want a 1 error rate Use 2575 instead of 1 96 I Broader range less error C14870 c25130 0 Use probability theory to do all this recognize binomial found mean and variance 2 SD rule normal approximation 233 General Principles and Format of a Test of Hypothesis 0 General Steps Phrase question in terms of unknown parameters Declare null and alternative hypothesis Choose our desired type 1 error rate Choose our test statistic Calculate the critical region those values ofX that will lead us to reject HO Get data do test Notes Steps 14 occur before you get data Steps 14 involve probability theory Cannot declare HO and H1 after experiment OthenNise selffulfilling prophecy 0 At end of day DO NOT SAY I have proven null is untrue Even if you reject the null you can only be 95 or 99 certain Data has randomness never say proof 0 You can compute some critical regions by hand more complex need to use computer or more charts 0 Various test statistics could use proportion of heads rather than total 2 and 2 and continuity correction change Same cut off values eventually 0 Always formulate both inductive and deductive statement Ded lfthe coin is fair then the probability of getting Xlt4901 or Xgt5099 heads is 5 nd We observed X4842 heads so we conclude that we have enough evidence to reject HO using a 5 type 1 error rate 0 234 Testing a Hypothesis About a Single Binomial Parameter 0 Ex ESP Guessing Suit of Cards 1 H0 14 H1 gt 2 5 type 1 error rate 3 Test stat Xtotal number of correct guessing in n1000 times 4 Prob theory compute critical region mean n 250 var n1 1875 5 Get Data Do Test 0 Pvalue probability assuming the null hypothesis is true of getting the observed value of the test statistic or one more extreme than the observed value in the direction indicated by the alternative hypothesis I Steps for PValue 0 Determine Hypothesis 0 Determine prob statistic 0 Compute P value and compare to type 1 error rate I Ex ESP 0 Suppose we observe x264 correct 0 What is the p value 0 probXgt269 o probYgt2695 o probZgt135 o 0885 885 o Is pvalue less than 5 No Therefore null is correct I Notes on the PValue 0 Always reach the same conclusion 0 Sometimes more convenient to use PValue approach 0 2 Sided Example Using PValues o HO 12 H1 12 5 error rate x0 is number of heads in 10000 tosses 0 Suppose x5090 heads I ProbXgt5090 or Xlt4910 I 2 x ProbXgt5090 I 2 x ProbXgt509012 I 2 0367 0734 I Accept null that coin is fair 0 235 Testing for the Equality of Two Binomial Parameters 0 Background I Let X1 and X be independent RVs with means 1 and 2 and variances 12 and 2 2 I Consider D X1 X2 Mean of D 1 2 Variance of D 12 22 0 Ex Are women more likely to be lefthanded O 1 HypOtheSiS 0 men women H0 men women 0 Step 2 5 Type 1 error rate 0 Step 3 Test Statistics LH RH Total Wome X1 1 X1 1 n Men X2 n2 X2 n2 Total X1 X2 m n2 X1 X2 1 2 We can Z the D Thus when we Z it we get We can substitute the following for And Ultimately Get I Step 4 Cut off values 0 For a type 1 error rate of 5 half 25 falls on either side 0 Therefore Zgt196 or Zlt196 o If it were onesided we would not split up percentage I Step 5 Get Data Do Test 0 Notes on Two Binomial Parameters I These are called tables of association I There is an alternative phrasing c There is some positive association between med and pain relief 0 There is some negative association between med and no pain relief 0 There is some positive association between placebo and no pain relief 0 There is some negative association between placebo and pain relief I Order of table does not matter I So far we have only done 1sided tests For 2sided critical region is 2sided I We have not used a continuity correction for the Z because it cancels out for 2 binomials I PValue Version 0 Steps 13 are the same Jump to data and compute test stat 0 Say we get z 288 o Pvalue 12 4980 002 so reject null I Sample Size 0 What happens when you increase 0 More likely to reject null hypothesis when alternative is true 0 Increase sample size by k 0 Z is multiplied by root k I When you make charts always use whole numbers not percentages You lose information about sample size by using percentages I Try to make n1 and n2 similar It I I Z involves approximations We put 2 in quotes because we have approximated a proportion by a normal distribution using the central limit theorem I How big does n have to be You have to be sure that every box in the table has at least 5 o Fisher s Exact Test is Exact I This is a differential test We are looking at diff between 2 parameters We don t care about their exact values I We are not testing for cause and effect Correlation is not causation 0 236 Contingency Tables 0 Reinspect 2x2 table for bigger tables B Not B Total A O11 O12 r1 NotA 021 022 F2 Total C1 C2 n New Z statistic 0 r1 r2 c1 c2 called marginal totals o ChiSquare by squaring z for 2x2 table we can use chi square 0 Notes on 2x2 Contingency Tables 0 1 degree of freedom as we can treat the marginal totals as known values Notation we will use X2 for chi square Only very large positive numbers will lead you to reject the null Can t use pvalues for chi square data only use original chart values Alt Formula for Chi Square 0000 e stands for our expectation under the null o The observed values that we eventually get will be whole numbers I But es can be decimals o X2Z2 only for 1 degree of freedom 0 X2 obseNationexpectations2 expectation 0 237 Larger Contingency Tables 0 For larger contingency tables r1c1 degrees of freedom 0 Notes on RxC Tables I It is possible to create deductiveinductive statements 0 Ded lf H0 is true then probX2gt16812 is 1 o lnd X2181 so we can conclude that there is some association with a type 1 error rate of 1 I All notes from 2x2 tables apply here as well 0 However X2 DOES NOT EQUAL 22 for more than 1 degrees of freedom I We will use cutoff values because you need a computer for pvalues 0 Can estimate pvalues 0 Look at Chi square distribution and see what percentages your pvalue would fall between I In general we will only look at diffuse alternatives looking for some association AssociationCausation X2 involves some approximations 0 Not safe to use if any number is less than 5 I What are HO and H1 for r by c 0 Let i denote the probability that an observation will fall in an ith row 0 Let 1 denote the probability that an observation will fall in an jth column 0 Let U denote the probability that an observation will fall in cell ij 0 H0 U i j 0 H1 U i 1 for at least one pair ij 0 238 Testing for a Specified Discrete Probability Distribution 0 Ex Is a 6sided die fair I Step 1 H0 die is fair H1 die is unfaire I Step 2 5 type 1 error I Step 3 Roll die n times see how many of each 1 2 3 4 5 6 wesee N1 N2 N3 N4 N5 N6 Expectatio n Test Stat n6 n6 n6 n6 n6 n6 0 Step 4 X211O7 reject null 5 degrees of freedom 0 Notes on Discrete Tests 0 K possible values 0 In such cases k1 degrees of freedom generally 0 In general test by listing possible values with null hypothesized probabilities observed values and expected 0 Further Generalization is this genetic theory correct Values Pink White Prob 2 21 12 Expect 2n 21n 12n Observ nr nIO nW 00000 O I Have to sub in for I Degrees of freedom becomes k11k2 as you subtract 1 degree of freedom for your estimate Observed data always whole numbers Expected can be decimals Set up experiment so expected numbers are all at least 5 Use cut off values Very hard to compute pvalues We ve discussed discrete dist Can also be used to test continuous X2 used for different situations Dissimilarities different nulls 0 239 Tests for Means O O 0 2391 Normal Theory Tests variance known I Background 0 Suppose we have the variance of average human temperature 1 1 and assume variance is the same for elevated temperature 0 We will measure temp after a procedure and as gt986 Nothing to do with individuals just mean We will assume After procedure temperatures are still normally distributed 0 Variance is 11 again same as before I Test Elevated Temperatures 0 H0 986 H1 gt986 o 1 error rate 0 Test Statistic Cut Off Values I 1 zgt233 reject null Get Data Do Test 0 2392 Normal theory tests variance unknown i e ttests 0 Test Elevated Temperatures I H0 986 H1 gt986 I 1 error rate I Test Statistic We need to estimate the variance Thus this is NO LONGER Z DISTRIBUTED Because we estimated standard deviation Called tstatistic 0 Cut Off Values I 1Degrees of freedom n1 I So for say n20 1 tgt 2539 0 Get Data Do Test 0 Notes on the 1Sample TTest o This is frequently used 0 The ttest is a question about the mean of a distribution 0 General Can you get pvalues Not really Numerator of tstat compares average x bar to mean 0 o Tstatistic is scale free I ie unites don t matter and same for Celsius and Fahrenheit 0 How do cut off numbers depend on n I As n increases cut off numbers go down expands critical region I Thus Zdistributed as t becomes 2 More degrees of freeom more normallike Onesided down just tack negative sign onto tchart Two sided test for 5 25 on each side TStat is a signal to noise ratio I The signal is how different they are numerator I The noise is the variance denominator 0 Sample size is important As n noise goes down by root n Called Student s TDistribution as a pseudonym Deductive and Inductive for TTest I Ded lf HC is true the probT gt 233 1 I Ind We observe that t32 so we infer that we can reject H0 0000 o Ztest and Ttest assume normality 0 Cannot apply ttest if your data is not normal 0 23931 TwoSample TTests 0 Ex ls women gt men for blood pressure I H0 women men H1 women gt men I 1 error I Test Must estimate s2 and use the same for men and women 0 Cut off Values I Degrees of freedom nm2 I Use chart for tvalue 0 Get Data Do Test 0 Notes on 2 Sample TTest 0000 Most notes from one sample apply N and M may not be the same Assume n and m are normally distributed Difference in H0 some number women I 1sample I 2 sample women men Women could be Xs or Ys just pick one Degrees of Freedom I 1sample n1 I 2 sample nm2 we have lost 2 degrees of freedom Could have 1sided up 1sided down or 2 sided alternative Alt Formula 0 23932 Paired TwoSample TTest O 0 Background I Used to reduce noise I Say previous data was sister brother pairs I More variance comes from family not gender I This variance is noise I Our goal is to assess gender Ex Same as before with brother sister I H men H I 1 error I Test 039 women 139 women men 0 Notes on Paired TTest O 0 00000 Degrees of Freedom I Unpaired 2n2 degrees of freedom I Paired n1 lose dof but worth it to reduce noise More likely to reject HQ with Paired lf natural pairing use it Pairing types brothersister person beforeafter Paired ttest loses degrees of freedom 2n2 vs n1 ANOVAs generalization of ttests We would have used the thrown away degrees of freedom to look at family relationships TTEST DOF OVERVIEW 0 OneSample TTest n1estimated parameters 0 Unpaired 2Sample TTest nm2 o Paired 2Sample TTest n1 0 2394 NonParametric Tests 0 When the distribution is freenonparametric we use a variation of ttests 0 Simple Sign TTest I Steps 0 H0 986 H1 gt986 o 1 error rate 0 Test Statistic Difference S39gn X1986 or X2986 or Xn986 or c Find differences between data and expected 0 Get signs positive or negative of each Test Statistic X of positive signs 0 Critical Region I Compute a pvaI what is the chance of observing data more or less extreme I Go to Prob distribution chart and add up I X7 temps greater than 986 I Pval probX7 8 9 or 10 for X binomial with 12 o Wilcoxon OneSample Test 0 Steps I HO 986 H1 gt986 I 1 error rate I Test Statistic Sign 7 Rank of Absolute Values Difference X1986 or 4 https WWWCourseherocomfile 806255 8BiOStatNotes 10 X2986 or 7 Xn986 or 2 c Find differences between data and expected 0 Get signs positive or negative of each 0 Rank absolute values Test Statistic W sum of ranks of initially positive differences 0 Critical RegionPVal 0 When HC is true what is the mean and variance Use Zchart to find cut off values I Get Data Do test 0 Notes on 1Sample Wilcoxon Test I Nonparametric about a mean don t assume normal I We have assumed symmetric distribution around mean I Compare Wilcoxon to Simple Sign Test Wilcoxon is more powerful because it looks at deviation 0 Alternative to 2Sample Unpaired 0 Test H I H0 women men 1 women gt men I 5 error I Test Statistic 0 Rank all of the data smallest to largest list will be nm long ExX6 Y2 X4 X3 Y12 0 Find midpoint 0 Make Table 1St half of List 2nOI Half of List Total Wome O11 012 n n Men O21 022 m Total nm2 nm2 nm I Do 2x2 Contingency Z Test 0 Critical Region 0 NonParametric Alt to 2 Samples MannWhitney 0 Test I H H I I gt 0 women men 1 women men https WWWCourseherocomfile 806255 8BiOStatNotes 10 I 5 error I Test Statistic 0 Get data x1 x2 x3xN y1 y2 y3yM 0 Rank values from smallest to largest 0 Test Stat U sum of the ranks of female blood pressures I Critical Region I We know Z will follow normal so allows us to test even if we aren t assuming normality o Permutation Test most common I Computationally intensive cannot do by hand I Test 39 Ho women men H1 women gt men 5 error Test stat 0 Critical Region 0 What values of d will lead to reject null What does distribution of d look like Use a computer to work out the distribution Really the null is not only that they are equal but they have the same distribution in this case Randomly generate genders for all of the recorded temps How many permutations match data set Reject HO if the value of the difference exceeds 95 of the permutations o For 2sided will look at 25 abovebelow so 975 I Notes on NonParametric Tests 0 We have 5 Simple Sign 1Sample Wilcoxon 2x2 2Sample Wilcoxon Permutations These tests are valid even when not normally distributed Disadvantages If data are normal ttest is the most powerfulefficient test 0 In practice 0 Do both ttest and nonparametric 0 Usually will be the same result 0 If different worry if data is normal 0 2310Hypthesis Testing in Regression 0 Regression I Lab rats at temps X1 to Xn chosen WE will observe heart rates Y1 to Yn I I Assume mean x I Variance 2 https WWWCourseherocomfile 806255 8BioStatNotes 10 I Estimate 2 0 So far we just estimate parameters 0 Also we should test natural hypothesis I Does temp influence heart rate 0 Test I H0 0 H1 gt0 I 5 type 1 error I Test Stat 0 Recall used random variable SXySXX 0 Used B as an estimator of 0 We know mean of B and variance of B2SXX 0 Therefore unbiased estimator 0 Given y1 yn produced est called b 0 Test stat type of ttest I Assume Ys are normal 0 Critical Regions n2 degrees of freedom as we already estimated and 0 Notes on TTest in Regression 0 We assume normal distribution o If only n2 we have 0 dof cannot do 0 Change null is the slop of the line we will make larger than 0 All 4 T Tests Together TTEST DOF OVERVIEW 0 OneSample TTest n1estimated parameters 0 Unpaired 2Sample TTest nm2 o Paired 2Sample TTest n1 0 Regression ttest n2 https WWWCourseherocomfile 806255 8BiOStatNotes 10 13 Concepts of Mean and Variance of Discrete Random Variables 0 Mean is center variance is width 0 1 and 2 same mean different variance 0 2 and 3 same variance different mean 0 131 Formal Definition of the Mean 0 Mean let X be a discrete random variable that can assume k possible values v1 v2 vk then the mean ofX is defined to be mean v1 ProbXv1 v2 ProbXv2 vk ProbXvk 0 Notes on the Definition of the Mean 0 Know the formula Mean is a concept in probability theory not statistics Usually the mean of a random variable is unknown in a research setting Standard notation for the mean is The mean need not always be one of the realizable values that the number assumes Mean vs Average NOT SYNONYMOUS Probability vs Statistics Mean can be thought of as the balance point or the center of gravity Symmetry Property if the distribution is symmetric the mean is equal to the point of symmetry o Shift Property of the Mean mean of 1 differ by the amount of shift to 2 0000 The Mean of Random Variable X The Mean of the Distribution of X Why do we care about the mean I About 50 of statistics is about posing questions about unknown means 0 132 Short Formulas for Means of Several Random Variables o Numerically Defined tough luck use long formula 0 Uniform Distribution I Possible values a a1 ab1 for a total of b trials I All probabilities are 1b a b12 uniform httpsWWWCourseher0Comfile8062552BiOStatNotes4 o Binomial Distribution 0 n trials with parameter n binomial 0 133 The Variance of a Discrete Random Variable o In both of the cases above same mean but different variance range 0 Variance let X be a DRV that can have k values v1 v2 vk then the variance is 2 v1 2 ProbXv1 v2 2 ProbXv2 vk 2 ProbXvk 0 Notes on the Variance 0 Know the formula In practice numerical value of the variance is unknown in research setting Definition is dependent upon mean u Reserved notation for variance is 02 In previous example with same mean variance is different Dimensions of Variance Square of dimension of RV Standard deviation is the square root of the variance or o Shift Property of the Variance shifting the distribution has no effect on the vanance 0 Alternative Formula for the Variance 0000000 2 v12 ProbXv1 v22 ProbXv2 vk2 ProbXvk p2 0 Relationship Between Variance and Predictability high variance yield low predictability 0 Why Do We Care I We care much more about the mean but we need to know variance to determine precision 0 134 The Variances of Various Discrete Random Variables o Numerically Defined RV use long formula 2 v12 ProbXv1 v22 ProbXv2 vk2 ProbXvk p2 0 Uniform Distribution httpsWWWCourseheroComfile8062552BiOStatNotes4 Zuniform b 1b 1 12 0 Notes on Variance of Uniform Distribution I Works only for uniform distribution I a does not appear due to shift property I If b1 Variance is O This makes sense as only 1 value 0 Binomial Distribution 0 Let X be a binomial RV with index n and parameter 2binomial 139 0 Notes on Variance of Binomial I Know formula is only for binomial o 2 outcomes per trial 0 Fixed number of trials 0 Independent results is unknown parameter Variance is 0 when 0 or 1 all losses or all wins The variance is maximizied when 12 Example should you put all your eggs in one basket 0 All in one basket in not binomial o In 12 baskets is binomial n12 unknown result of each is break ornoD 0 135 The Two Standard Deviation Rule 0 Not a law just a useful heuristic o Applies to all random variables that are unimodal only one center 0 Two SD Rule of Unimodal Let X be a random variable with a unimodal distribution Then the probability that X will fall within two SD of the mean is about 95 Prob2 X 2 95 0 Most useful when ngt20 httpsWWWCourseher0Comfile8062552BiOStatN0tes4 14 Continuous Random Variables o DRVs are how many gt Discrete Values 141 Continuous RVs are how much gt Continuous Range any number 0 Continuous Random Variable a conceptual and numerical quantity that in sime future experiment will take some value with a continuous range of possible values There is some probability known or unknown to use that the continuous RV will take some value in the range a to b where a is the highest possible value and b is the lowest possible value 0 We assign probability to subrange not discrete value c We allocate such probability using a socalled density function fx such that ProbasXsbIfx fromatob 0 Notes on the Continuous Random Variable I fx may be known or unknown Different continuous RVs have different density fuctions There are no negative areas in density functions Always greater than 0 Integral of entire range is 1 Chance that RV X is some exact value c is O 0 Makes sense as integrating from c to c is O I For a continuous RV Prob a s X s b Prob a lt X lt b 0 Because probability of getting exact a or b is 0 need not include 0 Example 1 I Let C be a continuous RV with fx2x from O to 1 I ProbOsXs11 I Prob3sXs71 o X2 from 3 to 7 49 09 40 0 142 Mean and Variance of Continuous RV 0 Mean of Continuous RV I x fx dx 0 Variance of Continuous RV 2 x2fxdx or 2 x2fxdx2 0 143 The Normal Distribution 0 Normal Distribution a continuous RV X has the normal distribution with mean p and variance 02 if its density function satisfies 0 Notes on Normal Distribution 0 e is natural log 0 Here actually is 314159 0 Cannot integrate normal function 0 The Standard Normal special family of normal distribution where p0 and 021 httpsWWWCourseher0Comfile8062553BiOStatN0teSS 0 Theorem Let X be a normal RV with mean u and variance oz Then Z Xu O 0 Z has standard normal distribution 0 To Zit is to go from X to Z 0 USE CHART draw picture if necessary 0 Notes on Standard Normal 0 Should always use Z notation 0 Use given chart not chart online 0 Proves two standard deviation rule Prob2 Z 2 95 Prob2 Z 2 4772 4772 9544 0 To get exactly 95 k196 not 2 0 144 The Normal Approximation to the Binomial 0 We lack charts for ngt2O because for ngtgtgt a binomial random variable can be well approximated by a normal random variable I In approximation you approximate a discrete binomial RV with a continuous normal random variable 0 Continuity Correction Let X be a binomial random variable with index n and parameter Y I Y is a normal RV with n and 2 n1 Probbina X b Prob a 12 Y b 12 Must use not lt Continuity correction is always 12 of the space between discrete values Notes on Normal Approximation 0 Can compute probability of very large n o Becomes more accurate as n increases I Approximation is poor for nlt15 0 Continuity corrections allow us to make binomial distributions into normal distributions 0 Continuity correction need not be 12 I Always should equal one half of the difference between consecutive values of the discrete RV being approximated norm httpsWWWCourseher0Comfile8062553BiOStatNoteSS 15 Many Random Variables 0 151 IID Assumption 0 We will assume that X1 X2 Xn are independent 0 We will assume that all RVs X1 X2 Xn are identically distributed I mean of X1 meanX2 meanXn I variance of X1 varianceX2 varianceXn 0 152 Sums Averages and Differences of RVs 0 Let X1 X2 Xn be D random variables with mean and variance 2 o TX1 X2 X I T the total is a Random Variable o 1nX1X2 Xn I the average is a Random Variable o D X1 X2 I D the difference is a Random Variable n RV Mean Variance T n n2 2 n D O 22 0 153 Central Limit Theory 0 Provides justification that Total Average and Difference formulas were valid 0 Sums and averages of MD RVs are approximately normal as n increases becomes more accurate 0 Justifies use of normal approximations 0 154 More About Binomial Random Variable 0 mean n o variance n1 o Binomial is the total number of successes over n trials 0 Total of Successes X1 X2 X3 X I X 0 failure 1 I Xn 1 success n quotBernuli RVs only 0 or 1 as options Consider Proportion of n trials that will be successful I Q 1n X1 X2 X3 Xn where all X are independent Bernulis I Possible values of Q are o On 1n 2n nn ranging from O 1 I Mean of Q httpsWWWCourseher0Comfile8062554BiOStatN0tes6 I Variance of Q 1 n 0 155 Comments on Continuity Correction 0 Only need for approximating a discrete RV by a cont RV httpsWWWCourseher0Comfile8062554BiOStatN0tes6


Buy Material

Are you sure you want to buy this material for

50 Karma

Buy Material

BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.


You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

Why people love StudySoup

Jim McGreen Ohio University

"Knowing I can count on the Elite Notetaker in my class allows me to focus on what the professor is saying instead of just scribbling notes the whole time and falling behind."

Jennifer McGill UCSF Med School

"Selling my MCAT study guides and notes has been a great source of side revenue while I'm in school. Some months I'm making over $500! Plus, it makes me happy knowing that I'm helping future med students with their MCAT."

Steve Martinelli UC Los Angeles

"There's no way I would have passed my Organic Chemistry class this semester without the notes and study guides I got from StudySoup."

Parker Thompson 500 Startups

"It's a great way for students to improve their educational experience and it seemed like a product that everybody wants, so all the people participating are winning."

Become an Elite Notetaker and start selling your notes online!

Refund Policy


All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email


StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here:

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.