Data Analysis and Statistical Inference
Data Analysis and Statistical Inference
Popular in Course
Popular in Department
This 12 page Reader was uploaded by Liubov Volkova on Tuesday March 25, 2014. The Reader belongs to a course at a university taught by a professor in Fall. Since its upload, it has received 231 views.
Reviews for Data Analysis and Statistical Inference
Report this Material
What is Karma?
Karma is the currency of StudySoup.
Date Created: 03/25/14
ANOVA and F ANOVA TABLE For comparing means of 2 groups we use Z or T da equotquot If we have 3 groups we use a new test ANALIYSIS G class 3 23656 78855 2 735 lt0000l OF VARIANCE ANOVA and a new statistic called F E Residuals 79 I 286980 3628 Total 794 3 I 0636 HYPOTHESIS TESTING Hnull the mean outcome is the same across all Categories DF SUM SQ Mean SQ Pvalue Ha at teeet one pair of means are different form each These are Degrees of freedom Sum of Squares total SST Mean S uares Avera e Vartabtttt It is the probability of at least as large a ratio between the quotbetweenquot other q 39 g V and quotwithinquot group variabilities if in fact the means of all groups are There is a group degrees of freedom dfe which It measure the total variability in the response between and W39th39n groupsv Calculated as equal depende en the number ef9re 39de39 variable the total variabilit sum of s uares scalec Zt test The total degrees of freedom dfT which depends n b th t dd fqf d Variability bet 0 groups 7 We eembete twe eteube and ttyte eetebtteb Wbetbet on total number of observations 2 y e associa e egrees O ree om F they are so far apart that the observed difference e lt 39 39 7 eenbet teeeenebty be attributed te eembttne Vettebtttty And the errordegrees of freedom dfE which IS the y y 8lquot0UP q Varlablhty WIn groups 39 difference between the two i 1 anova error P Cemelefe meehs ffem 3 9f0Ul0s 390 establish Whether So for each observation we calculate the difference they are so far apart that the observed differences Degrees of freedom between the grand mean of response variable and p cannot all reasonably be attributed to sampling the vaue Qf respgnse Variame square this Vefiebi 39fY assoaated w39th ANOVA difference and sum them all up F value b total dfT n 1 S f SSG 39 PVALUE FOR ANOVA IN R um o s uares rou s pQz t pDv k 1 q 9 p F statt39st39e Reee ef the eetween group the pf function requires 3 arguments eepende en the Vanabmty between groups and group G It measures the variability between groups and W39th39n group Var39ab3939tY3 1 The F39Va39 e variability within groups df df df 2 The dfG owerai gt error E T G It is the explained variable the one that is explained 3 The E K21 735 3 791 I t FALSE vanablhty bet groups by the explanatory variable it IS the one that IS F T P F affected by being in one or other group Var1ab111ty W 1n groups k 2 E Tlj Z339 3 IMPORTANT WHAT DO WE NEED 1 J FVALUE will always be positive since it is calculated using squared values 1 To reject H null we need a small pvalue The Pvalue for anova will always be on the higher tail of the distribution so when using R we need to set lowertail to FALSE 2 To get a small pvalue we need a large Fstatistic Fo each group we get the difference between the We do not consider two tails for the fdistribution only the right tail It is rightskewed 3 To get a large Fstatistic we need greater between mean of the response variable for the group and As the result we can establish whether we can say that there is at least one pair of groups that are significantly different We groups variability and lesser withingroups variability the grand mean of the response variable We cannot say which pair it is or what the difference exactly is between them square it then multiply it by number of observations in the group and sum them all up Sum of squares error SSE It measures the variability within groups It is the unexplained variable due to reasons other than being in a specific group SSE SST SSG DECISION ERRORS HA true Type 2 error Type 1 error null hypothesis is rejected even though it is true convict the innocenD Type 2 error null hypothesis is favoured even though it is wrong declare defendant innocent when he is guilty reject Ho Type I error 0 fail to reject Ho Ho true 5 SIGNIFICANCE LEVEL 5 Type1 error Plj I y39DC I CITOVI I I true 3 O cos y so set a higher alfa POWER OF TEST fail to reject Ho Ho true I X Truth HA true Type 2 error B The POWER of a test is the probability of correctly rejecting the null Thus the significance level alfa is basically the chance of committing the The probability of wrongly rejecting null hypothesis type 1 error is alfa The probability of wrongly keeping the null type 2 error is beta reject Ho Type I error 0 I I3 Our goal is to keep both ALFA and BETA low Alfa should be smaller in cases when Type1 errors are dangerous or very Similarly if Type 2 error is relatively more dangerouscostly it is reasonable I3 FOR TWOSIDED TEST CONFIDENCE INTERVAL COMPLEMENTS SIGNIFICANCE LEVEL So if confidence interval is 95 then significance level is 5 two sided HT with 0 005 95 confidence interval 095 u 0025 0025 FOR ONESIDED TEST CONFIDENCE INTERVAL COMPLEMENTS SIGNIFICANCE LEVEL 2 So if SL 5 then Cl is 90 one Sided HT 90 confidence with 0 005 interval quot l I 09 005 005 If H0 is REJECTED then the Cl should NOT include the null value If HO his KEPT then the Cl is expected to include the null value BOOTSTRAPPING Boo39rs39rRAppN3 pereehttte methed BOOTSTRAPPING standard error method is used when we have a small sample and cannot use CLT for inference we take the distribution eeteutete how many For this we need to know the means and the SE of the bootstrap statistics will not fir in our confidence b00 fS fFal00i 9 diStribUti0 It is basically emulating the multiple sampling for the thtetvet and Cut them off from the tens er ten And our Cl will be the mean plusminus the margin of population that we do not have access to on the basis error which is SE multiplied by the zscore which we of assumption that for every observation that we have The thtervet between the eutette Wm be our eeh dehee get form the Cl size there will be more like it in the population ihtehe 1 Take a bootstrap sample a random sample taken from original sample with replacement same size as 50075 t t 95 I I 5 I W W 1 n the original sample lt gt 190 gtlt SIwmt19gt gtlt I39im 2 Calculate bootstrap statistic mean median or whatever 3 Repeat 1 and 2 many times to create a bootstrap distribution It is like sampling distribution but instead of the true population samples we have simulated samples the bootstrap samples quotzgszoouuouo i O IO 0 N I I I I 800 900 1000 1 100 1 200 i 500 70 Bootstrap distribution COCOO0IOOOOOIOOOOOIOOOOOOOOOO CCCOOOCCCOOC 0 0 0 font 1 I In I l 00 1000 1100 1200 r11f 800 CD 700 Which die is on the right BAYESIAN INFERENCE We start with prior probabilities what we believe before we observe any data We rolled the right hand and got 24 Now what is the probability of P12sided now P24 I 12sided 075 n 39339WI quot P24 amp12sided 075 05 0375 P12sided 05 3 Plt4 I 12sided 025 mv x Plt4 amp 12sided 025 05 0125 e P24 I 6sided 05 P24 amp 6 sided 05 05 025 P6 sided 05 Plt4 I 6sided 05 mm t Plt4 amp 6sided 05 05 025 We apply the Bayes theorem probability of A given B is the probability of AampB divided by the marginal probability of B P12sided I 24 P24 amp12sided P24 P24 P24 amp12sided P24 amp 6sided P12sided I 24 P24 amp12sided P24 amp12sided P24 amp 6sided 0375 0375 025 0375 0625 06 Thus our probability of having the 12sided die on the right hand had increased due to observed data This is posterior probability Phypothesis I data P12sided 06 SAMPLING DISTRIBUTION We take samples from population and inspect those samples They will have their own more or less different parameters and their own sample distributions in which each case is a randomly sampled case of interest like a person or a household The resulting sample statistics form a new distribution the SAMPLING DISTRIBUTION In this distribution each case is a parameter of a sample distribution for example its mean r y sample statistic sample statistic 39 sample statistic sample statistic I z sample distributions iiipl rig lt139isli iItil ion CENTRAL LIMIT THEOREM CLT The distribution of sample statistics is 1 nearly normal 2 centered at the population mean 3 has sd equal to population sd divided by the square root of the sample size 7 1 N mean u SE px 72 Since population sd sigma is often unavailable we replace it by the best guess the sd of one sample that we do have so we use S instead of sigma Sf 5 J77 SAMPLE ERROR The mean and sd of each sample distribution is expected to be roughly equal to mean and sd of the population Also the mean of sampling distribution is expected to be roughly equal to the mean of the whole population lII II39T z 1 The standard deviation of the sampling distribution distribution of sample means is called the sample error It is expected to be less than the standard deviation of the whole population sigma because if samples are more or less similar then their means should not vary too much SDr39 lt 7 As the sample size n increases the standard error decreases CONDITIONS FOR THE CLT 1 Independence Sample observations must be independent 11 It has to be a random sample or assignment 12 If sampling without replacement then n lt 10 of population 2 Sample sizeskew Either the population distribution is normal or if it is skewed the sample size is large normally n gt 30 is the rule of thumb CONFIDENCE INTERVAL A plausible range of values for the population parameter Is is sample mean plusminus a margin of error ME Approximate 95 CI sample mean plusminus two standard errors sample sd 1 39 i A 68 95 A 997 A I I I I I I I cgporox mate 95 CI39 339H2S I 7C2l 37 of error C M0 CONFIDENCE LEVEL ACCURACY does the range capture the true value An interval which contains the true population mean Commonly used confidence levels are 90 95 98 and 99 As confidence interval increases so does the accuracy confidence level However precision goes down CONFIDENCE INTERVAL FOR POPULATION MEAN Computed as the sample mean plusminus a margin of error a critical value corresponding to the middle XX of the normal distribution times the standard error of the sampling distribution 397 i 3 5 CI 77 If we decide to find the critical value the z for the 95 confidence level it will be the zscore ALWAYS positive which cuts off the middle 95 of the normal curve Since we take the middle 95 we will have two tails of 25 percent each on top and on bottom So we can calculate the zscore with the qnorm function qnorm0025 mean 0 sd 1 196 So z will be actually 196 for the 95 CI For z we ALWAYS use the NORMAL parameters mean 0 sd 1 nding the critical value jic 8 95 con dence 7 I 095 2 CONDITIONS FOR THE CONFIDENCE INTERVAL FOR POPULATION MEAN 1 Independence Sample observations must be independent 11 It has to be a random sample or assignment 12 If sampling without replacement then n lt 10 of population 2 Sample sizeskew more strict than conditions for CLT MUST be h 2 30 if population distribution is very skewed then even larger CALCULATING SAMPLE SIZE If we have 1 Margin of error 2 Confidence level 3 Information on the variability of the sample or population We can determine the required sample size to achieve the desired ME 2 ME 5 gtn Regardless of the value of the decimal the minimum n is always rounded up If we want to decrease ME for example to divide in half then the n will change by this number squared in this example we will have to multiply it by 4 TIF1 I IgtEIL LTilIll p lD LTL Survey question For how many days during the past 30 days was your mental health notgood Survey results Based on responses from 1151 US residents the survey reported a 95 confidence interval of 340 to 424 days in 2010 What does the 95 confidence interval mean It means how confident we are that this interval captures the true mean So we are 95 confident that Americans on average have 340 to 424 bad mental health days per month What does the 95 confidence level mean It technically means that 95 of samples of the same size taken from the same population contain the true population parameter So here is means that 95 of random samples of 1151 americans will yield Cls that capture the true population mean of number of bad mental health days per month What if we want 99 confidence level what will happen to Cl It will increase BINOMIAL DISTRIBUTION Describes the probability of having EXACTLY k successes in n independent Bernouilli trials with probability of success being p BERNOULLI RANDOM VARIABLE When an individual trial has only 2 possible outcomes p probability of success I7 number of independent Bernouilli trials K exact number of successes RULES OF BINOMIAL DISTRIBUTION 1 trials must be independent 2 the number of trials n must be fixed 3 each trial must be classified as either success 0 number of all possible scenario scenarios x probability of a or a failure 4 the probability of success p must be the same for each trial single scenario Number of scenarios Calculate a number of ways to obtain k successes in n trials The choose function 12 III 1 1117 i39I 3 3 W EXAMPLE How many scenarios yield exactly one success in 4 trials 1000 0100 0010 0001 OR 4 4 1 1lgtlt4 1 1gtlt3r 2 7lt 1 1x3x sr1 1 Probability of single scenario Probability of a success to the power of needed number of successes multiplied by the probability of failure to the power of number of failures CHOOSE IN R function choose accepts variables 1 number of trials 2 number of successes and returns the amount of possible scena os syntax example a normal distribution BINOMIAL PERCENTILE IN R We can specify the range of exact numbers of successes for the dbinom function and add them up with sum syntax example PERCENTILE FOR BINOMIAL DISTRIBUTION The binomial distribution formulas are good when we need the exact k but if we need to know the probability of like 2 or s k successes we would have to calculate the probability of each point in the range and add them up which is not fun When n is a large number the shape of the binomial distribution closely resembles the normal distribution and thus we can approximate and use the formulas of the normal distribution Successfailure ruleA binomial distribution with at least l0 expected successes and I0 expected failures closely follows np nlt I P IV IV V A N M sumdbinom70245 size 245 p 025 I u 1 0113 CALCULATING BINOMIAL DISTRIBUTION EXPECTED VALUE The expected value of binomial distribution is it39s mean Binomial distribution If p represents probability of success l p represents probability of failure n represents number of independent trials and llt represents number of successes Pllt successes in n trials 2 0 pk 1 pquotIquot Q n 77 Wis k p Us choose4 1 4 BINOMIAL DISTRIBUTION IN R function dbinom accepts variables 1 number of successes 2 number of trials 3 probability of success and returns the probability of having exactly this amount of successes in the defined amount of trials syntax example dbinom8 size 10 p 013 277842e06 And it is simply the probability of success multiplied by the number of trials 1 up STANDARD DEVIATION sd of binomial distribution is mean multiplied by the probability of failure np1 p ADJUSTING THE OBSERVATION when doing a normal approximation to binomial Because the shape of the Normal distribution curve does no exactly follow the distribution of bars of the binomial distribution when approximating normal to binomial we compensate for this difference by subtracting 05 points from the observation when calculating its zscore 000 001 002 003 004 005 25 50 75 100 TESTING A HYPOTHESES Define HO the null hypothesis Define HA the alternative hypothesis HA is usually presented as a range of possible parameter values The claim is that the parameter is greater less or not equal to the HO parameter Assume H0 is true Obtain test statistic Calculate the probability of achieving results of HA if H0 is true It will be the percentile of the normal distribution for calculating which we use HO data for construction HA data for cutoff of the percentile So we basically draw a normal distribution if CLT conditions are fulfilled and we base it on the HO data Then we check where HA data is positioned on this curve and calculate the percentile for it Thus we find out the pvalue the probability of observation occurring if H0 is true Compare to significance level if pvalue lt alfa then we reject HO if pvalue is not lt alfa then we do not reject HO P07 32 l gtJ z 2 33 7 AL 3 56 0244 3quot3 Z a7 0839 9Va le RC2 gt 030 0209 St 0244 D Hypotheses are ALWAYS about population parameters not sample parameters TEST STATISTIC The parameter we use to calculate pvalue Often it is the zscore of HA parameter on the normal curve drawn according to H0 TWOSIDED TWOTAILED HYPOTHESIS TEST In case we are looking for divergence from the null in ANY direction not just specifically higher or specifically lower Definition of pvalue is the same but now we need to consider at least as extreme as observed outcome in both directions 23 u3 32 Difference between the sample mean and the null EFFECT SIZE value PVALUE The probability of observing data at least as jo favourable to the HA as our current data set if HO was true Calculated using the test statistic The smaller the pvalue the stronger is the evidence against the null hypothesis PVALUE FOR TWOTAILED Since now we consider BOTH tails of the normal curve the pvalue will be the SUM of probabilities 9Vaas 9Z P 0315 Z lt 030 1 3 0309 0418 SIGNIFICANCE LEVEL alfa Usually set at 5 50 32 44 T quot T l 3 32 Swe 0 ftAle 39 797 429 do noz rc kc gt0 2 Calculate the point estimate CE 3 Check conditions RECAP Hypothesis testing for a single mean I Set the hypotheses Ho it null value HA1 lt or gt or 35 null 39ual39ue 4 Draw sampling distribution shade pvalue calculate test statistic Z 5 Make a decision and interpret it in context of the research question gt If pvalue lt or reject Ho the data provide convincing evidence for HA gt If pvalue gt or fail to reject Ho the data do not provide convincing evidence for HA I Independence Sampled observations must be independent random sampleassignment amp if sampling without replacement n lt lO of population 2 Sample sizeskew n 2 30 larger if the population distribution is very skewed 173 3 SE 5 VARIANCE Roughly the average squared deviation from the mean For each observation we subtract the mean from the observation We square this We add them all up And we divide by number of observation minus one squot2 for sample variance oquot2 for population variance 2 1 I5 72 721 NORMAL DISTRIBUTION 68 95 997 rule 68 o 95 o 9 9 7 o A I I I I I I I I u 30 u 20 u0 u u0 u20 u30 STANDARD DEVIATION Roughly the average deviation from the mean square root from the variance s for sample sd o for population sd 9 pH 1ltwz 72 1 ZSCORE STANDARDIZED SCORE Z score of an observation is the number of standard deviations it falls above or below the mean observation mean SD Z PERCENTILE Although a distribution of ANY shape can have Zscores for the NORMAL distribution we can use Zscores to calculate PERCENTILES PERCENTILE is the percentage of observations which fall BELOW a given data point Graphically it is the area below the curve to the left of our observation PERCENTILE IN R function pnorm accepts variables 1 value of observation 2 mean of the distribution 3 standard deviation of distribution and returns the PERCENTILE of that observation on a normal distribution syntax example pnorm1800 mean 1500 so 300 08413 QUANTILE IN R function qnorm accepts variables 1 percentile of observation 2 mean of the distribution 3 standard deviation of distribution and returns the VALUE quantile of that observation on a normal distribution syntax example qnorm08413 mean 1500 so 300 1800 NORMAL PROBABILITY PLOT DATA are plotted on the Y axis vertical THEORETICAL quantiles following a normal distribution are on the X axis horizontal This allows to see how the actual data diverges from the expected normal distribution The closer the points are to the straight line the closer the distribution follows the normal model male heights in CD V J 01 O 01 60 65 70 75 80 Male heights inches heights of NBA players 2 1 0 1 2 Theoretical Quantiles I I I T I 70 75 80 85 90 Height inches Right skew I Points bend up and 1 to the left of the line K K I Left sltevv I Points bend down I and to the right of thehne Theoretical quantiles SIIOIAI tails Inuquot OwCr tI an quot3911 391fvr ll ttis39r 39xii On Points follow an S shaped curve LO g IaIIS I 0 ilrj r qA quotilti Points start below the line bend to follow it and end above it PAIRED DATA When two sets of observations has a special correspondence they are not independent We take the difference and calculatetest hypothesis as usual remembering to use the values for the distribution of DIFFERENCES COMPARING INDEPENDENT MEANS SIDEBYSIDE BOXPLOTS Work best for visual comparision of independent sets of observations ESTIMATING DIFFERENCE Again we want to work with the difference of means but since the sets are independent we need another formula for the Standard Error So standard error for the sampling distribution of the difference of means is larger than the error of each individual distribution Conditions for inference for comparing two independent means I Independence within groups sampled observations must be independent gt random sampleassignment gt if sampling without replacement n lt O of population J between groups the two groups must be independent of each other non paired 2 Sample sizeskew Eaich sample size must be at least 30 ni 2 30 and n2 2 30 larger if the population distributions are very skewed CONFIDENCE INTERVAL 3 zlz STANDARD ERROR SE It is the SD of the sampling distribution 0 l 5 J2 CRITICAL VALUE Z It will be the zscore of 1 Cl2 So for CI 95 the 2 will be qnorm1 0952 qnorm0052 qnorm0025 196 O V ALSO MARGIN OF ERROR ME I O 3 observation mean Z j Z fl SD We divide by SAMPLING SD SE NOT sample sd SAMPLE SIZE 8 2s 2 l IEz W gtn AIE P39VA39UE Pobserved or more extreme outcome llo true pnormtest statistic NONDISJOINT EVENTS MARGINAL PROBABILITY COND39t39ONAt PttOBAB39t39tY RANDOM PROCESS Can happen at the same time For calculation we use the results in the MARGINS What is the probability on one Condition GIVEN the B i ii F m we knew what euteemee there can be but of the table of observation that is we use the other ayes t e0 e We Ci0 h0t ilth0W ii VViiiiiltii FlJ39lti0m9 in PaFtiCUiai PA V B PA quot39 PB 39 PAampB totals For example what is the probability that a student PA and B DISJOINT EVENTS For example what is the probability that a student i r hi J ii39riSC rhle tquot 939V 39h quotquot ttquoth9 Ctass th39hkS that he 393 PA i B objective class position is upper middle class 39 PB ignnnOttjti 1ptl tthaet ttieenrstadriiazittattnnie We have to divide the total of objective upper titre We FIRST SeteCtthOSe who match the middieCiaSS by the ioiai of Students quotgivenquot second condition And then calculate the RULES OF PRO3A3LTY S0 PAampB 0 therefore probability of one if them meeting the other main 0 S PA S 1 PA v B PA PB PAampB PA PB Pobj UMC 5098 hd39t39 h So we take those who meet the main condition and results objectivesociaiciassposition divide by lIhlS subgroup Ge eral pr OdUCt rule FREQUENTBT iNTERPRETATiON INDEPENDANCE Subjective wor ifgciass Psubi UIVIC I obi WC 848 PA and B PA B x PB social class middle class Probabiiiiy of an Outcome is the proportion Processes are independent if knowing the tdeiitttr uppermiddileclass results objective social class position 39 39 39 PP C355 workin class u er middle class T I of times it will occur is we observer the OutcomzgrtgSin ig3t igigtfhegugiiztgitmatton poor 0g pp 0 random process an infinite number of times subjective working class 8 0 if social class middle class A I3 S0 PA I B PA therefore identity Upper middie dim kg 37 u erclass O PAampBgt PA B PB PA PB tO39Nt PttOBAB39L39tY 2 BAYESlAN lNlERPRETA1lON For joint pmbability We Use the intersection Ot r0W3 PROBABILITY TREES and columns Probabiiiiy is a Subieciive degree of beiiei Are good when we know PA I B and need to calculate the reverse PB IA F I h 39 h 39I39 h O8iteiiftrgtrttA i3DVtiSueit3ttittitiV tittS3ttttttSiiiitOittaarteStittCtt tittit The sum of all branches in the probability trees of the same level should equal to 1 upper middle class SAMPLE SPACE Basically we take the whole group and wonder Collection of all possible outcomes of a trial iiitrtigagetjritgtiir itgbabtttty to get BOTH Conditions at P393939 it 3 Coupie has 2 kids What is the Sampie P I HIV 0997 mmmmmmmmmma P amp HIV 0997 0259 02582 P ampOH2t5 Pgtogtdgo HIV 2 space of their sexes S 2 MM ME FM FF We have to divide the total of objeotive upper 0313 middleclass by the total of students C C PHIV 0259 e P b39 UM amp b39 UM 3798 to t S t i P I HIV 1 P I HIV 0003 xemaxx P amp HIV 0003 0259 00008 results objective social class position Population of oor 39 quot COMPLEMENTARY EVENTS PROBABILITY DISTRIBUTION Subjective woriingdass Swaziland 39 Sum Oi Oi aii probabiiiiies besides the one Lists all possible outcomes in the sample u tiditiltiitSiass P t quot0 t39ttV t 39 Pi t O t39ttV 0074 it P393939 8 quot Hit 0074 O741 O0548 pv We are Considering space and the probabilities with which they upperclass Pino HIV 2 1 p V 2 1741 3 P amp HIV P OCCUV 025820313 082 0 j Events must be W 2 Each P must be between 0 and i P I no HIV 0926 m mmmmsxms 0 P amp no HIV 0926 0741 06862 3 Sum of probabilities much be 1 tdistribution CLT can work with ANY sample size as long as the population distribution is nearly normal However is is hard to verify normality especially when we have a small sample Before applying CLT we should think whether we would expect the population distribution to be nearly normal Will it be symmetric will the outliers be rare So for uncertain condition we use the tdistribution Tdistribution gives higher pvalue than normal distribution so it39s more conservative Basically since we have little data we need much more certainty to reject the null hypothesis tstatistic It is analogous to the z score on the normal distribution It is calculated the same way T obs null SE And then the pvalue is obtained the same way as in normal distribution The only difference is that we must consider the degrees of freedom how much does the distribution approach the normal DEGREES OF FREEDOM DF For inference on ONE sample mean df n 1 For DIFFERENCE OF TWO MEANS df mz39nn1 1n2 1 So the smaller the sample the lower is the tip of the curve and the thicker it39s tails It is farther away from the normal curve TSTATISTIC IN R We can get the tstatistic by using the qt function like qnorm We need to feed it the percentile and the degrees of freedom qtoo25 df21 gt n is small amp I unknown almost always use the t distribution to address the uncertainty of the standard error estimate gt bell shaped but thicker tails than the normal gt observations more likely to fall beyond 2 SDs from the mean gt extra thick tails helpful for mitigating the effect ofa less reliable estimate for the standard error ofthe sampling distribution v always centered at 0 like the standard normal gt has one parameter degrees of freedom df gt remember the normal distribution has two parameters mean and SD N l 3 9 3 9 0 0000 O h h h II II II II g1a determines thickness of tails What happens to the shape of the tdistribution as degrees of freedom increases q 9 OCZC I85 Z 7e norMd a 52 PVALUE FOR TDISTRIBUTION IN R Just like the norm function we can use the pt function We need to feed it the tstatistic and df Additionally we can tell the function to switch ad use the upper tail instead of default lower tail pt23 df21 owertailFALSE