APP NONPARAMETRIC STAT
APP NONPARAMETRIC STAT MATH 324
Popular in Course
verified elite notetaker
verified elite notetaker
verified elite notetaker
verified elite notetaker
verified elite notetaker
verified elite notetaker
Popular in Mathematics (M)
This 36 page Class Notes was uploaded by Eunice Schoen on Saturday September 26, 2015. The Class Notes belongs to MATH 324 at James Madison University taught by Staff in Fall. Since its upload, it has received 42 views. For similar materials see /class/214034/math-324-james-madison-university in Mathematics (M) at James Madison University.
Reviews for APP NONPARAMETRIC STAT
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 09/26/15
Chapter 0 Preliminaries January 87 2009 1 0 Preliminaries First introductory question When do we use nonparametric statistics Example Construct a 95 con dence interval on the mean household income of your community Test if mean income equals 507000 versus the alternative that mean income does not equal 507000 1 How is this done 2 What test or table is often used 3 What assumptions are needed Section 0 Cumulative Distributions and Probability Density Functions January 87 2009 2 Second introductory question What are nonparametric statistics Details on nonparametric methods begin in section 05 01 Cumulative Distributions and Probability Density Functions A population consists of all possible values for some variable A sample consists of a subset of the population7 and is often drawn randomly A random variable denotes an observation selected randomly from the population A cumulative distribution function cdf of a random variable X is PX S u for all real x Example Suppose X denotes the income of a randomly selected household What does the cumulative distribution of X evaluated at 100000 represent Section 02 Common Continuous Probability Distributions January 87 2009 3 If the random variable is continuous7 then probabilities may be illustrated by the area under a curve known as the probability density function pdf What are some continuous distributions which have names 02 Common Continuous Probability Distributions Using R or Splus see WWWr projectorg To download R Click Download CRAN 7 UNC Chapel Hill or some other site7 Windows 7 base 7 R 281 win32exe Click Save and then Save again The saving should take less than ve minutes using high speed internet Click Open and OK 7 and continue to click Next until the process is complete Alternatively7 use Rweb by going to httpbayesmathmontanaeduRwebRwebgeneralhtml Normal distribution NOJ 039 The probability density function is 5WW202 TN27139 7 What is f for a standard normal distribution Wt foo lt z lt 00 need not memorize Example Use R regarding the normal distribution for the following Section 02 Common Continuous Probability Distributions January 87 2009 4 a Graph the probability density function pdf of a standard normal distribution7 say7 from 73 to 3 gt X lt 7 0100 Generate numbers from 0 to 100 gt X 0100 Generate numbers from 0 to 100 Alternatively7 use gt X C0100 07 means combine gt X X 100 7 05 Generates numbers between 705 and 05 gt X X gtk 6 Generates numbers between 73 and 3 Alternatively7 type gtX6 c0100 100 i 05 gt help dnorm Help menu for dnorm7 gt 7dnorm Again7 the help menu for dnorm7 gt y dnormX fX ie7 probability density function for a standard normal rv gt Note that different means and standard deviations may be used gt plot X7 y gt plot X7 dnormX gt plot X7 dnormX7 type s77 Use stair step for a nicer looking graph gtx6010001000705 gt plot X7 dnormX7 type s77 b Graph the probability density function pdf of a Nu 0570 03 random variable from 73 to 3 gt plot X7 dnorm X7 mean057 sd03 7 type s77 different mean and sd Section 02 Common Continuous Probability Distributions January 87 2009 gt plot X7 dnorm X7 057 03 7 type s77 different mean and sd c Graph the cumulative density function cdf of a standard normal random variable from 73 to 3 1 Use the macro plotdist7 to graph the probability density function pdf of a Nu 500 5 random variable from 35 to 65 gt source httpwwwmathjmuedugarrenstmath324dirRmacros77 Read in the macros gt ls List the functions macros and variables gt plotdist To read the source code and comments gt plotdist dnorm 507 57 xmin357 xmax65 e Use the macro plotdist7 to graph the cumulative distribution function cdf of a Nu 507 o 5 random variable from 35 to 65 Section 02 Common Continuous Probability Distributions January 87 2009 6 f Use the macro plotdist7 to graph the probability density function pdf and the cumulative distribution function Cdf of a standard normal random variable g What will the following R commands generate gt pnorm 196 gt pnorm 7196 gt pnorm 196 7 pnorm 7196 gt qnorm 0975 q7 in qnorm means quantile gt pnorm 31967 3007 10 gt qnorm 09757 3007 10 Section 02 Common Continuous Probability Distributions January 8 2009 7 h Generate 1000 independent observations from a N8010 distribution Determine the sample mean X and the sample standard deviation 5 and graph these obser vations in a histogram gt X1 rnorm 1000 80 10 The r7 in rnorm7 means random gt mean X1 gt sd X1 gt hist X1 gt history To view the history gt q To quit Central Theorem lfn independent observations are sampled from a population with mean u and nite standard deviation 0 then the sample mean X has apprOXimately a Nn om distribution for large n Furthermore if n independent observations are sampled from an apprOXimately NW 0 population then the sample mean X has apprOXimately a NM om distribution A parameter is an often unknown quantity which is a Characteristic of a population Section 02 Common Continuous Probability Distributions January 87 2009 8 More on standardizing ln general7 if X is any random variable and a and b are constants such that the distri bution of X 7 ab does not depend on a or b7 then a is a location parameter and b is a scale parameter Then7 the probability density function pdf of X can be written 1 z 7 a a 7 h f 1296 b lt b gt Example Let f or fwx be the pdf of a Nno distribution fawn202 U m Let hz be the pdf of a N071 distribution Then7 67222 x27T 7 f96 7iooltxltoo hz fooltzltoo Section 02 Common Continuous Probability Distributions January 87 2009 9 Note mo a Uniform distribution The pdf for a standard uniform distribution is l11217 0ltzlt1 Example Use R regarding the uniform distribution for the following a Graph the pdf of a standard uniform distribution gt 7runif b Graph the cdf of a standard uniform distribution c Graph the pdf and cdf of a uniform distribution with endpoints 30 and 40 Section 02 Common Continuous Probability Distributions January 87 2009 10 d What is the mean of a uniform random variable e Sample 1000 observations from a Uniform307 40 distribution7 calculate the sample mean and plot the histogram gt X runif 10007 307 40 gt X 1 50 l Observe the rst 50 observations Example Applying the Central Limit Theorem to the uniform distribu tion using R Let U N Uniform407 60 Let 7 be the sample mean of n independent realizations of U a Plot the pdf and cdf of U b Sample 30 independent observations of 727 using the macro replicate7 Section 02 Common Continuous Probability Distributions January 87 2009 11 gt ubar replicate 307 rnean runif 27 407 60 c Construct a histogram of 107000 independent observations of 72 Interpret your graph 1 Repeat part 0 for U57 U207 and 7100 gt hist replicate 1e47 rnean runif 57 407 60 Section 02 Common Continuous Probability Distributions January 87 2009 12 Exponential distribution An exponential random variable is continuous and lacks memory Example Suppose that a particular type of computer chip7 running continuously7 has lifetime distributed according to an crponcntial distribution Suppose that an old computer chip7 which has been running for two years7 is compared to a new computer chip7 which has been running only one day Which computer chip is better Can the lifetime of a computer chip be negative The pdf for a standard exponential distribution is h2z 6 17 2 gt 07 and has mean and standard deviation equal to 1 Section 02 Common Continuous Probability Distributions January 87 2009 13 gt Graph the pdf of a standard exponential distribution gt plotdist gt 7dexp gt Graph the pdf of an exponential distribution using rate7 equal to 01 gt What are the new mean and standard deviation gt Generate 10000 observations from an exponential distribution using rate7 equal to 01 gt Construct a histogram of the data gt Determine the sample mean of X gt Determine the sample standard deviation of X Section 02 Common Continuous Probability Distributions January 87 2009 14 Laplace or Double Exponential distribution A Laplace distribution is symmetric about its mean7 such that each half of the pdf is emponentz39al The pdf for a standard Laplace distribution is gilz l h3z foo lt z lt 00 7 The standard Laplace distribution has mean zero and standard deviation one gt Graph the pdf of the standard Laplace distribution gt dlaplace gt Graph the pdf of the Laplace distribution with mean equal to 2 and standard deviation equal to 3 gt Generate 107000 observations from a Laplace distribution with mean equal to 2 and standard deviation equal to 3 Section 02 Common Continuous Probability Distributions January 87 2009 15 gt Construct a histogram of the data gt Determine the sample mean of X gt Determine the sample standard deviation of X Cauchy distribution A Cauchy distribution is symmetric and has heavy tails The pdf for a standard Cauchy distribution is 1 7 lt lt Wlt1227 oo 2 oo h42 Note the typo in the textbook on p 3 The mean and the standard deviation of a Cauchy distribution DO NOT EXIST gt Graph the pdf of the standard Cauchy distribution gt plotdist Section 02 Common Continuous Probability Distributions January 87 2009 16 gt 7dcauchy gt Graph the pdf of a Cauchy distribution with location parameter 40 and scale parameter 5 gt If X has the above pdlf7 determine PX lt 407 using R gt Determine PX lt 35 Section 02 Common Continuous Probability Distributions January 87 2009 17 gt shadedist gt shadedist 357 doauohy 407 5 gt Determine PX lt 45 Comparing Normal and Cauchy distributions Ignoring the constants7 the probability density functions of the normal and Cauchy distributions are 6 12 and 1 22 1 Which of these converges to zero faster as 2 a 00 or z a foo Section 02 Common Continuous Probability Distributions January 87 2009 18 If Z is standard normal7 determine P71 lt Z lt 1 gt shadedist C717 17 dnorm lowertailF If X is standard Cauchy7 determine P71 lt X lt 1 Generate 100 observations from a standard normal distribution7 and 100 observations from a standard Cauchy distribution gt Z rnorm100 X rcauchy100 Look at the observations Are there any outliers Determine the minimum and maximum7 for both Z and X gt minz maxz gt minx maXX Plot the histograms Section 02 Common Continuous Probability Distributions January 87 2009 19 May want to try gt trunchist X gt trunchist X7 737 3 Construct a truncated histogram7 excluding values outside 73 and 3 Lighttailed distributions7 such as uniform and normal7 rarely produce outliers Heavytailed distributions7 such as croponcntial7 Laplace and Cauchy7 tend to produce outliers The Cauchy distribution has very heavy tails What is the distribution of the sample mean ofn independent Nu7 0 random variables What is the distribution of the sample mean of n independent Cauchyt97 1 random variables Homework 0021 Let X N Nu 5007 o 200 Let U be a uniform random variable with endpoints 50 and 60 Let Y N Cauchy location 5007 scale 200 Section 02 Common Continuous Probability Distributions January 8 2009 20 Let X be the sample mean of 100 independent realizations of X Let U be the sample mean of 100 independent realizations of U Let Y be the sample mean of 100 independent realizations of Y Let Y be the sample median of 100 independent observations of Y Use R for all graphs below and show both your source code and output CLEARLY LABEL THE VARIOUS PARTS a b c etc a Graph the pdf of X b Graph the pdf of U c Graph the pdf of Y 1 What distribution ie the shape location and scale does X have There is no need to use R for this part e Graph the pdf of X After completing part e sample 10000 independent realizations of X f Compute the sample mean and sample standard deviation of your 10000 values of X g Graph your 10000 values of X in a histogram h Are your results from parts f and g consistent with your answer from part d Explain Next sample 10000 independent realizations of U i Graph your 10000 values of U in a histogram j What is the approximate shape of the distribution of 7 Next sample 10000 independent realizations of Y k Graph your 10000 values of Y in a histogram 1 Graph your 10000 values of Y in a truncated histogram In What distribution ie the shape location and scale does Y have There is no need to use R for this part n When trying to estimate the population median ie location parameter of a Cauchy distribution which is more informative one observation from the Cauchy Section 03 The Binomial Distribution January 8 2009 21 distribution OR the sample mean based on 100 independent observations from the Cauchy distribution Explain Next sample 10000 independent realizations of 0 Graph your 10000 values of 7 in a histogram p When trying to estimate the population median ie location parameter of a Cauchy distribution which is more informative the sample mean or the sample median Explain End of Homework C021 D 03 The Binomial Distribution Example Toss a coin ten times Where the probability of heads is 40 Let X be the number of heads Then X is a binomial random variable with parameters 71 10 and p 04 Example Sample ten people independently from a population consisting of 40 Democrats Let X be the number of Democrats Then X is a binomial random variable with parameters 71 10 and p 04 For both above examples what are the possible values of X In general A Bernoulli trial may result in a success or afailure Suppose X represents the number of successes from n independent Bernoulli trials where n is xed not random and p Psuccess Then X N Binomialnp The probability density function pdf of X can be shown to be PXz pm 1710 7m 012n Section 03 The Binomial Distribution January 87 2009 22 Note well The de nition of the pdf of a discrete random variable differs from that of a continuous random variable In the above example regarding political af liation7 what is the probability that exactly 30 of the sample consist of Democrats Alternatively7 in the above example regarding coin tosses7 what is the probability that exactly 30 of the ten coin tosses result in heads gt dbinom37 107 04 Determine PX z7 for all z 071727 710 Section 03 The Binomial Distribution January 87 2009 23 On average7 how many heads do we expect from the 10 coin tosses On average7 how many Democrats do we expect from the 10 people sampled ln general7 what is the mean of X The variance of X can be shown to be np1 7 p A special case of the Central Limit Theorem If X N Binomialn7p and 16 Xi i7 where np 2 10 and 711710 2 107 thenX is approximately N 1 71p7 Um np1 710 and 16 is approximately N 13 107 a ip1 finn Hence7 for suf ciently large sample sizes7 a binomial random variable or a sample pro portion may be reasonably approximated by a normal random variable Example Graph the pdf of a Binomialn7p 03 random variable for various values of H7 say 71 17 27 57 107307 50 and 100 gt plotdist dbinom777 17 03 Section 03 The Binomial Distribution January 87 2009 24 Homework 0031 Let X N Binomiali i7 p 001 a Graph the pdf of X for n 107 1007 500 and 1000 Make sure that you select appropriate domains for X so that your four graphs are neat b State for which of these graphs in part a a normal approximation seems reasonable End of Homework C031 D Homework 0032 Let X N Binomialn 5007 p 08 CLEARLY LABEL THE VARIOUS PARTS a b c etc a Determine exactly PX S 390 Section 04 Con dence Intervals and Tests of Hypotheses January 8 2009 b Determine the mean and standard deviation of X c Determine the z score ie standard normal score corresponding to z 390 or you may use z 3905 1 Using a normal approximation to the binomial compute PX S 390 Use R for this computation not a standard normal table e Compare your answers from parts a and f Sample 1000 independent realizations of X Determine the sample mean X and the sample standard deviation 5 and graph these observations in a histogram g Compare your answers from part b with X and s from part End of Homework C032 D 04 Con dence Intervals and Tests of Hypotheses Suppose a the observations from a population are independent and b the original population is approximately normal or the sample size n is large and o lt 00 Let X be the sample mean and let 5 be the sample standard deviation Then an approximate con dence interval on the population mean M if nite is X i WIS Hypothesis testing on M involves converting the standardized test statistic T X 7 to a p value Example Consider the following data set regarding birthweight of humans in grams 3615 3105 4217 3234 3551 4023 3098 a Graph the data in a Q Q plot normal probability plot Section 04 Con dence Intervals and Tests of Hypotheses January 87 2009 26 gt 7qqnorm gt x c 36157 31057 42177 32347 35517 40237 3098 Section 04 Con dence Intervals and Tests of Hypotheses January 87 2009 27 b Construct a 90 con dence interval on 7 the population mean birthweight gt 7ttest gt ttest X7 conflevel 09 c Test at level 04 0057 H0 M 3400 g versus Ha M gt 3400 g gt ttest X7 alternative greater rnu 3400 Section 05 Parametric versus Nonparametric Methods January 87 2009 28 05 Parametric versus Nonparametric Methods Parametric methods often require that the population be approximately normal or the sample size be large When n is large and the standard deviation is nite7 normal theory often may be used Much theory has been developed for the normal distribution Random variables from other distributions whose theory has also been developed quite thoroughly arise from normal populations What is an example Section 05 Pammetm39c versus Nonpammetm39e Methods January 8 2009 29 Section 05 Parametric versus Nonparametric Methods January 87 2009 30 0 Some populations are rum normal but may be transformed to normality by taking a transformation For example7 a data set generated by a lognormal random variable may be transformed to a normal random variable How 0 Other distributions in addition to the normal distribution also have theory that is fairly well developed For example7 data from an exponential population may be analyzed using techniques based on the exponential distribution 0 How can we determine whether or not we believe a data set is from an approximately normal population How well can we distinguish normal data from exponential data for small sample sizes Section 05 Parametric versus Nonparametric Methods January 87 2009 31 0 With parametric methods such as the t test7 typically the sample size needs to be large or the shape of the population is assumed such as normality7 and some or all parameters such as u and U are assumed unknown 0 Nonparametric methods generally do not require that the shape of the population be known or that the sample size be large With nonparametric methods7 normality is not assumed The assumptions required on the distributions for nonparametric methods tend to be quite weak eg7 the population is continuous Nonparametric methods are sometimes called distributionfree methods7 since use of such methods is not restricted to particular eg7 normal distributions 0 Typically7 when the distribution is heavy tailed and the sample size is small7 the non parametric methods are more valid than the parametric methods Suppose the population really is normal methods be used Should parametric or nonparametric Section 06 Classes of Nonparametric Methods January 87 2009 32 0 Even when the population is normal7 nonparametric methods often perform well7 even in comparison to parametric methods 06 Classes of Nonparametric Methods Nonparametric methods are based on four statistical ideas Methods Based on the Binomial Distribution chapter 1 Suppose we perform hypothesis testing or construct con dence intervals on the median of a continuous population We sample n independent observations from the distribution An observation above the median might be labeled as a success7 whereas an observation below the median might be labeled as a failure Let Y be the number of successes Then Y has what distribution Section 06 Classes of Nonparametric Methods January 87 2009 33 Permutation Shuf ing Scrambling Methods chapters 2 through 7 Example Is the new drug better than the old drug for lowering cholesterol Suppose there are 20 volunteers for the study Obtain sorne statistic to account for the difference between the two groups regarding cholesterol change How do we convert our statistic to a pvalue Section 06 Classes of Nonparametric Methods January 87 2009 34 Permute shuffle7 at random the 20 observations7 and compute a permuted statistic Repeat the permuting many times to produce many perhaps 107000 permuted statis tic Compare your original statistic to the 107000 permuted statistics Interpret Suppose the original statistic falls far into the appropriate tail of the histogram of the permuted statistics Bootstrap Resampling Methods Chapter 8 and section 91 Sample 100 independent observations from an unknown population with unknown nite mean and unknown median X estimates M7 and the sample median estimates the population median What is the standard deviation of X From Math 2207 what is a reasonable estimator of the standard deviation of X What is the standard deviation of the sample median Section 06 Classes of Nonparametric Methods January 87 2009 35 What is a reasonable estimator of the standard deviation of the sample median One method for estimating is to use the bootstrap Bootstrap 1 Sample ie7 RESAMPLE 100 observations WITH replacement from the original 100 observations7 and obtain a bootstrappcd sample median 2 Repeat step 1 many times to produce many perhaps 10000 bootstrappcd sample medians 3 Compute the sample standard deviation of the 10000 bootstrappcd sample medi ans That is your estimate Smoothing and NonLeast Squares Methods Smoothing sections 101 and 102 Example using one variable gt X c rnorm10007 rnorm10007 67 15 gt Think of X7 as just being some data set of 2000 numbers gt hist X D Example using tWO variables Start with a scatterplot ie7 some 2 dimensional graph based on a sample Section 06 Classes of Nonparametric Methods January 87 2009 36 Draw a smooth curve through the data Why NonLeast Squares Methods section 103 Least squares methods eg7 linear regression often require normality and errors get squared Suppose a heavy tailed distribution is of interest What do heavy tailed distributions tend to produce Why is this a problem
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'