New User Special Price Expires in

Let's log you in.

Sign in with Facebook


Don't have a StudySoup account? Create one here!


Create a StudySoup account

Be part of our community, it's free to join!

Sign up with Facebook


Create your account
By creating an account you agree to StudySoup's terms and conditions and privacy policy

Already have a StudySoup account? Login here

Intro Stats and Data Analysis

by: Dr. Leon Koss

Intro Stats and Data Analysis ECON 2370

Marketplace > University of Houston > Economcs > ECON 2370 > Intro Stats and Data Analysis
Dr. Leon Koss
GPA 3.89


Almost Ready


These notes were just uploaded, and will be ready to view shortly.

Purchase these notes here, or revisit this page.

Either way, we'll remind you when they're ready :)

Preview These Notes for FREE

Get a free preview of these Notes, just enter your email below.

Unlock Preview
Unlock Preview

Preview these materials now for free

Why put in your email? Get access to more of this material and other relevant free materials for your school

View Preview

About this Document

Class Notes
25 ?




Popular in Course

Popular in Economcs

This 24 page Class Notes was uploaded by Dr. Leon Koss on Saturday September 19, 2015. The Class Notes belongs to ECON 2370 at University of Houston taught by Staff in Fall. Since its upload, it has received 37 views. For similar materials see /class/208206/econ-2370-university-of-houston in Economcs at University of Houston.


Reviews for Intro Stats and Data Analysis


Report this Material


What is Karma?


Karma is the currency of StudySoup.

You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 09/19/15
Chapter 7 Sampling Distribution In this chapter we talk about the techniques of collecting data and drawing samples that represent the distribution of values in a population Examples of data sets that use samples 1 Decennial Census 2 Current Population Survey CPS 3 Consumer Price Index CPI Decennial Census 1 Began August 1790 Conducted every 10 years 2 Motivation estimate population that can be taxed and assess the country s industrial and military potential 3 Data are kept con dential for 75 years Data items collected 1 1790 number of persons in household and counts of persons in the following categories free white males and females all other free persons slaves ethnicity 2 Added items agricultural mining government activities religious bodies 1810 taxes education wages value of property 1 1840 1850 farm and home mortgages 1880 unemployment 1980 business housing and transportation 1940 place of work and means to work 1960 occupation history 1970 Data items collected 1 1980 use and creation of TIGER les data can be mapped geographically Methodology 1 1790 1880 Marshals interview household 2 1830 Standard survey form used prior to this marshals used whatever paper was available rule it write in headings and bind the sheets together 3 1870 Rudimentary tallying device used to help clerks 4 1890 Herman Hollerith introduced punchcards and electric tabulating machines Methodology 1 1910 Census of ce organized as a permanent agency 2 1950 First full use of computer support 3 1960 Devices set up to read data on returns use of the Postal System to distribute surveys 4 1990 Counts of the homeless instituted Use of samples 1 1880 Basic counts took almost until 1890 census to tabulate and publish 2 1890 Supplemental survey some subjects were covered in more detail 3 1940 Sampling introduced 5 of the population were asked an additional set of questions Consumer Price Index CPI 1 Collected monthly 2 lnititated during World War I when rapid increases in prices particularly in shipbuilding centers made such an index essential for calculating cost of living adjustments in wages 3 Hypothetical bundle of goods is de ned for the typical family Changes to the bundle and the family composition occurred over time 4 Samples of prices collected from establishments are used to estimated value of bundles 3 Current Population Survey CPS 1 12 monthly survey topics include employment temporary workers job tenure and occupational mobility school enrollment race and ethnicity voting and voter registrations food security work schedules computer ownership fertility and marital history lx39 set up in the late 1930 s to provide direct measurements of unemployment each month 00 Probability sampling as used from the beginning Early samples of 50000 households used to estimate employment activities of the general population 4 Supplements decennial census data In this chapter we will concern ourselves with samples and sampling distributions 1 Previously when we looked at distributions we examined data distributions 2 In this chapter the distributions are made of parameters from samples mean distribution p distribution 3 Before we talk about sampling distributions we need to talk about how a sample is derived 4 In this chapter certain terms that we used before are given a new name 1 Parameters of interest gt u 02 and p are values or parameters that are ones we wish to derive from the samples 2 Statistics With each sample we derive statistics or the set of parameters of interest Methods of extracting a sample 1 Data on population is available extract a subset 2 Data on population is not available determine how many people are needed to obtain a representative sample survey n individuals Types of samples 1 Random sample 2 Non random methods convenience sample judgement sample quota sampling Data sets created by any of these methods cannot be used for making inferences Random Samples 1 In the simple version each element of the population has same chance of being selected 3 2 Other versions can also provide an unbiased sample a Strati ed random sample b Cluster sample c 1 in k systematic random sample Two methods of data collecting 1 Sampling from existing database eg stock market activity price data from a random sample of grocery stores researcher making use of data collected by the government secondary source data collection lx39 Retrieving data directly from the respondents using a survey designed by the researcher primary source data collection Secondary data source 1 Bene ts low cost data is typically high quality takes less time to obtain 2 Cost variables might not be a close t to the variables desired by the researcher Primary data source 1 Bene ts variables come close to tting the type of variables desired 2 Costs added cost of survey design data collection and processing the data Data collection problems that can result in an biased sample 1 Distributing surveys to a random sample and accepting a low response rate lx39 Collection techniques reaches only a subset of the full sample Even with a 100 response rate the methods will produce a biased sample 00 Wording Interviewer bias The choice of words used in the survey and the choice of interviewers can bias the results Sampling Distribution Sampling distribution of a statistics is the probability distribution for all possible values of the statistics that results when random sample of size n are repeatedly drawn from the population Methods of obtaining a sampling distribution 1 Derive the distribution mathematically using the laws of probability Examples 73 and tables 75 are examples of this method 2 Approximate the distribution empirically by drawing a large number of samples 7 3 Use statistical theorems such as the Central Limit theorem to derive exact or approximate distributions Central Limit Theorem If random samples of n observations are drawn from a non normal population with nite mean u and standard deviation 0 then when n is large the sampling distribution of the same mean aquot is approximately normally distributed with mean and standard deviation also known as the standard error of the mean Conditions Under certain conditions the means of random samples drawn from a population tend to approximate a normal distribution Conditions 1 If the population can be represented by a normal distribution the sampling distribution of aquot will be normal 2 If the population can be represented by a symmetric distribution the sampling 8 00 distribution of 3 becomes normal for small values of n for samples that are small relative to the population If the population can be represented by a skewed distribution the sampling distribution of 3 becomes normal for large values of nfor samples that are large relative to the population Tools to assess aquot given u u H 00M pp Compute the mean and standard deviation of the sample distribution m of Determine condition to test eg PG lt Convert aquot to a z score using the following function 35 Mr 0 w Use the table in the back of the book to test the probability condition region of the distribution Sampling Distribution of sample proportion 1 Recall from the previous chapter gt Let X be a binomial random variable with n trials and probability p of success The probability distribution of X approximates the normal with uznpandazm There is a similar outcome in sampling Let s assume that the sampling distribution has the following characteristics For a sample the probability of successes is equal to the number of person with this characteristics over the total number of persons in the sample or A a p n lx39 where 25 is the probability of success derived from the sample 3 For the sampling distribution W p 0quot m p n Where q 1 p If np gt 5 and nq gt 5 the sampling distribution can be approximated by the normal distribution Tools to assess 9 given that up p 1 Convert 25 into a z score and calculate the probability Econ 2370 Spring 2000 O Donnell 12 D Chapter 8 Large Sample Estimation Whenever we take a sample we do so with the idea of learning something about the population from which the sample is drawn Provided that the sample is drawn in an unbiased manner we believe that it may be taken representative of the parent population But representatives are not all equally authoritative Spokesmen even official spokesmen do not always tell a reliable tale and it is necessary in retelling a story secondhand from such a source that we indicate the degree of confidence which may be placed in what the spokesman has said Just the journalist tries to emphasize for his readers the difference between rumours and usually well informed sources so too the statistician has to attempt a similar thing Given large sample the problem is easily enough disposed of intuitively But when the samples are small we have to face not only the possibility of bias but also the fact that the average standard deviation or proportion found in the sample may differ be quite appreciably from the population parameters it is sought to estimate through the sample It is evident that there can be no possibility of finding a method of estimation which will guarantee us a close estimate under all conditions All we can hope for is a method which will be the best possible in the sense that it will have a high probability of being correct in the long run MJ Moroney Facts from Figures Points covered in this chapter Two approaches to estimate population parameters eg to estimate the mean and variance of the population for a normal distribution and the proportion for a binomial distribution when these values are unknown b Properties of sound estimators c Calculation of the margin of error and confidence intervals d How to choose a sample size Prior material used in this section Standard error measurement b 2 scores c Tchebysheflquots Theorem and Empirical Rule 36 Econ 2370 Spring 2000 O Donnell d lentral Limit Theorem if the sample size is large eg if n is large the sampling distribution will be approximately normal If the sample is normal we have a large set of statistical tools to our disposal 3 Types of estimators Point Estimator what is the best single value that can be used to estimate a population parameter b Interval Estimator what is the best interval refer to a con dence inter val that contains the population estimate Tied to the notion of the con dence interval is the con dence coefficient 1 oz Where 001 g oz g 010 4 Properties of point estimators Unbiased average values of the estimated parameter equals the population parameter b Consistent Estimators from sample converge to the true value the sample size increases c Ef cient Estimator with smallest sampling variance 5 Univariate Analysis Estimating point estimator i For population mean gt 7 Margin of error gt 196 Standard error of the estimator or 32194 If a is unknown and n 2 30 one can substitute 8 for 0 ii For population proportion gt 13 Z n Margin of error imamIE n 32196013 n estimated Recall up gt 5 and n gt 5 b Estimating interval estimator 37 Econ 2370 Spring 2000 O Donnell leneral function two tail test H Point estimator 1 gt1 Standard Error A Population mean When n gt 30 B Population proportion 13 3 39 General function left tail test onesided confidence interval H H Point estimator zQStandard Error 39 General function right tail test onesided confidence interval H H H Point estimator QStandard Error iv values of and 20 for given values of oz oz two tail 20 one tail Con dence 0010 258 233 990 0020 233 2055 980 0025 224 196 975 0050 196 1645 950 0100 1645 128 900 6 Bivariate Analysis This type of analysis works with two samples each drawn from different populations For this form of bivariate analysis the research question is Are the populations different Using the example from the textbook one would want to test if the average MCAT scores for biochemistry and biology majors are the same If there is no difference between these two populations biochemistry and biology students the difference in their population means m 12 would equal 0 This research question will be addressed briefly in this section and in more detail in Chapters 9 and 10 Right now we wish to deal with the point and interval estimates from the samples drawn from two populations There are two kinds of data sets used for bivariate analysis Data sets are not paired or data sets are independent from one another There is no relationship between the two parameters differenced eg MCAT scores for Biochemistry and Biology Majors 38 Econ 2370 Spring 2000 O Donnell b Data sets are paired eg there is a relationship between the two data sets Examples comparing the differences in gas mileage when a car is rst given one type and then another type of gasoline test scores of trainees before and after Viewing an instructional Video Properties of Sampling distribution of 771 52 not paired Mean and Standard error Mil 29 M 2 2 2 a 0 SE a 71 72 x1 x m m Margin of Error 2 2 a a 32196 4 4 TM 7amp2 Con dence interval twotail 2 2 a 0 L711 f2 i 55 i1 i2 m 7amp2 i If sampled populations are normally distributed then the sampling distribu tion of 771 52 is normally distributed regardless of size ii If the sampled populations are not normally distributed then the sampling distribution of 51 52 is approximately normally distributed when m and m are large due to the Central Limit Theorem iii If a and 03 are unknown but both m and m are greater than or equal to 30 you can substitute the sample variances for the population variances lV Use 2 values found in section 5BiV on preVious page b Properties of Sampling distribution of 131 132 not paired Mean and Standard Error A A 971 972 P1 39112 lm 132 P1 112 P191 P292 SE Owl gag l W K 39 Econ 2370 Spring 2000 O Donnell Margin of Error i196 111191 P2Q 2 7quot 712 lon dence Interval twotail The sampling distribution of 131 132 is approximately normally distributed when m and n2 are large due to the Central Limit Theorem 39 m and n2 must be sufficiently large so that the sampling distribution of 33 132 can be approximated by a normal distribution mpgQO mpg and H H H meg gt 5 ii Use 2 values found in section 5Biv H Properties of Sampling distribution of 771 52 paired Mean and Standard error A O V where a is the variance of the di 39erenced data and m m n 2 3219603 71 Margin of Error Con dence interval twotail d Properties of Sampling distribution of 131 132 paired Will not be covered in this class 7 Choosing a sample size Choosing a sample size is an application of the point and interval estimation techniques Suppose you want to generate a sample such that the margin of error is equal to some value let s call it B You also want a sample such that 95 of repeated sampling will given you a margin of error less than or equal to B 40 Econ 2370 Spring 2000 O Donnell For univariate and bivariate analyses each margin of error function is a function of n Here is the case for the population mean univariate case B 32 194 IfI rearrange the function above one nds the function for computing the sample size ngt 22 If a is not known the sample standard deviation can be used or a value based on the range of the values divided by 4 In order to prepare a sample with a different degrees of con dence just replace the margin of error function with the con dence interval function two tail version Below is a table of the set of function one can use to determine the sample size B is equal to margin of error Analysis Estimator Minimum sample size Univariate 77 n 2 g202BQ I3 n 2 xiQWVBQ Bivariate not paired 51 772 n 2 g2w 03B2 131 132 n 2 xi201101 meg82 For the Bivariate functions m m n B is the acceptable margin of error If a is not known the sample standard deviation can be used or a value based on the range of the values divided by 4 41 Large Sample tests of hypothesis Main points in this chapter H 00 Standard method to test research questions Discussion risks involved when decision based on the test is incorrect Detailed discussion of the standard method Application of standard method for research questions using large samples Recall chem lab or your biology lab in high school and the Scienti c Method Observations A good scientist is observant and Hypothesis Testing notices thing in the world around him herself She sees hears or in some other way notices whats going on in the world becomes curious about whats happening and raises a question about it This is a tentative answer to the question an explanation for what was observed The scientist tries to explain what caused what was observed hypo under beneath thesis an arranging 1 Hypotheses are possible causes An hypothesis is not an observation rather a tentative explanation for the observation 1 lx39 00 r U Hypotheses re ect past experience with similar questions educated propositions about cause Multiple hypotheses should be proposed whenever possible One should think of alternative causes that could explain the observation the correct one may not even be one that was thought of Hypotheses should be testable Hypotheses can be proven wrong incorrect but can never be proven or con rmed with absolute certainty Someone in the future with more knowledge may nd a case where the hypothesis is not true Statistical method H lx39 00 F U Observe the economy raise a question or set of questions Prepare an answer in the form of a hypothesis HO also known as the null hypothesis Prepare counter responses Ha alternative if the null is proven wrong or incorrect Collect data Specify a statistical test 2 6 Determining the critical regions to reject 7 Obtain the ndings prepare the results My modest example using Spam and eggs Implications from theory what the theory predicts with the respect the differences in the proportion of income spent on a good Income spent on good i Total income Let s look at the value of 7 spent by high income 7H2in and the the value of 7 spent by low income 7L0w wow 7mm lt O gt Luxury good wow 7mm O gt Normal good 71 7mm gt O gt Inferior good Test 1 is Spam a luxury good 1 Hypothesis to reject gt 7L0 7mg 2 O I wish to reject the notion that the good could be either a normal or inferior good Logically rejecting this hypothesis implies that I fail to reject that it is a luxury good Failing to reject Accepting an outcome Accepting an outcome implies that you have accepted the theory Full bank of tests for Is Spam a luxury good 1a Null Hypothesis to reject gt 7L0 7mg 2 0 1b Alternative hypothesis gt 7L0 7mg lt 0 2a Null Hypothesis to reject gt 7L0 7mg 0 2b Alternative hypothesis gt 7L0 7mg 0 3a Null Hypothesis to reject gt 710 7mg 3 0 3b Alternative hypothesis gt 7L0 7mg gt O The researcher will want to reject Tests 1 and 2 and fail to reject 3 Further comment regarding the three tests Two are one tailed tests one is a two tailed test Speci cs Making Assumptions H Type variable categorical interval 2 Type of population binomial nornial nornial given the Central Liniit Theoreni Type of analysis univariate bivariate 00 Large or small sample Differences in variances has an impact on differences in means test using a small sample O O Null hypothesis HO and alternative hypothesis one or two tailed test 4 Speci cs Sampling Distribution 1 Standard normal 2 scores 2 Student s t distribution 3 X2 distribution 4 F distribution Speci cs One tailed or two tailed If the hypothesis is an inequality eg u gt 0 u lt 1 we can use a one tail test If we are testing if u is a speci c value the alternative hypothesis is that u is not this value and can be any value in the distribution For this case we use a two tail test Speci cs Choosing a critical region Describes rejection area Answers the questions what are we willing to risk in being wrong Three scenariostwo tailed test Scenario 1 04 20 052 2 10 or p value 010 means that when we reject the hypothesis we reject it with a con dence level of 80 Scenario 2 04 10 052 2 5 or p value 005 means that when we reject the hypothesis we reject it with a con dence level of 90 Scenario 3 04 2 052 2 1 or p value 001 means that when we reject the hypothesis we reject it with a con dence level of 98 Notion of Signi cance table on page 346 Do researchers only report results that are signi cant Risk There are two types 1 Type I error rejecting a hypothesis when in fact it is true 2 Type II error failing to reject a hypothesis when one should reject it Probability of making a Type I error Signi cance level p value tells you the probability of making a Type I error Amount of risk for the three scenarios Highest risk of making a Type I error lowest con dence level taken Lowest risk of making a Type I error highest con dence level taken Probability of making a Type II error The probability of a Type II error is 5 6 Power of your statistical test is given as 1 5 Computing 5 and 16 Suppose your hypothesis test is that HO M0 A You want to compute a power test to determine the probability of rejecting HO when the alternative mean M C 1 Compute the two con dence interval values The book uses the margin of error values but the example 98 assumes that the signi cance level is 5 My instructions apply for all signi cance levels These values are the endpoints of the Type II region The formula for the con dence interval is 10 l Note function uses m From this you have the left boundary value LBV and right boundary value REV lx39 Draw the two graphs The left and right boundary points are points around no Determine where M is relative to these boundaries and determine the rejection area of the new distribution overlapping the acceptance region of the old distribution 7 3 Compute z scores for the two values using the following functions LBV a 2left boundary S Eu B a 2right boundary fig EM Note this set of functions uses Ma 4 Given the drawing above determine the paccepting ha when u M The power of the test or the probability of correctly rejecting HO given that u is M is 15 Relationships between Type I 85 II probabilities and power test 1 Increasing the significance level reduces the confidence interval thus increasing the probability of a Type II error and reducing the power of the test lx39 Increasing the sample size decreases the standard error This decreases the probability of a Type II error and increases the power of the test 00 If M is very close to no it weakens the power test


Buy Material

Are you sure you want to buy this material for

25 Karma

Buy Material

BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.


You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

Why people love StudySoup

Steve Martinelli UC Los Angeles

"There's no way I would have passed my Organic Chemistry class this semester without the notes and study guides I got from StudySoup."

Anthony Lee UC Santa Barbara

"I bought an awesome study guide, which helped me get an A in my Math 34B class this quarter!"

Bentley McCaw University of Florida

"I was shooting for a perfect 4.0 GPA this semester. Having StudySoup as a study aid was critical to helping me achieve my goal...and I nailed it!"

Parker Thompson 500 Startups

"It's a great way for students to improve their educational experience and it seemed like a product that everybody wants, so all the people participating are winning."

Become an Elite Notetaker and start selling your notes online!

Refund Policy


All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email


StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here:

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.