### Create a StudySoup account

#### Be part of our community, it's free to join!

Already have a StudySoup account? Login here

# Elementary Statistical Meth II MATH 3229

ECU

GPA 3.92

### View Full Document

## 35

## 0

## Popular in Course

## Popular in Mathematics (M)

This 8 page Class Notes was uploaded by Dr. Tyrell McKenzie on Sunday October 11, 2015. The Class Notes belongs to MATH 3229 at East Carolina University taught by Peng Xiao in Fall. Since its upload, it has received 35 views. For similar materials see /class/221296/math-3229-east-carolina-university in Mathematics (M) at East Carolina University.

## Similar to MATH 3229 at ECU

## Popular in Mathematics (M)

## Reviews for Elementary Statistical Meth II

### What is Karma?

#### Karma is the currency of StudySoup.

#### You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 10/11/15

E BAS TH J STATISTICS A set of tools for collecting organizing presenting and analyzing numerical facts or observations 1Descriptive Statistics procedures used to organize and present data in a convenient useable and communicable form 21nferential Statistics procedures employed to arrive at broader generalizations or inferences from sample data to populations J STATISTIC A number describing a sample characteristic Results from the manipulation of sample data according to certain specified procedures 1 DATA Characteristics or numbers that are collected by observation J POPULATION A complete set of actual or potential observations 1 PARAMETER A number describing a population characteristic typically inferred from sample statistic 3 SAMPLE A subset of the population selected according to some scheme J RANDOM SAMPLE A subset selected in such a way that each member of the population has an equal opportunity to be selected Ex lottery numbers in a fair lottery 3 VARIABLE A phenomenon that may take on different values 0 occurs when the values of a variable are arranged in order according to their magnitudes D GROUPED FREQUENCY DISTRIBUTION A frequency distribution in which the values of the variable have been grouped into classes STATJ IC PRINCIPLES OF STATISTICS J MEAN The point in a distribution of measurements about which the summed deviations are equal to zero Average value of a sample or population POPULATION MEAN SAMPLE MEAN 1 N 1 n l1W21xi 367239 i i1 Note The mean is very sensitive to extreme measure ments that are not balanced on both sides I WEIGHTED MEAN Sum of a set of observations multiplied by their respective weights divided by the sum of the weights E w xi WEIGHTED MEAN L10 2 quot 139 i1 where wi weight xi observation G number of observation groups Calculated from a opulation sample or groupings 1n a frequency distri ution EX In the FrequencyDistribution below the mean is 803 calculated by using frequencies for the wi s When grouped use class midpoints for xi s C MEDIAN Observation or potential observation in a set that divides the set so that the same number of observations lie on each side of it For an odd number of values it is the middle value for an even number it is the average of the middle two Ex In the Frequency Distribution table below the median is 795 I MODE Observation that occurs with the greatest frequency Ex In the Frequency Distribution table below the mode is 88 PING TA D CUMULATIVE FREQUENCY DISTRI BUTION A distribution which shows the to tal frequency through the upper real limit of each class C CUMULATIVE PERCENTAGE DISTRI BUTION A distribution which shows the to tal percentage through the upper real limit of each class STICS 4e parametersvartabsancer valspcopor tions FOR INTRODUCTORY COURSES A I K I I 39 quot I CI SUM OF SQUARES SSJ Deviations from the mean squared and summed 2 2 2 X X Population SSZ xi ux or E x N 2 Sample 88 XXi gtlt20r 2 92 2 CI VARIANCE The average of square differ ences between Observations and their mean POPULATION VARIANCE SAMPLE VARIANCE N 02L3xi2 2 Xi 72 M 1 n 1i 1 VARIANCES FDR GROUPED DATA POPULATION SAMPLE G 02T1T2 fimi J 2 5223172 fimiquotx2 i1 CI STANDARD DEVIATION Square root of the variance Ex Pop SD D BAR GRAPH A form of graph that uses bars to indicate the frequency of occurrence of observations 0 Histogram a form of bar graph used with interval or ratioscaled variables Interval Scale a quantitative scale that permits the use of arithmetic operations The zero point in the scale is arbitrary Ratio Scale same as interval scale except that there is a true zero point CI FREQUENCY CURVE A form of graph representing a frequency distribution in the form of a continuous line that traces a histogram 0 Cumulative Frequency Curve a continuous line that traces a histogram where bars in all the lower classes are stacked up in the adjacent higher class It cannot have a negative slope 0 Normal curve bellshaped curve 0 Skewed curve departs from symmetry and tailsoff at one end NORMAL CURVE 5 0 5 SKEWED CURVE NBdOv PROBABILITY The long term relative frequency with which an outcome or event occurs Probability of occurrence M A Number of outcomes favoring EventA of EventA Total number of outcomes Cl SAMPLE SPACE All possible outcomes of an experiment C TYPE OF EVENTS o Exhaustive two or more events are said to be exhaustive if all possible outcomes are considered Symbolically p A or B or l ONonExhaustive two or more events are said to be non exhaustive if they do not exhaust all possible outcomes oMutuaIIy Exclusive Events that cannot occur simultaneouslyzp A and B 0 andp A or B p A p B Ex males females oNonMutually Exclusive Events that can occur simultaneously p A or B p A p B p A and B Ex males brown eyes Independent Events whose probability is unaffected by occurrence or nonoccurrence of each other pA B pA pB A pB and pA and B pA pB Ex gender and eye color Dependent Events whose probability changes depending upon the occurrence or non occurrence of each other pA l B differs from pA pB I A differs from pB and pA and B pA pBIA pB pAlB Ex race and eye color CI JOINT PROBABILITIES Probability that 2 or more events occur simultaneously CI MARGINAL PROBABILITIES or Uncondi tional Probabilities summation of probabilities Cl CONDITIONAL PROBABILITIES Probability of A given the existence of S written p A S 1 EXAMPLE Given the numbers 1 to 9 as observations in a sample space oEvents mutually exclusive and exhaustive Example p all odd numbers p all even numbers Events mutually exclusive but not exhaustive Example p an even number p the numbers 7and 5 Events neither mutually exclusive or exhaustive Example p an even number or a 2 A EVENT C EVENT D TOTALS EVENT E 52 36 87 EVENT F 62 71 1 33 TOTALS 1 1 4 quot JOI ot A MARGINAL CONDITIONAL EVENT C EVENT D PROBABILITY PROBABILITY EVENT E 024 016 040 338153 CF047 DF053 EVENTF 028 032 060 MARGINAL PROBABILITY 052 048 100 CONDITIONAL EC046 ED033 PROBABILITY FCO54 FD067 Cl SAMPLING DISTRIBUTION A theoretical probability distribution of a statistic that would result from drawing all possible samples of a given size from some population THE STANDARD moon or THE MEAN A theoretical standard deviation of sample mean of a given sample size drawn from some specified popu lation CIWhen based on a very large known population the standard error 1s 0 I O v1 ClWhen estimated from a sample drawn from very large population the standard error is 039 z IE CIThe dispersion of sample means decreases as sample size is increased RANDOM VARIABLE A mapping or function that assigns one and only one numerical value to each outcome in an experiment E DISCRETE RANDOM VARIABLES In volves rules or probability models for assign ing or generating only distinct values not frac tional measurements E BINOMIAL DISTRIBUTION A model for the sum of a series of n independent trials where trial results in a 0 failure or 1 suc cess Ex Com toss ps Islj sl n s where ps is the probability ofs success in II trials with a constant 15 probability per trials j and Where Binomial mean It nTT Binomial variance 72 nzr 1 7T As it increases the Binomial approaches the Normal distribution C HYPERGEOMETRIC DISTRIBUTION A model for the sum of a series of 11 trials where each trial results in a O or 1 and is drawn from a small population with N elements split between N1 successes and N2 failures Then the probabil ity of splitting the n trials between X successes and x2 failures is N13 N23 xllN xI x2N2 x2 quotKT ll N Hypergeometrrc mean u bx1 quotN39 and variance 02 N n nN1N2 pxl and x2 2 N l N W C POISSON DISTRIBUTION A model for the number of occurrences of an event x 012 when the probability of occurrence is small but the number of opportunities for the occurrence is large for x 0123 and 3 gt 0 otherwise Px 0 eAAX x Poisson mean and variance 2t pm For continuous variables frequencies are expressed in terms ofureus under I cum CI CONTINUOUS RANDOM VARIABLES Variable that may take on any value along an uninterrupted interval of a numberline Cl NORMAL DISTRIBUTION bell curve a distribution whose values cluster symmetri cally around the mean also median and mode 1 4x quotll 220392 x 3 WW where f x frequency at a given value 039 standard deviation of the distribution 7 approximately 31416 e approximately 27183 II the mean of the distribution x 2 any score in the distribution CI STANDARD NORMAL DISTRIBUTION A normal random variable Z that has a mean of 0 and standard deviation of 1 CI ZVALUES The number of standard devia tions a specif 429111516 aw INFERENCE FOR PARAMETERS 39NBIASEDNESS Property of a reliable es imator being estimated Unbiased Estimate of 8 Parameter an estimate that equals on the average the value of the parameter Ex the sample mean is an unbiased estimator of the population mean Biased Estimate of 3 Parameter an estimate that does not equal on the average the value of the parameter Ex the sample variance calculated with n is a bi ased estimator of the population variance however when calculated with nI it is unbiased J STANDARD ERROR The standard deviation of the estimator is called the standard error Ex The standard error fori s is O39 61 n This has to be distinguished from the STAN DARD DEVIATION OF THE SAMPLE l n 1 izltxxxf The standard error measures the variabilit in the V39s around their ex ected value Ei t while I e stan dard deviation of tlie sample re ects the variability in the sample around the sample39s mean i n USED WHEN THE STANDARD DEVIA TION IS UNKNOWN Use of Student s t When 6 is not known its value is estimated from sample data a t ratio the ratio employed in the testing of y otheses or determining the Significance of a l ference between means twosample case involving a sample With a tdistribution The formula is f i u where it population mean under H0 S X andATSV 0Distributionsymmetrical distribution with a mean of zero and standard deViation that approaches one as de rees of freedom increases lie approaches the distribution Assumption and condition required in assuming t distribution Samples are drawn from a normally distributed population and 039 1population standard deviation is unknown 0 Homogeneity of Variance If 2 samples are being compared the assumption in using tratio is that the variances of the populations from 6 here the samples are drawn are equal 0 Estimated 0393 of that is Si 45 is based on the unbiased estimate of the population variance 0 Degrees of Freedom df the number of values that are free to vary after placing certain restrictions on the data Example The sample 43 7442055 has n 4 The sum is 224 and mean 56 Using these 4 numbers and determining deviations from the mean we ll have 4 deviations namely 1 31 8 1 4 9 which sum up to zero Deviations from the mean is one restriction we have imposed and the natural consequence is that the sum of these deviations should equal zero For this to happen we can choose any number but our freedom to choose is limited to only 3 numbers because one is restricted by the requirement that the sum of the de viations should equal zero We use the equality x 33 x2jr x3 x4f 0 So given a mean of 5 6 ifthe first 3 observations are 74 and 42 the last observation has to be 65 This single restriction in this case helps us determine d The formula is n less number of restrictions In this case it is n 1 41 3df x by tRatio is a robust test This means that statistical inferences are likely valid despite fairly large departures from normality in the population distribution If nor mality of population distribution is in doubt it is wise g to increase the sample size USING THE Z STATISTIC Cl USED WHEN THE STANDARD DEVIA TION IS KNOWN When 5 is known it is pos sible to describe the form of the distribution of the sample mean as a Z statistic The sample must be drawn from a normal distribution or have a sample size n of at least 30 z 50 f where u population mean either knowrir or hypothesized under Ho and of Shin 39 Critical Region the portion of the area under the curve which includes those values of a statistic that lead to the rejection of the null hypothesis The most often used significance levels are 001 005 and 01 For a onetailed test using z statistic these correspond to zvalues of 233 165 and 128 respectively For a two tailed test the critical region of 001 is split into two equal outer areas marked by z values of 1258 Example 1 Given a population with u250 and O39 50 what is the probability of drawing a sample of n100 values whose mean 3 is at least 255 In this case Z 1 00 Looking at Table A the given area for Z100 is 03413 To its right is 0158 70503413 or 1585 Conclusion there are approximately 16 chances in 100 of obtaining a sample mean 255 from this population when n 100 Example 2 Assume we do not know the population mean However we suspect that it may have been selected from a population with u 250 and 039 50 but we are not sure The hypothesis to be tested is whether the sample mean was selected from this popula tion Assume we obtained from a sample n of 100 a sample mean of 263 Is it reason able to assume that this sample was drawn from the suspected population 1 Hazy 250 that the actual mean of the popu lation from which the sample is drawn is equal to 250 H I u not equal to 250 the alternative hypothesis is that it is greater than or less than 250 thus a two tailed test 2 zstatistic will be used because the popula tion 039 is known 3 Assume the significance level Ct to be 001 Looking at Table A we find that the area be yond a z of 258 is approximately 0005 To reject H0 at the 001 level of significance the ab solute value of the obtained z must be equal to or greater than Izoml or 258 Here the value of z cor responding to sample mean 263 is 260 CI CONCLUSION Since this obtained z falls within the critical region we may reject H 0 at the 001 level of significance Cl CONFIDENCE INTERVAL Interval within which we may consider a hypothesis tenable Common confidence intervals are 90 95 and 99 Confidence Limits limits defining the confidence interval 1 0t100 confidence interval for u mate 0511M 106 where Z is the value of the standard normal variable z that puts 0t2 per cent in each tail of the distribution The confi dence interval is the complement of the critical regions A tstatistic may be used in place of the z statistic when G is unknown and s39must be used as an estimate But note the caution in that section Critical region for rejection of Ho when a 001 twotailed test meantoz A Normal Curve Areas area from 0 z Table 03 04 05 06 0040 0438 0832 1217 1591 1950 0120 0517 0910 1293 1664 2019 0160 0557 0948 1331 1700 2054 0199 0596 0987 1368 1736 2088 0239 0636 1026 1406 1772 2123 2291 2611 2910 3186 2357 2673 2967 3238 2389 2704 2995 3264 2422 2734 3023 3289 2454 2764 3051 3315 3422 3485 3599 2521 3554 3665 37083729 3749 3770 3869 39073925 39443962 4049 40824099 4115 4131 4207 4236425142654279 4345 4370438243944406 4463 4484449545054515 4564 4582459145994608 4649 4664467146784686 4732473847444750 4778 4788479347984803 W70 4826 4834483848424846 4864 4871487548784881 4896 4901490449064909 4920 4925492749294931 4940 494349454946 4955 495749594960 4966 49684969 4975 49774977 4982 49834984 4988 4988 4989 C SAMPLING DISTRIBUTION OF THE DIFFERENCE BETWEEN MEAN S If a num ber of pairs of samples were taken from the same population or from two different populations then 0 The distribution of differences between pairs of sample means tends to be normal z distribution 0 The mean ofthese differences between means Hi1 is equal to the difference between the X popufation means that is Jr12 J ZDISTRIBUTION G and 0392 are known 0 The standard error of the difference between means Gilr2l i i6 2 0 Where 111 uz re resents the hypothesized dif ference in means the ollowmg statistic can be used for hypothe51s tests fl izlli H2 o xi39xz 0 When n and n2 are gt30 substitue s1 and 2 for 61 and 02 respectively 442553 e a To obtain sum of squares SS see Measures ofCen tral Tendency on page I E POOLED tTEST 0 Distribution is normal 0 n lt 30 o 61 and 02 are not known but assumed equal The hypothesis test may be 2 tailed vs or 1 tailed uISp and the39alternative IS it gt uz or 1 2 uz and the a ernative is ullt p2 degrees of freedomdf n11n21n1n2 2 Use the given formula below for estimating 0x 2 to determine sag s xl xz Determine the critical region for rejection b as signinghan acceptable level of Significance and 00k ing at t e ttable With d 11 n22 0 Use the following formula for the estimated stan dard error n1 ls12 n2 1s22 n1ngl n1n22 71an Sitf2 AOPENA El HETEROGENEITY OF VARIANCES may be determined by using the F test F S2 larger variance S2 smaller variance Cl NULL HYPOTHESIS Variances are equal and their ratio is one DALTERNATIVE HYPOTHESISVariances differ and their ratio is not one CI Look at Table C below to determine if the variances are significantly different from each other Use degrees of freedom from the 2 samplesn 11 7121 T I Top row05 Bottom row01 points for distribution of F Degrees of freedom for numerator 5 230 234 237 5625 5859 5928 1925 1930 1933 1936 9925 9930 9933 9934 888 2767 608 Critical Values of F 439 137410921978 915 875 847 826 474 435 412 397 387 379 371 368 845 785 746 719 700 684 671 407 384 369 358 350 344 239 701 663 637 619 603 E 91 363 348 337 329 323 C 18 642 601 580 562 547 E 35 348 333 322 314 307 302 564 Degrees of freedom for denominator 1532133 at CORRELATED SAMPLES Cl STANDARD ERROR OF THE DIFFER ENCE between Means for Correlated Groups The general formula is l 2 2 SS le x2 x1 x2 where r is Pearson correlation o By matching samples on a variable correlated with the criterion variable the magnitude of the standard error of the difference can be reduced 0 The higher the correlation the greater the reduction in the standard error of the difference ANALYSIS OF VARIANCE ANOVA Cl PURPOSE Indicates possibility of overall mean effect of the experimental treatments before investigating a specific hypothesis CI ANOVA Consists of obtaining independent estimates from population subgroups It allows for the partition of the sum of squares into known components of variation Cl TYPES OF VARIANCES BetweenGroup Variance BGV re ects the mag nitude of the differences among the group means 39 WithinGroup Variance WGV re ects the dispersion within each treatment group It is also referred to as the error term I CALCULATING VARIANCES Followin the Fratio when the BGV is large relative to t e WGV the Fratio will also be large nchi Stat2 BGV kI where xi mean of ith treatment group and ct mean of all n values across all k treatment groups SS 1 SS2 SSk V nk where the 88 s are the sums of squares see Mea sures of Central Tendency on page 1 of each subgroup s values around the subgroup mean Cl USING F RATIO F BGVWG V 0 Degrees of freedom are kl for the numerator and n k for the denominator 39 If BG V gt WG V the experimental treatments are responsible for the large differences among group means Null hypothesis the group means are estimates of a common population mean peeponrious In random samples of size n the sample propor tion p fluctuates around the proportion mean 2 7t 7r17r n with a proportion variance of standard error of i 1r11r n As the sampling distribution ofp increases it concentrates more around its target mean It also gets closer to It ll normal distribution In which case z l n mon proportion ISBN l57EEEELl l8 5049i 9 781572 226494 quickstudycom CORRELATION De nition Correlation refers to the relationship between two variables The Correlation Coef cient is a measure that expresses the extent to which two variables are related El PEARSON r METHOD ProductMoment Correlation Coefficient Corelation coefficient employed with interval or ratioscaled variables EL Given observations to two variablesX and Y we can compute their corresponding z values Zx xx sx and Zy yysy The formulas for the Pearson correlation r r Ex Xy Ilssx ssy Use the above formula for large samples Use this formula also known as the MeanDeviation Method of computing the Pearson r for small samples 2 zx zy l39 T Cl RAW SCORE METHOD is quicker and can be used in place of the first formula above when the sample values are avaigblfv Evilsi twain e 0Most widely used non parametric test The X2 mean its degrees of freedom 0 The X2 variance twice its degrees of freedom 0 Can be used to test one or two independent samples 0 The square ofa standard normal variable is a chisquare variable 0 Like the tdistibution it has different distribu tions depending on the degrees of freedom Cl DEGREES OF FREEDOM df COMPUTATION V 0 If chisquare tests for the goodnessof fit to a h potheSIZed distribution 1 g I m where i 39 number ofgroups or classes in the frequenc distribution In number of population parameters that must be estimated from sample statistics to test the hypotheSIS O lfchi square tests for homogeneity or contingenc df rowsI columnsI Cl GOODNESS OFFIT TEST To apply the chisquare distribution in this manner the critical chisquare value is expressed as 2 fa fe f0 observed frequ ncy of the variable fe e E BAS TH J STATISTICS A set of tools for collecting organizing presenting and analyzing numerical facts or observations 1Descriptive Statistics procedures used to organize and present data in a convenient useable and communicable form 21nferential Statistics procedures employed to arrive at broader generalizations or inferences from sample data to populations J STATISTIC A number describing a sample characteristic Results from the manipulation of sample data according to certain specified procedures 1 DATA Characteristics or numbers that are collected by observation J POPULATION A complete set of actual or potential observations 1 PARAMETER A number describing a population characteristic typically inferred from sample statistic 3 SAMPLE A subset of the population selected according to some scheme J RANDOM SAMPLE A subset selected in such a way that each member of the population has an equal opportunity to be selected Ex lottery numbers in a fair lottery 3 VARIABLE A phenomenon that may take on different values 0 occurs when the values of a variable are arranged in order according to their magnitudes D GROUPED FREQUENCY DISTRIBUTION A frequency distribution in which the values of the variable have been grouped into classes STATJ IC PRINCIPLES OF STATISTICS J MEAN The point in a distribution of measurements about which the summed deviations are equal to zero Average value of a sample or population POPULATION MEAN SAMPLE MEAN 1 N 1 n l1W21xi 367239 i i1 Note The mean is very sensitive to extreme measure ments that are not balanced on both sides I WEIGHTED MEAN Sum of a set of observations multiplied by their respective weights divided by the sum of the weights E w xi WEIGHTED MEAN L10 2 quot 139 i1 where wi weight xi observation G number of observation groups Calculated from a opulation sample or groupings 1n a frequency distri ution EX In the FrequencyDistribution below the mean is 803 calculated by using frequencies for the wi s When grouped use class midpoints for xi s C MEDIAN Observation or potential observation in a set that divides the set so that the same number of observations lie on each side of it For an odd number of values it is the middle value for an even number it is the average of the middle two Ex In the Frequency Distribution table below the median is 795 I MODE Observation that occurs with the greatest frequency Ex In the Frequency Distribution table below the mode is 88 PING TA D CUMULATIVE FREQUENCY DISTRI BUTION A distribution which shows the to tal frequency through the upper real limit of each class C CUMULATIVE PERCENTAGE DISTRI BUTION A distribution which shows the to tal percentage through the upper real limit of each class STICS 4e parametersvartabsancer valspcopor tions FOR INTRODUCTORY COURSES A I K I I 39 quot I CI SUM OF SQUARES SSJ Deviations from the mean squared and summed 2 2 2 X X Population SSZ xi ux or E x N 2 Sample 88 XXi gtlt20r 2 92 2 CI VARIANCE The average of square differ ences between Observations and their mean POPULATION VARIANCE SAMPLE VARIANCE N 02L3xi2 2 Xi 72 M 1 n 1i 1 VARIANCES FDR GROUPED DATA POPULATION SAMPLE G 02T1T2 fimi J 2 5223172 fimiquotx2 i1 CI STANDARD DEVIATION Square root of the variance Ex Pop SD D BAR GRAPH A form of graph that uses bars to indicate the frequency of occurrence of observations 0 Histogram a form of bar graph used with interval or ratioscaled variables Interval Scale a quantitative scale that permits the use of arithmetic operations The zero point in the scale is arbitrary Ratio Scale same as interval scale except that there is a true zero point CI FREQUENCY CURVE A form of graph representing a frequency distribution in the form of a continuous line that traces a histogram 0 Cumulative Frequency Curve a continuous line that traces a histogram where bars in all the lower classes are stacked up in the adjacent higher class It cannot have a negative slope 0 Normal curve bellshaped curve 0 Skewed curve departs from symmetry and tailsoff at one end NORMAL CURVE 5 0 5 SKEWED CURVE NBdOv PROBABILITY The long term relative frequency with which an outcome or event occurs Probability of occurrence M A Number of outcomes favoring EventA of EventA Total number of outcomes Cl SAMPLE SPACE All possible outcomes of an experiment C TYPE OF EVENTS o Exhaustive two or more events are said to be exhaustive if all possible outcomes are considered Symbolically p A or B or l ONonExhaustive two or more events are said to be non exhaustive if they do not exhaust all possible outcomes oMutuaIIy Exclusive Events that cannot occur simultaneouslyzp A and B 0 andp A or B p A p B Ex males females oNonMutually Exclusive Events that can occur simultaneously p A or B p A p B p A and B Ex males brown eyes Independent Events whose probability is unaffected by occurrence or nonoccurrence of each other pA B pA pB A pB and pA and B pA pB Ex gender and eye color Dependent Events whose probability changes depending upon the occurrence or non occurrence of each other pA l B differs from pA pB I A differs from pB and pA and B pA pBIA pB pAlB Ex race and eye color CI JOINT PROBABILITIES Probability that 2 or more events occur simultaneously CI MARGINAL PROBABILITIES or Uncondi tional Probabilities summation of probabilities Cl CONDITIONAL PROBABILITIES Probability of A given the existence of S written p A S 1 EXAMPLE Given the numbers 1 to 9 as observations in a sample space oEvents mutually exclusive and exhaustive Example p all odd numbers p all even numbers Events mutually exclusive but not exhaustive Example p an even number p the numbers 7and 5 Events neither mutually exclusive or exhaustive Example p an even number or a 2 A EVENT C EVENT D TOTALS EVENT E 52 36 87 EVENT F 62 71 1 33 TOTALS 1 1 4 quot JOI ot A MARGINAL CONDITIONAL EVENT C EVENT D PROBABILITY PROBABILITY EVENT E 024 016 040 338153 CF047 DF053 EVENTF 028 032 060 MARGINAL PROBABILITY 052 048 100 CONDITIONAL EC046 ED033 PROBABILITY FCO54 FD067 Cl SAMPLING DISTRIBUTION A theoretical probability distribution of a statistic that would result from drawing all possible samples of a given size from some population THE STANDARD moon or THE MEAN A theoretical standard deviation of sample mean of a given sample size drawn from some specified popu lation CIWhen based on a very large known population the standard error 1s 0 I O v1 ClWhen estimated from a sample drawn from very large population the standard error is 039 z IE CIThe dispersion of sample means decreases as sample size is increased RANDOM VARIABLE A mapping or function that assigns one and only one numerical value to each outcome in an experiment E DISCRETE RANDOM VARIABLES In volves rules or probability models for assign ing or generating only distinct values not frac tional measurements E BINOMIAL DISTRIBUTION A model for the sum of a series of n independent trials where trial results in a 0 failure or 1 suc cess Ex Com toss ps Islj sl n s where ps is the probability ofs success in II trials with a constant 15 probability per trials j and Where Binomial mean It nTT Binomial variance 72 nzr 1 7T As it increases the Binomial approaches the Normal distribution C HYPERGEOMETRIC DISTRIBUTION A model for the sum of a series of 11 trials where each trial results in a O or 1 and is drawn from a small population with N elements split between N1 successes and N2 failures Then the probabil ity of splitting the n trials between X successes and x2 failures is N13 N23 xllN xI x2N2 x2 quotKT ll N Hypergeometrrc mean u bx1 quotN39 and variance 02 N n nN1N2 pxl and x2 2 N l N W C POISSON DISTRIBUTION A model for the number of occurrences of an event x 012 when the probability of occurrence is small but the number of opportunities for the occurrence is large for x 0123 and 3 gt 0 otherwise Px 0 eAAX x Poisson mean and variance 2t pm For continuous variables frequencies are expressed in terms ofureus under I cum CI CONTINUOUS RANDOM VARIABLES Variable that may take on any value along an uninterrupted interval of a numberline Cl NORMAL DISTRIBUTION bell curve a distribution whose values cluster symmetri cally around the mean also median and mode 1 4x quotll 220392 x 3 WW where f x frequency at a given value 039 standard deviation of the distribution 7 approximately 31416 e approximately 27183 II the mean of the distribution x 2 any score in the distribution CI STANDARD NORMAL DISTRIBUTION A normal random variable Z that has a mean of 0 and standard deviation of 1 CI ZVALUES The number of standard devia tions a specif 429111516 aw INFERENCE FOR PARAMETERS 39NBIASEDNESS Property of a reliable es imator being estimated Unbiased Estimate of 8 Parameter an estimate that equals on the average the value of the parameter Ex the sample mean is an unbiased estimator of the population mean Biased Estimate of 3 Parameter an estimate that does not equal on the average the value of the parameter Ex the sample variance calculated with n is a bi ased estimator of the population variance however when calculated with nI it is unbiased J STANDARD ERROR The standard deviation of the estimator is called the standard error Ex The standard error fori s is O39 61 n This has to be distinguished from the STAN DARD DEVIATION OF THE SAMPLE l n 1 izltxxxf The standard error measures the variabilit in the V39s around their ex ected value Ei t while I e stan dard deviation of tlie sample re ects the variability in the sample around the sample39s mean i n USED WHEN THE STANDARD DEVIA TION IS UNKNOWN Use of Student s t When 6 is not known its value is estimated from sample data a t ratio the ratio employed in the testing of y otheses or determining the Significance of a l ference between means twosample case involving a sample With a tdistribution The formula is f i u where it population mean under H0 S X andATSV 0Distributionsymmetrical distribution with a mean of zero and standard deViation that approaches one as de rees of freedom increases lie approaches the distribution Assumption and condition required in assuming t distribution Samples are drawn from a normally distributed population and 039 1population standard deviation is unknown 0 Homogeneity of Variance If 2 samples are being compared the assumption in using tratio is that the variances of the populations from 6 here the samples are drawn are equal 0 Estimated 0393 of that is Si 45 is based on the unbiased estimate of the population variance 0 Degrees of Freedom df the number of values that are free to vary after placing certain restrictions on the data Example The sample 43 7442055 has n 4 The sum is 224 and mean 56 Using these 4 numbers and determining deviations from the mean we ll have 4 deviations namely 1 31 8 1 4 9 which sum up to zero Deviations from the mean is one restriction we have imposed and the natural consequence is that the sum of these deviations should equal zero For this to happen we can choose any number but our freedom to choose is limited to only 3 numbers because one is restricted by the requirement that the sum of the de viations should equal zero We use the equality x 33 x2jr x3 x4f 0 So given a mean of 5 6 ifthe first 3 observations are 74 and 42 the last observation has to be 65 This single restriction in this case helps us determine d The formula is n less number of restrictions In this case it is n 1 41 3df x by tRatio is a robust test This means that statistical inferences are likely valid despite fairly large departures from normality in the population distribution If nor mality of population distribution is in doubt it is wise g to increase the sample size USING THE Z STATISTIC Cl USED WHEN THE STANDARD DEVIA TION IS KNOWN When 5 is known it is pos sible to describe the form of the distribution of the sample mean as a Z statistic The sample must be drawn from a normal distribution or have a sample size n of at least 30 z 50 f where u population mean either knowrir or hypothesized under Ho and of Shin 39 Critical Region the portion of the area under the curve which includes those values of a statistic that lead to the rejection of the null hypothesis The most often used significance levels are 001 005 and 01 For a onetailed test using z statistic these correspond to zvalues of 233 165 and 128 respectively For a two tailed test the critical region of 001 is split into two equal outer areas marked by z values of 1258 Example 1 Given a population with u250 and O39 50 what is the probability of drawing a sample of n100 values whose mean 3 is at least 255 In this case Z 1 00 Looking at Table A the given area for Z100 is 03413 To its right is 0158 70503413 or 1585 Conclusion there are approximately 16 chances in 100 of obtaining a sample mean 255 from this population when n 100 Example 2 Assume we do not know the population mean However we suspect that it may have been selected from a population with u 250 and 039 50 but we are not sure The hypothesis to be tested is whether the sample mean was selected from this popula tion Assume we obtained from a sample n of 100 a sample mean of 263 Is it reason able to assume that this sample was drawn from the suspected population 1 Hazy 250 that the actual mean of the popu lation from which the sample is drawn is equal to 250 H I u not equal to 250 the alternative hypothesis is that it is greater than or less than 250 thus a two tailed test 2 zstatistic will be used because the popula tion 039 is known 3 Assume the significance level Ct to be 001 Looking at Table A we find that the area be yond a z of 258 is approximately 0005 To reject H0 at the 001 level of significance the ab solute value of the obtained z must be equal to or greater than Izoml or 258 Here the value of z cor responding to sample mean 263 is 260 CI CONCLUSION Since this obtained z falls within the critical region we may reject H 0 at the 001 level of significance Cl CONFIDENCE INTERVAL Interval within which we may consider a hypothesis tenable Common confidence intervals are 90 95 and 99 Confidence Limits limits defining the confidence interval 1 0t100 confidence interval for u mate 0511M 106 where Z is the value of the standard normal variable z that puts 0t2 per cent in each tail of the distribution The confi dence interval is the complement of the critical regions A tstatistic may be used in place of the z statistic when G is unknown and s39must be used as an estimate But note the caution in that section Critical region for rejection of Ho when a 001 twotailed test meantoz A Normal Curve Areas area from 0 z Table 03 04 05 06 0040 0438 0832 1217 1591 1950 0120 0517 0910 1293 1664 2019 0160 0557 0948 1331 1700 2054 0199 0596 0987 1368 1736 2088 0239 0636 1026 1406 1772 2123 2291 2611 2910 3186 2357 2673 2967 3238 2389 2704 2995 3264 2422 2734 3023 3289 2454 2764 3051 3315 3422 3485 3599 2521 3554 3665 37083729 3749 3770 3869 39073925 39443962 4049 40824099 4115 4131 4207 4236425142654279 4345 4370438243944406 4463 4484449545054515 4564 4582459145994608 4649 4664467146784686 4732473847444750 4778 4788479347984803 W70 4826 4834483848424846 4864 4871487548784881 4896 4901490449064909 4920 4925492749294931 4940 494349454946 4955 495749594960 4966 49684969 4975 49774977 4982 49834984 4988 4988 4989 C SAMPLING DISTRIBUTION OF THE DIFFERENCE BETWEEN MEAN S If a num ber of pairs of samples were taken from the same population or from two different populations then 0 The distribution of differences between pairs of sample means tends to be normal z distribution 0 The mean ofthese differences between means Hi1 is equal to the difference between the X popufation means that is Jr12 J ZDISTRIBUTION G and 0392 are known 0 The standard error of the difference between means Gilr2l i i6 2 0 Where 111 uz re resents the hypothesized dif ference in means the ollowmg statistic can be used for hypothe51s tests fl izlli H2 o xi39xz 0 When n and n2 are gt30 substitue s1 and 2 for 61 and 02 respectively 442553 e a To obtain sum of squares SS see Measures ofCen tral Tendency on page I E POOLED tTEST 0 Distribution is normal 0 n lt 30 o 61 and 02 are not known but assumed equal The hypothesis test may be 2 tailed vs or 1 tailed uISp and the39alternative IS it gt uz or 1 2 uz and the a ernative is ullt p2 degrees of freedomdf n11n21n1n2 2 Use the given formula below for estimating 0x 2 to determine sag s xl xz Determine the critical region for rejection b as signinghan acceptable level of Significance and 00k ing at t e ttable With d 11 n22 0 Use the following formula for the estimated stan dard error n1 ls12 n2 1s22 n1ngl n1n22 71an Sitf2 AOPENA El HETEROGENEITY OF VARIANCES may be determined by using the F test F S2 larger variance S2 smaller variance Cl NULL HYPOTHESIS Variances are equal and their ratio is one DALTERNATIVE HYPOTHESISVariances differ and their ratio is not one CI Look at Table C below to determine if the variances are significantly different from each other Use degrees of freedom from the 2 samplesn 11 7121 T I Top row05 Bottom row01 points for distribution of F Degrees of freedom for numerator 5 230 234 237 5625 5859 5928 1925 1930 1933 1936 9925 9930 9933 9934 888 2767 608 Critical Values of F 439 137410921978 915 875 847 826 474 435 412 397 387 379 371 368 845 785 746 719 700 684 671 407 384 369 358 350 344 239 701 663 637 619 603 E 91 363 348 337 329 323 C 18 642 601 580 562 547 E 35 348 333 322 314 307 302 564 Degrees of freedom for denominator 1532133 at CORRELATED SAMPLES Cl STANDARD ERROR OF THE DIFFER ENCE between Means for Correlated Groups The general formula is l 2 2 SS le x2 x1 x2 where r is Pearson correlation o By matching samples on a variable correlated with the criterion variable the magnitude of the standard error of the difference can be reduced 0 The higher the correlation the greater the reduction in the standard error of the difference ANALYSIS OF VARIANCE ANOVA Cl PURPOSE Indicates possibility of overall mean effect of the experimental treatments before investigating a specific hypothesis CI ANOVA Consists of obtaining independent estimates from population subgroups It allows for the partition of the sum of squares into known components of variation Cl TYPES OF VARIANCES BetweenGroup Variance BGV re ects the mag nitude of the differences among the group means 39 WithinGroup Variance WGV re ects the dispersion within each treatment group It is also referred to as the error term I CALCULATING VARIANCES Followin the Fratio when the BGV is large relative to t e WGV the Fratio will also be large nchi Stat2 BGV kI where xi mean of ith treatment group and ct mean of all n values across all k treatment groups SS 1 SS2 SSk V nk where the 88 s are the sums of squares see Mea sures of Central Tendency on page 1 of each subgroup s values around the subgroup mean Cl USING F RATIO F BGVWG V 0 Degrees of freedom are kl for the numerator and n k for the denominator 39 If BG V gt WG V the experimental treatments are responsible for the large differences among group means Null hypothesis the group means are estimates of a common population mean peeponrious In random samples of size n the sample propor tion p fluctuates around the proportion mean 2 7t 7r17r n with a proportion variance of standard error of i 1r11r n As the sampling distribution ofp increases it concentrates more around its target mean It also gets closer to It ll normal distribution In which case z l n mon proportion ISBN l57EEEELl l8 5049i 9 781572 226494 quickstudycom CORRELATION De nition Correlation refers to the relationship between two variables The Correlation Coef cient is a measure that expresses the extent to which two variables are related El PEARSON r METHOD ProductMoment Correlation Coefficient Corelation coefficient employed with interval or ratioscaled variables EL Given observations to two variablesX and Y we can compute their corresponding z values Zx xx sx and Zy yysy The formulas for the Pearson correlation r r Ex Xy Ilssx ssy Use the above formula for large samples Use this formula also known as the MeanDeviation Method of computing the Pearson r for small samples 2 zx zy l39 T Cl RAW SCORE METHOD is quicker and can be used in place of the first formula above when the sample values are avaigblfv Evilsi twain e 0Most widely used non parametric test The X2 mean its degrees of freedom 0 The X2 variance twice its degrees of freedom 0 Can be used to test one or two independent samples 0 The square ofa standard normal variable is a chisquare variable 0 Like the tdistibution it has different distribu tions depending on the degrees of freedom Cl DEGREES OF FREEDOM df COMPUTATION V 0 If chisquare tests for the goodnessof fit to a h potheSIZed distribution 1 g I m where i 39 number ofgroups or classes in the frequenc distribution In number of population parameters that must be estimated from sample statistics to test the hypotheSIS O lfchi square tests for homogeneity or contingenc df rowsI columnsI Cl GOODNESS OFFIT TEST To apply the chisquare distribution in this manner the critical chisquare value is expressed as 2 fa fe f0 observed frequ ncy of the variable fe e

### BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.

### You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

## Why people love StudySoup

#### "I was shooting for a perfect 4.0 GPA this semester. Having StudySoup as a study aid was critical to helping me achieve my goal...and I nailed it!"

#### "I signed up to be an Elite Notetaker with 2 of my sorority sisters this semester. We just posted our notes weekly and were each making over $600 per month. I LOVE StudySoup!"

#### "It's a great way for students to improve their educational experience and it seemed like a product that everybody wants, so all the people participating are winning."

### Refund Policy

#### STUDYSOUP CANCELLATION POLICY

All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email support@studysoup.com

#### STUDYSOUP REFUND POLICY

StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here: support@studysoup.com

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to support@studysoup.com

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.