# Study Guide Test 1 STA2122

FSU

GPA 4.0

This 5 page Study Guide was uploaded by Stefanie Villiotis on Sunday October 4, 2015. The Study Guide belongs to STA2122 at Florida State University taught by David Lester in Fall 2015.

Date Created: 10/04/15

Study Guide Test 1 Sunday October 4 2015 823 PM There are two types of variables Categorical Nonnumerical 2 Quantitative Numerical observations that can be used in calculations Sample Size percentage of the total population Distribution The distribution of a variable tells us what values the variable can possibly take and how often it takes these values Distributions can be set out in tables charts graph Definition 42 Percentiles Percentiles are 99 numbers that partition a data set into 100 approximately equal parts percentiles are labeled Pl through P99 Pl has 1 of the data less than it PZ has 2 of the data less than itP99 has 99 of the data less thanit For a particular 39pth39 percentile it divides the data set into two parts The first part contains at least p of the data and the upper part contains at least 1 OO p of the data Approx 1 of data is between any two consecutive percentiles 2L 100 p desired percentage i n number of values in the data set Quartiles are three numbers that partition a data set into four approximately equal parts quartiles are labeled Q1 Q2 and Q3 Q1 P25 Q2 P50 ll Q3 P75 A Al ADDFOX Zbuo OT data are between any JEWO COHSGCUUVG quartiles approx 25 of approx 25 of approx 25 of approx 25 of mm observations Q1 observations Q2 observations Q3 observations max Constructing a Boxploti Order the data from smallest to largest smallest at the left end sample size EVEN sample size ODD Q2 nd the overall median the middle value Q2 nd the overall median mean of the two middle values left half of the data set half of the data set excluding Q2 Q3 nd the median of the right half of the data set Q3 nd the median of the right half of the data set excluding Q2 Q1 nd the median of the J Q1 nd the median of the left Range R Distance between highest and lowest data values R maxmin QR Interquartile Range IQR Q3 Q1 Rule of Thumb for identifying outliers Lower fence Q1 15 X QR and UpperFence Q3 15XIQR LEFT SKEWED RIGHTSKEWED P UNIFORM A l BELL SHAPED Variance s2 and Standard Deviation s Zxi f2 xi 22 x2 x2 x x2 n 1 n 1 2 1 xi if 11 2 Sample variance s Sample stan dard deviation sample variance s x JC2 x2 i2 x x2 n l Variables in Statistics Symbol Name Population Parameter Sample Statistic generally Latin generally Greek letters letters n Size number of sumac N quot N quot often denotes total sample size when more than one sample is involved Mean average 1 i Variance 0392 s2 Standard Deviation 039 3 Proportion percent 7 P Correlation 0 r Standardized Score zscore 10 I 2 S data value meanstandard deviation Negative 2 implies data value lt mean and positive 2 implies data value gt mean Same shape as original distribution Measures distance from mean higher absolute value indicates the data value is furtherfrom the mean The mean always standardizes to zero Mean of zscore is always 0 Standard deviation of zscores is always 1 The Continuous Uniform Distributions U or 3 height Data values range from lower limit or to upper limit 3 Distribution is symmetric so mea n median or 32 Height is pretty much fx l n equal outcomes 1 Height a so 3 oc v12 Width 3 or Expected Value x is the mean Total Area 1 Probabilityx X2quot X1l3 or The Normal Distributions Nu0 l A Mean median M Mean 0 Standard Deviation Empirical Rule 68 of all observations fall within 1 standard deviation of the mean 95 of all observations fall within 2 standard deviations of the mean 997 of allobservations fall within 3 standard deviations of the mean 34 34 uBG uZo uo u uo u20 u30 Causation Association Correlation Dependent Variable Response Independent Variable Predictor Explanatory influences responses Lurking Variable not included in the study but has an effect on a variable If the data is obtained only by random sampling and a scatter plot or a statistical calculationanalysis indicates that there is an association between the variables it is not proof of a causeandeffect relationship between them Regression LSR LeastSquares Regression Line line of best fit Notation y b0 b1X gt m b1 b b0

