Exam 1 Study Guide - Psychology 211
Exam 1 Study Guide - Psychology 211 Psychology 211
Popular in Psychology 211-003 Elem Statistical Methods
Popular in Psychlogy
This 32 page Study Guide was uploaded by Kennedy Patterson on Sunday March 8, 2015. The Study Guide belongs to Psychology 211 at University of Alabama - Tuscaloosa taught by Andre Souza in Spring2015. Since its upload, it has received 257 views. For similar materials see Psychology 211-003 Elem Statistical Methods in Psychlogy at University of Alabama - Tuscaloosa.
Reviews for Exam 1 Study Guide - Psychology 211
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 03/08/15
Exam 1 Study Guide 1 11 12 13 14 15 16 17 18 19 What is Statistics a The science of learning from data measuring controlling and communicating uncertainty b Uncertainty and Variation Provides Methods for a Planning how to collect data for research studies b Summarizing the Data c Making predictions based on the data Science of quantifying and understanding Variation brings uncertainty Statistics does NOT prove anything it makes you THINK a Helps make decisions when faced with uncertainty Heterogeneity is universal The amount of variation that one expects by chance The amount of variation is BIGGER than the variation you would expect as normal 9 statistically significant a No more variation than what one would expect 9 not statistically significant Do not say not important or insignificant 10 Fundamentals a Why do some things vary more than expected i Something an explanatory variable is influencing changing the variation of whatever we are observing response variable ii Response variable variation you are trying to understand iii Explanatory variable variable that you think is influencing the variation of the response variable Variable something that takes up different values a Not a value itself but the label that this value has Descriptive Statistics summary of the information in a collection of data what the data has to say about the phenomenon Inferential Statistics provides predictions about a population on the basis of a sample use statistics to infer values of parameters Population total set of units of interest Sample subset of population of interest Parameter number that summarizes a population Statistic number that summarizes a sample Random Sampling each member of population has an equal chance of being selected to be part of a sample Relationship vs Difference Exam 1 Study Guide a Relationship taking two groups and contrasting b Difference comparing two different groups Universe Population Sample Mean Age Angelina Jolie impossible parameter Random lady that looks like Angelina Mary statistic that represents parameter Population 9 Sample Parameter 9 Statistic 1 Decision Tree Graphical representation of decision involved in the choice of statistical procedures 2 Measurement Data quantitative Data data obtained by measuring objects or events 3 Categorical Data frequency data count data Data representing counts or number or observations in each category 4 Sample set of items of interest drawn from population 5 Random Sample sample selected so that every member of the population has an equal chance of being included in the sample 6 Statistics numerical values calculated from data in a sample intended to summarize the data 7 Descriptive Studies methods of organizing summarizing and presenting data 8 Measurement data data values that represent measurements of object or events 9 Sample of Convenience sample selected because it is easy to obtain such as a sample of volunteers 10Variability how much a value differs across different elements in the population or sample of interest 11Parameter number calculated from data in a population that quantifies a characteristic of the population 12Inferential Statistics methods of using sample data to make inferences about a population 13Statistics numerical values calculated from data in a sample intended to summarize the data 14Quantitative Data another term for measurement data 15Categorical Data data values that indicated membership in a particular category When summarized it consists of counts or frequencies Exam 1 Study Guide 1 How do you guarantee the sample is a presentation of population a Random sampling b Why use random sampling If there is no accurate representation results wont be accurate 2 Parameter representation of population number 3 Statistic summarizes sample number 4 Descriptive Statistics UniversePopulation a Sample Mean Sample S Variants of Mean Age i Sample S2 5 Parameter Estimation in the absence of all cases gt from a population we need to make inferences 5 Drlnklng 5 about the population parameter based on a sample 3 explanatory 6 Roles of Variables independent Drinking a Response dependent variable variation Influencmq the interested in yaXis 7 Types of Variables a Discrete can only take on specific values categories ex siblings b Continuous can take any real number value eX reaction time 8 Categorical vs Quantitative a Categorical variables characterized as a set of categories i Nominal two or more categories or levels 1 Name things 2 No order 3 Ex transportation ii Dichotomous two categories or levels 1 No order 2 Ex yes or no iii Ordinal two or more categories 1 Order or Rank things 2 Has order 3 Ex social class iv Cannot Calculate v All categorical types are discrete and can only assume a specific number of values 1 Nominal dichotomous ordinal b Quantitative variables characterized by a numerical value Exam 1 Study Guide i Interval numerical values in which the intervals between the values are assumed to be the same 1 Equal intervals represent equal differences 2 Ex professor s annual salary ii Ratio numerical values with a meaningful zero point ex height Zero represents the absence of the variable 1 Allow us to use phrases such as half as much iii Can Calculate 5 ugly 5 beauty n Interval not the absence of beauty Ratio absence of not caring to rate 1 Ordinal Scale the values of data measured on this scale can be a number or name but can be rank ordered 2 Interval Scale the values of data measured on this scale can be rank ordered The differences between two adjacent ranks are equal 3 Nominal Scale the values of data measured on this scale are labels or names 4 Ratio Scale values measured on this scale can be compared such that you can say one value is twice as big as another value 5 Random Assignment the allocation or assignment of participants to groups by a random process 6 Sigma Z Greek letter symbol indicating summation 7 Summation notation a 2X sum of the x s add up or sum what follows b N all the values c 1 starting with the first value N 2X 1 d ZX2 sum all x s then square e 2X2 square all the x s then sum 8 Discrete or Continuous Examples a Height in Inches Continuous Number of pets in household Discrete Pounds of Chocolate consumed in past year Continuous Number of countries ever visited Discrete DQ90 Shoe Size Discrete Exam 1 Study Guide 1 Data Summary a End up with a lot of numbers i Put numbers in data table 9 graph 9 summarize ii Identify center of data any unusual feature spread amp shape b Categorical variable that classifies into categories i List the categories and show the frequency number of observations in each category 2 Frequency Distribution listing of possible values for a variable together with the number of observations at each value a Example i Survey about students cell phone Model and SMS Usage 1 Variables model 6 models 2 Categorical list of variables Model Frequency Relative Frequency iPhone 4 2 014 iPhone 4s 1 007 iPhone 5 3 022 Galaxy SII 3 022 Galaxy SIII 4 028 Nokia 1 007 Total 14 1 3 Relative Frequency proportion percentage of the observations that fall in that category 4 Quantitative Variables frequency distributions are also useful for quantitative variables Intervals Frequency Relative Frequency 1 50 7 05 51 100 2 014 101 150 1 007 151 200 2 014 201 250 1 007 251 300 0 0 301 250 1 007 Proportion Type of Ratio Ratio Number Proportion Frequency Number Total 5 Outliers extreme observation that falls far from the rest of the data a Such observations are troublesome to many statistical procedures Exam 1 Study Guide b Cause exaggerated estimates and instability c Important to identify extreme observations and examine the source of the data more closely d Reasons underlying extreme observation i There was a typo ii Not meant for study iii Indicates deeper trend or phenomenon 6 Graphics good amp bad a Explore patterns b Visualization 7 Scatterplots a Example relationship between number of emails and 3 grade on exams E5 i Number of emails Quantitative explanatory 6 ii Grade on exams Qualitative responsive En a 4 R 8 Categorical 42 a Example relationship between beauty g and high grades 8 i Grade numeric Beautv 9 Bar Graph Histogram a Categorical Data b Histogram graph in which a rectangle is used to represent frequencies of observations within each interval i Relative frequency distribution density for a quantitative variable 10Real Lower Limit Point halfway between the bottom of one interval and the top of the one below it 11Real Upper Limit the point halfway between the top of one interval and the bottom of the one above it 12Midpoint center of the interval average of the upper and lower limits 13Bar Graph frequency of occurrence of each value of x 14Vertical Axis ordinate yaxis 15Horizontal Axis abscissa xaxis 16Skewness measure of the degree to which a distribution is asymmetrical a Positive distribution that trails off to the right b Negative distribution that trails off to the left 17Stem and Leaf Display graphical display presenting original data arranged into a histogram Exam 1 Study Guide 18Exploratory Data Analysis EDA a set of techniques developed by Turkey for presenting data in visually meaningful ways 19Leading Digits most significant digits leftmost digits of a number 20Stem vertical aXis of display containing the leading digits 21Trailing Digits less significant digits digits to the right of the leading digits 22Leaves horizontal aXis of display containing the trailing digits 23Guidelines for Plotting Data a Supply a main title Always label the axes Try to start both the X and Y aXis at 0 zero Pie charts Don t use very hard to read accurately 0090 Try to never plot in more than two dimensions f Don t add nonessential material 24Distributions a Symmetric having the same shape on both sides of the center Bimodal a distribution having two distinct peaks Unimodal a distribution having one distinct peak 090 Modality the number of meaningful peaks in a frequency distribution of the data 2 Negatively Skewed vs Positively Skewed StemandLeaf Display Row Data Stem Leaf O 1 1 223444555667777889 O 01122344455566677778899 9 1 0111222333334445555556666666666 2 777888899 10 11 11 11 12 12 12 13 13 13 13 13 14 3 00112233444455667889 14 14 15 15 15 15 15 15 16 16 16 16 16 005 16 16 16 16 16 16 17 17 17 18 18 18 18 19 19 20 20 21 21 22 22 23 23 24 24 24 24 25 25 26 26 27 28 28 29 3O 3O 35 Exam 1 Study Guide 1 Dataframe an object with rows and columns a Rows different observations b Columns contain the values of different variables i Values can be numbers quantitative or text qualitative 2 Example What affects student s grades i Age H Gender iii Relationship status iv Hours spent on Facebook v Number of friends on Facebook vi Drinking habits a 6 columns b X amount of rows c Cannot have Male Female data I ID Yes No 3 Central Tendency more than 1 column to represent Wrong a Everything varies yet measurements often cluster around certain values b Shows what the typical observation is 4 Sample statistics also cluster around central values 5 Calculate Central Tendency Sample Statistic Sample Statistic 2 2 a Variable x y etc always lower case Sample Statistics are not the same everything varies amp varies around central number i Example x age y relationship status b Subscript individual values i X1 X2 X3 c Individual values of the variable are represented by subscript i lnx 23 32 45 65 77 X1 23 X2 32 X5 d To refer to a single score without saying which one it is we use xi e Sometimes subscripts may be omitted f Most important symbol is sigma Z Exam 1 Study Guide i Z sum of everything that follows ii 2x sum of all the x s 1 Example if there are 30 values for X a x1 12 x2 45 x3 44xn 21 we say n 30 N 2X sum of all x s from i1 to in 1 Also written as ZX g 2X2 is different than 2X2 i Example X y Xy x 1112 1 20 20 2X 5 1 19 19 202 25 1 19 19 2 2 2 2 2 2 19 38 ZX 1112 7 6 Sample is represented by a Roman letter Z 7 Population is represented by a Greek letter J 8 Arithmetic Mean a x bar 139 b the mean is the sum of all the data points 2X divided by the number of data points n C Mathematically i A a ZXn n d Example 1112 f 54 e Answers the question if all the data points had the same value what would the value be IfI want all the data points to be the same and still have the same sum what would this number be f Example What ifI asked these people to give me all the money they have Carlos 250 Total 677 Kevin 153 1 ZXn a 6774 16925 Stephanie 76 Mean 16925 Mary 198 16925 x 4 677 Can replace all values with 16925 and will equal the same sum Only appropriate for quantitative variables LO h Mean is sensitive to outliers i Extremely large or small values will have an effect on the mean Exam 1 Study Guide j Example X3216 2226 I n 3 EX 6 ni ZX 32 6 Mean f 2 9 number to replace every data set to keep same total 9 Binary O and 1 Data a Mean equals the proportion observations that equal 1 b Example asked 9 women how many men they have dated in the past 12 months Boyfriend 0 o i ZXn 49 1 1 is recorded 4 times 1 9 is the total number 0 O 1 proportion freguency 4 1 total 9 O C Arithmetic mean is the only single number for which the residuals defined as the difference each data point amp the mean sum to zero i 2xii o X 3 3 2 21 1 1 2 21 1 1oo 2 2 2 2o i2 Residua0 10 Median middle score in an ordered set or data a Measure of central tendency b The score corresponding to the point having 50 of the observations below it when the observations are arranged in numerical order 11 Median location the location of the median in an ordered series a N 12 b Example32 34 35 35 36 38 38 42 42 42 44 44 100 Exam 1 Study Guide i Middle value 38 ii 13 12 7th number 9 median 38 C If the distribution contains an even number of observations there is no middle value i 12 12 65 9 between 6 amp 7 1 Average of number median 9 avg of 38 amp 38 38 12 Properties of the Median a The median like the arithmetic mean is appropriate for quantitative variables b Since it requires ordered data the median is also appropriate for ordinal variables c The median is not affected by outliers which makes it an appropriate measure for skewed distributions d The median is not very informative for discrete data that takes only a few values e Example The US Census asked how many relationships have you had in the last 12 months i Only 6 distinct responses occurred O12345 and 638 of them was 1 ii The median is 1 because more than 50 of the total was 1 13 Mean f the sum of the scores divided by the number of scores average 14 Trimmed Means mean after discarding fixed percentage of extreme observations a To calculate take one or more of the largest and smallest values in the sample set them aside and take the mean of what remains b For a 10 trimmed mean we would set aside the largest 10 of the observations and the lowest 10 of the observations the mean of what remained would be the 10 trimmed mean 15 Geometric Mean a For processes that change multiplicatively rather than additively arithmetic mean is not a good measure b Also indicates the central tendency but uses the product x instead of XX C x x x 30 9 x i Multiply the same numbers to get same product nxnx x d Example record the cumulative amounts of tweets written in last 6 months Month Number of Tweets Increase Rate July 132 August 158 1997 September 169 697 October 188 1124 November 221 1755 December 240 86 Exam 1 Study Guide 16 i How to find increase rate Take the first value subtract from the second value and divide by the first value 1 158 132 1997 132 e Percentage not independent of each other f Increase rate depends on previous month nxnx x 5V1997 x 697 x 1124 x 1755 x 86 1187 1187 Geometric Mean LO Arithmetic means are good for independent events scores on a test h Geometric Means are good for the numbers that are not independent of each other percentages i Arithmetic mean is the number that could possibly replace all scores and still keep the same sum i nAX The same logic applies to the geometric mean k It represents the number that can replace all scores and keep the same product i Axn x Mode the most common value highest region of distribution a Commonly used with highly discrete variables such as categorical variables through it is appropriate for all types of data b pX mode gt pX any other score c Example number of girls dated in the past 12 months i Mode quot2quot amp Mexico 1 2 is in the chart 3 times 2 Mexico has the most number of girls lVannaHtv BrazH Russia IncHa Mexico Peru Cdumbm Lh ted States Canada Number of Girls NLNGVNI n Exam 1 Study Guide 1 Measures of Variability a Statistics is about variation b What is variation c The greater the variability in your data the greater your uncertainty i Uncertainty about what Uncertainty about the parameter that you want to estimate d Example i Wrote down the amount of money spend on drinks for 11 consecutive weekends Weekend Amount zxL 1375129156119712 13 n 11 7 5 12 964 9 15 6 11 9 7 I LI LLDOOIOU39Ilgtwl L HQ 12 2 What is variability a Variability provides a quantitative measure of the differences between scores in a distribution b It describes the degree to which the scores are spread out or clustered together c Variability measures how well an individual score represents an entire distribution d What is the single number that represents the amount of dollars I spend on drinks i 964 e Depending on variability the mean is a good representation of the entire distribution 3 Range a Range is the distance covered by the scores in a distribution from the smallest score to the largest score b The range is the distance between the minimum and the maximum values c One important characteristic of the range is that only two data points contribute to it Exam 1 Study Guide 4 Using all data points a What ifI want all data points to contribute to the variability measure i We find the mean and how far from the mean each data point it b The distance between each data point and the mean Xi 5 Residuals a The longer the residual lines the more variability in the data b Why is high variability a bad thing i Variability brings uncertainty c The higher the variability the higher the uncertainty d What is the best way to come up with one number to represent all the residuals i We can average all residuals e What happens if you sum all the residuals ZX391f The sum of all residuals will be zero or close to zero due to rounding error This happens because the positive residuals cancel out the negative ones Weekend Amount Residual 1 13 336 2 7 264 3 5 464 4 12 236 5 9 O64 6 15 536 7 6 364 8 11 136 9 9 O64 10 7 264 11 12 236 1 Sum Residuals answer is ALWAYS zero How can you find the mean for the residuals if the sum of the residuals will always be zero The solution is you need to get rid of the negative signs 0 You can get rid of the negative signs by using absolute values zuxri I 0 Alternatively you can square all the residuals zuxri 2 2 Residuals distance between each data point and mean 2xi o 3 Residuals for population Xi39IJ 4 Square residuals to get rid of negative sign 2Xi391f2 5 Sum of squared residuals get most important measure of variability in statistics Exam 1 Study Guide a SS sum of squares sum of deviation 6 Example How many glasses per week 1O61 8 ix 8 i 2 14161 22 2xi 2 22 SS 2X2 A n X X2 1O618 1 1 ZX8 SS38 L2 0 0 1O36138 4 i 6 ZX238 ss22 Number of Residual Squared Glai ses 1 ReSilclual 7 Important of SS 0 2 4 a Represents the total 6 4 16 variability 1 391 1 b What units i Glasses of wine SQUARED ii SS is always the squared of whatever original unit c What happens to SS if another data point is added L X2 Answer is the same because the data 33 llg2612 10 point added was the mean Li 6 36 0 If you added a new data point of 5 1O3614 42 answer would be 29 increase 2 4 ss 4210L2 22 If you added a new data point of O 5 d The more data points added to the set n the bigger the SS e More numbers bigger total variation more uncertainty To avoid have to make the SS not dependent on sample size g Divide the SS by sample size f f mean squared deviation n When you add data 9 increases II Also increases n Exam 1 Study Guide Average of Squared Deviation zx i n ZXi 2 total variation average variation n 8 Degrees of Freedom the number of independent pieces of information remaining after estimating one or more parameters a Example Sample A with 5 numbers n5 and mean i4 i What is the sum of all 5 numbers aZX94ZX95420 n 5 ii 51O1Z 20 1 Za na 54 20 iii What are the 5 numbers 1 n 1 2 Degrees of Freedom df 3 Cannot choose last number freely 0900 Cannot 0000 Cannot Samples Samples choose 0 choose u u n3 g zzff degrees of freedom sample size number of df n 1 parameters estimated from the data 1 Variance a SS Sum of Squares b SS 9 measure of total variability C SS increases with sample size CI SS for population 9 population mean u 9 have to estimate because you don t know u n1 lose degree of freedom n1 by estimating 9 gives us variance most common measure of average variability zzff 9 not estimating because it is population 9 don t lose n degree of freedom Variance Population ZX2 Exam 1 Study Guide S2 ZXZ n n 1 e Will be used to measure reliability of an estimate i Used to calculate confidence intervals ii Hypothesis testing iii In squared units 9 S2 sample 1 Population o 2 Difference between a ZXi 2 sum of squares total variability b total average n 1 3 Statistics all about VARIATION 4 Example How many texts were sent at 3am a What is the average variation Number of texts Squared Scores 2X2 4 16 2 2 6 36 S ZX n 5 25 391 11 121 s2 386338 685 textsz 7 49 7 9 81 7 49 685 262 3 9 9 262 avg of residual distance between number and mean Standard Deviation ZX2 S 2X2 n n 1 Total Variation Average Variation Average Variation 2xi 2 s2 zxi 2 s zzff n 1 n 1 in sq Units V in sq units Same thing Exam 1 Study Guide 5 000 10 11 12 13 14 15 II Standard Deviation of sample Lower case s a Population 0 Can never get a negative number for standard deviation The more variability around the mean the larger the S Standard Deviation the average distance between the data point and mean Can get 0 as the standard deviation when all the numbers have the same value a Example number of friends or 20 9 939 Cl CD you hang out with a week F ends 7 oooooo 4 Bias the property of a statistic whose longrange average is not equal to the parameter it estimates Expected Value E the longrange average of a statistic over repeated samples Boxplot a graphical representation of the dispersion of a sample Box and Whisker Plot a graphical representation of the dispersion of a sample Quartile Location the location of the quartile in an ordered series Median Location 1 2 Whisker line from top and bottom of the box to the farthest point that is no more than 15 times the interquartile range from the box Exam 1 Study Guide Probability Distributions 0 Variable can take at least 2 different values Random values 0 Each possible value has a probability that it occurs 0 A variable is discrete if the possible outcomes are a set of separate values 0 A variable is continuous if the possible outcomes are an infinite continuum 0 Probability distribution lists all possible outcomes and probability Discrete Probability Distribution 0 Example how many dates does one go on before first kiss y possible answers Py probability OneO a y Py Three 1 s O 3901 Sixty 2 s 1 03 T Th 3 2 60 wenty ree s 3 23 Twelve 4 s 0 1 2 3 4 5 4 12 One5 y 5 01 Continuous Probability Distribution 0 Example How long it takes a girl to get rid of an annoying guy 1 minute high probability 132145 minutes low probability 339487 minutes low probability 0 Possible outcomes are continuous o The probability of occurrence of any value between 2 values is very small 0 A probability distribution for a continuous variable provides probabilities for intervals of numbers Pylt 10 or Pygt25 0 Smooth continuous curve Area under curve gives probability of numbers between those intervals Standard deviation average distance from the mean Exam 1 Study Guide Normal Distribution 0 Some probability distributions are important because they approximate the distance of variables in the real world 0 Others are important because of their uses in stat inference Normal Probability Distribution is important for both 0 Symmetric bellshaped the mean u and Standard Deviation is o IJ6 o2 024681012 0 Probability within any particular number of standard deviations of is the same for all normal distributions 34 6 02 Probability of a number above 6 is 50 below 6 is 50 n 7 A a Q 1 17 Probability of a number p 73 between 6 amp 8 is 34 a 2 34 Probability of a number above 73 is 50 below 73 is 50 Probability of a number 67 69 71 73 75 77 9 1 9 1 600 Every normal distribution has a mean of 0 Two standard deviations one 0 False can be ANY number above and one below the mean 0 Standard Normal 2 9 2 95 0 Standard Two above and two below the Deviation 1 mean ALWAYS 3 9 3 997 0 Mean O Three above and three below the ALWAYS x mean 3 2 1 0 1 2 33 0 9 3 50 3 9 o 50 Exam 1 Study Guide Z Scores O O The whole area under the curve is 1 To know any probability for any normal distribution we need to compare it to the standard normal distribution This will transform the values in the original normal distribution into values in the standard normal distribution Values in the standard normal distribution are called zscores Because we know that the standard normal distribution has a mean of zero and a standard deviation of one we can convert any value of X into a z score by Z LI 0 M a 140 150 160 170 180 190 200 3 2 1 0 1 2 33 X 180 180170 1 standard deviation u 170 10 o 10 160170 1 standard deviation 10 167170 3 standard 170 a 10 Taler than 170 50 Height of 140 or 200 not likely 10 deviations below the mean 3 6179 on ztable 6179 o 140 150 160 170 180 190 200160180 very likely 9 Probability of someone J Exam 1 Study Guide Left Right Probability of 13 amp below a use Leftznegative gt 5 g probability of 13 amp above 0 use Rightznegative r1 probability of 13 amp below 02 Use Leftzpositive 39 U 3 probability of 13 amp 1 use Rightzpositive Example The average grad in the class is 73 a girl got a 65 and the standard deviation is 13 What is the variance 0 Mean 73 0 Standard deviations 13 0 Her grade 65 0 Z o Variance 132 169 z x 56 Example what is the top 5 7 o 56 o 039 87 950 O O o O 2 0 500 100050 950 I 95 on ztable 165 56 x 165 x 56 87 What about the bottom 5 0 zx 56 50 87 v v Make 165 negative x 56 165 x 56 87 Exam 1 Study Guide Properties of the Normal Curve 0 Mean 146 0 Standard Deviation 35 413 3413 1 9 l3 22150 15 013 013 3 2 1 O 1 2 3 41 76 111 146 181 216 251 o What is the score 3 standard deviations above the mean Lower limit 251 o What is the score 3 standard deviations above the mean Upper limit 41 o The probability that a student will have a score between 41 and 251 is 9974 100 13 13 9974 0 A score of 181 is one standard deviation above the mean As a result the percentage of students with scores below 181 is 8413 100 13 215 1359 8413 0 You can infer that 9772 of the students have scores above 76 Terms related to z scores and the normal distribution 0 T Scores set of scores with mean of 50 and standard deviation of 10 0 Z Score number of standard deviations that a score is above or below the mean 0 Linear Transformation transformation involving the addition subtraction multiplication or division of or by a constant 0 Standard Normal Distribution normal distribution with a mean equal to O and a standard deviation equal to 1 it is denoted as NO1 o Ordinate vertical Y axis 0 Standardization process of computing standard scores 0 Standard Scores scores with a mean of O and a standard deviation of 1 Exam 1 Study Guide Probability A random experiment as opposed to a deterministic one know the outcome is one in which the outcome is determined by chance 0 0 Random outcome by chance Deterministic outcome known For a random experiment E the set of all possible outcomes of E is called sample space For instance for a cointoss experiment S the possible outcomes are head or tail Then S H T For a random experiment eg cointoss the possible outcomes are known but it is uncertain which one will occur For a particular possible outcome for a random phenomenon the probability of that outcome is the proportion of times that outcome would occur in a very long sequence of observations Example If you toss a coin 1000000000000000000000 times about half of 0 Proportion frequency this will be tails and about the other half will be heads 1 50 Total 2 Why don39t you always get EXACTLY 5050 Everything varies 0 Example You went to a restaurant this weekend there were 1500 girls you decided to ask each girl if they wanted to kiss you What are the possible Response Frequency Proportion outcomes eg sample No 973 064866 NO Okay now leave 4 00026 Total 1500 1 Are you kidding me Okay now leave that wanted to kiss you Basic Probability Rules Let PA denote the probability of a possible outcome denoted by the letter A O O probability that this event does not occur is 1 PA Example If you randomly choose one of these girls how likely are you to pick one Pnot A 1 PA if you know the probability of an event then the P6 017 Pnot 6 1 P6 Pnot 6 1 017 Pnot 6 083 Exam 1 Study Guide If you go out tonight the probability that you will find the girl of your dreams is PDG 0003 The probability of not finding the girl of your dreams if you go out tonight is 1PDGO997 0 Let PA denote the probability of a possible outcome denoted by the letter A 0 Let PB denote the probability of a possible outcome denoted by the letter B 0 Let A and B be mutually exclusive outcomes PA or B PA PB Example if you go out tonight the probability of finding a brunette woman is PBrunette 023 and the probability of finding a blonde is PBlonde 013 What is the probability of finding a blonde or brunette PA orB Sum ofA and B PA or B 023 013 036 0 Let PA denote the probability of a possible outcome denoted by the letter A 0 Let PB denote the probability of a possible outcome denoted by the letter B 0 Let A and B be independent outcomes PA and B PA x PB ONE DOES NOT AFFECT THE OTHER Example if you roll 2 dice what is the probability that you will get a 1 and a 2 P1 017 and P2 017 P1 and 2 017 x 017 003 Example Students 70 Exam 1 grades 82 Exam 1 standard deviation 2 You got a 76 How many people made a 75 or above 0 Z Z 76 76 82 3 2 3 on ztable 0013 1 0013 9987 9987 74 76 78 80 82 84 86 88 90 9987 made above a 75 Example u 260 o 10 Exam 1 Study Guide Hypothesis Testing Statistic as a random variable 0 O Statistic number that describes a sample Inferential Statistics make inferences about population parameters based on sample statistics The sample statistic is the it is our best guess of the population parameter The sample statistic is a random variable I Different samples of the same population will have different statistics Example How conservative is the state of Alabama 0 You asked 5 students to each as this question to a sample of 50 people how conservative is Alabama using a scale from 1 not very to 10 vary conservative Each person collected the data and calculated the mean for their sample 658 I Sophia 526 I Naomi 427 I Madeleine 319 I Tiffany 740 Central limit theorem is the answer for I Kendra I Who has the most accurate mean I Which one of these 5 means is closer to the actual population mean I How do we measure weather each mean is a good representation of the population parameter Central Limit Theorem 0 Given any population the means of random samples together will converge to a normal distribution Regardless of the distribution of the underlying population random sampling of the mean approximate a normal distribution given a large enough sample size Normal Distribution Mean of sample distribution is the mean of the population u A I M l Exam 1 Study Guide 0 Given a large enough sample random samples of the population mean will coverage to a normal distribution and the mean of the sampling distribution will coverage to the true population mean Sampling Distribution Mean 0 Expected Value of a variable is the weighted average of all possible values the variable can take EX ZXPX u 0 Sampling Distribution Mean i lZXi n Sampling Distribution Variance O O The sampling distribution of the mean is a probability distribution As such it has a central tendency which equals the population mean and has some variation How do we measure the variation ie the spread of a distribution Variance I Standard deviation The Sampling Distribution of the Mean 0 O O The mean of the sampling distribution of the mean equals the population u The standard deviation of the sampling distribution of the mean is 039 W The sampling distribution standard deviation is known as standard error of the mean As n increases the sampling distribution variance decreases that is the precision of the estimate of the statistics increases Simulation What happens in real life 0 Q We collect only one sample We calculate the standard error of the mean using our sample standard deviation and the sample size The standard error tells us how much a sample mean would vary if we took several samples of the same size from the same population Remember the sampling distribution is normally distributed Sample means can be converted to zscores z 0 Sampling Distribution amp Sampling Error Exam 1 Study Guide 0 To understand hypothesis testing you need to know the concepts of sampling distribution and sampling error 0 Sampling distribution represents the distribution of sample statistics 0 Sampling error represents the variability of those statistics from one sample to another 0 In statistics error means random variability and not a mistake or carelessness 0 Imagine a population A with 10000 people in it I Each person in this population has a certain number of siblings I For this population the average amount of siblings per person is u 543 543 H Error reSIdual Sampling Distribution 0 If we collect a random sample of 50 people from this population A we expect the mean of this sample to be as close as possible to the true mean u 543 0 Imagine that Lauren Andre Stephanie Molly Rebecca Anny and Karen decide to each collect a different sample of 50 people from this population A Each one will calculate the mean for hisher sample I Lauren f1 49 I Andre f2 54 I Stephanie f2 58 I Molly f4 61 I Rebecca f5 51 I Anny f6 68 I Karen f7 32 o The distribution of all these sample means f1 ELM is known as sampling distribution of the mean Sampling Error 0 Note that each person was off by a little bit compared to the true population mean u 543 o For each person i u I Lauren 49 543 053 I Andre 54 543 003 I Stephanie 58 543 067 I Molly 61 543 067 I Rebecca 51 543 033 I Anny 68 543 137 Exam 1 Study Guide 0 O I Karen 32 543 223 These differences are called sampling error Sampling distributions tells us what values we might expect to obtain for a particular statistic under a set of predefined conditions Note that this is conditional probability the probability of something happening if something is true Variance average variability Variance average of all the residuals Hypothesis Testing 0 The objective of science is to check whether a set of data agrees with a certain prediction This prediction is called hypothesis A hypothesis is a statement about a population A hypothesis is a prediction that a parameter takes a particular numerical value or falls within a particular interval A significance test uses data to summarize the evidence about a hypothesis It does this by comparing sample statistics point estimates of parameters to the values predicted by the hypothesis Example I In 2011 a local newspaper in Austin claimed that male professors make more money than female professors A hypothesis is a statement about the population I Let Wwomen be the mean salary of female professors and Wmen the mean salary for male professors Then the hypothesis is that Wwomen lt Wmen Hypothesis testing is really a systematic way to test claims or ideas about a group of population The objective of hypothesis testing is to determine the likelihood that a sample statistic would be selected if the hypothesis regarding the population parameter were true There are four basic steps to do hypothesis testing I Step 1 state the hypothesis I Step 2 set a criteria for decision aka choose an OlIEVEI I Step 3 compute the test statistic I Step 4 make a decision 0 Hypotheses 0 Every significance test has two hypotheses about the value of a parameter I Null hypothesis I Alternative hypothesis 0 The null hypothesis H0 is a statement about the value of a population parameter Exam 1 Study Guide The alternative hypothesis H1 or Ha is a statement that the population parameter falls in an alternative range of values than those stated in the null hypothesis Usually the null hypothesis states no effect In the example about the salaries of male amp female professors 39 H0 39 Wmen Wwomen 39 H1 39 Wmen 7t Wwomen Example 0 0 Someone thinks that men save more money because they are more responsible You think that women save more because they are awesome We know from the Census research that women save on average 103 of their monthly income with a standard deviation of 0 376 You asked 32 of your guy friends what percentage of their salary they save every month and the average was 1121 Question do men really save money differently than women Step 1 state your hypothesis I H0 39 umen JoOwomen I H1 umen uwomen Step 2 set the criteria for decision I 0 level 005 I Critical value 196 Step 3 compute the test statistic I Z Z 137 on z table 9147 a o 1 9147 0853 853amp of getting 1121 of higher 2 1121 103 091 137 376 066 V32 Step 4 Make your decision I Because the zvalue is not larger than zcritical we fail to reject H0 Exam 1 Study Guide Statistical Inference 0 We use a sample to make inferences about a population 0 Sometimes say that something is statistically significant I It means that a result was unlikely to have happened by chance I A result is statistically significant when it is unlikely that it has happened by chance I A result is statistically significant when it is unlikely that it has happened by chance if the null hypothesis were true 0 Significance o Hypothesis girls are smarter than boys 0 How would you test this I Select a group of 100 girls and 100 boys I Give each person an IQ test I The IQ score can be any number between 0 and 100 0 Calculate the average score eg the mean in the IQ for each group 0 Given the initial hypothesis what pattern of means do you expect 0 Which of the following patterns is more likely given the hypothesis I Boys 98 and girls 99 9 low probability if hypothesis is correct reject hypothesis I Boys 47 and girls 966 Good and Bad Hypothesis 0 A good hypothesis is one that can be rejected 0 A good hypothesis is one that can be falsified o The key idea is that the absence of evidence is not evidence of absence YOU CANNOT PROVE THAT A HYPOTHESIS IS RIGHT ONLY THAT IT IS WRONG o This is the main idea behind the null hypothesis It assumes that nothing is happening or that nothing special is going on 0 Example I Hypothesis there is someone standing in the hallway I Null Hypothesis there is no one outside you can prove wrong if someone is outside 0 Example I Hypothesis girls are smarter than boys 0 There is NO difference between boys and girls in terms of who is smarter I H0 M F 0 If B 47 and G 976 9 reject Exam 1 Study Guide Reject null accept hypothesis I Ho M F If B 98 and G 99 9 fail to reject Reject hypothesis The Null Hypothesis 0 The concept of the null hypothesis plays a crucial role in the testing of hypothesis 0 The idea is that we can never prove something to be true but we can prove something to be false 0 The null hypothesis gives us a starting point for any statistical test 0 If data do not show enough evidence to prove the null hypothesis wrong then we fail to reject it o If your data show enough evidence to prove the null hypothesis wrong then we reject it Example 0 You get a call that you got a girl pregnant 0 The null hypothesis is Karen IS NOT pregnant 0 You want to test this Karen does a pregnancy test 0 There are two possible outcomes for the test I Karen is pregnant 9 reject null I Karen is not pregnant 9 fail to reject null 0 But this is only a test the reality might be different Reality Pregnancy Test Not Pregnant Pregnant Positive False Positive Correct Negative Correct False Negative The same thing can happen with Ho 0 There are 2 possible mistakes we can make with our null hypothesis testing I We can reject the null hypothesis when it is true falsepositive Actual Situation Null Hypothesis True False Reject Type I Error Correct Decision Fail to Reject Correct Decision Type II Error I We can fail to reject the null hypothesis when it is false falsenegative oltlevel probability Reject When Type I Error decreases of making a Type I Error Type II Error Increases 50 I 73 I 85 When Type II Error decreases 11 Type I Error Increases Type II Error
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'