by: Anna Ballard

# Psy 202, Week 2 Notes Psy 202

These are detailed notes from week 2, including Chapters 2 and 3. There are also notes on how to do work using the statistics program "R".
This 7 page Class Notes was uploaded by Anna Ballard on Tuesday September 6, 2016.

Date Created: 09/06/16
Lecture 3 8/29 Ch. 2 ––> Describing Quantitative Data w/ Frequency Distributions To make a collection of Data… - Put the data in some sort of order - Frequency tables –> organize in descending order and count each group o Also works for smaller sets of data Simple Frequency Table - Shows number of times a piece of data shows up Y F(Y) F(y) –> raw scores 75 2 74 3 73 4 … … 57 0 Include all possible values between high … … and low! Use 0 for values not in table 18 7 Group Frequency Distributions • Grouping values –> used when there is a large range of data • Groups we are dealing with (intervals) - How many intervals - How wide should the intervals be? AKA how many values • How many intervals should we have? - Between 10 and 20, depending on distance between low and high values • How wide should our intervals be? - 2, 3, 5, or a multiple of 5, depending on the number of values used and subsequently, intervals. Grouped Simple Frequency Table • Evenly spaced, includes all values between high and low; inclusive - f(y) –> cumulative frequencies of raw scores with each interval Relative and Cumulative Frequency Distribution • How much is “a lot” –> must be relative to something f(y)/n ––––> n refers to the total number of scores - rf(y) = f(y)/n ….. 14/150 = 0.093 - always round to at least 2 decimal places - rf(y) always add up to be about 1.00 Grouped Cumulative Simple Frequency Table • Start with “n” and subtract f(y) as you descend - simple frequency and grouped cumulative should match up at the end - should end with last f(y) Histograms (simple and relative frequencies) • Great graph when there is a lot of data • X-axis –> values of raw scores - values of raw scores in ascending order (either ungrouped or grouped) • Y-axis –> frequencies (either simple or relative) • Vertical bar for each group (and touching)… compares scores for us Frequency Polygons (Similar to Histograms) • X-axis: still values from lowest to highest) • Point for each f(y) value • Connected points suggest values for scores on a continuum - points just outside of range touch the x-axis Lecture 4 8/31 Ch. 2 cont’d OGIVES • X-axis still • Y axis –> cumulative frequencies (simple or relative) - highest cf(y) is always equal to n (simple) or 1.00 (relative) - points not necessarily connected by lines Stem and Leaf Displays • Intervals: ex: 16-20, 21-25, 26-30, etc.. - shows shape and distribution and individual scores 7 5 5 Ones column with scores in 7 0 0 0 1 1 1 1 2 3 3 3 3 ascending/descending order 4 4 4 & Each score in each interval 6 0 1 2 2 2 2 3 3 3 3 4 displayed 5 5 6 6 6 6 8 8 8 8 9 9 ex: the number “62) shows up 4 times in the data, whereas the number 55 shows up once • intervals divide evenly into 10 Salient Characteristics of…. 2 characteristics (and a couple other important ones) Kurtosis Skewness (peak) Central Tendency Measur Central Tendency – 1 score that summarizes entire distribution Variabili ty - how accurate central tendency is –> smaller number = higher dependency of CT Measure of Variability – numbers further away from central tendency – tells us how accurate CT is –> smaller number = higher number of dependency o if most scores are around the same number as CT, higher dependency o if most scores are way greater than or way less than, and just average to be CT, there is lower dependency Skewness – how symmetrical distribution is (if it leans 1 way or the other… our example leans more to the right) Kurtosis – how curved… high kurtosis –> scores increased and lower than point; lower kurtosis is flatter. - Want kurtosis to be fairly average (not too low, not too high) Computer Analysis and learning R • Object oriented Program - Be specific –> tell it exactly what you want it to do (set.color(RED)) - Type in correct function - Arguments: give as much info as possible • R makes copies of original scores - Can tell one to do something - Never need to put a code in twice RED – protocol before • For group simple frequency table (and OGIVE and Histogram) GREEN – what to interval width –> how many you want in each interval change so R knows what you’re talking - Do not use 10! Look at rules from lecture 3 Type Protocol To Use How To Use It Ungrouped table(variable) table(scores) simple frequency table Grouped table(cut(variable,breaks=seq(lowest value- table(cut(scores,breaks=seq(low- simple 1,highest value,by=interval width))) 1,high,by=2))) frequency table Relative hist(variable,prob=TRUE,breaks=seq(lowest hist(scores,prob=TRUE,breaks=seq(low- frequency value-0.5,highest value+0.5,by=interval 0.5,high+0.5,by=2)) histogram width)) lines(density(scores)) with density distribution line Grouped plot(cumsum(table(cut(variable,breaks=seq(lo plot(cumsum(table(cut(scores,breaks=se OGIVE plot west value-1,highest value,by=interval q(low-1,high,by=2)))), type="o") width)))), type="o") Lecture 5 9/2 Ch. 3 –> Describing Quantitative Data with Summary Statistics Intro • Kurtosis • Skewness • Central Tendency • Measure of variability Measures of Central Tendency What score best represents the distribution Mode – which score that has highest relative frequency (aka the score that occurs the most) 0 2 2 2 3 4 4 5 7 8 9 - Our class does not like mode because it could be far away from the central tendency - There can also be more than one mode - Mode does not consider anything else in a distribution - Only use mode when you absolutely have to because it only gives us info on category membership Median – score that has 50% distribution below and above 0 2 2 2 3 4 4 5 7 8 9 - If 2 different scores straddle median… average the 2 - We like this more than mode because it tells us about rank - Missing info: does not give us distance between scores Mean – preferred because it includes the most information (category, rank, and distance) - Incorporates how much the scores weigh - The average of all the scores n µ = ∑ Yi/n µ –> population mean i = 1 –> start with this score n (above ∑) –> finish with this score Y –>raw score; sum all of these scores –> inclusive The Mean as a Balance Point n Deviations from mean (if negative it means ∑ (Yi - µY) = raw score is below or to the L of mean) 0 ** Deviation Scores will Always Sum to 0 ** i Y µY Y - µ 1 1 4 3- The Mean as a Least-Squares Estimate • Look for squared errors –> the one that 2 2 4 -2 gives you lowest square errors is the one that gives the best estimate 3 3 4 -1 - mean gives you the smallest square errors 4 4 4 0 5 10 4 6 Total error when Properties predicting that Yi = µ of the ∑ 20 20 0 Measures of Central Tendency Mode – use for things that are categorical Benefit: applies to all data (not always preferred because gives least amount of info) Median – used for anything that can be used in rank order Benefit: usable with all shapes (symmetrical or asymmetrical distribution) Not preferred because no distance info but better when there are outliers Mean – can give distance info (along with categorical and rank info) Benefit: uses all available info but is most sensitive to outliers We like mean because it can give us info about populations Outliers –> score far from what rest of scores are. Variability Vs. Ç√ Ç√ 20 40 60 35 40 45 More variability –> less confident The Range • Everything between highest score and lowest score (subtract low from high) - super duper sensitive to outliers What is similar to range but gets rid of outliers? |––––––I––––––|––––––I––––––| low high Chop off lower and upper 25% - interquartile range - gives true representation on how spread out the data is

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

