PSCH 343; Statistical Methods In Behavior Science. Week Two Notes
PSCH 343; Statistical Methods In Behavior Science. Week Two Notes PSCH 343
Popular in Statistics Methods In Behavioral Science
Popular in Psychlogy
verified elite notetaker
This 21 page Class Notes was uploaded by Katie on Sunday January 24, 2016. The Class Notes belongs to PSCH 343 at University of Illinois at Chicago taught by Liana Peter-Hagene in Spring 2016. Since its upload, it has received 20 views. For similar materials see Statistics Methods In Behavioral Science in Psychlogy at University of Illinois at Chicago.
Reviews for PSCH 343; Statistical Methods In Behavior Science. Week Two Notes
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 01/24/16
Instructor: Liana Peter-Hagene PSCH 343 Statistics Methods In Behavioral Science Week Two Descriptive Statistics: Measures of Central Tendency Distributions of scores Anger Scale Frequency Percentage 1 1 10% 2 2 20% 3 2 20% 4 4 40% 5 1 10% Instructor: Liana Peter-Hagene Jurors’ scores: (for the table above) 4 2 2 3 1 4 4 3 5 4 *Go back to the Frequency table example from last week – it shows how angry jurors were during a trial that showed them upsetting evidence. When we do frequency tables, we are setting up the data to create distributions of scores by learning how many people ended up having each score. Histogram - Graph that represents the information in a frequency table -The X-axis would have what? Frequency -The Y-axis would have what? Scores Then we take it a step further when we create a histogram – the graphical depiction of information from a frequency table – for each score, you draw a bar that indicates how many people – or the frequency of people – who had it. Instructor: Liana Peter-Hagene Here are a few other examples – as you can see, you can’t always trace every single frequency bar with the curve, but that’s ok – the curve captures the general trend of the data As you can see, there are many different types of distributions, some – most of them – are highest in the middle, but others are highest at the beginning or at the end, or even high in two separate points. For example, (red distribution), what if we’re measuring people’s love of zombie movies, and we get this distribution? Understanding distributions For example, what if we’re measuring people’s love of zombie movies from 1 – not at all, to 9 – Very much And we get each distribution? Team up with the person next to you to answer these questions, and write the answers down on your handouts: Instructor: Liana Peter-Hagene What would you conclude about how much people love zombie movies in general? Where are the most frequent scores? What if you get this distribution? B This is a little tougher – C Give them 2 minutes, then discuss as a class. (Call out on students) So, what we just did is –we looked at different types of distributions and interpreted what they mean – there is always information in these figures, you just have to learn how to get it out. So, this is what we call a normal distribution – write that down – and it’s normal because it has only one peak, and that peak is in the middle – which means what? (WAIT for answer) In a normal distribution, the two sides are identical – mirror images of each other Most scores fall in the middle of the distribution – most people are not at the extremes. Instructor: Liana Peter-Hagene This distribution also has one peak, but it’s not in the middle – most people are at one extreme, where they score really low on whatever variable we’re measuring. For example -- if I surveyed you on the number of times you’ve cheated on a test, we’d get this type of distribution - right? We call this a positively skewed distribu– tionri te that down And finally, if you get this distribution, where most scores fall at the end, meaning most people have high scores on that variable – you call it a negatively skewed distribution. For example, happiness ratings on the first week of summer break. All these distributions only have one peak – one place where scores tend to gather – we call them unimodal – and we’ll talk about what “mode” is in a few minutes But look at these ones – they have two peaks – from your zombie example we know that means people either love them, or to a lesser degree, hate them – but few people are neutral about them. This is the kind of distribution that makes researchers wonder – what separates these two groups? The second example comes with a cautionary tale – look at these test scores – students either bombed or rocked this test -and guess what separated the two groups? These are called BIMODAL distributions – they have two peaks Instructor: Liana Peter-Hagene And sometimes, you get even more complicated distributions – like this one – called multimodal, which means it has more than two peaks. We will mostly deal with unimodal, normal distributions throughout the semester, but you have to know about all of them, because data comes in all shapes in the real world, and even looking at a distribution to see multiple peaks can be interesting and informative. Measures Of Central Tendency -Where is the “peak” in the distribution? -What are the most frequent scores? -What is the most typical, or representative value for a distribution? Answer: 4 New Statistical Notations Mode Most Common Value In A Distribution Jurors’ scores: 4 2 2 3 1 4 4 3 5 4 Mode = 4 Instructor: Liana Peter-Hagene When we talked about unimodal and bimodal distributions, we were talking about the number of peaks in the distribution – and that’s exactly what the mode is: the most common score, or value, in a distribution – or the value with the highest frequency. In our distribution of jurors’ anger scores, what is the mode? Look at the frequency table if you have to It’s 4 – the peak of the distribution How about in the distribution of test scores where some students studied, some didn’t? Mode Test scores : 10 20 30 40 50 60 70 80 90 100 Modes = 90 and 30 -What kind of data? _____Nominal_________ -Which category has the most case? Fairness -E.g., What is the most important moral value to you? Instructor: Liana Peter-Hagene N = 100 Then mode can tell us where the peak is in interval or ratio data, But is most useful with nominal data, when all we do is count how many people fall in each category – so which category has most cases tells us what the mode is For example, 100 people answered this question – and here are their answers: What’s the mode? Which moral value category has the most people choosing it? Problems With The Mode -Very limited information: It does not give us too much information, for example – these two distributions have the same mode, but they are very different: If you look at the first one – what can you conclude? Where are most of the scores actually gathered? So, if you look at the mode, it’s misleading – The second distribution – the mode here is an actually accurate indication of where most scores are, but if we only see the mode, not the entire table Mode -Find the mode(s) in this dataset: Scores: 26 34 15 30 45 12 22 33 34 19 23 33 37 22 17 21 35 42 43 12 Median -DEF: Middle score when scores are arranged from the lowest to highest Instructor: Liana Peter-Hagene - Best for ordinal data, or very skewed data • Find the median: • Arrange scores from lowest to highest • If N = odd number, the median is the score in the middle • If N = even number, the median is the average of the two middle scores Jurors’ scores: 4 2 2 3 1 4 4 3 5 4 1. Re-order from low to high: 1 2 2 3 3 4 4 4 4 5 2. Identify middle score(s): 1 2 2 3 3 4 4 4 4 5 3. Calculate median: Mdn = (3 + 4)/2 = 3.5 Scores: 26 34 15 30 45 12 22 33 34 19 23 33 37 22 17 21 35 42 43 12 1. Re-order from low to high: 2. Identify middle score(s): 3. Calculate median (ONLY if you have 2 middle scores): Mean Instructor: Liana Peter-Hagene Never use the mean with nominal data-remember that sometimes we code categories with numbers, for example, like in this table, but that does not mean Justice is larger than Altruism by 2 - that does not make sense. On the exam I will try to see if you know not to compute the mean or the median if I give you categorical data – remember that the best way to check is to ask yourself: can I say Fairness > Altruism? That Green tea > Blueberry tea? Don’t confuse this with the number of people in the category!! Def.: the average of a group of scores: (sum of scores)/ number of scores m= EX/N Takes into account all the scores in a distrib ution Interval, ratio data CANNOT use it with nominal data! EXAM QUESTION: measures of central tendency. (Can’t do that, its categorical data) The mean is a mathematical average, NOT necessarily a score in the distribution Unlike the mode, and the median in odd samples, the mean is not necessarily a score in the distribution Instructor: Liana Peter-Hagene EX= 32 M= 32 -Balancing point of the distribution: -Jurors’ scores from low to high 1 2 2 3 3 (3.2) 4 4 4 4 5 If we add the distance from the mean of all scores below the mean, it will be exactly EX=555 M= 555/20= 27.75 Mode, Median, Mean In Distributions Instructor: Liana Peter-Hagene In normal distributions, they are the same – the same value is the most common, splits the distribution in 2, and is the mathematical center or balancing point of the distribution, because the distribution is perfectly symmetrical In a positively skewed distribution, the three measures are different. Most values gather at the low end of the distribution – which means the mode is here. The mean is dragged up by a small number of very high values – because remember that it is the only measure that is affected by every single score The median is in the middle, higher than the mode but lower than the mean – in fact, in very skewed distributions, the median is the best measure to use, not the mean In a negatively skewed distribution, the opposite happens : The mode is at the high end, because that’s where the most frequent score will be – the peak The median in the middle, And the mean, dragged down by small scores in the tail of the distribution -How many times have you helped someone in the past week, without that being part of your job or social obligation? 0, 0, 1, 2, 2, 2, 23 <- Outlier Mode = 2 Mdn = 2 M = 4.29 *Consider this example: Researchers want to know how common everyday helping behavior is they ask people this question: And get this data: The mode is 2, the median? 2 The mean? 30/7= 4.29 Instructor: Liana Peter-Hagene In this case, the mean is much higher than the median/mode- because we have this extreme value, which we call an outlier Because the mean takes every score into consideration, it can be affected by these extreme scores, that don’t represent the sample too well. So if we go with the mean in this case, would be have a good estimate of how much people help each other? Why/why not? No, it would be overestimated- in this case; the median is the better way to go, the median will always point us to the central values But, the mean is used most often because it has many computational advantages, and we work mostly with normal data. Practice Exercise Dataset: 3.0, 3.4, 2.6, 3.3, 3.5, 3.2, 8.2, 9.5, 2.3, 2.5, 34.4 -Create the frequency distribution (apply all the steps) -Describe the distribution -Calculate the mean, median, and mode -What is the relationship between the three measures of central tendency? +Is this what you would expect based on the shape of the distribution? -What is the best measure of central tendency in this case? (Meaning, the most representative of where most of the scores are?) Instructor: Liana Peter-Hagene Part B Lecture Notes Measures Of Variability -How spread out is the distribution? -How much do scores vary around a central value (the mean)? To more fully describe a sample, or a set of scores, we need to know more than just where all the scores gather -- we also want to know how much scores are spread around this central point, specifically the mean. With measures of variability, we can answer: How spread out is the distribution? How much do scores vary around a central value? Variability is the central concept when it comes to studying any psychological phenomenon – if people don’t differ from each other, then there is nothing to explain. Much of what we do is aimed at explaining why people differ in different aspects of their lives, such as anxiety levels, anger reactions, prosocial behavior, political orientation, etc. New Statistical Notations X = an individual score (X – M) = deviation score (distance of score from the mean) σ (lower-case sigma, squared) = Variance SD = Standard Deviation Variance Variance: as the name suggests, this is a measure of how much scores vary around the central value – the mean. This is extremely useful to know, and here’s why. Let’s say you get to choose whether you live in Chicago or in San Francisco, and it all comes down to the weather. You learn the annual a verage temperature in San Francisco is 58, and Chicago also 58. Which do you pick? Well, you think– it’s the same weather, I’ll go to Chicago, and they have deep-dish pizza… Instructor: Liana Peter-Hagene Except that this is the actual distribution of annual temperatures in Chicago And this is it in San Francisco. In Chicago, there is a large range, and a large variance—that is, scores are very scattered around the mean, in San Francisco, they all fall within a pretty nice range and there is little variability. So if you pick only based on the mean, you end up living part of the year North of the wall and half in Mordor. Conceptually: How much scores deviate from the mean; on average How far scores are from the mean on average The average distance of scores from the mean -How much does each score deviate from the mean? Subtraction -What is the average of these deviations or distances? Add deviations/ divide by number of scores This conceptual definition has clues about how to calculate it. This should tell you right away that you’ll have to do two things: Instructor: Liana Peter-Hagene 1. How much each score deviates form the mean – which is just a fancy way of saying How far is each score from the mean, or what’s the distance between each score and the mean? – This should tell you that you’ll be subtracting the mean from scores 2. What is the average (or mean) of these deviations, or distances? This should tell you that you’ll have to add them up and divide them by N *Let’s unpack this -- whenever we talk about means, we know we have to sum up scores, then divide them by the total number of scores. So, first – we get the deviation – or the distance -- of each score from the mean, which really just means subtracting the mean from each score. You will get about half positive, and about half negative numbers here, because in normal distributions half the scores are below the mean, and half are above the mean, by definition. Second, you square the deviation scores – and you do this because otherwise they will sum up to 0 every time, because in the normal distributions, the two sides are identical to each other. Third, you add up all these squared deviation scores And – since we’re computing a MEAN, what do we do next? We divide this sum by the total number of scores in the sample, which is also the total number of deviation scores. Instructor: Liana Peter-Hagene So you see, this is quite a logical procedure to arrive at an average, or mean, distance of scores from the mean of the distribution. Let’s look at an example. Example: Here are 5 scores from a scale of people’s attachment to their parents: 2, 4, 5, 6, 8. The mean of this sample is?. Let’s calculate this together, it is a simple one. – mean is 5 Score (X) XX-M X(X-M) ) 2 (X-M) 2 2 2-5=-3 (-3) =9 4 4-5=-1 (-1) =1 5 5-5=0 (0) =0 6 6-5=1 (1) =1 2 8 8-5=3 (3) =9 =4 =20 M = 5, N = 5 Σ (X – M) = 20 And then we divide. What do we divide it by? Instructor: Liana Peter-Hagene we divide the sum by N, which was 5. So, the variance of this sample is 4. What does that mean? Conceptually? Example 2: Jurors’ scores: 4 2 2 3 1 4 4 3 5 4 (N = 10, M = 3.2) How can you check you got these correctly? What should they sum up to, before squaring them? Let them work on it for 7-10 minutes, go around to answer questions and check At the end, ask for the solution, make sure all got it, Var =1.36 Example 3: Instructor: Liana Peter-Hagene Now let’s consider another example. Here are some scores from a scale measuring the wellbeing of people who live in Chicago, including job satisfaction, income, family and romantic satisfaction, weather, hobbies and free time activities. 270/6=45 Var = 1080 *So, variance is very important and no distribution is properly described without taking into consideration the spread of scores around the mean. But… It’s not very intuitive. What does it mean that the variance in the example before was 1080? We want to see numbers that are informative, that immediately tell us something. The reason variance is not that informative is that it is based on squared scores – which is necessary, but has this drawback of giving us an artificially large number that makes no sense in the context of the raw scores and the mean. Standard Deviation We can reduce variance to a meaningful number that falls within the scale of the distribution by taking its square root That number is the standard deviation. Conceptually: the average amount that scores differ from the mean, in the scale metric Mathematically: the square root of the variance -Chicago M = 58°F, σ = 400, SD = 20 -San Francisco M = 58°F, σ = 100, SD = 10 On average, temperatures in Chicago deviate 20 degrees from the mean On average, temperatures in San Francisco deviate 10 degrees from the mean Going back to our temperature example on page 15, Instructor: Liana Peter-Hagene See how talking about variability in actual degrees that exist in reality helps you understand how spread out temperatures is in the two cities? 400, 100 are just abstract numbers, but 20 and 10 are actual temperature differences you can understand. Instructor: Liana Peter-Hagene
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'