### Create a StudySoup account

#### Be part of our community, it's free to join!

Already have a StudySoup account? Login here

# Biostat 101: Week 1 PubH 6002

GWU

GPA 3.7

### View Full Document

## About this Document

## 7

## 1

## Popular in Biostatistical Applications of Public Health

## Popular in Public Health

This 108 page Class Notes was uploaded by Elizabeth Kapelan on Thursday September 1, 2016. The Class Notes belongs to PubH 6002 at George Washington University taught by Heather Hoffman in Spring 2016. Since its upload, it has received 7 views. For similar materials see Biostatistical Applications of Public Health in Public Health at George Washington University.

## Popular in Public Health

## Reviews for Biostat 101: Week 1

### What is Karma?

#### Karma is the currency of StudySoup.

#### You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 09/01/16

TABLEA-8 Critical ValuesTofor the Wilcoxon Signed-RankTest a .005 .01 .025 .05 (one tail) (one tail) (one tail) (one tail) .01 .02 .05 .10 n (two tails) (two tails) (two tails) (two tails) :;.: :;: 5 :;: * 1 6 * 1 2 7 =!= 0 2 4 8 0 2 4 6 9 2 3 6 8 10 3 5 8 11 11 5 7 11 14 12 7 10 14 17 13 10 13 17 21 14 13 16 21 26 15 16 20 25 30 16 19 24 30 36 17 23 28 35 41 18 28 33 40 47 19 32 38 46 54 20 37 43 52 60 21 43 49 59 68 22 49 56 66 75 23 55 62 73 83 24 61 69 81 92 25 68 77 90 101 26 76 85 98 110 27 84 93 107 120 28 92 102 117 130 29 100 111 127 141 30 109 120 137 152 NOTES: 1.*indicates that it is not possible to get a value in the critical region. 2. Reject the null hypothesis if the test statistic T is less than or equal to the critical value found in this table. Fail to reject the null hypothesis if the test statistic T is greater tban the critical value found in the table. From Some Rapid Approximate Statistical Procedures, Copyright © 1949,1964 Lederle Laboratories Division of American Cyanamid Company. Reprinted with the permission of tbe American Cyanamid Company. Lecture 1 Formula Sheet A 1. P A = N 2. P A =1− P A ( ) 3. P A B = P A + P B − P A B ( ) 4. P A B = P A ×P B) ( ) , if A and B are independent. 5. P AB = ) P A B ) P B ) n ∑ yi 6. Mean: y = i= n n 2 2 i=1yi− y) 7. Sample Variance:s = n −1 n 2 i=1yi− y ) 8. Sample Standard Deviations = n −1 9. Mean of shifted sample: Iyi= x ic , theny = x + c . 10.Mean of scaled sample: Iy = cx , theny = cx . i i 2 2 2 11.Variance of scaled sample: yi= cx i, thensy= c s x. MPH 6002 Biostatistical Applications for Public Health Unit 1 Lectures = Page number and section topic = Pause for practice 1.5 The Biostatistician's Thought Process Role of the Biostatistician n Field of study concerned with: n Collecting, organizing, summarizing, and analyzing data. n Drawing inferences about a body of data when only a part of the data is observed n Investigate and evaluate the nature and meaning of the information contained within the data. n Interpret and communicate the results. The Blind Men and the Elephant John Godfrey Saxe Six blind men describe the Elephant as something different because each one assumes the whole elephant is like the part he touched. n side=wall n tusk=spear n trunk=snake n knee=tree n ear=fan n tail=rope The Blind Men and the Elephant The biostatistician puts all of the individual pieces of data together to tell the whole story. The Data as the Elephant The biostatistician connects the data together to form the big picture. 1.6 Probability Terminology Probability Terminology n Event: Any result or outcome of interest. n Occur OR not occur n Frequency: Number of times an event occurs. n Observations: Number of times it is possible for an event to occur. n Probability: Measure of the likelihood that a given event will occur. n Mathematical expression of how many times an event occurs compared to the number of times it is possible for it to occur. n Ratio of frequency to observations. n Expressed as a number between 0 and 1. Notation: Probability A= number of times A eventAoccurs P A = (frequency) N N = total number of times Ais a subset of N, so it is possible for event 0 ≤ P(A) ≤ 1 Ato occur for any eventA. (observations) P(A) = probability of eventA occurring Example: Probability n A study followed 50,000 women to the completion of pregnancy. n Of those pregnancies, 3500 women reported congenital malformations. n What is the probability that a pregnancy in this group of women will result in a congenital malformation? Example: Probability n Event A = n A = n N = n P(A) = n P(congenital malformation) = n This can also be reported as ____%. 1.7 Basic Probability Concepts Pick a Card, Any Card! n Consider a standard deck of cards. n What distinguishes one card from another? nSuit (club, diamond, heart, spade) nValue (ace, 2-10, jack, queen, king) Joint or Composite Events nUnion: Probability A or B or both occur. nP(A U B) = P(A or B) nIntersection: Probability both A and B occur together. nP(A ∩ B) = P(A and B) Union A B Intersection Venn Diagram Mutually Exclusive Events n Two events are mutually exclusive if they cannot occur at the same time. nP(A ∩ B) = P(A and B) = 0 n Example: Two suits in a deck of cards HEART DIAMOND Not Mutually Exclusive Events n Two events are not mutually exclusive if they can occur at the same time. nP(A ∩ B) = P(A and B) ≠ 0 n Example: Suit and value in a deck of cards HEART QUEEN Addition Rule nP(A U B) = P(A) + P(B) – P(A ∩ B) nP(A or B) = P(A) + P(B) – P(A and B) A B Subtract P(Aand B) to avoid adding events in shaded region twice. Addition Rule nP(A or B) = P(A) + P(B) – P(A and B) A P(Aand B) A B Example: Addition Rule nP( ∩ ) = P( and ) = _____ nP( U ) = P( or ) = P( ) + P( ) – P( and ) = ____ + ____ – ____ = ____ HEART QUEEN Addition Rule for Mutually Exclusive Events nP(A ∩ B) = P(A and B) = 0 nP(A U B) = P(A or B) = P(A) + P(B) (A+B)/N = A/N + B/N A B Example: Addition Rule nP( ∩ ) = P( and ) = 0 nP( U ) = P( or ) = P( ) + P( ) = ____ + ____ = ____ HEART DIAMOND Example: Addition Rule Consider a study on hypertension. Suppose that 10% of patients take drug A, 20% take drug B, and 5% take both A and B. P(A) = ____ P(B) = ____ P(A and B) = ____ nAre taking drugs A and B mutually exclusive? P(A ∩ B) = ____ Mutually Exclusive? ____ nWhat is the probability of taking drug A or B? Example: Addition Rule P(A) = 0.10, P(B) = 0.20, P(A and B) = 0.05 Drugs A and B Drug A .05Drug B .05 .15 Collectively Exhaustive n A set of events A , A 1 …, 2 is collektively exhaustive if at least one of the events must occur. nA 1 A U 2 U A = S (sakple space) ntogether they contain all of the outcomes in the sample space Complementary Events n Two events are complementary if they are: nmutually exclusive and P(A∩ Ā) = 0 ncollectively exhaustive n Let Ā be the event that A does not occur. nP(A) = probability event A does occur. nP(Ā) = probability event A does not occur. nĀ is called the complement of A. Ā P(Ā) = 1 – P(A) A Collectively Exhaustive Mutually Exclusive Collectively Exhaustive And Mutually Exclusive 1.8 Identify the Type of Event Example: Roll the Dice n A = {2,4,6} n B = {1,3,5} n C = {6} n D = {1,2,3,4,5} n Mutually exclusive but not collectively exhaustive: _____ n Collectively exhaustive but not mutually exclusive: _____ n Mutually exclusive and collectively exhaustive: ________ Example: Coin Toss n In tossing a fair coin, there are only two possible events: n A = Heads n Ā = Tails n Heads and tails are: n mutually exclusive n collectively exhaustive n complementary Example: Hospital Admittance nSuppose 1,000 patients are admitted to a hospital in one year. Patients are admitted either through the emergency room (ER) or by a medical doctor (MD). nIf 450 patients are admitted through the ER, what is the probability of being admitted by an MD? P(ER) = _____ P(MD) = _____ Mutually Exclusive P(HD and RD) = 0 Collectively Exhaustive P(Ā) = 1 – P(A) Example: Cause of Death n Only one primary cause of death can be coded for each deceased patient. In a cohort, suppose 5% die from heart disease (HD) and 1% die from respiratory disease (RD). n What is the probability that someone in this cohort will die of a disease other than HD or RD? nP = _____ nP(HD) = _____ nP(RD) = _____ nP(HD and RD) = _____ Example: Cause of Death nFind P(not (HD or RD)) = 1 – P(HD or RD). nP(HD) = 0.05, P(RD) = 0.01, P(HD and RD) = 0 nP(HD or RD) = P(HD) + P(RD) – P(HD and RD) = 0.05 + 0.01 – 0 = 0.06 nP(not (HD or RD)) = 1 – P(HD or RD) OTHER = 1 – 0.06 = 0.94 DISEASE (0.94) RESPIRATOR HEART Y DISEASE DISEASE (0.05) (0.01) 1.9 Independent Events Independent Events n Two events are independent if the occurrence of one event does not affect the probability of the occurrence of the other. n Simply stated, the two events have no influence on one another. n Independent => not associated n Dependent => associated Example: Independence n Toss a “tails” with a fair coin AND then roll a “3” with a standard 6-sided die. n Pick an “8” from a standard deck of cards, replace it, and then pick a “spade” as the second card. Independence Rule (Multiplication Rule) nIf A and B are independent events, then: P(A∩ B) = P(Aand B) = P(A)*P(B) nIf A and B are dependent events, then: P(A∩ B) = P(Aand B) ≠ P(A)*P(B) P(A∩ B) = P(Aand B) = P(A)*P(B| A) Example: Independence Rule n Suppose you toss a fair coin 2 times. n Let A = heads on first toss. P(A) = n Let B = heads on second toss. P(B) = n Are A and B independent events? _____ There are 4 possible outcomes: P(A)*P(B) = P(A and B) = P (2 heads) = 37 Independence Rule (>2 Events) n If A1, A 2 …, A ake mutually independent events, then: P(A ∩A ∩…∩A ) = P(A )P(A )… 1 2 k 1 2 P(A )k Example: Independence Rule (>2) n Suppose you toss a fair coin 3 times. st nd rd n Let A ,1A ,2A =3heads on 1 , 2 , 3 toss. n Are A ,1A an2 A inde3endent events? _____ There are 8 possible outcomes: P(A )1= P(A ) 2 P(A ) = 3 P(A )1P(A )*2(A ) = 3 P(A a1d A and2A ) = 3 P (3 heads) = Independence and Mutual Exclusivity n If A and B are independent, then: nP(A ∩ B) = P(A and B) = P(A)P(B). n If A and B are mutually exclusive, then: nP(A ∩ B) = P(A and B) = 0. n If A and B are nontrivial events, then they cannot be both mutually exclusive and independent. nBeing mutually exclusive is a lack of independence. Example: Independence and Mutual Exclusivity nConsider a study on hypertension. Suppose that 10% of patients take drug A, 20% take drug B, and 2% take both A and B. nAre taking drugs A and B independent? _____ P(A) = P(B) = P(A and B) = P(A)*P(B) = nAre they mutually exclusive? _____ Example: Independence and Mutual Exclusivity nConsider a study on hypertension. Suppose that 10% of patients take drug A, 20% take drug B, and 2% take both A and B. nWhat is the probability of taking drug A or B? P(A) = P(B) = P(A and B) = P(A or B) = 1.11 Conditional Probability Conditional Probability nThe conditional probability of A given B is the probability of event A occurring given that event B has already occurred. nA = conditional event = event of interest nB = conditioning event = defines limitations for the probability of A occurring P AB = ) P (Aand B ) P(B ) Example: Conditional Probability n Suppose 500 mothers are seen in a high risk obstetrics practice. 50 of those mothers are being seen for twin pregnancies (TP). 10 of those mothers with twin pregnancies have gestational diabetes (GD). n In this practice, what is the probability of having GD given that you have a TP? n P(GD | TP) = P(GD and TP) / P(TP) n P(TP) = n P(GD and TP) = n P(GD | TP) = Conditional Probability and Independence nIf A and B are independent events, then P(A and B) = P(A)P(B). nTherefore, events are independent if and only if their unconditional probabilities are equal to their conditional probabilities. P(Aand B ) P(A )P(B) P AB =) = = P(A ) P B ) P B ) Beware of Notation! nP(A and B) = P(B and A) nP(A or B) = P(B or A) nP(A|B) ≠ P(B|A) unless P(A) = P(B) P AB =) P Aand B )≠ P(Aand B )= P BA ) P B ) P(A) Example: Independence and Conditional Probability n In a study of smoking (S) and coffee drinking (C), 5% of the people smoked but did not drink coffee, 45% drank coffee but did not smoke, 35% drank coffee and smoked, and 15% neither drank coffee nor smoked. n Is smoking independent of drinking coffee? n Does P(S) = P(S|C)? Coffee Coffee Total n P(S and NC) = Drinker Drinker P(NS and C) = Smoker P(S and C) = Not a Smoker P(NS and NC) = Total Example: Independence and Conditional Probability n Unconditional probability of smoking n P(S) = P(S and C) + P(S and NC) = 0.35 + 0.05 = 0.40 n Conditional probability of smoking n P(S|C) = P(S and C) / P(C) n P(C) = P(S and C) + P(NS and C)= 0.35 + 0.45 = 0.80 n P(S|C) = P(S and C) / P(C) = 0.35/0.80 = 0.4375 n Since P(S) ≠ P(S|C), smoking and coffee drinking are not independent. Important Probability Application n Prevalence of a disease is the probability of currently having the disease regardless of the duration of time one has had the disease. Prevalence = # currently with disease / # in population n Cumulative incidence of a disease is the probability that a person without the disease will develop a new case of the disease over some specified time period t. Cumulative incidence = # new cases of disease in t / # in population at risk in t Example: Prevalence and Cumulative Incidence of the students currently have the flu. Over the 15 next week, 10 new students develop the flu. What is the prevalence of flu at the start of the week? What is the prevalence of flu at the end of the week? What is the cumulative incidence of flu during that time period? 1.15 Introduction to Types of Variables Definition: Variables n Measurements or events that can potentially take on different values between observations in a population or sample. nVariable values depend on variable types. nVariable types depend on the nature of the data and how the data are measured. Types of Variables Qualitative Quantitative Categorical Numerical Discrete Continuous Integers Any number Nominal Ordinal Unordered categories Ordered categories High Fiber Diet Plan A manufacturer was considering marketing crackers high in a certain kind of edible fiber as a dieting aid. Dieters would consume some crackers before a meal, filling their stomachs so that they would feel less hungry and eat less. A laboratory studied whether people would in fact eat less in this way. 12 overweight female subjects were fed a controlled diet. Before each fiber, or no fiber (control). They were then allowed to eat as much as they wished from a prepared menu. The amount of food they consumed (caloric intake) and their weight were monitored, along with any side effects they reported. Subjects reported any gastric or other problems. Unfortunately, some subjects developed uncomfortable bloating and gastric upset from some of the fiber crackers. High Fiber Diet Plan: Variables Variable Name Variable Description Subject An identification for each of the 12 subjects Diet One of four diets (type of cracker) Cracker Type of fiber in the cracker Amount Number of whole crackers consumed Digested Digested calories (difference between caloric intake and calories passed through system) Bloat Degree of bloating and flatulence reported by the subjects High Fiber Diet Plan: Data Subject Diet Cracker Amount Digested Bloat 1 1 bran 4 2047.42 low 2 1 bran 15 2547.77 none 3 1 bran 7 1752.63 low 4 2 gum 11 2558.61 high 5 2 gum 7 1944.48 med 6 2 gum 8 1871.95 high 7 3 mucilage 18 2436.79 low 8 3 mucilage 16 1844.77 high 9 3 mucilage 4 2125.39 med 10 4 control 20 2359.90 none 11 4 control 14 1902.75 none 12 4 control 13 2125.39 low 1.15.1 Identifying Variable Types Nominal Variable n Definition: Categorical variable that can take on a limited number of possible values that do not have a natural order. n Assign values to mutually exclusive and collectively exhaustive groups or categories. n No quantitative information is conveyed and no ordering of the items is implied. n You can count but not order or measure nominal data. n Examples: n Race/ethnicity={white, black, Hispanic, American Indian or Alaskan native, Asian or Pacific Islander, other} n Blood type={A, B, AB, O} n Marital status={never married, married, separated, divorced, widowed} Example: High Fiber Diet Plan Subject Diet Cracker Amount Digested Bloat 1 1 bran 4 2047.42 low 2 1 bran 15 2547.77 none Which variables 3 1 bran 7 1752.63 low 4 2 gum 11 2558.61 high are nominal? 5 2 gum 7 1944.48 med 6 2 gum 8 1871.95 high 7 3 mucilage 18 2436.79 low 8 3 mucilage 16 1844.77 high 9 3 mucilage 4 2125.39 med 10 4 control 20 2359.90 none 11 4 control 14 1902.75 none 12 4 control 13 2125.39 low Dichotomous (Dummy) Variable n Definition: Categorical variable that can take on only one of two possible values. nSpecial case of a nominal variable. nAlso referred to as an indicator variable since it takes on a value of 1 or 0 to indicate inclusion (yes) or exclusion (no). n Examples: nGender={1=male, 0=female} nCurrent smoking status={1=smoker, 0=non-smoker} nDiabetic={1=yes, 0=no} Example: Dummy Variable n For k categories, you need to create k – 1 dummy variables. n Race/ethnicity={white, black, Hispanic, American Indian or Alaskan Native, Asian or Pacific Islander, other} ⎧1 if white ⎧ if AIor AN RaceW = ⎨ RaceAI= ⎨ ⎩0 otherwise ⎩ otherwise RaceB= ⎨1 if black RaceA = ⎨1 if Asianor PI ⎩0 otherwise ⎩0 otherwise RaceH = ⎧ if Hispanic If Race W= Race B= Race H= RaceAI= RaceA=0, ⎩ otherwise then we know that race is “other”. Ordinal Variable n Definition: Categorical variable that can take on a limited number of possible values that do have a natural order so that they can be ranked (put in order) as ascending or descending. n You can count and order, but not measure, ordinal data. n There is no “true” zero point for ordinal scales since the zero point is chosen arbitrarily. n Higher numbers represent higher values, but intervals between numbers are not equal. n Arithmetic does not make sense. n Examples: n Education Level={1=Some school, 2=High school graduate, 3=Some college, 4=Associate degree, 5=Bachelor’s degree, 6=Master’s degree, 7=Professional degree or Doctorate} n Symptom severity={0=minimal, 1=moderate, 2=severe} n Height={-1=short, 0=average, 1=tall} Example: High Fiber Diet Plan Subject Diet Cracker Amount Digested Bloat 1 1 bran 4 2047.42 low 2 1 bran 15 2547.77 none Which variables 3 1 bran 7 1752.63 low 4 2 gum 11 2558.61 high are ordinal? 5 2 gum 7 1944.48 med 6 2 gum 8 1871.95 high 7 3 mucilage 18 2436.79 low 8 3 mucilage 16 1844.77 high 9 3 mucilage 4 2125.39 med 10 4 control 20 2359.90 none 11 4 control 14 1902.75 none 12 4 control 13 2125.39 low Discrete Variable (How many?) n Definition: Quantitative variable that can take on only a finite (limited) number of evenly spaced distinct values between defined minimum and maximum values (any two points). n You can count, order and measure discrete data. n Discrete variables are typically counts. n Think whole numbers! n Examples: n Number of children in a family household n Number of patients in hospital ward n Number of students that will pass this class n CD4 cell count Example: High Fiber Diet Plan Subject Diet Cracker Amount Digested Bloat 1 1 bran 4 2047.42 low 2 1 bran 15 2547.77 none Which variables 3 1 bran 7 1752.63 low 4 2 gum 11 2558.61 high are discrete? 5 2 gum 7 1944.48 med 6 2 gum 8 1871.95 high 7 3 mucilage 18 2436.79 low 8 3 mucilage 16 1844.77 high 9 3 mucilage 4 2125.39 med 10 4 control 20 2359.90 none 11 4 control 14 1902.75 none 12 4 control 13 2125.39 low Continuous Variable (How much? ) n Definition: Quantitative variable that, in theory, can take on an infinite (unlimited) number of evenly spaced values between defined minimum and maximum values (any two points). n You can count, order and measure continuous data. n infinite precision, so data values are rarely truly continuous due to limitations of measurement instruments. n If a variable is continuous in theory, then we treat it as such in the analysis. n Examples: systolic blood pressure, diastolic blood pressure, total cholesterol level, age, height, weight. Example: High Fiber Diet Plan Subject Diet Cracker Amount Digested Bloat 1 1 bran 4 2047.42 low 2 1 bran 15 2547.77 none Which variables 3 1 bran 7 1752.63 low 4 2 gum 11 2558.61 high are continuous? 5 2 gum 7 1944.48 med 6 2 gum 8 1871.95 high 7 3 mucilage 18 2436.79 low 8 3 mucilage 16 1844.77 high 9 3 mucilage 4 2125.39 med 10 4 control 20 2359.90 none 11 4 control 14 1902.75 none 12 4 control 13 2125.39 low 1.16 Relationships Between Types of Variables Example: Continuous n Consider the following data set containing weights (lbs) of 15 individuals: 130 142.5 154.5 135.5 147.5 167 136 148.5 169 138.5 150 187.5 140 153 187.5 n Values reported to be the same may not truly be identical, only due to scale limitations. Example: Ordinal n Suppose we used corresponding height data to assign each individual to a BMI category as follows: 130 142.5 154.5 Underweight Normal Overweight 135.5er147.5t 1671) Normal Normal Overweight Normal (n=7) 136 148.5 169 Normal Normal Overweight 138.5rw150ht (187.5 Normal Overweight Obese Obese (n=2) 140 153 187.5 Normal Overweight Obese n You lose information when you redefine ordinal data from continuous data! Example: Dummy n Suppose we used corresponding height data to assign each individual to a BMI category as follows: 130 142.5 154.5 Normal – Normal – Overweight + Normal – (n=8)167 Normal – Normal – Overweight + 136 148.5 169 Normal – Normal – Overweight + 138.5ei150 + (187.5 Normal – Overweight + Overweight + 140 153 187.5 Normal – Overweight + Overweight + n You lose even more information when you redefine dummy data from continuous data! 1.17 Guided Practice: Statistics Calculations Statistics for Continuous Data n Measures of location or central tendency: n Mean n Median n Mode n Measures of spread: n Range n Interquartile Range n Variance n Standard Deviation n Summarize or describe the data. Measure of Location: Mean n MEAN means AVERAGE. nThe sum of a set of values divided by the total number of values. n nNotation: x =∑ xi “x-bar” i=1n nThe mean is at the center of the data set in the sense that it is the balance point for the data. nSensitive to extreme values. Example: Mean n Weight data (n=15) 130 + 142.5+ 154.5 + n Add all values 135.5+ 147.5+167 + 136 + 148.5+169 + n Divide sum by n 138.5+ 150 +187.5 + 140 + 153 +187.5 = 2287 = 15 n T/F test scores data (n=14) 48 + 68 + 100 + 72 + 68 + 64 + 52 + 68 + 60 + 80 + 72 + 84 + 100 + 92 = 1028 14 = Measure of Location: Median nMEDIAN is in the MIDDLE. n The middle value of a set of values arranged in order of increasing or decreasing magnitude. nOdd #: Value located in exact middle of ordered set. nEven #: Mean of two middle values in ordered set. n Notation: x “x-tilde” n Resistant to extreme values. Example: Median 130 142.5 154.5 n Weight data (n=15) 135.5 147.5 167 nOdd n 136 148.5 169 138.5 150 187.5 nSort and find middle value 140 153 187.5 n T/F test scores data (n=14) nEven n nSort and find mean of 2 middle values 48 58 600 72 68 68 68 68 60 80 74 94 100 100 Measure of Location: Mode n MODE is the MOST popular. n The value that appears most frequently in a set of values. nBimodal if 2 values appear the most. nMultimodal if >2 values appear the most. nNo mode if no value is repeated. n Notation: M n More appropriate for nominal data since not too meaningful for continuous data. Example: Mode 130 142.5 154.5 n Weight data (n=15) 135.5 147.5 167 136 148.5 169 n Find the value that appears the most 138.5 150 187.5 140 153 187.5 n T/F test scores data (n=14) n Mode = 48 52 60 64 68 68 68 72 72 80 84 92 100 100 Measures of Spread n Variation refers to the amount that values differ (vary) among themselves. n Values that are relatively close together have low measures of variation. n Values that are spread far apart have high measures of variation. Measure of Spread: Range n RANGE is the distance between. nMeasure of variation that is the difference between the highest and lowest values in a set of numbers. nInferior to other measures since it depends only on the highest and lowest values. Example: Range 130 142.5 154.5 n Weight data (n=15) 135.5 147.5 167 136 148.5 169 Range = 138.5 150 187.5 140 153 187.5 n T/F test scores data (n=14) 48 52 60 64 68 68 68 72 72 80 84 92 100 100 Range = Measure of Spread: IQR n Interquartile Range is the difference between the third quartile (75 percentile)h and first quartile (25 percentile). First Median Third quartile quartile 25% Q3 – Q1 = middle 50% 25% 25th 50th 75th percentile percentile percentile Measure of Spread: IQR n If np/100 is not an integer, then: npth percentile = kth largest sample point, where k = smallest integer greater than np/100 n If np/100 is an integer, then: npth percentile = the mean of the kth and lth largest sample points, where k = np/100 and l = np/100 + 1 Example: IQR 130 142.5 154.5 n Weight data (n=15) 135.5 147.5 167 np/100= 136 148.5 169 138.5 150 187.5 k= 140 153 187.5 np/100= k= IQR = n T/F test scores data (n=14) 48 68 84 52 68 92 np/100= k= 60 72 100 64 72 100 np/100= 68 80 k= IQR = Measures of Spread: Variance and Standard Deviation n Definition: The variance and standard deviation of a set of sample values are measures of dispersion representing the variation of values about the mean. n Measures of average deviation or average distance between each of a set of data points and their mean value. n Variance is equal to the mean of the squared deviations from the mean value. n Standard deviation is equal to the square root of the variance. Measure of Spread: Variance n Steps to calculate variance: n Find the mean x n Find each value’s deviation from the mean xi− x 2 n Square each deviation (xi− x) n x − x ) n Add all of the squared deviations i=1 i n Divide sum of squared deviations by n – 1. n ∑ xi− x ) s = i=1 n −1 Units will be squared! Weight – Mean DeDviviiion = Deviation 2 130 – 152.5 (--2..)2 = 504.8 + 2 135.5 – 152.5 (--7..) = 287.9 + Example: 136 – 152.5 (--6..)2 = 271.2 + 138.5 – 152.5 (--4..)2 = 195.1 + Variance 140 – 152.5 (--2..)2 = 155.4 + 2 Weight Data 142.5 – 152.5 (--0..) = 99.3 + 2 n Find the mean 147.5 – 152.5 (--.)0 = 24.7 + 148.5 – 152.5 (--.)02 = 15.7 + n Find each value’s deviation from the mean 150 – 152.5 (--.)52 = 6.1 + n Square each deviation 153 – 152.5 (0..)2 = 0.3 + 2 n Add all of the squared 154.5 – 152.5 (2..) = 4.1 + deviations 167 – 152.5 (14..)2 = 211.2 + n Divide sum of squared 169 – 152.5 (16..)2 = 273.4 + deviations by n – 1. 187.5 – 152.5 (35..)2 = 1227.3 + 2 187.5 – 152.5 (35..) = 1227.3 4503.7 = _____ 14 Measure of Spread: Standard Deviation n Steps to calculate standard deviation: nFind the variance. nTake the square root. n 2 ∑ (xi− x) s = s = i=1 n −1 Example: Standard Deviation n Weight Data nVariance s = 321.7 nStandard Deviation s = variance = 321.7 = 1.18 Guided Practice: Translating and Shifting Data Translating (Shifting) Data n Suppose you “shift” a sample of data points by adding a constant c as follows: Original Data Shifted Data y x = y + c 1 1 1 y2 x2= y2+ c : : yn xn= yn+ c n Relationship of the points relative to one another remains the same. y y y y y y y x x x x x x x Translating (Shifting) Data n Measures of location change, but measures of spread remain the same. n Common when changing origin of data. n If x = y + c, then: i i nMean increases or decreases by c: x = y +c 2 2 nVariance remains the same: sx= s y nStandard deviation remains the same: sx= s y Digested Example: Shifting Data Digested c Shifted 2047.42 – 2000 47.42 2547.77 – 2000 547.77 n Healthy Fiber Diet Plan 1752.63 – 2000 -247.37 2558.61 – 2000 558.61 n Compare digested 1944.48 – 2000 -55.52 calories to daily 1871.95 – 2000 -128.05 2436.79 – 2000 436.79 recommended intake of 1844.77 – 2000 -155.23 2000 calories 2125.39 – 2000 125.39 2359.90 – 2000 359.9 n c = -2000 1902.75 – 2000 -97.25 2125.39 – 2000 125.39 Mean: ______ _____ Standard Deviation: ______ _____ Rescaling Data n Suppose you “rescale” a sample of data points by multiplying by a constant c as follows: Original Data Shifted Data y1 x1= cy1 y x = cy 2 2 2 : : yn xn= cyn n Relationship of the points relative to one another changes. Rescaling Data n Both measures of location and measures of spread change. n Common when converting units. n If x i cy, thin: nMean changes by a factor of c: x = cy 2 2 2 2 nVariance changes by a factor of c : s xc s y nStandard deviation changes by a factor of c: s =cs x y Number Correct c Score Example: Rescaling Data 12 4 48 17 4 68 n T/F Test (25 questions worth 25 4 100 18 4 72 4% each) 17 4 68 16 4 64 n Transform number correct into percentage score 13 4 52 17 4 68 n c = 4 15 4 60 20 4 80 n T/F Test 18 4 72 n Convert minutes to hours 21 4 84 25 4 100 n Weight data 23 4 92 n Convert lbs to kg Mean ____ ___ Standard Deviation ____ ___ 1.18 Statistics for Other Data Types Statistics for Other Data Types Measures of Location nDichotomous (dummy) variable n Mean represents a probability (frequency/observations) n Mode (category that appears the most) nOrdinal n Median (category that falls in the middle of ordered data) n Mode (category that appears the most) nNominal n Mode (category that appears the most) Measures of spread do not apply here!

### BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.

### You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

## Why people love StudySoup

#### "I was shooting for a perfect 4.0 GPA this semester. Having StudySoup as a study aid was critical to helping me achieve my goal...and I nailed it!"

#### "I bought an awesome study guide, which helped me get an A in my Math 34B class this quarter!"

#### "I was shooting for a perfect 4.0 GPA this semester. Having StudySoup as a study aid was critical to helping me achieve my goal...and I nailed it!"

#### "It's a great way for students to improve their educational experience and it seemed like a product that everybody wants, so all the people participating are winning."

### Refund Policy

#### STUDYSOUP CANCELLATION POLICY

All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email support@studysoup.com

#### STUDYSOUP REFUND POLICY

StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here: support@studysoup.com

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to support@studysoup.com

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.