Class Note for PUBHLTH 540 at UMass

University of Massachusetts

This 10 page Class Notes was uploaded by an elite notetaker on Friday February 6, 2015.

Date Created: 02/06/15
Puleth 540 Introductory Biostatistics Page 1 of 10 Unit 2 Summarizing Data Week 2 Practice Problems Solutions 1 A stem and leaf diagram might come in handy Stems are shaded leaves are not 3 68851865 3 15566888 4 50165165310 gt 4 001113555 5 39113 5 1 1 3 3 9 6 90 6 0 9 1 25 MEAN 7c 2 Z XI n i1 1 1156 24446 so x 445 n MEDIAN First solven1j 262 135 Median is midpoint of 13 11 and 143911 observation N 1 x E4143 so x 42 MODE This sample is 11139 modal 384145 RANGE Maximum Minimum 269 31 sorange 38 WkZisolutionsdoc Puleth 540 Introductory Biostatistics Page 2 of 10 VARIANCE Let s save ourselves the trouble of a very long brute force formula by using the formula for grouped data Let j index the unique values There are 14 unique values x1 if fjxj if j Xi fi 1 31 1 18225 18225 2 35 2 9025 18050 3 36 2 7225 14450 4 38 3 4225 12675 5 40 2 2025 4050 6 41 3 1225 3675 7 43 1 225 225 8 45 3 025 075 9 46 2 225 450 10 51 2 4225 8450 11 53 2 7225 14450 12 59 1 21025 21025 13 60 1 24025 24025 14 69 1 60025 60025 TOTALS 26 199850 14 f 2 Z x 7 J J 199850 5211 2 S0 S27994 14 213 1 Standard deviation S xE So S 894 WkZisolutionsdoc Puleth 540 Introductory Biostatistics Page 3 of 10 25th Percentile First solve 25 n 25 26 65 So 25th percentile is the 7th observation P25 38 75th Percentile First solve 75 n 75 26 195 So 75th percentile is the 20Lh observation P75 51 1B 2 5 5 5 5 5 5 5 5 5 2 6 6 6 6 6 2 8 8 8 3 0 l 3 4 4 l 21 l AJEAN 721Xi i5682704 So C270 n I n1 211 AEDIAN Solvmg T 2 11 Median is the 11th observation So i 26 MODE mode 25 RANGE Maximum Minimum 34 25 So Range 9 WkZisolutionsdoc Puleth 540 Introductory Biostatistics Page 4 of 10 Variance There are 6 unique values 2 2 J x f Xi x fix x 1 25 9 4 36 2 26 5 1 5 3 28 3 1 3 4 30 1 9 9 5 3 1 1 16 16 6 34 2 49 98 TOTALS 21 49 98 6 2 2 Elixi x 167 2 S 6 So S 835 2 f 1 20 J Star ard deviation S JS2 J835 So S 2289 25th Percentile Solving 25 n 25 21 525 So 25th percentile is 6th observation P25 25 Note I get this by noticing from the table above that the smallest value 25 occurs with a frequency of 9 times in the sample 75th Percentile Solving 75 n 75 21 1575 So 75th percentile is 16th observation P75 28 Note 7 Iget this by noticing in the table that the value 28 occurs with afrequency of3 times in the sample and comes after the first 9 observations all equal to 25 and after the next 5 observations all equal to 26 so that the value of 28 is the 15M 16m and 1 7m observations in the ordered sample WkZisolutionsdoc Puleth 540 Introductory Biostatistics Page 5 of 10 1C REMINDER Use the same scale when comparing two groups Group Patients Controls Mean 44 5 270 Median 42 26 38 25 P25 P75 5 l 28 Interquartile Range IQR l3 3 P25l5IQR 185 205 P75l5IQR 705 325 Min 3 l 25 MaX 69 34 Whisker Notes on Whiskers 1 IF P25 15 IQR lt minimum of the actual data so use minimum of actual data instead 2 IF P75 15 IQR gt maximum of the actual data so use maximum of actual data instead 70 65 60 55 ZAS Score 407 35 30 25 Exercise 1 C Box and Whisker Plot 50 45 Hea1hyn21 Panic Disordern26 WkZisolutionsdoc Puleth 540 Introductory Biostatistics Page 6 of 10 2A Class Class Relative Cumulative Cumulative Endpoints Midpoint Frequency Frequency Frequency Relative Freq 51499 10 5 067 5 067 152499 20 10 133 15 200 253499 30 20 267 35 467 354499 40 22 293 57 760 455499 50 13 173 70 933 556499 60 5 067 75 1000 TOTALS 1000 2B A cumulative relative frequency polygon for grouped data is unfortunately not straightforward in SAS or Stata or SPSS or minitab Solution using Excel Step 1 Enter your X and y points into your worksheet such that X Endpoint of class interval y Cumulative relative frequency for the interval note 7 Be sure to include an Xy 00 xaqe vcumuative relative frequencv 0 0 15 0067 25 02 35 0467 45 076 55 0933 65 1 WkZisolutionsdoc Puleth 540 Introductory Biostatistics Step 2 Use the chart wizard in excel as follows Page 7 of 10 Highlight the data you want to plot 15 25 35 115 55 55 005 U2 0215 05 0933 Click on the chart wizard from the upper toolbar Under Chart Type Select XY Scatter Under Chart subtype Highlight the plot with the dots connected Click Next Chan Wizard Step 1 of 4 7 Chan Type Standard Types ghart type Custom Types I Chart subtype F g WW Scatter with data points connected by smoothed Lrnes Press and Hold to law Sample Cantel Einish You should see the following Click Next Chan Wizard Slap 2 M 4 Chan Snurce Dala Sam I mm 5m 1 u as at m gate vange P 5m alan cm at i m WkZisolutionsdoc Puleth 540 Introductory Biostatistics Page 8 of 10 You will then see a menu that lets you add legends and titles etc And if you like you can change such things as shading tick marks etc Chart Wizard Step 3 of 4 C Axes I Gridlines l Legend 1 Data Labels Chart thle 39 12 Value X axis 1 39 Ealue Y axis 03 as y l x Dquot l 02 7 H c l o l o 20 w so 30 Cancel lt ack I ext gt I Einish After some aesthetics on my part this is what I got Week 2 Problem 23 DD5 DB DD5 DD D75 D7 DD5 EIE D55 D5 D45 D4 Cumulative Relative Frequency D35 D3 D25 D2 D15 Ell DD5 2D 25 3D 35 Age years 4D 45 5D 55 EEI 65 7D Estimates are P10 17 P50 36 P25 26 P75 445 WkZisolutionsdoc Puleth 540 Introductory Biostatistics Page 9 of 10 2C Midpoint Frequency x J C 2 Xi f1 Xifi j fx x 10 5 50 257 330245 20 10 200 157 246490 30 20 600 57 64980 40 22 880 43 40678 50 13 650 143 265837 60 5 300 243 295245 Total 75 2680 1243475 6 f E x J J 2680 MN 7 J 16 W So c 357 Elfj MEDIAN Note to reader 7 I ve consulted a number of texts on this There is no single correct answer With interval data whatever median you calculate is an approximation Here is what is suggested in Think and Explain with Statistics Lincoln E Moses page 64 n1751 2 First solve 38th observation Examination of the table reveals that the 38th observation is in the interval 35 to 4499 Set the following quantities The letter 1 lower limit of interval 35 The letter u upper limit of interval 4499 R cumulative frequency up to the lower limit of interval 35 M observations contained in interval 22 N total observations 75 An approximate solution for the median is calculated as N2 R 752 35 1 Tu l 35 T4499 35 36135 or37 WkZisolutionsdoc Puleth 540 VARIANCE 6 2 2 Elf9 1243475 Introductory Biostatistics S soS2 l6804 104 74 Standard deviation S VSZ so S 130 Page 10 of 10 3A Remember to use the same scale l 2 3 median 8 8 7 mean 136 64 82 left box edge P25 5 4 5 right box edge P75 11 ll 12 IQR P75P25 6 7 7 P2539l5IQR 4 65 55 P75l5IQR 20 215 225 left whisker 5 l 4 right whisker 20 l3 l4 3B When data are skewed by extreme values medians and quartiles give a better feel for the bulk of the data than do means and standard deviations This example also illustrates that as sample size increases the range can only increase Notice that the extreme value of 40 occurred in the sample with the largest sample size WkZisolutionsdoc

