Lecture 3 6315

Texas A&M University-Corpus Christi
GPA 4.0

Covers basic renaming and extracting of certain columns in a data set. Measures of the center (mean, median, quantiles, etc)
COURSE
STATISTICAL METHODS RSRCH I
PROF.
Dr. Blair Sterba-Boatwright
TYPE
Class Notes
PAGES
7
WORDS
CONCEPTS
Graduate Statistics, R, RStudio, Statistics, Stats
KARMA
Free

This 7 page Class Notes was uploaded by Adam Bynum on Friday September 2, 2016. The Class Notes belongs to 6315 at Texas A&M University-Corpus Christi taught by Dr. Blair Sterba-Boatwright in Fall 2015.

## Reviews for Lecture 3

Date Created: 09/02/16
StatsNotes3 Adam September 3, 2016 load(file="paruelo.RData") #Basic stats from a data set. #Playing around with data set commands plo <- paruelo plo ## C3 C4 MAP MAT JJAMAP DJFMAP LONG LAT ## 1 0.65 0.00 199 12.4 0.12 0.45 119.55 46.40 ## 2 0.65 0.00 469 7.5 0.24 0.29 114.27 47.32 ## 3 0.76 0.01 536 7.2 0.24 0.20 110.78 45.78 ## 4 0.75 0.18 476 8.2 0.35 0.15 101.87 43.95 ## 5 0.33 0.28 484 4.8 0.40 0.14 102.82 46.90 ## 6 0.03 0.83 623 12.0 0.40 0.11 99.38 38.87 ## 7 0.00 0.31 259 14.5 0.47 0.17 106.75 32.62 ## 8 0.02 0.87 969 15.3 0.30 0.14 96.55 36.95 ## 9 0.05 0.72 542 13.9 0.44 0.13 101.53 35.30 ## 10 0.05 0.44 421 8.5 0.31 0.14 104.60 40.82 ## 11 0.36 0.41 446 5.1 0.41 0.15 102.50 47.75 ## 12 0.00 0.50 376 11.2 0.51 0.17 105.55 33.48 ## 13 0.21 0.70 661 17.8 0.27 0.16 99.23 33.33 ## 14 0.51 0.25 575 6.1 0.36 0.16 99.10 45.33 ## 15 0.07 0.78 885 12.9 0.37 0.12 96.60 39.10 ## 16 0.29 0.70 556 8.6 0.38 0.12 101.80 41.55 ## 17 0.05 0.00 344 6.6 0.18 0.35 112.67 43.73 ## 18 0.13 0.00 415 6.0 0.22 0.33 112.15 44.25 ## 19 0.00 0.31 347 18.9 0.41 0.12 102.92 29.58 ## 20 0.65 0.13 575 5.3 0.36 0.13 103.45 43.53 ## 21 0.00 0.76 477 13.9 0.50 0.22 110.50 31.60 ## 22 0.89 0.00 370 3.5 0.45 0.19 107.72 50.70 ## 23 0.08 0.30 537 16.7 0.32 0.12 101.18 32.97 ## 24 0.47 0.37 870 15.4 0.29 0.15 97.23 36.05 ## 25 0.00 0.00 356 7.3 0.18 0.31 113.08 41.87 ## 26 0.00 0.42 570 12.2 0.42 0.24 109.12 32.00 ## 27 0.21 0.48 457 8.8 0.36 0.13 102.33 43.75 ## 28 0.29 0.00 327 9.5 0.16 0.31 113.25 38.17 ## 29 0.35 0.31 176 13.8 0.25 0.28 111.87 37.10 ## 30 0.28 0.40 345 6.4 0.46 0.14 104.47 45.03 ## 31 0.45 0.00 405 5.9 0.40 0.12 116.75 39.82 ## 32 0.31 0.00 375 5.7 0.16 0.29 107.17 41.42 ## 33 0.02 0.11 231 12.1 0.29 0.21 111.30 38.52 ## 34 0.49 0.00 723 11.2 0.14 0.34 111.95 40.17 ## 35 0.36 0.00 365 6.9 0.15 0.31 114.90 39.16 ## 36 0.01 0.95 373 11.5 0.35 0.18 104.50 38.55 ## 37 0.00 0.60 733 9.4 0.23 0.37 112.35 35.27 ## 38 0.00 0.84 351 11.7 0.45 0.18 105.08 34.28 ## 39 0.63 0.00 244 7.2 0.20 0.29 110.38 40.47 ## 40 0.03 0.00 244 7.2 0.20 0.29 109.65 40.45 ## 41 0.07 0.00 207 7.2 0.22 0.28 109.75 39.88 ## 42 0.12 0.00 646 10.0 0.11 0.36 111.87 39.92 ## 43 0.36 0.00 593 11.0 0.13 0.35 111.75 40.12 ## 44 0.63 0.15 409 7.5 0.30 0.20 106.48 45.82 ## 45 0.86 0.03 409 7.5 0.30 0.20 106.48 45.87 ## 46 0.47 0.12 409 7.5 0.30 0.20 106.47 45.88 ## 47 0.58 0.05 409 7.5 0.30 0.20 106.37 45.85 ## 48 0.68 0.02 495 7.9 0.32 0.17 106.00 45.48 ## 49 0.71 0.11 717 7.5 0.34 0.15 97.00 43.50 ## 50 0.06 0.76 935 20.7 0.23 0.22 96.83 30.58 ## 51 0.04 0.86 976 21.2 0.29 0.18 97.00 29.00 ## 52 0.14 0.71 1011 17.7 0.23 0.19 96.00 33.75 ## 53 0.06 0.80 823 19.5 0.20 0.21 97.17 31.33 ## 54 0.34 0.00 207 7.2 0.22 0.28 109.83 39.66 ## 55 0.31 0.00 321 6.5 0.19 0.35 110.25 40.08 ## 56 0.21 0.00 418 5.9 0.20 0.38 110.83 40.50 ## 57 0.69 0.00 418 5.9 0.20 0.38 110.75 40.58 ## 58 0.48 0.00 270 6.9 0.21 0.27 110.00 40.50 ## 59 0.02 0.00 327 9.5 0.16 0.31 113.25 38.50 ## 60 0.08 0.28 327 9.5 0.16 0.31 113.25 38.50 ## 61 0.11 0.63 841 20.1 0.23 0.22 98.33 30.25 ## 62 0.11 0.00 332 7.8 0.12 0.40 115.75 40.33 ## 63 0.23 0.17 421 3.9 0.29 0.18 105.28 41.12 ## 64 0.18 0.66 527 7.4 0.28 0.16 104.82 41.25 ## 65 0.05 0.68 430 9.5 0.27 0.16 105.12 42.07 ## 66 0.19 0.24 512 2.0 0.42 0.12 97.50 49.87 ## 67 0.48 0.40 611 4.2 0.39 0.16 96.62 47.75 ## 68 0.02 0.05 169 17.1 0.28 0.32 116.08 36.83 ## 69 0.00 0.04 180 15.1 0.10 0.49 117.83 36.00 ## 70 0.00 0.00 117 18.1 0.21 0.39 116.25 36.67 ## 71 0.07 0.80 789 12.9 0.34 0.13 97.62 38.75 ## 72 0.31 0.39 892 6.6 0.35 0.15 93.20 45.40 ## 73 0.72 0.01 352 2.0 0.46 0.14 106.63 52.13 #Line 4 is how to rename a data set. While "paruelo" isn't that long of a word knowing how to r ename data sets #is very important if you are working with a data set such as "Genetic Diversity Across 8 Isl ands with a Bottleneck at..." #Instead of writing that out everytime you call the data set just rename it something short #Line 5 just calls the data set. Useful for checking if you renamed it correctly #We can also pull out specific columns from a data set and store them. mat <- plo\$MAT #Line 12 stores the MAT column from plo into its own data set. #Note, you can call it whatever you like #for example I.can.name.MAT.this<- plo\$MAT #gives us the same results as line 12. ################# #Measures of the center. To find these basic measures just do the following: #Finding median is simple median(mat) ## [1] 8.5 #Basic mean is also simple mean(mat) #Finds the mean of the given data set. ## [1] 9.99863 #We can trim the mean by whatever percent we want. mean(mat,trim=.05) #5% trimmed meaning taking the 5% lowest off and 5% highest off ## [1] 9.856716 #Why we would do this? Cutting outliers out of the mean calculation while leaving them in the d ata set for future use #Taking the mean of 90% (5+5=10 // 100-10=90) So we use .05 for the trim #Example 2: Lets take it to the extreme and only look at the middle 1% (not generally smart b ut for teaching purposes) mean(mat,trim=.495) ## [1] 8.5 #49.5% low + 49.5% high = 99% cut off, leaving the middle 1% #Geometric mean is when you multiply all the points together then take the n-th root (n= how ma ny points total) prod(mat)^(1/73) ## [1] 8.926788 #Line 34 gives us the geometric mean. #Remember if you dont remember/know what n is just run a str(dataset) to find out #Geometric mean isnt used much but it is still important to know how (ie for homework!) #Harmonic mean gives the MOST weight to the smallest point since they tend to impact the mean h eavily #Think of a data set of 100, 105, 107, and 5. The mean is pulled very far down by the 5. #Line 43 shows how to find it. 1/mean(1/mat) ## [1] 7.805661 #Range gives the min and max range(mat) #min and max ## [1] 2.0 21.2 #Standard deviation is also simple and is the average distance b/t the mean and any given indiv idual point sd(mat) # standard deviation ## [1] 4.681272 # Lets say mean is 10 and sd is 4.68, that means the average individual is 4.68 away from 10 #Lines 53 and 54 are simple commands to find the variance and interquartile range. var(mat) #variance = std dev squared ## [1] 21.9143 IQR(mat) ## [1] 6 #We are also to find specific quantiles/percentiles quantile(mat,.25) #25th percentile (just change .25 to any percentile you want) ## 25% ## 6.9 #Instead of running this command multiple times we can have it output multiple quantiles. quantile(mat, (1:3)/4) # 1:3 reads as 1 2 3 and /4 == 1/4 2/4 3/4 hence 25,50,75% ## 25% 50% 75% ## 6.9 8.5 12.9 quantile(mat,c(.025,.975)) #bounds the the middle 95% of the data ## 2.5% 97.5% ## 3.20 20.22 #takes 2.5% off the left and right (basically cuts off 2.5% and 2.5% to leave 95%)

