STATS 205 Week One

by: Janay Notetaker

STATS 205 Week One STAT 205 001

USC

These notes cover the first part of statistics. These are the basics that will be the background for the rest of the course. Covers how to use the R statistics software some basic commands. Will be...
Math, Statistics, usc, stat, 205, R, Cai, Chao Cai, programming, Statistical programming, R software, R program, research, Elementary Statistics for the Biological and Life Sciences, MATH 218A, Hendrix, Easy statistics, basic statistics, vocab
This 3 page Class Notes was uploaded by Janay Notetaker on Sunday August 28, 2016. The Class Notes belongs to STAT 205 001 at University of South Carolina taught by in Fall 2016.

Date Created: 08/28/16
Week one STAT 205 8/22 – 8/26 To do(s): Download R Stat software and practice using it with sample data. How to open files: NameYouCallTheFile <-- read.table(“C:\insert path name here\file.txt”, header = FALSE) (to make this easier change the working directory by going to file, change directory and select the folder that the file you want is in. If you have changed the directory you only have to type the file name (“nameOfFile.txt”) rather than the whole (“ C:\Users\MyLaptop\Documents…”). getwd() will tell you the working directory you are currently in. How to display first few data: head(NameYouCallTheFile) Other commands: ?head means how much data do you want to display On mac: try using the misc button if you need to start with the working directory hist(NameYouCallTheFile) #will make the default R histogram hist(NameYouCallTheFile, freq=FALSE) #gives y axis probability density mean(NameYouCallTheFileOrData) # gives mode median(NameYouCallTheFileOrData) # give median summary(NameYouCallTheFileOrData) #will give a five number summary, gives sample statistics boxplot(whateverYouCalledtheData) #makes a boxplot Statistics – you gather, Study/interpret, draw conclusions Vocab: Random sampling – everyone has same probability of being chosen Why random? To keep the sample accurately reflective of the population Experiments vs Observations: experiments (have conditions put on them on purpose by someone or something), observations (no conditions put in place, you are just watching what is happening) Common sample techniques: Convenience – not random, could be just whoever you walked by that day th Systematic – ordering people and picking every 5 member (for example) in the ordered list of people Stratified – looking at groups in a population, like characteristics of a population such as age or gender Cluster – dividing into groups regardless of characteristics and selecting a certain piece of the group (such as looking at classrooms then saying every left half of the classroom will be a cluster sample) Types of error: Sampling error – caused by the procedure (this would cause the statistics to not be representative of the population) Non sampling error – not based on the procedure (like someone wrote down a book thinking it was a movie) Variable – random events measured on experimental or observational units (can be categorical or numeric) Categorical variables – nominal (no order, like eye colors) or ordinal (order, like cold/warm/hot) Numerical variables – continuous (on numerical scale, like height) or discrete (is finite/countable, like puppy litter size) Observational unit – the collection of things recorded/measured *Note: variables will be uppercase; observations are lowercase* Frequency distributions – bar chart or table, something that shows when variables happen Relative frequency – frequency (divided by) total size of sample Dotplot – data on a horizontal line with each value stacked, so if a group has 5 people in it then there will be 5 dots stacked above that group Histogram – a bar chart that has connected bars **Note histograms may not be the best way to show shape, it may change based on the graph (could be a change of the axis)** Mode – highest data point Longer tail to the right = right skewed data Bin = class Survival analysis – a group of people that will have events and you wait for these events to occur (in this analysis it is hard to calculate the mean because you have to wait for everyone to have this event) Interquartile range (IQR) = Q3 – Q1 will tell half of the data lies in this range Boxplot – will show a 5 number summary Outliers calculation = Q1 – 1.5 x IQR and Q3 + 1.5 x IQR

