Intro Stat Notes Week 2
Intro Stat Notes Week 2 TMATH 110 C
University of Washington Tacoma
Popular in Intro Stat Applications
Popular in Math
This 6 page Class Notes was uploaded by Qihua Wu on Monday October 12, 2015. The Class Notes belongs to TMATH 110 C at University of Washington Tacoma taught by KENNEDY,MAUREEN C. in Fall 2015. Since its upload, it has received 21 views. For similar materials see Intro Stat Applications in Math at University of Washington Tacoma.
Reviews for Intro Stat Notes Week 2
Report this Material
What is Karma?
Karma is the currency of StudySoup.
Date Created: 10/12/15
Levels of Measurement Categorical data Nominal in statistical speaking it contains the least information which only includes categories ex names and labels and it can not be sorted When you see numbers representing some meaning or categories like conducting a research with 1 meaning yes and 2 meaning no it is still nominal since the numbers used here are not for quantity Ordinal data can be sorted but the differences between the data values result in arbitrary numbers it is meaningless to do math with it Ex different crash rates between cars can be sorted but does not mean one car is safer than the other Quantitative data Interval data can be sorted and the differences between the data values have meanings However there is no quotnaturalquot zero starting point ex years and temperature Ratio data can be sorted differences between the data values are meaningful and there is a quotnaturalquot zero starting point ex weight age and distance quotNaturalquot zero starting point means where we can actually starting counting from 0 There is no set 0 year but weight can have 0 gram Important Concept in Research Design Population census and sample methods that use sample data to make conclusions about populations Parameter describe some characteristics of a population measurements are in numbers M is Greek letter for parameter that is obtained from census Statistic describe some characteristics of a sample measurements are in numbers Collect Data Census measure the entire population Take samples Observational Study consists of only observing individuals no manipulation Retrospective past collecting data on things that already past Cross Sectional current collecting data in one point of time Prospective future tracking individuals through a period of time Experiment collecting data by having control groups and manipulation groups Randomization randomly choose groups for treatments to avoid bias Replication apply treatments to more than one person to ensure the treatment is effective for the entire population Blinding participants do not know whether they are in the control group or manipulation group to avoid the placebo effect where the participants have responses because they believe they are receiving treatments when in truth they are not and bias Double blinding both the participants and data collectors do not know which group they are in Avoid confounding avoid unexpected variables that can affect the experiment to avoid it design the experiment where everything can be controlled randomization and blinding avoid bias and placebo effect Other ways that can ensure the results are due to the treatments in an experiment Completely randomized Randomized block similar to cluster and strati ed samplings rst divide the sample size into blocks then randomly assign groups within the blocks Matched pairs compare the before and after effects on one person then conduct a lot of similar experiments on other people widely used in drug experiments since it contains more individual effects than groups First Step in Data Exploration Frequency distribution investigate the distribution of the data by looking at the shape they constructed the shape of it affects the statistical analysis Common Distribution Shapes are Normal bellshaped symmetric it has low data on both ends and peak in the middle In statistical drawings you would want to have a more dramatic change between the ends and the peak Uniform symmetric every category has the same amount of data Skewed right positive it is similar to normal except it is not symmetric and has a longer tail at the right and short on the left Skewed left negative similar to right except instead of having a longer right tail it has a longer left tail and short on the right In real data we probably would not obtain perfect data that match the distribution shapes entirely in that case we would either need to estimate to see which is closely matched or obtain more data to see a better pattern Frequency table identify pattern in large data sets 1 Divide data into classes bins finite intervals where we can count individual data 1 The most common way to decide how many numbers of classes is to test it out when you have too much detailed it means you have too many classes when the data are clustered it means you have too few classes 2 Find the class width after you nd the number of classes Class width range number of classes Range maximum data value minimum data value Round the class width to the next integer and make sure classes do not overlap 3 Determine lower class limits Usually the minimum data value is the lowest lower class limits or a close value that ts the data Every lower class limit is the previous lower class limit plus the class width continue the process until the last class reaches the largest data value 4 Determine upper class limits by adding the class width to the corresponding lower class limits All class limits should have the same amount of decimal places as the data values 5 Class boundaries separate individual classes usually the midpoint between the classes the differences between the boundaries and limits are that they can share border where the limits can not Class boundaries between the two classes are the midpoints between the upper class limit of the previous class and lower class limit of the next class The difference between each class boundaries is the class width You would need to nd both the boundaries between classes and the boundaries outside of the lowest class and the highest class always nd the ones between classes rst and then nd the ones outside of the classes 2 Count the frequency how many data fall into each class the sum of all frequency should match the sample size n Relative frequency amount of data in a given interval that is relative to the total number of samples n Relative frequency f frequency for class sample size When you add all the frequency together it should be close to 1 Cumulative frequency how many data are within the current range tells us how many data are within the current data range Cumulative frequency F add the frequencies for the current class and all the previous classes The nal cumulative frequency should equal to the sample size 3 Make a table that has classes as the rst column and the matched frequencies in the second column Ways to Graph Data Histogram visualize frequency tables by drawing columns with observation variable and units of measurement usually class boundaries or midpoints on xaxis and frequencies on yaxis do not use class limits because they would have gaps between bars which should not happen unless there is no frequency in the data if class boundaries have too many decimals use midpoints instead by drawing the corresponding boundaries half way next to the midpoints histogram is hard to interpret on small data Scatterplot used when we want to see the relationship between each data set it is drawn with paired data where one variable in the pair is plotted on the x axis explanatory independent variable and the other on the yaxis responsedependent variable Timeseries graphs time at xaxis and variable on yaxis we call the pattern trends Often times when the variable is seasonal we would see cycles in the graphs Used when a variable is tracked over time often connect the points with a line to better see the trend Dotplot plot each data value as a point along a horizontal xaxis Data are stacked when there are equal dots in the data It is best used for discrete data Bargraph to show frequencies of categorical data by putting different categories on the xaxis the categories should be put in order if possible and frequencies on yaxis In contrast to histogram gaps between the bars allowed Pareto chart similar to bar graph except it is put in a way where the categories are sorted by the amount of frequency they have starting from the most to the least Great for comparing distribution Frequency polygon use the same axes as a histogram with class midpoints on the xaxis at each class midpoint we locate a point at the frequency for that class and then connect the nearby points with a line Used for frequency or relative frequency Always start and end at 0 Ogive used for cumulative frequency similar to frequency polygon except the points are drawn at the upper class boundaries for each class Start at lower class boundaries and end at higher class boundaries It usually increases if not it would be at meaning there is no add on frequencies but it would never be decreasing A good statistical graph should include title variables labels on axis with units sources of the data if available Characteristics of misleading graphs Nonzero axis or axes that are not on appropriate scales Do not use pie charts or pictographs Note some of the de nitions are from the notes of the instructor due to the fact they are concise and hard to paraphrase without making them confusing