Popular in Statistics for the Sciences
Popular in Math
This 4 page Class Notes was uploaded by Heli Patel on Sunday June 19, 2016. The Class Notes belongs to 3339 at University of Houston taught by Prof. C Poliak in Summer 2016. Since its upload, it has received 10 views. For similar materials see Statistics for the Sciences in Math at University of Houston.
Reviews for Day 2
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 06/19/16
● Population Variance ○ If N is the number of values in a population with mean mu, and xi represents each individual in the population, the the population variance is found by: ○ σ 2 = sumN i=1 (xi − µ) 2 N ○ and the population standard deviation is the square root, σ = √ σ 2. ○ Most of the time we are working with a sample instead of a population. So the sample variance is found by: s 2 = Pn i=1 (xi − x¯) 2 n − 1 and the sample standard deviation is the square root, s = √ s 2. Where n is the number of observations (samples), xi is the value for the i th observation and x¯ is the sample mean. ○ By hand find mean, square each scores, 1/(#1)*(all sum square #*mean), then square root the ans = sd ○ If we change the data set by adding/subtract then the mean changes and sd and var remains the same ○ If multiplied or divided everything changes ● X means + sd ○ y=a+bx a and b are constants ○ mean(y)= a+b(mean(x)) ○ sd(y) = b(sd(x)) ○ var(y)=b^2(var(x)) ■ X mean (x) = 3 sd (x) = 0.5 ■ y= 3+2x mean(y) = 3+2(3) = 9 ○ sd(x) = 2(0.5) = 1 ● The function for the sample standard deviation in R is sd(data name$variable name) ● . ● Coefficient of Variation ○ This is to compare the variation between two groups. ○ The coefficient of variation (cv) is the ratio of the standard deviation to the mean. ○ cv = sd/mean ○ A smaller ratio will indicate less variation in the data. ● Percentiles ○ The pth percentile of data is the value such that p percent of the observations fall at or below it. ○ The use of percentiles to report spread when the median is our measure of center. ○ If you are looking for the measurement that has a desired percentile rank, the 100P th percentile,the measurement with rank(or position in the list)nP + 0.5 =position where n represents the number of data values in the sample. ○ >fivenum(price) ● IQR Interquartile range, ○ IQR = Q3Q1 ● outlier ○ is an observation that is "distant" from the rest of the data. ○ Outliers can occur by chance or by measurement errors. Any point that falls outside the interval calculated by Q1 − 1.5(IQR) and Q3 + 1.5(IQR) is considered an outlier. ● GRAPHS ○ R code ■ For bar graph: plot(datasetname$variablename) ■ For pie chart: > counts<table(shoes$Brand) > pie(counts) ○ Dotplots ■ y putting dots above the values listed on a number line. ○ Stem and leaf plot ■ 1. Separate each observation into a stem consisting of all but the final rightmost digit and a leaf, the final digit. Stems may have as many digits as needed, but each leaf contains only a single digit. 2. Write the stems in a vertical column with the smallest at the top, and draw a vertical line at the right of this column. 3. Write each leaf in the row to the right of its stem, in increasing order out from the stem. Rcode: stem(dataset name$variable name) ○ Histograms ■ Bar graph for quantitative variables. Values of the variable are grouped together. ■ The width of the bar represents an interval of values (range of numbers) for that variable. ■ The height of the bar represents the number of cases within that range of values. ■ 1. Divide the range of data into classes of equal width. For example the price of the basketballs shoes are from $40 to $250 dollars. We can use a width of $20 for the classes. Thus the classes are: 40 ≤ price < 60 60 ≤ price < 80 . . . 240 ≤ price < 260 Be sure to specify the classes precisely so that each individual price falls into exactly one class and all of the prices are counted. 2. Count the number of shoe prices in each class. ■ 1. Mark on the horizontal axis the scale for the variable whose distribution you are displaying. 2. The vertical axis contains the scale of the counts. 3. Each bar represents a class. The base of the bar covers the width of the classes, and the bar height is the class count. There is no horizontal space between bars unless a class is empty, so that its bar has height zero. Rcode: hist(dataset name$variable name) ○ Boxplot ■ A graph of the fivenumber summary. I A central box spans the quartiles. I A line inside the box marks the median. I Lines extend from the box out to the smallest and largest observations. I Asterisks represents any values that are considered to be outliers. Boxplots are most useful for sidebyside comparison of several distributions. ■ Rcode: boxplot(dataset name$variable name) ○ Cumulative Frequency Polygon ■ Plot a point above each upper class boundary at a height equal to the cumulative frequency of the class. ■ Connect the plotted points with line segments. A similar graph can be used with the cumulative percents. ● Distribution ○ distribution of a variable tells us what values it takes and how often it takes these values based on the individuals. ○ The distribution of a variable can be shown through tables, graphs, and numerical summaries ○ There are four main characteristics to describe a distribution: ■ 1. Shape ○ skewed to the right if the right side (higher values) of the graph extends much farther out than the left side. I ○ skewed to the left if the left side (lower values) of the graph extends much farther out than the right side. I ○ uniform if the graph is at the same height (frequency) from lowest to highest value of the variable. ■ 2. Center the values with roughly half the observations taking smaller values and half taking larger values. ■ 3. Spread from the graphs we describe the spread of a distribution by giving smallest and largest values. ■ 4. Outliers individual values that falls outside the overall pattern ○ Lists the categories and gives either the count or the percent of cases that fall in each category. ○ One way is a frequency table that displays the different categories then the count or percent of cases that fall in each category. Then we look at the graphs (bar or pie) to determine the distribution of a categorical variable. ○
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'