STP 231 week 3 notes
STP 231 week 3 notes STP 231
Popular in Statistics for Biosciences
Popular in Statistics
verified elite notetaker
This 5 page Class Notes was uploaded by Andrej Sodoma on Friday September 9, 2016. The Class Notes belongs to STP 231 at Arizona State University taught by Dr. Ye Zhang in Fall 2016. Since its upload, it has received 10 views. For similar materials see Statistics for Biosciences in Statistics at Arizona State University.
Reviews for STP 231 week 3 notes
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 09/09/16
STP 231 lectures covering 2.12.4 I.) 2.1 section about variables. Variables are things that are a part of your study that can be defined. Upper case letters represent the definition of the variable, which could be height, gender, number of home runs, etc. Lower case letters represent values of the variable, which could be six foot tall, number of males, 200 home runs, etc. A. Numeric variable: the variables take on numbers, its quantitative. i. Discrete: a numerical variable with a limit. Like the number of seats in a stadium. ii. Continuous variable: a numerical variable without a limit. Like a measurement because a measurement can have an infinite number of significant figures. For example 100.000009 meters. B. Categorical variable: has ranked variables i. Example: Schooling; easy, medium, and hard. ii. Ordinal variable: It is a ranked categorical variable. For example how much school you have had, bachelors, masters, PhD, Doctorate. C. Discrete variable = a variable with a limited number of values. Example, number of people in a class. D. Continuous variable = a variable with a unlimited number of values. Example, weight. II.) 2.2 and 2.3, sections about organizing data. A. Frequency and frequency distribution i. Frequency: number of times each category appears in a data set. ii. Frequency distribution: a value that shows the number of instances a variable occurs. iii. Relative frequency: the number of times the variable occurs divided by the total. It is used when two or more data sets are compared. iv. Relative frequency distribution: the number of times a variable occurs in a percentage. B. Bar Chart: A graph that displays the frequency or relative frequency in a sequence of vertical bars. i. It must include all of the classes in the data set. ii. If relative frequency is used then the sum of all of the relative frequencies must equal one. iii. One can convert from relative frequency to frequency by multiplying the relative frequencies by the total. iv. It can be used for qualitative and quantitative data. C. Dot plot i. A graph that shows all of the data. ii. Each dot represents one value in the data set. D. Cut point and single value grouping i. Single value grouping: each class is made up of one value Example: 1,2,3,4,5,1,2,3,4,5,1,2. class Frequency Relative Frequency 1 3 0.25 2 3 0.25 3 2 0.167 4 2 0.167 5 2 0.167 ii. Cut point grouping: each class is made up of intervals, which means each class is comprised of lower and upper limits. iii. Rules: All classes have the same width, the grouping must consider all values, each interval has a lower and upper class cut point, values must belong to one class. width = ((max –minimum) / ( number of classes)) iv. Example: 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2. Number of classes = 2. Width = 2 class frequency Relative frequency 13 6 0.50 35 6 0.50 E. Stem and leaf plot i. A table that shows the data in an organized manner. It is not good for large data sets. ii. Rules: data must be in order from least to greatest, every plot must have a key, for two line plots the lines must be divided evenly. iii. One line per stem: consists of two parts one part is the stem the other is the leaf and one digit takes up one stem. iv. Two lines per stem: each stem value will take up two lines. Example: 50,51,52,53,54,55,56,57,58,59,60,61,62,65,67,70. Key: 7|0 = 70 Stem Leaf 5 1,2,3,4 5 5,6,7,8,9 6 0,1,2 6 5,7 7 0 7 F. Histogram: A graph comprised of vertical bars. The height represents the frequency or relative frequency and width represents each class. i. Fitted curves: occur when you connect the corners of the rectangles in your histogram. ii. Choosing the width is important because it can drastically affect the way your data is shown. G. Mean: total divided by the number of values. i. i = 1 represents the sum starting with the first number in the data set. ii. n = number of observations iii. Xn = final value in the data set iv. It can be done using summation or by summing up the entire data set then dividing by the total number of observations. v. Not robust because it changes when there are extremes in the data set. It can only be used in numerical data sets. H. Median: middle measurement of a data set. i. In order to find the median the data set must first be put in ascending order. ii. Odd numbered data sets the median is the middle value. iii. Even numbered data sets the median is the average of the two middle values. iv. Robust because it does not change with extreme values. It can only be used in numerical data sets. I. Modality: describes the distribution of a data set by the shape of the curve. i. Unimodel distribution: A distribution on a graph with only one peak. ii. Bimodal Distribution: has two peaks. iii. Multimodal distribution: has two or more peaks J. Mode: the measurement/s that occur the most in a data set. i. Robust because it does not change with extreme values. It can be used in both categorical as well as numerical data sets. K. Skewness: represents symmetry of a graph. i. Symmetric distribution: equal distribution. Also the mean = median ii. Left skewed distribution: the left side is elongated but not the right. The mean is less than the median because there are more small values in the data set. iii. Right skewed distribution: The right side is elongated but not the left. The mean is greater than the median because there are multiple larger values in the data set. iv. Steps: first, arrange the data in ascending order. Second, Find the median of the data set, which then represents Q2. Third, split the data set into two equal halves. Fourth, obtain the median of the first and second half of the data set halves resulting in Q1 and Q3. III.) 2.4 section covering quartiles, boxplots, and fivenumber summaries. A.) Quartiles: divide the data set into quarters (Q1, Q2, Q3) i.) Q2 is simply the median of the entire data set. ii.) Q1 is the median of the first half of the data set. iii.) Q3 is the median of the second half of the data set. iv.) Interquartile range (IQR) = Q3 – Q1= variation for the middle 50% of the data set. v.) Outlier: it’s a value that is distant from the data set. It is considered an error. Upper fence: Greater than Q3 by 1.5 times the IQR. Lower fence: Less than Q1 by 1.5 times the IQR. B.) Five number summaries i. It consists of quartiles, which are Q1, Q2, and Q3. Plus minimum and maximum values of the data set. ii. These values make up a box and whisker plot. C.) Box and whisker plots i. The box is comprised of the three quartiles. ii. The whiskers are comprised of a line going through the box ending at the maximum and minimum of the data set. iii. Better than histograms because they show outliers clearly.
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'