Data Confidence of Distributions with Mean and Standard Deviations
Data Confidence of Distributions with Mean and Standard Deviations CHEM 241
Popular in Intro to Chemistry Analysis
verified elite notetaker
verified elite notetaker
verified elite notetaker
Math 122B ( Math, Deirdre Smith, Calculus 1)
verified elite notetaker
Popular in Chemistry
This 8 page Class Notes was uploaded by Kaiyana Dudley on Wednesday September 21, 2016. The Class Notes belongs to CHEM 241 at University of Michigan taught by Stephen Maldonado in Fall 2016. Since its upload, it has received 5 views. For similar materials see Intro to Chemistry Analysis in Chemistry at University of Michigan.
Reviews for Data Confidence of Distributions with Mean and Standard Deviations
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 09/21/16
Week 2: Data Confidence of Distributions Using Mean, and Standard Deviations What is measurement distribution, and why is it important? • In experiments in analytical chemistry, we are trying to find the true value of the subject of observation • Measurement distribution is when a number is reported as a true value plus or minus some error o We can never know the true value of a distribution, but we can test in a way where we get close o The bigger the range of numbers, the less precise we can be • The argument about measurements is that all quantitative measures are games of chance o The goal is to get as close to the true measurement so that we can be confident in our data • In a way, measurement distribution is a game of probability like flipping coins o If we were to flip a coin several times, we can figure out the probability of getting a head by calculating all of the possible combinations. Then count how many of those combinations are a head. This will give you the probability of successful combinations. Is there a more efficient way of finding probability of the distribution? • Dealing with measurement distributions, the success and failure most likely won't be 50:50 like flipping a coin. • So, for bigger distributions, we have this equation of Probability for success or failure: o This is the binomial distribution gives us the number of trials or observations when each trial has the same probability of having one particular value (the area) • This equation can be derived by multiplying the average and standard deviation , and multiplying the product by 1/2 (this is just the area equation) § Plug in the values for mean and deviation below • The variable p is the possibility for success (also the area) • The variable x is the one successful value (or the value that we're looking for) o The variable n is the total number of trials • Recall that n!, when n=4, is 4*3*2*1 o To find the values of average or standard deviation, here are the equations: • The Gaussian (or normal) distribution is a special type of binomial distribution where n is infinitely large, and the probability of success p is significantly bigger than zero: o This is the same equation as before (p=area), but, because of derivative rules, np^x coverts to (1e/(standard deviation))^1/2(z^n) . o The central limit theorem says that given a large sample from a distribution with a finite variance, the average of all samples will be approximately equal to the average of the whole distribution (we will find out more after the first example) • In other words, the sampling distribution will come out normal at the average value if the samples are chosen randomly, even if the population is not normally distributed • Normal distributions are Gaussian distributions with a total area of 1 § This makes our graph curve perfectly symmetrical around the average (or normal) value given § The total area is 1 because all of the possibilities (N =100% of the distribution) o These charted values below show the areas calculated through Gaussian's equation. (This is helpful so you can see which z value corresponds to which area without manual calculation!) o The following example is valid given the average = 98.2+or -‐0.7 and z=3 • Mean = 98.2, standard deviation = 0.7 (these should always be given) § The z equation allows us to plug in the value we're looking fo r to find out the probability of it happening. We then compare the areas as z moves toward and away from infinity. There are 3 steps to go through to get the area values at z= -‐ infinity, z=0, z=3, and z=infinity: 1. -‐infinity < z < 0 (area range between z=-‐infinity and z=0) i. The area between these is always zero because the area at z = 0 is 0 per the chart 2. 0 < z < 3 (range between z=0 and z=3) i. At z = 0, we are at the average (or normal value) of T = 98.2 ii. At z = 3, we find that the area from the chart is 0.498 ( as z increases up to infinity, p reaches 0.5) a. Standard deviation is 0.7, so the area at z = 0.7 and z = -‐0.7 are the same on either side of the axes (because total area = 1) 3. 3 < z < infinity (area range between z=3 and z=0) i. At z = infinity, area or p = 0.5 § Now we find the area as z > 3 (from 3 < z < infinity). Subtract these values from the total area to find the area or probability of patients at T > 100.3: 1-‐0.5-‐0.498 = 0.00135 out of 100 patients • The doctor should expect less than 1 patient to have a tem perature greater than or equal to 100.3 degrees Fahrenheit § Although the mean and standard deviation of the whole distribution will always be given, we can measure the mean and deviation of a sample of the distribution. • Here are the sample equations: o Remember, the Gaussian Equation is really just a way to make a really good guess at what the probability would be So how can we be sure that our measurements are accurate? • This is why we take a sample, rather than trying to account for the entire dis tribution • Again, the central limit theorem says that given a large sample from a distribution with a finite variance, the average of all samples will be approximately equal to the average of the whole distribution (we will find out more after the first exa mple) o In other words, the sampling distribution will come out normal at the average value if the samples are chosen randomly, even if the population is not normally distributed • Without any data collection, we presume a null hypothesis that two samples are pretty much the same, or have no meaningful differences o Two values are the same within random variation (standard deviance) • The only differences are due to chance and random error • These variations are also known as independent variables o Basically, two samples should be normal or average, but there may be some differences or deviation o We test this using the distribution equation process • We need to report the average of a small number of measurements o For finite numbers of measurements, the validity for the Gaussian/Normal distribution is not clear o The measurement of the sample average and standard deviation may not reflect the true distribution average and standard deviation • Confidence intervals approximate how close the sample mean/deviation match the distr ibution mean/deviation o We use the student's t test to figure this out: § applying Gaussian to things that don't have infinite measures This chart shows some t confidence levels for n degrees of freedom: o If we made n measurements to find x and s for the sample population, the interval/range that would include the true population mean µ (whose value we do not know) with 95% confidence (i.e. 95% of the measurements), would be defined by t95 o We’re testing that there's some systematic error causing differen ces (not why, just existence of) o We reject the null hypothesis if there is less than a X% chance measured difference could come from random error (standard deviation). We can do this 3 different ways: (a) Compare value from one number of measurements wit h the average(single t test) Calculate t in relation to mean (n=5), then find the 98% confidence level on the table (n=4) 1. Blood cell ex: n = 5 days of measurements 2. Calculate the average of this sample , then plug in to single t test to calculate t 3. Compare this t value with the corresponding slot in table (4.28 is closest to 4.303 at 98%) 4. Try a different degree of freedom: n -‐1=4 (4th row on table) at x=98% a. Specify with a x% of confidence level: 98% 5. If calculated t value > table t value, reject null hypothes is a. If calculated t < or = table t, accept null hypothesis (because we have no proof otherwise) (b) Compare replicate measurements to see if they agree with the true value (calculating two separate sets of samples without using the average) 1. Use true value and standard deviation for two different n sample values a. Compare each set of t values from the table at the revealed level of confidence 2. If calculated t value > table t values, reject null hypothesis a. If < or =, accept null hypothesis ( c) Compare individual measurements (calculating two paired groups of the same population) You pick two n values and their corresponding x values, and calculate the t test one trial at a time with the mean in consideration 1. Each sample n measurement has a pair of output s whose differences are summed up and divided by (n-‐1) to get standard deviation of population What is the F Test? • We use the F Test to see if two or multiply different data sets (groups) have the same standard deviation (rather than average) • The Null hypothesis says that the mean values of several (more than 2) normally distributed sets that all have the same standard deviation, are equal o the groups should have the same average (Fcalc ≤ Ftable) When Fcalc > Ftable, reject null hypothesis When Fcalc ≤ Ftable, accept null hypothesis • Here is the table for select s1 and s2 values: • The analysis of variance (ANOVA) is to test the differences in the groups' means by comparing the standard deviations All measurements aren't a reflection of the average; what about outliers? • The Grubbs Test and the Q Test determine whether it is appropriate to remove an outlier Here is the table: If Gcalc > Gtable (at x=95%), reject null hypothesis o It's okay at 95% to remove outlier If Gcalc < Gtable (at x=95%), accept null hypothesis to be true o You are less than 5% confident to remove outlier (so don't!) • The Q Test allows us to examine if one (and only one) observation from a small set of replicate n observations can be legitimately rejected or not o Basically the same as the Grubbs Test, but using different variables • The Grubbs test and the Q test can be used in terchangeably (Grubbs is more popular) References: Stephen Maldonado, September 14 and 16
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'