### Create a StudySoup account

#### Be part of our community, it's free to join!

Already have a StudySoup account? Login here

# APPLIED DATA ANALYSIS CHEN 3010

GPA 3.92

### View Full Document

## 62

## 0

## Popular in Course

## Popular in Chemical Engineering

This 82 page Class Notes was uploaded by Sylvester Sauer on Thursday October 29, 2015. The Class Notes belongs to CHEN 3010 at University of Colorado at Boulder taught by Staff in Fall. Since its upload, it has received 62 views. For similar materials see /class/231947/chen-3010-university-of-colorado-at-boulder in Chemical Engineering at University of Colorado at Boulder.

## Reviews for APPLIED DATA ANALYSIS

### What is Karma?

#### Karma is the currency of StudySoup.

#### You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 10/29/15

CHEN 3010 Selected Homework Problems Dave Clarke Tuesday November 8 2005 Topics covered 1 Hypothesis testing 2 Confidence interval interpretation 3 ANOVA interpretation Hypothesis testing Is the basis for good decision making about the system in which you are interested It is important that the general procedure is well learned applicable for remainder of this course Hypothesis testing Outline See text pp135145 Specify the null and alternate hypotheses Choose analytical technique State and test for necessary assumptions Choose level of significance or State appropriate test statistic State rejection criteria Perform computations State conclusions in the context of the problem PNP WPPNT Hypothesis testing 3 ways to reject the null hypothesis 1 Calculate test statistic and compare to a rejection criterion Reject Ho according to specific criteria Calculate Pvalue and compare to the level of significance Reject Ho if P lt a Calculate an appropriate confidence interval and see whether or not the hypothesized population parameter value falls inside or outside of the interval Reject Ho if value outside of interval lf Ha at then calculate a 2sided confidence interval lf Ha gt then calculate a lower onesided Cl lf Ha lt then calculate an upper onesided Cl Example Problem 440 Measurements on the percentage of enrichment of 12 fuel rods used in a nuclear reactor were reported as follows Is the enrichment different from the specification of 295 Data Point Enrichment Arbitrary Units 1 311 2 284 3 3 08 4 288 5 286 6 289 7 3 08 8 3 04 9 312 10 3 01 1 1 3 09 12 298 1 Specify hypotheses Measurements on the percentage of enrichment of 12 fuel rods used in a nuclear reactor were reported as follows Is the enrichment different from the specification of 295 H Cl H0 X2295 u 7b 295 These values are ALWAYS the SAME 2 Choose analytical technique Tests that we have learned ztest inference on the MEAN of a population VARIANCE KNOWN p146 inference on the MEANS of TWO populations variances KNOWN p 199 inference on a single population proportion approximate binomial distribution with normal distribution p178 inference on two population proportions p230 ttest inference on the MEAN of a population variance UNKNOWN p162 inference on the MEANS of TWO populations variances UNKNOWN and EQUAL p205 inference on the MEANS of TWO populations variances UNKNOWN and MEQUAL p205 inference on the MEANS of TWO populations variances UNKNOWN for which the observations were collected as PAIRS p218 2 Choose analytical technique Tests that we have learned continued 38 test inference on the VARIANCE of a normal population p172 Ftest inference on the RATIO of VARIANCES from TWO normal populations p224 ANOM amp ANOVA inference on the means of more than two samples p235 2 Choose analytical technique So which test do we use We want to make an inference about the MEAN of a population Do we know the POPULATION variance No Therefore 9 one sample ttest 3 Assumptions This test requires that the underlying population from which the sample was drawn is normal Test using a normal probability plotnormal scores plot Probability Plot of C1 Normal 95 CI 2 U1 WW i3 Percent 2158 III H 4 Set level of significance level of significance a risk of type I error Usually 005 or 001 5 State the test statistic a test statistic is a quantity that is known to behave according to a known probability distribution ie the tdistribution 6 State rejection criterion Hazyiyo togtta 0rt0lt t n1 a 2 217 1 NOTE p208 error in 3lrcl rejection criterion It should read 6 Perform computations X 2998 S 0105 n 12 2998 295 1595 t 0 0105 m 002541 2 2201 t002511 2201 to ltt002511 and to gt t002511 Excel calculation n mean std dev to tcrit 311 284 308 288 286 289 308 312 301 309 298 12 2998 0105 1595 2201 6 Perform computations Minitab output OneSample T Test of mu 295 vs not 295 N Mean StDev SE Mean 95 CI T P 12 299800 010500 003031 293129 306471 158 0142 7 State conclusions to lt tcritical therefore we fail to reject the null hypothesis at the 005 level of significance We cannot say that the mean enrichment level is significantly different than 295 Other rejection criteria 1 P gt or 2 295 is IN the 95 CI 293 306 How to interpret a confidence interval Confidence intervals are used in parameter estimates That is you come up with a number and wish to know the plausible range in which it can exist PLSuSU1 a confidence coefficient lowerconfidence limit upperconfidence limit Technical interpretation If an infinite number of random samples are collected and a 1001d CI for p is computed from each sample 1001d of these intervals will contain the true value of p Practical interpretation The interval Iu brackets the true value of p with 1001d confidence Example Problem 440 Measurements on the percentage of enrichment of 12 fuel rods used in a nuclear reactor were reported as follows Find a 99 twosided Cl on the mean enrichment Are you comfortable with the statement that the mean of enrichment is 295 Why Example Problem 440 X 2998 S 0 105 We are 99 confident that the true mean percentage 39 enrichment is within 290 and 309 Since 295 is within this n 12 interval we are comfortable with the statement that the mean percentage enrichment is 295 t000511 3106 S S x t000511KZglugx itoooiuXZ 2998 3106010 s s29983106010y J5 J5 hyosys3o ANOVA Table Interpretation Problem 558 An experiment was run to determine whether four specific firing temperatures affect the density of a certain type of brick The experiment led to the following data a Does the firing temperature affect the density of the bricks Use d005 b Find the Pvalue for the Fstatistic computed in part a ANOVA Table Interpretation Temperature deg F 100 125 150 175 Density arbitrary units 218 217 219 219 219 214 218 217 217 215 218 218 216 215 216 217 217 215 216 215 218 218 H011 72 Ta 0 Ha 71 7t Ofor atleastonei Rejecting Ho implies that at least one treatment effect is not equal to zero Side note Why are we analyzing variances instead of the means Temperature deg F 100 125 150 175 Density arbitrary units T 218 217 219 219 g 219 214 218 217 a 217 215 218 218 E 216 215 216 217 g 217 215 216 i 215 218 218 lt variation gt variation treatments vs variation random error The ANOVA Table Excel IEXCEL OUTPUT Anova Single Factor SUMMARY Groups Count Sum Average Variance 100 7 152 2171429 0018095 125 4 861 21525 0015833 150 5 1086 2172 0027 175 6 1305 2175 0011 ANOVA Source of Variation SS df MS F PVaue F crit Check Between Groups 013911 3 004637 2615911 0082655 3159911 Table 57 Within Groups 0319071 18 0017726 p241 Total 0458182 21 F f gt f a number of treatments 0 0 a 3 61 19 Na N total number of observations Here FoltFcritical therefore we fail to reject the null hypothesis How does random error propagate in calculations y fx a apparent mean random error random errors tend to have a central distribution about zero What value of 8 then to use in propagating the error through Descriptive Statistics 1 Minitab Variable III Mean Hedian TrIean StDev SE Mean 01 T5 39i l52 71500 71152 1412 153 1Variable Minimum Maximum 01 03 01 4300 10000 5300 8200 i A i B 1 1 Commiii 2 1 3 Mean H512 4 iaiarrdard Error 1530503F13 Excel 5 Median F5 5 Made 00 iquot IStandard Deviation 1412144230 5 Sample Variance 1094151351 9 0110515 43304343423 10 iSkewneas 01113070059 11 Flange 55 12 Minimum 43 131Maiiimum 108 14 iSum 53134 15 Count 75 Stern and Leaf Minitab StemandLeaf Display E1 Stem and leaf of El H T5 Leaf Unit 10 4 4 3889 19 5 D002344446TTT88 30 E DUDDUlEEEEE 181 T 11333445565TTT8899 2T 8 D0000112233445555TT399 5 9 U123 1 1D 8 Automatic Histogram Minitab Hislngram with Nurmaltum nlC1 Histogram of C1 with Normal Curve No Bins J7 587 gt o c a 2 U 8 LL Box Plot by hand or minitab iqr 8258 24 q1 q2 q3 Addition Rule PA1UA2PA1PA2 PA1mA2 r intersection and A Al A2 A1 A A2 Conditional Probability PPM11 PA1m A2 probability that A2 will occur P A1 given thatA1 does occur Multiplication Rule PA1mA2 PA2 A1PA1 Independence PA1 A2 PA1PA2 Mean and Variance yJxfxdx 02EOx y2fxdx Example Uniform Distribution 11 2 JZdex The Normal Distribution Normal density function 2 72 for ooltxltoo foo ge Standard normal density function y0 01 6 2 for ooltxltoo Standard Normal Cumulative Distribution Binomial n sample ortrial size n fx x px I p x p probability of successfailure x number of successesfailures Binomial Distribution 250 200 150 100 50 44 00 06 2345678910 x Hy pergeometric Po ulation with N items r N r p X n x m N n Hypergeometric Distribution 500 400 A 300 5 200 100 00 Sample withdrawn of n items Negative Binomial fxX g1 pxrpr rI23 xrr1r2 r x number of trials needed to obtain exactly r successes Example A pipeline is being inspected for faulty welds The probability that a weld is defective is 001 The welds are spaced at 100 ft Inspection starts at one end of the pipeline and ends when 3 faulty welds are found What is the probability that the inspection will extend more than a mile F39 0D Syntax 5g HEEBIHDMDIST fxr numberfnun1bersprohahilitys FER53 gunman Szr Numberj is the number of failures Numbers is the threshold number oF Phi53 DD15155 successes Probability is the probability oF a Firth53 11983845 success Poisson distribution 1 X mass function fX i X no of counts X In Interval mean and variance uEXZXfxl 02VarXiX y2fxi X0 x0 Poisson distribution Lambda05 Poisson distribution 05 lambda x fx 0606531 0303265 0075816 0012636 000158 0000158 132E05 0999999 Total CDU IACONAO Exponential distribution describes the quotdistancequot x between events of a Poisson process xi quotdiStancequot may be a physical dimension or time density fx 19 cumulative distribution Fx1 9 mean and variance yEX 02VarX Asymmetric continuous distributions Many physical processes amp properties naturally show an asymmetric behavior in their distribution Examples particle sizes polymer molecular weights processcomponent reliability There are numerous distribution functions that can be used to describe asymmetric behavior The most common are lognormal gamma weibull Example Normal Score Normal Scores Plot of Heat Capacity Data Heat Capacity Value 2 15 I 1 05 I 0 quot 3905 on to the workshop 1 J39 to learn how to create 7 probability plots and 391395 normal scores plots 39 l l l l 0620 0625 0630 0635 0640 0645 0050 0055 0060 0065 0670 Independent Random Variables Parallel Series C1 C1 CZ 9 95 CZ 95 PC1 PCZ 995855 PC1 or CZ1PC1 PCZ 1105 995 Combinations of Dependent Y2X1X2 Variables EY 1 2 VY 012 022 2 C0vX1X2 C0vX1X2 pm2 012022 11 x2 2051 X1x2 x2 correlatlon coef ment i1 n n n pxlxz leiXZi Zx1i 2x21 n i1 i1 i1 I0x 1 Workshop Review Session Exam 1 Workshop 1 Coin Toss Experiment Small sample sizes may not behave normally Increasing sample size tends to follow probability HeadsTails 9 50 Roll of a die 9 16 Workshop 2 Error Analysis Sources of Error Human Instrument material variability Estimate of random error propagation of volume 2 Calipers with 1 pm resolution 61w 161w 1 2 V ado do adj di aL L 7rLd 2 7rLd 2 2 wV ETOOOOlmmj Ti0001mmj d02 di20001mmj do 22mm di20mm WV 1869mm3 0071 L40mm V V 2369mm3 Workshop 2 Error Analysis Estimate errors in measurement AccuracyReadability of Instrument Human Error Material Variation Workshop 3 Statistical Diagrams StemandLeaf diagrams p 2326 in the text Stepbystep directions in workshop Histogram bins Workshop 4 Statistical Diagrams Box Plot stepbystep directions Line is drawn for median value indicates general symmetry of distribution CommentsObservations becoming important Workshop 5 Probability Two methods for testing a shipment of 5 items A Take 2 of 5 p 25 B Insp1 take 1 Insp2 take 1 PA1 UAZPA1PA2PA1 m142 my of and 1415 Workshop 6 The Normal Distribution Mean and standard deviation NORMDISTX mean stdev cumulative Workshop 7 Sampling with and without Replacement With replacement BINOMDIST Without replacement HYPGEOMDIST ZP1 Workshop 8 The Poisson Process Poisson Process discrete occurrences in a continuous interval A rate POISSON Exponential Distribution distance between events of a Poisson Process EXPONDIST LABEL AXES OF GRAPHS Workshop 9 Central Limit Theorem of bins for 4000 data points 64 normdist upper bin center boundary cell address 01 true normdist ower bin center boundary cell address 01true200 Includes scale to appropriate height Workshop 10 Probability and Normal Scores Plot Full plot must be taken into consideration to decide whether data appears normal Envelopes can help identify outliers Too many outliers nonnormal distribution Observations and comments are important CHEN 3010 Selected Homework Problems Dave Clarke Tuesday September 27 2005 The Normal Distribution Homework 4 Problem 4 4 The tenslie suengm of a metai pan is nunnauy disinbuied wnn a mean mu in and standard devlation of a it Katmai such pans are reduced new man wouidfail to meet a minimum specl cahun m4 lb How many would have a tensile strength in excess 0M7 5 ma The Normal Distribution Characteristics of the distribution mean 40 lb std dev 8 lb Plot distribution Use NORMDISTx mean stdev FALSE 6 ODE02 3 u E 500E02 7 E a 400E02 7 E m 300E02 7 C m 39U a 200E02 7 E 6 D 1 OOEO2 7 2 n39 000E00 0 20 4O 60 80 The Normal Distribution Probabilities of CONTINUOUS distributions area under the curve Probabilities of DISCRETE distributions value of probability mass function at x 600E02 FM 2 PX i x fdl Pn xibFib Fa 500E 02 7 400E 02 r 300E02 r 200E02 r 100E02 7 Probability density function fx 000E00 The Normal Distribution Px 34 NORMDST34408TRU E 0227 50000 x 0227 11350 parts would fail the minimum specification Probability density function fx 600E02 500E02 r 400E02 r 300E02 r 200E02 r 100E02 r 000E00 The Normal Distribution Px 2 475 21 Px S 475 1 NORMDST475408TRUE 0174 50000 x 0174 8700 parts would have a tensile strength greater than 475 lb Given distribution properties and a value for P and you are Probability density function fx 600E02 500E02 7 400E02 7 300E02 7 200E02 7 100E02 7 000E00 20 40 e0 x 475 80 asked to find x which Excel function would you use Discrete Distributions Homework 4 Problem 6 B Cotch lirlters used in the production of rocket fuel are subjected to a nitratiori process that enables the cotton bers to go into solution The process is 90 effective in that the material produced can be shaped as desired in a later processing state with a probability of 09 What is the probability thatexa m 20 lots Willi be produced in order to obtain the 3rd defective lot 2 ways to solve 1 harder 1 easier Discrete Distributions What is probability of 2 successes in 19 trials AND a success on the 20th trial Use BINOMDIST and probability rules P 01 probability of success ldefective lot 1st calculate probability of 2 successes in 19 trials BNOMDST21901FALSE 0285 239 determine probability of a success in a single trial P01 Discrete Distributions 3 Determine probability of the 3IrCI defective Iot occurring on the 20th lot Define events A 2 successes in 19 trials PA 0285 B success on any given trial PB 0 Events are independent therefore PA m B PA PB PA m B 0285 01 PA m B 00285 3 Discrete Distributions How many trials are required for 3 successes NEGBNOMDSTfaiures prior to ith success number of successes probability of success NEGBNOMDST17301 00285 3 Mean and Variance Homework 3 Problem 4 The thickness of a conductive coating in micrometers has a density function of 600x2 for 100um lt x lt 120 um and zero for x elsewhere a Determine the mean and variance of the coating thickness b If the coating costs 050 per micrometer of thickness on each part what is the average cost of the coating per part Again more than one way to solve Analytically and numerically Mean and Variance 1 Find mean and variance analytically u 0 xfxdx 1138600x1dx 6001nx1 8 60047875 46052 2 10938 109 0 2 fooxzfxdx 2 fggxzmoxde 1093932 120 60034100 11967 72000 60000 11967 3317 33 Mean and Variance 2 Find mean and variance numerically Probability Thickness Density Cumulative Calculate mean Calculate Variance x fx Area Distribution xfx Area xmuquot2fx Area 90 0000 0 0 100 0000 0 0 0 0 0 100 0060 0 0 6 0 3577 0 103 0057 0175 0175 5825 17476 1261 7257 106 0053 0165 0340 5660 16981 0158 2129 110 0050 0206 0546 5455 21818 0257 0831 113 0047 0145 0691 5310 15929 1309 2350 116 0045 0137 0828 5172 15517 3056 6548 120 0042 0173 1000 5 20 6282 18675 120 0000 0 1000 0 0 0 0 130 0000 0 1 000 0 0 0 0 Sum 10772 Sum 3779 x x fx 600x Area T lmx fxl1 Mean and Variance 3 Part c Coating cost 050 per um thickness per part Average cost 050 mean thickness 5450 per part Describing Data Homework 2 Problem 3 more or less The following data are the joint temperatures of the Orings degrees F for each test firing or actual launch of the space shuttle rocket motor Compute descriptive statistics Compute boxplot parameters Draw the boxplot Draw a stemandIeaf plot Are there any outliers Describing Data Descriptive stats mean 6586 stdev 1216 upper quartile 75 Use QUARTILE lower quartile 595 function IQR 155 15IQR 2325 inner fences lower quart 15IQR upper quart 15IQR whisker limits ie data points within whiskers outer fences lower quart 3IQR upper quart 3IQR 3625 9825 40 84 1300 106 Describing Data Boxplot Boxplot of C1 90 80 70 60 C1 50 4o 30 Describing Data StemandLeaf Plot Outliers StemandLeaf Display C4 Stemandleaf of C4 N 36 Leaf Unit 10 3 1 3 4 0 4 59 5 23 5 788 13 6 0113 7 6 6777789 16 7 000023 10 7 556689 4 8 0134 tomBNAA Probability Basics Homework 3 Problem 1 I In studying the causes of power failures at chemical process facilities the following observations have been made 5 are due to transformer failure 80 are due to transmission line failure 1 involve both failures Compute the probabilities for a given power failure for the following scenarios t t line failure given that there is a known transformer failure t2 transformer failure given that there is a known line failure if transformer failure but no line failure M transformer failure given that there is no line failure t5 transformer failure or line failure Probability Basics Set up problem Power Failure Cause Probabilities 5 transformer T 80 line L 1 both T n L TmL Probability Basics 1st 2 parts are conditional probabilities 11 PL T PT m LPT 001005 02 12 PT L PL m TPL 001O8 00125 Probability Basics 3IrCI part is confusing PTmL39 PTmL39 005 001 PTmL39 004 Probability Basics 4th part is less confusingconditiona probability TmL PTmL39 PL39 I M PT L 02 PT L39 02 PT L39 Probability Basics 5th part is straightforward PT u L PT PL PT m L PTUL 005 08 001 PTUL 084 Correlation Homework 2 Problem 4 The weight and systolic blood pressure of 26 randomly selected males in the age group 25 to 30 are shown in the following table a Create a scatter diagram of the data What do you anticipate will be the sign of the sample correlation coefficient b Compute and interpret the sample correlation coefficient Subject Weight Systolic BP 1 165 130 2 167 133 3 180 150 4 155 128 5 212 151 6 175 146 7 190 150 8 210 140 9 200 148 10 149 125 11 158 133 12 169 135 13 170 150 14 172 153 15 159 128 16 168 132 17 174 149 18 183 158 19 215 150 20 195 163 21 180 156 22 143 124 23 240 170 24 235 165 25 192 160 26 187 159 Systolic BP 180 7 170 7 160 7 150 7 140 7 130 7 120 7 1107 100 100 150 Weight 200 250 Correlation Homework 2 Problem 4 r 2 xy SxxSyy n n taxi X151 SUMPRODUCTxy Sxy 2pc xy y leiyl f SUMxSUMyIn 2 0R 5 Elm x DEVSQX Create separate array n 2 dev xmean SW 201 y DEVSQy Then calculate i1 SUMSQdev r2077 Correlation Homework 2 Problem 4 CORRELxy r2077 PEARSONxy

### BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.

### You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

## Why people love StudySoup

#### "I was shooting for a perfect 4.0 GPA this semester. Having StudySoup as a study aid was critical to helping me achieve my goal...and I nailed it!"

#### "Selling my MCAT study guides and notes has been a great source of side revenue while I'm in school. Some months I'm making over $500! Plus, it makes me happy knowing that I'm helping future med students with their MCAT."

#### "I was shooting for a perfect 4.0 GPA this semester. Having StudySoup as a study aid was critical to helping me achieve my goal...and I nailed it!"

#### "Their 'Elite Notetakers' are making over $1,200/month in sales by creating high quality content that helps their classmates in a time of need."

### Refund Policy

#### STUDYSOUP CANCELLATION POLICY

All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email support@studysoup.com

#### STUDYSOUP REFUND POLICY

StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here: support@studysoup.com

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to support@studysoup.com

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.