INTRODUCTION TO ENGINEERING
INTRODUCTION TO ENGINEERING ENGR 120
Popular in Course
Popular in General Engineering
Jazmin Rowe II
verified elite notetaker
This 23 page Class Notes was uploaded by Jazmin Rowe II on Monday October 12, 2015. The Class Notes belongs to ENGR 120 at Idaho State University taught by Staff in Fall. Since its upload, it has received 11 views. For similar materials see /class/222171/engr-120-idaho-state-university in General Engineering at Idaho State University.
Reviews for INTRODUCTION TO ENGINEERING
Report this Material
What is Karma?
Karma is the currency of StudySoup.
Date Created: 10/12/15
Elementary Statistical Formulae Engr 120 Spring 2007 1 Means Medians and Standard Deviations i 1 4 4 4 u With numerquot fal Given a data set of n elements let s call it 17 entral value or represen values the mean and median are indicators of the tative value of the sample The mean is easily computed by The median is gotten by sorting the data into ascending order call this set 2 xi the elements have been reordered so that 21 S 22 S 4 4 427 and then the median armed is the element zgn 4 The notation 1 means round up 1 to the next higher integer e4g4 7 The standard deviation is a measure of the spread of the data values about their mean It is given by the formula not used for computation This formula is preferred because you can compute all of its constiutent pieces in one pass through the data set 2 Regression Lines Suppose one has data which constitute the measured r ponses w in an experiment the yi s corresponding to the controlled inputs 1 the data a presumed to follow a linear relationship of the form 4 Very often 1 nu i In this case it is seldom true that the data will actually fall on any straight line if u gt 2 Why 33011011 1301013101100 311101 03 1090 3 1011 HA I 31111 311103111 01 3 111 91913111 13 31 J 3121 3 138133 01 3 SI 3 I I I I I I I I I a I I I zJ ZIOpOIH 17301111 73 30 0001101110030 0111112101 0111 02111121112 01 130011 01 31301111213011 13011120 01110120111 01111 3901111 11010001301301 12 30 111 30 0001113003 0111 0111111101013 01 130011 110130 0101110120111 V 39130011120 110011 0111211 X12111 110111 111011 01 012 110101110110 01111211001201 12 0111211 11011 012 31101 00 111 1011011 12 103 01 010111110 011011110 1110 11101111 01 0111110 12 1011 01 11 39011011110 3111131100011 10 010110 111011101110120111 K11 130011120 0112 1111120111111 000111 3901111011 1211213 10111110 01 01111101100 01 13011120 01 11 012 01111 110101 31301 0111 39K10A110011001 0M 131112 0111 0111 30 011120111 0111 0112 Q 131112 01011111 gm Q 1 3 3 11713 E 1 110111 0112 11101110111 01111 103 11 131112 111 30 00111121 1001100 0111 112111 11110110 11110120 011 11120 11 391 310110 13011211130 30 001100 0111 111 1301110120111 012 0111100011 012 11121110 012 01 11011111 1211213 0111 30 3111313113 12 01 311111000 0112 0111 11211 i 1112 mg 321 11 10110 13011211130 11211131001 0111 0211111111111 01 012 11000110 1 11100101111 131112 111 011010 1111111 01111 12 1111111 1211213 0111 111 01 00120 01111 111 013 110130 0111 11211 Elementary Statistical Formulae Engr 120 Spring 2007 1 Means Medians and Standard Deviations i 1 4 4 4 u With numerquot fal Given a data set of n elements let s call it 17 entral value or represen values the mean and median are indicators of the tative value of the sample The mean is easily computed by The median is gotten by sorting the data into ascending order call this set 2 xi the elements have been reordered so that 21 S 22 S 4 4 427 and then the median armed is the element zgn 4 The notation 1 means round up 1 to the next higher integer e4g4 7 The standard deviation is a measure of the spread of the data values about their mean It is given by the formula not used for computation This formula is preferred because you can compute all of its constiutent pieces in one pass through the data set 2 Regression Lines Suppose one has data which constitute the measured r ponses w in an experiment the yi s corresponding to the controlled inputs 1 the data a presumed to follow a linear relationship of the form 4 Very often 1 nu i In this case it is seldom true that the data will actually fall on any straight line if u gt 2 Why What we often do in this case is t the data with a line with slope m and intercept b chosen as to minimize the residual squared error 1 l R33 2 ch mar 62 1 What we are seeking is a fudging of the data which is as small as possible as measured in the sense of squared error It can be easily shown that the correct values of m and b for this problem are then y 21 37 i Z b y mf vhere and y are the means of the xi s and ms respectively The regrt sion line as it is called is sensitive to outlier data points These typically are caused by measurement errors or recording errors It is not crime to throw out obvious outliers to get a better t so long as you have reasonable suspicion as to how they may have been caused EA EA disp acem ent cm Hookes Law Data and Regression Llne y07 X69 O de ection cm spring de ection and regression line i i i o no on y 207 X 417 2 weight oz Elementary Statistics In this section we cover the statistical concepts of means averages medians variance histograms frequency plots simple regression line fitting and sorting Associated concepts include collecting data samples sources of error significant figures and discovering trends The grade level for this material ranges from 478 The collection of experimental data is one of the foundations of the hard sciences and engineering As we have learned engineering and the sciences are based on both theory and observation The scientific role of the latter is to verify challenge or suggest new theory However in engineering often what is needed is not theory but a description of how an actual system is behaving Hence data either suggests an empirical model or is used to verify an hypothesized model Statistics is the tool we use for organizing and understanding that data and its relation to the model We will primarily discuss data collection in which we are concerned with only one dependent variable measurement for each observation sample point E g we might have a collection of students at our disposal and we may measure for each student hisher height One could of course imagine measuring more than one variable we could simultaneously collect height weight age and gender measurements for each of the students in this group and analyze this multi Variate data the ensuing model having 4 variables somehow related to each other Suppose we have a collection of objects all taken from some population at large Examples of populations could be all students in your school a collection of identical rubber bands all homes up for sale in your town during the year all fish swimming around in a local reservoir all bags of frozen French fries produced by a local processor the annual production of roller bearings made by a manufacturer all registered voters in your county all the trees growing on your uncle39s back 40 acres etc The key point is that a population does not need to consist of people let alone living things just things we want to describe some aspect of The collection we have at our disposal for making measurements on is called a sample A random sample can be thought of as being a sample wherein we made no attempt to select our collection preferentially Whether this is actually the case is something we need to consider when analyzing our data later If you have a large enough population at your disposal there are clever techniques for ensuring you have a random sample Otherwise we just assume we didn39t rig our experiment Corresponding to the above populations samples could consist of all students showing up in your math class on a particular day 20 rubber bands grabbed from the bag of rubber bands this week39s common realty listing 50 fish caught by gill netting on a particular day by Fish amp Game two cases of bags of frozen fries taken off the shelves of the producer39s warehouse 100 roller bearings pulled off the end of the manufacturing line every tenth item all voters showing up between 10 AM and noon at three different polling stations on election day the trees standing on a 100 ft x 100 ft plot staked out in your uncle39s forest etc Some of these samples may be more random than others Can you see why Related to our population we have a measurable characteristic we wish to quantify Examples of such a variable for the above populations might be height of a student coefficient of restoration restoring force under elongation of a gummi asking price of a home weight of a fish actual volume vs claimed volume of a bag of fries actual outside diameter of a bearing vs design diameter support for or against a particular candidate for office usable volume of wood in a tree etc After having selected or drawn your sample we now make the associated measurement of the variable at interest These results are typically numerical in our running examples all would be numerical except for the political support question If we assume we have N sample points at hand we now have N corresponding data values As an investigator we know we will probably see variations in these values except possibly for the political examplel What we want to know is whether this variation across the entire population can be described adequately by only a couple of parameters or statistics As you may recall from your college training a statistic is merely a value computed from a sample As a first approach to answering this question our measurements can be displayed using a frequency plot or histogram This merely plots the measured values vs the frequency of occurrence or relative frequency of that value With data that can only be discrete assumes only specific values e g integer valued data yesno response data eye color gender etc this poses no problem However with data that could conceivably be any real value we need to bin the values That is rather than plot measured values vs frequency we plot ranges of measured values vs frequency The actual choice of bins can influence your judgment and interpretation of the data They should not be so coarse as to hide the variation in the sample but neither should they be so fine as to make each bin count frequency a 01 proposition Remember the histogram is meant to guide you in interpreting the variation of the variable being measured and so you can adjust the binning until you are comfortable with what you are seeing As an example suppose in your 5 h grade math class 23 students showed up on the day you made the height measurements Suppose these data are rounded to nearest inch 54 55 61 52 48 49 56 53 53 56 60 49 53 55 52 58 55 54 51 54 56 57 56 As a first step I prefer to sort the data into either increasing or decreasing order 48 49 49 51 52 52 53 53 53 54 54 54 55 55 55 56 56 56 56 57 58 60 61 If we use the actual rounded heights as bins we get a spikey histogram Go ahead and draw it Now try a bin size of 2 starting at 48 then a bin size of 3 Which do you like best Does the data have any structure Here s mine Having the data sorted allows us to readily adjust bin sizes on our histogram and it immediately gives us the range of variation in the sample for us 48 through 61 Also it allows us to quickly find the median or the data value which has half the data lying above it and half below it The median is the 50 h percentile datum and hence is often considered a good representative or central value statistic for describing the data If you are interested in the other percentiles they are just as readily available now The median for our example is xmed 54 Now we can compute the average value for our data set Note that like the median value the average or mean only makes sense for numerical data not categorical data like eye color gender approvedisapprove etc The mean value xwe is computed as the sum of the data values divided by n For our data set 11 23 and the average works out to be xwe 542 The average is a second representative statistic of central tendency It measures the center of mass or balance point of the data set as if it were weights placed on a lever the position of the weights being the data values and the lever being the number line Lastly there is a measure of spread of the data about its mean value The standard deviation s is computed as the square root of the variance The variance in turn is computed as the average of the sum of squared deviations of the data from the mean Most calculators and spreadsheets have this computation as a built in function just as they do for the mean For distributions of data allowing a description as centrally located the standard deviation is a number which should round up about 23 of the data values in the corra xave s xwe s For our data set the standard deviation s 33 and indeed for us 1723 of our data are within s of Km which is 73 a little high but understandable with such a small sample The formulae for the mean and std of a data set are Please see the statistical formulae handout With the exception of the computation of the standard deviation all of the above concepts are readily accessible to grades 478 The concept of the standard deviation would be appropriate to 7 h and 8 h graders As a side note the analysis of data provides a nice classroom wide activity Teams can be formed to measure and analyze the same data set but because of systematic errors in the measurement process they may well obtain slightly different results Also the choice of bin placement and sizing is a judgment call and different representations of the same data are expected The sortng of data also allows for some classroom group discovery Different students will sort by different algorithms For small data sets there are many acceptable means of sorting You can ask the different groups to try to explain carefully to the other groups their choice of sorting methodology and whether that methodology can be extended to a different avor of data You may be surprised to find that the different groups have each discovered one variant or another of one of the classic sorting routines Some of these routines are insertion sort selection sort bubble sort merge sort and quick sort and of course random sort As the data sets become larger in size you will find that you need to be more careful about the choice of sorting algorithm We39ll have more to say about this in our discussion of Computer Science math applications So why do the statistics we have discussed so far play an important role in engineering and science One goal of both fields is to make predictions If random samples from a population have well defined means and relatively tight variance the engineer or scientist can hope that any particular experiment he she performs is representative of the population as a whole Moreover the sample histograms give a means of making empirical probability statements as opposed to theoretical probability statements about individuals drawn at random from the population Note that an individual need not denote a person It could be a roller bearing a fish a tree or a rubber band The engineer can make predictions based on the central values deduced by his sampling and make estimates of how sure he is of his predictions based on the variance Of course in any one experiment using a single sample he may be way off We will have some more to say and do about this in the section on empirical probability Perhaps more important to scientists and engineers is the use of data collecting and analysis to detect I prefer the term discover trends Having mastered the art of collecting a sample and deducing the basic statistics for that one measurement we can move on to the case where we now allow a second variable to come into play We will term this as one variable or measurement as a function of a second controlled variable This really won39t be a bona fide math function until we do some more statistical analysis though Why Examples of this might include height of a student as a function of grade level elongation of a gummi as a function of weight force applied to it asking price of a home as a function of neighborhood weight of a fish as a function of length outside diameter of a roller bearing as a function of the time since the assembly line was last serviced usable volume of wood in a tree as a function of height etc I39m dropping those examples where it would be a stretch to set up a reasonable functional relationship or hypothesis of a relationship What we39ve done previously can be applied here as well We merely set up a random sampling scheme where we record both variables the presumed dependent or measured variable height elongation price weight diameter volume etc and the independent variable class grade force neighborhood length time since servicing height The choice of which variable is dependent and which is independent is made by either recognizing that you have direct control over one variable or that you suspect from experience theory or hope that a functional relationship somehow should exist If we have control over the independent variable as in sampling from various grades we can then make a sequence of analyses just as before for each of the values of the independent variable Eg a height histogram for each of the grades 4 5 6 7 and 8 Then we can produce a single plot summarizing these results Just plot the means vertical axis obtained as a function of grade level horizontal axis We could also put standard deviation bars vertically above and below each mean Now look at your results Your eye and the immense brain attached to it is the best trend detector yet found in the world If you see a recognizable trend you can now proceed to more sophisticated analysis If on the other hand we really have no control over the independent variable its value is whatever it turned out to be e g as would likely occur in the fish or tree data then we need to proceed directly to this more sophisticated analysis The first step is to plot the data dependent variable vertically vs independent variable horizontally Now you have what is known as a scatter plot Here is an example of a scatter plot elongation 3 U l O 02 7 You can clearly see there is some sort of trend to the data right What sort of relationship do you think might exist between the force and elongation variables The next step is to guess at a functional relationship The most common guess if the scatter plot supports your intuition is to fit a linear relationship to the data Now you can either do this by eye appropriate for grades Zlk6 or you can introduce the concept of the best fit suitable for hand computation in grades 778 Mathematically there is no single best fit to any data set unless all the data coincidel However several statistical fitting criteria have been proposed and the simplest amongst them is the least squares criterion You probably learned about this back in college under the topic of linear regression all we need are the algebraic formulae for the end result Before giving these it should be pointed out that this fit also produces a balancing act of a more complicated nature than that of the simple mean But you should be aware that as a balancing act this fit can be thrown out of kilter by unreasonably out of whack data points If you notice any data that are way out in left field known as outliers you may wish to censor your data set of them Often they are the result of a data recording error If we denote the horizontal axis variable as x and the vertical axis variable as y the equation of a straight line a linear function is yx m x b The slope is the quantity m and the y intercept is the term b Probably no single choice of m or b will hit all the data in the sense that for each data point Xi we get yi the measured value obeying yi yxj m Xi b Least squares chooses compromise values for m and b The formula are Please see the statistical formulae handout As you can see these are entirely manageable for 73911 and 83911 graders Most spreadsheets and even programmable calculators have these formula built in All you need to do is enter the Xi yi data values and the machine computes and displays the resulting best fitting straight line Here is the regression fit for the previous scatter plot regression line y 86 x 08 elongation Here are some suggested projects for classroom work 1 In a single class have the students collect height measurements in small workable sized groups Perform the elementary data analysis sorting histogram median mean Discuss the sorting algorithm used by each group 2 In a single class have the students collect weight measurements Same analysis as l 3 Collect the height and weight data across a sequence of grades 4 8 if possible Perform the scatter plot analysis of height vs grade and weight vs grade Overlay with the single class results from 1 and 2 Use a random sampling technique poll students enteringleaving the cafeteria to get height vs age data Perform the scatter plot analysis How does the regression line agreediffer from that in 3 Perform the Hooke39s Law experiment Have the students break into workable sized groups and each group experiments one out of a package of identical rubber bands or springs Have them perform the scatter plot analysisregression fit for the data 4 LI spring de ection and regression line i i i 107 7 no i y 207X417 7 de ection cm on i i 4 i i i i i i i 2 weight oz Here39s Mary39s trial of this experiment Nice data set and nice linear regression fit but what is wrong with her plot stylistically I mean So Ken had a whack at the data and came up with the following plot Can you tell what the differences are In the Hooke39s Law experiment we are trying to find the spring constant Hookes Law Data and Regression Line displacement cm a l l 0 10 20 30 40 50 60 70 80 90 100 so that force pulling back by the spring is given as a linear function with zero y intercept fkx We don39t have that here The spring we are using is an extension spring not a compression spring So the spring won39t start to behave linearly until it is stretched out a bit 0 gt1 9 Perform the Torricelli39s Law experiment In the lower grades assign each group of students a different shape water container and then make the flow rate measurement of time needed to fill a small standard volume all containers have water at the same height Have different students time each run of the experiment Each group does a simple histogram analysis and then aggregate data across the entire classroom Repeat 6 but using the equivalent measurement of measuring volume over some standard short time period Which experiment has more scatter greater variance Repeat 6 or 7 over a range of different heights In upper grades 678 perform a regression fit and see if the fit is reasonable Does the data suggest otherwise Introduce the idea of a simple variable transformation you are plotting velocity of exiting water as a function of the water level height The data theoretically should not lie on a straight line over a wide enough range of heights try plotting velocity squared vs height Perform the slingsho conservation of energy experiment Again form several groups and have each make measurements of time of ight vs change in elongation of the sling Have each group perform a scatter plot analysis Do the data tend to lie on a straight line 10Perform the coefficient of expansion experiment for water Each student is asked to take home a graduated container fill it to an assigned volume assign several different values so you can do a scatter plot and freeze it overnight Heshe then measures the resulting volume of ice how can you do this and reports back the original liquid volume and the resulting ice volume Take the resulting data and plot for each value of original volume the value of the ice volume Perform a scatter plot analysis The slope of the best fitting straight line should be the coefficient of expansion 11Have the students sit at a controlled intersection on the sidewalk and make a count of the number of cars that stop at the intersection over a reasonable period of time For each stopped car have the students record whether the driver is wearing hisher seatbelt shoulder harness If the observation is impossible eg windows too dark vehicle too high off the ground etc discard the observation Perform a scatter plot analysis or scale all the individual data sets to a usage value and perform a simple histogram analysis ISP claims a 70 usage rate Is your data in agreement 9 Later in the short course we will introduce you to simulated experiments you can perform These will be based on computer generated simulations of idealized physical or mathematical models but which still allow the student to make experimental measurements of the system In many of these the only source of error will be observational errors some of the experiments have a built in pseudo random behaviour The above actual physical experiments do illustrate just how difficult and messy real world experimentation which occurs in engineering and the sciences really is Even if it is a messy business it can be real fun and educational subjective time value subjective tlme experiment m A w time sec 9464 89 37855 384 7097 601 1893 186 37855 314 37855 394 2366 55 4732 68 249 35 14196 1183 16564 193 2366 296 7098 887 1183 1183 1183 592 480 30 240 20 14195 592 946 95 18927 2366 16561 1134 946 51 Ln m4 Msmgvam 1m expansmn 1am Watevgt1ce mean1s1 1mm 51m mm B 114 115118 11 112114115 expansmn mm 18 12 1221