Stat 200 - WEEK 2 SUMMARIZING AND GRAPHING
Stat 200 - WEEK 2 SUMMARIZING AND GRAPHING
Popular in Course
verified elite notetaker
Popular in Department
This 21 page Study Guide was uploaded by kimwood Notetaker on Friday November 13, 2015. The Study Guide belongs to a course at a university taught by a professor in Fall. Since its upload, it has received 21 views.
Reviews for Stat 200 - WEEK 2 SUMMARIZING AND GRAPHING
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 11/13/15
WEEK 2: SUMMARIZING AND GRAPHING DATA Complete this guide on your computer as you read. CHAPTER 2.1 and 2.2 In Chapter 2, we are going to look at sets of data and explore ways of analyzing that data so that conclusions can be made from that data. Now, it is possible to have a data set that has many as 100 data points or even 1000’s of data points. We need to summarize this data in a way that makes the results more obvious and understandable. One way to summarize the data is by using a frequency table. If you look at Table 21 (Academy Awards) there is a lot of data. Granted it is organized (beginning with the first awards ceremony) BUT the results are not obvious at a glance. Now look at Table 22. Here the results are more direct. We can immediately see that most of the award winners were between the ages of 31 and 40. This is a simple FREQUENCY table. How many actresses were 4150 years old when they won? 12 How many actresses were 5160 years old when they won? 2 How many actresses were 4160 years old when they won? 14 Now look at Table 23. This is the same information, but now expressed as a percentage. Relatively speaking, this table gives us an even better idea of the data. Now we know 39% of the actresses were between the ages of 31 and 40 when they received their awards. This is a RELATIVE FREQUENCY (%) table. Magazine publishers love relative frequency tables because they can present information without having to list any confusing (or maybe compromising?) data sets. By the same token, the reader has to trust that the publisher has data sets that are accurate and representative of their claims. What % of actresses were 4150 years old when they won? 16% What % of actresses were 5160 years old when they won? 3% What % of actresses were 4160 years old when they won? 19% Now look at Table 28. COMPARATIVE RELATIVE FREQUENCY tables are even more fun to use. We can fairly readily see that young actresses are more likely to win an award but actors can expect that they will have to prove themselves for a longer period of time before winning any award. Assuming the data is correct, then we might be able to conclude that males and females are not necessarily judged by the same sets of rules by the Oscar Awards Committees. A comparative relative frequency table allows us to critically evaluate the data. Any gap in the data larger than 5% is considered significant. What % of actresses were 4150 years old when they won? 16% 1 What % of actors were 4150 years old when they won? 39% What is the % discrepancy between the two groups? 15% Now you can go ahead and read pages 5053 and then complete the matching question below with respect to Table 22. Lower class limit 21, 31, 41, 51, 61, 71 Upper class limit 30, 40, 50, 60, 70, 80 Class boundaries 30.5, 40.5, 50.5, 60.5. 70.5 (separates the classes) Class midpoints 25.5, 35.5, 45.5, 55.5, 65.5, 75.5 (middle of each class) Class width (maximum#minimum#)/# classes = (8021)/ 6 = 10 If you were able to select the correct matches with no problem, great, if the concept is still a little confusing, then try reading the definitions and examples below. Class boundaries: For some reason this concept usually takes a little longer to absorb. Use the following information to complete the table below. Take the lower class limit of one class (let's say 51), and take the upper class limit of the previous class (50) and subtract the smaller number from the larger. So, lower class limit – previous upper class limit = (51 – 50) = 1. Then, take that difference between the classes (1) and divide by 2. So, the mean difference is 1/2 = 0.5. Now, go back to the first class (60 – 51). Subtract 0.5 from the lower class limit (510.5 = 50.5) and add .5 to the upper class limit (60 + 0.5 = 60.5). So, we get 50.5 – 60.5 as the class boundaries around the actual class (5160). Proceed in the same way to get the class boundaries for each class. HINT: The upper class limit of one class boundary is always the SAME as the lower class limit of the next class boundary. To see this as a diagram, see Figure 21. Lower boundary limit Class Upper boundary limit 30.5 3140 40.5 40.5 4150 50.5 50.5 5160 60.5 60.5 6170 70.5 Class Midpoint: Add the lower class limit (51) and the upper class limit (60) and then divide the sum by 2. So, (51+60)/2 = 101/2 = 55.5. The class mark can also be interpreted as the “midpoint mark” or the “Class Mark”. Class Width: Besides using the formula provided [(maximum#minimum)/# classes], the class width can be described as the lower class limit of one class minus lower class limit of the previous class. (ie (5141) = 10). This is usually easier to remember. 2 NOTE: The class boundaries in conjunction with the class midpoints are used to make a histogram (picture) of frequency and relative frequency tables. See Figures 22 and 23 for examples. These histograms make the data even more obvious. Our goal in this Chapter is to be able to create lovely histograms. SAVE YOUR WORK PERIODICALLY AND TAKE A WELL EARNED BREAK! NORMAL DISTRIBUTIONS In Science and in Math, a NORMAL distribution is usually expected to be a fairly symmetrical BELL CURVE. Read pages 5456. A normal distribution implies a simple data set with no complicating factors. An abnormal distribution implies something has skewed the expected data. Our job is often to determine what that “something” is. EXCEL Now follow the instructions on pages 56 58 to practice how to use EXCEL to construct a frequency distribution and a histogram. Note the following changes to the instructions: 1. The instructions assume that you will be using the data provided for Actress Ages in Table 21. You do not have to use these to test the system, but if you do, you should produce tables identical to the examples given on pages 57 and 58. 5b. If you want to create a Bin Range, these values must appear on your worksheet prior to Data Analysis. Note that you will be using the UPPER CLASS LIMITS only in EXCEL. The instructions have you placing the Bin Range in column B but you could just as well place it in any column and just change the site accordingly. For example, if you placed the values in column C, then you would enter the range C1:C6 in stead of B1:B6 as given in the example. Practice using EXCEL to create a frequency table. 22 Basic Skills and Concepts Before you proceed with the reading, you might want to practice what you have learned. You can place the answers to the following selected questions below. The fillin tables have been constructed for you here and in TH to give you more time for practice work. At the end of WEEK 2, the TAKE HOME assignment (TH2) will be posted. TH2 assignment will consist of a selected number of the practice problems assigned here. Remember to save this CLASS NOTES file periodically as you are working (into your STATS folder) and once again before you close it. You can HIGHLIGHT then COPY and PASTE the requested components of your completed work directly into the TAKE HOME assignments. All future CLASS NOTES and TH assignments will be set up in this manner to save a little time and to have you practice the concepts you have just read 3 about immediately. Please ask questions as you work on the materials so that I can give you timely responses and information. Remember that the odd numbered questions are answered in the back of your text and in your Student’s Solution Manual as examples. CHAPTER 22 222 223 225 Note: Please use a calculator and refer to the formulas above (p. 2) if needed and show the formula you chose to use for class width below. Some data has been PARTIALLY entered for you What is the class width formula? Classes: Daily Class Class Frequency EXCEL EXCEL Temperatures Boundaries Midpoint BIN Format Frequency in ºF. 3539 1 4044 3 4549 44.549.5 47 5 49 5 5054 11 5559 7 6064 7 6569 1 226 Note: You will use the same formulas for decimals as whole numbers, just be careful to enter each number accurately. What is the class width formula? Classes: Daily Class Boundaries Class Frequency EXCEL EXCEL Precipitation Midpoint BIN Format Frequency in inches 0.000.49 31 0.500.99 1 1.001.49 0 1.501.99 1.4951.995 1.745 2 1.99 2 2.002.49 0 2.502.99 1 227 Note: You are now working backwards from the EXCEL FORMAT to create the frequency table. Check the Student Solutions Manual for the example to start the chart and then complete it on your own. Check the manual again to see if your calculations were correct. Some sections are finished for you but others are only partially done so be sure to complete them. 4 What is the class width formula? Classes: Class Boundaries Class Frequency EXCEL EXCEL Height inches Midpoint BIN Format Frequency 60.064.9 59.9564.95 62.45 4 64.9 4 65.069.9 64..9569.95 67.45 25 69.9 25 74.9 74.95 74.9 9 79.9 79.95 79.9 1 84.9 84.95 84.9 0 89.9 89.95 89.9 0 94.9 94.95 94.9 0 99.9 99.95 99.9 0 104.9 104.95 104.9 0 109.9 109.95 109.9 1 228 Use what you have learned in the previous three examples to complete this chart. What is the class width formula? Classes: Plastic Class Class Frequency EXCEL EXCEL Weigh in lbs Boundaries Midpoint BIN Format Frequency 0.99 8 1.99 12 2.99 6 3.99 0 4.99 0 5.99 0 6.99 0 7.99 5 8.99 15 9.009.99 8.9959.995 9.495 20 9.99 20 Whenever possible, do not just answer a question with a YES, NO or MAYBE, also explain WHY Questions Answer Why? 229 2210 A B 2211 A B C 5 2212 A B NOW IS A GOOD TIME FOR A COFFEE BREAK ! RELATIVE FREQUENCIES Relative frequency for each class can be found by dividing its frequency by the sum of the frequencies. If you find the sum of all the relative frequencies, it should be close to 100.0% if you have used enough decimal points. If you round the numbers up however, occasionally the sum of the relative frequencies will be a tiny bit less than or more than 100.0%. (ie 99.97% or 100.01%). 2213 Temp. ºF Frequency Relative Frequency 3539 1 4044 3 4549 5 5054 11 5559 7 6064 7 6569 1 Sum = Sum = 2214 Rain “ Frequency Relative Frequency 0.000.49 31 0.500.99 1 1.001.49 0 1.501.99 2 2.002.49 0 2.502.99 1 Sum = Sum = Now, let us put what you have learned to practical use. We will look at a very popular chemical used in today’s society. Nicotine. You will use your EXCEL program to manipulate the data and then transfer the results to the tables below. For question 2220, the Data set on STATDISK is called CIGARET. Select only the Nicotine data for this exercise and copy and paste it into your new EXCEL file. You will 6 next create your 8 classes and enter the upper limit of each class into column B of your EXCEL worksheet before you can used the DATA ANALYSIS function under TOOLS. To create the classes requested you must: 1 Determine how many classes you want to have. 8 Usually you want to keep this number below 10 so the data is easy to work but you can have more. 2 Find the lowest value in the data set 3 If the lowest value is close to zero, assume it to be zero 0 or select a lower value that is convenient to use based upon the number of classes you wish to have. 4 Find the highest value in the set 5 If the highest value is not convenient to use, you can choose a slightly higher number based upon how many classes you want to have and their width. 6 Determine the class width. (Given in the question) The 0.2 width must be in useable, logical even increments. In this case, we know the class width must increase by increments of 0.2 since the directions are explicit. If our first, lowest class limit is zero, then our next lowest class limit must be 0.2 and the third lowest class limit must be 0.4. These values have been entered for you in the table below. Finish the lower class limits until all 8 are present. Next you must determine the upper class limit for each of the 8 classes….remember that the classes cannot overlap. So, you cannot for example have 0.00.2 for the first class since 0.2 is already used in Class Two. The upper limit of Class One must be greater than 0.0 and less than 0.2, plus it must have the same number of decimal places as the lower class limit. You could use 0.00.1 or you could also use 0.000.99 or 0.0000.999 depending upon the data. Which do you think would be the most logical (simplest) to use in this case? Place all your selected upper class limits into the table below. Review your classes to see if they are logical and then copy the only UPPER CLASS LIMITS (UCLs) into your EXCEL file in column B. Question 2220 Nicotine Classes Frequency Relative Frequency 0.0 0.2 0.4 Sum = Sum = 7 After transferring the UCLs, use your EXCEL program and the TOOLS, DATA ANALYSIS, HISTOGRAM functions to determine frequency. Once you have the frequencies, just copy and paste them into the table above. Find the sum of the frequencies and then calculate the relative frequencies (%) for each class using your calculator. Describe the nature of the distribution of nicotine classes Is the frequency distribution a NORMAL distribution? Yes or No and WHY? If nicotine concentrations of 1.4 mg or higher can cause lung cancer, what % of the selected population is at risk? For further practice, try using your EXCEL program and do the odd numbered questions, using your Student’s Solutions Manual as a guide. 2.3 HISTOGRAMS A histogram is a picture of a frequency or relative frequency table where the xaxis usually has the class boundaries (or class midpoints) and the yaxis uses the frequencies. The histograms for the Ages of the Best Actresses are present in Figures 22 and 23 on page 62. Note how the class boundaries make the range distribution continuous. One can determine the distribution of the data by looking at the histogram. For example, complete the following frequency table by calculating the Relative Frequency %. Grade of Students Number of Students A 4 B 9 C 10 D 3 F 1 To fill a cell with color, select the cells and then select the paint bucket on your tool bar to fill the cells with any color you choose. The first one is done for you. If you do not see a paint bucket, select VIEW, TOOLBARS, and then check TABLES AND BORDERS. c 10 e 9 e 8 F 7 8 6 5 4 3 2 1 0 A B C D F Grades Note that the shape of the histogram resembles a bell shape so it is a normal distribution. If a frequency table of grades has a skewed shape (See Figure 22), then the entire class may be very smart, or the instructor is just being too nice . A relative frequency histogram will appear to have the same distribution as a frequency histogram. The only difference is that the y axis scale will be marked with a % distribution instead of actual frequencies. Read pages 6366 and follow the directions to create histograms using EXCEL. Instead of using the Actresses ages, you can copy and paste the nicotine data from the EXCEL worksheet that you used for question 2220. You can copy and paste your completed Histogram below. Be sure to adjust the size of the histogram to fit the page comfortably. NOTE THE FOLLOWING CHANGES TO INSTRUCTION 8: 1. Some programs do NOT have the TEXT BOX icon. In this case, you can right click on the graph and select CHART OPTIONS to modify your chart titles, axis, legend and grid lines. 2. Use the Class midpoints (see the sample histogram below for the midpoints) instead of the Class boundary values. You can enter these into the frequency table you generated when creating the Histogram. The BIN and FREQUENCY table should magically appear to the left of your Histogram. When you change the table, the Histogram also changes. Your histogram should be similar to the one shown below except the background will be white instead of gray. 3. When you have completed your sample histogram, copy and paste it below. Sample Histogram: 9 PASTE YOUR NICOTINE HISTOGRAM BELOW: NOTE: You can eliminate lines below your pasted graphs by clicking below the graphs and BACKSPACING. You can also change the size of your graphs by clicking on the graph and on the corner boxes and dragging the graph handles to the size you want. 23 Basic Skills and Concepts Hints Answer 235 Add the column frequency values 236 Find class width. Determine the Min range for each class for the midpointMax given. 237 A coxswain directs the rowers only. The rowers do the actual work. 238 Subtract two consecutive midpoints Create a Histogram for Question 2315 (Weights of Pennies) using Data set 14 in Appendix B (EXCEL Data set: COINS). Paste your Histogram below this line. HINT: Choose 5 classes for your frequency with the Upper Class Limits and midpoints provided below when creating your graph. UCL Midpoints 3.00 2.975 3.05 3.025 3.10 3.075 10 3.15 3.125 3.20 3.175 PLACE YOUR HISTOGRAM FOR 2315 HERE: You have created the histograms for Question 2312 (Nicotine) and Question 2315 (Weights of Pennies) immediately above. Now answer the questions pertaining to them below. Answer Yes or No and explain Why? 2312 2315 TIME FOR A REALLY NICE PIECE OF CHOCOLATE CAKE AND TEA! (OR PERHAPS CHIPS AND PEPSI??) 24 STATISICAL GRAPHICS This section introduces various types of graphing techniques including: Frequency Polygons, Ogives, Dotplots, Stemplots, Pareto Charts, Pie Charts, Scatter diagrams and Timeseries graphs. As you read through this section, follow the EXCEL practices. You will not remember all the instructions the first time around, but remember that if you are in asked in the future to create a certain type of graph, all you have to do is go to the INDEX of this text and find the page number that provides the detailed instructions OR you can use the HELP guides which are included in the EXCEL program. The goal of this exercise is to become familiar with the various graph types and to be able to not just create graphs, but be able to critically interpret them as well. FREQUENCY POLYGON Follow the directions on page 70 and create a Frequency Polygon. After step 7 you will want to create Titles for your graph. Rather than using EXCEL steps 8 and 9 on page 65, you can follow the instructions to the right of the sample polygon. Save your work. Sample Frequency Polygon: 11 Right click on the graph and choose CHART OPTIONS Select TITLES. Type in the titles for the Chart, X and Y axis and then select OK. Click the white background of the chart. COPY and PASTE your frequency polygon below The sample provided has a blue gradient background. Your chart will have a plain gray background, but if you wish you can change the color to any color, texture background you wish (other than the sample). To do this, click on the chart and select FORMAT PLOT AREA, and then select your desired colors. PASTE YOUR FREQUENCY POLYGON HERE RELATIVE FREQUENCY POLYGON Now you can create a relative frequency polygon with the Actress and Actors data sets. You will be doing this slightly differently. Transfer the relative frequency (%) data in the table below to your EXCEL file. Actresses Actors 0 0 37 4 39 33 16 39 3 18 3 4 3 1 0 0 12 1. Once the data is in EXCEL, then HIGHLIGHT the entire set. 2. Select INSERT, CHART, LINE to make the graph appear. 3. Select SERIES and adjust the X axis labels to read: 4. (space),25.5, 35.5,45.5,55.5,65.5,75.5,(space) and then click NEXT. 5. Give your graph appropriate TITLES: “Relative Frequency Polygon” for Chart Title; “%” for the Value (Y) axis and “Age in years” for the Category (X) axis. 6. Select GRID and remove the gridlines, but leave the Legend. 7. Click FINISH. 8. Once the graph appears in EXCEL, you can modify it. 9. Click on the Legend (Actresses/Actors) and drag it up 10. Click on the side of the gray graph and stretch the graph to the right 11. Click on the Legend again to position it in the top right corner of the graph 12. Click on the white area of the graph and COPY and then PASTE it below. PASTE YOUR RELATIVE FREQUENCY POLYGON HERE OGIVE (“OHJIVE”) Look at Figure 26 for now. Instructions for creating this graph come later in the course. Estimate how many of the Actresses were younger than 30.5 years of age. DOTPLOTS Look at the dotplot at the top of page 72. How many of the Actresses were older than 72 years of age? STEMPLOTS Read about Stemplots. This is a different way of looking at data. It is like a vertical histogram only the bar now actually consists of the single digit numbers (leaves) on the stem (class). Look at the stemplot at the bottom of page 72. 13 How many of the Actresses were 60 years of age or older? All three charts use the same data but present it in different formats. Which plot type did you like the best? STEMPLOTS To better understand a stemplot, you will now create your own. You will use heights of the eruption of Old Faithful geyser listed in Data Set 11 of Appendix B (EXCEL FILE: OLDFAITH.XLS) to construct a stemplot using the table below. When you are done you can check the answer to 249 in your Student’s Solution Manual or Appendix F of your text to see if you were correct. Remember that the STEM may consist of several digits, but the LEAVES are the last digit only of each number and must appear in chronological order. For example, if a number is 45, then the Stem is 40 and the leaf is 5. If you have the numbers 41, 44, 44, 45, then the Stem is 40 and the leaves are: 1,4,4,5. NOTE: If a number is 45089, then the stem is 45080 and the leaf is 9. 249 Now enter the heights of Old Faithful: Stem Leaves 246. Let us repeat the exercise using Data Set 8 (Actual High Temperatures) in Appendix B. Create your StemLeaf chart using the table below. Stem Leaves 14 If you rotate the computer screen counterclockwise 90 degrees then you will have your histogram!!! Isn't that cool! The main advantage of a stemandleaf plot over a histogram is that the data can be recreated from the stemandleaf plot. From the histogram, the data cannot be recreated. PARETO CHARTS These charts are nice to use for qualitative data where the one axis represents frequency or relative frequency (%) and the other axis uses words instead of numbers. Follow the directions on page 74 to create your own chart and then paste it below. You can be inventive with the data PASTE YOUR PARETO CHART HERE PIE CHARTS These charts are also fun to do. You can use the same data that you used for your Pareto Chart, or 15 you can use any other data that appeals to you. Follow the directions on page 75 to create your chart. NOTE: Add the following steps to the instructions: 2.5. Highlight the data set 8.5. Under Data Labels, select Category name PASTE YOUR PIE CHART HERE SCATTERPLOTS Scatterplots are often used to determine if there is a relationship between two variables. Usually a relationship exists between two sets of values if a curved or straight line can be drawn among the points. The straighter the line, the better the relationship. In the scatterplot on Page 76 there is definitely a relationship between temperature and chirps per minute since one could draw a diagonal line from the bottom left corner of the graph to the top right corner and see that most of the points fall on either side of the line. If you draw the same line on the graph on page 77 you will see that most of the points fall at one end of the line or the other, so in effect there are no points in the middle region. In this case there is some “other factor” involved than just weight. What do you think the “other factor” is? 16 Now you will create your own Scatterplots following the direction on page 75. Use the data sets in questions 2417 and 2418 to create your scatterplots. NOTES FOR SCATTERPLOTS: 1. If you highlight the two columns of data before you click INSERT, the program will automatically enter the data into the graph so you will not have to manually insert your columns (A2:A30 or B2:B30) 2. If you select CHART OPTIONS, you can enter the titles and other features. 3. If you click on the graph you can also select FORMAT CHART AREA to alter color and style. PASTE YOUR SCATTERPLOT FOR QUESTION 2417 HERE 2417 Is there a relationship between TAR and CO? Give a reason: QUESTION 2418 In EXCEL you can cut and paste the data columns where you wish. For a scattergram, the xaxis (horizontal axis) is always the left column and the yaxis (vertical axis) is always the right column. Copy data set 9 into your new EXCEL file and rearrange the data columns so that the temperature column is to the left of the kWh column. The simplest way to do this is to copy and paste the information from the original data set to a new location in your file. 17 Once this is done, if you highlight both columns and then go to INSERT, CHART, then the program will automatically incorporate the data selected. Then all you have to do is modify your chart using CHART OPTIONS. NICE TOUCHS YOU CAN ADD: You can modify the SCALE of your graph to better fit the data by: RIGHT clicking on the x axis and then selecting FORMAT AXIS, and SCALE and then setting the MINIMUM value for the xaxis to 25. Repeating the process you can select the minimum value for the yaxis as 2000. PASTE YOUR SCATTERPLOT FOR QUESTION 2418 HERE 2418 Is there a relationship between kWh and daily temperature? Give a reason: TIMESERIES GRAPHS Timeseries graphs are extremely popular with companies which monitor sales over several years. If you look at the graph on page 78 you can see that the popularity of drive in movies has declined abruptly and then stabilized. 18 What year was data first collected? How many driveins were present in 1968? How many driveins were present in 1997? Use the directions on page 78 to create your own timeseries graphs. Use the data sets provided in questions 2419 and 2420 for your graphs. Your results should look somewhat like the sample graphs shown. Note that I have left the graph lines in both graphs. Your instructions however ask you to remove the graph lines. Please do so. Remember to save your work. PASTE YOUR TIMESERIES GRAPH FOR QUESTION 2419 HERE 19 Graph 2419: Runway NearHits Is there a trend? If so, what is it? PASTE YOUR TIMESERIES GRAPH FOR QUESTION 2420 HERE Graph 2420: Indoor Movie Theatres Is there a trend? If so, what is it? How does this trend compare to the Drivein movie theatres (p. 78)? Read pages 7982. If you are interested in purchasing vehicles, stereos or any other expensive consumer goods then you probably have seen charts similar to Figure 29 (p. 79). There is a lot of data contained in it but it is very easy to use. 20 If you like medicine, then look carefully at Florence Nightingale’s chart in Figure 210 and if you like history, then look at the chart on page 81. Before you close this file (and your book), save all the work you have done in your STATS file on your computer. 21
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'