Chapter 2 Powerpoint Notes
Chapter 2 Powerpoint Notes KH 3550
Popular in Evaluation and Instrumentation in Exercise Science
Popular in PHIL-Philosophy
verified elite notetaker
This 101 page Class Notes was uploaded by Apollo12 on Tuesday February 2, 2016. The Class Notes belongs to KH 3550 at Georgia State University taught by Brandenberger in Spring 2014. Since its upload, it has received 36 views. For similar materials see Evaluation and Instrumentation in Exercise Science in PHIL-Philosophy at Georgia State University.
Reviews for Chapter 2 Powerpoint Notes
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 02/02/16
Chapter 2 Describing Data: Frequency Distributions and Graphic Presentation GOALS When you have completed this chapter, you will be able to: • Organize raw data into a frequency distribution • Produce a histogram, a frequency polygon, and a cumulative frequency polygon from quantitative data • Develop and interpret a stem-and-leaf display • Present qualitative data using such graphical techniques as a clustered bar chart, a stacked bar chart, and a pie chart • Detect graphic deceptions and use a graph to present data with clarity, precision, and efficiency. Florence Nightingale (1820–1910) he tendency to graphically represent information seems to be one of the basic human instincts. As such, identification of the T oldest such representation is an elusive- task, the earliest known being the map of Konyo, Turkey, datB.. The earliest 1498 known bar chart is the one by Bishop N. Oresme (1350). 1548 Most of the modern forms of statistical graphic techniques were invented between 1780 and 1940. In 1786, William Playfair used 1598 time-series graphs to depict the amount of import and export to and from England, and in 1801, he published a pie chart to show graph- 1648 ically that the British paid more tax than other countries. The first stacked bar chart, cumulative frequency polygon and histogram were published, respec- 1698 tively, by A. Humboldt (1811), J.B.J. Fourier (1821), and A.M. Guerry (1833). The same period saw development of non-trivial applications of these techniques to real-world prob- 1748 lems. One of the most significant contributors in this regard was the lady with a lamp , Florence Nightingale. 1898 Florence Nightingale was born in Florence, Italy in 1820, but was raised mostly in 1948 Derbyshire, England. In spite of resistance from society and her mother, her father edu- cated her in Greek, Latin, French, German, Italian, history, philosophy, and, her favourite 2000 subject, mathematics. When she was 17 years old, Florence had a spiritual experience. She felt herself called by God to His service. Since that time, she made up her mind to dedicate her life to some social cause. She refused to marry several suitors and at the age of twenty-five, stunned her parents by informing them that she had decided to be a nurse, a profession consid- ered low class at that time. During the 1854 British war in Crimea, stirred by the reports of primitive sanitation methods at the British barracks’ hospital, she volunteered her services, and set out to Scutari, Turkey with a group of 38 nurses. Here, mainly by improving the sanitary condi- tions and nursing methods, she managed to bring down the mortality rate at the hospital from 42.7 percent to about 2 percent. On her return to England after the war as a national hero, she dedicated herself to the task of improving the sanitation, and quality of nursing in military hospitals. In this, she encountered strong opposition from the establishment. But with the support of Queen Victoria, and more importantly, with shrewd use of graphic methods (such as stacked bar charts and a new type of polar bar chart that she developed on her own), she succeeded in bringing forth reforms. She was one of the first to use graphical methods in a pre- scriptive, rather that merely a descriptive way, to bring about social reform. Over the subsequent 20 years, she applied statistical methods to civilian hospitals, midwifery, Indian public health, and colonial schools. She briefly served as an adviser to the British war office on medical care in Canada. Her mathematical activities included determining “the average speed of transport by sledge,” and “the time to transport the sick over immense distances in Canada.” With her statistical analysis, she revolutionized the idea that social phenomena could be objectively measured and subjected to mathematical analyses. Karl Pearson acknowl- edged her as “prophetess” in the development of applied statistics. Nightingale held strong opinions on women’s rights, and fought for the removal of restrictions that prevented women from having careers. In 1907 she became the first woman to receive the Order of Merit, an order established by King Edward VII for meri- torious service. 34 Chapter Two Introduction Rob Whitner is the owner of Whitner Pontiac. Rob’s father founded the dealership in 1964, and for more than 30 years they sold exclusively Pontiacs. In the early 1990s Rob’s father’s health began to fail, and Rob took over more of the dealership’s day-to-day operations. At the same time, the automobile business began to change—dealers began to sell vehicles from several manufacturers—and Rob was faced with some major deci- sions. The first came when another local dealer, who handled Volvos, Saabs, and Volkswagens, approached Rob about purchasing his dealership. After considerable thought and analysis, Rob made the purchase. More recently, the local Jeep Eagle deal- ership got into difficulty and Rob bought it out. So now, on the same lot, Rob sells the complete line of Pontiacs; the expensive Volvos and Saabs; Volkswagens; and Chrysler products including the popular Jeep line. Whitner Pontiac employs 83 people, including 23 full-time salespeople. Because of the diverse product line, there is quite a bit of vari- ation in the selling price of the vehicles. A top-of-the-line Volvo sells for more than twice the price of a Pontiac Grand Am. Rob would like to develop some charts and graphs that he could review monthly to see where the selling prices tend to cluster, to see the vari- ation in the selling prices, and to note any trends. In this chapter we present techniques that will be useful to Rob or someone like him in managing his business. 2.1 Constructing a Frequency Distribution of Quantita tive Da ta Recall from Chapter 1 that we refer to techniques used to describe a set of data asdes- criptive statistics. To put it another way, we use descriptive statistics to organize data in various ways to point out where the data values tend to concentrate and to help distinguish the largest and the smallest values. The first method we use to describe a set of data is afrequency distribution. Here our goal is to summarize the data in a table that reveals the shape of the data. Frequency distributionA grouping of data into non-overlapping classes i (mutually exclusive classes or categories) showing the number of observations in each class. The range of classes includes all values in the data set (collec- tively exhaustive categories). How do we develop a frequency distribution? The first step is to tally the data into a table that shows the classes and the number of observations in each class. The steps in constructing a frequency distribution are best described using an example. Remember that our goal is to make a table that will quickly reveal the shape of the data. Example 2-1 In the introduction to this chapter, we described a case where Rob Whitner, owner of Whitner Pontiac, is interested in collecting information on the selling prices of vehicles sold at his dealership. What is the typical selling price? What is the highest selling price? What is the lowest selling price? Around what value do the selling prices tend to cluster? To answer these questions, we need to collect data. According to sales records, Whitner Pontiac sold 80 vehicles last month. The price paid by the customer for each vehicle is shown in Table 2-1. Summarize the selling prices of the vehicles sold last month. Around what value do the selling prices tend to cluster? Describing Data: Frequency Distributions and Graphic Presentation 35 TABLE 2-1: Selling Prices ($) at Whitner Pontiac Last Month 31373 26879 31710 36442 37657 21969 23132 39552 42923 25544 31060 50596 25026 26252 32778 32839 33277 39532 19320 19920 25984 34266 38552 33160 37642 26009 26186 22109 26418 34306 25699 31812 36364 27558 26492 31978 35085 36438 45086 27169 29231 32420 35110 19702 23505 50719 22175 23050 26728 28400 28831 25149 30518 25819 27154 27661 30561 35859 38339 40157 45417 24470 28859 29836 33219 34571 39018 27168 31744 32678 42588 29940 22932 27439 35784 26865 28576 28 704 32795 31103 Solution Table 2-1 contains quantitative data (recall from Chapter 1). These data are raw or ungrouped data. With a little searching, we can find the lowest selling price ($19320) and the highest selling price ($50719), but that is about all. It is difficult to get a feel for the shape of the data by mere observation of the raw data. The raw data are more easily interpreted if they are organized into a frequency distribution. The steps for organizing data into a frequency distribution are outlined below. 1. Decide how many classes you wish to use. The goal is to use just enough groupings or classes to reveal the shape of the distribution. Some judgment is needed here. Too many classes or too few classes might not reveal the basic shape of the set of data. In the vehicle selling price problem, for example, three classes would not give much insight into the pattern of the data (see Table 2-2). TABLE 2-2: An Example of Too Few Classes Vehicle Selling Price Number of Vehicles 19000 up to 32900 53 32900 up to 46800 25 46800 up to 60700 2 Total 80 A useful recipe to determine the number of classes is the “2 to tk rule.” This guide suggests you select the smallest number ( k) for the number of classes such that 2 (in words, 2 raised to the power ok) is greater than the number of data points (n). In the Whitner Pontiac example, there were 80 vehicles sold. Sn = 80. If we try k = 6, which means we would use 6 classes, then 2 = 64, somewhat less than 80. 7 Hence, 6 classes are not enough. If we letk = 7, then 2 = 128, which is greater than 80. So the recommended number of classes is 7. 2. Determine the class width. Generally, the class width should be the same for all classes. At the end of this section, we shall briefly discuss some situations where unequal class widths may be necessary. All classes taken together must cover at least the distance from the lowest value in the raw data up to the highest value. 36 Chapter Two ••• Expressing these words in a formula: Statistics in HL - Action Class width > k Forestry and where H is the highest observed value, L is the lowest observed value, and k is the the Canadian number of classes. Economy In the Whitner Pontiac case, the lowest value is $19320 and the highest value is Why is Canadian soft- $50719. If we wish to use 7 classes, the class width should be greater than ($50719 wood an important - $19320)/7 = $4485571. In practice, this class width is usually rounded up to some commodity? To find convenient number, such as a multiple of 10 or 100. We round this value up to $4490. the answer, let us look at some statistics 3. Set up the individual class limits.We should state class limits very clearly so that each from the Statistics observation falls into only one class. For example, classes such as $19000–$20000 Canada Web site and $20000–$21000 should be avoided because it is not clear whether $20000 is in (www.statcan.ca). the first or second class. In this text, we will use the format $19000 up to $20000 and • Logging and forestry $20000 up to $21000 and so on. With this format it is clear that $19999 goes into the employed 68000 first class and $20000 in the second. workers, second only to mining in primary Because we round the class width up to get a convenient class width, we cover a industries in the larger than necessary range. For example, seven classes of width $4490 in the year 2000 Whitner Pontiac case result in a range of ($4490)(7) = $31430. • Canada exported $41380.8 millions of The actual range is $31399, found by (H - L = 50719 - 19320). Comparing this forestry products on value to $31430, we have an excess of $31. It is natural to put approximately equal balance of payment amounts of the excess in each of the two tails. As we have said before, we should also basis in the year 2000 select convenient multiples of 10 for the class limits. We shall use $19310 as the • Quebec occupies lower limit of the first class. The upper limit of the first class is then 23 800, found the most forestland by (19310 + 4490 = 23800). Hence, our first class is from $19310 upto $23800. We 2 (839000km ) can determine the other classes (in dollars) similarly, (from $23800 up to $28290), • PEI covers the (from $28290 up to $32780), (from $32780 up to $37 270), (from $37270 up to least f2restland $41760), (from $41760 up to $46250), and (from $46250 up to $50740). (3000km ) • Canada 2as 4. Tally the selling prices into the classes. To begin, the selling price of the first vehi- 75800km of cle in Table 2-1 is $31373. It is tallied in the $28290 up to $32780 class. The sec- forestland ond selling price in the first column is $39552. It is tallied in the $37270 up to All the numeric data $41760 class. The other selling prices are tallied in a similar manner. When all the above are statistics, and allow us to see selling prices are tallied, we get Table 2-3(a). why logging and forestry is important to TABLE 2-3: Construction of a Frequency Distribution the Canadian economy. of Whitner Pontiac Data (a) Tally Count (b) Frequency Distribution Classes ($) Tally Selling Prices Frequency 19310 up to 23800 |||| |||| ($ thousands) 23800 up to 28290 |||| |||| |||| |19.310 up to 23.800 10 28290 up to 32780 |||| |||| |||| |23.800 up to 28.290 21 32780 up to 37270 |||| |||| |||| 28.290 up to 32.780 20 37270 up to 41760 |||| ||| 32.780 up to 37.270 15 41760 up to 46250 |||| 37.270 up to 41.760 8 46250 up to 50740 || 41.760 up to 46.250 4 46.250 up to 50.740 2 Total 80 Describing Data: Frequency Distributions and Graphic Presentation 37 5. Count the number of items in each class. The number of observations in each class is called the class frequency. In the $19310 up to $23800 class, there are 10 obser- vations; in the $23800 up to $28290 class there are 21 observations. Therefore, the class frequency in the first class is 10 and the class frequency in the second class is 21. The sum of frequencies of all the classes equals the total number of observations in the entire data set, which is 80. Often it is useful to express the data in thousands, or some convenient units, rather than the actual data. Table 2-3(b) reports the frequency distribution for Whitner Pontiac’s vehicle selling prices where prices are given in thousands of dollars rather than dollars. Now that we have organized the data into a frequency distribution, we can sum- marize the patterns in the selling prices of the vehicles for Rob Whitner. These obser- vations are listed below: 1. The selling prices ranged from about $19310 to $50740. 2. The largest concentration of selling prices is in the $23800 up to $28290 class. 3. The selling prices are concentrated between $23800 and $37270. A total of 56 (70 percent) of the vehicles are sold within this range. 4. Two of the vehicles sold for $46250 or more, and 10 sold for less than $23800. By presenting this information to Rob Whitner, we give him a clearer picture of the distribution of the selling prices for the last month. We admit that arranging the information on the selling prices into a frequency dis- tribution does result in the loss of some detailed information. That is, by organizing the data into a frequency distribution, we cannot pinpoint the exact selling price (such as $23820, or $32800), and we cannot tell that the actual selling price of the least expen- sive vehicle was $19320 and of the most expensive vehicle was $50719. However, the lower limit of the first class and the upper limit of the largest class convey essentially the same meaning. Whitner will make the same judgment if he knows the lowest price is about $19310 that he will make if he knows the exact selling price is $19320. The advantage of condensing the data into a more understandable form more than offsets this disadvantage. SELF-REVIEW 2- 1 The commissions earned for the first quarter of last year by the 11 members of the sales staff at Master Chemical Company are $1650, $1475, $1510, $1670, $1595, $1760, $1540, $1495, $1590, $1625, and $1510. (a) What are the values such as $1650 and $1475 called? (b) Using $1400 up to $1500 as the first class, $1500 up to $1600 as the second class, and so forth, organize data on commissions earned into a frequency distribution. (c) What are the numbers in the right column of your frequency distribution called? (d) Describe the distribution of commissions earned based on the frequency distribution. What is the largest amount of commission earned? What is the smallest? 38 Chapter Two CLASS INTERVALS AND CLASS MIDPOINTS We will use two other terms frequently:class midpointand class interval. The midpoint, also called the class mark, is halfway between the lower and upper class limits. It can be computed by adding the lower class limit to the upper class limit and dividing by 2. Referring to Table 2-3 for the first class, the lower class limit is $19310 and the upper limit is $23800. The class midpoint is $21555, found by ($1931+ $23800)/2. The midpoint of $21555 best represents, or is typical of, the selling prices of the vehicles in that class. To determine the class interval, subtract the lower limit of the class from its upper limit. The class interval of the vehicle selling price data is $4490, which we find by sub- tracting the lower limit of the first class, $19310, from its upper limit; that is, $23800 - $19310 = $4490. You can also determine the class interval by finding the distance between consecutive midpoints. The midpoint of the first class is $21555 and the mid- point of the second class is $26045. The difference is $4490. A SOFTWARE EXAMPLE: FREQUENCY DISTRIBUTION USING MEGASTAT Chart 2-2 shows the frequency distribution of the Whitner Pontiac data produced by MegaStat. The form of the output is somewhat different than the frequency distribu- tion in Table 2-3(b), but overall conclusions are the same. Self-Review 2-2 The following table includes the grades of students who took Math 1021 during Fall 2002. 40 55 50 55 28 60 25 55 60 65 70 64 62 70 50 65 55 48 69 25 64 58 55 71 (a) How many classes would you use? (b) How wide would you make the classes? (c) Create a frequency distribution table. RELATIVE FREQUENCY DISTRIBUTION It may be desirable to convert class frequencies to relative class frequencies to show the fraction of the total number of observations in each class. In our vehicle sales example, we may want to know what percentage of the vehicle prices are in the $28290 up to $32780 class. To convert a frequency distribution to a relative frequency distribution, each of the class frequencies is divided by the total number of observations. Using the distribution of vehicle sales again (Table 2-3(b), where the selling prices are reported in thousands of dollars), the relative frequency for the $19310 up to $23800 class is 0.125, found by dividing 10 by 80. That is, the price of 12.5 percent of the vehicles sold at Whitner Pontiac is between $19310 and $23800. The relative frequencies for the remaining classes are shown in Table 2-4. Describing Data: Frequency Distributions and Graphic Presentation 39 EXCEL CHART 2-2: Frequency Distribution of Data in Table 2-1 Start 1 2 MICROSOFT EXCEL INSTRUCTIONS 1. Click on MegaStat, Frequency Distributions, Quantitative.... 2. In the Input Range field, enter the data location. 3. Select Equal Width Interval, and input interval size (= 4490 in our example). 4. Input value of lower boundary of the first interval (= 19310 in our example). 5. Deselect Histogram, and click OK. TABLE 2-4: Relative Frequency Distribution of Selling Prices at Whitner Pontiac Last Month Selling Price ($ thousands) Frequency Relative Frequency Found by 19.310 up to 23.800 10 0.1250 «——— 10/80 23.800 up to 28.290 21 0.2625 «——— 21/80 28.290 up to 32.780 20 0.2500 «——— 20/80 32.780 up to 37.270 15 0.1875 «——— 15/80 37.270 up to 41.760 8 0.1000 «——— 8/80 41.760 up to 46.250 4 0.0500 «——— 4/80 46.250 up to 50.740 2 0.0250 «——— 2/80 Total 80 1.00100 40 Chapter Two SELF-REVIEW 2-3 Refer to Table 2-4, which shows the relative frequency distribution for the vehicles sold last month at Whitner Pontiac. (a) How many vehicles sold for $23800 up to $28290? (b) What percentage of the vehicles sold for a price from $23800 up to $28290? (c) What percentage of the vehicles sold for $37270 or more? EXERCISES 2-1 TO 2-8 2-1. A set of data consists of 38 observations. How many classes would you recom- mend for the frequency distribution? 2-2. A set of data consists of 45 observations. The lowest value is $0 and the highest value is $29. What size would you recommend for the class interval? 2-3. A set of data consists of 230 observations. The lowest value is $235 and the highest value is $567. What class interval would you recommend? 2-4. A set of data contains 53 observations. The lowest value is 42 and the highest is 129. The data are to be organized into a frequency distribution. (a) How many classes would you suggest? (b) What would you suggest as the lower limit of the first class? 2-5. The Wachesaw Outpatient Centre, designed for same-day minor surgery, opened last month. Below are the numbers of patients served during the first 16 days. 27 27 23 24 25 28 35 33 34 24 30 30 24 33 23 23 (a) How many classes would you recommend? (b) What class interval would you suggest? (c) What lower limit would you recommend for the first class? 2-6. The Quick-Change Oil Company has a number of outlets in Hamilton, Ontario. The numbers of oil changes at the Oak Street outlet in the past 20 days are listed below. The data are to be organized into a frequency distribution. 65 98 55 62 79 59 51 90 72 56 70 62 66 80 94 79 63 73 71 85 (a) How many classes would you recommend? (b) What class interval would you suggest? (c) What lower limit would you recommend for the first class? (d) Organize the number of oil changes into a frequency distribution. (e) Comment on the shape of the frequency distribution. Also determine the relative frequency distribution. 2-7. The local manager of Food Queen is interested in the number of times a customer shops at her store during a two-week period. The responses of 51 customers were: 5 3 314 4 5 6 4 26 6 6 7 1 1 124 4 4 5 6 35 3 4 5 6 8 4 765 9 13 47 6 5 1 1 8922 1 Describing Data: Frequency Distributions and Graphic Presentation 41 (a) Starting with 0 as the lower limit of the first class and using a class interval of 3, organize the data into a frequency distribution. (b) Describe the distribution. Where do the data tend to cluster? (c) Convert the distribution to a relative frequency distribution. 2-8. Moore Travel, a nationwide travel agency, offers special rates on certain Caribbean cruises to senior citizens. The president of Moore Travel wants addi- tional information on the ages of those people taking cruises. A random sample of 40 customers taking a cruise last year revealed these ages: 77 18 63 84 38 54 50 59 54 56 36 26 50 34 44 41 58 58 53 51 62 43 52 53 63 62 62 65 61 52 60 60 45 66 83 71 63 58 61 71 (a) Organize the data into a frequency distribution, using 7 classes and 15 as the lower limit of the first class. What class interval did you select? (b) Where do the data tend to cluster? (c) Describe the distribution. (d) Determine the relative frequency distribution. FREQUENCY DISTRIBUTION WITH UNEQUAL CLASS INTERVALS In constructing frequency distributions of quantitative data, generally, equal class widths are assigned to all classes. This is because unequal class intervals present problems in graphically portraying the distribution and in doing some of the computa- tions, as we will see in later chapters. Unequal class intervals, however, may be neces- sary in certain situations to avoid a large number of empty, or almost empty, classes. Such is the case in Table 2-5. Canada Customs and Revenue Agency (CCRA) used unequal-sized class intervals to report the adjusted gross income on individual tax returns. Had the CCRA used an equal-sized interval of, say, $1000, more than 1000 classes would have been required to describe all the incomes. A frequency distribution with 1000 classes would be difficult to interpret. In this case, the distribution is easier to understand in spite of the unequal classes. Note also that the number of income tax returns or “frequencies” is reported in thousands in this particular table. This also makes the information easier to digest. TABLE 2-5: Adjusted Gross Income for Individuals Filing Income Tax Returns Adjusted Gross Income ($) Number of Returns (in thousands) Under 2000 135 2000 up to 3000 3399 3000 up to 5000 8175 5000 up to 10000 19740 10000 up to 15000 15539 15000 up to 25000 14944 25000 up to 50000 4451 50000 up to 100000 699 100000 up to 500000 162 500000 up to 1000000 3 1000000 and over 1 42 Chapter Two 2.2 Stem-and-Leaf Displa ys In Section 2.1, we showed how to organize quantitative data into a frequency distri- bution so we could summarize the raw data into a meaningful form. The major advan- tage of organizing the data into a frequency distribution is that we get a quick visual picture of the shape of the distribution without doing any further calculation. That is, we can see where the data are concentrated and also determine whether there are any extremely large or small values. However, it has two disadvantages: (1) we lose the exact identity of each value, and (2) we are not sure how the values within each class are distributed. To explain, consider the following frequency distribution of the num- ber of 30-second radio advertising spots purchased by the 45 members of the Toronto Automobile Dealers’ Association in 2001. We observe that 7 of the 45 dealers pur- chased at least 90 but less than 100 spots. However, is the number of spots purchased within this class clustered near 90, spread evenly throughout the class, or clustered near 99? We cannot tell. Number of Spots Purchased Frequency 80 up to 90 2 90 up to 100 7 100 up to 110 6 110 up to 120 9 120 up to 130 8 130 up to 140 7 140 up to 150 3 150 up to 160 3 Total 45 For a mid-sized data set, we can eliminate these shortcomings by using an alterna- tive graphic display called the stem-and-leaf display. To illustrate the construction of a stem-and-leaf display using the advertising spots data, suppose the seven observa- tions in the 90 up to 100 class are 96, 94, 93, 94, 95, 96, and 97. Let us sort these values to get: 93, 94, 94, 95, 96, 96, 97. Thestem value is the lead- ing digit or digits, in this case 9. The leaves are the trailing digits. The stem is placed to the left of a vertical line and the leaf values to the right. The values in the 90 up to 100 class would appear in the stem-and-leaf display as follows: 9 | 3445667 With the stem-and-leaf display, we can quickly observe that there were two dealers who purchased 94 spots and that the number of spots purchased ranged from 93 to 97. A stem-and-leaf display is similar to a frequency distribution with more information (i.e., data values instead of tallies). Stem-and-leaf display A statistical technique to present a set of data. Each i numerical value is divided into two parts. The leading digit(s) become(s) the stem and the trailing digit(s) become(s) the leaf. The stems are located along the vertical axis and the leaf values are stacked against one another along the horizontal axis. The following example will explain the details of developing a stem-and-leaf display. Describing Data: Frequency Distributions and Graphic Presentation 43 Example 2-2 Table 2-6 lists the number of 30-second radio advertising spots purchased by each of the 45 members of the Toronto Automobile Dealers’ Association last year. Organize the data into a stem-and-leaf display. Around what values do the number of advertising spots tend to cluster? What is the smallest number of spots purchased by a dealer and the largest number purchased? Solution From the data in Table 2-6 we note that the smallest number of spots purchased is 88. So we will make the first stem value 8. Leaf The largest number is 156, so we will have the stem values 8 8 3 9 6 begin at 8 and continue to 15. The first number in Table 2-6 is 96, which will have a stem value of 9 and leaf value of 6.0 Moving across the top row, the second value is 93 and the 11 third is 88. After the first three data values are considered, 13 the display is shown opposite. 14 15 TABLE 2-6: Number of Advertising Spots Purchased during 2001 by Members of the Toronto Automobile Dealers’ Association 96 93 88 117 127 95 113 96 108 94 148 156 139 142 94 107 125 155 155 103 112 127 117 120 112 135 132 111 125 104 106 139 134 119 97 89 118 136 125 143 120 103 113 124 138 Organizing all the data, the stem-and-leaf display would appear as shown in Chart 2-3(a). The usual procedure is to sort the leaf values from smallest to largest. The last line, the row referring to the values in the 150s, would appear as: 15 | 556 The final table would appear as shown in Chart 2-3(b), where we have sorted all of the leaf values. CHART 2-3: Stem-and-Leaf Display a. b. Stem Leaf Stem Leaf 9 8 8 9 8 8 9 6356447 9 3445667 0 873463 10 334678 1 732721983 11 122337789 2 75705504 12 00455577 3 9529468 13 2456899 4 823 14 238 5 655 15 556 44 Chapter Two You can draw several conclusions from the stem-and-leaf display. First, the lowest number of spots purchased is 88 and the highest is 156. Two dealers purchased less than 90 spots, and three purchased 150 or more. You can observe, for example, that the three dealers who purchased more than 150 spots actually purchased 155, 155, and 156 spots. The concentration of the number of spots is between 110 and 139. There were nine dealers who purchased between 110 and 119 spots and eight who purchased between 120 and 129 spots. We can also tell that within the 120 up to 130 group, the actual number of spots purchased was spread evenly throughout. That is, two dealers purchased 120 spots, one dealer purchased 124 spots, three dealers purchased 125 spots, and two dealers purchased 127 spots. We can also generate this information using Minitab. We have named the variable Spots. The Minitab output is given on the next page. The Minitab stem-and-leaf display provides some additional information regard- ing cumulative totals. In Chart 2-4, the column to the left of the stem values has numbers such as 2, 9, 15, and so on. The number 9 indicates that there are 9 observations of value less than the upper limit of the current class, which is 100. The number 15 indicates that there are 15 observations less than 110. About halfway down the column the number 9 appears in parentheses. The parentheses indicate that the middle value appears in that row; hence, we call this row the median row. In this case, we describe the middle value as the value that divides the total number of observations into two equal parts. There are a total of 45 observations, so the middle value, if the data were arranged from smallest to largest, would be the 23rd observa- tion. After the median row, the values begin to decline. These values represent the “more than” cumulative totals. There are 21 observations of value greater than or equal to the lower limit of this class, which is 120; 13 of 130 or more, and so on. Stem- and-leaf display is useful only for a mid-sized data set. When we use a stem-and-leaf display for a large data set, we produce a large number of stems and/or leaves and are not able to see the characteristics of a large data set. In the stem-and-leaf display for Example 2-2, the leading digits (stems) take the values from 8 to 15 and thus have 8 stems (8, 9, 10, 11, 12, 13, 14, 15) in units of 10. However, in some data sets, stems assume only two or three values. Generating a stem- and-leaf display in these situations is not as easy as in Example 2-2. Let us look at the sample of marks of 20 students in Math 2010: 50 52 54 53 65 60 45 43 57 62 56 58 51 61 46 44 69 55 64 59 The leading digits (units of 10) in this example assume only three values: 4, 5, and 6. Following the above procedure for drawing a stem-and-leaf display, the stem-and-leaf display of the above data set looks like the one given below. Stem Leaf 4 3456 5 0123456789 6 012459 As we can see, this stem-and-leaf display has only three stems and does not display the characteristics of the data set as well as if there were more stems. We can improve the stem-and-leaf display by splitting each stem. For example, stem 4 can be split as 4 4 3 6 4 5 The first stem 4 contains leaves less than 5 and the second stem 4 contains leaves 5 and above. Describing Data: Frequency Distributions and Graphic Presentation 45 MINITAB CHART 2-4: Stem-and-Leaf Display of Data in Table 2-6 1 2 3 MINITAB INSTRUCTIONS 1. Click on Graph, and Stem-and-leaf. 2. Enter the location of the data in the variable field. 3. Enter the size of the increment, ( = 10 in our example), in the increment field. 4. Click OK. The revised stem-and-leaf display is given below. Stem Leaf 4 4 3 6 4 5 5 01234 5 56789 6 0124 9 6 5 Other data sets may require even more splitting. The question of how much split- ting is necessary can be answered by the rule suggested by Tukey et al. 1For a sample size £ 100, the number of stems should be the integer part of 2Ön, where n is the sam- ple size; for n ³ 100, the number of stems should be the integer part of 10 log 10n. In our example of 20 students’ marks, the number suggested by the rule is 8. However, we have 6 stems in our example, which is close to 8. Remember, the rule provides a guideline for selecting the number of stems. 46 Chapter Two SELF-REVIEW 2-4 The price–earnings ratios for 21 stocks in the retail trade category are: 8.3 9.6 9.5 9.1 8.8 11.2 7.7 10.1 9.9 10.8 10.2 8.0 8.4 8.1 11.6 9.6 8.8 8.0 10.4 9.8 9.2 Organize this information into a stem-and-leaf display. (a) How many values are less than 9.0? (b) List the values in the 10.0 up to 11.0 category. (c) What are the largest and the smallest price–earnings ratios? EXERCISES 2-9 TO 2-14 2-9. The first row of a stem-and-leaf display appears as follows: 6| 1 3 3 7 9. Assume whole number values. (a) What is the range of the values in this row? (b) How many data values are in this row? (c) List the actual values in this row. 2-10. The third row of a stem-and-leaf display appears as follows: 21| 0 1 3 5 7 9. Assume whole number values. (a) What is the range of the values in this row? (b) How many data values are in this row? (c) List the actual values in this row. 2-11. The following stem-and-leaf display shows the number of units produced per day in a factory. 1 3 8 1 4 2 5 6 9 6 0 1 3 3 5 5 9 (7) 7 0 2 3 6 7 7 8 9 8 5 9 7 9 0 0 1 5 6 0 26 1 3 (a) How many days were studied? (b) How many observations are in the first class? (c) What are the largest and the smallest values in the data set? (d) List the actual values in the fourth row. (e) List the actual values in the second row. (f) How many values are less than 70? (g) How many values are 80 or more? (h) How many values are between 60 and 89? Describing Data: Frequency Distributions and Graphic Presentation 47 2-12. The following stem-and-leaf display reports the number of movies rented per day at Video Connection. 3 12 6 8 9 6 13 1 2 3 10 14 6 8 8 9 13 15 5 8 9 15 16 3 5 20 17 2 4 5 6 8 23 18 2 6 8 (5) 19 1 3 4 5 6 22 20 0 3 4 6 7 9 16 21 2 2 3 9 12 22 7 8 9 9 23 0 0 1 7 9 4 4 2 8 5 3 23 1 6 1 2 7 1 2 0 (a) How many days were studied? (b) How many observations are in the last class? (c) What are the largest and the smallest values in the entire set of data? (d) List the actual values in the fourth row. (e) List the actual values in the next to the last row. (f) On how many days were fewer than 160 movies rented? (g) On how many days were 220 or more movies rented? (h) On how many days were between 170 and 210 movies rented? 2-13. A survey of the number of calls received by a sample of Southern Phone Company subscribers last week revealed the following information. Develop a stem-and-leaf display. How many calls did a typical subscriber receive? What were the largest and the smallest number of calls received? 52 43 30 38 30 42 12 46 39 37 34 46 32 18 41 5 2-14. Aloha Banking Co. is studying the number of times a particular automated teller machine (ATM) is used each day. The following is the number of times it was used during each of the last 30 days. Develop a stem-and-leaf display. Summarize the data on the number of times the machine was used: How many times was the ATM used on a typical day? What were the largest and the small- est number of times the ATM was used? Around what values did the number of times the ATM was used, tend to cluster? 83 64 84 76 84 54 75 59 70 61 63 80 84 73 68 52 65 90 52 77 95 36 78 61 59 84 95 47 87 60 48 Chapter Two 2.3 Graphic Present a tion of a Frequency Distribution Sales managers, stock analysts, hospital administrators, and other busy executives often need a quick picture of the trends in sales, stock prices, or hospital costs. These trends can often be depicted by the use of charts and graphs. The charts that depict a frequency distribution graphically are the histogram, the stem-and-leaf display, the frequency polygon, and the cumulative frequency polygon. HISTOGRAM One of the most common graphical methods of displaying frequency distribution of a quantitative data is a histogram. Histogram A graph in which classes are marked on the horizontal axis and i class frequencies on the vertical axis. The class frequencies are represented by the heights of the rectangles, and the rectangles are drawn adjacent to each other without any space between them. Thus, a histogram describes a frequency distribution using a series of adjacent rec- tangles. Since the height of each rectangle equals the frequency of the corresponding class, and all the class widths are equal, the area of each rectangle is proportional to the frequency of the corresponding class. Example 2-3 Refer to the data in Table 2-7 on life expectancy of males at birth in 40 countries. Construct a frequency distribution and a histogram. What conclusions can you reach based on the information presented in the histogram? TABLE 2-7: Life Expectancy of Males at Birth Country Life Expectancy Country Life Expectancy Country Life Expectancy (years) (years) (years) Afghanistan 45 Bhutan 59.5 Egypt 64.7 Albania 69.9 Botswana 46.2 France 74.2 Angola 44.9 Brazil 63.1 Germany 73.9 Argentina 69.6 Bulgaria 67.6 Hungary 66.8 Armenia 67.2 Cambodia 51.5 India 62.3 Australia 75.5 Canada 76.1 Iran 68.5 Austria 73.7 Chad 45.7 Japan 76.8 Bahamas 70.5 Chile 72.3 Kenya 51.1 Bahrain 71.1 China 67.9 Nepal 57.6 Bangladesh 58.1 Congo 48.3 UK 74.5 Barbados 73.7 Cuba 74.2 USA 73.4 Belarus 62.2 Czech Venezuela 70 Belgium 73.8 Republic 70.3 Zambia 39.5 Bermuda 71.7 Denmark 73 Source: Life Expectancy at Birth (Males), United Nations Statistics Divisions, 1996–2000 Solution The data in Table 2-7 is a quantitative data. Therefore, the first step is to construct a frequency distribution using the method discussed in Section 2.1 This is given in Table 2-8. (In Table 2-8, we also give relative frequencies. These will be discussed later.) Describing Data: Frequency Distributions and Graphic Presentation 49 TABLE 2-8: Frequency and Relative Frequency Distribution of Life Expectancy Data Life Expectancy Frequency Relative Frequency Found by 36 up to 43 1 0.025 1/40 43 up to 50 5 0.125 5/40 50 up to 57 2 0.050 2/40 57 up to 64 6 0.150 6/40 64 up to 71 11 0.275 11/40 71 up to 78 15 0.375 15/40 Total 40 1.000 To construct a histogram, class frequencies are scaled along the vertical axis (y-axis) and either the class limits or the class midpoints are scaled along the hori- zontal axis (x-axis). From the frequency distribution, the frequency of the class 36 up to 43 is 1. Therefore, the height of the column for this class is 1. Make a rectangle whose width spreads from 36 to 43 with the height of one unit. Repeat the process for the remain- ing classes. The completed histogram should resemble the graph presented in Chart 2-5. The double slant on the x-axis indicates that the class limits did not start at zero. That is, the division between 0 and 36 is not linear. In other words, the distance between 0 and 36 is not the same as the distance between 36 and 43, between 43 and 50, and so on. CHART 2-5: Histogram of Life Expectancy for Males at Birth 16 14 12 10 8 6 Frequency 4 2 36 43 50 57 64 71 78 Histogram of Life Expectancy at Birth (Males) From Chart 2-5, we conclude that: • the lowest life expectancy is about 36 years and the highest is about 78 years. • the class with the highest frequency (15) is 71 up to 78. That is, 15 countries have a life expectancy from 71 up to 78 years. • the class with the lowest frequency (1) is 36 up to 43 years. That is, there is only one country with a life expectancy from 36 up to 43. • the histogram is j-shaped. There is a tail on the left side of the class with the high- est frequency (mode), and no tail on its right side. 50 Chapter Two COMMON DISTRIBUTION SHAPES According to the shapes of histograms, distributions can be classified into (i) symmetrical and (ii) skewed. A symmetrical distribution is one in which, if we divide its histogram into two pieces by drawing a vertical line through its centre, the two halves formed are mirror images of each other. This is displayed in Chart 2-6(a). A distribution that is not symmetrical is said to be skewed. For a skewed distribution, it is quite common to have one tail of the distribution longer than the other. If the longer tail is stretched to the right, the distribution is said to be skewed to the right. If the longer tail is stretched to the left, it is said to be skewed to the left. These are displayed in Charts 2-6(b) and (c) below. CHART 2-6: Common Distribution Shapes (a) (b) (c) Symmetrical Skewed Right Skewed Left For a symmetrical distribution, the centre, or the typical value, of the distribution is well defined. For a skewed distribution, however, it is not that easy to define the cen- tre. We shall discuss this in detail in the next chapter. Another commonly used classification of distributions is according to its number of peaks. When the histogram has a single peak, the distribution is called unimodal. A bimodal distribution is one in which the histogram has two peaks not necessarily equal in height. RELATIVE FREQUENCY HISTOGRAM A relative frequency histogram is a graph in which classes are marked on the horizontal axis and the relative frequencies (frequency of a class/total frequency) on the vertical axis. Let us refer again to the data in Table 2-7 on life expectancy of males at birth in 40 countries. In Table 2-8 we also give a relative frequency distribution corresponding to this data. For example, the relative frequency of the class 43 up to 50 is 0.125 (5/40). We follow the procedure used in drawing a histogram to draw a relative frequency histo- gram. Chart 2-7 shows the relative frequency histogram of the life expectancy data. A relative frequency histogram has the following important properties: • The shape of a relative frequency histogram of a data set is identical to the shape of its histogram. (Verify this for the life expectancy data.) Describing Data: Frequency Distributions and Graphic Presentation 51 • It is useful in comparing shapes of two or more data sets with different total frequencies. (Note that when total frequencies of two data sets are different, histo- grams of these data sets cannot be compared. For example, total frequency of one data set may be 1000, while that of the other data set may be 100. But relative fre- quencies of any data set add up to 1.0.) • The area of the rectangle corresponding to a class interval equals (relative frequency of the class) ´ (class width). For example, the relative frequency of class 43 up to 50 is 0.125 (12.5 percent of the countries listed in Table 2-7 have life expectancy in this class). The area of the corresponding rectangle is (0.125)(50 - 43) = 0.875. The total area under the entire relative frequency histogram is therefore (class width) ´ (sum of relative frequencies of all the classes) = class width. (This is because the sum of the relative frequency of all classes equals 1.) If we scale the height of each rectangle by 1/(class width), then the total area under each rectangle of the scaled relative frequency histogram will be equal to its relative frequency, and the total area under the entire scaled relative frequency histogram will be equal to 1. A histogram provides an easily interpreted visual representation of the frequency distribution of a given raw data. The shape of the histogram is the same whether we use the actual frequency distribution or the relative frequency distribution. We shall see in later chapters the importance of shapes in determining the appropriate method of statistical analysis. HISTOGRAM USING EXCEL AND MINITAB We can plot a histogram using MegaStat by following the same instructions as those for the construction of a frequency distribution, except that in this case, we do not deselect “histogram.” We give below instructions for plotting a histogram using Excel (without Megastat) and Minitab. CHART 2-7: Relative Frequency Histogram of Life Expectancy at Birth (Males) 0.40 0.30 0.25 0.20 0.15 0.10 Relative Frequency 0.05 36 43 50 57 64 71 78 Life Expectancy 52 Chapter Two EXCEL CHART 2-8: Histogram of Life Expectancy Start 1 2 3 MICROSOFT EXCEL INSTRUCTIONS 1. Enter the data in the first column of the worksheet. 2. In the next column, enter a label and call it Bin. In this column, enter the upper limit of each class. 3. Click on Tools, Data Analysis, Histogram, and OK. 4. Enter the location of data in the Input Range. 5. Enter the location of Bin Range. 6. Check Label box, chart output, and click OK. 7. Click on any rectangle in the chart and then right click the mouse. 8. Click Format Data Series, Select Options, reduce gap width to zero, and click OK. Describing Data: Frequency Distributions and Graphic Presentation 53 MINITAB CHART 2-9: Histogram of Life Expectancy 1 2 3 4 MINITAB INSTRUCTIONS 1. Click on Graph, and Histogram. 2. Type the variable name in box 1 of Graph varia
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'