Research Methods in Economics
Research Methods in Economics ECO 320
Popular in Course
Popular in Economcs
This 13 page Class Notes was uploaded by Waylon Crona on Sunday October 11, 2015. The Class Notes belongs to ECO 320 at Eastern Kentucky University taught by Staff in Fall. Since its upload, it has received 11 views. For similar materials see /class/221443/eco-320-eastern-kentucky-university in Economcs at Eastern Kentucky University.
Reviews for Research Methods in Economics
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 10/11/15
II DESCRIPTIVE STATISTICS As we saw in the previous chapter Statistics is a discipline that is concerned with the procedures or methods for collecting and analyzing data It was also pointed out that the analysis of data could consist of just description or it could involve generalizations While most of the course is concerned with generalization or Statistical Inference we begin with a discussion of Descriptive Statistics The main purpose of Descriptive Statistics is to organize and summarize data or observations into a form that is understandable and which makes transparent the patterns present in the data The standard techniques of Descriptive Statistics are discussed in this chapter Let us begin with a problem and some data Mr Bullock the owner of a restaurant is thinking about buying some advertising in an attempt to increase business Before committing his funds to an advertising campaign he decides that it would be prudent to get a profile of the restaurant39s customers He arranges to have a random sample of 100 customers taken Information was obtained by means of personal interview for each customer The data on a number of variables are presented in Table l The variables include the following i gender ii marital status iii age in years iv education in years of schooling completed v earnings in thousands of dollars per year vi occupation There are both qualitative and quantitative variables in Table l The quantitative variables are age education and earnings The data on earnings are rounded to the nearest thousand dollars The qualitative characteristics are coded as follows Gender is coded 0 if female 1 if male Marital status is coded 0 if single 1 if married 2 if divorced Table l A Random Sample of Mr Bullock39 Customers Sch Earn Marital Age Job Sex Sch Earn Marital Age Job Sex 100 Occupation is coded 1 if managerial 2 if professional 3 if technical 4 if clerical 5 if services 6 if other Before starting to organize and summarize the sample data for the restaurant it might be useful to point out why one needs to organize and summarize data Remember that what one is trying to do in the case of the restaurant data is to develop a mental picture of the characteristics of the sample of customers The data in Table 1 contain all the information that we have about the sample It is not however in a form that most people can comprehend by looking at it Any patterns that are present in the data are certainly not evident from just looking at Table 1 Hence one needs to organize and summarize the data to detect any interesting patterns that are present Doing Statistics well requires a combination of skill insight and judgment that comes from experience There are however approaches which if followed can help avoid major pitfalls An attempt is made here to outline such an approach to doing Descriptive Statistics Before one begins to analyze a set of data it is important to find out how the data were obtained One needs to know if the data are for a population or a sample If they are for a sample how was the sample drawn or selected It is also important to know who obtained the data Were they obtained by interview or other means What was the nature of the questions employed to obtain the data Were the data accurately recorded Many of the questions raised in the previous paragraph should also be asked about much of what is reported in the news media and especially about advertising claims Intelligent use of information requires that one know it genealogy After one has dealt with the preliminary questions raised above and provided one is convinced that the way in which the data were generated was legitimate for the purpose at hand then one is in a position to start the analysis of the data The rst step should be to plot the data in the order in which they were observed This plot is called a SEQUENCE PLOT The sequence plot for the data on age in Table l is presented in Figure 1 FIGURE 1 SEQUENCE PLOT OF THE OBSERVATIONS ON AGE IN TABLE 1 65 60 55 307 o o co 20 0 The main reason for doing a sequence plot is to see if there are any systematic patterns present in the data when viewed in the order in which they were observed If the data are supposed to be a random sample then one does not expect any systematic patterns in the data This seems to be true of the age data in Figure 1 If one had found that the rst forty observations were mostly younger and the last sixty were older one would be led to wonder if a random sampling procedure were used If one is dealing with a series of observations over time that is a TIME SERIES on a variable the sequence plot may reveal some important patterns in the data In Table 2 data the on the unemployment rate for Jackson County North Carolina are given for the period January 1980 through December 1984 The sequence plot of the data is given in Figure 2 Table 2 Unemployment Rate in Jackson County NC 198084 1980 1981 1982 1983 1984 Jan 81 Feb Mar 91 84 144 96 98 105 123 118 160 126 Apr 64 107 May Jun 54 59 63 55 77 73 81 64 89 65 Jul 58 Aug 54 38 69 Sep 55 58 47 66 47 48 107 The most obvious feature about the unemployment data is that there is a wellde ned pattern of variation within each year that is repeated across years This feature of the data provides one with important information about Jackson County39s economy You may want to ask quotWhy is there such a seasonal patternquot We shall return to this issue later For now we return to the restaurant data in Table 1 FIGURE 2 SEQUENCE PLOT OF THE UNEMPLOYMENT RATE IN JACKSON COUNTY North Carolina 198084 Since the data on age seem to be a random sample we conclude that there is no information in the order in which the data were observed Thus rearranging the data will not cause a loss of information If one is working by hand one of the easiest ways to get a quick impression of the data is to do a STEMAND LEAF DISPLAY In order to demonstrate how to construct a stem andleaf display let us just use the first ten observations on age in Table 1 namely 43 61 26 26 29 45 24 34 35 43 One begins by deciding on the stem In this case we use the tens for the stem size and the ones for t he leaf size The tens are listed in increasing order in a column ONUIAUJN Then we proceed to go through the observations and place the unit opposite the appropriate ten One places the number 3 in 43 in the row opposite 4 and the one in 61 opposite the siXetc One obtains the following display 2 6694 3 45 4 353 5 6 1 When the foregoing method is applied to all the observations on age the following stemandleaf display is obtained 2 66944605262632472 3 4553788592861334237799320851697709900488 4 3530756577244318274232110597086 5 312117373 6 121 What does the stemandleaf display tell us It leaves the basic information intact though it does arrange the information in classes It also provides us with a frequency table and a histogram in one step without destroying the original information We can also figure out the maximum the minimum and the median without much trouble It is now easy to rank the data if we wish to do so By looking at the stemandleaf display one can obtain a good summary picture of the ages of the persons in the sample They range in age from 20 years to 62 years The distribution of age is not uniform The persons in the sample tend to be concentrated in the 30 to 50 age group MINITAB produces a stemandleaf display in which the observations are in ascending order MTB gt STEM C4 SUBCgt INCR 10 Stemandleaf of AGE N 100 Leannit 10 17 2 02222344456666679 41 3 00001122233333444455556677777888888999999 42 4 000111222233334455556677777889 12 5 111233377 3 6 112 The numbers on the far left column of the display tell one that there are nine persons aged less than or equal to 24 years 17 persons age less than or equal to 29 years 35 persons less that or equal to 34 years old The median or middle value is in the 35 39 age group as indicated by the parentheses There are 42 aged 40 or over 26 aged 45 or older 12 aged fifty or older The stemandleaf display can be viewed as a bar graph or HISTOGRAM of the data For those used to vertical bar graphs the information is plotted in Figure 3 Figure 3 Histogram Of Age Restaurant Sample KOSQCQQH TJ 1 0 20 30 40 50 6O 70 8390 Years MINITAB produces a vertical display called a DOTPLOT which has more detail than the usual bar graph MTB gt DOTPLOT C4 A frequency table can also be obtained readily from the stemandleaf display Class lnterv al 6069 MINITAB has a HISTOGRAM command which produces both a histogram and a frequency table The response to Frequency 170 410 300 90 30 MTB gt HIST C4 SUBCgt INCR 10 Cumulativ e Fre quency 170 100 Histogram of AGE N 100 Midpoint Count 200 300 400 500 600 Given the initial stem and leaf display and the associated frequency table and histogram one can now refine these if one wishes to The main question that arises concerns the length of the class interval and thus the number of classes What one is dealing with here is a tradeoff between comprehension and loss of information At one extreme there could be a class for every distinct value while at the other extreme one could have only a single class The 9 Relativ e Frequency 017 041 030 009 003 Cumulative Relative Frequency 017 058 0 88 097 100 26 39 21 5 following plots show the tradeoff MTB gt STEM C4 SUBCgt INCR 2 Stemandleaf of AGE Leannit 10 wamm mmmmmmm wwwwwmmmmm N 100 0 22223 4445 666667 9 000011 22233333 44445555 6677777 888888999999 000111 22223333 445555 6677777 889 111 2333 77 11 2 MTB gt STEM C4 SUBCgt INCR 60 Stemandleaf of AGE Leaf Unit 10 100 0 N 100 222222222222222222222222222222222222222224444444 Most statisticians feel that somewhere between five and fteen classes is the optimum number of classes There are formulas available for determining class length These are built into many of the standard statistical packages It is however no harm to consider some alternative class lengths They can at times be revealing about the data The following stemandleaf display and histogram were produced by MINITAB when the increment was not specified MTB gt STEM C4 Stemandleaf of AGE N 100 Leannit 10 9 2 022223444 17 2 56666679 35 3 000011222333334444 23 3 55556677777888888999999 42 4 0001112222333344 26 4 55556677777889 l2 5 1112333 5 5 77 3 6 112 MTB gt HIST C4 Histogram of AGE N 100 Midpoint Count 20 5 gtxltgtxltgtxltgtxlt 25 11 gtxltgtxltgtxltgtxltgtxltgtxltgtxltgtxltgtxltgtxlt 30 10 gtxltgtxltgtxltgtxltgtxltgtxltgtxltgtxltgtxlt 35 20 40 22 45 17 50 7 gtxltgtxltgtxltgtxltgtxltgtxlt 55 5 gtxltgtxltgtxltgtxlt 60 3 It should be noted that the stemandleaf display and the histogram do not necessarily use the same class intervals In the above example the width is five in both case but the first class for the stemandleaf is 20249 while it is 175224 for the histogram Numerical Summaries In addition to graphical and tabular summaries of the information contained in a set of observations numerical summaries are often presented There are a variety of such summaries available However there are a few which are of special importance These arise from some questions that people tend to raise about phenomena One of these is What is the typical observation or element In our case what is the age of the typical customer There are a number of numerical measures of the typical element These are also called measures of central tendency The four most commonly used ones will be discussed here They are the arithmetic mean the median the mode and the geometric mean Arithmetic Mean The ARITHMETIC MEAN of a set of observations is obtained by summing the observations and dividing the resulting sum by the number of observations If Xi is the ith observation and there are n observations then the arithmetic mean is denoted by Xbar X1 X2 xnn ln sum Xi Applying the formula to the data on age yields an arithmetic mean of Xbar 3837 What is the interpretation of the arithmetic mean and what are its properties The physical interpretation of the arithmetic mean is that it is the value at which a bar with one pound weights at the observed values will balance For example if one has a twelve inch bar with a weight of one unit at zero and at twelve then in order to balance the bar it should be supported at siX In our terms Xbar 0 122 6
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'