# Elementary Statistical Methods Jan, 11-15 Notes Statistics 2228

ECU

GPA 3.86

This 6 page Class Notes was uploaded by Rebecca Weidner on Wednesday January 13, 2016. The Class Notes belongs to Statistics 2228 at East Carolina University taught by Dr. W. Teresa Obuchowska in Spring 2016.

Date Created: 01/13/16

Elementary Statistical Methods Notes: January 11-15, 2016 Chapter 1: Intro to Statistics Section 1.1 Statistics is the science of collecting, describing, and interpreting data. There are two types of Statistics: Descriptive Statistics (Chapters 2-4) Inferential Statistics (Chapters 8-11) (Chapters 5-7 cover probability) Chapter 5: Probability Chapter 6: Discrete probability distribution Chapter 7: Continuous probability distribution Descriptive Statistics includes the collection, presentation, and description of sample data (histograms, bar graphs, pie charts, etc…) Inferential Statistics consists of procedures to make inferences or conclusions about population characteristics from the info contained in the sample. A variable is a characteristic that changes or varies over time and/or for different individuals or objects under consideration (e.g. test scores, sales, etc…) Example: Body temperature is a variable that changes over time within a single person, it also varies from person to person. Data are observations (e.g. measurements) that have been collected. A population is the set of all individuals or objects or measurements whose properties are to be analyzed. A sample is a small part of the total population. A parameter is a numerical measurement describing some characteristics of an entire population. Example: The average age at time of admission for all students attending ECU *Greek letters are used to symbolize parameters: Mu = mean Sigma = standard deviation A statistic is a numerical measurement describing some characteristic of a sample of a population *English letters are used to symbolize statistics: X = mean S = standard deviation Illustration of each term used in an example: We are interested in estimating the average dollar value of all cars owned by the faculty members at ECU. The terms above can be identified as: 1. The population is the total collection of cars owned by ECU faculty. 2. A sample is any subset of that population, for example cars owned by faculty members of the math department. 3. The variable is the “dollar value” of each individual car. 4. The data would be the set of values that correspond with the sample obtained (e.g. $9,400, $11,000, etc…) 5. The parameter is the average value of all cars in the population. 6. The statistic is the average value of cars only in the sample of the population. The variables are what make up the data. Data can be either qualitative or quantitative. Qualitative Data Categorical or attribute. Nominal data can only be classified, but not ordered (e.g. car color, SSN, states, gender, etc…) Ordinal data can be ranked (e.g. letter grades, level of satisfaction, etc…) Quantitative Data Quantitative or numerical Can be either discrete or continuous o A discrete variable: can assume only a finite number of values, or a countable number of values (e.g. 1,2,3… etc) o A continuous variable: always assumes infinitely many values corresponding to the points on a line interval and the differences between the data values can be arbitrarily small. Can be either Interval data or Ratio data o Interval: meaningful difference between data values. Zero does not indicate that nothing is present (e.g. temperature, dress sizes, etc…) o Ratio: Meaningful difference between data vakues AND zero indicates that there is nothing present. (e.g. distance, weight, height, money etc…) Section 1.3: Simple Random Sample A random sample is a sample of size n out of a population size N where each individual has an equal chance of being selected. A simple random sample is a sample of size n out of a population of size N where every possible sample of that size has an equal chance of being selected. Example: Suppose that in a statistics class there are 42 students arranged into 6 rows and 7 columns, and that a random sample of 7 students is chosen by rolling a fair die. The number of dots on the upper face of the die indicates the number of the row of seven students that are being selected for the sample. The sample selected in this way is a random sample since every student has the same chance of being selected, but it is not a simple random sample because not every possible combination of 7 students is able to be selected this way. If you roll a 4 on the dice: Each student has a 1 in 6 chance of being selected x x x x x x x However, you could never have this group of students selected: (so its not simple random) x x x x x x x To select a simple random sample, each student would be assigned a two digit #: 01 02 03 04 05 06 07 Etc… And then a computer programmed random number generator could be used to select students, skipping all values higher than 42 or subtracting 42 from the large number. OR We can use a chart of random numbers by using the table in Appendix A1 in the textbook. Section 1.4: Other Methods of sampling (Not testable material) 1. Systematic sampling (random sample): Choose k close to N/n, then select some starting point that is <k. Chose every k-th (e.g. 100 ) element in the population. 2. Convenience Sampling: Choosing the most easily accessible individuals or a very easy to group set of individuals. 3. Stratified sampling 4. Cluster sampling Chapter 2 Section 2.1 Organizing Qualitative Data A frequency table (distribution) lists the categories of data along with their corresponding frequencies. A relative frequency table lists the categories of data along with their corresponding relative frequencies. Exmaple: Category Frequency Relative frequency A fA fA/sum of all frequencies B fB fB/sum of all frequencies C fC fC/sum of all frequencies D fD fD/sum of all frequencies E fE fE/sum of all frequencies Sum: ----- Sigma(fi) 1 To make a relative frequency table: Sum of frequencies = fA + fB +fC + fD +fE In general : sum of frequencies = f1 + f2 + f3 +….+fm* *m=how many categories there are relative freq = frequency/sum of all frequencies Graphs or qualitative data: bar graphs, relative frequency bar graphs, pie charts, Pareto charts. A bar graph is a graph in which the horizontal axis represents categories and the other axis represents their frequencies. Made of vertical rectangles with equal width. A relative frequency bar graph is a graph in which the horizontal axis represents categories and the other axis represents the relative frequencies. Also made of vertical rectangles with equal width. A Pareto chart is a bar chart where the bars are arranged in decreasing order of frequency. A pie chart is a graph depicting categories of the qualitative variable as slices of a circle. Example: An analyst of trail derailments showed that (A) 23 derailments are caused by bad tracks. (B) 9 are caused by faulty equipment (C)12 are caused by human error (D)6 are caused by other causes Construct the frequency table: Category Frequency Relative frequency A. Derailment 23 .46 (46%) B. Faulty Eq. 9 .18 (18%) C. Human Err. 12 .24 (24%) D. Other 6 .06 (6%) Sum 50 1 (100%)

