### Create a StudySoup account

#### Be part of our community, it's free to join!

Already have a StudySoup account? Login here

# Stat Chapter 2 Notes Stat 206

USC

### View Full Document

## 2

## 0

## Popular in Business Statistics

## Popular in Math

This 13 page Class Notes was uploaded by Brandon Gearhart on Monday October 3, 2016. The Class Notes belongs to Stat 206 at University of South Carolina taught by Angela Ferguson in Fall 2016. Since its upload, it has received 2 views. For similar materials see Business Statistics in Math at University of South Carolina.

## Similar to Stat 206 at USC

## Reviews for Stat Chapter 2 Notes

### What is Karma?

#### Karma is the currency of StudySoup.

#### You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 10/03/16

STAT 206: Chapter 2 (Organizing and Visualizing Variables) Methods to Organize and Visualize Variables For Categorical Variables: Summary Table; contingency table (2.1) Bar chart, pie chart, Pareto chart, side-by-side bar chart (2.2) For Numerical Variables (Array), Ordered Array, frequency distribution, relative frequency distribution, percentage distribution, cumulative percentage distribution (2.3) Stem-and-Leaf display, histogram, polygon, cumulative percentage polygon (2.4) Other methods later… 2.1 Organizing Categorical Variables Must identify variable type to determine the appropriate organization and visualization tools èRecall Variable Types Categorical (Category) Nominal – Name of a Category Ordinal – Has a natural ordering Numerical / Quantitative (Quantity) Discrete – distinct cutoffs between values Continuous – on a continuum Definitions: Summary Table: shows values of the data categories for one variable and the frequencies (counts) or proportions/ percentages for each category Contingency Table: shows values of the data categories for more than one variable and the frequencies or proportions/percentages for each of the joint responses Each response counted/tallied into one and only one category/cell Example (Problem 2.2, p. 40): The following data represent the responses to two questions asked in a survey of 40 college students majoring in business: What is your gender? (M=male; F=female) What is your major? (A=Accounting; C=Computer Information; M=Marketing) Gender : M M M F M F F M F M Major: A C C M A C A A C C Gender : F M M M M F F M F F Major: A A A M C M A A A C Gender : M M M M F M F F M M Major: C C A A M M C A A A 1 Gender : F M M M M F M F M M Major: C C A A A A C C A C SummaryTable(Gender): SummaryTable(Major): relative relative value frequencyfrequency percentageue frequencfrequency percentage Male(M) 25 0.625 62.5 A(Accounting) 20 0.500 50.0 Female(F) 15 0.375 37.5 C(Computer) 15 0.375 37.5 M (Marketing) 5 0.125 12.5 TOTALS 40 1.000 100.0 TOTALS 40 1.000 100.0 Now to combine the two variables (Gender and Major): MAJOR CATEGORIES A C (Accounting (Computer M GENDER ) ) (Marketing) TOTALS Male (M) 14 9 2 25 Female (F) 6 6 3 15 TOTALS 20 15 5 40 Table based on Total percentages: MAJOR CATEGORIES A C (Accounting (Computer M GENDER ) ) (Marketing) TOTALS Male (M) 35% 22.5% 5% 62.5% Female (F) 15% 15% 7.5% 37.5% TOTALS 50% 37.5% 12.5% 100% Table based on Row percentages: MAJOR CATEGORIES A C (Accounting (Computer M GENDER ) ) (Marketing) TOTALS Male (M) 56% 36% 8% 100% Female (F) 40% 40% 20% 100% TOTALS 50% 37.5% 12.5% 100% Table based on Column percentages: MAJOR CATEGORIES A C (Accounting (Computer M GENDER ) ) (Marketing) TOTALS Male (M) 70% 60% 40% 62.5% Female (F) 30% 40% 60% 37.5% TOTALS 100% 100% 100% 100% 2 Questions: How many of the surveyed students were females majoring in Marketing? 3 What percentage of the surveyed students were females majoring in Marketing? 7.5% What percentage of the male students surveyed were majoring in Computer? 36% Of the students majoring in Accounting, what percentage was male? 70% 3 2.3 Visualizing Categorical Variables Pie chart – uses sections of a circle to represent the tallies/frequencies/percentages for each category Bar chart – a series of bars, with each bars representing the tallies/frequencies/percentages for a single category Summary Table (Major): value frequency relative frequency percentage A (Accounting) 20 0.500 50.0 C (Computer) 15 0.375 37.5 M (Marketing) 5 0.125 12.5 TOTALS 40 1.000 100.0 Consider our previous example for Major Category: Pie Chart Percentage by Major Category A (Accounting) C (Computer) M (Marketing) Bar Chart: Percentage by Major Category Question: Which major has the lowest concentration of students? Marketing 4 Summary Table of Causes of Incomplete ATM Transactions Cause Frequency Percentage ATM malfunctions 32 4.42% ATM out of cash 28 3.87% Invalid amount requested 23 3.18% Lack of funds in account 19 2.62% Card unreadable 234 32.32% Warped card jammed 365 50.41% Wrong keystroke 23 3.18% TOTAL 724 100.00% Discussion: Preference for type of chart? Bar Chart Pareto chart – a series of vertical bars showing tallies/frequencies/percentages in descending order Example: 5 Pareto Chart Discussion: How or why do you think that a Pareto chart would be useful in the business world? Helps identify the important “few” than the important “many Side-by-Side Bar charts – Uses sets A4256 Calibration Solution / Chips of bars to show the joint response 300,000 from two categorical variables 250,000 Example: 200,000 Discussion: What can you 150,000 determine about product utilization for this side-by-side bar chart that 100,000 you might not be able to tell 50,000 otherwise? 0 55 55 55 66 66 55 55 55 66 66 55 55 55 66 66 55 55 55 66 66 22 33 44 11 22 22 33 44 11 22 22 33 44 11 22 22 33 44 11 22 A A A A A B B B B B C C C C C D D D D D 6 2.2 Organizing Numerical Variables Ordered array arranges the values of a numerical variable in rank order (smallest value to largest value) Array è Ordered Array Example (Table 2.8 A & B, p. 42): Frequency Distribution tallies the values of a numerical variable into a set of numerically ordered classes, called a class interval How many classes? at least 5, no more than 15 Determine the interval width by the following: interval width = (highest value-lowest value)/number of classes Using our Meal Cost data, we estimate that we want __10________ classes so the interval width is: (80-25)/10=55/10=5.5 But we really want to simplify to multiples of $5 increments, say $5 or $10 but $5 produces 13 classes (more than we want) so we choose $10 to produce 7 classes (notice this is sometimes more art than science…) 7 Relative Frequency Distribution presents relative frequency, or proportion of the total for each group Proportion or relative frequency, in each group is equal to the number of values in each class divided by the total number of values Example: CITY SUBURBAN Meal Cost ($) Frequen Relative Percenta Frequenc Relative Percenta cy Frequency ge y Frequency ge 20, but <30 4 .08 8% 4 .08 8% 30, but <40 10 .2 20% 17 .34 34% 40, but <50 12 .24 24% 13 .26 26% 50, but <60 11 .22 22% 10 .2 20% 60, but <70 7 .14 14% 4 .08 8% 70, but <80 5 .1 10% 2 .04 4% 80, but <90 1 .02 2% 0 0 0% TOTALS 50 1 100.00% 50 1 100.00% TOTAL of the relative frequency column MUST BE 1.00 TOTAL of the percentage column MUST BE 100.00 Cumulative Percentage Distribution provides a way of presenting information about the percentage of values that less than a specific amount CITY and SUBURBAN Meal Cost < lower ($) Frequen Relative Percentag boundar Cumulative Percentage < lower cy Frequency e y boundary 20, but <30 8 0.08 8.0% <20 0 (no meals cost less than $20) 30, but <40 27 0.27 27.0% <30 8% = 0 + 8% 40, but <50 25 0.25 25.0% <40 35% = 0 + 8% +27% 50, but <60 21 0.21 21.0% <50 60% = 0 + 8% +27% + 25% 60, but <70 11 0.11 11.0% <60 81% = 0 + 8% +27% + 25% + 21% 8 70, but 92% = 0 + 8% +27% + 25% + 21% + <80 7 0.07 7.0% <70 11% 80, but 99% = 0 + 8% +27% + 25% + 21% + <90 1 0.01 1.0% <80 11% + 7% 100% = 0 + 8% +27% + 25% + 21% + TOTALS 100 1.00 100.0% <90 11% + 7% + 1% Question: What percentage of meal costs was less than $50? 60% (0+8%+27%+25%) 9 2.4 Visualizing Numerical Variables Stem-and-Leaf Display – How to create: 1. Separate each observation into Stem (all but final digit(s)) and Leaf (final digit(s)). 2. Write stems in vertical column – smallest on top 3. Write each leaf, in increasing numerical order, in row next to appropriate stem Example: For each state, percentage (with one decimal place) of residents 65 and older Notice stem of “7” does not have a leaf è we conclude no value of 7.x there should be the same number of leaves as observations! Include ALL stems even if no values/leaves Leave a space holder if no leaf for a stem No punctuation (i.e., no decimal points, no commas) Leaves should be lined on top of one another to determine SHAPE Simple way to deliver a lot of detailed information FOR THIS EXAMPLE read data values as: 6.8 - 8.8 9.8, 9.9 10.0,10.8 10 Histogram: Displays a quantitative variable across different groupings of values • Careful when choosing how to group together values! Groupings must cover the same range so have of equal width Height used to compare the frequency of each range of values Steps to create a frequency histogram: • Create equal width classes (groupings) • Count number of values HISTOGRAM of Meal Cost in each class • Draw histogram with a bar for each class • Height of a bar represents the count for that bar’s class • Bars touch since there are NO GAPS between classes Be careful: • Number of categories can’t be too large or too small • Don’t skip any categories • Be clear about contents of each category Histogram Example: using Age at Time of First Oscar Award: Groupings chosen here are: [20,25) [25,30) [30,35) [35,40) [45,50), … Where “[“ means the number is INCLUDED in the interval, but “)” means the number is NOT included in the interval • Question: If Jack Nicholson won Best Actor at age 70, which category frequency would increase? A. [60,65) B. [65,70) C. [70,75) D. [75,80) 11 Let’sexaminewhatitmeanstoturn afrequencyintoarela vefrequency bylookingattheageatOscardata TOTAL 76 • Rela vefrequencyhistogram depictstherela vefrequency (count)ofcategoriesquency • Doshapesofthefrequencyand rela vefrequencyhistograms differ? • Percentage polygon – used for visualization when dividing the data of a numerical variable into two or more groups Uses midpoints of each class to represent the data in the class Combines data from two groups to allow easier comparison Conclusions? 12 • Cumulative Percentage Polygon (Ogive) uses the cumulative percentage distribution (discussed previously) to plot the cumulative percentages along the Y axis LOWER BOUNDS of the class intervals are plotted on the X axis Conclusions? 13

### BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.

### You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

## Why people love StudySoup

#### "Knowing I can count on the Elite Notetaker in my class allows me to focus on what the professor is saying instead of just scribbling notes the whole time and falling behind."

#### "I made $350 in just two days after posting my first study guide."

#### "It's a great way for students to improve their educational experience and it seemed like a product that everybody wants, so all the people participating are winning."

### Refund Policy

#### STUDYSOUP CANCELLATION POLICY

All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email support@studysoup.com

#### STUDYSOUP REFUND POLICY

StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here: support@studysoup.com

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to support@studysoup.com

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.