### Create a StudySoup account

#### Be part of our community, it's free to join!

Already have a StudySoup account? Login here

# IntroductiontoBusinessStatistics STAT201

Drexel

GPA 3.77

### View Full Document

## 13

## 0

## Popular in Course

## Popular in Business

This 90 page Class Notes was uploaded by Theresia Dare on Wednesday September 23, 2015. The Class Notes belongs to STAT201 at Drexel University taught by Staff in Fall. Since its upload, it has received 13 views. For similar materials see /class/212423/stat201-drexel-university in Business at Drexel University.

## Reviews for IntroductiontoBusinessStatistics

### What is Karma?

#### Karma is the currency of StudySoup.

#### You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 09/23/15

Discrete Probabiliw Distributions Stat 201 Prof Yanni Papadakis Key Terms Discrete Probability Distribution Bernoulli Process Binomial Distribution Poisson Distribution Resampling Simulation Discrete Probability Distribution A set Xipi such that probabilities sum to one Spi1 For instance distribution of past sales right Millions Probability 10 020 15 025 20 035 25 015 30 005 Total 100 Discrete Probability Distribution USEFUL FORMULAS Mean m Zpi Xi Variance 32 pi Xi39m2 St Deviation s sqrt 2 Discrete Probability Distribution USEFUL FORMULAS With your partner calculate mean variance std deviation for distribution of sales Millions Prob 10 025 15 020 20 035 25 015 30 005 Total 100 What is the probability sales will be less than 2 million Bernoulli Process Two or more consecutive trials Two possible outcomes in each trial success failure Trials independent to each other ij l l 3 l it l ll 3 139 l Till Vlr vflli r l ma l u rl39 l i H quotE 1 l Probability of success constant for every trial Binomial Distribution Setting Count number of successes x in a Bernoulli Process with probability of success p after n trials We say x follows the Binomial Distribution with parameters n and p or XBnp 7 W Q pX1pnX Binomial Probability Does it make sense One flips a fair coin 3 times in a row and counts the number of heads H Which are the possible outcomes What is their probability Binomial Example B1205 0 000024 000024 1 0 00293 000317 2 001611 001929 3 005371 007300 4 012085 019385 5 019336 038721 6 022559 061279 7 019336 080615 8 012085 092700 9 005371 098071 10 001611 099683 2 3 4 5 6 7 8 9 101112 11 000293 099976 12 000024 100000 Binomial Distribution Exercise Using Binomial Formula for B1205 calculate PXlt2 Binomial Distribution Example According to the random walk theory for financial markets the history of the returns in stock prices cannot be used in predicting future outcomes Hence the number of months the stock market is up in a year can be predicted according to a Bernoulli process The monthly returns of the Dow Jones Industrials index have been positive with probability 59 During the past 12 months returns were positive 8 out of 12 times What is the probability of 8 months in 12 showing positive DJI returns DJI example continued B12059 Probabilities for B12059 are given mo prob cdf in the table to the What i5 0 000002 000002 the probability that 8 or more 1 000039 000041 months in 12 have positive 2 03900308 03900350 3 001479 001829 4 004789 006618 5 011027 017646 6 018513 036159 7 022835 058994 8 020538 079532 9 013135 092668 10 005671 098338 11 001484 099822 12 000178 100000 Binomial Distribution Example A sociology professor asks her class to observe cars havin a man and a woman in the front seat and recor which of the two is the driver Does the number of man drivers follow Bernoulli Process assumptions all obs at same location and time Explain why Bernoulli process assumptions do not hold if half observations are outside a church after mass and half on campus after a dance Students observe 10 cars during business hours in the retail district Past experience suggests probability of man driving is 85 in this location What is the probability of finding a man driver in 8 or more of cars observed What is the Sociology example continued Probability B10085 Distribution Plot shape of this distribution POISSON DISTRIBUTION Generation count of successes x in a Bernoulli process when n IS large and p IS small Examples Customer arrivals at a service point during a given period Defects per yard of wire Work related injuries per month Homicides per week Poisson Distribution Formulas Poisson parameter Inp average number of successes for desired number of repetitions or units of time IX equot PXx e 2718 XI Mean EXI Variance VarX I Poisson Example Estimate the probability that 2 people out of 200 randomly selected will have their birthday on July 4th neglect leap years What is the probability that 3 or more will have their birthday on Independence day x 0 1 2 3 4 5 6 7 8 9 P 05781 03168 00868 00159 00022 00001 0 0 0 0 Poisson Example Suppose that 1 of all items in a supermarket are unmarked A customer buys 10 items and proceeds to check out through the express lane Estimate the probability that the customer will be delayed because one or more of the items requires a price check Resampling Empirical Distributions Computer Simulation Return to last week s sales distribution When I need to generate random samples from an empirical distribution I draw random numbers using software Random Numbers RNs range from 0 to 1 and are independent to each other Select a Sales scenario using RN 047 For cdf values between 045080 select 20 millions Millions Prob cdf 10 020 020 15 025 045 20 035 080 25 015 095 30 005 100 Simulation Example Scenario rand Sales 1 0258138 15 2 0589308 2 3 0801168 25 4 0486054 2 5 0743099 2 6 0181972 1 7 0362377 15 8 0117472 1 9 0905507 25 10 0193162 1 Expected Value Using Simulation Not Smart Formula Is Better Average Sales in 10 simulated scenarios 17 million True Average formula 18 million 392 What if we need to perform complex calculations Simulation Example Investment Analysis A company contemplates entering a market in order to realize sales with the previous distribution 392 A 5 million dollar piece of equipment needs to be bought In addition the company need to pay wages equal to O5sqrtSales What is the expected profit of the company in 5 years lifetime of the equipment neglecting discounting Numerical Data Summaries Stat 201 Prof Yanni Papadakis Sample Descriptive Statistics eg Sample Mean Sample Standard Deviation They are used as estimates of Exact Population Parameters which are typically unknown There are different formulas for the calculation of Population Parameters and Sample Statistics notably St Deviation Descriptive Statistics Serve As Numerical Summaries of the Sample One Numerical Variable Analysis NEED TO KNOW USE Center Mean Median Spread St Deviation Interquartile Range Shape Skewness Kurtosis or Compare Mean amp Median Can a few numbers Descriptive Statistics replace all the many observations in the dataset Often answer is a quali ed YES USE both Numerical amp Graphical Summaries Gretzky Goals Dataset STEMPLOT 09 116 2335 318 4001 51245 62 713 87 92 Use of Standard Deviation Frequently applying rule Normal Distribution 1 stdev of Mean find 68 of sample 2 stdev of Mean find 95 of sample 3 stdev of Mean find 997 of sample Always applying rule Chebyshev Theorem Sample proportion within k stdev of Mean 1 I Gretzky Goals Quartiles DataArray 911162323253138404041515254556271738792 Position 12 3 45 6 7 8 91011121314151617181920 MedianM Position12n1 1st Quartile Q1 Position14n1 3rGI Quartile Q3 Position34n 1 InterQuartile Range IQR Q3 Q1 Gretzky Goals Quartiles DataArray 911162323253138404041515254556271738792 Position 12 3 45 6 7 8 91011121314151617181920 Median Pos12201105 M 40 050 41 050 405 1St Quartile Pos14201525 Q1 23 075 25 025 235 split difference more weight to closest 3rd Quartile Pos34201 1575 Q1 55 025 62 075 6025 split difference more weight to closest InterQuartile Range IQR Q3 Q1 3675 Boxplot with no outliers Gretzky Goals Max 92 8 8 QB 6025 LO Lo39 0 g II M 405 a 9 Q1 235 a Min 9 Boxplot With Outliers 15 x IQR Rule Mark as outlier if Observation is higher than Q3 15 IQR Or Observation is lower than Q1 15 IQR How to construct boxplot Stop upper antenna with last in bounds observ Mark with circle or cross all top outliers Same below Stop lower antenna with last inbounds observ Mark with circle or cross all bottom outliers Boxplot With Outlier Data Array with ONE NEW OBSERVATION 911162323253138404041515254556271738792150 Position 12 3 45 6 7 8 9 1011121314151617181920 21 140 20 40 60 80 100 Median IQR are resistant to outliers Original Gretzky Goals 1 Extra 150 Goal Season Min 900 Min 900 lst Qu2450 lst Qu 2500 lMedian 4050 Median 4100l lMean 4470 Mean 4971 3rd Qu5675 3rd Qu 6200 Max 9200 Max l5000 lSt Dev2402 St Dev 3280 lIQR 3225 IQR 3700 OOOU JgtUII I I I I I I LO L wNI O Racquetball Tennis 2 11 97 70 14 42 26 09 17 36 10 36 18 16 89 wwwwwwwwwwwmmw 97 3 w wW WWW ww 77 51 91 49 71 97 43 47 74 77 08 31 27 17 90 Boxplots are easier to assess when they are sidebyside LO 139 i O 139 0 Lo 5 m I Cgt m o Racquetball Tennis Overview of Numerical Summaries Resistant m Median I Spread IQR MAD Mean Absolute Deviation Non Resistant I m Mean Mode good for symmetric data only Spread St Deviation Variance Range Coef of Variation StDev Mean 00 Shape Skewness Kurtosis Density Sample Shape Symmetric Distribution Right Skewed Distribution Density I I I I I I 105 110 85 90 95 100 0 1 000 2000 3000 115 Sample Shape Using Boxplot Rig ht Skewed Symmetric Apartment Rent 500 1000 1500 2000 2500 0 O 102 103 Store Sales X 1000 100 101 Sample Shape Using Desc Stats Symmetric Distribution Mean about equal to Median Skewness about equal to zero Right Skewed Distribution Mean to the right of higher than Median Skewness is greater than zero Left Skewed Distribution The opposite to Right Skewed Sample Shape Using Desc Stats Apartment Rents Mean 4603 Median 3085 IQR 4625 StDev 4539 Skewness 19 Right Skewed x gtgt M Store Sales Mean 1001 Median 1000 IQR 13 StDev 11 Skewness 01 Symmetric Find M IQR use Stemplot Babe Ruth s Home Runs in his 15 seasons with the NY Yankees 54 59 35 41 46 25 47 6O 54 46 49 46 41 34 22 Comparing Heterogeneous Data Month Gold oz Slab Zinc b Feb 93 3294 0509 Mar 3290 0473 Apr 3419 0481 May 3667 0472 June 3719 0448 July 3924 0451 Aug 3785 0429 Sep 3549 0424 Comparing Heterogeneous Data Gold oz Slab Zinc b Mean 3581 0461 StDev 233 0028 CoefVariation 65 62 Standardized Data To standardize data use formula Standardized Heterogeneous Data Month Gold oz Slab Zinc b Feb 93 123 170 Mar 125 043 Apr 069 071 May 037 039 June 059 045 July 147 035 Aug 088 112 Sep 014 130 Correlation Coefficient Summary of Numerical Variable Relationships 2 2 O 0 0 g g R 6 6 gt gt o D 1 L0 I I F C C 8 8 3 3 8 n n O 5 1O 15 20 25 Explanatory Variable Explanatory Variable D D o LO E o O O o m 3 3 80 oos 030 000 oo boo 3 gt o o gt Ln 0 9 CD 0 c oo qm O 0 Ln 2 a 6 g 9 o o 0 0 g o o O 9 Ln 3 ID o 0 3 v 39 D F I I I I I 05 90 95 100 105 110 Explanatory Variable Explanatory Variable y3 Corr Coef Not Enough to Describe Data Relationships corr coef 081 m o w y4 12 10 corr coef 081 CD00 8 x2 corr coef 081 10 12 14 16 18 x4 Sampling Distributions Stat 201 Prof Yanni Papadakis Key Terms Sampling Distribution Distribution of Sample Mean Central Limit Theorem Distribution of Sample Proportion Rules for Expectation Variation Let X be a random variable belogging to a population with mean m5 and st devration s Let Y be a random variable belonging to a population with mean m4 and st deviation s1 Choose randomly one X and one Y Their sum has the following statistics mXY mxmv 54 9 2gtltYszxszY 2212 41 5 sxY224 Their difference has the following statistics mXY mxmy 54 1 2XYszxszY 2212 41 5 sXY224 Choose randomly one X and multiply it by constant k3 Call Zk X the resulting random variable m2 ka 3X5 15 2 kslt 3X2 6 Example Revenue in USD millions XN31 Cost in USD millions YN2505 When XY follow the Normal distribution their sum and difference also follow the Normal distribution What is the distribution of profit 1 USD 15 CAD What is the distribution of profit in CAD Example Tom and George are two professional golfers Their scores vary every time they play Tom s score distribution is XN11010 and George s is YN1008 If they play against each other and their performance is independent to each other what is the probability George will win Hint Find the distribution of their score difference Sampling Distribution From any population X with mean m5 and stdeviation s2 choose random samples of size n20 The sampling distribution of the mean is the distribution of the averages of many samples The statistics of sample averages x are 0X 2 2447 2 5 072 ILLx X x J m Sample Means From Normal Distribution Let XN103 Draw the histogram from one random sample of size 50 0 1 3 4 averages 90 2 1 BOOH 5 9 3 1 8 8 0 9 6 0 Hi4 masoo 8 56 7 65 6 93 162 145 115 121 106 34 Generate 15 random samples and calculate their 94 100 101 103 91 93 99 102 106 95106 95 99 102 100 Draw the histogram of the 15 averages What is the distribution of the 15 averages Sample Means From Normal Distribution Distribution of1 sample and of sample means Sample Sample Means 1 67 83 01 Central Limit Theorem If samples are drawn from a normal population then the distribution of sample means is Normal SURPRISINGLY no matter what the distribution of a population the distribution of random sample means is NORMAL Sample Means From Exponential Distribution Let X exponential with I 5 Draw the histogram from one random sample of size 50 o o H 2 11 57 17 2 0 17 16 50 4 6 02 12 06 3 w o H N o o 4 6 4 H 15 02 83 51 07 04 09 06 03 m w o 9 07 0 32 7 01 HDOOO moem wbhg le O mhbb l WNO mEHb 3 1 1 7 Generate 15 random samples and calculate their averages 26 19 20 18 19 16 21 22 20 19 20 17 20 21 21 Draw the histogram of the 15 averages What is the distribution of the 15 averages Sample Means From Exponential Distribution Distribution of 1 sample and of sample means Sample Sample Means Example A roulette has 38 slots 18 black 18 red 2 reen In a fair roulette ball is equally likely to Ian on each slot A ambler plays 1 on red if the ball lands on red eshe makes 1 eitherwise heshe loses bank wins 1 What is the probability gamber wins 1 in one round According to Central Limit Theorem what is the distribution of gambler s profits after 100 consecutive bets of 1 100000 gamblers bet once how much do you expect the 10 biggest winners to make How much does the bank make Insurance Example We buy insurance to cope nancially with risks that have a low probability but result in very high losses An insurance company looks at re loss statistics and finds that out of 1m homes 1000 incurred damages due to re with average total losses equal to 250m and standard deviation of total losses equal to 1000m right skewed distr The company plans to sell re policies for 275 250 average loss home25 expenses Explain why it is not wise to sell only 12 policies but if many thousands of policies are sold then the company s cash ow is fairly steady If the company sells 10000 policies what is the probability average losses exceed 275 Sample Means Binomial Distribution Success Count Recall if XBnp then quotbFnP 52xnp139P What is the distribution ofX in a random sample Close to Normal np 5 and n1 p 5 What is the distribution of Xaverage in many samples of course sample size is n XNumto mtl1 7tl Sample Means Binomial Distribution Proportion of Success PROPORTION OF SUCCESS DEFINITION pXn What is the distribution of paverage in many samples of course sample size is n ka n7 uP 7 7 7739 Even in one sample 2 p is distributed 02 0X 7751 7757751 775 closeto Normal P n2 n2 1 when Binomial approx is valid 7r1 7r p N N 77 le np 5 and n1 p 5 n Sample Means From Binomial Distribution Let XB5001 and pX50 Draw the histogram from one random sample of size 50 X8 p850016 Generate 15 random samples and calculate their averages X 3 2 4 10 5 4 2 8 6 8 8 4 5 6 4 p 006 004 008 020 010 008 004 016 012 016 016 008 010 012 008 Draw the histogram of the 15 averages What is the distribution of the 15 averages Sample Proportion of Success From Binomial Distribution Distribution of 1 sample and of sample means One Sample Sample Means l l l l 01 00 03 04 lll l l 01 02 Graphical Data Summaries Stat 201 Prof Yanni Papadakis Driver Speed in a stretch of highway I RAW DATA 542 587 712 756 537 570 621 674 646 690 652 641 820 620 543 736 762 551 707 683 628 683 694 592 602 754 554 612 508 671 560 807 801 637 709 546 697 825 705 658 732 669 835 609 563 706 562 616 517 768 484 569 745 624 548 604 771 607 701 632 Raw data are difficult to read I GRAPHICAL SUMMARY HISTOGRAM 0 70 80 90 More SPEED Raw data are difficult to read Why do we need summaries REASON 1 so that WE the analysts understand what is going on REASON 2 so that after we have understood tendencies and relationships in the dataset we can represent them to OTHER DECISION MAKERS our customer boss Data Summaries Graphical eg Histogram Scatter Diagram Numerical eg Mean Median Different Graphical Summaries Are Used Depending On Random Variable type Numerical Qualitative Whether we are looking at tendencies in ONE random variable or at relationships between MANY random variables What do we try to read in a graphical summary ONE NUMERICAL VARIABLE Central Tendency If one exists If not distinguish between different groups IS THERE A VALUE DATA POINTS CENTER AROUND Spread ARE DATA POINTS CLOSE OR FAR FROM CENTRAL TENDENCY Shape WHAT IS THE SHAPE OF THE DATASET BELL CURVE SKEWED OTHER Outliers ARE THERE ANY EXCEPTIONS To THE MAIN PATTERN WHY DATA COLLECTION ERROR OR REAL EXCEPTION To THE RULE Graphical Summaries for ONE NUMERICAL Random Variable Histogram Stemplot Dotplot Boxplot Time Series Plot Line Graph Frequency Histogram Histogram of G retzkyGoas I I I I 20 4O 60 80 G retzkyGoas Frequency Another Histogram Histogram of G retzkyGoas I I I I 20 4O 60 80 G retzkyGoas 100 Frequency 10 And Another Histogram of G retzkyGoas I I I I 20 4O 60 80 GretzkyGoas 100 Stem plot Gretzky Goals Dataset 5155 92 7187 73 52 62 40 54 40 413116 381123 25 23 9 Data Array 91116 23 23 25 31384040415152 54 55 62 71738792 STEM PLOT Construct the Stemplot Babe Ruth s Home Runs in his 15 seasons with the NY Yankees 54 59 35 41 46 25 47 6O 54 46 49 46 41 34 22 Dotplot Dot Scale Diagram Values 1st Quartile 3rd Quartile 1 Std Dev 2 Std Dev 3 Std Dev Boxplot Gretzky Goals 80 60 4o 20 Time Series Plot Graphical Summaries for ONE QUALITATIVE Random Variable What do we try to read Central Tendency or Dominant Categories Relationships between categories Out of Pattern observations Qualitative Variable Dataset ATM time use per service Age Gender Seconds 1 a under 30 years Female 501 2 a under 30 years Male 530 3 b 30 60 years Female 432 4 a under 30 years Female 349 5 c over 60 years Male 375 6 b 30 60 years Male 378 7 c over 60 years Male 494 8 a under 30 years Male 505 9 c over 60 years Male 481 10 a under 30 years Female 276 ll c over 60 years Female 556 12 b 30 60 years Female 508 Qualitative Variable Dataset TABULAT ION OF FREQUENCIES Gender Age Female Male a under 30 years 10 7 b 30 60 years 8 13 c over 60 years 4 8 Pie Charts Group Frequency Count for Females Count for Males under 30 years 0 b 3060 y39 a under 30 years b 3060 years over 60 years c over 60 years 10 Bar Charts Group Frequency Count for Females a under 30 years c over 60 years 10 0 Count for Males a under 30 years c over 60 years 8 1O 12 6 a DoubleBar Chart Counts by Category under 30 years cover60 years MORE THAN ONE Qualitative Vars Crosstabulation Time Averages in Seconds Gender Age Female Male a under 30 years 3741000 3927143 b 30 60 years 4246250 3453077 0 over 60 years 4995000 4380000 MORE THAN ONE Qualitative Vars AVERAGE ATM USE TIME redfemaes yellowmaes 40 MultipleBar Plot 8 I I o 20 10 a under 30 years c over 60 years Stat 201 Statistics I Prof Yanni Papadakis Things People Know About Stat Don t be a statistic Things People Don t Know About Stat People don t believe in Statistics even when it is clear that they should See research on drug abuse and car accidents It will never happen to mequot well it might People believe in Statistics even when they shouldn t ABC s Nightine show once asked Should UN continue to have its headquarters in the US 186000 callers responded 67 said NO Properly Designed Study showed 72 say YES People believe in Statistics even when they shouldn t A website poll is conducted about the question Should female athletes be paid the same as men for the work they do 13147 responders clicked on YES 44 50 of responders clicked on NO and the remaining on NOT SURE Is this what you expected Should we believe this result as fact More men than women use the web Could this affect the results Statistical Terminology Population Entire set of objects people statistical results apply to Sample A subset of the Population We usually work with random samples every population member has the same chance of participating in the random sample Random samples are representative of the population Statistical Terminology Parameter A quantity of interest it is a characteristic of the Population eg mean income in the City of Philadelphia proportion of Americans to vote in next election Statistic A measured variable taken from the sample Statistics serve as estimates of population parameters eg the proportion of subjects in a random sample of 100 Americans declaring intention to vote may serve as an estimate of voting intention of the whole population Statistical Terminology Random Variable A variable the exact value of which is not known but what is known are the odds of this variable taking a value within any predefined region Tomorrow s Temperature High is a random variable We don t know its exact value but meteorologists do know the odds of it being between 55650F Types of Variables I QUALITATIVE I CATEGORICAL Detergent Bought Ticle Wisk Generic I ORDINAL How much customer likes product not at all somewhat satisfied very satisfied I BINARY Car purchase Import Domestic Types of Variables 39 QUANTITATIVE Discrete Year investment will be paid back 2004 2005 2006 Continuous Boxer Weigh t 1801 1990 2015 2023 Statistics in Decisions Exercise 111 Restaurants sometimes provide customer reaction cards so that customers can evaluate their dining experience at the establishment What kind of decisions might be made on the basis of this information What other data measure customer satisfaction Which one is the most informative

### BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.

### You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

## Why people love StudySoup

#### "Knowing I can count on the Elite Notetaker in my class allows me to focus on what the professor is saying instead of just scribbling notes the whole time and falling behind."

#### "When you're taking detailed notes and trying to help everyone else out in the class, it really helps you learn and understand the material...plus I made $280 on my first study guide!"

#### "Knowing I can count on the Elite Notetaker in my class allows me to focus on what the professor is saying instead of just scribbling notes the whole time and falling behind."

#### "Their 'Elite Notetakers' are making over $1,200/month in sales by creating high quality content that helps their classmates in a time of need."

### Refund Policy

#### STUDYSOUP CANCELLATION POLICY

All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email support@studysoup.com

#### STUDYSOUP REFUND POLICY

StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here: support@studysoup.com

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to support@studysoup.com

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.