### Create a StudySoup account

#### Be part of our community, it's free to join!

Already have a StudySoup account? Login here

# PRIN OF STATISTICS I STAT 211

Texas A&M

GPA 3.54

### View Full Document

## 31

## 0

## Popular in Course

## Popular in Statistics

This 29 page Class Notes was uploaded by Darien Kutch on Wednesday October 21, 2015. The Class Notes belongs to STAT 211 at Texas A&M University taught by Staff in Fall. Since its upload, it has received 31 views. For similar materials see /class/225762/stat-211-texas-a-m-university in Statistics at Texas A&M University.

## Popular in Statistics

## Reviews for PRIN OF STATISTICS I

### What is Karma?

#### Karma is the currency of StudySoup.

#### You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 10/21/15

Statistics 1 Student 1 An Overview of Probability and Statistics 11 What is Statistics In short Analysis of data all stages Common perceptions of statistics The above are descriptions of the world using numbers This is a part of Statistics very visible but statistics deals with more than describing phenomena Examples of Statistics Polio Vaccine In the 1950 s Polio was a serious disease that affected countless people mostly children In 1954 0 401974 children vaccinated 0 201229 with a trial vaccine and 0 200745 with a placebo There where a total of Polio cases for placebo versus for vaccine Question k Chapter 1 Descriptive Statistics Copyright 19982004 by Henrik Schmiediche Slide 1 Statistics 1 Student Unemployment We desire to know the unemployment rate Problem How do you find the answer How accurate are the results Stress Traffic lights are installed to aid in merging into the Interstate l75 in Tampa FL 0 Stress level of drivers is measured before the lights 0 After the lights it was Question 12 Branches of Statistics Descriptive deductive statistics Statistical methods that summarize and describe the prominent features of data Inferential inductive statistics Statistical methods that generalize from a sample to a population Population The entire collection of individuals objects or measurements about which information is desired Sample A part or subset of the population k Chapter 1 Descriptive Statistics Copyright 19982004 by Henrik Schmiediche Slide 2 Statistics 1 Student Examples o Polio o Unemployment 0 Stress A population or sample is not static but depends upon the definition of the problem Most of the samples in this course will be a sample that is randomly chosen from the population Historically statistics were far more important than statistics Nowadays the reverse is true 13 Data Definitions Categorical vs Numerical Categorical Observations that are only classified into groups Examples Numerical Observations that have a numerical quality about them Examples k Chapter 1 Descriptive Statistics Copyright 19982004 by Henrik Schmiediche Slide 3 Statistics 1 Student Classify the following State of birth Weight on birth Date of birth Zip code Discrete vs Continuous Discrete A variable is discrete if it can assume only a countable number of possible values Examples Continuous A variable is continuous if it can assume an uncountable number of values Examples There will usually be practical limitations on the accuracy any continuous variables has Data Sets Univariate Data set consists of variable Bivariate variables Multivariate more than variables Example of a multivariate data set k Chapter 1 Descriptive Statistics Copyright 19982004 by Henrik Schmiediche Slide 4 Statistics 1 Student 2 Pictorial and Tabular Methods in Descriptive Statistics Consider the Following Data Set Chp 1 10 The concentration of suspended solids in river water is an important environmental characteristic The paper Water Quality in Agricultural Watershed Impact of Riparian Vegetation During Base Flow Water Resources Bull 1981 pp 233239 reported on concentrations in parts per million or ppm for several different rivers Suppose the following 50 observations had been obtained for a particular river 558 609 370 913 658 423 338 606 760 690 459 391 355 560 446 717 612 615 472 745 832 400 317 367 623 473 946 563 300 682 753 714 652 526 582 480 618 788 398 650 607 771 591 495 693 698 649 271 871 663 Question What does this data tell us about the concentration of suspended solids First few steps in analyzing a data set 1 Organize and summarize the data 2 Find the center of the data 3 Examine the spread of the data X Chapter 1 Descriptive Statistics Copyright 19982004 by Henrik Schmiediche Slide 5 Statistics 1 Student 21 Stern and Leaf Display A compact and descriptive method of organizing data without losing any information in the data 0 Leading digits are stems o Trailing digits are leaves 0 Indicate units somewhere on the display 0 Option Sort the leaves 0 Comparative stem amp leaf 0 Repeat stems if need be Advantages o No loss of information 0 Easy to do for small data sets Disadvantages 0 Time consuming for large data sets by hand 0 Cannot be used for categorical data 0 Very space consuming for large data sets k Chapter 1 Descriptive Statistics Copyright 19982004 by Henrik Schmiediche Slide 6 Statistics 211 Student A StemandIeaf display of the solids data set X 558 459 832 753 607 609 391 400 714 771 370 355 317 652 591 913 560 367 526 495 658 446 623 582 693 423 717 473 480 698 338 612 946 618 649 606 615 563 788 271 760 472 300 398 871 690 745 682 650 663 Chapter 1 Descriptive Statistics Copyright 19982004 by Henrik Schmiediche Slide 7 Statistics 211 Student Stemandleaf display of the solids data set with sorted leaves 7 0245779 002567789 366689 111112255566899 01245679 37 15 k0ltI39lO U liJgtUJN units Stemandleaf display with multiple leaf values on a stem 7 024 5779 002 567789 3 66689 1111122 55566899 0124 5679 3 kOkOCOCOQQOWO tU IU libiPUJUJN 7 1 5 units k ppm ppm Chapter 1 Descriptive Statistics Copyright 19982004 by Henrik Schmiediche Slide 8 Statistics 1 Student Comparative Stemandleaf display on the solids data set taken two years earlier Two Years Current 8 1 9851 2 7 9887640 3 0245779 9997765322111 4 002567789 877554200 5 366689 9887653221 6 111112255566899 72210 7 01245679 95 8 37 9 15 units ppm Sometimes we redefine the leaves for lownumbered or narrow data sets 58 58 57 54 54 54 57 57 56 56 57 51 58 54 52 52 54 6O O 59 OO 58 00000000000 57 0000000000 56 0000000000 55 0000000000000 54 0000000000000 53 0000 52 000 51 O k Chapter 1 Descriptive Statistics Copyright 19982004 by Henrik Schmiediche Slide 9 Statistics 1 Student 22 Frequency Distributions for Quantitative Data A very popular way to summarize data is with a frequency distribution A frequency distribution is a compact summary of a data set using a table with 3 or 4 columns Class interval or category disjoint intervals of each obs in the data set Frequency Number of obs in a class interval Relative frequency Proportion of obs in interval Cumulative frequency Sum of the relative frequencies ZZZ fn Number of classes 5 to 20 Use for a rough idea k Chapter 1 Descriptive Statistics Copyright 19982004 by Henrik Schmiediche Slide 10 Statistics 1 Student A Frequency Distribution for the solids data set 558 609 370 913 658 423 338 606 760 690 459 391 355 560 446 717 612 615 472 745 832 400 317 367 623 473 946 563 300 682 753 714 652 526 582 480 618 788 398 650 607 771 591 495 693 698 649 271 871 663 50 observations Approximate number of classes Classlnterval Tally Frequency Relativef Cumulativef k Chapter 1 Descriptive Statistics Copyright 19982004 by Henrik Schmiediche Slide 11 Statistics 211 Student 23 Histogram 1 2 k A histogram is a pictorial representation of a frequency distribution Draw a Xaxis and mark class intervals Draw a rectangle whose area is proportional to the frequency of that interval l l 20 4 0 100 l l 0 60 8 solids Chapter 1 Descriptive Statistics Copyright 19982004 by Henrik Schmiediche Slide 12 statistics 1 Student A true histogram or a density scale will have an area that is equal to 10 In that case we make the Relative Frequency 5 1 H 39 ht Rec ang e 61g Base Length In the case where all the intervals are of equal length all we need to do is add the appropriately labeled yaxis 0010 0015 0020 0025 0030 l l l l l 0005 l 00 l l l 20 4 l l l 0 60 8 0 100 solids k Chapter 1 Descriptive Statistics Copyright 19982004 by Henrik Schmiediche Slide 13 Statistics 1 Student Histograms often exhibit particular shapes o unimodal o bimodal o multimodal o symmetric o positively skewed o negatively skewed k Chapter 1 Descriptive Statistics Copyright 19982004 by Henrik Schmiediche Slide 14 Statistics 1 Student 24 The MampM Data Set Some important questions How many MampM s are their in a regular plain size MampM bag More importantly how many red MampM s are there HW assignment Buy a MampM bag small plain and count the number of MampM s and the number of red MampM s Email them to me at henrikstattamuedu Part of homework 1 assignment red MampM s for 13 1 Create a StemandLeaf plot Create a frequency distribution Plot a histogram density scale LOOK Create a Comparative StemandLeaf plot of the total number of MampM s I will post the data on thew web to that of Spring 1998 next slide Do you think the total number of MampM s per bag has changed k Chapter 1 Descriptive Statistics Copyright 19982004 by Henrik Schmiediche Slide 15 Statistics 211 Student 241 The MampM Data set for Spring 1998 Total MampM s n 68 58 58 57 54 54 54 57 51 58 54 52 54 58 55 56 52 57 55 56 55 55 57 58 54 58 55 56 55 55 57 6O 55 58 54 55 54 55 56 58 59 54 53 Red MampM s n 66 18 16 13 6 6 7 14 2 15 9 5 8 18 11 12 4 15 9 12 10 11 14 17 7 15 10 11 10 10 14 11 15 7 10 7 12 13 19 8 5 3 9 k 57 55 57 53 56 56 52 57 54 58 53 54 57 54 14 16 56 56 59 55 54 53 12 11 2O 00 The following data was collected by the Spring 1998 Stat 211 class 56 56 58 57 58 55 Chapter 1 Descriptive Statistics Copyright 19982004 by Henrik Schmiediche Slide 16 Statistics 211 Student 3 Measures of Location Another step in gaining understanding of our data is to find the center of our data What is the center 31 Mean Average Average If we consider each number to have a weight equal to its value then the average is the value which equally divides the data by weight Think of a seesaw We calculate the average as follows Sample Average Population Average j21i HZ V1yj mi The i th observation in the sample yj The j th value in the population 71 Sample size N Population sample size K Note 81 mn is a sample from population y1 yN Slide 17 Chapter 1 Descriptive Statistics Copyright 19982004 by Henrik Schmiediche Statistics 211 Student Example Calculate the average number of red MampM s forthe Spring Red MampM s n 66 1998 MampM data set k 18 16 13 6 6 7 15 14 12 12 14 2 15 9 5 8 1O 8 11 13 18 11 12 4 15 9 14 16 20 15 12 1O 11 14 17 7 6 6 9 13 15 1O 11 1O 1O 14 12 9 8 19 11 15 7 1O 7 12 13 5 11 11 13 19 8 5 3 9 x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x 234567891011121314151617181920 Chapter 1 Descriptive Statistics Copyright 19982004 by Henrik Schmiediche Slide 18 Statistics 1 Student 32 Median Median The middle observation of the sorted data set Sample Median it Population Median 1 We calculate the median 71 Odd E n12 71 even it Lyn2 n222 Example Calculate the median number of red MampM s forthe Spring 1998 MampM data set Red MampM s n 66 2 3 4 5 5 5 6 6 6 6 Discussion 0 Mean and for different types of smoothed histograms distributions slide 14 o How do outliers affect the mean and median k Chapter 1 Descriptive Statistics Copyright 19982004 by Henrik Schmiediche Slide 19 Statistics 211 Student 33 Other Measures of Location 331 Trimmed Mean A trimmed mean is a compromise between 1 and it in that outliers will the mean It is calculated by eliminating a certain percentage of the data For example a 10 trimmed mean would eliminate 10 ofthe observation from each end of the data 20 total and average the remaining 80 of the observations For example If we have a sample of 100 observation and we want to find E12 12 trimmed mean how many observations must we eliminate from each end Solution We have n 100 observations 12 ofthis is each end for a total of 24 observations a fractional number of data points In this case we avoid the issue and simply round the number of observations that are removed k have some effect on the trimmed mean but not as much as they have on observation from both ends and calculating the average of the remaining 100 X 12 12 Therefore we eliminate 12 observation from There are a variety of ways to handle the case where we need to chop of Chapter 1 Descriptive Statistics Copyright 19982004 by Henrik Schmiediche Slide 20 Statistics 1 Student Example Calculate the 10 trimmed mean of red MampM s forthe Spring 1998 MampM data set Red MampM s n 66 2 3 4 5 5 5 6 6 6 6 k Chapter 1 Descriptive Statistics Copyright 19982004 by Henrik Schmiediche Slide 21 Statistics 1 Student 332 Percentiles and Quartiles The p th percentile is the observation in our data set where p are equal to or less than this observation The median is the 50 th percentile To calculate the p th percentile mm 1 Let 0 refer to our data set in ascending order 2 Let z p np100 3 Find the first index i such thati gt ip 4 The p th percentile is then xlti 1gt2ltigt ifi 1 z p mlpl 182 othenvise Q1 First Quartile 25 th percentile lower fourth Q2 Second Quartile 50 th percentile median Q3 Third Quartile 75 th percentile upper fourth IQR f3 Q3 Q1 lnterquartile Range or Fourth Spread k Chapter 1 Descriptive Statistics Copyright 19982004 by Henrik Schmiediche Slide 22 Statistics 1 Student Example Calculate Q1 and Q3 for our Spring 1998 MampM data set Red MampM s n 66 2 3 4 5 5 5 6 6 6 6 k Chapter 1 Descriptive Statistics Copyright 19982004 by Henrik Schmiediche Slide 23 Statistics 1 Student 333 Boxplots Box plots are useful in summarizing various aspects of the data Sidebyside box plots provide useful comparisons of two or more sets of data Form an axis that includes all possible values of the data I Draw a box extending from Q1 to Q3 00 Draw a vertical bar at the median 4 Draw whiskers horizontal lines out 15 IQR from each end of the box 01 lndicate mild outliers with a o 15 30 IQR from each end ofthe box 6 Indicate extreme outliers with a more than 30 IQ from each end ofthe box k Chapter 1 Descriptive Statistics Copyright 19982004 by Henrik Schmiediche Slide 24 Statistics 1 Student Example Calculate the summary statistics 1 1 Q1 Q3 forthe water quality data set Then construct a box plot 8i ii Particulate Matter solidsppm 0 i k Slide 25 Chapter 1 Descriptive Statistics Copyright 19982004 by Henrik Schmiediche Statistics 1 Student 334 Categorical Data and Sample Proportions We cannot calculate mean and median for categorical data However we can calculate a sample proportion We calculate the sample proportion Count Proportlon p For example What proportion on the average of MampM s are red in for the Spring 1998 MampM data set red 1106 Emmi 5561 k Chapter 1 Descriptive Statistics Copyright 19982004 by Henrik Schmiediche Slide 26 Statistics 1 Student 4 Measures of Variability The mean median etc do not give us a complete overview summary of our data For Example Consider the following three data sets Data 1 2O 30 4O 50 60 7O 50 350 1871 2 2O 43 44 46 47 7O 50 252 1587 3 4O 43 44 46 47 50 1O 12 346 The mean and median is 45 for all three data sets These data sets have very different spreads Ways to measure spread Range range maximum observation minimum observation Average the Deviations from the Mean We define the i th deviation to be xi LE Intuitive We average the deviations Zm 22gt Problem this does not give us anything useful k Chapter 1 Descriptive Statistics Copyright 19982004 by Henrik Schmiediche Slide 27 Statistics 1 Student Variance When we average the squared deviations from the mean and divide by n 1 instead of n we get a measure of spread we call the variance 1 2f 72 sin 1Eltxz 10 Calculation formula 82 ni1Z 271 The population variance is represented as 02 We will learn later why we divide by n l instead of n Standard Deviation The units of the variance are units of the data squared To make the units the same as that of the data set we take the square root of the variance This is called the standard deviation 8 V 32 s is translation invariant 31101 81101 a xn a Va 8 is scale equivariant 8dx1 dvn ds1 Va The population standard deviation is a k Chapter 1 Descriptive Statistics Copyright 19982004 by Henrik Schmiediche Slide 28 Statistics 211 Student Example Calculate the range variance and standard deviation of red MampM s forthe Spring 1998 MampM data set Remember 1 1106 Red MampM s n 66 15 14 12 12 14 16 20 15 k Chapter 1 Descriptive Statistics Copyright 19982004 by Henrik Schmiediche Slide 29

### BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.

### You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

## Why people love StudySoup

#### "Knowing I can count on the Elite Notetaker in my class allows me to focus on what the professor is saying instead of just scribbling notes the whole time and falling behind."

#### "I bought an awesome study guide, which helped me get an A in my Math 34B class this quarter!"

#### "There's no way I would have passed my Organic Chemistry class this semester without the notes and study guides I got from StudySoup."

#### "Their 'Elite Notetakers' are making over $1,200/month in sales by creating high quality content that helps their classmates in a time of need."

### Refund Policy

#### STUDYSOUP CANCELLATION POLICY

All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email support@studysoup.com

#### STUDYSOUP REFUND POLICY

StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here: support@studysoup.com

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to support@studysoup.com

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.