# Week 1 Notes! STAT 121

BYU

This 8 page Class Notes was uploaded by Amanda Berg on Saturday September 5, 2015. The Class Notes belongs to STAT 121 at Brigham Young University taught by Dr. Christopher Reese in Fall 2015.

Date Created: 09/05/15

STAT 121 Lecture 1 Notes 1 What is statistics a Science of extracting meaning form data b Art of persuading the universe to divulge information about itself c Methodology for using data to answer questions in the presence of variation 2 Dogma problems with statistics a Always variation which leads to uncertainty b In order to be successful in converting data into useful information we must be able to deal with varietyuncertainty 3 THIS IS NOTA MATH CLASS a In order to succeed in STAT 121 we must learn metacognition or literally quotthinking about thinkingquot i 5 skills to be successful in STAT 121 1 Assess task of learning statistics example what will mastering this entail 2 Evaluate strengths and weaknesses example I m good at reading graphs but not too good at learning and processing vocabulary words 3 Plan an approach to your learning example How much will I study per day and how will I study each day 4 Monitor your performance example I got a 70 on a credit quiz a 90 on another and a 100 on another 5 Re ect and adjust plan example what caused me to get such a low score on the rst quiz and how will I study to be able to master those skills to be able to thus pass the nal a If your plan isn t working don t wait until the nal to adjust it Always strive to be better Vocab from Lecture 1 NOTE Dr Reese said that if you are able to put the vocab in your own words your grade on the exams will be approximately 10 points higher than if you cannot Statistics converting data into useful information and using that data to answer questions in the presence of variation Also the art of collecting that information Dogma of statistics the negative aspect that causes confusion about statistics Examples include uncertainty and variation In order to overcome these obstacles our thinking patterns must change and we must develop an understanding of how to convert data despite these dif culties Variation How many different answers there are in a data set Uncertainty Because you aren t surveying every single part of the population look at lecture 2 for that de nition there will not be 100 accuracy in all of your surveys This causes uncertainty We must account for that Metacognition Thinking about thinking Changing our mindset to be able to think about how others think This is necessary to succeed in statistics Lecture 2 1 Process of statistics a Collect data D Summarize data D interpret data 2 The Big Picture a If ever you get confused in class just remember that we are talking about one of the following 4 steps i Producing data taking a sample out of a population and surveying that sample to get the data that we will then interpret 1 Sample Part of population we will be surveying Try to make it a sample representative of the entire population so the data will be as accurate as possible 2 Population entire group of individuals that is the target of interest a Any time the word all is listed on a test it is accompanying the population b A census is a polling of everyone in the population i The US census isn t actually a census ii Exploratory Data Analysis 1 Graphically display the data 2 Summarize the data collected iii Probability 1 Involving uncertainty in the data to produce a margin of error iv Inference 1 Taking something about the small group sample have it say something about the big group population Some Vocabulary so far Data Pieces of information that say something about the individuals who were surveyed divided into variables Variable a characteristic of an individual 0 2 kinds 0 Quantitative Variable whose values are meaningful numbers 0 Examples cost height yield 0 Categorical Variable whose values are nonquantitative 0 Can also be called qualitative Examples gender opinion Measurement value of a variable for an individual 0 Doesn t have to be a number 0 Examples 0 Textbook cost for Nathan is 150 dollars Cost in dollars variable quantitative Nathan individual 150 measurement 0 Miles per gallon for 2011 Toyota Camry is 24 m Miles per gallon variable quantitative n 2011 Toyota Camry individual 24 measurement 0 Ashley was asked a yes or no question quotWill you vote for Bernie Sanders in the Democratic Presidential Primaryquot She answered yes m Vote for Bernie Sanders variable categorical Ashley individual m Yes vote measurement Individual particular person or object Dataset All of the data collected displayed in a table Exploratory Data Analysis 1 Organize and summarize data a Discover features patterns striking deviations from patterns outliers i Interpret patterns in context ii Single variable patterns distribution and two variable patterns relationship 1 Distribution all of the values of the variable and how often they occur 2 Visual displays and numerical summaries a Graphs Lecture 3 Notes How to Visually Display Data 1 Why do we display data visually a The goal is to allow someone to look at a graph and be able to understand the data without extra words labels etc The graph should speak for itself i Minimal mental processing for viewer 2 How do we display data visually a Represent numerical quantities with visual elements length area position darkness i Visual element consistent and proportional to quantity 3 Visual display of categorical variables a Bar chart i Represent categories by frequency of time the variable occurs ii Arbitrary positioning of categories iii Preferred over pie chart BAR CHART RELATED DEATHS 200 Number ofdealilm by hair colour iv Can be displayed as a pictogram 1 Pictureenhanced bar chart a Can be misleading if the intended visual element is height and the perceived visual element is area iii lair l liar Chi hiring IquotF39 Cilfa mnufrr Si l 39wf 3lfi5 lrl39t39r r l 535 P 1quot quot 1 I Iii i i g 94 49 b Pie chart i Represent categories by percentage of occurrence ii Area of section is proportional to numberpercent in category Percentages1H 1 mn I39rl lEIE S 4 Bar charts are nearly always better than pie charts a Comparing bar heights is easier than comparing 135095 pie sections i 123 is easier to measure with the eye than rt4 radians b Bar charts are easier to label than pie charts c Pie charts require many colors textures etc 5 Visual display for quantitative variables a Histogram i More common big data sets ii Steps to making a histogram 1 Take value of variables and gure out the range a For example if the lowest number in the data you have is a 50 and highest is 200 then the range is 50200 2 Divide range into classes of equal width bin a If the range is 50200 you may have classes of 25 such as 5075 75100175200 b Be careful not to have too few or too many classes 3 Count the number of individuals in each class a If the class contains everything from 5075 and in the data you have a 55 65 and 70 then there are 3 individuals in that class 4 Construct bar over each class that is height proportional to numberpercent in class iii How many classes should I put 1 More data allows more classes 2 Usually 1015 but it is trial and error We want to be able to show the shape center and spread well enough to interpret the data correctly iv Difference between bar chart and histogram 1 Bar chart a Horizontal axis represents categorical variable b Categories arbitrarily placed on axis c Bars separated by spaces 2 Histogram a Horizontal axis represents quantitative variable b Order of classes matters because the axis is a number line c Bars are not separated by spaces b Stem plot i Smaller data sets ii How to create a stem plot 1 Separate measurements into stem and leaf a Stem all but nal digit b Leaf nal digit 2 Write stems in vertical column 3 Write leaf to the right of each stem 4 It looks like a sideways histogram with every value represented iii Still confused Here s a video to help you out httpswwwvoutubecomwatchv7mOQm2pbd iv You can split stems to double the number of stems 1 Don t get confused We only do that so that the graph doesn t extend really far 2 Round values 6 Interpreting visual displays a If asked to interpret a set of QUANTITATIVE data always use shape center and spread i Shape Symmetric Uniform Distribution Symmetric Singlepeaked Unimodal Distribution 412quot 300 3I U I EDD III 3 200 Equot a E 2oo 5 LI 3 E L 1oo 1IIZI D I I I I I I I I I I I III 5 1D 15 Symmetric Doublepeaked Bimodal Distribution 500 400 300 Frequency 200 100 a Unimodal 1 mode or bump b Bimodal 2 modes or bumps multimoda more than 2 modes SkewedRight Distribution SkewedLeft Distribution soo 4m eoo 3I E 3oo sI a E zoo s e E 2oo 3 quotquot II 1oo 1oo o o o 1oo 2oo o Io 20 so 40 so so TU so on c Skewed rightleft 1 Skewed whatever direction the tail short part is i Center 1 Look for data with roughly half of the data to the left and half to the right Symmetric Single peaked Unimodal Distribution SUD 20E 1DEI I 39 539 I 1 Center 10 Frequency ii Spread 1 1 ii Outlier individuals that seem to not belong in the data 1 2 3 Look for minimum and maximum Spread of this graph is approximately 401110 Count Ask yourself Is the data point miscoded am 50 ED m an an IIIIIIZI Were conditions for the 5m outier unusual Should data point be excluded Vocabulary for Lesson 3 de ned in notes Bar graph Outlier Categorical variable Data I did there Distribution Histogram 2 2 I quot3 outlier D H l l l I III 5 1D 15 Individual Table Measurement Variab e Quantitative variable Shape See what Pie Chart Center Stem pot Spread

