# Notes up until first exam Stat 1000

Pitt

GPA 3.35

This 0 page Bundle was uploaded by Joshua Notetaker on Sunday January 3, 2016. The Bundle belongs to Stat 1000 at University of Pittsburgh taught by in Winter 2016.

Date Created: 01/03/16

First Day 08262014 What is statistics We got where Probability Everywhere where there is a series of random events Example sporting events Random being when the outcome is not already chosen Anywhere where groups can be separated and represented Population A large group with at least one commonality between them which has to be de ned before the research is conducted Often times you cannot study the entire population so researchers will study a small randomly selected sample from the population to learn about the whole Sample A subgroup of a population that is a manageable size to study Samples should have a nonbiased way of selection The sample is an easily accessible representation of the population Statistic A fact or a piece of data from the study of a large quantity of variable data Calculated by applying a function to a set of data Based on sample Generated values would be mean standard deviation mode etc Data collection can vary such as number of credits or years of school age Parameter Number that describes population that is xed an unknown Parameters are usually capital Greek letters while statistics are usually lowercase roman letters This completes the cycle and starts over again Use statistics to estimate the parameter of the population Going from statistic to parameter error exists and we need to measure that Variable The values that change in an experiment Independent variable should be consistent Dependent variable changes according to independent Examples Time vs Money Price vs Pro t Time studying vs grade Identi er A variables that uniquely identi es the subjects in you observation If people observed more than once date is important Data sets One group of data from an observation but could have many parts with different variables Database Many observations and variables that are connected in various ways Quantitative Data that can be statistically manipulated a real number Categorical Give numbers to qualities but cannot manipulate statistically in a sensible manner Can also put quantitative into categories Above below We got Survey It is a way of gathering data from individuals Results are quantitative and are generally taken from samples This can be done through polls questionnaires conversation surveys Census The study of every unit of everyone in the population These should have no statistical error and provide large amounts of data Direct Observation nonintrusive observations or recordings Hawthorne effect when people perform better under known observation These can be done through interviews or semistructured interviews These forms of data collection can be better than surveys because of self report bias Interviews can have language barriers taboo topic problems other biases Experiment Orderly test with more than one possible outcome Manipulate a factor or impose a condition Allows even groups to compare Simulation Most often done on a computer Set controlled variables and nd statistical results Crossectional study is about 1 point in time Longitudinal study is repeated over time can measure change Prospective study is when you watch something develop like a disease Retrospective study is when you look through records in the past for data Retro and pro are opposing and cross and long are opposing others mix We are convenience sampling Examples could be rst 30 customers rst ve people to class Convenient accessibility to researcher Student participants is convenient Used for pilot studies to see the relationship between variables You could send a message to all of your students your customers the people in line next to you people who walk into your store Not repeatable because of people involved in sample 63 Sampling plan is a series of written down steps for how to conduct the sampling method Should be able to reproduce from plan Simple random sample Each individual in the population has an equal chance of selection Methods include random number generator without replacement Reduces bias but might not be the most accurate represent the population Strati ed random sampling is when the population is divided into strata Sample from each strata with its own attributes that are mutually exclusive Expected differences like average height should be separated by gendeh Determine strata split into strata and identify n for each sample Use percent for make sample proportional Example renters vs homeowners Two big words Representativeness and variability Cluster Sampling is when the population is divided into groups of clusters Random sample taken from clusters Cluster is a naturally occurring group Examples are households regions cities dorms oors Every individual Cons are that you don t have access to every member of a cluster A lot of error on how naturally occurring clusters form Pro is that it is easy to get How tall are we Stat 1000 11 am Tuesday Population those here Out of 10 numbers from 0178 started line 116 and quotselected following 5quot Arithmetic Mean or population mean uses mu not going to be able to calculate this most of the time Sample mean is X bar Empirical rule 68 95997 In symmetric unimodal bell shaped dist1 Chancelikely hood Event Simple event l0001236Smutuay exclusiveexhaustive EventRedbackgreen A Sample spaceist of all possible outcomes Probability could be a or a fraction Bounded from 0 to 1 Classical approach of ways to win divided by the of possible outcomes each outcome of an eventexperiment Discrete and Continuous PA nAnS The opposite of PA is PA Even given black means BEB Black given even means EBE PA391PA Random Variable a function table or graph that assigns a number to Prob Flip a coin HT Probabilities tabulated as distributions of discrete rvs 01 Normally the DRV will be quotnumber ofquot events which means they will be integer values Children playing video games Xnumber of games per visit X 1 2 3 4 5 6 7 PX 005 015 015 025 02 01 01 Discrete probability distribution Probability that she played more than 4 games that day PXgt4p5p6p7 noncumulative Pxgt5 is cumulative not different in this situation but could be Because this is the entire population we use MuEx SigmaxPx Expected value of random sample is 10052015301541 Gives us an idea of center and we also want to gure out the variance Y amount of spent at arcade per visit Each game at the arcade costs 025 YO25x Use that equation on the Px to nd the Py Rules of expected value and variance Ex 1 Ecc 2EcxcEx 3EcxcEx Vx 1Vc0 2VcxVx 3Vcxcquot2Vx The probability distribution of X Pa single number being picked is 138 because there are 36 numbers and 0 and 00 13800263 the expected value is 36 percent 0263 0947 The probability distribution of Y Pone color is 1838 1838 473 The expected value of x is 47320947 also They are both the same because the payout increases as the odds become more unfavorable The reason that the expected winnings are less than a dollar is because of the inclusion of 0 and double 0 Not sure how to calculate standard deviation without observed values Also all the probabilities are the same in this situation which is also confusing me The variation is going to be higher for each number because the payout is larger per win so there is more opportunity to get lucky and have a winnings of more than 3 standard deviations from the mean Binomial experiments have a xed number of trials n Each trial has two outcomes quotSuccessquot and Failure Psuccess1PFailure for all trials Finding something not or H Trials are independentl each trial does not change probability of outcomes beforeafter Flipping a coin ten times Fixed number of trials only two outcomes Success Heads Failure Tails PH51PT Assume independent Xnumber of heads in 10 tosses PXxnxnxpquotx1Pquotnx Px2 Pxgt8Px9Px10 01 xquot01 3321 20 ground balls Pout75 29292020175quot20 25quot075quot20000317 Poisson Distribution Poison experiment Number of events success in a region of space or period of time 1Number of success in an interval is independent 2For all equalsized intervals Psuccess is same 3Psuccess is proportional to size of the interval 4As interval becomes smaller and smaller Psuccess0 Parameter Ml mean of successes in interval Use 1Plt rather than counting up to in nity for probability PXxequot MMquotXx Number of typographical errors in a textbook is poisson distributed with an average of 15 errors per 100 pages M15 errors POequot1515quot00 equot15 02231 Probability that I nd 0 errors in a 400 page textbook If Mu15 for 100 pages then Mu6 for 400 pages PX0 for 400 pages equot6000248 For discussion board Also use PDFs in future 1State hypothesis 2Choose test set up test 3Decide supportfrom output 4Conclude Blue ray movies recently Interval units sold 50 million units soldle 1Ho M50 million Ha Mlt50 million 2 Hypothesis about M from data x bar 5 This means t distribution The t distribution is very close to the normal except that this distribution depends on sample size which involved degrees of freedom Degrees of freedom n1 Standard deviation causes a lot of outliers when a box plot is made 3 compare pvalue to alpha value If your p is less than the a reject the Ho 4 Reject Ho critical t value for alpha05 tlttcrit 1 Ho 2 Hapgt2 21 sample proportion test 2 statistic Normal approximation test How much does the total consumer spending vary 1 Var20 mill Var does not equal 20 mill 2 Interval data testing for variance ChiSquared test 1T test for the difference between two means M1M2 Ho M1M2 One type when variances are equal 2 Categorical 2 test for p1p2 Ho p1p2 Of the top 50 blueray disks rated pg13 is there a higher proportion of movies that are adventure Ho PaPna HaPagtPna Z test for 2 proportions z crit is 1645 for an alpha of 05 Pooled proportion estimate knot0 Not reject Ho because 2 statistic is less than 2 critical Other way to say is that the pvalue is not less than the alpha value Two sample t test for M1M2 ls average consumer spending different for bluray dvds that rated pg 13 vs not Pg13 M1 notM2 1T test assuming equal variances lpooled variance

