Popular in Course
Popular in Economcs
This 58 page Class Notes was uploaded by Newton Cartwright on Monday October 5, 2015. The Class Notes belongs to ECON5 at City College of San Francisco taught by AsatarBair in Fall. Since its upload, it has received 15 views. For similar materials see /class/219523/econ5-city-college-of-san-francisco in Economcs at City College of San Francisco.
Reviews for IntroductoryStatistics
Report this Material
What is Karma?
Karma is the currency of StudySoup.
Date Created: 10/05/15
Introduction to Statistics Lectures on Chapter 6l 62 Axatar Emir I ll V mpamnuu t of Emma mus City Della gm of San Francine abair um Continuous Probability Distributions Continuous Random Variables Continuous random variables can assuvneanv value within a given interval or set of intervals at the numberof possible real numbers in any interval is in nite n this means thatthe probability of a ny partieular value beeovnes impossible to ealeulate a if all values were equally likely the ehanee of any value would be ll whieh is nota de ned 8 instead of values we ealeulatethe probability of x falling Within some interval ea x SI Continuous We a If a line represents an interval and x can take on any value along that line then x is continuous If not x is discrete Continuous Kandom Variables I We de nefx as the probability density function for a continuous random Variable I say a ight can take anywhere from 120 to 140 minutes w ifx length of the ight in minutes 6 is a continuous random Variable t if all outcomes are equally likely we have a uniform probability density function Uniform Probability Vensi39ry Function L for a s x s b f X b a 0 elsewhere For the highraampie M 120 for all 120 5x5 140 and o elsewhere Graph Uniform Probability Density Function Graph Uniform Probability Density Fo notion mm The probability IS an are So the probability Lhatxis between 130 and 14o Prize 5 x5140 1020 o 5 nu an no The area of the whole 1 Uniform Probabilifv Pensifv Fu nefio n The expemd value Ex 2 b2 120 1402 130 The variance Wx b r 2212 140 r 120212 40012 3333 Standard devizu39on is the posiu39ve square not of the variance 0 3333u5 577 French mathematician Graph Normal Visfribufion Normal Probability Pensi rv Function fx 1 e x y2202 where mean 0 standard devialion Graph Normal Visfribufion Same mean differer sfan dard deviations MK 2 ME The mean can be anvvalue Standard Normal Visfribu on Standard Normal Probabili rv Pensi rv Funo rion z22 fx e under the curve Standard Normal Vis rribu on CalculaTion ofThe area requires inTegraI calculus 3 J 23 was a and b IS 9 b 1 if 222 Pasxsbf em 1x aw2n a Topic for advanced sTaTisTics courses Using a z fable t As usual we re sTandinq on The shoulders of qianTs a sTaTisTicians in The pasT have done The workfor us and puTThe resulTs in The 2 Table orThe STanda vna VisTribuTion Table found in The bank of vourTebeook Using a z fable t The way you do iT is by caloulaTinq a 1 value X X a recallThaTTheforvnulaisc z 4 Once you have your z value use The Table Using a z fable t Say your 2 111 00 01 02 03 The fable gives you the area P0z 122 We can use this area to find other areas under fhe curve Pz 2122 05 7 03888 1 01112 We know that the are under the whole curve is 1 this means the men 0 122 We can use fhefable To find any area P122 S 1 154 04382 7 03888 00494 submm39ng the areas under the curve gives you the m z 0 122 154 We can usefhe rablefo find any area Pz 122 05 03888 Introduction to Statistics Lectures on Chapter 2 department of ion nu miu s Gitv Bolle eufSan Francisco gha39romm VESGKIPTIVE STATISTIGS Summarizing Qualitative Vata Frequency Distribution l A frequency distribution is a tabular summary of a set of data showing the requech or number of items in each of several non overlapping classes I The objective is to 39provide insights about the data that cannot be quickly obtained by looking only at thenriginal data Example Marada lnn Guest staying at Marada ln n were asked to ratet e quality ofth eir accommodatio e rating provided by a xampleonO guest areshown be ow Below Average Above Average AboveAverage AboveAveme Above Average AboveAverage EelowAveraqe DelowAveraqe Average Poor oor AboveAverage Excellent Above Average Aveva e Above Average Average AboveAverage Average Example Marada lnn Frequency FixTribuTiun QualiTy raTing Frequency Four 2 Below average 3 Average 5 Above average 9 ExcellenT I TuTal 20 KelaTive Frequency and PercenT Frequency VisTribuTions l The relaTive frequency of a class is The fracTio n or proporTio n of The ToTal number of daTa iTems belonging To The class I A relaTivefrequenc disTribuTion is a Tabular summary of a seTo da a showing The relaTive frequency for each class KelaTive Frequency and Perce nT Frequency VisTribuTio ns l The percenT frequency of a class is The relaTive frequency mulTiplied by 100 l A perceanrequencv disTribuTion is a Tabular summary of a seT of daTa showing The percenT frequency foreach class Example Marada lnn Frequency VixTribuTiun Sign 5323 Your 0 10 Below average 015 15 Average 025 25 Above average 045 45 ExcellenT 005 5 nal mo 1 00 Bar Graph a A bar graph is a graphical device for llepic rihq guali ra rive da ra rha r have he s mmarize in a requlehcy rela nve frequency or perceh rfrequehcv dis rribu noh 1 0h The horizontal axis we specify The labels rha r are usell for each of The classes a A frequency rela rivefrequehcv or perceh r I frequech scale can be used for The ver ncal aXIs Bar Graph t Using a bar of xed wid rh drawn above each class label we ex rehd rhe heiqh r appropria relv 9 The bars are separa red To emphasize rhe fac r rha r each class is a separa re ca reqorv Example Marada lhh Halwanna Mm Aammmz milkquot I uomymlm Tall Thin bar graphs emphasizedifferehoe lllow q Wide short bar graphs emphasize similarity my hhwzwmn Anus Anmmmz mums I almmnm Pie Chart 1 The pie chart is acommonly used graphical device for presenting relative or percentage frequency distributions for qualitative data a First draw acircle then use the relative or percen age requencies to subdivide the circle into sectors thatcorrespond to the relative frequency for hclass n Since there are 360 degrees in a circlea class with a relative frequency of 025 would consume 025660 90 degrees of the circle Example Marada lnn Use of oolorin presenting pie charts To highlight the positive features of this data Mauve wage 5 lhevsqe 25 Exes em 5 To highlight the negative features of this data Summarizing Quantitative Data Frequency Distribution Relative Frequency and Percent Frequency Distributions Dot Plot Histogram Cumulative Distribution Example Hudson Auto Repair The ma agei of Hudson Wuuid We 0 get a better beeH taken arid the Costs of parts rounded to the Nearest doiiar are ii sted beiow Frequency Distribution Guideline for Selecting Number of Classes Use between 5 mi 20 classes Vaia xeix wiih a larger number uteiemenis uauallv require a larger number of classes Smallerdaia set usually require fewer classes Frequency Distribution Use classes of equal width Lamas Dab Vllue Smalle DIE Vllul Class V dth Numwof I Frequency Distribution oat 5 Frequency 5059 2 60 69 15 70 79 16 80 89 7 90 99 7 loo 109 5 Relative and Percent Frequency Distribution W Imam Fr l illv 50 59 004 4 60 69 026 26 70 79 032 32 8 0 8 9 01 4 l 4 90 99 01 4 4 100 1 09 01 0 l 0 Total 100 I 00 Dot Plot One ofthe simplest graphical summaries of data is a dot lot A horizontal axis shows the range of data values Then each data value is represented by a dot placed above the axis Dot Plot 50 so 70 so 90 100 110 Cost 515 Most ofthe data is in this range Histogram Anothercommon graphical presentation of quantitative n1 data is a histogra The variable ofinterest is placed on the horizontal axis g e requency relative frequency r percent frequency is placed on the vertical axis A rectan le isdrawn above each class interval with its height corresponding to the interval s frequency relative frequency or pcrcen frequency Unlike a bar graph a histogram has no natural separation between rectangles ofadJacent classes Histogram Relative Frequency Histogram mu illH um um BilDD Hill1M Cost 3 5050 con um um mm tuninn Cost 3 Cumulative Distribution The cumulative frequency distribution show the number of item with value less than or equal to the upper limit ofeach clau The cumulative relative frequencydixtribution show the proportion ofitemx with value leuthan orequal to the upper limit of each c lass The cumulative percent frequencydixtribution show the percentage of itemxwith value less than orequal to the upper limit of each c lau Cumulative Frequency an m Kola 31122 nuy 79 2 004 69 15 030 79 31 062 9 33 076 99 45 090 109 50 100 Ogive An ogive is a graph ofa cumulative distribution The data values are shown on the horizontal axis The vertical axis can be cumulative frequencies cumulative relative 39equency or cumulative percent 39equency Ogive Exploratory Data Analysis s Exploratory Data Analysis techniquesto quickly stimmarize data Crosstabulatmns s Scatter Diagrams StemandLeaf Display 3 This display shows both the rank order and shape ofthe distribution ofthe data at It s similar to a histogram but it has the advantage of showing the actual data values 3 The rst digits of each data item are arranged to the left ofa vertical line Hudson Auto Repair Crosstabulations and Scatter Diagrams s Thus at we have focused on methods that are well to summarize the data for one variable at a time t Next we explore method at understanding the relationship hetweeh two variables Crosstabulation The number of Finger Lakes homes sold for each style and phce forthe pasttwo is shown below Problem With crosstabola rlon a crosstabulatinh data areofteh oombihedtn nrmah aggregate omssta u ation n this presentsa possible danger n relationshipsthatappearihtheaggrzgate maybe onwtradioted bythe uhaggregated data 8 this isoalletl Simpso h s Paradox Crosstabulation Stmpson s Paradox Judge Lucken Kendall Uph eld ReverSed Total It looks like But Lucken actually has a better retard ln bum mun Example Panthers Football Team Scatter Dlagram The Panthers football team l5 lnterested ll l lnvestlgatmg the relatlonshlp lfany between mtercepttons made and polnts scored Panthers Football Team w VetrDIVargfram Tabular and Graphical Procedures p 56 m inll m n Microsoft Excel a MS Eml and nthersfa ristisal spreadsheet pray arms like it makes many tasks in statistics wush hush 2 i2 1 Appendix 22 dess ribzs hnw m perform some nperatiuns ih Exsel Histogram a You and thz knalvsis Tunl ak t 00 to Toolsquot than quotAdd ihs39 than szlzct knalvsis Tonl ak39 and hit 0Kquot u Than an to Tnuls39 and hit quotVata Analysis t Mist utuptinns will cumz up szlzct llistogmm Histograms M highlight all your data 51 more on this in a l minute Eqk 04mm highlight where you want the lrequency dislribulion to go 1 Bln range a what you do here isto delinethe class widths wantto use a Exoel oan Ilo this automatically but it has very bad judgement andthe results will be a look atthe dataand deoidewhat the upper bounds for eaoh class should be a ityou want yourelaueo to look like this you enterluxt the upper boundary in each ce1l959 69 7939 93105319 then go tothe bin I39angelield and highlight theseeells Frequency distribution a hit OK and Exoel gives you this gt a like to rename the Bin tieltls quot40 497 quot5049 etc 139 also rename the quotMotequot held Total and add up the column above so the whole thing looks like this l mm Histogram s Nowgo to lnszrt and szlzo ihartquot or hit the button Szlzetthz clustmdeottvnn39 zntzrthztitlztorthzxandyaxzs the whole chart then hit OK39 8 to gut rid ntthzqap hztvnzzn the r douhlmliek on onzotthz bars on the nished chart than go to motionsquot and antenna under 0ap wHtl rmqu mum Nam Eiunmmu nszxzxaa v a a a a y 3 up a a a D D a E s be sum in label the amami give if a title 1 annthzr masterpiene nf statistical Introduction to Statistics Lectures on Chapter 8 Asatar Ed I Phi Vapal twu tuf Ecunumic s Ditvbullnqe cf San Francis ahairmsm Interval Estimation Interval estimation a An interval estimate is designed to be more f I accuratethan a paint estivna e 3 there is a tradeaff between ptecisinn and accuracy a precision is the width of an interyal so a paint mate is the most precise possible interval being a single point accuracy is the probability ofthe estimate being correct de ned as n a Im pmiu Precision is the size ntthe target 0 mm pmlu Accuracy is the chance of hitting it interval estimate Interval estimation a General form of the interval estimate for the population me it margin of error E a General form of the interval estimate for the population proportion 1 margin of errorE l p i e r nt n x l l96lt2392 i 39 l i i r196 O 196 z Interpretation qula 3 Here39s the formula for a con dence interval Wu say I m saw I 95 four estimate for the population mean ifcl is point estimates will fall within 392 of the knowquot 1 usually 5 quoton population mean a in other words if wetook 100 different 039 samples 95 of the sample means would be x Z between 392 belowan 392 above the or 2 population mean Finding the value of Za 2 Say we39ve de ned a 005 qz 0025 in 095 Finding the value of Za 2 3 We re going m use The z fable backwards newer 10 edge Some common values of Za 2 III m formula more often fox 0 is generally not known Th e 1disfribu oquot The f disfribufioh vs the Sim dard normal I standard normal s ihie is a se r of probabiliiy disirihu riahs all disuibucion 01 which are svmme rrical and bell ehapeli Though h01 preaiselv normal 1 ihe shape of ihe disiribuiiom particulaer r ewi is ase 0h acohcepfcalled degrees of freedom h 1 developed by William Sealv Geesequot trdisuibution wiih trdisuibution a as rhe sample size geis larger the 1 iisi39ribuiinh cahverqes To The standard normal disi39rihuiioh William Sealy Gosset 5 r 1937 Using the l table brews 39 1 cnmfsxgl i 2193231 Say we39ve de ned 1 anddld 12110le q 005 351 publish my ndings whmh an 0025 uldleakc ysenm G s 1 7095 d the eudbnym 390 7 39 P p Student39 HIS my xx idapced and used by RA Fishsr m V pupulanzed e Value om depends on the unm e ofdegrees of freedom n r 1 Using the l table A n Degrees of I freedom 5 0 I Ian I 5 16 Using the l table El w I a How big does the sample size need to be a For most cases asample of 30 ormore is enough a ifthe population distribution is highly skewed orcontains outliers a sample size of 50 may be needed a ifthe population is normallv distributed then asmaller sample maybe used eg n 15 Vetermining the sample size Sometimes you need to determine what size sample ou need to collect in orderto achieve a margin of error E that is within certain boun s Interval estimates of the population proportion 3 Recall the formula for the standard deviation of depends on whether the population is niteor in nite39 Finite population Infinite popula on 0 M M 0 M quot N71 n quot n Sampling distribution population proportion interval estimation p 3 Since we don t generally know p wemust Estimate The standard deviation of p using p 4 So tho formula for the interval estimate is interval estimation p 044 i 00324 t In rerpretation we are 95 con dantthatp is between 04076 and 04724 in we re pretty certain that it s less than 05 more 0 n this in the next chapter Interval estimate 04076 S 7 04724 population proportion H 025 Conditions quot17gt nurpvs 042776 Epo44 04724 lt gt i 4t t ti 7196 o 196 z VeTermihing The sample size a WhaT is 1quot IT is The quotplan hing value of p f choosea value for p based on The VeTermihing The sample size a Again wiTh The sample proparTiom someTimes you need To deTermihe whaT size sample you need To collecTih or er awhieve a margin of error E ThaT is wiThih a Use The sample prnpania m fram a cerTaih bouh s previous sample 2 a useapilnTsTudvToquasample r r 39 39 r ZaZ p p spalzvaIJIoh Then use ITTn TakeThe eal E 2 usevnurjudqmeh r a use p 05 mnsT conservaTivel Infroduc rion ro S ra ris os Lectures on Ghapfer 7 Asakar 9n I PM Vaparmmuf Erunnmizs Eitvbullmu uf San Franzimu ahaimnm Sampling and Sampling Visfribufio ns Parameters t Mumerieallmgasum of papulaw iam kharacienstwsare called parameters a zxamplcsimkludez rhz manufhevariammfhz amialfll deviation and 1h 39 e fmparmm vi a population that has some de M eharaa reris rie Paramefers to Example An electronics nn called EAI wants to understand the chamctenstics of its 2500 managers what are their salaries and how many are trained a Population mean p 51800 a Population standard deviation 0 4000 0 p the proportion ofmanagers who completed a mining pmgmm the population proportionquot 15002500 05 Wha r ifwe don f know The parame rers a We can take a sample from the population instead of looking at the whole population then use the characteristics of the sample to infer the characteristics of the population a this is inferential statistics Sampling a There are several methods of seleo ring a sample from a populaiio n s a common me rhod is simple random sampling 3 ihewav ifs done dependson whe rher the popula rion is ni reor in nite Simple Ka ndom Sampling Finlfe population I A simple random sample of size n from a nite population of size Nis one in which each sample of size n has the same probability of being selected 3 An easy way to do this is to useExcel t a table of random numbers can also be used see p 261 Excel generafing random numbers 1 The formula is RANDOquot 1 This giVes you a random number between 0 and l by doing further number operations you can get a random number between any two Values 3 Excel will rergenerate a new set of random numbers any time you do anything to x the random numbers so they don39t change copy then use the Paste specialquot command to paste Values only Sampling t Most studies are done by sampling without replacement meaning that it is not possible for an item to appear more than oncein asample b itis also possible to sample with replacement meaning an item could appear more than once in a sample c Both are considered valid ways of drawing a sample Simple Kandom Sampling in nite population 3 In theory no population is in nite some populations are large enough or of unknown size so we consider them in nite a in this case simple random sampling has two criteria a each element selected comes from the population an a each element is selected independently Point estimation 3 We re interested in making inferences based on the sample data 3 the simplest way to do so is to use The sample statistics as estimates for the population parameters point estimate population parameter gt gt p quotUlth quotsq Sampling error I sampling error is the absolute difference between the populatio n parameter and the sample statistic l I 4 l l s a l lFpl Poinf esfimafion popn1ation point sampling pammeter estimate error sud Irv for salarv proportion imined Sampling Visfribufions All of nur sample s ra ris rics en The sample mean sample slandarii devia rinn and sample prapor rion are random variab es lhis is because rhev camefmm a randem process This means if is possible indeed likely in gel differe m values for eac sample lhis means That There is a disiribui ion of liiffere m values for each s ra ris ric it This is called a sampling dislribu rinn Sampling disfribufion 23 sample mean sample proportion 1 51814 063 2 52670 070 3 51780 067 4 51588 053 500 51752 050 Frequency disfribufion Kela rive Frequency His mqravn nf Sample Mean As The numberof samples approaches in ni rv rhe 39 3m thabilifvliwibqtipn News nvrmal Sabres 013509 Samplewbize 30 555555555 555555555 555555555 555555555 355555555 355555555 Sampling Vlsfrlbuflon ofX Sampling Vis mbuflon ofX The sampling dis n39ibufion anis The prnbabili rv Wham fhechammrisms of The nig rinn nfall possiblevalues of rhesavnple sampling disfribu w of foran in nite npula rion rhereare an in ni1e 39 This disfrwgomwEfrggg39dzgeg m numberufdif eren rsavnplevneanswhichgives I I d r dp r c pf um usacnn rinunus probabili rvdis rribu rion1 an VHF 3quotquot p a V 395 quot e 5 a 5 ewa rm n Sampling Visfribufion on a GenTral Tendency Eli We re inTeresTed in a parTicular pro perTy of x which is ThaT H is equal To The populaTion parameTeron average or E0 u a This means The esTimaTor is unbiased Unbiased esTimaTor Em u Exnfx 5 EU EX EX2 5x3 55 57lMMMMMM n n The variance of The sampling disTrlbuTlon of x If weobserve independenTand idenTicallv disTribuTed RVs from a disTribuTion wi rh mean u and variance cr Vm Lipmm mm VmOQ VmXn n 1 Vuan7ololol 01y L n n n The standard deviation of Variance of X s From The preceding formula we can see ThaT There s an inverse relaTions ip b eTween The sample size and The variance o 7 a W a properTv called The quotWeak Law of Large Numbers we can show ThaTas The observing a value some disTancec away from The mean diminishes approaching zero as n approaches in ni Weak Law of Large Numbers 1 We can use Ghebvchev s Theorem from Ch 3 To demo hs rra re The Weak Law of Large Letc be any constantgrezmr than zero For any dismbuhon of x wnh a mean 4 and vznznce a1 Pli7ul2cgt0 zsna m z z Pdiiul 2c Sir As H m Lia o m m Pdirul 2c 0 Graphical depicfion offhe WLLN Central Limif Theorem quotIn selec rihq random samples of size VI rom a popula rioh rhe samplih disTribuTioh of The sample mean icah be approximated by a normal dis rribu rioh as The sample size becomes large p Gen rral LiWIif Samyliug distribution of Theorem 2 A 4 7 l 7 5 A M x H 7 Population distribution Sampling distribution of Geniral Limif Theorem 1 l4 7 l l l quot5 A x M 4 7 Population distribution quot30 l4 7 Sampling distribution of Gen rral Limi r Theorem n n 2 E l4 x n 5 L M x l4 6 Population distribution It 30 4 7 Sampling disfribufion sample mean salary The sundard deviation of a 1 730 3 JIM 02517 02517 x l 51300 Haw 351300 l 70 68 o N Sampling disfribufion of 39p 1 Again we re inieres reil in ihe cen rra tendency and dispersion 0f ihe rlis rribufian aswellasi rss a 39 a In ierms af liispersinnihe formula forthe sianllarll devia rion of p lie an s nn whe rher The pnpulaiian is niieor in niie Finite populadon Infinite popula on 0 M M 0 M quot N71 n quot n Sampling disfribufion population proporfion Condiiions up quot11735 Sampling disiribufion populafion proporfion 0555603965 Condin39ons up quot11725 02123 Sampling Methods Sira ri eii Random Sampling Glusi er Sampling s Svsiemaiic Sampling The above techniques are leqiiimaie sampling moi nods inri ey rely on randomness as iheir hase with some variational ensure iltaii e o sare good that a sample I39eplesenis the pupula rinn Sampling Methods it Any method ofsampliug that is fundamenially nourmudom will produce ia39sed estimators 3 if sqmples are dawn on the basis of convenience or a person39s judgment for example s while potentiallyriuterestiug such iesu1ts ate not genemlly accepiable as evidence from a scienti c slandpoint The arithmetic mean is one measure of location It is a measure of central tendency or an aVerage It is sometimes referred to as quotthe average or quotthe mean although there are other means and averages E0 0 quot Measures of Location Introduction to Statistics Mm Lectures on Chapter 3 39Mediaquot F Mode Asatarhr m 39 Percen las cillpihlll39lllquotslflllllll u Quartiles ahair ushd Mean Central tendency a Central tendency is an impurtantcuncept 1 wewantto know ifanv particular values are more cuvnvnu nlv ubserved if the data is clustered in a certain range Sample Mean Iffhe dafa are from a sample fhe mean is denofed by Summa on sign 5 2x1xlxzxSx4x5 i1 i r means add up The values Populain n Mean The rnean is denoied N 2x i1 If the dam are frown a popula rion by Home Prices The following is a savnpleof The prices of 10 ho es in San Franci co Theda a are in ascending order in Thousands of dollars 20 455 463 72 512 514 554 575 580 600 625 630 870 670 810 1250 1480 2700 3400 500 Sample Mean M 10941 1094100 The mean can be misleading n this case the sample mean is not a good indication ott he central tendency ufthe data l u t e 20 me are beluwt e mean eeause the presence of expensive homes in the sample The mean defeated whenever there are signi cant extreme values the arithmetic mean becomes a poor measure of the central tendency of the data s cases in which there is a bi polar distribution orare skewed to one side or a distributions in which there is a high level riance 3 some means are impossibleto observe examples 3 skewed distribution As mentioned in the EM lecture say 9 homeless people and one millionaire earning exaotlv t1 milyr live on thesame block the mean income is 100000 t Extreme values Say you have one hand in liquid nitrogen 100quot the otherovera re 269quot so on average 80quot do you feel examples Number of families US 112 mil s Combined Wealth 20039 503 tril 1 Average wealth 5449107 per family 1 feeling rich hmzmmmmm resorwmvg m eds l l lzmp f Vis rribufion of weal rh is skewed IZIHIII 39 quot 15MI 39 MIMI r um I Bummg I Tuplz maransmm V In ms 61 6 Vis rribu rio In of wealfh is skewed 4 Average weal rh bn r mm 50 522300 3 Avqwea1 h 50 90 5313500 Averaqeweal rh mp 1 615 mil hmzmmdmm nummw rsmasguazusl g slzmmf Bipolar disfribu rions Say There s a land pnpula red exolusivelv by gm u ecs heiqh r 15 25 f r and qiam height 1 The mean 6 ft does mn r accura relv describe rhe cem rral rem demcies of The popula rinm da ra Mean with high variance a Knlling asix siiled die s possibledutcnmes l 2 3 4 i 6 a mean 35 t does this mean vdu will roll a 35 nota central tendency in the sense thatvdu are more likely to abserve the mean value Average does not mean quotnormal a Mist uultures propagate strong pressuresta nanfnrmtn prevailing standards of behavior appearane eta 3 Standards uhange leading to acanstant rma searehfurwhat is no t Mast Ameriuans have an abave average number of legs f a number of Amerieans with 3 legs 0 at number of Amerieans with l or 0 legs gt 0 a this means the mean number of legslt 2 Median The median of a data set is the value in the middle when the data items are arranged in ascending order lfthere is an odd number of items the median is the value of the middle item lfthere is an even number of items the median is the midpoint ofthe values for the middle two items Median of Home Prices 600t615 6115 Median home prices The median is often used in cases wherethere are extreme values hig low or both a is typicallvthecasewith housing data lnthis aset e median seems a better indication of central tendencvthan the mean although this is aludgment call The median and central tendency a The median is notalwa s a good description of the central tendency of the data 1 it fails under some similar circumstances as does the mean high variance bi polar distri ution 39 t itcan bea good way of dealing with the problem of extreme values the median is not as over used as the mean therefore it is less abused Mode 0 The mode of a data set is the value or values that occurs or occur wit qreatestfrequencv there can be more than one mode mode 670 Mode and central fendech Percen les a The vnolde is rarely aqnn vneasure of myth percen le of a dam f is a mm 2 mv value such fhaf af leasfp percenf of fhe i r f ils when a da rahse r is lame and There is ems fake his Vaduz or egg and a1 ei r er no vnndenr r ere are many repea red values which WV make we made Im leas ril 0071 percenf of fhe ferns fake vne 39anul on his value or more Peroenfilas Pemmm 39Ifiis not an integer the p h To nd The p percenfile of a dafa 1 percentile is the value of the next I Arrange rhe da ra in ascending order integer eg 139 3 gt 39 39 39 39 th Gompufe index 1 fhe posmon of fhep If 1 is an mg the pm percen le percentile is the midpomt of the 7 3 7100quot values in positions 139 and i1 Example Home prices 90th Percentile i lp1001n 90100120 18 The midpoint ofthe than 19th Ilata valuee39 2700 t 3400 1 1 3050 Example Home prices 75th Percentile i p1001n 178100120 x156 Koundupta 16evenifitwere151 we d mund u 1 eathe 70th percentile waulil be the16t datavalue 73th percentile Quartile Quartiles arespeci c percentiles First Quartile 25th Percentile econd Quartile 5011 Percentile Median Third Quartile 7511 Percentile 512 514 25th 5 percentile 600 625 50th 6 2 5 percentile 310 1250 75th 2 103 percentile Use of peree Miles Use of percentiles Top Heavy Measures oI Income MW ii mummy as ital by row Romlvo manual at 5mm dlspersm 2005 Hausohnl Immut t V I quot 39 quot Household Income at lmmimm l Selected Percentil 1ng 10th percentile upper limit 11288 NW 20th percentile upper limit 19178 Wages 50th median 46326 I quot quot quot quot 80th percentile upper limit 91705 90th percentile lower limit 126090 I V l 95th percentile lower limit 166000 Wall 9imsi Journal 2707 Use of percentiles Use of percentiles Mean Household Income Shares of Household of Quintiles Income of Quintiles Lowest quintile 10655 Lowest quintile 34 Second quintile 27357 Second quintile 86 Third quintile 46301 Third quintile 146 Fourth quintile 72825 Fourth quintile 230 Highest quintile 159583 Highest quintile 504 hijp mwmeisugnvprudZoo p Ii 60431 Measuring variability I Th e cuneept at variability is an impurtant on e is statistics and probability fur it s one ofthe n atiuns aft eeuncept oi riskan unaer a39n y39 3 the minimumvariability wuuld bea setuf numherx that dues nntehanqe at all essentially the same number reheated 3 when the data do vary we want to knew how much because we re not accustomed to thin king about variability the numherean be hard to in er r Range s The simplest measure at variability Range Largest value Smallest value a very sensitiveto extreme highs and lows Range of home prices 4500 420 lnterquartile range a This solves the pmblem at high and low ex reme values bycunsiderin t e difference between the third and rst quartiles a IQKQ3 Ql IOan homeprices 1030 513 i l7 Emulation Variance lithe data are tram a sample th e variance is denoted by Sample Variance Iffheda1 a are from a sample fhevarianee is denofed by n Sample variance alfernafive formula n 2 72 n quotx 52 i1 n 1 Standard devia on s Fopula rion siandard devia rion O V02 2 a Samples randard deviation S J Sample Varianceand Sample Sfan dard Veviafion A common quesfion is why divide by n I Why no r divide by n n 2xi 22 S2 i1 n Sample Varianceand Sample Sian dard Veviafion 0 Using ln I makes fhe sample variance oonsis renf wifh fhe mafhemafical de nifion of an on biased esfimafor Es2 02 More on fhis in chap rer 7 I ll preseni39 a proof af fhaf fime Computing varia noeand s randard deviation a it skindofapainiodoiihyhand I thzhestw oa Using Excel 1 a hithiqh r1 heoelloroells To oaloolaie a sum enter somlsl s To sub rrao r 1 94 To divide by 20 s20 s To square 97 1 To Take The square roo r a 05 Using Excel 1 The easy way To do His To en lo Tools f Then quotVa ra analysis and choose quotVescrip rive sia ris rios i and selec r the da ra range and seleo r quotSummary s ra ris riosquot nuuulvsn Using Excel 3 TheoquuT forThe home price daTa looks like This Column Coef cienf of Variafion This sTaTisTic looks aT The sTan dard deviaTion in relaTion To The mean 2quot3quot quot7 Standard deviation X100 21 125333 Mea Visfribu rion shape Symmetric Visfribu rion 3 Looking aT The shape of The frequency hisToqram gives us informaTion abouT The eenTral Tendency orTendencies of The daTa I There are many kinds of hisToqrams symmeTric skewed uniform and bi polar are so me examp es 1115 1m 2m zuo 5135 Skewed righf Visfribufion 1a SkewedIeff Visfribu on 15 510 1115 11m 2147 2540 5135 Uniform Visfribu on TEDT a i 7 16 510 1115 11m 2145 21m 1155 Bi polar Visfribu on 15 510 1115 11m 2147 2540 5135 Skew ness sfafisfic TheqreaierTheabsnlufevaluenf rhe skewness is r evnnres we i ris neqa rive skewness vneans skewed left Skew ness sfafisfic a This formula would be for an ex rra credi r ques rinn Sk t posi rive skewness vneans skewell riqh r ewness n a zero vneans svvnvne rric z Score z Score s z Li a The z care is a measure of rela rive loca rinn a if rells us how fara da ra value is from The mean relaiive m The s ranllarll devia rio n Xi X Z S a this da ra value is 15 s randard devia rinns a ove r evnean s z 08 This da ra value is 03 s randard devia rinns below i e mean 1 rhe rScnre is a very ivnpn rian r s ra ris ric 1 well beusinq i rl hrnuqhau r rhecaurse Ghehyshev s heorem The number of data values within 2 standard deviatio ns of the mean is at least equalto 0h ebyshev s Theorem a Forz 1 at least 075 or752of thedata values are within 2 standard deviations of t 2 mean aboveor be ow 1 1 21 im 0h ebyshev s Theorem It The inerediblething about ch ebyshev s theorem is that it holds for all distributions regardless of the shape Oh ebysh ev s Theorem at A more formal version Let c be any constant greater than zero For any distribution ofX 2 P X Hizcs Empirical rule 1 We can use a differenT rule for daTa which seem To have The bell shaped or norma IlisTribuTion t abouT GSZof The daTa are wiThin l sTandard deviaTion from The mean abouT 952are wiThin Z s nearly all are wiThin 3 Oufllers c Since The overwhelming majoriTy 89 almosT 100 of The daTa lie wiThin 3 sTanllarll lleviaTions of The mean ifs good To reviewanv daTa poinTs wiTh z scores less Than 3 orgreaTer Than 3 at such poinTs maybe ouTliers I They coulll be errors which shoulll be emoved t Then coulil be evidence of someThing unusual in T eoaTa Five number summary a A quick way of summarizing The daTa is To consider The following ve numbers39 Whlskers extend to the smallest and largestvzlues wthm the lower and upper llmlls Box Plof 1 Sample covariance is a measure of linear relationship be rween two variables xand V t one problem wi lh the measure is That ifs affected by The units used forx and v 2m m y W n l Ky Outllers EEEEEEEEE E E E Sample Covariance Popula on Covariance 2xi M1Xyi My N lnierpre ra rion bo rh samplean popula lion Covariance 0 No rela rionsllip Covariance gt 0 Fosi rive linear rela rionsllip Govariancelt O Neaa rive linear rela rionsllip Correlation Coef cient This measure solves the problem of units that plagues the covariance Sxy axy rxy pxy sxsy axay Sample Population Correlation Coef cient 3 One bene t of the correlation coef cient is that it also tells us aboutthe strength of the linear relationship between xanll v Positive linear relationship 3 covariance correlation coefficient positive Negative linear relationship a covariance correlation coefficient negative No linear rela riomship s covariance BorreIaTiom coef ciehT zero A perfee r posi rive linear relaTio nship 1 Barre aTiom coef ciehT 1 Y aTe measurenf ThecehTralTem emcv 9 mm T ecausecer aim values occurwiTh much qreaTer frequency Than nThers a far example wiTh This daTa The ari meTic mean cosT would be 53 buTT enverw elmimq vgajoriTv of The Time The MT is 1 1 v w Weigh red mean 3 To overcame This problem we use The 39 d mean welq hTe 2w 1 Value of observation i w weight for observation i Weigh red mean Pu rchaszs Gust l u rchaszs 452 EW Ew x 1900