# Class Note for MATH 115A with Professor Dawson at UA

Date Created: 02/06/15

Random Math 115A Spring 2008 Dawson Why random samples 0 We would like to have probability information for our random variable X o Often we don t know the distribution ofX or even its expected value 0 To estimate the expected value FX or fX we use random sampling 0 We may approximate the distribution if we can take a large enough sample of n independent observations of X e We call these n independent observations of X X1 X2 X3 Xn a random sample of size n Sample Mean If n is large enough the random sample is a good representation of the random variable X 0 Using the random sample we can calculate the sample mean 1 391 x Zgxi As n increases the sample mean gets closer to the parameter EX Approximating pmfs To approximate a probability mass function u Group the data according to values of X o Create a histogram for these groupings 0 Calculate the relative frequencies Sample Data 0200 0180 0160 0140 0120 0100 0080 0060 0040 0020 0000 Rekmve Frequency 01234567891011121314 stoppages Discrete Random Variable 0 Example Let X be the denomination of a chip selected at random from a box that has 4 1 chips 3 5 chips 2 25 chips and 1 100 chip There are twenty observations of X given below Use the sample to plot an approximation of the pmf ofX and to note changed data from in class to match next slide 25 5 25 1 5 25 1 1 1 5 5 1 25 1 100 1 1 25 5 5 Discrete Random Variable Example cont d APPROXIIKATEpmf Relative x Frequency Frequency A 05 1 8 040 i 0 5 6 030 S 25 5 025 g M 100 1 005 z 0 39 Sum 1 00 1 5 25 100 1 500237 2025 5 11315 Approximating pdfs To approximate a probability density function Group the data according to the range of values of X Create a histogram for these groupings Calculate the relative frequencies v Divide the relative frequencies by the bin width to calculate the pdf heights Create a line plot connecting the midpoints of each of the columns Approximating pdfs bins are labeled by their midpoint Fiel Freq 025 l O250125 2 lgtltw0125gtlt2 0123 025 l 1 3 5 7 9 11 13 In a plot of relative frequencies the probability is given by the height of the rectangle But that should not be true for a pdf it must be the area that is equal to the probability 3 5 7 9 11 13 By dividing the relative frequency by the bin width we are ensuring that the area of each rectangle length times width is equal to the relative frequency Approxumatlng pdfs 0108 0107 0105 E 0105 539 0104 003 0102 0101 0100 AW 1 7 TIMES 13 19 25 31 37 43 49 55 61 67 73 79 t Approx m 008 007 006 005 004 003 0102 001 000 TIMES 1 7 13 19 25 31 37 43 49 55 61 6 7 73 79 t Continuous Random Variables Example A bus arrives every 10 minutes Let W be the waiting time in minutes until the next bus Fifty observations of W are given below Use the sample to plot an approximation of the pdf of W and to estimate EX 97 08 65 55 95 34 75 48 02 86 70 20 06 20 36 24 80 62 61 52 03 45 94 92 34 09 59 78 56 16 23 13 98 72 60 66 51 37 05 58 82 35 33 31 71 38 64 44 87 90 Frequency Midpoint Relative Frequency frequency divided by total Height relative frequency divided by bin width r Continuous Random Variables Example cont d ApproXimate pdf Approximate pdf 012 o 12 C121 01 39 u 008 U 75 006 E 882 j 004 0 O 04 002 I D 02 D D 1 3 5 7 9 1 3 5 7 9 w 1 ECW a W E97 08 90 51 Bootstrapping Def Bootstrapping is sampling with replacement from one sample to generate new samples typically for the purpose of estimating probabilities and parameters 0 Why 0 We are not very likely to have a large enough sample set due to costs and time 0 We can simulate a larger data set by sampling from our original data this is bootstrapping We will use the Excel functions VLOOKUP and RANDBETWEEN to help bootstrapping COLUMN COLUMNS sawmmm mommy maxim VLOOKUP mom Lnokungtalue ll Iahigjnay Lmdmm RawaJanle l mks For a value m the left Ey default the Labia must b 9 same mw rm 5 nlumn Wu spenry Ln he mm in the first mlumn ur the table and an be a veins 3 veFereerE m a m suing Formula restdt NOTE If the first column in your table is not numbered in orderfrom 1 to some row number n the VLOOKUP will not work properly The rows must be numbered sequentially Lookupvalue the row number of our table from whic you wish to retrieve data the number you are searching for Tablearray the cell range of your table Colindexnum the column in your table which you wish to retrieve data Rangelookup leave blank RAN DBETWEEN o RANDBETWEEN returns a random integer between the two numbers you input 0 RANDBETWEEN is found under the Formulas Tab Math amp Trig Po ER PRODUCT B K D E r QUOTIENT 1e 39 PADIA NS mm mnoamvzm Roum ROUNDDOWN ROUNDUP 10 sEruEssuM RAN DBETWEEN r unmnn Arguments FMNDBETWEEN Botlnm l Tun l Returns a random number between the numbers y pacify Bottam ii the smallest Integer RANDBETWEEN Will return Fnrmula result tlelg an thls mnmun o Bottom the lower integer in your interval which you are searching Top the upper integer in your interval which you are searching RAN DBETWEEN 8 VLOOKUP RANDBETWEEN ab randomly chooses an integer between t e values a and b that you specify c We will use this in con39unction with VLOOKUP while bootstrapping ather that specifying the row number in the VLOOKUP function we will randomly choose a row from the table using RANDBETWEEN VLOOKUPRANDBETWEEN1200 iA18D2 OO3 will randomly choose one o t e 20 rows in the table from cell A1 to D200 and return the value in the third column of39the table ie in column C Example The times at which 270 calls arrived at a com switchboard are shown in the sheet Log of P Logxls anys one Let T be the random variable that gives the time in minutes until the arrival of the first call and between the arrivals of successive calls The 270 times in the sheet Log determine 270 time intervals which may be assumed to represent independent observations of T Let L be the random variable that gives the arrival t9ime of the last call in a run of 15 calls starting at am Example cont d Use VLOOKUP and RANDBETWEEN to generate 20 observations of L In the sheet Random of My Phone Logxls RANDBETWEEN is used to randomly select an integer between 1 and 270 In the sheet Times of M VLOOKUP is used to fin time between calls Phone Logxls the corresponding In the sheet Start Times the time between calls is added to the beginning of the hour or the arrival time of the previous call Example cont d Use the sample to estimate the probability that the last call in a run of 15 calls starting at 9 am will arrive before 915 am o the mean arrival time of the last call in a run of 15 calls starting at 9 am Example cont39d 0 Use the sample to estimate the probability that the last call in a run of 15 calls starting at 9 am will arrive before 915 am number ofobservations I 9 15 PL lt 915 2 number of observations number of observations 4 9 15 20 Example cont39d 0 Use the sample to estimate the mean arrival time of the last call in a run of 15 calls starting at 9 am sum of observations E L 1 C 3 number ofobservations sum of observations 20 Example 0 The business whose daily sales data are shown in the Excel file Daily Salesxls will be eligible to apply for a federal assistance program if its gross sales on 8 randomly selected business days are all under 7000 0 Use the functions RANDBETWEEN VLOOKUP and MAX in the sheet Raw Data to simulate 3000 sets of 8 day gross sales records 0 Use the COUNTIF function to estimate the probability that the business will be eligible for the federal program Example cont d Tn39al Dav 1 Day 1 Day 3 Day1 Day 5 Day I Day 8 1 5223 6191 5876 4640 1990 5 2 6067 5980 6314 6042 6344 5 3 5300 4923 6086 6695 6692 5 4 6074 771 1 6484 5032 6 s 5790 7340 4471 5433 5609 s 6 4354 6399 6057 6764 4556 s r 7 6704 5441 5152 6424 6311 6446 s 625 553 5664 5002 7218 5 9 4621 4640 5408 5446 537 6 10 6325 6619 5641 4610 5315 5 Trial Day 1 Day 2 7LOOKL PR1A1 DBETVEEI391 1 000SBSQSES 1 0084 V39LOOKL POKAJVDE 2 39LOOKL PRM39DBETVVEEN1 1 000SB59SES 1 008 VLOOKL PRAI D mwmw wwww wnww wmwr wo w own n r Hrmm eww1 wwaw Nkm DOOM mmm wmm hmmn Dmm 1 H N m w u E H E AUHCOUV nu QENXW Example cont d Therefore according to our sample of 3000 bootstrapped 8 day sets ie our computer simulated random sample of size 3000 the estimated probability that the business will be eligible for the grant is about o What if you recalculate ie hit F9 in Excel Does the estimated probability change Example cont d How can we improve our estimation Calculate the probability based on 5000 8 day sets in the simulation Calculate the probability based on just the 3000 sets Then repeat for a total of 10 times Average the 10 probabilities The larger the sample size the better the approximation of EX or any other statistical quantity Objectives 0 Give the definitions of the terms random sample and bootstrapping Use a random sample to estimate probabilities for and the expected value of a random variable 0 Plot a histogram that approximates the graph of the pmf of a finite random variable or the graph of the pdf of a continuous random variable Use the VLOOKUP and RANDBETWEEN functions in Excel to generate a bootstrap random sample of observations of a random variable Use conditional formatting in Excel

