### Create a StudySoup account

#### Be part of our community, it's free to join!

Already have a StudySoup account? Login here

# Undergraduate Seminar I STAT 3484

UCONN

GPA 3.87

### View Full Document

## 42

## 0

## Popular in Course

## Popular in Statistics

This 14 page Class Notes was uploaded by Blair Williamson on Thursday September 17, 2015. The Class Notes belongs to STAT 3484 at University of Connecticut taught by Joseph Glaz in Fall. Since its upload, it has received 42 views. For similar materials see /class/205897/stat-3484-university-of-connecticut in Statistics at University of Connecticut.

## Similar to STAT 3484 at UCONN

## Popular in Statistics

## Reviews for Undergraduate Seminar I

### What is Karma?

#### Karma is the currency of StudySoup.

#### You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 09/17/15

Runs and Scans for Discrete Data Joseph Glaz February 3 2009 1 Outline of the talk 1 An introduction 2 Runs of successes in 0 7 l Bernoulli trials 3 Scan statistics for 0 7 l Bernoulli trials 4 Scan statistics for binomial and Poisson models 5 Scan statistics for discrete data when the number of observed events is known 2 Example Quality Control Suppose that items coming off a continuous production line are examined one at a time and are classi ed as non defecitive denoted by 0 failure and defective denoted by 1 success A quality control engineer might be interested to determine the waiting time for the rst defective item or the waiting time for the kth defective item Other interesting questions one might ask are the waiting time for k consecutive defective items or the waiting time to observe k or more defective items with a consecutive string of m produced items An acceptable model for this example assumes that the observations are independent of each other and that the probability that each item is defective is equal to p where 0 lt p lt l and therefore the probability that each item is non defective is equal to l 7 p This is the so called Bernoulli model lf we denote the observed data in this example by X17 X27 quotquot7 Xn then we assume that the observations X1X2 X are independent and identically distributed iid and PX117PXOp112 n 21 Geometric Distribution Assume that the observations X1X2 X are independent and iden tically distributed iid and PX 1 1 i PX 0 p i 1 2 where 0 lt p lt l Denote by Y the waiting time to observe the rst one in a sequence of 0 7 l iid Bernoulli trials Then PY y 1 W lp y 1 2 One can easily show that PW ygtilt1epgty1pp lt1ipgty1 y1 The random variable Y is called a geometric random variable and the proba bility distribution PY y l 7p 1p is called a geometric distribution One can show that PY 2 y 1 W Hy 1727m and 1 EY 1 So for example if p 001 then EY 000 22 Negative Binomial Distribution Assume that the observations X1X2 X are independent and iden tically distributed iid and PX117PX0p i12 n 7 7quotquot 2 where 0 lt p lt 1 ln the quality control example denote by W the waiting time to observe the kth defective item Then wil PltW w w 7 k 1 717 ka w m 1 The random variable W is called a negative binomial random variable and the probability distribution PW w is called a negative binomial distribution One can show that So if p 001 and k 5 then 5000 3 Runs in n iid Bernoulli Trials Assume that the observations X1 X2 X are iid with PXl17PX 0 p i l2n where 0 lt p lt 1 We will refer to these observations as 0 7 l Bernoulli trials Let T denote the waiting time until a sequence of k consecutive l s successes is observed Formally T is de ned as follows Th minnXnk1 Xn1 X minn Z Xjk jn7k1 Another random variable closely related to T is the length of the longest success run in n trials denoted by L Since the event Th gt n is equivalent to Ln lt k the following is true PT gtn PLn ltk and 31 Example A hitting streak in a baseball game The question we may ask is how unusual is to observe a baseball player hitting 300 to have hits in k 25 consecutive games We assume that the baseball player stays healthy and plays in all 162 games during the season and on the average the player comes up to the plate 5 times during any game which might be true for players who hit rst or second One can show that PL162 2 25 22 Which means that it is something that we would expect to happen Still a sports article will present this as something spectacular has occurred 32 A Recurrence Formula for Evaluating P L 2 Is Using an elementary approach described in the book An Introduction to Probability Theory and Its Applications 3rd Ed by W Feller Wiley p277 278 one can establish the following recurrence formula for evaluating PLquotltkliPLn 2k Let q PLn lt Ii denote the probability that there is no consecutive run of k or more successes in n trials Then qn 1 pqn71 1117 pqn72 p HU 7 mm and 10 11 11971 1 This recurrence formula is easy to implement for computing effectively PLn 2 321 Back to the example For a baseball player with a hitting average of 300 who has 5 appearances at a plate the probability to get at least one hit during each game can be evaluated from a binomial distribution the binomial random variable is denoted by Y with N 5 and p0 3 to get PY2117PY0177583193 4 When using the recurrence formula with p 83193 one gets that PL162 2 25 22 322 Expected run length Using the following formula for the expected value of an integer valued ran dom variable Y Em W gt y we can get a formula for the average run length ARL in an iid sequence of 0 7 1 Bernoulli trials with probability of success equal to p ARL fan 2k 191 3 2 3 Example Suppose that a baseball player plays n 150 games in a season and is hitting on an average 333 The player hits in the middle of the order and comes up to the plate on an average 4 times during each game Then the probability that this player will get at least one hit during each game is 1 7 6674 8021 Moreover k P Ln 2 k 10 986 15 681 20 288 25 100 Also ARL 175 324 Example Pitching data analysis for Bret Saberhagen KC Royals 1989 CY Young Award In 1989 Bret Saberhagen pitched n 262 innings His ERA was 216 Based on that we can estimate the probability of having 0 runs in an inning as leg 1676 9 Based on that the expected number of consecutive inning pitched with 0 runs is about 17 During the month of September he pitched 31 consecutive inning with 0 runs scored PL262 2 31 013 What does this mean 4 Exact Formula for PTk 17 and P Ln 2 Is In the textbook by Balakrishnan and Koutras referenced above Runs and Scans with Applications the authors present an extensive survey on the exact formulae derived by several researchers For example Philippou and Muwa The Fibonacci Quarterly 20 225 236 1982 derived the following formula 951 902 91 lt1 pxl2xk 901902 muck p where x 2 k and the summation in the formula is carried out for all non negative integers x1 x2 90 subject to the condition 221 i901 x 7 k Muselli Statistics and Probability Letters 46 239 249 1996 derived a simpler formula based only on binomial coef cients 11 kl PTx in pmuwwlltrijkel 1ipltxijk71 if 771 To evaluate P Ln 2 k from these formulas one needs to use the following relation P T k 1 pLn2k1nLk7n21 1 7 pp 405 Asymptotic approximations Balakrishnan and Koutras in their book discuss probability inequalities for P Ln 2 k as well as asymptotic approximations as n and k get large In many instances these approximations do not perform well For example PLn 2 k 1 7 exp7 where n0 7 Milk For our example n 162 k 25 andp 3 we get that A 1627325 9608252831 X 10712 PL162 Z 31 6Xp 0 which is not a good approximation More accurate approximations exist but will not be discussed here 5 A Scan Statistic for iid O l Bernoulli Trials As attractive runs of certain events are many times the do not lead to a conclusion that an unusual event has occurred On the other hand if one observes several interrupted runs combining them together might lead to a conclusion that an unusual even has occurred 5 1 Example Suppose that in the baseball game example for the player who hits 333 the following sequence occurred in 40 consecutive games during a season of 150 games 0 l l l 0 l l l 0 l l l 0 where the number of games with a hit is equal to 1312l3 38 lf only the largest run has been recorded then nothing unusual has been detected On the other hand as we will see the occurrence of 38 1 s in 40 consecutive games is unusual ln fact one can show that the probability to observe somewhere within a sequence of 150 games a consecutive string of 40 games with 38 games with a hit is approximately equal to 029 52 De nition of a Scan Statistic Assume that the observations X1 X2 X are iid with where 0 lt p lt 1 We de ne im71 Snmmax Z Xj1 i n7m1 ji39 the largest number of successes in any scanning window of size m We refer to Sum as the scan statistic One is interested in evaluating P09quotm 2 Is Note that if m k then then a run of at least k consecutive 1 s has occurred Therefore one can view this can scan statistic as a generalized run of k successes within m consecutive trials in a sequence of length n Now suppose that X1 X2 Xn are independent 0 7 l Bernoulli trials with where 0 lt p lt 1 This scan statistic has been used for testing the null hypothesis H0 I p0 l 2 TL vs the alternative Ha 5 pip07i17277n07n0m1nand pi 1717int l1n0m7 where p1 gt p0 p0 is known while p1 and n0 are not known One can show that the test based on Sum is a generalized likelihood ratio test Glaz and Naus Annals of Applied Probability 1 306 318 1991 This test rejects Hg for large values of Sum To implement this test statistic for a given signi cance level 04 one has to nd the value of k such that MSW 2 k a Glaz and Naus derive in this article accurate bounds and approximations for P Sum gt The results they derive are valid in general for discrete dis by Naus Journal of the American Statistical Association 77 177 183 1982 More recently Fu Journal of Applied Probability 38 1 9 2001 derived an exact formula for the distribution of Sum that can be implemented for most values of parameters km and n tributions Accurate approximations for P SW7 gt k has also been derived The inequalities in Glaz and Naus 1991 have the following relative sim ple form LB S PSnm 2 k S UB7 where LB 1 7 1352mm lt k 1 7 f2mquot 2m and n72m UB17PSmmltk 1 2 1352mmltI Png1mltk and Png m lt Png1 m lt 53 Example For the baseball example with 38 hits in 40 consecutive games in a 150 game season one gets 02899 S PSl5040 2 38 S 02939 54 Example In the baseball example in regard Bret Saberhagen suppose that 37 innings with zero earned runs had been observed in 40 consecutive innings during the 262 innings season in 1989 Then PS26240 2 37 z 067 and 0658 3 P0926210 2 37 g 0672 55 Numerical Results Inequalities for PSm n 2 k n m p Ii LB UB 100 10 05 2 6905 7781 3 2215 2305 4 0310 0312 5 0025 0025 026 026 10 3 6922 7776 4 2733 2869 5 0540 0544 500 10 01 2 3238 3286 3 0159 0159 4 0003 0003 05 2 9967 9996 3 7274 7479 4 1534 1544 5 0134 0134 LB and U13 are product type lower and upper bounds based on Glaz and Naus 1991 6 Binomial and Poisson Models 6 1 Example Consider a radar sweep where a dichotomous quantizer transmits to the de tector the digit 1 if the signal plus noise waveform exceeds a pre determined threshold and the digit 0 otherwise Dinneen and Reed IRE Transaction Information Theory 2 29 39 1956 advocate the use of a moving window detector which basically a scan statistic Mirstik IEEE Transaction Aerospace Electron Syst 14 103 108 1978 discusses the use of a moving window detector based on several independent scans of several radars for target detection Basically this is a scan statistic for binomial data 10 62 Example In a time sharing system messages of constant length arrive the tra ic con centrator at discrete time intervals separated by tm time units where t is the time required to transmit a message of constant length At most k messages can be transmitted simultaneously Chu IEEE Trans Comput 19 530534 1970 and Fredrikson IEEE Trans Commun 22 18621866 1974 discussed the use of binomial and Poisson models respectively for design of buffer design for such systems 7 Conditional Scan Statistic for 01Bern0ulli Trials Let X1 Xn be iid 0 7 l Bernoulli trials Suppose that a successes 1 s and n 7 a failures 0 s have been observed Then P X1 x1 m 2X a i1 a In this case the joint distribution of the observed 0 7 1 trials assigns equal probabilities to all the arrangements of a 1 s and n 7 a 0 s We are inter ested in deriving tight bounds for the upper tail probabilities for a conditional scan statistic denoted by Pkmna P SW 2 m 2X a i1 where Sum has been de ned above Let U1 UN be the random variables denoting one of the sequences of 0 7 1 trials that contain a 1 s and n 7 a 0 s For n Lm L 2 4 and l S i S L 7 1 de ne the events m1 Di Ui71mj Uimj1 g k 71 j1 It follows that 11 L71 Pkmna P U D5 i1 Employing the second order Bonferroni type inequality in Hunter Joan nal of Applied Probability 13 597 6031976 and the fact that Di are station ary events we get L71 Pk m N a 3 Z P D5 7 Z P D m D131 i1 i72 1 L 7 3 2ma 7 L 7 2lq3ma 7 where for 1 S i S L 7 17 q2ma PDlv7 q3ma PDlv Diil and for 7 2 3 mjnrk7ra n7rni er Z H a where q7 mlj 17PM m7 rm j are evaluated via a result in Naus Journal of the American Statistical Association 69 810 815 1974 To derive a Bonferroni type lower bound for P k m7 N a an inequality in Kwerel Journal of the American Statistical Association 70 472 479 1975 yields L71 281 282 P D gt 7 U z i b it 1 51 2131 12 71gtlt17 aim and 52 Z PD mD 1 iltj L71 Lingme Z PD mD 7 173 7 2l1 7 2q2ma 7q3ma7l 503 7 2L 7 3l1 7 2q2ma qgm2mal L 7 2qema 503 7 2L 7 3q2m2ma 501 71L 7 21 7 2q2ma 12 where min2k7 211 mjn2k72aij1 92m2ma and q7 m j 7 27 3 are de ned above j10 Inequalities for Pk m7 n a j2 0 92m1j192m1j2 2 11 a n m a k BTLB BTUB 100 10 5 3 166 166 4 009 009 10 3 830 907 4 231 233 5 026 026 20 10 5 350 351 6 075 075 7 010 010 500 10 50 4 766 100 5 238 250 6 027 027 20 25 4 659 759 5 168 171 6 027 027 7 002 002 20 75 8 419 465 9 124 127 10 026 026 BTLB and BTUB are Bonferroni type lower and upper bounds 13 TL m aijrjz 1 23X 4 quot Inequalities for E Sum n m 1 LB UB 100 10 5 205 210 10 309 317 20 10 428 433 20 706 714 500 10 25 289 304 50 402 428 20 50 556 593 75 719 762 14

### BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.

### You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

## Why people love StudySoup

#### "I was shooting for a perfect 4.0 GPA this semester. Having StudySoup as a study aid was critical to helping me achieve my goal...and I nailed it!"

#### "Selling my MCAT study guides and notes has been a great source of side revenue while I'm in school. Some months I'm making over $500! Plus, it makes me happy knowing that I'm helping future med students with their MCAT."

#### "I was shooting for a perfect 4.0 GPA this semester. Having StudySoup as a study aid was critical to helping me achieve my goal...and I nailed it!"

#### "Their 'Elite Notetakers' are making over $1,200/month in sales by creating high quality content that helps their classmates in a time of need."

### Refund Policy

#### STUDYSOUP CANCELLATION POLICY

All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email support@studysoup.com

#### STUDYSOUP REFUND POLICY

StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here: support@studysoup.com

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to support@studysoup.com

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.