The Chernoff bound. The Chernoff bound is a powerful tool that relies on the transform associated with a random variable, and provides bounds on the probabilities of certain tail events. (a) Show that the inequality P(X a) ::; e-sa M(s) holds for every a and every s 0, where M(s) = E[eSX] is the transform associated with the random variable X, assumed to be finite in a small open interval containing s = O. (b) Show that the inequality P(X ::; a) ::; e-sa M(s) holds for every a and every s ::; O. (c) Show that the inequality holds for every a, where (a) = max (sa - In 1\.1(s)). s2:0 (d) Show that if a > E[X] , then (a) > O. (e) Apply the result of part (c) to obtain a bound for P(X a), for the case where X is a standard normal random variable and a > 0. (f) Let Xl , X2 , . .. be independent random variables with the same distribution as X. Show that for any a > E[X], we have so that the probability that the sample mean exceeds the mean by a certain amount decreases exponentially with n. Solution. (a) Given some a and s 0, consider the random variable Ya defined by It is seen that the relation always holds and therefore, On the other hand, from which we obtain if X < a, if X a. (b) The argument is similar to the one for part (a). We define Ya by Since s 0, the relation always holds and therefore, On the other hand, from which we obtain Ya = { e sa, 0, if X a, if X> a. P(X a) e-sa M(s). (c) Since the inequality from part (a) is valid for every s 0, we obtain (d) For s = 0, we have P(X a) min (e-saj\,f(s)) so . - (sa- In M(s )) = mIne so _ - maxs>O (sa- In M(s)) - e - _ -q)(a ) -e . sa - InM(s) =O-lnl = 0, where we have used the generic property M(O) = 1 of transforms. Furthermore, d l I d I -(sa - lnM(s)) = a - -_. -M(s) = a -1 E[X] > o. ds s=O M(s) ds s=O Since the function sa - In M (s) is zero and has a positive derivative at s = 0, it must be positive when s is positive and small. It follows that the maximum (a) of the function sa - In M (s) over all s 0 is also positive. (e) For a standard normal random variable X, we have M(s) = e s 2 /2 Therefore, sa - In M(s) = sa - s2/2. To maximize this expression over all s 0, we form the derivative, which is a - s, and set it to zero, resulting in s = a. Thus, (a) = a2/2, which leads to the bound Note: In the case where a E[XJ , the maximizing value of s turns out to be s = 0, resulting in ( a) = 0 and in the uninteresting bound P(X a) 1. (f) Let Y = Xl + ... + Xn . Using the result of part (c) , we have where and P (* i?' a) = P(Y na) :o e->y(n n) , y (na) = max (nsa - In My (s) ) , sO is the transform associated with Y. We have In My (s) = n ln M(s), from which we obtain y (na) = n . max (sa - In M (s)) = n( a) , sO and p (;; t Xi ::: a) :; ,- n.(a) Note that when a > E[XJ, part (d) asserts that (a) > 0, so the probability of interest decreases exponentially with n.

Tuesday, January 10 th • Statistics: a field of study; science of learning from data o What are data § Information – could be numbers (weight, income) or words (gender, race) o How do we learn from it § Involves analyzing data and making inferences • Example 1: Aspirin and Heart Attacks Harvard Medical School conducted a study to investigate whether regular aspirin intake reduces heart attacks. Half of the total 22,000 male