We want to design a spam filter for email. As described in Exercise 1, a major strategy

Chapter 2, Problem 37

(choose chapter or problem)

We want to design a spam filter for email. As described in Exercise 1, a major strategy is to find phrases that are much more likely to appear in a spam email than in a nonspam email. In that exercise, we only consider one such phrase: free money. More realistically, suppose that we have created a list of 100 words or phrases that are much more likely to be used in spam than in non-spam. Let Wj be the event that an email contains the jth word or phrase on the list. Let p = P(spam), pj = P(Wj |spam), rj = P(Wj |not spam), where spam is shorthand for the event that the email is spam. Assume that W1,...,W100 are conditionally independent given M, and also conditionally independent given Mc. A method for classifying emails (or other objects) based on this kind of assumption is called a naive Bayes classifier. (Here naive refers to the fact that the conditional independence is a strong assumption, not to Bayes being naive. The assumption may or may not be realistic, but naive Bayes classifiers sometimes work well in practice even if the assumption is not realistic.) Under this assumption we know, for example, that P(W1, W2, Wc 3 , Wc 4 ,...,Wc 100|spam) = p1p2(1 p3)(1 p4)...(1 p100). Without the naive Bayes assumption, there would be vastly more statistical and computational diculties since we would need to consider 2100 1.3 1030 events of the form A1 \ A2 \ A100 with each Aj equal to either Wj or Wc j . A new email has just arrived, and it includes the 23rd, 64th, and 65th words or phrases on the list (but not the other 97). So we want to compute P(spam|Wc 1 ,...,Wc 22, W23, Wc 24,...,Wc 63, W64, W65, Wc 66,...,Wc 100). Note that we need to condition on all the evidence, not just the fact that W23\W64\W65 occurred. Find the conditional probability that the new email is spam (in terms of p and the pj and rj ).

Unfortunately, we don't have that question answered yet. But you can get it answered in just 5 hours by Logging in or Becoming a subscriber.

Becoming a subscriber
Or look for another answer

×

Login

Login or Sign up for access to all of our study tools and educational content!

Forgot password?
Register Now

×

Register

Sign up for access to all content on our site!

Or login if you already have an account

×

Reset password

If you have an active account we’ll send you an e-mail for password recovery

Or login if you have your password back