New User Special Price Expires in

Let's log you in.

Sign in with Facebook


Don't have a StudySoup account? Create one here!


Create a StudySoup account

Be part of our community, it's free to join!

Sign up with Facebook


Create your account
By creating an account you agree to StudySoup's terms and conditions and privacy policy

Already have a StudySoup account? Login here


by: Trace Mante MD


Trace Mante MD

GPA 3.61


Almost Ready


These notes were just uploaded, and will be ready to view shortly.

Purchase these notes here, or revisit this page.

Either way, we'll remind you when they're ready :)

Preview These Notes for FREE

Get a free preview of these Notes, just enter your email below.

Unlock Preview
Unlock Preview

Preview these materials now for free

Why put in your email? Get access to more of this material and other relevant free materials for your school

View Preview

About this Document

Class Notes
25 ?




Popular in Course

Popular in Computer Science and Engineering

This 16 page Class Notes was uploaded by Trace Mante MD on Monday October 26, 2015. The Class Notes belongs to CSCE 822 at University of South Carolina - Columbia taught by Staff in Fall. Since its upload, it has received 49 views. For similar materials see /class/229587/csce-822-university-of-south-carolina-columbia in Computer Science and Engineering at University of South Carolina - Columbia.

Popular in Computer Science and Engineering




Report this Material


What is Karma?


Karma is the currency of StudySoup.

You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 10/26/15
CSCE 822 Data Mining amp Knowledge Discovery Information as a Measure1 Several books were written in the late 50 s to formally define the concept of information as a measure of surprise or lack of or as the uncertainty of outcomes These books were inspired by earlier work by Shannon and Wiener who independently arrived to the same expression for average information Let X be a random variable associated to a space Q having n mutually exclusive events such that E e1e2e3e so E U ppp2p3 SO Zpk1 k Let EX be some function such that if experiments are conducted many times the averages of X will approach EX Shannon and Wiener suggested the expression below to quantify the average uncertainty or chaos or disorder or entropy associated to a complete sample space Q HXZp110gpz 11 1 A measure is a rather precise definition involving such things as oalgebras which makes it difficult to understand for nonmathematicians such as myself All we need to know here is that this and other definitions form the basis for much of what is known as Mathematical Analysis For example every definition of an integral is based on a particular measure The study of measures and their application to integration is known as Measure Theory University of South Carolina 1 Juan E Vargas CSCE CSCE 822 Data Mining amp Knowledge Discovery For each event ek there is a value or quantity 5 such that Xk 10gPek log pk The term i log is called the amount of selfinformation associated to the event ek The unit of information called a bit binary unit is equivalent to the amount of information associated with selecting one event from the set The average amount of information called Entropy is defined for a sample space Q of equally probable events Ek as How Tm in 10ng If we have a fair coin such that pH pT 12 then 1 l l l HEk HE1 EZ ElogE Elog5 log 1 blt Note that 1E1 1E2 log12 1 bit Extending this example if we have a sample Q with 2N equally probable events Ek k122N then 1Ek log p log2 N N bits Example EaA 1 A 2 P1256 255256 gt 171 00369 bit EbBL 32 P 2 12 gt1B I bit Suggesting that it is easier to guess the value of Ak than to guess the value of Bk University of South Carolina 2 Juan E Vargas CSCE CSCE 822 Data Mining amp Knowledge Discovery The measure H complies with the following axioms 1 N 00 4 Symmetry Continuity If the probabilities of events change the entropy associated to the system changes accordingly H is invariant to the order of events ie H0711p21 11072pr Extremal Value The value of H is largest when all events are equally likely because it is most uncertain which event could occur Additivity H2 H1 pmHm when the mth event is a composition of other events The ShannonWiener formulation for Entropy gained popularity due to its simplicity and its axiomatic properties To illustrate consider an earlier definition of information due to R A Fisher who essentially defined it as an average second moment in a sample distribution with density x and mean m I 3 fxdx Thus for example expressing the normal distribution fx University of South Carolina 3 l xim2 EUCXPH g Juan E Vargas CSCE CSCE 822 Data Mining amp Knowledge Discovery in logarithmic form deriving with respect to the mean and taking the integral l xim 10 x 7711127riln0397 2 gf 2 50 alnfixim 0m 7 0392 xim 2 l xim 2 1 Jill OJ maexpk J50 MR7 The ShannonWiener expression for information can be generaJized for 2dimensi0nal probability schemes and by induction to any ndimensional probability schemes Let 21 and 22 be two discrete sample spaces with sets E E1E2En and 7 F1F2Fm We can have three complete sets of probability schemes PE P EkH PW P f P EF P Eka University of South Carolina 4 Juan E Vargas CSCE CSCE 822 Data Mining amp Knowledge Discovery The joint probability matrix is given by p11 pl2 p1m p21 p22 p2m PltXYgt pnl pn2 pnm We can obtain the marginal probabilities for each variable as in Peck ipwyj F1 p11 pl2 p1m pxl Pu7n p21 p22 p2m px2 px3 pml p011 pmm p064 Pm 2pm 5pm am 11 PM PE1 PE1Fl U Ein U U ElFm PL1 P12 NEW and Pol 2pltxygtipltxkvyfgt my k1 From the matrix we can also compute the total and marginal entropies HX HY HX Y HOLYUiimkwogmm n i Mk 110g i Mk 1 2 pm 10g pm 1 k1 HX University of South Carolina 5 Juan E Vargas CSCE CSCE 822 Data Mining amp Knowledge Discovery HYifHiMh jbgiMh imMogmj F1 k1 k1 1 Note that to obtain H we must find the corresponding pxk and pyj first To better understand the calculations involved in HXY versus HX and HY let mn3 Then HOLYiimkanogpw k1 F1 HX Y p11logp11 p12logp12 p13logp13 p21logp21 p22logp22 p23logp23 p31logp31 p32logp32 p33logp33 while HX pk1log pk1 i pxklogpxk 161 HX 1711P12P1310g0711P12P13 1721P22P2310g0721P22P23 1731P32P3310g0731P32P33 We can also compute the conditional entropies Due to the addition theorem of probability the union of Ek is iEk 1 Therefore marginalizing Ek from Bayes theorem px y pxl y py pyl x px therefore University of South Carolina 6 Juan E Vargas CSCE CSCE 822 Data Mining amp Knowledge Discovery PXxk Yyj PXxk1Yyj Fwy 7 J P V J My pm lyj where pyj is the jth marginal and s0 HXlYZZpxkylogpxk lyj F1 k1 HYlXZZpxkyjlogDyj lxk k1 1 Note again that the two equations imply that you have to compute the marginals first From Bayes theorem pxypxl ypy pyl xpx we canwrite HWY HWY HWY HmX HX Example Two honest dice X and Y are thrown Compute HXY HOG HY HXlY and HYLX Thejoint probability table is YX 1 2 3 4 5 6 eX 1 136 136 136 136 136 136 16 2 136 136 136 136 136 136 16 3 136 136 136 136 136 136 16 4 136 136 136 136 136 136 16 5 136 136 136 136 136 136 16 6 136 136 136 136 136 136 16 fy 16 16 16 16 16 16 11 The entropies can be calculated from the table University of South Carolina 7 Juan E Vargas CSCE CSCE 822 Data Mining amp Knowledge Discovery HXY7 x 6 6 1 if 7 lgPylogA6i log675l7 bltS J HX Hobie log y 710g y 258 bits HXlYHYle7 6 11 J PU logy 258 bits A Measure of Mutual Information We would like to formulate a measure for the mutual information between two symbols Xiyj Solomon Kullback wrote in 1958 a book on the study of logarithmic measures of information and their application to the testing of statistical hypotheses such as determining if two independent random samples were drawn from the same population or if the samples are conditionally independent etc Let H i12 be the hypothesis that X is from a population with a probability measure 1 Applying Bayes theorem PH fix HH W PH1f1x PH2 m x for i12 University of South Carolina 8 Juan E Vargas CSCE CSCE 822 Data Mining amp Knowledge Discovery Expanding PHllx for i12 solving for f1 and f2 and simplifying f1x PH1lXPH2 f2x PH2lxPH1 taking the log we obtain MZIOgHHllx 10gPH1gt f2x PltH2 l x PltH2gt log The right side of the equation is a measure of the difference between the odds in favor of H1 after the observation Xx and before the observation Kullback defined this expression as the information in Xx for discriminating in favor of H1 against H2 The mean information is the integral of the expression which is written as PH1lx E P Il239 log du1 flogPH Z for PH2 lx 5 quot2 Generalizing for kdimensional Euclidean spaces of two dimensions with elements XY the mutual information between X Y is given by fxy IXY 1 dd fxy0ggxhyxy University of South Carolina 9 Juan E Vargas CSCE CSCE 822 Data Mining amp Knowledge Discovery We can think of the pair XY as the signals that a transmitter X sends to a receiver Y At the transmitter pm conveys the priors for each signal being sent while at the receiver poetW is the probability that xi was sent given that yj was received Therefore the gain in information has to involve the ratio of the final and initial ignorance or poetW pxl Let X x1x2xn 2x N1 and Y y1y2ym 2y NZ 1 We can rewrite the mutual information I O Y for the discrete case as We a y Px1 Pyj 1XYZ ZPMJ log 1 Using pxypxl ypy pyl xpx we can also write IXY as pm lyj IXY 1 ZZZ1PM yj0g PM We can also write X Y as expressions involving entropy IXY HX HY 7HX Y University of South Carolina 10 Juan E Vargas CSCE CSCE 822 Data Mining amp Knowledge Discovery 1X Y HXHXlY 1YIX HYHYlX Example Compute IXY for a transmitter with an alphabet of 5 signals X1 X2 X3 X4 X5 and a receiver with 4 signals y1 y2 y3 Y4l The Joint Probability Table JPT and a system graph are Y1 Y2 Y3 Y4 X1 025 0 0 0 X2 010 030 0 0 X3 0 005 010 0 X4 0 0 005 010 X5 0 0 005 0 fx1 025 gy 025 010 035 fx2 010 030 040 gyZ 030 005 035 fx3 005 010 015 gy3 010 005 005 020 fx4 005 010 015 gy4 010 fx5 005 px1ly1 px1y1gyl 2535 57 pOlel px1y10quotx125 2510 px2ly2 33567 py2lx2 px2y2fx2 34 75 px3ly3 05 University of South Carolina 11 Juan E Vargas CSCE CSCE 822 Data Mining amp Knowledge Discovery py3lx3 23 px4ly4 10 py4lx4 23 px2ly1 27 pyle2 14 px3ly2 17 py2lx3 13 17064 M 4 py3lx4 13 px5ly2 005020 1 py3lx5 005005 10 HXY ZZpxylogpxy 2510g25 llogl 3log3 0510g05 llogl 0510g05 llogl 0510g05 r y HXY 2665 etc logio Note log2N703010 Likewise the calculations for HX HY HX IY and HY l X can be performed Given these we can assess whether X and Y are independent variables Another interesting question is where do probabilities come from and how can we use them to create a Bayesian network To answer these questions let39s consider the two sets below S and C University of South Carolina 12 Juan E Vargas CSCE CSCE 822 Data Mining amp Knowledge Discovery which were sampled from a database related to the famous quotChest Clinicquot example The variables S and C represent instantiations of Smoking YN and Cancer YN slllOllOlOOllOlOllOOO001010110100lOlllOlOOOOlllOl lOOOOOOlllOllOOOOOOlllOOOOOOlOOlOllllOOOlOlOlOOOllO 0 c00000001000000OOOOOO00001000OOOOOOOOOOOOOOOOOOOO OOOOOOOOOlO010000OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO 0 The joint probability table for the sample is approximately C0 01 so 055 00 s1 041 004 PSC 055 041 00 004 10 HSC 04Ilog041 004log004 055log055 11834 HS722psclogpS 055log05504Ilog045004log045 099277 HC722psclogpC 055log096 04110g096 00 004log004 0243244 HltS Cr 72240010412010 722 psc HSlC 055log055096 04110g041096 00 004log004004 0 9453 University of South Carolina 13 Juan E Vargas CSCE CSCE 822 Data Mining amp Knowledge Discovery HltC 5 2 Z psclogpc l s 42 z psc 10gj HClS 055log055055 04Ilog041045 004log004045 019467 HS C HS HClS HC HSlC 099277 019467 0 24324 0 9453 118744 z 1188 1SC HS HC HSC 099277 02432 1188 80048 1SC HS HSC 099277 0 9453 w 0048 1CS HS HC HSC 099277 02432 1188 8 0048 1CS HC HClS 024324 019467 80048 Note that we could have also calculated ICS by inverting the order of i andj in the summations All we want is to assess conditional dependency The fact that HSlC 09453 gtgt HClS 019467 indicates that there is less uncertainty surprise regarding C when S is known and therefore a Bayesian network involving the two variables carries more information when this relationship is represented as University of South Carolina 14 Juan E Vargas CSCE CSCE 822 Data Mining amp Knowledge Discovery H Therefore the conditional probability table associated to the edge should be C0 Ci C0 Ci s0 5555 055 Or 0 10 00 1 4145 0445 s1 0911 0089 And the next question is If we have a data base with N variables should we compute the mutual information for each of the ZN 1 pairs For N25 variables we only need 4321 10 calculations of mutual information However when the number of variables is much larger as for example N2103 we would need to find ways to reduce the number of computations Kullback also defined the divergence J12 as the mean observation from 1 for discriminating in favor of H2 against H1 as 121jfx10gam1 and m 1217J2xlogmd4 7 1200 PH1lx 7 PH1lx J127112I21J x xlogmd ijlogmdyl Jlogmdyz University of South Carolina 15 Juan E Vargas CSCE CSCE 822 Data Mining amp Knowledge Discovery J12 is a measure of the divergence between H1 and H2 ie a measure of how difficult it is to discriminate between them Kullback studied properties about these measures additivity convexity invariance sufficiency minimum discrimination information and others and made University of South Carolina 16 Juan E Vargas CSCE


Buy Material

Are you sure you want to buy this material for

25 Karma

Buy Material

BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.


You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

Why people love StudySoup

Jim McGreen Ohio University

"Knowing I can count on the Elite Notetaker in my class allows me to focus on what the professor is saying instead of just scribbling notes the whole time and falling behind."

Janice Dongeun University of Washington

"I used the money I made selling my notes & study guides to pay for spring break in Olympia, Washington...which was Sweet!"

Steve Martinelli UC Los Angeles

"There's no way I would have passed my Organic Chemistry class this semester without the notes and study guides I got from StudySoup."


"Their 'Elite Notetakers' are making over $1,200/month in sales by creating high quality content that helps their classmates in a time of need."

Become an Elite Notetaker and start selling your notes online!

Refund Policy


All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email


StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here:

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.