### Create a StudySoup account

#### Be part of our community, it's free to join!

Already have a StudySoup account? Login here

# Artificial Intelligence Prog CS 662

USF

GPA 3.7

### View Full Document

## 14

## 0

## Popular in Course

## Popular in ComputerScienence

This 39 page Class Notes was uploaded by Michele Herzog on Thursday October 29, 2015. The Class Notes belongs to CS 662 at University of San Francisco taught by Christopher Brooks in Fall. Since its upload, it has received 14 views. For similar materials see /class/231232/cs-662-university-of-san-francisco in ComputerScienence at University of San Francisco.

## Reviews for Artificial Intelligence Prog

### What is Karma?

#### Karma is the currency of StudySoup.

#### You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 10/29/15

ALA L 1 1 Artificial Intelligence Programming Belief Networks Chris Brooks Department of Computer Science University of San Francisco Department of Computer Science 7 University of San Francisco 7 p 17 140 Probability Review Probability allows us to represent a belief about a statement or a likelihood that a statement is true A Pmz n 06 means that we believe it is 60 likely that it is currently raining e Axioms A 0 S Pa g 1 A The probability of A B is PA PB PA B A Tautologies have P 1 a Contradictions have P 0 Department of Computer Science University of San Francisco p2739 141 Probabilistic Inference e Previously we talked about rulebased systems that can draw logical inferences a HungryH0mer gt G0e8T0H0mer Quickie mart A HungryH0mer a G0e8T0H0mer Quickie mart e We would like to extend this sort of operation to worlds with uncertainty a PHungryH0mer 098 a PG0e8T0H0mer Quickie mart lHungryH0mer 05 o PG0e8T0H0mer Quickie mart Department of Computer Science University of San Francisco p3739 142 Prior Probability The prior probability of a proposition is its probability of taking on a value in the absence of any other information A PRain 01 POvercast 04 PSunny 05 We can also list the probabilities of combinations of variables A PRai39n Humid 01 PRai39n Humz39d 01 POvercast Humid 02 POvercast Humz39d 02 PSunny Humid 015 PSunny Humz39d 025 This is called a joint probability distribution Can only enumerate discrete variables Department of Computer Science University of San Francisco p4739 143 Inference with the Joint Distribution We can use the axioms of probability to reason with the joint distribution Hum High Hum High Hum Normal Hum Normal Sky Overcast Sky Sunny Sky Overcast Sky Sunny Rain 01 005 015 005 a Rain 02 015 01 02 PRain PRain Hum High Sky Overcast PRain Hum Normal Sky Overcast PRain Hum High Sky SunnyVPRain Hum High Sky Sunny PRam 01 005 015 005 035 We call this the marginal probability of rain Department of Computer Science University of San Francisco p5739 144 Conditional Probability Once we begin to make observations about the value of certain variables our belief in other variables changes a Once we notice that it s cloudy PRaz n goes up this is called conditional probability Written as PRaz nlCl0udy P a b Pltalb 122 or Pa b PalbPb A This is called the product rule Department of Computer Science University of San Francisco p6739 145 Conditional Probability Example PCl0udy 03 PRain 02 PCl0udy min 015 PCl0udy Raz n 01 Pheloudy Rain 01 P Cloudy Raz n 065 A Initially PRaz n 02 Once we see that it s cloudy PRamlCl0udy PW 05 Department of Computer Science University of San Francisco p7739 146 Bayes Rule Often we want to know how a probability changes as a result of an observation Recall the Product Rule A Pa b PalbPb A Pa b PblaPa We can set these equal to each other a PalbPb PblaPa And then divide by Pa P ab P b A Pbla l8a This equality is known as Bayes theorem or rule or law Department of Computer Science University of San Francisco p8739 147 Bayes theorem example Say we know A Meningitis causes a stiff neck in 50 of patients A P8tiffNeCklMenmgitis 05 A Prior probability of meningitis is 150000 A Pmeningitis 000002 A Prior probability of a stiff neck is 120 a P8tiffNeCk 005 e A patient comes to use with a stiff neck What is the probability she has meningitis e PmeningitislstiffNeck PstiffNeckmeningitisPmeningitis O5gtlt000002 0 PstiffNeck 005 39 Department of Computer Science University of San Francisco p9739 148 Independence If two variables do not influence each other we say they are independent A Day of week and Rain are independent so PRainlMonday PRaianues PRain Independence is great if we can get it More commonly we ll see variables that are conditionaly independent This means that two variables are independent given the observation of a third variable Department of Computer Science University of San Francisco p10739 149 Conditional Independence e As an example consider diagnosing whether a patient has cancer 6 Let s say we know that cancer can cause lung tumors Ptum0rlcancer gt 0 e We also know that cancer can produce a positive calcium serum test PCalciumlcancer gt 0 Department of Computer Science University of San Francisco p11739 1410 Conditional Independence e Initially detecting a tumor will increase the probability of detecting calcium a Bayes rule will then increase PCancerltum0r which will influence PCalciumlcancer 63 However once we know for sure whether cancer is true or false tumor and calcium cannot influence each other c This is conditional independence two variables a and b are independent given the observation of a third variable c Department of Computer Science University of San Francisco p12739 1411 Probabilistic Inference In general we can use Bayes Rule to do this The problem is in working with the joint probability distribution A Exponentially large table We need a data structure that captures the fact that many variables do not influence each other A For example the color of Bart s hat does not influence whether Homer is hungry We call this structure a Bayesian network or a belief network Department of Computer Science University of San Francisco p13739 1412 Bayesian Network o A Bayesian network is a directed graph in which each node is annotated with probability information A network has A A set of random variables that correspond to the nodes in the network A A set of directed edges or arrows connecting nodes These represent influence In there is an arrow from X to Y then X is the parent of Y A Each node keeps a conditional probability distribution indicating the probability of each value it can take conditional on its parents values A No cycles it is a directed acyclic graph G The topology of the network specifies what variables directly influence other variables conditional in d e p e n d e n I 0 n S h i Department of Computer Science University of San Francisco p14739 1413 Burglary example 6 Two neighbors will call when they hear your alarm a John sometimes overreacts A Mary sometimes misses the alarm 6 Two things can set off the alarm A Earthquake F A Burglary 6 Given who has called what s the probability of a burglary Department of Computer Science University of San Francisco p15739 1414 Network structure Each node has a conditional probability table This gives the probability of each value of that node given its parents values These sum to 1 Nodes with no parents just contain priors Department of Computer Science University of San Francisco p16739 1415 Summarizing uncertainty Notice that we don t need to have nodes for all the reasons why Mary might not call A A probabilistic approach lets us summarize this information in M This allows a small agent to deal with large worlds that have a large number of possibly uncertain outcomes How would we handle this with logic Department of Computer Science University of San Francisco p17739 1416 Implicity representing the full joint distribution Recall that the full joint distribution allows us to calculate the probability of any variable given all the others Independent events can be separated into separate tables These are the CPTs seen in the Bayesian network Therefore we can use this info to perform computations PI1 512 13 HPlti paT nt8z PAaE BJM PJlAPMlAPAl B EP BP E 090 gtk 070 gtk 0001 gtk 0999 gtk 0998 000062 Department of Computer Science University of San Francisco p18739 1417 Some examples What is the probability that Both Mary and John call given that the alarm sounded A PMlA gtk PJlA 90 gtk 70 063 What is the probability of a breakin given that we hear an alarm A PBlA 095 0001 What is the probability of a breakin given that both John and Mary called A PBJMPBJMAA IEPBJMAEVPBJ MAIAAIEPBJMIAE This last example shows a form of inference Department of Computer Science University of San Francisco p19739 1418 Constructing a Bayesian network There are often several ways to construct a Bayesian network The knowledge engineer needs to discover conditional independence relationships Parents of a node should be those variables that directly influence its value a JohnCalls is influenced by Earthquake but not directly a John and Mary calling don t influence each other Formally we believe that PMaryCallleohnCalls Alarm Earthquake Burglary PMaryCallslAlarm Department of Computer Science University of San Francisco p20739 1419 Compactness and Node Ordering Bayesian networks allow for a more compact representation of the domain a Redundant information is removed Example Say we have 30 nodes each with 5 parents Each CPT will contain 25 32 numbers Total 960 Joint 230 entries nearly all redundant Department of Computer Science University of San Francisco p21739 1420 Building a network Begin with root causes Then add the variables they directly influence Then add their direct children This is called a causal model A Reasoning in terms of cause and effect Estimates are much easier to come up with this way We could try to build from effect to cause but the network would be more complex and the CPTs hard to estimate PElB Department of Computer Science University of San Francisco p22739 1421 Conditional Independence in Bayesian networks Recall that conditional independence means that two variables are independent of each other given the observation of a third variable Pa bio PalCPblC A node is conditionally independent of its nondescendants given its parents A Given Alarm JohnCalIs is independent of Burglary and Earthquake A node is conditionally independent of all other nodes given its parents children and siblings the children s other parents A Burglary is independent of JohnCalIs given Alarm and Earthquake Department of Computer Science University of San Francisco p23739 1422 Conditional Independence in Bayesian networks 1423 Conditional Independence in Bayesian networks 23 General rules A Two nodes are conditionally independent given their parents JohnCalIs and MaryCalIs given Alarm A A node is conditionally independent of nondescendents given its parents JohnCalIs is conditionally independent of Burglary given Alarm A A node is conditionally independent of all other nodes given its parents children and its children s other parents Department of Computer Science University of San Francisco p25739 1424 Inference in Bayesian networks In most cases we ll want to use a Bayesian network to tell us posterior probabilities We observe a variable how do other variables change We can distinguish between query variables things we want to know about and evidence variables things we can observe There are also hidden variables which influence query variables but are not directly observable JohnCalls and MaryCalls are evidence Earthquake and Burglary are queries and Alarm is hidden Department of Computer Science University of San Francisco p26739 1425 Enumeration We can rewrite the probability PXle aPX e 04 Z PX e y where y are hidden variables This is equivalent to summing within the joint probability distribution Consider PBurglarleohnCalls MaryCalls ozPBurglary JohnCalls MaryCalls This is 04 2E 2A PB E A J M For B true we must calculate PBlJ M 042E EA PBPEPA B7 EPJlAPMlA Department of Computer Science University of San Francisco p27739 1426 Enumeration Problem a network with n nodes will require 0n2 computations We can simplify by bringing the priors outside the summation 04133 ZE PE EA PAlBa EPJlAPMlA Still requires 02 calculations Department of Computer Science University of San Francisco p28739 1427 Enumeration 0 This tree shows the computations needed to determine PBiJ M 6 Notice that there are lots of repeated computations in here may Portlie 0 Pcaibre 05 94 06 POW Pjia POW 05 90 05 Portion Portia Portion 01 70 01 0 By eliminating repeated computations we can speed things up Department of Computer Science University of San Francisco p29739 1428 Variable Elimination We can reduce the number of calculations by caching results and reusing them Evaluate the expression from righttoleft Summations are done only for nodes influenced by the variable being summed Consider PBljm oz PB Z Pe 2a PalB e Pjla Pmla B E A J M 6 The equation has been separated into factors Once we compute PMla and PMl a we cache those results in a matrix fM Similarly for PJla and PJl a stored in f Department of Computer Science University of San Francisco p30739 1429 Variable Elimination PAlB E will result in a 2 x 2 x 2 matrix stored in FA We then need to compute the sum over the different possible values of A This sum is Ea fAA B EfJAfMA We can process E the same way In essence we re doing dynamic programming here and exploiting the same memoization process Complexity 0 Polytrees one undirected path between any two nodes linear in the number of nodes a Multiplyconnected nodes Exponential Department of Computer Science University of San Francisco p31739 1430 Scaling Up Many techniques have been developed to allow Bayesian networks to scale to hundreds of nodes Clustering A Nodes are joined together to make the network into a polytree A CPT within the node grows but network structure us simplified Approximating inference a Monte Carlo sampling is used to estimate conditional probabilities Department of Computer Science University of San Francisco p32739 r 10 f 50 N r4 NN 5 1431 Monte Carlo sampling example I 80 f 20 Start at the top ofthe network and select a random sample say it s true Draw a random sample from its children conditioned on true A PSprinklerCloudy true lt01 09gt Say we select False A PRainCloudy true lt08 02gt Say we select True A PWetGrassSprinkler false Rain true lt09 01gt Say we select true This gives us a sample for ltcloudy a Sprinkler Rain WetGrassgt As we increase the number of samples this provides an estimate of P cloudyDe erzletalers gwnml gtgtr9mgpm 1432 Other Approaches to Uncertainty a Default reasoning a Logical A Unless we know otherwise A follows from B o Dempster Shafer 1 Incorporates uncertainty about probabilities o Fuzzy logic A Allows partially true statements Department of Computer Science University of San Francisco p34739 I ll Z M Artificial Intelligence Programming Belief Networks Chrls Brooks Department of Computer science Unlverslty of san Francisco 141 Probabilistic Inference Previously we talked about rule based systems that can draw logical inferences HungryH0meT gt GoesT0HomeT Quickie e mart HungryHomcr GoesT0HomeT Quickie e mart We would like to extend this sort of operation to worlds with uncertainty PHungryHomcr 098 PG055T0H0mer Quickieemart lHungryHcmer 05 PG055T0H0mer Quickie e mart 143 Inference with the Joint Distribution We can use the axioms of probability to reason with the joint distribution Hum l li n Hum l lign Hum Normal Hum Normal sky Overcast sky sunny sky Overcast sky sunny Rain I 01 005 015 005 e Rain o 2 o 15 o 1 o 2 PRain PRain A Hum High A Sky Overcast v PRain A Hum Normal A Sky Overcast v PRain A Hum High A Sky SunnyVPRain A Hum High A Sky Sunny 01 005 015 005 035 We call this the marginal probability of rain ll 140 Probability Review Probability allows us to represent a belief about a statement or a likelihood that a statement is true PTain 6 means that we believe it is 600 likely that it is currently raining Axioms o 5 Pa 5 1 The probability of A y B is PA 133 7 PA A B Tautologies have P 1 Contradictions have P 0 142 Prior Probability The prior probability of a proposition is its probability of taking on a value in the absence of any other information PRain 01 POvercast 04 PSunny 05 We can also list the probabilities of combinations of variables PRain A Humid 01 PRain A eHumid 01 PO uercast A Humid 02 PO uercast A eHumid 02 PSu39rL39rLy A Humid 015 PSu39rL39rLy A eHumid 025 This is called ajoint probability distribution Can only enumerate discrete variables 144 Conditional Probability e we begin to make observations about the value of certain variables our belief in other variables changes Once we notice that it s cloudy PRain goes up this is called conditional probability Written as PRainlOloudy Palb 4W or Pa b PalbPb This is called the product rule 145 Conditional Probability 146 Bayes Rule PalbgtPbgt PblagtPagt And then divide by Pa 7 Pia bpgbz POW Pa This equality is known as Bayes theorem or rule or law F V Example POioudy 03 Often we want to know how a probability changes as a Pawn 02 result of an observation Recall the Product Rule Pcl0udy A ram 015 PaA 7 Pall7Pl7 Pcl0udy A mam 01 PaA l7 PI7IaPa P cl0udy A Ram 01 v We can set these equal to each other Clcrudy A Ra39m 065 Initially PRam 02 Once we see that it s cloudy PRamI0ioudy PW arias 05 wamnmmpmem imamms mmwm Devamiumvcmvmischmrummwmyn mmrm 147 Bayes theorem example 148 Independence v Say we know r If two variables do not influence each other we say they Meningitis causes a stiff neck in 50 of patients are Independent pltsmffNeckiMmmgms 05 Day ofweek and Rain are independent so Prior probability of meningitis is 150000 PIRWIMOW W PIR MITW9 FIRM Pmemngitis 000002 Independence is great if we can get it Prior pr bability Of a Stiff neCK is 1 20 t More commonly we ll see variables that are conditionally PstiffNeck 005 independent A atient comes to use with a stiff neck What is the This means that two variables are independent given the probability she has meningitis observation of a third variable PmemngitislstiffNeck PEsmffNeck mmwmspgmmnmmz 7 0 5x0 00002 7 Pstifflleck 0 05 00002 wamnmmpmem imamms mmwm Devamuxmmmvluscuu imamvisa ch mm 149 Conditional Independence 1410 Conditional Independence As an example consider diagnosing whether a patient Initially detecting a tumor will increase the probability of a cancer etecting ca um Ici Let s say we know that cancer can cause lung tumors de ieS I39Ule WI then increase Pcmcerltum0ri WhiCh Ptum0rlcancer gt 0 will influence Pcalmumlcancer We also know that cancer can produce a positive calcium 39 HoweVel39i once we know for sure Whether Cancer is true or serum test Pltmlmum cancergt gt 0 false tumor and calcmm cannot influence each other This is conditional independence two variables a and b are independent given the observation ofa third variable c mvzmumvcmvuiscuurummwavyn mpr Devamuxmmmvluscuu imamvisa ch am V In general we can use Bayes Rule to do this The problem is in working with thejoint probability distribution Exponentially large table We need a data structure that captures the fact that many variables do not influence each 0 er For example the color of Bart s hat does not influence whether Homer is hungry 1415 Summarizing uncertainty Notice that we don t need to have nodes for all the reasons why Mary might not call A probabilistic approach lets us summarize this information in aM 39 This allows a small agent to deal with large worlds that We call this structure a Bayesian network or a belief network 1411 Probabilistic Inference quot125 BayeSiaquot Network A Bayesian network is a directed graph in which each node is annotated with probability information A network has A set of random variables that correspond to the nodes in the network A set of directed edges or arrows connecting nodes These represent influence In there is an arrow from X to Y then X is the parent of Y Each node keeps a conditional probability distribution indicating the probability of each value it can take conditional on its parents values No cycles it is a directed acyclic graph 39 The topology of the network specifies what variables directly influence other variables conditional mmmmprscmimwwsmpr I I 1413 Burglary example 1414 Network structure Two neighbors will call when they hear your Each node has a conditional probability table This gives the probability of each value of that node given alarm Its parents values John sometimes overTeacts A These sum to 1 Mary sometimes misses the alarm 1 Two things can set off the alarm Nodes with no parents just contain priors Earthquake Burglary Given who has called what s the probability of a burglary DevanquotnavcmvurscuurHumanmy WWW Devammmoumvluscuu imamms mwp ism 1416 Implicitly representing the full joint distribution Recall that the full joint distribution allows us to calculate th s the probability of any variable given all the 0 er Independent events can be separated into separate ables These are the CPTs seen in the Bayesian network have a large number of possibly uncertain outcomes How would we handle this with logic Therefore we can use this info to perform computations 39 P111Q12ngt HPztlpmentszt PAAaEAaBAJAM PJlAPMlAPAl B A AEPABPAE 090 070 0001 0999 0998 000062 DevanquotnavcmvurscuurHumanmy mpr Devammmoumvluscuu imamms mwp ism 1417 Some examples Wnat is the probability that Both Mary and John call given that the alarm soun e 2 PMlA PJlA 00 70 063 Wnat is the probability of a breakin given that we hear an alarm PBlA 095 0001 Wnat is the probability of a breakin given that both John and Mary called PBlJMPBAJAMAAAaEVPBAAMAAAEVPBAJA MmAmEPBAAMMAAE This last example shows a form of inference Devanquotnavcmvluscuurunwmnvsn WWW 1419 Compactness and Node Ordering Bayesian networks allow for a more compact representation of the domain Redundant information is removed Example Say we have 30 nodes each with 5 parents Each CPT will contain 25 32 numbers Total 960 Joint 230 entries nearly all redundant Devanquotnavcmvluscuurunwmnvsn WWW 14 21 Conditional Independence in Bayesian networks Recall that conditional independence means that two variables are independent of each other given the observation of a third variable PaA blc PalcPl7lc A node is conditionally independent of its nondescendants given its parents Given Alarm JohnCalIs is independent of Burglary and Earthquake A node is conditionally independent of all other nodes given its parents children and siblings the children s other parents Burglary is independent of JohnCalIs given Alarm and Earthquake Devanquotnavcmvluscuurunwmnvsn mmpr 1416 uonstrucung a Dayesran network There are often several ways to construct a Bayesian The knowledge engineer needs to discover conditional independence relationships Parents of a node should be those variables that directly influence its value JohnCalIs is influenced by Earthquake but not directly John and Mary calling don t influence each other Formally we believe that P MaryCallsl JohnCalls Alarm Earthquake Burglary PMaryCallslAlarm mmnmmpmsum emum mmpr 1420 Building a network Begin with root causes Then add the variables they directly influence Then add their direct children This is called a causal model Reasoning in terms of cause and effect Estimates are much easier to come up with this way We could try to build from effect to cause but the network would be more complex and the CPTs hard to estimate PElB 7 mmnmmpmsum emum mmpr 1422 Conditional Independence in Bayesian networks mmnmmpmsum emum mm m u a vulluluullal unacpcnucnuc u Bayesian networks V General rules Two nodes are conditionally independent given their parents JohnCalls and MaryCalls given Alarm A node is conditionally independent of nondescendents given its parents JohnCaIls is conditionally independent of Burglary given Alarm A node is conditionally independent of all other nodes given its parents children and its children s other parents Datumnavcmvuvscuur Mannmy mmpr 1425 Enumeration v We can rewrite the probability PXle DZPX gt DrZPXey where y are hidden variables This is equivalent to summing within the joint probability distribution Consider PBurglaryl0hn0allsMamCalls orPBurglary A JohnCalls A MamCalls I This is aZE ZAPBEA J M For B true we must calculate PBlJ M WEE EA PBgtPEgtPAlBEgtPJlAgtPMlAgt Devanquotnavcmvluscuurumwwmyn mpr 1427 Enumeration This tree shows the computations needed to determine PBlJ M Notice that there are lots of repeated computations in here By eliminating repeated computations we can speed things up Datumnavcmvuvscuur Mannmy mmpr 1424 Inference in Bayesian networks In most cases we ll want to use a Bayesian network to tell us posterior probabilities We observe a variable how do other variables change We can distinguish between query variables things we want to know about and evidence variables things we can observe There are also hidden variables which influence query variables but are not directly observable JohnCalls and MaryCalls are evidence Earthquake and Burglary are queries and Alarm is hidden nevammmmmvuvscuu imamms mmpm 1426 Enumeration Problem a network with n nodes will require 0n2quot computations r We can simplify by bringing the priors outside the ummation S 093 XE WE EA PWB EgtPJlAPMlA Still requires 02quot calculations nevammmmmvuvscuu imamms WWWquot 1428 Variable Elimination We can reduce the number of calculations by caching results and reusing them Evaluate the expression from rightto left Summations are done only for nodes influenced by the variable being summed onsider PBlimgt093ZPBgtZQPalBaegtPilagtPmlagt Y Y WTTV The equation has been separated into factors Once we compute PMla and PMl a we cache those results in a matrix fM Similarly for PJla and Palm stored in f nevammmmmvuvscuu imamms mpr

### BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.

### You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

## Why people love StudySoup

#### "I was shooting for a perfect 4.0 GPA this semester. Having StudySoup as a study aid was critical to helping me achieve my goal...and I nailed it!"

#### "I signed up to be an Elite Notetaker with 2 of my sorority sisters this semester. We just posted our notes weekly and were each making over $600 per month. I LOVE StudySoup!"

#### "I was shooting for a perfect 4.0 GPA this semester. Having StudySoup as a study aid was critical to helping me achieve my goal...and I nailed it!"

#### "It's a great way for students to improve their educational experience and it seemed like a product that everybody wants, so all the people participating are winning."

### Refund Policy

#### STUDYSOUP CANCELLATION POLICY

All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email support@studysoup.com

#### STUDYSOUP REFUND POLICY

StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here: support@studysoup.com

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to support@studysoup.com

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.