### Create a StudySoup account

#### Be part of our community, it's free to join!

Already have a StudySoup account? Login here

# Artificial Intelligence CPSC 5185U

GPA 3.91

### View Full Document

## 10

## 0

## Popular in Course

## Popular in ComputerScienence

This 59 page Class Notes was uploaded by Earlene Cremin III on Sunday October 11, 2015. The Class Notes belongs to CPSC 5185U at Columbus State University taught by Staff in Fall. Since its upload, it has received 10 views. For similar materials see /class/221210/cpsc-5185u-columbus-state-university in ComputerScienence at Columbus State University.

## Similar to CPSC 5185U at

## Popular in ComputerScienence

## Reviews for Artificial Intelligence

### What is Karma?

#### Karma is the currency of StudySoup.

#### You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 10/11/15

Chapter 3 Reasoning with Uncertainty In this chapter we 39 quot tools for 39 with k 39 Ag that is not precise This is a major departure from standard programming practice in which the input data is assumed to be accurate How many times have you heard the phrase Garbage in 7 garbage out or just GIGO Here our motto is that We can reclaim garbage We begin our discussion with a review of classic logic and its methods of implication We shall review basic probability theory and then make the jump to Bayesian reasoning a tool for handling uncertainties in evidence Classical Methods Our investigation of methods of reasoning with uncertain evidence will lead us into a number of methods that are extensions of classical methods This suggests that we should brie y review these classical methods Among the topics we shall discuss are Boolean algebra set theory and probability theory Boolean Algebra Boolean algebra is an algebra developed over two values 7 True False or 1 0 and three basic operators AND OR and NOT Depending on the student s introduction the following notation will be familiar for the basic operators AND A o B A B 7 true only if both A and B are true OR A B A v B 7 false only ifboth A and B are false NOT A A FA 7 the rst two notations should be familiar to those who have taken CPSC 5 155 In classical logic we do have the following rule that is generally applied If NOT A is True then A is False and IfNOT A is False then A is True This metarule called the law of the excluded middle states the obvious fact that there are only two possible values in Boolean algebra True or False and that every variable must take one of the two values In this discussion we ignore some practical dif culties that arise in actual software and hardware implementation of logic designs Software can have the problem of uninitialized variables in which a variable is used before it is assigned a value In some sense an uninitialized Boolean variable can be considered not to have a value so it is neither True nor False but this is stretching things a bit Similar problems arise in hardwire designs where we can have a signal that is in a state called High Impedance because it has not been given a value In both cases we are dealing with dif culties in actual implementation of Boolean logic not with any ambiguity in the logic itself Boolean algebra is a powerful tool with very many important theorems some of which are quite counterintuitive However it is not suf cient for handling uncertainty In our attempt to make computer architecture fans feel at home we list some of the classical de nitions and theorems of Boolean algebra using the CPSC 5155 notation Page 3 l Truth Tables A Boolean variable can take only two values either True or False also denoted as l or 0 Suppose that we have two variables X and Y How many values can the pair take It is easy to demonstrate that the pair of variables can take only four distinct values X False Y False or X 0 Y 0 X False Y True X True Y False X True Y True gtltgtltgtlt 0 1 1 lt lt lt 1 0 1 It is easy to use an inductive proof to show that any set of N Boolean variables can take 2N possible values As a result of this fact one can de ne a Boolean function of Boolean variables by specifying the value of the function for every possible combination of the input variables This display is called a truth table A truth table has one row for every possible combination of the input variables 7 thus a truth table for N variables has 2N rows As the table at right shows truth tables are useful for de ning functions for two or three variables marginal for functions of four variables and much too large for any functions of more than four variables Imagine trying to work with a truth table description of a Boolean function of seven Boolean variables 7 such a table would have 128 rows and be very hard to understand In most courses we confine ourselves to functions of four or fewer variables and are sparing in the use of truth tables of four variables 7 128 Let s use the truth tables to define the three basic Boolean operators AND OR and NOT The first two are binary operators so they require truth tables with 22 4 rows while the last is a function of a single variable requiring 21 2 rows AND X Y The first Boolean function we define is the logical AND function The logical AND of two variables is True if and only if the values of both of the variables are True The logical AND function can be extended to more than two variables by the observation that the AND of N variables is True if and only if the values of all the input variables are true 7 a single False input makes the AND to be False OR The next Boolean function we define is the logical OR function The logical OR of two variables is False if and only if the values of both of the variables are False The logical OR function can be extended to more than two variables by the observation that the OR of N variables is True if the value of any input variable is true NOT The Boolean NOT function is a function of one variable only Page 3 2 We can use truth tables to compute the value of any Boolean function although we want to use other techniques for functions of ve or more variables The method for truthtable evaluations is to show the value of all of the intermediate terms that are used to make up the nal answer and then to form the nal answer Consider the Boolean function of three variables X 0 Y Z For this and much of what follows the notes will use 0 for False and l for True XYZYZ 0000 Formal De nition of Boolean Algebra Boolean algebra may be given a formal de nition based on the set of permissible values and the three basic operations Students who have taken any computer architecture course will recall the use of the XOR function exclusive OR This function is not fundamental to the de nition of Boolean algebra and is not required for this course so we ignore it De nition A Boolean algebra is a closed algebraic system containing a set K of two or more elements and three operators two binary and one unary The binary operators are denoted for OR and o for AND The unary operator is NOT denoted by a overbar placed over the variable The system is closed so that for all X and Y in K we have X Y in K X 0 Y in K and NOTX in K The set K must contain two constants 0 and l FALSE and TRUE which obey the postulates below In normal use we say K 0 l 7 set notation The postulates of Boolean algebra are Pl Existence of0 and l a X 0 X for all X bXo l XforallX P2 Commutativity a X Y Y X for all X and Y bXoYYonorallXandY P3 Associativity a X Y Z X Y Z for all X Y and Z bXo YoZXoYoZforallX Y andZ P4 Distributivity a X 0 Y Z X 0 Y X 0 Z for all X Y Z bXYoZXYo XZforallX YZ P5 Complement a X X l for all X bXo X 0forallX Page 3 3 The theorems of Boolean algebra are Tl Idempotency a X X X for all X bXoXXforallX T2 1 and 0 Properties a X l l for all X bX000forallX T3 Absorption a X X 0 Y X for all X and Y bXo XYXforallXandY T4 Absorption a X X 0Y X Y for all X and Y bXoX YX0Yfora11XandY T5 DeMorgan s Laws a m X 0 Y b XoY Y Y T6 Consensus aXoYXoZYoZXoYXoZ bXYoXZoYZXYoXZ As examples of classical proofs we show the truth of two of the stranger Boolean assertions We first show the truth of the lproperty something that appears foreign to standard algebra The table at left shows the Boolean OR function using the notation of 0 0 for False and l for True Recall that the value of the OR is False or 0 if and only if the values of both of its inputs are False or 0 One may conclude the truth of the theorem from this truth table or use the truth table below if one wants to be explicit The truth table at right shows the FX X l the Boolean function of one Boolean variable Recall that this is not addition but a function which many would write as FX X v 1 Thus we have a truthtable proof of what appears to be a fairly strange theorem in Boolean algebra We now show the truth of the second distribitivity postulate 7 another one that does not resemble standard algebra Here we use another trick that is possible in Boolean algebra There are only two possible values ofX X 0 False and X 1 True We consider each possibility in turn and use the lproperty just proved above to complete our proof XY0ZXYoXZforallXYZ ForX0 LHS0Y0ZY0Z RHS0Yo0ZYoZ ForXl LHS1Y0ZlbyT2that1XlforallX RHS1Yo1Zloll Page 3 4 Implication in Boolean Reasoning Here we deal with statements of the form If A is True Then B is True often stated more simply as If A Then B Symbolically this is often written as A 3 B Another way of writing the implication in Boolean algebra is FA v B We use a truth table to evaluate this expression and understand it Here we use F for False and T for True in either CPSC 2105 or CPSC 5155 we would use 0 for False and l for True A B One interesting feature of this formulation of the implication A 3 B is that it is true when the assertion A is false corresponding to the slogan False implies anything Note that the implication is false only when A is true and B is false Associated with every implication of the form A 3 B are three other forms two of which are of interest to us in these lectures The converse B 3 A and the contrapositive TB 3 FA In classical logic the truth of the contrapositive is equivalent to the truth of the original statement while the truth of the converse is independent of that of the original statement Contrapositive 2B3A The equivalence of this to the original comes from the fact that the implication A 3 B states that whenever A is true that B is also true It then follows that when B is false so must be A This can be seen by examining the truth table for FA v B the alternate form of the implication If B is false then the implication is true only when A is false Consider the following example implication and its contrapositive Implication If X is elephant then X is animal Contrapositive If NOT X is animal then NOT X is elephant Given the assertion that The Eiffel Tower is not an animal we may immediately assume that it is not an elephant This example is so simple that it is ridiculous Converse B 3 A The easiest way to demonstrate the independence of the converse is by example We begin with the example above Implication If X is elephant then X is animal Converse If X is animal then X is elephant This author believes that it is obvious that not all animals are elephants or is he an elephant lecturing to a herd of elephants and so the converse of this implication must be false Page 3 5 Consider however the following pair of assertions related to real numbers and the absolute value function written as W Implication If X 0 then le 0 Converse Ifle 0 then X 0 Here we see both the implication and its converse being true Given this example and the one above we must conclude that the truth values of an implication and its converse are independent After one more observation on the above we formalize this observation Consider the implication written above The fact that its converse is true is due to the fact that the implication itself might be seen as part of a definition What we really have in the first statement is as follows Assertion 1 If X 0 then le 0 and also Assertion 2 If X at 0 then le 0 The converse of assertion l is the contrapositive of assertion 2 thus insuring that it is also true Suppose that we have the following pair of statements both of which are true Assertion l IfA then B Assertion 2 IfB then A If we represent Assertion 2 by its contrapositive we have the following equivalent pair Assertion l IfA then B Assertion 2 If TA then TB Either of these two pairs is equivalent to and can be written as A if and only if B or more succinctly as A iff B also written as A cgt B The relevance of this comes out when we consider the problem addressed by Bayesian reasoning given A 3 B and B is true what can we say about the truth value of A If we know that A cgt B then the truth of B implies the truth of A If we know only that A 3 B the truth of B may or may not imply the truth value of A Bayesian reasoning allows us to make some statements about the truth value of A Set Theory The theory of sets arose as an attempt to formalize a number of intuitive ideas so that they could be used as the basis for formal mathematical proofs We begin with a problem that motivates our use of set theory It all depends on what the meaning of the word is is 7 Bill Clinton Consider the following two arguments Argument 1 4 l is 5 3 2 is 5 so 4 l is 3 2 Argument 2 A lime is green A dollar bill is green so a lime is a dollar bill Page 3 6 We see that when used in informal speech the word is can have at least two meanings only one of which is equivalent to the mathematical symbol which by de nition is transitive ifA B and B C then A C The following set of statements illustrates at least three meanings of the word is l 4 3 is 7 2 An elephant is an animal 3 This girl is 18 years old 4 The lime is fty cents Before we formalize a notation to express the last three statements precisely we should mention an old stande joke in logic which depends on the listener knowing that Ray Charles is a famous musician who happens to be blind God is love Love is blind Ray Charles is blind So God plays the piano The basic difference between the first statement and all of the rest shown above is that first is an equality possible to be written as 4 3 7 The other statements imply membership in some sort of class of objects Consider the following slightly more precise argument A lime is a member ofthe class of green objects A dollar bill is a member of the class of green objects One may conclude from this argument only the statement that it is possible for a lime to be a dollar bill not that it is one Definition of a Set A set is an unordered collection of distinguishable objects called the members of the set or the elements of the set If an element X is a member of the set S we write X e S If X is not a member of the set S we write X e S A number of sets commonly seen in mathematics are Q 7 the empty set that is the set containing no members N 7 the set of natural counting numbers that is the set 0 l 2 3 Z 7 the set of integers Some sets are defined by properties and some by listing all the members Thus Z7 0 l 2 3 4 5 6 is the set of integers modulo 7 N l N 20M and M e Z is the set of even integers Read this eXpression as the set of all N such that N 20M for some integer M By the term lSl we denote the size or cardinality of a set S Some sets such as Z7 are finite and others such as N are not finite The theory of infinite sets can lead to some truly bizarre results fortunately not of interest to us at this point Note that lQl 0 Page 3 7 We can now formalize the statement Jumbo is an elephant as the following set expression Jumbo 6 X l X is an elephant that is we claim that Jumbo is a member ofthe set of animals that we call elephants Note that a member of a set is never equal to a set although certain sets may have only one member e g the set of all even prime numbers The fallacious arguments presented above can be reduced to the following statement is set theory We note that the problem as stated does not give us enough information Given a set S with lSl gt 1 and X e S and y e S can we conclude that X y In passing we note that the following two statements are provable easily 1 Given a set S with lSl l and X e S and y e S we conclude that X y 2 Given a set S with X e S and y e S we conclude that X at y Subsets and Proper Subsets If all elements ofa set A are also elements of a set B we say that set A is a subset of set B More formally if we have two sets A and B such that X e A implies X e B we then write A lt B and say that A is a subset of B IfA g B but A at B we say that A is a proper subset of B and write A C B It should be obvious that for any set A we have A g A The following is a basic theorem of set theory Theorem For any two sets A and B A B if and only ifboth A g B and B Q A Subset inclusion is transitive so if A g B and B Q C then A g C The empty set is a subset of all sets so Q g A This nonobvious result is probably asserted in order to simplify the statement of a number of theorems in set theory Operations on Sets We define the following standard set operators for any two sets A and B Intersection Am B X l X e Aand X e B Union AUBXlXerrXeBorboth Difference AiB X l X e A and X e B If should be obvious that A n B Q A U B and A 7 B Q A U B As a matter offact we can easily prove the following three stronger assertions AntAgAUBand AntBgAUBand AiBgAgAUB Page 3 8 Both the setmterseetron and setumon operators are eommutatave thus Am B B n A and 1t shouldbe obvlous that the set dlfference operator ls not eommutatwe The followlng Venn dlagram rllustrates the dlfference A B AUB AnB AB BA The last operatorwe shall dlscuss at ths trme ls the enmplement of a set Informally we say Conslder the setB the set ofeven mtegers The eomplement of ths set ls denoted by E E7 huttht ofalarger set ealled the universe or universal set Thls unlversal set dlctates the objects eomplement ofthe set as follows Assume setA gU where U ls aunwersal set eontammg all the elements othe dlscusslon Then the complementofAls A UeA mu 6 Uandx e Agtthatrsxrs an element of eomplement of A ls the set A EA andB are two mte sets rtrs easy to prove that lA U Bl Bl SW 3 V 39 lmposslble to make statements on the slze ofAU B AllBlelAnBlfrom lM w V Pag2379 We use to the set eomp1emeht These are ea11ed DeMorgan39s 1aws KnE m AUB MB We e1ose thxs seeuoh wah the observauon that1 X 1 then 1X1 3 1Y1 so thatwe have the followmg two statements 1AnB1s1A1s1AUB1s1A11B1 1AnB1s1B1 SAL B1s1A11B1 Prnhzh ity as aresult ofwhat ean be consxdered sponsoredresearchquot by anumber of neh gamblers who F F r smee budge know of at 1east one other gameronented use of probabxhty e the Goheh pomt meh m1wd1r humbenh the closedmterval 0 0 1 01o 0 0 910 We shou1dhote 1mmedate1y that e 1 0 offer 1 0 equxvalently spemfymg that 0 0 lt p lt 1 0 Probabxh and Odds s then be10 ep as we are spemfymg that either an event occurs ont does not The odds assomated wnh the event are someumes quoted as 1 0 ea to P Page3e1o Consider a day of the week selected arbitrarily What are the betting odds that it is a Sunday The probability that any given day is a Sunday is 17 The probability that it is not a Sunday is 10 7 U7 67 So PDay is Sunday l7 PNot Day is Sunday 67 The odds would be 67 to l7 or simply 6 to 1 Note a number of factors in this very simple example 1 We postulate an experiment for which an outcome is expected 2 We can give an exhaustive list of all possible outcomes of this experiment 3 Each of these outcomes is mutually exclusive 4 The probability of each of the mutually exclusive events can be established In the above example the experiment is the question about the day of the week The possible outcomes of the experiment can be given as an exhaustive list Sunday Monday Tuesday Wednesday Thursday Friday Saturday of mutually exclusive outcomes 7 it cannot be two days of the week at the same time Absent any other information we assume that the probability of each outcome is the same so that the probability that for any given day is l in 7 We should immediately distinguish the odds above from what one would get in a casino or from one s friendly bookie A casino has a payout rate indicating that it must keep some money to pay staff and make a profit Suppose we played our day game in a casino with a 95 payout rate excellent The odds would be 60095 to l or 57 to 1 Methods of 39 quot quot P 39 39 quotquot39 How do we establish the probabilities associated with the possible outcomes of an event The book would have you believe that there is only one method which admittedly is correct and quite often used 1 List all of the outcomes of the event and assume that they are mutually exclusive 2 Assign a probability to each outcome The list of all possible outcomes approach works when the set of all possible outcomes is finite and of small size In the day experiment above the set of all possible outcomes has seven elements 7 the days of the week There are some experiments that are trickier to analyze Consider the coin toss experiment discussed in the textbook The book posits two outcomes 7 heads and tails All US and BritishCommonwealth coins and many other coins have a head depicted on one side and something else on the other side called tails There are examples of coins with heads depicted on both sides but we ignore them The book posits two equally probable outcomes for a coin toss experiment 1 The coin lands with heads up and 2 The coin lands with tails up Page3 ll Since each of these can occur with equal frequency in a fair coin the book posits that the probability of each event is 05 But note 7 the list of events is not exhaustive It is theoretically possible for a coin to land on its edge this author has seen a coin do just that in a coin toss in which the coin eventually fell over with the tails side up What is the probability that a coin will land on its edge and not fall over Any student who has a lot of time to waste should take a nickel coin and try to get it to land on its edge and stay upright Before leaving this topic we should remind ourselves not to confuse random events with those events that are hard to predict Many good magicians can toss a coin in such a way as to produce the desired outcome with a very high probability Note that we can establish probabilities for some events having an in nite number of outcomes Here is one select a positive integer at random What is the probability that the integer is an odd number Most of us can correctly answer that the probability is 05 as it is for an even number also without attempting to count the integers Gambling odds especially for events such as horse races and dog races are based on a large number of individual estimates of the likely outcomes Consider a recent Kentucky Derby in which the favorite horse was named Seattle Slew The consensus ahead of the race was that Seattle Slew would win so that of every 1000 bet on the race about 700 say was bet on that horse to win Since 1000 7 700 300 that would set the odds at 300 to 700 or equivalently 3 to 7 Every 100 bet would return 1000107 m 142 more likely something like 135 allowing for the track to make a profit Speaking of gambling does anyone play the stock market There is a version of stock trading called day trading that is so risky that it should be considered as another form of legalized gambling An Early Look at Reasoning from Evidence We have a bit more to discuss before be begin on this topic but might as well say something now Consider the coin toss experiment specified above We always assume in these thought experiments that the coin is fair that is that the coin has not been specifically modified to favor one side or the other Were we to perform ten coin tosses and get ten heads in a row we might reason from that evidence that the coin is not fair In making an assessment such as the above we must admit that it is possible for ten successive tosses ofa fair coin will yield ten heads The probability of such a sequence of events is easily computed as 1210 11024 m 9770104 leaving it more probable that the coin is not fair A prudent person would certainly want to examine the coin very closely A similar story can be told about bridge games in which 52 cards are dealt to four players each of whom gets 13 cards After each hand is played the cards are shuf ed and dealt again The frequent occurrence of unusual hands usually indicates that the shuf ing has not been done completely and that high cards from the previous deal have been aggregated Page 3 12 More on Classical Probability We now pick up with the book s discussion on pager 58 7 60 We begin with the discussion of success and failure the latter term being de ned as not success This is a slightly different way to look at the probabilities associated with a number of outcomes of an event Each outcome succeeds if it occurs otherwise it fails The book begins its introduction on conditional probability with a slightly puzzling example Consider a fair die a cube with the numbers 1 through 6 displayed one per face The probability of each number being the outcome is l in 6 We now ask what is the probability of getting a 6 Obviously the answer is 16 Suppose that we throw a die and ask a person to guess the result of the throw specifically noting that the result is not a 1 There are only five other options if the result is not a 1 so the probability of a 6 given that assumption is l 5 We now consider events that are mutually exclusive and those that are not Two events are mutually exclusive if they cannot occur at the same time The 2 events Today is Monday and Today is Thursday are mutually exclusive They cannot occur at the same time Consider the following two events Today is Thursday and Today is Thanksgiving referring to the United States national holiday The two events are not mutually exclusive more so because the holiday is defined to fall on a Thursday We now focus on conditional probability which forms the basis for our future discussions By the notation PA l B read as The probability of event A given that event B has occurred we denote the conditional probability of A given that B is true We can define this number as PA l B NA B NB where NA B is the number of times events A and B occur together while NB is the number of times event B occurs Assume for the moment that a year has exactly 52 weeks 364 days and define the events Let A today is Thanksgiving Let B today is Thursday We now reason from the number of times each event can occur in 1 year of 52 weeks Since Thanksgiving occurs only once per year and always on a Thursday we conclude that NA B 1 Since there are exactly 52 Thursdays in our year we conclude that NB 52 Thus P Today is Thanksgiving l Today is Thursday l52 m 19201072 We now examine a related problem Let A today is Thursday Let B today is Thanksgiving Page 3 l3 As before we have NAB 1 but NB 1 as Thanksgiving comes once a year Thus we have PA l B 1 following the de nition that Thanksgiving must occur on a Thursday We now consider joint probabilities the probability that two events A and B occur denoted by PA n B using the set intersection operator to denote both A and B If the two events are independent then we have the obvious formula PA n B PA PB The formula above does not work when the two events are not independent Let s look at the example above again assuming our perfect 52 week year that always has 52 Thursdays A today is Thursday PA 17 B today is Thanksgiving PB 1364 our perfect year has 364 days But note that PA n B 1364 also because if today is Thanksgiving it must be Thursday We now arrive at the basic formula for conditional probability PA l B In words this states that the probability of A given that B is true is the probability of both A and B being true divided by the probability of B being true Consider the probability P Today is Thanksgiving 1 Today is Thursday Let A Today is Thanksgiving PA 1364 in our simple year Let B Today is Thursday PB 1 7 in our perfect 52week year 1 We argued above that PA n B 1364 so we calculate PAl B E i xiii Another way to state the above formula is PA n B PA l B o PB We can also state an equation for the conditional probability of event B occurring given that event A has already occurred This is just a restatement of the above formula so PB A PBnA PA Each of PA n B and B n A is a way of stating the same joint probability of both A and B occurring so the two should be equal and we have PA l B o PB PB l A o PA which translates into the following basic equation called the Bayesian rule or PB n A PBl A PA PA B PBl AoPA PB where PA l B is the probability that event A occurs given that event B has occurred PB l A is the probability that event B occurs given that event A has occurred PA is the probability of event A occurring and PB is the probability of event B occurring Page 3 14 To see how this applies consider the production rule in the following form which includes a probability P 00 ltP S 10 If H is True then E is True with probability P What we are saying in the language of the recent discussion is that PE l H P With a simple change in language we can begin to reach our goal of arguing for a hypothesis based on evidence In the statement above substitute H for A and E for B to get the following PH E PE l HoPH PE While the formula just derived is correct it is not as useful as it might be because it is not stated in terms that we commonly use to argue for a case To derive the more useable form of the equation we need to return to a variant of the first form of the Bayesian rule Consider the equation PB n A PB l A o PA Substituting TA for A in the equation we get the result PB n EA PB l EA 0 PEA It should be obvious that PB PB n A PB n EA since A and EA form an exhaustive list of mutually exclusive possibilities either A is true or it is not Thus we get PB PB n A PB n EA PB l A o PA PB l EA 0 PEA We substitute this for PB into the formula PAl B to obtain the basic PBl A PA PB form that we shall use PAl B PB A P A We now arrive at the form that we shall find useful PH E PElHoPH PEl H PHP ETH oP EH where PH is the probability of the hypothesis being true PE l H is the probability that the evidence is true given that the hypothesis is true PEH is the probability that the hypothesis is false PE l EH is the probability that the evidence is true when the hypothesis is false and PH l E is the desired result the probability that the hypothesis is true given that the evidence is true While the form of the Bayesian rule just stated may be helpful we need to develop it a bit more in order to get the full use of it Page 3 15 Extensions of the Bayesian Rule So far we have considered two events one called A and one called B Suppose that the second event can have a number of mutually exclusive outcomes B1 B2 Bn 7 also referred to as events B1 B2 B We have two requirements on the second set of events B1 U B2 U U Bn U ithat is that the union of all the Bk is the universal set or that every possible outcome is contained in one Bk B n BK Q for J at K that it the set of possible outcomes is mutually exclusive In order to illustrate some of the concepts including the above we introduce our sample card deck This deck has four suits Spades Hearts Diamonds Clubs as are found in most playing cards The difference is that we shall have only five cards Ace King Queen Jack 10 in any suit as opposed to the thirteen found in a normal card deck Thus the deck has only twenty cards a choice made to minimize some counting difficulties The probabilities associated with this simple deck are also simple PCard is Ace 15 PCard is Spade 14 PCard is King 15 PCard is Heart 14 PCard is Queen 15 PCard is Diamond 14 PCard is Jack 15 PCard is Club 14 PCard is 10 US For convenience and in order to be able to generate events that are not independent we adopt the following conventions for subsets of the card deck Major Suite Spades Hearts Minor Suite Diamonds Clubs Face Cards King Queen Jack For a card drawn at random we have the following probabilies P Card 6 Major Suite 12 P Card 6 Minor Suite 12 P Card 6 Face Cards 35 Note that each of the two columns forms an exhaustive list of mutually exclusive events More specifically 7 a card must belong to exactly one of the four suits Spades Hearts Diamonds Clubs and cannot belong to two or more suits The same statement can be made about the five ranks of the cards in our sample deck Consider the following events associated with drawing a card from this deck of 20 cards the card is an Ace B1 the card is a spade B2 the card is a heart B3 the card is a diamond and B4 the card is a club Page 3 l6 We now de ne the following compound events A n B1 the card is the Ace of Spades A n B2 the card is the Ace of Hearts A n B3 the card is the Ace of Diamonds A n B4 the card is the Ace of Clubs Consider now the following conditional probability PA 1 B1 7 the probability that a card is an Ace given that it is a spade We may compute easily by noting that there are five spades in the deck only one of which is the Ace thus leading to the probability PA 1 B1 15 This is exactly PA because the event A is independent of the events B1 B2 B3 B4 but we need a simple example We also note that PA n B1 PAl B1 0 PBl15 0 14 120 This is as expected since there is exactly one Ace of Spades in the deck of twenty cards We note that for each of the four suites B1 B2 B3 and B4 or Spades Hearts Diamonds and Clubs that PBK 14 PAlBK15 and PA n BK 1 20 Summing up the probabilities PA n B1 PA n B2 PA n B3 PA n B4 US we see an example of equation 313 on page 60 ofthe textbook M PA ZPA n BK K1 which can be shown to be equal to M PA Z PA I BK P Equation 1 K1 To make use of this last formula we rewrite Equation 311 on page 80 as PBl A NAB W Equation 11 In our equation I we substitute E for A and H for B to get M PE ZPE HKo P K Equation III K1 In our equation 11 we substitute E for A and H J for B to get PE HJ PH P PHJ l E Equation IV PE HJ P I ZPE HK PHK which becomes our main result PHJ l E 319 Page 3 17 Before we move to the discussion of multiple evidences and multiple hypotheses let s do two examples with single evidence and multiple hypotheses Question What is the probability that a card is a king given that it is a face card Let E be the observation that the card is a face card H1 be the hypothesis that the card is an Ace H be the hypothesis that the card is a King H3 be the hypothesis that the card is a Queen H4 be the hypothesis that the card is a Jack and H5 be the hypothesis that the card is a 10 In terms of this discussion we are computing the value of PHz l E We have already seen that PHl PHz PH3 PH4 PH5 02 We now compute the odds PE 1 HJ Again the example is a bit simplistic because we are essentially working from definitions 7 either the card is a face card or it is not We must start somewhere PE 1 H1 00 ifthe card is an Ace it is not a face card and thus the the probability of observing a face card given that the card is an Ace is 00 PE 1 Hz 10 if the card is a King then it is a face card by definition PE 1 H3 10 ifthe card is a Queen then it is a face card by definition PE 1 H4 10 ifthe card is a Jack then it is a face card by definition PE 1 H5 00 ifthe card is a 10 then it is not a face card by definition To compute PHz l E we apply equation 319 with J 2 specified PElH2 P 2 1002 ZPEHK0P K 000210o0200210020002 PHz l E 02 1 7 7 which is not a surprise as 06 3 there are exactly three face cards one of which is a King Doing the simple math yields the result that PHz l E We now move to the next problem 7 reasoning from multiple strands of evidence In terms of the textbook we have a set of evidences and a set of hypotheses The evidence set is E1 E2 EN N different observables The hypothesis set is H1 H2 HM M different hypotheses In a minute we shall attempt another card example with the following evidences E1 the card is seen to be a face card E2 the card is seen to be a major suit Page 3 18 But rst the formula that says it all 7 in fact it says so much that it is hard to use What is the posterior probability of a given hypothesis H1 given a set of evidences E1 EN 13HJ E1E2EN 320 211E1E2quotEN l HK PHK There is a dif culty here 7 it is the evaluation of the conditional probabilities of all possible combinations of evidences More speci cally PE1EzEN l HJ is hard to compute as is Experts have decided on a workaround that makes the problem tractable This is to assume that the evidences are independent so that PE1E2EN l HJ PE1lHJ0 PE2 l HJ 0 0 PEN l HJ We apply this to the above equation to get the useable version PE1 H1 PE2 l PEN HJ PHJ PHJ l E1E2EN M PE1l HK PE2l HKPENl HK PHK 321 Example The book has an excellent example we shall solve another problem Again our problem will be a bit simplistic due to the independencies seen in the probabilities Question In our sample deck what is the probability that the card selected is the Jack of Hearts given that it is a face card and a major suit We have two evidences or observations that can be applied here E1 the card is a face card E2 the card is a major suit We use atable to organize the twenty hypotheses H1 H2 H20 that we deal with This set of twenty hypotheses comprises an exhaustive and mutually exclusive set The table should be read as follows H1 the card is the Ace of Spades H9 the card is the Jack of Hearts H17 the card is the King of Clubs etc Page 3 l9 We can solve this problem the easy way by noting that there are only siX cards that are both face cards and in a major suit Since the Jack of Hearts is one of them the probability that the card is a Jack of Hearts given that it is a face card in a major suit is 16 We now do it the hard way K 1 005 In the above table we note that P EllHKoPEleKoPHK is nonzero only for the indices K e 2 3 4 7 8 9 Speci cally we have the following PE1lHKoPEleKoPHK 005 for K e 2 3 4 7 8 9 PE1lHKoPEleKoPHK 000 for K g 2 3 4 7 8 9 Given this we compute the probability that the card is a Jack of Hearts given that it is both a face card and in a major suit The result is given below 005 005 1 PH9 l E1132 7 as expected 005005005 005005005 030 6 Page 3 20 Likelihood of Suf ciency and Likelihood of Necessity We now discuss two measures that are useful in writing expert systems Unlike probabilities these measures can have values that are any nonnegative number and are often greater than 10 In practice these measures depend on judgments made by domain experts Chapter 34 of the text pages 65 7 72 describe these two measures in terms of a weather prediction problem with two possible predictions 7 the weather tomorrow will be rainy or the weather tomorrow will be dry We reason on the weather today Let E be the evidence Today is rain 7 obviously that it is raining some time today Let H be the hypothesis Tomorrow is rain 7 obviously the claim that it will rain tomorrow Thus we have two more terms TE 7 the observation that today is dry or not rainy SH 7 the claim that tomorrow will not be rainy The rst measure likelihood of suf ciency is the likelihood that the evidence will support the hypothesis We use the following two terms in the equation for this measure PE H 7 the probability of the evidence given that the hypothesis is true and PE SH 7 the probability of the evidence given that the hypothesis is false PE H PE H likely that the evidence is present given that the hypothesis is true than it is that the evidence is present given that the hypothesis is false thus the evidence supports the hypothesis The equation is LS If LS is high then PE H gt PE TH and it is more Consider the implication of LS 1 This indicates that PE H PE TH or that the probability of the evidence given the hypothesis is equal to the probability of the evidence absent the hypothesis thus the hypothesis is probably independent of the evidence The next measure likelihood of necessity is a measure of how strongly the absence of the evidence implies that the hypothesis is false The measure is based on the two terms PTE H 7 the probability that the evidence is absent given that the hypothesis is true PTE SH 7 the probability that the evidence is absent given that the hypothesis is false P E H Thrs measure 1s a b1t tr1cky because the equatron 1s LN i so that low values of P E H LN imply that the absence of the evidence implies that the hypothesis is false The book notes that the measure LN must be obtained independently of the LS measure We now examine the rules given on page 65 Page 3 21 We begin our analysis by considering a modi cation of table 33 on page 66 We focus only on the actual weather for the day Rainy or Dry and how often we observe one of the four following R gtR R gtD D gtR D gtD today is rainy and tomorrow is rainy today is rainy and tomorrow is dry today is dry and tomorrow is rainy and today is dry and tomorrow is dry 12 5 5 8 Note that we draw no conclusion for day 31 as we do not know the weather for April 1 Page 3 22 Let us now crunch these numbers in the context of Rule 1 If Today is Rain Then Tomorrow is Rain The evidence is the observation today that it is raining so we are looking at a rule of the form If E then H with E today is rain so TE today is dry H tomorrow is rain so TH tomorrow is dry We now derive the basic measures from the table R gtR EgivenH PElH 1230 R gt D E given DH PE i H 530 D gt R DE given H PCB i H 530 D gt D DE given DH PCB i H 830 So we now compute our two measures WM wzn 24 PEl H 530 5 P E H 530 0625 P El H 830 8 LN Fortunately these are not too far from the values found in the textbook One might legitimately criticize the value 0625 as pretending too much knowledge and round the value to 06 as found in the book s example One might note that rule 2 is really not independent of rule 1 but just states it differently The rule is If Today is Dry Then Tomorrow is Dry The rule might be quoted as If NOT Today is Rain Then NOT Tomorrow is Rain Those who are interested will note the following two dependencies l The LS ofrule 2 is the inverse ofthe LN of rule 1 2 The LN or rule 2 is the inverse ofthe LS or rule 1 These observations are another indication that rule 2 is a restatement of rule 1 Page 3 23 Certainty Factors The Bayesian theory of reasoning is quite powerful but suffers from one major drawback 7 it is not the way experts think In section 35 of the textbook we see a case study in which experts were asked to state Bayesian probabilities which were then compared to probabilities derived from the reasoning methods used by the experts It was found that the two values did not agree in fact the values were quite different Certainty factors are a construct designed to re ect the way experts think Because these factors are based on human experience they lack the mathematical foundations found in the classical probability methods Nevertheless certainty factors have been shown to work and form a valuable tool in the building of expert systems Certainty factors have values in the closed interval 7 10 10 thus 7 10 S CF S 10 There are three specific values of certainty factors and many extensions as shown in Table 34 ofpage 75 ofthe text The three values are as follows 10 Definitely True 00 Unknown 7 10 Definitely False In most of what we do we shall consider certainty factors to be in the closed interval 00 10 thus 00 S CF S 10 The reason for this is the following equivalence IfEThenHcf7C IfEThenTH ch for00SCSl0 Certainty factors are related to two interesting measures MB the Measure of Belief and MD the Measure of Disbelief Equations 329 and 330 relating the two measures to certainty factors appear to have a mistake in them The student will note the following expressions maxl 0 7 PH in the denominator of equation 329 and minl 0 7 PH in the denominator of equation 330 Neither of these seems correct and this author cannot find the equations in any other source on Arti cial Intelligence so we shall ignore the two measures in our discussions Propagation of Certainty Factors The basic form of a production rule with certainty factors is If E then H cf If the certainty factor for the evidence E is denoted cfE then the certainty factor associated with the rule is cfH E cfE 0 cf The book has a good example on page 76 Page 3 24 Given the rule If Sky is Clear Then Forecast is Sunny cf08 and the value cfSky is Clear 05 then we have cfH E 05 o 08 04 In plain English this is cfForecast is Sunny Sky is Clear 04 We now consider compound statements for the production rules First consider a compound antecedent linked with logical AND statements and then consider a compound with OR Logical AND Antecedent E E1 AND E2 AND EN Certainty factor cfE min cfEl cfE2 cfEN Logical OR Antecedent E E1 OR E2 OR EN Certainty factor cfE maX cfEl cfE2 cfEN Logical NOT Antecedent E NOTE1 Certainty factor cfE 7 cfE1 Not being happy with a pseudoEnglish explanation we now move on to some fancy math With a rule of the form If E then H cf and a compound antecedent we have AND cfH E1 n E2 n n EN min cfEl cfE2 cfEN 0 cf OR cfH E1 U E2 U U EN maXcfE1 cfE2 cfEN 0 cf Consider the following examples 1 If Sky is Clear And Forecast is Sunny Then Wear sunglasses cf 08 E1 Sky is Clear cfE1 09 E2 Forecast is Sunny cfE2 07 then cfE mincfE1 cfE2 min 09 07 07 and cfH E1 n E2 07 o 08 056 Page 3 25 2 If Sly is Overcast Or Forecast is Rain Then Take Umbrella cf 07 El Sky is Overcast cfEl 03 E2 Forecast is Rain cfE2 09 then cfE maXcfE1 cfEz maX 03 09 09 and cfH E1 U E2 09 o 07 063 Multiple Support Suppose that we have two rules as follows Rule 1 If A is X Then C is Z cf CF1 Rule 2 If B is Y Then C is Z cf CFz We consider how to assess the certainty factor if both rules re Note that what we have here is something different from the compound antecedents discussed above Compare this pair of rules to each of the following pair of rules Rule 3 If A is X But note 7 that for this to fire only one Or B is Y of the antecedents must be true In the Then C is Z cf CF3 case above both are true Rule 4 If A is X Both A is X and B is Y must be true And B is Y However we have considered only how Then C is Z cf CF4 to compute cf A is X n B is Y It should be intuitively obvious that multiple support for a hypothesis should make us more confident of that hypothesis Equation 335 on page 78 propagates certainty factors cf1cfz cf1 cf2When cf1gt0 and szgt0 Cf0f150fz cf1cfzcf1 cf2When cfl lt0 and sz lt 0 Cf1Cf2 otherwise 1 m1n cf1lalclel where cf1 is the confidence in the hypothesis established by rule 1 and cfz is the confidence in the hypothesis established by rule 2 Page 3 26 Chapter 7 Evolutionary Computation In this chapter we discuss another method of mimicking natural processes in our design of computer software This time we discuss evolutionary computation of which genetic algorithms are one of the major components There are two simple requirements for a problem to be solvable by genetic algorithms 1 That its arguments be expressible as binary strings and 2 One can always compare two partial solutions and determine which is better This second requirement eliminates many of the more interesting problems One consequence of the first requirement is that every partial solution to the problem be encoded by a binary string of the same length denote this length by M Consider a problem in which the solutions can be encoded as 24bit strings Two possible solutions would be 111100001111000011110000 and 111111000000111111000000 Genetic algorithms function by applying two basic operations to partial solutions in order to nd better partial solutions These operations might be called mutation and crossing The process is as follows one applies the operation to single partial solutions or pairs of partial solutions and then evaluates the resulting partial solutions If the newly generated partial solutions are better than the previous the new ones are kept otherwise they are likely discarded Genetic algorithms apply processes seen to operate well in nature changes are made randomly and the better ones kept Mutation Mutation refers to taking a partial solution and changing one or more bits In common practice only one of the bits is changed The bit to be changed is selected randomly The sequence 111100001111000011110000 can mutate to 111100001111010011110000 Note only one bit was changed Crossing Crossing the textbook calls this mating refers to swapping parts of two partial solutions It operates by selecting a bit number at random splitting each of the two partial solutions at that point and then recombining by swapping parts Consider the following two partial solutions which are split at the indicated point 111100001111000011110000 gt 111100001111000011110000 111111000000111111000000 gt 111111000000111111000000 Crossing at this point gives 111100001111001111000000 or 111100001111001111000000 and 111111000000110011110000 or 111111000000110011110000 Written on 7 192010 Chapter 7 Page 1 of 6 pages CPSC 5185 Introduction to Arti cial Intelligence Genetic Algorithms My Way The basic process of genetic algorithms as discussed on page 220 of the textbook comes down to a simple process 1 Start with a set of trial solutions generated by some method 2 Play with these solutions generating new ones and evaluating them 3 After some time stop the process and take the best solution We shall comment on each of the steps presented in the textbook Rather than using the biological language favored by the textbook and most authors we shall use the language of binary strings The student should note that we are saying the same thing These notes will present a problem not found in the textbook The problem is based on a common encryption scheme called XOR encryption The student who did not sleep through CPSC 2105 Computer Organization will note that this problem can be easily solved by a very direct application of Boolean algebra but here we shall attempt a solution by use of genetic algorithms It is better to pick an easy problem as an example The problem to be discussed involves cryptography The objective of cryptography is to send messages via insecure media such as by radio broadcast and have the contents of the message known only to those authorized to receive the message The sender of a message applies a cryptographic algorithm often called a cipher to convert a message in readable form called a plaintext into a disguised form called a ciphertext The assumption is that the ciphertext can be decrypted or converted to readable plaintext only by those who are authorized to receive the message Most ciphers operate by using a well known cryptographic algorithm and a secret key known to only the sender and recipient Cryptanalysis is the study of ways to retrieve the meaning of a ciphertext without the consent of either the sender or recipient There are various methods of cryptanalysis some of which are less elegant than others Two inelegant methods are stealing the key that enables one to decrypt the messages or rubber hose cryptanalysis involving the application of rather nasty methods of persuasion to a human who knows the key Some of the weaker cryptographic algorithms can be easily broken by use of wellknown attacks An attach itself is an algorithm that can be used to produce the plaintext of a message given enough ciphertext There are several classes of attacks A known plaintext attack on a cipher occurs when the hacker attempting to crack the code has both the plaintext of a message and its enciphered form For many of the weaker cryptographic algorithms this attack can yield the secret key and thus allow the cryptanalyst to read all future messages The goal of all modern cryptographic algorithms is to prevent such an attack There is a caution to be observed here One should not put too much trust in the security of ciphers In a welldocumented example of American cryptanalysis of Japanese diplomatic ciphers before the attach on Pearl Harbor the Americans intercepted messages to the Japanese ambassador to the US and had translated plaintext to the US Secretary of State before the Japanese ambassador received his copy from his code room Written on 7 192010 Chapter 7 Page 2 of 6 pages CPSC 5185 Introduction to Arti cial Intelligence The encryption method that is the basis for the problem is called XOR encryption with a 24bit key Take the three character string CSU We rst convert this to a 24bit string The ASCII code for C is 0X43 or 0100 0011 The ASCII code for S is 0X53 or 0101 0011 The ASCII code for U is 0X55 or 01010101 The string CSU becomes 010000110101001101010101 Suppose that we encrypt this with the 24bit key 011001100001101010110101 Recall that the XOR operator is de ned as follows X Y One can see the following Plaintext 010000110101001101010101 Key 011001100001101010110101 Ciphertext 001001010100100111100001 The problem to be solved is the known plaintext attack thus given that we know that the plainteXt 010000110101001101010101 is enciphered to 001001010100100111100001 using the XOR algorithm can we deduce the 24bit secret key used in the algorithm Step 1 Choose a way to represent solutions to the problem as xed length binary strings If the student prefers one can think of these as integers The 24bit solutions will be proposed 24bit keys which we shall test until we nd the correct one The size of the solution set will be 14 The choice of this very small number is explained later Most genetic algorithms work with a much larger number of potential solutions The crossover probabilities and mutation probabilities will be such that each iteration will produce mutations and crossovers Basically we shall ignore these values Step 2 The tness function for a partial solution will be obtained by using that as a key to encrypt the string CSU and counting the number of times that the cipherteXt produced differs from the known cipherteXt Notice that this tness function has the desired properties here that smaller values are better and that a value of 0 indicates that a solution has been found Written on 7 192010 Chapter 7 Page 3 of 6 pages CPSC 5185 Introduction to Arti cial Intelligence Step 3 Select the initial population of solutions X1 X2 X14 The book suggests a random selection but one may specify solutions if there is any reason to favor some Our solution set will begin with four nonrandom 24 bit partial solutions and 10 partial solutions derived from this set In other problems one might have an idea of a beginning solution Dr Tim Howard of the CSU Math Department is working on a problem called the Prisoners and Guards Puzzle This involves placing 1 s and 0 s on a square array and attempts to optimize the number of 1 s given a speci c constraint Dr Howard has a method for generating solutions that are known to be good and wants to use genetic algorithms to explore for solutions that are better It just makes sense to seed the initial solution set with those solutions that are known to be good Of course random selection of the solutions for the initial set is as good a method as any It should be employed if one has no reasons to favor another solution For this example I choose to generate nonrandom solutions for the initial set because that is easier Here are the four solutions that will form the basis of the initial set of 14 111100001111000011110000 000011110000111100001111 111111000000111111000000 000000111111000000111111 Note on Random Numbers It is not possible in software to generate a true sequence of random numbers Most generators such as the Microsoft function Rnd supplied with Visual Basic are pseudorandom generators that generate a sequence of numbers with a long repetition pattern that is one can generate a very long sequence before the numbers repeat This suffices for our purpose Step 4 Each potential solution is used to encrypt the known plaintext and the ciphertext produced is compared against the known answer The number of differences is counted For example one of our starting keys will be 111111000000111111000000 PlainText OlOOOOllOlOlOOllOlOlOlOl Thiskey llllllOOOOOOllllllOOOOOO Ciphertextproduced lOllllllOlOlllOOlOOlOlOl KnownCiphertext OOlOOlOlOlOOlOOllllOOOOl Differences This solution scores an 11 of possible 24 We would hope that the genetic algorithms quickly find a potential key with a smaller nonnegative score Written on 7 192010 Chapter 7 Page 4 of 6 pages CPSC 5185 Introduction to Arti cial Intelligence Steps 5 8 At each step we use the following algorithm to generate the set of 14 potential solutions 1 Pick the four potential solutions with the lowest score At the start we begin with the initial four solutions that are speci ed 2 Copy these four solutions into the new solution set 3 Mutate each of these four solutions to produce a new solution For the first mutation the point will be selected as p 17 For others pick a random number p 1 S p S 24 4 There are six different ways that the four numbers can be made into s 2set 1 2 1 3 1 4 2 3 2 4 and 3 4 Use these six pairings ofthe four best solutions to form six additional more by crossing at the same point For the first crossing the point will be selected as p 13 afterwards the value will be random At the beginning of the first iteration we have a set comprising 4 initially specified solutions 4 solutions obtained by mutating these solutions and 6 solutions obtained by crossing these specified solutions At the beginning of any other iteration we have a set comprising 4 best solutions from the previous iteration 4 solutions obtained by mutating these solutions from the previous iteration and 6 solutions obtained by crossing these solutions from the previous iteration We then evaluate each solution and pick the four with the lowest score because those will be closest to the desired key If any of the potential solutions has fitness value 0 we declare it the solution and stop the iteration If we do not have a solution we copy the four best partial solutions for use in the next iteration and discard the rest We then do steps 5 through 8 over again Here is the initial set of 14 partial solutions Some digits underlined only for clarity Firstwe havetheinitialfour llllOOOOllllOOOOllllOOOO OOOOllllOOOOllllOOOOllll llllllOOOOOOllllllOOOOOO OOOOOOllllllOOOOOOllllll The fourmutations atp 17 llllOOOOllllOOOOglllOOOO OOOOllllOOOOlllllOOOllll llllllOOOOOOllllglOOOOOO OOOOOOllllllOOOOlOllllll The six crosses 12 llllOOOOllllOlllEOOOllll 13 llllOOOOllllOlllllOOOOOO 14 llllOOOOllllOOOOOOllllll 23 OOOOllllOOOOllllllOOOOOO 24 OOOOllllOOOOlOOOOOllllll 34 llllllOOOOOOlOOOOOllllll Written on 7 192010 Chapter 7 Page 5 of 6 pages Chapter 9 Knowledge Engineering and Data Mining What is Knowledge Eng neering The basic issue here is how to choose the right tool for the job at hand At this point in the course the student should have become familiar with a number of styles of programming including the following 1 Procedural programming 7 the standard C and Visual Basic stuff Within this classi cation one includes objectoriented programming some may disagree 2 Database programming 7 laying out databases using tools such as MSAccess or SQL so that the data can be ef ciently retrieved and used In this instructor s experience the way database programmers think is quite strange and foreign L V Rulebased expert systems 7 these apply rules that are obtained from domain experts to solution of problems that are best expressed in these terms As we have seen and will see again an explanation facility is very important here 4 Fuzzy logic 7 although this is not technically a separate programming style as it would be normally used in either procedural programming or rulebased systems UI V Arti cial Neural Networks 7 these use arti cial neurons either implemented in hardware or simulated in software to mimic processes seen in animals such as the classi cation of objects based on digital data O V Genetic Algorithms 7 these use methods motivated by biological evolution to move through a very large solution space to nd solutions that optimize certain factors called tness functions There is a saying that was current in the study of Arti cial Intelligence in the late 1970 s If the only tool you know is a hammer everything looks like a nail The saying is placed within the story of a carpenter who never having seen a screwdriver attempts to use a hammer to drive a screw into a board The point is obvious for many tasks there are one or two tools that seem to be best Within the context of Arti cial Intelligence there are a few guidelines that are worth mention when considering a selection of tools also called programming styles Before giving these considerations let s review an approach from software engineering All new software replaces an existing process sometimes manual and sometimes implemented by a previous program There are four steps in creating new software These must be done in order 1 Determine what they are doing 2 Determine why they are doing it 3 Determine why you want to change what they are doing 4 Determine what you want to do in the new process These four steps are not restricted to any style of programming nor do they relate directly to our discussion Nevertheless they are important enough to be stated here Version of 7132010 Chapter 9 Page 1 of 13 pages Knowledge Engineering and Data Mining Here are a few rules that seem appropriate in selecting a programming style 1 N E 4 V39 If the process is already being addressed by a software solution consider carefully the style in use It may be better to write the new software in the same style Depending on the corporate culture this may be an important consideration or may be of little concern Changing styles say from procedural to rulebased should be explained with a coherent argument that includes presentation of at least one test case that illustrates the advantages of the new software approach If the problem admits of an efficient algorithmic solution this is strong reason to use the procedural approach For example computation of Bessel functions Spherical Harmonic functions and other such mathematical constructs should always be done by software written in a procedural style as there are wellknown algorithms to compute the value of these objects If the problem is associated with users who will demand an explanation of the answers and the reasoning used to obtain it it is better to use rulebased systems There are many examples from medical practice in which software written with the procedural approach produced fine answers that were mistrusted by the medical staff because the software could not explain its reasoning When interviewing a domain expert pay a lot of attention to how she or he describes the problem This is often a guide to the best method for solution Consider carefully the quality of the data to be input into the software Are the data complete and consistent Some software design strategies notably the procedural methods handle incomplete and inconsistent data less well Problem Classi cation Problems that are best solved by nonalgorithmic methods fall into several classes Diagnosis Inferring malfunctions of an object from observing its behavior and recommending solutions cures Selection Recommending the best option from a list of possible alternatives Prediction Predicting the future behavior of an object from observations of its past behavior This class of problems might be better solved procedurally Classification Assigning an object to one of a number of defined classes such as the process in biology of assigning an animal to a species and genus Clustering Dividing a group of objects into homogeneous subgroups Optimization Finding a solution that maximizes or minimizes a welldescribed function often called an objective function or fitness function Control Controlling an object or machine based on realtime measurements of the object s behavior such as controlling a boiler based on the output of sensors measuring the temperature and pressure of the steam Version of 7132010 Chapter 9 Page 2 of 13 pages Knowledge Engineering and Data Mining Data Acquisition The book presents a number of problems that can arise when using real data as input to a program One of these is usually quite simple to handle once one realizes what is going on This is the problem of incompatible data The book posits examples in which the data are written with either EBCDIC or Binary Coded Decimals and are to be read in ASCII There are other problems fortunately less common these days Both of these next problems arise from issues that should be covered in a course on computer organization 1 Big Endean vs Little Endean data This has to do with the byte ordering of multibyte data such as 32bit integers Some computers write the most signi cant byte rst and some the least signi cant 2 Non Standard Floating Point Formats This used to be a real problem before the introduction of the IEEE754 standard for oatingpoint numbers In general the problems of incompatible data can be solved by a data preprocessor The only dif culty that should arise in practice is not knowing what the format of the data is The two problems of incompatible data and incomplete data require some innovative solutions Some database solutions will provide a special value such as Null to indicate that the data item is missing and allow reasoning based on the fact that the item is missing In the example in the book the lack of an of ce telephone number is taken as evidence that a person is unemployed In a more realistic and regrettable scenario a large and unexplained gap of time in a job applicant s resume will be taken as evidence that the applicant was imprisoned during that time period This last case really happens folks The intelligence agencies of the Us Government such as CIA DIA NSA often called spy agencies often must deal with obtaining information from incompatible data One way that this is done is to rank each source of information by reliability and to favor the data from the more reliable source Unfortunately the rankings of source reliability are highly subjective Lessons from Structured Analysis and Desig The discipline called structured analysis and design or SAampD was one of the rst attempts to engineer the development of software in other words to cause the development of large software systems to be an orderly and wellmanaged process There are two lessons from SAampD that seem appropriate at this point correct use of prototypes and early de nition oftest cases We comment on each Prototypes A prototype of a system is a smaller version of the system that is properly used to investigate the problems that will arise in developing the production system In our consideration one major use of the prototype is to determine whether the style of programming chosen is really applicable to the problem as stated Prototypes should always be discarded when beginning work on the system that will be made a product Any attempt to create the production system as a modi cation of the prototype system will usually end in failure Version of 7132010 Chapter 9 Page 3 of 13 pages Knowledge Engineering and Data Mining Here is one simple example related to what this instructor calls the doghouse problem One might build a small house the size of a doghouse to be a prototype of a fullsized house for a family This would allow certain features to be visualized and better design options to be decided upon but never would be itself modi ed into a habitation for humans For one thing a doghouse will lack the necessary foundation structure Early Test Cases In commercial software development test cases are often de ned late in the software development process This can lead to a common fallacy of developing the test cases with special care that the existing software will pass all the tests The best time for de nition of a set of test cases is at the same time that the system requirements are being developed One may reasonably ask how one can write requirements for a system without also stating what it would mean for the system to fail to meet the requirements Each requirement should automatically generate at least one test case to detect its being met this can be done early Admittedly the set of test cases should evolve as the software proceeds through the development cycle But one should have a good set of tests at the start Data mining and knowledge engineering We now introduce the last major topic to be covered in this course The basic problem addressed here is the extraction of information from a lot of data We perform such functions all the time indeed one of the driving forces of human evolution seems to be the ability to recognize patterns in data and draw conclusions such as how to avoid locations likely to hide dangerous animals For one example of pattern recognition the reader should examine the figure at right which is really a collection of integers each representing a color value We do not think of this as such a collection we humans see a picture Many of us would immediately recognize the face as that of Ozzy Osbourne The student is invited to speculate as to why this instructor has suddenly become fascinated with Ozzy as this instructor has never listened to any of Ozzy s music and up to a few years ago had never heard of him The source of this picture is a web page linked from the CNN web site The link is httpwwwcnncom 2003 SHOWBIZ TV 1 209 osboumeaccident indexhtml The important issue in recognizing this picture is that we did not process the data in any structured manner We immediately extract the information from the data almost without thought In particular we did not start processing a list of famous people to nd a matching picture Is this a picture of George Bush 7 No Is this a picture of Elvis 7 No etc Version of 7132010 Chapter 9 Page 4 of 13 pages Knovvledge Engineering and Data Mining In a similar vvay vve eouldtreat a database as a eolleeuon of datato be proeessed by database uenes possibilities The book ealls this the use of assumptinnrhzsed tnnls you tell me vvhatyou Seleet all and vvhatinteresung value one othls instruetor39s favorite examples oldiseovery is relatedto the Mareh 1979 eneounter ofthe probe Voyager lvvith the moon To a satellite of In t r A e proeessin f h V agerlprob l kingforsomething lse in at hen she none the image at g w soon reeognized as an aeuv the moo lo and was quickly interaeuon vvith the planet Jupiter gain afortultous diseovery r tv l T b r t N Ancient Data Mining nk of that it involved extxaeuon ofpatterns from amulutude of observations Most of us thi H r F r ll but A Pnorto that all to traek the paths ofthe planets and stars through the night sky Very early perhaps as early L L Fw l wd the observations The informauon they extraeted from these datais ealled a ealendar proeesses The ansvveris that the aneients didnot systematieally examine all ofthe approximately 5000 stars visible to e naked eye for examinauon and an ysis See http www abe net ausclenceneWsspaceSpaceRepubllsh7910295 htm but instead L m w 0 w l Mereury Venus Mars Jupiter and Saturn Twn Medical Examples We next diseuss two examples taken from the book The Great In uenza by John M Barry published by Viking Press in 2004 ISBN 0670894731 from the medieal researeh ofthe d 19 ay o eovvpoxquot Mr Jenner observed this pattern 7 that infeetion vvith eovvpox granted immunity to s l That one observauon leadto the development ofa safer smallpox vaeeine and f 0quot eentury pox eventually to the eradieauon o that disease in the 2 Version of7132010 Chapter 9 Page 5 of13 pages Knowledge Engineering and Data Mining The other example was the control of cholera There were epidemics of cholera in London in both 1849 and 1854 See httpwwwcsissorgclassicscontent8 During the first of these two outbreaks John Snow a medical doctor published a pamphlet On the Mode of Communication of Cholera postulating that the disease was spread through contaminated water In the 1854 epidemic it was observed that persons who drank water from the Broad Street pump in London were more likely to contract the disease in fact 500 people who drank water from that source had died of cholera within a span of ten days The authorities made the pump inoperable by removing its handle and the number of deaths per day dropped Had there been any tools for automatic detection of patterns in data the geographic correlation of deaths with contaminated water sources would have been discovered earlier and fewer people would have died Automated Discovery of Patterns At this point in the development of the lectures we should mention an important reference just discovered by this instructor This is a textbook used to teach CS 496 Selected Topics Data Mining at the University of Alabama in Huntsville for the Summer 2004 semester The book is Data Mining Concepts and Technigues by Jiawei Han and Micheline Kamber published by Morgan Kaufmann in 2001 Its ISBN is 1558604898 We shall quote frequently from this book and cite it as DM so that the citation DM 1 refers to a citation from page 1 of the book Data Mining Concepts and Technigues Following the reference just mentioned we ask about the motivation for the subject of data mining Why is data mining important and what do we hope to gain from it The major reason that data mining has attracted a great deal of attention in the information industry in recent years is due to the wide availability of huge amounts of data and the imminent reed for turning such data into useful information and knowledge Data mining can be viewed as a result of the natural evolution of information technology DM 1 As we shall see Data Mining is one of a number of tools for extraction of information from a sea of data The mathematical discipline of statistics provides a number of tools for discovering patterns in a sea of data The restriction on these statistical tools is again that the researcher must have some previous idea of what measurements are significant in the data There is also the problem of too many variables most humans can focus on at most four variables at once There is also the point that statistical analyses can be applied only to data that are selected as possibly holding significant information As our reference puts it The abundance of data coupled with the need for powerful data analysis tools has been described as a data rich but information poor situation The fastgrowing tremendous amounts of data has far exceeded our human ability for comprehension without powerful tools As a result data collected in large databases become data tombs 7 data archives that are seldom visited DM 4 Version of 7132010 Chapter 9 Page 6 of 13 pages Knowledge Engineering and Data Mining We have already seen one signi cant application of data mining when we studied arti cial neural networks previously This is the use of arti cial neural networks by Chase Manhattan Bank to examine data on the use of stolen credit cards and discovered that the most suspicious sales were for women s shoes costing between 4000 and 8000 This is a perfect example of the discovery of an unanticipated pattern in a large amount of data The essential question in data mining is how to automate the discovery of patterns in data by which we mean the identi cation of structure that has not previously been suspected to exist The book compares data mining to hardrock mining such as the search for gold The analogy is valid in that real information is valuable and uncommon in the piles of data that must be examined One way in which the analogy fails is that hardrock mining typically separates a valuable mineral from material of no use often called tailings In data mining the data that are rejected for one application might remain valuable for another Another way in which the analogy fails is due to the fact that the term is somewhat of a misnomer Remember that we are mining data in order to retrieve information As our favorite reference puts it Simply stated data mining refers to extracting or mining knowledge from large amounts of data The term is actually a misnomer Remember that the mining of gold from rocks or sand is referred to as gold mining rather than rock or sand mining Thus data mining should have been more appropriately named knowledge mining from data which unfortunately is somewhat long DM 5 Often the data of interest are stored in large amounts billions of records in a facility called a data warehouse Our reference de nes a data warehouse as a repository of multiple heterogeneous data sources organized under a uni ed schema at a single site in order to facilitate management decision making DM 3 While it should be obvious that value can be obtained by use of traditional database tools such as queries on data stored in a data warehouse we focus on techniques that will assist in information discovery In the above discussion we note that a data warehouse is frequently a repository for heterogeneous data by which we mean data contained in a variety of different database types The best probable situation is that all of the databases will process standard SQL queries and precompiled procedures As we shall see in a minute data in data warehouses often have been preprocessed in order to facilitate their use when needed We now consider the basic steps involved in data mining Broadly speaking these fall into two classes 1 Populating the data warehouse and 2 Extracting data from the data warehouse Version of 7132010 Chapter 9 Page 7 of 13 pages Knowledge Engineering and Data Mining Basic Steps in Data Mining Here are the seven steps of data mining taken from our favorite reference First we populate the data warehouse by preprocessing the data and making them available through some sort of common schema lData cleaning to remove noise and inconsistent data 2Data integration where multiple data sources may be combined DM 7 Having established our data warehouse we now propose to use it Extraction of knowledge from the data involves the following steps 3 Data selection where data relevant to the analysis task are retrieved from the database 4 Data transformation where data are transformed or consolidated into forms appropriate for mining by performing summary or aggregation operations for instance 5 Data mining an essential process where intelligent methods are applied in order to extract data patterns 6 Pattern evaluation to identify the truly interesting patterns representing knowledge based on some interestingness measures 7 Knowledge representation where visualization and knowledge representation techniques are used to present the mined knowledge to the user DM 7 Components of a Data Mining System A typical data mining system has a number of components both hardware and software This section is paraphrased from DM 7 7 9 l The Information Repository 7 this is a set of databases and spreadsheets containing the information to be processed Recall that a data warehouse usually comprises a heterogeneous set of data sources N The Database Server 7the software amp hardware system responsible for fetching the relevant data 3 The Knowledge Base 7 this is the domain knowledge that is used to guide the search for interesting patterns in the data In general blind searching for patterns in the data will yield too many data to be useful The knowledge base includes user beliefs and expectations as well as constraints and thresholds on the data 4 The Data Mining Engine 7 this contains a set of functional modules for cluster analysis characterization classification and association analysis of data 5 The Pattern Evaluation Module 7 this component employs measures of interest in the data and serves to focus the search towards those interesting patterns 6 The Graphical User Interface 7 this module presents the patterns of interest in a way that the user can easily comprehend Version of 7132010 Chapter 9 Page 8 of 13 pages Knowledge Engineering and Data Mining While the precise de nition of a data mining system might be somewhat elusive we can make some obvious disclaimers Here again we quote our favorite textbook While there may be many data mining systems on the market not all of them can perform true data mining A data analysis system that does not handle large amounts of data should be more appropriately categorized as a machine learning system a statistical data analysis tool or an experimental system prototype DM 9 We adopt a database perspective in our presentation of data mining in this book That is emphasis is placed on e lcient and scalable data mining techniques for large databases For an algorithm to be scalable its running time should grow linearly in proportion to the size of the database given the available system resources such as main memory and disk space DM 9 Data Mining Functions What Kinds of Patterns Can Be Mined We have observed that data mining involves processing of large databases in search of interesting patterns in the data We now consider the types of data patterns that can be extracted from the data In general data mining tasks can be classi ed into two broad categories descriptive and predictive To quote the reference Descriptive mining tasks characterize the general properties of the data in the database Predictive mining tasks perform inference on the current data in order to make predictions DM 21 In the following paragraphs we describe a few of the more useful data mining functions ConceptClass Description One form of data mining is the association of data into classes or concepts with the idea of inferring information from class membership For example a casino might classify the gamblers into penny ante and high rollers High rollers usually bet and lose a lot of money so the casino wants to keep them gambling This is without regard to whether it is 10000 or 250000 that the poor sucker has just lost High rollers get free drinks free meals and o en free rooms in an attempt to keep them happily gambling Association Analysis Association analysis is the discovery of association rules in the data that is attributevalue conditions that occur frequently together in sets of data More formally association rules are of the form X 3 Y interpreted as database records that satisfy the conditions in X are also likely to satisfy the conditions in Y For example consider X age P 20 29 AND income P 20K 29K Y buys P CD player The association rule X 3 Y would indicate that people in the age range 20 to 29 with incomes in the range 20000 to 29000 often buy CD players Version of 7132010 Chapter 9 Page 9 of 13 pages Knowledge Englneenng and Data ang There are two rrnportantrneasures assoerateol wrtln assoeratron analysls These rneasures are ealleol suppnrt and can relene Formal y spe ng for assoerauon rule x support 2 1 7mm Ytlne probabrlrty tlnat atransaeuon reeorol eontarns one or botln of entrres x and Y 50 an x entry also eontarns a Y entry processlng tlne olatareeorols ofan eleetronres store age 320 29quotAND rneorne P 20K 29Kquot buys P CD playerquot support 2 con dence 60 Thls rndleates tlnat of all tlne eustorner reeorols unoler study 1 2 suppnrt are 20 to 29 years of age wrtln an rneorne m tlne range 20000 to 29000 or have purchased a CD player at tlne eleetronres store and 2 tlnere ls a 60 pr bablllty ennrulenee or eertarnty tlnat a eustorner m tlus age anolrneorne braeket have purchased a CD player TneDe nn Tree as aDat ng Tnnl one ofthe rnost popular tools foruse m olata mmmg ls ealleol a deelslnn tree whlch ls a wt n andleafnodes The tree always starts at tlne rootnoole and generates new nooles whlch eorresponolto subsets ofthe olata sausfylng speclfll quesuons The gure below shows the basle structure ofa deelslon tree Figure Basie Structure at z Deeisinn Tree for tlne Yes elulol node to be on tlne rrglnt wlule tlne No chlldls on tlne left one should probably use a eonsrstent eonyenuon when dlsplaylng a deersron tree A any glyen u e a deersron tree wlll have one rootnoole anol anurnber oflntenor andleafnodes Eaeln noole ls labeledln two ways Verslon of7132010 Chapter 9 Page 10 of13 pages Knowledge Englneenng and Data ang 1 Exphertly by some objective funeuon that one wants to mlmmlze orrnaxrrnrze and 2 Lrnphertly by the path sequence of deersrons from the root to thatnode We have sardthat data mmmg refers to the atternptto extraet knowledge from data the smallpox vaeerne 1n thrs gure we rrnagrne arandorn sample of 1000 persons taken Agarn Sample Pnpulztinn smallpox Thls m fact ls true 39 333 All Hnusehnlds Incnme a 20700 Incnme gt 20700 a H V we gure 20700 eame from397 The answerls relatedto the rneasure ealled the Gini Inelgx or the Gini chemelent What we are tryrng to do ls examlne eaeh posslble dependent vanable and deterrnrne whether ornot there ls avalue ofthls dependentvanable that allows us to separate nodes wrth the deslred objectlve rneasure from those that do not Verslon of7132010 Chapter9 Page ll of13pages Knowledge Englneenng and Data ang ng horne ownershrp produees an obvlous qurte rrnportant The rst questron asked conceml blnary answerewesquot or No so there ls no dlfflculty m separatrng the answers rnto two y a a questron askedln generaung the deersron tree w turn our attenuon to the questron Is the rneorne greater than 20700quot Where dd ck g ess gure of 20700 eorne from397 Thls rnay have been alu y u We no tlus fl Classi catinn Analysis Declsl n trees are one ofthe tools ofclasslflcanon analysrs whlch ls data elasses or eoneepts for the purpose ofbelng able to use the rnodel to predet the analysls ofa set oftramlng dataquot DM 24 In other rd For mn e as ether a good buyquot or a lernon EIT esnr An There are two a thlonal analysls tools that are based not on elasslfleatlon but on desenblng J d Forexample nruruluur l rn eonneetedto any eentral srte A Gengraphleal Clustering Verslon of7132010 Chapter 9 Page 12 of13 pages Chapter 1 Introduction to knowled ge based systems In this rst chapter we attempt to de ne what we mean by the term Arti cial Intelligence and to give a brief history of the subject What AI is not First we must state a disclaimer that may be important in Georgia and other agricultural states This author was a part of a conversation in which the use of AI on the farm was a central topic Fortunately before he made a fool of himself this author recalled another meaning for AI 7 Arti cial Insemination a process that it is quite important for breading good strains of cattle and other farm animals We shall have enough problems when restricting AI to its computational context so we avoid the biology 11 Intelligent Machines If we are to study the eld of arti cial intelligence we must immediately face a number of questions The most obvious is question number 1 1 What does it mean for a person to be intelligent 2 What would it mean for a machine to be intelligent 3 Rarely asked Does arti cial intelligence mean mimicking human intelligence The above three questions are a good example of the aphorism that any fool can ask many more questions than any wise person can answer By the way 7 what is wisdom and how does it relate to intelligence One could continue on this path for quite some time There are two de nitions of arti cial intelligence that sidestep most controversy We might as well mention them rst before we proceed to a discussion of intelligence Arti cial I quot39 as an approach to 39 39 39 problems Here we must make a brief detour into the theory of computability 7 normally a very abstract and dif cult subject The key idea is that of computational complexity basically the amount of computation it takes to solve a given problem There are four basic classes of interest to us which can be given the following informal names 1 Tractable Easy to solve 2 Notprovablytractable 3 Intractable Hard to solve 4 Not solvable We immediately set aside the small set of problems for which it can be proven that no algorithm exists that will provide a solution and move on to the dichotomy of tractable problems and everything else The class everything else includes both problems for which it can be proven that any solution takes a lot of computation class 3 and those for which no known ef cient solution exists class 2 Basically either we know of an ef cient solution or we do not know of one Problems that are not provably tractable include those for which the only known approach to solving a problem of size 50 takes 608301062 computations One view of arti cial intelligence is that it focuses on nonalgorithmic solutions to problems for which the best known algorithms take too much time Problems in this class include the famous Traveling Salesman Problem TSP for which all known algorithms take time proportional to the factorial of the number of cities 7 to be precise N 7 1 steps to solve the N city problem This is the example used to provide the number above based on the approximation 49 m 608301062 A computer capable of 1012 operations per second there are none at present but their creation is a reasonable nearterm research goal would require at least 608301050 seconds or slightly less than 201035 years to complete This view of AI is the easiest to defend theoretically but note that it completely avoids any mention of the word intelligence The Turing Test The Turing Test devised by the mathematician Alan Turing avoids the slippery definition of the word intelligent by positing a simple welldefined test called the Turing Imitation game A person called the tester male or female 7 it does not matter is seated at a terminal connected to two other input devices One of these input devices is also a terminal being operated by another human and the other input device is actually a computer The tester types sentences to either the other human or the computer and attempts to determine which is the computer The question is whether or not the computer can be programmed to mimic the human sufficiently to fool the tester for a period of five minutes Note that the original Turing test had a tester trying to determine which of two humans is a male and which is female assuming that the tester is communicating with one of each but that in today s politically correct climate we prefer to forget this chauvinistic test One should note immediately that the test is based on the communication of textual information so that it avoids two areas considered very important in the field of Artificial Intelligence the understanding of spoken natural language and the analysis of visual scenes Before we consider the subject of intelligence in general and predictably fail to come to any good definition let us dispose of a few questions Should an AI system attempt to mimic human intelligence The answer is that it is not necessary We present two examples one from the military 1 The US Army has an interest in tank warfare specifically finding and neutralizing enemy tanks before they can in ict damage on our forces At least one army commander known to this instructor would be more than happy with a robot that stalked and attacked enemy tanks at the intelligence level of a common house cat This author at one time had three house cats who teamed very efficiently to trap and catch squirrels The result seemed very funny to this author but the targets took great exception to the sport 2 There are many bioremediation applications for small robots with the intelligence level of an ant Ants are very primitive creatures that display rather remarkable group intelligence Imagine a swarm of robots dedicated to cleaning a toxic spill and then returning to the nest where they deposit the toxic materials Can machines do anvthing that humans can do Some people answer this one in the af rmative claiming that it is possible theoretically to build a computational model of the human brain Were this possible one could clearly create a program to implement that computational model This approach perhaps is a bit too simplistic although those who deny such a claim are also a bit simplistic Speci cally it is not possible to prove that machines will never evolve behaviors such as love delity and moral choice Experiments by Douglas Lenat have yielded computer programs capable of remarkable and unexpected creative discovery but admittedly in a constrained area The author of these notes longs to program a computer in such a way as it will feel pain and embarrassment Perhaps one can infer that not all of this author s programs have worked The basic answer to this question is that it is probably not worth asking within the context of computer science Were it true it would offer great help to those in the eld of psychiatry in that one could reprogram schizophrenics and other mentally ill people to be normal Much science ction consider the movie Total Recall seems to be based on a similar premise but we should leave this one alone and focus on approaches that are computationally feasible 111 What is intelligence as presumably evidenced by humans This is a very tough question for anybody to answer so the rst response to this question is to ask why we as computer scientists should care about the number of possible answers to this question The reason that we should care is that a number of models of intelligence lead immediately to distinct approaches to software design We shall begin with one model that is not very helpful and move on to those we can use The Human Computer There are known cases of humans with remarkable computing abilities Many of these are called idiot savants in that they have this ability at the cost of normal functionality A calendar calculator is a typical idiot savant he can almost immediately tell what day of the week any given date fell on such as July 27 1782 but probably cannot make change for a twenty dollar bill More perplexing are fullyfunctional people with such abilities there is a wellknown scientist who used to work at Los Alamos he could take the square root of a 51 digit number in his head in about ve seconds Apart from this ability which he considered on the same level as a wellexecuted magic trick the man was a remarkable scientist who interacted well with others a trait absent in idiot savants and did much productive work Consider a simpler problem 7 addition of two numbers with large 20 or more numbers of digits The inability of most of us to do this in our heads evidences one fact about our mental processes that they are not register oriented 7 we cannot set up variables mentally store values in them and instantly recall the values as needed The reader should note one of the consequences of the last examples People who appear to have a register oriented memory in that they can memorize and recall large numbers are rare while most models of computing machines are register or variable oriented This indicates that a modern digital computer may not be a natural model of the human mind leading to the conclusion that human intelligence may be hard to mimic in such a computer Intelligence as the accumulation of knowledge One model of intelligence focuses on the acquisition of large amounts of knowledge and the ability to reason using such knowledge In an informal sense this picture represents one of the more common ideas of intelligence how well does one score on an IQ test Put simply this view equates intelligence with the ability to do symbolic reasoning The computational model that re ects this approach is that of expert systems When we study expert systems we shall nd them to be quite powerful but only in narrow ranges of specialization Specifically expert systems do not adapt well to unexpected inputs Learning machines Another definition of human intelligence relates to the human s ability to learn and change behavior in response to feedback One may quickly ask whether or not a computer can learn and what it would mean for a computer to do so This author cannot let this discussion pass without telling a true story from his freshman days at college in 1964 if you must know He was learning to program a small computer called the IBM 1620 That machine had a program that played threedimensional tictactoe In this game the human was always invited to play first After a number of losing games this author finally beat the machine On a guess he repeated the same set of moves the machine responded identically to the previous game and this author again beat the machine After repeating this winning game a number of times this author decided that the computer was not learning from its mistakes It is now quite possible to create computer programs that learn at least to the degree that such a program would not play the same losing game time after time We shall discuss this possibility when we get to the representation of game trees Ability to Survive A number of philosophers including Heidegger and MerleauPonty intelligence is not knowing what is true and how to reason from it but the ability to survive in a world that is constantly changing This would be a logical conclusion if one were to posit that intelligence arose as a result of evolutionary pressures in a changing world beginning with the origins of the human species in Africa a few million years ago This survivalist intelligence model might lead to a computational approach that focuses on interacting with an environment and responding appropriately Genetic algorithms were inspired by this approach in this programming method a number of trial solutions are evaluated with the best ones being kept and new solutions generated based on these 7 only the better solutions survive while the inferior solutions are pruned Social Intelligence For many species intelligence seems to be a product of cooperative behavior As we have mentioned earlier an ant colony can generate apparently intelligent behavior out of the interaction of a large number of very simple agents 7 the individual insects Similar behavior is noted among other social insects such as bees This behavior has been adapted in a style of programming based on creating a collection of simple autonomous agents each of which interacting with a number of others to solve complex problems 12 Histnry nrAru39rlelal Intelligence th Hl w These notes wlll not duplreate that hrstory but only eomment on rt 121 one ofthe earller supposed examples of Axu cl Lntellrgenee was a chessrplaylng machlne ealled The Tur that toured Europe and the Umted States m the late 18 and earlrest 19quot eentury The gure at nght was i gt 1 a did see 8 ma 3 Ell 0 e 3 was openedto reveal the marvelous eloekworklnslde 1769 by Kempelen who ehallenged all eomers to play the maehrne The Turk dld m fact defeat almost all opponents e t e t t t t l4 d slg d hrd lnspeeuon the Turks enhrbrtron anumber ofhuman ehess players provlded the rntelhgenee Fd r llnv Turk posslbly lose a mateh 122 General Purp use Emblem shivers useable results The early age ofAmflclal Lntellrgenee was eharaetenzed by boundless t problems Two ofthe early programs were ageneral purpose reasomng tool called Advlce Takerquot and another called General 1gtroblem Solverquot Thls author conslders thrs early stage of Early m the development of Amflclal 1ntelhgenee rt beeame obvlous that procedurerorlented languages sueh as Assembly Language and the reeently developed FORTRAN dd not flt well to the problems oflnterest For thrs reason anum er o programmmg languages were lncludmg the language LISP developed m the late 195039 s by John MeCarthy whlle at MIT one ofthe key funeuons m LISP ls ealled EVAL whlch does exactly what one would enpeet ert evaluates expresslons The story 15 thatDr MeCarthy vlewed LISP purely as a formal language untrl one ofhls graduate students wrote an rnterpreter for the EVAL funeuon e The name LISP stands for Qt Processing Language re ecting the fact that lists form one of the most important data structures in this type of programming Those of us who have worked with the language know that it really stands for Lots of Insipid tupid Barentheses LISP is an old programming language and carries its history with it It was originally developed on an IBM computer one that had two speci c sets of registers AR the Address Registers DR the Decrement Registers In the assembly language interpreter for the LISP language it became the practice to store the first element in a list at an address held in the address register and the rest of the list in an address held in the associated decrement register For this reason the two main operators for manipulating lists became CAR Qontents of the Address Register CDR Qontents of the Qecrement Register The student should realize that LISP was written with ease of interpretation by the computer as a main goal Interpretation by humans was of little importance since most of those humans who first saw it were mere graduate students aka graduate slaves To show the difference we compare two very simple statements each in a language of the 1950 s FORTRAN X 1 2 LISP SETQ X l 2 The student should note that we all make use of one product of this development of LISP as a programming language It was realized early that a human would need a fairly sophisticated visuallyoriented text editor to write LISP programs EMACS was one of the first such editors having special features to balance parentheses in an expression Consider the following two examples where the underline indicates the position of the cursor SETQ X l 2 the first parenthesis as the matching one starts to ash SETQ X l 2 the second opening parenthesis starts to ash If you like the Visual C editor you owe a debt to the early LISP developers Note the numerous parentheses in the Lots of Insipid Stupid Parentheses example This greatly facilitates parsing by a computer Note also the prefix notation l 2 with the operator as rst in the expression as opposed to the FORTRAN infix expression In general either prefix or postfix reverse Polish notation is easier for computers to handle Another development in the early age is that of fuzzy logic first announced in a 1965 paper Fuzzy Sets by Lotfi Zadeh We shall spend some time on the study of Fuzzy Expert Systems Chapter 4 in the textbook which are rulebased systems using fuzzy logic At this time we merely give a very simple example of fuzzy sets We all know what cold weather is and what hot weather is In formal set theory we would have to specify a fixed temperature T such that if the temperature exceeds T then the weather is hot otherwise if the temperature does not exceed T then the weather is not hot Almost everyone would agree on the following statements 104 F 40 C is de nitely hot 50 F 10 C is de nitely not hot 86 F 30 C is not quite clear but probably is not hot One way that mimics human experience is to de ne several lzzy sets and de ne these values by membership in the fuzzy sets COLD COOL WARM and HOT Thus 104 F might be assigned the value 0 0 0 10 decidedly hot 50 F might be assigned the value 02 08 0 0 mostly cool but maybe cold Later we shall give a much more precise formulation of fuzzy sets 123 ELIZA and Natural Language Processing NOTE The source for almost all of the information on ELIZA is a report created by Christopher Rose on 2212003 for a class CSE391 presumably at the University of Pennsylvania It was located on Google using the key words arti cial intelligence natural language Rogerian and Eliza In the mid 1960 s we saw an rather frightful example of an AI program 7 called ELIZA a program designed to mimic a Rogerian psychologist 7 ie a psychologist who practices what is called Rogerian Therapy in which the therapist takes a passive role re ecting the patient s statements back by reformulating the statements as questions The ELIZA program was created in the early 1960 s by Joseph Weizenbaum at MIT as an experiment in natural language processing The scary part of this is how many people attributed human intelligence to the program which used a number of syntactic tricks to extract a question from a previous statement that the patient had typed in If the program could not apply any of its rules it output a generic question such as That is very interesting Why do you say that or Tell me about your mother Here is a typical session Remember that all interaction was by means of keyboard and some sort of monitor speci cally that no speech was involved Person Men are all alike ELIZA In what way Person They re always bugging us about something or other ELIZA Can you think of a speci c example Person Well my boyfriend made me come here ELIZA Your boyfriend made you come here Person He says I m depressed much of the time ELIZA I am sorry to hear that you are depressed In the late 1960 s the US government tried to make it easier for our researchers to read results coming out of the Soviet Union It sponsored human translations of Soviet technical journals and sponsored an effort for computer translation of articles from Russian to English The author of these notes was a user of such machine translations in the late 1960 s and found it easier to read the original Russian than the English produced by these programs Chapter 6 7 Arti cial Neural Networks The r r r r n t program eomputer Thls has workeol very well but has not been shown to be ef clentfor every problem type Thls leaves us lookmg for new ways to solve harol problems one elassreal problem of great rmportanee ls the rnterpretauon ofvlsual seenes 7 l Conslder the preture at lelt taken from the CNN web slte n In fact the problem ofvlsual seene analysls ls eentral to the autonomous vehrele project ethe the problem ls howto use thrs vlsual olatato make navrgauon oleersrons Nur wrwrk l HA NW quotc lNurth rk p weas turn ll rlth eyes hlgh level Look agarn at the preture above You wlll not say to you self There ls a r blg blob wl suchrandrsuch entensron and gray eolor ert seems to be a earquot but There ls ray t t c t A t v m Although apresentrday amflclal neural network ANN resembles the human the rst lm wldm 39W 4 L L Notes wntten 7312010 Page 1 of 10 pages CPSC5185 chapter 6 Artlflelal Neural Networks The gure below shows the strueture ofthe basle aruflelal neuron Input Signal Output Signal speclflc neuron ls that of deterrnrnrng the values to be assrgnedto the werght set wt w wm The basls ofthe amflclal neuron ls the eornputauon othe weighted input u XAZWt39Xt u The whreh seem to be aeuyated only when the rnput reaehes a threshold value There are anurnber ofways for the neuron to aetryate based on the eornputauon ofthe functlon X Xl r We have the followlng relauon between x and Kl X20 rfandonlyrfxlze Xgo rfandonlyrfxlge Thus ways to eornpute the aetryatron funeuon based on the adjusted rnput but only two or three appear to be feaslble Two are the hardlrrnrt funetrons the step and slgn funetrons u h work just as well u h the slgmold funeuon We shall use the slmpler step funeuon Notes wntten 7312010 Page 2 of 10 pages CPSC5185 chapter 6 Artrfreral Neural Networks The Pereenernn We shall flrstlnvesngate ayery srrnple amflclal neural net ealled apereepernn Thrs rs an The step functlon rs a drseonunuous functlon thatrs rneorreetly drawn rn the textbook The book suggests that there rs alrne along the Yarns that rs a part ofthefunctlon Tt rs not The functlonlsvery srrnply desenbed Y x2 0 0 forXlt0 2 1 o 1 2 For our systern the yanable x wrll represent the adjusted werghted rnput denoted by X e warplanes and Linearly Separahle Functinns In general we shall look at apereeptron wrth Nrnputs denoted rm m In rnathernaueal terrns thrs deserrbes an Nrddmenslonal spaee wrth eaeh ofthe rnputsx as a eoordrnate The equauon Ewan ee 0 deserrbes an N7 Urdlmenslonal hyperplane that t r let l H d 0 w demenslonal hyperplane also ealled alrne X2 Here we see that the lrne dryrdes the Wordlmenslonal lane rnto two parts oyequot the lrne and e belowquot the lrne The lrne rs de ned by the lrne speclfled by the equauon rn pararnetrre form Wllrl W linearly Separ XI yarrables Kr and X2 rs one that has two WfX1 W X 9 0 values Yr and Ya separatedby thrs lrne For those who lrke the more tradltronal vrew ofalrne replaee the vanablesrr andx by 7 7 0 Then let M WzlY W X W w andB 6 w and dlvlde the enure equatron by w to get the farnrlrar form YMXB whreh ean be wntten as Y rMX 7B 0 To rerterate suppose that at one pornt above the lrne w m wch e G 0 we have the functlon wrth yalue Yr The functlon rs lrnearly separable lf rt has value Yr for all values Xr anan for whreh Wrm Wanxa re gt 0 and rt has value Y2 for all values Xr anan for whreh Wrm Wanxa re lt 0 Notes wntten 7312010 Page 3 of 10 pages CPSC5185 chapter 6 Artlflelal Neural Networks We Eonslder and thnl r t b ORanolX X2 These funetlons have the expeeteol de nltlons Xl X2 AND Xl X2 OR Xl X2 XOR 0 0 0 0 0 0 0 0 0 0 l 0 0 l l 0 l l l 0 0 l 0 l l 0 l l l l l l l l l 0 The loglcal AND funetlon 15 a separable funetlon beeause the only values forwhleh lthas value 1 have Xl X2 2 Thus we ean set E l 5 and say that lt has value 1 for all values Xl and X2 forwhleh an we 715 gt 0 and lt has value 0 for all values Xl and X2 forwhleh an we 715 lt 0 m n lt has value 0 have Xl X2 0 Thus we ean set E 0 5 and say that lt has value 1 for all values Xl and X2 forwhleh an we r 0 5 gt 0 and lt has value 0 for all values Xl and X2 forwhleh an we r 0 5 lt 0 The gure at left shows the excluslve OR funetaon XOR Xr X2 Thls ean b seen notto be allnear separable functlon as ltls notposslble to olraw al through the plane suehthat all polnts on one slde 0 anol all pornts on the other slde have value 1 IV me ha e p so that ceptx anolthe OR Boolean funetaon butnot the XOR Boolean funeuon 1t ean be th rnany lnterestlng funetlons are llnearly separable To eope wlth seen at not problerns that requlre funetlons that eannotbe llnearly separable we must oleal Wth rnultalevel neural networks Agaln ths proeess ls called h39amlngquot the network Obvlously multlrlevel networks requlre arno complex trarnlng rnethool than a slngle level pereeptron For thatreason we focus rst on pereeptrons Notes wntten 7312010 Page4 of lUpages CPSC5185 chapter 6 Artrfreral Neural Networks Example The Lngieal AND functinn L1 1 N39Dof two vanablesBoolean Xr anan The logreal ANDrs 1 lf and only rfxr 1 andX21 x 2 The frrstthrng to note rs that thrs problem ean ne Sueh rs the eharaetensue of any good example of a new method 7 that rt does not requlre use of the new method but wrll rllustrate rt well Consrder equatrons othe form x x2 c alternately wrrtten as x x 7 0 or any eonstant value c gt 1 we have alrne that separates the 0 values of the AND from the 1 values of the AND Choosrng c 2 rnsures thatfor 0 gxr g 1 and 0 29 g 1 that only x 1 and X2 wrll yreldthe result X1 X2 7 2 0 0 t wrxr W2IX2 79 0 The example speclfles that e 0 20 so we drvrde the equatron ln 1K 0 1 i A L arrrve at the values of the werghts Wt 0 1 andW 01 However ofthe werghts and trarn the pereeptron to arnve at the werght set above Here rs the trarnrng rule 0 Wl 1 Compute x mm wzxp e 0 20 for grven rnputs x and x2 2 Set 1 lleZU 0 le lt 0 3 Compare the eomputed value 1 wrth the desrredresult Yd ANI on 9 and produee an error term e Yd r 4 Ltthe errorterm rs aeeeptable termrnate the txamlng If not use rt to eompute new values of the werghts w and w and return to 1 Termrnology Epoeh andIteratron At thrs pornt we must dlfferentlate between two terms used to eontrol the trarnrng sessrons ung set In our mnl four possrble rnputs to the logreal AND funeuon o 0 0 1 1 0 and 1 1 andrterate to nd anew set ofwelghts An epneh refers to a set oflteratlons one for eaeh of the possrble rnputs here there are fourlterauons for eaeh epoeh Notes wntten 7312010 Page 5 of 10 pages CPSCS 185 Chapter 6 Arti cial Neural Networks The Convergence Rule The convergence rule is that criterion supplied by the user that indicates when to stop the iterations The book is not quite explicit on what a convergence rule should be so this instructor supplies his own 7 that the error be within acceptable limits for an entire epoch 7 put another way that the error be within acceptable limits for every input used as a test case That does seem to be the rule that the book uses in the example on page 171 The Learning Rate What we are considering is called in other contexts a feedback circuit The output of the perceptron is compared to the desired value and the error fed back into the circuit Those with experience in analog circuits know that feedback can lead to serious instabilities For that reason a learning rate a 0 lt at lt 1 is introduced into the equations This learning rate controls how fast the weights change in response to errors The Training Algorithm for This Perceptron We now give the training algorithm exactly as applied for this one example We begin by assigning random values to the weights here W1 03 and W2 7 01 We now describe the training algorithm in many cases duplicating the description on the previous page Note that this algorithm is based on epochs of four iterations each so that we terminate only when we have acceptable results for each of the four possible inputs Begin the epoch by selecting X1 0 and X2 0 the first test case For Each Test Case training case for iteration p p gt 0 Calculate X W10X1 Wzon 7 020 Calculate Y byY 1 if X Z 0 Y 0 ifX lt 0 The book calls this value Yp Obtain Yd the desired output for this test case The book calls this Ydp Obtain the error term e Yd 7 Y Use this error term to compute the new value of both weights for the next iteration Use the learning rate at as follows Compute AWk onkoe fork 1 2 Compute Wkp1 Wkp AWk the weights for the next iteration Increase the iteration by one p p 1 and move to the next test case End Loop over Test Cases Terminate the loop over epochs if the error was acceptable for the four previous test cases Otherwise start the next epoch and do four more test cases End of the epoch As a side remark the term epoch sounds a bit too grand to apply to such a process but we are stuck with the term It makes this instructor think of geologic time Notes written 7312010 Page 6 of 10 pages CPSC5185 Chapter 6 Arti cial Neural Networks Begin the Algorithm We start with W1 03 and W2 7 01 We have speci ed 9 02 and 0c 01 Set p l to start the rst iteration Start Epoch l p 1 NOTE Start Epoch 2 p 5 Notes written 7312010 W103ansz70l X10andXz0 X 0300X17 01X2 7 020 03000 7 010 7 020 7 020 Y 0 Yd 0 e 0 No change in the weights W1 03 and W2 70l X1 0 and X2 1 X 0300X17 01on 7 020 03000 7 01017 020 7 030 No change in the weights For two successive iterations we have zero error This is curious but does not imply anything We must make it through an epoch with no error W1 03 and W 701 X1 l and X2 0 X 0300X17 010X2 7 020 030017 0100 7 020 010 Y1 Yd0 e7l AW10loX10e0lolo7l7010 AW 010Xzoe 0100071 000 W102ansz70l X1landXzl X 0200X17 01X2 7 020 020017 01017 020 7 010 Y0 Yd 1 e l AWl010X10e0lolol010 AW 010X20e 010101 010 W1 03 and W2 00 X1 0 and X2 0 X 0300X1 000X2 7 020 03000 0000 7 020 7 020 No change in the weights Page 7 of 10 pages CPSCS 185 Chapter 6 Arti cial Neural Networks p6 W103 ansz00 X10andXz1 X 0300X1 000X2 7 020 03000 00017 020 7 020 Y 0 Yd 0 e 0 No change in the weights p7 W103ansz00 X1 l and X2 0 X 0300X1 000Xz 7 020 03001 0000 7 020 010 Y1 Yd0 e71 AW1010X10e01010717010 AW 010Xzoe 0100071 000 p8 W102ansz00 X1 l and X2 1 X 0200X1 000Xz 7 020 02001 00017 020 000 e 0 No change in the weights NOTE Here we end an epoch with an iteration haVing no error Again this may be interesting but it is not suf cient to stop the algorithm Start Epoch 3 p9 W102ansz00 X1 0 and X2 0 X 0200X1 000X2 7 020 02000 0000 7 020 7 020 Y 0 Yd 0 e 0 No change in the weights p 10 W1 02 and Wz00 X1 0 and X2 1 X 0200X1 000X2 7 020 02000 00017 020 7 020 e 0 No change in the weights Notes written 7312010 Page 8 of 10 pages CPSC5185 p11 Start Epoch 4 Chapter 6 AIti cial Neural Networks W1 02 and W2 00 X1 1 and X2 0 X 0200X1 000Xz 7 020 02001 0000 7 020 000 Y1 Yd 0 e71 AWl010X10e01010717010 AWZ 010Xzoe 0100071 000 W1 01 and W2 00 X1 1 and X2 1 X 0100X1 000Xz 7 020 01001 00017 020 7 010 Y0 Yd1 e1 AWl 010X10e010101010 AW 010Xzoe 010101 010 p13 W102ansz010 X1 0 and X2 0 X 0200X1 0100X2 7 020 02000 01000 7 020 7 020 Y0 Yd0 e 0 No change in the weights p14 W102ansz010 X1 0 and X2 1 X 0200X1 0100X2 7 020 02000 010017 020 7 010 Y0 Yd 0 e 0 No change in the weights p15 W102ansz010 X1 1 and X2 0 X 0200X1 0100X2 7 020 02001 01000 7 020 000 Y1 Yd 0 e71 AWl010X10e01010717010 AWZ 010Xzoe 0100071 000 p16 W1010andW2010 X1 1 and X2 1 X 0100X1 0100X2 7 020 0101 01017 020 000 Y1 Yd1 e 0 No change in the weights Notes written 7312010 Page 9 of 10 pages

### BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.

### You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

## Why people love StudySoup

#### "There's no way I would have passed my Organic Chemistry class this semester without the notes and study guides I got from StudySoup."

#### "I bought an awesome study guide, which helped me get an A in my Math 34B class this quarter!"

#### "Knowing I can count on the Elite Notetaker in my class allows me to focus on what the professor is saying instead of just scribbling notes the whole time and falling behind."

#### "It's a great way for students to improve their educational experience and it seemed like a product that everybody wants, so all the people participating are winning."

### Refund Policy

#### STUDYSOUP CANCELLATION POLICY

All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email support@studysoup.com

#### STUDYSOUP REFUND POLICY

StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here: support@studysoup.com

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to support@studysoup.com

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.