Artificial Intelligence CPSC 481
Cal State Fullerton
Popular in Course
Popular in ComputerScienence
This 164 page Class Notes was uploaded by Mrs. Lue Goyette on Wednesday September 30, 2015. The Class Notes belongs to CPSC 481 at California State University - Fullerton taught by Christopher Ryu in Fall. Since its upload, it has received 28 views. For similar materials see /class/217062/cpsc-481-california-state-university-fullerton in ComputerScienence at California State University - Fullerton.
Reviews for Artificial Intelligence
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 09/30/15
Artificial Intelligence CPSC 481 Introduction What is Intelligence 0 Definition A property of the mind including related abilities a very general mental capability such as the capacities for a learning new and from experience 0 understanding complex concepts or natural language a reasoning o imagination or abstract thought a creativity 0 communication a planning and 0 problem solving 0 Different from being smart that is capable of adapting to the environment Theories of Intelligence 0 Single intelligence Q Single general intelligence 0 Multiple intelligence 0 Logical linguistic spatial musical kinesthetic interpersonal intrapersonal and naturalist intelligences sensing patterns and making connections to elements in nature by Howard Gardner 0 But the Gardners theory has never been tested 0 Emotional intelligence by Daniel Goleman and others 0 Triarchic theory of intelligence Q Analytical intelligence creative intelligence and practical intelligence Other Aspects of Intelligence 0 Animal or plant intelligence o Is human intelligence different from animal or plant intelligence 0 Factors affecting intelligence 0 Biological 0 Environmental 0 Evolution of intelligence 0 Many aspects of intelligence and cognitive process are still unknown 0 Exactly what happens when learning occurs 0 How is knowledge represented in the nerve system or brain 0 What is intuition or skepticism o What is selfawareness o Etc 0 Measurements of intelligence 0 IQ psychometric testing EQ etc Artificial Intelligence Al 0 Definitions of Al Q Q The science and engineering of making intelligent machines John McCarthy The study and design of intelligent agents where an intelligent agent is a system that perceives its environment and takes actions that maximize its chances of success 0 Al fields and research Q The field was founded on the claim that a central property of humans intelligence can be so precisely described that it can be simulated by a machine Al has been influenced by many people in different areas including philosophers psychologists mathematicians computer scientists etc o In uenced by rationalist many philosophers theoretical mathematical logical etc o In uenced by empiricist focus on human intelligence through human cognitive system knowledge is acquired by experience Categories of Al 0 Thinking humanly and acting humanly 0 Thinking rationally and acting rationally important for ideal decision making 5 Turing Test Acting Humanly THE Turing test by Alan Turing for INTERROGATOR the theory of computabillty Can a machine to be made to think The test measures the performance of an intelligent machine against that of a human being best and only standard for intelligent behavior through the imitation game Importance of Turing Test o Attempts to give an objective notion of intelligence 0 Prevents us from being sidetracked by such confusing and currently unanswerable questions as whether or not the computer uses the appropriate internal processes or whether or not the machine is actually conscious of its actions 0 Eliminates any bias in favor of living organisms by forcing the interrogator to focus solely on the content of the answers to questions Criticisms for Turing Test 0 Focuses on purely symbolic and problemsolving skills not on perceptual skill 0 Only based on human Machine can do a lot better for many tasks 0 A major distraction to the more important task such as developing general theories to explain the mechanisms of intelligence in human and machines to solve specific and practical problems 0 Computers can only do as they are told and consequently cannot perform original actions Ada Lovelace o Impossible to create a set of rules programs that will tell an individual what to do under every possible set of circumstances 0 However advanced Al techniques show this unnecessary Major Areas of Interests in the First Al Conference at Dartmouth 1956 Automatic computers 0 A program that simulates the machine 0 How can a computer be programmed to use a language 0 The way that human uses and manipulated words Neuron networks 0 Theory of the size of a calculation o Selfimprovement through learning and reasoning Abstractions o Randomness and creativity 0 Difference between randomness and creativity Evolution of Al 0 Al has been the subject of optimism but has also suffered setbacks and today has become an essential part of the technology in industry 0 Early stages 0 Focused more on concepts theories simulating human intelligence 0 Formal logic predicate calculus stochastic analysis graph theory theorem proving and other related mathematics problem solving knowledge representation search etc 0 Modern Al I Focus more on practical aspects of Al for problem solving demand from boom through practical Al such as realworld problem solving strategies and business applications 0 Machine learning for data mining search engines Intelligent systems etc Scope Of What system is considered intelligent Egg 0 o The use of computers to do reasoning pattern recognition learning or some other form of inference o A focus on problems that do not respond to algorithmic solutions This underlies the reliance on heuristic search as an Al problemsolving technique 0 Answers that are neither exact nor optimal but are in some sense sufficient This is a result of the essential reliance on heuristic problemsolving methods in situations where optimal or exact results are either too expensive or not possible 0 A concern with problemsolving using inexact missing or poorly defined information and the use of representational formalisms that enable the programmer to compensate for these problems 0 Reasoning about the significant qualitative features of a situation 0 An attempt to deal with issues of semantic meaning as well as syntactic form 0 The use of large amounts of domainspecific knowledge in solving problems expert system 0 The use of metalevel knowledge to effect more sophisticated control of problemsolving strategies Although this is a very difficult problem addressed in relatively few current systems it is emerging as an essential are of research 0 Etc Areas of Al 0 Al research is specialized and deeply divided into subfiel s o Perception and the ability to move and manipulate objects o Learning o Knowledge representation o Natural language understanding o Reasoning o Planning o Problem solving o Abstract thoughts or creativity 0 Al offers a unique and powerful tool for exploring these questions expanding the capabilities of computer science 0 However general intelligence or quotstrong Alquot is still a long term goal of research State of the Art 0 Deep Blue defeated the reigning world chess champion Garry Kasparov in 1997 o Proved a mathematical conjecture Robbins conjecture unsolved for decades 0 No hands across America driving autonomously 98 of the time from Pittsburgh to San Diego 0 During the 1991 Gulf War US forces deployed an Al logistics planning and scheduling program that involved up to 50000 vehicles cargo and people 0 NASA39s onboard autonomous planning program controlled the scheduling of operations for a spacecraft o Proverb solves crossword puzzles better than most humans 0 Video analytics to detect various object motion 0 Voice recognition and natural language understanding Future Al a Can a machine with intelligence be more intelligent than human 0 Ethical issues may arise when we can have the true intelligent system with strong Al Primary Focus in this Course 0 While we are trying to understand intelligence and implement intelligence such as problem solving reasoning learning creativity natural language understanding planning etc 0 We also want practical AI as complex and intelligent problem solving strategies References 0 George Fluger Artificial Intelligence Structures and Strategies for Complex Problem Solving 6th edition Chapter 1 Addison Wesley 2009 Artificial Intelligence CPSC 481 Representation and Search State Space Search for Problem Solving Lecture Overview 339 0 States state space and search for representation and solution approach in problem solving process 0 Use graph theory for description of state space and problem solving process 0 Search direction 9 Goaldriven vs Datadriven 0 General search approaches for state space search e Depthfirst search e Breadthfirst search e Backtracking Representation and Search 0 Representation for problems solutions or knowledge 0 Various representation schemes Representation in computer floating point number variable array record table object list tree queue graph etc o Knowledge representation schemes semantic network frame etc 0 Importance of representation 3 To capture the essential features of a problem domain and make the information accessible to a problemsolving procedure Expressiveness and efficiency are the keys 0 Need to optimize the tradeoff between expressiveness and efficiency 0 State state space and state space search methods in problem solving process a State is an abstract representation State space is all possible states Search method as problem solving method Think about human problem solving examples Math problems or finding a location 0 Al as a problem of representation and search 3 Is there a walk around the city that crosses each bridge exactly once Riverbank 1i The city of K6nigsberg Riverbank 2 o Euler invented graph theory to solve this problem Graph of the Konigsberg Bridge System rb1 Euler proved the problem Unless a graph contained either exactly zero or two nodes of odd degree the walk was impossible rb2 Many other realworld problems can be thought as graph problems State and state space can be represented using Graph Theory 5 DEFINITIO GRAPH A graph consists of A set of nodes N1 N2 N3 Nquot which need not be finite A set of arcs that connect pairs of nodes Arcs are ordered pairs of nodes ie the arc Na N4 connects node his to node N This indicates a direct connection from node Na to N4 but not from M4 to N3 uniess N4 N3 is aiso an arc and then the are joining N3 and N4 is undirected I f a directed arc connects Nj and N then MI is caiied the parent of Nik and M the child of N If the graph aiso contains an an N N then Nk and N are nailed siblings A mated graph has a unique node MS from which all paths in the graph originate That is the root has no parent in the graph A tip or leaf node is a node that has no chiidren An ordered sequence of nodes Nb N2 N3 NH where each pair Ni NiI1 in the sequence represents an arc ie Ni NM is caiiecl a path of length n 1 On a path in a rooted graph a node is said to he an ancestor of aii nodes positioned after it to its right as weii as a descendant of ail nodes before it A path that contains any node more than once some Ni in the de nition of path above is repeated is said to contain a cycle or nap A tree is graph in which there is a unique path between every pair of nodes The paths in a tree therefore contain no cycies A Labeled Directed Graph f a MOd rwx Nodes abcde Arcs abadbccbcddadeeced Directed graph A graph is directed if arcs have a direction Path 3 sequence of nodes through successive arcs eg a b c d A Rooted Graph is A Tree 32 3 A A tree showing a b 6 d family relationships 0 o o 9 o f g h I parent and children 0 9 Tree has a root that has path from the root to all nodes Every path is unique without cycle ANDOR Graph to Represents Subproblems and Alternative Paths 3339 p P 0 q and r are q and r are subproblems alternative to be solved paths 0 O a r q r ANDOR graph can be also represented by q r 9 p implication in logic Cl V r 9 p AND operator indicates a problem decomposition as subproblems to be solved OR operator indicates alternative solution paths ANDOR can be used as a search method 9 DEFINITION 39 STATE SPACE SEARCH 0 0 A state space is represented by a fourtriple NASGD where N is the set of nodes or states of the graph These correspond to the states in a problenesolving process A is the set of arcs or links between nodes These correspond to the steps in a problemsolving process 8 a nonempty subset of N contains the start states of the problem GD 3 nonempty subset of N contains the goal stzttets ot the problem The states in GD are described using either 1 A measurable property of the states encountered in the search 2 A property of the path developed in the search for example the transition costs for the arts of the path A solution path is a path through this graph from a node in S to a node in GD State space search is a search method to find a solution in the state space Problem Solving through State Space Search 3 no 00 o Nodes correspond to partial problem solution states a Results of logical inferences or different configurations of a game board a States that describe the knowledge of a problem 0 Arcs represent transitions between states and correspond to the steps in a problemsolving process Logical inferences or legal moves of a game Applying a rule is represented as an arc between states 0 Problem solving process a Can be considered as state space search for finding a solution path from initial states to goal states or solution states that meet the criteria I Search space all possible paths I Solution space limited solution paths 0 Can be also used as a means of determining the problem complexity o eg search space all possible moves for Tictactoe 9x8x7x 9 Chess 10120 11 State Space of the 8puzzle Generated by move blank Operations is i 4 3 7 6 Search Moving from an 5 8 2 initial state to a goal state i v ii ii43143143i4 i43i 7 e7 616576786736763762 5 2 s 2s SZEEHMMEEH Traveling Salesman Problem Goal is to find the shortest path for the salesman to travel visiting each city and return to the starting city An instance of the travelling salesperson problem A D C B E A with 450 miles ls it optimum minimum solution Search Space for the Travelling Z Salesman Problem Each arc is marked with the total weight of all paths from the start node A to its endpoint Path Path Path ABCDEA ABCEDA ABDCEA Cast Cast Cost 375 425 475 14 Looking for the Shortest Path with the Nearest Neighbor Path 3 Neighbor path is in bold Note this path A E D B C A at a cost of 550 is not the shortest path The comparatively high cost of arc C A defeated the heuristic DEFINITION O FINITE STATE MACHINE FSM 39 39 A nite state machine is an ordered triple S I F where S is a nite set ofstates in a connected graph 31 52 53 Sn is a nite set oiinpttt values i1 i2 i3 Im F is a state transition function that for any i E It describes its etteet on the states 8 of the machine thus V i e I F S gt S lithe machine is in state S and input i occurs the next state of the machine will be H 5 A finite directed connected graph can be f9 9 seen as an abstract model I39 DI v Lac Ii of computation such as 199 finite state machine I quot a b The finite state graph for a flip flop Transition matrix in in visualized representation compact data structure 16 Search Direction in State Space 33 00 o Datadriven or forward 3 3 Begin with the given information states legal moves or rules sea ch by applying rules to produce new states until it finds a goal state 0 Eg Math problems or most problems Uses the knowledge and constraints found in the given data of a problem to guide search 0 Goaldriven or backward 3 Take the goal to solve and see what rules or legal moves could be used to generate this goal and determine what conditions must be true to use them these conditions become the new goals or subgoals for search and continue working backward until it works back to the facts of the problem 0 Eg multiplechoice questions question answering theorem proving when the goal is already known finding causes when something is already happened Uses knowledge of the desire goal to guide the search 0 Note Both approaches explore the same problem space Preferred strategy is chosen by the properties of the problem 3 Factors to consider complexity branching factor easier implementation 17 When is Datadriven Search Better 52339 0 When all or most of the data are given in the initial problem statement 3 Interpretation problems often fit this mold by presenting a collection of data and asking the system to provide a highlevel interpretation Systems that analyze data eg interpreting geological data to find minerals PROSPECTOR or Dipmeter 0 When there are a large number of potential goals but there are only a few ways to use the facts and given information of a particular problem instance a DENDRAL expert system finds the molecular structure of organic compounds based on their formula mass 0 When it is difficult to formulate a goal or hypothesis When is Goaldriven Search Better 52339 0 When a goal or hypothesis is given in the problem statement or can easily be formulated a Theorem proving goal is the theorem to prove question answering in expert systems questions are goals finding a cause of a problem 0 When there are a large number of rules that match the states of the problem and thus produce an increasing number of conclusions 3 Early selection of a goal can eliminate most of these branches reduced search space To prove a statement I am a descendant of Thomas Jefferson 0 Problem data are not given but must be acquired by the problem solver o For medical diagnosis problem doctor orders only those that are necessary to confirm or deny a particular hypothesis General Graph Search Methods 5 o DepthFirst Search DFS a When a state is examined all of its children and their descendant are examined before any of its siblings o Goes deeper and deeper into the search space stop only when no other descendants or goal is found 0 BreadthFirst Search BFS a Explores the space in a levelbylevel fashion Only stop when there are no more states to be explored at a given level and move to the next level until it finds a goal 0 Both DFS and BFS are brute force and uninformed approach Example a Graph for DFS and BFS E A M TALLb J chiEX BFS Algorithm function breadthfirstsearch begin open Start closed H while open do begin remove leftmost state from open call it X if X is a goal then return SUCCESS else begin generate children of X put X on closed discard children of X if already on open or closed put remaining children on right end of open end and return FAIL end initialize states remain goal found loop check queue no states left 22 99 899HgtP NE A Trace of BFS on the Graph 53 A R AM Wk 0 P I R U open A closed open BCD closed A open CDEF closed BA open DEFGH closed CBA s 39 open EFGHIJ closed DCBA open FGHIJKL closed EDCBA open GHJKLM as L is already on open closed FEDCBA open HIJKLMN closed GFEDCBA and so on until either U is found or open 1 States at Iteration 6 of BFS Open BFS of the 8Puzzle Problem 33 00 0 O Breadthfirst search showing order in which states were removed from open 2 an 4 an II E 5 6 I la 8 El 3 I I I 10 1 I 12 13 14 16 17 1 8 19 I I I I I I I I E II II Ill l II II II Illa I II II 20 21 22 23 24 2 El E I I 2 8253283232932 I II I I 3 II I II 8 I II I I I I I E 7557 87547547 39 28 29 30 31 32 33 E E 5 29323323333 513233233123 HEEEEEIH III 17 517 5 7 5 17 7 517 III E E 34 35 36 39 40 41 42 43 5 1 37 38 44 45 46 Goal DFS Algorithm begin open Start closed H while open 739 do begin remove leftmost state from open call it X if X is a goal then return SUCCESS else begin generate children of X put X on closed discard Children of X if already on open or closed put remaining children on left end of open end end return FAIL end initialize states remain goal found loop check stack no states left p A Trace of DFS on the Graph EPEOP gt1995gtWNf open A closed open BCD closed A open EFCD closed BA open KLFCD closed EBA open SLFCD closed KEBA open LFCD closed SKEBA open TFCD closed LSKEBA open FCD closed TLSKEBA open MCD as L is already on closed closed FTLSKEBA open CD closed MFTLSKEBA open GHD closed CMFTLSKEBA B D A A h I JZ K L M N o P Q R 01 5 To U States at Iteration 6 of DFS A A 1J KW N1 CRIME DFS with Iterative Deepening o DFS with bound Once it gets below a certain level or time assume a failure on a search path and go for another path eg in chess play in a limited time May solve some problems of both DFS and BFS o Depthfirst deepening DFS with a depth bound of 1 if it fails perform another DFS with a depth bound of 2 This continues increasing the depth bound by one at each iteration At each iteration it performs a complete DFS to the current depth bound Like DFS no information about the state space is retained between iterations Has the advantages over both DFS and BFS but space usage B x n B avg of children n level complexity OBquot still exponential DFS of the 8Puzzle with a Depth Bound of 5 Goal Breadthfirst vs Depthfirst 33 o Breadthfirst search Always examine all the nodes at level n before proceeding to level n1 BFS always find the optimal path to the goal Need more memory than DFS Appropriate for a problem with small search space and short path of solution 3 A problem with large space can be intractable o Depthfirst search 3 Can be efficient for a graph with many branches because it doesn t have to keep all the nodes at a given level on the open list Space usage is good less memory needed than BFS o If solution path is long it may find it quickly without wasting other branches But can be lost deep in the graph possibly missing shorter paths in other branches o Which approach is better 3 Many variations to BFS and DFS Decision should be based on the property of the problem 31 coco Backtracking Search Algorithm 3 l A 0 Algorithm d Ex One of the first search algorithms 5quot quot fir earlier thanDFS and BFS 133 W K Search begins at the start state and B 2 C Q 8 0 1D pursues a path until it finds a goal or x 4 if K t a in a D dead end I jf a if v r 39 k If the goal is found return goal if if 3 Ha dead end backtrack to the most E A 3 F1 5 o 9 recent state on the path and quot x i 5 G continue other paths i39 yxf39x t i w 39 x Works very similarly to DFS but I39 a39 11 unlike DFS backtracking does not 0 4 o 5 o 739 generate all successors only one H I J successor is generated and can detect if a path is fruitless through cost calculation then try other path Informed search 32 State space by backtracking search Backtracking Algorithm function backtrack begquot SLstate list NSLnew state list DEdead ends CScurrent state I SL Start NSL Start DE 1 CS Start initialize while NSL i do while there are states to be tried begin if CS goal or meets goal description then return SL on success return list of states in path if CS has no children excluding nodes already on DE SL and NSL then begin while SL is not empty and CS the first element of SL do begin add C5 to DE record state as dead end remove first element from SL backtrack remove first element from NSL C3 first element of NSL end add CS to SL end else begin place children of CS except nodes already on DE SL or NSL on NSL CS first element of NSL add C8 to SL end end return FAIL end A Trace of Backtrack on the Graph SLstate list NSLnew state list DEdead ends CScurrent state 00 Initialize SL A NSL A DE 1 cs A AFTER ITERATION 0 CS A SL A B A EBM WEBN UEBN Fem UFBN C A GCM N A BCDN EFBCDN WIEFBCDN UEFBCDM wacnm UFBCDN new pcom H L Elm Elm BFJEIm BFJEIW Summary 2339 0 State state space and search in problem solving process 0 Search direction 0 General search methods for state space search 0 Depthfirst search 0 Breadthfirst search 0 Backtracking o Are the state space representations and search algorithms good enough to model the human problem solving skills References 339 0 George Fluger Artificial Intelligence Structures and Strategies for Complex Problem Solving 6th edition Chapters 3 Addison Wesley 2009 Artificial Intelligence CPSC 481 AI As Representation and Search Heuristic Search for Problem Solving Lecture Overview o Heuristics in Al and problem solving o Heuristic search algorithms 3 Hillclimbing simulated annealing bestfirst search constraint satisfaction Game playing strategies using minimax and alphabeta pruning 0 Performance evaluation of heuristics Theoretical evaluation criteria Complexity and efficiency issues Heuristics m 0 Main problems of BFS and DFS o Brute force approach leading to combinatorial explosion o No utilization of information or strategies to make the algorithm efficient a We can improve many straightfonNard search algorithms by adding some smart ideas called heuristics 0 Improvement strategies use intelligent strategies maintain additional informationknowledge and utilize them 0 Heuristics is the study of the methods and rules of discovery and invention 9 Eureka l have found it First Three Levels of the TicTacToe State Space Reduced by Symmetry 3339 H There are nine but really Search space for TicTacToe game 9 three initial moves corner center side Symmetry in this game X X decreases the search a a aw lt7 xn R eea KK a i ii iiii sea see see The most wins Heuristic Applied to the First Children in TicTacToe 3339 x I x I Three wins through a comer square x I X I Fnur wins through the center square Two wins through a side square Heuristically Reduced State Space for TicTacToe 3 Heuristic value calculated by counting possible wins I X 23 of all the search space is pruned away with the first move Components of a Heuristic Function 3cm 0 A heuristic function fn gn hn o gn the distance or other measure from the start to current state n o Eg count 0 for the beginning state and is incremented by 1 for each level of the search hn a heuristic estimate of the distance from state n to a goal 0 h value guides search toward heuristically promising states while 9 value prevents search from indefinitely fruitless path o If two states are the same or nearly the same it is generally preferable to examine the state that is nearest to the root state of the graph initial state since it will give a greater probability of being on the shortest path to the goal 0 Good heuristic function returns a Unique and nonconflicting score with accurate measure of better Several Heuristics for the 8puzzle Game 53 Stan Possible heuristics 1 Counts the tiles out of place compared with the goal state in each state 2 Sum all the distances by which the tiles are out of place A 3 Multiplies a small number eg 2 l 2 3 2 3 3 times each direct tile reversal 1 e 1 6 4 7 L 5 The start state first moves and goal state for 8puzzle Goal Three Heuristics Applied to States in the 8puzzle E33 0 N 03 NC was U1 039 D 123 283 8 4 4 3 4 D 765 765 Goal 03 w qs mm 4 U1 039 C Tiles out of Sum of 2 x the number place distances out of direct tile of place reversals mm mm f3n What could be the problem of each heuristic Devising good heuristics based on the limited information is not easy 9 0000 00000 The Heurlstlc prplIed to States In the 8puzzle 3 Start 2M3 9W0 1 64 7 5 x m 8 3 2 33 2 M3 mm1 1 63 1 4 64 7f 7 65 l5 Values of fn for each state 5 4 6 where nmgmwmm gm actua1 distance from n to the start state and 1 2 3 hm number of tiles out of place 765 Goat 10 Simple Hillclimbing a Heuristic Search 5 o Strategies o Expand the current state of the search i Evaluate its children states a Select the close better node for further expansion but keeps no history i Continue until it finds a solution or reach a no better state o Comment on the approach i One of the simplest approaches to implement a heuristic o Problems i May be stuck at local maxima o If it reach a state that has a better evaluation than any of its children the algorithm halts and may fail to find the best solution May get confused by the result of evaluation when the best is not clear see graph in next slide Plateau Problem in HillClimbing with 3 5533 Level Look Ahead 0 Start Hillclimbing can get confused in this case plateau as the cost of all paths are similar Variations of Hillclimbing 2339 O Steepestascent or descent hillclimbing also called gradient search 0 To decide the next move all successors are compared and the closest to the solution is chosen while in a simple hill climbing Used for optimization 0 Still fail if there is no closer node which may happen if there are local maxima in the search space which are not solutions 0 Simulated annealing SA a A generic probabilistic metaheuristic for the global optimization problem 0 At each step the SA heuristic considers some neighbor s39 of the current state s and probabilistically decides between moving the system to state s39 or staying in state s The probabilities are chosen so that the system ultimately tends to move to states of lower energy This step is repeated until the system reaches a state that is solution good enough for the application or until a given computation budget has been exhausted o The name and inspiration come from annealing in metallurgy a technique involving heating and controlled cooling of a material to reduce their defects 13 BestFirst Search Algorithm on 0 Most algorithms backtracking hillclimbing etc a Can be effective as long as their evaluation functions are sufficient y informative to avoid local maxima dead ends and related anomalies in a search space 0 Bestfirst search always selects the most promising state on Open for expansion so it is often called Greedy bestfirst search a Using fn hn o Basic principle used 0 The more informed the heuristic the fewer states are processed o Algorithm sketch c Maintain Open to be visited and Closed already visited states 9 Maintain the information of ancestor states and the shortest path 0 Duplicate states are not retained a Open is maintained as a priority queue 9 Like a search algorithm taking only good strategies from o DFS BFS Backtracking Simple hillclimbing steepestdescent hill climbing 14 BestFirst Search Algorithm 332 r O O O 0 function bestflrstsearch O O O begin open Start initialize closed 2 while open do states remain begin remove the leftmost state from open call it X if X goal then return the path from Start to X else begin generate children 01 X for each child of X do case the child is not on open or closed begin assign the child a heuristic value add the child to open end the child is already on open it the child was reached by a shorter path then give the state on open the shorter path the child is already on closed it the child was reached by a shorter path then begin remove the state from closed add the Child to open end end 34 case put X on closed reorder states on open by heuristic merit best leftmost and return FAIL open is empty 15 end 0000 00000 BestFirst Search of a Hypothetical State 33 000 Space A5 Letters represent states B 4 C 4 B 6 Numbers represent heuristic values I quotK as cost eg the smaller the better I J39 in this example r V r v Equot5 F 3 14 393 I J Bold states indicate the states x x i A It expanded by the heuristic algorithm t 1 V Y M N O quot 7 s 9999 A Trace of the Execution of BestFirst Search 339 Maintain a priority queue evaluate A5 open B4C4D6 closed A5 evaluate B4 open C4E5F5D6 closed B4A5 evaluate C4 open H3G4E5F5D6 closed C4B4A5 evaluate H3 open 02P3G4E5F5DB closed H3C4B4A5 evaluate 02 open P3G4E5F5D6 closed 02H3C4B4A5 evaluate P3 the solution is found open A5 closed States after the Fifth Iteration of the While Loop using BestFirst Search A5 34 74 06 E5 F5 G H3 I J 1 02 P3 Fl 0 L M l T K i 00000 Heurlstlc fn Applied to States In the 8puzzle 3 Start 2 3 3 glnl0 1 e 4 7 5 jt C 28 3 2 3 2 83 gtnl1 1 5i 1 LG 4 7i 7 e 5 l 5 Values of fn for each state 6 4 6 where fn 9m hm gm actual distance from In to the start state and 1 2 3 Mn number of tiles out at place fn is a cost function the smaller the better 7 6 5 19 Goal Successive Stages of Open and Closed with 333339 Bestfirst Search a 1 open a4 closed 2 open C4 b6 d6 closed a4 3 open e5 f5 b6 d6 96 closed a4 04 4 open f5 h6 b6 d6 96 7 closed a4 c4 e5 5 open j5 h6 b6 d6 96 k7 l7 closed a4 c4 e5 f5 6 open US h6 b6 d6 96 k7 l7 closed a4 c4 e5 f5 j5 7 open m5 h6 b6 d6 96 n7 k7 l7 closed a4 c4 e5 f5 j5 l5 8 success m goal 20 Open and Closed as they Appear after the 53 00 3lrcl Iteration of Bestfirst Search 0 00 Staten ftaJ4 a S S d 2 4 REES 253 a 232 r s h HH H f 39a an n E 23 I 53 II a a r a an a Closedlist II a la 21 Open list Goal 000 State Space Generated in Bestfirst Search LB VB 0T search 9m1 gfnj1 gljn 2 gin 3 quotl 4 QU39U 5 Admissibility and Shortest Path E D E F l N IT I O N ALGORITHM A ADMISSIBILIT K ALGORITHM A Consider the evaluation function n gn hn where n is any state encountered in the search gm is the cost of n from the start state h n the heuristic estimate ofthe cost of aoimJ from n to a goal l l a t AdmISSIble heuristic If this evaluation function is used with the bestiirstsearch algorithm neVer OVereStlmateS the the result is called algorirhmA COSt to reach the goal A search algorithm is admissible ii for any graph it always terminates in the optimal solution path whenever a path from the start to a goal state exists If algorithm A is used with an evaluation function in which hm is less than or equal to the cost of the minimal path from n to the goal the resulting search algorithm is called algorithm A pronounced quotA STAR It is now possible to state a property of A algorithms All Aquot algorithms are admissible Goal n m S 0 n 1 0 1 dl e 0 9 6 U U U U m 9 UI U g Heuristic used is fn the graph searched heuristically is shaded The optimal search selection is in bold search and breadth Comparison of state space searched using heuristic first search The proportion of Local Admissibility for Heuristics Eu D E F l N l T l O N MONOTONICITY Consistency A heuristic function h is monotone if 1 For all states nl and n where nJ is a descendant of ni hn hnJ S costninl Where costnlnj is the actual cost in number of moves of going from state nl to n 2 The heuristic evaluation ofthe goal state is zero or hGoal 0 Evaluating the Behavior of Heuristics 0 Criteria for good heuristics Completeness o Is the algorithm guaranteed to find a solution when there is one Optimality 0 Does the strategy find the optimal solution Time complexity o How long does it take to find a solution Space complexity o How much memory is needed to perform the search Use of Information in Heuristics 33 DE F l N IT ON INFORMEDNESS For two A heuristics h1 and h2 if h1n S hgln for all states n in the search space heuristic he is said to be more Hiramlied than h1 In general the more informed heuristic is better But we should consider the computational cost to use the more informed heuristics eg computer chess Computational Cost vs Informedness 0quot Total cost of the Hf problem solving VE Control strategy 0031 Computational cost Rule application cost O quotInformednessquot Complete Informal plot of cost of searching and cost of computing heuristic evaluation against informedness of heuristic from Nilsson1980 28 Complexity and Efficiency Issues 0000 25 22 21 20 19 18 17 16 Branching3 factor B 15 14 13 12 1O lLl i 100 1000 10000 T Total states Number of nodes generated as a function of branching factor B for various lengths L of solution paths The relating equation is T BB39 1B 1 from Nilsson1980 29 Beyond the Types of Heuristic Search Methods We Have Discussed So Far Minimax Algorithm for Game Playing an 0 When playing a game with opponent at least two players 0 Need to take into account for the actions of the opponent o In order to win a player needs to maximize hisher advantage and minimize opponent s advantage whenever possible 0 The players in a game are referred to as MIN and MAX 0 MAX represents the player trying to win or to maximizing her advantage 0 MIN represents the player attempting to minimize the MAX s score 0 Assumption Your opponent uses the same knowledge ofthe state space as you use and applies that knowledge in a consistent effort to win the game 0 Implementation 1 Create a game graph by the rule of the game and strategies Label each level of the game graph as MIN and MAX Each leaf node is given a heuristic value Minimax propagates these values up the graph through successive parent nodes according to the following rules 0 Ifthe parent state is a MAX node give it the maximum value among its children 0 Ifthe parent state is a MIN node give it the minimum value of its children 31 3905 MiniMaxing to Fixed Ply Depth 3 A hypothetical MAX xT state space 0 2 I k a f MAX 3 9 s 0 7 2 gt3 1 139 ll 1 f Iquot 1 i T aquot II 5quot 395 l I39 395 V39 l 0 lt0 0 MW 2 3 Leaf states show heuristic values internal states show backedup values O O O O O O O O O Heu rlstlcs Applied to States of TIcTacToe 33 I O O O O 2 15 X has 6 possible win paths 3 5 X my move X W Jquot 0 0 opponent s move X I O O has 5 possible wins En8 51 415 X has 4 possible win paths X 0 C has 6 possible wins En 4 6 2 o X has 5 possible win pains X 0 has 4 possible wins Ein 5 4 i Heuristic is En Mm Oini where Mn is this total of My possible winning lines Oin is total oi Opponeni s possible winning lines 33 E01 is me lotal Evaluation for state n Two Ply Minimax Applied to the Opening Move Of TicTacToe from Nilsson 1971 MN9 it 6 515 5OB 515 504 54 X MAX move first G Start node MA X39S m 0V9 Ad ir Iir r er dr r f ir hEM H HH i jiED X i x 5 6 16 605 MINe gig MN9 D it fzt X X 8 X5 4 16 412 magi 1s eo4 s 2 Two Ply Minimax and One of Two o Possible MAX Second Moves from Nilsson 1971 3 xo Start node gh tf c sfgpg 4 22 3 21 5 2a 3 21 4 22 4 22 gh z z i G 4 313 3o 5 32 3 3o 4 314 31 ie tg g m g t 4 22 4 22 5 23 3 21 4 22 4 22 Twoply Minimax Applied to X s Move Near the End Of the Game from Nilsson 1971 3339 MAX 39393939 we 6 Start node 63 A awrsaw 3 i ggi m 3212 20 3 21 m 2 113 12 3 12 m0 i f 36 o 2 20 2 20 3 21 Alphabeta Pruning for Minimax 3 0 Problem of minimax Pursues all branches in the space including many that could be ignored or pruned by a more intelligent algorithm 0 Idea of alphabeta pruning 0 Rather than searching the entire space to the ply depth it proceeds in a depth firsz fashion Two values alpha for MAX and beta for MIN are created during each search use of more informed heuristics Alpha can never decrease and Beta can never increase 0 Algorithm Descend to full ply depth in a depthfirst fashion and apply our heuristic evaluation to a state and all its siblings 2 Values are backed up to parent using minimax algorithm 3 Use two rules below for terminating search based on alpha and beta values Stop the search below any MIN node having a beta value S the alpha value of any of its MAX ancestors Stop the search below any MAX node having an alpha value 2 the beta value of any of its MIN node ancestors I Alphabeta Pruning Applied to a Hypothetical State Space Graph 30 MAX Aw MIN M 0k Na DFQ path 395le If 3 El 1 A l l 39f o l o 0 l o 2 NOte Alphabeta prunmg A has 3 3 A will be no larger than 3 does NOT create a complete B IS 5 pruned snnce 5 gt 3 game tree C has a 3 C wwll be no smaller than 3 D is 0 pruned since 0 lt 3 E is 0 pruned slnce 2 lt 3 C is 3 Constraint Satisfaction 32 o A search procedure that operates in a space of constraint sets 0 Many scheduling or optimization problems 0 Solution finding process 0 The initial state contains the constraints that are originally given Twosteps constraints are discovered and propagated as far as possible throughout the system 0 Find a solution to a set of constraints that impose conditions that the variables must satisfy A solution is therefore a vector of variables that satisfies all constraints 0 Solving problems on these constraints can be done via variable elimination or the simplex algorithm used for solving Linear Programming problems 0 Techniques used in constraint satisfaction depend on the kind of constraints being considered Example Linear Programing O o A scenario of a wooden toy manufacturer 3 a A soldier price 27 cost of raw materials 10 labor amp overhead cost 14 a A train price 21 cost of raw materials 9 labor amp overhead cost 10 Skills carpentry and finishing required are c One soldier requires 2hrs of finishing and 1 hr of carpentry labor 0 One train requires 1hr of finishing and 1 hr of carpentry labor Raw materials available each week 100 hrs of finishing and 80 hrs of carpentry 0 Demand of toys from market Soldiers at most 40 per week and Trains unlimited 0 How can the company maximize weekly profit revenues cost 0 Objective function Maximize z 3x1 2x2 x1 of soldiers produced perweek 0 Subject to constraints x2 oftrains produced per week 0 2x1 x2 lt 100 Finishing constraint x1 x2 lt 80 Carpentry constraint O 0 x1 lt 40 Constraint on demand for soldiers 0 x1 x2 gt 0 Sign restriction 0 Solution 0 The feasible region for the problem is the set of all points x1 x2 satisfying the equations We can graphically determine the feasible region and get z180 40 Issues related to State Space Representation in Problem Solving 0 State space representation discussed mostly for game problems 0 For most games such as the 8 puzzle tictactoe etc search space is large enough to require heuristic pruning i Games generally do not involve complex representational issues 0 Mostly require a simple representation eg board configuration a Because of the common representation a single heuristic may be applied to throughout the search space 0 But many other complex realworld problems require much more complex representations and methods Beyond Traditional Heuristic Approaches 30 o Heuristic search is one of the oldest Al strategies since Many problems do not have exact solutions eg medical diagnosis or a problem may have an exact solution but the computational cost of finding it may be prohibitive But all heuristics are fallible since heuristic is an informed guess of the next step to be taken in solving a problem 0 Challenges to traditional heuristic approaches a A heuristic function is supposed to be estimate the cost of a solution Can a heuristic function be learned from experience 0 Requires inductive learning and a function created and executed during runtime a Search methods under partially observable environments 0 State space search is deterministic and fully observable Search methods under unknown environments Contingency problems or online search problems 0 Offline search computes a complete solution before action but online search interleaves computation and action eg first take an action then observes the environment and compute the next action etc 42 Summary 23 o Bruteforce search DFS BFS o Heuristics in Al for problem solving and various heuristic search algorithms Hillclimbing simulated annealing bestfirst search constraint satisfaction and many other algorithms Game playing strategies using minimax and alphabeta pruning 0 Performance evaluation of heuristics o Admissibility monotonicity informedness Complexity and efficiency issues 0 What approach is used for knowledge or problem representation in problem solving 0 What approach is used for retrieving knowledge and finding solutions 0 Does human use heuristic approach in problem solving References 339 0 George Fluger Artificial Intelligence Structures and Strategies for Complex Problem Solving 6th edition Chapter 4 Addison Wesley 2009 Artificial Intelligence CPSC 481 Evolutionary Computation for Problem Solving Learning and Creativity Lecture Overview 0 Molecular biology basics for computer scientists o Evolutionary approaches Problem solving using biological metaphor such as DNA evolution survival of fitness Genetic algorithm Genetic programming Supervised learning and symbolic regression A Crash Course of Molecular Biology 3 co 00 From atom to species 0 Atom Physical 9 Molecule Chemical 9 Cell Biological 9 Organism Physiological 9 Species Ecological Three most important types of molecules in cells DNA Deoxyribonucleic acid a RNA Ribonucleic acid 3 Proteins Proteins in cells are involved in virtually all cell functions 3 Each protein within the body has a specific function 0 Enzymes carry signals transport small molecules such as oxygen form cellular structures tissuesregulate cell processes such as defense mechanisms What are made of a protein 0 A chain of amino acids gt a protein G 20 different amino acids found in nature How are proteins created How are Proteins Created in Cells 52339 0 Central Dogma of Molecular Biology 0 DNA code of life 9 RNA mRNA through transcription or copy 9 Amino acid through translation or protein synthesis 0 DNA in 1 D medium linear sequence of characters is translated into 3D structure of proteins by Ribosomes o In fact this whole process happens in 4D with timing eg 1D 9 3D structure plus time 4D 0 Basis of modern molecular biology o What does DNA contain and where is it located Each cell 46 human chromosomes Zmeters of yf v DNA quot 3 billion DNA subunits the I V bases A T C G Approximately 30000 genes code for proteins that perform most life functions YGG 010085 DNA Location Structure and Genetic Code 0quot 0 DNA is located in Chromosomes in cells packages for DNA 0 Normally chromosomes are seen only when the cell divides they become highly condensed and visible Chromosomes in any cell exist as identical pairs Homologous pairs 0 humans 23 pairs corn 10 pairs etc Members of a pair are alike but different from all other pairs 0 DNA structure 0 Genes as genetic code DNA is doublestranded with collection of base pairs Base pairs nucleotides AT GC are complementary known as WatsonCrick bps A doublestranded DNA sequence can be represented by strings of letters 1 D in either direction 0 539 TACTGAA 339 or 339 ATGACTT 539 Length of DNA in bps eg 100 kbp Specific sequence of nucleotides ATGC along a chromosome carrying information for constructing a protein A gene consisting of 3 base pairs represents a codon one amino acid 0 Allele is a point or place on a chromosome Locus is a location of gene on a chromosome Mendel s idea of gene in 1860 s was elucidated 75 years later using DNA 6 gene DNA as Complex Computer Software DNA is very complex source code code of life but semantics are not known of functional body of life like a complex computer system 08 DBMS etc 0 A Human DNA is comprised of approximately three billion base pairs DNA is 1 D but functional proteins are 4D structures timings determine functions DNA and computer analogy Each base pair of the DNA alphabet A T G C a binary bit in an instruction A codon for an instruction or a gene for a statement in source code Direction of DNA translation 3 sequence of statements order of execution Action of a protein execution of an instruction A group of genes corresponds to an organism a function or module DNA repair using redundancy automatic debugging checksum hamming code for fixing transmission errors software verification Variations in genes using genetic operators or evolution modifications of statements for different effects or functions during software maintenancemigration Evolution by Natural Selection can 0 Genome and heredity Genome is a complete set of chromosomes for a specie Half of the genome DNA is passed from parent to child Thus heredity is passed through the genome 0 Genetic variation in a population is o the result of principal forces mutation crossover sexual recombination of genetic code DNA 0 Evolution is achieved by Inheritance by reproduction and crossover the result of natural selection or survival of the fittest achieved through genetic variation over multiple generations 0 What do these DNA and evolution have to do with problem solving in Al What have We Learned from Biologists 3 How the information about functions for an individual is represented and where it is stored Inheritance and natural selection How an individual is evaluated What the operators for genetic variations are The importance of diversity in a population How a population is evolved adapted improved or survived Evolutionary AlgorithmsComputation 00 o Idea of evolving computer programs is almost as old as the computer itself Friedberg 1958 o The idea of automatic programming is natural once you have learned how complex and tedious manual programming is o The general term evolutionary computation EC or evolutionary algorithms EA has emerged for these techniques 0 Basic idea of evolutionary algorithms 0 Generation of initial solutions based on a priori knowledge results of earlier run or random initial population Loo evaluate and select them by fitness function to determine o If the solutions is are sufficiently good then Stop and Return the solutions o Else Generation of variants by genetic operators mutation and crossover 0 Stop if the specified cycles for next generation is reached 0 Many variants to ECs were proposed The two most popular ones are 0 Genetic Algorithm GA and Genetic Programming GP Genetic Algorithms GA 332 0 First introduced by Holland and his colleagues at the University of Michigan19 5 o A population Pt xt1 xtz xtn xti is a candidate solution at time t gener tion begin set time t O initialize the population Pt while the termination condition is not met do begin evaluate fitness of each member of the population Pt select members from population Pt based on fitness produce the offspring of these pairs using genetic operators replace based on fitness candidates of Pt with these offspring set time t t 1 end end Key questions to answer from this algorithm How do we represent a problem based on a biological metaphor How do we create an initial population with initial candidate solutions offspring How do we define fitness of each member How do we evolve the population How do we use genetic operators 11 Answers to the Questions to Setup GA 000 Representation 0 GA traditionally use a fixed length binary string other encodings are possible So the success of a GA often depends on finding a suitable way to encode a solution to the problem as a bit string Initialization initial population of solutions chromosomes 0 Created randomly Eg each solution chromosome is a random binary vector if binary string is used for representation of a solution Selection evaluation of fitness for each solution and select 0 A fitness function fx for binary vector should be defined Reproduction evolving a population 0 Use genetic operators mutation and crossoverto solutions to produce a population for next generation Terminating conditions and other necessary parameters a Population size probability of crossover probability of mutation number of generations etc Examples Crossover and Mutation 3 Input Bit Strings IS don t care 110 101 110 01 Resulting New Strings 1 l00 l 110101 Crossover takes two candidates randomly or strategically determine the split points in the middle or randomly divides them and swapping components to produce two new candidates modified l 110101 a 11OO1 110O1 E 110101 Mutation takes a single candidate randomly or strategically and change one or some bits randomly 110101 gt11O l1 Example Optimization of a Simple Function 0 Problem description 0 0 Given fx xsin107rx 10 find X in the range 12 which maximizes th function f Or find X such that fX 2fX for all X 6 12 Precision ofx is 6 digits after the decimal point 0 Encoding the problem and representing problems and solutions 9 The size of binary vector as a chromosome can be determined as follow 0 Variable x in the range 12 can be divided into at least 31000000 equal size ranges This means that 22 bits are required as a binary vector chromosome 2097152 227 lt3000000 5222 4194304 9 Mapping from a binary string ltb21b20b0gt into a real numberx from the range 12 is straightforward and is completed in two steps 0 Convert the binary string ltb21b20b0gt from the base 2 to base 10 c Find a corresponding real number X x 10 x 32221 where x is a 22bits of chromosome 10 is the left boundary ofthe domain 20 is the right boundary of the domain 0 For example a chromosome X 1000101110110101000111 represents the number 0637197 since x 10001011101101010001112 2288967 and x 10 228896734194303 0637197 Of course the chromosomes 00000000 and 11111111 represent boundaries of the domain 10 and 20 respectively 0 What about other parameters Example Optimization of a Simple Function Eu lnitial population o Create a population of chromosomes where each chromosome is a binary vector of 22 bits All 22 bits for each chromosome are initialized randomly 0 Fitness function 0 Define the fitness function fx and evaluate binary vectors 0 Getx from binary vector an plugged it into fX Genetic operators 3 Perform crossover and mutation probabilistically 0 Parameters a For example population size 50 probability of crossover 025 probability of mutation 001 number of generations 150 RUN the system when everything is ready Looking forX that returns the maximum value of fX Example Encoding the Traveling 3 Salesman Problem 5 Assume there are nine cities to visit Representation 1 0 Path representation use 4 bits of binary to represent each city 1 2 9 in order p1 0001 00100011 1000 1001 p2 0101 0100 0011 0100 0011 Q Can we apply crossover and mutation operators 0 Fitness function straightforward calculation of cost Representation 2 0 Path representation use decimal numbers instead of binary p1192465783 p245918723 Q Can we apply crossover and mutation operators 0 Mutation is possible if neighboring cities are exchanged Modified crossover operator called Order Crossover 0 Make two cut points in each of two parents randomly p1 1 92465783 p2459187623 0 Two children c1 and c2 are produced by copying the middle segments 0 01xxx4657xx 02xxx1876xx 0 Starting from the second cut copy the cities from the other parent except for existing cities c1 239465718 c23921 87645 from23 918removing46 57 Genetic Programming GP Cramer and Koza suggested that a tree structure should be used as the progr m representation in a genome Koza recognized the importance of the method and demonstrated its feasibility for automatic programming in general in1992 Generational GP algorithm 1 Initialize the population 2 Evaluate the individual programs in the existing population Assign a numerical rating or fitness to each individual 3 Until the new population is fully populated repeat the following steps Select an individual or individuals in the population using the selection algorithm Perform genetic operations on the selected individual or individuals Insert the result of the genetic operations into the new population 4 If the termination criterion is fulfilled then continue Otherwise replace the existing population with the new population and repeat steps 24 5 Present the best individual in the population as the output from the algorithm Random Generation of a Program in 3 Tree Structure to Initialize Population The circled nodes are from a set of functions function set Example Crossover Operation 3 a b a b Two programs selected on fitness for The Chnd programs produced crossover Points from a and b are by crossover of the points randomly selected for crossover 19 Components of GP Functions and terminals are the primitives for GP programs a Loosely speaking terminals provide a value to the system while functions process a value already in the system c The terminal set is comprised of the inputs to the GP program 0 The function set is composed of the statements operators and functions available to the GP system 0 Functions and terminals must be assembled into a structure before they may execute as programs Programs are structures of functions and terminals together with rules or conventions for when and how each function or terminal is to be executed Sufficiency and parsimony in choosing functions and terminals 9 The function set may be applicationspecific and be selected to fit the problem domain parsimony is related to search space The functions and terminals used for a GP run should be powerful enough suf ciency to be able to represent a solution to the problem Closure of the function and terminal set 0 Each function in the function set should be able to handle gracefully all values it might receive as input This is called the closure property Other needed parameters Population size probability of genetic operators termination conditions etc 20 Tournament GP Algorithm Unlike generational GP there are no fixed generation intervals Instead th re is a continuous flow of individuals meeting mating and producing offspring The offspring replace existing individuals in the same population The method is simple to implement and has some efficiency benefits together with benefits from parallelization Algorithm 1 Initialize the population 2 Randomly choose a subset of the population to take part in the tournament the competitors 3 Evaluate the fitness value of each competitor in the tournament 4 Select the winner or winners from the competitors in the tournament using the selection algorithm 5 Apply genetic operators to the winner or winners of the tournament 6 Replace the losers in the tournament with the results of the application of the genetic operators to the winners of the tournament 7 Repeat steps 27 until the termination criterion is met 8 Choose the best individual in the population as the output from the algorithm The approach is called also steady state because the genetic operators are applied asynchronously and there is no centralized mechanism for explicit generations 2 GP Example Regression for a Function Fitting Regression using supervised learning 0 Involves the learning of the function that map a data item to a real valued prediction variable Possible parameters for genetic learning Parameters Values Objective evolve functions fitting the values ofthe cases in training data set 0 Terminal set variable x input and integer constants between 5 and 5 0 Function set Training data set Population size recordID Input Output Crossover probability 90 1 0000 0000 o Mutation probability 5 2 0100 0005 0 Selection tournament selection size 4 3 0200 0020 T I t 0 001 4 0300 0045 O O eran error 5 0400 0080 0 Termination criterion none 6 0500 0125 0 Maximum number ofgenerations 100 7 0600 0180 8 0700 0245 0 Maximum depth oftree after crossover 200 9 0800 0320 Maxnmum mutant depth 4 10 0900 0405 Initialization method row g A function learned fX X22 22 0000 000 GAIGP Algorithms Visualized as 3 I I I Parallel HIll Climbing Solution Solution Quality Quality The dOtS on the curve represent candidate solutions After n generations they tend to cluster around areas of Search Space Search Space higher SOIUtion quality a The beginning search space b The search space alter I39I generations GAGP works great for many optimization problems Support implicit parallelism May not be desired for problems that can be solved algorithmically Summary 23 0 Concepts of evolutionary computation o Biology and computer analogy o Evolutionary approaches for problem solving o Genetic algorithm o Genetic programming 0 What are the knowledge and problem representation and problem solving strategies used in EC 0 Can evolutionary approaches be used for implementation of discovery or creativity 0 Some evolutionary approaches used for problem solving 3 Redundant programs selfreconfigurable systems etc References 339 0 George Fluger Artificial Intelligence Structures and Strategies for Complex Problem Solving 6th edition Chapter 12 Addison Wesley 2009 o D Sadava DM Hillis HC Heller M Berenbaum Life The Science of Biology Vol II WH Freeman 2009 o J Koza Genetic Programming On the Programming of Computers by Means of Natural Selection Complex Adaptive Systems The MIT press 1992 o D Goldberg Genetic Algorithms in Search Optimization and Machine Learning AddisonWesley 1989 0 Genetic Programming Proceedings of the First Annual Conference Complex Adaptive Systems The MIT press 1996 Artificial Intelligence CPSC 481 Machine Learning Part A What is Learning 0 Learning ls acquiring new knowledge skills values or understanding Occurs as part of training or education personal development or expenence Learning curve is progress of learning over time learning performance o Think about how you learn o Some common learning methods Memorization generalization or conceptualization specialization pattern recognition inductiondeduction classification or categorization by patterns or concepts synthesizing different types of information reasoning analogy What is Machine Learning 2339 0 Machine learning a Learning by computer a Any change in a system that allows it to perform better the second time on repetition of the same task or on another task drawn from the same population Simon 1983 0 Why machine learning a One of the important Al problems Q So many real world applications such as data mining pattern recognition forecasting etc a Knowledge engineering bottleneck Costly to build expert systems and acquire knowledge Machine Learning Approaches u o Categories of machine learning Symbolic learning vs nonsymbolic learning Supervised learning vs unsupervised learning Hybrid learning o Machine learning methods Rote learning computer can do much better than human Generalization and concept learning Learning by examples inductive learning or classification Learning by discovery similaritybased learning or clustering Explanationbased learning deductive learning Learning by analogy or prior knowledge Learning through adaptation equilibration with the world connectionist or neural network genetic learning 0 Learning can be also seen as a search problem Symbolic Machine Learning Approaches 0 Concepts learning and generalization o Supervised learning concepts 0 Version space learning using candidate elimination algorithm 0 Decision tree D3 and 045 Generalization 0 Definition of generalization 0 Expression P is more general than Q iff P Q Q 0 Volleyball baseball football generalize to ball or sports 0 Use of generalization operators in logical representation Replacing constants with variables 0 ballroundred generalizes to ballround X Dropping conditions from a conjunctive expression 0 shapeXround quot sizeXsmall quot colorXred generalizes to shapeXroundquotcolorXred Adding a disjunct to an expression o shapeXroundquotsizeXsmallquotcolorXred generalizes to shapeXroundquotsizeXsmallquotcolorXredv colorXblue Replacing a property with its parent in a class hiearchy o colorXred generalizes to colorX primarycolor if primarycolor is a superclass of red following a class hierarchy Concept Learning through Generalization o Definition of concept and concept space A concept is a cognitive unit of meaning an abstract idea or a mental symbol eg defined as a quotunit of knowledge through generalization a If concept p is more general than concept q we say that p covers q Or in predicate calculus p covers q iff qx is a logical consequence of px Concept space is a set of concepts candidate concepts that can be created by learning operators eg generalization specialization or others 0 From two positive ball instances of ball 1 Sizeballi small quot colorbal1red quot shapeball1round 2 Sizeball2large A colorball2blue quot shapeball2round generalizes to a Ball concept SizeballY A colorbalZ quot shapeballround a A representation of the concept ball can be 0 SizeballY quot colorbalZ quot shapeballround 0 Any sentence that unifies with this general definition represents a ball O 000 Generalization and SpecIalIzatlon In 3 0 a Concept Space quot objXYZ objXYZ may be general enough but too general objlx Y ball objX red Y objsmall X Y o objX red ball objsmall X ball objsmall red X o o obj is too specific to be a concept objsmall red ball objlarge red ball objlfsmallwhite ball 0 Concept Space Arch Concept Represented in Semantic Networks in Winston s Learning Program a An example of an arch and its network description Generalization to Cover all Positive 00 Examples in Winston s Learning Program 30 c Given background knowledge that bricks and pyramids are both types of polygons Need a generalization hierarchy of these concepts as background knowledge Positive and Negative Examples in Learning 0quot the Concept of Arch in Blocks World ll Arch E Near mlss Arch Near mlss Positive examples Negative examples near miss To exclude all negative examples 11 000 000 Specialization from a Concept to Exclude 3 aCandid1tedeacriplionofanarch For specialization of a description to exclude a near miss the network c adds constraints to network a so that it can t match with b Near miss negative example helps the learning algorithm to determine exactly how to specialize the candidate concept performs a CArc1deacn39ption specializedto excludethe near mites search without backtracking on the concept space guided by the training data So the performance is highly sensitive to the order 39 m eu m and quality of the training examples ED 12 The Role of Negative Examples in 3 Preventing Overgeneralization quotquot quot I 39 i i L 2 9 5 L ix a l i KS Ax Concept induced firo m Concept induced irom positive exampies only positive and negative exampies Lessons Learned from Winston s Learning Program 3 o Generalization and specialization operations can be used to define a concept space 0 Need positive examples to generalize and negative examples to prevent overgeneralization for training data Concept should be general enough to cover all positive examples but should be specific enough to exclude all negative examples 3 Avoid overgeneralization by generalizing as little as possible to cover positive examples or use negative examples to eliminate overly general concepts specialization 0 Problem of Winston s learning program 3 Sensitivity of the learning algorithm to the quality of the training data 3 What does it mean by the quality of training data A Framework for Symbolic Learning Representation language E E E Operations Data and goals of the learning task Concept space H e uristio search Acquired Knowledge Supervised inductive learning process 1 Training with both positive and negative examples training data to build a learning model or patterns 2 Testing for verification of concept learned Version Space Learning an 0 Approach used Version space is the set of all concept descriptions consistent with the training examples a Inductive learning by generalization using candidate elimination algorithm Mitchelle 1978 o Candidate elimination algorithm Generalization and specialization operations are used to define a concept space starting with maximally general and specific concepts 3 A concept c is maximally general if it covers none of the negative training examples and for any other concept c that covers no negative training example c 2 c A concept c is maximally specific if it covers all positive examples none of the negative examples and for any other concept c that covers the positive examples c g c A Version Space Concept Space obj Y Z objX Y ball objX red Y objsmall X Y objLX red ball objsmal X ball 0bjsmall red X a u ob small red ball objlflarge red ball ob small white ball u u Version Space Learning 2339 o Candidate elimination algorithm 3 Need positive examples to generalize and negative examples to specialize to prevent overgeneralization o A concept should be general enough to cover all positive examples but should be specific enough to exclude all negative examples I Maintains two sets of candidate concepts hypotheses G the set of maximally general candidate concepts and S the set of maximally specific candidates G covers all of the positive and none of the negative instances 0 Algorithm combines these approaches into a bidirectional search o It specializes G and generalizes 8 until they converge on the target concept o The G and 8 sets summarize the information in the negative an positive training instance eliminating the need to save these instances Specific to General Search for 3 Hypothesis set S Candidate Concept Er Begin Uses positive examples to Initialize S to the first positive training instance eliminate overly specialized concepts N is the set of all negative instances seen so far and Negative examples to prevent overgeneralization by specialization For each positive instance p 0f candidate concepts Begin For every 5 e S ifs does not match p replace 5 with its most specific generalization that matchs p Delete from S all hypotheses more general than some other hypothesis in S Delete from S all hypotheses that match a previously observed negative instance in N End For every negative instance n Begin Delete all members of S that match n Add n to N to check future hypotheses for overgeneralization End End Example of Specific to General Search 3 to Learn the Concept of Ball 5 l l l S objsmall red ball 5 objsmall X ball Positive objtsmall red ball Positive obnsmall white ball Positive objtlarge blue ball Concept learned from the learning 3 011ng x balm process training using only S set O O O O O O O O O 0 0 Training data Size Color Class Small Red Ball Small White Ball Large Blue Ball General to Specific Search for u Hypothesis set G Candidate Concept Begin Initialize G to contain the most general concept in the space P contains all positive examples seen so far For each negative instance n Begin For each 9 e G that matches n replace 9 with its most general specializations that do not match n Delete from G all hypotheses more specific than some other hypothesis in G Delete from G all hypotheses that fail to match some positive example in P End For each positive instance p Begin Delete from G all hypotheses that fail to match p Add p to P End End General to Specific Search of the Version Space Learning the Concept Ball 3 obHXX ZH Negative objismail red brick l G objiiarge Y Z ubjiX white 2 b39Xb Z b39XY bII b39XY 10 L y V O Mquot 9 o H 39 a J o H C e Posnwe obargewhne ball l G objarge Y Z Ob x Whne 23 ObKX39 Y ham Negative objilarge blue cube l G obj arge while Z obji X white 2 objXi Y ball a objX Y ball Positive objsmall blue ball Concept learned from the learning process training using only G set O 000 0000 00 0 0 Training data Size Color Class Small Red Brick Large White Ball Large Blue Cube Small Blue Ball 22 000 ooooo Bidirectional Search u mquot Converge II 0 Begin Initialize G to be the most general concept in the space Initialize S to the first positive training instance The algorithm specializes G and generalizes S until they For each new positive instance p converge on the target concept Begin Delete all members of G that fail to match p For every 5 e S if s does not match p replace 5 with its most speci c generalizations that match p Delete from 8 any hypothesis more general than some other hypothesis in 8 Delete from S any hypothesis more general than some hypothesis in G End For each new negative instance n Begin Delete all members of S that match n For each 9 e G that matches n replace 9 with its most general specializations that do not match n Delete from G any hypothesis more specific than some other hypothesis in G Delete from G any hypothesis more specific than some hypothesis in S End Learning the Concept Ball with G and S 3333 G obJX Y 2 3 l l l e ob xt Y 2 s obJsmall red ballJ G objtx redZ S objismalL red balm l l a obJX red 2 s objX red bell G obX red ball 3 obJX red balll Posltlve objlsmall red ball Negative objsmal blue ball Posltlve obj arge redr ball Negatlve oqu39large red cube process training using both G and 8 sets Training data Concept learned from the bidirectional learning Size Color Class Small Red Ball Small Blue Ball Large Red Ball Large Red Cube 24 Converging Boundaries of the G and S Sets I f 39 I 39 a f I n v I k 9 3 Boundary 01 S M 393 39 L 9 1 f K J A Version Space Learning for Solving Symbolic Integration LEX l G Jf1xf2xdx gt apply 0P2 I fpo ypz f2x dx JIransqx f2x dx apply 0P2 apply 0P2 J kx cosx dx 3x trigm dx 39 apply 0P2 39 apphy 0P2 l SI3x cosxdx apply 0P2 I 0 33 Evaluation of Candidate Elimination Algorithm C 0 Pros in Because the algorithm is incremental concept building it doesn t require all training examples present unlike other learning approaches 0 Cons o Searchbased learning must deal with the combinatory problem space like all search problems 3 Algorithm is not noise resistant 0 Single misclassi ed training instance can prevent the algorithm from converging on a consistent concept Possible solution is to maintain multiple G and S efficiency problem in this case 0 Main contribution by version space and related issues a Knowledge representation generalization and search in inductive learning but also raised general questions complexity expressiveness use of knowledge and data to guide generalization related to learning a The role of prior knowledge in learning Decision Tree Induction for Classification 339 0 Decision tree induction 3 uses decision tree as the knowledge concepts representation D3 Quinlan 1986 is one of the earliest learning algorithm using the decision tree 0 Decision tree is a popularly used graph model for decision analysis in business operation research etc a determines the classification of an object by testing its values for certain properties 0 Goal of learning 0 Classification of objects 0 A concept is represented in the form of classification tree a Description of classification represents patterns or a set of properties that represent each class Training Data Set for Learning Decision class column K CREDIT NO RISK HISTORY DEBT COLLATERAL INCOME l high had high none 0 to 15k 2 high unknown high none 15 to 35k 3 moderate unknown low none 15 to 35k 4 high unknown low none 1 to 15k 5 low unknown low none over 35k 6 low unknown low adequate over 35k 7 high had low none 50 to 151 8 moderate had low adequate over 35k 9 low good low none over 35 k 10 low good high adequate over 35k 11 high good high none 0 to 15k 12 moderate good high none 15 to 35k 13 low good high none over 35k 14 high bad high none 15 to 35k Data collected from credit history of loan applications O A Decision Tree for Credit Risk Assessment 333339 39 0 credit history unknnwn had good coll ateral debt high low noneEmma39s high low high risk collateral high rial collateral none adequate none adequate incame income 0 to 15K 15 tit 35k over 35k 0 to ver 35k high risk mademlerisk l Hamish l Ihlghrisk I l modemmmisk I lowdsk l O 000 A Simplified DeCIsIon Tree for Credit Risk 3 Assessment 3339 0 to 15k 15 to 35k over 35k credit history credit history A unknown bad good unknown bad good Clem I Iowrlskl Imoderaterlskl I IOWTISKI high low moderate risk If multiple trees can be created from the same data set which tree would you use and why 31 Theory that Supports the Simpler Tree 0 A pattern needs to be simpler than listing out the data set it describe 0 Minimal Message Length MML theory Given a data set D Given a set of hypotheses H1 H2 Ht that describe D 0 These H can be classifiers clusters theories etc MML principle states that 0 We should choose Hi for which the quantity MlengthHi MlengthDHi is minimized where MlengthHi is minimum number of bits needed to specify Hi MlengthDHi is minimum number of bits needed to describe the data given that Hi is true Example Represent a set 246 100 o H1 List all 49 integers cost of H1 cost of listing 49 integers 0 H2 All even numbers lt 100 AND gt 2 cost of H2 cost of listing 2 integers cost of lt cost of even concept 0 According to MML if costsolution1 gt costsolution2 then solution2 is worth something since as size of set increases the cost of solution1 increases much more rapidly 32 Topdown Decision Tree Induction Process 3339 1 Selects a property or attribute to test by values of the property at the current node of the tree 2 Uses this test to partition the set of examples into groups by the values 3 Recursively construct a subtree for each partition following steps 1 and 2 o Continues until all members of the partition are in the same class that class becomes a leaf node of the tree D3 Classification Algorithm 3 function inducetree exampleset Properties begin if all entries in exampleset are in the same class then return a leaf node labeled with that class else if Properties is empty then return leaf node labeled with disjunction of all classes in exampleset else begin select a property P and make it the root of the current tree delete P from Properties for each valueV of P begin create a branch of the tree labeled with V let partitionv be elements of exampleset with values V for property P call inducetreepartitionv Properties attach result to branch V end end end Information Theory Entropy is a measure of the uncertainty in a random variable Shannon entropy quantifies the expected value of the information contained in a message usually in units such as bits For a random variable X with n outcomes x1 xn the Shannon entropy is a measure of uncertainty HX is defined as o HX i1 px log2 px where px is the probability mass function of outcome X HX is the average unpredictability in a random variable E information content assuming that communication may be represented as a sequence of independent and identically distributed random variables The more unpredictable the outcome increased uncertainty is the larger entropy is resulting in more bits required to contain larger information The more predictable the outcome reduced uncertainty or more information on the outcome the smaller the entropy is resulting in less bits to contain smaller information o A fair coin toss has one bit of entropy since there are two possible outcomes that occur with equal probability eg HcoinToss phlog2ph ptlog2pt 12log212 12log212 1 bit eg learning the actual outcome contains one bit of information o lfwe have more information let s say suppose ifthe coin was rigged to come up heads 75 ofthe time then HcoinToss 34log234 14log214 0811 bits 0 What s the entropy of a coin toss with a coin that has two heads and no tails 35 Information Theoretic Test Selection 32 For a data set T entropy HT i1pCi logz pC is a measure fthe amount of uncertainty in the data set T for the occurrence of each class C HT is the average amount of information needed to identify the class Ci in T o pC is the proportion of the number ofelements in class Cto the number of elements in set T a When HT 0 the set T is perfectly Classi ed all instances ofT are ofthe same class Compute entropy after T is divided into the m possible subsets V1 v2 vm using an attribute X or the expected information needed to complete the tree after selection X is EX Z V THV The amount of information needed to complete the tree is defined as the weighed average ofthe information in all its subtrees M and T are the number of instances that belong to the set vi and T respectively NOTE Hvi should be calculated based on the class Ci The information gain from the attribute X gainX HT EX D3 chooses the attribute that provides the greatest information gain meaning the smallest Ex by simplicity rule 0 The more informationknowledge we have less bits we need to describe it Example Calculating Information Gain Loan Data HLoanData 614ogz614 314Iogz314 514Iogz514 1531 bits This is for any tree that covers the examples in the data set Partitions by income are v114711 v2231214 V356891013 From the partitions v1T 414 v2T 414 v3T 614 Hv1 2pCilogzpCi00 since all in high class Hv2 10 Hv3 065 The expected information needed to complete the tree for income Eincome 414HV1 414HV2 614HV3 41400 41410 614065 0564bits Information gain by choosing income gainincome HLoanData Eincome 1531 0564 0967 bits Similarly we calculate information gain for all other attributes gaincredit history 0266 gaindebt 0633 gaincollateral 0206 80 we choose the attribute income for partition that results in subtrees For each subtree repeat the same procedure o Can we choose the same attribute as the parent node 37 Two Partially Constructed Decision Tree All examples 14 records income Stage A of a decision tree 0 to 15k 15 to 35k over 35k examples 1 41 11 examples 2 3 12 14 examples 5 6 8 9 10 13 These are record Ids 1W Stage B of a decision tree 0 to 15k 15 to 35k over 35k high risk credit history examples 5 6 8 9 10 13 unknowri bad good examples 2 3 examples 14 examples 12 38 The Final Decision Tree Constructed 3333 income 0 to 15k 15 to 35k over 35k credit history credit history A unknown bad good unknown bad good Clem I Iowrlskl Imoderaterlskl I IOWTISKI high low moderate risk Decision Tree Induction as Search through a Concept Space 3339 0 Representation Tree 0 Concepts represented in tree a Class concept can be described by attribute and value pairs 0 Concept of high risk income 015k OR income 15k35k AND credit history unknown AND debt high OR income 15k35k AND credit history bad 0 High risk rule IF income 015k OR income 15k35k AND credit history unknown AND debt high OR income 15k35k AND credit history bad THEN the application is high risk Concept space Version space is all possible decision trees that can be created from training examples 0 Operations 3 Moving through this space consists of adding tests to a tree 0 ID3 as a search method a D3 implements a form of greedy search in the space of all possible trees without backtracking Very efficient 40 oooo 0000 Evaluation of DeCISIon Tree 33 I 0 Advantages oo 9 Easy to convert a decision tree into a set of rules 0 LHS decisions leading to the leaf node 9 RHS class outcome 0 Common issues When the data is bad eg inconsistent data missing data Continuous attributes nominal attributes are ideal For continuous values break the values into subsets of values groups of values 0 Large data set may produce a large tree 0 C45 or higher version addressed many of these issues 0 Bagging produce replicate training sets called bootstrap samples by sampling with replacement from the training instances Boosting maintains a weight for each classifier The weight is used to reflect the importance Multiple classifiers produced from bootstrap samples are combined by taking average or voting to form a composite classifier In bagging each component classifier has the same vote while boosting assigns different voting strength to component classifiers on the basis of their accuracy 0 This technique can be used for other learning or regression approaches 0 To handle large data set divide the data into subsets build the decision tree on one subset and then test its accuracy on other subsets 0 Many variations exist eg CART 4 Inductive Bias Induction depends on prior knowledge and assumptions about the nat re of the concepts being learned Inductive bias refers to any criteria a learner uses to constrain the concept space or to select concepts within that space Examples of inductive biases 0 Use of domain specific knowledge and assumptions such as heuristics e Syntactic constraints on the representation of learned concepts to limit the size of the space 0 Bit representation decision tree feature vectors rules Horn clauses etc o Conjunctive forms limited number ofdisjunctive forms along with conjunctive forms to increase the expressiveness of representation Reason for inductive bias 0 Learning space is so large so searchbased learning becomes impractical a Training data are only a subset of all instances in the domain 80 since data alone are not enough it must make additional assumptions about likely concepts may be in the form of heuristics References 339 0 George Fluger Artificial Intelligence Structures and Strategies for Complex Problem Solving 6th edition Chapters 10 and 12 Addison Wesley 2009
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'