### Create a StudySoup account

#### Be part of our community, it's free to join!

Already have a StudySoup account? Login here

# Evolutionary Computation CSE 848

MSU

GPA 3.97

### View Full Document

## 42

## 0

## Popular in Course

## Popular in Computer Science and Engineering

This 36 page Class Notes was uploaded by Donnell Kertzmann on Saturday September 19, 2015. The Class Notes belongs to CSE 848 at Michigan State University taught by William Punch in Fall. Since its upload, it has received 42 views. For similar materials see /class/207407/cse-848-michigan-state-university in Computer Science and Engineering at Michigan State University.

## Popular in Computer Science and Engineering

## Reviews for Evolutionary Computation

### What is Karma?

#### Karma is the currency of StudySoup.

#### You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 09/19/15

Introduction to Genetic Algorithms For CSE 848 and ECE 802601 Introduction to Evolutionary Computation Prepared by Erik Goodman Professor Electrical and Computer Engineering Michigan State University and CoDirector Genetic Algorithms Research and Applications Group GARAGe Based on and Accompanying Darrell Whitley s Genetic Algorithms Tutorial Introduction to Genetic Algorithms 0 Stochastic not random 0 Evolutionary analogy survival of ttest 0 Not fast sometimes more robust scales relatively well 0 Can include GP LISPlike function trees learning classifier systems rules linear GP ordinary programs etc O O O Canonical GA Differences from Other Search Methods Maintains a set or population of solutions at each stage see blackboard Classical or canonical GA always uses a crossover or recombination operator domain is PAIRS of solutions sometimes more All we have learned to time t is represented by time t s POPULATION of solutions E E 660660 Contrast with Other Search Methods indirect setting derivatives to 0 direct hill climber already described enumerative already described random already described simulated annealing already described Tabu already described RSM fits approx surf to set of pts avoids full evaluations during local search GA When Might Be Any Good 9 Highly multimodal functions 9 Discrete or discontinuous functions 9 Highdimensionality functions including many combinatorial ones 9 Nonlinear dependencies on parameters interactions among parameters epistasis makes it hard for others 9 DON T USE if a hillclimber etc will work well Genetic Algorithm Meaning 9 classical or canonical GA Holland 60 s book in 75 binary chromosome population selection crossover recombination low rate mutation 9 More general GA population selection recombination mutation may be hybridized with LOTS of other stuff O O O 9 Representation Problem is represented as a string classically binary known as an individual or chromosome What s on the chromosome is GENOTYPE What it means in the problem context is the PHENOTYPE eg binary sequence may map to integers or reals or order of execution etc Each position called locus or gene a particular value for a locus is an allele So locus 3 might now contain allele 5 and the thickness gene which might be locus 8 might be set to allele 2 meaning its secondthinnest value Optimization Formulation 0 Not all GA s used for optimization also learning etc 0 Commonly formulated as given FX1Xn nd set of Xi s that extremize F often also with constraints on the Xi s equality or inequality 0 Encoding obviously depends on problem being solved Discretization etc 0 If realvalued domain discretize to binary typically powers of 2 need not be in GALOPPS with lower upper limits lineareXplog scaling etc 0 End result classically is a bit string De ning ObjectiveFitness Functions 0 Problemspecific of course 0 Many involve using a simulator 0 Don t need to possess derivatives 0 May be stochastic 0 Need to evaluate thousands of times so can t be TOO COSTLY The What Function 9 In problemdomain form absolute or raw fitness or evaluation or performance or objective function 9 Relative to population possibly inverted andor offset scaled tness usually called the fitness function Fitness should be MAXIMIZED Whereas the objective function might need to be MAXIMIZED OR MINIMIZED Selection 9 Based on fitness choose the set of individuals the intermediate population to survive untouched or be mutated or in pairs be crossed over and possibly mutated forming the next population 9 One individual may be appear several times in the intermediate population or the next population Types of Selection Using relative fitness examples 9 roulette Wheel classical Holland chunk of Wheel relative fitness 9 stochastic uniform sampling better sampling integer parts GUARANTEED Not requiring relative fitness 9 tournament selection 9 rankbased selection proportional or cutoff Scaling of Relative Fitnesses 9 Trouble as evolution progresses relative fitness differences get smaller as population gets more similar to each other Often helpful to SCALE relative fitnesses to keep about same ratio of best guyaverage guy for example Recombination or Crossover On parameter encoded representations 9 1pt example 9 2pt example 9 uniform example Linkage loci nearby on chromosome not usually disrupted by a given crossover operator cf 1pt 2pt uniform re linkage But use OTHER crossover operators for reordering problems later Mutation On parameter encoded representations 9 singlebit ne for true binary encodings 9 singlebit NOT good for binarymapped integersreals Hamming cliffs 9 Binary ints use Gray codes and bit ips or use random legal ints or use 0mean Gaussian changes etc What is a GA DOING Schemata and Hyperstuff chema adds to alphabet means don t care any value ii ne schema two schemata forgive occasional misuse in hitley trings or chromosomes or individuals or solutions are order schemata where L is length of chromosome in bits or loci Hypercubes Hyperplanes etc See pictures in Whitley tutorial or blackboard Vertices correspond to order L schemata strings Edges are order L1 schemata like 10 or 101 Faces are order L2 schemata Etc for hyperplanes of various orders A string is an instance of 2L1 schemata or a member of that many hyperplane partitions 1 because all s the whole space is not counted as a schema per Holland List them for L3 O O O O O O O 0 GA Sampling of Hyperplanes How many schemata in Whole search space how many choices each locus Since one string instances 2L1 schemata how much does a population of strings tell us about schemata of various orders More about some than others Implicit parallelism reflects idea that one string s fitness tells us something about relative fitnesses of more than one schema O O E E Fitness and Schemata Hyperplane Sampling Look at Figure 3 in Whitley tutorial re another View of hyperspaces See example in Tables 1 and 2 in Whitley tutorial and let s look at how fitnesses affect which schemata appear more or fewer times in intermediate generation sampling according to fitness How Do Schemata Propagate 9 Via instances only STRINGS appear in pop 9 Actual number of instances of schemata Table 2 in intermed gen tracked expected number pretty well in spite of small pop size 9 Let MHt be number of instances samples of schema H in pop at time t Then fitness proportional selection yields an expectation of MHZinlermedMHl Crossover Effect on Schemata 9 Onepoint Crossover Examples blackboard 11 and 11 9 Twopoint Crossover Examples blackboard rings 9 Closer together loci are less likely to be disrupted by crossover right A compact representation is one that tends to keep alleles together under a given form of crossover minimizes probability of disruption Linkage and Defining Length 0 Linkage coadapted alleles generalization of a compact representation with respect to schemata 0 Example convincing you that probability of disruption of schema H of length AH is AHL1 The Fundamental Theorem of Genetic Algorithms The Schema Theorem Holland published in 1975 had taught it much earlier by 1968 for example when I started PhD at UM It provides lower bound on change in sampling rate of a single schema from generation t to t1 We ll derive it in several steps starting from the change caused by selection alone MHtintermed MH t Schema Theorem Derivation cont Now we want to add effect of crossover A fraction pc of pop undergoes crossover so MHt1 1 pCMHt g pCMHt g 1 lasses gains Will make a conservative assumption that crossover within the defining length of H is always disruptive to H and will ignore gains we re after a LOWER bound won t be as tight but simpler Then Ht 1 2 1 pCMHt pCMHt1 dismptions Schema Theorem Derivation cont Whitley considers one nandisruption case that Holland didn t originally If cross H with an instance of itself anywhere get no disruption Chance of doing that drawing second parent at random is PHt MHtpopsize so prob of disruption by Xover is MH F 1 P H I Then can simplify the inequality dividing by popsize and rearranging re pc PltHr1gt 2 PltHrgt fltIquotgt1 p lt1 PltHrgt L71 This version ignores mutation and assumes second parent is chosen at random But it s usable already Schema Theorem Derivation cont Now let s recognize that we ll choose the second parent for crossover based on fitness too PHt1 2 Han if 1 pt L2 1 PHt f 16 Now let s add mutation s effects What is the probability that a mutation affects schema H Assuming mutation always ips bit or changes allele Each fixed bit of schema 0H of them changes with probability pm so they ALL stay UNCHANGED with probability lt H 1 pm Schema Theorem Derivation cont Now we have a more comprehensive schema theorem PHt1 2 PHtf1quot 1 pc 1 PHzlt1 pmaltm This is Where Whitley stops We can use this but Holland earlier generated a simpler but less accurate bound first approximating the mutation loss factor as 10Hpm assuming pmltlt1 Schema Theorem Derivation cont That yields PltH t 1 2 PltHru p 11 0ltHpm1 But since pmltlt1 we can ignore small crossproduct terms and get PltH t 1 2 PltHr1 p 0Hpm That is What many people recognize as the classical form of the schema theorem What does it tell us Schema Theorem Interpretation Even simple form can help us balance selection pressure crossover rate mutation rate etc PHz12 PHz1 pC 0Hpm L71 Say relative fitness of H is 12 pg 5 pm 05 and L 20 What happens to H Pitfalls slow progress random search premature convergence etc Problem with Schema Theorem OK in beginning of search but assumptions become worse later Lessons Not Always Followed Building block short loworder highfitness schema For newly discovered building blocks to be nurtured made available for combination with others but not allowed to take over population why 9 Mutation rate should be but contrast with SA ES 17t 9 Crossover rate should be 9 Selection should be able to 9 Population size should be oops what can we say about this so far A Traditional Way to Do GA Search 9 Population large 9 Mutation rate per locus 1L 9 Crossover rate moderate lt03 9 Selection scaled or ranktournament etc such that Schema Theorem allows new BB s to grow in number but not lead to premature convergence Schema Theorem and Crossover Types 9 If we use a different type of representation or different crossover operator Must formulate a different schema theorem using same ideas about disruption of schemata See Whitley Fig 4 for paths through search space under crossover Uniform Crossover amp Linkage 9 Uniform crossover is more disruptive than 1pt or 2pt crossover 9 BUT uniform is unbiased relative to linkage 9 If all you need is small populations and a rapid scramble to find good solutions uniform xover sometimes works better do you really want a GA for this 9 Otherwise try to lay out chromosome for good linkage use 1pt or usually better 2pt unchanging Classical Inversion Operator 0 Example reverses field pairs i through k on chromosome aaVa bVb 0ND ILVd en E Vf gVg 0 After inversion of positions 24 yields aaVa ILVd 0ND bVb en E Vf gVg 0 Now fields ad are more closely linked 1pt or 2pt crossover less likely to separate them 0 In practice seldom used must run problem for an enormous time to have such a secondlevel effect be useful Need to do on population level or tag each inversion pattern and force mates to have matching tags or do repairs to crossovers to keep chromosomes legal ie possess one pair of each type Inversion An Idea to Try to Improve Linkage 9 Tries to reorder loci on chromosome BUT NOT changing meaning of loci in the process 9 Means must treat each locus as index value pair Can then reshuffle pairs at random let crossover work with them in order APPEAR on chromosome but fitness function keep association of values with indices of fields Inversion NOT a Reordering Operator 0 In contrast if trying to solve for the best permutation of 0N use other reordering crossovers we ll discuss later That s NOT inversion Crossover Between Similar Individuals As search progresses more individuals tend to resemble each other When two similar individuals are crossed chances of yielding children different from parents are lower for 12 pt than uniform Can counter this with reduced surrogate crossover lpt 2pt Reduced Surrogates Given 0001111011010011 and 0001001010010010 drop matching Positions getting 1111 and 0000 reduced surrogates If pick crossover pts IGNORING DASHES 1pt 2 pt still search similarly to uniform The Case for Binary Alphabets Deals with efficiency of sampling schemata Minimal alphabet 9 maximum hyperplanes directly available in encoding for schema processing and higher rate of sampling loworder schemata than with larger alphabet See p 20 Whitley for tables Half of a random init pop samples each order 1 schema and 1 samples each order2 schema etc If use alphaisize 10 many schemata of order 2 will not be sampled in an initial population of 50 Of course each order1 schema sampled gave us info about a 3 bit allele Case Against Antonisse raises counterarguments on a theoretical basis and the question of effectiveness is really open But often don t want to treat chromosome as bit string but encode ints allow crossover only between int fields not at bit boundaries use problemspecific representations Losses in schema search efficiency may be outweighed by gains in naturalness of mapping keeping fields legal etc So we will most often use nonbinary strings GALOPPS lets you go either way The N3 Argument Implicit or Intrinsic Parallelism Assertion A GA with pop size N can usefully process on the order of N3 hyperplanes schemata in a generation WOW If N100 N3 1 million Derivation Assume 9 Random population of size N 9 Need q instances of a schema to claim we are processing it in a statistically significant way in one generation The N3 Argument cont Example to have 8 samples on average of 2nd order schemata in a pop there are 4 distinct CONFLICTING schemata in each 2position pair for example 00 01 10 11 we d need 4 bit patterns X 8 instances 32 popsize In general the highest ORDER of schema6 that is processed is log No in our case log328 log4 2 log means logz The N3 Argument cont But the number of distinct schemata of order 6 is 26 9 the number of ways to picke different positions and assign all possible binary values to each subset of the 6 positions L So we are trying to argue that 26 9 2 N 3 L which implies that 26 e 2 29 of since 9 logN The N3 Argument cont Rather than proving anything general Fitzpatrick amp O O O O Grefenstette 88 argued as follows Assume L 2 64 and 26 S N S 220 Pick 8 which implies 3 g e g 17 By inspection plug in N s get 9 s etc the number of schemata processed is greater than N3 So as long as our population size is REASONABLE 64 to a million and L is large enough problem hard enough the argument holds But this deals with the initial population and it does not necessarily hold for the latter stages of evolution Still it may help to explain why GA s can work so well E E Exponentially Increasing Sampling and the KArmed Bandit Problem Schema Theorem says MHt1 gt k MHt if we neglect certain changes That is H s instances in population grow exponentially as long as small relative to pop size and kgt1 H is a building block Is this a good way to allocate trials to schemata Argument that SHOULD devote exponentially increasing fraction of trials to schemata that have performed better in samples so far TwoArmed Bandit Problem from Goldberg 89 0 1armed bandit slot machine 0 2armed bandit slot machine with 2 handles NOT necessarily yielding same payoff odds 2 different slot machines 0 If can make a total of N pulls how should we proceed so as to maximize expected final total payoff Ideas TwoArmed Bandit cont 9 Assume LEFT pays with unknown to us expected value ul and variance 512 and RIGHT pays uz with variance 522 9 The DILEMMA Must EXPLORE While EXPLOITING Clearly a tradeoff must be made Given that one arm seems to be paying off better than the other SO FAR how many trials should be given to the BETTER so far arm and how many to the POORER so far arm TwoArmed Bandit cont Classical approach SEPARATE EXPLORATION from EXPLOITATION If will do N trials start by allocating 11 trials to each arm 2nltN to decide WHICH arm appears to be better and then allocate ALL remaining NZn trials to it DeJong calculated the expected loss compared to the OPTIMUM of using this strategy LNn lul nil Nn qn n1qnwhere qn is the probability that the WORST arm is the OBSERVED BEST arm after 11 trials on each machine TwoArmed Bandit cont This qn is well approximated by the tail of the normal distribution eixzz Where x M 0 4 03912 03922 00 q 5 X is signal difference to noise ratio times sqrtn Let s call signal difference to noise ratio c LNn lul Hzl Nn 111 n1Qn Due to quotAwrong arm later quotAwrong during exploration TwoArmed Bandit cont For any N solve for the optimal experiment size n by setting the derivative of the loss equation to 0 Graph below after Fig 22 in Goldberg 89 shows the optimal n as a function of total number of trials N and c the ratio of signal difference to noise 1E26 From graph see that total number of 1E23 expen39ments N grows at a greater 1E20 thanexponential function of the ideal 1E17 number of trials n in the explamtiun N 1E peliod that means according to classical decision theory that we 100000 should be allocating trials to the BETTER higher measured fitness 01 dun39ng the exploration period of the 01 1 1 10 two arms at a GREATER THAN Winquot EXPONENTIAL RATE TwoArmed Bandit cont The LARGER x becomes the LESS probable qn becomes ie smaller chance of error You can see that qn chance of error DECLINES as n is picked larger or as the differences in expected values INCREASES or as the sum of the variances DEC REASES The equation shows two sources of expected loss TwoArmed Bandit KArmed Bandit Now let our arms represent competing schemata Then the future sampling of the better one to date should increase at a largerthaneXponential rate A GA using selection crossover and mutation does that when set properly according to the schema theorem If there are K competing schemata over a set of positions then it s a Karmed bandit But at any time MANY different schemata are being processed with each competing set representing a K armed bandit scenario So maybe the GA s way of allocating trials to schemata is pretty good E E Early Theory for GA s Vose and Liepins 91 produced most wellknown GA theory model The main elements 9 vector of size 2L containing proportion of population with genotype 139 at time t before selection PSit Whole vector denoted pt 9 matrix rijk of probabilities that crossing strings i and j Wlll produce string k tI t t Then Epk ZsiserUc zuj Vose amp Liepins cont 0 r is used to construct M the mixing matrix that tells for each possible string the probability that it is created from each pair of parent strings Mutation can also be included to generate a further huge matrix that in theory could be used with an initial population to calculate each successive step in evolution E E Vose amp Liepins cont The problem is that not many theoretical results with practical implications can be obtained because for interesting problems the matrices are too huge to be usable and the effects of selection are dif cult to estimate More recent work in a statistical mechanics approach to GA theory seems to me to hold far more interest Different Forms of GA Generational vs SteadyState 9 Generation gap 10 means replace ALL by newly generated children at lower extreme generate 1 or 2 offspring per generation called steadystate 9 GALOPPS allows either by setting crossover rates Different Forms of GA Replacement Policy 9 Offspring replace parents 9 K offspring replace K worst ones 9 Offspring replace random individuals in intermediate population 9 Offspring are crowded in GALOPPS allows 134 easily 2 takes mods Crowding Crowding DeJong helps form niches and avoid premature takeover by fit individuals For each child 9 Pick K candidates for replacement at random from intermediate population 9 Calculate pseudoHamming distance from child to each 9 Replace individual most similar to child Effect Elitism Artificially protects fittest K members of population against replacement in next generation Often useful but beware if using multiple subpopulations K often 1 may be larger even large Example GA Packages GENITOR Whitley 0 Steadystate GA 0 Child replaces worstfit individual 0 Fitness is assigned according to rank so no scaling is needed 0 elitism is automatic Can do in GALOPPS except worst replacement user must rewrite that part Example GA Packages CHC Eshelman O Elitism 11 from ES generate 9t offspring from 11 parents keep best 11 of the 11 parents and children Uses incest prevention reduction pick mates on basis of their Hamming dissimilarity HUX form of uniform crossover highly disruptive O O O Rejuvenate with cataclysmic mutation when population starts converging which is often small populations used GALOPPS allows last three not first one I don t favor except for relatively easy problem spaces Hybridizing GAS a Good Idea IDEA combine a GA With local or problem specific search algorithms HOW typically for some or all individuals start from GA solution take one or more steps according to another algorithm use resulting fitness as fitness of chromosome If also change genotype Lamarckian if don t Baldwinian preserves schema processing Helpful in many constrained optimization problems to repair infeasible solutions to nearby feasible ones Other RepresentationsOperators PermutationOptimal Ordering 0 Chromosome has EXACTLY ONE copy of each int in 0N1 0 Must find optimal ordering of those ints 0 1pt 2pt uniform crossover ALL useless 0 Mutations swap 2 loci scramble K adjacent loci shuf e K arbitrary loci etc 0 See blackboard for example Crossover Operators for Permutation Problems What properties do we want 0 1 Want each child to combine building blocks from both parents in a way that preserves highorder schemata in as meaningful a way as possible and 0 2 Want all solutions generated to be feasible solutions Representations Using TSP Example PMX Partially Matched Crossover interchange between parents 0A 98456713210 0B 87123109546 A 98423101657 B 810156 7 I924 3 O O O ie swap 5 with 2 6 with 3 and 7 with 10 in both children Thus some ordering information from each parent is preserved and no infeasible solutions are generated 9 Example Operators for PermutationBased 0 2 sites picked intervening section specifies cities to 9 9 9 9 9 9 Example Operators for PermutationBased Representations Order Crossover A 98456 7 13210 segmentAandB B 8712310954 6 gtB8H123109H4H repl567withH s gt B 2 3 10 H H H 9 4 8 1 promote segment from B gather H s append rest with wraparound gtB 231056 7 I948 1 SimilarlyA 56723101984 Order crossover preserves more information about RELATIVE ORDER than does PMX but less about ABSOLUTE POSITION of each city for TSP example o o o o co Example Operators for PermutationBased Representations Cycle Crossover Cycle crossover forces the city in each position to come from that same position on one of the two parents C98217451063 D1234567 8 910 4 6 which completes lst cycle then depending on whose cycle crossover you choose i start from rst unassigned position in D and perform another cycle or ii just ll in the rest of the numbers from chromosome iyields gt 92 1 48610 gt923148610 gtC 92317458610 iiyieldsgtC 92315478610 D is done similarly D is done similarly Example Operators for PermutationBased Representations Uniform OrderBased Crossover lt Lawrence Davis Handbook ofGenetic Algorithms Analogous to uniform crossover for ordinary listbased chromosomes Uniform crossover allows many crossovers to be performed at once on a pair of chromosomes com ining parents genes on a locusbylocus basis so is quite disruptive of longer schemata I don t like it much as it jum es information and is too disruptive for effectiveness with many problems I believe But it works quite well for some others o o o Parallel GAS Independent of Hardware My Bias Three primary models coarsegrain island fine grain cellular and trivia micrograin Trivial not really a parallel GA pass out individuals to separate processors for evaluation or run lots of local tournaments no master still acts like one large population GALOPPS release of this not current Q Q A 1 2 3 4 5 6 7 8 B 8 6 4 2 7 5 3 1 Binary Template 0 1 1 0 1 1 0 0 random gt 2 3 5 6 then reordering rest ofA s nodes to the order THEY appear in in B o gtA 82345671 and similarlyfor B gt8 4 5 2 6 7 3 1 CoarseGrain Island Parallel GA N subpopulations acting as if running in parallel timeshared or actually on multiple processors Occasionally migrants go from one to another in prespeci ed patterns Strong capability for avoiding premature convergence while exploiting good individuals if migration ratespatterns well chosen MicroGrain Parallel GAs 0 Individuals distributed on cells in a tessellation one or few per cell O Mating typically among near neighbors in some defined neighborhood 0 Interesting when viewed as cellular automata

### BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.

### You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

## Why people love StudySoup

#### "There's no way I would have passed my Organic Chemistry class this semester without the notes and study guides I got from StudySoup."

#### "I signed up to be an Elite Notetaker with 2 of my sorority sisters this semester. We just posted our notes weekly and were each making over $600 per month. I LOVE StudySoup!"

#### "I was shooting for a perfect 4.0 GPA this semester. Having StudySoup as a study aid was critical to helping me achieve my goal...and I nailed it!"

#### "It's a great way for students to improve their educational experience and it seemed like a product that everybody wants, so all the people participating are winning."

### Refund Policy

#### STUDYSOUP CANCELLATION POLICY

All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email support@studysoup.com

#### STUDYSOUP REFUND POLICY

StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here: support@studysoup.com

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to support@studysoup.com

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.