Popular in Course
verified elite notetaker
Popular in Computer Information Technology
verified elite notetaker
This 80 page Class Notes was uploaded by Kathleen Cartwright on Monday September 28, 2015. The Class Notes belongs to CIS511 at University of Pennsylvania taught by Staff in Fall. Since its upload, it has received 8 views. For similar materials see /class/215379/cis511-university-of-pennsylvania in Computer Information Technology at University of Pennsylvania.
Reviews for THEORYOFCOMPUTATION
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 09/28/15
Chapter 3 ContextFree Grammars ContextFree Languages Parse Trees and Ogden s Lemma 31 ContextFree Grammars A context free grammar basically consists of a nite set of grammar rules In order to de ne grammar rules we assume that we have two kinds of symbols the terminals which are the symbols of the alphabet underlying the languages under consideration and the nonterminals which behave like variables ranging over strings of terminals A rule is of the form A a 04 where A is a single nonterminal and the right hand side or is a string of terminal andor nonterminal symbols As usual rst we need to de ne what the object is a context free grammar and then we need to explain how it is used Unlike automata grammars are used to generate strings rather than recognize strings De nition 311 A eontecot free grammar for short CFC is a quadruple G V E P S where 0 V is a nite set of symbols called the vocabulary or set of grammar symbols 0 E Q V is the set of terminal symbols for short terminals 0 S E V 7 E is a designated symbol called the start symbol 0 P Q V 7 E gtlt V is a nite set of productions or rewrite rules or rules The set N V7 E is called the set of nonterminal symbols for short nonterminals Thus P Q N gtlt V and every production ltAagt is also denoted as A a a A production of the form A a E is called an epsilon rule or null rule 33 34 CHAPTER 3 CONTEXT FREE GRAMMARS AND LANGUAGES Remark Context free grammars are sometimes de ned as G VN VT P S The correspondence with our de nition is that E VT and N VN so that V VN U VT Thus in this other de nition it is necessary to assume that VT VN 0 Example 1 G1 Eab ab P E where P is the set of rules E a aEb E ab As we will see shortly this grammar generates the language L1 arm l n 2 1 which is not regular Example 2 G2 EgtkaaP E where P is the set of rules E a EE E a E gtk E E60 7 This grammar generates a set of arithmetic expressions 32 Derivations and ContextFree Languages The productions of a grammar are used to derive strings In this process the productions are used as rewrite rules Formally we de ne the derivation relation associated with a context free grammar First let us review the concepts of transitive closure and re exive and transitive closure of a binary relation Given a set A a binary relation R on A is any set of ordered pairs ie R Q A gtlt A For short instead of binary relation we often simply say relation Given any two relations R S on A their composition R o S is de ned as RoSxy AgtltAl32 Az z Randzy S The identity relation IA on A is the relation IA de ned such that IA x p l x E A For short we often denote IA as I Note that R o I I o R R for every relation R on A Given a relation R on A for any n 2 0 we de ne R as follows R0 I R 1 R o R 32 DERIVATIONS AND CONTEXT FREE LANGUAGES 35 It is obvious that R1 R It is also easily veri ed by induction that R o R R o R The transitive closure PU of the relation R is de ned as RUR n21 It is easily veri ed that Pi is the smallest transitive relation containing R7 and that my 6 PH iff there is some n 2 1 and some x07x177xn E A such that 0 7 xn y7 and xi7 n1 E R for all i7 0 S i S n 7 1 The transitive and reflective closure P of the relation R is de ned as FUW n20 Clearly7 Pi PU U I It is easily veri ed that P is the smallest transitive and re exive relation containing R De nition 321 Given a context free grammar G V7 27 P7 S7 the one step derivation relation gtG associated with G is the binary relation gtG Q V gtlt V de ned as follows for all 0476 E V we have 04 gt0 5 iff there exist A7p E V and some production A a y 6 P7 such that 04 AAp and B A yp The transitive closure of gtG is denoted as gtG and the re exive and transitive closure of gtG is denoted as 30 When the grammar G is clear from the context7 we usually omit the subscript G in gt07 gtG7 and gt0 A string 04 E V such that S 3 04 is called a sententialforrn7 and a string w E 2 such that S 3 w is called a sentence A derivation oz 3 6 involving n steps is denoted as V L 04 gt 6 Note that a derivation step 04 gt0 3 is rather nondeterministic lndeed7 one can choose among various occurrences of nontermi nals A in 047 and also among various productions A a y with left hand side A For example7 using the grammar G1 E7a7b7 a7b7 137E7 where P is the set of rules Egmm E mll7 36 CHAPTER 3 CONTEXT FREE GRAMMARS AND LANGUAGES every derivation from E is of the form E anEb anabb an bn or E anEb anaEbb an EW where n 2 0 Grammar G1 is very simple every string anon has a unique derivation This is usually not the case For example7 using the grammar G2 E777 7a7gtka137E7 where P is the set of rules E gtEE7 E gtEgtkE7 Egta7 7 the string a a gtk a has the following distinct derivations7 where the boldface indicates which occurrence of E is rewritten EgtEEgtEEE gtaEgtkEgtaagtkEgtaagtka7 and EgtEEgtaE gtaEgtkEgtaagtkEgtaagtka In the above derivations7 the leftmost occurrence of a nonterminal is chosen at each step Such derivations are called leftmost derivations We could systematically rewrite the right most occurrence of a nonterminal7 getting rightmost derivations The string a a gtk 1 also has the following two rightmost derivations7 where the boldface indicates which occurrence of E is rewritten EgtEEgtEEE gtEEgtkagtEagtkagtaagtka7 and EgtEEgtEgtka gtEEgtkagtEagtkagtaagtka The language generated by a context free grammar is de ned as follows 32 DERIVATIONS AND CONTEXT FREE LANGUAGES 37 De nition 322 Given a context free grammar G VEPS the language generated by G is the set LGw E Sgtw A language L Q 2 is a contecot free language for short CFL iff L LG for some context free grammar G It is technically very useful to consider derivations in which the leftmost nonterminal is always selected for rewriting and dually derivations in which the rightmost nonterminal is always selected for rewriting De nition 323 Given a context free grammar G V 2135 the one step leftmost derivation relation gt associated with G is the binary relation lgt Q V gtlt V de ned as follows for all 046 6 V we have 04 lgt 5 iff there exist u E 2 p 6 V and some production A a y E P such that 04 uAp and B U yp The transitive closure of lgt is denoted as gt and the re exive and transitive closure of m m lgt is denoted as The one step rightmost derivation relation gt associated with m m rm G is the binary relation gt Q V gtlt V de ned as follows for all 046 6 V we have rm 04 gt 6 iff there exist A 6 V o E 2 and some production A a y E P such that 04 AAU and B A71 The transitive closure of gt is denoted as gt and the re exive and transitive closure of rm rm gt is denoted as Tm Tm Remarks It is customary to use the symbols abcde for terminal symbols and the symbols A B GD E for nonterminal symbols The symbols uowzyz denote terminal strings and the symbols a yApo denote strings in V The symbols XYZ usually denote symbols in V Given a context free grammar G VEPS parsing a string it consists in nding out whether it E LG and if so in producing a derivation for w The following lemma is technically very important It shows that leftmost and rightmost derivations are universal This has some important practical implications for the complexity of parsing algorithms 38 CHAPTER 3 CONTEXT FREE GRAMMARS AND LANGUAGES Lemma 324 Let G VEP S be a coritecst free grammar For every iv 6 2 for every derivatiori S gt iv there is a leftmost derivation S l gt iv arid there is a rightmost m derivation S gt iv Tm Proof Of course we have to somehow use induction on derivations but this is a little tricky and it is necessary to prove a stronger fact We treat leftmost derivations rightmost derivations being handled in a similar way Claim For every iv 6 2 for every oz 6 V for every n 2 1 if oz ngt iv then there is a leftmost derivation oz gt iv in The claim is proved by induction on n For n 1 there exist some Ap 6 V and some production A a 7 such that oz AAp and iv A yp Since iv is a terminal string Ap and y are terminal strings Thus A is the only nonterminal in oz and the derivation step oz 1gt iv is a leftmost step and a rightmost stepl If n gt 1 then the derivation oz ngt iv is of the form n71 oz gt a1 gt iv There are two subcases Case 1 If the derivation step oz gt oz1 is a leftmost step oz lgt oz1 by the induction in hypothesis there is a leftmost derivation oz1 7 iv and we get the leftmost derivation m n71 oz gt a1 gt iv lm lm Case 2 The derivation step oz gt oz1 is a not a leftmost step In this case there must be some it E 2 mp 6 Vi some nonterminals A and B and some production B a 6 such that oz uAiin and oz1 wimp where A is the leftmost nonterminal in a Since we have a derivation oz1 2 iv of length n 7 1 by the induction hypothesis there is a leftmost derivation 041 7 11 lm Since oz1 wimp where A is the leftmost terminal in oz1 the rst step in the leftmost derivation oz1 7 iv is of the form in MM f mum 32 DERIVATIONS AND CONTEXT FREE LANGUAGES 39 for some production A a 7 Thus we have a derivation of the form 04 uApo gt uAp p lgt U yp p 7 w We can commute the rst two steps involving the productions B a 6 and A a y and we get the derivation Oz uApo lgt 117po gt U yu p 7 w This may no longer be a leftmost derivation but the rst step is leftmost and we are back in case 1 Thus we conclude by applying the induction hypothesis to the derivation 117po 2 w as in case 1 El Lemma 324 implies that LGw E S gtww62 S gm We observed that if we consider the grammar G2 E a a P E where P is the set of rules E E E E a E gtk E E 6 E a a the string a a gtk a has the following two distinct leftmost derivations where the boldface indicates which occurrence of E is rewritten E gt E gtk E gt E E gtk E gtaEgtkEgtaagtkEgtaagtka and E gt E E gt a E gtaEgtkEgtaagtkEgtaagtka When this happens we say that we have an ambiguous grammars In some cases it is possible to modify a grammar to make it unambiguous For example the grammar G2 can be modi ed as follows Let G3 ET FgtkagtkaP E where P is the set of rules E E T E a T T a T gtk F T a F F 6 F a 40 CHAPTER 3 CONTEXT FREE GRAMMARS AND LANGUAGES We leave as an exercise to show that LG3 LG27 and that every string in LG3 has a unique leftmost derivation Unfortunately7 it is not always possible to modify a context free grammar to make it unambiguous There exist context free languages that have no unambiguous context free grammars For example7 the language L3 ambmcn l m7n 21Uamb c l m7n 21 is context free7 since it is generated by the following context free grammar S m 517 S m 527 81 a XG7 2 a AY7 X a aXb7 X a ab7 Y a bYc7 Y a be A a aA7 A a a7 G a CG7 G a c However7 it can be shown that L3 has no unambiguous grammars All this motivates the following de nition De nition 325 A context free grammar G V7 271375 is ambiguous if there is some string w E LG that has two distinct leftmost derivations or two distinct rightmost deriva tions Thus7 a grammar G is unambiguous if every string w E LG has a unique leftmost derivation or a unique rightmost derivation A context free language L is inherently am biguous if every CFC G for L is ambiguous Whether or not a grammar is ambiguous affects the complexity of parsing Parsing algo rithms for unambiguous grammars are more ef cient than parsing algorithms for ambiguous grammars We now consider various normal forms for context free grammars 33 Normal Forms for ContextFree Grammars Chom sky Normal Form One of the main goals of this section is to show that every CFC G can be converted to an equivalent grammar in Chomsky Normal Form for short GNF A context free grammar 33 NORMAL FORMS FOR CONTEXT FREE GRAMMARS 41 G V72137 S is in Chomsky Normal Form iff its productions are of the form AgtBC7 Aaa7 or Sgte7 where A7BO 6 N7 1 E E S a E is in P iff E E LG7 and S does not occur on the right hand side of any production Note that a grammar in Chomsky Normal Form does not have e rules7 ie7 rules of the form A a 67 except when E E LG7 in which case S a E is the only e rule It also does not have chain rules7 ie7 rules of the form A a B7 where AB 6 N Thus7 in order to convert a grammar to Chomsky Normal Form7 we need to show how to eliminate e rules and chain rules This is not the end of the story7 since we may still have rules of the form A a 04 where either lal 2 3 or lal 2 2 and 04 contains terminals However7 dealing with such rules is a simple recoding matter7 and we rst focus on the elimination of e rules and chain rules It turns out that e rules must be eliminated rst The rst step to eliminate e rules is to compute the set EG of erasable or nullable nontermz39nals EGA NlAgte The set EG is computed using a sequence of approximations El de ned as follows E0A NlA e P7 E141 l 6P7 BjEEi7 Clearly7 the El form an ascending chain EogElgnglgElngLN 7 and since N is nite7 there is a least 239 say to such that EU Ei01 We claim that EG Eio Actually7 we prove the following lemma Lemma 331 Given a contecct free grammar G V72137 S one can construct a conteat free grammar G V7 2 P7 S such that 1 MGquot MG 2 P contains no e rules other than S a 6 and S a E E P z e E LG 3 S does not occur on the right hand side of any production in P Proof We begin by proving that EG Eio For this7 we prove that EG Q ED and g EG 42 CHAPTER 3 CONTEXT FREE GRAMMARS AND LANGUAGES To prove that EU Q EG7 we proceed by induction on 239 Since E0 A E N l A a 6 E P7 we have A 1gt 67 and thus A E By the induction hypothesis7 E Q If A 6 E1117 either A E El and then A E EG7 or there is some production A a B1BjBk 6 P7 such that B E E for all j7 1 S j S k By the induction hypothesis7 Bj gt E for each j7 1 Sj 3 k7 and thus AgtB1BjBkgtB2BjBkgtBjBkgt67 which shows that A E To prove that EG Q Eio we also proceed by induction7 but on the length of a derivation AgtE11A67thenA 66P7andthuSA E0S1H08E0A NlA EEP If A E 67 then A gt 04 ngt 67 for some production A a 04 E P If 04 contains terminals of nonterminals not in EG7 it is impossible to derive E from 04 and thus7 we must have 04 B1 quotBi Bk with B E EG7 for all j7 1 S j S k However7 Bj E where 71 S 717 and by the induction hypothesis7 B 6 ED But then7 we get A E Ei01 Eio as desired D Having shown that EG Eio we construct the grammar G lts set of production P is de ned as follows First7 we create the production S a S where S V7 to make sure that S does not occur on the right hand side of any rule in P Let P1AgtOLEP l OLEVUSgtS7 and let P2 be the set of productions P2 A a 041042 akak1 l 3041 E V73ak1 6 Vi 3B1 E EG7 HBk E EG A a 04131042 oszkozk1 6 P7 k 217a1ak1 31 6 Note that E E LG iff S E If S EG7 then let P P1 UPZ7 and if S E EG7 then let P P1 U P2 U S a 6 We claim that LG LG7 which is proved by showing that every derivation using G can be simulated by a derivation using G 7 and vice versa All the conditions of the lemma are now met I From a practical point of view7 the construction or lemma 331 is very costly For example7 given a grammar containing the productions S a ABC39DEF7 A a 67 B a 67 C a 67 D a 67 E a 67 F a 67 33 NORMAL FORMS FOR CONTEXT FREE GRAMMARS 43 eliminating e rules will create 26 7 1 63 new rules corresponding to the 63 nonempty subsets of the set A B O D EF We now turn to the elimination of chain rules It turns out that matters are greatly simpli ed if we rst apply lemma 331 to the input grammar G and we explain the construction assuming that G VEPS satis es the conditions of lemma 331 For every nonterminal A E N we de ne the set IAB NlAgtB The sets IA are computed using approximations IA de ned as follows IA0B NlAgtBEP Al1 IA U C E N l 3B a O E P andB 6 1A Clearly for every A E N the IA form an ascending chain IA0 IA1Qquot39 IAiCIAi1 Qquot N7 and since N is nite there is a least i say i0 such that L4 Al01 We claim that IA 1A Actually we prove the following lemma Lemma 332 Given a contecct free grammar G V E P S one can construct a conteat free grammar G V E P S such that 1 MG MG 2 Every rule in P is of the form A a 04 where lal 2 2 or A a a where a E E or S a E i e E LG 3 S does not occur on the right hand side of any production in P Proof First we apply lemma 331 to the grammar G obtaining a grammar G1 l1ESlP1 The proof that IA IA is similar to the proof that EG Eio First we prove that L4 Q IA by induction on i This is staightforward Next we prove that IA Q IA by induction on derivations of the form A gt B In this part of the proof we use the fact that G1 has no e rules except perhaps 81 a 6 and that 51 does not occur on the right hand side of any rule This implies that a derivation A E O is necessarily of the form A ngt B gt O for some B E N Then in the induction step we have B 6 1A and thus 0 E IA7O1 IAJO We now de ne the following sets of rules Let P2P17AgtBlAgtB P1 andlet P3AgtOtlB OLEPhOL NhBEIAj 44 CHAPTER 3 CONTEXT FREE GRAMMARS AND LANGUAGES We claim that G V127P2 U P3751 satis es the conditions of the lemma For example7 51 does not appear on the right hand side of any production7 since the productions in P3 have right hand sides from Pl7 and 51 does not appear on the right hand side in P1 It is also easily shown that LG LG1 LG I Let us apply the method of lemma 332 to the grammar G3 T7 F777 77a77 7 77a7P7 E7 where P is the set of rules EgtET7 EgtT7 TgtTgtkF7 TgtF7 F w F a 7 We get IE T7F7 IT F7 and IF Q The new grammar Gg has the set of rules EgtET7 E TgtkF7 Egta7 T TgtkF7 TgtE Tgta7 F a 7 7 7 At this stage7 the grammar obtained in lemma 332 no longer has e rules except perhaps 5 a 6 iff E E LG or chain rules However7 it may contain rules A a 04 with lal 2 37 or with lal 2 2 and where 04 contains terminalss To obtain the Chomsky Normal Form we need to eliminate such rules This is not dif cult7 but notationally a bit messy Lemma 333 Given a contecct free grammar G V7 2 P7 S one can construct a contest free grammar G V 727P 7S such that LG LG and G is in Chomsky Normal Form that is a grammar whose productions are of the form A a B07 A a a7 or S g 33 NORMAL FORMS FOR CONTEXT FREE GRAMMARS 45 where A7B7O E N a E E S a E is m P z e E LG and S does not occur on the right hand side of any production in P Proof First7 we apply lemma 3327 obtaining G1 Let ET be the set of terminals occurring on the right hand side of rules A a 04 E Pl7 with lal 2 2 For every 1 E E let Xa be a new nonterminal not in V1 Let P2Xa ala627 Let PM be the set of productions A a1a1a2 akakak17 where 117 41k 6 ET and 041 E Nf For every production A 04101042 04kak04k1 in PM7 let A a Otha1042quot39OthakOtk1 be a new production7 and let P3 be the set of all such productions Let P4 P1 7 PM U P2 U P3 Now7 productions A a 04 in P4 with lal 2 2 do not contain terminals However7 we may still have productions A a 04 6 P4 with lal 2 3 We can perform some recoding using some new nonterminals For every production of the form A a Bl Bk7 where k 2 37 create the new nonterminals Br Bk71l7 Br Bk72l7 7 19132le7 191le7 and the new productions A a B1 B1 1Bk7 B1 Bk71lquotB1 Bk72lBk717 7 BlBngl a B1B2B37 B1B2 a B1B2 All the productions are now in Chomsky Normal Form7 and it is clear that the same language is generated I 46 CHAPTER 3 CONTEXT FREE GRAMMARS AND LANGUAGES Applying the rst phase of the method of lemma 333 to the grammar G we get the rules E EXJFT7 E TXiF7 E 6 XEX7 E 6 17 T 6 TXiF7 T 6 XEX7 T 6 17 F 6 XEX7 F 6 a7 X 6 7 CE 6 7 X a X 7 After applying the second phase of the method7 we get the following grammar in Chomsky Normal Form E 6 EXJrlT7 EXl EX7 E 6 mm TX 6 TX E 6 XEX lXltEl XltE7 E 6 17 T 6 mm T 6 lXltElXgt7 T 6 17 F 6 lXltElXgt7 Fae 7 X 67 CE 43 Xlt gt X 7 For large grammars7 it is often convenient to use the abbreviation which consists in group ing productions haVing a common left hand side7 and listing the right hand sides separated 34 REGULAR LANGUAGES ARE CONTEXT FREE 47 by the symbol Thus a group of productions A7gta1 A7gta2 A7gtak may be abbreviated as A7a1lo 2l lak An interesting corollary of the CNF is the following decidability result There is an algorithm which given a context free grammar G given any string w E 2 decides whether w E LG Indeed we rst convert G to a grammar G in Chomsky Normal Form If w 6 we can test whether 6 E LG since this is the case iff S 7 E E P If w 31 6 letting n lwl note that since the rules are of the form A 7 BC or A 7 a where a E 2 any derivation for w has n 7 1 n 2n 7 1 steps Thus we enumerate all leftmost derivations of length 271 7 1 There are much better parsing algorithms than this naive algorithm We now show that every regular language is context free 34 Regular Languages are ContextFree The regular languages can be characterized in terms of very special kinds of context free grammars right linear and left linear context free grammars De nition 341 A context free grammar G V E P S is left linear iff its productions are of the form A7Ba A7gta A76 where AB E N and a E E A context free grammar G VEP S is right linear iff its productions are of the form A7M1B A7gta A76 where AB E N anda E E The following lemma shows the equivalence between NFA7s and right linear grammars 48 CHAPTER 3 CONTEXT FREE GRAMMARS AND LANGUAGES Lemma 342 A language L is regular if and only if it is generated by some right linear grammar Proof Let L LD for some DEA D Q72767q07F We construct a right linear grammar G as follows Let V Q U 27 S qo7 and let P be de ned as follows Pp aqlq5p7a7p7q Q7 06Up 6lp F It is easily shown by induction on the length of w that 19 wq iff q 5119710 and thus7 LD LG Conversely7 let G V7 27 P7 S be a right linear grammar First7 let G V7 27 P7 S be the right linear grammar obtained from G by adding the new nonterminal E to N7 replacing every rule in P of the form A a a where a E E by the rule A a aE7 and adding the rule E a 6 It is immediately veri ed that LG LG Next7 we construct the NFA M Q72767q07F as follows Q N NU E7 qo 57 F A E N l A a 67 and 6A7a B E N l A aB E P 7 for all A E N and all a E E It is easily shown by induction on the length of w that A wB iff B e 6Aw and thus7 LM LG LG I A similar lemma holds for left linear grammars It is also easily shown that the regular languages are exactly the languages generated by context free grammars whose rules are of the form lgtBu7 Agtu7 where AB 6 N7 andu E 2 35 Useless Productions in Context ee Grammars Given a context free grammar G l7E7P7S7 it may contain rules that are useless for a number of reasons For example7 consider the grammar G3 E7A7a7b7a7b7P7E7 where P is the set of rules E a aEb7 E 6 ab7 E 6 A7 A 6 bAa 35 USELESS PRODUCTIONS IN CONTEXT FREE GRAMMARS 49 The problem is that the nonterminal A does not derive any terminal strings and thus it is useless as well as the last two productions Let us now consider the grammar G4 EAabcdabcd P E where P is the set of rules E a aEb E 6 ab A 6 cAd A 6 Cd This time the nonterminal A generates strings of the form and but there is no derivation E gt 04 from E where A occurs in a The nonterminal A is not connected to E and the last two rules are useless Fortunately it is possible to nd such useless rules and to eliminate them Let TG be the set of nonterminals that actually derive some terminal string ie TGAEV72 l 3w E Agtw The set TG can be de ned by stages We de ne the sets Tn n 2 1 as follows T1A V72 l 3A w 6P withw E and Tn TnU A E V 7 E l 3A 63 E P with B E TUE It is easy to prove that there is some least 71 such that Tn T and that for this n TG T If S TG then LG Q and G is equivalent to the trivial grammar G SE S If S E TG then let UG be the set of nonterminals that are actually useful ie UG A E TG l 3amp6 E TG U EV S gt ozA The set UG can also be computed by stages We de ne the sets Un n 2 1 as follows U1 A e TG 35 a aA e P with a e TG o W and U1 U o B e TG 3A a aB e P with A e U a e TG o Er It is easy to prove that there is some least 71 such that Un1 U and that for this n UG Un U Then we can use UG to transform G into an equivalent CFG in 50 CHAPTER 3 CONTEXT FREE GRAMMARS AND LANGUAGES which every nonterminal is useful ie for which V 7 E UG lndeed simply delete all rules containing symbols not in UG The details are left as an exercise We say that a context free grammar G is reduced if all its nonterminals are useful ie N UG It should be noted than although dull the above considerations are important in practice Certain algorithms for constructing parsers for example LR parsers may loop if useless rules are not eliminated We now consider another normal form for context free grammars the Greibach Normal Form 36 The Greibach Normal Form Every CFC G can also be converted to an equivalent grammar in Greibach Normal Form for short GNF A context free grammar G V E P S is in Greibach Normal Form iff its productions are of the form A a 1B0 A a 1B A a a or S a e where ABG E N a E E S a E is in P iff E E LG and S does not occur on the right hand side of any production Note that a grammar in Greibach Normal Form does not have e rules other than possibly S a 6 More importantly except for the special rule S a 6 every rule produces some terminal symbol An important consequence of the Greibach Normal Form is that every nonterminal is not left recursive A nonterminal A is left recursive iff A gt A04 for some 04 E V Left recursive nonterminals cause top down determinitic parsers to loop The Greibach Normal Form provides a way of avoiding this problem There are no easy proofs that every CFC can be converted to a Greibach Normal Form A particularly elegant method due to Rosenkrantz using least xed points and matrices will be given in section 39 Lemma 361 Given a contecct free grammar G V E P S one can construct a COTZtCCEt free grammar G V EP S such that LG LG and G is in Greibach Normal Form that is a grammar whose productions are of the form A a aBG A a aB A a a or S 3 7 LEAST FIXED POINTS 51 where ABC E N a E E S a E is in P z e E LG and S does not occur on the right hand side of any production in P 37 Least FixedPoints Context free languages can also be characterized as least xed points of certain functions induced by grammars This characterization yields a rather quick proof that every context free grammar can be converted to Greibach Normal Form This characterization also reveals very clearly the recursive nature of the context free languages We begin by reviewing what we need from the theory of partially ordered sets De nition 371 Given a partially ordered set ltA gt an w ehoin orange is a sequence such that an S on for all n 2 0 The least upper bound of an w chain an is an element a E A such that 1 on lt a for all n 2 0 2 For any b E A if on lt b for all n 2 0 then a S b A partially ordered set ltA gt is an w ehain complete poset iff it has a least element L and iff every w chain has a least upper bound denoted as Llan Remark The in in w chain means that we are considering countable chains w is the ordinal associated with the order type of the set of natural numbers This notation may seem arcane but is standard in denotational semantics For example given any set X the power set 2X ordered by inclusion is an w chain complete poset with least element 0 The Cartesian product 2X gtlt gtlt 2X ordered such n that A1A B1B iff A Q B where A B 6 2X is an w chain complete poset with least element Q 0 We are interested in functions between partially ordered sets De nition 372 Given any two partially ordered sets ltA1 1gt and ltA2 2gt a function fA1 a A2 is monotonic iff for all my 6 A1 i 31 1 implies that JCW 2 lf ltA1 31gt and ltA2 32gt are w chain complete posets a function f A1 a A2 is w eontz39nuous iff it is monotonic and for every w chain an fan fltan39 52 CHAPTER 3 CONTEXT FREE GRAMMARS AND LANGUAGES Remark Note that we are not requiring that an w continuous function f A1 a A2 preserve least elements7 ie7 it is possible that fL1 Lg We now de ne the crucial concept of a least xed point De nition 373 Let ltA7 gt be a partially ordered set7 and let f A a A be a function A red point of f is an element a E A such that fa a The least red point of f is an element a E A such that fa a7 and for every b E A such that fb b7 then a S b The following lemma gives suf cient conditions for the existence of least xed points It is one of the key lemmas in denotational semantics Lemma 374 Let ltA73gt be an w chain complete poset with least element L Every u continuous function f A a A has a unique least red point x0 given by mU Furthermore for any h E A such that fb S b then x0 3 b Proof First7 we prove that the sequence LJQLHMVMWQW is an w chain This is shown by induction on n Since L is the least element of A7 we have l3 fL Assuming by induction that f L S fni lL7 since f is w continuous7 it is monotonic7 and thus we get f lL S f 2L7 as desired Since A is an w chain complete poset7 the w chain f L has a least upper bound U Since f is w continuous7 we have mo f f i H mm U f 1i ace and x0 is indeed a xed point of f Clearly7 if fb S b implies that x0 3 b7 then fb b implies that x0 3 b Thus7 assume that fb S b for some b E A We prove by induction of n that f L S b lndeed7 LS b7 since L is the least element of A Assuming by induction that f L 3 b7 by monotonicity of f7 we get ff iJ S fb and since fb 3 b7 this yields f 1L b 38 CONTEXT FREE LANGUAGES AS LEAST FIXED POIN TS 53 Since f l S b for all n 2 0 we have x0 Llf l b CI The second part of lemma 374 is very useful to prove that functions have the same least xed point For example under the conditions of lemma 374 if g A a A is another w chain continuous function letting x0 be the least xed point of f and yo be the least xed point of 9 if fy0 3 yo and gz0 3 0 we can deduce that 0 yo Indeed since fy0 3 yo and x0 is the least xed point of f we get 0 3 yo and since gx0 3 0 and yo is the least xed point of 9 we get yo 3 0 and therefore x0 yo Lemma 374 also shows that the least xed point 0 of f can be approximated as much as desired using the sequence We will now apply this fact to context free grammars For this we need to show how a context free grammar G V E P S with m nonterminals induces an w continuous map ltIgtG2E gtlt X22 a2 gtlt X22 x m m 38 ContextFree Languages as Least FixedPoints Given a context free grammar G V E P S with m nonterminals A1 Am grouping all the productions having the same left hand side the grammar G can be concisely written as A1 m 0411 041mm a Ai w 04131 39 39 39 04m a Am 047ml l 04mm Given any set A let meA be the set of nite subsets of A De nition 381 Let G V E P S be a context free grammar with m nonterminals A1 Am For any m tuple A L1 Lm of languages L Q 2 we de ne the function ltIgtA 73fv a 22 inductively as follows WW0 07 Aa a if a E E Ln if Ai 6 N7 q lAlHaXD q lAl04q lAlX7 if 04 E V7 X 6 V7 q lAKQ U LYN ltI lWQ U q lAla7 if Q E meV7 Q 7 07 04 6 Vi 04 6 Q 54 CHAPTER 3 CONTEXT FREE GRAMMARS AND LANGUAGES Then7 writing the grammar G as A1 m 0411 041mm Ai m 04131 quot3904ini7 Am 047ml l Olm nm we de ne the map ltIgtG2E gtlt gtlt2E a2 gtlt X2 W m m such that 101 17 39 39 39 lAlQEO Lh 39 39 39 7aln17 39 39 39 O m17 39 39 39 7am m for allAL177Lm 622 gtlt x2 m One should verify that the map A is well de ned7 but this is easy The following lemma is easily shown Lemma 382 Given a contact free grammar G V7 27 P7 S with m nontermmals A1 Am the map Gi22gtltgtlt22gt22gtltgtlt22 m m is w eontmuous Now7 22 gtlt gtlt 22 is an w chain complete poset7 and the map ltIgtG is w continous Thus7 z m by lemma 3747 the map ltIgtG has a least xed point It turns out that the components of this least xed point are precisely the languages generated by the grammars V7 27137 Ai Before proving this fact7 let us give an example illustrating it Eat ample Consider the grammar G A7 B7 17 b7 a7 b7 P7 A de ned by the rules A a BB ab7 B a aBb ab The least xed point of ltIgtG is the least upper bound of the chain w ZA0707 Z30707 where 8Alt 7 21307 07 38 CON TEXT FREE LANGUAGES AS LEAST FIXED POIN TS 55 and ELLE 07 U ab7 07 21307 0 0 ml U ab ltV 7 altIgtZB 07 It is easy to verify that raw 0 abh ram 0 abh wow 0 7 an ababh raga 0 abi aabbh gA 7 VJ ab7 abab7 abaabb7 aabbab7 aabbaabb7 83 Q7 VJ ab7 aabb7 aaabbb By induction7 we can easily prove that the two components of the least xed point are the languages LA ambmanbn l m7n 21Uab and LB a b l n 21 Letting GA A7B7a7b7a7b7P7A and GB A7B7a7b7a7b7137B7 it is indeed true that LA LGA and LB LGB We have the following theorem due to Ginsburg and Rice Theorem 383 Given a eorztecct free grammar G V7 271375 with m rumtermmals A1 Am the least red point 0f the map ltIgtG is the m tuple of languages LGA17 39 39 39 7 where G142 V7 27 P7Ai Proof Writing G as Al a11 0 1n17 Ai ai1 aini7 Am arml i l am ln7 let M max ozml be the maximum length of right hand sides of rules in P Let ltIgtZ 770 54077 77ltIgtZm 770 56 CHAPTER 3 CONTEXT FREE GRAMMARS AND LANGUAGES Then for any w E 2 observe that w E 39 iff there is some rule A a am with w 04M and that w E 39 for some n 2 2 iff there is some rule A a am with aim of the form CW U1A11u2 39ukAjiuk17 where u1 uk1 E 2 k 2 1 and some w1 wk 6 2 such that w e lt1gtig ltwwgt and w u1w1u2 ukwkuk We prove the following two claims Claim 1 For every w E 2 if A ngt it then it E g 0 for some p 21 Claim 2 For every w E 2 if it E lt1gtg with n 2 1 then A pgt w for some p S M 1 1 Proof of Claim 1 We proceed by induction on ii If A 1gt it then it aim for some rule A oz and by the remark just before the claim it E 5 0 If A E iv with n 2 1 then Al39 ngt Olij gt w for some rule A a 04 1f Olij ulAJ1u2 39 39 ukA7kuk11 where uluk1 E 2 k 2 1 then A 7 wh where m S n and w iniuz 39 quotukwkuk1 for some w1 wk 6 2 By the induction hypothesis w E 13th 01 for some ph 2 1 for every h 1 S h S k Letting p maxp1 pk since each sequence qa is an w chain we have iv 6 Z jh 0 for every h 1 S h S k and by the remark just before the claim it E g lw 0 D Proof of Claim 2 We proceed by induction on ii If it E 5 0 by the remark just before the claim then it aim for some rule A oz and A 1gt w 39 LEAST FIXED POINTS AND THE GREIBACH NORMAL FORM 57 If w E IDEJQ7 7 0 for some n 2 27 then there is some rule Al a am with am of the form Otij ulAJlu2 39 39 ukAjkuk17 where ul7uk1 E 2 k 2 17 and some wlwk E 2 such that 71 wk 6 12 th quot 7 7 and w u1w1u2 ukwkuk By the induction hypothesis7 Ajh gt wh with ph S M 1 27 and thus Al gt ulAJlu2 ukAjkuk1 gt gt w7 so that Al pgt w with pp1pk1MM1H1g M1 1 since k S M I Combining Claim 1 and Claim 27 we have 39 39 39 7 7 which proves that the least xed point of the map ltIgtG is the m tuple of languages LGA17 39 39 39 7 El We now show how theorem 383 can be used to give a short proof that every context free grammar can be converted to Greibach Normal Form 39 Least FixedPoints and the Greibach Normal Form The hard part in converting a grammar G V7 2 P7 S to Greibach Normal Form is to convert it to a grammar in so called weak Grez39bach Normal Form7 where the productions are of the form A a 104 or S a 67 where a E E 04 E V and if S a E is a rule7 then S does not occur on the right hand side of any rule lndeed7 if we rst convert G to Chomsky Normal Form7 it turns out that we will get rules of the form A a 1B07 A a 1B or A a a 58 CHAPTER 3 CONTEXT FREE GRAMMARS AND LANGUAGES Using the algorithm for eliminating e rules and chain rules we can rst convert the original grammar to a grammar with no chain rules and no e rules except possibly S a 6 in which case S does not appear on the right hand side of rules Thus for the purpose of converting to weak Greibach Normal Form we can assume that we are dealing with grammars without chain rules and without e rules Let us also assume that we computed the set TG of nonterminals that actually derive some terminal string and that useless productions involving symbols not in TG have been deleted Let us explain the idea of the conversion using the following grammar AaAaBBBb BaBdBAaaAo The rst step is to group the right hand sides 04 into two categories those whose leftmost symbol is a terminal 04 6 EV and those whose leftmost symbol is a nonterminal 04 E NV It is also convenient to adopt a matrix notation and we can write the above grammar as ABF4A C wg wtm p Thus we are dealing with matrices and row vectors whose entries are nite subsets of V For notational simplicity braces around singleton sets are omitted The nite subsets of V form a semiring where addition is union and multiplication is concatenation Addition and multiplication of matrices are as usual except that the semiring operations are used We will also consider matrices whose entries are languages over 2 Again the languages over 2 form a semiring where addition is union and multiplication is concatenation The identity element for addition is Q and the identity element for multiplication is As above addition and multiplication of matrices are as usual except that the semiring operations are used For example given any languages AM and BM over 2 where 27 E 12 we have A1 A12 311 312 AmBm U A1232 A11B12 U ALZBZJ A2 A22 321 322 A21B11 U A2232 A21B12 U A2232 Letting X A B K b aAc and Hf w w the above grammar can be concisely written as XXHK 39 LEAST FIXED POINTS AND THE GREIBACH NORMAL FORM 59 More generally7 given any context free grammar G Vth7 S with m nonterminals A17 7 Am7 assuming that there are no chain rules7 no e rules7 and that every nonterminal belongs to TG7 letting X A177Am7 we can write G as X XHK7 for some appropriate m gtlt m matrix H in which every entry contains a set possibly empty of strings in VJ 7 and some row vector K in which every entry contains a set possibly empty of strings 04 each beginning with a terminal 04 6 EV Given an m gtlt m square matrix A AM of languages over 27 we can de ne the matrix A whose entry AZ is given by AL U A297 n20 where A0 Idm7 the identity matrix7 and A is the n th power of A Similarly7 we de ne A4r where i n Am 7 U AM n21 Given a matrix A where the entries are nite subset of V where N AL7 7Am7 for any m tuple A L1L7 7 Lm of languages over 27 we let ltIgtlAlA ltIgtlAlAij Given a system X XH K where H is an m gtlt m matrix and X7 K are row matrices7 if H and K do not contain any nonterminals7 we claim that the least xed point of the grammar G associated with X XH K is KH This is easily seen by computing the approximations X 13507 7 0 lndeed7 X0 K7 and X KH KH 1KHKKH H 1HIm Similarly7 if Y is an m gtlt m matrix of nonterminals7 the least xed point of the grammar associated with Y HY H is H4r provided that H does not contain any nonterminals Given any context free grammar G V7E7P7S with m nonterminals A17 7 Am7 writing G as X XH K as explained earlier7 we can form another grammar GH by creating m2 new nonterminals YM7 where the rules of this new grammar are de ned by the system of two matrix equations X KY K7 Y HY H7 where Y The following lemma is the key to the Greibach Normal Form 60 CHAPTER 3 CONTEXT FREE GRAMMARS AND LANGUAGES Lemma 391 Given any contecot free grammar G V7 27 P75 with m nonterminals A1 Am writing G as X XH K as explained earlier if GH is the grammar de ned by the system of two matria equations XKYK7 YHYH7 as eaplained above then the components in X of the least red points of the maps ltIgtG and ltIgtGH are equal Proof Let U be the least xed point of PG7 and let V7 W be the least xed point of GH We shall prove that U V For notational sirnplicity7 let us denote ltIgtUH as HU and UK as Since U is the least xed point of X XH K7 we have U UHU Since HU and KU do not contain any nonterrninals7 by a previous rernark7 KUHU is the least xed point of X XHlU KU7 and thus7 KUHU U On the other handl7 by rnonotonicity7 KUHUHKUHUl KKUHU KUHUHU KW KUHU7 and since U is the least xed point of X XH K7 U KUHU Therefore7 U KUHU We can prove in a similar manner that W HV Let Z HlUW We have KUZ KU KUHUJr KU KUHU U7 and HUZ HU HUHU HU HU Z7 and since V7 W is the least xed point of X KY K and Y HY H7 we get V S U and W S HU We also have V KVW KW KVHV KW KmHnii 39 LEAST FIXED POINTS AND THE GREIBACH NORMAL FORM 61 and VHV KW KVHVHV KW KVHV V and since U is the least xed point of X XH K we get U S V Therefore U V as claimed El Note that the above lemma actually applies to any grammar Applying lemma 391 to our example grammar we get the following new grammar A B b aAc 3342 b aAc Y1 Y2 7 1B Q Y1 Y2 1B Q Y3 Y4 B Ma Y3 Y4 B diAa There are still some nonterminals appearing as leftmost symbols but using the equations de ning A and B we can replace A with thaAmYM and B with bY27 CY47 aA7 C7 obtaining a system in weak Greibach Normal Form This amounts to converting the matrix Hf w w to the matrix L 7 1B Q T bl2aAK1cK1aAC dbY1a aAY3a 0Y3a ba The weak Greibach Normal Form corresponds to the new system XKYK YLYL This method works in general for any input grammar with no e rules no chain rules and such that every nonterminal belongs to TG Under these conditions the row vector K contains some nonempty entry all strings in K are in EV and all strings in H are in VJ After obtaining the grammar GH de ned by the system XKYK YHYH 62 CHAPTER 3 CONTEXT FREE GRAMMARS AND LANGUAGES we use the system X KY K to express every nonterminal A in terms of expressions containing strings 04 involving a terminal as the leftmost symbol 04 6 EV and we replace all leftmost occurrences of nonterminals in H occurrences A in strings of the form AM where B E V using the above expressions In this fashion we obtain a matrix L and it is immediately shown that the system X KY K Y LY L generates the same tuple of languages Furthermore this last system corresponds to a weak Greibach Normal Form lt we start with a grammar in Chomsky Normal Form with no production S a 6 such that every nonterminal belongs to TG we actually get a Greibach Normal Form the entries in K are terminals and the entries in H are nonterminals Thus we have justi ed lemma 361 The method is also quite economical since it introduces only 7712 new nonterminals However the resulting grammar may contain some useless nonterminals 310 Tree Domains and Gorn Trees Derivation trees play a very important role in parsing theory and in the proof of a strong version of the pumping lemma for the context free languages known as Ogden7s lemma Thus it is important to de ne derivation trees rigorously We do so using Gorn trees Let N 123 De nition 3101 A tree domain D is a nonempty subset of strings in N satisfying the conditions 1 For all 111 6 Ni if m E D then u E D 2 For allu Nforevery239 N ifuz EDthenquDforeveryj1 j 2 The tree domain D 6 12 11 21 22221222 2211 is represented as follows 221 222 l 2211 310 TREE DOMAINS AND GORN TREES 63 A tree labeled with symbols from a set A is de ned as follows De nition 3102 Given a set A of labels7 a A tree for short7 a tree is a total function t D a A7 where D is a tree domain The domain of a tree t is denoted as domt Every string u E domt is called a tree address or a node Let A f7g7h7a7b The tree t D a A7 where D is the tree domain of the previous example and t is the function whose graph is 67 f7 17 h7 27g7 117a7 217a7 227 f7 2217 h7 2227b7 22117a is represented as follows f h 9 a a f h b l a The outdegree sometimes called rami cation ru of a node n is the cardinality of the set i l iii 6 domt Note that the outdegree of a node can be in nite Most of the trees that we shall consider will be nite branching7 that is7 for every node u7 ru will be an integer7 and hence nite If the outdegree of all nodes in a tree is bounded by n7 then we can view the domain of the tree as being de ned over 17 27 7n A node of outdegree 0 is called a leaf The node whose address is E is called the root of the tree A tree is nite if its domain domt is nite Given a node n in domt7 every node of the form iii in domt with i E N is called a son or immediate successor of u Tree addresses are totally ordered lemicographically u S i if either u is a pre x of i or7 there exist strings Lyn E N and i7j E N7 with i lt j7 such that u My and i djz In the rst case7 we say that u is an ancestor or predecessor of i or u dominates o and in the second case7 that u is to the left of o If y E and z 67 we say that mi is a left brother or left sibling of mg i lt j Two tree addresses u and i are independent if u is not a pre x of i and i is not a pre x of u 64 CHAPTER 3 CONTEXT FREE GRAMMARS AND LANGUAGES Given a nite tree t7 the yield of t is the string 1t110M112 WW7 where U17u27 7uk is the sequence of leaves of t in lexicographic order For exarnple7 the yield of the tree below is aaab f h 9 a a f h b l 1 Given a nite tree t7 the depth of t is the integer dt rnax ul l u E d0mt Given a tree It and a node u in d0mt7 the subtree rooted at u is the tree tu7 whose domain is the set 1 l m E d0mt and such that tuv tuv for all 1 in d0mtu Another important operation is the operation of tree replacement or tree substitution De nition 3103 Given two trees t1 and t2 and a tree address u in 1th the result of sub stituting t2 at u in t17 denoted by tllu t27 is the function whose graph is the set of pairs v7t1v l 1 E d0mt17 u is not a pre x of 1 U u 7t2i l 1 E d0mt2 Let t1 and t2 be the trees de ned by the following diagrarns Tree t1 f h 9 a a f h b l 311 DERIVATIONS TREES 65 We can now de ne derivation trees and relate derivations to derivation trees 311 Derivations Trees De nition 3111 Given a context free grammar G VEPS for any A E N an A dem39vatz39zm tree for G is a V U 6 tree t a tree with set of labels V U such that 1 We A 2 For every nonleaf node u E d0mt if ul uk are the successors of u then either there is a production B a X1Xk in P such that tu B and tuz39 X for all 239 1 S 239 S k or B a E E P tu B and tu1 6 A complete derivation or parse tree is an S tree whose yield belongs to 2 A derivation tree for the grammar G3 T7 F777 77a77 7 77a7P7 E7 where P is the set of rules E E l T E T TgtTF T F F w F a 7 7 66 CHAPTER 3 CONTEXT FREE GRAMMARS AND LANGUAGES is shown in Figure 31 The yield of the derivation tree is a a gtk a Figure 31 A complete derivation tree Derivations trees are associated to derivations inductively as follows De nition 3112 Given a context free grammar G Vth787 for any A 6 N7 if 7139 A ngt 04 is a derivation in G7 we construct an A derivatizm tree t7r with yield 04 as follows 1 If n 07 then t7r is the one node tree such that d0mt7r 6 and tw6 A 2 If A 2 ABp gt A yp 047 then if t1 is the A derivation tree with yield ABp associated with the derivation A g pr7 and if t2 is the tree associated with the production B a 7 that is7 if v X1Xi7 then d0mt2 6717 7k t2e B7 and t2i Xi for all i7 1 S i 3 k7 or if y 67 then d0mt2 E717 t2e B7 and t21 67 then It t1u lt tgL where u is the address of the leaf labeled B in t1 The tree t7r is the A derivatizm tree associated with the derivation A ngt a Given the grammar G2 E77777a77777a7P7E7 where P is the set of rules E S E E7 E a E gtk E7 E 6 Each 312 OGDEN S LEMMA 67 the parse trees associated with two derivations of the string a a gtk a are shown in Figure 32 Figure 32 Two derivation trees for a a gtk a The following lemma is easily shown Lemma 3113 Let G V7 E71375 be a contecct free grammar For any derivation A ngt 04 there is a unique A derivation tree associated with this derivation with yield 04 Con versely for any A derivation tree t with yield a there is a unique leftmost derivation A f 04 in G having t as its associated derivation tree We will now prove a strong version of the pumping lemma for context free languages due to Bill Ogden 1968 312 Ogden s Lemma Ogden7s lemma states some combinatorial properties of parse trees that are deep enough The yield w of such a parse tree can be split into 5 substrings u7v7 my 2 such that w uvxyz7 where u7v7 my 2 satisfy certain conditions It turns out that we get a more powerful version of the lemma if we allow ourselves to mark certain occurrences of symbols in w before invoking the lemma We can imagine that marked occurrences in a nonempty string w are occurrences of symbols in w in boldface7 or red7 or any given color but one color only For example7 given w aaababbbaa7 we can mark the symbols of even index as follows aaababbbaa More rigorously7 we can de ne a marking of a nonnull string w 17 7n a E as any function in 17 7n a 071 Then7 a letter wl in w is a marked occurrence iff 17 68 CHAPTER 3 CONTEXT FREE GRAMMARS AND LANGUAGES and an unmarked occurrence if 0 The number of marked occurrences in w is equal to n Ogden7s lemma only yields useful information for grammars G generating an in nite language We could make this hypothesis but it seems more elegant to use the precondition that the lemma only applies to strings w E LD such that it contains at least K marked occurrences for a constant K large enough If K is large enough LG will indeed be in nite Lemma 3121 For every contecct free grammar G there is some integer K gt 1 such that for every string w 6 2 for every marking of w ifw E LG and it contains at least K marked occurrences then there epists some decomposition ofw as w unpyz and some A E N such that the following properties hold I There are derivations S gt qu A gt uAy and A gt a so that uunxynz E LG for all n 2 0 the pumping property 2 p contains some marked occurrence 3 Either both u and 1 contain some marked occurrence or both y and 2 contain some marked occurrence 4 uzy contains less than K marked occurrences Proof Let t be any parse tree for it We call a leaf oft a marked leaf if its label is a marked occurrence in the marked string w The general idea is to make sure that K is large enough so that parse trees with yield it contain enough repeated nonterminals along some path from the root to some marked leaf Let r lNl and let p max2 max ozl l A oz 6 We claim that K p27 does the job The key concept in the proof is the notion of a B node Given a parse tree t a B node is a node with at least two immediate successors u1u2 such that for i 1 2 either u is a marked leaf or u has some marked leaf as a descendant We construct a path from the root to some marked leaf so that for every B node we pick the leftmost successor with the maximum number of marked leaves as descendants Formally de ne a path so 5 from the root to some marked leaf so that i Every node 5 has some marked leaf as a descendant and so is the root of t 312 OGDEN S LEMMA 69 ii If 57 is in the path7 57 is not a leaf7 and 57 has a single immediate descendant which is either a marked leaf or has marked leaves as its descendants7 let 57171 be that unique immediate descendant of 57 iii If 57 is a B node in the path7 then let 57171 be the leftmost immediate successors of s with the maximum number of marked leaves as descendants assuming that if 57171 is a marked leaf7 then it is its own descendant iv lf 5 is a leaf7 then it is a marked leaf and n j We will show that the path 507 7 577 contains at least 27quot 3 B nodes Claim For every 27 0 S 239 S 717 if the path 577 7577 contains 1 B nodes7 then 57 has at most pl7 marked leaves as descendants Proof We proceed by backward induction 7 ie7 by induction on n 7239 For 239 717 there are no B nodes7 so that b 07 and there is indeed p0 1 marked leaf 5 Assume that the claim holds for the path 571717 7 577 If 57 is not a B node7 then the number 1 of B nodes in the path 571717 7 577 is the same as the number of B nodes in the path 577 7 sn7 and 57171 is the only immediate successor of 57 having a marked leaf as descendant By the induction hypothesis7 57171 has at most pl7 marked leaves as descendants7 and this is also an upper bound on the number of marked leaves which are descendants of 57 If 57 is a B node7 then if there are 1 B nodes in the path 571717 7sn7 there are 1 1 B nodes in the path 577 7 577 By the induction hypothesis7 57171 has at most pl7 marked leaves as descendants Since 57 is a B node7 57171 was chosen to be the leftmost immediate successor of 57 having the maximum number of marked leaves as descendants Thus7 since the outdegree of 57 is at most p7 and each of its immediate successors has at most pl7 marked leaves as descendants7 the node 57 has at most ppd pd marked leaves as descendants7 as desired D Applying the claim to 507 since w has at least K 10273 marked occurrences7 we have pl7 2 p2737 and since p 2 27 we have b 2 27quot 37 and the path 507 7 577 contains at least 27quot 3 B nodes Note that this would not follow if we had p 1 Let us now select the lowest 27quot 3 B nodes in the path7 507 7 5n7 and denote them 117 71927173 Every B node b7 has at least two immediate successors ul lt 117 such that ul or Di is on the path 507 7577 If the path goes through u77 we say that b7 is a right B node and if the path goes through 1177 we say that b7 is a left B node Since 27quot 3 r 2 r 17 either there are r 2 left B nodes or there are r2 right B nodes in the path 117 71927173 Let us assume that there are r 2 left B nodes7 the other case being similar Let d17 7dr be the lowest 7 2 left B nodes in the path Since there are r 1 B nodes in the sequence d27 7d727 and there are only 7 distinct nonterminals7 there are 70 CHAPTER 3 CONTEXT FREE GRAMMARS AND LANGUAGES two nodes oh and dj7 with 2 S i ltj S r 27 such that tdZ tdj A7 for some A E N We can assume that oh is an ancestor of db and thus7 dj dioz7 for some 04 31 6 If we prune out the subtree tdl rooted at dl from t7 we get an S derivation tree having a yield of the form qu7 and we have a derivation of the form S gt qu7 since there are at least r 2 left B nodes on the path7 and we are looking at the lowest r 1 left B nodes Considering the subtree tdi7 pruning out the subtree tdj rooted at 04 in tdi7 we get an A derivation tree having a yield of the form vAy7 and we have a derivation of the form A gt vAy Finally7 the subtree tdj is an A derivation tree with yield p7 and we have a derivation A gt x This proves 1 of the lemma Since 5 is a marked leaf and a descendant of db z contains some marked occurrence7 proving Since oil is a left B node7 some left sibbling of the immediate successor of oil on the path has some distinguished leaf in u as a descendant Similarly7 since dl is a left B node7 some left sibbling of the immediate successor of dl on the path has some distinguished leaf in v as a descendant This proves dj 1927 has at most 2r1 B nodes7 and by the claim shown earlier7 dj has at most 102TH marked leaves as descendants Since 102TH lt 102H3 K7 this proves El Observe that condition 2 implies that z 31 67 and condition 3 implies that either u 31 6 and v 31 67 or y 31 6 and z 31 6 Thus7 the pumping condition 1 implies that the set uvnpynz l n 2 0 is an in nite subset of LG7 and LG is indeed infinite7 as we mentioned earlier Note that K 2 37 and in fact7 K 2 32 The standard pumping lemma77 due to Bar Hillel7 li erles7 and Shamir7 is obtained by letting all occurrences be marked in w E LG Lemma 3122 For every contecct free grammar G without e rules there is some integer K gt 1 such that for every string w 6 2 ifw E LG and lwl 2 K then there epists some decomposition ofw as w uvpyz and some A E N such that the following properties hold I There are derivations S gt qu A gt vAy and A gt a so that uvnxynz E LG for all n 2 0 the pumping property 2 z 31 6 3 Eitherv 31 6 ory 31 6 4 Will S K A stronger version could be stated7 and we are just following tradition in stating this standard version of the pumping lemma 312 OGDEN S LEMMA 71 Ogden7s lemma or the pumping lemma can be used to show that certain languages are not context free The method is to proceed by contradiction7 ie7 to assume contrary to what we wish to prove that a language L is indeed context free7 and derive a contradiction of Ogden7s lemma or of the pumping lemma Thus7 as in the case of the regular languages7 it would be helpful to see what the negation of Ogden7s lemma is7 and for this7 we rst state Ogden7s lemma as a logical formula For any nonnull string w 17 771 a 27 for any marking m 17 771 a 07 1 of w7 for any substring y of w7 where w 1127 with h and k lyl7 the number of marked occurrences in y7 denoted as is de ned as ihk lmyl Z 7W ih1 We will also use the following abbreviations not 071727 7 nat32 327337 7 A E w may B E 2 17 O Elmul21lmvl21Vlmyl21lm2l 21 D E lt K7 P E Vn not uvnxynz E LD Ogden7s lemma can then be stated as VG CFG 3K nat32 Vw 2 Vm marking 10 E LD gt K D 3u7v7x7y72 2 A B C D Recalling that ltABODPgtE ltAABAOADgtV PEABAOAD D P and PDQ EPA Q7 the negation of Ogden7s lemma can be stated as 3G CFG VK nat32 3w 2 3m marking w e LD A lmwl 2 K A Vu7v7x7y7z2 A A B A o A D 3 am Since P E 371 not uvnxynz LD7 72 CHAPTER 3 CONTEXT FREE GRAMMARS AND LANGUAGES in order to show that Ogden7s lemma is contradicted one needs to show that for some context free grammar G for every K 2 2 there is some string w E LD and some marking m of w with at least K marked occurrences in w such that for every possible decomposition w uvxyz satisfying the constraints A B C D there is some n 2 0 such that uvnzynz LD When proceeding by contradiction we have a language L that we are wrongly assuming to be context free and we can use any CFG grammar G generating L The creative part of the argument is to pick the right w E L and the right marking of w not making any assumption on As an illustration we show that the language L a b c l n 21 is not context free Since L is in nite we will be able to use the pumping lemma The proof proceeds by contradiction lf L was context free there would be some context free grammar G such that L LG and some constant K gt 1 as in Ogden7s lemma Let w aKchK and choose the Us as marked occurrences Then by Ogden7s lemma z contains some marked occurrence and either both um or both yz contain some marked occurrence Assume that both u and 1 contain some b We have the following situation aabbbbbbc539 WWW u 1 myz If we consider the string umwyyz the number of as is still K but the number of Us is strictly greater than K since 1 contains at least one b and thus umwyyz L a contradiction If both y and 2 contain some b we will also reach a contradiction because in the string umwyyz the number of 07s is still K but the number of Us is strictly greater than K Having reached a contradiction in all cases we conclude that L is not context free Let us now show that the language L ambncmdn l 77171 2 1 is not context free Again we proceed by contradiction This time let w aKchKdK where the Us and 07s are marked occurrences By Ogden7s lemma either both u 1 contain some marked occurrence or both y 2 contain some marked occurrence and z contains some marked occurrence Let us rst consider the case where both um contain some marked occurrence lf 1 contains some b since umwyyz E L 1 must contain only b7s since otherwise we would have a bad string in L and we have the following situation 312 OGDEN S LEMMA 73 aNab bbbbbccdd wvhv z u 1 myz Since umwyyz E L the only way to preserve an equal number of Us and d7s is to have y 6 d1 But then ozy contains cK which contradicts 4 of Ogden7s lemma lf 1 contains some c since z also contains some marked occurrence it must be some c and 1 contains only es and we have the following situation aabbccccccdd u 1 myz Since umwyyz E L and the number of as is still K whereas the number of c7s is strictly more than K this case is impossible Let us now consider the case where both y 2 contain some marked occurrence Reasoning as before the only possibility is that i E a4r and y 6 c1 aaaaaabbCCCCCCdd39 vvx FVW u 1 y I Z But then oxy contains bK which contradicts 4 of Ogden7s Lemma Since a contradiction was obtained in all cases L is not context free Ogden7s lemma can also be used to show that the context free language ambncn l 77171 2 1 U ambmcn l 77171 2 1 is inherently ambiguous The proof is quite involved Another corollary of the pumping lemma is that it is decidable whether a context free grammar generates an in nite language Lemma 3123 Given any contecct free grammar G ifK is the constant of Ogden s lemma then the following equivalence holds LG is in nite i there is some it E LG such that K S lwl lt 2K Proof Let K 102713 be the constant from the proof of Lemma 3121 If there is some it E LG such that lwl 2 K we already observed that the pumping lemma implies that LG contains an in nite subset of the form unnaynz l n 2 0 Conversely assume that LG is in nite lf lwl lt K for all it E LG then LG is nite Thus there is some it E LG such that lwl 2 K Let it E LG be a minimal string such that lwl 2 K By the pumping lemma we can write it as w unzyxz where z 31 6 11y 31 6 and liwyl S K By the pumping property uxz E LG lf lwl 2 2K then luzzllu1wyzl7loyl gt luoxyzl 7 liwyl 2 2K 7 K K and lt lurwyzl contradicting the minimality of it Thus we must have lwl lt 2K I In particular if G is in Chomsky Normal Form it can be shown that we just have to consider derivations of length at most 4K 7 3 74 CHAPTER 3 CONTEXT FREE GRAMMARS AND LANGUAGES 112 CHAPTER 2 REGULAR LANGUAGES 217 RightInvariant Equivalence Relations on 2 Let D Q 2640117 be a DEA The DEA D may be redundant for example if there are states that are not accessible from the start state The set QT of accessible or reachable states is the subset of Q de ned as Q p E Q l 3w 6 2 6q07w 29 The set QT can be easily computed by stages If Q 7A QT we can clean up D by deleting the states in Q QT and restricting the transition function 6 to QT This way we get an equivalent DEA Dr such that LD LDr where all the states of Dr are reachable Erom now on we assume that we are dealing with DEA s such that D Dr called reachable or trim 217 RIGHT INVARIANT EQUIVALENCE RELATIONS ON 2 113 Recall that an equivalence relation 2 on a set A is a relation Which is reflexive symmetric and transitive Given any a E A the set b E A l a 2 b is called the equivalence class of a and it is denoted as a7 or even as a Recall that for any two elements a b E A a H b 9 iff a 2 b and a 1 iff a 2 b The set of equivalence classes associated With the equivalence relation 2 is a partition H of A also denoted as A 2 This means that it is a family of nonempty pairwise disjoint sets Whose union is equal to A itself The equivalence classes are also called the blocks of the partition H The number of blocks in the partition H is called the index of 2 and H 114 CHAPTER 2 REGULAR LANGUAGES Given any two equivalence relations 21 and 22 With as sociated partitions H1 and H2 21222 iff every block of the partition H1 is contained in some block of the partition H2 Then every block of the parti tion H2 is the union of blocks of the partition H1 and we say that 21 is a re nement of 22 and similarly H1 is a re nement of H2 Note that H2 has at most as many blocks as H1 does We now de ne an equivalence relation on strings induced by a DEA This equivalence is a kind of observational equivalence in the sense that we decide that two strings u v are equivalent iff When feeding rst it and then 1 to the DEA u and 1 drive the DEA to the same state From the point of view of the observer u and U have the same effect reaching the same state 217 RIGHT INVARIANT EQUIVALENCE RELATIONS ON 2 115 De nition 2171 Given a DFA D QZ6q0F we de ne the relation 2D Myhz39llNemde equivalence on 2 as follows for any two strings u v E 2 2Lst iff 6q0u6q0v We can gure out what the equivalence Classes of 2D are for the following DFA Dl O owl t9 m kov with 0 both start state and unique nal state For ex ample abbabbb 2D aa ababab 2D 6 bba 2D or 116 CHAPTER 2 REGULAR LANGUAGES There are three equivalences classes Elm lalza Gals Observe that LD Hz Also the equivalence classes are in oneitoione correspondence With the states of D The relation 2D turns out to have some interesting prop erties In particular it is rightinvariant Which means that for all u v w E 2 if u 2 v then uw 2 vw Lemma 2172 Given any trirn accessible DFA D Q E 6 qo F the relation 2D is an equivalence relation which is rightinvariant and has nite index Furthermore if Q has n states then the index of 2D is n and every equivalence class of 2D is a regular language Finally LD is the union of some of the equivalence classes of 2D 217 RIGHT INVARIANT EQUIVALENCE RELATIONS ON 2 117 The remarkable fact due to Myhill and Nerode is that lemma 2172 has a converse Lemma 2173 Given any equivalence relation 2 on 2 if 2 is rightinvariant and has nite index n then every equivalence class block in the partition ll as sociated with 2 is a regular language Proof Let 01 On be the blocks of ll and assume that 01 e is the equivalence class of the empty string First we claim that for every block Cl and every w E 2 there is a unique block Cj such that CZ39UJ Q Cj Where CZ39UJ uw l u E Ci We also claim that for every w E 2 for every block Ci 118 CHAPTER 2 REGULAR LANGUAGES For every Class Gk let Dk 1 n 26 1 Where 6Za j iff Ola Q Gj Using induction we have 6z39w j iff Ciw Q ij and using claim 2 it is immediately veri ed that LDk Gk proving that every block 0 is a regular language a 217 RIGHT INVARIANT EQUIVALENCE RELATIONS ON 2 119 We can combine lemma 2172 and lemma 2173 to get the following characterization of a regular language due to Myhill and Nerode Theorem 2174 MghillNerode A language L over an alphabet Z is a regular language i it is the union of some of the equivalence classes of an equivalence relation 2 0n 2 which is rightinvariant and has nite index Given two DFA s D1 and D2 whether or not there is a morphism h D1 gt D2 depends on the relationship between 2191 and 2192 More speci cally we have the following lemma 120 CHAPTER 2 REGULAR LANGUAGES Lemma 2175 Given two DFA S D1 and D2 with D1 trim the following properties hold I There is a DFA morphism h D1 gt D2 i 2191 Q 2192 2 There is a DFA F map h D1 gt D2 i 2191 g 2192 and LD1 g LD2 3 There is a DFA Bmap h D1 gt D2 i 2191 g 2192 and LD2 g LD1 Furthermore h is surjeotive i D2 is trim 217 RIGHT INVARIANT EQUIVALENCE RELATIONS ON 2 121 Theorem 2174 can also be used to prove that certain languages are not regular For example we prove that L anbn l n 2 1 is not regular The general method is to nd three strings 51 y z E 2 such that and szL and yzg L 122 CHAPTER 2 REGULAR LANGUAGES Another useful tool for proving that languages are not regular is the so called pumping lemma Lemma 2176 Given any DFA D Q 26q0F there is some m 2 1 such that for every w E 2 ifw E LD and 2 m then there exists a decomposition ofw as w uxv where 151 67 2 uxiv E LD for alli Z 0 and Moreover m can be chosen to be the number of states of the DFA D Typically the pumping lemma is used to prove that a language is not regular The method is to proceed by contradiction ie to assume contrary to What we Wish to prove that a language L is indeed regular and derive a contradiction of the pumping lemma 217 RIGHT INVARIANT EQUIVALENCE RELATIONS ON 2 123 Thus it would be helpful to see What the negation of the pumping lemma is and for this we rst state the pumping lemma as a logical formula We Will use the following abbreviations not 0 1 2 p0512 AEwu5L v BExy e C E g m P E W not 2Lin 6 LD The pumping lemma can be stated as VDDFA Hmcpos VwZ w E LDlwl 2 m D HuxvZABCP 124 CHAPTER 2 REGULAR LANGUAGES Recalling that ABCP E ABCvaP E AABAC 3 AP and R D S E RA S the negation of the pumping lemma can be stated as 3D DFA Vmpos Elw 2 w E LDlwl Z mVuxv2ABG D P Since P E 313nm 2Lin e LD in order to show that the pumping lemma is contradicted one needs to show that for some DFA D for every m 2 1 there is some string w E LD of length at least m such that for every possible decomposition w mm satisfying the constraints 1 y e and g m there is some 2 Z 0 such that uxlv e LD We now consider an equivalence relation associated With a language L 218 MINIMAL DFA S 125 218 Minimal DFA S Given any language L not necessarily regular we can de ne an equivalence relation p L which is right invariant but not necessarily of nite index However when L is regular the relation p L has nite index In fact this index is the size of a srriallest DFA accepting L This will lead us to a construction of minimal DFA s De nition 2181 Given any language L over 2 we de ne the relation p L on 2 as follows for any two strings u v E 2 2Lva i Vw E 2uw E L i mu 6 L We leave as an easy exercise to prove that p L is an equiv alence relation which is right invariant It is also clear that L is the union of the equivalence classes of strings in L 126 CHAPTER 2 REGULAR LANGUAGES When L is also regular we have the following remarkable result Lemma 2182 Given any regular language L for any accessible DFA D QZ6q0F such that L LD pL is a rightinvariant equlualence rela tion and we have 2D g pL Furthermore if pL has m classes and Q has n states then m g n Lemma 2182 shows that when L is regular the index m of pL is nite and it is a lower bound on the size of all DFA s accepting L 218 MINIMAL DFA S 127 It remains to show that a DFA with m states accepting L exists However going back to the proof of lemma 2173 starting with the right invariant equivalence relation p L of nite index m if L is the union of the classes Oil CZk the DFA DPL 17quot397m7276717ilv39 7ik7 where 6Z a j i Ola Q Cj is such that L LDpL Thus DpL is a minimal DFA accepting L In the next section we give an algorithm which allows us to nd DpL given any DFA D accepting L This algorithms nds which states of D are equivalent 128 CHAPTER 2 REGULAR LANGUAGES 219 State Equivalence and Minimal DFA S The proof of lemma 2182 suggests the following defini tion of an equivalence between states De nition 2191 Given any DEA D Q E 6 go F the relation E on Q called state equivalence is de ned as follows for all p q E Q p E q iff Vw E Z6pw E F iff 6qw E F When p E q we say that p and q are indistinguishable It is trivial to verify that E is an equivalence relation and that it satis es the following property ifp E q then 6p a E 6q a for all a E Z 219 STATE EQUIVALENCE AND MINIMAL DFA S 129 In the DFA of Figure 218 states A and C are equivalent No other two states are equivalent Figure 218 A non minimal DFA for a7babb If L LD the following lemma shows the relationship between p L and E and more generally between the DFA DH and the DFA D 2 obtained as the quotient of the DFA D modulo the equivalence relation E on Q and de ned such that 130 CHAPTER 2 REGULAR LANGUAGES D E Evzv E7Q0EvF E7 Where 5 E lplaa W19 IN The minimal DFA D E is obtained by merging the states in each block of the partition H associated With E forming states corresponding to the blocks of H and drawing a transition on input a from a block CZ to a block Gj of H iff there is a transition q 6p a from any state 10 E CZ to any state q E Gj on input a The start state is the block containing go and the nal states are the blocks consisting of nal states 219 STATE EQUIVALENCE AND MINIMAL DFA S 131 Lemma 2192 For any accessible DFA D Q E 6 go F accepting the regular language L LD the function go 2 gt Q de ned such that 6ltQO7 u induces a szectwn SayPL gt Q E de ned such that MpL 6ltQO7UE Furthermore we have vihag Mm it WWW E 901 Conseguentlg g3 induces an isomorphism of DFA s 93 DH gt D E an invertible F map whose inuerse is also an F map from a homework problem such a map must be an invertible proper homomorphism whose inuerse is also a proper homomorphism The DFA D E is isomorphic to the minimal DFA DpL accepting L and thus it is a minimal DFA accepting L 132 CHAPTER 2 REGULAR LANGUAGES Note that if F Z then E has a single block Q and if F Q then E has a single block In the rst case the minimal DEA is the one state DFA rejecting all strings In the second case the minimal DFA is the one state DEA accepting all strings When F 7A Z and F y Q there are at least two states in Q and E also has at least two blocks as we shall see shortly 21 9 STATE EQUIVALENCE AND MINIMAL DFA S 133 It remains to compute E explicitly This is done using a sequence of approximations In View of the previous discussion we are assuming that F 7 Q and F y Q which means that n 2 2 where n is the number of states in Q De nition 2193 Given any DFA D Q E 6 go F for everyi Z 0 the relation 2 on Q called Zstate equiv alence is de ned as follows for all p q E Q 192qu VUJEZquot gt 619 w E F iff 6qw E F When p E q we say that p and q are iz ndz stz ngm shable 134 CHAPTER 2 REGULAR LANGUAGES It remains to compute E 1 from 2139 which can be done using the following lemma The lemma also shows that 722390 Lemma 2194 For any accessible DFA D Q72767QO7F7 for allpvq E Q7 p EH1 q i p 2139 q and 619 a 2139 6qa for every a E 2 Furthermore ifF 7A Q and F 7A Q there is a smallest integer i0 g n 2 such that Ei01Ei0 Note that if F Q or F ll then E 20 and the in ductive characterization of Lemma 2194 holds trivially Using lemma 2194 we can compute E inductively start ing from EU F Q F and computing E 1 from 2139 until the sequence of partitions associated with the EZ39 stabilizes 21 9 STATE EQUIVALENCE AND MINIMAL DFA S 135 There are a number of algorithms for computing E or to determine whether p E q for some given 19 q E Q A simple method to compute E is described in Hopcroft and Ullman It consists in forming a triangular array corresponding to all unordered pairs q with p y q the rows and the columns of this triangular array are indexed by the states in Q where the entries are below the descending diagonal Initially the entry p q is marked iff p and q are not 0 equiyalent which means that p and q are not both in F or not both in Q F Then we process eyery unmarked entry on every row as follows for any unmarked pair q we consider pairs 619 a6qa for all a E Z If any pair 609 a 6q a is already marked this means that 619 a and 6q a are inequiyalent and thus 10 and q are inequiyalent and we mark the pair p q 136 CHAPTER 2 REGULAR LANGUAGES We continue in this fashion until at the end of a round during which all the rows are processed nothing has changed When the algorithm stops all marked pairs are inequivalent and all unmarked pairs correspond to equivalent states Let us illustrates the above method Consider the follow ing DFA accepting a babb EUQUUEB UUUUUUUUUUQ QEQUQG The start state is A and the set of nal states is F 219 STATE EQUIVALENCE AND MINIMAL DFA S 137 The initial half array is as follows using X to indicate that the corresponding pair say E A consists of in equivalent states and D to indicate that nothing is known yet UQUU LX E E El wxgn QXD UX After the rst round we have Q03 LXXDD D E UUXXIZI Qgtltgtlt DX After the second round we have THUde LX X n X o3gtltgtltgtlt Qgtltgtlt UX 138 CHAPTER 2 REGULAR LANGUAGES Finally nothing changes during the third round and thus only A and G are equivalent and we get the four equiv alence classes A C B D E There are ways of improving the ef ciency of this algo rithm see Hopcroft and Ullman for such improvements Fast algorithms for testing Whether p E q for some given pq E Q also exist One of these algorithms is based on forward closures an idea due to Knuth Such an algorithm is related to a fast uni cation algorithm Eulerian and Hamiltonian Cycles Jean Gallier June 217 2005 Chapter 1 Directed Graphs Paths Recall that a directed graph G is a pair G VE WhereE g V X V A pair u v E E is called an edge of G note that u v is allowed Given any two nodes u v E V a path from u to v is any sequence of n l edges n 2 0 7101 111112 own If n 1 a path from u to v is simply a single edge u A graph G is strongly connected if for every pair u v E V X V there is a path from u to v A closed path or cycle is a path from some node n to itself 4 CHAPTER 1 DIRECTED GRAPHS PATHS We will restrict out attention to nite graphs ie graphs V E where V is a nite set De nition 101 Given a graph G an Eule rian cycle is a cycle in G that passes through all the nodes possibly more than once and every edge of G exactly once A Hamiltonian cycle is a cycle that passes through all the nodes exactly once note some edges may not be traversed at all Ealerian Gycle Problem Given a graph G is there an Eulerian cycle in G Hamiltonian Gycle Problem Given a graph G is there an Hamiltonian cycle in G Chapter 2 Eulerian Cycles The following graph is a directed graph ver sion of the Konigsberg bridge problem solved by Euler in 1736 The nodes A B C D corre spond to four areas of land in Konigsberg and the edges to the seven bridges joining these ar eas of land The problem is to nd a closed path that crosses every bridge exactly once and returns to the starting point 6 CHAPTER 2 EULERIAN CYCLES In fact the problem is unsolvable as shown by Euler because some nodes do not have the same number of incoming and outgoing edges In the undirected version of the problem some nodes do not have an even degree C B Figure 21 A directed graph modeling the Konigsberg bridge problem It may come as a surprise that the Eulerian Cy cle Problem does have a polynomial time algo rithm but that so far not such algorithm is known for the Hamiltonian Cycle Problem 7 The reason Why the Eulerian Cycle Problem is decidable in polynomial time is the following theorem due to Euler Theorem 202 A graph G V E has an Enleridn cycle ijj the following properties hold I The graph G is strongly connected 2 Every node has the same number of in coming and outgoing edges Proving that properties l and 2 hold if G has an Eulerian cycle is fairly easy The converse is harder but not that bad tryl 8 CHAPTER 2 EULERIAN CYCLES Theorem 202 shows that it is necessary to check Whether a graph is strongly connected This can be done by computing the transitive closure of E Which can be done in polynomial time in fact 0013 Checking property 2 can clearly be done in polynomial time T hus the Eulerian cycle prob lem is in 77 Unfortunately no theorem analogous to Theo rem 202 is know for Hamiltonian cycles Chapter 3 Hamiltonian Cycles A game invented by Sir William Hamilton in 1859 uses a regular solid dodecahedron whose twenty vertices are labeled with the names of fa mous cities The player is challenged to travel around the world by nding a closed cycle along the edges of the dodecahedron which passes through every city exactly once this is the undi rected version of the Hamiltonian cycle prob lem 10 CHAPTER 3 HAMILTONIAN CYCLES In graphical terms assuming an orientation of the edges between cities the graph D shown in Figure 31 is a plane projection of a regular dodecahedron and we want to know if there is a Hamiltonian cycle in this directed graph Figure 31 A tour around the world77 Finding a Hamiltonian cycle in this graph does not appear to be so easy A solution is shown in Figure 32 below A solution Figure 32 A Hamiltonian cycle in D
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'