Algebraic Automata Theory
Algebraic Automata Theory MAD 6616
Popular in Course
Popular in Mathematics Discrete
11883 - GEO 105 - 01
verified elite notetaker
verified elite notetaker
verified elite notetaker
verified elite notetaker
PSY - 0010
verified elite notetaker
This 24 page Class Notes was uploaded by Elizabeth Haley on Wednesday September 23, 2015. The Class Notes belongs to MAD 6616 at University of South Florida taught by Natasa Jonoska in Fall. Since its upload, it has received 33 views. For similar materials see /class/212665/mad-6616-university-of-south-florida in Mathematics Discrete at University of South Florida.
Reviews for Algebraic Automata Theory
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 09/23/15
April 7 2005 MAD 6616 ALGEBRAIC AUTOMATA THEORY NOTES 9 CELLULAR AUTOMATA AND DE BRUIJN GRAPHS INSTRUCTOR NATASA JONOS KA 1 DEFINITION OF CELLULAR AUTOMATON Up till now we were considering only models for sequential computing nite state automata Starting from this set of notes we concentrate to some models for parallel computing but mostly on cellular automata or otherwise called self reproducing automata They were introduced by von Neumann in the mid 50 s and they rose a big interest in the 60 s The interest somehow almost died in the 70 s and it was again awakened in the 80 s when people working on several different subjects physics symbolic dynamics biology formal language theory realized that they are studying the same objects We will be mostly interested in the formal language theoretic aspects of cellular automata and its applications in symbolic dynamics We start with the de nition Let A be a nite set which is called a set of states opposite to the case in the sequential computation when A was the alphabet of the input string We denote with d 0 lt d the dimension of the cellular automaton The nite set N Q Zd is used to specify the local neighborhood of the cellular automaton For 1 E Zd we will denote u l N thesetuNujlj EN Example 11 1 lfd 1 and N 7r7101r then for u E Z we have uN U7 r u 71uu l 1 u l r In this case N is called onedimensional neighborhood of radius r 2 If at 2 and N ijl0 ij r then for u E 22 we have that u l N is the set of all elements in 22 which are on distance at most r from u in this case the distance of ij and i j is the maxli 7 i l lj 7 j l And in general for a dimension d if N i1idl0 2 rj1dandu E Zdwe havethat uNistheset of elements in Zd which are on distance at most r from u The neighborhood N is called d dimensional neighborhood of radius r It is a square box woth side 2r l 1 centered at the origin In the case of r 1 and d 2 the 2 dim neighborhood of radius 1 is called the Moore neighborhood lfd 2 andN 710 071 00 10 0 1 then Nis called the uon Neumann neighborhood A DJ V A global con guration is an assignment of a state to each element of Zd ie it is a function from Zd to A and is denoted with one of the greek letters If M is a subset of Zd and 04 is a global con guration then 04M is the restriction of 04 on the index set M The set of global con gurations will be denoted with P ala Zd 7 A A local function also known as local dynamics function is a function l AN 7 A De nition 12 cellular automaton A cellular automaton CA is afour tuple A d Nl where A is a nite set of states at is the dimension N is a nite subset of 2 1 called the neighborhood andl AN 7 A is the function The global function of the cellular automaton is the function G P 7 P de ned with Gl04v l04vN 1 Often refer to the global function G as a CA determined by the four tuple A d Nl lntuitively we think of cellular automaton as an array of nite state automata each connected with its neighbors Instead of requiring a head and a successive scanning of a tape in determining the next state the automata determine the next state of their computation by looking at the states of their neighbors Since they don t require a tape and there is no terminal state the computation can potentially be performed inde netly This means that we can consider the dynamics of cellular automata ie F G1 can be viewed as a dynamical system with a discrete time step The computation and the dynamics of CA should not be considered separately This is why CA are also known as self reproducing systems Example 13 1 Let A 01 1 1 N 7101 andl AN a A such that la1a0a1 11 a0 a1 mod 2 If 04 is the initial con guration with 04 0 for i 7 0 and 040 1 then the global computation on 04 is Gla 11204 at 00000100000 Gm 00001110000 a a 00010101000 G a 00110101100 a a 01000100010 2 The Conway s Game of Life is a two dimensional CA which has the following de nition The neighborhood N is the Moore s neighborhood and the set of states is A 01 which represents dead and alive The local function is de ned with l AN a A with lw 1 if w contains three 1 s not in 00 position ie the cell becomes alive if there are enough living cells lw w0 0 if w contains two 1 s not in position 00 lw 0 in the other cases that means that the cell dies if there is not enough life environment only one or no cell in the neighborhoods is alive or if it is overcrowded there are more than three live cells surrounding For example initial con guration of 2 x 2 square of 1 s remains unchanged so does a hexagon But a square of size 3 x 3 dies in three steps The shift a on Z i 1 d The set of states is any nite set the neighborhood is N 0 0 10 0 where 1 appears at the i th position The local function is l AN a A which is essentially l A a A ie the identity on A since it is a singleton and the global function G a is ai04v avHOVWOJDVWO In case at 1 there is only one shift map 01 denoted a and the neighborhood is N In case at 2 there is ago and can Onedim CA is called one way if the neighborhood is N 01 An example of a oneway CA may be any nite semigroup S In that case A S and the local function CSXSHSiS zyHmy A DJ V A g V We may consider some times CA with a quiescent state q In that case the local rule has to have the property that lw q whenever w is a constant and has value q A global con guration 04 is called nite if only nite number of cells are in the non quiescent state If it is not stated otherwise we will assume that our con gurations may be in nite In all examples above state 0 is quiescent Example 14 The ring squad synchronization problem Consider a nite con guration of a 1 dim CA We can call the cells soldiers except the one at the end called the general The neighborhood of the local rule is of radius r 1 The problem is to specify the set of states A and the local rule l such that when the general undergoes the transition into the state re when ready he does not take any initiative afterwards and the soldiers re ie get into one same state at the same time In the beginning it is assumed that all soldiers are in one state say 0 except the general say 1 So the initial con guration is qq1000 Oqu q is the quiescent state and at the time to re the con guration is qq1fff fqu The tricky part is to construct the solution which will work with xed nite number of states A for arbitrarily length n of the ring squad Since the information can not travel faster then one cell at a unit time it is clear that for length of the squad n the shortest time to get into the ring con guration must be 2n H 2 It should be relatively easy to nd the solution in 3n to 8n units of time I challenge you to try to nd it The application behind the ring squad synchronization problem is the following if we have an array of processors connected in a line how to de ne their local links such that after an initial signal to the rst one all processors get to a state from which they can start the same computation at the same time 2 DE BRUIJN GRAPH AND 1 DIM CA There has been an extensive study of the 1 dim CA especially of the local rules with radius one but still many problems remain unanswered Here we will present connections between CA and the regular languages For that purpose we introduce the de Bruijn graph First consider the 1 dim CA A 1Nl where A 01 and N H10 1 There are 256 such CA since the local rule is de ned on 8 triples 000 111 and there are 28 functions l A3 H A If we write every local rule as a binary string of length 8 then to each local rule we can associate a decimal number from 0 to 255 according to the value of the binary string For example the CA de ned in example 1 has the value 150 000 H 0 001 H 1 010 H 1 011 H 0 The binary string representing this local function 100 H 1 is 100101102 15010 101 H 0 110 H 0 111 H 1 We will refer to the decimal number 0 255 in considering some speci c examples For example the rule 0 is the constant rule mapping every con guration into the con guration 04 Z H 0 and the rule 255 is the dual one of 0 mapping every con guration to the constant con guration having all cells equal to 1 Using the duality 0 H 1 and 1 H 0 out of 256 CA there are only 128 distinct dynamics of CA De nition 21 de Bruijn graph Let G be the global map of the CA A 1 Nl where the neighborhood N is of radius r The de Bruijn graph of G1 is the graph 91 V1El M where V A2 El A2 1 such that the edge 0LT a0 ar has source 0LT so a1 and target a71a0a7 The labeling function 1 El H A X A is such that for e 0LT ao ar we have 1ea0le In the case when A 0 1 the de Bruijn graph of the rule 150 is presented in Figure We will write 1 1 A2 understanding that for the edge e a ao a we have 1e a0 and A2e le We will refer to 1 as to input label and to 2 as to output 10 11 10 FIGURE 1 label A bi in nite path in 91 is a map 7139 Z a E denoted with e1e0e1 such that 8a 25613971 for all i E Z Proposition 22 a Let L 04 04 is input label of a bi in nite path in 91 Then L P AZ b If m and 71392 are two bi in nite paths in 91 such that 17r1 17r2 then m 7r2 c If7r is a bi in nite path in 91 then Gl17r 27r Proof a We have that 04 aim1 aHk is a CA con guration iff by the construction Of 917 aiir7ai1ir7 ai 2140LH147 0441 71441 04 aHk aHkH is a bi in nite path in 91 b follows from the fact that for every path with length 2p 1 the middle edge7 ie the r 1 st edge is uniquely determined by the label of the path c this follows directly from the de nition of 91 D The previous proposition states that the whole computation of the 1 dim CA is presented by the de Bruijn graph Let 04 be a con guration of a 1 dim CA We write Fa for the set of all factors of a ie Fa w E A 3i k E Z on xiM w For a set of con gurations S we write FS UDLESFW Corollary 23 IfP is the set of all con gurations of J dim CA then FP 14 and FGl is a factorial transitive and regular language Proof follows from the de nition of the de Bruijn graph7 the fact that it is irreducible and the Proposition above and the Corollary 24 notes 5 B Let 50 AZ F We say that the i th time step image of a 1 dim CA is Si GlSi 1 which is the set of con gurations that can be obtained after iterating the CA i times on the set of all con gurations How to get the i th time step image of CA One obvious way of doing this is by expanding the radius of G1 ie clearly7 G12 has radius 2r G has radius 3r etc Then we can construct the de Bruijn graphs 9 912 913 for the CA maps G1 C1127 Gil Then the output label of 91239 is a FSA of the set of factors of the i th time step image This graph has exactly lAlZiT vertices We know that every FTR language has a unique up to an isomorphism minimal deter ministic FSA This is an irreducible presentation We can consider 9 as a FSA accepting an FTR language LGL over the alphabet A x A Then LGL has a minimal DFA Let Mal be that minimal DFA for LOL The Mal of the rules 187 48 and 56 are depicted in Figure 00 0 0 00 10 C Q 10 01 01 10 01 11 1 0 10 00 10 rule 18 rule 48 rule 56 FIGURE 2 Now we use the product construction to study the properties of CA Product construction with de Bruijn graphs Let M 621 T8 be an irreducible FSA with LM L Let 9 be a de Bruijn graph for a CA with global function G1 The product M x 91 is an automaton with set of vertices states V Q x V and a set of edges E such that there is an edge from q 12 to q of iff there is an edge e in M from q to q with label equal to the input label of an edge in G from u to of The labeling in g is equal to M We 1 inM q o o 11 015 in 3 O O W1 W1 q V V O V inGl 1 FIGURE 3 take that C is trimmed such that every vertex lies on a bi in nite path ie for every vertex there is an edge coming in and an edge going out see Figure For each directed graph C with edges labeled from the alphabet A x A we consider two FSA Cquot and G The FSA Cquot is the graph 6 having only the input labels and C is G with opnly the output labels Let the graph NC be de ned as follows if 91 has fewer number of vertices than MC then NC 9 otherwise NC M0 Let Cl NOW and for t gt 1 G C ng x New where C ng is All1 with no input labels By Proposition 512 is a FSA recognizing and which is G is a FSA recognizing FSi 1 1n the case when NC M0 the presentation MC has fewer number fvertices then MW and thus we have reduced the base of the exponential growth of the vertices in the FSA accepting the set of factors of the i th time step image For example the graphs G for t 1234 for the rule 18 have 3 6 14 28 vertices respectively but the graphs 91239 have 4 16 64 256 vertices Proposition 24 The trimmed graph All1 239s irreducible Proof Exercise D The above exercise shows that for each i is an FTR language Proposition 25 For each i Squot1 Q Si Proof We proceed by induction on i for i 07 SO AZ It is clear that G1AZ Q AZ The set of bi in nite sequences obtained as output labels of the bi in nite paths in G is exactly Si7 and the set of bi in nite sequences obtained as input labels of bi in nite paths in Gi is Si l By the inductive hypothesis Si 1151 1 Q Sl l If 04 is a bi in nite sequence of Si 7 then there is a bi in nite path in Gi with input label a By the construction of Gi 1104 is the output label of that path7 and so Gla 6 Si D March 1 2005 MAD 6616 ALGEBRAIC AUTOMATA THEORY NOTES 7 CODES AND FINITE TRANSDUCERS INSTRUCTOR NATASA JONOS KA 1 CODES In this part we introduce some basic de nitions and properties of coding theory We will address some of these concepts later when we will consider 1 dim cellular automata An informal de nition of coding may be the following Let A and B be two alphabets and let 4p B H A be a function The function 4p is said to be coding if it is injective In that case the set 4pB Q A is called a code In some sense every element of B is uniquely determined by elements of A Example 11 1 Let Nk 0 k H 1 be an alphabet with k elements and let 4p N H N be de ned with i H where is the presentation ofi at the base k 2 Consider the function 4p A H 01 where A 11 04 and 4p is the morphism induce by ltpa 0 1 Note that in this case the submonoid 0i1li 1 k is free We will concentrate our discussion on the case when 4p is a monoid morphism In that case it is easy to see exercise that L Q A 4pB is a free monoid generated by the set 4pB We will call the set 4pB a code In other words De nition 12 code A subset X of A is a code iffor dllnm 2 1 and 1 Mum1 m E X the condition 135235 mlxZmn implies that n m and mi for dlli 1 n So X is a code if XJr has a unique factorization in words in X Since 1 z m a code can never contain the empty word Note that if for all w E X the length lwl const then X is a code Note If F is a free group and G is a subgroup of F then G is free too But in case of monoids a submonoid of a free monoid is not necessarily free For example let 01 ab ba be a generating set for a submonoid of A ab Then it is not free since a ba ab a If B 131 b2b3 and 4p B H A is a morphism induced by the function 1 H a 2 H ab and 3 H ba then 4p is not injective ie coding since 4pb1b3 4pb2b1 Example 13 3 For any alphabet A the set X A is a code More generaly ifp 2 1 is an integer than X AZ7 is a code 4 The set X aa baa ba over A ab is a code lndeed assume it is not a code Then there is a word in X1 of minimal length that has two distinct factorizations w1nimn Since w has a minimal length 1 7 Thus 1 is a proper left factor of 3 or Vice versa Assume that 1 is a proper left factor of This implies that 2 act and that z z am In general if m 7 z1z2 zpa then mp act and mg aa Thus m 1m1 ml mp1a which contradicts the existance of two factorizations 1 Exercise 14 Is X a ab abb a code A set X is pre x if no element of X is a proper pre x of another word of X Similarly we can de ne su x A set is bipre x or just bi x if it is both pre x and suf x Proposition 15 Every pre x su ix bipre x set of words is a code Proof If X is not a code then there is a word w x1xn x l aving a minimal h m length and two distinct factorizations But in that case x1 7 x l and either lxll lt or lt Contradiction with X being a pre x set D Note if X is a code then XJr is a transitive language Now we will establish a link between FTR languages and pre x codes Proposition 16 For every regular language X FX is an FTR language Converse for every FTR language L there is a regular pre x code X such that L FX Proof It is clear that if X is a regular subset of A then FX is regular factorial and transitive Let L be an FTR language By notes 5 Prop 23 and Cor 24 there is an irreducible deterministic FSA M Q I T 8 having T I Q such that LM L Let q E Q De ne X to be the rst return code of q ie X xl p 6 P1 sp tp q Vp p E Pp p p tp sp 7 q where P1 is the set of paths with positive length in M We will show that X is pre x Assume it is not then there are x y E X with x being a pre x of y But this is a contradiction with the determinism of M and with the de nition of X It is clear that L FX le if w E L then there is a path from a state q to a state q in M with label w Since M is irreducible there is a path from q to q with label u and from q to q with label u Then u wu is a label of a path from q to q and so u wu E X ie w E FX The other inclusion is clear D Exercise 17 Determine the pre x code X in each of the examples in Figure 1 such that the FTR language L recognized with the irreducible automaton is L FX a a b b a a a b a 1 2 FIGURE 1 We say that an FTR language L is generated by X if L FX The previous proposition shows that every FTR language is generated by a regular pre x code A code X is said to be an in x code if for all w E X the set of all factors of w is just w ie Fw O X w A code X is said to be comma free code if X2 AJVXAJr Q The idea behind the comma free codes is to have code words that even after concatenating the words one would identify the individual factors uniquely ie one does not need a comma to separate the words Example 18 0 a2 baba is bi x but not in x o 1 aba is not in x but a2aba is o A singleton w for lwl 2 1 is in x pre x suf x but may not be comma free Note that every subset of a comma free code is also comma free Hence one can come up with a large set of examples in the following way the set bab is comma free So foe every L1 Q baJr and L2 Q ab the language L1L2 Q bab is comma free Proposition 19 The family of in x codes is closed under concatenation Proof Let X1 and X2 be two in x codes and let w E X1X2 Consider u E Fw X1X2 Then there are zy such that w mug E X1X2 and since u E X1X2 we have that u u1u2 such that ul 6 X1 and u2 6 X2 But this implies that w1 zul 6 X1 and w2 u2y 6 X2 Note that the possibility that one of z or y is in X1X2 fails since both X1 and X2 are in x Now Fw1 X1 w1u1 Since X1 is in x u1 w1 ie z 1 Similarly y 1 and u w E 11 Algorithms for nite codes For a nite set of words in general it is not straight forward to determine whether it is a code or not It can be accomplished by using the Flower automaton Let X be a nite subset of A1 We de ne a special automaton for X with MX Q I T 8 containing a set of states Q the labeling transition 8 Q Q x A x Q a set of initial states I and a set of terminal states T such that I H and T 9X The set of states is de ned to be Q 7Uw6X qi lwa1quot39an7i27H7n and the transitions in MX are E 7a17q22Ulw aliiian EXU qr livamlfllw aliiian EX Uq aiq 1 lw a1an E X i 2n71 We call MX a semi Flower automaton for X If 9lt 99 we say that MX is the Flower automaton for X Consider the labeling function 8 a A such that q a q a The labeling is extended to paths in the usual way We say that an automaton is unambiguous if and only if for all p E Iq E T and w E A there is at most one path from p to q with label w Proposition 110 A nite set of words X is a code if and only if the Flower automaton MX is unambiguous Proof Exercise B Let M QI T 8 be an automaton over A The square of M is the automaton M x M 5M Q717T75 Where Q Q X Q I I X I T T x T and the transitions are 8 Q Q x E x Q such that 5 pqa iv60 l ROMP7 was 6 8 a 6 E The following proposition follows from the de nitions Proposition 111 An automaton M QI T 8 is unambiguous if and only if there is no path in 5M from a state p p to a state qq uisiting a state r s with r 7 s Proof Exercise D Using the square of a Flower automaton and Proposition 111 we can determine whether a nite set X is a code 2 FINITE TRANSDUCERS In this section we will introduce two algorithms concerning codes through a nite state sequential machines and transducers Let A and B be nite alphabets De nition 21 nite transducer A nite state automaton ouer the alphabet A x B is called a nite transducer Every nite transducer T de nes a relation RT Q A x B in the following way u7 u E RT iff there is a path e1ek with ei uhui such that se1 is an initial state7 tek is a terminal state and u u1uk and u U1Uk The pair 11 is in the relation RT if there is an initial state which is also a terminal The domain of the relation RT is DT u E A u E 13 u7 u E RT We say that T is single ualued if for each u E DT7 the set of 12 s for which u7 u E RT is a singleton In other words7 T is single valued iff RT de nes a function from DT into B We will use the word transducer for a nite transducer A non deterministic sequential machine NSM is a transducer M for which each edge is labeled by a7 b where a E A and b E B Proposition 22 It is decidible whether a NSM is single ualued Proof There is a simple procedure for deciding whether a NSM is single valued Let M Q717T78 be an NSM with a unique initial state go We will use the product construction Let M ba a FSA over the alphabet 07 1 with states Q x Q initial state qo7 go and terminal states T x T The set of edges is de ned as follows there is an edge from p7p to q7 q with label bit iff there is a E A such that pa7 bp 7 q7a7b7q E E in M If for every such a7 we have that b b 7 then bit O7 otherwise bit 1 Let M be obtained from M by trimming Then M is single valued iff 1 does not appear as a label of any edge of M Exercise D There are several algorithms for deciding whether a given regular language is a code Most of them require a DEA recognizing the language7 or at least a nonambiguous FSA ie a nite state automaton with the propery that for every word w7 there is at most one path from one state to another with label w The following algorithm uses the above construction and has no restrictions on the FSA recognizing L Proposition 23 It is decidible whether a regular language L Q A gerenrates a free sub monoid of A ie whether it is a code Proof Let M Q717T78 be a FSA recognizing a regular language L Assume that go is a unique initial state If go 6 T then 1 E L and so L is not a code If qo T then M be a NSM over the alphabet A x 01 constructed as follows the set of states is Q the initial and the terminal state is qo7 and the edges are de ned with pa07 ql p7 tag 6 8U pa1q0lpaq E 8 q E T We use 1 for marking the end of the word in L Then L is a code iff M is single valued Exercise D We will show one more algorithm using the single valudness of NSM Let S be a monoid Let E I be an indexed family of nontrivial submonoids of S We say that S is the coproduct of this indexed family of monoids if a this family generates S and b for each monoid N and each family of monoid homomorphisms hi S1 A N E I7 there is a unique homomorphism h S a N for which each of the composite homomorphisms S Q S a N E I coincide with h This is a standard de nition of a coproduct in category theory The concept of coproduct generalizes the concept of a code in the following sense For an indexed family E I of non nul strings in a submonoid S of A S is the coproduct of the indexed family of submonoids E I iff the subset E I is a code that generates S and has the property that s 57 only when i j In other words every element from S can be uniquely decomposed into factors fromquot sets For example consider S1 fl and S2 bb then the set S SfS is not a coproduct of S1 S2 since I 6 S1 But if we consider S 1 1 then S SiS is a coproduct of S 7 1 S2 Now let S1 Sk be a family of regular submonoids of A Let Mi Qi7 qu7 38 be a FSA recognizing Si We may assume that T qu and that Q Qj Q 7 j Let S Q A be a submonoid generated by S1 Sk Proposition 24 It is decidible whether S is a coproduct of S1 Sk Proof Let B 131 bk O A Q and Q qo U Lg31620 Let M be an NSM constructed as follows for each p aq 6 Mi place p a 1 q in M lfp qol39 then add q0a 1 q in M lf q qol39 then for each j distinct from i add p a bi qoj in M The initial state is go and the terminal states are qol39 1 Then S is coproduct of S E 1 iffM is single valued Exercise D 3 DNA CODE WORDS ln laboratory experiments the complementarity of the bases may potentially pose problems if some DNA strands can form non speci c hybridization and partially anneal to strands that are not their complete complements This type of hybridization can occur during a polymerase chain reaction self assembly step or in the extraction process Particular cases when mishy bridization can happen are presented in Figures We see how we can use formal languages methods and automata theory to develope good DNA code words We will denote by A the special case when the alphabet is A G C T representing the DNA nucleotides Since in this context A has a special meaning in the rest of the section we will use the notation E to denote the nite alphabet of symbols For words representing DNA sequences we will use the following convention A word u over A denotes a DNA strand in its 5 a 3 orientation The Watson Crick complement of the word u also in orientation 5 a 3 is denoted with For example if u AGGC then Z GCCT The intramolecular hybridization happens when two sequences one being a reverse complement of the other appear within the same DNA strand see Fig 2 In this case the DNA strand forms a hairpin u u H lll lllllll a b FIGURE 2 lntramolecular hybridization a the reverse complement is at the begin ning of the 5 end b the reverse complement is at the end of the 3quot The 3 end of the DNA strand is indicated with an arrowl Two particular intermolecular hybridizations are of interest see Fig 3 In Fig 3 a the strand labeled u is a reverse complement of a subsequence of the strand labeled 1 and in the same gure b represents the case when u is the reverse complement of a portion of a concatenation of v and w The Watson Crick complementarity of the nucleotides is modeled with an involution map ping on the alphabet that we call 0 The set of code words that we would like to obtain would have the property that the unwanted hybridizations as shown in Figures 2 and 3 are not u u V V W a b FIGURE 3 Two types of intermolecular hybridization a one code word is a reverse complement of a subword of another code word b a code word is a reverse complement of a subword of a concatenation of two other code words The 3 end is indicated with an arrowi permitted For this purpose consider the Watson Crick complementarity as an involution on the alphabet A that is antimorphism on A An involution t9 2 a 2 of a set 2 is a mapping such that 02 equals the identity mapping 00z z Vm E 2 The mapping 1 A a A de ned by 1A T 1T A 1C G 1G C is an involution on A and can be extended to a morphic involution of A Since the Watson Crick complementarity appears in a reverse orientation we consider another involution p A a A de ned inductively ps s for s E A and pus pspu spu for all s E A and u E A This involution is antimorphism such that puv p Upu The Watson Crick complementarity then is the antimorphic involution obtained with the composition up pll Hence for a DNA strand u we have that pzu 110u Z For the general case we concentrate on morphic and antimorphic involutions of 2 De nition 31 Let 0 2 gt 2 be a mmphz39c or antimomhz39c involution Let X Q 2 be a nite set 1 The set X is called 0km1m2 subword compliant if for all u E 2 such that gt k we have 2u2m0u2 O X 0 for m1 m m2 2 We say that X is called 0 c0mpliant or 0 in x if 20X2 X 0 and 20X2 O X Q 3 The set X is called 0 free or 0 comma free if X2 20X2 0 4 The set X is called strictly t9 ifX 0X We note that when 9 pH the 0k m1 m2 subword compliance of the code words X Q A do not allow intramolecular hybridization as in Fig2 for a predetermined k and m1 m mg The maximal length of a word that together with its reverse complement can appear as subwords of a code words is limited with k The length of the hairpin ie distance between the word and its reversed complement is bounded between m1 and mg The values of k and m1 m2 would depend on the laboratory conditions ex the melting temperature and the length of the code words In order to avoid intermolecular hybridizations as presented in Fig 3 X has to satisfy 0 compliance and 0 freedom Most applications would require X to be strictly 0 We now show an algorithm how to determine whether a given set of code words is 0 free Consider the variation of a semi Flower automaton M2 X for X2 consisting of the set of states Q UweX p qg w a1an i 2 n The transitions edges of M2X are as follows 5 7a17q22U77a171922 lw a1 an E X Uq7am7 19341an W a1 an E X Uqt vai7qt 17pgvaivp t rllw a1quot39an 6 X7 i 27 77739 7 1 Clearly M2 X recognizes X2 Since we are interested in the subwords of X2 we set that all states in M2X are initial and terminal Now consider the semi Flower automaton M0X for 0X We construct the direct prod uct of M2X and M0X in a similar way as we constructed the square automaton The initial states of the product automaton are ordered pairs of initial states and the terminal states of the product are ordered pairs of terminal states Denote this automaton with B The algorithm for 0 free then relies on the following proposition Proposition 32 The set of code words X is 0 free if and only ifB recognizes the empty set Proof The proof follows form the direct product construction and the de nition of M2X exercise ll in the details D It is easily seen from the de nitions exrecise that for any morphic or antimorphic involution 9 a set of code words that is 0 free is also 0 compliant Hence it is suf cient for our algorithm to check for 0 freedom Example 33 Let Lm n UOAmT6m 1TiCm This is Vp free and hence Vp compliant 31 DNA codes and splicing In this subsection we investigate properties of DNA codes that are preserved under splicing operation In other words we characterize the set of splicing rules so that the resulting language of the splicing system keeps the properties of the involution codes We consider the question of determining the properties of the code words such that under splicing they produce new good code words in other words during the computational process the good encoding is not lost A splicing system is an ordered triple 39y E A R where E is a nite alphabet A C 2 is a set of words called axioms and R Q 24 is a set of splicing rules where 24 is the direct product of 2 x 2 x 2 x 2 The splicing operation a is de ned such that if r u1u2v1 v2 then 1 1 U U 1 w 1 U U 1 1 2 2 ruler produces 7 1 1 1 232 24 yivivzyz wz 241111142902 and in this case we write my 7 w1w2 For a language L Q 2 and a set of rules 9 we say that 0L is obtained by single step splicing of L if 0L w l 3r 6 R Hzy E L 44 7 w w Then the language generated by the splicing system 39y with axiom set A is UnZOUnA De nition 34 Base for Splicing Rules Let X be a set and B C FX be a subset of words with k minlml m E B Z 0 We say that a set B is a base for splicing rules for X if it is strictly t9 and satis es the following two properties 1 F9090 FsB2 0 for all s 2 k 2 0B2 FX De nition 35 0rules Let X be a set of words and B a base for splicing rules for X Then a set RBX which is a subset of B4 is called a set of 0 rules for X Several observations about 0rules 1 The 0 rules can always be re exive This means for each rule al a2 121122 in RBX the rules a1a2u1a2 and 121 122121122 can also be in RBX This follows directly from the de nition of 0 rules 2 0 rules can be symmetric lf a1a2u3a4 E RBX then a3a4u1a2 E RBX This also follows from the de nition of 0 rules 3 One way to obtain a set of 0 rules for X is the following Let m min ml z E X and k Note that even if X is strictly 0 FkX may not necessarily be so We partition FFkX into three strictly t9 subsets in the following way Set U1 x l z and U2 FkX Ul Then by de nition U1 is strictly t9 and for every n 6 U2 we have 0a 6 U2 Now we partition U2 into U2 and U2 such that a E U and 0a E U Hence Fk X U1 U U U U2 is a partition of FkX into three strictly 9 sets We form the basis of the words used in the splicing rules from two of these sets Let W be either UZ or U and let B U1 U W Then B is a maximal set of subwords of X of length k that is strictly 0 Now we remove from B all words that Violate the properties 1 and 2 of De nition 34 This may leave an empty set of B In that case we consider the subwords of length k 7 1 and k 1 and repeat the procedure for obtaining B This procedure ends either with a base B for splicing rules or since X is nite with no base for splicing rules for X A set K is called a strong splicing base if for all z E 2 and for all a E K if an is a pre x of K then x E K and if am is a suf x of K then x E K The need for the set of 0 rules R to be re exive and symmetric comes naturally from the chemistry of the restriction enzymes If an enzyme can cut one molecule the ligase can recombine the same molecule back together hence a re exive operation If two molecules z and y take part in a splicing operation can be cut by an enzyme then the same molecules written in the opposite order y and z are part of the same operation Hence the symmetric operation In the following we assume that the splicing rules are re exive and symmetric gs Proposition 36 Let 39y E A RBA be a splicing system with RBA being the a set of 0 rules If the axiom set A is strictly 0 in z then the language generated by the system L39y is strictly 0 in z Proof We give a sketch of the proof Since L39y Un200 A we proceed by induction on n splicing rules derivation steps If n 0 then 00A A and A is 0 in x by assumption Assume that a A is 0 in x and suppose an1A is not 0 in x Then Hzy E an1A such that z s0yp for some words 8 and p Since RBA is re exive we can assume that both z and y have been obtained by splicing words from a A Let x 1U1U24 and y y1v1v4y4 for some a1a2u3a4 E RBA and v1v2v3v4 E RBA Then z m1a1u4m4 30y1v1v4y4p In what part of x can 0v1v4 appear as a subword There are several possibilities 0U1 U4 E F1 Ll1 or 0U1 U4 E F Ll44 9 111114 E F Ll1 Ll4 0U1 U4 E lt4 0lt gt e M 111114 z1u1a4 or 0v1v4 E Fa1a4z4 but not case 1 or 2 x but none of the previous cases The rst case involves longer analysis but it ends up violating the property 1 or property 2 from de nition 34 The second and the third case violate the property 1 from De nition 34 The fourth case means that mm 6 F0v1v4 which again violates condition 1 of De nition 34 Hence an1A has to be 0 in x In order to show that L39y is strictly 9 it is suf cient to take the words 8 and p to be empty and then follow the above argument D The following proposition provides conditions under which the language generated by a splicing system is 0 comma free Proposition 37 Let 39y E A RBA be a splicing system where RBA is a set of 0 Tules for A Assume that the base B for the splicing rules is such that 0032 o FA2 0 If the axiom set A is strictly 0 c0mma free then L39y is also 0 c0mma free Proof The proof is similar to the above proof and is by induction on the number of deriva tion steps If n 0 then 00A A and A is 0 comma free by assumption Assume that a A is 0 comma free and suppose an1A is not 0 comma free Then there are m yz E an1A such that my 3029 319 6 2 Let x z1u1a4z4 y y1v1v4y4 and z 2110110424 for some u1u2u3u4 U1U2 Ug U4 and w1w2w3w4 6 RB Let 0y be a subword of a4m4y1v1 mm and mm 6 SabA violates the property 2 from de nition 34 If a4z4 rltltlr2 and mm 73qu such that tlr2r3q1 E SabA tiq E B then either 1 0w1w4 is a subword of tit or 0w1w4 is a subword of qqul 2 0w1w4 is a subword of rgrg then T2T3 E SabA2 which is a contradiction 3 0w1w4 is a subword of tlrg or 0w1w4 is a subword of qul Then the rst case violates property 1 of de nition 34 the second case violates 0B2 FA2 Q and the third case violates property 2 of de nition 34 Hence an1A is 0 comma free D It can be proven that if a strong splicing base is strictly 0 comma free then so is the language generated by the splicing system provided that the set of axioms is a subset of the free monoid generated by the splicing base The Proposition 37 does not require that the axiom set is a subset of the free monoid generated by the base for the splicing rules and therefore the language generated by the splicing system is not necessarily a subset of this same monoid The following example shows a base for splicing rules that is not strong splicing base Example 38 Consider the alphabet E a b c with the morphic involution 9 de ned with a H c b H b and c H a Let the axiom set A abaaba We choose a base for splicing rules to be B a ab This is clearly a strictly 0 comma free code but it is not a strong splicing base Consider a ab a E B with suf x aba If we pick a a and z ba we have that am is a suf x ofa word in BJr and a E B but z B Now 0B2 ccccbcbc cbcb and clearly the conditions of the de nition 34 and Proposition 37 are satis ed We can take 9 B4 and the resulting generated splicing language is in nite contains ababa U aba and is strictly 0 comma free Example 39 X ab 1 g n g 10 is 0 comma free and 0 subword 2 code for E ab cd with the morphic involution 9 de ned with a H c and b H d and 0X cd 1 g n g 10 and U ab It is easy to verify that L y Q E colJr and hence L y is 0 subword 2 code 0 in x and 0 comma free Jan 13 2005 MAD 6616 ALGEBRAIC AUTOMATA THEORY NOTES 1 FINITE AUTOMATA AND REGULAR LANGUAGES INSTRUCTOR NATASA JONOS KA 1 REGULAR LANGUAGES Let A be a nite set which will be called alphabet It will remain xed troughout our discussion unless otherwise stated In this case A stands for a nite alphabet A The elements of A are called letters or symbols Let Nk 1 A map from Nk to A is called a word or string So w Nk a A is a word in A with length lwl k We will write the word w by w a1a2 ak where ai We identify a word with length 1 with the corresponding element in A If a E A then lwla denotes the number of occurrences of a in w The empty sequence is the empty word denoted 1 or 6 Clearly 11 0 and for every a E A 11a 0 If u Nk a A and u NW A A are words in A we de ne the concatenation of u and u to be the word denoted with u 1 or simply uu and de ned uu NWquot A A such that 4 fori1k Wm ui7kforik1km Note u11u u and luul lul and luula lula lulu The set of all words is denoted with A With the operation concatenation A is a monoid ie the operation s associative uuw uuw and 1 is the identity element In categorical sense it is the free monoid generated by A The free semigroup generated by A is denoted by A1 It is the set of all words with positive length ie AJr A Note that both A and A are non commutative Example if A a b and u aba u abb then uu abaabb and uu abbaba Notation u0 1 uk1 uku If u a1ak then uR ak a1 A subset of A is called a language Languages will be denoted with capital letters L K etc The word 1 is a factor also called as subword of the word u if there are w u E A such that u wuw The word 1 is a pre x su x of u if there is w E A such that u uw u wu Note that 1 is a factor pre x and suf x of every word De nition 11 FSA A nite state automaton FSA is a quadruple M QI T 8 where Q is a nite set I Q Q T Q Q and 8 Q Q X AU X We call Q the set of states ofM I is the set of initial states T is the set of terminal states and 8 is the set of transitions M is usualy presented as a nite directed graph with Q as a set of vertices and E as a set of edges arcs In that case we associate three functions to M st 8 a Q and E a A The function 3 is called the source function de ned with sq a q q the function t is called the target function de ned with tqaq q and is the labeling function q a q a Example 12 Consider the automaton M pq p p e1 p ap52 p bqe3 q bp The graph representing this nite state automaton is oresented in Fig1 1 FIGURE 1 A transition of the form q7 17 q is called an empty transition A transition sequence or a path in M is a sequence of edges P 6162 Ek q0a17q1q1a27q2qk717ak7qk satisfying sei1 tei In fact p is a function p Nk a 8 such that spi 1 for i 17k717 and it isaword in 8 ie p 8 We denote the set of paths in M with P For a path p e1ek a label of p is Mp Me1Mek Source of p is sp se1 and target ofp is tp tek A word w is accepted by M if there is a path p such that sp E I tp E T and Mp w The language recognized by M is LM w l w is accepted by In other words LM w E 14 l 3p 6 P sp E Itp 6 T7 Mp w De nition 13 regular language A language L Q A is regular if there exists a FSA M such that L LM exercise 14 Examples and exercises The following languages are regular 1 L w 6 ab la s are separated by even number of Us 2 Every nite language7 in particular7 1 and Q are regular 3 L set of all strings in a7b that terminate with 1 4 L w 6 ab la s are separated by even number of Us 5 L words that contain at least at most three or k as 6 L words that do not contain any of the words aa7 bb as factors 7 Dk Q 01 de ned with Dk w lw whose binary representation is a number divisible by 2 DETERMINISTIC AUTOMATA We say that two automta M and M are equivalent if LM LM Proposition 21 Euery FSA M is equiualent to a FSA M which does not have any empty tran sitions Proof Let M QITE and let P be the set of all paths in M We construct M from M recognizing the same language as M by setting M 627 I 7 T7 8 where 1 1o q 6 QB e sp 6 1 tp qu 1 T To q 6 Q 3p 6 Ttp 6 T 829 6M0 1 5 q7a7q l 3p 6 31819 617509 61 Mp W E A 61161 6 El 6176 6 Q We show now that LM LM Let p e1ek be a path in M from an initial state to a nal state with label Mp a1a9 E lf Mp 1 then tek E 1 O T and so 1 E LM Let ej17ejs be a subpath subsequence ofp e1ek such that Meg1 a17 MegS as Denote with seji the initial states of ejz with pji Then by construction pjl 6 1 7 tejs E T and e pji7ai7pji1 E 8 for all i 17 7 s 71 Also egg 6 8 Hence the path 19 e 17e s1ejS is a path in M from an initial state in M to a terminal state in M with label p p So LM Q LM Converse Assume that w E LM There is a path p e1 ek in M from an inital state to a terminal state with label w Every transition q a q appearing in p can be substituted by a path with label a from q to q in M by the de nition of 8 So there is a path with label w in M and by construction of M that path can be chosen from an initial to a terminal state by possibly concatenating a pre x path and a su x path with label 1 Proposition 22 Every FSA is equivalent to a FSA with no empty transitions and with only one inital state Proof By the previous proposition we only need to show the following if M Q I T 8 has no empty transitions then it is equivalent to an automaton with only one initial state and no empty transitions De ne M Q I0T8 in the following way Q Q U qo I0 go where qo Q and 8 E U q0aql3q1 E I qaq E 8 We de ne the terminal states with T T if 1 LM and T T U go if 1 E Clearly M and M are equivalent Exercise D From now on all our aoutomata unless otherwise stated will be without empty transitions ie the transition set 8 will be subset of Q x A x Q De nition 23 DFA An automaton M QITE is deterministic if for every q E Q and every a E A the set q lqaq E 8 is either singleton or empty In other words a deterministic nite state automaton is a quadruple M QITE where Q I and T are same as in the de nition of BSA and the transition 8 de nes a partial function 6 Q x A a Q with 6q a q iff q a q E 8 We call 6 the transition function of the automaton We will use both descriptions of DFA In both cases the next state of the automaton is uniquely determined by the present state and the symbol which is currently read In case of DFA we will write q a q or just qa q instead of 6q a q or q a q E 8 If there is no transition starting at q with label a we will write qa Q We extend this notation for words ie we write qw q if there is a path in M from q to q with label w So by the de nition each DFA is also an FSA Hence the class of languages recognized by DFA is included in the class of regular languages The following proposition says that with BSA we cannot do more7 than with the DFA Proposition 24 A language L is regular i it is recognized by a DFA Proof The subset construction The if part is clear We only need to show that every FSA is equivalent to a DFA Let M Q I T 8 be an FSA By propositions 11 and 12 we can assume that M has no empty transitions and that it has only one initial state ie I go We construct a DFA Md Qd qd Td Ed in the following way Qd 2Q the power set of Q The initial state is qd go where go is the unique initial state for M The terminal states are subsets of Q which contain a terminal state in M ie B Q Q is terminal for Md iff B O T 7 Q The set of transitions function is de ned in the following way BaB 6 Ed iff B q l 3q 6 B q aq E 8 te l se E B e ae E 8 Clearly Md is deterministic since B and a uniquely determine B Hence we can associate a transi tion function 6d lfw a1 ak is aword accepted by M then there is a path q0a1q1 qu1ak qk with qk E T and by the de nition of Md ql E 6q0a1 B1 qg E 6B1a B2 qk E 6Bk71a Bk and Bk 6 Td SO LM g LMd Converse Let w E LMd Let p e1ek be a path in Md from qd go to a state Bk in Td Denote with B the terminal states te of ei Then there is a state qk E Bk O T and a transition qu1ak qk E E with qk1 E Bk1 tek1 Similarly inductively for each i O hi1 there is a transition qiai1qi1 E 8 such that qi E B and qi1 6 Bi So w E LM D exercise 25 1 Construct a DFA for each of the following FSA The initial states are in dicated by an arrow pointing to them and the terminal states are circled a 0 b 9 FIGURE 2 2 Show that there is a regular language that is recognized by a non deterministic FSA with 5 states and is not recognized by any DFA with less than 23 7 1 7 states An FSA M Q717 T7 8 is called complete if for all a E A and for all q E Q there is a q E Q such that q7 a7q E 8 Note that each FSA can be extended to an equivalent complete FSA by adding one state 9 the garbage or the junk state and adding transitions for each q E Q and a E A if there is no transition with source q and label a in 8 then we add the transition qag to 8 And for all a E A we add gag to 8 This construction doesn t destroy the determinism of FSA7 if it was deterministic it will remain deterministic 3 CLOSURE PROPERTIES OF REGULAR LANGUAGES We assume for now that all regular languages are determined with a FSA7 ie7 with a transition graph of a FSA Proposition 31 IfL Q A is regular then so is L0 A L Proof If M Q7q07T78 is a complete DFA which recognizes L7 then M Q7q07Q T78 recognizes LC Exercise show that LM LC D Proposition 32 If L1 and L2 are regular then so is L1 L2 Proof We will use the very useful product construction to prove this Let Mi Qi7 Ii Ti 8139 i 17 2 be the automaton that recognizes Li We construct the product of M1 and M2 in the following way M M1 X M2 Q1 X 622T7 E where 8 q1q2a qiq 2lq1a 61 6 51 6124 62 6 82 The initial states are determined by q17q2 E I if both ql and q2 are initial in M1 and M2 respectively7 and the terminal states are determined by q17q2 E T if both ql and q2 are terminal in M1 and M2 respectively It is clear exercise that M recognizes L1 L2 D Since L1 U L2 Li L c and L1 L2 L1 Lg regular languages are closed under dil lerence7 union7 intersection exercise 33 giuen FSA that recognize L1 and L2 construct an FSA that recognizes L1 U L2 and L1 L2 Jan 20 2005 MAD 6616 ALGEBRAIC AUTOMATA THEORY NOTES 2 DECISION PROPERTIES AND KLEENE7S THEOREM INSTRUCTOR NATASA JONOS KA 1 DECISION PROPERTIES OF REGULAR LANGUAGES We will denote the class of regular languages with Reg Proposition 11 Pumping Lemma Let L be a regular language recognized by FSA M 621118 with n states Let m E L be such that Z n Then there are u7u7w E 14 such thatv 731 m uuw anduulw EL for alll 012 Proof Since z a1am E L and m 2 n then there is apathp q07a17q1 qm717am7qm in M from an initial state go to a terminal state gm with label m Since m 2 n7 there are lj E 07177m such that qi qj and l j Then we set u a1ai7 u ai1aj and w aj1am in this case u and w may be 1 D Pumping lemma provides an algorithm to check whether a regular language is nonempty or in nite A language L 6 Reg is not empty iff it contains a word with length less than n7 and L is in nite iff it contains a word x such that n g lt 2n lf L LM then the rst algorithm has to check at most lAl 1 MP 1 lAln l words7 and the second has to check 1A1 1 lAlzn l words But clearly one can nd much more ef cient procedures to do this Proposition 12 Let L 6 Reg It is decldlble whether L is a empty 7 in nite Proof Let M Q717T78 be an FSA recognizing L We may assume that M has no empty transitions We trim M by erasing every state q in M such that there is no path from an inital state to q or there is no path from g to a terminal state with this process we get an equivalent FSA to M in which every state lies on an accepting path Let M be trimmed M Then L is not empty if M has a path from an initial state to a terminal state7 and it is in nite if M contains a cycle or a loop One can check this by considering the adjacency matrix MM for M If the states are ordered Q q1 7qn then the lj th entry of Mjlf is 7 0 iff there is a pthe of length h from state qi to state qj The cycles7 loops can be identi ed by the non zero entries in the diagonal of Mg for k Proposition 13 It is decldlble whether two regular languages are equal Proof Let L17 L2 6 Reg Then L1 L2 iff the symmetric difference of L1 and L2 is empty ie L1 U L2 L1 7 L2 D The Pumping Lemma also provides a useful tool to determine whether L Q A is regular Here is one example Consider the language L anbnl n E N Q A Assume that L 6 Reg Let M 627 I T7 8 be an automaton recognizing L with k states By the Pumping Lemma the word z akbk can be pumped ie there is u 7 1 and uw E A such that z uuw and uuiw E L for all nonnegative integers i What can u be If u a9 then uaisw aki 19bk and so 1 uuzw L Similarly we can conclude that u 7 b9 So it must be that u asbt But then uuzw akbtasbk L Thus such 1 does not exist and by the Pumping Lemma L is not regular Exercise 14 39 Which of the following languages are regular In all cases the alphabet is A a 1 L ule w E 14 2 L the set of all words not containing aba as a factor lt3 L M w 6 Ai lwla Mb 4 L the set of all words w such that lwla lwlb and no pre x of w has two more as than b s nor two more b s than as 2 REGULAR EXPRESSIONS Let L1 L2 6 Reg We de ne the product of L1 and L2 to be L1L2 L1L2 wl Eu 6 L1 31 E L2w up Clearly 1 E L1L2 iff 1 6 L1 and 1 6 L2 Similarly we can de ne L0 1 n1 L Ln Lt man and L UfglLi Note that 1 E L always and that 1 E L iff 1 E L Also LL LJr and L U 1 L We will write Lw or wL instead Lw and wL for every word w E A in the alphabet A Note that L is a submonoid of A generated by L and LJr is a subsemigroup of AJr generated by L Proposition 21 If L1L2 6 Reg then L1L2 6 Reg In other words the class of regular languages is closed under taking product of languages Proof Let M Q 1 Ti 81 be an FSA recognizing Li 1 2 We will construct an FSA M 621 T8 recognizing L1L2 in the following way Q Q1 U Q2 I 1 T T2 and E 81 U 82 U q1q lq 6 T1 q E 2 It is clear that LM L1L2 exercise D Proposition 22 IfL 6 Reg then L 6 Reg Proof Let M 621 T 8 be an FSA recognizing L We construct 3W Q I T 8 recognizing L in the following way let go be a state that is not in Q then Q Q U go I go T TUq0 and 8 EUq01qlq E IUq1q0lq E T Clearly LM L exercise D De nition 23 regular expression Let A be an alphabet A regular expression ouer A is de ned recursively as follows 0 is a regular expression and o For each a E A U 1 a is a regular expression and La a o If n x2 are regular expressions then 71 x2 mm and are regular and Lr1 l x2 Lr1 U Lx2 Lr1r2 Lx1Lx2 and 0 Lx1 If r is a regular expression and L Lr then we say that L is represented by r We say that two regular expressions r1 and r2 are equal and write r1 r2 if Lr1 Lr2 To simplify the notation if r is a regular expression we may reffer to the language r instead of Lr Exercise 24 39 Show that distributes ouer ie if r1r2r3 are three regular expressions then r1 r2r3 rlrg TZTg and r3r1 r2 rgrl Tng We are now ready to state the Kleene s theorem Proposition 25 Kleene7s Theorem Let L Q 14 Then L 6 Reg i there is a regular expression r such that L 0 Proof The only if part follows from the closure properties of Reg ie from the propositions 14 15 24 25 and from the fact that every nite language is regular Note that Q is regular since every M Q I T 8 with T Q recognizes 0 So we have to show that if L LOVE for some FSA M then there is a regular expression r such that LOVE LU We will prove this by induction on the number of edges transitions in M If M has no edges ie E Q then either L 1 in the case when O T 7 0 or L Q in the case when I O T 0 In both cases L is represented by a regular expression 1 or Q We introduce some notation Let qq E Q and quz Q q q 8 Then we de ne qu Lqu Then it is clear that L UieLtETLlt So our proof would be nished if we show that for all q q E Q there is a regular expression rqq such that quz Lrqqz Case 1 Let e q1aq2 E 8 such that ql 7 qg and let 3W QIT8 e By the inductive hypothesis quM can be recognized by a regular expression r qh Then for each qq E Q we have qu quM U qu1M aLq2q1M WVLmM By the inductive hypothesis quM can be represented by the regular expression rl zq qu1arq2q1a Tingquot Case 2 If e a a a and NV QI T E e Then for each qq E Q we have qu M qu M U LIAMKW U LamWWW M In this case again by the inductive hypothesis we have that quz can be represented by a regular expression The main observation in case 2 is that if an automaton has only two transitions qaq and qbq and one state two simple cycles can be relabeled ie substituted by two edges then the language Lqu is ab ie a b a b a b excersie show this equality D L1 L1 b V b a o a M b b 39 I W a M1 M2 M3 FIGUREl Example 26 Let M Q717T78 be the automaton presented in Figure 1 and M17M27M3 be as depicted In M we have I and T t7 so LOVE LMM Following the proof of the Kleene s Theorem we have LMM Lit M1 U LiiM1a U Lhmlhihml Lit M1 LitOVEZ U LiqM2aLiqM2aLitM2 LitM2 LitM3 U LitM3aLttM3aLttJ3 Since LitM3 bb and LttM3 1 we get that LMMZ bb bbaa bb1 a1 bba So LitM1 bba bababba 1 babba babba since LiqM2 Now we nd that anl LMMZ U LiqM2aLiqM2WLMMZ But LiqM2 b and LilM2 17 so LidMl 1 baba ba Thus LM babba 1 ba a 1 ba babba 1 baa 1 ba babba 1 a bababba a babba Note baa 1 ba a 1 ba a 1 ba Corollary 27 The class Reg is the smallest class of languages closed under the operations U and Proof Follows from the Kleene s theorem D In the following we consider another de nition of regular expressions De nition 28 generalized regular expression Generalized regular expressions are de ned recursively a 0 and 1 are regular expressions and 0 and L1 1 b for each a E A a is a regular expression and La a c if rhrg are regular expresions then r1 1 r2 771 rlrg and rf are regular expressions and Lr1 1 r2 Lr1 U Lr2 Lr1 Lr1c Lr1r2 Lr1Lr2 and 0 Lr1 This second de nition difers from the rst one only by alowing taking complement of a language and so alowing the intersection Since Reg is closed under intersection and taking complement7 both de nitions determine the same class of languages The difference appears in the question of star height of a language The star height of a regular expression denoted with shr is de ned with a sh sh1 sha 0 for all a 6 A7 b if shr1 i and shr2 j then shr1 r2 shr1r2 maxij c if shr i then shr i 1 The generalized star height of a generalized regular expression r is denoted with gshr and is de ned in the same way as the shr with addition of gshr1 gshr1 De nition 29 star height The generalized star height of a regular language L denoted with gshL ie shL is shL minshrl r is a regular expression and 0 L gshL mingshrl r is a generalized regular expression and 0 L Star height of a regular language is connected with a loop complexity of the automata recognizing the language The question of determining the star height of a language was introducewd by Eggan 1963 In 1966 Dejean and Schiitzenberger showed that for every k there is a language Lk such that shL k Although arbitrary large star height can be achieved7 only in 1988 Hashiguchi proved that the star height is decidable and presented a way to determine shL for a given regular language L However7 the complexity of an algorithm for determining shL for a given L was for the rst time established in 2004 by Daniel Kirseten with the rst estimate being complexity 2220 n whether a given language is of star height one n denotes the number of states of non deterministic automaton recognizing L In all those papers non generalized regular expressions were treated In 1965 Schiitzenberger showed that a regular language has generalized star height 0 iff its syntactic monoid is group free The syntactic monoids will be de ned in a few classes The big7 still open question is Question Is there a regular language with generalized star height 2 To illustrate the differences between the regular expressions and the generalized regular expressions consider the following examples Example 210 1 A La1 ak This implies that gshA 2 ab 1 abA Aab AlbA AaaA This implies that gshab 0 3 a bba 1 Aa 1 a This implies that gsha bb a 0 Exercise 211 Determine the FSA that recognize the following languages aa bb bab b aba fa c a bba
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'