Intro Theory Computation
Intro Theory Computation ECS 120
Popular in Course
Popular in Engineering Computer Science
This 5 page Class Notes was uploaded by Ashleigh Dare on Tuesday September 8, 2015. The Class Notes belongs to ECS 120 at University of California - Davis taught by Staff in Fall. Since its upload, it has received 36 views. For similar materials see /class/187780/ecs-120-university-of-california-davis in Engineering Computer Science at University of California - Davis.
Reviews for Intro Theory Computation
Report this Material
What is Karma?
Karma is the currency of StudySoup.
Date Created: 09/08/15
ECS 120 Lesson 7 7 Regular Expressions Pt 1 Oliver Kreylos Eriday April 13th 2001 1 Outline Thus far we have been discussing one way to specify a regular language Giving a machine that reads a word and tells whether it is in the language or not Though this is a valid and unambiguous speci cation it is sometimes not a very helpful one Specifying languages by automata has two major shortcomings First when given a language it is often dif cult to construct an automaton that accepts it second when given an automaton it is often dif cult to understand which language it accepts Regular expressions are an alternative speci cation method for regular languages They are easier to construct and it is easier to see which language they describe by just looking at the expression Both bene ts stem from the fact that regular expressions describe the structure of words contained in a language rather than giving a machine that must be run in order to decide a word Regular expressions are very common in computer applications because they are a powerful way to describe patterns in texts Text editors in their search and replace functions programming languages such as PERL and UNIX utilities such as grep awk and lex all use regular expressions to describe patterns Programming language compilers typically use regular expressions to de ne the lowest level constructs of program source code tokens and the stage of the compiler responsible for recognizing tokens parser is automatically constructed from those regular expressions using the lex utility 2 Regular Expressions Regular expressions over an alphabet 2 de ne languages over 2 by describing the structure of words in a language They are based on the three regular operations Union concatenation and Kleene Star They are very similar to arithmetic expressions like 3 4 5 They consist of constants and operators and they construct complex expressions from simpler building blocks As opposed to arithmetic expressions their values are not numbers but languages Examples 0 hello speci es the language consisting of the single word hello 0 hello Uworld speci es the language consisting of the two words hello and world 0 aa speci es the language of all words consisting of an even number of as o a obb often written just as aibb speci es the set of all words consisting of any number of as followed by two bs Since every regular expression de nes one language we will write LR to denote the language de ned by regular expression R 3 Formal De nition of Regular Expressions Regular expressions over an alphabet E are de ned in a recursive fashion very similarly to arithmetic expressions We start by de ning the simplest regular expressions and then de ne operations to create more complex ones from simpler building blocks 1 Q is a regular expression de ning the empty language L Q C 2 2 E is a regular expression de ning the language consisting only of the empty word Le 6 C 2 3 If a E E is a character then a is a regular expression over 2 de ning the language consisting of the single one character word a La a C 4 If R1 and R2 are two regular expressions over the same alphabet 2 then R1 U R2 is a regular expression over 2 de ning the union of the languages of R1 and R2 LR1 U 132 LR1 U LR2 C 2 9 If R1 and R2 are two regular expressions over the same alphabet 2 then R1 0 R27 often abbreviated as RIRZ7 is a regular expression over 2 de ning the concatenation of the languages of R1 and R2 LR1 0 CT If R is a regular expression over 2 then P is a regular expres sion over 2 de ning the Kleene Star of the language of R7 LR LR C 2 The regular expressions generated by these de nitions are fully parenthe sized to avoid arnbiguities Here is how the earlier exarnple expressions look like following the recursive de nition ho e o l o l o o hello h o e o l o l o o U o o o r o l o d hello U W0r1d a o of aa o W o bow ltltagtltbbgtgt Even when dropping the explicit concatenation operator 07 the fully paren thesized expressions are almost impossible to read Therefore7 we introduce rules for irnplicit parenthezation7 similar to those used in arithrnetics o R1R2R3 R1R2R3 R1R2R3 Since the concatenation oper ator is associative7 we can drop parentheses in sequences of concatena tion operations entirely 0 R1 U R2 U R3 R1 U R2 U R3 R1 U R2 U R3 Since the union operator is associative as well7 we can drop parentheses in sequences of union operations entirely o R1UR2R3 R1UR2R3 Concatenation has precedence over union 0 RlRf R1R2 Kleene Star has precedence over concatenation 0 R1 U R R1 U R2 Kleene Star has precedence over union We also de ne the following shorthand notation o If A 114127 7an C E U 6 is a set of characters from E or the symbol 6 then A is a shorthand for 11 U 12 U U an7 the regular expression denoting the language LA 114127 7an C 2 Here are some more relevant examples for regular expressions over the ASCII alphabet In the following7 let L A7 Za7 72 be the set of letters7 and D O7 7 9 the set of decimal digits 0 DD describes all words starting with a digit7 followed by any number of digits This is the set of all positive integers in decimal notation o 6DD describes the language of all integer constants with an optional sign 0 7 6DDUDD DUD DDE7 e7 6DDU6 describes the language of all oating point constants with an optional sign and exponential part7 as recognized by the C programming language 0 L U L U D U describes the language of all valid identi ers in the C programming language not taking reserved words into account 4 Equivalence of Regular Expressions and Fi nite State Machines Earlier we have claimed that the class of languages that can be described by regular expressions is exactly the class of regular languages We are now going to prove this statement First we will show that the language LR generated by any regular expression R is accepted by some NFA M Second7 we show that the language LM accepted by any automaton M is generated by some regular expression R 5 Construction of Automata from Regular EX pressions We will prove the existence of an automata that accepts the language gen erated by a regular expression by structural induction This means7 we will 4 Figure 1 An NFA M accepting the empty language7 LM 0 Figure 2 An NFA M accepting the language consisting only of the empty word7 LM follow the recursive de nition of regular expressions and construct automata accepting the languages generated by simple regular expressions rst7 and will then show how to combine those automata to accept the languages gen erated by more complex ones For all the following constructions we will assume that all regular expressions are over some alphabet E 51 Case 1 R If R Q then LR Q The empty language is accepted by the NFA Mg q07276q0 where 6q07a Q for all a 6 25 A transition diagram for this automaton is shown in Figure 1 52 Case 2 R e If R 6 then LR The language consisting only of the empty word is accepted by the NFA M6 qo E767q0q0 where 6q07a Q for all a 6 25 A transition diagram for this automaton is shown in Figure 2 53 Case 3 Ra lfR a for some character a E 2 then LR a The language consisting only of the word a is accepted by the NFA Ma q07q17276q07q1 where ifqq0 andza otherwise W E 107Q1z e 25 5Q 9