Programming Lang & Compilers
Programming Lang & Compilers CS 5363
Popular in Course
Popular in ComputerScienence
verified elite notetaker
This 24 page Class Notes was uploaded by Mireya Heidenreich on Thursday October 29, 2015. The Class Notes belongs to CS 5363 at University of Texas at San Antonio taught by Qing Yi in Fall. Since its upload, it has received 11 views. For similar materials see /class/231383/cs-5363-university-of-texas-at-san-antonio in ComputerScienence at University of Texas at San Antonio.
Reviews for Programming Lang & Compilers
Report this Material
What is Karma?
Karma is the currency of StudySoup.
Date Created: 10/29/15
Lexical Analysis Scanners Regular expressions and Automata c55363 1 Changes to Syllabus El Grade distribution I 30 projects I 25 homeworks I 15 midterm exam I 30 final exam El Office hours I MW 400530pm c55363 Phases of compilation Compilers Read input program 9 optimization 9 translate into machine code front end mid end back end A A F L xi l Semantic Code e ca parsmg Assembler analySIs analySIs generation Characters Sentences statements Wordsstrings Meaning translation c55363 3 Lexical analysis n The first phase of compilation I Also known as lexer scanner I Takes a stream of characters and returns tokens words I Each token has a type and an optional value I Called by the parser each time a new token is needed IF LPARAN if a b c a 22 ltID b gt RPARAN ltID C gt ASSIGN ltID a gt c55363 4 Lexical analysis a Typical tokens of programming languages I Reserved words class int char boo I Identifiers abc def mmm mine I Constant numbers 123 12345 12E3 I Operators and separators lt lt a Goal I recognize token classes report error if a string does not match any class Each token class could be A single reserved word CLASS INT CHAR A single operator LE LT ADD A single separator LPARAN RPARAN COMMA The group of all identifiers ltID a gt ltID b gt The group of all integer constant ltINTNUM 1gt The group of all floating point numbers ltFLOAT 10gt c55363 Simple recognizers c 6 NextChar if c I f then do something else c 6 NextChar if c I e then do something else c 6 NextChar if c I e then do something else report success c 6 NextChar if c 0 then report success else if c lt 1 or c gt 9 then do something else c 6 NextChar while c gt 0 and c lt 9 c 6 NextChar report success c55363 Multiple token recognizers c 6 NextChar if c 5 f then if c 5 w then do something else c 6 NextChar if c 5 h then do something else else c 6 NextChar if c 5 e then if c 5 i then do something else else c 6 NextChar if c 5 e then do something else report success c55363 What About Automation I Have you seen a pattern Each recognizer is a finite state machine or finite automata a Each state remembers what characters have been read and what characters to expect a Each state corresponds to a distinct program point in the scanning algorithm a No additional storage other than the input buffer and the current input pointer is required I Can we automatically generate the recognition scanning algorithm Needs an interface language that describes what tokens need to be recognized Needs to translate token descriptions to a finite automata a finite state machine Needs to implement compileinterpret the finite automata c55363 Describing tokens a Each token type is a set of strings CLASS class LE lt ADD ID strings that start with a letter INTNUM strings composed of only digits FLOAT I Use formal language theory to describe sets of strings An alphabet Z is a finite set of all characterssymbols e39g39 alblquot39ZI0I1Iquot399I I I II ltI gtI I A string over 2 is a sequence of characters drawn from Z eg abc begin end class if a then b Empty string a A formal language is a set of strings over 2 class lt abc def 3 210 1 The C programming language EngHsh c55363 Operations on strings and languages El Operations on strings I Concatenation abc def abcdef a Can also be written as 152 or 51 2 i Exponentiation s sssssssss I n Operations on languages Union L1 U L2 x XELl oerLZ Concatenation L1L2 xy x E L1 and y E L2 Exponentiation L39 x39 x E L Kleene closure L xi x E L and i gt 0 C55363 10 Regular expression El A subset of formal languages I La the formal language described by or a Regular expressions over 2 a recursive definition I The empty string 8 is a re Ls s I For each s E Z sis a re Ls s I If or and 5 are regular expressions then a 0c is a re and Lcx Lcx i 043 is a re and Lcx3 LcxL3 n cxlBisareLoc3LocUL3 El xi iS a re Lia Lix El 0 iS a re L0L L0L C55363 11 Regular Expression Examples n Examples a b 9 a b a b a b 9 aa ab ba bb a 9 a aa aaa aaaa aa 9 a aa aaa aaaa a b 9 all strings over ab a a b 9 all strings over ab that start with a a a b b 9 all strings start with and end with b El Character classes short hands abcdabcd a zabz Ia fO 3abf0123 I Aa f z a f C55363 12 What languages can be defined by regular expressions letterABCZ digit012345 ID letter letter digit NAT digit digit FLOAT digit NAT NAT digit EXP NATe E NAT INT NAT NAT abcz 6789 What languages can be defined by regular expressions alternatives and loops each definition can refer to only previous definitions no recursion C55363 13 Exercises write regular expressions El Given an alphabet ZO1 describe I The set of all strings of alternating pairs of Os and pairs of is I The set of all strings that contain an even number of Os or an even number of is El Write a regular expression to describe I Any sequence of tabs and blanks white space I Comments in C programming language C55363 14 Finite Automata finite state machines n Deterministic finite automata DFA I A set of states S n A start initial state so u A set F of final accepting states I Alphabet Z a set of input symbols I Transition function 6 S X Z S in Example 6 1 a 2 n Non deterministic finite automata NFA I Transition function 6 S X 2 n s 2 s in Where 8 represents the empty string in Example 6 1 a 23 6 2 a 4 El a Language accepted by FA I All strings that correspond to a path from the start state sO to a final state fE F C55363 15 DFA examples b sta rt Accepted language a b C55363 16 NFA examples b b a Accepted language albabb a a we start 8 b b a Accepted language a b sta rt 0 C55363 17 Implementing DFA Char NextChar state 6 so while char I eof and state I ERROR state 66 state char char 6 NextChar if state E F then report acceptance else report failure S s0s1s2 Z O123456789 6s00 1 6s019 2 6s209 2 F s1s2 c55363 Automatically building scanners a Regular Expressionslexical patterns 9 NFA II NFA 9 DFA n DFA 9 Lexical Analyzer DFA interpreter Char NextChar state 6 50 While char I eof and state I ERROR state 66 state char char 6 NextChar if state E F then report acceptance Else report failure Lexical patterns c55363 scanner Input puffer 39RR DFA interpreter I Scanner generator transition DFA39 table Converting RE to NFA n Thompson s construction I Takes a re r and returns NFA Nr that accepts Lr n Recursive rules I For each symbol c E Z s define NFA Nc as C I Alternation if r r1 r2 build Nr 8 7 O s I Concatenation if r r1r2 build Nr as 8 Nr1 8 Nr2 5 c55363 RE to NFA examples ab start 8 8 990 o 88 alb 8 start a u as 8 439 8 C55363 21 Simulating NFA n Movement through NFA on each input character I Similar to DFA simulation but must deal with multiple transitions from a set of states I Idea each DFA state correspond to a set of NFA states I s is a single state 8cosuret s s is reachable from t through Stransitions I T is a set of states S closureT s El t E T st s E 8cosuret S closures0 a nextchar while a eof S 8cosure moveSa a nextchar If S U F Q return yes else return no C55363 22 DFAbased lexical analyzers El Convert composite NFA to DFA before simulation l Match the longest string before terminiation Match the pattern specification with highest priority add scosuresO to Dstates unmarked while there is unmarked T in Dstates do mark T for each symbol c in 2 do begin U scosuremoveT c DtransT C U if U is not in Dstates then add U to Dstates unmarked C55363 23 Convert NFA to DFA example i NFA Start a o b b D5tate5 Scosure50 50 DtranssOa cosuremove50 a 5051 DtranssOb 8 cosuremove50 b 50 D5tate5 50 5051 DtranssOsla Scosuremove5051 a 5051 DtranssOslb 8 cosuremove5051 b 5052 D5tate5 50 5051 5052 Dtrans5052a 8 cosuremove5052 a 5051 Dtrans5052b 8 cosuremove5052 b 5053 D5tate5 50 5051 5052 5053 Dtrans5053a 8 cosuremove5053 a 5051 Dtrans5053b 8 cosuremove5053 b 50 c55363 24