Popular in Course
Popular in ComputerScienence
This 0 page Class Notes was uploaded by Amira Cormier on Monday November 2, 2015. The Class Notes belongs to CS4110 at California State University - East Bay taught by EdnaReiter in Fall. Since its upload, it has received 27 views. For similar materials see /class/234371/cs4110-california-state-university-east-bay in ComputerScienence at California State University - East Bay.
Reviews for CompilerDesign
Report this Material
What is Karma?
Karma is the currency of StudySoup.
Date Created: 11/02/15
CS 41106110 Fall 2013 Edna Eddie Reiter COMPILERS I A compiler is a program takes a program written in a source language and translates it into an equivalent program in a target language source program COMPILER target program Normally the equivalent program in 1 machine code relocatable object file Normally a program written in a highlevel programming language error messages Several experimental compilers were developed in the 1950s but the FORTRAN team led by John Backus at M is generally credited as having introduced the first complete compiler in 19 25 man years vs 10 person weeks i Compilers vs Interpreters Compilers translate Interpreter translate one source to executable line of source amp execute Fast Interactive Small size Platform independence Excellent debuggers Mixtures of the two are quite possible Other Applications In addition to the development of a compiler the techniques used in compiler design can be applicable to many problems in computer science 1 Techniques used in a lexical analyzer can be used in text editors information retrieval system and pattern recognition programs 1 Techniques used in a parser can be used in a query processing system such as SQL 1 Many software having a complex frontend may need techniques used in compiler design I A symbolic equation solver which takes an equation as input That program should parse the given input equation a Most of the techniques used in compiler design can be used in Natural Language Processing NLP systems Compiler De nitions Onepass compiling read source code once start to end vs multiple passes Front end up through int code gen all machine independent Back end optimization and machine code generation Lexeme The character sequence forming a token Examples rate 60 Syntax What programs look like Semantics What programs mean Phases of A Compiler LE E lH EleEI Earmaxmalwer KR 6 I SEITE tiE Ell EEI39 H fff 331111301 Inter G Unit Gmemtnr Ermr Hanle Table If f WEEK Coda 3 tjmizer r Earle Gmmatnr Phases of A Compiler Source Lexical Syntax Semantic Intermediate Code Code Target gt gt gt gt gt gt gt Program Analyzer Analyzer Analyzer Code Generator Optimizer Generator Program Each phase transforms the source program from one representation into another representation They communicate with error handlers They communicate with the symbol table Compiler Architecture In more detail Intermediate Source gt gt Target Language Language Separation of Concerns Retargeting Compiler Architecture Language structure Course Outline Introduction to Compiling Symbol tables Lexical Analysis Syntax Analysis 1 Context Free Grammars a TopDown Parsing LL Parsing a BottomUp Parsing LR Parsing SyntaxDirected Translation Semantic Analysis Type Checking Using Sym tables RunTime Organization Intermediate Code Generation Possible optimization machine code generation Lexical Analysis Scanning Source languag e Tokens described formally Breaks input into tokens Remove white space 7 It Lexical Analyzer Lexical Analyzer reads the source program character by character and returns the tokens of the source program A token describes a pattern of characters having same meaning in the source program such as identifiers operators keywords numbers delimeters and so on EX newval oldval 12 gt tokens newval identifier assignment operator oldval identifier add operator 12 anumber Identifers need information names and info stored in the Symbol Table Example newval type int Ioc Input result I a b C d Tokens ire ultl i1 a7 17 b7 1 7 CI 7 t J perators Static Analysis Parsing Syntac c 7 structure 7 i 39 tokens Source language wl language Syntax described formally Tokens organized into syntax 5 tree that describes structure Error checking syntax Syntax Analyzer A Syntax Analyzer creates the syntactic structure generally a parse tree of the given program A syntax analyzer is also called as a parser A parse tree describes a syntactic structure 93 identifier PW newval expression expressid rlcontext free grammar identifier number oldval 12 o In a parse tree all terminals are at leaves 0 All inner nodes are nonterminals in Parsing Techniques Depending on how the parse tree is created there are different parsing techniques TopDown Parsing a Construction of the parse tree starts at the root and proceeds towards the leaves a Efficient topdown parsers can be easily constructed by hand BottomUp Parsing a Construction of the parse tree starts at the leaves and proceeds towards the root u Normally efficient bottomup parsers are created with the help of some software tools Semantic Analysis Syntac c structure language language Meaning TypeError Checking Intermediate Code abstract machine Intermediate Code Generation A compiler may produce an explicit intermediate codes representing the source program These intermediate codes are generally machine architecture independent But the level of intermediate codes is close to the level pf machine codes Code Optimizer for Intermediate Code Generator The code optimizer optimizes the code produced by the intermediate code generator in the terms of time and space Ex MULT id2id3temp1 ADD temp11id1 Code Generator Produces the target language in a specific architecture The target program is normally is a relocatable object file containing the machine codes I EX assume that we have an architecture with instructions whose at least one of its operands is a machine register MOVE MULT ADD MOVE id2R1 id3R1 1R1 R1id1 Inputz result a b C 1 Exp Assign Exp Exp Exp Exp Exp Exp Exp Exp ID ID Exp YER 22 CONTD After intermediate code generation temp1 inttoreal60 temp2 id3 temp1 temp3 id2 temp2 id1 temp3 After code optimization temp1 id3 600 id1 id2 temp1 After final code generation MULF 600 R2 MOVF id3 R2 MOVF id2 R1 ADDF R2R1 MOVF R1 id1 CONTD After semantic analysis Id2 Id3 inttoreal 60 id CONTD After syntax analysis CONTD Example source statement position initial rate 60 After lexical analysis id1 id2 id3 60 and three symbols are should be in the symbol table posMon initial rate If an identifier is used without declaration then what we ll worry about that later Code Generation Syntacticsemart Target anguage Source language S acticsemanti quot C structure lC to real machine code Memory management register allocation instruction selection instruction scheduling 27 Optimization Source language language ntacticsemantic structure Improving efficiency machin ea 39 independent Finding optimal code is NP 28 Code Generation Target anguage Source language Syntacticsemantic quot structure lC to real machine code Memory management register allocation instruction selection instruction scheduling 29 Issues Driving Compiler Design I Correctness I Speed runtime and compile time 1 Degrees of optimization 1 Multiple passes I Space I Feedback to user I Debugging 30 Symbol Table dictionary for identi ers Entry for each identifier Store attributes exactly what to be determined later oldentifiers in different scopes have different entries From Prog Lang identifier has 1 name 2 type 3 location 4 scope 5 lifetime 6 aliases What will we need Name and more to be determined Hash Tables Notation u U Universe of all possible keys a K Set of keys actually stored in the dictionary El n I When U is very large a Arrays are not practical a K ltlt U Use a table of size proportional to K The hash tables a However we lose the directaddressing ability u Define functions that map keys to slots of the hash table Hashing Hash function h Mapping from U to the slots of a hash table TOm 1 h U gt O1 m 1 With arrays key k maps to slot Ak With hash tables key k maps or hashes to slot 7Thikii hk is the hash value of key k Hashing 0 U universe of keys hk1 Mk4 actual 39 39 h k h k keys 2 5 Mk3 m l Issues with Hashing Multiple keys can hash to the same slot collisions are possible a Design hash functions such that collisions are minimized 1 But avoiding collisions is impossible I Design collisionresolution techniques Search will cost 9n time in the worst case 1 However all operations can be made to have an expected complexity of 61 Methods of Resolution I Chaining a Store all elements that hash to the same slot in a linked list a Store a pointer to the head of the linked list in the hash table slot Open Addressing a All elements stored in hash table itself a When collisions occur use a systematic consistent procedure to store elements in free slots of the table Collision Resolution by Chaining U universe of keys hk1hk4 K 23131 k2 hltk2gthltk5gthltk6gt hltk3gthltk7gt mks m l Collision Resolution by Chaining 0 U universe of keys K actual keys k2 Hashing with Chairng Dictionary Operations ChainedHashlnsert T X a Insert X at the head of list Thkeyx u Worstcase complexity 01 Chained HashSearch T k a Search an element with key k in list Thk u Worstcase complexity proportional to length of list I ChainedHashPrint T a Print contents of entire symbol table ChainedHashDelete Tk u Likely not necessary for symbol tables Keys as Natural Numbers Hash functions assume that the keys are natural numbers When they are not have to interpret them as natural numbers Example 1 Interpret a character string as an integer expressed in some radix notation Suppose the string is CLRS u ASCII values C67 L76 R82 883 a There are 128 basic ASCII values 1 So CLRS 671283761282 821281 831280 141764947 Keys as Natural Numbers Examgle 2 Make an integer from character string by adding the digits 1 ASCII values 067 L76 R82 883 1 So CLRS 6776 82 83 Advantage easy a Disadvantage y2y and yy2 collide Division Method Map a key k into one of the m slots by taking the remainder of k divided by m That is hk k mod m Example m 31 and k 78 gt hk 16 Advantage Fast since requires just one division operation Disadvantage Have to avoid certain values of m a Don t pick certain values such as m2p in Or hash won t depend on all bits of k Good choice for m o Primes not too close to power of 2 or 10 are good One simple hash that works Add characters in the string mod by the size of the table m I Remember to choose good m u Primes not too close to power of 2 or 10 are good Alternative use hash tables built into the language that you are using Java net i Certainly acceptable solution I example Symbol table 1 at 3b be a b a b ab 3 b ab C b 0 a 65 b 66 067 abba131 Mod 5 a O b 1 02 abba 1 O 1 30 bO a baO a abO nil nil nil I example Symbol table 1 at 3b be a b 0 32 9 31 9 30 1 b3 9 ab2 9 b2 9 ab1 9 a b ab b1 9 bO 9 baO 9 abO 2 03 9c2 a b ab C b C 6 3 ml 4 nil a 65 b 66 067 abba131 Mod 5 a O b 1 02 abba 1 I example ab be a b a b ab 3 b ab C b 0 a 65 b 66 067 abba131 Mod 5 a O b 1 02 abba 1 Symbol table 2 at ST blk 3 actT o nil 1 b 2 c 3 nil 4 nil ST blk 2 act T o a 1 ab 9 b 2 c 34 nil ST blk 1 act F o a 1 ab 9 b 345 nil ST blk 0 act T o a 1 b 9 ba 9 ab 345 nil 0116 ST VS stack of STS One hash table Hash table for each I Need block number in SCOIOG each record Stack the hash tables I Add at beginning of I When leave a scope chain mark inactive OR delete Issues Driving Compiler Design I Correctness I Speed runtime and compile time 1 Degrees of optimization 1 Multiple passes I Space I Feedback to user I Debugging 48 Related to Compilers Interpreters direct execution Assemblers I Preprocessors Text formatters nonWYSIWYG I Analysis tools 49