Popular in Course
verified elite notetaker
Popular in Computer Information Technology
verified elite notetaker
This 123 page Class Notes was uploaded by Kathleen Cartwright on Monday September 28, 2015. The Class Notes belongs to CIS500 at University of Pennsylvania taught by Staff in Fall. Since its upload, it has received 5 views. For similar materials see /class/215377/cis500-university-of-pennsylvania in Computer Information Technology at University of Pennsylvania.
Reviews for SOFTWAREFOUNDATIONS
Report this Material
What is Karma?
Karma is the currency of StudySoup.
Date Created: 09/28/15
Introduction to the Objective Caml Programming Language Jason Hickey September 4 2002 Contents N W 5 U1 Introduction 7 11 Functional and imperative languages 8 12 Organization 9 13 Additional Sources of Information 9 Simple Expressions 1 1 21 Basic expressions 11 211 unit the singleton type 12 212 39int the integers 12 213 float the oatingpoint numbers 12 214 char the characters 13 215 str39i ng character strings 13 216 bool the Boolean values 14 22 Compiling your code 14 23 The OCaml type system 15 24 Comment convention 16 Variables and Functions 1 7 31 Functions 18 311 Scoping and nested functions 20 312 Recursive functions 20 313 Higher order functions 21 32 Variable names 23 Basic Pattern Matching 25 41 Incomplete matches 28 42 Patterns are everywhere 29 Tuples Iists and Polymorphism 31 5 Polymorphism 31 511 Value restriction 32 512 Comparison with other languages 33 52 Tuples 34 53 Lists 36 4 CONTENTS 6 Unions 39 61 Unbalanced binary trees 40 62 Unbalanced ordered binary trees 41 63 Balanced redblack trees 42 64 Some common builtin unions 44 7 Exceptions 47 8 Records Arrays and SideEffects 51 81 Records 51 811 Imperative record updates 52 812 Field label namespace 54 82 References 54 821 Value restriction 55 83 Arrays and strings 56 84 Sequential execution and looping 57 841 wh39ileloops 58 842 for loop 58 9 Input and Output 61 91 File opening and closing 61 92 Writing and reading values on a channel 63 93 Channel manipulation 63 94 Printf 64 95 String buffers 66 10 Files Compilation Units and Programs 67 101 Signatures 68 1011 Type declarations 68 1012 Method declarations 69 102 Implementations 69 1021 Type de nitions 70 1022 Method de nitions 70 103 Building a program 71 104 Compiling the program 72 1041 Where is the main function 73 1042 Some common errors 74 105 Using open to expose a namespace 76 1051 A note about open 76 106 Debugging a program 77 1 1 The OCaml Module System 81 111 Module signatures 82 112 Module structures 83 1121 Assigning a signature 84 113 Functors 85 1131 Using a functor 88 CONTENTS 5 1132 Sharing constraints 89 1133 An implementation that works 89 12 The OCaml Object System 91 121 The basic object system 91 1211 Class types 91 1212 Class expressions 92 1213 Objects 93 1214 Parameterized class expressions 94 122 Polymorphism 95 123 Inheritance 96 1231 Multiple inheritance 97 1232 Virtual methods 99 1233 Subclassing 100 1234 Superclassing or typecase 101 124 Functional objects 105 CONTENTS Chapter 1 Introduction This document is an introduction to ML programming speci cally for the Objective Caml OCaml programming language from INRIA 4 6 OCaml is a dialect of the ML MetaLanguage family of languages which derive from the Classic ML language designed by Robin Milner in 1975 for the LCF Logic of Computable Functions theorem prover 2 3 OCaml shares many features with other dialects of ML and it provides several new features of its own Throughout this document we use the term ML to stand for any of the dialects of ML and OCaml when a feature is speci c to OCaml ML is a functional language meaning that functions are treated as rstclass values Functions may be nested functions may be passed as arguments to other functions and functions can be stored in data structures Functions are treated like their mathematical counterparts as much as possible Assignment statements that permanently change the value of certain expressions are permitted but used much less frequently than in languages like C or Java ML is strongly typed meaning that the type of every variable and every expression in a program is determined at compiletime Programs that pass the type checker are safe they will never go wrongquot because of an illegal instruction or memory fault Related to strong typing ML uses type inference to infer types for the expressions in a program Even though the language is strongly typed it is rare that the programmer has to annotate a program with type constraints The ML type system is polymorphic meaning that it is possible to write programs that work for values of any type For example it is 7 CHAPTER 1 INTRODUCTION straightforward to de ne data structures like lists stacks and trees that can contain elements of any type In a language like C or Java the programmer would either have to write different implementations for each type say lists of integers vs lists of oatingpoint values or else use explicit coercions to bypass the type system ML implements a pattern matching mechanism that uni es case anal ysis and data destructors ML includes an expressive module system that allows data structures to be speci ed and de ned abstractly The module system includes functors which are are functions over modules that can be used to produce one data structure from another OCaml is also the only widelyavailable ML implementation to include an object system The module system and object system complement one another the module system provides data abstraction and the object system provides inheritance and reuse OCaml includes a compiler that supports separate compilation This makes the development process easier by reducing the amount of code that must be recompiled when a program is modi ed OCaml actually includes two compilers a bytecode compiler that produces code for the portable OCaml bytecode interpreter and a nativecode compiler that produces ef cient code for many machine architectures One other feature should be mentioned all the languages in the ML family have a formal semantics which means that programs have a mathematical interpretation making the programming language eas ier to understand and explain 11 Functional and imperative languages The ML languages are semifunctional which means that the normal pro gramming style is functional but the language includes assignment and sideeffects To compare ML with an imperative language here is how Euclid s algo rithm would normally be written in an imperative language like C 7quot Determine the greatest common divisor of two positive numbers a and b We assume agtb int gcdint a int b int r 12 ORGANIZATION 9 wh39iler a b a b r return b In a language like C the algorithm is normally implemented as a loop and progress is made by modifying the state Reasoning about this program requires that we reason about the program state give an invariant for the loop and show that the state makes progress on each step toward the goal In OCaml Euclid s algorithm is normally implemented using recursion The steps are the same but there are no sideeffects The 1 et keyword spec i es a de nition the rec keyword speci es that the de nition is recursive and the gcd a b de nes a function with two arguments a and b let rec gcdab ramodb39in 39ifr0then else gcd b r In ML programs rarely use assignment or sideeffects except for IO Functional programs have some nice properties one is that data structures are persistent by de nition which means that no data structure is ever destroyed There are problems with taking too strong a stance in favor of functional programming One is that every updatable data structure has to be passed as an argument to every function that uses it this is called threading the state This can make the code obscure if there are too many of these data structures We take a moderate approach We use imperative code when necessary but its use is discouraged 12 Organization This document is organized as a user guide to programming in OCaml It is not a reference manual there is already an online reference manual I assume that the reader already has some experience using an imperative programming language like C I ll point out the differences between ML and C in the cases that seem appropriate 13 Additional Sources of Information This document was originally used for a course in compiler construc tion at Caltech The course material including exercises is available at 10 CHAPTER 1 INTRODUCTION httpwwwcsca1techeducoursescsl34csl34b The OCaml reference manual 4 is available on the OCaml home page httpwwwocam1org The author can be reached at jyhcs ca1 tech edu Chapter 2 Simple Expressions Many functional programming implementations include a signi cant runtime that de nes a standard library and a garbage collector They also often include a toploop that can be used to interact with the system OCaml provides a compiler a runtime and a toploop By default the toploop is called ocaml The toploop prints a prompt reads an input expression evaluates it and prints the result Expressions in the toploop must be terminated by a doublesemicolon My machine name is kena39i ltkena39i 113gtocaml Objective Caml version 204 14 39int5 The toploop prints the type of the result in this case 39int and the value 5 To exit the toploop you may type the endof le character usually ControlD in Unix and ControlZ in Windows 21 Basic expressions OCaml is a strongly typed language every expression must have a type and expressions of one type may not be used as expressions in another type There are no implicit coercions Normally you do not have to input the types of expressions Type inference 1 is used to gure out the types for you The basic types are un39it 39int char float b001 and string 11 12 CHAPTER 2 SIMPLE EXPRESSIONS 211 unit the singleton type The simplest type in OCaml is the uni t type which contains one element 0 This seems to be a rather silly type In a functional language every func tion must return a value unit is commonly used as the value of a procedure that computes by sideeffect It corresponds to the vo i d type in C 212 int the integers The 39int type is the type of signed integers 72 710 1 2 The pre cision is nite On a 32bit machine architecture the precision is 31 bits one bit is reserved for use by the runtime and on a 64bit architecture the precision is 63 bits There are the usual expressions on integers addition subtraction multiplication division and mod remainder In addition there are the normal shifting and masking operators on the binary representations of the numbers i 151 j logical shift left i 21 i lsr j logical shift right i 2 i is treated as an unsigned twos complement number i asl j arithmetic shift left i 2 i asr j arithmetic shift right i 2 the sign of i is preserved 139 land j bitwiseand i lor j bitwiseor i lxor j bitwise exclusiveor 213 f1 oat the oatingpoint numbers The oatingpoint numbers provide fractional numbers The syntax of a oating point includes either a decimal point or an exponent base 10 de noted by an E A digit is required before the decimal point Here are a few examples 02 2e7 31415926 31415926e1 The integer arithmetic operators 710 not work with oating point values The corresponding operators include a addition is subtraction is multiplication is and division is There are also coercion functions 39intoff39 oat and fl oatofi nt 21 BASIC EXPRESSIONS 13 ltkena39i 114gt ocam Objective Caml version 204 31415926e 1 float 31415926 floatof39i nt 1 float 1 39intoffloat 12 1nt 1 31415926 172 float 5403539272 214 char the characters The character type is implemented as characters from the ASCII character set The syntax for a character constant uses single quotes a Z 120 ti V n The numerical speci cation is in decimal so 120 is the ASCII char acter x not P There are functions for converting between characters and integers The function Charcode returns the integer corresponding to a charac ter and Charchr returns the character with the given ASCII code The Char lowercase and Char uppercase functions give the equivalent lower or uppercase characters 215 st ring character strings Character strings are a builtin type Unlike strings in C character strings are not arrays of characters and they do not use 000 as a termination character The St r39i ng l ength function computes the length of the string The syntax for strings uses doublequotes Characters in the string may use the ddd syntax quotHelloquot worl d000 39is not a term39i natorn The A operator performs string concatenation quotHello A quotwor39IdOOONot a term39inatorn string quotHello worldOOONot a term39inatorn Pr39intfpr39intf s quotHello A quotwor39IdOOONot a term39inatorn Hello worldNot a terminator un39it 0 Strings also allow random access The s 39i operator gets character 139 from string S and the command 5 39i lt c replaces character 139 in string S by character 6 14 CHAPTER 2 SIMPLE EXPRESSIONS 216 bool the Boolean values There are only two Boolean values true and false Every relation returns a Boolean value Logical negation is performed by the not function The standard binary relations take two values of equal types and compare them in the normal way x y equality x ltgt 3 x is not equal toy x lt y x is less thany x lt y x is no more thany x gt y x is no less thany x gt y x is greater than 3 These relations operate on values of arbitrary type For the base types in this chapter the comparison is what you would expect For values of other types the value is implementationdependent The logical operators are also de ned ampamp is conjunction and is dis junction Both operators are the shortcircuit versions the second clause is not evaluated if the result can be determined from the rst clause For ex ample in the following expression the clause 3 gt 4 is not evaluated which makes no difference at this point but will make a difference when we add sideeffects 1lt23gt4 booltrue There is also a conditional operator 39if b then 21 else 22 if1lt 2 then 37 else 4 39int 10 22 Compiling your code If you wish to compile your code you should place it in a le with the m1 suf x There are two compilers ocamlc compiles to bytecode and ocaml opt compiles to native machine code The native code is roughly three times faster but compile time is longer The usage is similar to cc The doublesemicolon terminators are not necessary in the source les you may omit them if the source text is unambiguous 23 THE OCAML TYPE SYSTEM 15 To compile a single le use ocamlc g c lem39 This will produce a le lecmo The ocaml opt programs produces a le le cmx The g option includes debugging information in the out put le To link together several les into a single executable use ocamlc to link the cmo les Normally you would also specify the o program 79 option to specify the output le the default is a out for example if you have to program les x cmo and y cmo the com mand would be ltkena39i 165gtocamlc g 0 program xcmo ycmo ltkena39i 166gtprogram There is also a debugger ocaml debug that you can use to debug your programs The usage is a lot like gdb with one major exception execution can go backwards The back command will go back one instruction 23 The OCaml type system The ML languages are strictly typed In addition every expression has a exactly one type In contrast C is a weaklytyped language values of one type can be coerced to a value of any other type whether the coercion makes sense or not Lisp is not an explicitly typed language the compiler or interpreter will accept any program that is syntactically correct the types are checked at run time The type system is not necessarily related to safety both Lisp and ML are safe languages while C is not What is safety There is a formal de nition based on the operational semantics of the programming language but an approximate de nition is that a valid program will never fault because of an invalid machine oper ation All memory accesses will be valid ML guarantees safety by guar anteeing that every correctlytyped value is valid and Lisp guarantees it by checking for validity at run time One surprising some would say annoying consequence is that ML has no nil values again all correctly type values are va 1 As you learn OCaml you will initially spend a lot of time getting the OCaml type checker to accept your programs Be patient you will eventually nd that the type checker is one of your best friends It will help you gure out which programs are bogus If you make a change the type checker will help track down all the parts of your program that are affected In the meantime here are some rules about type checking 1 Every expression has exactly one type 2 When an expression is evaluated one of four things may happen 16 CHAPTER 2 SIMPLE EXPRESSIONS One of the important points here is that there are no pure commandsquot Even assignments produce a value although the value has the trivial un39i t type To begin to see how this works let s look at the conditional expression ltkena39i 229gtcat b xml 1 lt 2 then 2 1 3 else ltkena39i 230gtocamlc c xml F39ile quotxml line 4 characters 3 6 This expression has type float but 39is here used with type 39int This error message seems rather cryptic it says that there is a type error on line 4 characters 36 the expression 1 3 The conditional expression evaluates the test If the test is t rue it evaluates the rst branch Otherwise it evaluates the second branch In general the compiler doesn t try to gure out the value of the test during type checking Instead it requires that both branches of the conditional have the same type so that the value will have the same type no matter how the test turns out Since the expressions 1 and 1 3 have different types the type checker generates an error One other issue the else branch is not required in a conditional If it is omitted the conditional is treated as if the else case returns the Q value The following code has a type error ltkena39i 236gtcat b yml 39 1 lt 2 then 2 1 ltkena39i 237gtocamlc c yml F39ile quotymlquot line 2 characters 3 4 This expression has type 39int but 39is here used with type unit In this case the expression 1 is agged as a type error because it does not have the same type as the omitted else branch 24 Comment convention In OCaml comments are enclosed in matching and pairs Comments may be nested and the comment is treated as White space Chapter 3 Variables and Functions So far we have only considered simple expressions not involving vari ables In ML variables are names for values In a purely functional setting it is not possible to tell the difference between a variable and the value it stands for Variable bindings are introduced with the let keyword The syntax of a simple toplevel declaration is as follows let name expr For example the following code de nes two variables x and y and adds them together to get a value for 2 let x 1 valx 1nt1 let y 2 valy 1nt2 let Zxy valz 1nt3 De nitions using let can also be nested using the 39in form let name exprl 39in exprZ The expression exan is called the body of the let The variable name is de ned as the value of exprl within the body If there is only one de nition for name the variable name is de ned only in the body exan and not exprl or anywhere else Lets with a body are expressions the value of a let expression is the value of the body 17 18 CHAPTER 3 VARIABLES AND FUNCTIONS letx139in lety239in Xy 39int3 letz letx139in lety239in Xy valz39int3 Binding is static lexical scoping if there is more than one de nition for a variable the value of the variable is de ned by the most recent let de nition for the variable The variable is bound only in the body of the let or for toplevel de nitions the rest of the le letx139in letx239in letyxx in Xy 39int6 What is the value of z in the following de nition xx valz 39int8 x 1nt1 31 Functions Functions are de ned with the fun keyword The fun is followed by a se quence of variables that name the arguments the gt separator and then the body of the function By default functions are not named In ML functions are values like any other They may be constructed passed as arguments and applied to arguments Like any other value they may be named by using a let let 39incr fun 39i gt 39i 1 val 39incr 39int gt 39int ltfungt 31 FUNCTIONS 19 Note the type 39int gt 39int for the function The gt is for a function type The type before the arrow is the type of the function s argument and the type after the arrow is the type of the result The 39incr function takes an integer argument and returns an integer The syntax for function application function call is concatenation the function is followed by its arguments The precedence of function aplication is higher than most operators Parentheses are needed for arguments that are not simple expressions incr 2 int 3 incr 2 3 int 9 39incr 2 3 int 7 Functions may also be de ned with multiple arguments For example a function to compute the sum of two integers can be de ned as follows let sumfun39ij gt39i j val sum 39int gt 39int gt 39int ltfungt sum 3 4 39int 7 Note the type for sum 39int gt 39int gt 39int The arrow associates to the right so this could also be written 39int gt 39i nt gt 39int That is sum is a function that takes a single integer argument and returns a function that takes another integer argument and returns an integer Strictly speaking all functions in ML take a single argument multipleargument functions are implemented as nested functions this is called Currying after Haskell Cu r a famous logician who had a signi cant impact on the design and interpretation of programming languages The de nition of sum above is equivalent to the following explicitlycurried de nition let sum fun 39i gt fun j gt 39i j val sum 39int gt 39int gt 39int ltfungt The application of sum to only one argument is called a partial applica nquot let 39incr sum 1 val 39incr 39int gt 39int ltfungt 39incr 5 39int 6 Since named functions are so common OCaml provides an alternate syntax for functions using a let de nition The formal parameters of the function are listed after to the function name before the equality symbol 20 CHAPTER 3 VARIABLES AND FUNCTIONS let sum 39i j 139 J39 val sum 39int gt 39int gt 39int ltfungt 311 Scoping and nested functions Functions may be arbitrarily nested They may also be de ned and passed as arguments The rule for scoping uses static binding the value of a variable is determined by the code in which a function is de nedinot by the code in which a function is evaluated For example another way to de ne sum is as follows let sum 39i let sumZ j 39ij 39in sum2 val sum 39int gt 39int gt 39int ltfungt sum 3 4 39int 7 To illustrate the scoping rules let s consider the following de nition let1395 val 39i 39int 5 let add39ij 139jquot val add39i 39int gt 39int ltfungt let39i 7 val39i 1nt7 add39i 3 val 8 In the addi function the value of 39i is de ned by the previous de nition of i as S The second de nition of i has no effect on the de nition for add39i and the application of add39i to the argument 3 results in 3 S 8 312 Recursive functions Suppose we want to de ne a recursive function that is a function that is used in its own function body In functional languages recursion is used to express repetition and looping For example the power function that computes xi would be de ned as follows let rec power 39i x 39if39i 0 then 31 FUNCTIONS 21 10 else x power 39i 1 x val power 39int gt float gt float ltfungt power 5 20 float 32 Note the use of the rec modi er after the let keyword Normally the function is not de ned in its own body The following de nition is very different let powerbroken 39i x floatof39int 39i x val powerbroken 39int gt float gt float ltfungt let powerbroken 39i x 39if 39i 0 then 10 else x powerbroken 39i 1 x val powerbroken 39int gt float gt float ltfungt powerbroken 5 20 float 12 Mutually recursive de nitions functions that call one another can be de ned using the and keyword to connect several let de nitions let rec f 39i j 39 39i 0 then 1 else 9 J39 1 and g j 39ifj mod 3 0 then 1 else f j 1 J39 val f 39int gt 39int gt 39int ltfungt val g 39int gt 39int ltfungt g 5 39int 3 313 Higher order functions Let s consider a de nition where a function is passed as an argument and another function is returned Given an arbitrary function f on the real numbers a numerical derivative is de ned approximately as follows 22 CHAPTER 3 VARIABLES AND FUNCTIONS let dx 1e 10 val dx float 1e 10 let der39iv f fun x gt f x dx f x dx val der39iv fl oat gt fl oat gt float gt fl oat ltfungt Remember the arrow associates to the right so another way to write the type is float gt float gt float gt float That is the derivative is a function that takes a function as an argument and returns a function Let s apply this to the powe r function de ned above partially applied to the argument 3 let f power 3 val f fl oat gt fl oat ltfungt float 1000 let f der39iv f val f float gt float ltfungt f 100 float 300000237985 f 5039 float 750000594962 1 float 300000024822 As we would expect the derivative of x3 is approximately 3262 To get the second derivative we apply the der39iv function to f let f der39iv f val f float gt float ltfungt f 00 float 6e 10 f 10 float 0 f 100 float 0 The second derivative which we would expect to be 626 is way off Ok there are some numerical errors here Don t expect functional programming to solve all your problems letgx30 x x val g float gt float ltfungt let 9 der39iv g float gt float ltfungt g 10 float 600000049644 g 100 float 599999339101 32 VARIABLE NAMES 23 32 Variable names As you may have noticed in the previous section the character is a valid character in a variable name In general a variable name may contain letters lower and upper case digits and the and characters but it must begin with a lowercase letter or the underscore character and it may not the the all by itself In OCaml sequences of characters from the in x operators like are also valid names The normal pre x version is ob tained by enclosing them in parentheses For example the following code is a proper entry for the Obfuscated ML contest Don t use this code in class quotIet and and and val 39int gt 39int gt 39int ltfungt val 39int gt 39int gt 39int ltfungt val quot 39int gt 39int gt 39int ltfungt val 39int gt 39int gt 39int ltfungt Note that the operator requires space within the parenthesis This is because of comment conventions comments start with and end with The rede nition of in x operators may make sense in some contexts For example a program module that de nes arithmetic over complex numbers may wish to rede ne the arithmetic operators It is also sensible to add new in x operators For example we may wish to have an in x operator for the powe r construction quotIetx i power39ix va float gt 39int gt float ltfungt 100 M 5 float 100000 24 CHAPTER 3 VARIABLES AND FUNCTIONS Chapter 4 Basic Pattern Matching One of the more powerful features of ML is that it uses pattern matching to de ne functions by case analysis Pattern matching is performed by the match expression which has the following general form match apr with pattl gt exprl l pattz gt exprz l pattn gt exprn A pattern is an expression made of constants and variables When the pattern matches with an argument the variables are bound to the corre sponding values in the argument For example Fibonacci numbers can be de ned succinctly using pattern matching Fibonacci numbers are de ned inductively f39i b 0 0 fi b 1 1 and for all other natural numbers 139 f39ib i f39i bi 71 f39i bi 7 2 let rec f39ib39i match 39i with O gtO 1 gt1 j gtf139b j 2 f39ib j 1 val f39ib 39int gt 39int ltfungt f39ib 1 39int1 f39ib 2 39int1 f39ib 3 39int2 26 CHAPTER 4 BASIC PATTERN MATCHING fib6 int8 In this code the argument 139 is compared against the constants 0 and 1 If either of these cases match the return value is i The nal pattern is the vari able j which matches any argument When this pattern is reached j takes on the value of the argument and the body fi b j 2 fi b j 1 computes the returned value Note that the variables in a pattern are binding occurrences unrelated to any previous de nition of the variable For example the following code pro duces a result you might not expect The rst case matches all expressions returning the value matched The toploop issues warning for the second and third cases let zero 0 let one 1 let rec fib i match i with zero gt zero I one gt one j gt fib j 2 fib j 1 Warning this match case is unused Warning this match case is unused val fib int gt int ltfungt fib1 int 1 fib 10 int 10 fib 2002 int 2002 The general form of matching where the function body is a match ex pression applied to the function argument is quite common in ML pro grams OCaml de nes an equivalent syntactic form to handle this case using the function keyword instead of fun A function de nition is like a fun where a single argument is used in a pattern match The fi b de nition using function is as follows let rec fib function 1 gt1 i gtfib i 1fib i 2 val fib int gt int ltfungt fib1 int1 fib 6 int8 27 Patterns can also be used with values having the other basic types like characters strings and Boolean values In addition multiple patterns with out variables can be used for a single body For example one way to check for capital letters is with the following function de nition funct39i on let 39isuppercase IA I C ID I IE I IF I G I H I BI I II I I I K IL I III I IN I 0 I P I IQ I IR I IS IT I III I IV I w I IX I Y I Z gt true c gt false val 39isuppercase char gt bool ltfungt 39isuppercase M l true 39isuppercase m bool false It is rather tedious to specify all the letters one at a time OCaml also allows pattern ranges C1C2 where cl and C2 are character constants let 39isuppercase function A Z gt true c gt false val 39isuppercase char gt bool ltfungt 39isuppercase M bool true 39isuppercase m bool false Note that the pattern variable 6 in these functions acts as a wildcard pattern to handle all nonuppercase characters The variable itself is not used in the body false This is another commonly occurring structure and OCaml provides a special pattern for cases like these The pattern a single underscore character is a wildcard pattern that matches anything It is not a variable so it can t be used in an expression The 39isuppercase function would normally be written this way let 39isuppercase function A Z gt true gt false val 39isuppercase char gt bool ltfungt 39isuppercase M bool true 39isuppercase m bool false 28 CHAPTER 4 BASIC PATTERN MATCHING 41 Incomplete matches You might wonder about what happens if all the cases are not consid ered For example what happens if we leave off the default case in the i suppe rcase function let isuppercase function A Z gt true Characters 19 49 Warning this pattern matching is not exhaustive Here is an example of a value that is not matched val isuppercase char gt bool ltfungt The OCaml compiler and toploop are verbose about inexhaustive pat terns They warn when the pattern match is inexhaustive and even suggest a case that is not matched An inexhaustive set of patterns is usually an erroriwhat would happen if we applied the isuppercase function to a nonuppercase character isuppercase M true isuppercase m Uncaught exception Matchfailure 19 49 Again OCaml is fairly strict In the case where the pattern does not match it raises an exception we ll see more about exceptions in Chapter 7 In this case the exception means that an error occurred during evaluation a pattern matching failure A word to the wise heed the compiler warnings The compiler generates warnings for possible program errors As you build and modify a program these warnings will help you nd places in the program text that need work In some cases you may be tempted to ignore the compiler For example in the following function we know that a complete match is not needed if the isodd function is always applied to nonnegative numbers let isodd i match i mod 2 with 0 gt false I 1 gt true Characters 18 69 Warning this pattern matching is not exhaustive Here is an example of a value that is not matched 2 val isodd int gt bool ltfungt isodd 3 bool true 42 PATTERNS ARE EVERYWHERE 29 isodd 12 bool false However do not ignore the warning If you do you will nd that you begin to ignore all the compiler warningsiboth real and bogus Eventually you will overlook real problems and your program will become hard to maintain For now you should add the default case that raises an exception manually The Invalid1argument exception is designed for this purpose It takes a string argument that identi es the name of the place where the failure occurred You can generate an exception with the raise construction let isodd i match i mod 2 with 0 gt false I 1 gt true gt raise Invalidargument quotisodd val isodd int gt bool ltfun isodd 3 bool true isodd 1 Uncaught exception Invalidargument isodd 42 Patterns are everywhere It may not be obvious at this point but patterns are used in all the binding mechanisms including the let and fun constructions The general forms are as follows let patt expr let name patt patt expr fun patt gt expr These aren t much use with constants because the pattern match will always be inexhaustive except for the 0 pattern However they will be handy when we introduce tuples and records in the next chapter let isone fun 1 gt true Characters 13 26 Warning this pattern matching is not exhaustive Here is an example of a value that is not matched val isone int gt bool ltfungt let isone 1 true Characters 11 19 Warning this pattern matching is not exhaustive Here is an example of a value that is not matched 30 CHAPTER 4 BASIC PATTERN MATCHING O va1 39isone 39int gt b001 ltfungt 39isone 1 b001 true 39isone 2 Uncaught except39ion Matchfa39i39ure 11 19 1et 39isun39it O true va1 39isun39it un39it gt b001 ltfungt 39isun39it 0 b001 true Chapter 5 Tuples Lists and Polymorphism In the chapters leading up to this one we have seen simple expressions involving numbers characters strings functions and variables This lan guage is already Turing completeiwe can code arbitrary data types using numbers and functions Of course in practice this would not only be in efficient it would also make it very hard to understand our programs For efficient and readable data structure implementations we need aggregate types OCaml provides a rich set of aggregate types including tuples lists disjoint unions also called tagged unions or variant records records and arrays In this chapter we ll look at the simplest part tuples and lists We ll discuss unions in Chapter 6 and we ll leave the remaining types for Chapter 8 when we introduce sideeffects 51 Polymorphism At this point it is also appropriate to introduce polymorphism The ML languages provide parametric polymorphism That is types may be param eterized by type variables For example the identity function in ML can be expressed with a single function let 39ident39ity x x val 39ident39ity a gt identity 1 a ltfungt 1nt 1 39ident39ity quotHe39l39loquot string quotHelloquot 32 CHAPTER 5 TUPLES LISTS AND POLYMORPHISM Type variables are lowercase identi ers preceded by a single quote A type variable represents an arbitrary type The typing 39ident39i ty a gt 31 says that the 39ident39i ty function takes an argument of some arbitrary type a and returns a value of the same type a If the 39ident39i ty function is applied to an 39int then it returns an 39int if it is ap plied to a str39i ng then it returns a str39i ng The 39ident39i ty function can even be applied to function arguments let succ i 39i 1 val succ 39int gt 39int ltfungt 39ident39ity succ 39 39 t ltfungt 39ident39ity succ 2 39int 3 In this case the 39ident39i ty succ expression returns the succ function itself which can be applied to 2 to return 3 511 Value restriction What happens if we apply the 39ident39i ty to a polymorphic function type let 39ident39ity 39ident39ity 39ident39ity val 39ident39ity a gt a fungt identity 1 39 1 39ident39ity 39int gt 39int ltfungt 39ident39ity HHe39l39lo Characters 10 17 This expression has type String but 39is here used with type 39int This doesn t quite work as we expect Note the type assignment 39ident39i ty a gt a The type variables a are now preceded by an underscore These type variables specify that the 39ident39i ty function takes an argument of some type and returns a value of the same type This is a form of delayed polymorphism When we apply the 39ident39i ty function to a number the type a is assigned to be 39int the 39ident39i ty function can no longer be applied to a string This behavior is due to the value restriction for an expression to be truly polymorphic it must be a value Values are immutable expressions that evaluate to themselves For example numbers and characters are values Functions are also values Function applications like 39ident39i ty 39ident39i ty are not values because they can be simpli ed the 39ident39i ty 39ident39ity expression evaluates to 39ident39i ty 51 POLYMORPHISM 33 The normal way to get around the value restriction is to use eta axpansion which is the technical term for adding extra arguments to the function We know that 39ident39i ty is a function we can add its argument explicitly let 39ident39ity x 39ident39ity 39ident39ity x val 39ident39ity a gt a ltfungt identity 1 39int 1 39ident39ity quotHelloquot string quotHelloquot The new version of 39ident39i ty computes the same value but now it is properly polymorphic Why does OCaml have this restriction It probably seems silly but the value restriction is a simple way to maintain correct typing in the presence of sideeffects it would not be necessary in a purely functional language We ll revisit this in Chapter 8 512 Comparison with other languages Polymorphism can be a powerful tool In ML a single identity function can be de ned that works on all types In a nonpolymorphic language like C a separate identity function would have to be de ned for each type 39int 39int39i dent39i ty39i nt 39i i return 39i struct complex float real float 39imag struct complex complex39ident39itystruct complex x return X Another kind of polymorphism is overloading also called adhac poly morphism Overloading allows several functions to have the same name but different types When that function is applied the compiler selects the appropriate function by checking the type of the arguments For example in Java we could de ne a class that includes several de nitions of addition for different types note that the operator is already overloaded class Adder static 39int Add39int 39i 39int j return 39i j 34 CHAPTER 5 TUPLES LISTS AND POLYMORPHISM stat39ic float Addfloat x float y return x y static String AddStr39ing 51 String sZ return slconcatsZ The expression Adder Add5 7 would evaluate to 12 while the ex pression AdderAdd Hello quotworldquot would evaluate to the string quotHello world OCaml does not provide overloading There are probably two main rea sons One is technical it is hard to provide both type inference and over loading at the same time For example suppose the function were over loaded to work both on integers and oatingpoint values What would be the type of the following add function Would it be 39int gt 39int gt 39int or float gt float gt float let addxy XYH The best solution would probably to have the compiler produce two in stances of the add function one for integers and another for oating point values This complicates the compiler and with a sufficiently rich type sys tem type inference would become undecidable That would be a problem The second reason for the omission is that overloading can make it more difficult to understand programs It may not be obvious by looking at the program text which one of a function s instances is being called and there is no way for a compiler to check if all the function s instances do similar things1 52 Tuples Tuples are the simplest aggregate type They correspond to the ordered tuples you have seen in mathematics or set theory A tuple is a collection of values of arbitrary types The syntax for a tuple is a sequence of expressions separated by commas For example the following tuple is a pair containing a number and a string let p 1 quotHelloquot val p 39int quot string 1 quotHelloquot 1 The second reason is weaker Properly used overloading reduces namespace clutter by grouping similar functions under the same name True overloading is grounds for obfusca tion but OCaml is already ripe for obfuscation by allowing arithmetic functions like to be defined 52 TUPLES 35 The syntax for the type of a tuple is a delimited list of the types of the components In this case the type of the pair is 39int str39i n Tuples can be deconstructed using pattern matching with any of the pat tern matching constructs like let match fun or function For example to recover the parts of the pair in the variables x and y we might use a let form let x y p val x 39int 1 val y string quotHelloquot The builtin functions fst and snd return the components of a pair de ned as follows let fst x XH val fst a b gt a ltfungt let snd y y val snd a b gt b ltfungt fst p 39int 1 snd p string quotHelloquot Tuple patterns in a function argument must be enclosed in parentheses Note that these functions are polymorphic The fst and snd functions can be applied to a pair of any type a b fst returns a value of type a and snd returns a value of type b There are no similar builtin functions for tuples with more than two elements but they can be de ned let t 1 quotHelloquot 27 val t 39int str39ing float 1 quotHelloquot 27 let fst3 x x val fst3 a b c gt a ltfungt fst3 t 39int 1 Note also that the pattern assignment is simultaneous The following expression swaps the values of x and y let x val x 39i 1 let y ello val y string quotHelloquot let x yy x val x string quotHelloquot val y 39int 1 1 3 t uH 36 CHAPTER 5 TUPLES LISTS AND POLYMORPHISM Since the components of a tuple are unnamed tuples are most appropri ate if they have a small number of wellde ned components For example tuples would be an appropriate way of de ning Cartesian coordinates let makecoord x y X y val makecoord a gt b gt a b ltfungt let xofcoord fst val Xofcoord a b gt a lt mgt let yofcoord snd val yofcoord a b gt b ltfungt However it would be awkward to use tuples for de ning database en tries like the following For that purpose records would be more appropri ate Records are de ned in Chapter 8 Name Height Phone Salary let jason quotJasonquot 625 quot626 395 6568quot 500 val jason str39ing float str39ing float let nameofentry name name val nameofentry a b c d gt a ltfungt quotJasonquot 625 quot626 395 6568quot 50 nameofentry jason string quotJasonquot 53 Lists Lists are also used extensively in OCaml programs A list contains a se quence of values of the same type There are two constructors the expression is the empty list and the 9122 expression is the cons of expres sion 21 onto the list 22 let l quotHelloquot quotWorldquot val l str39ing l39ist quotHello quotWorldquot The bracket syntax 91 en is an alternate syntax for the list contain ing the values computed by 21 en The syntax for the type of a list with elements of type t is t list The list type is a of a parameterized type An 39int list is a list containing integers a str39i ng list is a list containing strings and an a l 39i st is a list containing elements of some type a but all the elements have to have the same type Lists can be deconstructed using pattern matching For example here is a function that adds up all the numbers in an 39int l39ist let rec sum function gt 0 53 LISTS 37 il gtisuml val sum int list gt int ltfungt sum 1 2 3 4 int10 These functions can also be polymorphic The function to check if a value x is in a list l could be de ned as follows let rec mem x l match l with gt false I y l gt x y mem x l val mem a gt a list gt bool ltfungt mem 5 1 7 3 bool false mem quotdoquot quotI m quotafraidquot quotIquot quotcan tquot quotdoquot quotthatquot quotDave bool true This function takes an argument of any type a and checks if the ele ment is in the a list The standard map function Li st map can be de ned as follows let rec map f function gt xl gtfxmapfl val map a gt b gt a list gt b list ltfungt map succ 1 2 3 4 int list 2 3 4 5 The map function takes a function of type a gt b this argument func tion takes a value of type a and returns a value of type b and a list containing elements of type a It returns a b list Equivalently mapf HunWu f v1fvn Lists are commonly used to represent sets of values or keyvalue rela tionships The Li st library contains many list functions The Li stassoc function returns the value for a key in a list let entry quotnamequot quotJason quotheight quot6 3 quotphone quot626 395 6568 quotsalaryquot H50 val entry string string list quotname quotJasonquot quotheightquot quot6 3 38 CHAPTER 5 TUPLES LISTS AND POLYMORPHISM quotphonequot quot626 345 9692quot quotsalaryquot quot50quot L39istassoc quotphonequot entry string quot626 395 6568quot Note that the comma separates the elements of the pairs in the list and the semicolon separates the items of the list Chapter 6 Unions Disjoint unions also called tagged unions or variant records are ubiq uitous in OCaml programs A disjoint union represents a set of cases of a particular type For example we may say that a binary tree contains nodes that are either interior nodes or leaves Suppose that the interior nodes have a label of type a In OCaml this would be expressed as a type de nition using the type keyword The type is recursively de ned and it contains a type parameter a for the type of elements type a btree Node of a Leaf type a btree a btree quot a btree Node of a a btree a btree Leaf The name of the type is btree and the type parameterization uses pre x notation a btree The cases are separated by a vertical dash the character Each case has a name and an optional set of values The name must begin with an uppercase letter In this case the type of the de nition is a btree and the interior node Node has three values a label of type a a left child of type a btree and a right child of type a bt ree The names like Node and Leaf are called constructors Con structors can be viewed as functions that inject values into the dis joint union Thus the Node constructor would be a function of type a a btree a btree gt a btree For technical reasons OCaml does not allow constructors with arguments to be used as values Leaf a btree Leaf Node 1 Leaf Leaf 39int btree Node 1 Leaf Leaf Node 39 40 CHAPTER 6 UNIONS The constructor Node expects 3 arguments but is here applied to 0 arguments A value with a union type is a value having one of the cases The value can be recovered through pattern matching For example a function that counts the number of interior nodes in a value of type a btree would be de ned as follows let rec cardinality function 0 Node left right gt cardinality left cardinality right 1 val cardinality a btree gt int lt ungt cardinality Node 1 Node 2 Leaf Leaf Leaf int 61 Unbalanced binary trees To see how this works lets build a simple data structure for unbalanced binary trees that represent sets of values of type a The empty set is just a Leaf To add an element to a set s we create a new Node with a Leaf as a leftchild and s as the right child let empty Leaf val empty a btree Leaf let insert x s Node x Leaf s val insert a gt a btree gt a btree ltfungt let rec setoflist function gt empty I x l gt insert x setoflist l val setoflist a list gt a btree ltfungt let s setoflist 3 5 7 11 13 val s int btree Node 3 Leaf Node 5 Leaf Node 7 Le f a 1 Node 11 Leaf Node 13 Leaf Leaf The membership function is de ned by induction on the tree an element x is a member of a tree iff the tree is a Node and x is the label or x is in the left or right subtrees let rec mem x function gt false I Node y left right gt 62 UNBALANCED ORDERED BINARY TREES 41 x y mem x left mem x right val mem a gt a btree gt bool ltfungt mem 11 s bool true mem 12 s bool false 62 Unbalanced ordered binary trees One problem with the unbalanced tree is that the complexity of the mem bership operation is O n where n is cardinality of the set We can improve this slightly by ordering the nodes in the tree The invariant we maintain is thefoHowungforanyinte ornodeNode x left rightachelabeh in the left child are smaller than x and all the labels in the right child are larger than x To maintain this invariant we need to modify the insertion function let rec insert x function Leaf gt Node x Leaf Leaf Node y left right gt if x lt y then Node y insert x left right else if x gt y then Node y left insert x right else Node y left right val 39insert a gt a btree gt a btree ltfungt let rec setofl39ist function gt empty Ix l gt 39insert x setofl39ist l val setofl39ist a list gt a btree ltfungt let s setoflist 7 5 9 11 3 val s int btree Node 3 Leaf Node 11 Node 9 Node 5 Leaf Node 7 Leaf Leaf Leaf Leaf Note that this insertion function does not build balanced trees If ele ments are inserted in order the tree will be maximally unbalanced with all the elements inserted along the right branch For the membership function we can take advantage of the set ordering during the search let rec mem x function 42 CHAPTER 6 UNIONS Leaf gt false I Node y left right gt if x lt y then mem x left else if x gt y then mem x right else x y true val mem a gt a btree gt bool ltfungt mem 5 5 bool true mem 9 s bool true mem 12 s bool false The complexity of this membership function is 01 where l is the max imal depth of the tree Since the insert function does not guarantee bal ancing the complexity is still 0n worst case 63 Balanced redblack trees For a more advanced example of pattern matching on unions consider the implementation of balanced trees as redblack trees This section may be skipped by the reader who is already familiar with advanced pattern match 1ng We ll use a functional implementation of redblack trees due to Chris Okasaki S Redblack trees add a label either Red or Black to each of the interior nodes Several new invariants are maintained 1 Every leaf is colored black 2 All children of every red node are black w Every path from the root to a leaf has the same number of black nodes as every other path 4 The root is always black These invariants guarantee the balancing Since all the children of a red node are black and each path from the root to a leaf has the same number of black nodes the longest path is at most twice as long as the shortest path The type de nitions are similar to the unbalanced binary tree we just need to add a redblack label 63 BALANCED REDBLACK TREES type color Red Black type a btree Node of color Leaf a btree a a btree The membership function also has to be rede ned for the new type let rec mem x function Leaf gt false I Node a y b gt if x lt y then mem x a else if x gt y then mem x b else true The i nse rt function must maintain the invariants during insertion This can be done in two parts First nd the location where the node is to be inserted If possible add the new node with a Red label because this would preserve invariant 3 This may however violate invariant 2 because the new Red node may have a Red parent If this happens the balance function migrates the Red label upward in the tree let balance function Black Node Red Node Red a x b y c 2 d gt Node Red Node Black a x b y Node Black c Black Node Red a x Node Red b y c 2 d gt Node Red Node Black a x b y Node Black c Black a x Node Red Node Red b y c 2 d gt Node Red Node Black a x b y Node Black c Black a x Node Red b y Node Red c 2 d gt Node Red Node Black a x b y Node Black c a b c gt Node a b c d let insert x s let rec ins function Leaf gt Node Red Leaf x Leaf Node color a y b as s gt if x lt y then balance color ins a y b else if x gt y then balance color a y ins b else s in match ins s with guaranteed to be non empty Node a y b gt Node Black a y b Leaf gt raise Invalidargument quotinsert 2 d z d z d z d 44 CHAPTER 6 UNIONS a a btree gt a btree ltfungt a btree ltfungt val balance color a btree val 39insert a gt a btree gt Note the use of nested patterns in the balance function The balance function takes a 4tuple with a color two btrees and an element and it splits the analysis into ve cases four of the cases are for Violations of invariant 2 nested Red nodes and the nal case is the case where the tree does not need rebalancing Since the longest path from the root is at most twice as long as the shortest path the depth of the tree is Olog n The balance function takes constant time This means that the 39inse rt and mem functions both take time Olog n let empty Leaf val empty a btree Leaf let rec setofl39ist function gt empty I x l gt 39insert x setofl39ist l val setofl39ist a list gt a btree ltfungt let s setoflist 3 9 5 7 11 val s int btree Node Black Node Black Node Red Leaf 3 Leaf 5 Leaf 7 Node Black Node Red Leaf 9 Leaf 11 Leaf mem 6 s bool false Iuu 64 Some common builtin unions A few of the types we have already seen are de ned as unions The built in Boolean type is de ned as a union the true and false keywords are treated as capitalized identi ers type bool true false type bool true false The list type is similar once again the compiler treats the and identi ers as capitalized identi ers1 1 At the time of writing using OCaInl 204 this de nition was accepted literally In OCaInl 304 this usage is deprecated and the produces a syntax error 64 SOME COMMON BUILTIN UNIONS 4S a quotlist a quotlist type a quotlist of a type a quotlist of 31 Although it is periodically suggested on the OCanil mailing list OCanil does not have a NIL value that can be assigned to a variable of any type Instead the builtin a opti on type is used in this case type a option None Some of a type a option None Some of a The None case is intended to represent a NIL value While the Some case handles nondefault values 46 CHAPTER 6 UNIONS Chapter 7 Exceptions Exceptions are used in OCaml as a control mechanism either to signal errors or to control the ow of execution When an exception is raised the current execution is aborted and control is thrown to the most recently entered active exception handler which may choose to handle the exception or pass it through to the next exception handler Exceptions were designed as a more elegant alternative to explicit error handling in more traditional languages In Unix C for example most sys tem calls return 1 on failure and 0 on success System code tends to be cluttered with explicit error handling code that obscures the intended oper ation of the code In the OCaml Unix module the system call stubs raise an exception on failure allowing the use of a single error handler for a block of code In some ways this is like the setjmpl ongj mp interface in C but OCaml exceptions are safe To see how this works consider the Li st assoc function which is de ned as follows let rec assoc key function k v l gt if k key then v else assoc key l gt raise Notfound val assoc a gt a b list gt b ltfungt let l 1 quotHelloquot 2 quotWorld val l int string list 1 quotHelloquot 2 quotWorld assoc 2 l string quotWorldquot 47 48 CHAPTER 7 EXCEPTIONS assoc 3 1 Uncaught exception Notfound quotHe11oquot A assoc 2 1 string quotHe11oWor1dquot quotHe11oquot A assoc 3 1 Uncaught exception Notfound In the rst case assoc 2 1 the key is found in the list and its value is returned In the second case assoc 3 1 the key 3 does not exist in the list and the exception Notfound is raised There is no explicit exception handler and the toploop default handler is invoked Exceptions are declared with the exception keyword and their syntax has the same form as a constructor declaration in a union type Exceptions are raised with the raise function exception Abort of int quot string exception Abort of int string raise Abort 1 quotGNU is not Unix Uncaught exception Abort1 quotGNU is not Unix Exception handlers have the same form as a match pattern match using the try keyword The syntax is as follows tryewith p1 gt e1 lp2 gt e2 lph gt en First e is evaluated If it does not raise an exception its value is re turned as the result of the try statement Otherwise if an exception is raised during evaluation of e the exception is matched against the patterns p1 pn If the rst pattern the exception matches is m the expression at is evaluated and returned as the result Otherwise if no pattern matches the exception is propagated to the next exception handler try quotHe11oquot A assoc 2 1 with Abort i s gt s Notfound gt quotNotfound string quotHe11oWor1dquot try quotHe11oquot A assoc 3 1 with Abort i s gt s Notfound gt quotNotfound string quotNotfound try quotHe11oquot A assoc 3 1 with Abort i s gt s Uncaught exception Notfound 49 Exceptions are also used to manage control ow For example consider the binary trees in the previous chapter type a btree Node of a btree a a btree Leaf type a btree Node of a btree a a btree Leaf Suppose we Wish to build a replace function that replaces a value in the set The expression repl ac x y s should replace value x with y in tree 5 or raise the exception Notfound if the value does not exist let rec replace x y function Leaf gt raise Notfound Node left 2 right gt let left modl eft try replace x y left true with Notfound gt left false in let right modright try replace x y right true with Notfound gt right false in if Z x then Node left y right else if modleft modright then Node left x right else raise Notfound val replace a a gt a btree gt a btree ltfungt In this function the left and right subtrees are recursively modi ed The modl eft and modri ght ags are set iff the corresponding branches were modi ed If neither branch is modi ed the Notfound exception is raised CHAPTER 7 EXCEPTIONS Chapter 8 Records Arrays and SideEffects In this chapter we discuss the remaining data types all of which allow sideeffects A record can be viewed as a tuple with labeled elds An array is a xedsize vector of items with constant time access to each element There are operations to modify some of the elds in a record and any of the elements in an array 81 Records A record is a labeled collection of values of arbitrary types The syntax for a record type is a set of eld type de nitions surrounded by braces and separated by semicolons Fields are declared as label type where the label is an identi er that must begin with a lowercase letter or an under score For example the following record rede nes the database entry from Chapter 52 type dbentry name string he39ight float phone string salary float type dbentry name String he39ight float phone string salary float The syntax for a value is similar to the type declaration but the elds are de ned as label expr Here is an example database entry 51 52 CHAPTER 8 RECORDS ARRAYS AND SIDEEFFECTS let jason name quotJasonquot height 625 phone quot626 395 6568quot salary 500 val jason dbentry name Jason height625 phone 626 395 6568 salary50 There are two ways to access the elds in a record The projection op eration r l returns the eld labeled l in record r jason he39ight float 625 jasonphone str39i ng quot626 395 6568quot Pattern matching can also be used to access the elds of a record The syntax for a pattern is like a record value but the elds contain a label and a pattern label patt Not all of the elds have to be included Note that the binding occurrences of the variables n and h occur to the right of the equality symbol in their elds let name n height h jason val n string quotJasonquot val h float 625 There is a functional update operation that produces a copy of a record with new values for the speci ed elds The syntax for functional update uses the with keyword in a record de nition let dave jason with name quotDavequot height 59 val dave dbentry name Dave height59 phone 626 395 6568 salary50 jason dbentry name Jason height625 phone 626 395 6568 salary50 811 Imperative record updates Record elds can also be modi ed by assignment but only if the record eld is declared as mutable The syntax for a mutable eld uses the mutabl e keyword before the eld label For example if we wanted to allow salaries to be modi ed we would redeclare the record entry as follows 81 RECORDS type dbentry name string he39ight float phone string mutable salary float type dbentry name string he39ight float phone string mutable salary float let jason name quotJasonquot height 625 phone quot626 395 6568quot salary 500 val jason dbentry name Jason height625 phone 626 395 6568 salary50 The syntax for a eld update is rlabel lt expr For example if we want to give jason a raise we would use the following statement jasonsalary lt 1500 un39it O jason dbentry name Jason height625 phone 626 395 6568 salary150 Note that the assignment statement itself returns the canonical unit value 0 That is it doesn t return a useful value unlike the functional update A functional update creates a completely new copy of a record assignments to the copies will be independent let dave jason with name quotDavequot val dave dbentry name Dave height625 phone 626 395 6568 salary150 davesalary lt 1800 un39it O dave dbentry name Dave height625 phone 626 395 6568 salary180 jason dbentry name Jason height625 phone 626 395 6568 salary150 54 CHAPTER 8 RECORDS ARRAYS AND SIDEEFFECTS 812 Field label namespace One important point the namespace for toplevel record eld labels is at This is important if you intend to declare records with the same eld names If you do the original labels will be lost For example consider the following sequence type recl name string height float type recl n me string height float let jason name quotJasonquot height 625 val jason recl name Jason height625 type recZ name string phone string type recZ n me string phone string let dave name quotDavequot phone quot626 395 6568quot val dave recZ name Dave phone 626 395 6568 jasonname Characters 0 5 This expression has type recl but is here used with type recZ davename string quotDavequot let bob name quotBobquot height 575 Characters 10 41 The label height belongs to the type recl but is here mixed with labels of type recZ In this case the name eld was rede ned At this point the original rec1name label is lost making it impossible to access the name eld in a value of type recl and impossible to construct new values of type recl It is however permissible to use the same label names in separate les as we will see in Chapter 11 82 References Variables are never mutable However reference cells are common enough in OCaml programs that a special form is de ned just for this case Refer ence cells are created with the ref function let i ref 1 val i int ref contents1 82 REFERENCES 5 S The builtin type 31 ref is de ned using a regular record de nition the normal operations can be used on this record type 31 ref mutable contents a Dereferencing uses the operator and assignment uses the in x operator 39i 39int 1 39i 17 un39it O 39i 39int 17 Don t get confused with the operator in C here The following code can be confusing let flag ref true val flag bool ref contentstrue 39if flag then 1 else 2 39int 1 You may be tempted to read 39if fl ag then as testing if the flag is false This is not the case the operator is more like the operator in C 821 Value restriction As we mentioned in Section 511 mutability and sideeffects interact with type inference For example consider a oneshot function that saves a value on its rst call and returns that value on all future calls This function is not properly polymorphic because it contains a mutable eld We can illustrate this using a single variable let x ref None val x a option ref contentsNone let oneshot y match x with None gt x Some y y Some Z gt Z val oneshot a ltfungt oneshot 1 oneshot a gt 56 CHAPTER 8 RECORDS ARRAYS AND SIDEEFFECTS val oneshot 39int gt 39int ltfungt oneshot 2 oneshot quotHe39l39loquot Characters 9 16 This expression has type String but 39is here used with type 39int The value restriction requires that polymorphism be restricted to values Values include functions constants constructors with elds that are values and immutable records with elds that are values A function application is not a value and a record with mutable elds is not a value By this de nition the x and oneshot variables cannot be polymorphic as the type constants a indicate 83 Arrays and strings Arrays are xedsize vectors of values All of the values must have the same type The elds in the array can be accessed and modi ed in constant time Arrays can be created with the 91 enl syntax which creates an array of length n initialized with the values computed from the expressions 61 en let a I1 3 5 7 val a 39int array I1 3 5 7 Fields can be accessed with the a 39i construction Array indices start from 0 Arrays are boundschecked Isl0 39int Isl1 39int 3 Isl5 Uncaught except39ion Inval39idargument Arrayget Fields are updated with the a 39i lt e assignment statement a2 lt 9 un39it O 61 39int array I1 3 9 7 The Ar ray library module de nes additional functions on arrays Arrays of arbitrary length can be created with the Array create function which requires a length and initializer argument The Ar raylength function returns the number of elements in the array 84 SEQUENTIAL EXECUTIONAND LOOPING S 7 let a Arraycreate 10 1 val a 39int array I1 1 1 1 1 1 1 1 1 1 Arraylength a 39int 10 The Ar raybl 39i t function can be used to copy elements destructively from one array to another The bl 39i t function requires ve arguments the source array a starting offset into the array the destination array a starting offset into the destination array and the number of elements to copy Arraybl39it I 3 4 5 6 I 1 a 3 2 un39it 61 39int array I1 1 1 4 5 1 1 1 1 1 In OCaml strings are a lot like packed arrays of characters The access and update operations use the syntax s 39i and s 39i lt c let s quotJasonquot val s string quotJasonquot s2 char s 5D lt y un39it O 5 string quotJasynquot The St r39i ng module de nes additional functions including the St r39i ng length and St r39i ng bl 39i t functions that parallel the correspond ing Array operations The St r39i ngcreate function does not require an initializer It creates a string with arbitrary contents Str39ingcreate 10 string quot000011000000200027000000 Str39ingcreate 10 string quot196181027001000000000000000 84 Sequential execution and looping Sequential execution is not useful in a functional languageiwhy compute a value and discard it In an imperative language including a language like OCaml sequential execution is used to compute by sideeffect Sequential execution is de ned using the semicolon operator The ex pression eyez evaluates 21 discards the result e1 probably has a side effect and evaluates 22 Note that the semicolon is a separator as in 58 CHAPTER 8 RECORDS ARRAYS AND SIDEEFFECTS Pascal not a terminator as in C The compiler produces a warning if ex pression 21 does not have type un39i t As usual heed these warnings The 39ignore a gt un39i t function can be used if you really want to discard a nonunit value There are two kinds of loops in OCaml a for loop and a wh39i l e loop The wh39i l e loop is simpler we ll start there 841 while loops The while loop has the syntax while 21 do 22 done The expression 21 must have type bool When a wh39i l e loop is evaluated the expression 21 is evaluated rst If it is false the wh39i l e loop terminates Otherwise 22 is evaluated and the loop is evaluated again Here is an example to check if a value x is in an array a let arraymem x a let len Arraylength a in let flag ref false in let i ref 0 in while flag false ampamp i lt len do if ai x then flag true 1 391 1 done flag val arraymem a gt a array gt bool ltfungt arraymem 1 I 3 5 1 6 bool true arraymem 7 I 3 5 1 6 bool false 842 for loop The for loop iterates over a nite range of integers There are two forms one to count up and one to count down The syntax of these two operations is as follows for 1 21 to 22 do 23 done for 1 21 downto 22 do 23 done The for loops rst evaluate 21 and 22 which must have type 39int The to form evaluates the body 23 for values of 1 counting up from 21 to 22 and the downto form evaluates the body for values counting down from 21 to 22 Note that the nal value 22 is included in the evaluation The following code is a simpler expression for computing membership in an array although it is somewhat less ef cient 84 SEQUENTIAL EXECUTIONAND LOOPING 59 1et arraymem x a 1et f1ag ref fa1se in for i 0 to Array1ength a 1 do if ai x then f1ag true done f1ag va1 arraymem a gt a array gt boo1 ltfungt arraymem 1 I 3 5 1 6 boo1 true arraymem 7 I 3 5 1 6 boo1 fa1se 60 CHAPTER 8 RECORDS ARRAYS AND SIDEEFFECTS Chapter 9 Input and Output The 10 library in OCaml is fairly expressive including a Unix library that implements most of the portable Unix system calls In this chapter we ll cover many of the standard builtin IO functions The 10 library uses two data types the 39inchannel is the type of IO channels from which characters can be read and the outchannel is an IO channel to which characters can be written IO channels may represent les communication channels or some other device the exact operation depends on the context At program startup there are three channels open corresponding to the standard le descriptors in Unix val std39in 39inchannel val stdout outchannel val stderr outchannel 91 File opening and closing There are two functions to open an output le the openout function opens a le for writing text data and the openoutb39i n opens a le for writing binary data These two functions are identical on a Unix system On a Macin tosh or Windows system the openout function performs line termination translation why do all these systems use different line terminators while the openoutb39i n function writes the data exactly as written These func tions raise the Syserror exception if the le can t be opened otherwise they return an outchannel A le can be opened for reading with the functions open i n and open i nb39i n val openout String gt outchannel 61 62 CHAPTER 9 INPUTAND OUTPUT val openoutbi n string gt outchannel val openi n string gt inchanne val openi nbin string gt inchannel The openoutgen and openi ngen functions can be used to perform more sophisticated le opening The function requires an argument of type openfl ag that describes exactly how to open the le type openfl ag 0penrdonl y Openwronly Openappend 0penc reat 0pentrunc Openexcl Openbi nary Opentext 0pennonbl ock These opening modes have the following interpretation Openrdonly open for reading Openwr0nly open for writing Openappend open for appending Opencreat create the le if it does not exist Openjrunc empty the le if it already exists Openexcl fail if the le already exists Openbinary open in binary mode no conversion Openjext open in text mode may perform conversions Opennonblock open in nonblocking mode The openi ngen and openoutgen functions have types val openingen openflag list gt int gt string gt inchannel val openoutgen openflag list gt int gt string gt outchannel The openfl ag list describe how to open the le the int argument de scribes the Unix mode to apply to the le if the le is created and the st ring argument is the name of the le The closing operations cl oseout and cl osei n close the channels If you forget to close a le the garbage collector will eventually close it for you However it is good practice to close the channel manually when you are done with it val closeout outchannel gt unit val closein inchannel gt unit 92 WRITING AND READING VALUES ON A CHANNEL 63 92 Writing and reading values on a channel There are several functions for writing values to an outchannel The outputchar writes a single character to the channel and the outputstr39i ng writes all the characters in a string to the channel The output function can be used to write part of a string to the channel the 39i nt arguments are the offset into the string and the length of the substring val outputchar outchannel gt char gt un39it val outputstr39ing outchannel gt string gt un39it val output outchannel gt string gt 39int gt 39int gt unit The input functions are slightly different The 39inputchar function reads a single character and the 39inputl 39i ne function reads an entire line discarding the line terminator The 39input functions raise the exception Endoff i l e if the end of the le is reached before the entire value could be read val 39inputchar 39inchannel gt char val 39inputl39ine 39inchannel gt string val 39input 39inchannel gt string gt 39int gt 39int gt 39int There are also several functions for passing arbitrary OCaml values on a channel opened in binary mode The format of these values is imple mentation speci c but it is portable across all standard implementations of OCaml The outputbyte and 39inputbyte functions writeread a sin gle byte value The outputb39i nary39i nt and 39inputb39i naryj nt functions writeread a single integer value The outputval ue and 39inputval ue functions writeread arbitrary OCaml values These functions are unsafe Note that the 39inputval ue function returns a value of arbitrary type a OCaml makes no effort to check the type of the value read with 39inputval ue against the type of the value that was written with outputval ue If these differ the compiler will not know and most likely your program will generate a segmentation fault val outputbyte outchannel gt int gt unit val outputbinaryint outchannel gt int gt unit val outputvalue outchannel gt a gt unit val inputbyte inchannel gt int val inputbinaryint inchannel gt int val inputvalue inchannel gt a 93 Channel manipulation If the channel is a normal le there are several functions that can modify the position in the le The seeLout and seeL i n function change the 64 CHAPTER 9 INPUTAND OUTPUT le position The posout and pos i n function return the current position in the le The outchannell ength and 39inchannellength return the total number of characters in the le val seekout outchannel gt int gt unit val posout outchannel gt int val outchannellength outchannel gt int val seekin inchannel gt int gt unit val posin inchannel gt int val inchannellength inchannel gt int If a le may contain both text and binary values or if the mode of the the le is not know when it is opened the setb39i narymodeout and setb39i narymode39i n functions can be used to change the le mode outchannel gt bool gt un39it 39inchannel gt bool gt un39it val setb39i narymodeout val setb39i narymode39in The channels perform buffered 10 By default the characters on an outchannel are not all written until the le is closed To force the writing on the buffer use the flush function val flush outchannel gt un39it 94 Printf The regular functions for IO can be somewhat awkward OCaml also imple ments a pri ntf function similar to the pri ntf in UnixC These functions are de ned in the library module Pr39i ntf The general form is given by fpr39i ntf val fpr39intf outchannel gt a outchannel un39it format gt Don t be worried if you don t understand this type de nition The format type is a builtin type intended to match a format string The nor mal usage uses a format string For example the following statement will print a line containing an integer i and a string S fpr39intf stdout quotNumber d String sn 39i s The strange typing of this function is because OCaml checks the type of the format string and the arguments For example Ocaml analyzes the for mat string to tell that the following fpr39i ntf function should take a fl oat 39int and str39i ng argument a let f fpr39intf stdout quotFloat g Int d String sn Float val f float gt 39int gt String gt un39it ltfungt 94 PRINTF 65 The format speci cation corresponds roughly to the C speci cation Each format argument takes a Width and length speci er that corresponds to the C speci cation d or i convert an integer argument to signed decimal u convert an integer argument to unsigned decimal X convert an integer argument to unsigned hexadecimal using lowercase letters X convert an integer argument to unsigned hexadecimal using uppercase letters 3 insert a string argument c insert a character argument f convert a oatingpoint argument to decimal notation in the style ddddddd e or E convert a oatingpoint argument to decimal notation in the style tidald edd mantissa and exponent g or G convert a oatingpoint argument to decimal notation in style f or e E Whichever is more compact b convert a Boolean argument to the string true or false a userde ned printer It takes two arguments it applies the rst one to the current output channel and to the second argument The rst ar gument must therefore have type outchannel gt b gt un39i t and the second one has type b The output produced by the function is therefore inserted into the output of fpr39i ntf at the current point t same as a but takes only one argument with type outchannel gt un39i t and applies it to the current outchannel takes no argument and output one character The Pr39i ntf module also provides several additional functions for print ing on the standard channels The pr39i ntf function prints in the channel stdout and epr39i ntf prints on stderr let pr39intf fpr39intf stdout let epr39intf fpr39intf stderr The spr39i ntf function has the same format speci cation as pr39i ntf but it prints the output to a string and returns the result val spr39intf a unit str39ing format gt a 66 CHAPTER 9 INPUTAND OUTPUT 95 String buffers The Buffer library module provides string buffers The string buffers can be signi cantly more ef cient that using the native string operations String buffers have type Buffer t The type is abstract meaning that the imple mentation of the buffer is not speci ed Buffers can be created with the Buffer create function type t Abstract type of string buffers val create un39it gt t There are several functions to examine the state of the buffer The contents function returns the current contents of the buffer as a string The length function returns the total number of characters stored in the buffer The clear and reset function remove the buffer contents the dif ference is that reset also deallocates the internal storage used to save the current contents val contents t gt String length t gt 39int clear t gt un39it val reset t gt un39it val val There are also several functions to add values to the buffer The addchar function appends a character to the buffer contents The addstr39i ng function appends a string to the contents there is also an addsubstr39i ng function to append part of a string The addbuffer func tion appends the contents of another buffer and the addchannel reads input off a channel and appends it to the buffer val addchar t gt char gt un39it val addstr39ing t gt String gt un39it val addsubstr39ing t gt String gt 39int gt 39int gt un39it val addbuffer t gt t gt un39it val addchannel t gt 39inchannel gt 39int gt unit The outputbuffer function can be used to write the contents of the buffer to an outchannel val outputbuffer outchannel gt t gt unit The Pr39i ntf module also provides formatted output to a string buffer The bpr39i ntf function takes a pri ntf style format string and formats out put to a buffer val bpr39intf Buffert gt a Buffert un39it format gt a Chapter 10 Files Compilation Units and Programs One of the principles of modern programming is data hiding using en capsulation An abstract data type ADT is a program unit that de nes a data type and functions also called methods that operate on that data type An ADT has two parts a signature or interface that declares the accessible data structures and methods and an implementation that de nes concrete implementations of the objects declared in the signature The implementa tion is hidden all access to the ADT must be through the methods de ned in the signature There are several ideas behind data hiding using ADTs First by sepa rating a program into distinct program units called modules the program may be easier to understand Ideally each module encapsulates a single concept needed to address the problem at hand Second by hiding the implementation of a program module dependen cies between program modules become tightly controlled Since all inter actions must be through a module s methods the implementation of the module can be changed Without affecting the correctness of the program as long as the behavior of the methods is preserved OCaml provides a module system that makes it easy to use the concepts of encapsulation and data hiding In fact in OCaml every program le acts as an abstract module called a compilation unit in the OCaml terminology A signature for the le can be de ned in a mli le with the same name If there is no ml39i le the default signature includes all type and functions de ned in the m1 le 67 68 CHAPTER 10 FILES COMPILATION UNITS AND PROGRAMS 101 Signatures In OCaml a signature contains type de nitions and function declarations for the visible types and methods in the module To see how this works let s revisit the binary trees we de ned in Chapter 6 A binary tree de nes a simple distinct concept and it is an ideal candidate for encapsulation A module signature usually contains three parts 1 Data types used by the module 2 Exceptions used by the module 3 Method type declarations for all the externally visible methods de ned by the module For the binary tree the signature will need to include a type for binary trees and type declarations for the methods for operating on the tree First we need to choose a lename for the compilation unit The lename should re ect the function of the data structure de ned by the module For our purposes the binary tree is a data structure used for de ning a nite set of values and an appropriate lename for the signature would be fset ml i The data structure de nes a type for sets and three methods an empty set a mem membership function and an insert insertion function The complete signature is de ned below we ll discuss each of the parts in the following sections The abstract type of sets type a t Empty set val empty a t Membership function val mem gt a t gt bool Insertion is functional val insert a gt a t gt a t 1011 Type declarations Type declarations in a signature can be either transparent or abstract An abstract type declaration declares a type without giving the type de nition a transparent type declaration includes the type de nition For the binary tree the declaration type a t is abstract because the type de nition is left unspeci ed In this case the type de nition won t be visible to other program units they will be forced to use the methods if they 1 02 IMPLEMEN TA TYONS 69 want to operate on the data type Note that the abstract type de nition is polymorphic it is parameterized by the type variable 31 Alternatively we could have chosen a transparent de nition that would make the type visible to other program modules For example if we intend to use the unbalanced tree representation we might include the following type declaration in the signature type a t Node of a t a a t Leaf By doing this we would make the binary tree structure visible to other program components they can now use the type de nition to access the binary tree directly This would be undesirable for several reasons First we may want to change the representation later by using redblack trees for example If we did so we would have to nd and modify all the other modules that accessed the unbalanced structure directly Second we may be assuming that there are some invariants on values in the data structure For example we may be assuming that the nodes in the binary tree are or dered If the type de nition is visible it would be possible for other program modules to construct trees that violate the invariant leading to errors that may be dif cult to nd 1012 Method declarations The method declarations include all the functions and values that are visible to other program modules For the Fset module the visible methods are the empty mem and 39inse rt methods The signature gives only the type declarations for these methods It should be noted that only these methods will be visible to other pro gram modules If we de ne helper functions in the implementation these functions will be private to the implementation and inaccessible to other program modules 102 Implementations The module implementation is de ned in a ml le with the same base name as the signature le The implementation contains parts that correspond to each of the parts in the signature 1 Data types used by the module 2 Exceptions used by the module 70 CHAPTER 10 FILES COMPILATION UNITS AND PROGRAMS 3 Method de nitions The de nitions do not have to occur in the same order as declarations in the signature but there must be a de nition for every item in the signature 1021 Type de nitions In the implementation de nitions must be given for each of the types in the signature The implementation may also include other types These types will be private to the implementation they will not be visible outside the implementation For the Fset module let s use the redblack implementation of balanced binary trees We need two type de nitions the de nition of the Red and Bl ack labels and the tree de nition itself type color Red Black type a t Node of color a t quot Leaf The col or type is a private type the a t type gives the type de nition for the abstract type declaration type a t in the signature 1022 Method de nitions In the implementation we need to implement each of the methods declared in the signature The empty method is easy the Leaf node is used to implement the empty set let empty Leaf The mem method performs a search over the binary tree The nodes in the tree are ordered and we can use a binary search let rec mem x function Leaf gt false I Node a y b gt if x lt y then mem x a else if x gt y then mem x b else true The implement the 39inse rt method we need two methods one is the actual 39inse rt function and another is the helper function balance that keeps the tree balanced We can include both functions in the implemen tation The balance function will be private since it is not declared in the signature 103 BUILDINGA PROGRAM 71 let balance function Black Node Red Node Red a x b y c Z gt Node Red Node Black a x b y Node Black c Z Black Node Red a x Node Red b y c Z d gt Node Red Node Black a x b y Node Black c Z Black a x Node Red Node Red b y c Z d gt Node Red Node Black a x b y Node Black c Z Black a x Node Red b y Node Red c Z d gt Node Red Node Black a x b y Node Black c Z a b c d gt Node a b c d let insert x s et rec ins function Leaf gt Node Red Leaf x Leaf Node color a y b as s gt if x lt y then balance color ins a y b else if x gt y then balance color a y ins b else s in match ins s with guaranteed to be non empty Node a y b gt Node Black a y b Leaf gt raise Invalidargument quotinsert 103 Building a program Once a compilation unit is de ned the types and methods can be used in other les by pre xing the names of the methods with the capitalized le name For example the empty set can be used in another le with the name Fsetempty Let s de ne another module to test the Fset implementation This will be a simple program with an input loop where we can type in a string If the string is not in the set it is added otherwise the loop will print out a message that the string is already added To implement this program we need to add another le we ll call it test ml The Test compilation unit has no externally Visible types or methods By default the testmli le should be empty The Test implementation should contain a function that recursively l prints a prompt 2 reads a line from stdi n 3 checks if the line is already in the set 72 CHAPTER 10 FILES COMPILATION UNITS AND PROGRAMS 4 if it is then print a message 5 repeat We ll implement this as a quotloop method quotlet quotloop 0 quotlet set ref Fsetempty 39in ry while true do outputstring stdout quotsetgt flush stdout let line inputline stdin in if Fsetmem line set then Printfprintf s is already in the setn line else Printfprintf s added to the setn line set Fsetinsert line set done with Endoffile gt quotlet quotloop 0 There are a few things to note First we need to catch the Endoff39i quotl e exception that is raised when the end of the input le is reached In this case we exit without comment To run the loop we include the line quotlet quotloop 0 The quotlet may seem strange it tells the OCaml parser that this is a new top level expression Another way to accomplish this is by adding the terminator after the last 0 expression in the quotloop function 104 Compiling the program Once the les for the program are de ned the next step is to compile them using ocaml c The usage of ocamlc is much like cc Normally the les are compiled separately and linked into an executable Signatures must be compiled rst followed by the implementations For the fset module the signature can be compiled with the following command ocamlc c fsetmquotl39i 1 04 COMPILING THE PROGRAM 73 If there are no errors in the signature this step produces a le called fset cmi The implementations are compiled with the following command ocamlc c fsetml ocamlc c testml If this step is successful the compiler produces the les fset cmo and test cmo The modules can now be linked into a complete program using the ocaml c linker The command is as follows ocamlc 0 test fsetcmo testcmo The linker requires all of the cmo les to be included in the program The order of these les is important Each module in the link line can refer only to the modules listed before it If we reverse the order of the modules on the link line we will get an error ocamlc 0 test testcmo fsetcmo Error while linking testcmo Reference to undefined global Fset Exit 2 Once the program is linked we can run it test setgt hello hello added to the set setgt world world added to the set setgt hello hello is already in the set setgt x x added to the set setgt world world is already in the set 1041 Where is the main function Unlike C programs OCaml program do not have a main function When an OCaml program is evaluated all the statements in the les in the program are evaluated in the order speci ed on the link line Program les contain type and method de nitions They can also contain arbitrary expressions to be evaluated The let loop 0 statement in the testml le is an example it evaluates the loop function Informally this is the main loop it is the last expression to be executed in the program 74 CHAPTER 10 FILES COMPILATION UNITS AND PROGRAMS 1042 Some common errors When a ml le is compiled the compiler compares the implementation with the signature in the cmi le If a de nition does not match the signature the compiler will print an error and refuse to compile the le Type errors For example suppose we had reversed the order of arguments in the Fset i nse rt function so that the set argument is rst let insert 5 x When we compile the le we get an error The compiler prints the types of the mismatched values and exits with an error code ocamlc c fsetml The implementation fsetml does not match the interface fset Values do not match val insert a t gt a gt a t is not included in val insert a gt a t gt a t Exit 2 Missing de nition errors Another common error occurs when a method declared in the signature is not de ned in the implementation For example suppose we had de ned an add method rather than an insert method In this case the compiler prints the name of the missing method and exits with an error code ocamlc c fsetml The implementation fsetml does not match the interface fset The field insert is required but not provided Exit 2 Type de nition mismatch errors Transparent type de nitions in the signature can also cause an error if the type de nition in the implementation does not match Suppose we were to export the de nition for type a t We need to include exactly the same de nition in the implementation A correct fset ml i le would contain the following de nition type col or cmi cmi 1 04 COMPILING THE PROGRAM 75 type a t Node of color Leaf Note that we must include a type de nition for color since it is used in the de nition of the set type a t The type de nition for col or may be transparent or abstract ow suppose we reorder the constructors in the interface de nition for a t by placing the Leaf constructor rst atquotaquotat type col or Node of color quot a t quot a quot When we compile the le the compiler will produce an error with the mismatched types ocamlc c fsetmli ocamlc c fsetml The implementation fsetml does not match the interface fsetcmi Type declarations do not match type a t Node of color a t a a t Leaf is not included in type a t Leaf Node of color a t quot a a t Exit 2 Compile dependency errors The compiler will also produce errors if the compile state is inconsistent Each time an interface is compile all the les that uses that interface must be recompiled For example suppose we update the fsetmli le and recompile it and the test ml le but we forget to recompile the fset ml le The compiler produces the following error ocamlc c fsetmli ocamlc c testml ocamlc 0 test fsetcmo testcmo Files testcmo and fsetcmo make inconsistent assumptions over interface Fset Exit 2 It takes a little work to detect the cause of the error The compiler says that the les make inconsistent assumptions for interface Fset The inter face is de ned in the le fset cmi and so this error message states that at least one of fsetml or test cmo needs to be recompiled In general we don t know which le is out of date and the best solution is to recompile them all 76 CHAPTER 10 FILES COMPILATION UNITS AND PROGRAMS 105 Using open to expose a namespace Using the full name Modulenamemethodname to refer to the methods in a module can get tedious The open Modulemame statement can be used to open a module interface which will allow the use of unquali ed names for types exceptions and methods For example the test ml module can be somewhat simpli ed by using the open statements for the Pri ntf and Fset modules open Printf open Fset let loop et set ref empty in try while true do outputstring stdout quotsetgt flush stdout let line inputline stdin in if mem line set then printf s is already in the setn line else printf s added to the setn line set insert line set Endoffi l e gt 0 let loop 0 Sometimes multiple opened modules will de ne the same name In this case the last module with an open statement will determine the value of that symbol Fully quali ed names of the form Modulemame name may still be used even if the module has been opened Fully quali ed names can be used to access values that may have been hidden by an open statement 1051 A note about open Be careful with the use of open In general fully quali ed names provide more information specifying not only the name of the value but the name of the module where the value is de ned For example the Fset and Li st modules both de ne a mem function In the Test module we just de ned 1 06 DEBUGGING A PROGRAM 77 it may not be immediately obvious to a programmer that the mem symbol refers to Fsetmem not Li stmem In general you should use open statement sparingly Also as a matter of style it is better not to open most of the library modules like the Ar ray Li st and String modules all of which de ne methods like create with common names Also you should never open the Unix Obj and Marshal modules The functions in these modules are not completely portable and the fully quali ed names identify all the places Where portability may be a problem for instance the Unix grep command can be used to nd all the places Where Unix functions are used The behavior of the open statement is not like an i ncl ude statement in C An implementation le mod ml should not include an open Mod state ment One common source of errors is de ning a type in a mli interface then attempting to use open to include the de nition in the ml imple mentation This won t workithe implementation must include an identical type de nition True this is an annoying feature of OCaml But it preserves a simple semantics the implementation must provide a de nition for each declaration in the signature 106 Debugging a program The ocaml debug program can be used to debug a program compiled with ocaml c The ocamldebug program is a little like the GNU gdb program it allows breakpoints to be set When a breakpoint is reached control is returned to the debugger so that program variables can be examined To use ocaml debug the program must be compiled with the g ag ocamlc c g fsetmli ocamlc c g fsetml ocamlc c g testml ocamlc 0 test g fsetcmo testcmo The debugger is invoked using by specifying the program to be debugged on the ocaml debug command line ocamldebug test Objective Caml Debugger version 204 ocd help Li st of commands cd complete pwd directory kill help quit run reverse step backstep goto finish next start previous print display source break delete set show info frame backtrace bt up down last list loadprinter installprinter removeprinter 78 CHAPTER 10 FILES COMPILATION UNITS AND PROGRAMS ocd There are several commands that can be used The basic commands are run step next break list print and goto run Start or continue execution of the program break module linenum Set a breakpoint on line linenum in module mod ule list display the lines around the current execution point print eXpr Print the value of an expression The expression must be a vari able goto time Execution of the program is measured in time steps starting from 0 Each time a breakpoint is reached the debugger will print the current time The goto command may be used to continue execu tion to a future time or to a previous timestep step Go forward one time step next If the current value to be executed is a function evaluate the func tion a return control to the debugger when the function completes Otherwise step forward one time step For debugging the test program we need to know the line numbers Let s set a breakpoint in the loop function which starts in line 27 in the Test module We ll want to stop at the rst line of the function ocd break Test 28 Loading program done Breakpoint 1 at 24476 file Test line 28 column 4 ocd run Time 7 pc 24476 module Test Breakpoint 1 28 ltbgtlet set ref Fsetempty in ocd n Time 8 pc 24488 module Test 29 ltbgttry ocd p set set string Fsett ref contentsFsetLeaf Next let s set a breakpoint after the next input line is read and continue execution to that point M MEMEWGAHMQM N ocd Tist 27 Tet Toop 28 Tet set ref Fsetempty in 29 ltbgttry 30 whiTe true do 31 outputstring stdout quotsetgt 32 fTush stdout 33 Tet Tine inputTine stdin in 34 if Fsetmem Tine set then 35 Printfprintf s is aTready in the setn Tine 36 eTse 37 Printfprintf s added to the setn Tine 38 set Fsetinsert Tine set 39 done ocd break 34 Breakpoint 2 at 24600 fiTe Test Tine 33 coTumn 40 ocd run setgt heTTo Time 22 pc 24604 moduTe Test Breakpoint 2 34 ltbgtif Fsetmem Tine set then ocd p Tine Tine string quotheTToquot When we run the program the evaluation prompts us for an input line andiNecanseethev ueofthehneintheTineva abkLefscon nueand View the set after the line is added ocd n Time 24 pc 24628 moduTe Test 34 if Fsetmem Tine setltagt then ocd n Time 25 pc 24672 moduTe Test 37 ltbgtPrintfprintf s added to the setn Tine ocd n Time 135 pc 24700 moduTe Test 37 Printfprintf s added to the setn Tineltagt ocd n Time 141 pc 24728 moduTe Test 38 set Fsetinsert Tine setltagt ocd n Time 142 pc 24508 moduTe Test 31 ltbgtoutputstring stdout quotsetgt ocd p set set string Fsett ref contentsFsetNode ltabstrgt FsetLeaf quotheTToquot FsetLeaf ocd 80 CHAPTER 10 FILES COMPILATION UNITS AND PROGRAMS This value seems to be correct Next suppose we want to go back a descend into the Fsetmem function We can go back to time 22 the time just before the Fset mem function is called and use the step command to descend into the membership function ocd goto 22 setgt hello Time 22 pc 24604 module Test Breakpoint 7 34 ltbgtif Fsetmem line set then ocd s Time 23 pc 22860 module Fset 39 Leaf gt ltbgtfalse ocd s Time 24 pc 24628 module Test 34 if Fsetmem Tine setltagt then ocd Note that when we go back in time the program prompts us again for an input line This is due to way time travel is implemented in ocaml debug Periodically the debugger takes a checkpoint of the program using the Unix forkO system call When reverse time travel is requested the de bugger restarts the program from the closest checkpoint before the time requested In this case the checkpoint was taken sometime before the call to i nputl i ne and the program resumption requires another input value When we step into the Fset mem function we see that the membership is false the set is the Leaf empty value We can continue from here exam ining the remaining functions and variables You may wish to explore the other features of the debugger Further documentation can be found in the OCaml reference manual Chapter 1 1 The OCaml Module System The compilation units discussed in the Chapter 10 are not the only way to create modules OCaml provides a general module system where modules can be created explicitly using the module keyword There are three key parts in the module system signatures structures and functors Module signatures correspond to the signatures de ned in a ml39i le and module structures correspond to the implementations de ned in a ml le There is one major difference Each compilation unit has at most one signature de ned by the ml39i le The module system is more general a single signature can be used to specify multiple structures and a structure can have multiple signatures This ability to share signatures and structures can have important effects on code re use For example in Chapter 6 we introduced three implemen tations of nite sets using unbalanced ordered and balanced binary trees All three of these implementations can be expressed as structures having the same signature Any of the three implementations can be used in a context that requires an implementation of nite sets The ability to assign multiple signatures to a structure becomes useful in larger programs composed of multiple components each spread over multiple les The structures within a program component may make their implementations visible to one another like a friend declaration in a C class or a protected declaration for a Java method Outside the program component a new signature may be assigned to hide the implementation details making them private OCaml module system also includes functors or parameterized structures A functor is a function that computes a structure given a struc ture argument Functors provide a simple way to generalize the implemen tation of a structure In the following sections we ll describe the three different parts of the module system by developing the nite set example in the context of the 81 82 CHAPTER 11 THE OCAML MODULE SYSTEM module system 1 11 Module signatures A module signature is declared with a module type declaration module type Name si 9 signature end The name of the signature must begin with an uppercase letter The signature can contain any of the items that can occur in an interface ml39i le including any of the following type declarations except39i on de nitions method type declarations using the val keyword open statements to open the namespace of another signature include statements that include the contents of another signature nested signature declarations Signatures can be de ned in an interface implementation or in the OCaml toploop A signature is like a type declarationiif a ml39i le de nes a signature the same signature must also be de ned in the m1 le For the nite set example the signature should include a type declaration for the set type and method declarations for the empty mem and insert methods For this example we ll return to the OCaml toploop which will display the types of the modules we de ne module type FsetSig 519 type a t val empty a t val mem a gt a t gt bool val insert a gt a t gt 1 a t module type FsetS39ig s ig type a t val empty a t val mem a gt a t gt bool val insert a gt a t gt a t end 112 MODULE STRUCTURES 83 The 39include statement can be used to create a new signature that ex tends an existing signature For example suppose we would like to de ne a signature for nite sets that includes a delete function to remove an el ement of a set One way to be to retype the entire signature for nite sets followed by the delete declaration The 39incl ude statement performs this inclusion automatically module type FsetDS39ig 9 39incl ude Fset val delete a gt a t gt a t module type FsetDSig sig type a t val empty a t val mem a gt val insert val delete end 1 12 Module structures Module structures are de ned with the module keyword module Name st ruct implementation end Once again the module name must begin with an uppercase letter The implementation is exactly the same as the contents of a ml le It can include any of the following type de nitions except39i on de nitions method de nitions using the let keyword open statements to open the namespace of another module 39incl ude statements that include the contents of another module signature declarations nested structure de nitions Let s try this with the balanced binary tree example the complete de nitions for the balance and insert functions are given in Section 1022 84 CHAPTER 11 THE OCAML MODULE SYSTEM moduquotle Fset struct type color Red Black type a t a t a a t Node of color quot Leaf quotlet empty Leaf let rec mem x function Leaf gt false I Node a y b gt if x lt y then mem x a else if x gt y then mem x b else true quotlet baquotlance quotlet 39insert x s end moduquotle Fset s ig type color Red Black and a t Node of color a t a a t Leaf vaquotl empty a t vaquotl mem a gt a t gt booquotl at at gt at vaquotl baquotlance coquotlor a vaquotl 39insert a gt a t gt a t end Fsetempty a Fsett FsetLeaf Fsetbaquotl ance a Fsetcoquotlor a Fsett a quot a Fsett gt a Fsett ltfungt 1 121 Assigning a signature Note that the default signature assigned to the structure exposes all of the types and functions in the structure including the type de nitions for the color and a t types as well as the bal ance function which would nor mally be hidden To assign a signature to a structure we include a type constraint using a modi er of the following form moduquotl e Name SigName struct implementation end 113 FUNCTORS 85 In the nite set example we want to assign the FsetS39i g signature to the module module Fset FsetS39ig struct type color Red Black type a t Node of color a t a a t Lea quotlet empty Leaf quotlet rec mem x quotlet bal ance quotlet 39insert x s end module Fset FsetS39ig Fsetempty 39 a Fsett ltabstrgt Fsetbaquotl ance Characters 0 12 Unbound vaquotl ue Fsetbaquotl ance When we assign this signature the type de nition for a t becomes abstract and the bal ance function is no longer Visible outside the module de nition 1 13 Functors One problem with the implementation of nite sets that we have been using is the use of the builtin lt comparison operation to compare values in the set The de nition of the lt operator is implementationspeci c and it may not always de ne the exact ordering that we want To x this problem we can de ne our own comparison function but we will need to de ne a separate nite set implementation for each different element type For this purpose we can use functors A functor is a function on modules the function requires a module argument and it produces a module Functors can be de ned with the functor keyword or with a more common alternate syntax moduquotl e Name functor ArgName ArgSig gt struct implementation en moduquotl e Name Arg ArgSig struct implementation end 86 CHAPTER 11 THE OCAML MODULE SYSTEM For the nite set example we ll need to de ne an argument structure that includes a type el t of elements and a comparison function compare We ll have the compare function return one of three kinds of values a negative number if the rst argument is smaller than the second zero if the two arguments are equal a positive number if the rst argument is larger than the second module type EltSig sig type elt val compare elt gt elt gt int en The nite set signature FsetSi 9 must also be modi ed to used a speci c element type el t Note that the set itself is no longer polymorphic it is de ned for a speci c type of elements module type FsetSig sig type elt type t val empty t val mem elt gt t gt bool val insert elt gt t gt t end Next we rede ne the set implementation as a functor The implementa tion must be modi ed to include a type de nition for the el t type and the mem and inse rt functions must be modi ed to make use of the comparison function from El t module Makeret Elt EltSig struct type elt Eltelt type color type t Node of color t elt quot t Leaf let empty Leaf let rec mem x function Leaf gt false I Node a y b gt 1L3FUNCTORS 87 let i Eltcompare x y in ifi ltOthen memxa else if i gt 0 then mem x b else true let balance let insert x s let rec ins function Leaf gt Node Red Leaf x Leaf Node color a y as s gt let i Eltcompare x y in if i lt 0 then balance color ins a y b else if i gt 0 then balance color a y ins b else s in match ins s with guaranteed to be non empty Node a y b gt Node Black a y b Leaf gt raise Invalidargument quotinsert end module Makeret functorElt EltSig gt sig type elt Eltelt and color Red Black and t Node of color t elt t Leaf val empty t val mem Eltelt gt t gt bool val balance color t elt t gt t val insert elt gt t gt t end Note the return type The argument type is right the functor takes an argument module El t with signature El tSi Q But the returned module makes the implementation fully Visible To x this problem we need to add a type constraint using the modi er module Makeret Elt EltSig FsetSig struct type elt Eltelt type color type t let empty let rec mem x let balance let insert x s 88 CHAPTER 11 THE OCAML MODULE SYSTEM end module Makeret functorElt EltS39ig gt FsetS39ig 1 131 Using a functor To use the module produced by the functor we need to apply it to a speci c module implementating the El tS39ig signature Let s de ne a comparison function for a nite set of integers The comparison function is straightfor ward module Int st ruct type el t 39int let compare 39i j 39if39i lt j then else 39if 39i gt j then 1 else 39i j 0 end module Int s ig type elt 39int val compare 39 Intcompare 3 5 39int int gt 39int gt 39int end We must not give the Int module the signature El tS39ig In the El tS39ig signature the elt type is abstract Since there is no way to create a value of the abstract type el t it would become impossible to use the compare function and the module would become useless module Int Int module Int EltS39ig Int compare 3 5 Characters 13 1 EltS39ig 4 This expression has type 39int but 39is here used with type Int elt A functor is applied to an argument with the syntax Functorname Argname To build a nite set of integers we apply the Makeret functor to the Int module module IntSet Makeret Int module IntSet s ig type elt MakeretIntelt and t MakeretIntt val empty t val mem elt gt t gt bool 113 FUNCTORS 89 val 39insert elt gt t gt t en IntSetempty IntSett ltabstrgt Note the type de nitions for el t and t both types are abstract 1132 Sharing constraints In its current state the IntSet module is actually useless Once again the problem is with type abstraction the el t type is de ned as an abstract type in the FsetS39i g signature The OCaml compiler remembers that the type of elements el t is produced by an application of the functor but it doesn t know that the argument type in the Int module was 39int IntSet insert 5 IntSetempty Characters 14 15 This expression has type 39int but 39is here used with type IntSetelt MakeretIntelt To x this problem we can t add a type de nition in the FsetS39i g module since we want to be able to construct nite sets with multiple different element types The only way to x this problem is to add a constraint on the functor type that speci es that the el t type produced by the functor is the same as the el t type in the argument 1 133 An implementation that works These kind of type constraints are called sharing constraints The argument and value of the Makeret functor share the same elt type Sharing constraints are de ned by adding a with type constraint to a signature The corrected de nition of the Makeret functor is as follows module Makeret Elt EltS39ig FsetS39ig with type elt Eltelt struct type elt Eltelt type color type t let empty let rec mem x let balance let 39insert x s module Makeret functorElt EltS39ig gt 90 CHAPTER 11 THE OCAML MODULE SYSTEM s ig type e1t E1te1t va1 empty t va1 mem e1t gt t gt boo1 va1 insert e1t gt t gt t end The toploop now displays the correct element speci cation When we rede ne the IntSet module we get a working version of nite sets of inte gers modu1e IntSet Makeret Int modu1e IntSet s ig type e1t Inte1t and t MakeretIntt t va1 mem e1t gt t gt boo1 va1 insert e1t gt t gt t end IntSetempty IntSett ltabstrgt open IntSet quotlet s insert 3 insert 5 insert 1 empty va1 s IntSett ltabstrgt mem 4 5 boo1 fa1se Chapter 12 The OCaml Object System OCaml includes a unique object system with classes parameterized classes and objects and the usual features of inheritance and subclassing Objects are perhaps not as frequently used in OCaml as in other languages like C or Java because the module system provides similar features for code reuse However classes and objects are often appropriate in programs where extensibility is desirable 121 The basic object system The OCaml object system differs in one major way from the classes de ned in many other languages the object system includes both class types as well as class expressions The two are separate just as module signatures are separate from module structures There are three construct in the OCaml object system class type are signatures for classes classes are initial spec i cations for objects and objects are instances of classes created with the new keyword 1211 Class types A class type is de ned using a cl ass type de nition The syntax of a class type declaration is as follows cl ass type name object declarations end The name of the class type should begin with a lowercase letter or an underscore The declarations can include any of the following Inheritance directives with the inherit keyword 91 92 CHAPTER 12 THE OCAML OBJECT SYSTEM Values declared with the val keyword Methods declared with the method keyword Type constraints declared with the constraint keyword To illustrate the object system let s use the canonical object example a onedimensional movable point The point should have methods to return and modify the position of the point The class type for the point includes two methods one to get the posi tion of the point and another to set the position of the point We will also include areset function to return the point to the origin class type pointtype object method get int method set int gt unit method reset unit end class type pointtype object method get int method set int gt unit method reset unit 1 212 Class expressions A class expression gives the de nition of a class The syntax for an class expression uses the cl ass keyword object implementation end The implementation can include any of the following Values de ned with the val keyword Methods de ned with the method keyword Type constraints de ned with the constraint keyword Initializers de ned with the initialize r keyword We can build a class of the po i nttype class type by implementing each of the elds in the class type To implement the point we will need to include a pos eld that speci es the position of the point The get method should return the pos value and the move method should add an offset to the position 121 THEBASIC OBJECTSYSTEM 93 class point object val mutable pos 0 method get pos method set pos pos lt pos method reset pos lt 0 end class point object val mutable pos int method get int method reset unit method set int gt unit end The pos lt pos off is a sideeffect the value of pos is updated by adding the offset argument Note that the pos eld is visible in the class type To get the correct class type we need to add a type constraint class point pointtype object val mutable pos 0 method get pos method set pos pos lt pos method reset pos lt 0 end class point pointtype Class expressions are templates like function bodies The expressions in a class expression are not evaluated when the class is de ned they are evaluated when the class is instantiated as an object 1213 Objects Objects are the values created from classes using the new keyword The methods in the object can be accessed by using the operator let p new po int val p po int ltobjgt pget int 0 pset 7 unit O pget 39int 7 let p2 new point 3 u 94 CHAPTER 12 THE OCAML OBJECT SYSTEM val p2 po int ltobjgt p2get 39int 1214 Parameterized class expressions Class expressions can be parameterized in OCaml using a fun expression For example suppose we want to specify the initial position of the point as an argument to the class expression class makepo intclass 39in39it39ialpos 39int object val mutable pos 39in39it39ialpos method get pos method set pos pos lt pos method reset pos lt 0 end class makepo intclass 39int gt object val mutable pos 39int method get 39int method reset un39it method set 39int gt un39it en We have to constrain the argument 39in39i t39i a1 pos to be an 39int other wise the object would be polymorphic Speci c classes can be de ned by application class po int7 makepo intclass 7 cl ass po i nt7 makepo i ntclass let p new po int7 val p po int7 ltobjgt pget 39int 7 preset un39it pget 39int 0 A parameterized class can also include let de nitions in the function body For example we can lift the pos eld out of the class and use a reference cell instead cl ass makepo i ntcl ass 39i n39i t39i a1 pos 39int l t pos ref 39in39it39ialpos 39in object 1 22 POL YMORPHISM 95 method get pos method set pos pos pos method reset pos end class makepointclass int gt object method get 39int method reset un39it method set 39int gt un39it end The body of the cl ass de nition is not evaluated initiallyiit is evaluated at object instantiation time All po i nt objects will have separate positions let p1 new po int7 val p1 po int7 ltobjgt let p2 new po int7 val p2 po int7 ltobjgt p1set 5 un39it O p2get 39int 122 Polymorphism Class types class expressions and methods can also be polymorphic For example consider the parameterized class makepo i ntcl ass we just de ned If we do not constrain the type of argument we get a type of refer ence cells The syntax of a polymorphic class includes the type parameters in square brackets after the cl ass keyword class a makerefcell x a object val mutable contents x method get contents method set x contents lt x end class a makerefcell a gt object val mutable contents a method get a method set a gt un39it n class 39intref 39int makerefcell 0 class 39intref 39int makerefcell 96 CHAPTER 12 THE OCAML OBJECT SYSTEM let p new 39intref val p 39intref ltobjgt pset 7 un39it O pget 39int 7 class str39ingref string makerefcell class str39ingref string makerefcell let s new str39ingref val s str39ingref ltobjgt sset quotHelloquot un39it O sget string quotHelloquot 123 Inheritance Inheritance allows classes to be de ned by extension For example suppose we wish to de ne a new po i nt class that includes a move method that moves the point by a relative offset The move method can be de ned using the get and set methods To be able to access these methods we need a self parameter like the th39i s object in C or Java The self parameter is de ned after the obj ect keyword We make a new class movabl epo i nt using the inherit keyword to inherit the po i nt class de nition cl ass movabl epo i nt object self inherit point method move off selfset selfget off end class movablepoint object method get int method move int gt unit method reset unit method set int gt unit end let p new movablepoint val p movablepoint ltobjgt 123 INHERITANCE 97 pmove 5 unit pget int 12 1231 Multiple inheritance Classes can also be de ned by inheriting from multiple classes For example let s de ne a point class that also has a color The col or class can be de ned in the normal way type color Black Red Green Blue type color Black Red Green Blue class color object val mutable color Black method getcolor color method setcolor color method reset color lt Black color lt color class color object val mutable color color method getcolor color method reset unit method setcolor color gt unit end let c new color val c color ltobjgt csetcolor Green unit cgetcolor color Green To de ne a colored point we inherit from both classes Objects in this class will have the methods and values de ned in both classes cl ass coloredpoi nt object inherit point inherit color end Characters 63 74 Warning the following methods are overriden by the inherited class reset class coloredpoint 98 CHAPTER 12 THE OCAML OBJECT SYSTEM object val mutable color color method get int method getcolor color method reset unit method set int gt unit method setcolor color gt unit end let cp new coloredpoint val cp coloredpoint ltobjgt cpget int 0 cpgetcolor Note that the compiler produced a warning message when the colored point is created The point and color bath de ne a method called reset Which de nition does the colored point use cpset 7 unit O cpsetcolor Red unit O cpreset unit cpget int 7 cpgetcolor color Black As usual the compiler chooses the last de nition of the method The correct version of the colored point should call both the point and color reset functions The coloredpoi nt method must override the de nition To do this we need to include a name for the object in each of the inherit declarations class coloredpoint 0 j ct inherit point as p inherit color as c method reset preset creset end Characters 64 69 Warning the following methods are overriden by the inherited class cl ass col oredpoi nt 123 INHERITANCE 99 object val mutable color color val mutable pos int method get int method getcolor color method reset unit method set int gt unit method setcolor color gt unit end let cp new coloredpoint val cp coloredpoint ltobjgt cpset 5 1 cpsetcolor Red un39it O cpreset un39it cpget 39int 0 cpgetcolor color Black The compiler still produces a warning message but this time the reset method works correctly 1232 Virtual methods Virtual methods can be used to postpone the implementation of methods for de nition in subclasses For example suppose we wish to make a point that includes a method move to move the object by a relative offset One way would be to inheritance to de ne a new class movabl epo i nt based on the po i nt class Another more general way is to de ne a separate movable class that can be combined by multiple inheritance with any class that implements the get and set methods This class must be declared as V39l rtual because it can t be instantiated the get and set methods are not implemented class virtual movable object self method virtual get int method virtual set int gt unit method move off selfset selfget off i class virtual movable 39 object 100 CHAPTER 12 THE OCAML OBJECT SYSTEM method vi rtual get int method move int gt unit method vi rtual set int gt unit end let m new movable Characters 8 19 One cannot create instances of the vi rtual class movable Now to create the class movabl epoi nt we combine the classes by mul tiple inheritance class movablepoint object inherit point inherit movable end class movablepoint 39 object val mutable pos int method get int method move int gt unit method reset unit method set int gt unit end let p new movablepoint val p movablepoint ltobjgt pset 7 unit O pmove 5 unit O pget int 12 Note that a vi rtual method in OCaml does not mean the same thing as a vi rtual declaration in C In C the vi rtual declaration means that a method can be overridden in subclasses In OCaml all methods are Virtual in this sense The OCaml vi rtual declaration just means that the method de nition is omitted 1233 Subclassing The inherit declarations in a class de nition de ne an inheritance hierar chy In OCaml an object can be coerced to a class type of any of its ancestors Coercions in OCaml must be made explicitly using the gt operator which requires two class types the type of the object and the type of the object after the coercion 123 INHERITANCE 101 let p cp coloredpo int gt po int val p po int ltobjgt pget 39int 0 pgetcolor Characters 0 1 This expression has type point It has no method getcolor If the class type can be inferred the rst type can be omitted let p cp gt po int val p po int ltobjgt In OCaml objects can also be coerced to any class type that has fewer methods For example suppose we want a read onlyquot colored point without the set and setco39or methods class type readonlypoint object method get int method getcolor color end class type readonlypoint object method get int method getcolor color end let rop cp coloredpoint gt readonlypoint val rop funnypoint ltobjgt ropget int ropgetcolor color Re ropset 5 Characters 0 4 This expression has type readonlypoint It has no method set 1234 Superclassing or typecase In OCaml there is no operator to coerce an object to a superclass there is no typecase operator or 39instanceof predicate like in Java So for instance once we coerce a coloredpo int to a po i nt there is no corresponding operator for recovering the col oredpo i nt This kind of problem can arise frequently in some contexts especially when binary functions are de ned over two objects For example suppose 102 CHAPTER 12 THE OCAML OBJECT SYSTEM we wish to implement an equality relation on points The pointequal function should take two objects If both objects have type poi nt and both have the same position the pointequal function should return true If both are coloredpoi nts and have the same position and color it should also return true Otherwise it should return false How can we de ne this function One thing is clear the pointequal function must have type point gt point gt bool because the type of point is not known beforehand If the argument is of type point how can we tell if it is actually a col oredpoi nt The easiest way to solve this problem is to use a trick For each class we add a new method that uses an exception to return the actual value We will call this method typecase and it will have type uni t since it returns the result by exception The point class implements the typecase method as follows class type pointtype object method get int method set int gt unit method reset unit method typecase unit en class type pointtype object method get int method reset unit method set int gt unit method typecase unit end exception Point of pointtype exception Point of pointtype class point object self val mutable pos 0 method get pos method set pos pos lt pos method reset pos lt 0 method typecase raise Point self gt pointtype end class point object val mutable pos int method get int method reset unit method set int gt unit method typecase unit 12ui HVHFRTEANCF 103 end The typecase method raises the Point exception Note that the self parameter must be coerced to poi nttype For the c01 oredpoi nt we perform a similar operation First we de ne the type of colored points and the exception c1ass type coioredpointtype object inherit point inherit coior end c1ass type coioredpointtype object va1 mutab1e coior coior va1 mutab1e pos int method get int method getc010r coior method reset unit method set int gt unit method setc010r coior gt unit method typecase unit end exception CoioredPoint of coioredpointtype exception CoioredPoint of coioredpointtype Next we de ne the class and override the typecase method c1ass coioredpoint object seif inherit point as p inherit coior as c method reset preset creset method typecase raise CoioredPoint seif gt coioredpointtype end Characters 77 82 Warning the foiiowing methods are overriden by the inherited c1ass reset c1ass coioredpoint object va1 mutab1e coior coior va1 mutab1e pos int method get int method getc010r coior 104 CHAPTER 12 THE OCAML OBJECT SYSTEM method reset unit method set int gt unit method setcoior coior gt unit method typecase unit en Now the typecase method can be used to determine the class type of a point 1et p1 new point va1 p1 point ltobjgt 1et p2 coioredpoint va1 p2 coioredpoint ltobjgt 1et p3 p2 gt point va1 p3 point ltobjgt p1typecase Uncaught exception Point p2typecase Uncaught exception CoioredPoint p3typecase Uncaught exception CoioredPoint 3 m 5 At this point we can de ne the poi ntpri nt printing function 1et pointprint p try ptypecase with Point p gt printf quotPoint position dn pget CoioredPoint p gt 1et coior match pgetcoior with Biack gt quotb1ackquot Red gt quotredquot Green gt quotgreenquot Biue gt quotb1uequot in printf quotCoioredPoint position d coior sn pget coior gt raise Invaiidargument quotpointprint va1 pointprint lt typecase unit gt gt unit ltfungt p1set 7 p2setcoior Green unit Listiter pointprint p1 p2 gt point p3 Point position CoioredPoint position 0 coior green 124 FUNCTIONAL OBJECTS 105 ColoredPo int position 0 color green un39it There are two things to note First the po i ntpr39i nt function takes any object with a typecase methodino just points Second we include a de fault exception case if the typecase method returns some other exception the argument is invalid 124 Functional objects In all of the examples we have given so far the methods work by sideeffect OCaml can also be used to implement functional objects where method updates produce new values by copying the self object The syntax for a functional update uses the lt gt notation to produce a copy of the current object with the same type as the current object with updated elds The use of the update operator is importantiit is the only way to preserve the current object s type Let s build a functional version of points We include the pos eld which the set method replaces class point o ject val pos 0 method get 5 method set pos lt pos pos gt end class point object a val pos int method get int method set 39int gt a en let p1 new po int val p1 po i nt ltobjgt p1get 39int let p2 p1set 5 val p2 po i nt ltobjgt p2get 39int 5 Note the type of the set method on an object of type a it takes an integer argument and returns a new object of type a The col or class can also be modi ed so that it is functional CTLAPTER 12 TYJElDCVUMI OBJECT SYSTIIJ 106 class color object val color Black method getcolor color lt color color gt method setcolor color method reset lt color Black gt end class color object a val color color method getcolor color method reset a color gt a method setcolor end What about the coloredpoi nt example For the reset function we need to invoke the reset method from both the point and color super classes There is no syntax to do this directly for this purpose we will need to make use of private methods so that we can name the reset functions class coloredpoint object self inherit point as p inherit color as c method private preset preset method private creset creset method reset selfpresetcreset end Characters 75 80 Warning the following methods are overriden by the inherited class reset class coloredpoint 39 object val color 39 color val pos int method creset a method get int method getcolor color method private preset a method reset a method set int gt a method setcolor color gt end The resulting object has the expected behavior a let p1 new coloredpoint 124FUNCTHMVALOBHIYS va1 p1 co1oredpoint ltobjgt 1et p2 p1set 7 va1 p2 co1oredpoint ltobjgt 1et p3 p2setco1or B1ue va1 p3 co1oredpoint ltobjgt p3get int 7 p3getco1or co1or B1ue p2getco1or co1or B1ack 1et p4 p3reset va1 p4 co1oredpoint ltobjgt p4get int 0 p4getco1or co1or B1ack 108 CHAPTER 12 THE OCAML OBJECT SYSTEM Bibliography 1 L Damas and R Milner Principal type schemes for functional programs 1982 2 M Gordon R Milner and C Wadsworth Edinburgh LCFA Mechanized Logic of Computation volume 78 Springer Verlag 1979 Lecture Notes in Computer Science E Michael Gordon Robin Milner and Christopher Wadsworth Edinburgh LCF a mechanized logic of computation Lecture Notes in Computer Sci ence Vol 78 SpringerVerlag NY 1979 4 Xavier Leroy The Objective Caml System Documentation and User s Manual 2002 With Damien Doligez Jacques Garrigue Didier Remy and Jerome Vouillon Availiable from w ocaml org E Chris Okasaki Redblack trees un a functional setting Journal of Func tional Programming 944 71 4 77 May 1999 E Didier Remy and Jerome Vouillon Objective ML A simple object oriented extension of ML In ACM Symposium on Principles of Program ming Languages pages 40 53 1997 109 31 Pairs and Lists Formal vs Informal Proofs The question of what exactly constitutes a proof of a mathematical claim has challenged philosophers throughout the ages A rough and ready definition though could be this a proof of a mathematical proposition P is a written or sometimes spoken text that instills in the reader or hearer the certainty that P is true That is a proof is an act of communication Now acts of communication may involve different sorts of readers On one hand the reader can be a program like Coq in which case the belief that is instilled is a simple mechanical check that P can be derived from a certain set of formal logical rules and the proof is a recipe that guides the program in performing this check Such recipies are calledformul proofs Alternatively the reader can be a human being in which case the proof will be written in English or some other natural language thus necessarily informal Here the criterial for success are less clearly specified A good proof is one that makes the reader believe P But the same proof may be read by many different readers some of whom may be convinced by a par ticular way of phrasing the argument while others may not be One reader may be particularly pedantic inexperienced or just plain thickheaded the only way to convince them will be to make the argument in painstaking de tail But another reader more familiar in the area may find all this detail so overwhelming that they lose the overall thread All they want is to be told the main ideas because it is easier to fill in the details for themselves Ulti mately there is no universal standard because is no single way of writing an informal proof that is guaranteed to convince every conceivable reader In practice however mathematicians have developed a rich set of conventions and idioms for writing about complex mathematical objects that within a certain community make communication fairly reliable The conventions of 3 Pairs and Lists this stylized form of communication give a fairly clear standard for judging proofs good or bad Because we will be using Coq in this course we will be working heavily with formal proofs But this doesn t mean we can ignore the informal ones Formal proofs are useful in many ways but they are not very efficient ways of communicating ideas between human beings For example consider this statement Theorem plusiassocI forall n m p nat plus n plus m p plus plus n m p Coq is perfectly happy with this as a proof Proof intros n m p induction n as l n reflexivity simpl rewrite 4 IHn reflexivity Qed For a human however it is difficult to make much sense of this If you re used to Coq you can probably step through the tactics one after the other in your mind and imagine the state of the context and goal stack at each point but if the proof were even a little bit more complicated this would be next to impossible Instead a mathematician would write it as in Figure 3 1 The overall form of the proof is basically similar This is no accident of course Coq has been designed so that its induction tactic generates the same subgoals in the same order as the bullet points that a mathematician would write But there are significant differences of detail the formal proof is much more explicit in some ways eg the use of re flexivity but much less explicit in others in particular the proof state at any given point in the Coq proof is completely implicit whereas the informal proof reminds the reader several times where things stand EXERCISE In Li sts v you will find a Coq proof of a theorem called plusicomm which was an exercise last week In a comment following the formal proof write the corresponding informal proof using the informal proof Ofplusiassoc asamodel D EXERCISE In Lists v you will find a careful informal proof of a theorem called beqinatire fl Write a corresponding Coq proof Pairs of Numbers Each constructor of an inductive type can take any number of parametersi none as with true and 0 one as with S or more than one 32 Pairs ofNumbers Proof By induction on n 0 First suppose n 0 We must show plus 0 plusmp plus plus Om p This follows directly from the definition of plus 0 Next suppose n S n with plus n plus mp plus plus nm p We must show plus S n plusmp plus plus S n m p By the definition of plus this follows from S plus n plus mpll S plus plus nm p which is immediate from the induction hypothesis Figure 31 A mathematician s inductive proof Inductive natprod pair Set nat 4 nat 4 natprod This declaration can be read There is just one way to construct a pair of numbers by applying the constructor pair to two arguments of type nat Here are some simple function definitions illustrating pattern matching on twoargument constructors Definition fst p natprod nat match p with l pair x y gt x end Definition snd p natprod nat match p with l pair x y gt y end 3 Pairs and Lists Since pairs are used quite a bit it is nice to be able to write them with the standard mathematical notation X y instead of pair X y We can instruct Coq to allow this with a Notat ion declaration Notation quot x y quot pair x y The new notation is supported both in expressions like fst 3 4 and in pattern matches Definition swapipair p natprod natprod match p with l x y gt y x end EXERCISE Prove sndif stii siswap in Lists v D EXERCISE VAn OPTIONAL Proof fstiswapii sisnd in Lists v D Lists of Numbers Generalizing the definition of pairs a little we can describe the type of lists of numbers like this A list can be either the empty list or else a pair of a number and another list Inductive natlist Set l nil natlist l cons nat 4 natlist 4 natlist For example here is a threeelement list Definition 1123 cons l cons 2 cons 3 nil As with pairs it is more convenient to write lists in familiar mathematical notation The following two declarations allow us to use as an infix cons operator and square brackets as an outfix notation for constructing lists Notation quotx lquot cons x h at level 60 right associativity Notation quot x y quot cons x cons y nil J It is not necessary to fully understand the second line of the first declara tion but in case you are interested here is roughly what s going on The right associat ivity annotation tells Coq how to parenthesize expres sions involving several uses of z so that for example the next three decla rations mean exactly the same thing 33 Lists ofNumbers 29 Definition 1123 1 2 3 nil Definition 1123 l 2 3 nil Definition llZ3H 123 The at level 60 part tells Coq how to parenthesize expressions that involve both z and some other infix operator For example if we define as infix notation for the plus function at level 50 Notation quotx yquot plus x y at level 50 left associativity then will bind tighter than z and l 2 z z 3 will be parsed correctly as 1 2 3 ratherthanl 2 z 3 A number of functions are useful for manipulating lists For example the repeat function takes a number n and a count and returns a list of length count where every element is n Fixpoint repeat n count nat struct count natlist match count with O gt nil S count gt n repeat n countU end The length function calculates the length of a list Fixpoint length l natlist struct l nat match 1 with nil gt O h t gt S length U end The app function concatenates two lists Fixpoint app ll 12 natlist struct ll natlist match ll with nil gt l2 h t gt h app t l end In fact app will be used so pervasively in some parts of what follows that it is convenient to have an infix operator for it Notation quotx yquot app x y right associativity at level 60 3 Pairs and Lists Two more small examples The hd function returns the first element the head of the list while t l tail returns everything but the first element Definition hd l natlist nat match 1 with l nil gt 0 arbitrarily l h t gt h end Definition ti 1 natlist natlist match 1 with l nil gt nil l h t gt t end EXERCISE Complete the definitions of nonzeros oddmembers and countoddmembers in Listsv D EXERCISE Complete the definition of alternate This exercise il lustrates the fact that it sometimes requires a little extra thought to satisfy Coq s requirement that all Eixpoint definitions be obviously terminat ing There is an easy way to write the alternate function using just a single match end but Coq will not accept it as obviously terminating Look for a slightly more verbose solution with two nested match end constructs Note that each match must be terminated by an end B EXERCISE VAn OPTIONAL A bug or multiset is a set where each element can appear any finite number of times One reasonable implementation of bags is to represent a bag of numbers as a list Definition bag natlist Stubs for several bagmanipulating functions count union etc can be found in Lists v Complete them D Reasoning Ab out Lists Just as with numbers simple facts about listprocessing functions can some times be proved entirely by simplification Eor example simplification is enough for this theorem Theorem niliapp i1 forall l natlist 34 Reasoning About Lists 31 because the is substituted into the match position in the definition of app allowing the match itself to be simplified Also like numbers it is some times helpful to perform case analysis on the possible shapes empty or non empty of an unknown list Theorem tlilengthipred forall lnatlist pred length l length tl l Proof intros l destruct l as l n l Case quotl nilquot reflexivity Case quotl cons n l quot reflexivity Qed Notice that the as annotation on the destruct tactic here introduces two names n and l corresponding to the fact that the cons constructor for lists takes two arguments the head and tail of the list it is constructing Usually though interesting theorems about lists require induction for their proofs Proofs by induction over nonnumeric data types are perhaps a little less familiar than natural number induction but the basic idea is equally simple Each lnduct ive declaration defines a set of data values that can be built up from the declared constructors a number can be either 0 or S applied to a number a boolean can be either true or false a list can be either nil or cons applied to a number and a list Moreover applications of the declared constructors to one another are the only possible shapes that elements of an inductively defined set can have and this fact directly gives rise to a way of reasoning about inductively defined sets a number is either 0 or else it is S applied to some smaller number a list is either nil or else it is cons applied to some number and some smaller list etc So if we have in mind some proposition P that mentions a list l and we want to argue that P holds for all lists we can reason as follows First show that P is true of l when l is nil Then show that P is true of l when l is cons n l for some number n and some smaller list l asssuming that P is true for l Since larger lists can only be built up from smaller ones stopping eventually with ni l these two things together establish the truth of P for all lists l For example the associativity of the operation Theorem assiapp forall ll l2 l3 natlist ll l2 l3 ll l2 l3 32 3 Pairs and Lists can be shown by induction on 1 1 Proof intros 11 12 13 induction 11 as l n 11 Case quot11 ni1quot reflexivity Case quot11 cons n 11 quot simp1 rewrite gt 1H11 reflexivity Qed Again this Coq proof is not especially illuminating as a static written documentiit is easy to see what s going on if you are reading the proof in an interactive Coq session and you can see the current goal and context at each point but this state is not visible in the writtendown parts of the Coq proof A humanreadable informal proof needs to include more explicitiin particular it helps the reader a lot to be reminded exactly what the induction hypothesis is in the second case 341 THEOREM For all 11 12 and 13 11 12 13 11 12 13 Proof By induction on 11 0 First suppose 1 We must show 12 13 12 13 which follows directly from the definition of 0 Next suppose 1 n 1 with 11 12 13 11 12 13 We must show r11 11 12 13 n 11 12 13 By the definition of this follows from n 11 1213n 11 1213 which is immediate from the induction hypothesis D For a slightly more involved example of an inductive proof over lists sup pose we define a cons on the right function snoc like this 344 Reasoning About Lists 33 Fixpoint snoc lnatlist vnat struct l natlist match 1 with nil gt v h t gt h snoc t v end and use it to define a listreversing function rev line this Fixpoint rev lnatlist struct l i natlist match 1 with nil gt nil h t gt snoc rev t h end We can now prove that reversing a list doesn t change it s length as follows First we prove a lemma relating length and snoc 342 LEMMA For all numbers n and lists l length snoc l n S length l Proof By induction on l 0 First suppose l We must show length snoc n S length ll which follows directly from the definitions of length and snoc 0 Next suppose l n l with length snoc l n S length 1 We must show length snoc n 1 l n 8 length n 1 l By the definitions of length and snoc this follows from s length snoclnj s s length N which is immediate from the induction hypothesis D Now we can use this lemma to prove the fact we wanted about length and rev 34 3 Pairs and Lists 343 THEOREM For all lists 1 length rev l length l Proof By induction on l 0 First suppose l We must show length rev H length H which follows directly from the definitions of length and rev 0 Next suppose l n l with length rev l length l We must show length rev n 1 l length n 1 l By the definition of rev this follows from length snoc rev l n S length l which by the previous lemma is the same as s length rev 1 s length 1 This is immediate from the induction hypothesis D Obviously the style of these proofs is rather longwinded and pedantic After we ve seen a few of them we might begin to find it easier to follow proofs that give a little less detail overall since we can easily work them out in our own minds or on scratch paper if necessary and just highlight the non obvious steps In this more compressed style the above proof might look more like this 344 THEOREM For all lists 1 length rev l length l Proof First observe that length snoc l n S length l for any 1 This follows by a straightforward induction on 1 The main property now follows by another straightforward induction on 1 using the observation together with the induction hypothesis in the case wherelnl D 35 35 Options 35 Which style is preferable in a given situation depends on the sophistication of the expected audience and on how similar the proof at hand is to ones that the audience will already be familiar with The more pedantic style is usually a safe fallbac EXERCISE Find the section marked A bunch of exercises in Lists v and complete all the proofs you find there D EXERCISE 1 Find a nontrivial equational property involving snoc and 2 Prove it Fill in your theorem and proof in the file Li st s v just after the string Design exercise D EXERCISE VAn OPTIONAL If you did Exercise 333 then prove the theorems countimemberinonzero and removeidecreasesicount in Listsv D Options Here is another type definition that is quite useful in daytoday program ming Inductive natoption Set I Some nat 4 natoption l None natoption We can use natoption as a way of returning error codes from func tions For example suppose we want to write a function that returns the nth element of some list If we give it type nat gtnat l i stanat then we ll have to return some number when the list is too short Fixpoint indexibad n nat l natlist struct l nat match 1 with l nil gt 42 arbitrary l a l gt match beqinat n O with true gt a l false gt indexibad pred n 1 end end On the other hand if we give it type natanatlistanatoption then we can return None when the list is too short and Some a when the list has enough members and a appears at position n Fixpoint index match 1 with n nat l natlist struct l natoption 351 352 353 3 Pairs and Lists l nil gt None l a l gt match beqinat n O with true gt Some a l false gt index pred n l end end Example testiindexl index 0 4567 Some 4 Example testiindexZ index 3 4567 Some 7 Example testiindex3 index 10 4567 None This example is also an opportunity to introduce one more small feature of Coq s programming language conditional expressions Eixpoint index n nat l natlist struct l natoption match l with l nil gt None l a l gt if beqinat n 0 then Some a else index pred n l end Coq s conditionals are exactly like those in every other language with one small generalization Since the boolean type is not built in Coq actually al lows conditional expressions over any inductively defined type with exactly two constructors The guard is considered true if it evaluates to the first constructor in the lnduct ive definition and false if it evaluates to the second EXERCISE Complete the definition of hdiopt in Lists v D EXERCISE ink Prove optionielimihd in Lists v u EXERCISE Define a function beqinatlist for comparing lists of numbers for equality Prove that beqinatlist l 1 yields true for every list 1 Stubs are provided in Lists v D The appl y Tactic This section still needs to be written Please have a look at the corresponding material in Li sts v where you ll nd plenty of descriptive text EXERCISE Prove sillyiex in Listsv without using the simpl tactic D EXERCISE W Prove reviexercisel and beqinatisym in Lists v D EXERCISE W Provide an informal proof of beqinatisym in Li sts v D 37 37 Varying the Induction Hypothesis 37 Varying the Induction Hypothesis One subtlety in these inductive proofs is worth noticing here For example look back at the proof of the appias s theorem The induction hypothesis in the second subgoal generated by the induct ion tactic is 11 12 13 11 12 13 That is it makes a statement about 11 together with the particular lists 12 and 13 The lists 12 and 13 which were introduced into the context by the int ros at the top of the proof are held constant in the induction hypoth esis If we set up the proof slightly differently by introducing just n into the context at the top then we get an induction hypothesis that makes a stronger claim fora11 12 1311 12 13 11 12 13 Use Coq to see the difference for yourself In the present case the differ ence between the two proofs is minor since the definition of the function just examines its first argument and doesn t do anything interesting with its second argument But we ll soon come to situations where setting up the in duction hypothesis one way or the other can make the difference between a proof working and failing EXERCISE Finish the proof of assiapp at the end of Lists v D