### Create a StudySoup account

#### Be part of our community, it's free to join!

Already have a StudySoup account? Login here

# Current Topics in Computer Science CSCI 7000

GPA 3.51

### View Full Document

## 29

## 0

## Popular in Course

## Popular in ComputerScienence

This 154 page Class Notes was uploaded by Allie West II on Thursday October 29, 2015. The Class Notes belongs to CSCI 7000 at University of Colorado at Boulder taught by Staff in Fall. Since its upload, it has received 29 views. For similar materials see /class/231989/csci-7000-university-of-colorado-at-boulder in ComputerScienence at University of Colorado at Boulder.

## Reviews for Current Topics in Computer Science

### What is Karma?

#### Karma is the currency of StudySoup.

#### You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 10/29/15

Current Topics in Computer Science Computational Genomics CSCI 7000005 Debra Goldberg debragoldbergcscoloradoedu An Introduction to Bioinformatics Algorithms wwwbioagorithmsinfo Molecular Biology Primer grunge5quot 5 Eatgamma 1 V 39 39 39 9 V A H i 115 m r quot7 rmunitir tysh rIIa t r Ii Angela Brooks Raymond Brown Calvin Chen Mike Daly Hoa Dinh Erinn Hama Robert Hinman Julio Ng Michael Sneddon Hoa Troung Jerry Wang Che Fung Yung An Introduction to Bioinformatics Algorithms wwwbioagorithmsinfo Life begins with Cell l Plasma membrane Golgi vesicles Mitochondrion Peroxisome Lysosome Rough y A I encdoplasmlc Secretory reticulum Ki vesicle Smallest structural unit of an organism that is capable of functioning independently All cells have some common features An Introduction to Bioinformatics Algorithms wwwbioagorithmsinfo All Life depends on 3 critical molecules DNA Holds information on how cell works RNA Transfers short pieces of information to different parts of cell Provides templates to synthesize into protein Protein Form enzymes that send signals to other cells and regulate gene activity Form body s major components eg hair skin etc An Introduction to Bioinformatics Algorithms wwwbioagorithmsinfo IA 3 are specified linearly DNA and RNA are constructed from nucleic acids nucleotides Can be considered to be a string written in a four letter alphabet A C G TU Proteins are constructed from amino acids Strings in a twentyletter alphabet of amino acids An Introduction to Bioinformatics Algorithms wwwbioagorithmsinfo ICentral Dogma of Biology DNA RNA and the Flow of Information Replication Tr scription ll Information coded in the RNA sequence of base pairs in DNA is passed to molecules of RNA is DNA can replicate Translation l Information in RNA is passed to proteins It never passes from proteins to nucleic acids Protein An Introduction to Bioinformatics Algorithms wwwbioagorithmsinfo DNA An Introduction to Bioinformatics Algorithms wwwbioagorithmsinfo Overview of organizations of life Genes books Chromosomes bookshelves Nucleus library Almost every cell in an organism contains the same sets of books Books represent all the information DNA that every cell in the body needs so it can grow and carry out its various functions An Introduction to Bioinformatics Algorithms wwwbioagorithmsinfo Some Terminology Gene basic physical and functional units of heredity located on chromosomes specific sequences of DNA bases that encode instructions on how to make proteins Genome an organism s genetic material complete set of DNA a bacteria contains about 600000 DNA base pairs human and mouse genomes have some 3 billion An Introduction to Bioinformatics Algorithms wwwbioagorithmsinfo Nucleic Acid Components Nitrogenous Base N is important for hydrogen bonding between bases A adenine with T thymine double Hbond C cytosine with G guanine triple Hbond Sugar Ribose 5 carbon Base covalently bonds with 1 carbon Phosphate covalently bonds with 5 carbon Normal ribose OH on 2 carbon RNA deoxyribose H on 2 carbon DNA dideoxyribose H on 2 amp 3 carbon used in DNA sequencing Phosphate negatively charged An Introduction to Bioinformatics Algorithms wwwbioagorithmsinfo The Pvrimidines The Purines 1 3 L c 1 L H 1 HH C CH3 I CH 0 II Ihgmine CH quotR H C NHdr nllne H T to 1 carbon of either pe ntose to 1 carbon of deoxynbose u quot2 II I 39 n it th H C K I Eylnsine I CH C I G 08 3quot H2H yr N uanlne N G to 1 carbon of either pentose to 1 carbon of either pe ntose An Introduction to Bioinformatics Algorithms wwwbioagorithmsinfo IDNA Stores all information of life 4 letters base pairs AGTC adenine guanine thymine cytosine which pair AT and CG on complimentary strands http Wwwlblgov Educatlon HGP lmages dna rned1urng1t An Introduction to Bioinformatics Algorithms wwwbioagorithmsinfo IDNA continued in mg r rquot39 39f I 5 0N 7 Sugar L quot Phosphate C G 3 9 Base ATCor G 0 5 0 httpWwwbiomiamiedudana104DNA2jpg An Introduction to Bioinformatics Algorithms wwwbioagorithmsinfo Basic Structure WatsonCrick base pair structures JmAr CH H l quot 0 g I I CH 4 r39 rN c K 3 HKN C I K mm C t39 If M 9quot Adenine NCN c 0 Cl39 xtr39quotquot7 H r39quot P K IN C On H C CIH L N 3 humquot 1 Cytusine 7 1 C A quot C N r a Q Guanine H q fl E If cquot j q NICK f quot0 39x 39 N NH j quot Cl 1395 51 l Q H ltl 7 CK quot Phosphate Sugar An Introduction to Bioinformatics Algorithms wwwbioagorithmsinfo IDNA continued DNA has a double helix structure However it is not symmetric It has a forward and backward direction The ends are labeled 5 and 3 after the Carbon atoms in the sugar component 5 AATCGCAAT 3 3 TTAGCGTTA 5 DNA always reads 5 to 3 for transcription replication An Introduction to Bioinformatics Algorithms wwwbioagorithmsinfo IDNA the building blocks of genetic material DNA provides a code consisting of 4 letters for all cellular function Letters in DNA code T An Introduction to Bioinformatics Algorithms wwwbioagorithmsinfo IMUtAsHONs o The DNA can be thought of as a sequence of the nucleotides CAG or T o What happens to genes when the DNA sequence is mutated irmai D S qu lCilCCi I 1 Mu An Introduction to Bioinformatics Algorithms wwwbioagorithmsinfo The Good the Bad and the Silent Mutations can serve the organism in three ways A mutation can cause a trait that enhances the organism s function The Good Mutation in the sickle cell gene provides resistance to malaria A mutation can cause a trait that is harmful sometimes fatal to the The Bad organism Huntington s disease a symptom of a gene mutation is a degenerative disease of the nervous system The A mutation can simply cause no difference in the function of the organism Campbell Biology 5th edition p 255 An Introduction to Bioinformatics Algorithms wwwbioagorithmsinfo Genetic Variation Despite the wide range of physical variation genetic variation between individuals is quite small Out of 3 billion nucleotides only roughly 3 million base pairs 01 are different between individual genomes of humans An Introduction to Bioinformatics Algorithms wwwbioagorithmsinfo DNA replication DNA can replicate by splitting and rebuilding each strand Note that the rebuilding of each strand uses WWW slightly different mechanisms due to the 5 3 asymmetry but each daughter strand is an exact replica of the 339 539 original strand Template stra nds Replication fork DNA polymerase DNA ligase Okazaki fragme ntB Lagging strand Leading Strand 3 httpusersrcncornjkimballmau1tranetBiologyPages DDNARep1icationhtrnl An Introduction to Bioinformatics Algorithms wwwbioagorithmsinfo DNA The Code of Life The structure and the four genomic letters code for all living organisms Adenine Guanine Thymine and Cytosine which pair A T and CG on complimentary strands Human chromosomeS I gt JW 1 3 3 I a l I L 1m E E if 395 m 9 F quot J z 3 nd 15 16 1 Fquot I I t9 3 1 E An Introduction to Bioinformatics Algorithms wwwbioagorithmsinfo Chromosomes Organism base pair Chromosomes Prokayotic Escherichia coli bacterium 4x106 1 Eukaryotic Saccharomyces cerevisiae yeast 135x107 17 Drosophila melanogaster fruitfly 165x108 4 Homo sapiens human 29x109 23 Zea mays corn 50x109 10 Potato 24 Chicken 39 King crab 104 Ophioglossum reticulatum fern 630 An Introduction to Bioinformatics Algorithms wwwbioagorithmsinfo Definition of a Gene 50 kb llt gtl a b Cap Site C PolyA Site 139 1 439 Gene Exon 1 Exon 2 Exon 3 Control regions mRNA 539 ii 339 Regulatory regions up to 50 kb upstream of 1 site Exons protein coding and untranslated regions UTR 1 to 178 exons per gene mean 88 8 bp to 17 kb per exon mean 145 bp lntrons splice acceptor and donor sites junk DNA average 1 kb 50 kb per intron Gene size Largest 24 Mb Dystrophin Mean 27 kb An Introduction to Bioinformatics Algorithms wwwbioagorithmsinfo lcentral nogma Revisited Transcription SPIiCing DNA hnRNA mRNA Nucleus Spliceosome Translation protein 4 Ribosome in Cytoplasm Note Some mRNA stays as RNA ie tRNArRNA An Introduction to Bioinformatics Algorithms wwwbioagorithmsinfo DNA The Basis of Life Humans have about 3 billion base pairs How do you package it into a cell How does the cell know where in the highly packed DNA where to start transcription Special regulatory sequences DNA size does not mean more complex Complexity of DNA Eukaryotic genomes consist of variable amounts of DNA Single Copy or Unique DNA Highly Repetitive DNA Chromosome Packing it in Solenoid loops Nudeosome Base Paks NHGRI lPacking it in II hccmnscuedu Short region of DNA double hem lPacking it in I 39Beads on a string form 0 chmmatin 30mm chromatin fibre ol packed nucleosomes Section of chrcrmosoma in an extended form Condensed section of chromosome naturecom Entire mitotic chromosome An Introduction to Bioinformatics Algorithms wwwbioagorithmsinfo Superstructu re Metaphase chromosome 1400 nm Condensed scaffold associated chromatin Qty Wit 0 9 3 y 0 7 Iquot a frquot 700 nm LO D k9 MR0 lnterphase black L1Cg 1 extended U L w Kli scaffold aSSOCiated Chromosome chromatin scaffod l 0 Lodish et al Molecular Biology of the Cell 5th ed WH Freeman amp Co 30nm chromatin fiber of packed nucleosomes Beads onastringquot form of chromatin Short region LEXNA dOUb39e w my mtgkm it 21quot 2003 An Introduction to Bioinformatics Algorithms wwwbioagorithmsinfo Superstructure Implications DNA in a living cell is in a highly compacted and structured state Transcription factors and RNA polymerase need ACCESS to do their work Transcription is dependent on the structural state SEQUENCE alone does not tell the whole story Transcriptional Regulation Transcnipti n can signal JI Pmnmt Endimgmginm ragiun fgena ufgena RNA An Introduction to Bioinformatics Algorithms wwwbioagorithmsinfo lRNA ribonucleic acid Similar to DNA chemically Usually only a single strand Built from nucleotides AUG and C with ribose ribonucleotides Thyamine is replaced by Uracil tRNA linear and 3D View http WWWCg1ucsfedu home glasfeld tutorial trna trnagif An Introduction to Bioinformatics Algorithms wwwbioagorithmsinfo Types of RNA mRNA carries a gene s message out of the nucleus The type RNA most often refers to o tRNA transfers genetic information from mRNA to an amino acid sequence rRNA ribosomal RNA Part of the ribosome involved in translation siRNA small interfering RNA lnterferes with transcription or translation Recent discovery An Introduction to Bioinformatics Algorithms wwwbioagorithmsinfo Terminologv Promoter A special sequence of nucleotides indicating the starting point for RNA synthesis Terminator Signal in DNA that halts transcription RNA Polymerase ll Multisubunit enzyme that catalyzes the synthesis of an RNA molecule on a DNA template from nucleoside triphosphate precursors An Introduction to Bioinformatics Algorithms wwwbioagorithmsinfo Transcription The process of making RNA from DNA Catalyzed by transcriptase enzyme Needs a promoter region to begin transcription 50 base pairssecond in bacteria but multiple transcriptions can occur simultaneously httpghsgreshamk12orussciencepssciibbiochernnucleicchpt15transcriptiongif An Introduction to Bioinformatics Algorithms wwwbioagorithmsinfo ion Q DNA a RNA Transcript DNA gets transcribed by a protein known as RNA polymerase This process builds a chain of bases that will become mRNA An Introduction to Bioinformatics Algorithms wwwbioagorithmsinfo Transcription continued Transcription is highly regulated Most DNA is in a dense form where it cannot be transcribed To begin transcription requires a promoter a small specific sequence of DNA to which polymerase can bind 40 base pairs upstream of gene Finding these promoter regions is a partially solved computational problem related to motif finding There can also be repressors and inhibitors acting in various ways to stop transcription This makes regulation of gene transcription complex to understand An Introduction to Bioinformatics Algorithms wwwbioagorithmsinfo Transcriltion mm 9 mm Transcription occurs in the nucleus RNA polymerase reads promoter sequence and opens a small portion of the double helix exposing DNA bases RNA polymerase reminding promoter region unwinding oil DNA 1 basame binding site RNA polymerase II unwinds helixjust ahead of active site During transcription DNA helix reforms as RNA forms When the terminator sequence is met polymerase halts and releases both the DNA template and the RNA An Introduction to Bioinformatics Algorithms wwwbioagorithmsinfo Definition of a Gene 50 kb llt gtl a b Cap Site C PolyA Site 139 1 439 Gene Exon 1 Exon 2 Exon 3 Control regions mRNA 539 ii 339 Regulatory regions up to 50 kb upstream of 1 site Exons protein coding and untranslated regions UTR 1 to 178 exons per gene mean 88 8 bp to 17 kb per exon mean 145 bp lntrons splice acceptor and donor sites junk DNA average 1 kb 50 kb per intron Gene size Largest 24 Mb Dystrophin Mean 27 kb An Introduction to Bioinformatics Algorithms wwwbioagorithmsinfo Terminologv Exon A portion of the gene that appears in both the primary and the mature mRNA transcripts Intron A portion of the gene that is transcribed but excised prior to translation An Introduction to Bioinformatics Algorithms wwwbioagorithmsinfo 5 Exonl GU Intron A AG E30112 3 Branch site U4U5LTIE complex An Introduction to Bioinformatics Algorithms wwwbioagorithmsinfo Splicing Eukaryotes Unprocessed RNA is composed of Introns and gemefahrm msmal amt Exons Introns are exonlemnzemmE removed before the rest IS mmmmm expressed and converted 4 to protein lilmar m Sometimes alternate mquot1 quot splicings can create Will s splicing different valid proteins 1 39 39 mammalsgrif n A typical Eukaryotic gene Emmmzkm has 420 introns Locating them by analytical means is not easy An Introduction to Bioinformatics Algorithms wwwbioagorithmsinfo RNA secondary structures Some forms of RNA can form secondary structures by pairing up with itself This can change its properties dramatically A 3 DNA and RNA can bind with 0100p what each other Anticodon 00p Anueodon tRNA linear and 3D View httpWwwcglucsfeduhomeglasfeldtutorialtrnatrnagif An Introduction to Bioinformatics Algorithms wwwbioagorithmsinfo lGenomic Information Cells store all information to replicate itself Human genome is around 3 billions base pair long Almost every cell in human body contains same set of genes But not all genes are used or expressed by those cells Proteins An Introduction to Bioinformatics Algorithms wwwbioagorithmsinfo Proteins Workhorses of the Cell 20 different amino acids different chemical properties cause the protein chains to fold up into specific threedimensional structures that define their particular functions in the cell Proteins do all essential work for the cell build cellular structures digest nutrients execute metabolic functions Mediate information flow within a cell and among cellular communities Proteins work together with other proteins or nucleic acids as quotmolecular machinesquot structures that fit together and function in highly specific lockandkey ways An Introduction to Bioinformatics Algorithms wwwbioagorithmsinfo Proteins Complex organic molecules made up of amino acid subun s 20 different kinds of amino acids Each has a 1 and 3 letter abbreviation httpwwwindstateeduthcmemwkinqamino acidshtml for complete list of chemical structures and abbreviations Proteins are often enzymes that catalyze reactions Also called polypeptides Some other amino acids exist but not in humans An Introduction to Bioinformatics Algorithms wwwbioagorithmsinfo lUncovering the code Scientists conjectured that proteins came from DNA but how did DNA code for proteins If one nucleotide codes for one amino acid then there d be 41 amino acids However there are 20 amino acids so at least 3 bases codes for one amino acid since 42 16 and 43 64 This triplet of bases is called a codon 64 different codons and only 20 amino acids means that the coding is degenerate more than one codon sequence code for the same amino acid An Introduction to Bioinformatics Algorithms wwwbioagorithmsinfo MODEM mu Translation The process of going SECONDPDSITION from RNA to J c I 1 u polypeptide U f Three base pairs of g h h j RNA calledacodon E c m s M an correspond to one j amino acid based on a E A quotquotquot39quot quot39quotquot39 fixed table quot quotquot39 J Always starts with a E Methionine and ends with a stop codon An Introduction to Bioinformatics Algorithms wwwbioagorithmsinfo T erminologv Codon The sequence of 3 nucleotides in DNARNA that encodes for a speci c amino acid mRNA messenger RNA A ribonucleic acid whose sequence is complementary to that of a proteincoding gene in DNA Ribosome The organelle that synthesizes polypeptides under the direction of mRNA rRNA ribosomal RNAThe RNA molecules that constitute the bulk of the ribosome and provides structural scaffolding for the ribosome and catalyzes peptide bond formation tRNA transfer RNA The small Lshaped RNAs that deliver specific amino acids to ribosomes according to the sequence of a bound mRNA An Introduction to Bioinformatics Algorithms wwwbioagorithmsinfo IRNA 9 Protein Translation Ribosomes and transferRNAs tRNA run along the length of the newly synthesized mRNA decoding one codon at a time to build a growing chain of amino acids peptide The tRNAs have anticodons which complimentarily match the codons of mRNA to know what protein gets added next An Introduction to Bioinformatics Algorithms wwwbioagorithmsinfo Terminologv Anticodon The sequence of 3 nucleotides in tRNA that recognizes an mRNA codon through complementary base pairing Cterminal The end of the protein with the free COOH Nterminal The end of the protein with the free NH3 An Introduction to Bioinformatics Algorithms wwwbioagorithmsinfo The ribosome continually binds tRNA joins the amino acids together and moves to the next location along the mRNA The proper tRNA is chosen by having the corresponding anticodon for the mRNA s codon The tRNA then transfers its aminoacyl group to the growing peptide chain For example the tRNA with the anticodon UAC corresponds with the codon AUG and attaches methionine amino acid onto the peptide chain An Introduction to Bioinformatics Algorithms wwwbioagorithmsinfo 3 Amino acid attachment Site Amino acid attachment site amp quot Iquot 339 A n 419 Anticadon t Anticod on a ll 6 CHIN Jolson Wadquot torgnarl w An Introduction to Bioinformatics Algorithms wwwbioagorithmsinfo Terminologv Molecular chaperone Protein that binds to unfolded or misfolded proteins to refold the proteins in the quaternary structure An Introduction to Bioinformatics Algorithms wwwbioagorithmsinfo Transation continued EDS ibom me 235455434 Proteins as 7 t MA arriving A lane at Am inagilyc aside Binding 539 EDS Elbosame 165 21 Proteins Movement of the ribosame 59 Codnn Cudun ans ans httpwongscrippseduPIXribosornejpg An Introduction to Bioinformatics Algorithms wwwbioagorithmsinfo The Central Dogma revisited EDNA WW Replication Infbrmatio n1 DNA dup cates DNA lnfnxrll39naliun i M Transcription RNA Sj39 bh sis nucleus cytoplasm nuclear mvclope Translation Protein synthesis Protein Protein An Introduction to Bioinformatics Algorithms wwwbioagorithmsinfo Protein Synthesis Summary There are twenty amino acids each coded by three basesequences in DNA called codons This code is degenerate The central dogma describes how proteins derive from DNA M 9 mRNA 9 splicing 9 rm The protein adopts a 3D structure specific to it s amino acid arrangement and function KMA H amino acids 39l f IRMA l quotS C gl O 3 x39citeinsLMAMicmmIl 9quot I 39 Protein synthesis An Introduction to Bioinformatics Algorithms wwwbioagorithmsinfo Polypeptide v Protein A protein is a polypeptide however to understand the function of a protein given only the polypeptide sequence is a very difficult problem Protein folding an open problem The 3D structure depends on many variables Current approaches often work by looking at the structure of homologous similar proteins Improper folding of a protein is believed to be the cause of mad cow disease htth wwwsangeracukz Users sgj thesis nodelhtml for more information on folding An Introduction to Bioinformatics Algorithms wwwbioagorithmsinfo Protein Folding Proteins are not linear structures though they are built that way The amino acids have very different chemical properties they interact with each other after the protein is built This causes the protein to start fold and adopting it s functional structure Proteins may fold in reaction to some ions and several separate chains of peptides may join together through their hydrophobic and hydrophilic amino acids to form a polymer An Introduction to Bioinformatics Algorithms wwwbioagorithmsinfo Proteins tend to fold into the lowest free energy conformation Proteins begin to fold while the peptide is still being translated Molecular chaperones hsp60 and hsp 70 work with other proteins to help fold newly synthesized proteins Much of the protein modifications and folding occurs in the endoplasmic reticulum and mitochondria An Introduction to Bioinformatics Algorithms wwwbioagorithmsinfo Protein folding secondary structure Proteins bury most of its hydrophobic residues in an interior core to form an a helix An Introduction to Bioinformatics Algorithms wwwbioagorithmsinfo Protein folding secondary structure Most proteins take the form of secondary structures or helices and 3 sheets Eli h An Introduction to Bioinformatics Algorithms wwwbioagorithmsinfo Protein Folding cont d The structure that a protein adopts is vital to it s chemistry Its structure determines which of its amino acids are exposed carry out the protein s function Its structure also determines what substrates it can react with Prlmary protein Slrumure gig is sequence or a chain cl amino acids Amino Acids Alpha helix Secondary prolniin strucle occurs when the sequence of ammo eons 515 llnk d by hydrogen macs PlEaIEd shes Tertiary proloin structure a s In emacucne ar pl a 1 Demsen alpha names and p 55 ea sneeze Quaternary protein autumn is a DI39OIEIn consisting or more man nne mmmmmmmmmm am gt Professor John R Black University of Colorado at Boulder Fall 2002 CSCI 6268 Notes on Groups and RSA This stuff is not in our textbooks These are some rough notes for the lectures we did recently concerning groups and related notions in number theory and RSA cryptography De nition 1 A group is a nonempty set C along with a binary operation a We must have that G is closed under the operation a and we must have the following three properties 0 Associative Property for all ab c E G a o 12 c a 01 0 0 Existence of an identity element there exists some e E G such that for all a E G a o e e O a a o Inverses for all elements for all a E G there exists some I E G such that ab ba e If a group G is also commutative we call C an Abelian groupi Examples of groups are Z under addition Q under addition 1R under additioni But Z under multiplication is NOT a group inverses exist only for 1 and 71 If you take Q 7 0 you get a group under multiplication check this FINITE GROUPSi The groups we just mentioned are nite But there can be nite groups tool The simplest example is the group G 0 under addition check this But that7s not very interesting It turns out that Z is a group under addition mod n for any n gt 1 We write this group as Zn and its elements are 01 n 7 1 The inverse of any element a 6 Zn is simply n 7 a and the identity element is 0 of course For multiplication in Z we have to be careful If we allow 0 to be in our group we won7t have an inverse for it so we must disallow it If we take the set 12 p 7 1 where p is prime and the operation is multiplication mod p we get a group We didn t prove this but its a fact We write this group as Z where the asterisk reminds us that the operation in the group is multiplicationi What happens if we try doing multiplication modulo some composite Say we try doing multi plication mod n for some n gt 1 As we saw in class we have to be careful once more any number between 1 and n 7 1 that is not relatively prime to n will not have an inverse Take for example the element 2 and try to nd its inverse mod 4 1t doesnlt have one So when we speak of the group we mean the set 1 S i S n 7 1 z 1 Then everything will have an inverse For example Z15 1 24 7 8 11 13 14 and the operation is of course multiplication mod 15 You can nd all the inverses in this set as a good exercise we did it in class In general how do you go about nding inverses in Z Well for any element a E Z you need to nd some element 12 such that ab 1 You know that a n 1 by de nition of So using Euclidls extended algorithm we nd two integers u and 1 such that au nv 1 and u E 1 n 7 1 with un 1 We know we will ALWAYS succeed in doing this though we didn t have time to prove it Now just set I u and you7re donei That is the inverse of ai And remember sometimes a number is its own inverse for example 14 is the inverse of 14 in the group Zia How many elements are there in Z The answer is of course the number of elements in the set 1 S i S n 7 1 i n 1 So what we re really asking is how many numbers are there between 1 and n 7 1 which are relatively prime to n This is precisely the de nition of This is called the Euler totient function A few handy facts about this function 0 If p is prime then p 7 1 0 If p and q are distinct primes then 45pq P 1 1 0 If p is prime then p5 7 ps l Notice that this rule implies the rst rule above just set 8 1 We didnlt present this rule in class because we7re not going to need it for RSA LAGRANGE S THEOREM We had a really famous result in lecture called Lagrange7s Theorem It says something like this let G be a nite group of n elements and with operation 0 Then for any a E a a o a o o a n appearances of a is equal to 6 Where 6 is the identity element of the group If we were in Z under addition then 6 would be 0 for example We did some examples of Lagrangels Theorem with various groups including Z15 You can do them yourself for additional practice if you li e PUBLIC KEY CRYPTOGRAPHY Suppose Alice and Bob want to send information privately between each other but there is some evil adversary trying to learn the content of the communications 1f Alice and Bob share a piece of secret information called the key then this is not too hard But without this its quite unclear how they can hope to succeed The solution is called PublicKey Cryptography And the most wellknown and widelyused method is called RSA after its inventors Rivest Shamir and Adleman Here s how RSA works Alice chooses two primes p an 4 an sets n pq Alice also selects some number 6 which is relatively prime to P1q 71 She then computes d as the inverse of e in the group Z2500 That should be confusing to you why in this weird group It will become clear later She then publishes to the world the value en She wants EVERYONE to know this value But she guards the value d with her life No one gets to know this number but Alice Now when Bob wishes to send some message encoded as a number M E Z he computes C Me mod n and sends this to Alice When Alice receives C she computes Cd mod n and gets M which is Bob s original message There are a couple of things to clear up here First how do we know that M8 mod n is going to always be M again And how do we know that some adversary cannot compute M from just having the public info 6 n along with C And how do we do all this efficiently in software Those modular exponentiations look computationally difficult We now address each of these issues First why is M8 equal to M in the group Z This uses some of the rules we have been working on this week Lets go through it We know that e and d are inverses in a dz erent group 7 That means that in Z we know ed 1 for some 16 So what happens when we take Mad M1k n in the group Z2 Med M1k n M MWW M M nk M 1k M Can you see why we said that Ma 1 above It s Lagrange7s theoreml Remember is exactly the size of Z so we must get 1 when we raise ANY group element to that power So the method works Now why is it secure Well that would take us a long time to answer but the short answer is that it is BELIEVED to be as hard to crack RSA as it is to factor 7L And factoring is believed to be hard when p and q are large primes Finally the last question how do we do these computations quickly on a computer Everything is easy except for one part raising a big number to a big power seems like it would take a long time The advantage we have here is that we are doing all arithmetic modulo n so we can always keep our intermediate results between 1 and n 7 1 if we just keep applying the modulus we may go as high as n2 just before we apply the modulus but no higher than this Computing Me mod n is called the modular exponentiation problem77 and we need a way to solve it quickly on a computer The trick to compute this stuff fast is called repeated squaring77 1 went over it in lecture but llll give you an example quickly to remind you how it works suppose we have to compute 513134 mod 2911 First we convert the exponent into binary 134 is 100001102 27 22 21 Then we begin squaring 513 as needed Each time we square we reduce modulo 2911 because we want to keep the numbers as small as possible for the sake of ef ciency 5131 mod 2911 513 5132 mod 2911 1179 5134 mod 2911 11792 mod 29111494 5138 mod 2911 14942 mod 2911 2210 51316 mod 2911 22102 mod 2911 2353 51332 mod 2911 23532 mod 2911 2798 51364 mod 2911 27982 mod 29111125 513123 mod 2911 11252 mod 2911 2251 Now we re set up to compute 513134 mod 2911 using the numbers above 513134 mod 2911 513242221 mod 2911 7 51312342 mod 2911 513123 5134 5132 mod 2911 2251 1494 1179 mod 2911 1622 This method is vastly more ef cient than trying to compute 513134 mod 2911 by direct means But we could do it quot0 be 11 513A134 n 14312876534254347708236005439153387158141857888266632413677053814466 78466106413785650388051707107085144135834447237365101215481207496429 59097246164667876539907161872489529764802711563611566730260838080909 092 46461 79702273716679964291490884194 82156853196197323518424150205818670259116078782137840072919174086583 981864660548090303679489 11 Z 2911 1622 But if you tried something like 171241i 712921275881999911 rnod 99128447125111 even he would fail You would HAVE to use modular exponentiationl In case youlre interested7 the answer is 57233387830710 Practical Theorem Proving with IsabelleIsar Lecture Notes Jeremy Siek May 2 2007 CONTENTS 1 Isabelle s functional language 4 11 Natural numbers integers and booleans 4 12 De nitions nonrecursive 5 13 Let expressions 5 14 Pairs 5 15 Lists 5 16 Records 6 17 Lambdas anonymous functions 6 18 Conditionals if and case 6 19 Datatypes and primitive recursion 7 191 Exercises 7 2 The Isar proof language 7 21 Overview of lsar s syntax simpli ed 7 22 Propositional reasoning 8 23 Isar shortcuts 10 24 Universal and existential quanti ers 11 241 Exercises 12 25 Case analysis of datatypes 12 26 Notes 12 3 Induction 13 31 Mathematical induction 13 311 Exercises 13 32 Structural induction 14 4 More logical reasoning 15 41 Negation contradiction and false 15 5 Generalizing for induction 16 6 Mutual recursion and induction 18 7 Case study compiling to a stack machine 20 71 The compiler is correct 21 72 Notes 22 8 Sets 22 9 Finite sets 24 10 Case study automata and the Pumping Lemma 25 101 Properties of the extended transition function 26 102 Some properties of the take and drop string functions 27 103 The Pumping Lemma 27 11 Inductively de ned sets and a graph example 30 111 Inductive de nition of a path through a graph 30 112 The strongly connected relation is an equivalence 31 12 Case study the simply typed lambda calculus 33 121 Syntax of the simply typed lambda calculus 34 122 Operational semantics with evaluation contexts 34 123 Creating fresh variables 36 124 Welltyped expressions 37 125 Properties of substitution 38 126 Properties of environments and rule induction 40 127 The substition lemma 41 128 Inversion rules and canonical forms 42 129 Subject reduction 43 121Decomposition 45 121Progress and preservation 48 1212Type safety 48 13 Total recursive functions 49 131 The Fibonacci function 49 132 Case study Euclid s Algorithm 50 133 Merge sort 50 134 Substitution and strong induction 52 135 DepthFirst Search 54 136 Notes 55 14 Metatheory of Propositional Logic 55 141 Formulas and their meaning 56 142 Axioms and proofs 56 143 Basic properties of the proof system 57 144 Completeness 60 Abstract This document is the lecture notes for the course Practical Theorem Proving with IsabelleIsar A lemma introduces a proposition followed by a proof Isabelle has several automatic procedures for generating proofs one of which is called simp short for simpli cation The simp procedure applies a set of rewrite rules that is initially seeded with a large number of rules concerning the builtin objects lemma mosttrivialsimp True by simp 1 ISABELLE S FUNCTIONAL LANGUAGE This section introduces the functional language that is embedded in Isabelle The func tional language is closely related to Standard ML 11 Natural numbers integers and booleans Isabelle provides Peanostyle natural numbers There are two constructors for natural numbers 0 and Suc n where n is a previously constructed natural number Numerals such as 1 are shorthand for the appropriate Peano numeral in this case Suc 0 lemma Suc 0 1 by simp Isabelle also provides the usual arithmetic operations on naturals such as and The doublecolon notation ascribes a type to a term lemma 1 2 3nat by simp lemma 2 gtk 3 6nat bysimp Isabelle provides a division function for naturals called div that takes the oor of the result this ensures that the result is a natural number and not a real number lemma 3 div 2 1nat by simp The mod function gives the remainder lemma 3 mod 2 1nat by simp Isabelle also provide integers lemma 72 71int by simp Confusingly the numerals such as 1 are overloaded and can be either naturals or inte gers depending on the context It is sometimes necessary to use type ascription to tell Isabelle which you want The following are examples of Boolean expressions lemma True A True True by simp lemma True A False False by simp lemma True V False True by simp lemma False V False False by simp lemma a True Falsebool by simp lemma False True by simp lemma V x x x by simp lemma 3 x x 1 bysimp 12 De nitions nonrecursive constdefs xor bool bool bool xorAB E AA B V AAB lemma xor True True False by simp add xordef Add the xor de nition to the default set of simpli cation rules declare xorde simp 13 Let expressions A let expression gives a name to value The name can be used anywhere after the in ie anywhere in the body of the let lemma let x 3 in x gtk x 9nat by simp 14 Pairs Pairs are created with parentheses and commas The fst function retrieves the rst ele ment of the pair and snd retrieves the second lemma let p 23nat x nat infst p 1 snd p by simp 15 Lists A list can be created using a comma separated sequence of items all of the same type enclosed in square brackets The empty list is written The operator adds an element to the front of a list aka cons lemma let I 123nat list in hd l 1 A tl l 23 bysimp lemma 123H 123 bysimp lemma length 123 3 by simp Section 38 of HOL The basis of HigherOrder Logic documents many useful functions on lists and lemmas concerning properties of these functions 16 Records A record is a collection of named values similar to structs in C and records in Pascal The following is an example declaration of a point record record point x coord int ycoord int The following shows the creation of a record and accessing a eld of the record The Isabelle notation is somewhat unusual because the typical dot notation for eld access is not used and instead the eld name is treated as a function Some care must be taken when choosing eld names because they become globally Visible and will con ict with any other uses of the names So for example it would be bad to use X and y for the eld names of the point record constdefs pt point pt E xcoord 3 ycoord 7 lemma xcoord pt 3 by simp add ptdef The record update notation shown below creates a copy of a record except for the indi cated value lemma xcoord thx coord4 4 by simp add ptdef 17 Lambdas anonymous functions lemma x x x 1 2nat by simp 18 Conditionals if and case lemma ifTrue then 1 else 2 1 bysimp lemma case 1 of 0 False Suc m a True by simp 19 Datatypes and primitive recursion datatype a List Nily Consy a a List consts app a List a a List a a List primrec app Nily ys ys app Consyxxs ys Corist app xs ys Note that one of the arguments in the recursive call must be a part of one of the parameters lemma app Consy 1 Consy 2 Nily Consy 3 Nily Consyl Consy 2 Consy 3 Nily by simp 191 Exercises De ne a function that sums the rst 11 natural numbers 2 THE ISAR PROOF LANGUAGE This section describes the basics of the Isar proof language 21 Overview of Isar s syntax simplified A lemma or theorem starts with a label followed by some premises and a conclusion The premises are introduced with the assumes keyword and separated by and Each premise may be labeled so that it can be referred to in the proof The conclusion is introduced with the shows keyword If there are no premises then the assumes and shows keywords can be left out The following is a simpli ed grammar for Isar proofs proof proof method statement qed by method statement fix variable I assume proposition I from fact have proposition proof I from fact show proposition proof proposition label string fact label method 7 7 this rule fact I simp blast I auto induct variable I The Show statement establishes the conclusion of the proof whereas the have statement is for establishing intermediate results 22 Propositional reasoning The rst example will demonstrate the use of the congI rule to prove a conjunction a logical and The congI rule is shown below The horizontal bar is used to separate the premises from the conclusion P Q conJI W The rule can equivalently be rendered in English as follows cond If P and Q then P A Q In the following example we use the cond rule twice Each time we supply the necessary premises using the from clause and make sure to specify the premises in the expected order lemma coan assumes p P and q Q shows P A Q A P proof 7 from q p have qp Q A P by rule cond from p qp showP A Q A P by rule cond qed The above proof is an example of forward reasoning We start with basic facts like P and Q and work up towards proving the conclusion Isabelle also supports backward reasoning where the focus is on decomposing the goal the conclusion into smaller subgoals The following is a proof of the same proposition as above but this time using backward reasoning We can apply the cond rule in reverse by using it as an argument to the proof form The proposition you are trying to prove should match the conclusion of the rule The resulting proof state will have a subgoal for each premise of the rule Each subgoal is proved with a Show statement and the subproofs are separated with next The goals window shows the list of subgoals thm co nd lemma assumes p P and q Q shows P A Q A P proof rule cond from p show P by this next show Q A P proof rule conjl from q show Q by this next from p show P by this qed qed The this method resolves the goal using the current facts in the from clause The next example demonstrates how to prove an implication and make use of conjunctions using the following rules P 7 AQ lt t2gtP Con UnC 7 J Q P A Q im I 7 con39unct P P Q J The following proof uses a mixture of forward and backward reasoning The choice be tween forward or backward reasoning depends on what you are trying to prove Use whichever style seems more natural for the situation lemma0natltaAaltbgtaaltbb proof rule impI assumex 0lt aAalt b from x have za 0 lt a by rule conjunctl from x have ab 1 lt b by rule conjunth from za ab have aa 11 lt 1b by simp from ab have bb Mb lt bb by simp from an bb show 11 lt bb by arith qed Modes ponens lemma assumes ab A a B and a A shows B by rule mp Disjunction introduction lemma assumes a A shows A V B by rule dist lemma assumes b B shows A V B by rule distZ Reasoning by cases lemma assumes ab A V B and ac A a C and be B a C shows C proof 7 note ab moreover assume a A from ac a have C by rule mp moreover assume b B from be b have C by rule mp ultimately show C by rule disjE qed See the manual Isabelle s Logics HOL section 22 for a complete list of the inference rules 23 Isar shortcuts Isar has lots of shortcuts this refers to the fact proved by the previous statement then from this hence then have thus then show with fact from fact and this by this by rule where Isabelle guesses the rule A sequence of facts that will be used as premises in a statement can be grouped using moreover and then fed into the statement using ultimately The order of the facts mat ters lemmaA AB gtB AA proof rule impI assume ab A A B hence B by rule conjunth moreover from ab have A ultimately show B A A by rule cond qed Equational reasoning is made more succinct with the combination of also and nally 10 lemma assumes ab 1 b and be b c and cd c d shows a d proof 7 have a b by rule ab also have c by rule be also have d by rule cd nally show a d qed 24 Universal and existential quantifiers lemma assumes a V x P Qx shows P a V x Q x proof rule impI assume p P show V x Q x proof rule alll fix x from a have pq P Qx by rule allE from pq p show Q x by rule mp qed qed Isabelle s elimination rule for eXistentials eXE is a little funky to understand but Isar provides a nice obtain form that makes it straightforward to use eXistentials lemma assumes e 3 x P A Qx shows P A 3 x Qx proof rule cond from 8 obtain x where p P and q Qx by blast from p show P next from 8 obtain x where p P and q Qx by blast from q show 3 y Qy by rule exI qed constdefs divisibleby mat a mat bool l 8080 80 xlyEH kxky declare divisiblebydef simp lemma divisiblebytrans assumes ab 1 bnat and be b cnat shows a l c proofsimp from ab obtain m where m a m gtk b by auto from be obtain 11 where n b n gtk c by auto frommnhaveamncbyauto thus 3 k a ks cby rule exI qed lemma divisiblebymodz a l b a mod b 0 by auto 241 Exercises Show that division by a positive natural commutes over addition for natural numbers when the numbers being added are evenly divisible by the denominator Hint you may need to use a lemma from Isabelle s Nat theory 25 Case analysis of datatypes If you have a value of a datatype it must have come from one of the constructors for the datatype Isabelle provides a cases rule that generates a subgoal replaces the value that you chose for case analysis with one of the constructors As an example we ll use case analysis to prove a simple property of the drop function from Isabelle s List theory The drop function is just the tail function tl applied 71 times For reference the following is the de nition of drop drop 71 H H drop 11 xxs case n 0f0 i xxs Suc m i drop mxs lemma drop 11 1 x5 drop 11 tl x5 proof cases x5 assume x5 thus drop 11 1 x5 drop 11 tl x5 by simp next fix a list assume x5 a list thus drop 11 1 x5 drop 11 tl x5 by simp qed 26 Notes The book How to Prove It 12 has lots of good examples and advice concerning logical reasoning and proofs Some of the examples from this section and later ones were adapted from that book 3 INDUCTION 31 Mathematical induction The principle of mathematical induction says that if you want to prove some property of the natural numbers prove the property for 0 and assuming the property holds for an arbitrary n prove that the property also holds for n 1 P11 P0 n PSucn Pn The following is a closed form equation for the sum of the rst 11 odd numbers 132n71 712 The lefthand side of the equation can be formalized as recursive function consts sumodds mat a nut primrec sumodds 0 0 sumodds Suc n 2 gtk Suc n 7 1 sumodds n We can then prove the closed form equation by mathematical induction lemma sumodds n n gtk n proof induct 11 show sumodds 0 0 gtk 0 by simp next fix 1 assume IH sumodds n n gtk n have sumodds Suc n 2 gtk Suc n 7 1 sumodds n by simp alsowithIHhave 2 Sucn 71 11 nbysimp alsohave nn2nlbysimp nally show sumodds Suc n Suc n gtk Suc n by simp qed 311 Exercises 1 Show that n n l is even More speci cally sow that n n l 7 2 13 2 Formulate a closed form equation for the summation of the rst n natural numbers Prove that te closed form is correct using mathematical induction 3 Formulate a closed form equation for summations of the form 12 22 7 1232 7 22 12 42 7 2 7 12 and prove by mathematical induction that the equation is true 32 Structural induction Mathematical induction is really just structural induction for natural numbers which are created from a datatype with constructors zero and successor In general we can perform structural induction on any datatype For example the induction rule for the list datatype is P list P l t 7 H Au ls Palist P list To prove some property about lists we prove that the property is true of the empty list and we prove that assuming the property is true for an arbitrary list we prove that the property is true of the list with an element added to the front thm appendNil thm appendCons lemma appendassoc x5 ys zs x5 y5 zs proof induct x5 show ys zs H ys zs roof 7 have ys zs ys zs bysimp also have H ys zs bysimp nally show thesi5 qed next fix x x5 assume IH x5 ys zs x5 y5 25 show xxs ys zs xxs ys zs proof 7 have xxs ys zs xxs ys 25 by simp also have xxs ys 25 using H by simp also have xxs ys 25 by simp also have xxs ys 25 by simp nally show thesi5 qed qed Homework Exercise 241 from the IsabelleHOL tutorial concerning binary trees and the relationship between the atten mirror and list reversal functions 4 MORE LOGICAL REASONING 41 Negation contradiction and false To prove a negation assume the unnegated proposition and then try to reach a contradic tion prove False If you ve proved both A and not A then you ve proved False lemma assumes 30c x gtk x y 13 andyy 7 4 shows x 7 3nat proof rule notl assume x 3 with 30c havey 4 by simp with y show False by rule notE qed You can prove anything from False lemma 2nat 3 4nat proof rule impI assume 1 2nat hence False by simp thus 3 4nat by rule FalseE qed To prove an if and only if written L Prove that the lefthandside implies the righthand side and vice versa lemma R a C A S a C R V S a C proof rule if assume a R a C A S a C from a show R V S a C by blast next assume a R V S a C thus R a C A S a C by blast qed If and only if elimination lemma assumes A B andA shows B by rule m1 lemma assumes A B and B shows A by rule m2 5 GENERALIZING FOR INDUCTION consts reverse a list a a list primrec reverse reverse xxs reverse xs x Here s a more ef cient version of reverse consts itrev a list a a list a a list primrec itrev ys ys itrev xxs ys itrev xs xys We try to prove that itrev produces the same output as reverse lemma itrev xs reverse xs proof induct xs show itrev l reverse by simp next fix a xs assume IH itrev xs reverse xs have itrev axs itrev xs 1 by simp 7 Problem the induction hypothesis does not apply show itrev axs reverse axs oops Often times generalizing strengthening what you want to prove will allow the induction to go through Why does generalizing help instead of make it harder In a proof by induction in the induction step you get to assume what you are trying to prove for the subproblem Now the stronger the thing you are proving the more you get to assume about the subproblem So often times when doing proofs by induction proving a stronger statement is easier than proving a weaker statement When using structural induction universally quantify all variables other than the induction variable lemma V ys itrev xs ys reverse xs ys proof induct xs show V ys itrev ys reverse H ys by simp next fix a xs assume IH V ys itrev xs ys reverse xs ys show V ys itrev axs ys reverse axs ys 16 proof rule alll fix ys have itrev axs ys itrev xs ays by simp also from H have reverse xs ays by rule allE also have reverse a xs ys by simp nally show itrev a xs ys reverse a xs ys by simp qed qed constdefs divides mat a mat bool l 8080 80 xlyEH kkxy declare dividesdef simp constdefs isGCD mat a mat a mat bool is gcd of and 404040 39 k is gcd ofm and n E km A kln A V q qlm A qln 7 qlk consts computegcd nat x mat a mat recdef computegcd measure mm 11 compategcdm7 n ifn 0 then m else compategcdn7 m mod 11 lemma dividesadd assumes km km and kn kln shows klmn proof 7 from km kn obtain q r where m kq and nkr apply auto by blast hence m n kq r by blast intro addmultdistrinsymmetric thus klmn by simp qed lemma dividesdiff assumes km km and kn kln shows klm 7 n proof 7 from km kn obtain q r where m kq and nkr apply auto by blast hence m 7 n kq 7 kr by simp also have kq 7 r by blast intro di mult distrinsymmetric nally have m 7 n kq 7 r by simp thus km7n by simp qed lemma gcdpreserved assumes M m qn r shows x is gcd ofm and n x is gcd ofn and r proof 7 fix k assume km and kln hence km 7 qn using dividesdiff by auto hence klr using M by simp moreover fix k assume kln and klr hence kqn r using dividesadd by auto hence klm using M by simp ultimately show thesis using M by simp add isGCDdef7 blast qed theorem computegcdcomputesgcd computegcdmn is gcd ofm and n proof induct rule computegcdinduct fix m n assume IH 11 7 0 7 computegcdn7 m mod n is gcd ofn and m mod 11 show computegcdmn is gcd ofm and n proof casetac n 0 assume 1 0 thus thesis using isGCDdef by simp next assume N n 7 0 have m m div nn m mod n by auto with NIH gcdpreserved have computegcdn7 m mod n is gcd ofm and n by blast with N show thesis by simp qed qed 6 MUTUAL RECURSION AND INDUCTION datatype a tree EmptyT NodeT a aforest and aforest NilF ConsF a tree aforest consts attentree a tree a a list attenforest aforest a list primrec attentree EmptyT attentree NodeT xf x attenforestf attenforest NilF attenforest ConsF tf attentree t attenforestf consts maptree a tree a a a b a b tree mapforest aforest a a b bforest primrec maptree EmptyT h EmptyT maptree NodeT xf h NodeT h x mapforestf h mapforest NilF h NilF mapforest ConsF tf h ConsF maptree t h mapforestf h The following is the induction rule for trees and forests P1 EmptyT P2 forest t P2 N F t t a fores P1 NodeT aforest l reefores P1 tree P2 forest P2 ConsF tree forest P1 tree P2 forest thIn treeforestinduct lemma attentree maptree t h map h attentree t A attenforest mapforestf h map h attenforestf proof inducttoo t andf show attentree maptree EmptyT h map h attentree EmptyT by simp next fix of assume IH attenforest mapforestf h map h attenforestf have attentree maptree NodeT of h attentree NodeT h a mapforestf h by simp also have h a attenforest mapforestfh by simp also have h amap h attenforestf using H by simp also have map h attentree NodeT af by simp finally show attentree maptree NodeT of h map h attentree NodeT af 19 next show attenforest mapforest NilF h map h flattenforest NilF by simp next fix tf assume 1H1 attentree maptree th map h flattentree t and IHZ attenforest mapforestf h map h flattenforestf from IH IHZ show attenforest mapforest ConsF tf h map h flattenforest ConsF tf by simp qed 7 CASE STUDY COMPILING TO A STACK MACHINE types vbinop v v v Isabelle does not have builtin support for LISPstyle symbols so the typically approach for representing variables is to use natural numbers datatype v expr Const v l Var nat lApp v binop v expr v expr The following eval function is an interpreter for this simple language consts eval v expr hat a v a v primrec eval Const b env b eval Var x env env x eval Appfel e2 env f eval e1 env eval e2 env We compile this language to instructions for a stack machine Here is the datatype for instructions The ILoad instruction looks up a variable and puts it on the stack and the IApply instruction applies the binary operation to the top two elements of the stack datatype v instr Const v l ILoad nat l App v binop The exec function implements the stack machine executing a list of instructions consts exec v instr list nati v v list a v list primrec exec env vs vs exec iis env vs case i of Const v exec is env vvs lILoad x exec is env env xvs lIAppf exec is env hd vs hd tl vstltl vs TODO explain arbitrary stuff from partially de ned functions like hd of an empty list The compiler translates an expression to a list of instructions 20 consts comp v expr v instr list primrec comp Const v IConst v comp Var x ILoad x comp Appfel e2 comp e2 comp e1 IAppf 71 The compiler is correct To check that the compiler is correct we prove that the result of compiling and then executing is the same as interpreting theorem exec comp e env eval e s oops We re going to prove this by induction on e but rst need to generalize the theorem a bit theorem V vs exec comp e env vs eval e envvs proof induct e fix v Show Vvs exec comp Const v env vs eval Const v envvs by simp next fix x Show Vvs exec comp Var x env vs eval Var x env vs by simp next fixf e1 e2 assume IH Vvs exec comp e1 env vs eval e1 env vs and 1H2 Vvs exec comp e2 env vs eval e2 env vs Show Vvs exec comp Appf e1 e2 env vs eval Appfel e2 env vs proof fix vs have A comp Appfel e2 comp e2 comp e1 IAppf by simp have eval Appf e1 e2 env f eval e1 env eval e2 env by simp have f eval e1 env eval e2 envvs exec IAppf env eval e1 env eval e2 env vs by simp also have exec IAppf env exec comp e1 env eval e2 env vs using 1H1 by simp also have exec IApp f env exec comp e1 env exec comp e2 env vs using 1H2 by simp 7 At this point we need a lemma about exec and append oops lemma execappendruleformat V vs exec xsys env vs exec ys env exec xs env vs apply induct xs apply simp apply auto apply casetac 1 apply auto done 21 theorem V vs exec comp e env vs eval e envvs proof induct e fix v show Vvs exec comp Const v env vs eval Const v envvs by simp next fix x show Vvs exec comp Var x env vs eval Var x env vs by simp next fixf e1 e2 assume IH Vvs exec comp e1 env vs eval e1 env vs and 1H2 Vvs exec comp e2 env vs eval e2 env vs show Vvs exec comp Appf e1 e2 env vs eval Appfel e2 env vs proof fix vs have exec comp Appfel e2 env vs exec comp e2 comp e1 IAppf env vs by simp also have exec comp e1 IAppf env exec comp e2 env vs using execappend by blast also have exec IApp f env exec comp e1 env exec comp e2 env vs using execappend by blast also have exec IAppf env exec comp e1 env eval e2 env vs using 1H2 by simp also have exec IAppf env eval e1 env eval e2 env vs using 1H1 by simp also have f eval e1 env eval e2 envvs by simp also have eval Appfel e2 env vs by simp finally show exec comp Appfel e2 env vs eval Appfel e2 env vs by blast qed qed 72 Notes This section is based on section 33 of the IsabelleHOL tutorial 8 SETS One of the nice aspects of Isabelle is that it provides good support for reasoning with sets For reference see section 23 in Isabelle s Logics HOL constdefs Evens nat set EvensEn3 mn2m 22 lemma 2 E Evens by simp add Evensdef lemma 34 E Evens by simp add Evensdef constdefs Odds not set OddsEn3 mn2m1 In the following proof we use the rules for intersection introduction and elimination and the memCollecteq rule that can be used to introduce and eliminate membership in a set that is formed by comprehension lemma x Evens Odds proof induct x show 0 Evens Odds by simp add Oddsdef next fixx assume xneo x Evens Odds show Suc x Evens Odds proof assume towardscontra Suc x E Evens Odds from towardscontra have sxo Suc x 6 Odds by rule IntDZ fromsxo havexm 3 m Sucx2 gtk m 1 by simp only Oddsdef memCollect eq from m obtain m where M Suc x 2m 1 frothave 3 mx2 gtk mbysimp hence xe x E Evens by simp only Evensdef memCollect eq from towardscontra have sxe Suc x E Evens by rule IntDJ from sxe obtain n where M Suc x 2n apply simp only Evensdef memCollect eq by blast frothavex2 gtk n 711byarith hence xo x 6 Odds by simp add Oddsdef from xe xo have x E Evens Odds by rule Intl with xneo show False by simp qed qed Note how the rules for intersection are similar to the rules for conjunction That is because the two notions are equivalent in the following sense lemma x EA Ax EB x EA OB bysimp Union is equivalent to disjunction and has similar introduction and elimination rules lemma x EA Vx EB x EA UB bysimp Subset is equivalent to implication 23 lemmaVxx Agtx BAgBbyauto Complement is equivalent to not lemma x E 7A x A by simp 9 FINITE SETS Finite sets can be formed using insert and also set notation lemma insert 0 01 by auto The size of a nite set its cardinality is given by the card function lemma card 0 by simp lemma card 4nat 1 bysimp lemma card 4nat1 2 by simp lemma x 7 y gt card xy 2 by simp You can de ne functions over nite sets using the fold function constdefs setsum nat set a nut setsum S Efold Axy x y x x 0 S declare setsumdef simp lemma setsum 123 6 by simp You can perform induction on nite sets This is also the rst example of proof by case analysis Perhaps we should introduce proof by cases earlier lemma setsumgefinite S gt V x E S x g setsum S proof induct rule niteinduct show Vx x g setsum by simp next fix x and Fnat set assumefFfinite F and XF x F and IH V XEF x g setsum F show Vyeinsert XF y g setsum insert x F proof fixy assume ny y 6 insert x F showy g setsum insert x F proof casesy x assume yx y x fromfF XF have me setsum inserth x setsum F by auto withyx showy g setsum insert x F by simp 24 next assume yx y 7 x fromyxny have yF y E F by auto with H have ysF y g setsum F by blast fromfF XF have me setsum inserth x setsum F by auto with ysF showy g setsum insert x F by auto qed qed qed 10 CASE STUDY AUTOMATA AND THE PUMPING LEMMA In this section we model deterministic nite automata DFA and prove the Pumping Lemma We de ne a DFA in Isabelle to be a record consisting of the set of states the starting state the set of nal states and the transition function 6 The transition function says which state the DFA goes to given an input character we re using natural numbers for characters here and the current state types state nat record DFA DFAstates state set Q DFAstart state qo DFA nals state set DFAdelta nat state a state 6 A DFA can be used to de ne a regular language if the DFA accepts a string then it is in the language otherwise the string is not in the language A DFA accepts a string if feeding the string into the DFA causes the DFA to transition to a nal ie accepting state types string nat list lang string set The set consisting of the natural numbers up to 11 called iota will be used in several places in the de nitions and proofs We collect some useful properties of iota here constdefs iota nat nat set iotanEii n lemma iotaz iota 0 0 by simp add iotadef lemma iotas iota Suc n insert Suc n iota n apply simp add iotadef by auto 25 lemma notiniota Suc n Z iota n apply induct n by auto simp add iotadef lemma iotafinitefinite iota n apply induct n by auto simp add iotaz iotas lemma cardiota card iota n n 1 apply induct n using notiniota iotafinite by auto simp add iotaz iotas We de ne the predicate good DFA to make explicit some assumptions about DFAs For example we assume that the range of the transition function is a subset of the states of the DFA Also we assume that the states are numbered 0 n 7 1 constdefs goodDFA DFA bool goodDFAM E nite Q M A QM iota card Q M 7 1 A qoM e QM A FM g QM AVaVq QM6Maq QM We use semicolons for function composition and read the composition from left to right instead of the usual right to left syntax compfwd a a b a b a c a a a c infixl 70 translations f g g o f The A function is the extension of the transition function 6 to strings This de ne what it means to feed a string into a DFA consts extdelta DFA string a state a state A primrec AMHM AMaw 6Ma AMw We can now formally de ne the language of a DFA as the set of strings that take the DFA to a nal state Via the extended transition function constdefs langof DFA lang langofMEw AMw qoM EFM consts strpow string nat string 8080 80 priam ec W W WSucnWWn 101 Properties of the extended transition function lemma extdeltaappend A M xy A MX A My by induct x auto lemma extdeltaidempotent 26 VMpgoodDFAMAp e QMApAMypgtpAMykp apply induct k using extdeltaappend by auto lemma extdeltagood VMqgoodDFAMAq QMgtAqu QM apply induct w by auto simp add goodDFAdef 102 Some properties of the take and drop string functions lemma takeeqtakeappdroptake assumes ilj i lt j shows takej w take iw drop i takej w proof 7 from ilj have B take i takej w take i w by simp add mindef have C take i takej w drop i takej w takej w by simp only appendtakedropid fromB C show takej w take i w drop i takej w bysimp qed lemma wequulsJQz assumes ij i lt j and jw j g length w shows w take iw drop i takgj w dropj w proof 7 have A takej w dropj w w by simp obtain t where T t takej w by simp fromA T haveX t dropj w w by simp from ij have D takej w take iw drop i takej w by rule takeeqtakeappdroptakg fromD T have D2 t take i w drop i takej w bysimp fromXDZ show thesis by simp qed 103 The Pumping Lemma The pumping lemma relies on the pigeonhole principle which we state without proof here lemma pigeonhole assumes card B lt cardA and V x E Afx E B shows xyx yAx AAy AAfxfsorry constdefs steps DFA string a nut state stepsMw n E AM take n w qo M The Pumping Lemma is best described by the diagram in Figure 1 Given a string it that is longer than the number of states in the DFA at some point the DFA must loop back on itself and revisit some state p this is by the pigeonhole principle Let x be the rst portion of it that gets the DFA to p y the next portion that gets it back to p and let 2 be the remainder 27 of w If to is in the language of the DFA takes it to a nal state then so is zykz because the DFA can take the y loop any number of times and then proceed Via 2 to a nal state F7924 Figure 1 The Pumping Lemma lemma pumpingregular assumes g goodDFA M shows 3 n V w w E lang ofM A n g length w 7gt 3 xyz w x y z Ay 7 A length xy g n A V k x ykz E lang ofM proof 7 let n card Q 7 Choosing n is an important decision fix w assume wl w E lang ofM and nw n g length w from wl have wd AMw qo M 6 FM by simp add lang ofdef 7 Setting up to use the Pigeonhole Principle let A iota card Q and B iota card Q M 7 1 have F V x E A steps Mw x E B using g extdeltagood stepsdef goodDFAdef by auto from g have C card B lt card A using goodDFAdef cardio ta by auto fromF C obtain ij where ij i 7ampj and iA i E A and jA j E A and sab steps Mw i steps M wj using pigeonhole by blast 7 without loss of generality assume i i j fixij assume iljiltjand iA i E A and jA j E A and sab steps Mw i steps M wj obtain xy 2 where x x take i w and y y drop i takej w and z z dropj w by simp from jA nw have jlwj g length w by simp add iotadef from ilj jlw have w w x y z using xy 2 wequalsJQz by blast fromjlw iljy have ly lengthy j 7 i by simp add mindef with ilj have ynil y 7 by auto 28 from iAjA nw x ilj ly have bryn length xy g n by simp add iotadefmindef have V k x ykz E lang ofM proof fix k let p stepsMw i fromg have pq p E QM using extdeltagood stepsdefgoodDFAdef by simp from ilj x have zyp p A MX 10 M by simp add stepsdef fromxyilj have takejw xy apply simp by rule takeeqtakeappdroptake with sub zyp have pyp p A My p by simp add stepsdef extdeltaappend fromgpq pyp have pykp p A M p using extdeltaidempotent by blast from w zyp pyp pykp have AMw qo M AM xykz qo M by simp add extdeltaappend oassoc with wl show x ykz E lungof M by simp add lang ofdef ed with wynil bryn have 3 xyz w xyz Ay 7 A length xy g n A V k x ykz E langofM by blast with ij iAjA sab have 3 xyz w xyz Ayy A length xy g n A V k x ykz E langofM apply casetac i lt j apply force apply casetacj lt i by auto thus thesis by blast qed 29 1 1 INDUCTIVELY DEFINED SETS AND A GRAPH EXAMPLE In this section we will explore the use of inductively de ned sets by modeling some basic graph theory in Isabelle We start by de ning a type for directed graphs types vertex digraph vertex set x vertex x vertex set To match the notation of the CLR 4 we provide the VG and E G syntax for the vertex and edge sets of a graph syntax vertices v digraph v set V 100 edges v digraph e set 100 translations VG fst G E G snd G 111 Inductive de nition of a path through a graph When recursion is required to de ne a set use the inductive command Here we de ne the set of all paths in a directed graph The introduction rules de ne what one must show to prove a path is in paths G The conclusion of each introduction must be of the form x 6 paths G consts paths v digraph v x v list x v set inductive paths G intros pathsbasis a E VG gt au 6 paths G pathsstep vpw 6 paths G av E EG a E VG gt a7 vali p7 w 6 paths G Next we de ne nice syntax for writing path expressions syntax paths v7 v list7 v v digraph bool t gt in 100 translations a t gtp v in G apv 6 paths G Isabelle automatically creates a rule for performing inductive proofs over inductively de ned sets The generated rule paths induct is xctaxbxainG uE VG v gtpwinG Pva uv EG u VG Adi Apuvw Pu la P u vp w P xc xb xa 30 Here is an example of performing induction on paths Each inductive intro gives rise to a subgoal that must be proved The parenthesis provide scoping for the x and assume commands lemma lastisinV a gtp v in G gt v E VG proof induct rule pathsindact fix a assume a E VG thus a E VG next fixp a v w assume w E VG thus w E VG qed If you already know that a set is in paths G then you know that the set must satisfy the conditions given by the intro rules Isabelle will generate this inverse rule for you automatically if you ask nicely using the inductivacases command inductivecases pathsinv apw 6 paths G lemma apv 6 paths G gt a E VG apply erale pathsinv apply simp apply simp done The inverse rule is awpwin G uEVGp wagtP pa v v gtpaw in G a7 v EG a E VGp vpa gtP gtP 112 The strongly connected relation is an equivalence We are going to show that the strongly connected pairs relation is an equivalence relation A pair of vertices u v are strongly connected if there is a path from u to v and from v to u constdefs stronglyconnectedpairs v digraph v x v set stronglyconnectedpairs G E av 3p q u t gtp v in G A v t gtq a in G The Isabelle Relation theory contains de nitions for re exive symmetric and transitive relations which we use here to de ne an equivalence relation constdefs equivalencerelation a set7 a x a set bool 31 equivalencerelation S R E re S R A sym R A trans R To show the transitivity property we will need to be able to join two paths The following lemma proves that the result of appending one path to another is a valid path The way in which this lemma is stated is a bit strange so as to t what the paths induct method is expecting First the only thing to the left of the gt is a paths expression Next all other premises appear to the right of the gt but to the left of the 7 The conclusion of the lemma appears to the right of the 7 Finally note the use of V The variables to the left of the gt are automatically universally quanti ed but we need to make sure the rest of the variables are also universally quanti ed The use of rule format tells Isabelle to transform the statement of the lemma after it has been proved into format that is easier to use lemma appendpath ruleformat a7pbinGgtV qcht gtqcinG7gtat gtpqcinG proof induct rule pathsinduct fix u assume u E VG thus Vq c u t gtq c in G 7gt u t gtHq c in Gbysimp next fix p u v w assume vw v 7p w in G and uvinE uv E EG and uinV u E VG andIH Vq cw 7gtq c in G 71 7pq c in G shoqu c w 7gtq c in G 7gt u 7Vpq c in G proof clarify 7 clarify removed forall changed single arrow to double fix q r c assume we w 7gtq c in G from we and H have v 7pq c in G by simp with uinV and uvinE have u 7vPq c in G by simp add pathsintros 12121118 u 7Vpq c m G by szmp e qed The resulting lemma append path is the following at gtpbinGbt gtqcinGgta7gtpqcinG lemma stronglyconnectedLsanequivalencerelation equivalencerelation VG stronglyconnectedpairs G 7 Going into the proof we apply the def of equivalence 7 relation and then the perform induction on the path proof simp add equivalencerelationdef7 auto show re VG stronglyconnectedpairs G proof simp add re def5tronglycormectedpairsdef7 auto7 erule pathsinduct 32 fix u assume u E VG thus u E VG next 7 next clears out any xed variables or assumptions fix u assume u E VG thus u E VG next fix up b assume 17 p7 b 6 paths G thus b E VG by rule lastisinV next fix x assume x E VG from prems have x gt x in G by simp add pathsbasis thus 3 p x t gtp x in Gby auto qed next show sym stronglyconnectedpairs G proof simp only symdef stronglycormectedpairsdef7 clarify fixxyp q assumext gtpyin Gandywqxin G thus 3p qyt gtpxin GAxt gtqyin Gbyauto qed next show trans stronglyconnectedpairs G proof simp only transdef5tronglycormectedpairsdef7 clarify7 renameme r 5 fix xy 2 p q r s assume xy x t gtpy in G andyxy t gtq x in G andyzyt gtrzin Gandzyzt gt3yin G from xy andyz have xz x gtpr z in G by rule appendpath from zy andyx have zx z gtSq x in G by rule appendpath fromxzandzxshow p qxt gtpz in G Az t gtqxin Gbyauto qed qed 12 CASE STUDY THE SIMPLY TYPED LAMBDA CALCULUS We formalize an operational semantics for the simply typed lambda calculus in the evalu ation context style 6 13 and prove type safety We use a relatively new approach for representing variables called locally nameless 3 7 11 In the locally nameless approach bound variables are represented with de Bruijn indices whereas free variables are represented with symbols This approach enjoys the ben e ts of the de Bruijn indices ozequivalent terms are syntactically identical while avoiding much of the complication normally caused by representing free variables with de Bruijn indices Separate functions are used to substitution for free and bound variables 33 121 Syntax of the simply typed lambda calculus datatype Ly IntT BoolT ArrowT Ly Ly in xr a 200 datatype const IntC int BoolC bool Succ IsZero datatype expr BVar nat FVar nat Const const Lam Ly expr 5252 51 App expr expr Free variables consts FV expr nat set primrec FV BVar i FV FVar x x FV Const c FV a e FVe FV App e1 e2 FVel U FVe2 lemma niteFV nite FV e apply induct e by auto Substitution for free variables constsfsubst mat expr expr expr 545454 53 primrec zHeBVar i BVar i zHe FVar x ifz x then e else FVar x l zHeConst c Const c Zaea e3 a zaee zHeApp e1 e2 App zaeel zaee2 Substitution for bound variables consts bsubst mat expr expr expr 54754754 53 primrec kae BVar i ifk i then e else BVar kae FVar x FVar x kaeApp e1 e2 App kaeel kaee2 122 Operational semantics with evaluation contexts A utility function for casting an arbitrary expression to an integer consts toint expr int option primrec 34 toint BVar x None toint FVar x None toint Const c case c of IntC n Some n lBoolC b None l Succ None l IsZero None toint Lam 7 e None toint App e1 e2 None The 6 function evaluates the primitive operators consts delta const expr expr option 6 primrec delta IntC n e None delta BoolC b e None delta Succ e case toint e of None None l Some n Some Const IntC n delta IsZero e case toint e of None None l Some n Some Const BoolC n Evaluation reduces expressions to values The following is the de nition of which expres sions are values consts Values expr bool primrec Values BVar i True FVar x True Values Const c True 0 e True Values App e1 e2 False The callbyvalue notion of reduction is de ned as follows consts reduces expr x expr set syntax reduces expr expr bool infixl 7 51 translations e 7 e ee 6 reduces inductive reduces intros Beta Values v gtApp 7 e v 7 0HV8 Delta 6 c v Some v Values v gt App Const c v 7H v constdefs redex expr bool redex r E 3 r r 7 r3 35 We use contexts to specify where reduction can take place within an expression datatype ctx Hole l AppL ctx expr l AppR expr ctx consts wfctx ctx set inductive wfctx intros WFHole Hole 6 wfctx WFAppL E E wfctx gt AppL E e E wfctx WFAppR Values v E E wfctxl gtAppR v E E wfctx constsfill ctx expr expr 8282 81 prlmrec Hole e e AppL E 82W App E 8 82 AppR 81 EM App 81 Elel consts evalstep expr x expr set syntax evalstep expr expr bool infixl gt 51 translations e gt e e7e3 E evalstep inductive evalstep intros Step E E wfctx r 7 r gt Er gt Er 123 Creating fresh variables constdefs max nat nat nat maxxy E ifx lty thenyelse x declare maxde simp interpretation AC max ACe max 0nat by auto intro ACfintro ACeaxiomsintro constdefs setmax nat set nat setmaxS Efold max x x 0 S lemma maxgefinite L gt V x E L x g setmaxL apply induct rule niteinduct apply simp apply clarify apply casetac xa x proof 7 fix x and Fnat set and xa assumefF nite F and xF x F and xax xa x omfoF have me setmax insert xF maxx setmax F apply simp only setmaxdef apply rule AC maxfoldinsert apply auto done 36 with xax show xa g setmax insert x F apply clarify by simp next x x and Fnat set and xa assumefFfinite F and XF x F and axF VXEF x g setmaxF and xsxF xa 6 insert x F and xax xa 7 x from xax xsxF have xaF xa E F by auto with axF have xasF xa g setmaxF by blast omfoF have me setmax insert XF maxx setmax F apply simp only setmaxdef apply rule AC maxfoldinsert apply auto done with xasF show xa g setmax insert x F by auto qed lemma maxisfreshsimp assumes F finite L shows Suc setmax L Z L proof assume ssl Suc setmax L E L with F maxge have Suc setmax L g setmax L by blast thus False by simp qed lemma greaterthanmaxisfresh simp assumes Ffinite L and I setmaxL lt i shows i L proof assume ssl i E L with F maxge have i g setmax L by blast with I show False by simp qed 124 Welltyped expressions types env nat ty option constdefs removebind env nat env bool 7 C 505050 49 FizCF EVXTX7 zAFXSome7gtF XSome7 constdefs niteenv env bool finiteenv F E nite dom F declare finiteenvdef simp 37 consts TypeOf const ty primrec TypeOf IntC n IntT TypeOf BoolC b BoolT TypeOfSucc IntT a IntT TypeOfIsZero IntT a BoolT consts wte env x expr x Ly set syntax wte em expnty bool F 525252 5 translations P F e 7 P 87 7 E wte inductive wte intros wtevar PX Some 7 gt P F FVarx 7 wteconst P F Const c TypeOfc wteabs nite L dom P g L V x x Z L POO gt0 F OHFVar xe 7 gtl FUeagt739 wteapp PFe CUHTgr FEZCO39 gtPFAppe 823T thm wteinduct 125 Properties of substitution lemma b5ubst crossruleformat V ij uv i 7ampj A iaujavt javt gt iaut t apply induct t apply force apply force apply force apply clarify apply eruletac XSuc i in allE apply eruletac XSuc j in allE apply eruletac xu in allE apply eruletac xv in allE apply simp apply clarify apply eruletac xi in allE apply eruletac xi in allE apply eruletac xj in allE apply eruletac xj in allE apply simp apply blast done lemma b5 ubst wt P F e Tg niteenvP gtV ke kae e e 38 apply induct rule wteinduct apply force apply force apply clarify apply simp apply eruletac XSuc setmax L in allE apply erule impE apply rule maxisfresh apply simp apply erule conjE apply eruletac XSuc k in allE apply eruletac xe in allE apply rule bsubstcross apply blast apply force done lemma subst permuteimplruleformat ijzPTe xy zAPFe TA niteenvP zHeMUHFVar xe fa FVar xzae e apply induct 8 apply force apply simp apply clarify apply frule bsubstwt apply simp apply eruletac xj in allE apply eruletac xFVar x in allE apply simp apply simp apply simp apply clarify apply blast apply simp apply clarify apply eruletac xj in allE apply eruletac xj in allE apply eruletac xx in allE apply eruletac xx in allE apply eruletac xz in allE apply eruletac xz in allE apply eruletac XP in allE apply eruletac XP in allE apply blast done lemma subst permute x 7amp2 P F e Tg niteenv P gt fa FVar xzae e zHeMUHFVar xe using substpermuteimplofx z P e Tj e by simp lemma decomposesubstruleformat 39 V ux i x We iaue XHMlgtFV1T X8 apply induct 8 apply force apply force apply force apply clarify apply eruletac xu in allE apply eruletac xx in allE apply eruletac XSuc i in allE apply simp apply force one 126 Properties of environments and rule induction constdefs subseteq env env bool in xl g 80 PQP EV XT1 XSome7gt1quotXSome7 lemma tamweakening PFe7gtV 1quot1 1quotA niteenv1quotgt1quotFe7 apply induct rule wteinduct using subseteqdef wtevar apply blast using wteconst apply blast prefer 2 using wteapp apply blast apply rule alll apply rule impI proof 7 xL P a 739 e 1quot assumefLfinite L and GL dom P g L andIH Vxx Z L POO gt0 F OHFVar xe 7 A V1quot POO gt0 g 1quot A niteenv 1quot a 1quot F OHFVar xe 7 and GGP P g 1quot A niteenv 1quot let L L U dom 1quot from GGP have nite dom 1W by auto withfL havefLZ nite L by auto fixx assume xL x L from GGP have XGXGP POO gt0 g PXl MT using subseteqdef by auto from GGP havefGP niteenv PXl MT by auto from foGP IH XGXGP have PXl MT F OHFVar xe 7 by blast hence X V x x L PXl W39l 0gtFV1rX CT by blast have dGL dom 1quot g L by auto fromeZ dGLX show 1quot F a e 039 a 7 by rule wteabs qed 4O 127 The substition lemma lemma substitution 1 F 812TPX Som8 ag nit88nv P gt V 1quotfinit88nv1quotA1 HxC1quotA1quotF82aH 1quotF XH8281 7 apply induct rul8 wt8induct apply cas8tac x xa apply simp apply clarify apply simp only r8moV8bindd8f apply 8rul8tac xxa in allE apply simp apply rul8 wt8var apply assumption using wt8const apply forC8 prefer 2 apply clarify apply simp apply rul8 wt8app apply blast apply blast proof clarify xLnat s8t and P8nv and a ty and 7 8 1quot assumefLfinit8 L and dom P g L andIH an xa Z L H Pxa H a F OHFVar xa8 7 A Pxa H 7 x Som8 039 H finit88nv Pxa H 7 H V1quot nit88nv 1quotA Pxa H 03 H x C 1quotA 1quotF 82 039 H 1quot F XH820HFVar xa8 and XG P x Som8 a andfG nit88nv P andfGP finit88nv 1quot and GXG P H x C 1quotandwt821quotF 82 a let L insert x L U dom P U dom 1W Show 1quot F XH82a 8 a H 7 proof simp Show 1quot F a XH828 a H 7 proof rul8 wt8absof L fromefoGP show nite L by auto next Show dom 1quot g L by auto next showaa xa L H 1quotxa H 03 F OHFVar xaxH828 7 proof rul8 allI7 rul8 impI x x assumexL x Z L let GP MXHad from foGP wt82 have wt82b GP F 82 0 using subs8t8qd8f 8nvW8ak8ning by forC8 from XG XL wt82b foGP GXG IH have wt8 GP F XH820HFVar x 8 7 using r8moV8bindd8f by auto 41 from XL wteZb fGP have OHFVarX XHe2e Xae20HFVarX e using subst permute by auto with wte XL show GP F OHFVar XXgt828 7 by auto qed qed qed qed 128 Inversion rules and canonical forms We use lsabelle s inductivecases form to generate inversion rules for expressions with certain types such as integers and functions These rules are called inversion rules because they let you use the inductive de nitions in reverse going from the conclusions to the premises inductivecases wteintinv empty F e IntT From the above Isabelle generates None Some IntT e FVar X empty F e IntT X P e Constc IntT TypeOfc empty F e1 0 4 IntT empty F d a e App e1 e2 Ac A0 e d P P inductivecases wtefuninv empty F e 039 a 7 and Isabelle generates NoneSome 0H7 eFVarX eConstc UHTT e0 0 emel ECUHT X Ac yp f P P AL m niteL dom empty Q L VX X g L 4gt X H 7 F UHFVHTX 1 739 e A17 ea P I emptyFe1c7 4lc74gtT emptyFe2cr eAppe1e2 a e a P P The following canonical forms lemmas describe what kinds of values have certain types For example the only value that has type MT is an integer constant The canonical forms lemmas are needed to prove subject reduction lemma canonicalformint assumes eint empty F e IntT and ve Values e shows 3 n e Const IntC 11 using eint apply rule wteint inv using ve apply auto apply casetac c by auto 42 lemma canonicalformfun assumes wtf empty F V 039 a 7 and V Values V shows 3 e V a e V 3 c V Const c using wtf apply rule wtefuninV using V by auto 129 Subject reduction lemma deltatypability assumes tc TypeOfc 7quot a 7 and Vt empty F V 7quot and VV Values V shows 3 V 6 CV Some V A empty F V 7 using tc Vt VV apply cases c apply simp apply simp proof 7 assume tc TypeOfc 7quot a 7 and Vt empty F V 7quot and VV Values V and c c Succ from c tc have st 7quot IntT A 7 IntT by simp from st Vt VV obtain 11 where V V Const IntC n apply simp using canonicalformint by blast let VP Const IntC n 1 have thp empty F VP IntT using wteconstof empty IntC n 1 by auto from c V have d 6 c V Some VP by simp from d thp st show thesis by simp next assume tc TypeOfc 7quot a 7 and Vt empty F V 7quot and VV Values V and c c IsZero from c tc have st 7quot IntT A 7 BoolT by simp from st Vt VV obtain 11 where V V Const IntC n apply simp using canonicalformint by blast let VP Const BoolC n 0 have thp empty F VP BoolT using wteconstof empty BoolC n 0 by auto from c V have d 6 c V Some VP by simp from d thp st show thesis by simp qed lemma subjectreduction assumes wte P F e 7 and g P empty and red e 7H e shows empty F e 7 using wte g red apply cases rule wtecases apply simpall apply force apply cases rule reducescases apply simp apply cases rule reducescases apply simp apply clarify 43 fix Penv and a 7quot e1 e2 assume wte empty F e 039 H 7 and wteZ empty F e2 a and red App e1 e2 HH e H Would be cleaner to use an inductive cases for the above red from red show empty F e 7 proof cases rule reducescases fix 7 b v assume a App e1 e27 e App 7 b v7 0Hvb and W Values v havefe nite by simp have XL 0nat bysimp from wte fe a xL obtain L wherefLfinite L and wtb Vx x Z L H x H a F OHFVar xb 7 apply cases rule wtecases by auto let X Suc max setmax L setmax FV have xgel setmaxL lt X by auto have xgeb setmax FV b lt X by auto H Set up for and apply the substitution lemma frome xgel have XL X Z L by rule greaterthanmax isfresh with wtb have wth X H a F OHFVar Xb 7 by blast have gxs X H a X Some 0 bysimp havefg niteenv X H a by simp havefgp niteenv empty by simp have gxgp X H a H X C empty by simp add removebinddef from wth gxs fgfgp gxgp wteZ have wtb empty F XHe20HFVar Xb 7 using substitution by blast H Use the substitution decomposition lemma have nb nite FV b by rule finiteFV from nb xgeb have xb X Z FVb by rule greaterthanmax isfresh from xb have 0He2b XHe20HFVar Xb by rule decomposesubst with wtb a show empty F e 7 by simp next H Delta fix c v v assume a App e1 e27 e App Const c v7 v and d 6 c v Some v and W Values v from wte a have tc TypeOfc 039 H 7 apply cases rule wtecases by auto from a tc wteZ W obtain v where dd 6 c v Some v and wtvp empty F v 7 using deltatypability by blast from wtvp a d dd show empty F e 7 by simp qed qed 1210 Decomposition consts welltypedctx env x ctx x ty x ty set syntax welltypedctx em ctx ty ty bool F 52752752752 51 translations P F E a a 7 P E7 0 7 E welltypedctx inductive welltypedctx intros WTHole P F Hole 7 7 WTAppL Pl ECO39gtggtTPl ZQ gtPFAppLEeaT WTAppR FFegHTFFEag gtPFAppReEaT lemma welltypeddecomposition P F e 7 gt PemptygtValueseV3 aEreEr APFEaTAE Wfctx APFraAredexr isPFeTgtPPe7 apply induct rule wteinduct apply simp apply simp apply simp apply rule impI proof 7 fix P a 7 e1 e2 assume wte1P F e 039 a 7 and H P P e cat and wteZ P F e2 a and 1H2 P P e2 0 and g P empty show Values App e1 e2 V HaErAppe1e2ErAPFE07AEewfctxAPFraAredexr proof cases Values e1 assume ve Values e1 show thesis proof cases Values e2 assume veZ Values e2 have h App e1 e2 HoleApp e1 e2 by simp have wth empty F Hole 77 by rule WTHole from wte wteZ g have wta empty F App e1 e2 7 apply simp by rule wteapp from wte ve1g have 3 e e a e V 3 c e Const c apply simp apply rule canonicalformfun by auto moreover assume x 3 e e a e 7 Beta 45 fromx obtain b where 81 81 a b by blast from 81 V82 have App 81 82 7 0H82b apply simp by rul8 B8ta hence r r8d8x App 81 82 using r8d8xd8f by blast have wfh Hol8 E wfctx by rul8 WFHol8 from h wth wfh wta r g have th8sis by blast moreover assume x 3 c 81 Const c 7 Delta fromx obtain c where 81 81 Const c by blast from wt81 81 have tc Typ80fc 039 a 7 apply cas8s rul8 wt8cas8s by auto from tc wt82 V82 g obtain v where dd 6 c 82 Som8 v using d8ltatypability by blast from dd V82 81 have App 81 82 7H v apply simp by rul8 Delta hence r r8d8x App 81 82 using r8d8xd8f by blast have wfh Hol8 E wfctx by rul8 WFHol8 with h wth wfh wta r g have th8sis by blast ultimately show th8sis by blast next assume V82 a Values 82 from V82 IH2 g obtain a E r where 82 82 E r and th P F E OJgt039 and ME E E wfctx and wtr P F r a and rr r8d8x r by blast from 82 have App 81 82 AppR 81 E r by simp moreover from wt81 thg have 8mpty F AppR 81 E a a 7 apply simp apply rul8 WTAppR apply auto done moreover from V81 wa have AppR 81 E E wfctx by rul8 WFAppR moreover note wtr rr g ultimately show th8sis by blast qed next assume V81 a Values 81 from V81 1H1 g obtain a E r where 81 81 Er and th P F E OJgt039gtT and ME E E wfctx and wtr P F r a and rr r8d8x r by blast from 81 have App 81 82 AppL E 82r by simp moreover from th wt82 g have 8mpty F AppL E 82 a a 7 apply simp apply rul8 WTAppL apply auto done moreover from wa have AppL E 82 E wfctx by rul8 WFAppL moreover note wtr rr g ultimately show th8sis by blast qed qed lemma W8lltyp8d8xprctXimpl 46 Pke7gtVEreEr gt3 aPFEaTAPFra apply induct rule wteinduct apply clarify apply ruletac x7 in end apply casetac E using wtevar WTHole apply force apply simp apply simp apply clarify apply ruletac xTypeOfc in end apply casetac E using wteconst WTHole apply force apply simp apply simp apply clarify apply casetac E apply ruletac X039gtT in end apply simp using wteabs WTHole apply force apply simp apply simp apply clarify apply casetac E apply ruletac x7 in end using wteapp WTHole apply force apply eruletac xctx in allE apply eruletac xctx in allE apply eruletac xr in allE apply eruletac xr in allE apply simp using WTAppL apply blast apply eruletac xctx in allE apply eruletac xctx in allE apply eruletac xr in allE apply eruletac xr in allE apply simp using WTAppR apply blast one lemma welltypedexprctx PEER Tgt3 039Pl E2039gtTAPl rCO39 using welltypedexprctx impl by simp lemma llctXwelltypedruleformat PFEUTgtVrPFragtPF llErT apply induct rule welltypedctxinduct apply simp using wteapp apply force using wteapp apply force done 47 1211 Progress and preservation lemma progress assumes wte empty F e 7 shows Values e V 3 e e gt e3 proof 7 show thesis proof cases Values e assume Values e thus thesis by simp next assume Values e withwte havex 3 aEr e Er AemptyFEa 7 AEewfctx AemptyFra Aredexr using welltypeddecompositionof empty e 7 by simp from x obtain a E r where eE e Er and wtc empty F E a a 7 and ME E E wfctx and wtr empty F r a and rr redex r by blast from rr obtain r where red r 7H r using redex def by blast from wa red have Er gt EM by rule Step with eE show thesis by blast qed qed lemma preservation assumes s e gt gt e and wte empty F e 7 shows empty F e 7 using 5 proof cases rule evalstepcases fix E r r assume a e7 e ErEr1 and ME E E wfctx and rr r 7H r from a wte obtain a where wtc empty F E a a 7 and wtr empty F r 0 using welltypedexprctx by blast from wtr rr have wtrp empty F r 0 using subjectreduction by blast from wtc wtrp have empty F ll E r 7 by rule fillctx welltyped with a show thesis by simp qed 1212 Type safety constdefs nished expr bool nished e E G e e gt e3 syntax evalsteprtrancl expr expr bool in xl gt 51 48 translations e gt e ee E evalstep theorem typesafety assumes et empty F e 7 and ee e gt e shows empty F e 7 A Values e V a finished e3 using ee et proof induct rule rtranclinduct fix a assume wta empty F a 7 from wta have Values a V 3 e a gt e by rule progress with wta show empty F a 7 A Values a V a finished a usingfinisheddef by auto next fix a b c assume IH empty F a 7 gt empty F b 7 A Values b V finished b and be b gt c and wta empty F a 7 from wta IH have wtb empty F b 7 by simp from be wtb have wtc empty F c 7 by rule preservation from wtc have Values c V 3 e c gt e by rule progress with wtc show empty F c 7 A Values c V a finished c usingfinisheddef by auto qed 13 TOTAL RECURSIVE FUNCTIONS Isabelle s recdef facility let you write functions without syntax restrictions on the recursion pattern as with primrec However you must provide the termination measure That is you must provide a function that maps the input of your recursive function to an element of a wellfounded set such as the natural numbers and show that these elements decrease for each recursive call 131 The Fibonacci function The following is a simple example of a recursive function the Fibonacci function constsfib mat a mat recdeffib measure 11 n fib 0 0 fib Suc 0 1 fib Suc Suc x fib x b Suc x 49 thmfibinduct lemmafib Suc Suc Suc Suc 3 bysimp 132 Case study Euclid s Algorithm consts computegcd nat x not a not recdef computegcd measuremn n computegcdm7 n ifn 0 then m else computegcdn7 m mod n thm computegcdinduct constdefs divisibleby not a not bool l 8080 79 divisibleby m n E 3 k m n gtk k declare divisiblebydef simp constdefs isGCD not a not a not bool isGCD km 11 E mlk A mlk A V k mlk A nlk a klk declare isGCDde simp theorem isGCD computegcdmn m n proof induct rule computegcdinduct fix m n assume IH 11 7 0 isGCD computegcd 117 m mod n n m mod 11 show isGCD computegcd m7 n m n proof casetac n 0 assume n 0 thus thesis by simp next assume N n 7 0 from NIH have isGCD computegcd 117 m mod n n m mod n by simp oops 133 Merge sort The goal of merge sort of course is to produce a sorted list consts sorted not list bool primrec sorted True sorted xxs Vy E setxs x g y A sorted 35 The merge sort function will use the following auxiliary function to merge already sorted sublists When using the recdef facility the recursive function must have a single param eter but that parameter may be a tuple 50 consts merge not list gtk not list a not list recdef merge measurexsys size xs size ys mergeXX57y5 ifx g y then x mergexsyys elsey mergexxs7 ys mergexs xs mergems ys Isabelle generates a special purpose induction rule for each recursive function Compare the following rule to the de nition of merge x g y a sz yys AXXS 5 x Sy gtP xxsys N 39 P ms ms P w z P ac ad P acad P u v lemma setmergesimp setmergexsys set xs U setys applyinduct xs ys rule mergeinduct apply auto done If the inputs to merge are sorted then so is the output and Viceversa lemma sortedmerge simp sorted mergexsys sorted xs A sorted ys applyinduct xs ys rule mergeinduct applysimpall add ballUri linordernotle orderlessle applyblast intro ordertrans done Here s the de nition of merge sort consts msort not list a not list recdef msort measure size msort msort x x msort xs mergemsorttake size xs div 2 xs7 msortdrop size xs div 2 xs The induction rule for msort is p H x p x P drop luzaal div 2 uzaa P take luzaal div 2 uzaa u z aa P uzaa P x theorem sortedmsort sorted msort xs by induct xs rule msortinduct simpall 51 134 Substitution and strong induction We de ne the explicitly arenaming version of substitution a la Curry 1 5 using the recdef facility The proof of termination relies on a proof by strong induction an extremely general and powerful induction principle datatype expr Var nat l Lam nat expr 5353 52 l App expr expr To be completely concrete and computable we choose fresh variables by computing the largest variable in the relevant terms and add 1 thereby guaranteeing that the new variable does not occur in these expressions consts maxv expr nat primrec maxv Var x x maxv x e max maxv e x maxv App e1 e2 max maxv e1 maxv e2 constdefsfresh mat expr expr nat fresh x e e E max max maxv e x maxv e 1 Here s the de nition of substitution We label each clause so that we can used them as simpli cation rules consts subst expr x nat x expr expr syntax subst mat expr expr expr 100100100 101 translations xe e substexe recdef permissive subst measure p size fstp svar xeVar y ify x then e else Vary slam xey e letz fresh x e e in z xeyVar ze supp IxelApp 81 82 App Ixelei Ixel82 The use of permissive tells Isabelle not to immediately abort but instead accept the subst function conditionally Isabelle accepts a modi ed form of the subst function that includes extra if statements to make sure that it terminates xeVary ify x then e else Vary xe y e letz fresh x e e in A z ifsize yVar ze lt Sac size 6 then xe yVar ze else arbitrary 4141717 61 e2 APP 90461 lXiele2 The response window tells us that Isabelle could not prove termination and where it got stuck We then create a lemma slightly generalizing from the stuck proof state The following lemma says that substituting a variable for a variable does not change the size of an expression The proof cannot be done by structural induction on the expression because the nested substitution changes the expression so the induction hypothesis is not applicable Instead we use strong induction aka course of values induction on the size 52 of expressions With this style of induction the induction hypothesis is applicable to any expression smaller than the current one The following is the rule for strong induction VmltnPm An P11 P11 lemma alphasubst sizesimp V x w e size e n a size xVar wle n proof induct rule natlessinduct fix 1 assume IH Vmltn wa e size e m a size xVar wle in show wa e size e n a size xVar wle n proof rule allI7 rule impI fix x and w and eexpr assume se size e 11 show size xVar wle n proof cases e fixy assume e Vary thus size xVar wle 11 using se by simp add svar next fix x 7 e assume E e x e let W max max maxv e3 x w 1 from E se have Suc size e n by simp witth have EP Suc size x Var We n by auto from se EP E have EPZ size x Var We lt Suc size e by auto from EP IH have Suc size xVar wx Var We n by auto with E EP2 show size xVar wle n by simp add slamfreshdef next fix e1 e2 assume AP e App e1 e2 fromAP se have size e lt n by auto witth have E1 size xVar wlel size e by auto fromAP se have size e2 lt n by auto witth have E2 size xVar wleZ size e2 by auto fromAP E1 E2 have size xVar wle size e by simp add sapp with se show size xVar wle n by simp qed qed qed With the above lemma established we can resolve the termination conditions and update the simpli cation rules for the subst function recdeftc subst 1 by simp lemmas substsimpssimp substsimps simplified lemma substlam z fresh x e e gt xey e3 z xeyVar zle by simp addfreshdef 53 135 DepthFirst Search typedecl node types graph node gtk node list consts adj graph7 node gt node list primrec adj l n l adj ees n iffst e n then snd e adj es 1 else adj es 1 constdefs adjs graph7 node list gt node set adjs gxs E setg setxs lemma adjset y 6 set adj gx xy 6 set g by induct g7 auto lemma adjsCons adjs g xxs set adj g x U adjs g xs byunfold adjsdef7auto simp addImagedef adjset constdefs reachable graph7 node list a node set reachable g xs E set g set xs constdefs nodesof graph node set nodesofg E set mapfst g map snd g lemma ruleformat7 simp x Z nodesofg adj gx H by induct g7 auto simp add nodesofdef constdefs dfsrel graph gtk node list gtk node list gtk graph gtk node list gtk node list set dfsrel E invimage finitepsubset ltlexgt lessthan Agxsys nodesofg 7 setys7 size xs lemma dfsrelwf wf dfsrel by auto simp add dfsreldefwf nitepsubset lemma simp nite nodesofg 7 setys proofrule nitesubset show nite nodesofg by auto simp add nodesofdef qed auto 54 consts dfs graph gtk node list gtk node list a node list recdef permissive dfs dfsrel dfsbasesimp dfs g7 l ys ys dfsinductive dfs g7 XXS ys ifx mem ys then dfs g7 xs7 ys else dfs g7 adj gx xs7 xys hints recdefsimp add dfsreldef nitepsubsetdef recdefwf add dfsrelwf o The second argument of dfs is a stack of nodes that will be visited 0 The third argument of dfs is a list of nodes that have been visited already recdeftc dfstc dfs proof intro alll fix g xys show e x mem ys 7 nodesofg 7 insertx setys C nodesofg 7 setys V nodesofg 7 insertx setys nodesofg 7 setys A adj gx l by cases x E nodesofg7 auto simp add memiff qed lemmas dfsinduct dfsinductOF dfstc lemmas dfsinductivesimp dfsinductivelOF dfstc To do proof of correctness 136 Notes The material on merge sort is from the HDLexMergeSort thy example from the Isabelle distribution The material on DepthFirst Search is from 9 14 METATHEORY OF PROPOSITIONAL LOGIC We formalize the meaning of propositional formulas and de ne a proof system We prove completeness of the proof system via Kalmar s variable elimination method The mate rial in this section is based on several teth on Mathematical Logic 2 8 and Paulson s completeness proof in lsabelleZF 10 55 141 Formulas and their meaning datatype formula Atom nat Neg formula Impliesformulaformula in xl a 101 consts eval not bool formula bool primrec eval v Atom 1 v a eval v Neg 4p a eval v 47 eval v 4p a 11 eval v 4p eval v 11 constdefs tautology formula bool tautology 4p E V v eval v 4p satis es not bool formula set bool sats 8080 80 vsatsE E V 4p 62 evalvap satis able formula set bool satis able E E 3 v v sats 2 implies formula set formula bool 8080 80 E LpE V vvsatsE gtevalvltp True 142 Axioms and proofs constdefs A1 formula A1 EAtom 0 a Atom 1 HAtom 0 A2 formula A2 E Atom 0 a Atom 1 HAtom 2 gt Atom 0 a Atom 1 a Atom 0 a Atom A3 formula A3 E Neg Atom 1 a Neg Atom gt Neg Atom 1 HAtom 0 HAtom1 Axioms formula set Axioms E A1A27A3 declare A1 defsimp AZde simp A3defsimp Axiomsde simp lemma tautology A1 by simp add tautologydef lemma tautologyAZ by simp add tautologydef lemma tautology A3 by simp add tautologydef consts subst nat formula formula formula primrec subst 8 Atom x S x subst S Negf Neg subst Sf subst 8 f1 HfZ subst Sf1 a substSfZ 56 consts deduction formula set xformula set syntax deduction formula set formula bool F 100100 100 translations E F 4p Sp 6 deduction inductive deduction intros hypu762gtEFua ax 4p EAxioms gt E F substS 4p mp lEFwn ElegtEFw constdefs emp nat formula emp E x Atom x 143 Basic properties of the proof system lemma aa P F 4p gt 4p roof 7 let M emp0 w1 Hw2iw let S emp04p14p have p P F subst 80 AZ apply rule ax by simp have p2 P F subst 80A1 apply rule ax by simp haVeP3 P F W7 A w n 7 n w n w using p1 p2 apply simp apply rule mp have p4 P F subst S A apply rule ax show P F 4p gt 4p using p3 p4 apply simp apply rule mp apply blast apply blast done qed VVVV V apply blast apply blast done by simp V lemma weakeningruleformat P F 4p gt V A P g A a A F 4p apply induct rule deductioninduct apply clarify using hyp apply blast using ax apply blast apply clarify apply eruletac XA in allE apply eruletac XA in allE apply clarify using mp apply blast done theorem soundness assumes d A F 4p shows A 4p using d apply induct rule deductioninduct apply auto simp add impliesdefsatisfiesdef done lemma ppp assumes A 4p 6 P shows P F Lp gt 4p proof 7 let S 8mp02lpliLp 57 have p P F subst SA1 apply rule ax by simp fromA have p2 P F 4p by rule hyp from p p2 show thesis apply simp apply rule mp apply blast apply simp done qed lemma deductionimpl P F gtV tpPP inserttpPgtPF paw apply induct rule deductioninduct apply clarify apply casetac 4p4p apply simp apply rule aa apply simp apply rule ppp apply simp apply clarify defer apply clarify apply eruletac XAp in allE apply eruletac XP in allE apply eruletac XAp in allE apply eruletac XP in allE apply simp proof 7 x w 11 W1 assume IH P F tp a 4 a 1 andIHZ P F tp gt 4p let s emp0 w 1 w2w have p P F subst 8 AZ apply rule ax by simp from 1H1 p1 have p2 P F Lp gt 4p gt tp gt 1 apply simp apply rule mp apply blast apply blast done from p2 1H2 show P F Lp a 1b by rule mp next x 8 2 4p LpP assume pa 4p 6 Axioms from pa have A P F subst 8 4p by rule ax let S emp0substS 4p14p have p P F subst SA1 apply rule ax by simp fromA p1 show P F tp a subst 8 4p apply simp apply rule mp apply blast apply simp done qed theorem deduction insert 4p P F 1b gt P F 4p a 1p using deductionimpl by simp lemma cutrule assumesA P F 4p and B insert 4p P F 1b shows P F 1b proof 7 from B have C P F 4p a 1 by rule deduction from CA show thesis by rule mp 58 qed lemma mphyp assumes ppG 4p 7 1b 6 P and pG 4p 6 P shows P F 1b proof 7 from pG have Gp P F 4p by rule hyp from ppG have pp P F 4p 7 1 by rule hyp from pp Gp show P F w by rule mp qed lemma corJJOa b7cC7d F b7d proof 7 let E b7cC7d have C insert b E F c apply rule mphyp apply blast apply blast done have CD insert b E F C7d apply rule hyp by blast from CD C have insert b E F d by rule mp thus E F b7d by rule deduction qed lemma corJJOb b7C7d7 c F b7d proof 7 let E b7c7gtd C have CD insert b E F C7d apply rule mphyp apply blast by blast have C insert b E F c apply rule hyp by simp from CD C have insert b E F d by rule mp thus thesis by rule deduction qed lemma lemJHa P F Neg Neg 4p 7 4p proof 7 let S emp0Neg Lpl4p have p P F subst 8 A3 apply rule ax by simp have p2 P F Neg 4p 7 Neg 4p by rule aa have subst 8 A37 Neg 4p 7 Neg Lp F Neg 4p 7 Neg Neg 4p 7 4p apply simp by rule corJJOb hence A insert subst 8 A3 insert Neg 4p 7 Neg 4p P F Neg 4p 7 Neg Neg 4p 7 4p using weakening apply blast done obtain x where X x insert Neg 4p 7 Neg 4p P by simp fromXA have B insert subst 8 A3 x F Neg 4p 7 Neg Neg 4p 7 4p by simp from p X have p3 x F subst 8 A3 using weakening apply blast done from p3 B have x F Neg 4p 7 Neg Neg 4p 7 4p by rule cutrule withX have C insert Neg 4p 7 Neg 4p P F Neg 4p 7 Neg Neg 4p 7 4p by simp let Y Neg 4p 7 Neg Neg 4p 7 4p 59 from p2 C have D P F Y by rule cutrule let S emp0Neg Neg Lp1 Neg 4p have E P F subst S A apply rule ax by simp let tz Neg Neg e e e have F subst 81 A17 Y F Z apply simp by rule corJJOa obtainy where Y y insert Y P by simp from F Y have G insert subst S A1y F Z using weakening apply blast done from E Y have H y F subst S A using weakening apply blast done from H G havey F Z by rule cutrule with Y have I insert Y P F Z by simp from D I show P F Z by rule cutrule qed lemma lemJHb P F 4p a Neg Neg proof 7 let ts emplt0egtgtltlltlveg Neg e have p P F subst 8 A3 apply rule ax by simp have p2 P F Neg Neg Neg a Neg 47 by rule lemJHa let P3 Neg Neg Neg e e e e Neg Neg e from p p2 have p3 P F P3 apply simp apply rule mp apply blast apply blast done let Sl emplt0egtgtltl ltNeg Neg Neg em let P4 tpHNegNegNeg Lpgt4p have P F subst S A apply rule ax by simp hence p4 insert P3 P F P4 apply simp using weakening by blast have P47 P3 F 4p a Neg Neg 47 by rule corJJ 0a hence insert P4 insert P3 P F 4p a Neg Neg 47 using weakening by blast with p4 have insert P3 P F 4p a Neg Neg 47 using cutrule by blast with p3 show P F 4p a Neg Neg 47 using cutrule by blast qed lemma lemJHc P F Neg 4p a 4 gt 1 sorry lemma lemJHd P F Negzl a Neg 4p a 4 gt 1 sorry lemma lemJHe P F Lp gt 1p gt Negzl a Neg 47 sorry lemma lemJHf P F a Negzl a Neg 4p a sorry lemma lemJHg P F Lp a 1 a Neg 4p a 1b a 1b sorry 144 Completeness consts hyps nat set formula formula set primrec hyps T Atom n ifn E T then Atom n else Neg Atom hypsTNeg hypsTw hypsT4szbhypsTlpUhypstb lemma hyps nite finite hyps T 4p apply induct 4p apply auto done 60 lemma hypsmember V TX x E hyps T 4p gt 3 a x Atom 1 A a E T V x Neg Atom 1 A 1 T apply induct 4p by auto lemma hypsdiff hyps T704 4p g insert Neg Atom 04 hyps T 4p 7 Atom 1 apply induct 4p by auto lemma hypscons hyps insert a T 4p g insert Atom 1 hyps T 4p7Neg Atom a by inducttac 4p auto constdefs ip nat set formula formula ip T 4p E ifeval x x E T 4p then 4p else Neg 4p lemma eval x x E T flip T 4p by simp add ipdef lemma kalmarruleformat V 4psize4pngthypsV4pF ipV4p apply induct rule natlessinduct apply clarify proof 7 fix n and 4pformula assume IH V mltsize 4p V 4p size 4p m hyps v 4p F ip v 4p show hyps v 4p F ip v 4p proof cases 4p fix a assume p 4p Atom 1 thus thesis apply simp add ipdef using hyp by blast next fix 1 assume p 4p Neg 1 show thesis proof cases eval x x E v 1p assume ev eval x x E v p from ev have evnp a eval x x E v Neg 1M by simp from ev havefp ip v p p by simp add ipdef from evnp p havefnp ip v 4p Neg 4p by simp add ipdef from p have size 1 lt size 4p by simp witth have hyps v p F ip v p by blast withfp have A hyps v p F p by simp have B hyps v p F 1 a Neg Neg 1M by rule lemJHb from B A have hyps v p F Neg Neg 1p by rule mp withfnp p show thesis by simp next assume ev a eval x x E v p from ev p have evp eval x x E v 4p by simp hencefp ip v 4p 4p by simp add ipdef 61 from ev havefps ip v 1amp Neg 1amp by simp add ipdef from p have size 1amp lt size 4p by simp witth have hyps v 1amp F ip v 1amp by blast withfps pfp show thesis by simp qed next x1amp11amp2 assume p 4p 1amp1 a 1amp2 from p have s1 size 1amp1 lt size 4p by simp from s11H have 1H1 hyps v 1amp1 F ip v 1amp1 by blast from p have s2 size 1amp2 lt size 4p by simp from s2 1H have 1H2 hyps v 1amp2 F ip v 1amp2 by blast show thesis proof cases eval x x E v 1amp1 assume ev1 eval x x E v1amp1 from ev1 havef1flip v 1amp1 1amp1 by simp add ipdef show thesis proof cases eval x x E v 1amp2 assume ev2 eval x x E v 1amp2 from ev2 havef2flip v 1amp2 1amp2 by simp add ipdef fromp ev2 havefpflip v 4p 4p by simp add ipdef fromf2 1H2 have ps2 hyps v 1amp2 F 1amp2 by simp let S emp01amp211amp1 have hyps v 1amp2 F subst 8 A1 apply rule ax by simp with ps2 p have X hyps v 1amp2 F 4p apply simp apply rule mp apply blast apply blast done from p have hyps v 1amp2 g hyps v 4p apply simp by blast withX have hyps v 4p F 4p by rule weakening withfp show thesis by simp next assume ev2 a eval x x E v 1amp2 hencefp2flip v 1amp2 Neg 1amp2 by simp add ipdef fromp ev2 have a eval x x E v 4p by simp hencefp ip v 4p Neg 4p by simp add ipdef fromp have p1p hyps v 1amp1 g hyps v 4p apply simp by blast from 1H1 f1 have p1p1 hyps v 1amp1 F 1amp1 by simp from p1p1 p1p have p1 hyps v 4p F 1amp1 by rule weakening fromp have p2p hyps v 1amp2 g hyps v 4p apply simp by blast from 1H2 fp2 have p2p2 hyps v 1amp2 F Neg 1amp2 by simp from p2p2 p2p have p2 hyps v 4p F Neg 1amp2 by rule weakening 62 have hyps v 4p F 1amp1 7 Neg 1amp2 7Neg 1amp1 7 1amp2 by rule lem111f with p1 have hyps v 4p F Neg 1amp2 7 Neg 1amp1 7 1amp2 using mp by blast with p2 have hyps v 4p F Neg 1amp1 7 1amp2 using mp by blast withfp p show thesis by simp qed next assume ev1 a eval x x E v 1amp1 withp have ep eval x x E v 4p by simp from ev1 havef1 ip v 1amp1 Neg 1amp1 by simp add ipdef from ep havefp ip v 4p 4p by simp add ipdef fromf1 1H1 have ps1 hyps v 1amp1 F Neg 1amp1 by simp have p12 hyps v 1amp1 F Neg 1amp1 7 1amp1 7 1amp2 by rule lem111c from p12 ps1 have hyps v 1amp1 F 1amp1 7 1amp2 by rule mp withp haveX hyps v 1amp1 F 4p by simp from p have Y hyps v 1amp1 g hyps v 4p apply simp by blast from YXfp show thesis apply simp apply rule weakening apply blast by blast qed qed qed lemma excludedmiddle assumes pp insert 4p P F 1amp and npp insert Neg 4p P F 1amp shows P F 1amp proof 7 from pp have a P F 1p 7 1amp by rule deduction from npp have b P F Neg 1p 7 1amp by rule deduction have c P F 1amp71amp 7 Neg 1p 7 1amp 7 1amp by rule lem111g from c a have d P F Neg 1p 7 1amp 7 1amp by rule mp from d b show P F 1amp by rule mp qed lemma variableelimination finite H gt V 4p tautology 4p AH g hyps T0 1p 7 V T IDPS T w i H F 7 apply induct rule finiteinduct apply clarify defer apply clarify defer proof 7 x a T assume taut tautology a have hyps T 4p F ip T 4p apply rule kalmar by simp with taut show hyps T 4p 7 F 4p by simp add ipdef tautologydef next 63 fix x H 4p T assume IH le tautology 4p A H g hyps T0 4p 7 V T hyps T 4p 7 H F 4p and taut tautology 4p and xj h insert x H g hyps T0 4p from xj h obtain 04 where X x Atom 04 A 04 6 T0 V x Neg Atom 04 A 04 T0 using hypsmember by blast moreover assume X x Atom 04 A 04 6 T0 have hyps T 4p 7 insert Atom 04 H F 4p proof rule excludedmiddle of Atom 04 from taut xj h IH have a hyps T 4p 7 H F 4p by blast have b hyps T 4p 7 H g insert Atom 04 hyps T 4p 7 insert Atom 04 H by blast from a b show insert Atom 04 hyps T 4p 7 insert Atom 04 H F 4p using weakening by blast next from taut xj h IH have a hyps T704 4p 7 H F 4p by blast have hyps T704 4p g insert Neg Atom 04 hyps T 4p 7 Atom 04 by rule hypsdiff withX have b hyps T704 4p 7 H g insert Neg Atom 04 hyps T 4p 7 insert Atom 04 H by blast from a b show insert Neg Atom 04 hyps T 4p 7 insert Atom 04 H F 4p using weakening by blast qed moreover assume X x Neg Atom 04 A 04 T0 have hyps T 4p 7 insert Neg Atom 04 H F 4p proof rule excludedmiddle of Atom 04 from taut xj h IH have a hyps insert 04 T 4p 7 H F 4p by blast have b hyps insert 04 T 4p g insert Atom 04 hyps T 4p 7 Neg Atom 04 by rule hypscons from b have c hyps insert 04 T 4p 7 H g insert Atom 04 hyps T 4p 7 insert Neg Atom 04 H by blast from a c show insert Atom 04 hyps T 4p 7 insert Neg Atom 04 H F 4p using weakening by blast next from taut xj h IH have a hyps T 4p 7 H F 4p by blast have b hyps T 4p 7 H g insert Neg Atom 04 hyps T 4p 7 insert Neg Atom 04 H by blast from a b show insert Neg Atom 04 hyps T 4p 7 insert Neg Atom 04 H F 4p using weakening by blast qed ultimately show hyps T 4p 7 insert x H F 4p by blast qed theorem completeness assumes taut tautology 4p shows F 4p proof 7 have finite hyps T 4p by rule hyps nite with taut have hyps T 4p 7 hyps T 4p F 4p using variableelimination by blast thus thesis by simp qed 64 REFERENCES 1 2 3 4 5 6 7 8 9 10 11 12 13 H Barendregt The Lambda Calculus volume 103 of Studies in Logic Elsevier 1984 S Bilaniuk A problem course in mathematical logic A Chargu raud B C Pierce and S Weirich Proof engineering Practical techniques for mechanized metatheory Sept 2006 Submitted for publication T H Cormen C Stein R L Rivest and C E Leiserson Introduction to Algorithms McGrawHill Higher Education 2001 H B Curry and R Feys Combinatory Logic NorthHolland Publishing Co Amster dam 1958 M Felleisen and R Hieb The revised report on the syntactic theories of sequential control and state Theoretical Computer Science 10322354271 1992 A D Gordon A mechanisation of namecarrying syntax up to alphaconversion In HUG 93 Proceedings of the 6th International Workshop on Higher Order Logic Theorem Proving and its Applications pages 413425 London UK 1994 Springer Verlag E Mendelson Introduction to Mathematical Logic Chapman and Hall 1997 T Nishihara and Y Minamide Depth rst search In G Klein T Nipkow and L Paulson editors The Archive of Formal Proofs httpafpsourceforgenetentries DepthFirstSearchshtml June 2004 Formal proof development L C Paulson Set theory for veri cation 11 Induction and recursion Journal of Automated Reasoning 1521674215 1995 R Pollack Closure under alphaconversion In TYPES 93 Proceedings of the inter national workshop on Types for proofs and programs pages 3134332 Secaucus NJ USA 1994 SpringerVerlag New York Inc D J Velleman How to Prove It Cambridge University Press 1994 A K Wright and M Felleisen A syntactic approach to type soundness Information and Computation 115138494 1994 65 Cryptology and Physical Security Rights Ampli cation in MasterKeyed Mechanical Locks Matt Blaze ATampT Labs 7 Research mabcrypto com mabresearch att com PREPRINT 7 15 Sept 2002 revised 6 Feb 2003 To appear in IEEE Security andPrivacy MarchApril 2003 This paper can be found at http www crypto compapersmk pdf Abstract This paper examines mechanical lock security from the perspective of computer science and cryptol ogy We focus on new and practical attacks for amplifying rights in mechanical pin tumbler locks Given access to a single masterkeyed lock and its associated key a procedure is given that allows discovery and creation of a working master key for the system No special skill or equipment beyond a small number of blank keys and a metal le is required and the attacker need engage in no suspicious behavior at the locks location Countermeasures are also described that may provide limited protection under certain circumstances We conclude with directions for research in this area and the suggestion that mechanical locks are worthy obj ects for study and scrutiny 1 Introduction In the United States and elsewhere mechanical locks are the most common mechanisms for access control on doors and security containers They are found in and guard the entrances to the vast majority of residences commercial businesses educational institutions and government facilities and often serve as the primary protection against intrusion and theft As important as locks are in their own right their design and function has also in uenced much of how we think about security generally Computer security and cryptology borrow much of their language and philosophy from metaphors that invoke mechanical locksmithing The concept of a key as a small secret that allows access or operation the notion that system security should be designed to depend only on the secrecy of keys and even the reference to attackers as intruders can all be traced back to analogies that long predate computers and modem cryptology Conversely the design of mechanical locks could well be informed by the philosophy and methodology of computer security and cryptology For example formal notions of the computational complexity and other resources required to attack a system could be applied to the analysis and design of many aspects of mechanical locks In general however these concepts have not enjoyed widespread adoption by lock smiths or lock designers Computer security specialists for their part are often surprisingly unskeptical in evaluating claims of physical security This paper examines the security of the common masterkeyed pintumbler cylinder lock against an insider threat model more commonly associated with computing systems unauthorized rights amplz cation As we shall see not only is this threat of practical concern in physical security there are simple attacks that render many realworld lock systems quite vulnerable to it 2 Background Mechanical Locks A complete review of lock technology is well beyond the scope of this paper For an excellent discussion of physical security design and evaluation the reader is referred to 3 For the purposes of consistent terminology a brief overview follows Broadly speaking mechanical locks fall into two general categories combination locks which operate upon demonstration of a secret procedure and keyed locks which operate with use of a secret token Com bination locks are most frequently used to control access to safes and vaults and on some padlocks most commercial and residential doors and entrances use keyed locks There are many different keyed lock designs that have been invented and used throughout the industrial age among currently manufactured schemes there are warded locks lever locks wafer locks and dimple locks More recently electronic locks and computerbased access control systems have found application in certain commercial environments By far the most common medium and highsecurity mechanical keyed lock mechanism in the US and many other countries however is the mechanical pin tumbler lock cylinder 21 Evaluating Lock Security Mechanical locks must resist a much wider range of threats than those associated with computing or com munications systems First of course locks function in the physical world and must therefore be suf ciently mechanically strong to withstand forceful attack Evaluation of this aspect of lock security focuses on such issues as the strength of materials the accessibility of weak points resistance to various tools and so forth There are industry and government standards that require speci c physical characteristics of locks for various appli cations which vary depending on the eXpected resources of the attacker and the likely ease of alternative methods of entry eg through a broken window A related issue is the ease with which the locking mechanism itself can be bypassed It may be possible to open a lock without interacting with the keyed mechanism at all door latches can often be wedged or pried open for example Here security depends not only on the lock but also the soundness and correctness of its installation It is also possible that a lock might be manipulated to operate without a key or that a key can be fab ricated without knowledge of its parameters The most common or at least famous manipulation method involves picking which eXploits small manufacturing imperfections and mechanical tolerances to set a lock to a keyed state without using a key A related method impressioning fabricates a working key directly Manipulation is generally nondestructive and may leave behind only minimal eXtemal evidence Both pick ing and impressioning require nesse and skill however and are much more dif cult to carry out against locks of better quality especially designs that employ security features intended speci cally to thwart ma nipulation Evaluating and protecting against most of the above threats focuses more on the details of a lock s mechanical and physical construction than on abstractly quanti able security metrics A computer science and cryptologic security analysis on the other hand might take a more abstract idealized view of locks and their operation In particular we might be especially concerned with the security of the key space against various threats The most basic design goal of all keyed locks is that a correct key is required for operation ideally it should not be possible to operate a lock without possession of the key This is rarely achieved in practice due to the factors discussed above but that is not critical for the purposes of this discussion Among the most quanti able security parameters for discussing locks therefore is the number of possible unique keys called the number of di ers or changes in the terminology of the trade which gives the probability that a randomly cut key will operate a given lock and an upper bound on the resources required to nd a Figure l A pin tumbler lock cylinder Left The cylinder face Note the keyway which is cut into the plug which in turn sits inside the shell Right Side view with part of the shell and plug cut away to expose the six pin stacks Note the border between the plug and shell which forms the shear line and the cuts in each pin stack resting within the plug working key by exhaustive search On typical commercial locks there are between several thousand and several million possible distinct keys While these numbers may seem very small by computational security standards mechanical locks perform on a more human scale Testing a key against a lock after all is an online operation requiring seconds not microseconds and carries with it at least some risk of discovery if the lock is not one to which the attacker has legitimate access If exhaustive search is not feasible it may still be possible to analyze and exploit a lock s key space in other ways 22 The PinTumbler Lock The modern pin tumbler lock is quite simple dating back to ancient Egypt but not commercially mass produced until the middle of the 19th century The basic design consists of a rotatable cylinder tube called a plug that operates the underlying locking mechanism Around the circumference of the plug is a shell which is xed to the door or container Rotation of the plug within the shell operates the locking mechanism In the locked state the plug is prevented from rotating by a set of movable pin stacks typically under spring pressure that protrude from holes in the top of the opening in the shell into corresponding holes drilled into the top of the plug Each pin stack is cut in one or more places perpendicular to its length See Figure 1 In practice the cuts are produced by stacking pin segments of particular sizes not by actually cutting the pins hence the term pin stack With no key in the lock all the pin stack cuts rest within the plug When a key is inserted into the keyway slot at the front of the plug the pin stacks are raised within the plug and shell The plug can rotate freely only if the key lifts every pin stack s cut to align at the border between the plug and shell The plugshell border is called the shear line See Figure 2 The plug will be blocked from rotating if any pin stack is lifted either not far enough with the cut still in the plug below the shear line or too far with the cut pushed above the shear line and into the shell to rotate all pin stacks must have a cut at the shear line See Figure 3 The height or cut depth of a key under each pin stack position is called its hitting the bitting of a key is the secret needed to open a lock A key that is bitted to the wrong depth in even one pin position will not allow the lock to operate Generally a lock manufacturer will choose from among only a small number of standard bitting depths at each pin position This allows keys to be described concisely typically the bitting depth number is written starting from the shoulder handle of the key to the tip giving the standard depth number at each L W A Elihu1177 ill l Figure 2 Pin tumbler lock with a correct key inserted Left The correct key lifts the pin stacks to align the cuts at the shear line Right With all of the cuts at the shear line the plug can rotate freely within the shell Here the plug has been turned slightly toward the camera so that the tops of the pins in the plug are Visible Figure 3 A lock with an incorrect key Observe that while three of the pin stacks cuts are at the shear line two stacks have the cut too high and one stack has the cut too low Figure 4 A master keyed pin tumbler lock Left Each of the six pin stacks has two cuts Right With the correct change key inserted one of the cuts on each pin stack is aligned at the shear line Observe that the other cut is sometimes above and sometimes below the shear line position So a key for a ve pin lock denoted 12143 would be cut to depth 1 nearest the shoulder and proceeding toward the tip cut at depths 4 and 3 The exact speci cations of the depths and positions for most commercial locks are widely published in the trade or could be discovered easily by disassembling a sample lock or measuring a small number of cut keys Typically the number of pins is in the range of four to seven and the number of possible depths ranges from four to ten depending on the lock model Better quality locks generally employ more pins and use more distinct cut depths on each Pin tumbler locks can often be defeated in various ways although a discussion of lock picking and other bypass techniques that require specialized skills or tools or that exploit mechanical imperfections is beyond the scope of this paper In practice however even very modest products are often su iciently secure or offer the perception of being suf ciently secure to discourage the more casual wouldbe intruder from attempting to operate a lock without a key Probably the most commonly used techniques for unauthorized entry aside from brute force involve procuring a working key 23 Master Keying Complicating the analysis of pin tumbler lock security is the fact that especially in largerscale installations there may be more than one key bitting that operates any given lock The most common reason for this phenomenon is the practice of master keying in which each lock in a group is intended to be operated not only by its own unique key the change key in trade parlance but also by master keys that can also operate some or all other locks in the system Master keying in pin tumbler locks can be accomplished in several ways with the earliest systems dating back over 100 years The conceptually simplest master key method entails two cylinders on each lock one keyed individually and the other keyed to the master bitting a mechanical linkage operates the lock when either cylinder is turned Other master keying schemes employ an independently keyed master ring around the lock core and still others depend on only a subset of pin positions being used in any given lock All of these approaches have wellknown advantages and disadvantages and are not considered in this paper Most importantly these schemes require the use of special locks designed speci cally for master keying The most common master keying scheme the subject of consideration of this paper can be used with virtually any pin tumbler lock Recall that in a pin tumbler lock each pin stack is cut in one place de ning exactly one depth to which the stack must be lifted by the key bitting to align with the shear line In the conventional split pm mastering scheme some or all pin stacks are cut in more than one place typically in two places allowing additional bittings that align such pins See Figure 4 Consider for example a lock A which has ve pin stacks with four possible cut positions in each Suppose pin stacks 1 through 5 are each cut in two places corresponding to bittings l and 4 Observe that this lock can be opened by at least two keys one with bitting 11111 and another with bitting 44444 We could create a second lock B this time with pin stacks 1 through 5 each cut at depth 2 and depth 4 This lock can be operated by keys cut 22222 and 44444 If these are the only two locks in the system keys 11111 and 22222 can be said to be the change keys for locks A and B respectively while key 44444 is a master key that operates both There are a number of different schemes for master keying the subject is surprisingly subtle and com pleX and the trade has developed standardized practices in recent years For indepth treatments the reader is referred to l and 2 For the purposes of our discussion it is suf cient to note that modern master systems fall into two broad categories Total Position Progression T PP and Rotating Constant RC In TPP schemes every pin stack has a single separate master cut which is never used in that position on any change keys In RC schemes change keys do share the master bitting for a xed number of pin stack positions although the positions will vary rotate from lock to lock Both these schemes can implement a directed graph with several levels of master keys submaster keys that open a subset of locks in the system and grand master keys that open morel The highestlevel master key which opens all locks in a multilevel system is sometimes called the Top Master Key TMK The astute reader will note that master keying reduces security in several important ways First of course the master key represents a very valuable target compromise of the master key compromises the entire system Even if the master keys are well protected security is still somewhat degraded Because each mastered pin stack aligns with the shear line in several positions mastered systems are more susceptible to unintentional cross keying in which keys from the same or other systems will operate more locks than intended For the same reason mastered locks tend to be more vulnerable to manipulation by picking and impressioning These weaknesses can be mitigated to some eXtent through careful planning improved mechanical construction and the use of additional pin stacks and possible cut depths In this paper however we consider methods for discovering the master key bitting in conventional pin tumbler systems given access to a single change key and its associated lock No special skills or tools are required on the part of the attacker nor is it necessary to disassemble any lock or engage in any inherently conspicuous or suspicious activity We also suggest countermeasures and alternative lock designs that can frustrate these attacks to at least some eXtent under certain circumstances 3 Rights Ampli cation ReverseEngineering Master Keys Clearly the most valuable sensitive secret in any lock system is the bitting of the toplevel master key TMK Insiders who possess legitimate change keys and have physical access to locks represent perhaps the most serious potential threat against master keyed systems The primary purpose of assigning locks unique change key bittings after all is to allow operating privileges to be granted to only speci c locks if a change key can be converted into a master key a major security objective of the system is compromised In the terminology of computer security master key systems should resist unauthorized rights ampli cation also called privilege escalation Unfortunately most deployed master key systems are quite vulnerable in this regard 1There are also Selective Key systems in which any lock can be keyed to operate with an arbitrary subset of keys using techniques similar to master keying and Maison Key schemes in which certain locks are keyed to all keys in a group We do not consider such systems here 31 Background Several timehonored methods convert change keys into master keys with different techniques applicable depending on the particular system and resources available to the attacker The simplest approach to master key discovery involves direct decoding of an original master key eg from visual inspection 1 39 l l l 39 U1 A trained observer may be able to recall the cut depths with surprising accuracy after being allowed to look only brie y at a key Another direct technique involves disassembly of a master keyed lock and measurement of the pins in each pin stack to determine the bittings that will operate each pin position Without access to the locks change key this does not yield complete information about the master bitting there will be exponentially many potential master key bittings only one of which will correspond to the true master key If every pin is mastered according with a standard TPP scheme disassembly of a single lock will reveal 21 potential master keys where P is the number of pin stacks This eXponent is still small enough to make eXhaustive search of these keys feasible in many cases Disassembly of additional locks from the same system can narrow this search space signi cantly If the change key to a disassembled lock is available the cuts corresponding to its bitting can be eliminated from each pin stack making the correct bitting of the true master unambiguously clear from a single sample More secure lock designs make it dif cult to nondestructiver remove a lock without the key eg by placing set screws in locations that are inaccessible when a door is closed and locked Padlocks are especially vulnerable to these sorts of attacks since they can be stolen easily when they are left unlocked A suf ciently large group of change key holders in TPPbased systems may be able to reverse engineer a master key without disassembling any locks Recall that in these systems change keys never have the same bitting at a given pin position as the master By measuring their change keys a conspiracy of key holders may discover a single depth not used at each pin position on the change keys this will correspond to the master bitting Several correspondents have noted that this technique is occasionally employed by enterprising university students especially at better engineering schools None of these approaches is completely satisfactory from the point of view of the attacker however Direct decoding from the true master key entails limited access to such a key and is not possible if no master key is available for Lock quot 39 39 for pin may eXpose the attacker to suspicion and could be dif cult to perform in secret and carries the risk that the lock may be damaged in reassembly Comparing a large number of different keys requires in the rst case a large number of different keys which may not be available and is ineffective against RCbased systems A more powerful attack requires only one change key and is effective against all standard TPP and RC based systems u u 32 An Adaptive Oracle Based Rights Ampli cation Attack It is useful now to consider a lock in more abstract terms From a cryptologic point of view we might observe that a lock is really an online oracle that accepts or rejects keys presented to it In this sense the oracle gives a single bit answer for each key presented to it the lock either turns or it does not A natural question to ask about any online oracle is whether it is feasible to issue a small number of queries that force the oracle to leak its secrets In particular can we eXploit the oracle to test ef ciently single bits of a possible key or must we exhaustively search the entire key space Recall that a pin tumbler lock will operate when each of its pin stacks is raised by a key to a position where one of its cuts is aligned at the shear line There is no communication among pins the lock will operate not only with all pin stacks aligned at the change key depth or all pin stacks at the master key depth but also by keys that align some stacks at the change depth and others at the master depth That is consider our ve pin lock A from the previous section with key bitting lllll representing A s change key and 44444 representing the system s master key This lock can be operated not only by the obvious keys cut 11111 and 44444 but by a total of 25 different keys including eg 11114 11141 etc It is straightforward to exploit this phenomenon to discover the master key bitting given access to a single change key and its associated lock plus a small number of blank keys milled for the system keyway In our new2 attack we use the operation or nonoperation of a lock as an oracle to determine pin by pin the complete bitting of the TIVIK 321 Notation Let P denote the number of pin stacks in a lock with stack 1 representing the rst stack eg the one closest to the shoulder of the key and stack P representing the last eg the stack at the tip of the key Let D denote the number of distinct key bitting depths in a pin stack where 1 is the highest bitting in which the pin stack is raised the most and D is the lowest in which the pin stack is raised the least Assuming that the physical properties of the system place no restrictions on the bitting depth of adjacent pin positions observe that the number of distinct keys is DP 322 The Attack For each pin position 1 from 1 to P prepare D 1 test keys cut with the change key bitting at every position except position 1 At position 1 cut each of the D 1 keys with each possible bitting depth excluding the bitting of the change key at that position Attempt to operate the lock query the oracle with each of these test keys and record which keys operate the lock In a TPPbased system with every pin mastered exactly one of the D 1 test keys for each pin position will operate the lock the depth of the test key at that position represents the master bitting at that position If none of the test keys for a particular position operates the lock then either that pin is not mastered or it is an RCbased system In either of these cases the master key bitting at that position is the same as that of the original change key Once the master bitting has been determined at each of the P positions a complete toplevel master key can be cut easily Observe that our attack consumes PD 1 key blanks and requires PD 1 probes of the lock in the worst case If it is possible for the attacker to cut keys between probes of the lock however a simple optimization reduces the number of blanks consumed to P in the worst case Rather than cutting D 1 separate blanks per position the attacker need use a single key initially cutting the position under test to the highest depth and recutting the same blank successively lower after probing the lock This reduces the total cost of carrying out the attack to less than about two US dollars in the worst case This optimized attack still requires P D 1 probes of the lock in the worse case of course 323 Practical Considerations In some lock designs not all of the DP possible keys are legal In particular with some lock models it is not possible on a standard key to have a very high cut immediately adjacent to a very low cut if the angle at which the bittings are cut reaches across to the next pin position A lock s Maximum Adjacent Cut Specz cation MACS might require for example in a system with 7 different cut depths that adjacent cuts 2It is always di icult to be sure that something is novel in the sense of not having previously been discovered independently the lack of a coherent and open body of literature on locks makes it especially so here Our attack surely is not new in this sense Several correspondents have suggested that similar approaches to master key reverse engineering have been discovered and used illicitly in the past and the method occasionally circulated informally eg on Internet message boards We subsequently found a message originally sent to a private mailing list in 1987 from Doug Gwyn that describes a similar method However there do not appear to be references to this particular attack in the published literature of either the locksmith or underground communities be no more than 4 steps apart disallowing a depth 1 cut next to a depth 7 cut Even if both the change key and the master key do not violate the MACS rule for a particular lock this attack employs test keys that mix change key cuts with potential master cuts If the original change key has very high or very low cuts it may therefore be necessary for the attacker to create some test keys that do violate MACS In practice on the locks we examined with MACS restrictions it is generally still possible to cut working test keys by using a steeper than usual angle and with cuts occupying slightly narrower than usual space on the key Although insertion and removal of such keys is more dif cult they are suf cient for this limited singleuse purpose Alternatively previously discovered master depths could be used in adjacent positions on subsequent test keys Also complicating our attack is the possibility that the master cuts lie somewhere between the standard depths ordinarily used by the lock manufacturer This is more likely in older systems or those keyed by private locksmiths who may not not follow manufacturerstandardized practices When this is suspected to be the case the attacker must probe the lock at more test cut depths removing only a small amount of key material 005 inches or so from the position under test between probes This is similar to the procedure used when creating a key by the impressioning technique and could be performed with a ne metal le Some systems especially in older installations use master cuts that are consistently higher or lower than the change key cuts This practice makes it especially easy to discover the master key with this attack Multilevel master systems may or may not present a special challenge In standard TPP and RC systems every pin stack has at most two cuts submasters are implemented by using xed change key bitting on certain pins for locks within each submaster group In such cases the attack proceeds as described and yields the TMK It is also possible however to implement hierarchical submastering by using more than two cuts on each pin stack In such cases the TIVIK bitting of a given pin may be ambiguous An attacker can distinguish the true TMK cuts in such systems by conducting the attack on locks from different submaster groups This may not always be necessary however It is common for such systems to employ the convention that all of the TMK cuts are either above or below the submaster cuts Some larger installations put different groups of locks on distinct keyways such that a change key for a lock in one group does not t into the keyway of locks from others The TMK is cut on a special master blank that ts all the keyways in the system This practice called Sectional Mastering expands the number of effective differs in the system and reduces cross keying between different lock groups Sectionally mastered systems are especially attractive targets for attack since the TMK works for a very large number of locks across groups that would otherwise have to be keyed on different master systems The attacker simply cuts the TMK bitting derived from a lock in any section onto a blank milled for the master section It is worth noting that even high security pin tumbler lock designs including those that use sidebar cuts and rotating pins are usually in principle vulnerable to this attack the only question is whether the attacker can obtain or fabricate the required blanks Furthermore our attack can be generalized to many other lock schemes including for example certain high security lever lock and disk wafer designs such as Abloy 33 Experimental Results It is easy to see that this attack is effective against the standard master keying schemes we described It is natural to ask then whether master key systems deployed in practice follow these schemes and are therefore vulnerable Unlike computing systems that can be tested relatively easily and safely in isolated testbed environments running standard software such a question can only be answered by attempting the attack against real installations The reader is cautioned that reproduction of these experiments should be carried out only with the cooperation of the owner of the lock systems on which the attack is attempted We tested our attack against avariety of medium and large scale institutional master keyed installations including both 39 39 39 and Systems tested were both relatively new and u relatively old had been both factorykeyed as well as privately rekeyed and included locks manufactured by Arrow SFIC Best SFIC CorbinRusswin Schlage and Yale For the Best SFIC Arrow SFIC and Schlage systems we used portable key punches and a supply of blank keys brought to the facilities tested For the CorbinRusswin and Yale systems we precut siX test keys on a general purpose code machine based on measurements previously taken from a change key and used a metal le at the test site to progressively cut the test keys and nally to cut the full master bitting onto a fresh blank key All required key blanks were procured from standard commercial sources which can be found easily on the Intemet with a search engine Cost per blank ranged from US014 to US035 depending on the particular lock type plus shipping We used for convenience in some of the attacks key cutting machines also available widely from commercial sources for a few hundred dollars In other cases we used a ne metal le and a dial caliper or micrometer to cut the keys to the correct bitting depth None of the equipment or supplies we used are restricted in any way Such restrictions even if they eXisted would not be espe cially effective at preventing potential attackers from obtaining blank keys given the vast number of small businesses that have legitimate need for them hardware stores etc In every case the attack yielded the top master key bitting as eXpected In general it required only a few minutes to carry out even when using a le to cut the keys All siX Arrow SFIC and Best SFIC systems we tested had all siX or seven pin stacks mastered with a TPP format The two CorbinRusswin system 70 systems each had three pin stacks out of siX mastered again with a TPP format The Schlage system used an RCbased scheme with every pin mastered and two master cuts used on each change key The Yale system was also RCbased with one master cut used on each change key Several of the systems had multilevel mastering hierarchies the attack yielded the TIVIK in all cases Notably although some of the complications discussed in the previous section such as more than one master cut per pin stack selective keying or nonstandard master depths are possible in principle we did not encounter them Every system we tested was keyed according to standard TPP or RC industry practice had at most one master cut per pin and employed standard depths making the attackers job especially straightforward Although our experiments hardly constitute an eXhaustive survey they were conducted across a wide variety of facilities that seem reasonably representative of a large segment of US institutional lock installations A check of several other lock vendors standard master keying practices further supports this conclusion 4 Countermeasures Our adaptive oracle attack is only effective against locks that have a single shear line used by both master and change keys Although this is the case with the majority of mastered locks there are commercially available designs that do not have this property Locks with a separate master ring for example require that all pin stacks be aligned to the same one of two distinct master or change shear lines and therefore do not provide feedback about the master bitting of a pin given the change bittings of the other pins3 Master ring locks however are actually more vulnerable to reverse engineering from lock disassembly by an attacker without access to the change key This attack assumes that the attacker has access to a modest supply of blank keys for the system Whether this is a practical assumption depends on the particular system of course and some restricted keyway lock products may make it more dif cult for the attacker to obtain blanks from commercial sources However 3A master ring lock has two concentric plugs with the keyway cut into the inner plug Two distinct shear lines are formed The pin stacks are correspondingly taller with one cut on each stack designed to be able to reach one shear line and another cut designed to reach the other A few master ring locks are still commercially manufactured but the design has largely fallen out of favor for most applications blanks for many socalled restricted systems are in fact readily available from aftermarket vendors Even when an exact blank is not commercially available often a different key can be milled down to t Unusual or patentprotected key designs such as those employing a sidebar cut may be more dif cult to procure directly or modify from commercial sources but blanks can still usually be fabricated in small quantities relatively easily by casting especially since the attacker already possesses a working change key cut on the correct blank In mediumscale master systems it may be possible to limit the information contained in any given lock at the expense of somewhat increased vulnerability to cross keying and picking In standard master schemes each pin stack is cut only at the master and change depths The attacker exploits the fact that any working depths not corresponding to the change key must be on the master A natural way to frustrate the attack therefore is to add false cuts to some pin stacks that do not correspond to the master and that do not appear in the majority of other locks in the system If one extra cut is added to each pin stack the attacker will learn 2 different possible master keys from one lock only one of which will correspond to the true TMK bitting These extra cuts must be selected very carefully however since each such cut reduces the number of unique differs available in the system Effectively the extra cuts create new subclasses of submaster keys among locks that share the same false cuts which the attacker must eliminate before learning the true highlevel master key 5 Conclusions and Lessons Learned In this paper we have shown a very simple rights ampli cation attack that is effective against virtually all conventional masterkeyed pin tumbler locks including many socalled highsecurity products This attack is an especially serious threat to the security of such systems because it is easy to carry out leaves no forensic evidence requires no special skills and uses only very limited resources a few blank keys and a le in the case of the most frugal attacker Compounding the threat are the facts that the attacker need engage only in apparently ordinary behavior 7 operating the lock to which he or she already has legitimate access 7 and that the attack can be carried out over a period of time in several interrupted sessions Any successful compromise of a master keyed installation can be very dif cult and costly to remedy assuming it is even discovered Every mastered lock must be rekeyed and depending how the keying is done new keys distributed to the key holders Not only is this very expensive but systemwide rekeying can also require a considerable period of time to complete during which all the old locks remain exposed In light of the inherent security vulnerabilities introduced by master keying owners of lock systems should consider carefully whether the security risks of mastering outweigh its convenience bene ts Unfortunately the computing world is not alone in often putting a premium on convenience over security If master keying must be used simple countermeasures especially the use of false cuts in mastered pin stacks can frustrate the adaptive oracle attack If it is not possible to employ lock designs such as master rings that resist such attacks these countermeasures should be considered seriously It is worth noting that these attacks become rather obvious when the basic analysis techniques of cryp tology and computer security are employed In fact as noted previously these attacks appear to have been discovered and rediscoverd independently several times occasionally passed on as underground engineering and locksmithing folklore but never documented in the literature One of the rst questions asked about any proposed cryptosystem for example is whether it is possible to test the value of one key bit indepen dently from the others If it is the system would be considered hopelessly insecure since an attack would take time only linear in the number of key bits instead of exponential The same question readily trans lates into the mechanical lock domain by substituting pin stack for key bit In fact our master key discovery scheme bears a striking resemblance to a famous characterbycharacter attack against the Tenex password mechanism Similarly the notion of an online service as an authentication oracle is familiar in 11 the analysis of cryptographic systems Mechanical locks can likewise be modeled as online oracles that accept or reject keys and security analysis conducted accordingly Finally the attack against TPP systems that compares many different change keys is reminiscent of related key attacks against cryptosystems with a threat model much like traitor tracing in broadcast encryption Perhaps other aspects of the anal ysis of mechanical and physical security would bene t from similar analogies to computing systems and cryptology On the other side of the coin the vulnerability to rights ampli cation in master keying of mechanical locks recalls similar weaknesses in cryptographic systems that attempt analogous capabilities Consider for example the vulnerabilities inherent in key escrow systems that attempt to facilitate emergency decryption by a central third party of data encrypted with many different users keys Even more direct analogies can be found in digital rights management schemes and smartcardbased digital cash systems that contain but aim to hide as master keyed locks do global secrets from their users 6 Acknowledgments The author is grateful to David Chaum Niels Ferguson AJ Hoffman Dave Korman Avi Rubin Mark Seiden Lloyd Seliber Adi Shamir Jonathan Smith Marc W Tobias and Barry Wels for comments on this paper and interesting conversations about locks generally John Ioannidis made the cutaway lock shown in the gures We are also indebted to managers at several masterkeyed installations who allowed us to conduct our experiments but who in the interests of protecting the security of their facilities cannot be thanked publicly References l J Andrews Fundamentals ofMaster Keying Associated Locksmiths of America 1990 2 BB Edwards Jr Master Keying by the Numbers 2e Security Resources Pensacola FL USA 1997 3 M W Tobias Locks Safes and Security 2e Charles Thomas Publisher Ltd Spring eld IL USA 2000 University of Maryland CMSC 456 7 Introduction to Cryptography Professor Jonathan Katz Notes on Algebra and Number Theory1 Part 1 The following are brief notes covering several topics in algebra and number theory which will be useful for the course and for applications in cryptography in general The presentation is informal and proofs are not given More thorough treatment can be found in 1 3 4 1 Groups De nition 1 A group G is a set C along with an operation on pairs of elements of G such that the following conditions hold 1 For allab Ga beG 2 The operation is associative ie for all a bc E G a b c a b c 3 There exists an identity element I such that for all a I a a I a 4 For all a E G there exists a unique inverse a 1 E G such that a a 1 a 1 a I Furthermore we say a group is commutative if for all ab E G a b b a I Some common groups include Z the set of integers under addition IRJV the set of positive real numbers under addition and IR the set of non zero real numbers under multiplication Note that Z is not a group under multiplication why not For brevity we denote a b by ab in a general group and the notation am denotes a multiplied by itself m times The identity element is typically denoted by e or by 1 We call G ie the number of elements in G the order of G Most of the groups we will deal with in this class will have nite order Lemma 1 Let m be the order of nite group G Then 9 1 for any nonzero g E C There is a nice simple proof for this see 1 for details This simple lemma can be used to demonstrate a very useful fact which we state in its own lemma because it is so important Lemma 2 Let G be a nite group of order m Let g E G and x be an integer Then gar gar mod m 1Adapted in part from 2 3 5 6 Proof Let x w mod m Then we can write x km w But now we have gm gkmz gkmgz gmgtkgz 1gtkgz 91 I One type of group which will come up again and again in cryptography and computer science generally is the group of integers modulo N for some integer N For any integer i de ne i mod N as the remainder when i is divided by N Note that by de nition this means that we have i E 0N 71 De ne the group ZN as the set ON 71 under addition where addition is done by adding elements of ZN normally ie over the integers and then reducing modulo N For example if we are working in Z5 then 4 3 2 mod 5 When the underlying group is clear from the context we will simply write 4 3 2 Note that the validity of such a statement depends tremendously on the underlying groupI Note that ZN is not necessarily a group under multiplication for example in Z15 the element 5 has no multiplicative inverse We can x this however by introducing a commutative group which will be of great importance in our study of cryptography Lemma 3 Let N be an integer and de ne Zj v m 1 S m S N and gcdmN 1 Then Z3 is a group under multiplication modulo De nition 2 We denote by MN Thus MN is the number of elements of Zj v I lfp is prime one can show that Mp p 71 this is because Z 1 p71 in this case Another important case is when N as the product of two distinct primes p q ie N pq In this case one can show by a simple counting argument that MN p71q71 2 Fields A eld is an extension of the concept of a group De nition 3 A eld F X is a set F with special elements 0 and 1 along with two operations X de ned on pairs of elements of F such that the following conditions hold H F is a commutative group with identity the element 0 10 The X operation is associative for all abc E F a X b X c a X b X c 9quot The X operation is commutative for all 1 E F a X b b X a r gt The element 1 is an identity for X for all a E F 1 X a a X 1 a U1 The distributive law is satis ed for all abc E F a X b c a X b a X c 9 All nonzero elements in F have an inverse under X for all a E F a 7 0 there exists an element 1 1 E F such that a X 1 1 1 For any eld F we denote by F the set of nonzero elements of F Thus every element of F has an inverse As an example Z for p prime see below is the set 1 p 7 1 Note that Z5 is a eld under addition and multiplication modulo 5 To see this note that we already know that Z5 is a group under addition from Section 1 Furthermore we can check that requirements 275 are satis ed since they hold over the set 1 5 considered over the integers this essentially implies that they hold The non trivial one to check is condition 6 but this can be veri ed on a caseby case basis ie the inverse of 2 is 3 4 is its own inverse On the other hand note that Z5 is not a eld For example 4 has no multiplicative inverse try to nd one There is one type of eld which we will be concerned with in this class this is the subject of the following lemma Before presenting the lemma we remind the reader that a prime number is an integer p 2 2 whose only divisors are 1 and p Lemma 4 Letp be a prime number Then Zp is a eld under addition and multiplication modulo p 3 Complexity of Arithmetic Operations ln cryptography we tend to deal with large numbers on the order of 2512 so it is worth stepping back and making sure that arithmetic operations on these integers can be done e iciently Recall that the e iciency of an algorithm operating on a number a is not mea sured in terms of a but rather in terms of the length number of bits in a written here as a llogz al Fortunately all simple arithmetic operations can be done in polynomial time in the length of the inputs We state the following facts without proof in the following ab are positive integers with a 2 b 1 Addition of a and b can be done in time Oa 2 Multiplication of a and b can be done in time Oa In fact this can be improved to Oa log a logloga 3 Division of a by b returning both the quotient and the remainder can be done in time Oa Note this means that we can compute a modb in time Oa This can be also improved What about exponentiation We will only be interested in exponentiation modulo some xed value N so that in particular our nal answer will always be bounded by N so the question becomes how fast can we compute a17 mod N A very naive way of doing this is to compute a2 a a a3 a a a a a1 1 and then nally reduce a17 modulo N what is the time required for this approach A somewhat less naive method is to reduce the value at each step ie compute a2 mod N a a mod N a3 mod N a a2 mod N Note however that this algorithm has b steps so the running time will be exponential in bI A more careful algorithm 7 repeated squaring 7 is needed Pseudocode for this algo rithm follows we assume b 2 O for simplicity lnput abN if b 0 return 1 ans atmp 1 we maintain the invariant that our solution is tmp ansl7 mod N while b gt 1 if b is odd tmp tmp ans mod N b b 7 1 ans ans ans mod N b 272 return tmp ans mod N Note that this algorithm performs at most 2b multiplications and each multiplication takes time ON2 we assume here that a lt N 7 if this is not the case we reduce a before starting So the total running time of the algorithm is Ob N2 We state without proof the fact that the greatest common divisor of a and b can be computed in time Oa this uses Euclid s algorithm In fact we can do even better using the extended Euclid s algorithm in the same time bound we can compute natural numbers mn with 7a2 3 mn S a2 such that ma nb gcdab This fact turns out to be surprisingly useful as an example consider the problem of computing the inverse of an element a E Zj v By de nition of Zj v we know that gcdaN 1 Thus using the extended Euclid s algorithm we can nd nm such that no mN 1 note that one of nm may be negative But now we claim that n is the inverse we desire Indeed we have no 1 imN 1 mod N Thus it a 1 mod N 4 Chinese Remainder Theorem A very useful algorithmic result is the following we say two integers N1N2 are co prime if gcdN1N21 Theorem 1 Consider a system of equations of the form a1 mod N1 a2 mod N2 6 1k mod Nk where w is the variable and a1 an and N1N1c are given Suppose N1 Nk are pairwise coprime Then there is always a unique solution x with 0 S w lt N1 Nk Furthermore such 6 is e lciently computable given a1 an and N1 Nk Finally the set of all solutions are the integers y such that y 6 mod N1 Nk Assume N pq with pq prime note that this implies that p and q are co prime The above theorem gives an alternate way to represent elements of ZN we may represent a E ZN by the pair ab with a E Zp and b E Zq where w a modp and w bmod q For example 7 E Z15 can be written as 12 since 7 1 mod 3 and 7 2 mod 5 An important property of these representations is given by the following fact Fact 1 Let N pq with p q prime Let x lt gt wpwq be the representation described above Then 0 far 3 2 mod N then wpwq ypyq 2pzq where 2p mp yp mod p and zq wq yq mod 1 o If my zmodN then wpwq ypyq 230231 where 2p wpyp modp 1nd zq acqu mod 1 We give an example of some very useful facts one can derive from this simple observation First note that this gives a quick proof that N p7 1q 7 1 when N pq and p q are prime lndeed represent elements as above Then 11 E Z iff 11 is not divisible by p and not divisible by 1 But note that 1 b is divisible by p exactly when 1 0 and similarly is divisible by q exactly when 1 0 Thus the number of elements 1 b which are in Z is given by all choices of 1 except 0 and all choices ofb except 0 giving 137 1q 7 1 possibilities Furthermore note that the inverse of 1 b in Z is given by 14174 where 1 1 is the inverse of 1 in Zp etc Let s now look at elements which are squares in Zp where p 2 3 is prime note an element 1 is a square if 1 2 modp for some We claim that every element in Zp has either no square roots ie is not a square or exactly two distinct square roots As an informal proof note that all calculations in this paragraph are implicitly done in the group Zp note that if an element 1 has a square root at then 7x is also a square root of 1 Furthermore w and 7w are distinct modulo 1 when p is odd why 7 note that is not true for the case when p 2 which is why we excluded it above So any element which is a square has at least two square roots Can there be more Well let x and y be square roots of 1 Then 2 32 and thus 2 7 32 0 Algebra gives w 7 y 0 But this has the two solutions at y and w 73 important note this makes use of the fact that the equation 102 0 has solutions only if u 0 or z 0 or both This is true for Zp which is a eld This is not in general true for all groups as we will see below As an aside note that this also gives us a count of how many squares there are in Zp Since every square maps to two distinct elements of the group exactly half of the nonzero elements of Zp must be squares ie p 7 12 We now consider ZN where N pq and p and q are prime How many square roots can elements have now We show that each element has either no square roots or exactly four distinct square roots We work with the representations of elements of ZN given by the Chinese remainder theorem above Consider an element 1 b A little thought shows that 1 b is a square iff 1 is a square and b is a square Then the four square roots of 1 b are given by h 71 h f 7h and 7f Furthermore this shows that the number of squares in Z is 14 of the the elements of Zj v As an aside note why the proof that there are only two square roots given in the case of Zp p prime fails here In particular it is not the case that if my 0 mod N then either at 0 or y 0 As an easy counterexample note that for any 1 we have using representations 10 1 02 00 0 It is the case that square roots modulo prime p can be computed in polynomial time Note that this allows e icient calculation of square roots modulo N if the factors of N are known by application of the Chinese remainder theorem We will show later in class that square roots cannot be computed in polynomial time modulo N without knowing the factorization of N unless factoring can be done in polynomial time 5 Legendre and Jacobi Symbols Quadratic Residuosity Notation has developed for dealing with quadratic residues in modular groups For elements in Z p prime we call an element which is a square a quadratic residue and de ne the Legendre symbol as follows 1 if y is a quadratic residue modulo p spa 0 if y 0 1 otherwise We can extend this de nition to the case N pq pq prime and de ne the Jacobi symbol as follows JIM EM any Note that if y is a square in ZN then we must have jNy 1 This is so because if cc is a square modulo N and ac lt gt a b then a must be a square modulo p and I must be a square modulo 1 On the other hand there are elements y with jNy 1 which are not quadratic residues these are precisely those elements y lt gt a b where neither a nor 1 are squares and hence pa qb An important result is that the Jacobi symbol ofw modulo N can be e iciently computed even if the factorization of N is not known Thus if we compute jNw 1 we know that cc cannot be a quadratic residue On the other hand given an element cc for which jNw 1 and without the factorization ofN no e icient algorithm is known to determine whether cc is a quadratic residue or not We will use this in building an encryption scheme later in the course 6 Generators Generators are special elements that exist in certain groups De nition 4 Let G be a group of order it let 9 E G and suppose that 9192 g are all distinct note that in this case we have 91 gn G Then G is said to be cyclic and g is said to be a generator of G I Examples of cyclic groups include Z N for arbitrary N under addition More interesting are groups which are cyclic under multiplication Theorem 2 For every prime Z is a cyclic group under multiplication Note that even in a cyclic group not every element is a generator For cryptographic purposes an important example of a cyclic group is the following Let p 2g 1 with pq prime such primes can be e iciently found We state Without proof the fact that there is a subgroup G of Z which is of order q furthermore this subgroup is cyclic Finally a generator 9 of this subgroup can be e iciently found and given 9 we can e iciently tell whether it is a generator of this subgroup see 3 Section B5 for more details In this group actually this holds for any group computation in the exponent can be done modulo the order of the group Thus gi mod p gi mOd 1 mod p etc Also in a cyclic group discrete logarithms are well de ned with respect to any generator For example if g is a generator of G then for any element h E G we know that gi h for some i Furthermore this i is unique modulo the order of the group 1 We denote this unique value i by loggh where the underlying group G is understood from the context 7 Conjectured Hard Problems le RSA factoring squaring and discrete logarithm 8 Primality Testing To come References 1 LN Childs A Concrete Introduction to Higher Algebra Second Edition Springer 1995 2 C Crepeau Lecture Notes for Computer Science 308 547A Cryptography and Data Security pp 33745 3 S Goldwasser and M Bellare Lecture Notes on Cryptography Appendix B June 1997 4 AJ Menezes PC van Oorschot and SA Vanstone Handbook on Applied Cryptogra phy Chapters 273 CBC Press 1996 5 L Trevisan Notes on Algebra January 1999 6 L Trevisan Notes on NumberTheoretic Algorithms December 1999

### BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.

### You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

## Why people love StudySoup

#### "There's no way I would have passed my Organic Chemistry class this semester without the notes and study guides I got from StudySoup."

#### "I signed up to be an Elite Notetaker with 2 of my sorority sisters this semester. We just posted our notes weekly and were each making over $600 per month. I LOVE StudySoup!"

#### "I was shooting for a perfect 4.0 GPA this semester. Having StudySoup as a study aid was critical to helping me achieve my goal...and I nailed it!"

#### "Their 'Elite Notetakers' are making over $1,200/month in sales by creating high quality content that helps their classmates in a time of need."

### Refund Policy

#### STUDYSOUP CANCELLATION POLICY

All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email support@studysoup.com

#### STUDYSOUP REFUND POLICY

StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here: support@studysoup.com

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to support@studysoup.com

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.