Special Topics CS 4803
Popular in Course
Popular in ComputerScienence
This 0 page Class Notes was uploaded by Alayna Veum on Monday November 2, 2015. The Class Notes belongs to CS 4803 at Georgia Institute of Technology - Main Campus taught by Staff in Fall. Since its upload, it has received 38 views. For similar materials see /class/234146/cs-4803-georgia-institute-of-technology-main-campus in ComputerScienence at Georgia Institute of Technology - Main Campus.
Reviews for Special Topics
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 11/02/15
CS 4803 Computer and Network Security Alexandra Sasha Boldyreva Symmetric encryption Symmetric encryption schemes A scheme SE is specified by a key generation algorithm K an encryption algorithm E and a decryption algorithm D SEKE D MsgSpmessage space K 4 Sender S Receiver R 9 It is required that for every MeMsgSp and every K that can be output by K DKEKMM Often the key generation algorithm simply picks a random string from some key space KeySp eg 01 for some integer k In this case we will say that a scheme SE is defined by KeySp and two algorithms SEKeySpED The encryption algorithm can be either randomized take as input a random string or stateful take as input some state eg counter that it can update Block cipher modes of operation Modes of operation define how to use a block cipher to encrypt long messages We will often assume that the message space consists of messages whose length is multiple of a block length Electronic Code Book ECB mode Electronic Code Book ECB mode Let Eo1kxo1 eo1 be a block cipher ECB01kED algorithm 6K M if modn 0 or 0 then return L Break M into Wrbit blocks M1 for te 1 tom do Encryption algor39thm E return 0 algorithm DK 0 if modn Oor lCl 0 then return L Break 0 into nebit blocks C1 Cm for te 1 to m do D M 1 E EClilgt ecwptlon algorithm D quotquotquotquotquotquotquotquotquotquotquotquotquotquot quot return 5 Cipherblock chaining CBC mode with random IV Cipher39bIOCk Chaining CBC mOde With random IV Let Eo1kxo1 eo1 be a block cipher CBC01kED alg rithm MM If lMl mod n 0 or lMl 0 then return L Encwption algorithm E m Break M into mint blocks M1 M 1 cm 471V A mm for t e 1 to m do Oil EEKltOlzr 1 eMlzl c e on Om return mo 1v3o1 algorithm DKltltIVCgt f mod 72 D or 0 then return L Break 0 into nebit blocks C1mcm Decryption algorithm D Stateful Cipher block chaining CBC mode with counter IV Let Eo1kxo1 eo1 be a block cipher CBCC01kED Encwption algorithm E ctrFOquot The counter ctr is incremen e r message Decryption algorithm D Stateful Cipher block chaining CBC mode with counter IV algorithm SKM i ctr e 0 if lMl modn at 0 or lMl 0 then return L Break M into nebit block Mn Mm m do cit ltEKCL 711 Mp1 cecpiucimi T return mo algorithm DKIVC if ici mod 7L e 0 01 ici 0 then return L Bleak a into whit blocks C1Cm ifIVmgt Z C en returnl for re 1 to m do Mz e E Cz e Cz e 1 MenpruMW turn M Randomized counter mode CTR Let Eo1kxo1 eo1 be a block cipher CTR01kED Encwption algorithm E R3o1 E E Decryption algorithm D Stateful counter mode CTRC Let Eo1kxo1 eo1 be a block cipher CTRC01kED Encwption algorithm E ctr is initially On A current counter ctr is maintained as a s at Decryption algorithm D What is a secure encryption scheme Recall perfectly secure schemes are impractical We assume that adversaries are computationally bounded A scheme is secure when it is not insecure Insecure adversaries can do bad things Bad things an adversary who sees ciphertexts can compute the secret key can compute some plaintexts can compute the rst bit of a plaintext can compute the sum of the bits of a plaintext can see when equal messages are encrypted can compute So what is a secure encryption scheme Informally an encryption scheme is secure if no adversary with reasonable resources who sees several ciphertexts can compute any partial information about the plaintexts besides some apriori information Any information except the length of the plaintexts We assume the length of the plaintexts is public Note that the above implies that the bad things we mentioned do not happen And the other bad things While the above definition captures the right intuition it s too informal to be useful Security of encryption still informally An encryption scheme is semantically secure if no adversary with reasonable resources who is given a ciphertext can get any information about the underlying message better than anyone who is not given the ciphertext An encryption scheme is indistinguishable under chosenplaintext attacks if no adversary with reasonable resources who is given a ciphertext of the one of two messages of its choice can figure out which message was encrypted These two definitions can be formalized and proven equivalent Ind istinguishability under chosenplaintext attacks Fix an encryption scheme SEKeySpED Pick a key K at random from KeySp For an adversary A and a bit b consider two experiments Expindcpab SEA for b0 or b CEKMb EKLRb The difference between probabilities of outputting 0 in two experiments is called indcpaadvantage ofA in attacking SE A symmetric encryption scheme SE is indistinguishable under chosen plaintext attacks INDCPA secure if prfadvantage of any adversaw with reasonable resources is close to 0 Why IND CPA ensures that no partial information is leaked Fix SEKeySpED with MsgSpO1m Assume for example that there existsan efficient adversary that after seeing a few plaintextscipertexts pairs and a challenge ciphertext can always compute the challenge plaintext Then we can easily show that SE is not INDCPA secure We present an adversary A Adversary A given access to EKLRb pick any two distinct messages MOM1 from 01m query them to the oracle get back a ciphertext Cb Run the adversary B If B requests to see an encryption of a message M query MM to the oracle get back C give C back to B give Cb as the challenge to B until B outputs plaintext M lf M M0 then return 0 else return 1 Endlf A is correct whenever B is correct e in ExpindcpaO A returns 0 if B was correct always by assumption in Expindcpal A outputs 0 only if B is incorre Thus indcpa advantage of A is 101 A is efficient Thus SE is not IND ct never CPA ECB is not INDCPA We construct an adversary A A weakness is that equal plaintext blocks always yield equal ciphertext blocks The adversary just asks to see a ciphertext of two messages where only one has equal blocks Adversary A5KLRWBgtgt Mien Mgeo l 1 0102 e KLRMDM112 If C1 CD then return 1 else return 0 A is always correct thus it always outputs 0 in experiment ExpindcpaO and never in Expindcpal Thus ECB is not INDCPA Analysis of the ECB mode Let Eo1kxo1 ao1 be a block cipher ECB01kED Encryption algorithm E Conjecture ECB hides the plaintex But is it enough Is ECB a good encryption scheme Is ECB INDCPA secure Note We broke the ECB scheme without breaking the underlying block cipher it can be secure PRF The weaknesses were in the scheme not the tools Claim Any deterministic stateless scheme is not INDCPA Why Insecurity of CBCC Let Eo1kxo1 ao 1 be a block cipher CBCC01kED Stateful Encryption algorithm E incremented for each new message 0 Theorem CBCC is not INDCPA There exists an efficient adversary A with indcpa advantage 1 0 Proof idea Adversmy A KLR39i39ibgtgt Mm H Dquot M111 H0quot M012 5 on M12 boron ltWi 01gt Hammer Mi i2 ltIV2 02gt Mammy Mm 5 If C 02 then return 1 else Ietlun o Security of the CTRC Let Eo1kxo1 ao1 be a function family CTRC01kED Encwption algorithm E ctr is initially On A current counter ctr is maintained as a tat The scheme is used to encrypt at most 2 blocks so that the counter does not wrap around 0 How good is the scheme 0 The flaws seem hard to find 0 Q But may be they exist and wejust don39t see them AThe mode is as good as it can be and one can prove it Security of CTRC CTR CBC 0 Theorem 1 If E is a PRF then CTRC is INDCPA Ie For any adversary A there exists an adversary B such that indcpaadvantage ofA S 2 prfadvantage of B and B has comparable resources 0 Theorem 2 If E is a PRF then CTR is INDCPA a similar concrete security statement can be made 0 Theorem 3 If E is a PRF then CBC is INDCPA a similar concrete security statement can be ma e DiffieHellman key exchange 0 Secure against passive eavesdropping 0 but insecure against a maninthemiddle attack CS 4803 Computer and Network Security Alexandra Sasha Boldyreva Authenticated key exchange Adding key exchange Overview 0 Not sufficient to simply add on key establishment beforeafter 0 Protocol design is subtle authentication Small changes can make a protocol insecure 0 Need authenticated key exchange o Historically designed in an adhoc way by checking protocol for known weaknesses 0 Great example of where provable security helps Example 0 Reverse challengeresponse o Ie send a ciphertext and have user decrypt it 0 Mutual authentication if decrypts validly 0 Weaknesses 0 Uses encryption for authentication Example 0 User sends time MACKtime o What if she had used encryption or a hash o What about just sending MACKtime o Considerations 0 Requires loosely synchronized cocks 0 Must guard against replay o What if user has same key on multiple servers 0 Clock reset attacks No mutual authentication Adding mutual authentication 0 Double challengeresponse symmetric key in 4 rounds 0 Variant in which user sends nonce first 0 Insecure reflection attack 0 Also vulnerable to offline password guessing without eavesdropping 0 To improve security make protocol asymmetric Security principle let initiator prove its identity first Using timestamps 0 User sends time MACKtime server responds with MACKtime 1 o Vulnerabilities Establishing a session key 0 Double challengeresponse compute session key as FKR2 0 Secure against passive attacks if F is a pseudorandom permutation Active attacks And how to fix it Publickey based 0 Include Epksessionkey in protocol 0 Encrypt sessionkey and sign the result 0 Potentially vulnerable to replay attacks 0 User sends ER1 server sends ER2 session key is R1R2 Reasonable Oneway authentication 0 If only the server has a known public key eg SSL Server sends R o Client sends EpkR password sessionkey 0 Insecure in general 0 But secure if encryption scheme is chosen appropriately 0 Can extend to give mutual authentication Authenticated DiffieHellman 0 Add signaturesMACS and nonces to DiffieHellman protocol 0 Variation HMQV improved MQV Using session keys 0 Generally want to provide both secrecy and integrity for subsequent conversation Use encryptthenMAC 0 Use sequence numbers to prevent replay attacks 0 Periodically refresh the session key Mediated authentication 0 Eg using KDC 0 Simple protocol 0 Alice requests to talk to Bob KDC generates KAB and sends it to Alice and Bob encrypted with their respective keys 0 Note no authentication here but impostor can t determine KAB Improvement 0 Have KDC send to Alice the encryption of KAB under Bob s key 0 Reduces communication load on KDC Resilient to message delays in network NeedhamSchroeder o A KDC N1 Alice Bob 0 KDC A KAN1 Bob KAB ticket where ticket KBKAB Alice A98 ticket KABN2 89A KABN2l N3 A98 KABN3l Computer and Network Security via Program Analysis Malware Analysis Monirul Sharif Guest Lecture CS4803 CNS College of Computing Georgia Institute of Technology Date 09222006 Introduction Program analysis techniques are used for malware detection and analysis o It is evident that simple signature based schemes have become increasingly ineffective 0 Both static and dynamic analysis are used for advanced signature and behavior based detection 0 We will cover techniques of analysis and also highlight techniques that malware use to make analysis hard Malware 0 Software designed to infiltrate or damage computer systems Malicious Software Computer viruses worms trojan hOrses spyware adware etc Created by someone with malicious intent compromise security o Malware categories Computer Viruses spreads by infecting legitimate programs Computer WOrms spreads by the network from host to host Trojans malware disguised as legitimate software Botnets malware that provides command and control capability Spyware spies on user activities and compromises privacy Adware unsoliciated ads on infected computers And there are more iota number Malware Trend 12000 1086639 10992 I thai viruses anpi Wun s I Tm families 9000 7350 6000 i 4496 3000 1702 o a 7 7 7 Ju iyrDec 2005 Juiy Dec 2003 Janilune 2004 in y Dec Z004 Jandune 2005 Periud Fig New Win32 virus and worm variants Source Symantec Internet and Security and Threat Report Vol IX Traditional Detection Methods 0 Antivirus software has been around since the dawn of computer viruses 0 Traditional methods of detection signature based Usually the signature is sequence of bytes found in malware Antivirus software scans files using signature Can be considered simple Static Analysis In traditional methods signature is mostly syntactic o How to find signature Traditional method involves manual analysis of malware Binary is scanned structurally and reverse engineered Malware is analyzed to find an invariant to be used as a signature Traditional Evasion Methods Methods seen in viruses a long time ago Stealth virsuses Hides changes in the system and therefore the virus by showing original state Oligomorphic viruses Encrypts body with varying forms and has a constant decryptor routine Polymorphic viruses Contains varying encrypted body and has several copies of decryptOr polymOrphic decryptor Metamorhic viruses Uses various code morphic techniques to create new instances that are different in code but identical in nature 0 Current eneration of mavvare use Obfuscation techniques evolved rom the above prInCIpIes to evade sngnature based schemes 0 Code obfuscation is used to evade static analysis 0 A good resource of information Peter Szor s book The Art of Computer Virus Research and Defense Polymorphic Code 0 When creating a new instance the malware uses code encryption to encrypt its body Different keys are used in different instances so body is different sequence of bytes Simple signatures that are just sequences of bytes in the body are fooled o Polymorphic Decryptor The malware uses several versions of decryptors or uses obfuscation so that it is not easy to create signature Detection of Polymorphic Code 0 Commercial Antivirus Software are capable of detecting polymorphic malware viruses and trojans Signature is created to detect PD if PD is not obfuscated well enough Emulation is used to antivirus to emulate PD action and find a detectable sequence in the body Malware also employ antiemulation techniques to defeat them 0 Polymorphic worms see Polygraph by Newsome et al and also defeating Polygraph by Perdisci et al Metamorphic Code To further evade detection complete full body obfuscation is applied 0 Some obfuscation techniques are Code reordering Garbage insertion Equivalent code replacement Jump insertions Code reordering via inserting jumps Code and data encapsulation Packing code 0 Metamorphic malware creates new instances using these Some Disassemble own code and apply directly at machine code level Some convert in to intermediate form apply them and recompile 0 Reference Szor s book and Christodorescu et al Testing Malware Detectors Metamorphic Code Examples 0 Code reordering Instructions that are independent are reordered MOV EAX X MOV EBX Y MOV EBX Y MOV EAX X ADD EAX EBX ADD EAX EBX MOV X EAX MOV X EAX39 0 Garbage insertion Instructions are inserted that are semantic noops do not effect the code and registers and therefore execution MOV EAX X MOV EAX X MOV EBX Y MOV EBX Yi ADD EAX EBX ADD EAX EBX MOV EAX PUSH ES MOV X EAX POP ESI Metamorphic Code Examples 0 Equivalent Code Replacement Register renaming or semantically equivalent code MOV EAX X xon EAX EAX MOV EBX Y ADD EAX X ADD EAX EBX ADD EAX Y MOV X EAX MOV X EAX 0 Register Renaming Registers are renamed and replaced throughout code Therefore code is equivalent in terms of semantics MOV EAX X MOV ECX X ADD EAX EBX ADD ECX EAX MOV X EAX MOV X ECX Detecting Metamorphic Code 0 Identifying two pieces of code is equivalent in nature same execution behavior is undecidable 0 But we can still detect Approximations exist A real complete metamorphic malware is hard to build 0 W32Simile and others are limited in scope 0 Complex problems to solve itself 0 Techniques based on semantics rather than syntax is better in detecting metamorphic code Semantic Aware Malware Detection Method is to create signature from the semantic properties of code using static analysis 0 Signature is a semantic template T ITVT CT where IT is a sequence of instructions VT is a set of variables CT is a set of symbolic constants also contains functions An execution context for a template is assignment to constants A state is assignment to variables program counter and memory 0 A program instance is said to satisfy a template if There exists an execution context and initial state for which A sequence of the instructions in the template and malware affect memory in an equivalent manner This problem is undecidable can be shown by reducing the halting problem to it Semantic Aware Malware Detection cont 0 Approximate solution by making weaker assumptions not all memory updates need to be equivalent Differentiate between core and temporary memory loca ons Several classes of obfuscation can be supported 0 Instruction reordering 0 Register renaming 0 Garbage insertion Not supported 0 Instruction replacement limited support 0 Equivalent functionality o Reordered memory access a Reference Mihai Christodorescu et al Semantic Aware Malware Detection AntiStatic Analysis All syntax and semantic signature generation require some sort of static analysis 0 Antistatic analysis techniques are used in malware to make it hard to perform proper static analysis Static analysis of binary relies on Disassembly therefore obfusce code to make it hard to disassemble also called Antidisassembly Make steps used in static analysis hard 0 Hide function boundaries by not conforming to proper semantics o Rer on dynamically computed constructs such as opaque predicates o Encapsulate code pack 0 Antidisassemny In the X86 architecture disassemny is imprecise reasons 0 Variable length instructions 0 Code and data can be mixed together Disassemny is done using some algorithms linear sweep or iterative adap ve Antidisassemny can be done by c Mixing code and data 0 Using a lot of indirect jumps and calls so that start of code cannot be identified Antidisassemny aware static analysis that use heuristics can overcome most of these obfuscations but with limitations AntiStatic Analysis cont a Code and data encapsulation packing Packed code requires unpacking for static analysis Some commercial and free code packers are available pack code and add unpacker routine that use obfuscation o UPX o PECompact o Armadillo 0 And others The unpack routine is obfuscated hence hard to analyze Also emulation and interpretation may be used to unpack 0 Reference Paul Royal etal Polyunpack 0 Thus pure static analysis is hard for malware some dynamic analysis methods are required Dynamic Analysis on Malware 0 Since dynamic analysis require execution malware is usually analyzed in a sandboxed environment Virtual Machines are mostly used with network access restricted and watched upon Emulation may be applied a Debuggers or debugging techniques are used to trace execution a Dynamic analysis benefits Most antistatic analysis techniques can be defeated Can provide accurate information 0 But as stated in the last lecture dynamic analysis is incomplete though it is precise Behavioral Detection 0 Behavior based detection is mostly dynamic analysis based Static analysis may be combined Open area for a lot of research 0 Some malware exhibit identifiable runtime behavior for example Spyware that hijack browser Bot programs that access CampC server Adware that may show popups and ads 0 Principle Define templates of bad behavior Use runtime monitoring to find match 0 Templates can be manually or automatically created 0 Runtime monitoring is mostly system call API or an observable behavior 0 This is approximate solution can lead to false positives and negatives 0 Reference Engin Kurda et al Behavior based Spyware Detection Dynamic Analysis Evasion 0 Dynamic analysis techniques require to execute malware Current trend in malware is detecting dynamic analysis environments or analysis tools and exit or crash 0 VM detection is used to detect analysis environments Nopill deetect Redpill Scoopy Doo Jerry thools Malware detects the use of VMware or other Virtual Machine environments that are used for safe malware analysis 0 Debugger detection is used to detect dynamic analysis tools Most Dynamic Analysis tools work as debugger for runtime analysis Malware uses different techniques to detect that it is being debugged Also known debuggers have specific methods of detection Keyframing fine level of control quality of motion depends on skill of animator Keyframing iterate C adjust trajectory gt play back motion parameter representation interpolation path speed along path Keyframing quotEverythingquot can be interpolated in Maya parameters locations orientations joint angles shape exible objects material properties color texture camera motion for animation lighting Coordinate Systems Z X root llty X world Y f0 relative or world Representation orientation transformation matrices Xx YX ZX PX XY yy ZY py X2 Y2 ZZ 132 0 0 0 1 Euler angles several sequential joints rot 6 about X rot 0c about y quaternions axis angle notation xyz sin 92 cos 92 Keyframing Interpolation Linear 5 c U 0 pl c 5 time 5 c U 0 pl c 5 time Splines uomsod time llnear cardinal b spline bezier spline Splines unintended wiggles intended path Keyframing Velocity along path Linear a lt p o O 0 time Ease in Ease out 8 p o O h time Control of Velocity tlt xy Qu for 11 01 X equal arc lengths equal spacing in u Arc length reparametrization want 8 Au where s is arc length reparam Qu to QA1s need to find u A1s Au S m B Is 53 bisection search for a value of u where Au s with a numerical evaluation of Au details in Watt and Watt Keyframing Constralnts Inverse Kinematics Ioint Limits Position Limits Control for the Animator picking keyframe positions editing motion curves control over velocity timing specification of constraints Kinematics the study of motion without regard to the forces that cause it 0 A B Forward A focB Inverse ocBf1A draw graph specify fewer degrees of freedom more intuitive control of dof pull on hand glue feet to the ground Forward Kinematics x Llcos 91 chos 91 92 y Llsm 91 Lzsm 9 1 9 2 r IN lt M II r IOOO Inverse Kinematics 92 cos x2 yz L Li 2 LIL2 9 91 Lzsin92x L1 L2cos 82y Lzsin92y L1 L fosegx e f1x What makes IK hard redundancy singularities goal of quotnatural lookingquot motion Authentication o Verifying the identity of another entity Computer authenticating to another computer CS 4803 Computer and Network Security 0 Person authenticating to a local computer 0 Person authenticating to a remote computer 0 Two issues o How authentication information is stored at both ends Alexandra Sasha Boldyreva o Authentication protocol itself Authentication Overview Attack taxonomy o Authentication may be based on 0 Passive attacks What you know 0 Active attacks 0 What you have Impersonation o What you are Maninthemiddle Examples 0 Server compromise 0 Mutual authentication vs unidirectional authentication Diffgrent attaCks may be eagermore dif CUlt in different settings Addressbased authentication 0 Is sometimes used eg unix 0 This is generally not very secure 0 Relatively easy to forge source addresses of network packets Passwordbased protocols o Passwordbased authentication 0 Any system based on lowentropy shared secret note different from book definitions Password selection 0 User selection of passwords is typically very weak 0 Lower entropy password makes dictionary attacks easier 0 Typical passwords Derived from account names or userna mes 0 Dictionary words reversed dictionary words or small modifications of dictionary words 0 Etc Better password selection Nonalphanumeric characters Longer phrases Can try to enforce good password selection but these types of passwords are difficult for people to memorize and type From passwords to keys 0 Can potentially use passwords to derive symmetric or public keys 0 What is the entropy of the resulting key 0 Often allows offline dictionary attacks on the password Passwordbased protocols 0 Any passwordbased protocol is vulnerable to an on line dictionary attac o OnIine attacks can be detected and limited 0 How 0 Any passwordbased protocol is vulnerable to offline attack if server is compromise Passwordbased protocols 0 Best Use a passwordbased protocol which is secure against offline attacks when server is not compromised 0 Unfortunately this has not been the case in practice eg telnet cell phones etc o This is a difficult problem Password storage In the clear Hash of password done correctly 0 Doesn39t always achieve anything 0 Makes adversary39s job harder 0 Potentially protects users who choose good passwords Salt ed hash of password 0 Makes bulk dictionaw attacks harder but no harder to attack a particular password Centralized server stores password Threshold password storage Centralized password storage 0 Authentication storage node 0 Central server stores password servers request the password to authenticate user 0 Auth facilitator node 0 Central server stores password servers send information from user to be authenticated by the central server 0 Note that central server must be authenticated Basic authentication protocols 0 Server stores Hpw user sends pw 0 Secure against server compromise but not eavesdropping or replay attacks 0 Server stores pw sends R user sends HpwR 0 Secure against eavesdropping but not server compromise or dictionary attack 0 What if the user sends R also 0 Can we achieve security against both Other techniques for human auth o Tokens Magnetic stripe cards 0 Smartcards o Standalone tokens 17m 0 Still need a secure auth protocol 0 Various possibilities 0 Drawbacks Biometrics o Entropy o Are biometric data secret o Revocation Difficult to use securely o Nonuniform 0 Errors Still need a secure protocol Publickey protocols 0 Server stores pk user stores sk 0 Server sends R user signs R 0 Using a secure signature scheme 0 Is this secure 0 Potential weaknesses 0 What if we had used encryption instead 0 Can we achieve security against server compromise and eavesdropping without using publickey crypto Lamport s hashing protocol Server stores Hnpw user sends Hn391pw 0 Server updates user s entry 0 Can also add salt to hash 0 Can use same password on different sites 0 Protects against offline attacks Can use same password but different salt when password expires Some attacks 0 Secret expires 0 No mutual authentication 0 Small n attack Session key establishment 0 There are very few applications for which authentication alone is sufficient o What do you do once you are authenticated 0 Generally need to establish a session key 0 Efficiency advantages to using symmetrickey techniques if publickey auth is used Advantages even if a symmetric key is already shared Chapter 9 COMPUTATIONAL NUMBER THEORY 91 The basic groups We let Z 72710 12 denote the set of integers We let Z 12 denote the set of positive integers and N 012 the set of non negative integers 911 Integers mod N lf 1 are integers not both zero then their greatest common divisor denoted gcdab is the largest integer 01 such that d divides a and d divides b If gcdab 1 then we say that a and b are relatively prime lf aN are integers with N gt 0 then there are unique integers rq such that a Nq r and O S r lt N We call r the remainder upon division of a by N and denote it by a mod N We note that the operation a mod N is de ned for both negative and non negative values of a but only for positive values of N When a is negative the quotient 1 will also be negative but the remainder r must always be in the indicated range 0 S r lt N lf 1 are any integers and N is a positive integer we write a E 1 mod N if a mod N 1 mod N We associate to any positive integer N the following two sets ZN 01N71 Z3 iEZzlgigNilandgcdiN1 The rst set is called the set of integers mod N lts size is N and it contains exactly the integers that are possible values of a mod N as a ranges over Z We de ne the Euler Phi or totient function g7 Z a N by MN for all N E Z That is MN is the size of the set Zj v 912 Groups Let G be a non empty set and let be a binary operation on G This means that for every two points 1 E G a value a b is de ned De nition 91 Let G be a non empty set and let denote a binary operation on G We say that G is a group if it has the following properties 2 COMPUTATIONAL NUMBER THEORY CLOSURE For every 1 E C it is the case that a b is also in G ASSOCIATIVITY For every abc E C it is the case that a b c a b 0 9 59 IDENTITY There exists an element 1 E G such that a 1 1 a a for all a E G 4 INVERTIBILITY For every 1 E G there exists a unique I E G such that a b b a 1 The element In in the invertibility condition is referred to as the inverse of the element a and is denoted a l I We now return to the sets we de ned above and remark on their group structure Let N be a positive integer The operation of addition modulo N takes input any two integers 117 and returns 1 17 mod N The operation of multiplication modulo N takes input any two integers 117 and returns ab mod N Fact 92 Let N be a positive integer Then ZN is a group under addition modulo N and Z3 is a group under multiplication modulo N In ZN the identity element is O and the inverse of a is 7a mod N N 7 a In Z the identity element is 1 and the inverse of a is a b E Zj v such that ab E 1 mod N In may not be obvious why such a I even exists but it does We do not prove the above fact here In any group we can de ne an exponentiation operation which associates to any a E G and any integer 2 a group element we denote ai de ned as follows lfi 0 then ai is de ned to be 1 the identity element of the group lfi gt 0 then 2 lfi is negative then we de ne ai a lfi Put another way let 9 72 which is positive and set a2 aTl aTl aTl x V With these de nitions in place we can manipulate exponents in the way in which we are accustomed with ordinary numbers Namely identities such as the following hold for all a E G and all ij E Z 12 ai a W 0472 04071 0172 7 aily We will use this type of manipulation frequently without explicit explanation It is customary in group theory to call the size of a group G its order That is the order of a group G is G the number of elements in it We will often make use of the following basic fact It says that if any group element is raised to the power the order of the group the result is the identity element of the group Fact 93 Let G be a group and let m G be its order Then am 1 for all a E G I This means that computation in the group indices can be done modulo m Bellare and Rogaway 3 Proposition 94 Let G be a group and let m G be its order Then ai ai mOd m for all a E G and all 2 E Z I We leave it to the reader to prove that this follows from Fact 93 Example 95 Let us work in the group Z31 under the operation of multiplication modulo 21 The members of this group are 1245810111316171920 so the order of the group is m 12 Suppose we want to compute 586 in this group Applying the above we have 585 mod 21 586mod 12 mod 21 52 mod 21 25 mod 21 41 If G is a group a set S Q G is called a subgroup if it is a group in its own right under the same operation as that under which G is a group If we already know that G is a group there is a simple way to test whether S is a subgroup it is one if and only if x y 1 E S for all 5331 6 S Here y 1 is the inverse of y in G Fact 96 Let G be a group and let S be a subgroup of G Then the order of S divides the order of G I 92 Algorithms Fig 91 summarizes some basic algorithms involving numbers These algorithms are used to im plement public key cryptosystems and thus their running time is an important concern We begin with a discussion about the manner in which running time is measured and then go on to discuss the algorithms some very brie y some in more depth 921 Bit operations and binary length In a course or text on algorithms we learn to analyze the running time of an algorithm as a function of the size of its input The inputs are typically things like graphs or arrays and the measure of input size might be the number of nodes in the graph or the length of the array Within the algorithm we often need to perform arithmetic operations like addition or multiplication of array indices We typically assume these have 01 cost The reason this assumption is reasonable is that the numbers in question are small and the cost of manipulating them is negligible compared to costs proportional to the size of the array or graph on which we are working In contrast the numbers arising in cryptographic algorithms are large having magnitudes like 2512 or 21024 The arithmetic operations on these numbers are the main cost of the algorithm and the costs grow as the numbers get bigger The numbers are provided to the algorithm in binary and the size of the input number is thus the number of bits in its binary representation We call this the length or binary length of the number and we measure the running time of the algorithm as a function of the binary lengths of its input numbers In computing the running time we count the number of bit operations performed Let 17194 b1 b0 be the binary representation of a positive integer a meaning b0 17194 are bits such that 51H 1 and a 21642711 2 th 2 u 2151 2050 Then the binary length of a is k and is denoted a Notice that a k if and only if 219 1 S a lt 2 If a is negative we let a 7 1 and assume that an additional bit or two is used to indicate to the algorithm that the input is negative 4 COMPUTATIONAL NUMBER THEORY m 2 E E BAA Ag 3 22 axgng 39 gsssZZZCE 91000000 01 Z lQ V a s V Is A o a Z a1 quot8 a 3 5 E a 2 H H 52 quot s 15 4 a a a zggzgz agwlQSEEZEU g sa 8w O s g igsgs 3 iss AA zzrx iigwwNNU n UUU E sss 5 22 2 g22 2 D t squot squot squot squot squot squot squot squot E E QQ J in QDgtgtlt a Satz e oQE39QQQQQ Egoxoooox ltH2m2222m Figure 91 Some basic algorithms and their running time Unless otherwise indicated an input value is an integer and the running time is the number of bit operations C denotes a group 922 Integer division and mod algorithms We de ne the integer division function as taking input two integers a N with N gt 07 and returning the quotient and remainder obtained by dividing a by N That is7 the function returns qr such Bellare and Rogaway 5 that a qNr with O S r lt N We denote by INT DIV an algorithm implementing this function The algorithm uses the standard division method we learned way back in school which turns out to run in time proportional to the product of the binary lengths of a and N We also want an algorithm that implements the mod function taking integer inputs aN with N gt O and returning a mod N This algorithm denoted MOD can be implemented simply by calling lNT DlVa N to get 1 r and then returning just the remainder r 923 Extended GCD algorithm Suppose 11 are integers not both 0 A basic fact about the greatest common divisor of a and b is that it is the smallest positive element of the set aabZ 636 Z of all integer linear combinations of a and b In particular if d gcda b then there exist integers 65 such that d a6 b3 Note that either 6 or 3 could be negative Example 97 The gcd of 20 and 12 is d gcd20 12 4 We note that 4 202 1273 so in this case 6 2 and E 73 I Besides the gcd itself we will nd it useful to be able to compute these weights 63 This is what the extended gcd algorithm EXT GOD does given 1 as input it returns 0165 such that d gcda b a6 b3 The algorithm itself is an extension of Euclid s classic algorithm for computing the gcd and the simplest description is a recursive one We now provide it and then discuss the correctness and running time The algorithm takes input any integers ab not both zero Algorithm EXT GCDa b If I 0 then return a 10 Else qr lt lNT DlVa 17 61mg lt EXT GCDbr E lt 3 Return 0165 Endlf The base case is when I 0 lfb 0 then we know by assumption that a 7 0 so gcda b a and since a 11 170 the weights are 1 and O lfb 7 0 then we can divide by it and we divide a by it to get a quotient q and remainder T For the recursion we use the fact that gcda b gcdbr The recursive call thus yields 01 gcda b together with weights wy such that d bwry Noting that a bq r we have d bwry bwa7bqy aybqygt 164475 con rming that the values assigned to 65 are correct The running time of this algorithm is Oa b or put a little more simply the running time is quadratic in the length of the longer number This is not so obvious and proving it takes some work We do not provide this proof here 6 COMPUTATIONAL NUMBER THEORY We also want to know an upper bound on the lengths of the weights 6 3 output by EXT GCDa b The running time bound tells us that IEI IZI Oa lbl but this is not good enough for some of what follows I would expect that IEI IZI Oa Is this true If so can it be proved by induction based on the recursive algorithm above 924 Algorithms for modular addition and multiplication The next two algorithms in Fig 91 are the ones for modular addition and multiplication To compute a b mod N we rst compute c a b using the usual algorithm we learned way back in school which runs in time linear in the binary representations of the numbers We might imagine that it now takes quadratic time to do the mod operation but in fact if c gt N the mod operation can be simply executed by subtracting N from c which takes only linear time which is why the algorithm as a whole takes linear time For multiplication mod N the process is much the same First compute c ab using the usual algorithm which is quadratic time This time we do the mod by invoking MODcN The length of c is the sum of the lengths of a and b and so 0 is not small as in the addition case so a shortcut to the mod as we saw there does not seem possible 925 Algorithm for modular inverse The next algorithm in Fig 91 is for computation of the multiplicative inverse of a in the group Zj v Namely on input N gt O and a E Z algorithm MOD lNV returns In such that ab E 1 mod N The method is quite simple Algorithm MOD lNVaN dEN e EXT GCDaN b e 6 mod N Return 1 Correctness is easy to see Since a E Zj v we know that gcdaN 1 The EXT GOD algorithm thus guarantees that d 1 and 1 a6 NN Since N mod N 0 we have 1 E 16 mod N and thus I 6 mod N is the right value to return The cost of the rst step is Oa The cost of the second step is Oa If we assume that E Oa then the overall cost is Oa See discussion of the EXT GOD algorithm regarding this assumption on the length of E 9 2 6 Exp onentiation algorithm We will be using exponentiation in various different groups so it is useful to look at it at the group level Let G be a group and let a E G Given an integer n E Z we want to compute the group element a as de ned in Section 912 The naive method assuming for simplicity n 2 0 is to execute yHl Ford1ndoyey aEndFor Returny Bellare and Rogaway 7 This might at rst seem like a satisfactory algorithm but actually it is very slow The number of group operations required is n and the latter can be as large as the order of the group Since we are often looking at groups containing about 2512 elements exponentiation by this method is not feasible In the language of complexity theory the problem is that we are looking at an exponential time algorithm This is because the running time is exponential in the binary length of the input 71 So we seek a better algorithm We illustrate the idea of fast exponentiation with an example Example 98 Suppose the binary length of n is 5 meaning the binary representation of n has the form 5453525150 Then 71 2454 2353 2252 2151 2050 7 50 Our exponentiation algorithm will proceed to compute the values y5y4y3y2y1y0 in turn as follows 35 1 34 yg 1174 1174 33 32 abs 121744473 32 yg 1172 a4bq2b3b2 yl 31 abl a8b44b32172b1 yo y abo a15bq81734b22171bo Two group operations are required to compute y from 32 and the number of steps equals the binary length of n so the algorithm is fast I In general we let 5191 blbo be the binary representation of n meaning 50 bk1 are bits such that n 21945194 21945194 2151 2050 The algorithm proceeds as follows given any input a E G and n E Z Algorithm Engan If n lt 0 then a e 1 1 and n e in Endlf Let 5191 bl 50 be the binary representation of n y H 1 For 2 k 71 downto 0 do 3 H 32 11 End For Output 3 The algorithm uses two group operations per iteration of the loop one to multiply y by itself another to multiply the result by 117239 The computation of abi is without cost since this is just a if bi 1 and 1 if bi 0 So its total cost is 2k 2n group operations We are ignoring the cost of the one possible inversion in the case n lt O This is the worst case cost We observe that it actually takes group operations where WH is the number of ones in the binary representation of We will typically use this algorithm when the group G is Z and the group operation is multi plication modulo N for some positive integer N We have denoted this algorithm by MOD EXP in 8 COMPUTATIONAL NUMBER THEORY Fig 91 The input a is not required to be relatively prime to N even though it usually will be so is listed as coming from ZN In that case each group operation is implemented via MOD MULT and takes ON2 time so the running time of the algorithm is N2 Since 71 is usually in ZN this comes to ON3 The salient fact to remember is that modular exponentiation is a cubic time algorithm 93 Cyclic groups and generators Let G be a group let 1 denote its identity element and let m G be the order of G If g E G is any member of the group the order of g is de ned to be the least positive integer n such that g 1 We let lt9 92 I 2 6 Zn907917m79 1 denote the set of group elements generated by g A fact we do not prove but is easy to verify is that this set is a subgroup of G The order of this subgroup which by de nition is its size is just the order of 9 Fact 96 tells us that the order n of g divides the order m of the group An element 9 of the group is called a generator of G if g G or equivalently if its order is m If g is a generator of G then for every 1 E G there is a unique integer 2 E Zm such that 93 a This 2 is called the discrete logarithm of a to base 9 and we denote it by DLogG ga Thus DLogG gQ is a function that maps G to Zm and moreover this function is a bijection meaning one to one and onto The function of Zm to G de ned by 2 gt gt 93 is called the discrete exponentiation function and the discrete logarithm function is the inverse of the discrete exponentiation function Example 99 Letp 11 which is prime Then Zfl 12345678910 has order 1371 10 Let us nd the subgroups generated by group elements 2 and 5 We raise them to the powers i09 We get 2 0 1 2 3 4 5 6 7 8 9 23 mod 11 1 2 4 8 5 10 9 7 3 6 53 mod 11 1 5 3 4 9 1 5 3 4 9 Looking at which elements appear in the row corresponding to 2 and 5 respectively we can deter mine the subgroups these group elements generate lt2gt 17273747576777879710 lt5gt 13459r Since 2 equals Zfl the element 2 is a generator Since a generator exists Zfl is cyclic On the other hand 5 74 Zfl so 5 is not a generator The order of 2 is 10 while the order of 5 is 5 Note that these orders divide the order 10 of the group The table also enables us to determine the discrete logarithms to base 2 of the different group elements I a12345678910 DLogz12ltagt018249736 5 Later we will see a way of identifying all the generators given that we know one of them I The discrete exponentiation function is conjectured to be oneway meaning the discrete loga rithm function is hard to compute for some cyclic groups G Due to this fact we often seek cyclic Bellare and Rogaway 9 groups for cryptographic usage Here are three sources of such groups We will not prove any of the facts below their proofs can be found in books on algebra Fact 910 Let p be a prime Then the group Z is cyclic I The operation here is multiplication modulo 1 and the size of this group is Mp p 7 1 This is the most common choice of group in cryptography Fact 911 Let G be a group and let m G be its order If m is a prime number then G is cyclic I In other words any group having a prime number of elements is cyclic Note that it is not for this reason that Fact 910 is true since the order of Z where p is prime is 1371 which is even ifp 2 3 and 1 ifp 2 and is thus never a prime number The following is worth knowing if you have some acquaintance with nite elds Recall that a eld is a set F equipped with two operations an addition and a multiplication The identity element of the addition is denoted 0 When this is removed from the eld what remains is a group under multiplication This group is always cyclic Fact 912 Let F be a nite eld and let F F 7 Then F is a cyclic group under the multiplication operation of F I A nite eld of order m exists if and only if m p for some prime p and integer n 2 1 The nite eld of order p is exactly Zp so the case n 1 of Fact 912 implies Fact 910 Another interesting special case of Fact 912 is when the order of the eld is 2 meaning 1 2 yielding a cyclic group of order 2 7 1 When we want to use a cyclic group G in cryptography we will often want to nd a generator for it The process used is to pick group elements in some appropriate way and then test each chosen element to see whether it is a generator One thus has to solve two problems One is how to test whether a given group element is a generator and the other is what process to use to choose the candidate generators to be tested Let m G and let 1 be the identity element of G The obvious way to test whether a given 9 E G is a generator is to compute the values 919293 stopping at the rstj such that 97 1 lf 9 m then 9 is a generator This test however can require up to m group operations which is not e icient given that the groups of interest are large so we need better tests The obvious way to choose candidate generators is to cycle through the entire group in some way testing each element in turn Even with a fast test this can take a long time since the group is large So we would also like better ways of picking candidates We address these problems in turn Let us rst look at testing whether a given 9 E G is a generator One sees quickly that computing all powers of g as in 919293 is not necessary For example if we computed 98 and found that this is not 1 then we know that g4 7 1 and 92 7 1 and g 7 1 More generally if we know that 97 7 1 then we know that gi 7 1 for all 2 dividing 9 This tells us that it is better to rst compute high powers of g and use that to cut down the space of exponents that need further testing The following Proposition pinpoints the optimal way to do this It identi es a set of exponents m1 mn such that one need only test whether gmi 7 1 for 2 1 n As we will argue later this set is quite small 10 COMPUTATIONAL NUMBER THEORY Proposition 913 Let G be a cyclic group and let m G be the size of G Let pi 43quot be the prime factorization of m and let mi mpi for 2 1 n Let g E G Then 9 is a generator of G if and only if Forallz 1n gmiy l 91 where 1 is the identity element of G I Proof of Proposition 913 First suppose that g is a generator of G Then we know that the smallest positive integer 9 such that 97 1 is j m Since 0 lt mi lt m it must be that gmi 7 1 for allz 1m Conversely suppose 9 satis es the condition of Equation 91 We want to show that g is a generator Let 9 be the order of 9 meaning the smallest positive integer such that 97 1 Then we know that 9 must divide the order m of the group meaning m dj for some integer d 2 1 This implies that j 131 43quot for some integers 11 n satisfying 0 S 8239 3 04 for all 2 1 n lfj lt m then there must be some 2 such that 8 lt 04 and in that case 9 divides mi which in turn implies gmi because 97 1 So the assumption that Equation 91 is true implies that 9 cannot be strictly less than m so the only possibility is j m meaning 9 is a generator I The number n of terms in the prime factorization of m cannot be more than lgm the binary logarithm of m This is because pi 2 2 and 04 2 1 for all 2 1 n So for example if the group has size about 2512 then at most 512 tests are needed So testing is quite e icient One should note however that it requires knowing the prime factorization of m Let us now consider the second problem we discussed above namely how to choose candidate group elements for testing There seems little reason to think that trying all group elements in turn will yield a generator in a reasonable amount of time Instead we consider picking group elements at random and then testing them The probability of success in any trial is GenGG So the expected number of trials before we nd a generator is GGenG To estimate the e icacy of this method we thus need to know the number of generators in the group The following Proposition gives a characterization of the generator set which in turn tells us its size Proposition 914 Let G be a cyclic group of order m and let 9 be a generator of G Then GenG gi E G 2 6 Z1 and GenG I That is having xed one generator 9 a group element h is a generator if and only if its discrete logarithm to base 9 is relatively prime to the order m of the group As a consequence the number of generators is the number of integers in the range 1 m 7 1 that are relatively prime to m Proof of Proposition 914 Given that GenG gi E G 2 E Zj n the claim about its size follows easily lGenGgt M 92 e o z e 2 l Izrnl Mm We now prove that GenG gi E G 2 E First we show that ifz E Zj n then gi E GenG Second we show that ifz E Zm 7 Zj n then gi g GenG So rst suppose 2 E Zj n and let h 92 We want to show that h is a generator of C It su ices to show that the only possible value ofj E Zm such that h7 1 is j 0 so let us now show this Let 9 E Zm be such that h7 1 Since h gi we have 1 hj gij modm Bellare and Rogaway 11 Since 9 is a generator it must be that zj E 0 mod m meaning m divides 27 But 2 E Zj n so gcdz m 1 So it must be that m divides 9 But 9 E Zm and the only member of this set divisible by m is 0 so 9 O as desired Next suppose 2 E Zmi Z1 and let h 92 To show that h is not a generator it su ices to show that there is some non zero j E Zm such that h 1 Let d gcdz m Our assumption 2 E Zm 7 Zj n implies that d gt 1 Let 9 md which is a non zero integer in Zm because 01 gt 1 Then the following shows that h7 1 completing the proof h g aimd 9m WM 124d 139 We used here the fact that d divides 2 and that gm 1 I Example 915 Let us determine all the generators of the group Zfl Let us rst use Proposition 913 The size of Zfl is m 4711 10 and the prime factorization of 10 is 21 51 Thus the test for whether a given a E Zfl is a generator is that a2 if 1 mod 11 and a5 if 1 mod 11 Let us compute a2 mod 11 and a5 mod 11 for all group elements a We get a 1 2 3 4 5 6 7 8 9 10 a2 mod 11 1 4 9 5 3 3 5 9 4 1 a5 mod 11 1 10 1 1 1 10 10 10 1 10 The generators are those a for which the corresponding column has no entry equal to 1 meaning in both rows the entry for this column is different from 1 So Genus 2678 Now let us use Proposition 914 and doublecheck that we get the same thing We saw in Example 99 that 2 was a generator of Zfl As per Proposition 914 the set of generators is Genzfl 2239 mod 11 z e zfo This is because the size of the group is m 10 Now Zfo 1379 The values of 23 mod 11 as 2 ranges over this set can be obtained from the table in Example 99 where we computed all the powers of 2 So 2imodiizz ezfo 21mod1123mod1127mod1129mod11 2678 This is the same set we obtained above via Proposition 913 If we try to nd a generator by picking group elements at random and then testing using Proposition 913 each trial has probability of success w1010 410 so we would expect to nd a generator in 104 trials We can optimize slightly by noting that 1 and 71 can never be generators and thus we only need pick candidates randomly from Zf171 10 In that case each trial has probability of success w108 48 12 so we would expect to nd a generator in 2 trials I When we want to work in a cyclic group in cryptography the most common choice is to work over Z for a suitable prime p The algorithm for nding a generator would be to repeat the process of picking a random group element and testing it halting when a generator is found In order to make this possible we choose 13 in such a way that the prime factorization of the order p 7 1 of 12 COMPUTATIONAL NUMBER THEORY Z is known In order to make the testing fast we choose 13 so that p 7 1 has few prime factors Accordingly it is common to choose 13 to equal 2q 1 for some prime q In this case the prime factorization ofp7 1 is 21q1 so we need raise a candidate to only two powers to test whether or not it is a generator In choosing candidates we optimize slightly by noting that 1 and 71 are never generators and accordingly pick the candidates from Z 7 113 7 1 rather than from Z So the algorithm is as follows Algorithm FIND GENp q H p 7 1gt2 found 7 0 While found 7 1 do 9 i z 7 113 7 1 If 92 modp 7 1 and 9 1 modp 7 1 then found 71 EndWhile Return 9 Proposition 913 tells us that the group element 9 returned by this algorithm is always a generator of Z By Proposition 914 the probability that an iteration of the algorithm is successful in nding a generator is lGenZgt 7 M197 1 M21 171 1 Z72 2W2 m 5 Thus the expected number of iterations of the while loop is 2 Above we used that fact that u72q q 7 1 which is true because q is prime 94 Squares and nonsquares An element a of a group G is called a square or quadratic residue if it has a square root meaning there is some I E G such that b2 a in G We let QRG g E G g is quadratic residue in G denote the set of all squares in the group G We leave to the reader to check that this set is a subgroup of G We are mostly interested in the case where the group G is Z for some integer N An integer a is called a square mod N or quadratic residue mod N if a mod N is a member of QRZ V If b2 E a mod N then b is called a square root of a mod N An integer a is called a nonsquare mod N or quadratic nonresidue mod N if a mod N is a member of Z 7 QRZ V We will begin by looking at the case where N p is a prime In this case we de ne a function JP Z 7 711 by 1 if a is a square mod p Jpa O ifamodp0 71 otherwise for all a E Z We call Jpa the Legendre symbol of a Thus the Legendre symbol is simply a compact notation for telling us whether or not its argument is a square modulo 1 Before we move to developing the theory it may be useful to look at an example Bellare and Rogaway 13 Example 916 Letp 11 which is prime Then Zfl 1234 56789 10 has order 1371 10 A simple way to determine QRZf1 is to square all the group elements in turn a12345678910 a2mod11149533594 1 The squares are exactly those elements that appear in the second row so QRZf1 13459 The number of squares is 5 which we notice equals 13 7 12 This is not a coincidence as we will see Also notice that each square has exactly two different square roots The square roots of 1 are 1 and 10 the square roots of 3 are 5 and 6 the square roots of 4 are 2 and 9 the square roots of 5 are 4 and 7 the square roots of 9 are 3 and 8 Since 11 is prime we know that Zfl is cyclic and as we saw in Example 99 2 is a generator As a side remark we note that a generator must be a non square Indeed if a b2 is a square then a5 2110 1 modulo 11 because 10 is the order of the group So a7 1 modulo 11 for some positive 9 lt 10 which means a is not a generator However not all non squares need be generators Below we reproduce from that example the table of discrete logarithms of the group elements We also add below it a row providing the Legendre symbols which we know because above we identi ed the squares We get a 1 2 3 4 5 6 7 8 9 10 DLomem 0 1 8 2 4 9 7 3 6 5 J11a 1 71 1 1 1 71 71 71 1 71 We observe that the Legendre symbol of a is 1 if its discrete logarithm is even and 71 if the discrete logarithm is odd meaning the squares are exactly those group elements whose discrete logarithm is even It turns out that this fact is true regardless of the choice of generator I As we saw in the above example the fact that Z is cyclic is useful in understanding the structure of the subgroup of quadratic residues QRZ The following Proposition summarizes some important elements of this connection Proposition 917 Let p 2 3 be a prime and let 9 be a generator of Z Then QRZ 92 2 6 Zp1 and 2 is even 92 and the number of squares mod p is 1371 QRltZgt Furthermore every square mod 13 has exactly two different square roots mod p I Proof of Proposition 917 Let E 92 iEZPA andz iseven We will prove that E QRZ by showing rst that E Q QRZ and second that QRZ Q E To show that E Q QRZ let a E E We will show that a E QRZ Let 2 DLogzg ga Since 14 COMPUTATIONAL NUMBER THEORY a E E we know thatz is even Letj 12 and note thatj E Zp1 Clearly gjgtZE ijodpEli 2339 2 g 9 91 modp so 97 is a square root of a 92 So a is a square To show that QRZ Q E let I be any element of Z We will show that b2 E E Let 9 DLogzg g a Then 52 E W E 927 mdp l E 927 mod p 7 the last equivalence being true because the order of the group Z is p E 1 This shows that b2 E E The number of even integers in Zp1 is exactly 13 E 12 since 13 E 1 is even The claim about the size of QRZ thus follows from Equation 92 It remains to justify the claim that every square mod 13 has exactly two square roots mod p This can be seen by a counting argument as follows Suppose a is a square mod 13 Let 2 DLogzg ng We know from the above that 2 is even Let x 12 and let 3 w p E 12 mod p E 1 Then gm is a square root of 1 Furthermore W 2 921 2 WM 2 WW 2 a 1 2 a mod p so gy is also a square root of a Sincez is an even number in Zp1 and 13E 1 is even it must be that O S w lt p E 12 It follows that pE12 S y lt p E 1 Thus an 7 y This means that a has as least two square roots This is true for each of the p E 12 squares mod 13 So the only possibility is that each of these squares has exactly two square roots I Suppose we are interested in knowing whether or not a given a E Z is a square mod 1 meaning we want to know the value of the Legendre symbol Jpa Proposition 917 tells us that DL W E E1 WW where g is any generator of Z This however is not very useful in computing Jpa because it requires knowing the discrete logarithm of a which is hard to compute The following Proposition says that the Legendre symbols of a modulo an odd prime p can be obtained by raising a to the power 13 E 12 and helps us compute the Legendre symbol Proposition 918 Let p 2 3 be a prime Then ma 2 a mod p for any a E Z I Now one can determine whether or not a is a square mod p by running the algorithm MOD EXP on inputs 1 pE 12p If the algorithm returns 1 then a is a square mod p and if it returns 13E 1 which is the same as E1 mod p then a is a non square mod 13 Thus the Legendre symbol can be computed in time cubic in the length of 13 Towards the proof of Proposition 918 we begin with the following lemma which is often useful in its own right Lemma 919 Let p 2 3 be a prime Then 9 E E1 mod p for any generator 9 of Z Bellare and Rogaway 15 Proof of Lemma 919 We begin by observing that 1 and 71 are both square roots of 1 mod p and are distinct It is clear that squaring either of these yields 1 so they are square roots of 1 They are distinct because 71 equals p71 mod p and p71 7 1 because 13 2 3 By Proposition 917 these are the only square roots of 1 Now let I 9 mod p Then b2 E 1 mod 13 so I is a square root of 1 By the above I can only be 1 or 71 However since 9 is a generator I cannot be 1 The smallest positive value ofz such that gi is 1 mod p is 2 p 7 1 So the only choice is that b E 71 mod p as claimed I Proof of Proposition 918 By de nition of the Legendre symbol we need to show that p71 1 mod 13 if a is a square mod p 71 mod 1 otherwise Let g be a generator of Z and let 2 DLogzg g z We consider separately the cases of a being a square and a being a non square Suppose a is a square mod 13 Then Proposition 917 tells us that 2 is even In that case up 3 E 9H21 Eltgp71gti2 E 1 mod p as desired Now suppose a is a non square mod 13 Then Proposition 917 tells us that 2 is odd In that case p71 I p71 a 2 E 92gtT E gtpgi E 92 71p21p21 E gp1gti712gp21 E 9 mod p However Lemma 919 tells us that the last quantity is 71 modulo 1 as desired I The following Proposition says that ab mod p is a square if and only if either both a and b are squares or if both are non squares But if one is a square and the other is not then ab modp is a non square This can be proved by using either Proposition 917 or Proposition 918 We use the latter in the proof You might try as an exercise to reprove the result using Proposition 917 instead Proposition 920 Let p 2 3 be prime Then Jpab mod p Jpa Jpb for all 1176 Z I Proof of Proposition 920 Using Proposition 918 we get Jpab mod p E ab 2 11b 2 ma Jpb mod p The two quantities we are considering both being either 1 or 71 and equal modulo 1 must then be actually equal I A quantity of cryptographic interest is the Di ie Hellman DH key Having xed a cyclic group G and generator 9 for it the DH key associated to elements X gm and Y gy of the group is the group element 9 The following Proposition tells us that the DH key is a square if either X or Y is a square and otherwise is a non square 16 COMPUTATIONAL NUMBER THEORY Proposition 921 Let p 2 3 be a prime and let 9 be a generator of Z Then Jpgzy mod p 1 if and only if Jpgz mod p 1 or Jpgy mod p 1 for all my E Zp1 I Proof of Proposition 921 By Proposition 917 it su ices to show that my mod p 7 1 is even if and only if m is even or y is even But since 13 7 1 is even my mod p 7 1 is even exactly when my is even and clearly my is even exactly if either m or y is even I With a cyclic group G and generator 9 of G xed we will be interested in the distribution of the DH key 9 in C under random choices of my from Zm where m One might at rst think that in this case the DH key is a random group element The following proposition tells us that in the group Z of integers modulo a prime this is certainly not true The DH key is signi cantly more likely to be a square than a non square and in particular is thus not even almost uniformly distributed over the group Proposition 922 Let p 2 3 be a prime and let 9 be a generator of Z Then Pr m lt1 zp1ylti Zp1JpgW1l equals 34 I Proof of Proposition 922 By Proposition 922 we need only show that Pr m i Zp1 y lt1 Z1071 Jpgz 1 or Jpgy1 equals 34 The probability in question is 1 7 04 where a Pr m lt1 Zp1 y lt1 Zp1 Jpgz 71 and Jpgy 71 Pr wiz Jpgz 71l Prlinp1Jpgy 71 IQRZgt IQRZgt lz l lz l MM p71 p71 7 1 7 HgtIHMIH Thus 1704 34 as desired Here we used Proposition 917 which told us that QRZ p712 I The above Propositions combined with Proposition 918 which tells us that quadratic residu osity modulo a prime can be e iciently tested will later lead us to pinpoint weaknesses in certain cryptographic schemes in Z Bellare and Rogaway 17 95 Groups of prime order A group of prime order is a group G whose order m G is a prime number Such a group is always cyclic These groups turn out to be quite useful in cryptography so let us take a brief look at them and some of their properties An element h of a group G is called nontrivial if it is not equal to the identity element of the group Proposition 923 Suppose G is a group of order q where q is a prime and h is any non trivial member of G Then h is a generator of G I Proof of Proposition 923 It suf ces to show that the order of h is 1 We know that the order of any group element must divide the order of the group Since the group has prime order q the only possible values for the order of h are 1 and 1 But h does not have order 1 since it is non trivial so it must have order q I A common way to obtain a group of prime order for cryptographic schemes is as a subgroup of a group of integers modulo a prime We pick a prime 13 having the property that q p7 12 is also prime It turns out that the subgroup of quadratic residues modulo 1 then has order q and hence is a group of prime order The following proposition summarizes the facts for future reference Proposition 924 Let 1 2 3 be a prime such that p 2g 1 is also prime Then QRZ is a group of prime order 1 Furthermore if g is any generator of Z then 92 modp is a generator of QRZgt Note that the operation under which QRZ is a group is multiplication modulo 1 the same operation under which Z is a group Proof of Proposition 924 We know that QRZ is a subgroup hence a group in its own right Proposition 917 tells us that QRZ is p7 12 which equals 1 in this case Now let 9 be a generator of Z and let h 92 mod 13 We want to show that h is a generator of QRZ As per Proposition 923 we need only show that h is non trivial meaning h 7 1 Indeed we know that 92 if 1 mod 13 because 9 being a generator has order p and our assumptions imply p gt 2 I Example 925 Let 1 5 and p 2g 1 11 Both 13 and q are primes We know from Example 916 that QR f1 13459 This is a group of prime order 5 We know from Example 99 that 2 is a generator of Z Proposition 924 tells us that 4 22 is a generator of QR We can verify this by raising 4 to the powers 2 O4 In In We see that the elements of the last row are exactly those of the set QRZf1 I Let us now explain what we perceive to be the advantage conferred by working in a group of prime order Let G be a cyclic group and g a generator We know that the discrete logarithms to Chapter 11 ASYMMETRIC ENCRYPTION The setting of public key cryptography is also called the asymmetric setting due to the asymmetry in key information held by the parties Namely one party has a secret key while another has the public key that matches this secret key This is in contrast to the symmetry in the private key setting where both parties had the same key Asymmetric encryption is thus another name for public key encryption the mechanism for achieving data privacy in the public key or asymmetric setting Our study of asymmetric encryption following our study of other primitives will begin by searching for appropriate notions of security and models and formalizations via which they are captured We then consider constructions where we look at how to design and analyze various schemes With regard to notions of security we will be able to build considerably on our earlier study of symmetric encryption lndeed from this point of view there is very little difference between symmetric and asymmetric encryption not much more than the fact that in the latter the adversary gets the public key as input This is important and re assuring to remember All the intuition and examples we have studied before carry over so that we enter the study of asymmetric encryption already having a good idea of what encryption is how security is modeled and what it means for a scheme to be secure Accordingly we will deal with the security issues quite brie y just re formulating the de nitions we have seen before The second issue namely constructions is a different story Designs of asymmetric encryption schemes rely on tools and ideas different from those underlying the design of symmetric encryp tion schemes Namely in the asymmetric case the basis is typically computationally intractable problems in number theory while for the symmetric case we used block ciphers Thus the greater part of the effort in this chapter will be on schemes and their security properties 111 Asymmetric encryption schemes An asymmetric encryption scheme is just like a symmetric encryption scheme except for an asym metry in the key structure The key pk used to encrypt is different from the key sk used to decrypt Furthermore pk is public known to the sender and also to the adversary So while only a receiver in possession of the secret key can decrypt anyone in possession of the corresponding public key can encrypt data to send to this one receiver 2 ASYMMETRIC ENCRYPTION De nition 111 An asymmetric encryption scheme 45 IC D consists of three algorithms as follows 0 The randomized key generation algorithm 1C takes no inputs and returns a pair pksk of keys the public key and matching secret key respectively We write pksk i C for the operation of executing 1C and letting pksk be the pair of keys returned The encryption algorithm 5 takes the public key pk and a plaintext also called a message M to return a value called the ciphertext The algorithm may be randomized but not stateful We write C lt1 pkM or C lt1 pkM for the operation of running 5 on inputs pkM and letting C be the ciphertext returned The deterministic decryption algorithm D takes the secret key sk and a ciphertext C 7 1 to return a message M We write M e DskC or M e Dsk C The message space associated to a public key pk is the set Plaintextspk of all M for which pkM never returns 1 We require that the scheme provide correct decryption which means that for any key pair pk sk that might be output by C and any message M E Plaintextspk if C was returned by pkM then DskC M I Let R be an entity that wants to be able to receive encrypted communications The rst step is key generation R runs C to generate a pair of keys pksk for itself Note the key generation algorithm is run locally by R Anyone in possession of R s public key pk can then send a message M privately to R To do this they would encrypt M via 0 e pkM and send the ciphertext C to R The latter will be able to decrypt C using sk via M e DskC Note that an entity wishing to send data to R must be in possession of R s public key pk and must be assured that the public key is authentic meaning really is the R s public key and not someone else s public key We will look later into mechanisms for assuring this state of knowledge But the key management processes are not part of the asymmetric encryption scheme itself In constructing and analyzing the security of asymmetric encryption schemes we make the assumption that any prospective sender is in possession of an authentic copy of the public key of the receiver This assumption is made in what follows A viable scheme of course requires some security properties But these are not our concern now First we want to pin down what constitutes a speci cation of a scheme so that we know what are the kinds of objects whose security we want to assess The key usage is the mirror image of the key usage in a digital signature scheme In an asymmetric encryption scheme the holder of the secret key is a receiver using the secret key to decrypt ciphertexts sent to it by others In a digital signature scheme the holder of the secret key is a sender using the secret key to tag its own messages so that the tags can be veri ed by others The last part of the de nition says that ciphertexts that were correctly generated will decrypt correctly The encryption algorithm might be randomized and must for security But unlike in a sym metric encryption scheme we will not consider stateful asymmetric encryption algorithms This is because there is no unique sender to maintain state many different entities are sending data to the receiver using the same public key The decryption algorithm is deterministic and stateless We do not require that the message or ciphertext be strings Many asymmetric encryption schemes are algebraic or number theoretic and in the natural formulation of these schemes messages might be group elements and ciphertexts might consist of several group elements However it is understood that either messages or ciphertexts can be encoded as strings wherever necessary The Bellare and Rogaway 3 encodings will usually not be made explicit In particular we might talk of the length of a message of ciphertext with the understanding that we mean the length of some binary encoding of the quantity in question We do this for example in de ning security In cases where messages are not strings but say group elements using the scheme in practice will usually require encoding of actual messages as group elements We will discuss this as it arises 112 Notions of security Security of an encryption scheme whether symmetric or asymmetric is supposed to re ect the inability of an adversary given ciphertexts and any public information such as a public key to get non trivial information about the underlying plaintexts We allow an adversary having the goal of guring out some non trivial information about plaintexts from ciphertexts different attack capabilities re ecting different situations The most basic kind of attack is a chosen plaintext attack in which the adversary can obtain encryptions of messages of its choice We discussed this type of attack in depth in the context of symmetric encryption and argued that the de nition of security in the sense of left or right captured security against these types of attacks in a strong sense In the asymmetric case the same is true and we will use the same notion to capture security against chosen plaintext attack A difference that must be kept in mind is that the adversary in an asymmetric setting also has the public key and so can in any case encrypt on its own but this does not really affect the formalization of the notion We also discussed the stronger chosen ciphertext attack in which we desire that privacy of data be maintained even if the adversary has some limited access to a decryption oracle this being a box that contains the secret decryption key and implements decryption under this key The adversary does not get the key itself For the asymmetric setting chosen ciphertext attacks are both more relevant and more di cult to protect against than in the symmetric setting We begin by summarizing the notion of security against chosen plaintext attack extending the de nitions for the symmetric setting Then we go on to discuss chosen ciphertext attacks 1121 Security against chosenplaintext attack Let us x a speci c asymmetric encryption scheme 45 IC D We consider an adversary A that is an algorithm program that is given as input a public key pk The intuition behind the notion is as follows Imagine that the sender has two sequences of messages M6 Mg and M11 Mf lt encrypts the messages in one of the sequences to get a sequence of ciphertexts which it transmits That is if b 6 01 denotes the choice of sequence the sender computes Ci 9 P1le for 2 1 q and then transmits C1 C 1 to the receiver The adversary being able to eavesdrop obtains the ciphertexts Its goal is to gure out which of the two message sequences was encrypted namely to gure out the value of the bit b The scheme is said to be secure if it cannot compute the value of b correctly with probability signi cantly more than 12 The formalization allows the adversary to specify both message sequences and furthermore to mount an adaptive attack meaning to choose M8 Mg as a function of C1 Ci l The formalization is in terms of the left or right encryption oracle It depends on the public key and challenge bit b It takes input two messages and returns a ciphertext as follows Oracle pkLRM0M1b be 01 and M0 M1 e o1 lf M0 7 M1 then return 1 4 ASYMMETRIC ENCRYPTION C e KMb Return C Thus the oracle encrypts one of the messages the choice of which being made according to the bit b Now we consider two worlds World 0 The oracle provided to the adversary is pkLR 0 So whenever the adversary makes a query M0M1 to its oracle the oracle computes C i pkM0 and returns C as the answer World 1 The oracle provided to the adversary is pkLR So whenever the adversary makes a query M0M1 to its oracle the oracle computes C i pkM1 and returns C as the answer We call the rst world or oracle the left world or oracle and we call the second world or oracle the right world or oracle The problem for the adversary is after talking to its oracle for some time to tell which of the two oracles it was given The adversary queries makes some number of queries to its oracle and then outputs a bit This bit has some probability of equaling one The probability is over the choice of the keys pksk as made by the key generation algorithm any random choices made by the oracle and any other random choices made by the adversary in its computation We look at this probability in each of the two worlds as the basis for the de nition We suggest that the reader return to the chapter on symmetric encryption to refresh his or her mind about this model In particular remember that the encryption function is randomized and the oracle implementing it is thus randomized too Each time the oracle computes a ciphertext it does so by running the encryption algorithm with fresh coins De nition 112 Let A5 IC D be an asymmetric encryption scheme let I 6 01 and let A be an algorithm that has access to an oracle and returns a bit We consider the following experiment Experiment ExpijgcpabA pk sk lt1 K b H AspkLRbpkgt Return 27 The indcpaadvantaye of A is de ned as Advijf 39cpam Pr Expijg39cpa391A 1 7 Pr Expijg39cpa390 A 1 I As usual the time complexity mentioned above is the worst case total execution time of the entire experiment This means the adversary complexity de ned as the worst case execution time of A plus the size of the code of the adversary A in some xed RAM model of computation worst case means the maximum over A s coins or the answers returned in response to A s oracle queries plus the time for other operations in the experiment including the time for key generation and the computation of answers to oracle queries via execution of the encryption algorithm Another convention we make is that the length of a query M0M1 to a left or right encryption oracle is de ned as IMOI We can assume without loss of generality that this equals M1 since otherwise the oracle returns L and so the query would be useless The total message length which Bellare and Rogaway 5 is the sum of the lengths of all oracle queries is another parameter of interest We say that the total message length is at most 1 if it is so in the worst case meaning across all coin tosses and answers to oracle queries in the experiment We consider an encryption scheme to be secure against chosen plaintext attack if a rea sonable adversary cannot obtain signi cant advantage where reasonable re ects its resource usage The technical notion is called indistinguishability under chosen ciphertext attack denoted IND CPA 1122 Security against chosenciphertext attack Stories introducing chosen ciphertext attack can be somewhat whimsical One is about the so called lunchtime attack Entity R goes to lunch while leaving his console accessible For the short period of the lunch break an adversary gets access to this console when the lunch break is over the adversary has to leave before it is discovered at the console by the legitimate user returning from lunch The access is such that the adversary cannot actually read the secret decryption key sk imagine that sk is in protected hardware but does have the capability of executing the algorithm Dsk on input any ciphertext of its choice At that time if the adversary has in hand some ciphertext it wants to decrypt it can certainly do so there is nothing one can do to prevent that However it may be able to do even more For example perhaps there is some clever sequence of calls to Dsk via which the latter can be made to output sk itself These calls would not be made under normal execution of the algorithm on normal ciphertexts but the adversary concocts weird ciphertexts that make the decryption routine do strange things Having sk means the adversary could decrypt tra ic at any time in the future even after the lunch break Alternatively the adversary is able to call Dsk on some inputs that result in the adversary s gaining some information that would enable it to decrypt some fraction of ciphertexts it might see later after the lunch break when it no longer has access to Dsk These are the eventualities we want to prevent This scenario is arti cial enough that were it the only motivation it would be natural to wonder whether it is really worth the trouble to design schemes to withstand chosen ciphertext attack But this is not the main motivation The real motivation arises from gathering evidence that asymmetric encryption schemes secure against chosen ciphertext attack are the desired and appropriate tool for use in many higher level protocols for example protocols for authenticated session key exchange There a party decrypts a random challenge message to prove its identity This leaves it open to a chosen ciphertext attack on the part of an adversary who sends ciphertexts in the guise of challenges and obtains their decryption Were this attack to reveal the secret key the adversary could impersonate the legitimate entity at a later date since it would now itself possess the ability to decrypt the challenges sent by others Based on this and other such applications we would like to design asymmetric encryption schemes that are secure against very strong kinds of chosen ciphertext attack To illustrate let s consider the following game An adversary A is given a challenge ciphertext C and must output the corresponding plaintext to win the game The adversary is given the public key pk under which C was created and is also given access to the oracle Dsk allowing decryption under the secret key sk corresponding to pk A trivial way for the adversary to win the game is to invoke its oracle on C This triviality is the one thing disallowed We allow the adversary to invoke Dsk on any input C 7 C Of course it may invoke the oracle multiple times all the inputs provided to the oracle must however be different from C If from the information so gathered the adversary can 6 ASYMMETRIC ENCRYPTION compute DskC then it wins the game This is a very strong form of chosen ciphertext attack the adversary can invoke the decryption oracle on any point other than the challenge Again one s rst reaction might be that it is in fact ridiculously strong How in any setting where l have some sort of decryption oracle access is it possible that I could not ask the query of my choice yet be able to ask absolutely any other query Indeed it is hard to imagine such a setting Yet this is the right attack model to consider for several reasons One is that in proving the security of authenticated key exchange protocols that use asymmetric encryption as discussed above it is exactly security under such an attack that is required of the asymmetric encryption scheme The other reasons is perhaps more fundamental We have seen many times that it is di cult to anticipate the kinds of attacks that can arise It is better to have an attack model that is clear and well de ned even if perhaps stronger than needed than to not have a clear model or have one that may later be found to be too weak We have already seen that inability to decrypt a challenge ciphertext is not evidence of security of a scheme since one must also consider loss of partial information In nalizing a notion of security against chosen ciphertext attack one must take this into account too This however we already know how to do via left or right encryption oracles De nition 113 Let A5 IC D be an asymmetric encryption scheme let I 6 01 and let A be an algorithm that has access to two oracles and returns a bit We consider the following experiment Experiment Expi g39cm39bA pk sk lt1 K b A AgpkLR39i39ibgtgt Dsk39gtpk If A queried Dsk on a ciphertext previously returned by KLR 27 then return 0 else Return 27 The indcca advantaye of A is de ned as Advigg39mm Pr Expi g39cm391A 1 7 Pr Expigg39m m 1 The conventions with regard to resource measures are the same as those used in the case of chosen plaintext attacks We consider an encryption scheme to be secure against chosen ciphertext attack77 if a rea sonable77 adversary cannot obtain signi cant advantage where reasonable re ects its resource usage The technical notion is called indistinguishability under chosen ciphertext attack denoted lND CCA 113 One encryption query or many The adversary in our de nitions is allowed to make many queries to its lr encryption oracle We gave it this power because it might be possible to expose weaknesses in the encryption scheme via an attack involving observing the encryptions of many related messages chosen adaptively as a function of ciphertexts of previous messages Indeed it may be possible to achieve a higher advantage with more queries but we show here that the gain is limited Namely an adversary making 18 lr encryption oracle queries cannot achieve an advantage greater than 18 times that of Bellare and Rogaway 7 an adversary making just one lr encryption oracle query and having other resources comparable to that of the original adversary This is true both under chosen plaintext and chosen ciphertext attack as indicated in the following Theorem 114 Let A5 IC D be an asymmetric encryption scheme Let B be an ind cpa adversary who makes at most 18 queries to its leftor right encryption oracle Then there exists an ind cpa adversary A making at most one query to its left or right encryption oracle and such that MagWE g qe Mag8PM 111 Furthermore the running time ofA is that of B Similarly let B be an ind cca adversary who makes at most 18 queries to its left or right encryption oracle Then there exists an ind cca adversary A making at most one query to its left or right encryption oracle and such that Advig f 39mw g qe Advigf 39mm 112 Furthermore the number of decryption oracle queries made by A is the same as made by B and the running time of A is that of B I In a qualitative sense this theorem can be interpreted as saying that an asymmetric encryption scheme secure against adversaries making just one lr encryption query is also secure against adver saries making many lr encryption queries This will simplify later analyses by allowing us to focus on adversaries that make only one lr encryption query An important element making this result possible is that in an asymmetric encryption scheme an adversary can itself encrypt any message it wants because it has the public encryption key In the symmetric setting the adversary cannot directly encrypt a message but may only do so via an oracle that holds the key An analogue of the above is true in the symmetric setting but requires that the adversary be provided not only with an lr encryption oracle but also with an encryption oracle Proof of Theorem 114 The statement corresponding to Equation 111 follows from the state ment corresponding to Equation 112 by considering an ind cca adversary who makes no queries to its decryption oracle so we need only prove the statement corresponding to Equation 112 We will use what s called a hybrid argument We will associate to B a sequence of experiments Expfggw Exp345B W Expggw 113 such that if we let 131 Pr Exp345B 1 for 2 6 01 q then it will be the case that P0 Pr Expigg39m390B1l 114 Pq Pr Expigg39m391B1l 115 In other words the rst and last experiments in our sequence will correspond to the world 0 and world 1 experiments respectively in De nition 112 If so De nition 112 tells us that Advig g39mw Pq 7130 8 ASYMMETRIC ENCRYPTION Oracle HagkalO7 M1 Experiment Exp345B jej1 pkskltilC Ifj Si d lt BHgkquot39gt D5k39gtpk then C lt1 5pkM1 Return 01 else C lt1 SpkM0 Endlf Return C Adversary AgpkLRquotquotbgtgt Dsk39gt pk 9 90 1amp1q Subroutine 05M0M1 j H j 1 Ifj lt I then C lt1 Emu1 Endlf lfj I then C lt1 pkLRM0M1b Endlf Ifj gt I then C lt1 ammo Endlf Return C End Subroutine d amp Bogquot39gt Dsk39gtpkgt Return d Figure 111 Hybrid oracles and experiments related to the construction of ind cca adversary A in the proof of Theorem 114 Now comes a trick We consider the sum qil 21131 7 P2 l r 2 1 lts value7 of course7 is 0 Hence7 from the aloove7 Advi g39mw PqgtP0 We will now construct ind cca adversary A so that Pr Exp gcca1A 1 2131 116 Bellare and Rogaway 9 qil Pr Expi g39ma390A 1 5 2131 117 10 Then from the above we would have 1 Advg f 39mm g Adv jjg39m B Re arranging terms we get Equation 112 We now specify the hybrid experiments of Equation 113 in such a way that Equations 114 and 115 are true and we are able to construct adversary A such that Equations 116 and 117 are true We associate to any 2 E 0 q an oracle and an experiment as indicated in Fig 111 The oracle associated to 2 is stateful maintaining a counter j that is initialized to O by the overlying experiment and is incremented by the oracle each time the latter is invoked Now observe that oracles H ka and pkLR 0 are equivalent meaning that on any inputs their responses are identically distributed Similarly oracles H k and SpkLR 1 are equivalent Hence Equations 114 and 115 are true Adversary A is speci ed in Fig 111 It begins by initializing a counter j to 0 and picking I at random from 1 q It then de nes a subroutine 05 Finally A executes B replacing the B s lr encryption oracle with the subroutine OS and providing B a decryption oracle via A s own access to a decryption oracle We highlight that A s operation depends on the fact that it was provided the public encryption key as an input This enables it to compute encryptions under this key directly and it does so inside the subroutine Had we not given A the public key this construction would not be posible To complete the proof it su ices to justify Equations 116 and 117 Suppose A is in world 1 meaning the challenge bit 1 equals 1 Then subroutine OS encrypts the right message of its input pair the rst I times it is called and the left message after that One the other hand if A is in world 0 meaning I 0 subroutine OS encrypts the right message of its input pair the rst I 7 1 times it is called and the left message after that Regarding I as a random variable taking values in 1 q this means that for every 2 E 1 q we have Pr Expigg39m391A 1 I 131 Pr Expi g39ma390A 1 I 132 7 1 Since the random variable I is uniformly distributed in the range 1 q we have q Pr Expglg39m391A 1 ZPr Expg g39m391A 1 I PrI 1 21 q 2 131 1 1 2 1 1 This justi es Equation 116 Similarly I Pr Expijg39m39Wl 1 Zpr Expg g39m390A 1 I PrI 1 10 ASYM METRIC ENCRYPTION 20 1 which justi es Equation 117 This concludes the proof I 1 14 Hybrid encryption Before we present constructions of asymmetric encryption schemes it is useful to get some idea of the context in which they are used Given an asymmetric encryption scheme 45 IC D one rarely encrypts data directly with it Rather to encrypt M under a public key pk of this scheme we rst pick a random key K for a symmetric encryption scheme 55 1C5 5D5 encrypt K under pk via the asymmetric scheme to get a ciphertext C encrypt M under K via the symmetric scheme to get a ciphertext Cs and transmit 0 15 This is called hybrid encryption More precisely hybrid encryption is a transform that given any asymmetric encryption scheme and any symmetric encryption scheme associates to them a new asymmetric encryption scheme Scheme 115 Let A5 10 5 D be an asymmetric encryption scheme and let SE 1C5 5D5 be a stateless symmetric encryption scheme such that KeysS Q Plaintextspk for every pk that might be output by K The hybrid encryption scheme associated to A S is the asymmetric encryption scheme E 10135 whose key generation algorithm is the same as that of A5 and whose encryption and decryption algorithms are de ned as follows Algorithm EpkM Algorithm 55140 K lt1 K5 05 lt1 fM Parse C as 01105 If C5 L then return L K e DkC C lt1 531K C lt 0 05 lf K L then return L Return C M H Dido Return M Under this hybrid encryption scheme one can asymmetrically encrypt any message M that is in the plaintext space of the underlying symmetric encryption scheme Hybrid encryption is used for numerous reasons The principal one is cost The number theoretic operations underlying common asymmetric encryption schemes are computationally costly relative to the operations on block ciphers that underly common symmetric encryption schemes In practice one wants to minimize the amount of data to which these number theoretic operations are applied Accordingly rather than encrypt the possibly long message M directly under pk via the given asymmetric scheme one uses hybrid encryption The costly number theoretic operations are thus applied only to data whose length k is xed and not dependent on the length of M This context tells us that when we design asymmetric encryption schemes we can typically assume that the message space consists of short strings This will facilitate our constructions However before we adopt the hybrid encryption paradigm we need to know that it works meaning that it is secure In assessing the strength of hybrid encryption we use as usual the Bellare and Rogaway 11 provablesecurity philosophy and approach A hybrid encryption scheme is built from two com ponents a base asymmetric encryption scheme and a base symmetric encryption scheme The appropriate question to ask is whether the assumed security of the components su ices to guar antee security of the hybrid scheme based on them It turns out that it does and moreover for security under both chosen plaintext and chosen ciphertext atttacks Theorem 116 below addresses the rst case and Theorem 117 the second Although the latter implies the former we state and prove them separately because the proof has some delicate issues and ideas and is best understood via an incremental approach Theorem 116 Let A5 IC D be an asymmetric encryption scheme let SS 1055 D5 be a stateless symmetric encryption scheme such that Keys5 Q Plaintextspk for every pk that might be output by K and let E 10135 be the hybrid encryption scheme associated to 45 55 as per Scheme 115 Let k denote the length of keys output by 0 Let B be an ind cpa adversary attacking E Then there exist ind cpa adversaries A00 01A11 10 attacking A5 and an adversary A attacking 55 such that Advij S39CPWB g Advig f 39cpammm Advig f 39cpamnm AdvglgCWA 118 Furthermore suppose B had time complexity at most t made at most 1 queries to its left or right encryption oracle these totalling at most 1 bits in length Then A0001A1110 each have time complexity at most t and make at most 1 left or right encryption oracle queries each query being k bits long Also A has timecomplexity at most t and makes only one query to its left or right encryption oracle I The qualitative interpretation of Theorem 116 is that if 45 and 55 are each assumed to be secure against chosen plaintext attack then E is also secure against chosen plaintext attack On the quantitative front note that the advantage of E against an attack involving 1 lr encryption queries is upper bounded as a function of the advantage of 55 against an attack involving only a single lr encryption query This means that the symmetric encryption scheme used may be very weak and yet the hybrid asymmetric encryption scheme will be secure For example the encryption algorithm of the symmetric encryption scheme could apply a pseudorandom bit generator to the key to get an output of M bits and XOR this with the message to get the ciphertext In particular the symmetric encryption scheme could be deterministic Proof of Theorem 116 These constructions are not as straightforward as some we have seen in the past We will need to isolate the asymmetric and symmetric components of E in such a way that an attack on this scheme can be broken down into attacks on the component schemes To do this we will use a hybrid argument We will associate to B a sequence of experiments 00 01 11 10 Exp w Exp w Exp w Exp w 119 such that if we let Paa Pr Exp B 1 1110 12 ASYM METRIC ENCRYPTION for bits a8 6 01 then it will be the case that P10 PrExpCpa1B1 1111 P00 Pr Exp39cpa39 31l 1112 In other words the rst and last experiments in our sequence will correspond to the world 0 and world 1 experiments respectively in De nition 112 If so De nition 112 tells us that indspa 7 AdvE B 7 P10 7 P00 Now comes a trick We throw into the expression P1 0 7 PO 0 a bunch of extra terms that sum to zero and hence don t change the value of the expression and then we regroup like this P10 7 P00 7 Mo 7 Plt11gt Plt11gt 7 Plt01gt No1 7 Plt0ogt 13170 P171l 13171 P071l 13071 P070lgt We have now written the ind cpa advantage of B as a sum of the differences that adjacent experi ments in our experiment sequence return 1 We will then construct the adversaries A0100AA1011 such that P017P00 lt AdaWANG 1113 P117P01 g Advig g39cpam 1114 P107P11 g AdvglgprAmn 1115 Equation 118 follows The template above is pretty generic What we need to do now is to actually specify the hybrid experiments of Equation 119 in such a way that Equations 111171112 are true and we are able to construct adversaries A0190 AA1011 such that Equations 111371115 are true Recall that B has access to an oracle that takes input a pair of messages and returns a ciphertext In the experiments of De nition 112 that de ne the ind cpa advantage of B this oracle is either EpkLR 1 or EpkLR O depending on the world in which B is placed Our hybrid exper iments will involve executing B not only with these oracles but with others that we will de ne Speci cally we will de ne a sequence of oracles 715326 7153116 7155116 715526 r 1116 Each oracle will take input a pair M0 M1 of messages and return a ciphertext Now to each pair a8 of bits we associate the a hybrid experiment de ned as follows 0 Experiment ExpEB pksk lt1 K at d lt7 Bngk quot39gtp1lt Return 01 Bellare and Rogaway 13 Oracle H MOM1 Oracle H M0M1 KOingKliICS KOingKliICS Csi sK0 osi sK0 If C5 L then return L If C5 L then return L Calti pk Calti pk C lt 01105 0 lt 01105 Return C Return C Oracle H iM0M1 Oracle H M0M1 KOingKliICS KOingKliICS Csamp sK0 osamp sK0 If C5 L then return L If C5 L then return L Calti pk Calti pk C lt 01105 0 lt 01105 Return C Return C Figure 112 Hybrid lr encryption oracles used in the proof of Theorem 116 This de nes our experiments in terms of the oracles and nally the oracles themselves are speci ed in Fig 112 Each hybrid lrencryption oracle is paramterized by a pair a8 of bits and takes input a pair M0 M1 of messages Examining the oracles you will see that they are mostly identical different only in the quantities that have been boxed Each oracle picks not one but two keys KOK1 independently at random for symmetric encryption It then encrypts Ma under K0 via the symmetric encryption scheme to get a ciphertext C5 and it encrypts K under the public key via the asymmetric encryption scheme to get a ciphertext C It returns the pair C Cs Note oracles H a and H aa do not actually use K1 We have asked these oracles to pick K1 only to highlight the common template underlying all four oracles Observe that oracle H532 and oracle EpkLRO are equivalent in the sense that their responses to any particular query are identically distributed Similarly oracle Hfg g and oracle pkLR 1 are equivalent This means that Equations 1111 and 1112 are true which is the rst requirement of a successful hybrid argument The new hybrid lr encryption oracles we introduce may seem rather bizarre at rst since they do not necessarily return valid ciphertexts For example oracle 01 will return a ciphertext C Cs in which C5 is the encryption of M0 under a key K0 but C is not an encryption of K0 as it ought to be under the de nition of E but rather is the encryption of a random unrelated key K1 Thus in the corresponding hybrid experiment B is not getting the types of responses it expects But nonetheless being an algorithm with access to an oracle B will execute and eventually return a bit The meaning of this bit may be unclear but we will see that this does not matter Before constructing A0190 so that Equation 1113 is true let us try to explain the intuition Con sider hybrid lr encryption oracles 00 and 01 Note that in both experiments C5 is computed 14 ASYM METRIC ENCRYPTION Adversary Agfi immb pk Adversary Afgiflfmmb pk Subroutine 05M0 M1 Subroutine 05M0M1 KOingKliICS KOingKliICs Cs 55K0M0 Cs i 5KOM1 If C5 L then return L If C5 L then return L Ca i sgkLRKO K1 5 0a i sgkLRK1K0 5 Return 0105 Return C Cs End Subroutine End Subroutine d lt1 BOEquot39gtpk d lt1 BOEquot39gtP1ltgt Return 01 Return 01 Figure 113 Adversaries attacking A5 constructed for the proof of Theorem 116 in exactly the same way This means that the difference between P00 and PO 1 measures the ability of the adversary to tell whether C encrypts the key underling C5 or not This is something we can relate solely to the security of the base asymmetric encryption scheme Adversary A0190 attacking the base asymmetric encryption scheme 45 is speci ed in Fig 113 As per De nition 112 it has access to a lr encryption oracle kLR b lts strategy is to de ne a subroutine OS and then run B using 05 to reply to B s oracle queries Subroutine 05 takes input a pair M0M1 of messages picks a pair of keys for symmetric encryption and nally returns a ciphertext which is computed using a call to the given oracle SkLR Consider A0190 in world 1 meaning its oracle is SgkLR In that case the ciphertext C computed by subroutine 053 is an encryption of K1 and thus subroutine O is equivalent to oracle 39Hfg On the other hand when A0190 is in world 0 meaning its oracle is SSkLR 0 the ciphertext C computed by subroutine 953 is an encryption of K0 and thus subroutine 953 is equivalent to oracle H Hence Pr Expig g39cpa39lmmpm 1 Pr EXpB 1 Pr Expig g39cpa390Am001l Pr EXpB1 Subtracting and remembering the notation of Equation 1110 we get Advg g39cpammm p0 1 7 P00 which justi es Equation 1113 We leave to the reader the task of verifying Equation 1115 based on the construction of A1140 given in Fig 113 and now proceed to the construction of the ind cpa adversary A attacking the base symmetric encryption scheme The intuition here is that the O 1 and 1 1 hybrid experiments both compute C as an encryption of key K1 but differ in which message they symmetrically encrypt under K0 and thus the difference between PO 1 and P1 1 measures the ability of the adversary to tell which message C5 encrypts Bellare and Rogaway 15 under K0 This is something we can relate solely to the security of the base symmetric encryption scheme The construction however will require a little more work introducing another sequence of hybrid experiments This time there will be 1 1 of them 0 1 EXpEBgt EXpEBgt ExpgB Again we associate to any 2 E 0 q an oracle and an experiment as indicated in Fig 114 The oracle associated to 2 is stateful maintaining a counter j that is initially O and is incremented each time the oracle is invoked The oracle behaves differently depending on how its counter 9 compares to its de ning parameter 2 lfj S 2 it symmetrically encrypts under K0 the right message M1 and otherwise it symmetrically encrypts under K0 the left message M0 The asymmetric component C is always an encryption of K1 Fori0qwe let 131 Pr EXpZEB 1 Now suppose 2 0 In that case the value C5 computed by oracle Hfgk on input M0M1 is a symmetric encryption of M0 regardless of the value of the counter 9 This means that oracles 39Hfgk and 39Hfg are equivalent Similarly oracles 39Hfgk and 39Hfg are equivalent Hence P01 P0 and P11 Pq So P117P01 PqP0 Pqgt Pq 1 Pq 1 P1P1P0 291 7 1317 1 1117 2 71 Our ind cpa adversary A attacking the base symmetric encryption scheme 55 is depicted in Fig 114 It gets a lr encryption oracle fLR b based on a hidden key K and challenge bit b It picks a pair of public and secret keys by running the key generation algorithm of the asymmetric encryption scheme It then picks an index 2 at random and initializes a counter j to 0 Next it de nes a subroutine 053 that takes input a pair of messages and returns a ciphertext and runs B replying to the latter s oracle queries via the subroutine The subroutine increments the counter j at each call and computes the symmetric component C5 of the ciphertext differently depending on how the counter 9 compares to the parameter 2 In one case namely when 9 2 it computes C5 by calling the given fLR b oracle on inputs M0M1 Notice that A makes only one call to its oracle as required For the analysis regard I as a random variable whose value is uniformly distributed in 1 q Then notice that for any 2 E 1 q Pr ExpglgCpa1A 1 I 131 Pr mpgWWW 1 I 1317 1 Thus AdvglgCWA Oracle H kM0M1 9 R9 1 K0 lt1 K5 K1 lt1 K5 ngi then Cs lt1 55K0 else Cs lt1 55K0 EndIf If C5 L then return L Ca lt1 pk C e 01105 Return C ASYM METRIC ENCRYPTION Experiment Expj4 5B pksk i K d e BHgkquot39gtpk Return 01 Adversary A5 ltLRquotquot5 pkskltizcajeo1amp1q Subroutine 05M0M1 JH j 1 K0 lt1 16 K1 lt1 K5 Ifj lt I then Cs lt1 55K0 EndIf If 9 I then 05 1 fLRM0M1b EndIf Ifj gt I then Cs lt1 55K0 EndIf If C5 L then return L Ca lt1 pk Return C CS End Subroutine d lt1 B05quot39gtpk Return 01 Figure 114 Hybrid oracles and experiments related to the construction of ind cpa adversary A in the proof of Theorem 116 Pr Expg8pa1A 1 7 Pr Expglg8pa0A 1 q ZPr Expg g39cpa391A 1 I Pr 1 2 2391 q i 7 ZPr Expg g39cpa390A 1 I Pr 1 2 2391 Bellare and Rogaway 17 21 2 1 zpaymin 21 1 gp11 7P01 In the last step we used Equation 1117 Re arranging terms we get Equation 1114 This completes the proof I We now proceed to the chosen ciphertext attack case The scheme itself is unchanged but we now claim that if the base components are secure against chosen ciphertext attack then so is the hybrid encryption scheme Theorem 117 Let A5 IC D be an asymmetric encryption scheme let SS 1055 D5 be a stateless symmetric encryption scheme such that Keys5 Q Plaintextspk for every pk that might be output by K and let E 10135 be the hybrid encryption scheme associated to A S as per Scheme 115 Let k denote the length of keys output by K5 and let 0 denote the length of a ciphertext created by 5 on input a k bit message Let B be an ind cpa adversary attacking E Then there exist ind cpa adversaries A0100A1011 attacking A5 and an adversary A attacking 55 such that Advj E39CWB g Advg g39mmmm Advigg39mmml Advlglg39mm 1118 Furthermore suppose B had time complexity at most t made at most 18 queries to its leftor right encryption oracle these totalling at most 115 bits in length and at most qd queries to its decryption oracle these totalling at most ud bits in length Then A0001A1110 each have timecomplexity at most t and make at most 18 leftor right encryption oracle queries each query being k bits long and at most qd queries decryption oracle queries each at most 0 bits long Also A has timecomplexity at most t makes only one query to its left or right encryption oracle and at most qd queries to its decryption oracle I Proof of Theorem 117 We use a hybrid experiment template similar to the one in the proof of Theorem 116 but there are some tricky issues regarding decryption oracles Let us try to highlight these before proceeding to the constructions Since B now has access to a decryption oracle the a8 hybrid experiment of the proof of Theorem 116 will have to be enhanced to provide B with an oracle that plays the role of the decryption oracle that B expects A natural rst thought is to set this to the actual decryption or acle 5514 Now let us look ahead to the construction of A0190 Besides subroutine OS to replace B s lr encryption oracle A0190 will have to provide a subroutine OD to replace B s decryption oracle This seems easy at rst glance because A0190 itself being a ind cca adversary has access to a decryption oracle Dsk for the base asymmetric encryption scheme Thus it can simulate 18 ASYM METRIC ENCRYPTION Oracle H5 M07M1 Oracle H5311ltM07M1gt j H j 1 j H 1 KEN116 Kmilcs KO 110 K1 16 Ciggsmm 0issKogt If C L then return L If 0 L then return L Cg wpk og wpk 0790339lvcf39 Cjwwy lvof39 Return Cy Return C Oracle H iM0M1 Oracle 5519M07M1 j e j 1 j Hj 1 K0 lt1 K5 K1 1 C5 KEN 1 K5 K1 amp Cs Oji 5KOJ Cigfs m39y If C L then return L If C i then return i Cglti pk Cfefauik C a egg 07 a 0530 Return C Retum Ci Oracle HDSkC Parse C as C CS llt chmoa Cf Cg lfl 73 0 then M lt D5K0 ZCs Endlf lfl 0 then M lt 5sk C Endlf Return M Figure 115 Hybrid lr encryption oracles and hybrid decryption oracle used in the proof of Theorem 117 sk The catch is in the rules of the game Recall that A0190 is not allowed to call its decryption oracle on a ciphertext C that was previously returned by its own lr encryption oracle However in attempting to simulate 5514 using Dsk it might be forced to do so because B might call 5514 on a ciphertext C CS where a ciphertext of the form 0 X was previously returned by B s lr encryption oracle but X 7 05 In that case A0091 is stuck how can it decrypt C CS without breaking the rules of its game To get around this problem we will enhance the hybrid experiments to provide B not with an actual decryption oracle but with a fake one that we will de ne Let us now proceed to the actual proof To each pair a8 of bits we associate the 048 hybrid experiment de ned as follows Bellare and Rogaway 19 Experiment ExpB pksk 10 9 lt 0 d u BH EltgtHvskltgtpk Return 01 The experiment initializes a counter j to 0 and then runs B replacing B s lr encryption oracle with a hybrid lr encryption oracle and B s decryption oracle with a hybrid decryption oracle Note that the hybrid lr encryption depends on 16 but the hybrid decryption oracle does not However the two oracles share state in the form of counter j as well as quantities that are created and stored by the hybrid lr encryption oracle and then accessed by the hybrid decryption oracle The hybrid lr encryption oracles shown in Fig 115 are equivalent to the corresponding ones of Fig 112 in the sense that on any inputs M0M1 the output of the 16 hybrid lr encryption oracle of Fig 115 is distributed identically to the output of the a8 hybrid lr encryption oracle of Fig 112 However the code of the hybrid lr encryption oracles has been enhanced to do some extra internal book keeping The hybrid decryption oracle invokes a subroutine Find that given a value T and a list T1 Tj returns the smallest 9 such that T Tj if such a 9 exists and 0 if T g T1 Tj It uses this to see whether the asymmetric component of the ciphertext it is given to decrypt was previously returned by a partner a 8 hybrid lr encryption oracle If not it decrypts the given ciphertext via the decryption algorithm of scheme E using the secret key sk which it is given Else it decrypts the symmetric component of the ciphertext under the key K0 chosen at the time the asymmetric component was rst created As in the proof of Theorem 116 for any 046 6 01 we let Pa8 Pr EXpZ B 1 Observe that the hybrid decryption oracle is equivalent to 5514 in the cases a 8 E 0 0 10 because in these cases the asymmetric component of the ciphertext produced by the hybrid lr encryption oracle is an encryption of K03 Thus we have P10 Pr Exp13 gm1w 1 P00 Pr Expij g39m39031l Following the proof of Theorem 116 our proof of Equation 1118 is complete if we can construct adversaries A0190 A A1041 such that P017P00 g Advigg39mmmpo 1119 P117P01 g Advig g39mm 1120 P107P11 g Advigg39mmml 1121 The constructions of A0190 and A1041 are shown in Fig 116 Each adversary runs B replacing B s lr encryption oracle with a subroutine OS and B s decryption oracle with a subroutine OD Note the OD subroutine calls is It does not actually have sk but it can implement this by running the code of 5514 shown in Scheme 115 and using its oracle 20 ASYM METRIC ENCRYPTION Adversary AS TS Rmbgtgt ikgtpk Adversary AfglijIRb73gkpk j H 0 j A 0 SUbrOunne 05M07M1 Subroutine 05M0M1 j H j 1 9 e j 1 K0 lt1 16 K1 lt1 K5 0 550 M0 If C L then return L 0amp EgkLRKOJ K1 5 K0 1C5 K1 lt1 K5 0 lt1 5500 M1 If C L then return L 051 531LRK177K0775 Return C5303 End Subroutine Subroutine ODC Parse C as C Cs e FindC Cf 0 HI 7g 0 then M e D5K01705 lfl 0 then M e 5skC Return M End Subroutine d i BOS3939OD39pk Return 01 Return C Cs End Subroutine Subroutine ODC Parse C as C Cs e FindC 0 0 If 1 7g 0 then M e D5K01705 If 1 0 then M 95sk0 Return M End Subroutine d i BOS3939OD39pk Return 01 Figure 116 Adversaries attacking A5 constructed for the proof of Theorem 117 We note that A0190 and A1041 are legal in the sense that they never query their decryption oracle on a ciphertext previously returned by their lr encryption oracle kLR b This is ensured by the de nition of OD which if given a ciphertext C C C5 whose asymmetric component C was previously returned by kLR b does not call ng ut instead directly computes the symmetric decryption of C5 under a key KW satisfying C C We leave to the reader to extend the arguments of the proof of Theorem 116 to verify that Equa tions 1119 and 1121 are true and proceed to the construction of the ind cca adversary A attacking the base symmetric encryption scheme We associate to any 2 E O q the oracle and experiment de ned in Fig 117 The experiment shown in that gure initializes counter j and then runs B with the shown oracle and also with the hybrid decryption oracle HDSkQ that we de ned previously The two oracles share state in the form of a counter j and as well as quantities that are created and stored by Hfgk and then accessed by HDSkQ Our ind cca adversary A attacking the base symmetric encryption scheme 55 is depicted in Fig 117 The novelty here as compared to Fig 114 is the subroutine OD de ned by A Note that it considers three cases for the value of Z In the second case it computes a response by invoking A s given decryption oracle In the third case it computes a response using the fact that it knows sk We must check that A is legal meaning that it never queries its decryption oracle with a ciphertext Bellare and Rogaway 21 C5 previously returned by its lr encryption oracle Suppose B makes decryption oracle query C 0 C5 Our concern is that C5 Of The latter is the only ciphertext returned by As lr encryption oracle since A makes only one query to this oracle and thus this is the only concern Let Z FindC Cf Cf lfl 7 I then A does not query its decryption oracle at all and thus certainly does not make an illegal query So suppose I I This means C C However B is assumed to be legal which implies C CS 7 Off3915 so it must be that C5 7 Cf as desired The analysis of A is then analogous to the one in the proof of Theorem 116 and we omit the details I 115 El Gama scheme and its variants Let G be a cyclic group with generator 9 meaning G 90gl gn l where n G is the order of G Recall that the discrete exponentiation function is DEXpG g Z a G x gt gt gm The inverse of this function is the discrete logarithm function DLogG g G a Zn X gt gt w where w 6 Z is the unique integer such that gm X in G The discrete exponentiation function is conjectured to be oneway meaning the discrete loga rithm function is hard to compute for some groups C An example is the group G Z under multiplication modulo 1 where p is a large prime such that p7 1 has a large prime factor The size order of Z is p 7 1 so in this case n p 7 1 In other words exponents of g are in the range 01p72 Let us now assume G is some cyclic group in which the discrete logarithm problem is hard We would like to use this assumption as the basis of an encryption scheme in the sense that somehow an adversary wanting to decrypt should be faced with solving a discrete logarithm problem The basic idea is the following Let the receiver s secret key by w 6 Zn and let its public key be X gm 6 G Note that computing the secret key given the public key involves computing a discrete logarithm and by assumption is hard Now suppose a sender in possession of X picks y 6 Zn lets Y gy E G and sends Y to the receiver At this point the receiver holds 3 Y and the sender holds yX Consider the quantity K 9W 6 G and notice that W W J WXy 1122 9 g 9 The sender can compute K as Xy since it knows yX while the receiver can compute K via Y9 since it knows wY The quantity K is thus a shared key Having such a key encryption of a message M is easy Assuming that M E G is a group element the sender computes W KM in G and transmits W to the receiver The latter having K recovers M as WK l The above description might make it look like there are two steps namely the sender rst transmits Y and then W but when we implement this as an encryption scheme we simply merge these steps The sender computes Y and W and the ciphertext it transmits is the pair Y 22 ASYM METRIC ENCRYPTION Now what about security The adversary is in possession of the public key X and via eavesdropping will obtain the ciphertext Y The most direct attack is to attempt to compute K 9 An adversary attempting to do this is faced with solving what we call the computationl Di ie Hellman CDH problem given XY compute 9W where X gm and Y 93 One approach to solving this is for the adversary to try either to nd either an and then let K Ya or to nd 3 and let K Xy But nding an or 3 given X Y involves computing discrete logarithms and is hard by assumption However even if the discrete logarithm problem is hard we are not necessarily assured that computing K given X Y is hard because there may be methods to do this that do not involve computing discrete logarithms We do not know whether such methods exist or not but we do know empirically that the CDH problem seems to be hard in numerous groups Accordingly the security of the scheme relies on this assumption rather than merely the assumption that the discrete logarithm problem is hard But this is a very rough view of the security considerations When we examine security more closely we will see that for the schemes to achieve the kinds of strong well de ned notions of security we have discussed above the CDH assumption is not su icient But it is certainly the rst cut at understanding the issues 1151 The El Gamal scheme What we described informally above is the El Gamal encryption scheme Let us now detail it and then look more closely at its security Scheme 118 Let G be a cyclic group of order n and let 9 be a generator of G The El Gamal encryption scheme ASEG 1C 5 D associated to G g is the asymmetric encryption scheme whose constituent algorithms are depicted below Algorithm 1C Algorithm XM Algorithm DzY w 1 Z If M g G then return 1 K 9 Y9 Xltgac y znglvegy MHWKil Return Xw KeXy WHKM ReturnM Return Y W The plaintext space associated to a public key X E G is G itself and if M is not in this set then the encryption algorithm returns 1 I The quantities G g are assumed to be chosen a priori and known to all parties A typical example is G Z where p 2 3 is a prime We have discussed in Section 93 how to nd primes and generators and thus set up G g in this case The rst thing that should be veri ed about the El Gamal scheme is that decryption works correctly meaning Dz XM M for all M E G This is true because of Equation 1122 which says that the value K in both algorithms is indeed the same In common with several other algebraic schemes in the natural formulation of the El Gamal scheme given above the message is a group element In practice we might prefer to think of our starting message as a string In that case we would encode the string as a group element before using the El Gamal scheme For example if G Z where p is a prime of length k ie 219 1 S p lt 2 the scheme could be viewed as enabling us to encrypt any binary string message m of length k 7 1 To do this compute the integer whose binary representation is m and then adding one to it to get an integer M in the range 1 2k 1 This M beign in Z can be thought of as the message Bellare and Rogaway 23 for the El Gamal scheme From the group element returned by the decryption algorithm one can recover the corresponding string message in the obvious way More generally the message space can be viewed as any set of strings of size at most G mapping these to group elements via some injective encoding function for the sake of encryption Now we turn to security concentrating rst on security against chosen plaintext attack The rst thing to consider is whether the adversary could recover the secret key w from the public key X This however requires solving the discrete logarithm problem which we are assuming is computationally intractable Next we could consider the possibility of recovery of the plaintext M from a ciphertext Y The most obvious attack is for the adversary given the public key X and a ciphertext Y to try to compute the key K from XY and then recover M via M WK 1 mod pll l But trying to nd K amounts to solving the CDH problem which as we discussed is believed to be hard However by now we know that it is naive to restrict security concerns to key recovery or even to recovery of plaintext from ciphertext We must also address the possibility of loss of partial information about the plaintext In other words we should be asking whether the scheme meets the notion of IND CPA we discussed above Whether it does or not turns out to depend on the choice of group Before assessing IND CPA we need to clarify something Recall that encryption is not by our de nition required to hide the length of a message captured by the fact that the leftor right encryption oracle simply returns 1 if fed a pair of messages of equal length This leads us to ask what is the length of a message when the latter is a group element As we said earlier some encoding of group elements as strings is presumed However we insist that the strings corresponding to all elements of the group be of the same length meaning encryption should not enable an adversary to distinguish the ciphertexts corresponding to any two group elements 1152 El Gamal in the group Z We rst look at the case where G Z for a prime p 2 3 and show that in this case the scheme fails to be IND CPA The attacks rely on a little number theory from Chapter 9 Recall that the Legendre also Jacobi symbol JpA of A E Z is 1 if A is a quadratic residue and 71 otherwise We claim that given a ciphertext Y W of the scheme above we can compute JpM This is loss of information about M since a priori there is no reason that the Jacobi symbol of M should be known to an adversary We now explain how to compute JpM given an encryption Y W of M under public key X gm The scheme tells us that W KM where K 9W and gy Y We rst note that by Proposition 920 JpW JpKM JpK JpM This implies JpM JpK JpW Now Proposition 921 tells us JpK Jpg can be computed given JpX and JpY Finally Proposition 918 tells us that JpX JpY JpW can all be computed in time cubic in the length of 13 Putting it all together JpM can be computed given X Y W in time cubic in the length of 13 We now detail the attack Proposition 119 Let p 2 3 be a prime and let G Z Let g be a generator of G Let ASEG be the El Gamal encryption scheme associated to 09 as per Scheme 118 Then there is an adversary A such that Advijgj am 1 24 ASYM METRIC ENCRYPTION Furthermore A makes only one query to its left or right encryption oracle and having running time Op3 plus the time to perform some encoding related operations I Proof of Proposition 119 Adversary A has input a public key X E Z and access to a left or right encryption oracle XLR b where 5 is the encryption algorithm of the scheme We regard p g as xed and known to all parties including the adversary Now here is the adversary Adversary AgX LRquotquotbgtgt X M0 lt 1 M1 lt 9 Y7 W lt1 5XLRM07 M17 5 If X1P 1W E 71 mod p and WWW E 71 mod 13 then s e 71 else 5 91 Endlf If WCD U2 E mod p then return 0 else return 1 Endlf Results in previous chapters Propositions 921 and 918 tell us that s JPK where K 9 Y gy and X gm By Proposition 920 and some basic algebra we have JpWgt AKA1771 JPKgt JPMI1gt JPKgt JpMb 5 JpMb where b is the challenge bit Proposition 917 tells us that M0 is a square it equals 90 and O is even and M1 is a non square it equals 91 and 1 is odd Now suppose we are in world 0 meaning I 0 Then JpMb 1 so JpW s and thus A returns 0 On the other hand if we are in world 1 meaning I 1 then JpMb 71 so JpW 7 s and A returns 1 This means that Pr Expig gg a391A1l 1 Pr Expig gg a390A 1 0 Subtracting we get Advg ggam 1 as desired I 1153 Chosenciphertext attacks on the El Gamal scheme The El Gamal scheme is vulnerable to a chosen ciphertext attack regardless of the choice of group C An adversary can obtain the decryption of a given ciphertext by calling the decryption oracle on a different but related ciphertext This leads to the following Proposition 1110 Let G be a cyclic group and g a generator of G Let AEEG be the El Gamal encryption scheme associated to G g as per Scheme 118 Then there is an adversary A such that Advggggm 1 Furthermore A makes one query to its left or right encryption oracle one query to its decryption oracle and has running time the cost of a few exponentiations in G I Bellare and Rogaway 25 Proof of Proposition 1110 Adversary A that has input a public key X E G and access to two oracles a left or right encryption oracle XLRb and a decryption oracle where gm X Group G and generator 9 are xed and known to all parties including the adversary and D are as in Scheme 118 It works as follows Adversary AgX LRquotquotbgtgt D 39gtX Let M0 M1 be any two distinct elements of G Y7 W lt1 5XLRM07 M17 5 W lt Wg M e DzYW If M M09 then return 0 else return 1 The ciphertext Y W is different from the ciphertext Y W and thus the adversary is allowed to call its decryption oracle on Y W Let 1 denote the challenge bit and let K 9 where Y 93 Then M may W K lW K lwg Mbg Thus the value returned by A is the bit b meaning it has advantage 1 I 1154 Security of El Gamal under the DDH assumption ln suitable groups the El Gamal encryption scheme is secure against chosen plaintext attack The groups in question are those for which the DDH Decision Di ie Hellman problem is hard The problem was described in Section 1014 Recall the problem is that the adversary is given gzgygz and must return a bit to indicate whether or not 91 9W where g is a generator of the underlying cyclic group G If one can solve the CDH problem then one can solve the DDH problem by computing 9W and testing whether it equals 91 but it might be possible to solve the DDH problem without solving the CDH problem Indeed this is true in some groups such as integers modulo a prime as indicated by Proposition 105 meaning the DDH problem is easy in these groups But there are choices of group for which the DDH problem is hard in particular some groups of prime order such as the subgroup of quadratic residues of the group of integers modulo a prime cf Section 95 CS 4803 Computer and Network Security Alexandra Sasha Boldyr eva Very basic number Theory Let Z 2 1 0 1 2 denote the set of integers Let 2 1 2 denote the set of positive integers and N 0 1 2 the set of nonnegative integers Ifa N are integers with N gt 0 then there are unique integers r q suchthata Nq randO S rlt We associate to any positive integer N the following two sets ZN01N 1 ZR iEZ lsiSN l and gcdiN1 relatively prime to N Groups 0 Dief Let G be a nonempty set and let denote a binary operation on G We say that G is a group if it has the following properties 1 Closure For every a b e G it is the case that a b is also in G 2 Associativity For every a b c e G it is the case thatabcabc w Identity There exists an element 1 e G such that a aaforaaeG 4s Invertibility For every a e G there exists a unique beGsuchthatabba1 inverse denoted a1 Fact Let N be a positive integer Then ZN is a group under addition modulo N and z is a group under multiplication modulo N o In any group we can define an exponentiation operation ifi 0 then a39 is defined to be 1 ifigt0thena39aaaitimes ifi lt 0 then a39 a 1 a 1 a 1 ji times 0 For all a e G and all ij e Z aij ai aj aij aij ai ai1 a1i o The order of a group is its size Fact Let G be a group and let m G be its order Then am 1 for all a e G 0 Fact Let G be a group and let m G be its order Then a39 a39 med m for all a e G and all i e Z 0 Examp j Let us work in the group 22 1 2 4 S 8 10 11 13 16 17 19 20 under the operation of multiplication 1 modulo 21 m 2 586 mod 21 586 m d 12 mod 21 52 m d 12 mod 21 25 mod 21 4 o If G is a group a set S g G is called a subgroup if it is a group in its own right under the same operation as that under which G is a group 0 Fact Let G be a group and let S be a subgroup of G Then the order of S divides the order of G Algorithms and their running times Since in cryptography we will be working with BIG numbers the complexity of algorithms taking numbers as inputs is measured as a function of the bitlength of the numbers 39 Eg PrintinBinary A where A2k operations takes k Some basic algorithms Algorithm Input Output Running Time INTVDIV aN NgtD 117 withaNq7 andDgw ltN OmariNi MOD aN NgtD anxch OmariNi EXTVGCD ab m ie 00 m5 with d gcdab mb 0lal lbl MODVADD abN tbEzN ab Inch 0lNl MODVMULT abN a b E ZN ab mod N 0iNi2 MODVINV aN a E 2 b E Z with ab 1 mod N 0iNi2 MODVEXP anN a E ZN a mod N 0lnl iNiZ EXPG am a E s a E s anl separations Cyclic groups and generators 0 If g E G is any member of the group the order of g is defined to be the least positive integer n such that Q 1 We let ltggt g39 i E Zn gogl gn39l denote the set of group elements generated by g This is a subgroup of order n 0 Let An element 9 of the group is called a generator of G if ltggtG or equivalently if its order is mG 0 Let A group is cyclic if it contains a generator 0 If g is a generator of G then for every a E G there is a unique integer i E Zm such that g39 a This i is called the discrete logarithm of a to base 9 and we denote it by DLogGlga DLogG a is a function that maps G to Zm and moreover this I 9 function is a bijection The function of Zm to G defined by i 0 Qi is called the discrete function o Examp Let p 11 Then 21 l2345678910 has order p 1 10 We find the subgroups generated by group elements 2 and S We raise them to the powers 09 239 0 1 2 3 4 5 6 7 8 9 o 21 mod 11 1 2 4 8 5 10 9 7 3 6 o 51 mod 11 1 5 3 4 9 1 5 3 4 9 lt2gt 1234S678 9 10z1 1 lt5gt 13459 2 is a generator and thus Zfl is cyclic Choosnng cyclic group and generators The discrete log function is conjectured to be oneway hard to compute for some cyclic groups G Due to this fact we often seek cyclic groups 0 Examples of cyclic groups Z for a prime p 0 a group of prime order 0 We will also need generators There are efficient algorithms that allow to choose generators Squares and non squares Lef An element a of a group G is called a square or quadratic residue if it has a square root meaning there is some b E G such that b2 a in G 0 We let QRG g E G g is quadratic residue in G We are mostly interested in the case where the group G is 211 for some integer N 0 Defs An integer a is called a square mod N or quadratic residue mod N if a mod N is a member of QRZI Q If b2 a mod N then b is called a squareroot of a mod N An integer a is called a non square mod N or quadratic nonresidue mod N if a mod N is a member of 21 QRZI Q 0 Let Let p be a prime Define the Legendre symbol of a 1 if a is a square mod p Jpa 0 ifamodp0 7 1 otherwise OU39mFm aWM PorcePAM ovamew gt a I it 7odij a RefractW14 I r m M Nglefk leT 12 Heavy 3441b 7 X40149 Aanl g mu 3 m39vv 3 haw 4 Wm a 4322091397 Wm cs wo E39l hior i plun o Manly Sw Ll M of W 9 JabV 4w M ka a Camaon M0454 L 39 all 64 1 4 16 m T 9 a meamaf We 70 quot7 611 LIIS 116 m NC HDK mid 49 m fr706423 7quot amp 4pm MUMi View 77 a l A 7 m 01W y 600 I I A may I A n 39 393 M W Mman YeaLIarw on w Ac 4 1u1 I39I h lm 393 I I 1110 m T 25 new m m ocm Woiwsggacn gwih rq 6819 amp lt am 22m J in 870 C 95 anu40 3 IF U 3 quot A I Lvma 470 k 3 Co x 1 one wm it w A TbX 3 Q Fclt 3 f L 1 J 1 D PM I J v L N Ow HQ A 3 7C EVSSQZS N gt C gt01 8 E kt L mi Egg 1 J 5 L c 74 w WKhhk Application level security Ie programminglanguage security Previous focus was on protocols and algorithms C5 4803 to prevent attacks Computer and Network Security Are they implemented correctly Here focus is on programming errors and how to deal with them Alexandra Sasha Boldyreva Rd39 Iquot t39 f39d39 Applicationlevel security Buffer overflows e ucmge lmma mg m mg errors Malicious code worm and viruses Containing damage resulting from errors Classifying flaws Buffer overflows Intentional flaws 50 of reported vulnerabilities Eg backdoory Overflowing a buffer results in data written elsewhere Unintentional flaws User s data spaceprogram area Eg programmer errors System dataprogram code Including the stack or memory heap Stack Buffers 0 Suppose Web server contains this function void funcchar str Aime m We char bums strcpy buf r str capmumemm latnl my 0 When this function is invoked a new frame with local variables is pushed onto the stack Suck grows m my buf sf Nt str 39mmemne To of wwww Lomlyariabls WW Blteltute Arguments odeat previous m in S What If Buffer is Overstuffed Memow pointed to by str is copied onto stack void funcchar str 0 If a string longer than 126 bytes is copied into buffer it will overwrite adjacent stack locations tameame Topo ullmg mumn smk buf This Wm be interpreted 25 return addressl Executing Attack Code 0 Suppose buffer contains attackercreated string Ammpimeimemm I m W n a Hammett 5W immune m ms We mg e g m m h mm m mm wan m exam minsh the team We the mm was en mm mm mm 0 When function exits code in the buffer will be executed Problem No Range Checking strcpy does Lot check input size I strcpybuf str simply copies memory contents into buf starting from str until 0quot is encountered ignoring the size of area allocated to u 0 Many C libraw functions are unsafe Off By One Overflow Finding buffer overflows Homebrewed rangechecking string copy Hackers find buffer overflows as follows V t 1a nntSnSafeCDPYUEhal 1npu m imam m i lt iawnll lr Run web server on local machine m H KW mow hufferi inputi laminaquot W cm m m Issue requests with long tags nins lenpymwrn All long tags end With If web server crashes lbyte overflow can t change RET but can change pointer to previous stack frame search core dump for quot t0 overflow location 9 10 Addressing buffer overflows Run time checking StackGuard Sigiccufatgf pr39ggtdfrgigi gpgigf z a lomarking Stacksegmentas quot quot39 Embed canaries in stack frames and verify their 0 Code patches exist for Linux and Solaris lntegrlty prlor to fundJon return 0 Problems Frame 2 Frame 1 to 0 Some apps need executable stack e g LISP interpreters 52 0 Does not block more general overflow exploits 0 Patch not shipped by default for Linux and Solaris Run time checking Libsafe Intercepts calls to strcpy dest src I Validates sufficient space in current stack frame I If enough space does strcpy Otherwise terminates application More methods Address obfuscation I Encrypt return address on stack by XORing with random string Decrypt just before returning from function I Attacker needs decryption key to set return address to desired value Preventing Buffer Overflow 0 Use safe programming languages eg Java I What about legacy C code 0 Mark stack as nonexecutable 0 Make buffers slightly longer than necessary to avoid offbyonequot errors 0 Randomize stack location or encrypt return address on stack by XORing with random string I Attacker won39t know what address to use in his string 0 Static analysis of source code to find overflows 0 Runtime checking of array and buffer bounds I StackGuard libsafe many other tools I Blackbox testing with long strings Virusesmalicious code Virusesmalicious code Virus passes malicious code to other non malicious programs Or documents with executable components Trojan horse software with unintended side effects Worm propagates via network Typically standalone software in contrast to viruses which are attached to other programs Viruses Can insert themselves before program can surround program or can be interspersed throughout program In the last case virus writer needs to know about the specifics of the other program Two ways to insert virus Insert virus in memory at old location of original program Change pointer structure Viruses Boot sector viruses If a virus is loaded early in the boot process can be very difficult impossible to detect Memoryresident viruses 0 Note that virus might complicate its own detection Eg removing virus name from list of active programs or list of files on disk Some examples BRAIN virus Locates itself in upper memory resets the upper memory bound below itself Traps disk reads so that it can handle any requests to read from the boot sector Not inherently malicious although some variants were Morris worm 1988 Chernobyl virus 1998 I Resource exhaustion unintended I When infected program run virus becomes I Was supposed to have only one copy running but did not work correctly reSldent in memory 0f quot13chth I Rebooting does not help I Spread in three ways I Virus writes random garbage to hard drive I Exploited buffer overflow flaw in fingerd I Attempts to trash FLASH BIOS I Exploited flaw In sendmail debug mode I Guessing user passwords on current network 39 deStroyS the hardware I Bootstrap loader would be used to obtain the rest of the worm Melissa virusworm 1999 Code red 2001 I Word macro I Propagated itself on web server running When le opened would create and send Microsoft 5 Internet Information Server infected document to names in user s Outlook I Infection using buffer overflow Express mailbox I Propagation by checking IP addresses on port I Recipient would be asked whether to disable 80 of the PC to see if they are vulnerable macros I If macros enabled virus would launch Code Red I 0 July 13 2001 First worm of the modern era 1539 through 2039h of each month spread I Find new targets by random scan of IP address space I Spawn 99 threads to generate addresses and look for IIS scanned the same set of addresses 21539 through the end of each month attack Deface websites with HELLO Welcome to httpwwwwormcoml Ha ked by hinesequot Exploited buffer overflow in Microsoft39s Internet Information Sener IIS I Creator forgot to seed the random number generator and evenI copy Usurped Exception Handling In IIS 0 Overflow in a rarely used URL decoding routine I A malformed URL is supplied to vulnerable routine exception Exception handler is invoked I the pointer to exception handler is located on stack It has been overwritten to point to a certain instruction inside the routine that noticed the overflow I that instruction is CALL EBX At that moment EBX is pointing into the overwritten buffer I the buffer contains heap and executes it nother routine notices that stack has been smashed and raises an the code that finds the worm39s main body on the Code Red I v2 July 19 2001 Same codebase as Code Red I but fixed the bug in random IP address generation I Compromised all vulnerable IIS sewers on the Internet I Large vulnerable population meant fast worm spread I Scanned address space grew exponentially I 350000 hosts infected in 14 hours Payload distributed packet flooding denial of senice attack on wwwwhitehousegov 39 Coding bug causes it to die on the 2039h of each month but if victim39s clock is wrong resurrects on the 15 0 Still alive in the wild Code Red II 0 August 4 2001 Same IIS vulnerability completely different code kills I Code Red I Known as Code Red IIquot because of comment in code I Worked only on Windows 2000 crashed NT 0 Scanning algorithm preferred nearby addresses I Chose addresses from same class A with probability 2 same class B with probability 38 and randomly from the entire Internet with probability 18 0 Payload installed root backdoor in IIS servers for unrestricted remote access I Died by design on October 1 2001 Slammer Sapphire Worm 0 January 2425 2003 UDP worm exploiting buffer overflow in Microsoft39s SQL Server I Overflow was already known and patched by Microsoft but not evewbody installed the patc I Entire code fits into a single 404byte UDP packet I Worm binaw followed by overflow pointer back to itself 0 Classic buffer overflow combined with random scanning once control is passed to worm code it randomly generates IP addresses and attempts to send a copy of itself to port 1434 I MSSQL listens at port 1434 Slammer Propagation 0 Scan rate of 55000000 addresses per second I Scan rate rate at which worm generates IP addresses of potential targets I Up to 30000 singlepacket worm copies per second 0 Initial infection was doubling in 85 seconds H I Doubling time of Code Red was 37 minutes I Wormgenerated packets saturated carwing capacity of the Internet in 10 minutes I 75000 SQL sewers compromised I And that39s in spite of broken pseudorandom number generator used for IP address generation Slammer Impact 0 125 Billion ofdamage Temporarily knocked out many elements of critical infrastructure I Bank of America ATM network I Entire cellphone network in South Korea I Five root DNS servers I Continental Airlines39 ticket processing software I The worm did not even have malicious payload simply bandwidth exhaustion on the network and resource exhaustion on infected machines I Secret of Slammer 5 Speed I Oldstyle worms Code Red spawn a new thread which tries to establish a TCP connection and if successful send a copy of itself over T P I Limited by latency of the network I Slammer was a connectionless UDP worm I No connection establishment simply send 404byte UDP packet to randomly generated IP addresses I Limited only by bandwidth of the network I ATCP worm can scan even faster I Dump zillions of 40byte TCPSYN packets into link layer send worm copy only if SYNACK comes back CS 4803 Computer and Network Security Alexandra Sasha Boldyreva Kerberos Ma nytoMany Authentication Servers How do users prove their identities when requesting services from machines on the network Naive solution every server knows every user39s password insecure compromise or one server is enougn to compromise all users inef cient to change nis password user must Contact every server Requirements Security 0 Against attacks by passive eavesdroppers and actively malicious users Reliability Transparency 0 Users shouldn t be aware of authentication taking place 0 Entering password is OK if done rarely Scalability 0 Large number of users and servers Recall the threats 0 User impersonation o Malicious user with access to a workstation pretends to be another user from the same workstation 0 Network address impersonation o Malicious user changes network address of his workstation to impersonate another workstation 0 Eavesdropping tampering and replay o Malicious user eavesdrops on tampers with or replays other users conversations to gain unauthorized access Solution Trusted Third Party I Knows all users and servers passwords User requests ticket for some 39 service proves his identity 1 I Ticket is used to access desired network service Servers User Trusted authentication service on the network It Knows all passwords can grant access to any server Convenient but also the single point of failure Requires high level of physical security What is a ticket for 1amp3 Ticket gives holder access to a network service Server User What Should a Ticket Include User En c rypted ticket n n 0 User name 0 Server name 0 Address of user s workstation Otherwise a user on another workstation can steal the ticket and use it to gain access to the server 0 Ticket lifetime 0 A few other things eg session key How Is Authentication Done User 39 A if J Ck a Authentication server Insecure passwords are sent in plaintext Eavesdropper can steal the password and later impersonate the user to the authentication server Inconvenient need to send the password each time to obtain the ticket for any network service Separate authentication for email printing etc Solution TwoStep Authentication Prove identity once to obtain specialTGS ticket Instead of password use key derived from password UseTGS to get tickets for many network services EncrypdeGSIicket V NH V Authemeauon39 serverAS Key x mm a l TKkeE nun QHEQNKDC 3 g Encrypted Servitetitket quot Joe die User serViee cs File serven prineen other network semees Still Not Good Enough 0 Ticket hijacking o Malicious user may steal the service ticket of another user on the same workstation and use i IP address verification does not help 0 Servers must be able to verify that the user who is presenting the ticket is the same user to whom the ticket was issue 0 No server authentication 0 Attacker may misconfigure the network so that he receives messages addressed to a legitimate server Capture private information from users andor deny service 0 Servers must prove their identity to users Summary of Kerberos Symmetric Keys in Kerberos 0 KC is gterm key of client C Derived from user s password 0 Known to client and key distribution center KDC o KTGS is longterm key of ticket granting service TGS 0 Known to KDC and TGS o KV is longterm key of network service V 0 Known to V and TGS separate key for each service o KCITGS is shortterm key between C and TGS 0 Created by KDC known to C and TGS o KCIV is shortterm key betwen C and V Created by TGS known to C and V Single Logonquot Authentication Obtaining A SerVice TiCket Ticket Granting Service TGS usually live inside KDC kinit program client EncrypthFGSGDC Addrc Key Distribution lmec Center KDC 1gt39 7hr IDv tjeketTGs auth I EncryptKCVTGchvy IDv timerl tieke new 7 Wequot clam Me Know key KV for each service Decrypt with Kc and obtains MayanKW IDc Addrc IDV lmeTGS iife39time ltch5 i ID Adar Encrypths i IDTGS itlmeKDC iifetime 0 Client uses TGS ticket to obtain a service ticket and a shortterm key for each network service 0 One encrypted unforgeable ticket per service printer email etc Obtaining SerVice Kerberos in Large Networks Encrypth VIDC Addrc lmec One KDC isn t enough for large networks why ServerV EncrprKCV imec Authenticates server to eiient Network is divided into realms System commend eg lpr 4mmquot User KDCs in different realms have different key databases To access a service in another realm users must Get ticket for homerealm TGS from homerealm KDC Get ticket for remoterealm TGS from homerealm TGS 0 For each service request client uses the shortterm key for that service and the GS ticket he received from T As if remoterealm TGS were just another network service Get ticket for remote service from that realm s TGS Use remoterealm ticket to access service NN12 key exchanges for full Nrealm interoperation Recall symmetric setting CS 4803 WA Computer and Network Security K ir1 X K Alexandra Sasha Boldyreva 5 R Publickey encryption A scheme AE is specified a key generation algorithm K an encryption algorithm E and a decryption algorithm D k AEKED 39p R A L 9 pk sk MsgSppk message space CZZZLLZZZZZZUX kR pk s 5 R 39 go o a M M o J m gt orJ Senders Receiver R It is required that for every pksk that can be output by K and every MeMsgSppk if CEpkM then DskCM o A sender must know the receiver s public key and must be assured that this public key is authentic really belongs to the receiver This is ensured be the PKI processes which are not part of encryption 0 Unlike in a symmetric encryption the asymmetric encryption algorithm is never stateful 0 Messages will often be numbers or group elements encoded as bitstrings whenever necessary Indistinguishability under chosenplaintext attacks Fix an encryption schemeAEKED Pick keys pksk by running K For an adversaryA and a bit b consider two experiments Expindcpab AEA for b0 or b b l CEDWbgt Ewanm The difference between probabilities of outputting 0 in two experiments is called indcpaadvantage ofA in attacking AE An asymmetric encryption scheme AE is indistinguishable under chosen um FDA secure A A a o adversary with reasonable resources is close to 0 INDCPA is not always enough Bleichenbacher s attack on a previous version of SSL invalid cipher text Cu invalid cipher text Cm OK Alice39s session key Cullquot invalid ciphertext Indistinguishability under chosenciphertext attacks Fix an encryption schemeAEKED Pick keys pksk by running K For an adversaryA and a bit b consider two experiments Expindccab AEA for b0 or b b l A is not allowed to query its decryption oracle on cip ertexts returned y its LR encryption oracle CEpkMb Epkmc b The difference between probabilities of outputting 0 in two experiments is called indccaadvantage ofA in attacking SE A symmetric encryption scheme SE is indistinguishable under chosen ciphertext attacks INDCCA secure if indccaadvantage of any adversary with reasonable resources is close to 0 ND CCAgtND CPA INDCCA secure schemes guarantee security against more powerful adversaries Any INDCCA scheme is also INDCPA But an INDCPA scheme is not necessarily INDCCA The ElGamal scheme 0 Let G be a cyclic group of order n and let 9 be a generator of G The ElGamal encryption scheme EGK E D associated to Gg is as follows Algorithm IC Algorithm X Algorithm z 3 Zn If M Q G then return L K lt Y Xegw inmiegy MeWIrI ReturnX7z KHX9WltKM ReturnM Return W 0 Security depends on the choice of G The ElGamal scheme in Z for a prime p 0 In this group the ElGamal is INDCPA insecure namely there exists an adversary A with indcpa advantage The idea given a ciphertextA can compute JpM Adversary A5leRl v vbgtgtX MD 47 1 M1 47 g KW i xltLRltMnMnbgtgt If XWW E 71 mod p and WHW E 71 mod 10 0 thenseilelsesel Endlf H WlP DQ E 9 mod p then return 0 else return 1 Endlf MW MK MM S MM Note tha M0 is a square and M1 is no Why If b0 then JpM01 JpWs If b1 then JpM11 JpWs Hence PrExpEm quotWquotA1 1 and Pr Exp WDA 1 0 0 Theorem The EIGa mal is INDCPA secure in groups where the Decisional DiffieHellman DDH problem is hard p where p2ql and pq are primes It s a cyclic group of prime order ie in QRZI the subgroup of quadratic residues of Z 0 Proof 9Q a CS 4803 Computer and Network EU CI 9 Security x U a Dr Wenke Lee l wenkeccgatechedu r a The lecture notes have incorporated course materials developed by Dr S Felix Wu of UC Davis a Dr F engmin Gong of I ntruVert Q Dr Matt Bishop of UC Davis and Dr H enning Schulzrinne of Columbia University a W E Course Objectives Ed D l Understanding of basic issues concepts principles and mechanisms in information security 0 Security goals and threats to computer and networking infrastructure and applications 0 Introduction to cryptography 0 System security 0 Network security I Exposure to security research gu QQEQ Course Styles El 3 U l Descriptive what is out there 739 l Critical what is wrong with e l Skill oriented homework with programming and 5 lab projects 7 9 Explore z I Interactive discussion and questions encouraged Qquot and considered in grade 9 Students are encouraged to present their ndings l Information sharing home page and message boardemail list a a a rn Course Outline E D l Fundamentals 9 Overview of computer security 9 Fundamental results 9 Security policy and models 9 Design principles and implementation issues 9 Vulnerability analysis and auditing I Program security operating system security and database security I Cryptography 9 Secret key cryptography O Hashes and message digests 9 Public key cryptography 9 Information hiding Course Outline Cont d l Network and system security 0 Authentication and security handshakes pitfalls 0 IP security 0 Web and Ecommerce 0 Virusworm detection firewalls intrusion detection 0 Hacking and forensics l Writing secure code Prerequisites l Operating systems networking discrete mathematics and programming C or C J av a GQQEQEQ C J 1 391 v 1quot Textbooks and References l Required textbooks 0 Introduction to Computer Security by Matt Bishop 0 Network security PRIVATE communication in a PUBLIC world by Kaufman Perlman and Speciner l Reference texts and papers 0 Security in Computing by Charles P eeger and Shari P eeger 0 Additional materials see course Web page a El Course Mechanics cu m l WWW page 0 For course materials e g lecture slides homework files papers tools etc I Grading 40 homework 15 exam I 20 exam II and 25 nal project 0 Course participation 5 extra credits U E D U C 1 E2 Motivations F 139 L E3 E5 Q Q Q U E El E 3 Why Is Security Important g l Computers and networks are the nerves of f the basic services and critical infrastructures in our society 0 Financial services and commerce 0 Transportation F 0 Power grids g 0 Etc l Computers and networks are targets of U attacks by our adversaries at A Motivating Example CI cl D l Requlrements of an eCommerce s1te 0 Performance of concurrent transactions 9 Usability Easy to follow GUIs convenience eg cookies 0 Security 3 Secure transmission and storage of costumer u D finanmalpersonal data Protect the Web servers and the enterprise network it from illegitimate access Provide continuousuninterrupted services D U a quotU E El 39 3 Why Is Securlty Hard and Harder 1 E f l The complexity of computers and networks 7 l User expectation r s1 l User Ignorance 0 Social engineering l Defense is inherently more expensive 0 Offense only needs the weakest link Cl a C0 Trends by Application Demands l Hunger for bandwidth 9 Hardware Physics breakthroughs seem to come easier than software I Wider spectrum of application sophistication O Besteffort to guaranteed 9 Builtin security I Drive for ubiquitous access I Economicsprofitability Quest for Better Services I Realtime audioVideo requires guaranteed endtoend delay and jitter bounds l Adaptive multimedia application requires minimum bandwidth and loss assurance l Intelligent application demands reliable feedback from the network I Security GQQEQEQ J 1 391 v 1quot Quest for Ubiquitous Access l Information age is a reality I Everything depends on reliable and ef cient information processing 0 Quality of our everyday life 0 Development of nationalworld economy 0 Security of national defenseworld peace l Networking is one critical part of this underlying information infrastructure a U a W 3 Economic Pressure C l Service providers want the most bang on their buck the most profitable technology 0 Cautious adoption of new technologies Even for security 0 Emphasis on leveraging deployed technologies 0 Increased utilization of existing facilities QQQEEIEHIIIIIII J Networking Technologies l Switching modes 9 Circuit switching 9 Packet switching Ethernet HIPPI ber channel IP routing frame relay ATM IP switchingtag switching l Highspeed transmission media o SONETSDH WDM l Ubiquitous access media 9 XDSLcable modem IEEE802ll LEOSs I We will study the common security issues I I The Internet ll Ill an III e 2E g a Layered Storeand forward e1 a User A User B 3 Application 3 1 II Wmquot II Li Z 39l B 31 3 Network a DIID gt IID 1 a all Link m IDIII lt E A A m m g Securlty Impllcatrons E l g l Vulnerabilities from weak design to feature rich implementation to compromised entity l Heterogeneous networking technologies adds to E security complexrty 9 But improves survivability l Higherspeed communication puts more information at risk in given time period O Easier to attack than to defend l Ubiquitous access increases exposure to risks an u Ill Grounded Theory Overview 0 What is it How do I do it Concepts Theoretical Sensitivity Open Coding Axial Coding Selective Coding Illustrated with an example From my own research Georgia tmg u ce Techt 1 H gy r v Grounded Theory What is it 0 An approach to guide analysis of data The Discovery of Grounded Theory Strategies for Qualitative Research Barney G Glaser and Anselm L Strauss Set of techniques Identify categories and concepts that emerge from text Data reduction in our terminology Link concepts into substantive and formal theories Data organisation and explanation in our terminology Georgia tmg aa ce Techm H cgjy Grounded Theory Overview of the Mechanics 0 Read through eld notes Produce analytic categories potential themes that arise As categories emerge Pull data from categories together and compare them Think about how categories Fit together into an explanation or model Take models developed Check them against the data particularly negative cases Present results using examples from the data Georgia tmg u ce Techt 1 H gy r v Grounded Theory Getting Started Before you commence analysis Theoretical sensitivity is about developing insight into data Being able to give the data meaning understand it and separate pertinent from not 0 Theoretical sensitivity Use literature readings on theory research and supporting evidence Ground yourself in the multiple contexts that surround the data your analysing Professional experience Use background knowledge of practioners in the eld including yourself Georgia tmg aa ce Techm H cgjy I Grounded Theory ExampleTheoretical Sensitivity in Software 0 Prior to analysis of startup data I did the following things to enhance my theoretical sensitivity Literature software engineering Followed ICSE proceedings SCM workshop etcto understand CM 0 Professional experience get others insights Followed CM usenet group to learn about practice of CM manager Georgia tmg u ce Techt 1 H gy r v Open coding Grounded Theory First Step Open Coding Process of breaking down examining comparing conceptualising and categorising data Data reduction although it doesn t feel like reduction when you re doing it Inductive Open coding consists of Labelling phenomena Discovering categories Developing categories properties and dimensions Georgia tmg aa ce Techm H cgjy Open Coding Labelling Phenomena 0 Break down raw full descriptive field notes By asking questions about notes What is this What does it represent For each phenomenon incident idea or event Give each discrete phenomenon a name Give it a name that captures its essence in a more general way concept Compare it to others already discovered Could it be labelled in a way that others are labelled preserving integrity Georgia rmg u ce Techt 1 H gy t v Open Coding Discovering Categories of Concepts 0 At the end of even a small piece of field notes Lots of concepts It is quite normal to feel overwhelmed and concerned at this point Next group concepts into categories Category classi cation of concepts discovered when concepts compared What s similar may want to note why you think it s similar This started when you labelled Now you re asking questions about the concepts and the category What are the phenomena in this category about what are they instances of Use answer to label category Georgia tme aa ce Techm H cgjy I Open Coding Developing Properties and Dimensions 0 Properties Characteristics or attributes of a category Eg colour has the properties of intensity and hue 0 Dimensions Locations of a property along a continuum Eg intensity varies from high to low hue from darker to lighter 0 These will help us develop broader relations later Georgia rmg c ce Techr H gy Example Open Coding Context Interview with a software developer about her work Well I try to avoid parallel development I grumble about it to me it39s out there it happens in our company and in others but it seems to me that if there39s better management and better decomposition of problems then should be avoided Number 1 solve it by keeping things separate as far the units of work the resolutions of work which in our case is source files and number 2 when you go about assigning this work you could try and assign common problems to the same person so they are not doing parallel development What has to happen is the last guy who checks something in has to merge these two together and merging to be honest is generally pretty easy as long as the people aren39t working on the same checks in the code If I39m working at the top of the le and somebody else is working on something and the bottom of the file then it39s fairly easy to merge unless those changes change the overall algorithm then it gets messy Georgia me ca ce Techm H ij Example Open Coding Phenomena Well I try to avoid parallel development I grumble about it to me it39s out there it happens in our company and in others but it seems to me that if there39s better management and better decomposition of problems then should be avoided Number 1 solve it by keeping things separate as far the units of work the resolutions of work which in our case is source files and number 2 when you go about assigning this work you could try and assign common problems to the same person so they are not doing parallel development What has to happen is the last guy who checks something in has to merge these two together and merging to be honest is generally pretty easy as long as the people aren39t working on the same checks in the code If I39m working at the top of the le and somebody else is working on something and the bottom of the file then it39s fairly easy to merge unless those changes change the overall algorithm then it gets messy Phenomena events being described Parallel development Merge Picked these because they seemed to be main acts Georgia tmg ui ce Techr H gy Example Open Coding Category Discovery and Development Well I try to avoid parallel development I grumble about it to me it39s out there it happens in our company and in others but it seems to me that if there39s better management and better decomposition of problems then should be avoided Number 1 solve it by keeping things separate as far the units of work the resolutions of work which in our case is source files and number 2 when you go about assigning this work you could try and assign common problems to the same person so they are not doing parallel development What has to happen is the last guy who checks something in has to merge these two together and merging to be honest is generally pretty easy as long as the people aren39t working on the same checks in the code If I39m working at the top of the le and somebody else is working on something and the bottom of the file then it39s fairly easy to merge unless those changes change the overall algorithm then it gets messy Category Initially code coordination but too broad so Individuals coordinating code Properties and Dimensions Workrthat varies from independent to dependent What s being talked about here Work talked about see italics Module Change varies from separate to the same Changes to the code modules speci cally described in detail Techm H gy Open Coding Practical Moment Computers probably useful for open coding Sadly I ve never done that way I m still manual 0 Initially I used postit notes for each concept And a large flat surface My slide door wardrobe which I then could not get into for several days Now I tend to use absurd numbers of 3x5 cards And a floor or conference room table One colour for concepts one concept per card Another colour for categories one category per card Georgia tmg u ce Techt 1 H gy t v Grounded Theory Second StepAxial Coding Axial Coding Data assembled in new ways after open coding by making connections between categories Moving from inductive to deductive analysis 0 Axial Coding consists of Taking categories and identifying The conditions that give rise to it Context into which it is embedded Actioninteraction strategies in which it is handled managed carried out Consequences of those strategies Expanding out our knowledge of the categories Categories represent our phenomena ow Georgia tme aa ce Techm H cgjy I Pedal Coding Casual Conditions and Context Causal Conditions Events incidents that lead to the occurrence of a category Search the data for the conditions that give rise to the category phenomena Context The set of properties that pertain to a category Locate the events in the category on the dimensions Georgia tmg u ce Techt 1 H gy t v Pedal Coding lntervening Conditions and lnterActional Strategies lntervening Conditions Broader structural context pertaining to category Space time culture economic status technological status career history biography 0 Actionlnteractional Strategies Particular emphasis of Grounded Theory it focuses on action amp interaction What actions do individuals take with respect to the category How do groups or collectives interact and act with respect to the category Georgia me aa ce Techm H cgjy Pedal Coding Consequences Consequences There are always consequences Here it means the outcomes of the interactional strategies Georgia tmg u ce Techt 1 H gy Examplez odal Coding Casual Conditions Well I try to avoid parallel development I grumble about it to me it39s out there it happens in our company and in others but it seems to me that if there39s better management and better decomposition of problems then should be avoided Number 1 solve it by keeping things separate as far the units of work the resolutions of work which in our case is source files and number 2 when you go about assigning this work you could try and assign common problems to the same person so they are not doing parallel development What has to happen is the last guy who checks something in has to merge these two together and merging to be honest is generally pretty easy as long as the people aren39t working on the same checks in the code If I39m working at the top of the le and somebody else is working on something and the bottom of the file then it39s fairly easy to merge unless those changes change the overall algorithm then it gets messy 0 Individuals coordinating code 0 What causes individuals coordinating code bad management bad code being in the same part of the code Georgia me ca ce Techm H ij Examplez oaal Coding Context Well I try to avoid parallel development I grumble about it to me it39s out there it happens in our company and in others but it seems to me that if there39s better management and better decomposition of problems then should be avoided Number 1 solve it by keeping things separate as far the units of work the resolutions of work which in our case is source files and number 2 when you go about assigning this work you could try and assign common problems to the same person so they are not doing parallel development What has to happen is the last guy who checks something in has to merge these two together and merging to be honest is generally pretty easy as long as the people aren39t working on the same checks in the code If I39m working at the top of the le and somebody else is working on something and the bottom of the file then it39s fairly easy to merge unless those changes change the overall algorithm then it gets messy 0 Individuals coordinating code Workthat varies from independent to dependent Dependent work Module Change varies from separate to the same Same changes Georgia tmg u ce Techr H gy Examplez oaal Coding Intervening Conditions Well I try to avoid parallel development I grumble about it to me it39s out there it happens in our company and in others but it seems to me that if there39s better management and better decomposition of problems then should be avoided Number 1 solve it by keeping things separate as far the units of work the resolutions of work which in our case is source files and number 2 when you go about assigning this work you could try and assign common problems to the same person so they are not doing parallel development What has to happen is the last guy who checks something in has to merge these two together and merging to be honest is generally pretty easy as long as the people aren39t working on the same checks in the code If I39m working at the top of the le and somebody else is working on something and the bottom of the file then it39s fairly easy to merge unless those changes change the overall algorithm then it gets messy 0 Individuals coordinating code 0 What broader contexts might apply here Time delivery schedules Space colocated development or geographically separated 0 At this point with this data These are questions we might explore in the eld Georgiagmg w cg Techm H cgjy Examplez oaal Coding Interactiona Strategies Well I try to avoid parallel development I grumble about it to me it39s out there it happens in our company and in others but it seems to me that if there39s better management and better decomposition of problems then should be avoided Number 1 solve it by keeping things separate as far the units of work the resolutions of work which in our case is source files and number 2 when you go about assigning this work you could try and assign common problems to the same person so they are not doing parallel development What has to happen is the last guy who checks something in has to merge these two together and merging to be honest is generally pretty easy as long as the people aren39t working on the same checks in the code If I39m working at the top of the le and somebody else is working on something and the bottom of the le then it39s fairly easy to merge unless those changes change the overall algorithm then it gets messy 0 Individuals coordinating code Strategies focus on avoiding interaction Example waiting for someone else to check in before she checks out Here strategies emerge in negative form What not to do Georgia tmg u ce Techt 1 H gy Examplez oaal Coding Consequences Well I try to avoid parallel development I grumble about it to me it39s out there it happens in our company and in others but it seems to me that if there39s better management and better decomposition of problems then should be avoided Number 1 solve it by keeping things separate as far the units of work the resolutions of work which in our case is source files and number 2 when you go about assigning this work you could try and assign common problems to the same person so they are not doing parallel development What has to happen is the last guy who checks something in has to merge these two together and merging to be honest is generally pretty easy as long as the people aren39t working on the same checks in the code If I39m working at the top of the le and somebody else is working on something and the bottom of the file then it39s fairly easy to merge unless those changes change the overall algorithm then it gets messy 0 Individuals coordinating code 0 Not keeping things separate Leads to a mess another negative example Separation leads to ease Of interaction positive example Georgia me ca ce Techm H cgjy Example Open and Axial Coding First Observational Example page 96 Lo and and Lo and Open Coding Concept Toy Car Properties and Dimensions Place Public Private Control Moving Letting Go not quite right name Axial Coding Casual Conditions sitting down being still Context public setting letting go lntervening Conditions Behaviour in public places lnteractional Strategies Observation of boy by mother reaction to prevent Consequences punishment 0 Not enough data for selective coding Georgia mg n Techt 1 H gy r v Grounded Theory Third Step Selective Coding Selective coding The process of selecting the core category The central category around which your nal analysis will be based Then relating it to other categories 0 Three processes Explicating the story line analytic description of the core category Relating other categories to the core Validating the story line Georgia me aa ce Techm H cgjy Selective Coding The Story Line Commit commit commit After months of analysis EVERYTHING seems important But a story line needs to priortise one category over all others 0 Finding the story Ask yourself what seems the most strikinginteresting Write it in no more than a paragraphjust a FEW sentences Does one category seem more central Can it explain the others If you have two drop one Georgia tmg u ce Techt 1 H gy t v Selective Coding Relating Categories to the Story Line First outline core s properties and dimensions Relate other categories to the core Using properties and dimensions Using conditions context strategies consequences Georgia tme ca ce Techm H cgjy Selective Coding Validating the Story Line 0 Final step Validate the theory against the data 0 Write a series of memos that step through the story Do they organise all the data Do they cope with all the negative as well as positive examples Go back to the eld To ll in missing pieces Georgia tmg u ce Techt 1 H gy t v Example Previously we had one category Individuals coordinating code Parallel development history of the code base many others Others emerged Groups coordinating code Life cycle architecture Organisations coordinating code Multiple organisations coordinating code 0 And these are just the relevant ones For the gory details there s my dissertation Georgia tme aa ce Techm H cgjy Example Selective Coding The Story Line 0 Eventually the story line became Technical dependencies in software create social dependencies among the developers responsible for the pieces Named the core category recomposition Recomposition is all the work it takes to coordinate the assembling software from its parts Name came from decomposition the process of breaking software apart It took me a long time to abandon Individual group organisational distinction Present in categories but not actually important to the central story Important in explaining the details of the theory but not in summarising it Georgia tmg u ce Techt 1 H gy t v Example Selective Coding Relating Categories to the Story Line Recomposition Seemed to encompass individual group organisational coordination Same roots 0 Properties of interrelationship of work and code All the problems happen when work and code can not be separated 0 Differences emerged Causes and consequences depending on scale individual v organisation But they could be parameterised and organised into the story line Georgia tme aa ce Techm H cgjy Graphics in Games level of detail in models terrain maps lighting texture maps 00101 S shadows ray casting take 4390 4391 4451 How have graphics in uenced the development of games What would you do with substantially more graphics computing power How have graphics in uenced the development of games vertical walls horizontal oors texture mapping vs modeling detail fog games set indoors What would you do with substantially more graphics computing power lighting effects re ections shadows more texture memory more animation pipeline latency more simulation or Al from CPU Level of Detail using simplified versions where detail is not needed used throughout the system polygon meshes textures procedural and not animation procedural and not Where does LOD info come from howwhen to swap models less important if details are truly minor but pops may attract attention Level of Detail Modeling by hand or automatic why artists will do better knowledge about the model facial features silhouette why use automatic meshing dynamically changing objects cpu vs artist time if Art of Low Polygon Modeling know your limitations target face count Quake II 600 faces character engine depends on vertices or faces know what matters how will model be seen back or front near or far alone or in groups properties of model closed model one model or articulated organic vs non organic Techniques for Low Poly Models vertex merge 4 edge division edge turn Automatic Meshing Techniques off line Siggraph Community on line heuristics for vertex deletion on line parametric surfaces Progressive Meshing selecting which edge to collapse next small details go first nearly co planar surfaces need fewer polygons than areas of high curvature costuV u V max min 1 fn39nn 2 f eTu n aTuv where T11 is set of triangles that contain u TuV is set of triangles that contain uV Progressive Meshing 39Ei a 4 FIGURE 5 Bunny model at left to right 453 200 and 100 vertices Lighting video games are different than stills or even animations viewpoint object motion Goals direct viewer s attention emphasize depth and separation reveal texture form create mood provide exposure and balance Properties of Lights quality hard or soft intensity want objects to differ in brightness for separation and depth color and pattern glow from sunset grid from bars direction frontal edge side back top 12H side 12V frontal key 9H 3H 9v 3v 39 l strongest shadows 6v 12H top edge side 9H8 3H contours 6H texture top 12H side 12quot 9 back separate H 3H 9v 3v 39 from background i W Key light casts shadows chief light source diagonally from front high Fill light soften shadows lower intensity positioned lower Back light separate figure from background 3d mid level intensity above figure Color Theory use color palette to set mood color temperature warm red red orange yellow yellow green I I l I cold Violet blue green green yellow blue green I I I I I weight darker gtheavier depth grey gt more distant Visibility blackyellow greenWhite l redwhite blueWhite white blue blackwhite Color Schemes that Work monochromatic just primary colors all warm or all cool contrast of warm and cool color and its complement Raycasting Wolfenstein 3D in 1992 subclass of ray tracing Raytracmg hidden surface removal transparency re ections refractions ambient lighting shadows n eeeee ne Raycasting grid world plane number of rays gt horizontal resolution subsampling tables of slopes for ef ciency World walls at 900 wrt to oor walls made of uniform cubes oor at each cube consists of 64x64 smaller units Viewer player s height field of View Xy position of player facing direction yaw A FOV 60 Finding Walls screen size 320 ray Viewing angle 30 for col O col ray 60320 collt320 follow ray until hit wall record distance to wall Finding Intersections find intersection points with the grid fixed number of ray angles 360 60320 J use a table for the slope Horizontal Intersections find first intersection check grid wall or wall if wall compute distance if wall 1 J I find next intersection x xxa y yya xa table lookup ya grid height A ya 64 xa 64 tanoc 2 I X J Vertical intersections are similar look up ya xa is grid Width Finding Distance 1 sqrtpx dX2 py dy2 d abspx dX cos0c d abspy dy sin0c d JAG pxpy table look up for cos sin nite number of values for 0c Improvements doors and windows 450 walls platforms and ramps Ladder Queue An 01 Priority Queue Structure for LargeScale Discrete Event Simulation WAI TENG TANG RICK SIOW MONG GOH and IAN LIJIN THNG National University of Singapore This article describes a new priority queue implementation for managing the pending event set in 4 t i u t 1 I l l w 39 u p ii u other current popular candidates This new implementation called Ladder Queue is also theoreti cally justi ed to have 01 amortized access time complexity as long as the meanjump parame er of the priority increment distribution is nite and greater than zero regardless of its variance Many practical priority increment distributions satisfy this condition including unbounded vari ance distributions like the Pareto distribution This renders the LadderQ the ideal discrete event queue structure for stable 01 performance even under practical queue distributions with in nite variance Numerical simulations ranging from 100 to 10 million events af rm the 01 property of LadderQ and that it is a superior structure for largescale discrete event simulation Categories and Subject Descriptors E1 Data Data structures E2 Theory of Computation Analysis of Algorithms and Problem Complexity 1138 Simulation and Modeling Types of Simulationdiscrete evem General Terms Algorithms Performance Additional Key Words and Phrases Pending event set implementations priority queue calendar queue 1 INTRODUCTION The Sorteddiscipline Calendar Queue SCQ as proposed by Brown 1988 is an important implementation of the pending event set PES in discrete event simulators such as GTW Das et al 1994 CSIM18 Schwetman 1996 and Network Simulator v2 Fall and Varadhan 2002 By far SCQ is more popular than the unsorted bucket discipline calendar queue or UCQ This is because the number of operations required for a basic enqueue and dequeue operation is 0nsublist and 01 respectively in a SCQ bucket Where nsubh g represents Authors address Department of Electrical and Computer Engineering National University of Singapore 3 Engineering Drive 3 BlkE4A 0506 Singapore 117576 email waitengtangsmgoh Permission to make digital or hard copies of part or all ofthis work for personal or classroom use is granted without fee provided that copies are not made or distributed for pro t or direct commercial advantage and that copies show this notice on the rst page or initial screen of a display along with the full citation Copyrights for components of this work owned by others than ACM must be honored Abstracting with credit is permitted To copy otherwise to republish to post on servers to redistribute to lists or to use any component of this work in other works requires prior speci c permission andor a fee Permissions may be requested from Publications Dept ACM Inc 1515 Broadway New York NY 10036 USA fax 1 212 8690481 or permissionsacmorg 2005 ACM 104933010507000175 500 ACM Transactions on Modeling and Computer Simulation Vol 15 No 3 July 2005 Pages 1757204 176 W T Tang et al the number of events in an SCQ bucket For the case ofthe UCQ the number of operations required is higher that is 01 for enqueue and guaranteed nsublist operations for dequeue Ronngren and Ayani 1997 provided empirical evidence that the SCQ achieves 01 performance in benchmark scenarios where the number of events N in the queue and the mean of the jump random variable associated with the priority increment distribution denoted as M remain constant However the SCQ sometimes falters to 0n when u varies for example the Camel and Change distributions even though N is constant Furthermore when N varies the performance ofthe SCQ becomes erratic with many peaks which occur when the queue size uctuates by a factor of 2 revealing that the resize operations can be costly In addition when N varies the SCQ does not always achieve 01 performance even though 1 remains constant for example Triangular distribution These observations translate to the following a The SCQ s sizebased resize trigger is an ineffective mechanism for han dling skewed distributions where u varies It results in many events being enqueued into a few buckets with long sublists and many empty buckets ie skewed distribution phenomena Long sublists make enqueue oper ations expensive since each enqueue entails a sequential search whereas excessive traversal of empty buckets can increase the process of dequeue operations This problem of the SCQ has been attributed to the sizebased resize triggers which are too rigid to react according to the events distribu tion encountered since a resize trigger occurs only if the queue size halves or doubles Brown 1988 Sizebased resize trigger is not suitable for handling simulation scenarios where N varies by a factor of two frequently This form of trigger is consid ered in exible because even though the SCQ can be performing well with its existing operating parameters but due to this trigger it still has to resize w en N varies by a factor of two Sampling heuristic is inadequate to obtain good operating parameters namely the number of buckets and the bucketwidth when skewed distribu tions are encountered During each resize operation the SCQ samples at most the rst twenty ve events This is clearly too simplistic because for skewed distributions in which many events fall into some few buckets the interevent timegap of the rst twenty ve events which can span several buckets and those in the few populated buckets may vary a lot To simply increase the events sampled is not a prudent approach unless the distri bution is uniform For most distributions particularly skewed distributions such as the Camel distribution if we simply sample more events and then take the mean or median it is unlikely that it will be accurate since events are spread unevenly Even if the most populated bucket is sampled the SCQ will also perform poorly because events in that bucket will have a small av erage timegap whereas other events have widely diverse timegaps If the bucketwidth is updated to this small timegap there will likely be numerous empty buckets And skipping these empty buckets will lead to inferior per formance Furthermore sampling more events inevitably leads to higher A U v A o v ACM Transactions on Modeling and Computer Simulation Vol 15 No 3 July 2005 Ladder Queue 177 overheads for each resize operation affecting the SCQ performance on a d Events are immediately sorted in all the bucket sublists as they are en queued This enqueue process is inef cient and costly if the event distri bution is heavily skewed so that bucket sublists are very long resulting in costly linear searches during enqueue operations Furthermore if resize op erations are triggered the effort of sorting these events previously are all wasted since these events will have to be reenqueued and resorted again Such resize operations are very costly and time consuming in unstable large scale scenarios where N varies resulting in resize triggers being activated many times In this article we introduce a novel multilistbased priority queue structure which overcomes the above problems to achieve 01 performance in which N andor 1 vary These include skewed distributions as well as heavytailed distributions such as the Pareto with nite mean but unbounded variance We coin this structure the Ladder Queue LadderQ The reason for using a multilist as the basic structure for LadderQ rather than a treebased structure is that treebased structures are known to be bounded by 0logn whereas multilistbased structure such as the SCQ have an expected 01 performance except for scenarios in which the 1 andor N vary with time However unlike the SCQ the LadderQ is 39 quot39 to its quot l 6 as well as insensitive to N In fact we justify theoretically that the Ladder is 01 in both its enqueue and dequeue operations given that the mean of u is nite and greater than zero regardless of its variance Unlike the theoretical UCQ studied in Erickson et al 2000 we do not impose restrictive or sterile conditions like 6 must vary according to 01N in order to show that the LadderQ is 01 theoretically In addition we present empirical evidence obtained from rigorous experimental studies to illustrate that the LadderQ exhibits 01 amortized Tarjan 1985 or average complexity for widelyvarying priority increment distributions as well as for queue size ranging from 100 to 10 million These veri cations establish the necessary evidence that the LadderQ structure is superior compared to all current PES implementations The rest of this article is organized as follows Section 2 describes in detail the LadderQ algorithm Section 3 provides the theoretical proof of LadderQ s 01 amortized complexity Section 4 illustrates the measurement methods and benchmarking used in the experiments and Section 5 presents the empirical results 2 LADDER QUEUE The LadderQ avoids the problems encountered by the SCQ by having four es sential principles Firstly LadderQ defers the sorting of events until absolutely necessary that is when some high priority events are close to being dequeued The majority of arriving events are simply appended into buckets without sorting and hence only incur 01 cost ACM Transactions on Modeling and Computer Simulation Vol 15 No 3 July 2005 178 W T Tang et al Secondly during enqueue operations a resize operation encompasses only resizing the Bottom structure when it becomes too populated The Bottom struc ture made up of a sorted linked list is the center of activity of the LadderQ and essentially determines the performance of the queue However all of LadderQ s other buckets do not get resized Since the Bottom structure is limited to at most 50 events this not only cuts down the costliness of the resize it limits the com plexity of the enqueue operations in Bottom which relies on linear search and also ensure that complexity requirements on Bottom is at most 50 operations ie 01 irrespective of queue size and u Thirdly during dequeue operations the LadderQ re nes the bucketwidth on a bucketbybucket basis reducing the cost of resize operations Furthermore this bucketwidthre ning methodology which results in LadderQ having mul tiple bucketwidths well handles event distributions which have widely varying interevent timegaps In contrast the SCQ adopts a one size ts all approach where only a single bucketwidth is used for the entire SCQ structure resulting inevitably in costly resize operations Lastly instead of unreliable and somewhat costly sampling heuristics the LadderQ obtains good operating parameters dynamically in accordance to the events in its structure without sampling This reduces further its management overheads on the whole 21 Basic Structure of Ladder Queue Figure 1 shows an example of the basic structure of a LadderQ The name LadderQ arises from the semblance of the structure to a ladder with rungs Basically the structure consists of three tiers a sorted linked list called Bottom the middle layer called Ladder consisting of several rungs of buckets where each bucket may contain an unsorted linked list and a simple unsorted linked list called Top In the example of Figure 1 a LadderQ with three rungs of buckets Rung1 Bung 2 and Rung3 is shown The actual number of rungs ma vary from distribution to distribution Later in Section 3 it will be demonstrated that the average number of rungs is bounded by a constant irrespective of Nand as long as the mean jump parameter that is u of the distribution is nite and greater than zero regardless of its variance In the literature there is an interesting priority queue known as the Lazy Queue LazyQ Ronngren et al 1993 which bears a striking similarity in terms of structure as compared to the LadderQ which we describe in this article The LazyQ also has three tiers However the similarity ends there The mechanisms employed to trigger the transfer of events between the various tiers are more convoluted It also requires the user to know a priori the value of numerous parameters for it to perform well It also relies on the resize operation similar to the SCQ but on top of that it comes with various resize criteria We only provide a brief mention of the LazyQ because it has been already shown that the LazyQ provides similar performance compared to the SCQ for most scenarios and worse for some scenarios Ronngren and Ayani 1997 As such we have not included the LazyQ in our performance comparision We have on the other hand empirically shown that the LadderQ signi cantly outperforms the SCQ in almost every scenario see Section 5 ACM Transactions on Modeling and Computer Simulation Vol 15 No 3 July 2005 Ladder Queue 179 TopStart RStart1 RCur1 lr x o o o o o Rung1 o 0 o o O Q RStart 2 Ladder tRcurp x o o o o o o o Rung2 39 I C Q Bottom Legend BE lnvalidated empty bucket where all events have been dequeued before B Valid bucket may contain an unsorted linked list Events inserted into these buckets will simply be u appended and not sorted B Bucket used to contain an unsorted linked list but has spawned a new rung because the number of x 3p 39 events in the linked list exceeded 50 Subsequently events in this bucket will be redistributed to the newly spawned rung C B Current dequeue bucket where all events have been sorted and transferred to the Bottom linked list C If Bottom becomes too large bucket BC can also become a BSp bucket by spawning a new rung Fig 1 Basic structure of Ladder Queue 22 The Algorithm Before we present a detailed description of the LadderQ algorithm Table I describes the operating variables associated with this threetier structure 221 Dequeae Operation Initially Top Ladder and Bottom are empty and all incoming events are inserted into Top without any sorting MaxTS M inTS and N Top are updated accordingly When the rst dequeue event is red events in Top are transferred to the rst rung of Ladder that is Rang 1 The buck etwidth of Rang 1 is obtained using M TS M TS Backetwidth 1 ax m 1 N Top where Masz is not equal to M irLTs1 Upon obtaining the Backetwidth 1 the following variables are updated RStart1 and RCar1 are set MinTS and TopStart is set MaxTS Backetwidth Thereafter events can be 1Note that if Masz is equal to MinTs it means all the events in Top have the same timestamp In this article we consider the mean jump to be nite and positive Thus the likelihood of this occurring is extremely low See Section 24 for the practical aspect of this occurrence ACM Transactions on Modeling and Computer Simulation Vol 15 No 3 July 2005 180 W T Tang et al Table ll Some Important Operating Variables Maintained in LadderQ as events are events are which subsequent dequeue operations will start timestamp threshold of events which can be Used for bucketindex when inserting an event action would be initiated transferred to Bung 1 using 2 Bucket k TS 7RStart 1 Bucketwidth i where i 1 Equation 2 determines the index of bucket BuckeLk which an event is to be appended to the sublist of Bucketh unsorted TS refers to the timestamp of an event being transferred The rationale of using 1 to determine the bucketwidth is to assume that the interevent timegap in 717p is uniformly distributed between M11sz and MinTS so that there will be on average one event per bucket in Bung 1 If this assumption of uniform distribution is not true and that there are some buckets which contain too many events then the spawning process of LadderQ will be activated to x the problem The spawning process of LadderQ is what creates the multiple rung structure of LadderQ as illustrated in Figure 1 This process is activated only when certain conditions are satis ed More detailed description of the spawning process will be provided later Once the events have been transferred from Top to Bung 1 the buckets in Rung1 will be traversed by the dequeue process to nd the next nonempty bucket and this bucket will be designated as the current dequeue bucket BC The traversal of the buckets involves adding Bucketwidth 1 to RCur1 for every bucket skipped This ensures that buckets with timestamps less than RCur 1 are invalidated that is they should be empty of events Once the bucket BC is found the number of events in that bucket denoted by NBC is compared with a predetermined threshold called THRES The purpose of this threshold is to decide whether a spawning action should be initiated If NBC does not exceed THRES they will be sorted and placed into the linked list ACM Transactions on Modeling and Computer Simulation Vol 15 No 3 July 2005 Ladder Queue 181 Bottom Thereafter the rst event from Bottom which is the event with the highest priority will be returned However ifNBC does exceed THRES a spawning action will be initiated BC is converted into a BSp see Figure 1 and this process involves adding a new rung that is Bung 2 where the events from BC are then transferred into This spawning process is repeated until the sorted linked list Bottom is created and the highest priority event is dequeued For this spawning process different variables are updated as compared to the transfer of events from Top to Bung 1 In the example given in Figure 1 the variables RStart 2 and RCur 2 are set RCur 1 Subsequently RCur 1 4 o n l 15 by 1 r m 1 and n 39 39 m 2 is set using Bucketwidth i Bucketwtdth l 1 7 3 Events held in the parent bucket in Rung1 are then redistributed into Rung2 using the bucketindex procedure given in Eq 2 with i 2 The rationale of using Eq 3 to obtain the bucketwidth ofRung 2 is based on the assumption that the events in the BSp of Rung1 are uniformly distributed within its bucketwidth If not the spawning process continues on with more rungs being created The spawning process terminates when the number of events in the lowest rung does not exceed THRES It is shown later that if the jump distribution has a nite M then the average number of rungs created bounded by a constant irrespective of N The above describes all the possible operations that may result when a de queue occurs However the majority of dequeue operations are much simpler once a previous dequeue operation has already created the sorted linked list Bottom In such cases since Bottom is a sorted linked list there is a pointer that is always referencing to the rst event in Bottom that event is simply removed with negligible access times When all the events in Bottom have been dequeued the buckets of the lowest rung are traversed to nd the next dequeue bucket BC and the whole process is repeated again When a child rung is de queued of all events the child structure is deleted and the dequeue operation shifts back to the parent rung Figure 2 illustrates an example of the dequeue operation just described Ini tially a series ofenqueue operations insert events with timestamps 06 05 31 305 33 34 30 and 45 in that order into Top On the rst dequeue operation events from Top are transferred to Bung 1 of Ladder as shown in the diagram and the event with timestamp 05 is then dequeued Subsequently during the third dequeue operation empty buckets are simply skipped and the sixth bucket is reached For illustration purposes it is assumed that having ve events in the sixth bucket breaches the threshold THRES The consequence of breaching THRES is that Bung 2 is spawned with a bucketwidth of 01 Next the ve events in the sixth bucket of Bung 1 are transferred to Bung 2 Finally the event with timestamp 30 is extracted at the third dequeue operation 222 Successive Dequeue Operations Creates LadderQ Epochs Finally it should be noted that the LadderQ operates in epochs when successive dequeue ACM Transactions on Modeling and Computer Simulation Vol 15 No 3 July 2005 182 o W T Tang et al NT p3 anTS05 TapStart 0 MaxTS 45 l Top A 06 05 quotl31 quot 305 quot 33 quot 34 quot 30 quot 45 1st Dequeue Bucketwfdth operat39 quot RStartH RCur1 45058 05 05 10 15 20 25 30 35 40 45 50 Rung1 c o o o o o o o o l l 31 45 Bottom 05 39 l 06 305 To be L dequeued 33 2 34 2 30 During 3rd Dequeue Operation RStartm RCm 05 10 15 20 25 30 35 40 45 50 Rung1 x o o o s 1 quot 45 RStart RCurt2 O 39 quot 39 39 39 s Bucke vfdth 3 quot31 32 33 34 35 gEE Rung2 c o o o o quot t i t Le end See Fi 1 31 33 34 g g BE Bottom 30 305 I o B H x B To be dequeued c B Fig 2 An example illustrating the dequeue algorithm operations take place The start of a LadderQ epoch is marked by the creation of the Ladder and Bottom structure This occurs when all the events are held in Top ie there is no Ladder structure and no Bottom structure currently and the rst dequeue operation is initiated which creates the Ladder structure and the Bottom structure The operating parameters of the epoch are deter mined using Eqs 1 2 3 After the epoch is started subsequent enqueue ACM Transactions on Modeling and Computer Simulation Vol 15 No 3 July 2005 Ladder Queue 183 operations can result in events being enqueued in the Top Ladder or Bottom structures However in a stable queue system where the number of events en queued in the system is roughly equal to the number of events being dequeued then each dequeue operation will advance the current time stamp on average by a This means that as the timestamp advances more and more buckets in the Ladder and Bottom structures will be dequeued of events and become empty while the Top structure will grow in size Since the Ladder and Bottom structures have a nite number of buckets there must exist a time where the Ladder and Bottom structures are exhausted of events and all future events are now found in Top When the Ladder and Bottom structure are empty then this marks the end of the epoch The next epoch begins when equations 1 2 and 3 are employed once again to derive the new operating parameters of the Ladder and Bottom structures Notice that for the new epoch Eqs 1 2 and 3 will be operating on parameters associated the current future events stored in Top This means that LadderQ will always operate with a new set of operating parameters tailored for events in the new epoch This is clearly desirable It is interesting to know that in a practical setting some simulators such as Network Simulator v2 Fall and Varadhan 2002 enqueues at the onset an event with a huge timestamp which indicates the end of a simulation task This result in LadderQ having only one epoch and thus may slightly affect its performance This is an isolated issue and can be easily solved by implementing two Top Topl and Topg where Topg contains one or more huge timestamps Topl then replaces the function of Top to allow multiple epochs 223 Enqaeae Operation The enqueue operation in LadderQ is compara tively more straightforward than its dequeue operation In each enqueue oper ation the event timestamp is compared with the threshold of each level to nd out where it is to be inserted For each enqueue the event timestamp TS is rst compared with TopStart If TS gt TopStart the event is appended at Top Oth erwise TS is compared with thresholds RCar 1 RCar2 RCar NRang to determine which rung the event should be in Once TS gt RCar i the event is inserted into Rang i The event is inserted into Bottom only if it does not belong to either Top or Ladder Ifthe event is to be inserted into a Rang i of the Ladder the bucket index for insertion is determined using Eq 2 As every bucket will contain a linked list that is not sorted in any order the event is simply appended at the tail of the sublist An insertion at Bottom will on the other hand require a sequential search through the list to determine the place of insertion as Bottom is always kept sorted An enqueue into Bottom causing the NBot to reach THRES will activate a rung spawning process somewhat similar to that found in the de queue operation To describe this process brie y assume that the number of rungs currently in use NRang is i A new rung Rang i 1 is created with Backetwidth i 1 set using Eq 3 Events that belonged to the Bottom list will then be redistributed into Rang i 1 The previous BC in Rang i then changes to BSP A new BC is identi ed in Rang i 1 and events in BC will then be sorted to form a new Bottom list ACM Transactions on Modeling and Computer Simulation Vol 15 No 3 July 2005 184 W T Tang et al Continuing from the dequeue example described in Figure 2 suppose two events with timestamps 56 and 32 are to be inserted For the event with timestamp 56 it is simply appended into Top since 56 gt TopStart 50 For the other event since its timestamp is less than RCur 1 which is 35 it cannot be inserted in Bung 1 Since event s timestamp is 32 gt RCur 2 31 this event is to be enqueued in Bung 2 Using Eq 2 with i 2 this event is then appended to the third bucket ofRung 2 23 Spawning vs Ftesize and Value of THRES The rung spawning process in LadderQ can occur in both enqueue and dequeue operations This is functionally different from the resize operation of the SCQ which involves the transfer of all events from an old queue to a new one with a different bucketwidth which is an expensive procedure However the spawning operation in LadderQ only involves copying events from the current dequeue bucket BC in Ladder or a single linked list in Bottom to a new rung In short spawning only affects a bucket or a linked list and not the entire PES structure In comparison to the SCQ this methodology is much more economical and ef cient Recall that during an enqueue operation it is possible for an event to be inserted in the Bottom linked list If events are frequently enqueued in Bottom it is possible for LadderQ to degenerate into a single linked list structure thus degrading the performance of LadderQ severely In such cases it is therefore imperative for a new child rung to be spawned in Ladder and for Bottom to be recopied to it BC changes to B517 Hence further enqueue operations in the lowestmost region will be further distributed into smaller buckets resulting in enqueues having 01 complexity instead of 0n complexity due to traversing a long sorted linked list The other spawning operation is triggered during dequeue operations when the number of events in the current dequeue bucket BC exceeds THRES When this happens a new child rung is created BC changes to B517 and events will be redistributed in the child run Now the value of THRES is independent of the event distribution and can be determined by comparing the average time required for sorting an event in Bottom and the average time required for spawning and copying Bottom events into a new child rung A short piece ofcode that compares the time required per event under Linear Sort and Copy was written and run on an Intel Pentium 4 workstation The simulation results in Figure 3 show that the threshold where Linear Sort intersects Copy is around 50 This means that during a dequeue operation if the number of events is greater than 50 the time is better invested by copying the events to a new child rung The value of 50 concurs with previous studies that linear sort is only ef cient provided events in the list number less than 50 Marin 1997 24 Practical Aspects of LadderQ Infinite Fiung Spawning and Reusing Ladder Structure The following presents some practical aspects of LadderQ for use in practical queue scenarios where some modi cation of the original LadderQ structure is require ACM Transactions on Modeling and Computer Simulation Vol 15 No 3 July 2005 Ladder Queue 185 Mean operation time us we Number of Elements I l 1 I I I Linear Sort Co 08 py 06 04 02 0 0 Fig 3 Comparison of average time required per element for each operation First the practical LadderQ incorporates a maximum limit of eight rungs that can be spawned at any one time This is to prevent in nite spawning of rungs in Ladder In the extensive assortment of simulation scenarios con sidered in Section 5 the maximum number of rungs spawned did not exceed three If there should be an occasion where there are already eight rungs then events in the BC current dequeue bucket associated with the eighth rung are sorted to create Bottom even though the number of events in Bottom may exceed THRES In nite spawning of rungs can occur if the number of events exceeds THRES and the time stamps associated with all these events are all identical This causes all the events to be always enqueued in one bucket irre spective of the number of rungs spawned Even though scenarios with as much as 50 events having the same time stamp is rarely encountered this safeguard eliminates the possibility of in nite spawning that may jeopardize the per formance of LadderQ For other more common priority increment distribution where the mean a of the jump random variable is nite and greater than zero then relevant theoretical justi cations see Section 3 will demonstrate that the average number of rungs spawned by the LadderQ is bounded by a constant Second at the start of the simulation the practical LadderQ is preinitialized with ve rungs irrespective whether the distribution require less than ve rungs which is greater than the maximum number of rungs under most dis tributions see Section 5 After each epoch the rungs are not deleted but rather they are reused for the subsequent epochs The only difference is that the buck etwidth parameter of the rungs is different from epoch to epoch By reusing the same rung structures memory fragmentation is avoided and superior perfor mance is obtained since the rungs are created only once With the exception of Rang 1 the number of buckets in Rang i i gt 1 is 50 that is the THRES value Hence preinitializing Rang 2 to Rang 5 should be straightforward enough In contrast the number of buckets in Rang 1 is a variable which is only determined by equations 1 and 2 when a new epoch is started To be able to reuse Rang 1 repeatedly for each epoch we create more than enough buckets in Rang 1 when the rst epoch starts For example if the actual num ber of buckets required in Rang 1 is M in the rst epoch then we create 2M ACM Transactions on Modeling and Computer Simulation Vol 15 No 3 July 2005 186 W T Tang et al buckets instead If 2M buckets are insuf cient for a later epoch then at that epoch a batch creation process is initiated again to double the amount of buck ets in Rang 1 Eventually in a stable queue situation the number of buckets in Rang 1 will progressively satisfy all the later epochs and thus do not re quire a batch creation of buckets Hence from epochtoepoch if the number of buckets in Rang 1 is suf cient the previous set of already created buckets can be reused With this costsaving feature in the practical LadderQ it is expected that the practical LadderQ will operate with more or exactly enough buckets than required in some epochs Therefore the practical LadderQ must also keep track of the total number of elements enqueued in each rung When the number of elements parameter associated with a particular rung is zero then the bucket scanning process for that rung should cease even though the current bucket is not the last bucket of that rung With the above modi cations the pseudocode for the practical LadderQ can be better understood and is found in the Appendix 3 THEORETICAL ANALYSIS OF LADDER QUEUE S 01 AVERAGE TIME COMPLEXITY 31 Scenario and Conditions for Theoretical Analysis and the 1Epoch LadderQ We now consider the theoretical performance of LadderQ As mentioned earlier in Section 222 the algorithm of the LadderQ structure proceeds in epochs whereby for each epoch a new 61 is obtained according to Eq 1 based on cur rent events enqueued in Top The LadderQ performs more favorably if there are more epochs in a simulation job since rstly each epoch redistributes the events in Top to a newlycreated bucketwidth with parameters tailored for cur rent events Hence parameters of past events do not affect the current operat ing parameters Secondly having more epochs suggests that as the simulation time progresses more events will be enqueued in Top which is an 01 cost per event since Top is an unsorted linked list rather than Bottom and Ladder This means that the epoch populationde ne to be the number of events in Lad derQ excluding those events in Top that is the number of events in Bottom and Ladder must decrease as timestamp increases As the major complexity in LadderQ is not with its Top as Top is always 01 structure one can easily conclude that the major complexity of the LadderQ must therefore lie with its Bottom andLadder structure Hence we begin with the following Lemma LEMMA 31 The complexity of the LadderQ cannot increase when the epoch population decreases PROOF The complexity of LadderQ lie primarily with its Ladder and Bottom structure as these structures contain sorting processes When the epoch population decreases this mean that the Ladder and Bottom structure man ages less events compared to the time when the epoch was rst created with maximum size With smaller events to manage less sorting is required and hence complexity cannot increase El ACM Transactions on Modeling and Computer Simulation Vol 15 No 3 July 2005 Ladder Queue 187 It should be noted that the majority of practical simulation scenarios in cluding those with in nite variance in the jump variable t into the category where after the creation of an epoch the epoch population becomes progres sively smaller Hence the La Q scenario 139 quot a multi epoch LadderQ where within the duration of the simulation epochs arise and then die away eventually to give rise to another epoch However there are two pathological scenarios where the epoch population grows instead of decreases The rst scenario is an unstable queue situation where the number of events enqueued is always higher than the number of events dequeued Hence it is possible for the epoch population to increase rather than decrease Complexity analysis on queues implicitly require that there is some xed N representing the target number of events in the queue for which a complexity measure is to be derived A growing queue scenario is one where no targetN can be de ned and hence bear no signi cance for further analysis The other scenario where the epoch population increases is where the mean jump parameter 1 is zero so that every event has the same timestamp In this scenario uncontrolled rung spawn ing may occur and is the reason behind the imposition ofa maximum eightrung limit to the practical LadderQ structure see Section 24 and the relaxation of usual 50element Bottom structure ie sorted linked list so that it can hold more than 50 elements Even if such mystical scenarios are encountered the LadderQ can also be easily adapted to yield 01 performance as follows If there are already eight rungs and it is detected that all events are arriving with the same timestamp then a special tail pointer for Bottom is initialized so that an enqueue process does not require event scanning beginning from the head of the queue This makes LadderQ clearly an 01 structure that is all dequeue will be 01 using Bottom s usual head pointer and all enqueue will also be 01 using Bottom s special tail pointer We now return to the more conventional LadderQ scenarios where multiple epochs are encountered Such scenarios have the property that the priority in crement distribution has a nite mean jump parameter 1 and the queue size grows to some wellde ned value N and then maintains at that level ie the number of dequeues and enqueues are roughly the same for some time Conse quently the LadderQ structure will proceed in epochs which implicitly require that the current epoch population will decrease to zero at some stage in time ifthe epoch population does not decrease then the LadderQ cannot proceed in epochs Also noted is that during each epoch the bucketwidth parameter 61 stays constant In the following theoretical analysis we demonstrate that the LadderQ is 01 for conventional queue scenarios where the epoch population starts de creasing from the time it was created However in view of Lemma 31 it is also clear that the worstcase LadderQ complexity in a practical queue sce nario is where the epoch population is at maximum and equal to N In other words it is reasonable to consider a 1epoch LadderQ where all event activi ties ie enqueue and dequeue activities are assumed to only occur in Bottom or Ladder and the epoch population is always a constant equal to N In ad dition the 1epoch LadderQ will now have a bucketwidth parameter 61 which remains constant throughout The 1epoch LadderQ also has some practical ACM Transactions on Modeling and Computer Simulation Vol 15 No 3 July 2005 188 W T Tang et al signi cance in that it represents the initial state of the multiepoch LadderQ during the time when the epoch was rst created with maximum events It is also during this initial time following the epoch creation that the LadderQ experiences maximum time complexity If this 1epoch LadderQ is 01 then clearly the multiepoch LadderQ is also 01 by virtue of Lemma 31 Hence we state the following corollary which is a result of Lemma 31 COROLLARY 31 The average time complexity of the multiepoch LadderQ that is conventional LadderQ is no larger than the average time complexity of the Iepoch LadderQ It should also be noted that the theoretical analysis do not consider the cost of rung creation since rung creation is only done once as explained in Section 24 As for the number of buckets in Rung1 if N is the target size of the queue then based on 1 where we set NTop N the total number of buckets required in Rung 1 isjust N 1 where the additional one bucket is to accom modate the maximum timestamp event and this does not change any further during the epoch It is noted that rung creation is just a xed initial cost for each epoch irrespective whether we are considering a multiepoch LadderQ or 1epoch LadderQ This xed cost is 01 per event for each epoch since the cost of creating N buckets is a onetime cost when transferring N events from Top to Ladder Finally an important issue for theoretical analysis is that we do not impose a maximum limit of eight rungs and the usual 50element limit also applies to Bottom irrespective of the number rungs spawned The case of practical LadderQ being limited to eight rungs as stated in Section 24 is only for a speci c and obviously mystical queue situation where all the event timestamps are equal However as far as the mean jump parameter u is nite and not zero and the event size does not grow in nitely then a maximum rung limit has no theoretical basis for its existence This will be clear in the following sections 32 Complexity of 1epoch LadderQ The complexity of treebased priority queues eg Splay Tree Sleator and Tarjan 1985 are often gauged by considering the average height ofthe grow ing tree structure Similarly the complexity of the 1epoch LadderQ is closely related to the average number of rungs that are spawned as will be seen later We begin with the following proposition PROPOSITION 31 The 1epoch LadderQ is 01 if the average number of rungs for a priority event distribution is bounded by a constant Justi cation The proposition can be justi ed by considering the cost of a dequeue operation and the cost of an enqueue operation in a 1epoch LadderQ as the number ofevents N increases lfthe costs ofthese two bread and butter operations in the 1epoch LadderQ are 01 then the structure is also 01 We consider these two basic costs and analyze them in terms their xed cost and variable cost component Dequeue Cost A xed cost of dequeue is incurred when there is already a Bottom list In this situation since there is always a pointer that points to the ACM Transactions on Modeling and Computer Simulation Vol 15 No 3 July 2005 Ladder Queue 189 rst event in the sorted Bottom list the xed cost of dequeue is to reference this pointer extract the rst element and then increment the pointer to the next element This is 01 irrespective of N However a dequeue will incur a variable cost when there is no Bottom list In that case several rungs may be spawned In each spawned rung there will be THRES number of buckets If we consider the worstcase the most number of buckets in Rung1 that requires spawning is N1THRES buckets since there are at most N1 buckets in Bung 1 where the additional one bucket is to accommodate the maximum timestamp event Each bucket spawned generates Bung 2 Since each spawned Rung2 contains only THRES number of buckets there will be at most N1THRES THRES N 1 total number of buckets to be traversed for Bung 2 And the analysis is the same for subsequent rungs The total cost worstcase for transferring N events during dequeue operations from Top to Bung 1 is N from Bung 1 to Bung c is c 7 1N 1 and from Bung c to Bottom is N 1 This gives a total worstcase transfer cost from Top to Bottom during all dequeues as N cN 1 0Nc 1 where c is the number of rungs spawned Therefore if the average number of rungs spawned is bounded by a constant irrespective of N then a dequeue operation involving rung spawning is 01 amortized see Corollary 34 Enqueue Cost An 01 xed cost of enqueue is incurred by calculating the bucket index for inserting an event as well as appending the event into an unsorted bucket Ifthe event is to be inserted into Bottom using Linear Sort the search cost is also bounded as the number of events in Bottom is always limited to 50 However an enqueue can incur a variable cost if rungs are required to be traversed in order to nd the appropriate insertion level Therefore if the average number of rungs is bounded by a constant irrespective of N this variable enqueue cost is also bounded 33 Similarity of 1Epoch LadderQ to the U00 To prove that the average number of rungs in 1epoch LadderQ is bounded we adopt a static Hold scenario as used by Erickson et al 2000 to show that a UCQ has 01 amortized complexity It is under the static Hold scenario where the number of events N is kept a constant will it be possible that a Markov chain model be able to provide an analysis of the UCQ structure The UCQ model considered in that article assumed that the N events are all stored within the same year of the UCQ s bucket structure These N events are also assumed to be created by a priority increment distribution where the mean of the jump random variable is some constant M which is nite and positive regardless of its variance This UCQ scenario has an uncanny resemblance to the rst rung of the Ladder portion of Figure 1 We can apply the same scenario ofinterest to the 1epoch LadderQ as was applied to the UCQ Initially N events are generated and placed into the 1epoch LadderQ s Top list Once N events have been queued in Top the rst dequeue is generated and this causes Bung 1 in the Ladder portion to be created to store the N events The creation of Rung1 would place all Nevents within one calendar year of Bung 1 If child rungs are spawned then we consider all events stored in child rungs ACM Transactions on Modeling and Computer Simulation Vol 15 No 3 July 2005 190 W T Tang et al to belong to the spawning parent bucket Hence the distribution of events in Rung 1 of the Ladder portion is identical to the UCQ scenario considered in Erickson et al 2000 34 Useful Lemmas Applicable to the 1Epoch LadderQ Several lemmas which are to be used later for showing the average number of rungs in a 1epoch LadderQ is bounded by a constant are now presented For convenience we de ne the following variables B represents the random variable that the bucket BC or ESP in Rung i contains a certain number of events ranging from 0 to N 6 represents the bucketwidth ofRung i Based on 3 and for THRES 50 we note that 6H1 Si50 6150i Note that the analysis is for the 1epoch LadderQ where 61 is a constant throughout u is the nite mean ofthejump random variable that de nes the priority incre ment distribution of the N events The jump random variable has a cumulative density function denoted by F x qid is used to denote the limiting probability that a bucket in Rung i has exactly j events enqueued inside LEMMA 32 For N 3 2 and all 61 gt 0 the Nevents are distributed amongst the one year UCQlike rst rung structure of the 1epoch LadderQ according to M PB 0 4 1 110 MN61 andforj12NwehaUe B39 quNSHampN 00 m lt m 1aFmr where Bjis the tail probability ofa binomial distribution for N trials with success parameter p61 N N BowXXkwuapW m M 31 P031 17Fxdx 7 M 0 Remark Lemma 32 provides a mathematical description to the UCQ dis tribution under a Hold scenario The relevant proof is presented in Erickson et al 2000 using a Markov chain model The next Lemma presents results on the probability distribution of events in the child rungs For this purpose it can be assumed without loss of generality that each bucket in Rung 1 will spawn m number of buckets as shown in Figure 4 For the 1epoch LadderQ m 50 However the proofis also applicable for all m gt 1 An event enqueued in Rung 1 can be seen to be virtually enqueued inRung 2 This virtual enqueue process can also be applied to higher ACM Transactions on Modeling and Computer Simulation Vol 15 No 3 July 2005 Ladder Queue 191 An event that is enqueued in Rung1 can be seen as virtually enqueued in Rung2 and so forth Rung1 U n I n QI m buckets m buckets m buckets Fig 4 Events in a parent bucket are virtually enqueued amongst m child buckets order rungs where virtual events in a parent bucket are seen to be virtually enqueued amongst m number of buckets in the child rung With the virtual enqueue process in place it is clear that each child Rung i can be considered to be a one year UCQ with bucketwidth mail Hence Lemma 31 is also applicable on the successive child rungs so that the following probability expression is valid u PBi 0 i 8 1 0 81 mi l We now present Lemma 33 as follows LEMMA 33 As more levels of child rungs are spawned the probability that a bucket in the lowest child rung denoted to be the Lth rung has no element will asymptotically approach unity that is 110 lt 120 lt K lt qL10 lt q lt K gt 1 PROOF The proof is seen in 8 where we let i increase to a large value D Remark Lemma 33 is also rather intuitive since as more child rungs are spawned the more the events are spread out amongst the child rungs and thus the greater the likelihood that there will be no element in any particular bucket belonging to the lowest rung This lemma also suggests that the number of child rungs spawned ought to be bounded LEMMA 34 Let L represent the random variable that counts the number of rungs spawned in a 1epoch LadderQ Then for n gt 1 PL n lt 1 qn10 PROOF We provide proofs for PL 2 and PL 3 and then apply in duction to obtain the desired result for PL n It is noted that for 1epoch LadderQ the new child rung is spawned when the number of events in a parent bucket has reached 50 Hence we note that PL 1 PBl lt 50 PL 2 PBg lt 50 Bl Z 50 E lt 1 110 ACM Transactions on Modeling and Computer Simulation Vol 15 No 3 July 2005 192 W T Tang et al Similarly PL 3 PBg lt 50 Bg 3 5031 3 50 PBg 3 50 1 7 PBg lt 50 lt 1 7 120 Similarly bounds for PL 4 and higher can be derived using similar tech niques and hence the proof is complete El 35 Theorems for the 1Epoch LadderQ s 01 Amortized Complexity We present in this section several theorems and corollaries for the 1epoch LadderQ We begin With Theorem 31 as follows THEOREM 31 The average number of rungs in a 1epoch LadderQ is bounded by a constant provided the bucketwidth 6 is ofO1N PROOF The average number of rungs for a 1epoch LadderQ containing N events is given y ENL ZjP L j j1 1PL 1 2PL 2 3PL 3 4PL 4 L lt 1 21 7 110 31 7 120 41 7 130 L see Lemma 33 N51 N51 N51 1 2 3 4 L ltMNt31gt lt77luN5lgt 7712MZ 51gt N61 2 3 N61 1 1 1 1 2 1 2 u m m u m m 1 N61 1 1 1 N61 2m2 7 m 9 00 u 1771712 17m 1 m272m1 lt t The last expression in 9 is bounded by a constant as long as the bucketWidth 61 ofRung 1 is held at ON El The next theorem that is Theorem 32 provides an alternative proof to demonstrate that the average number of rungs in the 1epoch LadderQ is bounded by a constant While Theorem 31 has already provided an explicit bound see Eq 9 for the average number of rungs a constraint is required on the 1epoch LadderQ s 61 parameter that is 61W ON Theorem 32 on the other hand does not require any constraint on the 61 parameter but Will not provide an explicit bound value as Theorem 31 di THEOREM 32 The average number of rungs in a 1epoch LadderQ is bounded by a constant regardless of the Rung 1 bucketwidth 61 ACM Transactions on Modeling and Computer Simulation Vol 15 No 3 July 2005 Ladder Queue 193 PROOF We begin with the usual expression for obtaining the average num ber of rungs for a total ofN events in the 1epoch LadderQ ENL Z jPL j j1 1PL 1 2PL 2 3PL 3 4PL 4 L lt 1 1 21 7 110 1 31 7 120 1 41 7 130 1 L see Lemma 33 TjTj1 i where TJ and Tj1 represents the higher order terms ofthe series The above sum of series is bounded as long as the ratio test Tj11TJ is less than 1 Thus Tj1 j 1 1411411 mj 1MN51 1 11m llm lt1 J liqm jaw mJMN51 m Since the ratio test evaluates to a value that is less than unity the average number of rungs must converge to some constant less than in nity El lim j7gtoltgt TJ j7gtoltgt COROLLARY 32 The 1epoch LadderQ has 0 1 average time complexity PROOF The proof is obtained by combining Proposition 31 with either Theorem 31 or Theorem 32 El COROLLARY 33 The 1epoch LadderQ has 0N total memory usage PROOF As shown in Eq 1 Ladder s rst rung requires N 1 N N buckets on a transfer of events from Top and thus the rst rung has an 0N memory usage Each subsequent child rung requires 0THRES memory space Since the average number of rungs is bounded See Theorem 31 or Theorem 32 the 1epoch LadderQ s memory consumption is therefore bounded by 0N El COROLLARY 34 The average amortized dequeue cost incurred when events are transferred from Top to the multirung Ladder structure and then to the Bottom structure is 01 PROOF Assume that the average number of rungs spawned is C Therefore the worstcase total cost note total cost is not the amortized cost incurred is given as ONC 1 where all the events in Top traversed C number of rungs before reaching Bottom see Proposition 31 Hence the cost incurred per event is 0C 1 01 since C is some constant independent of N which is given in Theorems 31 and 32 El 36 Theorems for the Conventional LadderQ s 01 Amortized Complexity For formality we now have COROLLARY 35 The conventional multiepoch LadderQ is theoretically an 01 priority event queue structure PROOF Combine the results of Corollary 31 and Corollary 32 El ACM Transactions on Modeling and Computer Simulation Vol 15 No 3 July 2005 194 W T Tang et al LadderQ with its theoretical 01 amortized complexity is clearly and sig ni cantly more robust than the current 01 priority queue structure proposed It should be noted that the UCQ considered by Erickson et al 2000 is 01 only provided the bucketwidth can be kept at 01N Therefore for the UCQ to maintain 01 in a dynamic queue situation where N varies costly buck etwidth resizes must be initiated to adjust the bucketwidth and hence the UCQ has never been widely used as the priority queue structure for practical simu lators In comparison with the more widely implemented SCQ the LadderQ is also more superior First in the area of amortized complexity the SCQ can at best be described as having expected 01 complexity meaning that there is no known theoretical proof to show 01 except through a number of simulation examples In fact simulations studies conducted on the SCQ has shown that for certain scenarios where the priority increment is highly unstable the SCQ exhibit 0N characteristics either due to under or over triggering of resize l quot To the quot and superiority of the LadderQ for practical implementation Sections 4 and 5 provide simulation studies on the performance of the LadderQ in comparison with previously proposed priority queue structures It should be noted that most of the scenarios chosen for the simulation studies are those that have been previously proposed by wellknown researchers on priority queues Other scenarios presented have been suggested by the reviewers to incorporate more stringent tests In fact the LadderQ does not require any special scenarios to show its superiority since it already has the distinction being 01 quot quot ir r quot ofthe l and hence N We note that in the numerical studies presented in Section 5 the maximum number of rungs spawned never exceeded three levels 4 PERFORMANCE MEASUREMENT TECHNIQUES The performance of priority queues are often measured by the average access time to enqueue or dequeue an event under different load conditions The pa rameters to be varied for each queue are the access pattern the priority distri bution and the queue size The access pattern models that have been proposed either emulate the steadystate or the transient phase of a typical simulation Classic Hold model Jones 1986 and UpDown model Ronngren et al 1993 respectively The priority increment distributions used for the benchmarking of priority queue structures are found in Table II where rand returns a random num ber Park and Miller 1988 in the interval 01 The Camelx y distribution Ronngren et al 1993 represents a 2hump heavily skewed distribution with x of its mass concentrated in the two humps and the duration of the two humps is y ofthe total interval The ChangeA B x distribution Ronngren et al 1993 was also used to test the sensitivity of the SCQ when exposed to drastic changes in priority increment distribution The compound distribution ChangeAB x interleaves two different priority increment distributions A and B together Initially x priority increments are drawn from A followed by an other x priority increments drawn from B and so on Change distributions can be used to model simulations where the priority increment distributions vary ACM Transactions on Modeling and Computer Simulation Vol 15 No 3 July 2005 Ladder Queue 195 Table III Priority Increment Distributions See et a1 if Counterlt2000 then Exponential 1 else 90000 signi cantly over different time periods for example battle eld simulations The ParetoCD is a heavytailed distribution and is an excellent model for event timestamps with high variance C is the shape parameter and D is the scale parameter D affects the range where the random number generated is 3 D In this article we set D 1 C affects the mean and variance Type 1 In nite mean amp in nite variance 0 lt C 3 1 Type 2 Finite mean amp in nite variance 1 lt C 3 2 and Type 3 Finite mean amp nite variance C gt 2 We have included the rst two types Pareto1 and Pareto15 since their mean andor variance are different from the other distributions included in this article The hardware platform used for the experiments was a 24GHZ Intel Pentium 4 It is equipped with 1GB of shared RAM The operating system on this hardware is Linux Mandrake 90 with kernel 2419 Background processes that might affect the benchmark results were kept at a minimum A microsec ond resolution timer was used Two empirical tests were conducted to verify that no items in the priority queues were gained or lost and that successive dequeues removed events in stable timeorder In addition the experiments were performed with the required memory for each priority queue being preallocated This was to eliminate the underlying memory management system which might affect the results This is a good practice in actual DES as it prevents memory fragmentation when creating new events and deleting the serviced events This method of preallocating memory would also enhance the performance of the DES The method of pre allocation could be made dynamic by an initial preallocation and subsequently an allocation of memory on demand methodology could be employed All code was written in the C programming language with all recursive procedure calls and the like being eliminated Loop overhead time and the time taken for ran dom numbers generated were removed by factoring out the time required for running a dummy loop 5 EMPIRICAL RESULTS Figures 5 and 6 demonstrate the performance of the priority queues for Clas sic Hold and UpDown experiments respectively and for increasing queue sizes with various priority increment distributions The results shown are the aver age values from ve identical runs The obvious knee phenomenon at queue size 10000 is due to declining cache performance which has also been reported by Ronngren and Ayani 1997 ACM Transactions on Modeling and Computer Simulation Vol 15 No 3 July 2005 196 W T Tang et al Classic Hold 87Exponential mean access time ps 2 J J J I Splay ree see 15 Ladd rQ1 Rung EI Ladcl rQ 100 1000 10000 100000 1000000 10000000 Queue Size Classic Hold 8 Camel mean access time us 2 J J J I u Splay Tree LEE sce o 100 1000 10000 100000 1000000 10000000 Queue Slze Classic Hold aJPareto l mean acc ess time uls 2 15 i Splayr ee SCQ Ladde Q1 Rung l39 Ladd rQ 45quot 39j39 gen 2 9 9 U quotquot I 39 39 quotquotquotI 39 39 quotquotquotI 39 39 quotquot I 39 39 quotquot 100 1000 10000 100000 1000000 10000000 Queue Slze Classic Hold 8 Bimodal mean access time us 2 J J J I 15 0 39 39 quotquotquotI 39 39 quotquotquotI 39 39 quotquotquotI 39 39 quotquot I 39 39 quotquot 100 1000 10000 100000 1000000 10000000 Queue Slze Classic Hold aJChange m ean accessl time us I 2 5 E I E splayaTreeg 15 I ELadderangu BquotI 100000 1000000 10000000 Queue Slze 100 1000 10000 Classic Hold amp Pareto151 mean access time pie 2 IIIIII I IIIII IIIII nunn 15 0 39 39 quotquotquotI quotquotquotI 39 39 quotquotquotI 39 39 quotquot I 39 39 quotquot 100 1000 10000 100000 1000000 10000000 Queue Slze Fig 5 Mean access time for Classic Hold experiments under different distributions The LadderQ performs very stably under all distributions with excellent per formance for all queue sizes tested that is 100 to 10 million To demonstrate the importance of the Ladder structure within LadderQ we have included in the experiments a single rung LadderQ denoted as LadderQ1 Rung Figure 5 shows that LadderQ1 Rung is 0n for Pareto a heavytailed distribution and Figure 6 clearly reveals the weakness of LadderQ1 Rung under Camel Change and Pareto distributions where many events ended up in some few buckets leading to a long Bottom linked list Because the LadderQ1 Rung ACM Transactions on Modeling and Computer Simulation Vol 15 No 3 July 2005 Ladder Queue 197 UPIDOW 3 Exponential quot1930 access time 115 Uprown a Bimodal mean access time us 2 ul J I I 2 I J J l Splay Tree SCQ 15 v39 LadderQ1 EI LadderQ Splay Tree SCQ U091 15 3939 39 LadderQ1 EI LadderQ t 100 1000 10000 100000 1000000 10000000 100 1000 10000 100000 1000000 10000000 Queue Size Queue Size Uprown 8 Camel mean access time us Uprown amp Change mean access time us 2 J J l 2 J l J I Spliay Tree isralayr Tree 30 LETSCQ 15 LadderQU ung 15 u LadderQ1 R g EI LadderQ EI LadderQ 1 J H H39quot am 05 aEEEaE B g r 139 I Jquot 1 22 N r11 x awse o 100 1000 10000 100000 1000000 10000000 100 1000 10000 100000 1000000 10000000 Queue Size Queue Size UpIDown 8 Pareto10 mean access time us UpIDown 8 Pareto15 mean access time ps 2 J J l I F 2 l J I Splay Tree 5 Splay Tree 5 sce Fe SCQ g 4 15 Lao derQU may 15 uu LadderQ1 ung if a LadgderQ F EI LadderQ a 100 1000 10000 100000 1000000 10000000 100 1000 10000 100000 1000000 10000000 Queue Size Queue Size Fig 6 Mean access time for UpDown experiments under different distributions does not resize or spawn the bucketwidth stays xed and the number of events in Bottom cannot be reduced to manageable quantity This clearly is not a high quality methodology and results in a poor overall performance for skewed and heavytailed distributions These observations therefore speak for the impor tance of Ladder structure for the LadderQ to achieve its 01 characteristic The SCQ a popular PES structure for DES performs well for distribu tions where the mean of the jump remains constant under the Classic Hold experiments shown in Figure 5 However SCQ is extremely sensitive to the skewed distributions Camel and Change where the mean of the jump varies ACM Transactions on Modeling and Computer Simulation Vol 15 No 3 July 2005 198 W T Tang et al Table IIL Maximum Number of Rungs Utilized in Classic Hold and UpDown Experiments For Figure 6 where the queue size uctuates during each experiment the SCQ is evidently not a suitable PES candidate for simulation jobs of this nature For exponential and bimodal distributions the SCQ is closed to 100 slower than the LadderQ For skewed and heavytailed distributions the SCQ is very unstable at 0n The reasons for the behavior of the SCQ plots have already been elucidated in points a to d of Section 1 ie introduction section The Splay Tree is an ef cient treebased priority queue Though it is only at best 0logn it is included for two important reasons To demonstrate that cache effects do exist The Splay Tree has been theoreti cally proven to perform at 0logn worstcase amortized complexity However it also exhibits the knee effects at 10000queue size To provide a comparison between an 0logn and 01 priority queues Because of cache effect the performance of 01 priority queues may not be visually obvious The splay tree plots provide the necessary comparison which undoubtedly af rms the superior performance of the 01 LadderQ Table III shows a summary of the maximum number of rungs spawned in the experiments The number of rungs does not exceed three under all the distributions and queue sizes 51 Experiments on an Intel Pentium 4 24 GHZ without Cache Effect of Bucketwiclth on the Performance of LadderQ and the Actual Algorithmic Complexity All the prior simulation results for the LadderQ demonstrate knee effects due to declining cache performance Hence it is certainly not visible from Figure 5 and Figure 6 that the LadderQ is truly 01 In this section we present numer ical simulations of various priority queues under the same Pentium 4 24GHz platform but with cache disabled Having the cache disabled ensure that the hardware only accesses a standard memory type with the same access speed irrespective of the size of N Hence a true 01 priority queue like LadderQ if simulated without cache would demonstrate an average amortized time com plexity that is a constant straight line across all queue sizes Our rst experi ment without cache is to demonstrate that the bucketwidth 61 does not affect the 01 characteristic ofthe LadderQ as elucidated in Theorem 32 We note in Figure 7 that when 61 is varied from 01100N to 0100N the mean access time remains relatively constant across all queue sizes The LadderQ with a 01100N bucketwidth results in slightly poorer performance but still retains the 01 constant mean access time characteristic This is due to the additional cost of skipping more empty buckets since for 61 to be 01100N there are 100 ACM Transactions on Modeling and Computer Simulation Vol 15 No 3 July 2005 Ladder Queue 199 Classic Hold amp Exponential mean access time 15 100 39 39 39 quot l 01 100M MaxRungs 1 01 I 10M MaxRungs 1 a 011N MaxRungs 2 p 0101N MaxRungs 2 an e 0100 I N MaxRungs 2 39 BU 100 1000 10000 100000 1000000 Queue Size Fig 7 Mean access time of LadderQ with widelyvarying bucketwidth for Classic Hold exper iments and exponential distribution Maximum number of rungs MaxRungs used during the experiments are also shown times more buckets created in Rang 1 In distinct contrast when the buck etwidth of the UCQ varies by a few magnitudes the mean access time varies by several factors see Figure 1 in Erickson et al 2000 Figure 8 demonstrates the reliable 01 characteristics of LadderQ under a wide variety of priority increment distributions found in the landmark survey paper by Ronngren and Ayani 1997 In addition we have included the Pareto 1 and Pareto 15 distributions which allow us to test the jump parameter with in nite mean and in nite variance and nite mean and in nite variance re spectively It is stressed that the constant line 01 characteristic is only visible if the cache is disabled If cache is enabled then the plots in Figure 8 would also demonstrate knee effects similar to Figure 5 and Figure 6 52 Performance Evaluation via SWAN To determine the performance of LadderQ in actual discrete event simulations the LadderQ Splay Tree and the SCQ have been implemented as the PES struc tures available in the SWAN Simulator Without A Name simulator A simple M M 1 queuing system was created in SWAN and simulated for 10000 simu lation seconds In this network topology each source node generates a network packet where the interpacket generation rate is exponentially distributed The source nodes which vary from 8 to 399998 generate packets which are multi plexed into a single server that services the packets to a single destination node Including the server and the destination nodes the total number of nodes in the network topology varies from 10 to 400000 The service times for the packets are also exponentially distributed The results obtained are shown in Figure 9 Each discrete event simulator operates uniquely For SWAN an event in the PES corresponds to a node in the network topology At the onset if the number of nodes is set to be N SWAN initializes the PES structure with N ACM Transactions on Modeling and Computer Simulation Vol 15 No 3 July 2005 200 39 140 W T Tang et al Classic Hold mean access time us vs Queue Size ul I I 120 100 B0 60 40 Parete15 mean access time us vs Queue Size I IIIIIII I IIIIIII I I IIIIII Exp Uniform02 Uniform09 1 1 Bimodal Triangular015 NegativeTriangularm 1000 ExponentialMix Camel0100000010999 39 Changeexp1triangular90k10k2k Changetriangular90k10kexp110k Paretc10 1000 10000 100000 1000000 I I I IJIII 20 0 100 UplDown 120 39 100 80 80 I EXDU Uniform02 Uniform0911 Bimodal Triangular015 NegativeTriangu Iar0 1000 ExponentiaIMix Camel0100000010999 Changeexp1trianguar90k10k2k Changetriangular90k10kexp110k Paretc10 Paretc15 0 100 Fig 8 Mean access time of LadderQ for Classic Hold and UpDown under different distributions events with each holding a zero timestamp Thereafter the process interactions between the nodes determine the dequeue and enqueue of subsequent events with the number of events in the PES structure being held constant throughout the simulation at N Thus the mechanism of SWAN essentially corresponds to the Classic Hold model where the number of events in the PES structure is a constant and is equal to the number of nodes in the network topology The 1000 10000 100000 1000000 XaXis in Figure 9 can thus be relabeled as Number of nodes The benchmarks in this section were carried out on the Intel Pentium 4 workstation The metric measured was the overall runtime which include the ACM Transactions on Modeling and Computer Simulation Vol 15 No 3 July 2005 Ladder Queue 201 Runtime sec mo quot39 quotquotJ Smamie e 39 39 39 39 39 quot 39 39 39 39 39 quot 5 IauI LadderQ it 200 5 5 1 1m 3 1m eT 55quot 50 5quot 10 100 1000 10000 No of events In the FEL structure Runtime hours 12 39 39 39 39 Splay Tree 10 39 39 39 39 39 quot39I39 39 39 39 39 39 quot IE39I LadderQ 1000000 1000 10000 100000 Ne of events In the FEL structure Fig 9 Runtime performance measurements in SWAN for 10 to 400000 network nodes SWAN simulation engine management and control time in addition to the PES structure management time So while the PES structure may be 01 the overall runtime is not expected to be 01 Nevertheless the results illustrate a realistic view on the degree to which the performance of a PES structure can affect the runtime The runtime for each simulation with different PES structure corresponds to taking the average value from ve identical runs Figure 9 shows that for relatively small number of events the cache memory of the hardware brings about good performance for both the simulations with the LadderQ and Splay Tree structure As the number of events or nodes in creases beyond 4000 the LadderQ outperforms the Splay Tree and at 400000 the speed increase is close to 100 The SCQ on the other hand performs rather ACM Transactions on Modeling and Computer Simulation Vol 15 No 3 July 2005 202 W T Tang et al poorly The reason for this poor performance is due to the initial onslaught of events with zero timestamps which sets off the SCQ sizebased trigger fre quently resulting in a long sublist in the SCQ s rst bucket However by in creasing the simulation time it can be veri ed that the SCQ s performance gradually improves over that of the Splay Tree To this end LadderQ has again shown to be a conspicuously superior PES structure particularly for largescale simulation scenarios 6 CONCLUSION The choice of an ef cient and stable pending event set implementation is of paramount importance in many largescale simulations Treebased priority queues provide a stable performance regardless of event distribution but they are of 0logn complexity On the other hand the Sorteddiscipline Calendar Queue SCQ is a multilistbased structure that provide 01 performance un der some situations However the SCQ has been empirically shown to perform poorly under in scenarios where mean of thejump u varies andor when the number of events N uctuates frequently by a factor of 2 This article proposes a new priority queue structure called Ladder Queue or LadderQ which not only exhibits the stability of treebased structure but also the 01 character istic that is often associated with a multilistbased structure To achieve these properties LadderQ does away with resize operations and uses a unique bucket bybucket copy operation via the rung spawning mechanism Furthermore in contrast to the SCQ sorting is deferred as long as it is possible to reduce unnec essary operations LadderQ s 01 average time complexity is also theoretically justi ed to be true in this article on the assumption that the mean of is nite and greater than zero regardless of its variance An extensive empirical study on the performance of LadderQ in comparison with the Splay Tree and SCQ is also presented These empirical studies con rm the theoretical predictions that LadderQ is indeed 01 Furthermore we have shown empirically that LadderQ exhibits 01 behavior even when the mean and variance of u are in nite LadderQ s stable and good 01 performance under all priority incre ment distributions for the Classic Hold and UpDown experiments for queue sizes ranging from 100 to 10 million events demonstrates that it is the superior structure for small to largescale discrete event simulation APPENDIX The pseudocode for LadderQ is given below void enqueue if TS gt TopStart insert into tail of Top NTop return l while TS lt RCur x ampamp x lt NRung x ACM Transactions on Modeling and Computer Simulation Vol 15 No 3 July 2005 Ladder Queue 203 if x lt NRung found bucketk TS RStart X Bucketwidthx insert into tail of rung X bucketk NBucket x bucketk H return else if NBot gt THRES creat enewrung NBot transfer Bottom to it insert event in new rung bucketk TS RStart NRung BucketwidthNRung insert into tail of rung NRung bucketk NBucket NRung bucketk else insert into Bottom using sequential search NBot EVENT dequeue i if Bottom not empty return next event from Bottom if NRung gt 0 bucketk recurserung if last bucket then NRung sort from bucketk to Bottom return rst event from Bottom else Bucketwidth 1 MaxTS MinTS NTop TopStart MaxTS RStaIt 1 RCur 1 MinTS transfer Top to rung 1 of Ladder bucketk recurserung sort events from bucketk and copy to Bottom return rst event from Bottom BUCKET recurserung f indbucket find next nonempty bucket from lowest rung while NBucket NRung k 0 k RCur NRung Bucketwidth NRung if NBucketNRung k gt THRES ACM Transactions on Modeling and Computer Simulation Vol 15 No 3 July 2005 Cryptography is very old and very new 0 Crypto is an ancient discipline 0 Recall Julius Caesar Enigma CS 0 Crypto as a science modern cryptography has short but excitin histor Computer and Network Security 9 y 0 Most of it happened in the last 30 years 0 In this course we will study the basics of modern Alexandra Sasha Boldyreva cryptography 0 Modern cryptography means formal security models and definitions proofs etc Cryptography Introduction 0 We won t always be formal and often just discuss the intuition 0 Those who want to learn more and are comfortable with theory may take CS 6260 Applied Cryptography Main goals Of cryptography are Crypto is used by most people when 0 data privacy confidentiality 0 Doing online shopping and banking iv i 0 data authenticity it came from where it claims Talking on a cell phone 0 data integrity it has not been modified on the way 0 Watching satellite TV and payperview movies in the digital world Who used some cryptography recently Players and settings Players and settings pk A Pk A r t 0 t r 0 K K skr S R S R l Symmetrickey setting 2 Asymmetric publiclltey setting 5 Goals and primitives Symmetric vs publickey crypto o Symmetric schemes are easier to construct and implement less math is required goal setting symmetrickey asymmetrickey 0 Symmetric schemes are faster by 34 orders of magnitude 0 But how do parties agree on the shared key at the first place symmetric secretkey asymmetric public data PFIVBCY pton key encryption data authenticity digital signature message authentication code In egrlty MAC How good is a scheme 0 Trialanderrorquot approach 1 Try to find an attack 2 If an attack found then the scheme is insecure fix the scheme repeat ste 3 If no attack found then 0 Provable securityquot approach I show that ifan attack found a scheme is insecure then one can break some trusted assumption eg factoring I requires a definition of what secure means Symmetric encryption schemes 0 A scheme SE is specified by 3 algorithms REED Bil5 MsgSpm essage space KeySpkey space SEKED or SEKeySp CD I S R It is required that for every MeMsgSp and every KeKeySp CDK K MM One Time Pad 0 OneTimePad9C D MsgSp0ln Kreturn a random nbit string K KeySp0ln 0 HKM ChmBK return C 0 f6KC MkCEBK return M 0 Example M011111111011101 K110010011010100 C101101100001001 0 As the name suggests the scheme is to be used only once a new key must be used to encrypt a new message Perfect Shannon security M nformal An encryption scheme SEKED is perfectly secure if everything what can be learned about the message from a ciphertext can be learned without the ciphertext 0 Th1 OneTimePad is a Shannonsecure encryption scheme 0 Th2 Shannon s theorem optimality of OneTimePad If a scheme is perfectly secure then the key space cannot be smaller than the message space if KeySp0lk and MsgSp0lm then kZm and a key must be as long as the message we want to encrypt Higher Level Behaviors AI in videogames 5 10 of CPU for realtime 25 50 of CPU for turn based chase escape behaviors group behaviors nite state machines adaptation learning Questions to think about has AI in games lived up to the hype how good should the AI be why are people more fun than NPC s will networked games reduce AI new directions for AI in games Chase Evade algorithm for the predator apredator NPC 6 Prey user control Enhancements to Chase Speed Control velocity acceleration max min limited turning radius Randomness moves patterns pattern selector l g Er C9 Enhancements to Chase Anticipation building a model of user s behavior Jredator Group Behaviors lots of background characters to create a feeling of motions make area appear interesting Pro programmed Formations too tidy randomness g g gettlng 1nto formatlon y C O O O O turning corners O O O O O O O O O obstacles I 0 39I Flocking HalfLife Unreal gt centroid o l CZpiln compute trajectory to head towards centroid What might go wrong Group Behaviors 1r E Z EISSS7 Reaction to Neighbors Avoidance Desired Separation Distance Attraction Desired Position Desired Velocity current velocity kpel l 01 ill DOSitiOIl 1current velocity nominal velocity Craig Reynolds pursue evade wander obstacle avoidance wall path following queuing Combine behaviors with weights What could go wrong What might go wrong exactly aligned forces balance out in dead end does not handle changes in strategy Perceptual Models Range of Visibility Obstacle f React to n Nearest 3 within Range n 5 Circle of Obstacle In uence Production Rules if enemy in sight fire if big enemy in sight run away if if selecting among multiple rules priority weighting for rules or sensor events random selection no state in pure form Finite State Machines States action to take chase random walk eating Transitions time events completion of action J Problems with Finite State Machines predictable fuzzy probablistic simplistic hierarchies of fsm s Hal ife Probabilistic State Machines Personalities change probability that character will perform a given action under certain conditions aggressive passive Sight Attack 50 5 memory Evade 5 60 curiosity Random 10 20 fear Flock 20 10 anger Pattern 15 5 19958 soc1ab111ty Modify probabilities on the y LearningAdaptation For example increment aggressiveness if player is doing well Levels are a pre programmed version of adaptation Tuning Stability How might adaptation make play better or worse How might adaptation make play better or worse Do you want the monsters in Quake to get smarter as you get better Force user to live with the consequences of his her actions Can surprise the designer Creatures Pit AI creatures against each other to find bugs or tune actions Genetic Algorithms Creatures Cloak Dagger and DNA DNA for rules governing strategy record of performance rules for mutation cross over Use either for on line tuning or as part of development cycle Get players that are adapted to user s style What is good AI perceived as challenging by the user but in a fair way user surprised by the game but later understands why feeling that reality will provide answers able to make progress solving problem What games have used AI effectively CS 4803 Computer and Network Security Alexandra Sasha Boldyr eva RSA function and encryption The RSA system The basics 0 Def Let Nf 2 1 be integers The RSA function associated to Nf is the function RSAN f ZI QI Z fV defined RSANIf w wf mod N for all w E 21 Claim Let N 2 2 and ed e ZN be integers such that ed a 1 mod N Then the RSA functions RSAN e and RSAN d are both permutations on 211 and 1 inverses of each other ie RSA39QIe RSANId and 1 RSANId RSANIe 0 Proof For any erfQ modulo N RSANIdRSANIex a Xed a xed a xed mOd MN 5 x1 a x Similarly RSAN eRSAN dy E y 0 The RSA function associated to Nf can be efficiently computed using MODEXPfN algorithm Hence RSAN e is efficiently computable given Ne RSANlleQ RSANId is efficiently computable given Nd 1 But RSAN e RSAN C1 is believed hard without d for a proper choice of para meters good for crypto 0 Let s build algorithms that generate RSA parameters Claim There is an Ok2 time algorithm that on inputs N e where e e ZqiN and N lt 2k returns d E Z3301 satisfying ed a 1 mod N 0 The RSA modulus generator Algorithm mica k 1 lt 2 lt Repeat pi 21112Z1 7 1 qi 2121212 7 1 Until the following conditions are all true 7 TESTrPRlMELv 1 and TESTrPRIMEKq 1 7 0 q 7 2 1 g M N H m Return Npq o The randomexponent RSA generator Algorithm 6 k Napaqgt 1 Kama MHP1gtq1gt e lt1 Z d e MODVINVeM Return N e Npq 1 Often for efficiency we want e to be small eg 3 Then Algorithm Cfsak Repeat N404 ilCimdUC C ltp71andeltqil mew 1 gcd5q 1 1 M H 00 a 1q 1 dHMODrleaM Return ltltNgt5gtltNgtPgtd Hard RSArelated problem Def owkeaFor an adversary A consider an experiment Experiment Exp fcakeaA We1fiinqidlti 95 0 ziZnyezemod z 3 Away If I I then return 1 else return 0 The owkea advantage of A is defined as the probability of the above experiment outputting RSA assumption There is no efficient adversary whose owkea advantage is not negligible easy e x mod N easy with d hard without d 0 Let s study several known attacks that break RSA ie compute an inverse of the RSA function on random inputs without knowing the trapdoor X Some known attacks on RSA function Factoring the RSA modulus o If one can factor N ie compute pq st Npq then one can compute de1 mod plq1 o The best known algorithm to factor is GNFS Theorem RSA with low secret exponent Let Npq where qltplt2q and pq are prime Let dlt13N14 Then given Ne one can efficiently compute d Hastad s broadcast attack for RSA with low public exponent C1 C2 C3 Pkir szr sz d N g m3 mo X Pk N 3 If N1N2N3 are relatively 1 139 prime then by Chinese N2 Remainder Theorem can combine 004 C1 in3 mud N1 3 C2 m3 mod N2 Pk2N2 3 C m3de N Pk N 3 tofindC m3mod N1N2N3 3 3 I Since m3 lt N1 N2 N3 then m e V It seems natural to try to fix the problem eg by padding the messages before applying the RSA function However there are attacks known such as broadcast attack on padded RSA with low public exponents relatedmessage attack on RSA with low public exponent short pad attack goo that show that patching this problem is not easy and not such a d idea Theorem Let Npq be a kbit RSA modulus Then given k4 least or most significant bits of p one can efficiently factor N Theorem Let N be a kbit RSA modulus and let d be an RSA secret exponent Then given the k4 least significant bits of d one can efficiently recover all bits of d Reference httpcryptostanfordeduNdaboabstracts RSAattacksurveyhtm Algorithm Kiwi 1 lt 2 lt Repeat Until the following conditions are all true 124 7 2mg M New Return Npq plti2r1211 71 qiwrl 7 TESTrPRlME p 1 and TESTPRJMEQ 1 Plain RSA encryption scheme Algorithm Kia NJ 4 i Kim I 1 H171t11gt a v 1 ng dHMODelNVe Return M6 Npqd Algorithm Dw d C Algoritth Algorithm NIVI we Maw e3 Ki C e mod N M e Cd mod N Remm N 3 ll1 Remquot C Return M Plain RSA is not secure 0 Under the RSA assumption it is hard to recover a message given the public key and a ciphertext easy M cMe mod N easy with d hard without d 0 Nevertheless the plain RSA is not a good encryption scheme 0 Eg it is not INDCPA secure Why 0 One might try to add a random padding to a message before applying the RSA function but as we saw it does not necessarily helps RSAOAEP Output M iff z00 GH are hash functions RSAOAEP Hash functions G 01 En a 01quot H 01quot En a 01quot Algoritth Algon39thm imam we Iimm 3 xi r 3 0 1 Return Ne1vd s a Milo 69Gr t r69Hs Cltlt sllt gt2 mod N um c Algorithm w C W 5 mod N Fame Was sl lt r H Hs 69 M a xGr Parse M as Ml lz If 1 0 then netum Melse return L Security of RSA OAEP RSAOAEP has not been proven INDCCA secure But it is proven INDCCA secure assuming the RSA assumption and when GH are modeled as random oracles Assuming the RSA problem is hard RSAOAEP is INDCCA secure in the Random Oracle RC model CS 4803 Computer and Network Security Alexandra Sasha Boldyreva Hard problems for publickey crypto Discrete log As no encryption scheme besides the OneTimePad is unconditionally secure we need to find some building blocks hard problems assumptions about hardness of some problems to base security of our new encryption schemes on Block ciphers and their PRF security is not an option since now we don t have shared keys in the publickey asymmetrickey setting Let s consider the discrete log related problems and the RSA problem Discrete log related problems 0 Let G be a cyclic group and let m G The discrete logarithm function DLogG ga G o Zm takes a E G and returns i E Zm such that gi a 0 There are several computational problems related to this function 0 Discretelogarithm DL problem Computational DiffieHellman CDH problem 0 Decisional DiffieHellman DDH problem Problem Given Discrete Decisional Dif eHellman DL problem Lei Let G be a cyclic group and let m G Let g be a generator Consider the following experiment associated with an adversary A Experiment Exp g g A zm Xegw EHAX If g X then return 1 else return 0 The dladvantage of A is defined as the probability of the above experiment outputting l The discrete logarithm problem is said to be hard in G if the dladvantage of any adversary with reasonable resources is small Lei Let G be a cyclic group of order m Let g be a generator CDH Consider the following experiment associated with an adversary A Experiment Expg g l 0 The cdhadvantage of A is defined as the probability of the o The computational DiffieHellman CDH problem is said to be we Zm ye Zm X H g1 Y H gy Z lt AXY If Z gzy then return 1 else return 0 above experiment outputting 1 hard in G if the cdhadvantage of any adversary with reasonable resources is small DDH o Dief Let G be a cyclic group of order m Let g be a generator Consider the following experiments associated with an adversary A Experiment Exp 39RA Experiment Exp 39Wl I lt Zm I lt Zm y A zm y i 2m 2 lt 1y mod m z i Zm o XltL7 YltgyZltg 7 XltL7 YltgyZltg 7 deAXYZ deAXYZ 0 Return d Return d o The ddhadvantage ofA is defined as the difference between probabilities of outputting 0 in two experiments 0 The decisional DiffieHellman DDH problem is said to be hard in G if the ddhadvantage of any adversary with reasonable resources is small Relations between problems Fix a group and a generator Hardness of the problems depends on the choice of a group o For most groups there is an algorithm that solves the DL problem in OG12 Let s consider GZP for a prime p Claim DDH is easy Let p 2 3 be a prime let GZ and let g be a generator of G Then there is an adversary A with running time Op3 and ddhadvantage 12 Chapter 5 SYMMETRIC ENCRYPTION The symmetric setting considers two parties who share a key and will use this key to imbue commu nicated data with various security attributes The main security goals are privacy and authenticity of the communicated data The present chapter looks at privacy A later chapter looks at authen ticity Chapters 3 and 4 describe tools we shall use here 51 Symmetric encryption schemes The primitive we will consider is called an encryption scheme Such a scheme speci es an encryption algorithm which tells the sender how to process the plaintext using the key thereby producing the ciphertext that is actually transmitted An encryption scheme also speci es a decryption algorithm which tells the receiver how to retrieve the original plaintext from the transmission while possibly performing some veri cation too Finally there is a heygeneration algorithm which produces a key that the parties need to share The formal description follows De nition 51 A symmetric encryption scheme 55 IC D consists of three algorithms as follows 0 The randomized key generation algorithm 1C returns a string K We let KeysS denote the set of all strings that have non zero probability of being output by K The members of this set are called keys We write K lt1 C for the operation of executing 1C and letting K denote the key returned The encryption algorithm 5 which might be randomized or stateful takes a key K E KeysS and a plaintext M 6 01 to return a ciphertext C 6 01 U We write 0 i KM for the operation of executing 5 on K and M and letting C denote the ciphertext returned The deterministic decryption algorithm D takes a key K E KeysS and a ciphertext C 6 01 to return some M 6 01 U We write M e DKC for the operation of executing D on K and C and letting M denote the message returned The scheme is said to provide correct decryption if for any key K E KeysS and any message M 6 01 PrCTORDKCMClti KM 1 2 SYMMETRIC ENCRYPTION The key generation algorithm as the de nition indicates is randomized It takes no inputs When it is run it ips coins internally and uses these to select a key K Typically the key is just a random string of some length in which case this length is called the key length of the scheme When two parties want to use the scheme it is assumed they are in possession of K generated via 1C How they came into joint possession of this key K in such a way that the adversary did not get to know K is not our concern here and will be addressed later For now we assume the key has been shared Once in possession of a shared key the sender can run the encryption algorithm with key K and input message M to get back a string we call the ciphertext The latter can then be transmitted to the receiver The encryption algorithm may be either randomized or stateful If randomized it ips coins and uses those to compute its output on a given input K M Each time the algorithm is invoked it ips coins anew In particular invoking the encryption algorithm twice on the same inputs may not yield the same response both times We say the encryption algorithm is statefal if its operation depends on a quantity called the state that is initialized in some pre speci ed way When the encryption algorithm is invoked on inputs KM it computes a ciphertext based on KM and the current state It then updates the state and the new state value is stored The receiver does not maintain matching state and in particular decryption does not require access to any global variable or call for any synchronization between parties Usually when there is state to be maintained the state is just a counter If there is no state maintained by the encryption algorithm the encryption scheme is said to be stateless The encryption algorithm might be both randomized and stateful but in practice this is rare it is usually one or the other but not both When we talk of a randomized symmetric encryption scheme we mean that the encryption algorithm is randomized When we talk of a statefal symmetric encryption scheme we mean that the encryption algorithm is stateful The receiver upon receiving a ciphertext C will run the decryption algorithm with the same key used to create the ciphertext namely compute DKC The decryption algorithm is neither randomized nor stateful Many encryption schemes restrict the set of strings that they are willing to encrypt For example perhaps the algorithm can only encrypt plaintexts of length a positive multiple of some block length n and can only encrypt plaintexts of length up to some maximum length These kinds of restrictions are captured by having the encryption algorithm return the special symbol i when fed a message not meeting the required restriction In a stateless scheme there is typically a set of strings called the plaintext space such that Pr07 i Ci KMl 1 for all K and all M in the plaintext space In a stateful scheme whether or not KM returns L depends not only on M but also possibly on the value of the state variable For example when a counter is being used it is typical that there is a limit to the number of encryptions performed and when the counter reaches a certain value the encryption algorithm returns L no matter what message is fed to it The correct decryption requirement simply says that decryption works if a message M is encrypted under a key K to yield a ciphertext C then one can recover M by decrypting C under K This holds however only if C 7 L The condition thus says that for each key K E KeysS and message M 6 01 with probability one over the coins of the encryption algorithm either Bellare and Rogaway 3 the latter outputs i or it outputs a ciphertext C which upon decryption yields M If the scheme is stateful this condition is required to hold for every value of the state Correct decryption is naturally a requirement before one can use a symmetric encryption scheme in practice for if this condition is not met the scheme fails to communicate information accurately In analyzing the security of symmetric encryption schemes however we will see that it is sometimes useful to be able to consider ones that do not meet this condition 52 Some symmetric encryption schemes We now provide a few examples of encryption schemes We stress that not all of the schemes that follow are secure encryption schemes Some are secure and some are not as we will see later All the schemes here satisfy the correct decryption requirement 521 The onetime pad encryption scheme We begin with the classical one time pad Scheme 52 Onetime pad encryption The one timepad encryption scheme 55 IC D is stateful and deterministic The key generation algorithm simply returns a random k bit string K where the key length k is a parameter of the scheme so that the key space is KeysS 01k The encryptor maintains a counter ctr which is initially zero The encryption and decryption algorithms operate as follows algorithm KM algorithm DKltctrCgt Let static ctr e 0 Let m e Let m lt if ctr m gt k then return T if ctrmgtkthen returni MeCEBKctr1ctrm CeMEBKctr1ctrm returnM ctr lt ctr m return ctr 7 mCgt Here X2 9 denotes the i th through j th bit of the binary string X By ctr C we mean a string that encodes the number ctr and the string C The most natural encoding is to encode ctr using some xed number of bits at least lg k and to prepend this to C Conventions are established so that every string Y is regarded as encoding some ctrC for some ctr C The encryption algorithm XORs the message bits with key bits starting with the key bit indicated by one plus the current counter value The counter is then incremented by the length of the message Key bits are not reused and thus if not enough key bits are available to encrypt a message the encryption algorithm returns L Note that the ciphertext returned includes the value of the counter This is to enable decryption Recall that the decryption algorithm as per De nition 51 must be stateless and deterministic so we do not want it to have to maintain a counter as well I 522 Some modes of operation The following schemes rely either on a family of permutations ie a block cipher or a family of functions Effectively the mechanisms spell out how to use the block cipher to encrypt We call such a mechanism a mode of operation of the block cipher For some of the schemes it is convenient 4 SYMMETRIC ENCRYPTION algorithm KM if mod 71 7E 0 or 0 then return 1 Break M into n bit blocks M1 for 2 e 1 to m do 0m e EKM2 1gt C lt C1 Cm return C algorithm DKC if mod n 7 0 or C 0 then return 1 Break C into n bit blocks C1 Cm for 2 e 1 to m do Mm e Ewen return M Figure 51 ECB mode to assume that the length of the message to be encrypted is a positive multiple of a block length associated to the family Accordingly we will let the encryption algorithm returns 1 if this is not the case In practice one could pad the message appropriately so that the padded message always had length a positive multiple of the block length and apply the encryption algorithm to the padded message The padding function should be injective and easily invertible In this way you would create a new encryption scheme The rst scheme we consider is ECB Electronic Codebook Mode whose security is considered in Section 551 Scheme 53 ECB mode Let E C X 01 a 01 be a block cipher Operating it in ECB Electronic Code Book mode yields a stateless symmetric encryption scheme 55 IC D The key generation algorithm simply returns a random key for the block cipher meaning it picks a random string K iIC and returns it The encryption and decryption algorithms are depicted in Fig 51 Break M into n bit blocks M1 Mm means to set m and for 2 E 1 m set to the z th n bit block in M that is 7 1n 1 through in of M Similarly for breaking C into C1 Notice that this time the encryption algorithm did not make any random choices That does not mean it is not technically a randomized algorithm it is simply a randomized algorithm that happened not to make any random choices I The next scheme cipher block chaining CBC with random initial vector is the most popular block cipher mode of operation used pervasively in practice Scheme 54 CBCEB mode Let E C gtlt01 a 01 be a block cipher Operating it in CBC mode with random IV yields a stateless symmetric encryption scheme 55 IC D The key generation algorithm simply returns a random key for the block cipher K lt1 K The encryption and decryption algorithms are depicted in Fig 52 The lV initialization vector is C0 which Bellare and Rogaway 5 algorithm KM if mod 71 7E O or 0 then return 1 Break M into n bit blocks M1 C0 lt 1v lt1 01 for 2 e 1 to m do e EKC2 7 1 e9 C e C1 return IVCgt algorithm DKltIVCgt if mod n 7 O or 0 then return 1 Break C into n bit blocks C1 C0 lt IV for 2 lt 1 to m do MM 9 E1C2 e9 C2 7 1 return M Figure 52 CBC mode is chosen at random by the encryption algorithm This choice is made independently each time the algorithm is invoked I For the following schemes it is useful to introduce some notation If n 2 1 and 2 2 O are integers then we let denote the 71 bit string that is the binary representation of integer 2 mod 2 If we use a number 2 2 O in a context for which a string I 6 01 is required it is understood that we mean to replace 2 by I The following is a counter based version of CBC mode whose security is considered in Section 553 Scheme 55 CBCC mode Let E C X 01 a 01 be a block cipher Operating it in CBC mode with counter IV yields a stateful symmetric encryption scheme 55 IC D The key generation algorithm simply returns a random key for the block cipher K lt1 K The encryptor maintains a counter ctr which is initially zero The encryption and decryption algorithms are depicted in Fig 53 The IV initialization vector is CO which is set to the current value of the counter The counter is then incremented each time a message is encrypted The counter is a static variable meaning that its value is preserved across invocations of the encryption algorithm The CTR counter modes that follow are not much used to the best of our knowledge but perhaps wrongly so We will see later that they have good privacy properties In contrast to CBC the encryption procedure is parallelizable which can be exploited to speed up the process in the presence of hardware support It is also the case that the methods work for strings of arbitrary bit lengths without doing anything special to achieve this end There are two variants of CTR mode one random and the other stateful and as we will see later their security properties are different For security analyses see Section 57 and Section 5101 6 SYMMETRIC ENCRYPTION algorithm KM static ctr e 0 if mod n 7 O or 0 then return L Break M into n bit blocks M1 if ctr 2 2 then return i CO 9 IV e ctrln for 2 e 1 to m do H EKC2 7 1 e C lt C1 Cm ctr e ctr 1 return IVCgt algorithm DKltIVCgt if mod 71 7S 0 or C 0 then return i Break C into n bit blocks C1 Cm if IV m gt 2 then return L C0 lt IV for ie 1 to m do MM 9 EI1C2 e9 C2 7 1 return M Figure 53 CBCC mode Scheme 56 CTREB mode Let F C X 012 A O1L be a family of functions Possibly a block cipher but not necessarily Then CTR mode over F with a random starting point is a probabilistic stateless symmetric encryption scheme 55 IC D The key generation algo rithm simply returns a random key for E The encryption and decryption algorithms are depicted in Fig 54 The starting point R is used to de ne a sequence of values on which FK is applied to produce a pseudo one time pad77 to which the plaintext is XORed The starting point R chosen by the encryption algorithm is a random 6 bit string To add an 6 bit string R to an integer iiwhen we write FKR2 7convert the 6 bit string R into an integer in the range 0 2e 7 1 in the usual way add this number to 2 take the result modulo 2e and then convert this back into an 6 bit string Note that the starting point R is included in the ciphertext to enable decryption On encryption the pad Pad is understood to be the empty string when m 0 We now give the counter based version of CTR mode Scheme 57 CTRC mode Let F C X 012 A O1L be a family of functions Possibly a block cipher but not necessarily Operating it in CTR mode with a counter starting point is a stateful symmetric encryption scheme 55 IC D which we call CTRC The key generation algorithm simply returns a random key for F The encryptor maintains a counter ctr which is initially zero The encryption and decryption algorithms are depicted in Fig 55 Position index ctr is not allowed to wrap around the encryption algorithm returns L if this would happen The position index is included in the ciphertext in order to enable decryption The encryption algorithm Bellare and Rogaway 7 algorithm KM m a HMlLl Rio1l Pad HFKR 1 FKR2 FKRm Pad e the rst bits of Pad 0 lt M 69 Pad 0 HR H 0 return C algorithm DKC if C lt 6 then return 1 Parse C into R C where R 6 m H flClLl Pad HFKR 1 FKR2 FKRm Pad e the rst IC I bits of Pad M lt C 69 Pad return M Figure 54 CTR mode using a family of functions F C X 012 A O1L This version of counter mode is randomized and stateless algorithm KM static ctr e O m e HMILi lf ctr m 2 22 then return 1 Pad e FKctr 1 FKctr 2 Pad 9 the rst bits of Pad C lt M 69 Pad ctr e ctr m return ctr 7 mC l FKctr m algorithm DKlt2 C m e HalLi Pad e FK2 1 FKi2 FKG m Pad lt the rst C bits of Pad M lt Pad 69 C return M Figure 55 CTRC mode using a family of functions F C X 01 a O1L This version of counter mode uses stateful but deterministic encryption updates the position index upon each invocation and begins with this updated value the next time 8 SYMMETRIC ENCRYPTION it is invoked I We will return to the security of these schemes after we have developed the appropriate notions 53 Issues in privacy Let us x a symmetric encryption scheme 55 IC D Two parties share a key K for this scheme this key having being generated as K i IC The adversary does not a priori know K We now want to explore the issue of what the privacy of the scheme might mean For this chapter security is privacy and we are trying to get to the heart of what security is about The adversary is assumed able to capture any ciphertext that ows on the channel between the two parties It can thus collect ciphertexts and try to glean something from them Our rst question is what exactly does glean mean What tasks were the adversary to accomplish them would make us declare the scheme insecure And correspondingly what tasks were the adversary unable to accomplish them would make us declare the scheme secure It is easier to think about insecurity than security because we can certainly identify adversary actions that indubitably imply the scheme is insecure So let us begin here For example if the adversary can from a few ciphertexts derive the underlying key K it can later decrypt anything it sees so if the scheme allowed easy key recovery from a few ciphertexts it is de nitely insecure Now the mistake that is often made is to go on to reverse this saying that if key recovery is hard then the scheme is secure This is certainly not true for there are other possible weaknesses For example what if given the ciphertext the adversary could easily recover the plaintext M without nding the key Certainly the scheme is insecure then too So should we now declare a scheme secure if it is hard to recover a plaintext from the ciphertext Many people would say yes Yet this would be wrong too One reason is that the adversary might be able to gure out partial information about M For example even though it might not be able to recover M the adversary might given C be able to recover the rst bit of M or the sum of all the bits of M This is not good because these bits might carry valuable information For a concrete example say I am communicating to my broker a message which is a sequence of buy or sell decisions for a pre speci ed sequence of stocks That is we have certain stocks numbered 1 through in and bit i of the message is 1 ifl want to buy stock i and 0 otherwise The message is sent encrypted But if the rst bit leaks the adversary knows whether I want to buy or sell stock 1 which may be something I don t want to reveal If the sum of the bits leaks the adversary knows how many stocks 1 am buying Granted this might not be a problem at all if the data were in a different format However making assumptions or requirements on how users format data or how they use it is a bad and dangerous approach to secure protocol design An important principle of good cryptographic design is that the encryption scheme should provide security regardless of the format of the plaintext Users should not have to worry about the how they format their data they format it as they like and encryption should provide privacy nonetheless Put another way as designers of security protocols we should not make assumptions about data content or formats Our protocols must protect any data no matter how formatted We view it as the job of the protocol designer to ensure this is true Bellare and Rogaway 9 At this point it should start becoming obvious that there is an in nite list of insecurity proper ties and we can hardly attempt to characterize security as their absence We need to think about security in a different and more direct way and arrive at some de nition of it This important task is surprisingly neglected in many treatments of cryptography which will provide you with many schemes and attacks but never actually de ne the goal by saying what an encryption scheme is actually trying to achieve and when it should be considered secure rather than merely not known to be insecure This is the task that we want to address One might want to say something like the encryption scheme is secure if given C the adversary has no idea what M is This however cannot be true because of what is called a priori information Often something about the message is known For example it might be a packet with known headers Or it might be an English word So the adversary and everyone else has some information about the message even before it is encrypted We want schemes that are secure in the strongest possible natural sense What is the best we could hope for It is useful to make a thought experiment What would an ideal encryption be like Well it would be as though some angel took the message M from the sender and delivered it to the receiver in some magic way The adversary would see nothing at all lntuitively our goal is to approximate this as best as possible We would like encryption to have the properties of ideal encryption In particular no partial information would leak We do deviate from the ideal in one way though Encryption is not asked to hide the length of the plaintext string This information not only can leak but is usually supposed to be known to the adversary a priori As an example consider the ECB encryption scheme of Scheme 53 Given the ciphertext can an eavesdropping adversary gure out the message It is hard to see how since it does not know K and if F is a good block cipher then it ought to have a hard time inverting FK without knowledge of the underlying key Nonetheless this is not a good scheme Consider just the case n 1 of a single block message Suppose a missile command center has just two messages 1 for re and O for don t re It keeps sending data but always one of these two What happens When the rst ciphertext 01 goes by the adversary may not know what is the plaintext But then let us say it sees a missile taking off Now it knows the message M1 underlying C1 was 1 But then it can easily decrypt all subsequent messages for if it sees a ciphertext C the message is 1 ifC01 and Oniny Cl In a secure encryption scheme it should not be possible to relate ciphertexts of different messages of the same length in such a way that information is leaked Not allowing messageequalities to be leaked has a dramatic implication Namely encryption must be probabilistic or depend on state information If not you can always tell if the same message was sent twice Each encryption must use fresh coin tosses or say a counter and an encryption of a particular message may be different each time In terms of our setup it means 5 is a probabilistic or statefal algorithm That s why we de ned symmetric encryption schemes above to allow these types of algorithms The reason this is dramatic is that it goes in many ways against the historical or popular notion of encryption Encryption was once thought of as a code a xed mapping of plaintexts to ciphertexts But this is not the contemporary viewpoint A single plaintext should have many possible ciphertexts depending on the random choices or the state of the encryption algorithm Yet it must be possible to decrypt How is this possible We have seen several examples above One formalization of privacy is what is called perfect security an information theoretic notion introduced by Shannon and showed by him to be met by the onetime pad scheme and covered in 10 SYMMETRIC ENCRYPTION Chapter 2 Perfect security asks that regardless of the computing power available to the adversary the ciphertext provides it no information about the plaintext beyond the a priori information it had prior to seeing the ciphertext Perfect security is a very strong attribute but achieving it requires a key as long as the total amount of data encrypted and this is not usually practical So here we look at a notion of computational security The security will only hold with respect to adversaries of limited computing power If the adversary works harder she can gure out more but a feasible amount of effort yields no noticeable information This is the important notion for us and will be used to analyze the security of schemes such as those presented above 54 Indistinguishability under chosenplaintext attack Having discussed the issues in Section 53 above we will now distill a formal de nition of security 541 De nition The basic idea behind indistinguishability or more fully leftor riyht indistinguishability under a chosenplaintext attack is to consider an adversary not in possession of the secret key who chooses two messages of the same length Then one of the two messages is encrypted and the ciphertext is given to the adversary The scheme is considered secure if the adversary has a hard time telling which of the two messages was the one encrypted We will actually give the adversary a little more power letting her choose a whole sequence of pairs of equal length messages Let us now detail the game The adversary chooses a sequence of pairs of messages M01M11 M0qM1q where in each pair the two messages have the same length We give to the adversary a sequence of ciphertexts 01 Cq where either 1 C is an encryption of M0 for all 1 g i S q or 2 C is an encryption of MM for all 1 g i 3 1 In doing the encryptions the encryption algorithm uses the same key but fresh coins or an updated state each time The adversary gets the sequence of ciphertexts and now it must guess whether Mm Mo q were encrypted or M1 1 Ml q were encrypted To further empower the adversary we let it choose the sequence of message pairs via a chosen plaintext attack This means that the adversary chooses the rst pair then receives 01 then chooses the second pair receives 02 and so on Sometimes this is called an adaptive chosen plaintext attack because the adversary can adaptively choose each query in a way responsive to the earlier answers Let us now formalize this We x some encryption scheme 55 IC D It could be either stateless or stateful We consider an adversary A It is a program which has access to an oracle to which it can provide as input any pair of equal length messages The oracle will return a ciphertext We will consider two possible ways in which this ciphertext is computed by the oracle corresponding to two possible worlds in which the adversary lives To do this rst de ne the leftor riyht encryption oracle abbreviated lr encryption oracle KLR b as shown in Fig 56 The oracle encrypts one of the messages the choice of which being made according to the bit b Now the two worlds are as follows World 0 The oracle provided to the adversary is KLRO So whenever the adversary makes a query M0M1 with M0 M1 the oracle computes C lti KM0 and returns C as the answer Bellare and Rogaway 11 Oracle KLRM0M1b b e 01 and M0M1 e 0 l if M0 y M1 then return L C 1 KMb return C Figure 56 Left or right lor encryption oracle used to de ne IND CPA security of encryption scheme 55 IC D World 1 The oracle provided to the adversary is KLR So whenever the adversary makes a query M0M1 with M0 M1 to its oracle the oracle computes C i5KM1 and returns C as the answer We also call the rst world or oracle the left world or oracle and the second world or oracle the right world or oracle The problem for the adversary is after talking to its oracle for some time to tell which of the two oracles it was given Before we pin this down let us further clarify exactly how the oracle operates Think of the oracle as a subroutine to which A has access Adversary A can make an oracle query M0M1 by calling the subroutine with arguments M0M1 In one step the answer is then returned Adversary A has no control on how the answer is computed nor can A see the inner workings of the subroutine which will typically depend on secret information that A is not provided Adversary A has only an interface to the subroutineithe ability to call it as a black box and get back an answer First assume the given symmetric encryption scheme 55 is stateless The oracle in either world is probabilistic because it calls the encryption algorithm Recall that this algorithm is probabilistic Above when we say C lt1 KMb it is implicit that the oracle picks its own random coins and uses them to compute ciphertext C The random choices of the encryption function are somewhat under the rug77 here not being explicitly represented in the notation But these random bits should not be forgotten They are central to the meaningfulness of the notion and the security of the schemes If the given symmetric encryption scheme 55 is stateful the oracles in either world become stateful too Think of a subroutine that maintains a static variable across successive calls An oracle begins with a state value initialized to a value speci ed by the encryption scheme For example in CTRC mode the state is an integer ctr that is initialized to 0 Now each time the oracle is invoked it computes KMb according to the speci cation of algorithm 5 The algorithm may as a side effect update the state and upon the next invocation of the oracle the new state value will be used The following de nition associates to a symmetric encryption scheme 55 and an adversary A a pair of experiments one capturing each of the worlds described above The adversary s advantage which measures its success in breaking the scheme is the difference in probabilities of the two experiments returning the bit one De nition 58 Let SS IC D be a symmetric encryption scheme and let A be an algorithm that has access to an oracle We consider the following experiments 12 SYMMETRIC ENCRYPTION Experiment EXP gCpa1A Experiment Expglg Cpa0A KiK K K diA g LRbJ d i A5KLR0 Return 01 Return d The oracle used above is speci ed in Fig 56 The INDCPA advantage of A is de ned as Advglg8paA Pr Expglg Cpa1A 1 7 Pr Expglg Cpa0A 1 I As the above indicates the choice of which world we are in is made just once at the beginning before the adversary starts to interact with the oracle ln world 0 all message pairs sent to the oracle are answered by the oracle encrypting the left message in the pair while in world 1 all message pairs are answered by the oracle encrypting the right message in the pair The choice of which does not ip op from oracle query to oracle query If AdvglgcpaA is small meaning close to zero it means that A is outputting 1 about as often in world 0 as in world 1 meaning it is not doing a good job of telling which world it is in If this quantity is large meaning close to oneior at least far from zero then the adversary A is doing well meaning our scheme 55 is not secure at least to the extent that we regard A as reasonable lnformally for symmetric encryption scheme 55 to be secure against chosen plaintext attack the IND CPA advantage of an adversary must be small no matter what strategy the adversary tries However we have to be realistic in our expectations understanding that the advantage may grow as the adversary invests more effort in its attack Security is a measure of how large the advantage of the adversary might when compared against the adversary s resources We consider an encryption scheme to be secure against chosen plaintext attack if an adversary restricted to using practical amount of resources computing time number of queries cannot obtain signi cant advantage The technical notion is called leftor right indistinguishability under chosen plaintext attack denoted IND CPA We discuss some important conventions regarding the resources of adversary A The running time of an adversary A is the worst case execution time of A over all possible coins of A and all conceivable oracle return values including return values that could never arise in the experiments used to de ne the advantage Oracle queries are understood to return a value in unit time but it takes the adversary one unit of time to read any bit that it chooses to read By convention the running time of A also includes the size of the code of the adversary A in some xed RAM model of computation This convention for measuring time complexity is the same as used in other parts of these notes for all kinds of adversaries Other resource conventions are speci c to the IND CPA notion When the adversary asks its left or right encryption oracle a query M0 M1 we say that length of this query is maxM0 This will equal M0 for any reasonable adversary since an oracle query with messages of different lengths results in the adversary being returned 1 so we can assume no reasonable adversary makes such a query The total length of queries is the sum of the length of each query We can measure query lengths in bits or in blocks with block having some understood number of bits 71 The resources of the adversary we will typically care about are three First its timecomplexity measured according to the convention above Second the number of oracle queries meaning the number of message pairs the adversary asks of its oracle These messages may have different lengths and our third resource measure is the sum of all these lengths denoted Ia again measured according to the convention above Bellare and Rogaway 13 542 Alternative interpretation Let us move on to describe a somewhat different interpretation of left or right indistinguishability Why is Advglg8paA called the advantage of the adversary We can view the task of the adversary as trying to guess which world it is in A trivial guess is for the adversary to return a random bit In that case it has probability 12 of being right Clearly it has not done anything damaging in this case The advantage of the adversary measures how much better than this it does at guessing which world it is in namely the excess over 12 of the adversary s probability of guessing correctly In this subsection we will see how the above de nition corresponds to this alternative view a view that lends some extra intuition to the de nition and is also useful in later usages of the de nition Proposition 59 Let SS 1C 5 D be a symmetric encryption scheme and let A be an algorithm that has access to an oracle that takes input a pair of strings and returns a string We consider the following experiment Experiment ExpglgcpacgA b lt1 0 1 K lt1 K b ltiAzKLRb if b b then return 1 else return 0 Then Advg g39cpam 2 Pr Expglg39cpa39cgw 1 7 1 In the above experiment adversary A is run with an oracle for world b where the bit I is chosen at random A eventually outputs a bit b its guess as to the value of b The experiment returns 1 if A s guess is correct Thus Pr Expglg8pacgA 1 is the probability that A correctly guesses which world it is in The cg in the superscript naming the experiment stands for correct guess The probability is over the initial choice of world as given by the bit b the choice of K the random choices of K if any and the coins of A if any This value is 12 when the adversary deserves no advantage since one can guess I correctly by a strategy as simple as always answer zero or answer with a random bit The advantage of A can thus be viewed as the excess of this probability over 12 which re scaled is 2 Pr ExpglgCpaWA 1 71 The Proposition says that this rescaled advantage is exactly the same measure as before Proof of Proposition 59 We let Pr be the probability of event a in the experiment Expglg8pacgA and refer below to quantities in this experiment The claim of the Proposition follows by a straightforward calculation Pr Expg g39cpa39cm ll Pr b 27 Prbbb1 Prb1Prbb 270 Prb0 14 SYMMETRIC ENCRYPTION prbbb1 prbbbo 1 1 rgtrza1za154rrgtrza0b0a 1 1 Prb1za154r11rgtrza1b05 1 5Prb 1b17Prb 1b0 1 2 Pr Expg g39cpa m 1 7 Pr mpgmom 1 mlelehd 1 A 5 Advg g WA We began by expanding the quantity of interest via standard conditioning The term of 12 in the third line emerged because the choice of b is made at random In the fourth line we noted that if we are asking whether I b given that we know I 1 it is the same as asking whether 27 1 given I 1 and analogously for b 0 In the fth line and sixth lines we just manipulated the probabilities and simpli ed The next line is important here we observed that the conditional probabilities in question are exactly the probabilities that A returns 1 in the experiments of De nition 58 I 543 Why is this a good de nition Our thesis is that we should consider an encryption scheme to be secure if and only if it is IND CPA secure meaning that the above formalization captures our intuitive sense of privacy and the security requirements that one might put on an encryption scheme can be boiled down to this one But why Why does IND CPA capture privacy This is an important question to address and answer In particular here is one concern In Section 53 we noted a number of security properties that are necessary but not su icient for security For example it should be computationally infeasible for an adversary to recover the key from a few plaintext ciphertext pairs or to recover a plaintext from a ciphertext A test of our de nition is that it implies the necessary properties that we have discussed and others For example a scheme that is secure in the IND CPA sense of our de nition should also be automatically secure against key recovery or plaintextrecovery Later we will prove such things and even stronger things For now let us continue to get a better sense of how to work with the de nition by using it to show that certain schemes are insecure 55 Example chosenplaintext attacks We illustrate the use of our IND CPA de nition in nding attacks by providing an attack on ECB mode and also a general attack on deterministic stateless schemes 551 Attack on ECB Let us x a block cipher E C X 01 a 01 The ECB symmetric encryption scheme 55 IC D was described as Scheme 53 Suppose an adversary sees a ciphertext C KM Bellare and Rogaway 15 corresponding to some random plaintext M encrypted under the key K also unknown to the ad versary Can the adversary recover M 7 Not easily if E is a good block cipher For example if E is AES it seems quite infeasible Yet we have already discussed how infeasibility of recovering plaintext from ciphertext is not an indication of security ECB has other weaknesses Notice that if two plaintexts M and M agree in the rst block then so do the corresponding ciphertexts So an adversary given the ciphertexts can tell whether or not the rst blocks of the corresponding plain texts are the same This is loss of partial information about the plaintexts and is not permissible in a secure encryption scheme It is a test of our de nition to see that it captures these weaknesses and also nds the scheme insecure It does To show this we want to show that there is an adversary that has a high 1ND CPA advantage while using a small amount of resources We now construct such an adversary A Remember that A is given a lr encryption oracle KLR 317 that takes as input a pair of messages and that returns an encryption of either the left or the right message in the pair depending on the value of the bit b The goal of A is to determine the value of b Our adversary works like this Adversary A 91 ltI Rquotquotl gtgt M1 902 M0 9071111 C1C2 lt KLRM0M1b lf C1 C2 then return 1 else return 0 Above denotes the i th block of a string X a block being a sequence of n bits The adversary s single oracle query is the pair of messages M0 M1 Since each of them is two blocks long so is the ciphertext computed according to the ECB scheme Now we claim that Pr ExpglgcpalA1 land Pr Expg g39cpa390A 1 0 Why You have to return to the de nitions of the quantities in question and trace through the ex periments de ned there In world 1 meaning I 1 the oracle returns C1C2 EK0 EKO so C1 C2 and A returns 1 In world 0 meaning I 0 the oracle returns C1C2 EKO EK1 Since EK is a permutation C1 7 C2 So A returns 0 in this case Subtracting we get AdvglgcpaA 17 O 1 And A achieved this advantage by making just one oracle query whose length which as per our conventions is just the length of M0 is 2n bits This means that the ECB encryption schems is insecure As an exercise try to analyze the same adversary as an adversary against CBC or CTR modes and convince yourself that the adversary will not get a high advantage There is an important feature of this attack that must be emphasized Namely ECB is an insecure encryption scheme even if the underlying block cipher E is highly secure The weakness is not in the tool being used here the block cipher but in the manner we are using it It is the ECB mechanism that is at fault Even the best of tools are useless if you don t know how to properly use them This is the kind of design aw that we want to be able to spot and eradicate Our goal is to nd symmetric encryption schemes that are secure as long as the underlying block cipher is secure In other words the scheme has no inherent aw as long as you use good ingredients the recipe will produce a good meal If you don t use good ingredients Well that is your problem All bets are off 16 SYMMETRIC ENCRYPTION 552 Any deterministic stateless schemes is insecure ECB mode is deterministic and stateless so that if the same message is encrypted twice the same ciphertext is returned It turns out that this property in general results in an insecure scheme and provides perhaps a better understanding of why ECB fails Let us state the general fact more precisely Proposition 510 Let SS 1C 5 D be a deterministic stateless symmetric encryption scheme Assume there is an integer m such that the plaintext space of the scheme contains two distinct strings of length m Then there is an adversary A such that Advg g39cpam 1 Adversary A runs in time Om and asks just two queries each of length m I The requirement being made on the message space is minimal typical schemes have messages spaces containing all strings of lengths between some minimum and maximum length possibly restricted to strings of some given multiples Note that this Proposition applies to ECB and is enough to show the latter is insecure Proof of Proposition 510 We must describe the adversary A Remember that A is given an lr encryption oracle f KLRb that takes input a pair of messages and returns an encryption of either the left or the right message in the pair depending on the value of b The goal of A is to determine the value of b Our adversary works like this Adversary Af Let X Y be distinct m bit strings in the plaintext space 01 e KLRXY b 02 e KLRY Y b If C1 C2 then return 1 else return 0 Now we claim that Pr ExpglgcpalA 1 1 and Pr mpgmom 1 0 Why ln world 1 meaning I 1 the oracle returns 01 KY and C2 EKG and since the encryption function is deterministic and stateless C1 02 so A returns 1 ln world 0 meaning I 0 the oracle returns C1 KX and C2 EKG and since it is required that decryption be able to recover the message it must be that Cl 7E 02 So A returns 0 Subtracting we get Advglg CpaA 1 7 O 1 And A achieved this advantage by making two oracle queries each of whose length which as per our conventions is just the length of the rst message is m bits I Bellare and Rogaway 17 553 Attack on CBC encryption with counter IV Let us x a block cipher E C X 01 a 01 Let SS IC D be the corresponding counter based version of the CBC encryption mode described in Scheme 55 We show that this scheme is insecure The reason is that the adversary can predict the counter value To justify our claim of insecurity we present an adversary A As usual it is given an lr encryption oracle KLR 5 and wants to determine 5 Our adversary works like this Adversary A 91 ltI Rquotquotbgt Mo l lt 0n Ml l lt 0 M02 H 0 M12 H Onill IV1Cl lt1 5KLRMD1M115 ltlV2 02 i KLRM02M12 5 If C1 C2 then return 1 else return 0 We claim that Pr Exp g8palA I 1 and Pr mpgmom 1 0 Why First consider the case b 0 meaning we are in world 0 In that case lV1 O and 1V2 1 and C1 EKO and C2 EKG and so C1 7E C2 and the de ned experiment returns 0 On the other hand if b 1 meaning we are in world 1 then 1V1 O and lV21 1 and C1 EKO and C2 EKO so the de ned experiment returns 1 Subtracting we get Advglg CpaA 1 7 O 1 showing that A has a very high advantage Moreover A is practical using very few resources So the scheme is insecure 56 INDCPA implies PRCPA In Section 53 we noted a number of security properties that are necessary but not su icient for security For example it should be computationally infeasible for an adversary to recover the key from a few plaintext ciphertext pairs or to recover a plaintext from a ciphertext A test of our de nition is that it implies these properties in the sense that a scheme that is secure in the sense of our de nition is also secure against key recovery or plaintext recovery The situation is analogous to what we saw in the case of PRFs There we showed that a secure PRF is secure against key recovery In order to have some variation this time we choose a di erent property namely plaintext recovery We formalize this and then show if there was an adversary B capable of recovering the plaintext from a given ciphertext then this would enable us to construct an adversary A that broke the scheme in the IND CPA sense meaning the adversary can identify which of the two worlds it is in If the scheme is secure in the IND CPA sense that latter adversary could not exist and hence neither could the former The idea of this argument illustrates one way to evidence that a de nition is goodisay the de nition of left or right indistinguishability Take some property that you feel a secure scheme should have like infeasibility of key recovery from a few plaintext ciphertext pairs or infeasibility of predicting the XOR of the plaintext bits Imagine there were an adversary B that was successful at this task We should show that this would enable us to construct an adversary A that broke 18 SYMMETRIC ENCRYPTION the scheme in the original sense left or right indistinguishability Thus the adversary B does not exist if the scheme is secure in the left or right sense More precisely we use the advantage function of the scheme to bound the probability that adversary B succeeds Let us now go through the plaintext recovery example in detail The task facing the adver sary will be to decrypt a ciphertext which was formed by encrypting a randomly chosen challenge message of some length m In the process we want to give the adversary the ability to see plaintext ciphertext pairs which we capture by giving the adversary access to an encryption oracle This encryption oracle is not the lr encryption oracle we saw above instead it simply takes input a single message M and returns a ciphertext C i5KM computed by encrypting M To capture providing the adversary with a challenge ciphertext we choose a random m bit plaintext M com pute C e KM and give C to the adversary The adversary wins if it can output the plaintext M corresponding to the ciphertext C For simplicity we assume the encryption scheme is stateless and that 0 1m is a subset of the plaintext space associated to the scheme As usual when either the encryption or the challenge oracle invoke the encryption function it is implicit that they respect the randomized nature of the encryption function meaning the latter tosses coins anew upon each invocation of the oracle De nition 511 Let SS 1C 5 D be a stateless symmetric encryption scheme whose plaintext space includes 01m and let B be an algorithm that has access to an oracle We consider the following experiment pr spa Experiment Expsg B K lt1 K M lt1 0 1m C lt1 KM M lt1 BM 0 lfM M then return 1 else return 0 The PRCPA advantage of B is de ned as AdvglgcpaB Pr ExngCpaB 1 I In the experiment above B is executed with its oracle and challenge ciphertext C The adversary B wins if it can correctly decrypt C and in that case the experiment returns 1 In the process the adversary can make encryption oracle queries as it pleases The following Proposition says that the probability that an adversary successfully recovers a plaintext from a challenge ciphertext cannot exceed the IND CPA advantage of the scheme with resource parameters those of the plaintext recovery adversary plus the chance of simply guessing the plaintext In other words security in the IND CPA sense implies security in the PR CPA sense Proposition 512 INDCPA PRCPA Let SS IC D be a stateless symmetric en cryption scheme whose plaintext space includes 01m Suppose that B is a plaintext recovery adversary that runs in time t and asks at most 1 queries these queries totaling at most a bits Then there exists an adversary A such that 1 2 m Advggcpw s magEmu Bellare and Rogaway 19 Furthermore the running time of A is that of B plus Oumc where c bounds the length of the encryption of an m bit string A makes 1 1 oracle queries and these queries total at most 1 m bits I Proof of Proposition 512 As per De nition 58 adversary A will be provided an lr encryption oracle and will try to determine in which world it resides To do so it will run adversary B as a subroutine We provide the description followed by an explanation and analysis Adversary AgK LRquotquotbgtgt MO lti01m M1 i01m C e KLRM0M1b Run adversary B on input C replying to its oracle queries as follows When B makes an oracle query X to 9 do Y e KLRX X 27 return Y to B as the answer When B halts and outputs a plaintext M If M M1 then return 1 else return 0 Here A is running B and itself providing answers to B s oracle queries To make the challenge ciphertext C for B adversary A chooses random messages M0 and M1 and uses its lr oracle to get the encryption C of one of them When B makes an encryption oracle query X adversary A needs to return KX It does this by invoking its lr encryption oracle setting both messages in the pair to X so that regardless of the value of the bit b the ciphertext returned is an encryption of X just as B wants When B outputs a plaintext M adversary A tests whether M M1 and if so bets that it is in world 1 Otherwise it bets that it is in world 0 Now we claim that PrlExpg g39Cpa391A1l 3 AdvglgcpaB 51 Pr ExpglgCpa0A1 5 2 52 We will justify these claims shortly but rst let us use them to conclude Subtracting as per De nition 58 we get AdwglgCWA Pr Expg g39cpa391A 1 7 Pr Expg g39cpa390A 1 2 AdvglgcpaB 7 2 7 It remains to justify Equations 51 and 52 Adversary B will return M DKC with probability AdngngaB ln world 1 ciphertext C is an encryption of M1 so this means that M M1 with probability at least AdvglgcpaB and thus Equation 51 is true Now assume A is in world 0 In that case adversary A will return 1 only if B returns M M1 But B is given no information about M1 since C is an encryption of M0 and M1 is chosen randomly and independently of M0 It is simply impossible for B to output M1 with probability greater than 2 Thus Equation 52 is true I Similar arguments can be made to show that other desired security properties of a symmetric encryption scheme follow from this de nition For example is it possible that some adversary B 20 SYMMETRIC ENCRYPTION given some plaintext ciphertext pairs and then a challenge ciphertext C can compute the XOR of the bits of M DKC Or the sum of these bits Or the last bit of M lts probability of doing any of these cannot be more than marginally above 12 because were it so we could design an adversary A that won the left or right game using resources comparable to those used by B We leave as an exercise the formulation and working out of other such examples along the lines of Proposition 512 Of course one cannot exhaustively enumerate all desirable security properties But you should be moving towards being convinced that our notion of leftor right security covers all the natural desirable properties of security under chosen plaintext attack Indeed we err if anything on the conservative side There are some attacks that might in real life be viewed as hardly damaging yet our de nition declares the scheme insecure if it succumbs to one of these That is all right there is no harm in making our de nition a little demanding What is more important is that if there is any attack that in real life would be viewed as damaging then the scheme will fail the left or right test so that our formal notion too declares it insecure 57 Security of CTR modes Recall that the CTR counter mode of operation of a family of functions comes in two variants the randomized stateless version CTRC of Scheme 56 and the counter based stateful mechanism CTR of Scheme 57 Both modes achieve indistinguishability under a chosen plaintext attack but interestingly the quantitative security is a little different The difference springs from the fact that CTRC achieves perfect indistinguishability if one uses the random function family Funcn in the role of the underlying family of functions Fibut CTR would not achieve perfect indistin guishability even then because of the possibility that collisions would produce overlaps in the pseudo onetime pad We will state the main theorems about the schemes discuss them and then prove them For the counter version we have Theorem 513 Security of CTRC mode Let F ICgtltO1Z a O1L be a family of functions and let SS IC D be the corresponding CTRC symmetric encryption scheme as described in Scheme 57 Let A be an adversary for attacking the IND CPA security of 55 that runs in time at most t and asks at most 1 queries these totaling at most a L bit blocks Then there exists an adversary B attacking the PRF security of F such that Advg g39cpam g AdvfB Furthermore B runs in time at most 75 tOq La and asks at most 1 a oracle queries I Theorem 514 Security of CTREB mode Let F C X O1l a O1L be a block cipher and let SS IC D be the corresponding CTR symmetric encryption scheme as described in Scheme 56 Let A be an adversary for attacking the IND CPA security of 55 that runs in time at most t and asks at most 1 queries these totaling at most a L bit blocks Then there exists an adversary B attacking the PRF security of F such that 05 72 Advi E39CWA Adv ltBgt 22 Furthermore B runs in time at most t tOq La and asks at most 1 a oracle queries I Bellare and Rogaway 21 The above theorems exemplify the kinds of results that the provable security approach is about Namely we are able to provide provable guarantees of security of some higher level cryptographic construct in this case a symmetric encryption scheme based on the assumption that some building block in this case an underlying block is secure The above results are the rst example of the punch line we have been building towards So it is worth pausing at this point and trying to make sure we really understand what these theorems are saying and what are their implications If we want to entrust our data to some encryption mechanism we want to know that this encryption mechanism really provides privacy If it is ill designed it may not We saw this happen with ECB Even if we used a secure block cipher the aws of ECB mode make it an insecure encryption scheme Flaws are not apparent in CTR at rst glance But maybe they exist It is very hard to see how one can be convinced they do not exist when one cannot possible exhaust the space of all possible attacks that could be tried Yet this is exactly the di iculty that the above theorems circumvent They are saying that CTR mode does not have design flaws They are saying that as long as you use a good block cipher you are assured that nobody will break your encryption scheme One cannot ask for more since if one does not use a good block cipher there is no reason to expect security of your encryption scheme anyway We are thus getting a conviction that all attacks fail even though we do not even know exactly how these attacks might operate That is the power of the approach Now one might appreciate that the ability to make such a powerful statement takes work It is for this that we have put so much work and time into developing the de nitions the formal notions of security that make such results meaningful For readers who have less experience with de nitions it is worth knowing at least that the e ort is worth it It takes time and work to understand the notions but the payoffs are big you get signi cant guarantees of security How exactly are the theorems saying this The above discussion has pushed under the rug the quantitative aspect that is an important part of the results It may help to look at a concrete example Example 515 Let us suppose that F is the block cipher AES so that 6 L 128 Suppose I want to encrypt q 230 messages each being one kilobyte 213 bits long I am thus encrypting a total of 243 bits which is to say a 236 blocks This is about one terabyte Can I do this securely using CTR7 Let A be an adversary attacking the privacy of my encryption Theorem 514 says that there exists a B satisfying the stated conditions How large can Advgr sB be It makes 1 236 queries and it is consistent with our state of knowledge of the security of ABS to assume that such an adversary cannot do better than mount a birthday attack meaning its advantage is no more than 122128 Then the theorem tells us that 72 05 72 15 272 1 rndCpa AdVSE A lt 2128 2128 3 2E39 7 2128 This is a very small number indeed saying that our encryption is secure at least under the as sumption that the best attack on the PRF security of AES is a birthday attack Note however that if we encrypt 264 blocks of data all provable security has been lost I The example illustrates how to use the theorems to gure out how much security you will get from the CTR encryption scheme in a given application Note that as per the above theorems encrypting more than a 222 blocks of data with CTR is not secure regardless of the quality of F as a PRF On the other hand with CTRC it might be 22 SYMMETRIC ENCRYPTION algorithm 59M static ctr e O m e HMILi lf ctr m 2 22 then return L Pad 9 gctr 1 gctr 2 gctr m Pad 9 the rst bits of Pad C lt M 69 Pad ctr e ctr m return ctr 7 mC algorithm Dglt2 C m a TICILl PadHWJrl H9i2 H H9lt m Pad 9 the rst C bits of Pad M lt Pad 69 C return M Figure 57 Version S G 1C 5 D of the CTRC scheme parameterized by a family of functions G secure as long as F can withstand 7 queries This is an interesting and possibly useful distinction Yet in the setting in which such modes are usually employed the distinction all but vanishes For usually F is a block cipher and 6 L is its block length In that case we know from the birthday attack that the prf advantage of B may itself be as large as 6022 and thus again encrypting more than a 222 blocks of data is not secure However we might be able to nd or build function families F that are not families of permutations and preserve PRF security against adversaries making more than 222 queries 571 Proof of Theorem 513 The paradigm used is quite general in many of its aspects and we will use it again not only for encryption schemes but for other kinds of schemes that are based on pseudorandom functions An important observation regarding the CTR scheme is that the encryption and decryption operations do not need direct access to the key K but only access to a subroutine or oracle that implements the function FK This is important because one can consider what happens when FK is replaced by some other function To consider such replacements we reformulate the scheme We introduce a scheme that takes as a parameter any given family of functions G having domain 0 12 and range 0 1L As we will see later the cases of interest are C F and G Func L Let us rst however describe this parameterized scheme In the rest of this proof S G 1C 5 D denotes the symmetric encryption scheme de ned as follows The key generation algorithm simply returns a random instance of G meaning that it picks a function g lt10 from family G at random and views 9 as the key The encryption and decryption algorithms are shown in Fig 57 The scheme is stateful with the encryptor maintaining a counter that is initially zero As the description indicates the scheme is exactly CTRC except that function g is used in place Bellare and Rogaway 23 of FK This seemingly cosmetic change of viewpoint is quite useful as we will see We observe that the scheme in which we are interested and which the theorem is about is simply S F where F is our given family of functions as per the theorem Now the proof breaks into two parts The rst step removes F from the picture and looks instead at an idealized version of the scheme Namely we consider the scheme S Func L Here a random function g of bits to L bits is being used where the original scheme would use FK We then assess an adversary s chance of breaking this idealized scheme We argue that this chance is actually zero This is the main lemma in the analysis This step is de nitely a thought experiment No real implementation can use a random function in place of FK because even storing such a function takes an exorbitant amount of memory But this analysis of the idealized scheme enables us to focus on any possible weaknesses of the CTR mode itself as opposed to weaknesses arising from properties of the underlying block cipher We can show that this idealized scheme is secure and that means that the mode itself is good It then remains to see how this lifts to a real world in which we have no ideal random functions but rather want to assess the security of the scheme 55 that uses the given family F Here we exploit the notion of pseudorandomness to say that the chance of an adversary breaking the 55 can differ from its chance of breaking the ideal world scheme S Func L by an amount not exceeding the probability of breaking the pseudorandomness of F using comparable resources Lemma 516 Security of CTRC using a random function Let A be any IND CPA adver sary attacking S Func L where the scheme is depicted in Fig 57 Then indc a AdV5 anclLA 0quot The lemma considers an arbitrary adversary Let us say this adversary has time complexity 75 makes 1 queries to its lr encryption oracle these totaling 7 L bit blocks The lemma does not care about the values oft q or 7 Recall however that after encrypting a total of 22 blocks the encryption mechanism will shut up77 and be of no use It says the adversary has zero advantage meaning no chance at all of breaking the scheme The fact that no restriction is made on t indicates that the result is information theoretic it holds regardless of how much computing time the adversary invests Of course this lemma refers to the idealized scheme namely the one where the function 9 being used by the encryption algorithm is random But remember that ECB was insecure even in this setting The attacks we provided for ECB work even if the underlying cipher E is Permn the family of all permutations on 71 bit strings So the statement is not content free it is saying something quite meaningful and important about the CTR mode It is not true of all modes We postpone the proof of the lemma Instead we will rst see how to use it to conclude the proof of the theorem The argument here is quite simple and generic The lemma tells us that the CTRC encryption scheme is very secure when 9 is a random function But we are interested in the case where g is is an instance of our given family F So our worry is that the actual scheme S F is insecure even though the idealized scheme S Func L is secure In other words we worry that there might be an adversary having large IND CPA advantage in attacking S F even though we know that its advantage in attacking S Func L is zero But we claim that this is not possible if F is a secure PRF lntuitively the existence of such an adversary indicates that F is not approximating Func L since there is some detectable event namely the success probability of some adversary in a certain experiment that happens with high probability when F is used and with low probability when Func L is used To concretize this 24 SYMMETRIC ENCRYPTION intuition let A be a IND CPA adversary attacking 55 We associate to A an adversary B that is given oracle access to a function 9 012 a O1L and is trying to determine which world it is in where in world 0 g is a random instance of Func L and in world 1 g is a random instance of F We suggest the following strategy to the adversary It runs A and replies to A s oracle queries in such a way that A is attacking S Func L in B s world 0 and A is attacking S F in B s world 1 The reason it is possible for B to do this is that it can execute the encryption algorithm 59 of Fig 57 which simply requires access to the function g If the adversary A wins meaning it correctly identi es the encryption oracle B bets that g is an instance of F otherwise B bets that g is an instance of Func L We stress the key point that makes this argument work It is that the encryption function of the CTRC scheme invokes the function FK purely as an oracle If it had instead made some direct use of the key K the paradigm above would not work The full proof follows Proof of Theorem 513 Let A be any IND CPA adversary attacking SE IC D Assume A makes 1 oracle queries totaling 1 bits and has time complexity t There there is an adversary B such that AdvglgCpaA g 2 Advgrfw 53 Furthermore B will make a oracle queries and have timecomplexity that of A plus Oq La Now the statement of Theorem 513 follows Remember that B takes an oracle g 012 A 01L This oracle is either drawn at random from F or from Func L and B does not know which To nd out B will use A running it as a subroutine But remember that A too gets an oracle namely an lr encryption oracle From A s point of view this oracle is simply a subroutine A can write at some location a pair of messages and is returned a response by some entity it calls its oracle When B runs A as a subroutine it is B that will simulate the lr encryption oracle for A meaning B will provide the responses to any oracle queries that A makes Here is the description of B Adversary B9 b lt1 0 1 Run adversary A replying to its oracle queries as follows When A makes an oracle query M0M1 do 0 lt1 59Mb Return C to A as the answer Until A stops and outputs a bit b lf 5 b then return 1 else return 0 Here 59 denotes the encryption function of the generalized CTRC scheme that we de ned in Fig 57 The crucial fact we are exploiting here is that this function can be implemented given an oracle for g Adversary B itself picks the challenge bit 1 representing the choice of worlds for A and then sees whether or not A succeeds in guessing the value of this bit If it does it bets that g is an instance of F and otherwise it bets that g is an instance of Func L For the analysis we claim that 1 1 A Pr Expgrf 1B 1 i i Advg gfaA 54 r 1 1 in c a Pr Expgf 031l i i Adv5 anCM A 55 Bellare and Rogaway 25 We will justify these claims shortly but rst let us use them to conclude Subtracting we get Advgrfw Pr ExpflB 1 7 Pr Expgrf39 B 1 indc a 1 indc a Advsswf A g AdvsqanqLLnM 56gt MIHMIH Advgggm The last inequality was obtained by applying Lemma 516 which told us that the term Advglg ffia m A was simply zero Re arranging terms gives us Equation 53 Now let us check the resource usage Each computation 59Mb requires lelL applications of g and hence the total number of queries made by B to its oracle g is a The time complexity of B equals that of A plus the overhead for answering the oracle queries It remains to justify Equations 54 and 55 Adversary B returns 1 when I b meaning that IND CPA adversary A correctly identi ed the world I in which it was placed or in the language of Section 542 made the correct guess77 The role played by B s world is simply to alter the encryption scheme for which this is true When B is in world 1 the encryption scheme from the point of view of A is S F and when B is in world 0 the encryption scheme from the point of view of A is S Func L Thus using the notation from Section 542 we have Pr Expgrf391B 1 Pr Exp 39fa39cgA 1 Pr Expgrf39 B1l Pr Exp 39fampLA1l To obtain Equations 54 and 55 we can now apply Proposition 59 I For someone unused to PRF based proofs of security the above may seem complex but the under lying idea is actually very simple and will be seen over and over again It is simply that one can view the experiment of the IND CPA adversary attacking the encryption scheme as information about the underlying function 9 being used and if the adversary has more success in the case that g is an instance of F than that g is an instance of Func L then we have a distinguishing test between F and Func L Let us now prove the lemma about the security of the idealized CTRC scheme Proof of Lemma 516 The intuition is simple When 9 is a random function its value on successive counter values yields a onetime pad a truly random and unpredictable sequence of bits As long as the number of data bits encrypted does not exceed L22 we invoke 9 only on distinct values in the entire encryption process And if an encryption would result in more queries than this the algorithm simply shuts up so we can ignore this The outputs of g are thus random Since the data is XORed to this sequence the adversary gets no information whatsoever about it Now we must make sure that this intuition carries through in our setting Our lemma statement makes reference to our notions of security so we must use the setup in Section 54 The adversary A has access to an lr encryption oracle Since the scheme we are considering is S Func L the oracle is 59LR b where the function 59 was de ned in Fig 57 and g is a random instance of Func L meaning a random function 26 SYMMETRIC ENCRYPTION The adversary makes some number q of oracle queries Let MM Mm be the i th query and let m be the number of blocks in Mm We can assume this is the same as the number of blocks in Mm since otherwise the lr encryption oracle returns 1 Let Mmb be the value of the j th 6 bit block of M b for b 6 01 Let C be the response returned by the oracle to query MLOJVIZIJ It consists of a value that encodes the counter value together with m blocks of 6 bits each C 1 Ci Pictorially M1b M1b1lM1b1l r r r M1bm1l Cl lt0Cl1 01m1gt Mu M2b1lM2b2l r r gtM2blm2l 02 m1702l1l 02lm2lgt Mq b Mq b1Mq b2 Mq bmq Ct ltm1 mq7170ql1l qumqlgt What kind of distribution do the outputs received by A have We claim that the m1 mq values CED 1 q and j 1 m are randomly and independently distributed not only of each other but of the queried messages and the bit 5 and moreover this is true in both worlds Why Here is where we use a crucial property of the CTR mode namely that it XORs data with the value of g on a counter We observe that according to the scheme M 1U if we are in world 1 C 7 69 zlJl 9lm1 m2 1 all Mi om if we are in world 0 Now we can nally see that the idea we started with is really the heart of it The values on which 9 is being applied above are all distinct So the outputs of g are all random and independent It matters not then what we XOR these outputs with what comes back is just random This tells us that any given output sequence from the oracle is equally likely in both worlds Since the adversary determines its output bit based on this output sequence its probability of returning 1 must be the same in both worlds indc a l indc a O Pr EXP55anceLA ll Pr EXP55anceLA ll r Hence A s IND CPA advantage is zero I 572 Proof of Theorem 514 The proof of Theorem 514 reuses a lot of what we did for the proof of Theorem 513 above We rst look at the scheme when 9 is a random function and then use the pseudorandomness of the given family F to deduce the theorem As before we associate to a family of functions G having domain 0 12 and range 0 1L a parameterized version of the CTR scheme S G 1C 5 D The key generation algorithm simply returns a random instance of 0 meaning picks a function g 9 HO from family G at random and Views 9 as the key and the encryption and decryption algorithms are shown in Fig 58 Here is the main lemma Bellare and Rogaway 27 algorithm 59M m a TIMILl Ri01l PadegltR1gt HgltR2gt H HgltRmgt Pad e the rst bits of Pad 0 lt M 69 Pad 0 HR H 0 return C algorithm DgC if C lt 6 then return L Parse C into R C where R 6 m H HUILl Pad H91 31 H 903 H H 9Rm Pad e the rst IC I bits of Pad M lt C 69 Pad return M Figure 58 Version S G 1C D of the CTR scheme parameterized by a family of functions G Lemma 517 Security of CTR using a random function Let A be any IND CPA adversary attacking S Func L where the scheme is depicted in Fig 58 Then 05 72 S 22 7 d Advglqgflam A assuming A asks a number of queries whose total length is at most a L bit blocks I The proof of Theorem 514 given this lemma is easy at this point because it is almost identical to the above proof of Theorem 5137 and it is the subject of Problem 52 We go on to prove Lemma 517 Before we prove Lemma 5177 we will analyze a certain probabilistic game The problem we isolate here is purely probabilistic it has nothing to do with encryption or even cryptography Lemma 518 Let q be positive integers and let m1mq lt 22 also be positive integers Suppose we pick 1 integers r1 rq from 022 7 1 uniformly and independently at random We consider the following m1 mq numbers 17 27 7 T1m1 T217 T227 7 T2m2 Tq17 12 7 Tqmq7 28 SYMMETRIC ENCRYPTION where the addition is performed modulo 22 We say that a collision occurs if some two or more numbers in the above table are equal Then q1gtmlwmq Pr Col 3 2e 57gt where Col denotes the event that a collision occurs Proof of Lemma 518 As with many of the probabilistic settings that arise in this area this is a question about some kind of balls thrown in bins77 setting related to the birthday problem studied in the appendix on the birthday problem Indeed a reader may nd it helpful to study that appendix rst Think of having 22 bins numbered 01 22 7 1 We have 1 balls numbered 1 q For each ball we choose a random bin which we call n We choose the bins one by one so that we rst choose n then m and so on When we have thrown in the rst ball we have de ned the rst row of the above table namely the values n 1 n m1 Then we pick the assignment r2 of the bin for the second ball This de nes the second row of the table namely the values r2 1 7 2 m2 A collision occurs if any value in the second row equals some value in the rst row We continue up to the q th ball each time de ning a row of the table and are nally interested in the probability that a collision occurred somewhere in the process To upper bound this we want to write this probability in such a way that we can do the analysis step by step meaning view it in terms of having thrown and xed some number of balls and seeing whether there is a collision when we throw in one more ball To this end let Col denote the event that there is a collision somewhere in the rst 2 rows of the table for 2 1 q Let NoCoIZ denote the event that there is no collision in the rst 2 rows of the table for 2 1 q Then by conditioning we have Pr Col Pr Colq 7 Pr Colq1 Pr Colql NoCoIq1 Pr NoCoIq1 S Pr Colq71 Pr Colql NoCoIq71 l l q Pr Coll ZPr ColZ39I NoCoI 1 22 q ZPrCoI NoCoI 1 2392 Thus we need to upper bound the chance of a collision upon throwing the i th ball given that there was no collision created by the rst 2 7 1 balls Then we can sum up the quantities obtained and obtain our bound We claim that for any 2 2 q we have prCOZNOCOZ71 S Bellare and Rogaway 29 Let us rst see why this proves the lemma and then return to justify it From the above and Equation 58 we have q Pr Col 3 ZPr ColZ39I NoCoIZnll 22 q 2 71m m 71wm1 22 l s to q71gtltm1wmqgt 2e 39 A How did we do the last sum The term mi occurs with weight 2 7 1 in the i th term of the sum and then with weight 1 in the j th term of the sum for j 1 1 q So its total weight is F1qiq71 It remains to prove Equation 58 To get some intuition about it begin with the cases 2 12 When we throw in the rst ball the chance of a collision is zero since there is no previous row with which to collide so that is simple When we throw in the second what is the chance of a collision The question is what is the probability that one of the numbers r2 1 7 2 m2 de ned by the second ball is equal to one of the numbers n 1r1 m1 already in the table View n as xed Observe that a collision occurs if and only if n 7 mg 1 3 r2 3 r1 m1 7 1 So there are n m1 7 1 7 r1 7 mg 1 1 m1 m2 7 1 choices of r2 that could yield a collision This means that Pr CO2 NoCoI1 3 m2 m1 7 12 We need to extend this argument as we throw in more balls So now suppose 2 7 1 balls have been thrown in where 2 S 2 3 q and suppose there is no collision in the rst 2 71 rows of the table We throw in the i th ball and want to know what is the probability that a collision occurs We are viewing the rst 2 7 1 rows of the table as xed so the question is just what is the probability that one of the numbers de ned by n equals one of the numbers in the rst 2 7 1 rows of the table A little thought shows that the worst case meaning the case where the probability is the largest is when the existing 2 7 1 rows are well spread out We can upper bound the collision probability by reasoning just as above except that there are 2 7 1 different intervals to worry about rather than just one The i th row can intersect with the rst row or the second row or the third and so on up to the 7 1 th row So we get m rmii1m m21wm m 7171 22 11m m 71wm1i1 22 Pr Co NoCoI 1 S and Equation 58 follows by just dropping the negative term in the above I Let us now extend the proof of Lemma 516 to prove Lemma 517 Proof of Lemma 517 Recall that the idea of the proof of Lemma 516 was that when 9 is a random function its value on successive counter values yields a onetime pad This holds whenever g is applied on some set of distinct values In the counter case the inputs to g are always distinct In the randomized case they may not be distinct The approach is to consider the event that they are distinct and say that in that case the adversary has no advantage and on the other hand 30 SYMMETRIC ENCRYPTION while it may have a large advantage in the other case that case does not happen often We now ush all this out in more detail The adversary makes some number 1 of oracle queries Let MM llIM be the i th query and let m be the number of blocks in M 0We can assume this is the same as the number of blocks in NI since otherwise the lr encryption oracle returns T Let lli bb be the value of the j th L bit block of NI for b 6 01 Let C be the response returned by the oracle to query MM It consists of the encoding of a number n 6 022 7 1 and a mi block message 0 CAI Ci Pictorially M117 Ml meLb M1bm1l C1 ltT1Cl1 Cllmllgt Mu M2bl1lM2bl2l M2bm2l 02 rzyozlllmozlmzl Mqb Mqb1qub2l Mqbmql Ct ltTqvoql1l r r 39 qumqlgt Let NoCoI be the event that the following m1 mq values are all distinct n1 27 7 T1m1 T21 T227 7 T2m2 Tq17 12 7 Tqmq Let Col be the complement of the event NoCoI meaning the event that the above table contains at least two values that are the same It is useful for the analysis to introduce the following shorthand Pro The probability of event 7 in world 0 Pro The probability of event 7 in world 1 We will use the following three claims which are proved later The rst claim says that the probability of a collision in the above table does not depend on which world we are in Claim 1 Pr1Co Pro Col D The second claim says that A has zero advantage in winning the left or right game in the case that no collisions occur in the table Namely its probability of outputting one is identical in these two worlds under the assumption that no collisions have occurred in the values in the table Claim 2 Pro A 1 NoCoI Prl A 1 NoCoI D Bellare and Rogaway 31 We can say nothing about the advantage of A if a collision does occur in the table It might be big However it will suf ce to know that the probability of a collision is small Since we already know that this probability is the same in both worlds Claim 1 we bound it just in world 0 2 Claim 3 Pro Col 5 B Let us see how these put together complete the proof of the lemma and then go back and prove them Proof of Lemma given Claims It is a simple conditioning argument Advgg lam A Pr1A17Pr0A1 7 Prl A 1 Col Prl Col Pr1 A 1 NoCoI Pr1NoCo 7 Pro A 1 Col Pr0Col 7 Pro A 1 NoCoI Pro NoCoI Prl A 1 Col 7 Pro A 1 Co Pro Col 3 Pro Col The second last step used Claims 1 and 2 In the last step we simply upper bounded the parenthe sized expression by 1 Now apply Claim 3 and we are done D It remains to prove the three claims Proof of Claim 1 The event NoCoI depends only on the random values r1 rq chosen by the encryption algorithm 59 These choices however are made in exactly the same way in both worlds The difference in the two worlds is what message is encrypted not how the random values are chosen D Proof of Claim 2 Given the event NoCoI we have that in either game the function g is evaluated at a new point each time it is invoked Thus the output is randomly and uniformly distributed over 0 1L independently of anything else That means the reasoning from the counter based scheme as given in Lemma 516 applies Namely we observe that according to the scheme MI 19 if we are in world 1 0M 7 9mm 69 2 l llIwb if we are in world 0 Thus each cipher block is a message block XORed with a random value A consequence of this is that each cipher block has a distribution that is independent of any previous cipher blocks and of the messages D Proof of Claim 3 This follows from Lemma 518 We simply note that m1 mq 039 D This concludes the proof I 58 Security of CBC with a random IV In this section we show that CBC encryption using a random IV is IND CPA secure as long as E is a block cipher that is a secure PRF or PRP Namely we show 32 SYMMETRIC ENCRYPTION algorithm 59 if mod 71 7E O or 0 then return T Break M into n bit blocks M1 C0 lt IV lt1 01 for 2 e 1 to m do e gC2 7 1 e C e C1 return IVCgt algorithm DgltlV 1 return T Figure 59 Version S G IC D of the CBC scheme parameterized by a family of functions G Theorem 519 Security of CBCEB mode Let E C X 01 a 01 be a block cipher and let SS IC D be the corresponding CBC symmetric encryption scheme as described in Scheme 56 Let A be an adversary for attacking the IND CPA security of 55 that runs in time at most t and asks at most 1 queries these totaling at most a 71 bit blocks Then there exists an adversary B attacking the PRF security of E such that indspa prf 72 Advsg A S AdvE BW Furthermore B runs in time at most t t Oq m7 and asks at most 1 a oracle queries I To prove this theorem we proceed as before to introduce a scheme that takes as a parameter any given family of functions G having domain and range 01 The cases of interest are C E and G Funcnn The algorithms of the scheme are depicted in Fig 59 Note that the decryption algorithm simply returns T so that this scheme does not have the correct decryption property But one can still discuss its security and it is important for us to do so Now the main result is the information theoretic one in which the underlying function family is Funcnn Lemma 520 Security of CBC using a random function Let A be any IND CPA adversary attacking S Funcnn where the scheme is depicted in Fig 59 Then 2 d 0 AdvlCIleCCSl iuncmm S W 7 assuming A asks a number of queries whose total length is at most a 71 bit blocks I Given this lemma the proof of Theorem 519 follows in the usual way so our main task is to prove the lemma This is postponed for now Bellare and Rogaway 33 59 Indistinguishability under chosenciphertext attack So far we have considered privacy under chosen plaintext attack Sometimes we want to consider privacy when the adversary is capable of mounting a stronger type of attack namely a chosen ciphertext attack In this type of attack an adversary has access to a decryption oracle It can feed this oracle a ciphertext and get back the corresponding plaintext How might such a situation arise One situation one could imagine is that an adversary at some point gains temporary access to the equipment performing decryption It can feed the equipment ciphertexts and see what plaintexts emerge We assume it cannot directly extract the key from the equipment however If an adversary has access to a decryption oracle security at rst seems moot since after all it can decrypt anything it wants To create a meaningful notion of security we put a restriction on the use of the decryption oracle To see what this is let us look closer at the formalization As in the case of chosen plaintext attacks we consider two worlds World 0 The adversary is provided the oracle KLRO as well as the oracle DKQ World 1 The adversary is provided the oracle KLR1 as well as the oracle DKQ The adversary s goal is the same as in the case of chosen plaintext attacks it wants to gure out which world it is in There is one easy way to do this Namely query the lr encryption oracle on two distinct equal length messages M0 M1 to get back a ciphertext C and now call the decryption oracle on C If the message returned by the decryption oracle is M0 then the adversary is in world 0 and if the message returned by the decryption oracle is M1 then the adversary is in world 1 The restriction we impose is simply that this call to the decryption oracle is not allowed More generally call a query C to the decryption oracle illegitimate if C was previously returned by the lr encryption oracle otherwise a query is legitimate We insist that only legitimate queries are allowed In the formalization below the experiment simply returns 0 if the adversary makes an illegitimate query We clarify that a query C is legitimate if C is returned by the lr encryption oracle after C was queried to the decryption oracle This restriction still leaves the adversary with a lot of power Typically a successful chosen ciphertext attack proceeds by taking a ciphertext C returned by the lr encryption oracle modifying it into a related ciphertext C and querying the decryption oracle with C The attacker seeks to create 0 in such a way that its decryption tells the attacker what the underlying message M was We will see this illustrated in Section 510 below The model we are considering here might seem quite arti cial If an adversary has access to a decryption oracle how can we prevent it from calling the decryption oracle on certain messages The restriction might arise due to the adversary s having access to the decryption equipment for a limited period of time We imagine that after it has lost access to the decryption equipment it sees some ciphertexts and we are capturing the security of these ciphertexts in the face of previous access to the decryption oracle Further motivation for the model will emerge when we see how encryption schemes are used in protocols We will see that when an encryption scheme is used in many authenticated key exchange protocols the adversary effectively has the ability to mount chosen ciphertext attacks of the type we are discussing For now let us just provide the de nition and exercise it De nition 521 Let SS IC D be a symmetric encryption scheme let A be an algorithm that has access to two oracles and let I be a bit We consider the following experiment 34 SYMMETRIC ENCRYPTION Experiment Expg g39cm39bA K amp 1c 13 A5KLR39 J7 DK39 If A queried DKQ on a ciphertext previously returned by KLRZJ then return 0 else return I The IND00A advantage of A is de ned as Advig g39cmm Pr Expg g39cm391A 1 7 Pr Expg g39cm390A 1 The conventions with regard to resource measures are the same as those used in the case of chosen plaintext attacks In particular the length of a query M0M1 to the lr encryption oracle is de ned as the length of MD We consider an encryption scheme to be secure against chosen ciphertext attack if a reason able adversary cannot obtain signi cant advantage in distinguishing the cases I O and b 1 given access to the oracles where reasonable re ects its resource usage The technical notion is called indistinguishability under chosen ciphertext attack denoted lND CCA 510 Example chosenciphertext attacks Chosen ciphertext attacks are powerful enough to break all the standard modes of operation even those like CTR and CBC that are secure against chosen plaintext attack The onetime pad scheme is also vulnerable to a chosen ciphertext attack our notion of perfect security only took into account chosen plaintext attacks Let us now illustrate a few chosen ciphertext attacks 5101 Attacks on the CTR schemes Let F C X 01 a 012 be a family of functions and let SE IC D be the associated CTR symmetric encryption scheme as described in Scheme 56 The weakness of the scheme that makes it susceptible to a chosen ciphertext attack is the following Say ltrCgt is a ciphertext of some 6 bit message M and we ip bit 2 of C resulting in a new ciphertext r C Let M be the message obtained by decrypting the new ciphertext Then M equals M with the i th bit ipped You should check that you understand why Thus by making a decryption oracle query of r C one can learn M and thus M In the following we show how this idea can be applied to break the scheme in our model by guring out in which world an adversary has been placed Proposition 522 Let F C X 01 a 012 be a family of functions and let SS IC D be the corresponding CTR symmetric encryption scheme as described in Scheme 56 Then Advg g39mt1 1n e 1 for t On 6 plus the time for one application of F The advantage of this adversary is 1 even though it uses hardly any resources just one query to each oracle That is clearly an indication that the scheme is insecure Bellare and Rogaway 35 Proof of Proposition 522 We will present an adversary algorithm A having timecomplexity t making 1 query to its lr encryption oracle this query being of length 6 making 1 query to its decryption oracle this query being of length n 6 and having Advg g39mm 1 The Proposition follows Remember that the lr encryption oracle KLR 27 takes input a pair of messages and returns an encryption of either the left or the right message in the pair depending on the value of b The goal of A is to determine the value of b Our adversary works like this Adversary A5KLR39 J7 DK39gt M0 lt 02 M1 lt 12 T70 H 5KLRM0 M1 5 C lt C 69 12 M e DKltltnogtgt If M M0 then return 1 else return 0 The adversary s single lr encryption oracle query is the pair of distinct messages M0M1 each one block long It is returned a ciphertext rC It ips the bits of C to get C and then feeds the ciphertext AC to the decryption oracle lt bets on world 1 if it gets back M0 and otherwise on world 0 Notice that r C y ltrC so the decryption query is legitimate Now we claim that 1 Pr ExpE g CC 1A 1 Pr Expg g39m390A 1 0 Hence Advglg8paA 17 O 1 And A achieved this advantage by making just one lr encryption oracle query whose length which as per our conventions is just the length of M0 is 6 bits and just one decryption oracle query whose length is n 6 bits assuming an encoding of ltrX as n X bits So Advglgcpat1 171 6 1 Why are the two equations claimed above true You have to return to the de nitions of the quantities in question as well as the description of the scheme itself and walk it through In world 1 meaning I 1 let ltrC denote the ciphertext returned by the lr encryption oracle Then 0 FKltT1M1 FKT11 Now notice that M DKltTC gtgt FKr1C FKT1C1 FKr 1 69 FKr 1 a 1 1 69 1 ol M0 36 SYMMETRIC ENCRYPTION Thus the decryption oracle will return M0 and A will return 1 ln world 0 meaning I 0 let ltrC denote the ciphertext returned by the lr encryption oracle Then 0 FKT1M0 FKr1o Now notice that M DKltltT70gtgt FKU 1 0 FKT1 06912 FKU 1 e9 FKT 1 69 0 e9 1 1 M1 Thus the decryption oracle will return M1 and A will return 0 meaning will return 1 with probability zero I An attack on CTRC cf Scheme 57 is similar and is left to the reader 5102 Attack on CBC Let E C X 01 a 01 be a block cipher and let SS IC D be the associated CBC symmetric encryption scheme as described in Scheme 54 The weakness of the scheme that makes it susceptible to a chosen ciphertext attack is the following Say IVC1 is a ciphertext of some n bit message M and we ip bit 2 of the IV resulting in a new ciphertext IVCm Let M be the message obtained by decrypting the new ciphertext Then M equals M with the i th bit ipped You should check that you understand why by looking at Scheme 54 Thus by making a decryption oracle query of IVCm one can learn M and thus M In the following we show how this idea can be applied to break the scheme in our model by guring out in which world an adversary has been placed Proposition 523 Let E C X 01 a 01 be a block cipher and let SE 1C 5 D be the corresponding CBC encryption scheme as described in Scheme 54 Then Advg g39cmt1n12n 1 for t On plus the time for one application of F The advantage of this adversary is 1 even though it uses hardly any resources just one query to each oracle That is clearly an indication that the scheme is insecure Proof of Proposition 523 We will present an adversary A having timecomplexity t making 1 query to its lr encryption oracle this query being of length 71 making 1 query to its decryption oracle this query being of length 2n and having Advg g39mm 1 The proposition follows Remember that the lr encryption oracle KLR 27 takes input a pair of messages and returns an encryption of either the left or the right message in the pair depending on the value of b The goal of A is to determine the value of b Our adversary works like this Bellare and Rogaway 37 Adversary ASK LR39mb DK39gt M0 H on M1 lt 1 Well KltLRltMOM15 IV lt 1v 1n M e DKltIVC1gt If M M0 then return 1 else return 0 The adversary s single lr encryption oracle query is the pair of distinct messages M0M1 each one block long It is returned a ciphertext ltIVC1gt It ips the bits of the IV to get a new IV IV and then feeds the ciphertext ltIV C1gt to the decryption oracle It bets on world 1 if it gets back M0 and otherwise on world 0 It is important that ltIV C1gt y ltIVC1gt so the decryption oracle query is legitimate Now we claim that l l H Pr ExpE g CC 1A 1 Pr Expg g39m390A 1 0 Hence Advg gccaA 17 O 1 And A achieved this advantage by making just one lr encryption oracle query whose length which as per our conventions is just the length of M0 is 71 bits and just one decryption oracle query whose length is 2n bits So Advg g39cmt 171 1271 1 Why are the two equations claimed above true You have to return to the de nitions of the quantities in question as well as the description of the scheme itself and walk it through In world 1 meaning I 1 the lr encryption oracle returns IV C1gt with on EKIVM1 EKIV1 Now notice that M DKlt1V 7C11gtgt EI1C1 69 IV Eg1EKIV1 IV IV1 IV O IV1 ivein on MO gt gt Thus the decryption oracle will return M0 and A will return 1 In world 0 meaning I 0 the lr encryption oracle returns ltIVC1gt with C1 EKIV a MD EKIV a 01 Now notice that M DK lt1V C1gt Eg1011v Eg1EKIV a on 69 iv 38 SYMMETRIC ENCRYPTION IV a 0 e9 IV O IV a 0 69 IV 691 1 M1 Thus the decryption oracle will return M1 and A will return 0 meaning will return 1 with probability zero I 51 1 Historical notes The pioneering work on the theory of encryption is that of Goldwasser and Micali 3 with re ne ments by 4 2 This body of work is however in the asymmetric ie public key setting and uses the asymptotic framework of polynomial time adversaries and negligible success probabilities The treatment of symmetric encryption we are using is from In particular De nition 51 and the concrete security framework are from The analysis of the CTR and CBC mode encryption schemes as given in Theorems 513 514 and 519 is also from The approach taken to the analysis of CBC mode is however new 512 Problems Problem 51 Formalize a notion of security against key recovery for symmetric encryption schemes and prove an analog of Proposition 512 I Problem 52 Using the proof of Theorem 513 as a template prove Theorem 514 assuming Lemma 517 I Problem 53 Devise a secure extension to CBC mode that allows messages of any bit length to be encrypted Clearly state your encryption and decryption algorithm Your algorithm should be simple should look like77 CBC mode as much as possible and it should coincide with CBC mode when the message being encrypted is a multiple of the blocklength How would you prove your algorithm secure I CS 4803 Computer and Network Security Alexandra Sasha Boldyreva Hash functions Hash functions A hash function is a function whose output is shorter than its input SHAl o1lt25 ao1160 Standardized by NIST Design principles are similar to that of other hash functions MD4 and MDS proposed by Rivest The inputs are first padded and divided by blocks Then an iterated chaining compression function is applied known as MerkIeDamgaord transform M1 W W 5 LE Security of hash functions 0 What security properties a good hash function H should have 0 Collisionresistance very informally nobody should be able to efficiently find M1M2 st HM1HM2 0 Onewayness very informally nobody given h should be able to find M st HMh Looking for collisions Let s recall learn the birthday paradox Applying the birthdayattack strategy and some additional analysis one can see that after making 1 s 2N hash computations one can find collisions with probability close to 1 Here N is the size of the range So for SHAl approximately 280 trials will suffice
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'