New User Special Price Expires in

Let's log you in.

Sign in with Facebook


Don't have a StudySoup account? Create one here!


Create a StudySoup account

Be part of our community, it's free to join!

Sign up with Facebook


Create your account
By creating an account you agree to StudySoup's terms and conditions and privacy policy

Already have a StudySoup account? Login here


by: Sterling Schmeler


Marketplace > University of Pittsburgh > ComputerScienence > CS1541 > INTROTOCOMPUTERARCHITECTURE
Sterling Schmeler
GPA 3.95


Almost Ready


These notes were just uploaded, and will be ready to view shortly.

Purchase these notes here, or revisit this page.

Either way, we'll remind you when they're ready :)

Preview These Notes for FREE

Get a free preview of these Notes, just enter your email below.

Unlock Preview
Unlock Preview

Preview these materials now for free

Why put in your email? Get access to more of this material and other relevant free materials for your school

View Preview

About this Document

Class Notes
25 ?




Popular in Course

Popular in ComputerScienence

This 77 page Class Notes was uploaded by Sterling Schmeler on Monday October 26, 2015. The Class Notes belongs to CS1541 at University of Pittsburgh taught by SangyeunCho in Fall. Since its upload, it has received 28 views. For similar materials see /class/229379/cs1541-university-of-pittsburgh in ComputerScienence at University of Pittsburgh.

Similar to CS1541 at Pitt

Popular in ComputerScienence




Report this Material


What is Karma?


Karma is the currency of StudySoup.

You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 10/26/15
Work in Progress Lecture Notes on the Status of IEEE 754 May 31 1996 244 pm Lecture Notes on the Status of IEEE Standard 754 for Binary FloatingPoint Arithmetic Prof W Kahan Elect Eng amp Computer Science University of California Berkeley CA 947201776 Introduction Twenty years ago anarchy threatened floatingpoint arithmetic Over a dozen commercially significant arithmetics boasted diverse wordsizes precisions rounding procedures and overunderflow behaviors and more were in the works Portable software intended to reconcile that numerical diversity had become unbearably costly to develop Eleven years ago when IEEE 754 became official major microprocessor manufacturers had already adopted it despite the challenge it posed to implementors With unprecedented altruism hardware designers rose to its challenge in the belief that they would ease and encourage a vast burgeoning of numerical software They did succeed to a considerable extent Anyway rounding anomalies that preoccupied all of us in the 1970s afflict only CRAYs now Now atrophy threatens features of IEEE 754 caught in a vicious circle Those features lack support in programming languages and compilers so those features are mishandled andor practically unusable so those features are little known and less in demand and so those features lack support in programming languages and compilers To help break that circle those features are discussed in these notes under the following headings Representable Numbers Normal and Subnormal Infinite and NaN 2 Encodings Span and Precision 34 MultiplyAccumulate a Mixed Blessing 5 Exceptions in General Retrospective Diagnostics 6 Exception Invalid Operation NaNs 7 Exception Divide by Zero Infinities 10 Digression on Division by Zero Two Examples 10 Exception Over ow 14 Exception Underflow 15 Digression on Gradual Underflow an Example 16 Exception Inexact 18 Directions of Rounding 18 Precisions of Rounding 19 The Baleful Influence of Benchmarks a Proposed Benchmark 20 Exceptions in General Reconsidered a Suggested Scheme 23 Ruminations on Programming Languages 29 Annotated Bibliography 30 Insofar as this is a status report it is subject to change and supersedes versions with earlier dates This version supersedes one distributed at a panel discussion of FloatingPoint Past Present and Future in a series of San Francisco Bay Area Computer History Perspectives sponsored by Sun Microsystems Inc in May 1995 A Post Script version is accessible 39 as http llhttp r berkelem quot 39 39 4113 Page 1 This Annnmant mo nrantarl nn39Ha UrnmaKTnlzar A H A Work in Progress Lecture Notes on the Status of IEEE 754 May 31 1996 244 pm Representable Numbers IEEE 754 specifies three types or Formats of floatingpoint numbers Single Fortran39s REAL4 C39s float Obligatory Double Fortran s REAL8 C39s double Ubiquitous and DoubleExtended Fortran REAL10 C39s long double Optional A fourth QuadruplePrecision format is not specified by IEEE 754 but has become a defacto standard among several computer makers none of whom support it fully in hardware yet so it runs slowly at best Each format has representations for NaNs N otaNumber ice Infinity and its own set of finite real numbers all of the simple form 2klN n with two integers n signed Signi cand and k unbiased signed Exponent that run throughout two intervals determined from the format thus K1 Exponent bits 1 2K lt k lt 2K N Significantbits 2N lt n lt 2N Table of Formats Parameters Format Bytes K1 N Single 4 8 24 Double 8 11 53 DoubleExtended 2 10 2 15 2 64 Quadruple 16 15 113 This concise representation 2k139N n unique to IEEE 754 is deceptively simple At first sight it appears potentially ambiguous because if n is even dividing n by 2 a rightshift and then adding 1 to k makes no difference Whenever such an ambiguity could arise it is resolved by minimizing the exponent k and thereby maximizing the magnitude of significand n this is Normalization which if it succeeds permits a Normal nonzero number to be expressed in the form 2k139N n iZk 1 f with a nonnegative fraction f lt 1 Besides these Normal numbers IEEE 754 has Subnormal Denormalized numbers lacking or suppressed in earlier computer arithmetics Subnormals which permit Underflow to be Gradual are nonzero numbers with an unnormalized significand n and the same minimal exponent k as is used for 0 Subnormal 2k139Nn 2k 0 f has k 2 2K and 0 lt n lt 2N391 so 0ltflt 1 K Thus where earlier arithmetics had conspicuous gaps between 0 and the tiniest Normal numbers i2 2 2 IEEE 754 fills the gaps with Subnormals spaced the same distance apart as the smallest Normal numbers Subnormals Normalized Numbers gt O l l K K K Powers of 2 22 2 23 2 24 2 Consecutive Positive Floating Point Numbers Page 2 Work in Progress Lecture Notes on the Status of IEEE 754 May 31 1996 244 pm IEEE 754 encodes floatingpoint numbers in memory not in registers in ways first proposed by LB Goldberg in Comm ACM 1967 1056 it packs three fields with integers derived from the sign exponent and significand of a number as follows The leading bit is the sign bit1 0 for and l for The next Kl bits hold a biased exponent The last N or Nl bits hold the significand s magnitude To simplify the following table the significand n is dissociated from its sign bit so that 11 may be treated as nonnegative Encodings of i2k139Nn into Binary Fields Number Type Sign Bit K1 bit Exponent Nth bit N1 bits of Signi cand NaNs 7 binary 111111 1 binary 1xxxxxx SNaNs 7 binary 111111 1 nonzero binary Oxxxxxx lnfinities r binary 111111 1 O Normals r k1 2K 1 nonnegative n ZN 1 lt 2N 1 Subnonnals r O 0 positive n lt ZN 1 Zeros r O O O Note that 0 and 0 are distinguishable and follow obvious rules specified by IEEE 754 even though floating point arithmetical comparison says they are equal there are good reasons to do this some of them discussed in my 1987 paper Branch Cuts The two zeros are distinguishable arithmetically only by either divisionby zero producing appropriately signed infinities or else by the CopySign function recommended by IEEE 754 854 lnfinities SNaNs NaNs and Subnormal numbers necessitate four more special cases IEEE Single and Double have no Nth bit in their significant digit fields it is implicit 680x0 ix87 Extendeds have an explicit Nth bit for historical reasons it allowed the Intel 8087 to suppress the normalization of subnormals advantageously for certain scalar products in matrix computations but this and other features of the 8087 were later deemed too arcane to include in IEEE 754 and have atrophied NonExtended encodings are all Lexicographically Ordered which means that if two floatingpoint numbers in the same format are ordered say x lt y then they are ordered the same way when their bits are reinterpreted as SignMagnitude integers Consequently processors need no floatingpoint hardware to search sort and window floatingpoint arrays quickly However some processors reverse byteorder Lexicographic order may also ease the implementation of a surprisingly useful function NextAfterx y which delivers the neighbor of x in its floatingpoint format on the side towards y Algebraic operations covered by IEEE 754 namely and Binary ltgt Decimal Conversion with rare exceptions must be Correctly Rounded to the precision of the operation s destination unless the programmer has specified a rounding other than the default If it does not Overflow a correctly rounded operation s error cannot exceed half the gap between adjacent floatingpoint numbers astride the operation s ideal unrounded result Halfway cases are rounded to Nearest Even which means that the neighbor with last digit 0 is chosen Besides its lack of statistical bias this choice has a subtle advantage it prevents prolonged drift during slowly convergent iterations containing steps like these While do y xz x yz A consequence of correct rounding and Gradual Underflow is that the calculation of an expression X Y for any algebraic operation produces if finite a result X Y l B u where Iul cannot exceed half the smallest gap between numbers in the destination s format and IBI lt Z39N and Bu 2 0 u 0 only when Underflow occurs This characterization constitutes a weak model of roundoff used widely to predict error bounds for software The model characterizes roundoff weakly because for instance it cannot confirm that in the absence of OverUnder ow or division by zero 1 S xx2 yz S 1 despite five rounding errors though this is true and easy to prove for IEEE 754 harder to prove for most other arithmetics and can fail on a CRAY YMP Page 3 Work in Progress Lecture Notes on the Status of IEEE 754 May 31 1996 244 pm The following table exhibits the span of each floatingpoint formaL and its precision both as an upper bound 239N upon relative error B and in Significant Decimals and Precision of IEEE 754 Formats 49 E324 22 E308 18 E308 111 E16 17 S 36 S 34 E4932 212 E4932 S 42 E20 218 21 Entries in this table come from the following formulas Min Positive Subnormal 23 39 2K 39 N Min Positive Normal 22 39 2K Max Finite 1 12N 22K Sig Dec at least floorN1Log102 sig dec at most ceil 1 N Log102 sig dec The precision is bracketed within a range in order to characterize how accurately conversion between binary and decimal has to be implemented to conform to IEEE 754 For instance 6 9 Sig Dec for Single means that1 in the absence of OVERUNDERFLOW If a decimal string with at most 6 sig dec is converted to Single and then converted back to the same number of sig dec then the final string should match the original Also If a Single Precision floatingpoint number is converted to a decimal string with at least 9 sig dec and then converted back to Single then the final number must match the original Most microprocessors that support floatingpoint onchip and all that serve in prestigious workstations support just the two REAL4 and REAL8 floatingpoint formats In some cases the registers are all 8 bytes wide and REAL4 operands are converted on the fly to their REAL8 equivalenm when they are loaded into a register in such cases immediately rounding to REAL4 every REAL8 result of an operation upon such converted operands produces the same result as if the operation had been performed in the REAL4 format all the way But Motorola 680x0 based Macintoshes and Intel ix86based PCs with ix87based not Weitek s 1167 or 3167 floatingpoint behave quite differently they perform all arithmetic operations in the Extended format regardless of the operands widths in memory and round to whatever precision is called for by the setting of a control word Only the Extended format appears in a 680x0 s eight oatingpoint flat registers or an ixS7 s eight floating point stackregisters so all numbers loaded from memory in any other formaL floatingpoint or integer or BCD are converted on the fly into Extended with no change in value All arithmetic operations enjoy the Extended range and precision Values stored from a register into a narrower memory format get rounded on the fly and may also incur OVERUNDERFLOW Since the register s value remains unchanged unless popped off the ixS7 s stack misconstrued ambiguities in manuals or illconsidered optimizations cause some compilers sometimes wrongly to reuse that register s value in place of what was stored from it this subtle bug will be reexamined later under Precisions of Rounding below Since the Extended format is optional in implementations of IEEE 754 most chips do not offer it it is available only on Intel s x86x87 Pentium P6 and their clones by AMD and Cyrix on Intel s 80960 KB on Motorola s6804060 or earlier 680x0 with 688812 coprocessor and on Motorola s 88110 all with 64 sig Page 4 Work in Progress Lecture Notes on the Status of lEEE 754 May 31 1996 244 pm bis and 15 bits of exponent but in words that may be 80 or 96 or 128 bits wide when stored in memory This format is intended mainly to help programmers enhance the integrity of their Single and Double software and to attenuate degradation by roundoff in Double matrix computations of larger dimensions and can easily be used in such a way that substituting Quadruple for Extended need never invalidate its use However language support for Extended is hard to find MultiplyAccumulate a Mixed Blessing The IBM Power PC and Apple Power Macintosh both derived from the IBM RS6000 architecture purport to conform to lEEE 754 but too often use a Fused MultiplyAdd instruction in a nonconforming way The idea behind a MultiplyAdd or MAC for MultiplyAccumulate instruction is that an expression like iab i c be evaluated in one instruction so implemented that scalar products like 211b1 212b2 213b3 aLbL can be evaluated in about L3 machine cycles Many machines have a MAC Beyond that1 a Fused MAC evaluates iab i c with just one rounding error at the end This is done not so much to roughly halve the rounding errors in a scalar product as to facilitate fast and correctly rounded division without much hardware dedicated to it To compute q xy correctly rounded it suffices to have hardware approximate the reciprocal ly to several sig bis by a value t looked up in a table and then improve t by iteration thus t l tyt Each such iteration doubles the number of correct bis in t at the cost of two MACs until t is accurate enough to produce q tx To round q correctly its remainder r x qy must be obtained exactly this is what the Fused in the Fused MAC is for It also speeds up correctly rounded square root decimal ltgt binary conversion and some transcendental functions These and other uses make a Fused MAC worth putting into a computer39s instruction set If only division and square root were at stake we might do better merely to widen the multiplier hardware slightly in a way accessible solely to microcode as Tl does in its SPARC chips A Fused MAC also speeds up a grubby DoubledDouble approximation to QuadruplePrecision arithmetic by unevaluated sums of pairs of Doubles lts advantage comes about from a Fused MAC39s ability to evaluate any product ab exactly first let p ab rounded off then compute c ab p exactly in another Fused MAC so that ab p c exactly without roundoff Fast but grubby DoubleDouble undermines the incentive to provide QuadruplePrecision correctly rounded in lEEE 75439s style Fused MACs generate anomalies when used to evaluate ab i cd in two instructions instead of three Which of ab and cd is evaluated and therefore rounded first Either way important expectations can be thwarted For example multiplying a complex number by its complex conjugate should produce a real number but it might not with a Fused MAC lf W q2 pr is real in the absence of roundoff then the same is expected for SQRT qq Pr despite roundoff but perhaps not with a Fused MAC Therefore Fused MACs cannot be used indiscriminately there are a few programs that contain a few assignment statements from which Fused MACs must be banned By design a Fused MAC always runs faster than separate multiplication and add so compiler writers with one eye on benchmarks based solely upon speed leave programmers no way to inhibit Fused MACs selectively within expressions nor to ban them from a selected assignment statement ldeally some locution like redundant parentheses should be understood to control the use of Fused MACs on machines that have them For instance in Fortran AB CD and CD AB should always round AB first AB CD should inhibit the use of a Fused MAC here Something else is needed for C whose Macro Preprocessor often insinuates hordes of redundant parentheses Whatever expedient is chosen must have no effect upon compilations to machines that lack a Fused MAC a separate compiler directive at the beginning of a program should say whether the program is intended solely for machines with or solely for machines without a Fused MAC Page 5 Work in Progress Lecture Notes on the Status of IEEE 754 May 31 1996 244 pm Exceptions in General Designers of operating systems tend to incorporate all traphandling into their handiwork thereby burdening floatingpoint exceptionhandling with unnecessary and intolerable overheads Better designs should incorporate all floatingpoint traphandling into a runtime math library along with logarithms and cosines which the operating system merely loads To this end the operating system has only to provide default handlers in case the loaded library neglects traphandling and secure trap revectoring functions for libraries that take up that duty and later at the end of a task relinquish it To Disable an exception s trap is to let the numeric coprocessor respond to every instance of that exception by raising its Flag and delivering the result specified as its Default in IEEE 754 For example the default result for 3000 is co with the same sign as that 00 The raised flag stays raised until later set down by the program perhaps after it has been sensed IEEE 754 allows for the possibility that raised flags be nonnull pointers but most microprocessors keep one or two bis per flag in a Status register whose sensing and clearing fall outside the scope of these notes The same goes for bis in a Control register that EnableDisable traps see manuals for your chip and for the programming environment e g compiler that concerns you CAUTION Do not change enable or disable exception traps in a way contrary to what is expected by your programming environment or application program lest unpredictable consequences ensue for lack of a handler or its action Disputes over exceptions are unavoidable In fact one definition for Exception is Event for which any policy chosen in advance will subsequently give some reasonable person cause to take exception A common mistake is to treat exceptions as errors and punish the presumed perpetrators usually punishment falls not upon deserving perpetrators but upon whatever interested parties happen to be in attendance later when exceptions arise from what was perpetrated error or not Exceptions that reveal errors are merely messengers What turns an exception into an error is bad exception handling Attempts to cope decently with all exceptions inevitably run into unresolved dilemmas sooner or later unless the computing environment provides what I call Retrospective Diagnostics These exist in a rudimentary form in Sun Microsystems39 operating system on SPARCs The idea is to log in the sense of a ship39s log every suspicious event that is noticed during a computation These events are logged not by time of occurrence which could fill a disk very soon but by site in a program A hashing scheme ensures that events repeated at the same site will perhaps update but certainly not add entries to the log Neither need an exception that occurs while its flag is still raised by a previous exception of the same kind add a new entry to the log After a program finishes if it ever does its user may be notified discreetly if a dangerous flag like INVALID is still raised then that flag can serve as a pointer to its entry in the log The log cannot grow intolerably long so unusual entries stand out and point to whatever software module put them there The user of that software can then identify the module in question and ask its supplier whether an entry is something to worry about Page 6 Work in Progress Lecture Notes on the Status of IEEE 754 May 31 1996 244 pm Exception INVALID operation Signaled by the raising of the INVALID flag whenever an operations operands lie outside its domain this exception s default delivered only because any other real or infinite value would most likely cause worse confusion is NaN which means Not a Number IEEE 754 specifies that seven invalid arithmetic operations shall deliver a NaN unless they are trapped real Negative 0eo 0000 eoeo REMAINDERAnything 00 REMAINDER co Anything ltgtltgt ltgtltgt when signs agree but ltgtltgt co 2 co when signs agree Conversion from floatingpoint to other formats can be INVALID too if their limits are violated even if no NaN can be delivered NaN also means Not any Number NaN does not represent the set of all real numbers which is an interval for which the appropriate representation is provided by a scheme called Interval Arithmetic NaN must not be confused with Undefined On the contrary IEEE 754 defines NaN perfectly well even though most language standards ignore and many compilers deviate from that definition The deviations usually afflict relational expressions discussed below Arithmetic operations upon NaNs other than SNaNs see below never signal INVALID and always produce NaN unless replacing every NaN operand by any finite or infinite real values would produce the same finite or infinite floatingpoint result independent of the replacements For example 0NaN must be NaN because 0ltgtltgt is an INVALID operation NaN On the other hand for hypotx y Vxx yy we find that hypotltgtltgt y ltgtltgt for all real y finite or not and deduce that hypotltgtltgt NaN 2 ltgtltgt too naive implementations of hypot may do differently NaNs were not invented out of whole cloth Konrad Zuse tried similar ideas in the late 1930s Seymour Cray built Indefinites into the CDC 6600 in 1963 then DEC put Reserved Operands into their PDPll and VAX But nobody used them because they trap when touched NaNs do not trap unless they are Signaling SNaNs which exist mainly for political reasons and are rarely used NaNs propagate through most computations Consequently they do get used Perhaps NaNs are widely misunderstood because they are not needed for mathematical analysis whose sequencing is entirely logical they are needed only for computation with temporal sequencing that can be hard to revise harder to reverse NaNs must conform to mathematically consistent rules that were deduced not invented arbitrarily in 1977 during the design of the Intel 8087 that preceded IEEE 754 What had been missing from computation but is now supplied by NaNs is an opportunity not obligation for software especially when searching to follow an unexceptional path no need for exotic control structures to a point where an exceptional event can be appraised after the event when additional evidence may have accrued Deferred judgments are usually better judgments but not always alas Whenever a NaN is created from nonNaN operands IEEE 754 demands that the INVALID OPERATION flag be raised but does not say whether a flag is a word in memory or a bit in a hardware Status Word That flag stays raised until the program lowers it The Motorola 680x0 also raises or lowers a transient flag that pertains solely to the last floatingpoint operation executed The Sticky flag mandated by IEEE 754 allows programmers to test it later at a convenient place to detect previous INVALID operations and compensate for them rather than be forced to prevent them However Microsoft s C and C compilers defeat that purpose of the INVALID flag by using it exclusively to detect floating point stack overflows so programmers cannot use it via library functions clear87 and status87 for their own purposes This agrant violation of IEEE 754 appears not to weigh on Microsoft s corporate conscience So far as lknow Borland39s C C and Pascal compilers do not abuse the INVALID flag that way Page 7 Work in Progress Lecture Notes on the Status of IEEE 754 May 31 1996 244 pm While on the subject of miscreant compilers we should remark their increasingly common tendency to reorder operations that can be executed concurrently by pipelined computers C programmers may declare a variable volati l e to inhibit certain reorderings A programmer39s intention is thwarted when an alleged optimization moves a floatingpoint instruction past a procedurecall intended to deal with a flag in the floatingpoint status word or to write into the control word to alter trapping or rounding Bad moves like these have been made even by compilers that come supplied with such procedures in their libraries See icontr0187 clear87 and istatus87 in compilers for Intel processors Operations movements would be easier to debug if they were highlighted by the compiler in its annotated relisting of the sourcecode Meanwhile so long as compilers mishandle attempts to cope with floatingpoint exceptions flags and modes in the ways intended by IEEE Standard 754 frustrated rogrammers will abandon such attempts and compiler writers will infer wrongly that unexercised capabilities are unexercised for lack of demand IEEE 75439s specification for NaN endows it with a field of bits into which software can record say how andor where the NaN came into existence That information would be extremely helpful for subsequent Retrospective Diagnosis of malfunctioning computations but no software exists now to employ it Customarily that field has been copied from an operand NaN to the result NaN of every arithmetic operation or filled with binar 1000000 when a new NaN was created by an untrapped INVALID operation For lack of software to exploit it1 that custom has been atrophying 680x0 and ix87 treat a NaN with any nonzero binary 0xxxxxx in that field as an SNaN Signaling NaN to fulfill a requirement of IEEE 754 An SNaN may be moved copied without incident but any other arithmetic operation upon an SNaN is an INVALID operation and so is loading one onto the ix87 s stack that must trap or else produce a new nonsignaling NaN Another way to turn an SNaN into a NaN is to tuni 0xxxxxx into lxxxxxx with a logical OR Intended for among other things data missing from statistical collections and for uninitialized variables SNaNs seem preferable for such purposes to zeros or haphazard traces left in memory by a previous program However no more will be said about SNaNs here Were there no way to get rid of NaNs they would be as useless as Indefinites on CRAYs as soon as one were encountered computation would be best stopped rather than continued for an indefinite time to an Indefinite conclusion That is why some operations upon NaNs must deliver nonNaN results Which operations Disagreements about some of them are inevitable but that grants no license to resolve the disagreements by making arbitrary choices Every real not logical function that produces the same floatingpoint result for all finite and infinite numerical values of an argument should yield the same result when that argument is NaN Recall hypot above The exceptions are C predicates 2 2 and x x which are respectively 1 and 0 for eve infinite or finite number x but reverse if x is Not a Number NaN these provide the only simple unexceptional distinction between NaNs and numbers in languages that lack a word for NaN and a predicate IsNaNx Over optimizing compilers that substitute 1 for 2 2 violate IEEE 754 IEEE 754 assigns values to all relational expressions involving NaN In the syntax of C the predicate x1 y is True butallothers x lt y 2 lt y 2 y 2 gt y and x gt y are False whenever x or y or both are NaN and then all but 2 y and 2 y are INVALID operations too and must so signal Ideally expressions 2 lt y 2 lt y 2 gt y x gt y and x gtlt y shouldbe valid and quietly True if x or y or both are NaN but arbiters of taste and fashion for ANSI StandardC have refused to recognize such expressions Inany event 1 x lt y differs from 2 gt y when NaN is involved though rude compilers optimize the difference away Worse some compilers mishandle NaNs in all relational expressions Some language standards conflict with IEEE 754 For example APL specifies 10 for 0000 this specification is one that APL39s designers soon regretted Sometimes naive compiletime optimizations replace expressions xx by 1 wrong if x is zero ltgtltgt or NaN and x x by 0 wrong if x is co or NaN and 0x and Ox by 0 wrong if alas Page 8 Work in Progress Lecture Notes on the Status of IEEE 754 May 31 1996 244 pm Ideally certain other Real expressions unmentioned by IEEE 754 should signal INVALID and deliver NaNs some examples in Fortran syntax are Negativelloninteger LOGNegative ASINBigger than 1 SIIIoo ACOSHLess than 1 all of them INVALID These expressions do behave that way if implemented well in software that exploits the transcendental functions built into the 680x0 and ix87 here i387 and successors work better than 8087 and 80287 A number of real expressions are sometimes implemented as INVALID by mistake or declared Undefined by ill considered language standards a few examples are 0000 oo00 NaN00 10 not NaN COS20120 09258790228548378673038617641 not NaN More examples like these will be offered under DIVIDE by ZERO below Some familiar functions have yet to be defined for NaN For instance maxx y should deliver the same result as maxy x but almost no implementations do that when x is NaN There are good reasons to define maxNaN 5 max5 NaN 5 though many would disagree Differences of opinion persist about whether certain functions should be INVALID or defined by convention at intenral discontinuities a few examples are 10oltgt 10ltgtltgt 10 7 NaN is better ATAN200 00 00 or 7 or 439 7 NaN is worse ATAN2oltgt ltgtltgt 114 7 NaN is worse SIGNUM00 00 or 10 or 10 or NaN7 00 is best SGN00 0 Standard BASIC SIGN00 SIGN00 10 Fortran Standard CopySign10 i00 i10 respectively IEEE 754854 As time passes so do disputes over the value that should be assigned to a function at a discontinuity For example a consensus is growing that x0 1 for every x including 0 co and NaN If some day we agree that 1x 1 for every x then 1NaN 1 will follow but for the time being NaN is the preferred altenrative for 1NaN and for 1ltgtltgt too provided it signals It seems unlikely that 0 will ever be preferred to NaN for signNaN And yet unwise choices continue to be inflicted upon us with the best of intentions so the struggle to correct them is unending Between 1964 and 1970 the US National Bureau of Standards changed its definition of arccotx from 112 arctanx to arctan1x thereby introducing a jump at x 0 This change appears to be a bad idea but it is hard to argue with an arm of the US govenrment Some programmers think invoking language locutions that enable the trap to abort upon INVALID operations is the safe way to avoid all such disputes they are mistaken Doing so may abort searches prematurely For example try to find a positive root x of an equation like TANx ASINx x4 00 by using Newton39s iteration or the Secant iteration starting from various first guesses between 01 and 09 In general a rootfinder that does not know the boundary of an equation39s domain must be doomed to abort if it probes a wild guess thrown oumide that domain unless it can respond to NaN by retracting the wild guess back toward a previous guess inside the domain Such a rootfinder is built into current HewlettPackard calculators that solve equations like the one above far more easily than do rootfinders available on most computers Page 9 Work in Progress Lecture Notes on the Status of IEEE 754 May 31 1996 244 pm Exception DIVIDE by ZERO This is a misnomer perpetrated for historical reasons A better name for this exception is Infinite result computed Exactly from Finite operands An example is 3000 for which IEEE 754 specifies an Infinity as the default result The sign bit of that result is as usual for quotients the exclusive OR of the operands39 sign bits Since 00 can have either sign so can co in fact division by zero is the only algebraic operation that reveals the sign of zero IEEE 754 recommends a nonalgebraic function CopySign to reveal a sign without ever signaling an exception but few compilers offer it1 alas Ideally certain other real expressions should be treated just the way IEEE 754 treats divisions by zero rather than all be misclassified as errors or Undefined some examples in Fortran syntax are i00NegativeNonInteger ltgtltgt i00NegativeEvenInteger ltgtltgt i00NegativeOddInteger 2 ice resp ATANHil0 2 ice resp LOGi00 ltgtltgt The sign of co may be accidental in some cases for instance if TANdegx delivers the TAN of an angle x measured in degrees then TANdeg900 180Integer is infinite with a sign that depends upon details of the implementation Perhaps that sign might best match the sign of the argument but no such convention exists yet For x in radians accurately implemented TAN x need never be infinite Compilers can cause accidenm by evaluating expressions carelessly For example when y resides in a register evaluating xy as yx reverses the sign of zero if y x evaluate it as y x instead Simplifying x0 to x misbehaves when x is 0 Doing that or printing 0 without its sign can obscure the source of a ltgtltgt Operations that produce an infinite result from an infinite operand or two must not signal DIVIDE by ZERO Examples include ltgtltgt 3 oooltgt EXPltgtltgt LOGltgtltgt Neither can 30ltgtltgt EXPoltgt 00 ATANioltgt iTEZ and similar examples be regarded as exceptional Unfortunately naive implementations of complex arithmetic can render ltgtltgt dangerous for instance when 0 3tltgtltgt is turned naively into 0 3tltgtltgt l0 ltgt2 02 it generates a NaN instead of the expected 0 MATLAB suffers from this af iction If all goes well infinite intermediate results will turn quietly into correct finite final results that way If all does not go well Infinity will turn into NaN and signal INVALID Unlike integer division by zero for which no integer infinity nor NaN has been provided floatingpoint division by zero poses no danger provided subsequent INVALID signals if any are heeded in that case disabling the trap for DIVIDE by ZERO is quite safe Digression on Division by Zero Schools teach us to abhor DivisionbyZero and to stand in awe of the Infinite Actually adjoining Infinity to the real numbers adds nothing worse than another exception to the familiar cancellation laws lxx xx l xx 0 among which the first is already violated by x 0 That is a small inconvenience compared with the circumlocutions we would resort to if Infinity were outlawed Two examples to show why are offered below Page 10 Work in Progress Lecture Notes on the Status of IEEE 754 May 31 1996 244 pm The first example shows how Infinity eases the numerical solution of a differential equation that appears to have no divisions in it The problem is to compute ylO where yt satisfies the Ricatti equation dydt ty2 forall tZO y00 Let us pretend not to know that yt may be expressed in terms of Bessel functions I whence ylO 753121 10731 35425 34544 97349 58 Instead a numerical method will be used to solve the differential equation approximately and as accurately as desired if enough time is spent on it Qe t Y will stand for an Updating Formula that advances from any estimate Y m yt to a later estimate Qe t Y m yt9 Vastly many updating formulas exist the simplest that might be applied to solve the given Ricatti equation would be Euler39s formula Qe t Y Y on Y2 This FirstOrder formula converges far too slowly as stepsize 9 shrinks a faster SecondOrder formula of RungeKutta type is Heun39s f tY2 q Yef Qe t Y Y f t9 q2 92 Formulas like these are used widely to solve practically all ordinary differential equations Every updating formula is intended to be iterated with a sequence of stepsizes 9 that add up to the distance to be covered for instance Q may be iterated N times with constant stepsize 9 lON to produce Yn9 m yn9 thus YO 3 370 for n l to N do Yn9 Q 9 nl9 Ynl9 Here the number N of timesteps is chosen with a view to the desired accuracy since the error YlO ylO normally approaches 0 as N increases to Infinity Were Euler39s formula used the error in its final estimate YlO would normally decline as fast as UN were Heun s lNz But the Ricatti differential equation is not normal no matter how big the number N of steps those formulas estimates YlO turn out to be huge positive numbers or overflows instead of 753 Conventional updating formulas do not work here The simplest unconventional updating formula Q available turns out to be this rational formula Qet Y Yte Y2e19Y if 9Ylt 19 t G9 l 9Y 19 otherwise The two algebraically equivalent forms are distinguished to curb rounding errors Like Heun39s this Q is a second order formula It can be compounded into a formula of arbitrarily high order by means that lie beyond the scope of these notes Iterating it N times with stepsize 9 lON yields a final estimate YlO in error by roughly lO5N2 even if DivisionbyZero insinuates an Infinity among the iterates Yn9 Disallowing Infinity and DivisionbyZero would at least somewhat complicate the estimation of ylO because yt has to pass through Infinity seven times as t increases from 0 to 10 See the graph on the next page What becomes complicated is not the program so much as the process of developing and verifying a program that can dispense with Infinity First find a very tiny number 8 barely small enough that l 10 8 rounds off to l Next modify the foregoing rational formula for Q by replacing the divisor l 9Y in the otherwise case by l 9Y 8 Do not omit any of these parentheses they prevent divisions by zero Then perform an erroranalysis to confirm that iterating this formula produces the same values Yn9 as would be produced without 8 except for replacing infinite values Y by huge finite values Survival without Infinity is always possible since Infinity is just a short word for a lengthy explanation The price paid for survival without Infinity is lengthy cogitation to find a nottoolengthy substitute if it exists Page ll Work in Progress Lecture Notes on the Status of IEEE 754 May 31 1996 244 pm Maple V r3 plots the solution yt of a Ricatti equation 10 End of first example The second example that illustrates the utility of Infinity is part of the fastest program known for computing a few eigenvalues of a real symmetric matrix This part can be programmed to run well on every commercially significant computer that conforms to IEEE 754 but not in any single higherlevel language that all such computers recognize a1b2 N0 bj0 b2 a2 b3 39 Let T b3 a3 b4 LBt 4113 0 and b4 am 2 Mn qJ bJ gt0 bn an for 1ltj Sn Every real symmetric matrix reduces quickly to a tridiagonal form like T with the same eigenvalues Tl lt T2 lt lt Tn The task is to find some of them specified either by an interval in which they lie or by their indices Typically a few dozen eigenvalues may be sought when the dimension n is in the thousands For this task the fastest and most accurate algorithm known is based upon the properties of two functions f f o and k ko defined for real variable 0 thus Page 12 Work in Progress Lecture Notes on the Status of IEEE 754 May 31 1996 244 pm oo0 Ifo0thisensuresitis 0 k n f 1 I FORj 123 INTURN I Note This loop has no explicit I tests nor branches D0fiGajlqjlf kkSignBitf kltogtk fltogtf The function SignBitf 0 if f gt0 or if fis 0 orO 1 ifflt00riffis 0 its value at f 0 may not matter but its value at f gt0 does It can be computed by using logical rightshifts or by using f lt00 in C or 05 SlGN05f in Fortran or 05 CopySign05f from IEEE 754 However the use of shifts or CopySign is mandatory on computers that depart from IEEE 754 by ushing UNDERFLOWed subtractions to 00 instead of UNDERFLOWing Gradually q v below Through an unfortunate accident the arguments of CopySign are reversed on Apple computers which otherwise conform conscientiously to IEEE 754 they require SignBit 05 CopySignf 05 The function f o is a continued fraction 2 f6 6 an quot1 2 O39 an l 1 2 O39 an 2 b 21 2 b2 O39 an 3 m The eigenvalues TU of T are the zeros of f o separated by the poles of f G at which it interrupts its monotonic increasing behavior to jump from ltgtltgt to ltgtltgt The integer function ko counts the eigenvalues on either side of 0 thus Tl lt T2 lt lt rko S o lt rko1 lt lt Tn and TkG o justwhen fo0 Evidently the eigenvalues TU of T are the n values of G at which ko jumps and may be located by Binary Search accelerated by interpolative schemes that take values of f G into account too Although rounding errors can obscure f o severely its monotonicity and the jumps of ko are practically unaffected so the eigenvalues are obtained more accurately this way than any other And quickly too If Infinity were outlawed the loop would have to be encumbered by a test and branch to prevent Divisionby Zero That test cannot overlap the division a slow operation because of sequential dependencies so the test would definitely slow down the loop even though zero would be detected extremely rarely The tests impact would be tolerable if the loop contained only one division but that is not what happens Because division is so slow fast computers pipeline it in such a way that a few divisions can be processed concurrently To exploit this pipelining we search for several eigenvalues simultaneously The variable 0 ecomes a small array of values as do f and k and every goaround the loop issues an array of divisions In this context the tests though vectorized too degrade speed by 25 or more much more on machines with multiple pipelines that can subtract and shift concurrently with division regardless of what else a branch would entail whenever a zero divisor were detected By dispensing with those tests this program gains speed and simplicity from Infinity even if DivisionbyZero never happens End of second example Page 13 Work in Progress Lecture Notes on the Status of IEEE 754 May 31 1996 244 pm How many such examples are there Nobody knows how many programs would benefit from Infinity because it remains unsupported by programming language standards and hence by most compilers though support in hardware has been abundant for over a decade To get some idea of the impediment posed by lack of adequate support1 try to program each of the foregoing two examples in a way that will compile correctly on every machine whose hardware conforms to IEEE 754 That ordeal will explain why few programmers experiment with Infinity whence few programs use it In my experience with a few compilers that support IEEE 754 on PCs and Macintoshes Infinity and NaNs confer their greatest benefits by simplifying test programs which already tend to grossly worse complexity than the software they are designed to validate Consequently my programs enjoy enhanced reliability because of IEEE 754 regardless of whether it is in force where they run End of Digression Exception OVERFLOW This happens after an attempt to compute a finite result that would lie beyond the finite range of the floatingpoint format for which it is destined The default specified in IEEE 754 is to approximate that result by an appropriately signed Infinity Since it is approximate OVERFLOW is also INEXACT Often that approximation is worthless it is almost always worthless in matrix computations and soon turns into NaN or worse misleading numbers Consequently OVERFLOW is often trapped to abort seemingly futile computation sooner rather than later Actually OVERFLOW more often implies that a different computational path should be chosen than that no path leads to the desired goal For example if the Fortran expression x SQRTxx yy encounters OVERFLOW before the SQRT can be computed it should be replaced by something like 8X SQRT SXSX Sysy with a suitably chosen tiny positive ScaleFactor s A power of 2 avoids roundoff The cost of computing and applying s beforehand could be deemed the price paid for insurance against OVERFLOW Is that price too high The biggest finite IEEE 754 Double is almost 18 e308 which is so huge that OVERFLOW occurs extremely rarely if not yet rarely enough to ignore The cost of defensive tests branches and scaling to avert OVERFLOW seems too high a price to pay for insurance against an event that hardly ever happens A lessened average cost will be incurred in most situations by first running without defensive scaling but with a judiciously placed test for OVERFLOW and for severe UNDERFLOW in the example above the test should just precede the SQRT Only when necessary need scaling be instituted Thus our treatment of OVERFLOW has come down to this question how best can OVERFLOW be detected The ideal test for OVERFLOW tests its flag but that flag cannot be mentioned in most programming languages for lack of a name Next best are tests for Infinities and NaNs consequent upon OVERFLOW but prevailing programming languages lack names for them suitable tests have to be contrived For example the C predicate z z is True only when z is NaN and the compiler has not optimized overzealously 10e37 l fabsz 0 is True onlywhenz isinfinite and 2 2 O is True only when z is NaN or infinite the INVALID trap has been disabled and optimization is not overzealous A third way to detect OVERFLOW is to enable its trap and attach a handler to it Even if a programming language in use provides control structures for this purpose this approach is beset by hazards The worst is the possibility that the handler may be entered inadvertently from unanticipated places Another hazard arises from the concurrent execution of integer and floatingpoint operations by the time an OVERFLOW has been detected data associated with it may have become inaccessible because of changes in pointers and indices Therefore this approach works only when a copy of the data has been saved to be reprocessed by a different method than the one thwarted by OVERFLOW and when the scope of the handler has been properly localized note that the handler must be detached before and reattached after functions that handle their own OVERFLOWs are executed Page 14 Work in Progress Lecture Notes on the Status of IEEE 754 May 31 1996 244 pm The two cosm of saving and scoping must be paid all the time even though OVERFLOW rarely occurs For these reasons and more other approaches to the OVERFLOW problem are to be preferred but a more extensive discussion of them lies beyond the intended scope of this document When OVERFLOW39s trap is enabled the IEEE 754 default Infinity is not generated instead the results exponent is wrapped which means in this case that the delivered result has an exponent too small by an amount 2K3913 that depends upon its format DoubleExtended too small by 24576 224576 13 E 7398 Double too small by 1536 21536 24 E 462 Single too small by 192 2192 63 E 57 Though required by IEEE 754 the latter two are not performed by ix87 nor 680x0 nor some other machines without help from suitable traphandling software In effecL the delivered result has been divided by a power of 2 so huge as to tuni what would have overflowed into a relatively small but predictable quantity that a traphandler can reinterpret If there is no trap handler computation will proceed with that smaller quantity or in the case of formatconverting FSTore instructions without storing anything The reason for exponent wrapping is explained after UNDERFLOW Exception UNDERFLOW This happens after an attempt to approximate a nonzero result that lies closer to zero than the intended floating point destination39s Normal positive number nearest zero 22 e308 is that number for IEEE 754 Double A nonzero Double result tinier than that must by default be rounded to a nearest Subnormal number whose magnitude can run from 22 e308 down to 49 e324 but with diminishing precision or else by 00 when no Subnormal is nearer IEEE 754 Extended and Single formats have different UNDERFLOW thresholds for which see the table Span and Precision of IEEE 754 FloatingPoint Formats above If that rounding incurs no error no UNDERFLOW is signaled Subnormal numbers also called Denormalized allow UNDERFLOW to occur Gradually this means that gaps between adjacent floatingpoint numbers do not widen suddenly as zero is passed That is why Gradual UNDERFLOW incurs errors no worse in absolute magnitude than rounding errors among Normal numbers No such property is enjoyed by older schemes that lacking Subnormals flush UNDERFLOW to zero abruptly and suffer consequent anomalies more fundamental than afflict Gradual UNDERFLOW For example the C predicates 2 y and x y 0 are identical inthe absence of OVERFLOW only if UNDERFLOW is Gradual That is so because x y cannot UNDERFLOW Gradually if x y is Subnormal then it is Exact Without Subnormal numbers xy might be 095 and yet xy could UNDERFLOW abruptly to 00 as could happen for x and y tinier than 20 times the tiniest nonzero Normal number Consequently Gradual Underflow simplifies a theorem very important for the attenuation of roundoff in numerical computation If p and q are floatingpoint numbers in the same format and if 12 S pq S 2 then p q is computable exactly without a rounding error in that format But if UNDERFLOW is not Gradual and if p q UNDERFLOWs it is not exact More generally floatingpoint erroranalysis is simplified by the knowledge first that IEEE 754 rounds every finite floatingpoint result to its best approximation by floatingpoint numbers of the chosen destination39s format and secondly that the approximations absolute uncertainty error bound cannot increase as the result diminishes in magnitude Erroranalysis and therefore program validation is more complicated sometimes appallingly more so on those currently existing machines that do not UNDERFLOW Gradually Page 15 Work in Progress Lecture Notes on the Status of lEEE 754 May 31 1996 244 pm Though afflicted by fewer anomalies Gradual UNDERFLOW is not free of them For instance it is possible to have xy 0 95 coexist with xz yz 0 5 because xz and probably also yz UNDERFLOWed to Subnormal numbers without Subnormals the last quotient turns into either an ephemeral 00 or a persistent NaN INVALID 00 Thus UNDERFLOW cannot be ignored entirely whether Gradual or not UNDERFLOWs are uncommon Even if flushed to zero they rarely matter if handled Gradually they cause harm extremely rarely That harmful remnant has to be treated much as OVERFLOWs are with testing and scaling or trapping etc However the most common treatment is to ignore UNDERFLOW and then to blame whatever harm is caused by doing so upon poor taste in someone else39s choice of initial data UNDERFLOWs resemble ants where there is one there are quite likely many more and they tend to come one after another That tendency has no direct effect upon the i387486Pentium nor M688812 but it can severely retard computation on other implementations of lEEE 754 that have to trap to software to UNDERFLOW Gradually for lack of hardware to do it They take extra time to Denormalize after UNDERFLOW andor worse to prenormalize Subnormals before multiplication or division Gradual UNDERFLOW requires no prenormalization before addition or subtraction of numbers with the same format but computers usually do it anyway if they have to trap Subnormals Worse still is the threat of traps whether they occur or not to machines like the DEC Alpha that cannot enable traps without hampering pipelining andor concurrency such machines are slowed also by Gradual UNDERFLOWs that do not occur Why should we care about such benighted machines They pose dilemmas for developers of applications software designed to be portable after recompilation to those machines as well as ours To avoid sometimes severe performance degradation by Gradual UNDERFLOW developers will sometimes resort to simpleminded alternatives The simplest violates lEEE 754 by flushing every UNDERFLOW to 00 and computers are being sold that flush by default DEC Alpha is a recent example it is advertised as conforming to lEEE 754 without mention of how slowly it runs with traps enabled for full conformity Applications designed with flushing in mind may when run on ix87s and Macs have to enable the UNDERFLOW trap and provide a handler that flushes to zero thereby running slower to get generally worse results This is what MathCAD does on PCs and on Macintoshes Few applications are designed with flushing in mind nowadays since some of these might malfunction if UNDERFLOW were made Gradual instead disabling the ix87 UNDERFLOW trap to speed them up is not always a good idea A format s usable exponent range is widened by almost its precision N to fully iZK as a byproduct of Gradual Underflow were this its sole benefit its value to formats wider than Single could not justify its price Compared with FlushtoZero Gradual Underflow taxes performance unless designers expend time and ingenuity or else hardware Designers incapable of one of those expenditures but willing to cut a corner off lEEE 754 exacerbate market fragmentation which cosm the rest of us cumulatively far more than whatever they saved Digression on Gradual Under ow To put things in perspective here is an example of a kind that when it appears in benchmarks scares many people into choosing FlushtoZero rather than Gradual UNDERFLOW To simulate the diffusion of heat through a conducting plate with edges held at fixed temperatures a rectangular mesh is drawn on the plate and temperatures are computed only at intersections The finer the mesh the more accurate is the simulation Time is discretized too at each timestep temperature at every interior point is replaced by a positively weighted average of that points temperature and those of nearest neighbors Simulation is more accurate for smaller timesteps which entail larger numbers of timesteps and tinier weights on neighbors typically these weights are smaller than 18 and timesteps number in the thousands Page 16 Work in Progress Lecture Notes on the Status of IEEE 754 May 31 1996 244 pm When edge temperatures are mostly fixed at 0 and when interior temperatures are mostly initialized to 0 then at every timestep those nonzero temperatures next to zeros get multiplied by tiny weights as they diffuse to their neighbors With fine meshes large numbers of timesteps can pass before nonzero temperatures have diffused almost everywhere and then tiny weights can get raised to large powers so UNDERFLOWs are numerous lf UNDERFLOW is Gradual denormalization will produce numerous subnormal numbers they slow computation badly on a computer designed to handle subnormals slowly because the designer thought they would be rare Flushing UNDERFLOW to zero does not slow computation on such a machine zeros created that way may speed it up When this simulation figures in benchmarks that test computers39 speeds the temptation to tuni slow Gradual UNDERFLOW Off and fast FlushtoZero On is more than a marketing manager can resist Compiler vendors succumb to the same temptation they make FlushtoZero their default Such practices bring to mind calamitous explosions that afflicted highpressure steam boilers a century or so ago because attendants tied down noisy over pressure relief valves the better to sleep undisturbed Vast numbers of UNDERFLOWs usually signify that something about a program or its data is strange if not wrong this signal should not be ignored much less squelched by flushing UNDERFLOW to O What is strange about the foregoing simulation is that exactly zero temperatures occur rarely in Nature mainly at the boundary between colder ice and warmer water Initially increasing all temperatures by some negligible amount say 103930 would not alter their physical significance but it would eliminate practically all UNDERFLOWs and so render their treatment Gradual or FlushtoZero irrelevant To use such atypical zero data in a benchmark is justified only if it is intended to expose how long some hardware takes to handle UNDERFLOW and subnormal numbers Unlike many other floatingpoint engines the i387 and its successors are slowed very little by subnormal numbers we should thank Intel39s engineers for that and enjoy it rather than resort to an inferior scheme which also runs slower on the i387etc End of Digression When UNDERFLOW39s trap is enabled the IEEE 754 default Gradual Underflow does not occur the results exponent is wrapped instead which means in this case that the delivered result has an exponent too big by an amount 2K3913 that depends upon its format DoubleExtended too big by 24576 224576 13 E 7398 Double too big by 1536 21536 24 E 462 Single too big by 192 2192 63 E 57 Though required by IEEE 754 the latter two wraps are not performed by ix87 nor 680x0 nor some other machines without help from suitable traphandling software In effect the delivered result has been multiplied by a power of 2 so huge as to tuni what would have underflowed into a relatively big but predictable quantity that a traphandler can reinterpret If there is no trap handler computation will proceed with that bigger quantity or in the case of formatconverting FSTore instructions without storing anything Exponent wrapping provides the fastest and most accurate way to compute extended products and quotients like a1 b1 4 512 72 4 a3 173 r r aN bN 01 d1 4 c2 d2 4 c3 d3 4 r cM dM Page 17 Work in Progress Lecture Notes on the Status of IEEE 754 May 31 1996 244 pm when N and M are huge and when the numerator andor denominator are likely to encounter premature OVER UNDERFLOW even though the final value of Q would be unexceptional if it could be computed This situation arises in certain otherwise attractive algorithms for calculating eigensystems or Hypergeometric series for example What Q requires is an OVERUNDERFLOW traphandler that counts OVERFLOWs and UNDERFLOWs but leaves wrapped exponents unchanged during each otherwise unaltered loop that computes separatel the numerator39s and denominator39s product of sums The final quotient of products will have the correct significant bits but an exponent which if wrong can be corrected by taking counts into account This is by far the most satisfactory way to compute Q but for lack of suitable traphandlers it is hardly ever exploited though it was implemented on machines as diverse as the IBM 7094 and 360 by me in Toronto in the 1960s see Sterbenz 1974 a Burroughs B5500 by Michael Green at Stanford in 1966 and a DEC VAX in 1981 by David Baniett then an undergraduate at Berkeley Every compiler seems to require its own implementation Exception INEXACT This is signaled whenever the ideal result of an arithmetic operation would not fit into its intended destination so the result had to be altered by rounding it off to fit The INEXACT trap is disabled and its flag ignored by almost all floatingpoint software Its flag can be used to improve the accuracy of extremely delicate approximate computations by Exact Preconditioning a scheme too arcane to be explained here for an example see pp 107 110 of HewlettPackard s HP15C Advanced Functions Handbook 1982 0001590011 Another subtle use for the INEXACT flag is to indicate whether an equation fx 0 is satisfied exactly without roundoff in which case x is exactly right or despite roundoff in which notisoirare case x may be arbitrarily erroneous A few programs use REAL variables for integer arithmetic M680x0s and ixS7s can handle integers up to 65 bis wide including sign and convert all narrower integers to this format on the fly In consequence arithmetic with wide integers may go faster in floatingpoint than in integer registers at most 32 bits wide But then when an integer result gets too wide to fit exactly in floatingpoint it will be rounded off If this rounding went unnoticed it could lead to final results that were all unaccountably multiples of say 16 for lack of their last few bits Instead the INEXACT exception serves in lieu of an INTEGER OVERFLOW signal it can be trapped or flagged Well implemented BinaryDecimal conversion software signals INEXACT just when it is deserved just as rational operations and square root do However an undeserved INEXACT signal from certain transcendental functions like XY when an exact result is delivered accidentally can be very difficult to prevent Directions of Rounding The defaulL reset by tuniing power offon rounds every arithmetic operation to the nearest value allowed by the assigned precision of rounding When that nearest value is ambiguous because the exact result would be one bit wider than the precision calls for the rounded result is the even one with its last bit zero Note that rounding to the nearest 16 32 or 64bit integer floattoint store in this way takes both 15 and 25 to 2 so the various INT IFIX conversions to integer supported by diverse languages may require something else One of my Fortran compilers makes the following distinctions among roundings to nearest integers IRINT RINT DRINT round to nearest even as ix87 FIST does NINT ANINT DNINT round halfintegers away from 0 INT AINT DINT truncate to integers towards 0 Rounding towards 0 causes subsequent arithmetic operations to be truncated rather than rounded to the nearest value in the direction of 00 In this mode storetoint provides INT etc This mode also resembles the way many old machines now long gone used to round Page 18 Work in Progress Lecture Notes on the Status of IEEE 754 May 31 1996 244 pm The Directed roundings can be used to implement Interval Arithmetic which is a scheme that approximate every variable not by one value of unknown reliability but by two that are guaranteed to straddle the ideal value This scheme is not so popular in the USA as it is in pars of Europe where some people distrust computers ControlWord control of rounding modes allows software modules to be rerun in different rounding modes without recompilation This cannot be done with some other computers notably DEC Alpha that can set two bits in every instruction to control rounding direction at compiletime that is a mistake It is worsened by the designers39 decision to take rounding direction from a ControlWord when the two bis are set to what would otherwise have been one of the directed roundings had Alpha obtained only the roundtonearest mode from the ControlWord their mistake could have been transformed into an advantageous feature All these rounding modes round to a value drawn from the set of values representable either with the precision of the destination or selected by rounding precision control to be described below The sew of representable values were spelled out in tables above The direction of rounding can also affect OVERUNDERFLOW a positive quantity that would OVERFLOW to ltgtltgt in the default mode will tum into the formats biggest finite floating point number when rounded towards ltgtltgt And the expression X X delivers 00 for every finite X in all rounding modes except for rounding directed towards ltgtltgt for which 00 is delivered instead These details are designed to make Interval Arithmetic work better Ideally software that performs BinaryDecimal conversion both ways should respect the requested direction of rounding David Gay of ATampT Bell Labs has released algorithms that do so into the public domain Netlib to use less accurate methods now is a blunder Precisions of Rounding IEEE 754 obliges only machines that compute in the Extended long double or REALlO format to let rogrammers control the precision of rounding from a ControlWord This lets ix87 or M680x0 emulate the roundoff characteristics of other machines that conform to IEEE 754 but support only Single C39s float or REAL4 and Double C39s double or REAL8 not Extended Software developed and checked out on one of those machines can be recompiled for a 680x0 or ix87 and if anomalies excite concenis about differences in roundoff the software can be run very nearly as if on its original host without sacrificing speed on the 680x0 or ix87 Conversely software developed on these machines but without explicit mention of Extended can be rerun in a way that portends what it will do on machines that lack Extended Precision Control rounds to 24 sig bis to emulate Single to 53 sig bits to emulate Double leaving zeros in the rest of the 64 sig bits of the Extended format The emulation is imperfect Transcendental functions are unlikely to match Although Decimal gt Binary conversion must round to whatever precision is set by the ControlWord Binary gt Decimal should ideally be unaffected since its precision is determined solely by the destination s formaL but ideals are not always attained Some OVERUNDERFLOWs that would occur on those other machines need not occur on the ix87 IEEE 754 allows this perhaps unwisely to relieve hardware implementors of details formerly thought unimportant Few compilers expose the ControlWord to programmers Worse some compilers have revived a nasty bug that emerged when DoublePrecision first appeared among Fortran compilers it goes like this Consider X T SY ina Fortran program where S is SINGLE PRECISION and X and Y are DOUBLE or EXTENDED PRECISION variables or expressions computed in registers Compilers that supplant S by X in the second statement save the time required to reload S from memory but spoil T Though S and X differ by merely a rounding error the difference matters Page 19 Work in Progress Lecture Notes on the Status of IEEE 754 May 31 1996 244 pm The Baleful In uence of Benchmarks Hardware and compilers are increasingly being rated entirely according to their performance in benchmarks that measure only speed That is a mistake committed because speed is so much easier to measure than other qualities like reliability and convenience Sacrificing them in order to run faster will compel us to run longer B disregarding worthwhile qualities other than speed current benchmarks penalize conscientious adherence to standards like IEEE 754 worse attempts to take those qualities into account are thwarted by political constraints imposed upon programs that might otherwise qualify as benchmarks For example a benchmark should compile and run on every commercially significant computer system This rules out our programs for solving the differential equation and the eigenvalue problem described above under the Digression on DivisionbyZero To qualify as benchmarks programs must prevent exceptional events that might stop or badly slow some computers even if such prevention retards performance on computers that1 by conforming conscientiously to IEEE 754 would not stop The Digression on Gradual Underflow offered an example of a benchmark that lent credibility to a misguided preference for FlushtoZero in so far as it runs faster than Gradual Underflow on some computers b disregarding accuracy lf Gradual Underflow39s superior accuracy has no physical significance there neither has the benchmark39s data Accuracy poses tricky questions for benchmarks One hazard is the Stopped Clock Paradox Why is a mechanical clock more accurate stopped than running A running clock is almost never exactly right1 whereas a stopped clock is exactly right twice a day But WHEN is it right Alas that was not the question The computational version of this paradox is a benchmark that penalizes superior computers that produce merely excellent approximate answers by making them seem less accurate than an inferior computer that gets exactly the right answer for the benchmark39s problem accidentally Other hazards exist too some will be illustrated by the next example Quadrch equations like p x2 2 q x r 0 arise often enough to justify tendering a program that solves it to serve as a benchmark When the equations roots xl and x2 are known in advance both to be real the simplest such program is the procedure ertc exhibited on the next page In the absence of premature OverUnderflow ertc computes xl and x2 at least about as accurately as they are determined by data p q r uncorrelatedly uncertain in their last digits stored It should be tested first on trivial data to confirm that it has not been corrupted by a misprint nor by an ostensible correction like xl qsp x2 q sp copied naively from some elementary programming text Here are some trivial data pAnynonzeroqr0 xlx20 p20 q50 r2120 xl20 x230 p20E37 ql0 r20 xlml0 x2ml0E37 Swapping p with q swaps xlx2 with lx2 lxl up uq ur yields x1 x2 independently of nonzero u Page 20 Work in Progress Lecture Notes on the Status of IEEE 754 May31 1996 244 pm A Proposed Accuracy Benchmark Procedure Qtest ertc Parameter n Intime 11 may grow Real Array rlzn Chooseprecisionhere rl 12 20 for 24sigbits r2 2 12 225 and6hex r3 16 3 1 1016 2 6 hexIBM r4 2 24 20 48bimCRAY r5 2 24 225 rounded r6 2 24 30 48bitsch0pped r7 949062670 53 sigbits r8 94906267 025 53 sigbits r9 2 28 55 PowerPC1860 rlO 2 28 45 PowerPC1860 rll 2 28 20 56 sigbits rl2 2 28 225 and14hex rl3 16 7 1 1016 6 14hexIBM rl4 2 32 20 64 sigbits r15 2 32 225 64 sigbits e Infinity for 39 to n do t Qtrial ertc rj Couldbe NaN If t lt e or nott t then e Display quot Worst accuracy is quot e quot sig bitsquot Return End Qtest Real Function Log2X LogAbsXLog20 Real Function Qtrial ertc r 2 p r 2 r l Qtrial O Display Nameofertc quot for r quot r If p S 0 then Displayquot Qtrial r expects r gt 2 quot elseif notr ql amp q pl then Displayquot r is too big for Qtrial rquot else Call ertc p q r X1 X2 el Log2 Xl l Couldbe NaN e2 Log2 X2 l 2p Heedparentheses Qtrial Min el e2 MinNaNNaNisNaN Displayquot gets quot el quot and quot e2 quot sig bitsquot If not X1 2 l O then Displayquot and root quot Xl quot isn t at least lquot Display Return Qtrial End Qtrial Procedure ertc p q r X1 X2 2 Real p q r s X1 X2 Chooseprecisionhere s w qq pr NaN if Vlt0 S q Copysigns q i Fortran s SIGN OK If S 0 then X1 2 X2 rp else X1 rS X2 Sp NaNs ifnotreal Return End ertc orelseitmay abort Page 21 Work in Progress Lecture Notes on the Status of IEEE 754 May 31 1996 244 pm The proposed benchmark program Qtest runs ertc on a battery of data sets each chosen to expose the worst rounding error of which a computing system is capable The system39s precision appears next to its data r as an annotation in Qtest Each datum r is expressed in a way that avoids damaging roundoff at the precision under test and since r is a large positive number but not too large the other two coefficients p r2 and q rl are also computed uncontaminated by roundoff in function Qtrial Therefore Qtrial knows the correct roots xl l and x2 l 2p exactly and can compare them with the roots xl and x2 computed by ertc to determine its accuracy More important than accuracy are mathematical relationships implied by correlations among data In this problem inequalities q2 2 p r and 0 lt p lt q lt r and pq 2 q r can all be confirmed directly by tests and imply that both roots must be real and no less than 10 When ertc fails to honor those implications Qtrial notices What should we expect wouldbe benchmark Qtest to find when it runs in 8byte floatingpoint on some current computer systems Tabulated under Precision is how many significant bits are stored in the named system39s 8byte format different systems trade off precision and range differently and this should be taken into account before one system is condemned for getting less accuracy than another Next comes the worst Accuracy Qtest encountered evidently as many as half the sig bis stored in computed roots can be inaccurate Worse the smaller computed root can fall short of 10 in the sig bit whose position is tabulated last These findings cry out for explanation how can some computer systems get worse accuracy than others that store the same number of sig bits Expected Results from Qtest ertc on 8 byte Floating Point Computer Software Precision Accuracy How far lt 1 Hardware System sig bits sig bits sig bit ix8687 amp Fortran C Pentium based TurboBasic PCs TurboPascal 53 32 333 680x0based Sun III Fortran C Macintosh DEC VAX D Fortran C 56 28 293 ix8687 amp MATLAB 35 Macintosh MathCAD 25 SGI MIPS Fortran 53 265 278 SPARC C DEC VAX G MATLAB 4x IBM 370 Fortran C 56 264 264 CRAY YMP Fortran C 48 24 253 Intel 860 NaN from NaN from PowerPC Fortran C 53 IBM RS6000 w lt 0 w lt 0 The explanation is easy for the IBM 370 its hexadecimal floatingpoint loses two or three sig bits compared with binary floatingpoint of the same width No matter these formerly ubiquitous machines are disappearing The best accuracy 32 sig bits is achieved on inexpensive ix8687based PCs and 680x0based Macintoshes whose hardware permits every algebraic subexpression though no named variable wider than 8 bytes appears in it1 to be evaluated in Extended registers 10 bytes wide and by software systems compilers that neither disable nor eschew that capability regardless of whether they support named lObyte variables These computer systems also accept without premature overunderflows a wider range of input data up uq ur than do the others though this robustness cannot be explored by Qtest without crashing some systems Page 22 Work in Progress Lecture Notes on the Status of IEEE 754 May 31 1996 244 pm MATLAB and MathCAD on ix8687 and 680x0 platforms store almost every subexpression into internal 8 byte scratch variables thereby wasting time as well as the lObyte registers39 superior accuracy and range that is why their accuracy is no better on machines with lObyte registers than on machines without The final mystery is the NaN Not a Number obtained from the i860 IBM RS6000 and PowerPC instead of roots The NaN arises from the square root of a negative number qq pr although tesm performed upon input data would find that QQ qq and PR pr do satisfy QQ 2 PR This paradox arises out of the Fused MultiplyAccumulate instruction possessed by those machines The i860 s MAC is only partially fused The paradox can be suppressed by inhibiting that instruction at compile time but doing so generally would slow those machines therefore their compiler was designed to render that inhibition inconvenient and unusual If Qtest were run on these machines in their unusual mode would that constitute a fair test Fairness raises troublesome issues for a benchmark What if custodians of a computer family allege unfairness Letting them tweak a benchmark slightly to render it fair lets them overcompensate in devious ways very difficult to expose For example replace ertc by an ostensibly algebraically equivalent procedure Procedure PPCertc p q r X1 X2 2 Real 0 p q r 5 x1 x2 Chooseprecisionhere S pr 0 pr S Suim PowerPC well Vqq s 0 NaN if Vlt0 q i Fortran s SIGN OK 2 X2 rp else xlrS 2 Return End PPCertc 819 NaNs ifnotreal or else may abort Aside from running slightly slower QtestPPCertc differs from Qtest ertc only by getting 53 sig bis instead of NaN on the PowerPC and RS6000 which then win the prize for accuracy Which of Qtest ertc and Qtest PPCertc assesses accuracy more fairly In general insisting that a benchmark exist in only one version and that it run successfully no NaNs on every machine may cripple speed or accuracy or robustness on computers with advantageous features others lack Permitting variety may invalidate comparison As it is now Qtest ertc tells us something Ithink worth knowing regardless of whether it is admitted to the ranks of industryapproved benchmarks Exceptions in General Reconsidered The prevailing attitude towards exceptions is changing Previously they were declared to be errors that would abort an offending program Abortion could be prevented only by defensive programming that tested for every error condition in advance Punitive policies and paranoid practices persist but now in competition with other options afforded programmers by IEEE 754 though handicapped by their near invisibility in programming languages How might exceptionhandling be practiced if other options were supported properly The Standard Apple Numerical Environment SANE documented in the Apple Numerics Manual 1988 is one approach What follows is another I have implemented partially First exceptionclasses must have names preferably the same names in all languages Venerable languages that limit names lengths still live so the names have to be short here are suggestions for fiveletter names for floating point exceptions plus a few others Page 23 Work in Progress Lecture Notes on the Status of IEEE 754 May 31 1996 244 pm Name Description of Exception INXCT INeXaCT due to floatingpoint roundoff or overunderflow UNFLO floatingpoint UNderFLOw Gradual or not DIVBZ Infinity exactly from finite operands eg 10 OVFLO floatingpoint OVerFLOw INTXR INTeger arithmetic eXception or eRror like over ow or l0 INVLD INVaLiD operation most likely one from the list that follows ZOVRZ 00 00 IOVRI Infinity Infinity These four are the rational lMINI Infinity Infinity Removable Singularities ZTMSI 00 Infinity FODOM Function computed Outside its DOMain e g V03 DTSTR Attempted access outside a DaTa STRucture or array NLPTR De referencing a NiL PoinTeR UNDTA UNinitialized DaTum or vAriable or SNaN These names are intended to identify such flags as may exist to signal exceptions to a program and such modes as a programmer may choose to predetermine the program39s response to exceptions More important than the spellings are the length and structure of the list It must be parsimonious if allowed to grow indefinitely it can accrete names unknown to most of us or with overlapping meanings and then our programs would mishandle their exceptions The list should be comprehensive enough to leave no kind of exception uncovered by a name the list above may be incomplete It does include names for exceptions practically undetectable on some systems examples are UNFLO on a CRAY YMP IOVRI on machines that lack Infinity DTSTR on systems that do not check arraybounds and INXCT on nonconformers to IEEE 754 The list is structured less to reflect how or where the exception is detected as C39s nearly useless ERRNO does and more to re ect what may be done to remedy it For example expressions 00 oooo oooo and oo0 are distinguished because to produce anything of more use than a NaN they require different versions of l39Hospital39s rule for the removal of removable singularities Though different named exceptions require different remedies the list of remedies worth considering for any particular named exception class fits into a short menu for a preprogrammed exceptionhandling library Selection from an adequate menu will serve applications programmers far better than coding up each his own handler Here are fiveletter names for the few exceptionhandling modes thought worth putting into a menu PAUSE ABORT PREMT IEEED PSUBS KOUNT ABSNT To select mode PAUSE for handling say OVFLO is to request that each floatingpoint overflow suspend computation and invoke a debugger to display the values of variables defined at that moment after which the onlooker may either abort computation or else resume it as if the overflow had been handled in the previously prevailing mode PAUSE is a debugging mode applicable to every other exceptionhandling mode ABORT is a mode now common for severe exceptions it empties buffers closes files and retunis to the operating system PREMT preempts the handling of a designated exception for whatever language ambience had been over ridden by a previously invoked mode For example to PREMT ZOVRZ in APL is to reestablish the definition 00 l to PREMT ZOVRZ in ADA is to put 0000 back among Arithmetic Errors that drop execution into a program module39s error handler if one has been provided or else ABORTs PREMT is indispensable when a language sports control structures like ON ERROR do something else or go somewhere else and ABSENT ERROR try something ELSE do something else but lacks locutions to distinguish among different kinds of exceptions Page 24 Work in Progress Lecture Notes on the Status of IEEE 754 May 31 1996 244 pm Control structures like those palliate exceptions rather than handle them all well The main de ciency is a lack of recognizable names for modes de nedimplicitly by do something else clauses a name known to an independently compiled subprocedure of the try something clause could tell it which exceptions of its own to hide and which to expose Other de ciencies exacerbate the cost of scoping Which variables in the try something clause are to be saved and in what state for the do something else clause to use after the ERROR 7 A satisfactory discussion lies beyond the scope of these notes IEEED names the Default Mode in which every exception mentioned by IEEE 754854 must be handled by default unless a programmer asks explicitly for another mode This requirement is violated by a few computer systems that run much faster in some other mode and by some compilers whose authors fear the consequences of unleashing Infinity and NaN upon a programmer who has not said he wants them To every named floating point exception except INXCT and UNFLO IEEED has assigned what I call a presubstitution that is a precomputed number whose magnitude and possibly its sign will be substituted for the result of an exceptional oatingpoint operation For DIVBZ and OVFLO IEEED presubstitutes ice with an appropriate sign For the lNVLDs ZOVRZ l VRL lMlNL ZTMSI and FODOM IEEED presubstitutes NaN For INXCT IEEED does not presubstitute but yields an approximation in accordance with current modes of rounding direction and precision IEEED for UNFLO is Gradual For example with DIVBZ in IEEED mode LOG00 oltgt is not trapped as an ERROR though it does raise the DIVBZ flag PS UBS is a generalization of IEEED that presubstitutes any number computed by the program in advance for any oatingpoint exception For example PSUBS 00 i for UNFLO replaces Gradual Underflow by Flush toZero with preservation of sign invoke PSUBS 00 to flush UNFLO to 00 Similarly PSUBS y for ZOVRZ replaces a subsequent SINxyx by y whenever x 00 Thus programmers can remove some removable singularities with PSUBS without explicit tests nor branches It is no panacea Those tests and branches may have to be introduced implicitly by the compiler 7 for vectorized machines like CRAYs Neither can PSUBS COS x for ZOVRZ shield SINx SINy xy from damage caused by roundoff when x is too near y use PSUBSl0 and COSxy2 SINxy2 xy2 respectively instead Here is a nontrivial application of presubstitution It simpli es an economical way to compute the continued fraction fx introduced above during the Digression on Division by Zero and simultaneously its derivative f x They might be computed for many different values x if they serve in Newton s iteration x 7gt x fxf x for solving fx 0 While x is being computed from the recurrence fn x an qnfn1 starting with f1 xal and ending with fx fn another recurrence computes its derivative Because multiplication can go rather faster than division two divisions have been replaced by one division and two multiplications in the recurrences but at the cost of introducing auxiliary variables rj lfj hj qjrj1 so fj xaj hj and f39j1 391hj39hj Recall that every qj gt 0 this implies that every f39j 2 l and absent overunder ow that no hj 0 unless fj1 and f39j1 are both in nite Over ow or 0oltgt could interfere with the computation of f 39j if nothing else were done about them What we shall do about them is precompute an array of quotienm Pj qj1lqj and presubstitute Sovflo PSUBS 1013150 1 for OVFLO Savepriorpresubstitution for Over ow Sztmsi PSUBS 00 for ZTMSI Savepriorpresubstitution for 0w Presubstitution replaces Ooo here ZTMSI enddo f39x f39 fx f PSUBS Sovflo for OVFLO PSUBS Sztmsi for ZTMSI Restore prior presubstitutions This program has no oatingpoint testandbranch Instead the rst PS UBS replaces an over owed f39 by a huge but nite il0El50 and then the PSUBS inside the doloop accomplishes the same effect as if it and the previous assignment were replaced by if f39 is finite then f39 hrf39 1 else f39 Pj1SSf39 1 endif SSf39 Sf39 Page 25 Work in Progress Lecture Notes on the Status of IEEE 754 May 31 1996 244 pm Mode KOUNTk exploim exponentwrapping to count OVERUNDERFLOWs in an integer variable k as if it were a leftward extension of the floatingpoint exponent field We have seen one application above it was the fast and accurate evaluation of expressions like Q described under UNDERFLOW lf implemented fast enough this mode also speeds up the comparison of complex magnitudes Ix 1y x2 yz via the relationship Ix 1y lt u 1v if and only if xuxu lt vyvy To attenuate roundoff rst swap so that x 2 y and u 2 IvI OVFLO and UNFLO ags do not get raised in KOUNT mode For nearly three decades no other floatingpoint exceptionhandling modes than PAUSE AB ORT PREMT IEEED PSUBS and KOUNT have been found both worthwhile and compatible with concurrent execution of floatingpoint and integer operations on very fast processors If not due to a lack of imagination this state of affairs justifies efforts to promulgate a modest library of exceptionhandling modes rather than leave every programmer to his own devices A few more floatingpoint modes require support on systems that conform to IEEE 754 Directed Roundings DlRND ToNEAR ToZERO ToPOSV ToNEGV Rounding Precisions RNDPR ToSNGL ToDBLE ToEXTD Rounding Precision modes are pertinent only to hardware that evaluates every floatingpoint expression in the DoubleExtended REALlO format However they do raise a question of general interest What if a program attempts to invoke a nonexistent or unsupported mode An errormessage protesting the use of an undefined name or else the response to CS if def command for conditional compilation would answer this question at compiletime At runtime the answer to an environmental inquiry conceniing an unsupported mode39s status might best be AB SNT defined herewith to be the name of no mode ABSNT is the mode of INXCT UNFLO and DlRND on a CRAY YMP for example Flags and modes are variables of type Flag and Mode that may be sensed saved and set by library programs I prefer programs that are syntactically functions but actually swap values For example my function Fflag OVFLO NewFlag returns the current value of the OVFLO flag and resets that flag to NewFlag FflagOVFLO merely retunis the value without changing it A flag39s value resembles a pointer in that it may be 39 er Null or some nonNull value retunied by Fflag and also resembles a Boolean value insofar as Null behaves like False and every other flag like True Consequently a typical patteni of use for Fflag goes like this SavOV Fflag OVFLO Null savesampclears OVFLO flag X 2 ex ression that may Overflow prematurely If Fflag OVFLO SavOV then havingrestored OVFLO flag X 2 alternative expression At the end premature OVFLOs have been hidden and the OVFLO flag is raised just if it was raised before or if the alteniative expression X overflowed Similarly Fmode DlRND NewDir swaps the DlRND mode for saving sensing and restoring For example if function g z is contrived either to retuni a monotonic increasing function of its argument 2 and of its rounding errors this can be tricky or else to take proper account of Fmode DlRND then a few statements like SavDir Fmode DIRND ToNEGV Rounding towards oltgt xlo g 210 yields upperbound Dummy Fmode DIRND ToPOSV Rounding towards ltgtltgt xhi g zhi yields lower bound Dummy Fmode DIRND SavDir Restores prior rounding Page 26 Work in Progress Lecture Notes on the Status of IEEE 754 May 31 1996 244 pm guarantee that xlo S gzlo S exact gz S gzhi S xhi despite roundoff This is admittedly a cumbersome way to obtain what Interval Arithmetic would deliver easily if it received the support it deserves from popular programming languages Here follows a simple example of flags and modes working together The Euclidean Length Norm of a DoublePrecision vector x is me xlL w xi2 xz2 x32 xL2 This simple formula arises in so many matrix computations that every matrix package like LAPACK and MATLAB contains a subprogram devoted to it It poses two technical challenges how may we l avoid an incorrect result caused by premature Over ow of some ij2 or Underflow of too many of them though the true value of Vnrm is unexceptional 2 avoid excessive accumulation of roundoff when L is huge For example consider the case when every ij for j gt 1 is barely small enough that x 2 xl2 rounds to xl2 then Vnrm can come out too small by about L4 units in its last place if the additions are performed lefttoright L usually stays below a few hundred but often runs into several thousands These challenges are worth overcoming only if doing so does not slow computation of Vnrm too much compared with the obvious subprogram Double Vnrm Double xlzL 3200 For j l to L do 5 s Xjxj Return Vnrm 2 Vs On a 680x0 based Macintosh or PC with ix87 our problem has an easy solution provided the compiler supports IEEE 754 DoubleExtended REAL10 begin the obvious subprogram with the declaration Double Extended s O Then the sumofsquares s will accumulate in a register with 3 more bits of exponent range and ll more bits of precision then Vnrm and x Thus with no loss of speed OverUnderflow is precluded unless Vnrm must lie out of range and roundoff is kept below 1 L8192 units in its last place These are typical benefits of an Extended format Moreover this subprogram honors Directed Roundings and the KOUNT mode of Over Underflow In the absence of Extended a craftier subprogram is needed to fiddle with flags and modes during the computation of Vnrm and particularly to ensure that the last expression computed and Returned also raises and merges only those flags that deserve alteration or else KOUNTs If the program to do so presented on the next page appears too baroque compare it with slower less accurate and more elaborate subprograms that now infest portable libraries like LAPACK The SAN E library provides two procedures ProcEntry and ProcExit to save and restore all flags and modes and merge old flags with new simultaneously However SAN E makes no provision for exceptions other than those mentioned by IEEE 754 nor for modes other than ABORT and IEEED nor for computer systems that do not conform to IEEE 754 My scheme lets a programmer utter for example If FmodeINXCT IEEED ABSNT then Dummy FflagINXCT True else Dummy FflagINXCT Null after which his program repeatedly preconditions data until a rounding error renders further repetition unnecessary on machines that detect INXCT but preconditions just once otherwise Page 27 PSUBS 10E128 i Work in Progress Lecture Notes on the Status of IEEE 754 May 31 1996 244 pm Double Vnrm Double XllzL OVm FmodeOFLO lEEED UNm FmodeUNFLO lEEED OVf FflagOFLO Null UNf FflagUNFLO Null swaps b l d z 1 these willbescale factors ifneeded s z 0 c z 0 c will compensate for additive roundoff For j 1 to L Do r 7 xljlxljl t 7 s s rc t Compensateforthisroundoff c tes r Heedparentheses OVf Ff lag OVFLO OVf If FflagUNFLO Null amp s lt 05969 Constants Then b 20996 1 z 059 suitonly Else If OVf IEEE754 Then b 05754 1 z 20754 Double If b i l Then Redo accumulation with scaled x s s 0 c 0 ForjltoLDo t bxjl r tt t z s z s rc t c tes 16 c UNf FflagUNFLO UNf OVm FmodeOFLO OVm UNm FmodeUNFLO UNm Return Vnrm laws lneed three more library programs Two of them are swapping functions Fpsubs ExceptionName NewValue supports presubstitution Second Kountf k lnitk designates k to be the integer variable into which OVER UNDERFLOWs will be counted reads out the current value of k and then resets it to the value of lnitk Those two may be embedded in Fmode The third program inserts or updates entries in the log of Retrospective Diagnostics or reads it out but that is a story for another day The foregoing schemes to handle floatingpoint exceptions can be called elaborate complicated cumbersome add your own pejoratives I shall rejoice if somebody shows me a simpler way to accomplish all of what my proposal tries to do Meanwhile onlookers who need not know about all these complications can stay in their blissful state of ignorance because IEEE 754 was designed with their state of mind in mind IEEE 754 establishes partially nested computational domains What this means is best illustrated by examples Rare individuals who intend to perform floatingpoint computations exactly without roundoff must pay close attention to the INXCT exception the rest of us ignore it because we are willing to tolerate roundoff Someone whose concept of Real Number excludes Infinity must watch out for DIVBZ those of us who ignore it accept ice as the uttermost among the Reals Quantities so small as 10E307 lie beneath our notice most the time so we ignore UNFLO and IEEE 754 specifies that Underflow be Gradual to reduce the risk of harm from what we disdain A few people can ignore OVFLO in a situation where any sufficiently big number will do this belief could be tested by recomputing with OVFLO in a few modes like PSUBS 10E32 i PSUBS 10E64 i instead of IEEED PSUBS Infinity i No one can ignore INVLD unless someone else has used Fflag lNVLD or Fmode lNVLD PSUBS or lsNaN to cope with that exception In short most of us can ignore most exceptions most the time provided someone else has thought about them Tha someone else needs our support lest we all be obliged to face some ugly problems unaided Page 28 Work in Progress Lecture Notes on the Status of IEEE 754 May 31 1996 244 pm Ruminations on Programming Languages Mediaeval thinkers held to a superstition that Thought was impossible without Language that is how dumb came to change its meaning from speechless to stupid With the advent of computers Thought and Language have changed their meanings and now there is some truth to the old superstition In so far as programming languages constrain utterance they also constrain what a programmer may contemplate productively unless disrupted by bitter experience or liberated by vigorous imagination Considering how relatively few programmers grapple daily with floatingpoint arithmetic and how few of those have time to contemplate unsupported features of IEEE 754 it comes as no surprise that computer linguists receive hardly any requests to support those features Most computer linguists find floatingpoint arithmetic too disruptive Their predilection for referential transparency which means that a wellformed expression39s meaning should not change from one context to another runs counter to an imperative of approximate calculation The precisions with which expressions are evaluated must depend upon context because the accuracy required of an approximation depends more upon the uses to which it will be put and upon the resources available to compute it than upon alleged precisions of constituent subexpressions Consequently rules promulgated in 1963 inherited from Fortran IV for evaluating mixedprecision expressions are not optimal and never were those rules turn pernicious when applied to more than two precisions especially when precisions can vary at runtime See C Farnum39s 1988 paper for better ways to handle mixed precisions These pernicious rules are deeply imbedded in C insofar as its operator overloading explicitly disallows the expected type of a result to influence the choice of an operator now selected by consulting only the types of its operands This restriction precludes certain troublesome ambiguities but it also precludes fully effective C implementations of intensely desirable but contextdependent programming ambiances like Mixed Precisions arbitrarily Variable at runtime Interval Arithmetic arbitrarily mixable with NonInterval Arithmetic Ostensibly CoordinateFree expressions concerning Objects in Linear Spaces Mixedprecision expressions should ideally be evaluated at the widest of the precision of operands in the expression not segregated by explicit coercions Noninterval expressions must be evaluated as if they were intervals when mixed with Interval Arithmetic expressions In ostensibly coordinatefree Linear Algebra expressions must be evaluated in some coordinate system determinable from context if it is determined at all These contextdependent languages become burdensomely awkward to use when the programmer is obliged to utter explicitly those coercions and conversions that a compiler could almost always determine quickly from context Computer linguists also dislike functions with sideeffects and functions affected by implicit variables not explicit in argument lists But floatingpoint operations can raise IEEE 754 exception flags as sideeffects and operations are affected implicitly by exceptionhandling and rounding modes eligible at runtime according to IEEE 754 Alas that standard omitted to bind flags and modes to locutions in standard programming languages and this omission grants computer linguists a licence for inaction The sideeffecm and implicit variables in IEEE 754 admit tractable disciplines they are not whimsical Moreover other computational domains exist where contextdependence sideeffecm and implicit variables are rampant Examples of sideeffects and implicit variables abound in operating systems inpu output and filehandling real time control systems and synchronization of parallel computing In short the features of IEEE 754 that computer linguists disdain raise issues that cannot be evaded by avoiding floatingpoint they have to be addressed elsewhere anyway and in forms more obnoxious than in IEEE 754 Programmers ambitious enough to try to apply those features but left to their own devices cannot transport their meager successes to different computer systems their situation could worsen only if palliatives were incorporated into language standards and there is some risk of that Thoughtful action is needed now to avert an intensification of market fragmentation that retards development of robust numerical software and diminishes the market and its rewards for all of us Page 29 Work in Progress Lecture Notes on the Status of IEEE 754 May 31 1996 244 pm Annotated Bibliography IEEE standards 754 and 854 for FloatingPoint Arithmetic for a readable account see the article by W J Cody et al in the IEEE Magazine MICRO Aug 1984 pp 84 100 Whatevery computer scientist should know about oatingpoint arithmetic D Goldberg pp 548 in ACM Computing Surveys vol 23 1 1991 Also his Computer Arithmetic appendix A in ComputerArchitecture A Quantitative Approach J L Hennessey and DA Patterson 1990 Morgan Kaufmann San Mateo CA Surveys the basics Compiler Support for FloatingPoint Computation Charles Farnum pp 701 9 in Software Practices and Experience vol 18 no 7 1988 Describes among other things better ways than are now customary in Fortran and C to evaluate mixed precision expressions Intel Pentium Family User s Manual Volume 3 Architecture and Programming Manual 1994 Order no 241430 Explains instruction set control word ags gives examples Its aws are listed in Pentium Processor Speci cations Update Order No 242480001 Feb 1995 Programming the 80386 featuring 80386387 John H Crawford amp Patrick P Gelsinger 1987 Sybex Alameda CA Explains instruction set control word ags gives examples The 8087 Primer John F Palmer amp Stephen P Morse 1984 Wiley Press New York NY Mainly ofhistorical interest now User39s Manuals instruction sets control words ags for MC 68881 and 68882 FloatingPoint Coprocessors MC68881UMAD 1989 MC 68040 Microprocessor MC68040UMAD 1993 Motorola PowerPC 601 Microprocessor MPC601UMAD 1993 Apple Numerics Manual Second Edition 1988 AddisonWesley Reading Mass Covers Apple II and 680x0based Macintosh oatingpoint what a pity that nothing like it is promulgated for the ix87 For PowerPCbased Macs see Apple Tech Library Inside Macintosh PowerPC Numerics 1994 for PowerPC generally see a forthcoming document on Foundation Services for the CommonPoint Application System from Taligent which will support FloatingPoint C Extensions proposed for a new C standard now being debated by ANSI X3J11 Copies of draft proposals concerning oating point generally and complex oatingpoint arithmetic in particular can still be obtained through JimThomasTaligent com Sun Microsystems Numerical Computation Guide Part no 800527710 Rev A 22 Feb 1991 Information analogous to these notes39 but aimed at Sun39s customers describes variances among compilers and hardware offers advice explains crude retrospective diagnostics Branch Cuts for Complex Elementary Functions or Much Ado About Nothing39s Sign Bit W Kahan ch 7 in The State of the Art in Numerical Analysis ed by M Powell and A Iserles 1987 Oxford Explains how proper respect for 0 eases implementation of conformal maps of slitted domains arising in studies of ows around obstacles The Effects of Under ow on Numerical Computation JW Demmel pp 887919 in SIAM Jl Scienti c A Statistical Computing vol 5 4 Dec 1984 Explains gradual under ow s advantages Faster Numerical Algorithms via Exception Handling JW Demmel and X Li pp 983992 in IEEE Trans on Computers vol 43 8 Aug 1994 Some computations can go rather faster if OVERFLOW is agged than if it will be trapped Database Relations with Null Values C Zaniolo pp 142166 in Jl Computer and System Sciences vol 28 1984 Tells how best to treat a NaN he calls it ni for no information when it turns up in a database FloatingPoint Computation PH Sterbenz 1974 PrenticeHall NJ Ch 2 includes abrief description of my early 1960s work with Gradual Under ow OverUnder ow Counting and Retrospective Diagnostics on the IBM 7094 quotHandling Floatingpoint Exceptions in Numeric Programs J R Hauser ACM Trans on Prog lang and Syst vol 8 2 Mar 1996 Surveys many languagerelated issues and contains a substantial bibliography Page 30 CSCOE1541 Introduction to Com puter Architecm re Datapath and Control Review Sangyeun cm as a mmpmersuenze umw a vmsburgh Register file ntaface read ponwme port dock ontroi Signai Datapath elements Arithmeti iogit um ALU umbmalmnaiiugizfundmn m s D ALU Spamsquot am i madam I am mimmmmm H Adder FaPCinaememmqmranzhtarg taizuiatmn Mux Weneedaid Mthese Registers 39Regi a iePC arzhitedurawvisibieregmers Yempuraryregi ersmkeep intermediatevaiues Register file Processor building blocks Analyzing instruction execution lw load word Fetch mammary Hm made available inmea awnosteps Aztess data memory m the address computed in m above Step Storet evalue omthememorymthemrgetregimrspezi edmthe mmmun Abstract implementation Analyzing instruction execution add add Fetdwmstmmun Read mm m Source registers Addthewu numbers made available in m above step Storetheresulttometargmreginerspezi edmthemsvumon Analyzing instruction execution 39 JUUWP rmrnsauman Extend the 2am immediate mid snmienbyzbnslt2gt3msnwwgt 2mm emu SqnmczmMansimmmemrrmiPtzndmmaenmem mm zizbiwdue AssignthisvaiuetoPC Fetching an instruction Common steps in inst execution Fetching the instruction word from the instruction memory Decodingthemstrucuon and rmdmgfromtheregisler ie 0rprepareavamacramenarmmadaeavameondpo Performing an ALU opeauon Accessrng the data memory rr needed Making a jump assrgnrng a computed vaiueto PQ rr n Writingtotheregisla ie Designing a controi iogic is based on our moreformai anaiysis or instruction execution Consid er aii instructions mm M an mm mm Fetching operands Handling memory access Revisiting MIPS inst format JWne i a mu numbr Datapath so far More elaborate datapath Lyi u First look at control mm m quotumwmy mm Example lw r8 32r18 3 n up n n mm mm Lev aiiume r18 Hai 1000 Lev aiiume MHDEZ Ha 0x11223344 Control signals overview RegDsl wmh mstr eldtousefordst regmerspecmerv 5 ll ALUSrc Whlth oneto useforALU Sr 27 n 2 m am 33 I x ALUop whattype ofALU operauonv Example lw r8 32r18 Pm 39 n ma 39 Erartnz 35 pm mmzzm 1012 mm 7 32 39 Mem md quot muzzny Control signals in a table WW ALU control A op 00 lwsw01 been arithmengll rump ALU control Depen operal E ding on instruction we perform different ALU ion rm sua w SEHHESSJHANsimilarmSUB mm Supporting instruction Resource usage we mum Singlecycle execution problem The ydetlme depends on the most tlmerconsumlng lnslrucllon Whathappenslfwelmpiementamorezompiexlnsvumoneg e oatlngrpolntmuinpiltatlon All resourees are simuitaneousiy salve rthere is no sbenno of well adopt a muitlrcyde Soiutlon which allows us to usee faster iozk Adopt it quotnumberofdozkzyziesperlnsvumonand Reduce obysieel resour Singlecycle execution timing new in nemweeee new Multicycle implementation Reusingfundional units Break up lnstrumorl exeeution mm smaiier steps seen funnlonai unltls used for e speeineouroose in any eyele ALU lsused for addloonal functions zalzulaoon and PClnzrement Memory used for lnnrumons end date At the end of a cycle keep results in registers Additionai regliteri Now onlon signals are NOT solely determined by the instruction bits Controls will be generated bya FSM Five instruction execution steps Instruction fetch Instruction decode and register read Execution memoly address calculation or branch completion Memoly access or Rtype instruction completion Writeback Instruction execution takes 35 cycles cscaa 5M inna racananeiaicnmne Umsuyvfhmmx a Step 2 decode 8t operand fetch Read registers rs and rt We read both ofthem regardless of necessity Compute the branch address using ALU in case the instruction is a branch We can do this because ALU is not busy ALUOut wiII keepthe target address We have not set any control signals based on the instruction I type ye Instruction is being decoded now in the control logic c Kiflsm inna racananeaicnmne thvuslxyvfhmhuqu Step 1 instruction fetch Access memoly W PC to fetch instruction and store it in Instruction Register IR Increment PC by 4 using ALU and put the result back in the PC We can do this because ALU is not busyin this cycle Actual PC Update is done at the next cloclc rising edge cscaa 5M inna racananeiaicnmne Umsuyvfhmmx a Step 3 actions actions actions ALU performs one of three functions based on instruction type Memony reference ALLOut lt A signrextendIR150 Rctype ALLOut lt A op 3 Branch if 943 PC lt ALLOut ump PC lt PC3128IR150239b00 vetilog notation c Kiflsm inna racananeaicnmne thvuslxyvfhmhuqu Step 4 memory access f the mslrucuon 5 memory reference Mm lt MemorylALLOut mm a had MemorylALLOut lte Hw hsasmre Store 5 mmer fthemslrucuonwskrtype ReelRH 5 ml lt Aum Newthemstrumon Smmp me Multicycle datapath amp control Step 5 register writeback On yme ory oad mstrucuon reaches msstep RegiRl2016lltMDR Multicycle control design Example lw 1 cycle Example lw 3rd cycle Example lw 2quotd cycle Example lw 4 cycle Example lw 5 cycle Example j 2quotd cycle Example j 1 cycle Example j 3rd cycle To wrap up From a number of building blocks we constructed a datapath for a subset of the MIPS instruction set First we analyzed instructions for functional requirements Second we connected buildings blocks in a way that accommodates instructions Third we kept refining the datapath yam m Wm mzmvmevmznnmme Umsuyvfhmmqu To wrap up We looked at the mullicycle control scheme in some detail Mullicycle control can be implemented using FSM z Kmsm Wm mzmvmsmznnmme thvusuyvfhmmujq To wrap up We looked at how an instruction is executed on the datapath in a pictorial way Control signals were connected to functional blocks in the datapath How execution sequence of an instruction change the control sIgnals was analyzed We looked at the multicycle control scheme in some detail Multircycle control can be implemented using FSM yam m Wm mzmvmevmznnmme Umsuyvfhmmqu CSCOE1541 Introduction to Computer Architecture Datapath and Control Review Sangyeun Cho Dept of computer Science university of Pittsburgh Datapath elements Aw rvi Ml vi Arithmetic logic unit ALU Combinationai logic funclion d input ahALU operation anyin l5 hidden Output iesulL zero over ow a on Adder For PC incrementing branch target calculation cmrvuvi ux 39 We need a lot of these Registers Register file PC architecturally visible registers Temporaw registers to keep intermediate values WCQEWSM rm toamvuterAmmta we Mimimisith Register file R5quot veg m quotunvle Md am mm dgu mm 2 R1 mm We 5 m vegmy mu 2 Hula an Interface read port write port clock control signal n Wm WWW Register fIIe 7V D wig H mm Moms who mampmerAmMadwe Processor building blocks Moms intro mmpmwmmre Umvzmtynfhmhuxgh Abstract implementation m u m mmsz lmlmnlizm mmmy Moms intro mampmerAKmtadwe Umvzmlynfhmhuxgh Analyzing instruction execution Iw load word Fetch instruction Read a base register 0 Signextend the immediate offset Add the two numbers made available in the above two steps Access data memory with the address computed in the above step Store the value from the memory to the target register specified in the instruction CSCOHSM intro to Computer Arthitenure Universicyufmsburgh Analyzing instruction execution add add 0 Fetch instruction 0 Read from two source registers 0 Add the two numbers made available in the above step 0 Store the result to the target register specified in the instruction SCOHSM intro to Computer Arthitenure unmmyufmsburgq Analyzing instruction execution 39 J lump 0 Fetch instruction 0 Extend the 26bit immediate field a Shift left by 2 bits 28 bits now Extract the most significant 4 bits from the current PC and concatenate to form a 32bit value 0 Assign this value to PC CSCOHSM intro to Computer Arthitenure Universicyufmsburgh Common steps in inst execution Fetching the instruction word from the instruction memory Decoding the instruction and reading from the register file 0 Or prepare a value from the immediate value and PC Performing an ALU operation Accessing the data memory if needed Making a jump assigning a computed value to PC if nee e Writing to the register file Designing a control logic is based on our more formal analysis of instruction execution 0 Consider all instructions SCOHSM intro to Computer Arthitenure unmmyufmsburgq Fetching an instruction Read address Inshucmm Instrucxion memory wcumsm intro mmpmmmmre Umvzmtynfhmhuxgh Fetching operands 5mm mu mtmmmmmm gt41 an Sum A L J 1 mam mum We Moms intro mampmerAKmtadwe Lymmmmmmh Handling memory access a dam from memory Mann Mro mmwmwmmre Umvzmlynfhmhuxgh Data path so far F Wm m y MW 39 Mm w MNquot My mum W mm mcmw Moms who mampmerAmMadwe Umvzmlynfhmhuxgh Revisiting MIPS inst format Rzype 39 op rs 39 n iishamt39i 2521 ZD JB 106 3126 Hype 0 rs 1 1 16bit number i 3126 2521 150 J type 91 i1 gsbitpuq per 31 25 25 o wcumsm m mmwmmmg quotmmwmm More elaborate datapath m mm W M A me M L M l w r mwwm r w ri mm I x m mummy ALL Con ml 1 m Moms who mampmerAKmiadwe Umvzmlynfhmhuxgh First look at control Moms who loamvulerAmmle we Umvzmtynfhmhuxgh Control signals overview RegDst which instr field to use for dst register specifier I nst2016 vs nst1511 ALUSrc which one to use for ALU src 2 I Immediate vs register read port 2 MemtoReg is it memory load RegWrite update register MemRead read memory MemWrite write to memory Branch is it a branch ALUop what type of ALU operation Moms who loampulerAmmle we Umvzmlynfhmhuxgh Example lw r8 32r18 35 18 B 32 up rs r 16bit number Let39s assume r18 has 1000 Let39s assume M1032 has 0x11223344 Moms who mampmerAmMadwe Umvzmtynfhmhuxgh Example lw r8 32r18 PC4 7 u 2 Memmiimg I 1 MW MemRead39 0x11223344 Moms who ocampmermumre Umvzmlynfhmhuxgh Control signals in a table u Dquot Q s n Memto m um um mm m u u a nn 1 ALus m w and Win a n ALunpl Aluopn Hype 1 o a 1 n a u 1 a W o 1 1 l 1 u o o 0 SW x 1 x u n 1 n n u an x a x u o a I n 1 Moms 1m mmwimmmwe quotmmmmmmh ALU control Depending on instruction we perform different ALU operation rn X m 3 E m gt C 9n 0 E U o 5 U C H a g F L L 39 110 SUB 39 111 SETVIFVLESSVTHAN similar to SUB Moms 1m immpmermmire Umvzmlynfhttxhuxgh ALU control ALUop 00 iWsw 01 beq 10 arithmetic 11jump WWW W L39Si39r ifi 173239 At riiii u 5 LW no Lead Wu xxxxxx Add um SW 05 Store word XXXXXX Add mu SEQ U1 Branch if equai xxxxxx Subuact nu RUD 10 ADD IUUDDU Add 010 R Kwe w sus mama Subtract no Hype 10 AND mama AND can my m on 10mm OR 001 Hype 0 Set if i255 than imam Set if 1255 than 111 Moms intro mmpmmmmre Umvzmtynfhmhuxgh Supporting instruction Moms intro loamvmerAKmladwe Umvzmtynfhmhuxgh Resource usage Instruction Funztinnnl units used by m lnstruniun class xtype Moms who mmpmwmmre Umvzmtynfhmhuxgh Singlecycle execution timing Moms who mampmerAKmiadwe Umvzmlynfhmhuxgh cscwsm intro to Computer Architecture Singlecycle execution problem The cycle time depends on the most time consuming instruction 0 What happens if we implement a more complex instruction eg a floatingpoint multiplication o All resources are simultaneously activeithere is no sharing of resources We39ll adopt a multi cycle solution which allows us to 0 Use a faster clock 0 Adopt a different number of clock cycles per instruction and 0 Reduce physical resources University uf Pittsburgh cscwsm intro to Computer Architecture Multicycle implementation Reusing functional units 0 Break up instruction execution into smaller steps 0 Each functional unit is used for a specific purpose in any cycle 0 ALU is used for additional functions calculation and PC increment 0 Memory used for instructions and data At the end of a cycle keep results in registers 0 Additional registers Now control signals are NOT solely determined by the instruction bits Controls will be generated by a FSM Unweme uf nasburgh Five instruction execution steps Instruction fetch Instruction decode and register read Execution memory address calculation or branch completion Memory access or R type instruction completion Write back Instruction execution takes 35 cycles CSCoEISM Intro to Computer Architecture Universicyufmsburgh Step 1 instruction fetch Access memory w PC to fetch instruction and store it in Instruction Register IR Increment PC by 4 using ALU and put the result back in the 0 We can do this because ALU is not busy in this cycle 0 Actual PC Update is done at the next clock rising edge CSCOEISM Intro to Computer Architecture Umve nyuf nshurgq Step 2 decode amp operand fetch Read registers rs and rt 0 We read both of them regardless of necessity Compute the branch address using ALU in case the instruction is a branch 0 We can do this because ALU is not busy 0 ALUOut will keep the target address We have not set any control signals based on the instruction type yet 0 Instruction is being decoded now in the control logic CSCoElSM intro to Computer Architecture Universxcyufmsburgh Step 3 actions actions actions ALU performs one of three functions based on instruction type Memory reference 0 ALUOut lt A signextendR1 501 Rtype o ALUOut lt A op B Branch if AB PC lt ALUOut Jump 0 PC lt PC3128R2502 b00 veriog notation CSCoElSM intro to Computer Architecture Umve nyuf nsburgq Step 4 memory access If the instruction is memory reference 0 MDR lt MemoryALUOut if it is a load 0 MemoryALUOut lt B if it is a store 0 Store is complete If the instruction is R type RegR1511 lt ALUOut 0 Now the instruction is complete CSCoElS4l lntro to Computer Arthitenure Universxcyufmsburgh Step 5 register writeback Only memory load instruction reaches this step RegR2016 lt MDR CSCoElS4l lntro to Computer Architecture Umve nyuf nshurgq Multicycle datapath amp control 1 m m campuermm ammmmmmh Umvemlyofh sbnxgh Example lw 1st cycle Moms who ocampmermumre ammmmmmh Example lw 2nd cycle Mann Mro mmwmwmmre amwmcyommmh Example lw 3rd cycle Moms who ocampmermumre ammmmmmh Example lw 4th cycle Mann Mro mmwmwmmre amwmcyommmh Example lw 5th cycle Moms who ocampmermumre ammmmmmh Example j 1st cycle Mann Mro mmwmwmmre amwmcyommmh Example j 2nd cycle Moms who ocampmermumre ammmmmmh Example j 3rd cycle Mann Mro mmwmwmmre amwmcyommmh To wrap up From a number of building blocks we constructed a datapath for a subset of the MIPS instruction set First we analyzed instructions for functional requirements Second we connected buildings blocks in a way that accommodates instructions Third we kept refining the datapath csCOEisM intro to Computer Architecture unwmnyufmsburgq To wrap up We looked at how an instruction is executed on the datapath In a pictorial way Control signals were connected to functional blocks in the datapath How execution sequence of an instruction change the control Signals was analyzed We looked at the multicycle control scheme in some detail 0 Multicycle control can be implemented using FSM CSCoElSM intro to Computer Architecture Umve nyuf nsburgq 23 To wrap up We looked at the multi cycle control scheme in some detail Multi cycle control can be implemented using FSM CSCOHSM lntro to Computer Arthltenure Un ve nyuf nsburgq 24 CSCOE1541 Intro to Computer Architecture Inputoutput subsystem Sa ngyeu n Cho Dept of Computer Science University of Pittsburgh O devices Tofrom users 0 Display keyboard mouse Tofrom non volatile storage 0 Hard disk tape flash drives Tofrom other computers 0 Network interface card NIC Questions 0 What are the erformance re uirements of these devices 0 Are there changing needs CSCoElSM intro to Computer Architecture Univers yuf nshmgh Inputoutput lO lO connects User human and CPU or a program running on it Environment physical world and CPU or a program running on it Bottom line IO devices and CPU IO devices have vastly different performance requirements Eg Hard disk vs mouse Bandwidth 59 blockorientedvs latency 89 characteroriented lO system architecture Control Polling IO device is quotpassivequot vs interrupt IO device is quotactivequot S nchronous or as nchronous buses Parallel or serial buses DMA direct memory access 39 IU Interfaces are Cleren Dy standards csCOEisM intro to ComputerArchitecture Unixgrsnyumeshurgh Performance aspects IO devices typically have fixed bandwidth requirements What is the necessary bandwidth tofrom video buffer for a 1024x768 screen at 60Hz refresh rate Two tiers Between user and device o Display screen lt gt video card VGA cable amp signal interface Between device and CPU Video card lt gt CPU 99 AGP Another bandwidth example How many transactions per second can be handled Latency csCOEisM lntro to ComputerArchitecture Umveysnyumesburgh O bus Bus Connects different components in a computer system CPUmemory MemoryIO Chiptochip There are quotserialquot buses Processormemory bus lO bus Processormemory bus vs lO bus Main Memory Synchronous vs asynchronous Processormemory bus Master amp Slave Masters processor IO bridge esp DMA ddWhen wilhave multiple masters we need quotarbitrationquot Slave main memory A less 8 ta Separate Signals or multiplexed lO bus Split transaction bus Decouples address transfer and data transfer and allows multiple address 39 AccommOdates SlOWeT lO deVICGS and allOWS foICIGM data tranSfeT U5ng transfers before the first data transfer takes place gt increased bus dedicated buffers and DMA Decouples memory and IO bus transactions munmm wmww w quotwwww m 7quot w CSCoElS4l lritro to Computer Architecture Unwers yummbmgh CSCoElSM lritro to ComputerArthitetture Unwers yufpmbmgh Master amp slave Bus signals Master Address 0 Bus entity that can initiate a bus transaction 0 Examples CPU DMA Data I Slave Control 0 Bus entity that does not initiate a transaction by itself 0 Rather it only responds to a request from a bus master 0 Example Memory 0 Signals that show or govern bus transaction type arbitration data transfer timing exception etc As to timing address and control signals precede data for read why or they can come close together for write CSCoElSAl lritro to Computer Architecture Unwas yufpmbmgh 5on intro to ComputerArchitecture Universityumesburgh AMBA example Transfer with quotwaitquot states AMBA Advanced Microconlroller Bus Architecture from ARM gt1 an o NAMth m i mwmm Nwmmm m quotmay mam mamm n mmmm ytumsm m wmpmerArmnenure Ummmmm ymmsm lmru mmwwmmm Umxsuyumemugh Multiple transfers Burst sequential increasing mmsu m mman m HEWLSVW m mmm m mmm n1 ymmsm m mmwwmmm ammmmw WWW m wmvvmrmnenwe amswmm Burst wrapping DMA direct memory access Taps into the main memory bus and O device buffers From lO device to main memory eg copying a new network packet from Nic to main memory From main memory to v0 device eg writing a block of data from in em ory to hard disk From IO device to IO device in a main memory location to another Key is rogrammabilit quot Source and destination Totai transfer size how many hytes in total Transfer unit how many hytes at a tim e7 ransfer triggering condition amp transfer termination condition Various event notification methods eg termination error uaaauru u DMA vs CPU priority cgcrfism imiummmpmeimmnenme mammarme cscuasu inraracmuerarchmure Wmng Magnetic disk drive Magnetic disk drive performance madam Auniimly Queueing time H manyjobs are queued atthe time of request Seek time 39 Time needed to move the arm to the target track 39 Mechanical Rotation delay A murmurv 39 Time for the target sector to be under the head quotquot 5 V2 rotation t39me on average 39 M chani Transfer time Stack of platters Heads move together single arm 39 Time to vead out buffer and llansfevdata I I Two surfaces per planer Mk access me srgsation speed recording density buffering method Tracks amp sectors Queueing seek Rotation transfer cgcrfism imiummmpmeimmnenme mammarme cscuasu inraracmuerarchmure Wmng Improving magnetic d k drive Use snian scheduling to reduce queueing lime Lotzhtyrbzied Stheduhng I 0 arm Faster seeking em Methznitzi Faster iotaiion speed currently 7200mm 0 higher quot9 Eg vemtzi magnetization Get perpmditui Rag mww hitzthigit eminddireaeardineeomi rig nadigrgergendimiar Larger hufferunemiy16MB for PCs Smart athing Hash awe a k2 hybrid hard drive Higherrhandwidth bus stanrards Senai ATA a SATA 150MB 2 BooMEs 2 anoMEs Cf zmHeiATA orPATA maxiaaMEa mew inmmmnnmw mmvfmwn n Hard disk product fam es PCworkslalion 357m Perionnanee prite and apatity 2007SOGB Notebookasrin Power apatity and form factor 40WEOGB Embedded products Cost sensitivity and reiiabiiity 8016065PVR Micrordrivelt1rin Form factor power and 051 2SGB mew inmmmnnmw mmvfmwn n Magnetic disk drive some met cs Capacity GE Performance Latenty ms transfer rate MES owe onsum pl on Wan rneeaie differentmode egSpindieom Form factor inthxinthxinth Reliability war man imebetween faiiurei POH pudenon hours hration thermal EMI Cost mew nu mammary Msw humgn Example spec cation mew nu mammary Msw humgn Price per GB trend Dnllavs per gigabyte 1 53545556575359909152939495959795990001 CSC0E1541 Intro 0 Com 2 uler Arch ileclure Unwermy of pmgburgh Hybrid hard drive All clisks have a DRAM cache In e e A HHDD also has a nonvolatile cache CSC0E1541 Intro to Computer Architecture Unwemw ofpmburgh Cost vs access time mumu 11mm DRAM mm 1000 Cast 15mm 1000 mama 10mm ammo 1000ou 1 Du Access rm insr CSC0E1541 Intro 0 Com 2 uler Arch ileclure Umv army of pmgburgh Hybrid hard drive Disk comes ready in less than 1 Up to 90 Power Saving when po PM instantly while spindle stopped Read instantly even while spindle spinning for nlgner IU rate CSC0E1541 Intro 0 Com 2 uler Arch ileclure Umv army of pmgburgh CSCoElSM lntro to Computer Architecture HHD bootresume Windows During shutdown or hibernate all the disk sectors needed to boot or resume are pinned into the NV cache On next power on the BIOS POST runs and the disk is powered on but the spindle won39t be ready for 24 seconds BIOS can read data from the NV cache and all boot process O can be read from the NV cache Once the rotating media is ready O can be satisfied by both NV cache and rotating media for optimized read performance CSCoElSM lntro to Computer Architecture univermyufpmsburgh Hybrid hard drive summary Basic idea 0 Implement a 64128MB NV cache using NAND flash memory The NV cache size will increase as NAND flash memory price goes down Potential benefits 0 Reduce boot time and resume time o Extend spindle down time lower power 0 Higher reliability Problem 0 Cost Full flash based hard drives are already on the market University uf Pittsburgh CSCoElSM lntro to ComputerArthitetture Power saving mode Windows SuperFetch buffers disk data in system DRAM to fulfill reads Write Os buffered in NV cache while disk is spun down Disk spins up only when 0 Read cache miss 0 NV cache full The disk spin down and continues to use the NV cache CSCoElSM lntro to ComputerArthitetture univermyufpinsbuigh Optical discs Optical disc drives have become a standard archivebackup device for personal computin Based on opticslaser technology For writing quotphasechangequot materials used ie two different reflection rate dl EfE 47GB BIurag Disc 27GB University ufpinsbuigh Storage devices trend Hard disk 0 We will see 1TB hard disks become mainstream in PC soon 2gtlt capacity increase per year 0 Perpendicular recording is fully adopted 0 Hybrid disks will become more widespread from notebook market 0 Solidstate disks are gaining recognition Optical disk 0 CD speed of 52 gtlt saturated o Writeable DVD drives prevalent now 16gtlt speed saturated 0 Standard battle ended BIuray group Sony Samsung won over HDDVD previously AOD group Toshiba NEC Flash memory cards 0 16GB cards and 86B USB drives are available 0 Device NAND flash capacit has doubles every ear will slow down CSCoElSM lntro to Computer Arthitenure Umveys yuf nsbmgh DMA and virtual memory Different virtual memor39 a39es have different h39sical a39e numbers 0 DMA operation over multiple pages can pose a problem 7 quotwhat is the next pagequot Solutions 0 DMA using VM addresses With translation hardware OS does not remap those pages until DMA is done 0 Partition DMA transfer into pages OS chains the pages for the requester CSCoElSM lntro to Computer Arthitenure Univers yuf nsbmgh CPUlO interfacing Memory mapped O 0 IO control bits eg turn on turn off change color are located in memory addresses Use regular load and store to read from and write to those bits 0 These addresses are quotvolatilequot Dedicated O address space and instructions In any case O device control is done via well defined quotinterfacequot register specification CSCoElSM lntro to ComputerArthitetture Umvemtyumesburgh DMA and cache coherence Two copies 0 One in cache 0 One in main memory When transferring data from memory to IO When transferring data from IO to memory Solutions 0 Do not cache IO data 0 Flush cache whenever needed eg write dirty data to main memory before IO write Similarly invalidate cache before IO read Primitives special instructions or registers to control the actions are provided by hardware CSCoElSM lntro to ComputerArthitetture Umveygtyumesburgh cytomsm intro to Computer Architecture cytomsm intro to Computer Architecture Reliability issues in storage We don39t want to lose valuable data Fault is the cause of an error 0 When a fault occurs it creates a latent error which can later manifest itself System failure occurs because of a manifested error Example 0 All ha I article hits DRAM cell fault 0 Bit inversed error 7 latent until it is read 0 Error is propagated and affects the delivered senice failure University uf Pittsburgh Reliability improvement Fault avoidance o How to prevent by construction fault occurrences Fault tolerance o How to provide by redundancy continued senice in spite of faults Error removal 0 How to minimize by verification the presence of latent errors Error forecasting o How to estimate by evaluation the presence creation and consequences of errors University uf Pittsburgh MTTF MTTR MTI39F mean time to failure 0 1M39I39I39F rate of failures MTI39R mean time to repair MTBF mean time between failures MTTF M39I39I39R Module availability MTI39FMTBF CSCoElSM intro to ComputerArthitetture RAID Redundant arra39 of inex ensive dirks University umesburgh Disk array with striped data can provide high bandwidth 0 But not necessarily smaller latency What about reliability 0 Error rate with 100 hard disks is 100 times higher than that of 1 disk CSCoElSM intro to ComputerArthitetture University efpittsburgn RAID 0 and 1 RAID 0 0 Data striped o No provision for reliability 0 This is not really a RAID 7 it39s AID RAID 1 0 Data mirrored Every bit is written in two different disks 0 Expensive CSCoElSM Intro to Computer Arthitenure Optimizing small writes I Reno 2 Read a Real i El y EEHEE 3 Write CSCoElSM Intro to Computer Arthitenure t WHIR University at Pittsburgh 3R2W 2R2W University at Pmslzlurgh RAID 2 and 3 RAID 2 0 Bit interleaved parity 0 Memory ster ECC error correction code 0 Still expensive RAID 3 o Byteinterleaved parity 0 Add 1 check diskfor N disks overhead 1N CSCoElSM Intro to ComputerArthitetture University umesburgh RAID 4 and 5 RAID 4 I BlockInterleaved parity I 1 check disk for N data disks I Chick is done on a block with the original data block possibly residing in a single is I Allows parallel reads RAID 5 I Same as RAID 4 but check blocks are also interleaved El E mm qus CSCoElSM lntro to ComputerArthitetture University nfpittsburgn RAID 6 PQ redundancy Added more parity for improved error recovery RAID s we Logrcar Disk Array mmr Buysrca Physrrar Physrcal Physm urgk mm Dusk mm Drskh mm m a mpmmmnmu Mswmm


Buy Material

Are you sure you want to buy this material for

25 Karma

Buy Material

BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.


You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

Why people love StudySoup

Steve Martinelli UC Los Angeles

"There's no way I would have passed my Organic Chemistry class this semester without the notes and study guides I got from StudySoup."

Amaris Trozzo George Washington University

"I made $350 in just two days after posting my first study guide."

Jim McGreen Ohio University

"Knowing I can count on the Elite Notetaker in my class allows me to focus on what the professor is saying instead of just scribbling notes the whole time and falling behind."

Parker Thompson 500 Startups

"It's a great way for students to improve their educational experience and it seemed like a product that everybody wants, so all the people participating are winning."

Become an Elite Notetaker and start selling your notes online!

Refund Policy


All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email


StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here:

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.