Popular in Course
Popular in ComputerScienence
This 93 page Class Notes was uploaded by Betty Kertzmann on Tuesday September 22, 2015. The Class Notes belongs to CS 530 at Colorado State University taught by Yashwant Malaiya in Fall. Since its upload, it has received 41 views. For similar materials see /class/210183/cs-530-colorado-state-university in ComputerScienence at Colorado State University.
Reviews for Fault
Report this Material
What is Karma?
Karma is the currency of StudySoup.
Date Created: 09/22/15
Design for Testability in Digital Integrated circuits Bob Strunz Colin Flanagan Tim Hall University of Limerick Ireland This course was developed with part funding from the EU under the COMETT program The authors wish to express their thanks to COMETT Document rescued from the depths afintemet Introduction and Objectives Testability In Digital Systems 0 Faults 0 Test Vector Generation 0 Combinational Logic Test Fault Models 0 StuckAt Faults 0 Example 0 Fault Models For Basic Gates AND Gate OR Gate Inverter Special Circuits The Sensitized Path Method 0 Problems with the Sensitized Path Method 0 Making Choices 0 Reconvergent Fanout Paths Redundancy and Undetectability 0 Example The Dalgorithm o DNotation o Singular Cover 0 Primitive DCubes of Failure PDCFs Example Example 0 Propagation DCubes o DIntersection Example Examples The Full DAlgorithm Example DDrive Consistency Operation DDrive 0 Consistency Operation Other Test Generation Methods 0 L ASAR o PODEM o Boolean Differences Design for Testability AdHoc Technigues Structured Technigues Scan Paths 0 Scan Path Implementation Stanford Scan Path Design Latch Based Designs Self Test 0 Signature Analysis Generating Self Test Patterns 0 BILBO Test Procedure with BILBO39s 0000 Introduction and Objectives This course provides an introductory text on testability of Digital ASIC devices The aim of the course is to introduce the student to various techniques which are designed to reduce the amount of input test patterns required to ensure that an acceptable level of fault coverage has been obtained Testability In Digital Systems Being able to design a workable system solution for a given problem is only half the battle unfortunately We must also be able to test the system to a degree which ensures that we can have a high confidence level that it is fully functional This is generally not a straightforward task in very small scale digital systems we can test exhaustively that is to say we can exercise the system over its full range of operating conditions In a larger scale system it is no longer possible to do this and therefore we must look at other strategies to ensure that the system will be properly tested When testing a digital logic device we apply a stimulus to the inputs of the device and check its response to establish that it is performing correctly The input stimulus is referred to as a test pattern In general we observe the response of the device at its normal output pins however it may be that the device is specially configured during the test to permit us to observe some internal nodes which generally would not be accessible to the user The response of the device is evaluated by comparing it to an expected response which may be generated by measuring the response of a known good device or by simulation on the computer If the device under test DUT passes the test we cannot say categorically that it is a good device The only conclusion that we can draw from the device passing a test is that the device does not contain any of the faults for which it was tested It is important to grasp this point a device may contain a huge number of potential faults some of which may even mask each other under specified operating conditions The designer can only be sure that the device is 100good if it has been 100tested this is rarely possible in real life systems Faults What type of faults are we trying to detect 7 We are starting with the assumption that logically the system performs its desired function and that any faults occuring will be due to electrical problems associated with one or more of its component parts Two key concepts are of interest here these are Controllability Observability During the design ofa system the designer must ensure that the test engineers have the z u v Sy em 239 i Equally as 39 39 39 Will be observable that is that we will be able to see clearly the effects of the test patterns applied 1 Controllability Being able to set up knon internal states 2 Combinatorial Testability Being able to generate all states to Jlly exercise all combinations of circuit states 3 Observability Being able to observe the effects ofa state change as it occurs preferably at the system primary outputs Test Vector Generation In VLSI circuits we have a high ratio of logic gates to pins on the device there is generally no way of accessing most ofthe logic so we cannot directly probe the internals of the device Because of this problem we need away ofgenerating tests which when applied to the inputs ofa circuit give a set of signals which indicate whether or n tthe device is good or fau ty The set of stimulus input and expected output pattern is called a Test Vectorquot Tl t t quot39 quot 39 betwe n L 4 L39 machine Figure 1 shows a digital device as we can see there is only access to the pliluajy 39 39 39 onlythese ports Combinational Logic Test Ni If 39 quot lnicblock 39 ln m then the device may be tested by applying all possible ZAN possible inpu patterns here N is the number of inputs This is termed Exhaustive Testing and is satisfactory for small circuits but rapidly becomes unweildy as the number of inputs grows Assuming a tester capable of applying a test pattern every 100ns Then we can calculate the test time as shon in table 1 Inputs Num Tests Test Time 20 220 106 01 Sec 40 240 1012 305 Hours 60 260 118 X 1019 58500 Years Table 1 Exhaustive Testing Times Looking at table 1 it is apparant that the exhaustive test strategy gem completely out of hand quite quickly and therefore it is only of use Where there are a Very small number of inputs This is also a Very inef cient test strategy most of the test patterns are actually redundant We need a method of determining which test patterns are signi cant in order to obtain a minimum set of patterns Various methods are listed Sensitised Path Method gorithm Critical Path LASAR PODEM Boolean Differences 5quot ny Currently the DAlgorithm and is descendenw P ODEM and LASAR are the most Widely used methods Fault Models A Fault Model is a description at the digital logic level of the effects of some fault or combination of faults in the underlying circuitry The use of fault models has some advantages and also some disadvantages 0 Advantages 0 Technology independant 0 Works quite well in practice 0 Disadvantages 0 May fail to identify certain process specific faults for instance CMOS oating gates o Detects static faults only Stuck eat faults Example Single Fault Consider an N Input AND gate For the STUCKAT fault model there are 3ANl 1 different cases of single and multiple faults with the single fault assumption there are only 2Nl stuck at faults AND Gate Fault Test at Inputs Output SAO 1111 1 ml SA1 0111 1 In2 SA1 1011 1 InN SA1 1111 0 Table 2 Nelnput AND Fault Model N For the AND gate any input SAG has the same effect as the output SAO Output SAO is said to CO VFR all of the input SAO faults Similarly any input SAl COVERS the fault output SAl This means that 1 No Input SAO faults need be included in the fault model 2 The Output SAl fault need not be covered either So the AND gate fault model has Nl faults these are listed in table 2 Therefore with Nl faulw Nl test Vectors can completely test an N input AND gate The NH faults COVER all the 2Nl single SA faults It can be shown that they cover all 3ANl 1 multiple SA faults as well OR Gate With an OR gate the Output SAl covers all Input SAl faulw Any Input covers Output SAO This is shown in table 3 Fault Test at Inputs Output SAI 0000 0 Inl SAO 1000 0 In2 SAO 0100 0 InN SAO 0000 1 Table 3 Nilnput OR Fault Model Inverter For an inverter the Output SAl covers Input SAO and the Input SAO covers Output SAl This is shown in table 4 Fault Test at Inputs Output SAl 1 Output SAO 0 Table 4 Inverter Fault Model Special Circuits especial 39 39 39 n L L L LL n both single and multiple faults Most such circuits are generally trivial however the 2 inputANDOR 39 39 39 39 39 39 39 Lnn iLi in PLA structures Such a circuit is shown in figure 4 X1 X3 X2 x4 The circuit of gure 4 realises the Boolean function Z X 1X 3 X 3X4X1X2X4 X2XaX 4 i I I 2 level ANDn I I I I I I realised by the circuit These prime implicants may be expressed using the CUBE 39 39 39 39 an an LLi L 39 39 39 L represent the vertices of a hypercube in a Boolean statespace The Cube associated with is 0mm The Cube associated with X3X4 is 101 The Cube associated with X1X2X4 is 111 1 The Cube associated with X2X3X4 1s 9110 So the function may also be realised as z 0xoxxxol11x1x110 Consider now the set offaults 1121110 This completely covers all other faults in Gate G1 If 1 10 occurs the cube 0XOX disappears completely from Z leaving Z XOX11X11X110 A test vector for this fault is any input pattern which causes the two functions Z and Z to differ Since the 0cubes minterms of Z must be a subset of those in Z a test vector for 110 wil be any 0cube in Z and not in Z39 T110 Z Z 0000000100100011 X0X11X11X110 00000010 In general for any ANDgate s output SAO the test set may be found by taking the difference between the dropped minterms and the other minterms This is most easily done by expanding the dropped prime implicants to minterms and comparing them with the other prome implicants Testing AND gate output SAO also tests for the appropriate OR gate output SAO and any inverter output SAO as well The test sets for the other AND gate outputs SAO are T120 1001 T130 1111 T140 01101110 Testing for AND gate inputs SAl Consider 1 1 This modifies Z to Zquot X0XXX0X1X110 As before the set of test vectors for this fault will be all those minterms in Z and not in Zquot and vice versa ie the set difference T11 Z Zquot 0 00XXX0X11X11X110X0XXX0X11X11X110 10XX X0X11X11X110 Expand 1XOX to minterms and take the set difference 10001010 Similarly the test sets for some of the other AND gate inputs SAl are T21011001010111 T31010101111101 Notice that the vectors 0101 and 0111 serve as tests for both 21 and 31 Testing an AND gate input SA1 also tests for the OR gate output SA1 and any inverter output SA1 which lies in the path to the AND gate input Testing the AND gate output SA1 and each input SAO covers the AND gate However it also covers both the OR gate and the inverters Thus by testing only the AND gate we perform a complete test for all the faults in the circuit The Sensitized Path Method This is a heuristic approach to generating tests for general combinational logic networks The circuit is assumed to have only a single fault in it The sensitized path method consists oftwo parTs l The creation ofa SENSITIZED PATH from the fault to the primary output 2 The JUSTIFICATION or CONSISTENCY operation where the assignments made to gate inputs on the sensitized path are traced back to the primary inputs X1 X2 X3 X4 X5 X6 Figure 5 is an example network with an assumed fault 70 The sensitized path method will be used to generate a test for this fault PART 1 Create the sensitized path This is done by forcing the complement of the fault on line 7 and propagating this Value to the output step 123456789101112131415 1 11 1 2 0 1 3 10 Step 1 Create conditions for fault to be detected on line 7 this can be achieved by applying the test Vector for an AND gate output SAO net 1 l and net 2 l The Value shown on net 7 in the table is that which it would carry in the absence ofa fault Step 2 We need to propagate the fault through G5 this may be done by setting net 10 0 Step 3 We now need to propagate through G7 this can be achieved by setting net 14 1 In general the fault signal is propagated through each gate by picking a combination of the other inputs which cause the output to be solely dependent on the faulty input The sensitization stage is now complete because the fault has been propagated to a primary input PART 2 Justify the assignment of Values to the outputs of internal gates made in part 1 by working backwards towards the primary inputs Interior nets 10 and 14 have had Values assigned to them corresponding to the outputs of gates G6 and G2 First we notice that net 10 having a Value applied to it implies the Values of nets 8 11 and 12 so these are updated in step 4 step 123456789101112131415 11 1 0 110 4 0 01 Step 5 Now we trytojustify the assignment ofalogic 1 to net 14 output ofG6 Notice that net 12 1 will force net 14 1 so this condition is automatically justi ed Therefore assign an X don t care to net 9 step 123456789 101112131415 1 1 1 0 0 0 1 1 1 0 5 X In justifying nets 8 and 9 we have created two new gates to justify ie G2 and G3 Step 6 G3 is easy tojustify since it is X nets 5 and 6 can also be X step 123456789101112131415 11 10X001110 6 XX Step 7 G2 may be justi ed by either net 3 0 and net 4 X or net 3 X and net 4 0 so nally We have step123456789101112131415 11 XX10X001110 ox 7b X0 The test generation is noW nished because all the primary inputs have been assigned Values T70 110XXX11XOXX Problems with the Sensitized Path Method 1 Making Choices 2 ReconVergent Fanout Paths Making Choices The sensitized path method attempts to drive a test to a single output When the propagation routine reaches a net With fanout it arbitrarily selects one path Sometimes this blind choice of a path ignores easy solutions X1 X2 X3 x4 x5 The NAND gate 34 of gure 515 easy to eontro1 rts rnput from 32 ean be setto 1 by setung PIXS to 0 Th1s makes an easy path to propagate faults on Gland other to rts 1e t to a pnmary output on the otherhand gate 63 W111 be more ddf cult to eontro1beeausertsrnputltnet1 r n test propagaung through 63 rnust also propagate through other logn recogmzmg thrs mdxsjust as hkely to ehoose apath through 63 estabhshed Reconvergent Fanout Paths The sensruve path rnethod 15 NOT guaranteed to nd a test for a fault even where sueh a test does exst The pnnerp1e eause ofthxs prob1ern 15 the presenee ofreconvergent fanr outpaths m a erreurt Consrder the erreurt of gure 6 wrth fault 60 The path sensrtrzatron proeedure 15 Step 1 7 create eondtrons to deteet fault on net 6 Step 2 7 Try propagatrng through gate GS by assrgnrng 0 to net 1 Step 3 Nets 1 and 3 are both 0 from steps 1 and 2 gt net 5 1 step 123456789 1 X0 D 2 n D 5 0X0 D E 3 15D XD D1 D step 123456789 an D1 D 4 n X 1 unuxnl D 5 no 1 1 nnnn1D1 D Step 4 Nets 5 1 from step 4 gt net 8 0 Step 5 Now propagate net 9 to the output by setting nets 81011 0 The path sensitization is now complete as the fault has been propagated to a primary output Assignments to internal gate outputs must now be justifie G6 0 and G7 0 need to be justified Start with G6 0 step 123456789101112 000 11 00001 1 1 111 00001 1 0 1110000 0 1 0 00 7 0 0 O 0 Step 6 Net 6 is SAO so net4 1willjustify net 10 0 Step 7 This implies G3 0 from net 2 0 and now net 4 1 However this assignment is INCONSISTENT because net 3 0 and net 7 0 but also net 11 0 according to the table is is not correct e values specified on nets 3 and f 39l d as 7 should result in net 11 adopting a 1 Thejustification procedure has a1 e On examination we can see that the problem arose because we assigned a 1 to net 4 However this is an inevitable assignment given the path we are trying to sensitize We could backtrack and try to sensitize a path through G6 but the same problem would arise It appears that there is no test for 60 However such a test does exist it is T60 0000 Why does the sensitized path method fail to find this test 7 If 39 we see that mi emu 39 39 I through both G5 and G6 simultaneously The problem with the sensitized path method is that it only ever attempts to sensitize a SINGLE path For reconvergent fanout networks this will usually cause problems during the justification stage The problem arises because having completed the sensitization we must during justi cation trace back along the alternative paths to the position with the fault This will require us to justify a gate one of whose inputs is faulty which severely restricts our choice of inputs Usually this leads to the type of difficulties encountered in this example Redundancy and Undetectability If no test vector exlsts for a fault than that fault ls UNDETECTABLE Undetectable hazard condltlons Flgure fault Fault 31 ls urroleteetable beeause m order to propagate rtto rret s we requlre X11 and x2lbutxlANDx2o Are urroleteetable faults aproblem 7 Yes beeause they ear MASK otherfaults Example Fault Masklng 110 ls atest for 10 However lf31 ls srrrrultaueously preserrtrrr tlre erreurt tlus test wlll fall shown above 1 X1 AND X2 Satlsfy yourselfthatthls ls so The Dalgorithm e Ms guaranteed to nd a test vector for a fault if one exism The Dalgon39tnrn has hem speci ed vey formally and is suitable for computequot I I I I u L L LLUI ucl nun mammary u n i i outputs DNotation Singular Covequot Primitive DCubes ofFailure EDCFs 0 am Is U E 3 S 5 l D Notation This is a compact way of specifying how faulm propagate through a circuit D is a holdingD 39 39 39 39 39 Nm 39 is analogously de ned Faulted Machine 0 1 Good 0 0 D Machine 1 D 1 D stands for discrepancy signalquot Singular Cover mputAND gate Truth table Singular Cover a b F 0 0 X 0 0 1 1 1 LhePl set FotheAND gate P0 UXUX00 P1 111 a Autumn F lmphcants ofF and those ofNoLF Primitive DCubes of Failure PDCFs A m n hung the fault to the circuit output To generate the P D c F for a fault cover and intersect the P1 cubes with the F0 cubes wPl mi following table Faulted Machine 0 1 D D U Note e D and NotD are only allawed on UUTPUT pinxfor PDCF inmmn39m Example Form the P D CF 5 for a 21nputAND gate where mput 1 x5 SAD We already have the singular coverfor the faulefeee AND gate n is P0 UXUJ C00 P1 111 For the faulted AND gate 10 the eovens 1 2 3 X X 0 x e F1 is the empty set and F0 contams all mput combinations P0 0 f1 UXUL X00 0 P1 0 fn 111 0 000 1113 Now performmg the intersections he n h lt0X01ltXoonlt1xm 103 pmfo cunnmxon 1 So the oh1y P D c F for101511Dgt Example Form the P D CF 5 for a zerhputAND gate where rhput 2 15 5A1 For the faulted AND gate 21 the eovens 1 2 3 0 X 0 1 X 1 a Interseeahg he n n lt0X01ltXoomltlxn 103 pmfo 111 0X0 The oh1y P D CF for 2115 10 NotDgt Propagation D Cubes The propagatroh Deorbes ofagate are those whreh eause the output of agate to depend propagated through the gate The propagatroh Drcubes for a Zrlnput AND gate are y D D D D D D mun tntr tr tn of P1 accordlng to tlne followmg table P1 1 D 1 1 0 0 5 0 In general rtrs posslble to have up to 2mm propagatron Drcubes for an ernputgate so normally only those eubes wrtln a smgle D m the rnputs are stored lees wrtln m tlne one D m the rnputs ean be formed by rnterseetrng tlne eovers DIntersectiun o show nowD slgnals at tlne outputs ofgates rnterseetwrtln tne propagataon Drcubes of otlnergates allowmg a sensmzed path to be eonstrueteol Example Generate atestfor 20 1 the elreult offlgure 8 20 has P D c F 01NotDgt To txansmxtthe NotD on net 4 ihmngh 32 we must try to match1e1ntersectthe spemfxcauon wnh one ofthe propagauon Drcubes for 32 Sneh amatch is possime 1fthe propagauon Drcube 0D NotD1s used PCDF Prop Decnbe Intersection The n11 sei of inies for use mineiseeiimi is as canons lei A Dash 11 B minuh be Dcnbes n1ieie has 5 01XDE 1 g i g n The De tersecliou denoied b AFIB is as canons l X 1397 41 as 2 ifa x and 6 x i1ien a M 7 a has a oineinise 3 A B a ie i1ie nn11 intersection or enipiy cube if for any i a nb 1 Examples leDO 1 X51130 X51130 D1 01511 OOXDI 00D For purposes ofmtersecuon blank enmes in atable conespondto X s The Full DAlgo rithm 1 choose aPDCF for the fault under consideration quot quot 39 e ault r r We do this by successively intersecting the PDCF ofthe fault with the propagation Dcubes of successor gates The process is called the DDrive Justify 39 39 A 39 Ddriveby39 39 L 39 39 covers ofgates 39n the justi cation path with the expanding Dcube This is called onquot i the Consistency Operati cu w Example Use the Dalgorithrn to generate atest for 60 in the network shown in gure 9 D Drive Step 1 Select PDCF for 60 Step 2 Intersect with propagation Dcube for NOR gate G4 Step 3 Intersect with propagation Dcube for NAND gate Gs u i i iinmenu rl step 123456789 1 XI D 2 n D D nxn D D 3 1DD nxn D1DD step 123456789 nxn D1DD 4 n X 1 nnnXD1DD 5 an 1 1 nnnn1D1DD Consistency Operation Step 4 e tnat0x1wr11 serve Step 5 rJusufy tne choose 001 arbnx rgnrnent 31 x Any combmauon ofmputs wru serve so we ass anly and we have atestfor 60 ans rs an easy example because no rneonsrsteneres were enmuntered wnen tnev are rt rs necessary to backtrackto tne last pornt atwhxch an arbrtrarv deersron was made and make anotner enoree attnatpornt ans can leadto alot ofBACKTRACKJNG and can make tne algorithm very slowxfalot of alternative patnsrnust be enamrned Figure 10 is an example circuit where inconsistencies force backtracking D Drive Step 1 SelectPDCF for 51 Step 2 Carry out implications ofthis choice net 4 becomes 1 Step 3 Propagate through G4 by using one of the propagation D cubes of the NAND gate Step 4 Propagate through G5 by using a propagation D cube of the NOR gate at 9 1 2 1X 1X 1X 1X 1X 1X H HHHI I UlU D U Ui a H D OI D DD sad aulci 123456789 1 1139 Step 5 We now have D on net 7 and NotD on net 8 No propagation D cube exism for this combination of inputs on an AND gate G6 so Nulls are entered instea ehoose anothervalue for etthehhet 7 ohhet 8 Thxs is a Justification step for Go Step 6 7W2 arbxtxanly ehoose to modAfy net 8 and se1eet to set it to 0 Thts aehoh a t n a In this mstahee the value on net 6 beeomes mvahd ahdts dropped Urcube try another theme ofmput Step 7 1 A propagation ofmputs on Go so the intersection is successful Thts eompletes the Dome Consistency Operation beJusn ed Gates GS and 63 needjusu canon Step 8 rJusufy GS Thts gate has a fault on one mput whteh eah be matehed usmg aux Step 9 rJusufy 63 Thts eompletes the consistency operation Other Test Generation Methods In this section we will brie y examine some other test generation methods LASAR L ASAR Logic Automated Stimulus And Response Otherwise known as the Critical Path method Unusual in that all circuits to be analyzed using this technique must first be converted to NAND equivalent form Very similar to the justification part of the D algorithm It works back from assumed values on primary outputs towards the inputs Used in the HILO simulator PODEM PathOriented DEcision Making This algorithm was designed to generate tests for error detection correction circuits These circuits typically contain large numbers of XOR gates which slow down the Dalgorithm Has been found to work as well as or better than the Dalgorithm for most general circuits Works from primary inputs towards fault site and primary outputs Boolean Differences This approach was popular amongst researchers in the 1960 s but has not survived Des1gn for Testablllty There are various techniques in common usage which help the designer of digital systems to ensure that his system will be testable In the following sections we shall consider some ofthese AdHoc Techniques In this section we shall list a set of what might be termed design for testability rules These are not architecture design techniques perse but rather a set of guidelines 1 A global reset signal is important this brings all of the internal memory elements to a known state 2 Long counter chains should be broken A 10 bit counter needs 1024 cycles to test it fully if divided into 2 5 bit counters only 32 cycles are required Plus a few approx 18 cycles for testing the decode logic This approach may be difficult with fast synchronous counters with lookahead carry 3 Bring difficult to test internal nodes out to device pins This may also be difficult as pads are usually at a premium 4 On board clock generators should be replacable by an external test clock signal this will allow external control of the clock during test 5 Never Ever use asynchronous design this can lead to RACE conditions and lots of other nasty problems Only ever use the CLEAR input on a ip op for the global reset signal Structured Techniques Primary Inputs Primary Outputs Combinational Logic Memory Element Memo Element Figure 11 illustrates a canonical model ofa sequential circuit What the structured design i t ta 39li 39 39 39 memory elements from external pins Scan Paths Since IO for test purposes This is called a SCANPA TH design umiu Primary Inputs Combinational Logic Memory Element scan 1n Element Primary Outputs Scan Out Figure 12 illustrates the canonical system of gure 11 with a Scan Path added to it 39 39 39 39 iIninput nIm39n D ted L L39 c block u 39 39 p and the memory element inputs as primary outputs rne snin reg re smith a 1 o 1 o 1 Once the shin re i re 39 39 Mode and 1 cycle ofnormal operation performed to check that p Scan Path Implementation Scan path designs may be implemented in various ways Stanford Scan Path Design Tn thi 39Dn added as shown in gure 13 den the combinati anal block U DTest Test Clk The ip ops are then connected as shown in gure 14 From Comb Logic From Comb Logic Scan Out Scan In To test the design 1 Load test pattern into ip ops by clocking Set test 0 PPNquot w Apply 1 rlnrlz ml 9 1 test pattern 5 Set test 1 and clock out the result Latch Based Designs Latch based designs attempt to eliminate circuit race hazards A completely hazard free circuit is easier to test and more reliable The most important technique was developed by IBM and is called Level Sensitive Sean Design or LSSD In LSSD each memory element consists of 2 latches the L1 latch and the L2 latch The L1 latch is a 2 port latch with 2 clock inputs as shown in gure 15 Dl D D2 C1 C2 C Input D1 is controlled by clock signal Cl when Cl is high then D1 is connected to Q Normally D1 is connected to the outpuw of the system combinational logic and D2 to the scan path Latch L2 is a standard D type latch The system is connected together as shown in gure 16 TO LOGIC FROM LOGIC TO LOGIC SCAN OUT CLK TEST7CLK CLK2 This is one possible LSSD structure it is designed by directly replacing ip ops in an IC Each latch pair is used exactly like a MasterSlave ip op in normal operation A 2 phase nonoVerlapping clock is used on CLKl and CLK2 as shown in gure 15 Jlll The Lestprocedure IS as follows 1 rr r A 2 Apply 1 CLKl pulse followed by l CLKZ pulse to run the test through the elrcull 3 clock the result outuslng TSTCLK and CLKZ Thls ls only 1 leehmque Whlch may be used to deslgd LSSD Circuits Self Test Varlous forms of selflesl leehhlques are used m modem Dlgltal 1c deslgi In the followlng seeuohs we shall look at some othese Signature Analysis o l A one wlll devlate from ths ha l all lh lfmmanlrl a showh m gure 18 GF2 on the mcommg bxt saeann andthe taps coming back from me LFSR XOR Gates mmally an putm the global reset signal A erN clock pulses the regxsterwxll contam the Lquot WVHPkrdmw Mr for NH num 1be the same signature 15 given by 2nm 1 Pew 271 1 F rw V wdw V infinny this tends to 1 Perrz2 m W n Packard use m 1sgmng Pen 1 SE and have not found this error probability to pose a problem m pnaeaee Generating Self Test Patterns A As n wn mam m e length will be 2m 71 645mm patterns all zeros is not allowed 39 o1234 Is I6 I7 l8 To perform insitu testing of a logic network we could place one of these registers at its input and some signature analysis circuitry at the output The LFSR generates random inary sequences which are fed through the network under test and analysed by the signature analysers BILBO BILBO is a rather unfortunate acronym forBuilt In Logic Block Obrewajion it implements the signature analysis idea in practice The memory elements in the system are connected in a Scan Path as shown in figure 20 Logic Block 1 I I Logic Block 2 BILBO Scan In Scan Out Each BILBO can act as o A Scan Path shift register 0 An LFSR generating random patterns 0 A multiinput signature analyser Provided that the start state is known and that a known number of clock cycles are injected the finish state will be a known pattern Test Procedure with BILBO39s Testing using BILBO39s is carried out as follows For logic block 1 l BILBO l is initialised to a nonzero initial state by shifting in a pattern through Scaniln 2 BILBO l is con gured as a PRBS generator and BILBO 2 as a multiinput signature analyser N clock pulses are applied BILBO 2 is con gured as a ScanPath and the result is shifted out through Scan70ut bUJ To test logic block 2 BILBO 2 becomes the sequence generator and BILBO l the signature analyser The quality of the tests generated fault coverage must be determined by prior fault simulation The final signature may be determined by checking a known good part Dangerous or by logic simulation The translation was initiated by strunzbitderlulie on Mon Jan 23 164416 WET 1995 Random Testing YKM Random Testing Outline RT advantages and tradeoffs RT vs pseudorandom testing PR Coverage and detectability pro le Hardware and software DPS CL for random and pseudorandom tests High and low testability faults during early amp late testing Implications of a late asymmetric profile October 21 2008 Random Testing YKM Random Testing Extensively used for both hardware and software Ideally each input is selected randomly PR Pseudorandom schemes approximate random Generally quite effective for moderate coverage Coverage hard to determine a priori Ineffective for random pattem resistant faults Coverage tools Random functional followed by Structural testing 7 it b 10212008 a we 1 WHEN FTC YKM 3 Random Testing Advantage N 0 test generation using structural information needed Test setup using comparison Random pattern generator comparator Random pattern generator comparator Stored esponse Alternative Is response reasonablesoftware testing 10212008 FTC YKM 4 October 21 2008 Random Testing YKM Pseudorandom PR Testing Unlike true random reproducible Will not repeat until all combinations applied Generation usually justintime not stored Autonomous linear feedback shift register ALFSR Cellular automata etc possible Some randomness properties satisfied but not all 4 10212008 FTC YKM 5 a Coverage Achieved Coverage grows fast in the beginning saturates near end 1975 Is it described by CL1e39aL No doesn t fit It is controlled by distribution of expexted coverage 9 C 0 LA detectability of faults M74 0 0 IS 1I0 1I5 20 Detectability profile Malaiya ampYang LL L L161 84 vectors hz u N total possible vectors Tom fauus Mth hk number of faults detected by exactly 39 hlz number 0f leaSt teStable k vectors faults 10212008 FTC YKM 6 October 21 2008 Random Testing October 21 2008 Detectability Profiles Ex CECL Full adder 3 Inputs4 N16 M90 h Hh19h29h39h4 9h59h69h8 1 I k SChnelder S 25 X Hardest to test counterexample f hk Inputs 4 N16 M44 1 IIIIIIIMJI Hh1h2h3h14231911 0 1 2 3 4 5 6 7 8 910111213141516 k d 10212008 FTC YKM 7 quot nunsva Coverage With L random vectors hk out of M defects detectable by exactly k vectors detection probability kN k Pa defect with dp kN not detected by a vector 1 W Pa defect with dp kN not detected by L vectors l L k Of hk faults expected number not covered is l N L hk Expected test coverage with L vectors Lhk N k 1 1 CL 1 N M FTC YKM 8 11 10212008 3 Elf1 7quot ll39 39 ln YKM Random Testing YKM Ex CL and components for CECL Full Adder CECL full adder After20 vectors covered 072 1024 197 4286 2099 400 800 remaining 028 076 0 03 0 14 001 0 00 000 10212008 FTC YKM 9 Coverage of partltlons d c E d gt o O C 2 N a 0 10 15 20 25 test length L 10212008 FTC YKM October 21 2008 Random Testing YKM Shift in profile with progress in testing El Initial l After 20 vectors Undetected Faults d 10212008 FTC YKM 11 a w r niva Coverage Obtained by L Vectors 1 p999 For PR tests McClusky 87 N L NiL CrL C h C L 21 k k 05 lt gt NCk M 3229 x 1 i 1 if h quot for Random 10274 0 39 39 39 k1 N M 0 5 10 15 20 0 For large L terms with only low k ie faults that are hard to test have an impact Thus only lower elements of H need to be estimated 0 For CECL Full Adder C15 1 42 164 09 63 084 003 0 10393 10212008 61110 FTC YKM Eh 5 5 October 21 2008 Random Testing October 21 2008 Detectability Profile software Regardless of initial pro le after some initial testing the profile will become asymmetric Dunham s data based on NASA experiments for 16 faults faults m co A 01 m 1 0 001 002 003 004 005 006 007 008 009 01 error rate d 10212008 FTC YKM quot nunsva Detectability Profile software Adam s Data Adam39s data Product 1 0017 0053 0167 0526 1667 5263 1667 5263 Detection rate 10212008 FTC YKM 5h 75 Defects with this detection rate A YKM 7 Random Testing October 21 2008 Detectability Profile Software 39 Software detectability profile is exponential Adam s data IBM W Justification Early testing 08 will find amp remove easyto 39 M test faults M 39 Testing methods need to focus on hardtofind faults 0T s 2 Hard to test Low hanging fruit V W 10212008 FTC YKM 15 1 139 le 39 Implications Fault Seeding A program has X defects We want to estimate X 39 Seed j new faults Do some testing Let faults found be j1 seeded faults and X1 original faults Assuming j1j XlX we get x x1 J 1 However in reality the X faults include harder faults to test xlj 39 x J gt 1 hencexgt J x 11 10212008 FTC YKM YKM 8 Random Testing October 21 2008 Implications Estimation by Inspection Sampling 39 Software with X bugs is inspected by two separate teams that finds Xland X2 bugs respectively of which X3 are shared Assuming XlX X3X2 we get x1 x2 x x3 However actually since X includes more harder to test faults x x x x 3 gt 1 hence x gt x2 x x3 W 10212008 FTC YKM W39lh a E Implications fault exposure ratio Let Nt be the number of bugs at timet during testing then if a is a parameter w aNt dt If a is constant then N t N 0 expo SRGM However in random testing a should decline as faults get harder to find If testing is intelligent then a can rise which can give rise to Logarithmic SRGM 10212008 FTC YKM YKM 9 Random Testing YKM References YK Malaiya A von Mayrhauser and P Srimani An Examination of Fault Exposure Ratio IEEE Trans Software Engineering Nov 1993 pp 1087 1094 S C Seth V D Agrawal H Farhat quotA Statistical Theory of Digital Circuit Testabilityquot IEEE Trans Computers 1990 pp 582 5 86 K Wagnor C Chin and E McCluskey Pseudorandom testing IEEE Trans Computer Mar 1987 pp 3327343 J R Dunham quotExperiments in software reliability Life critical applicationsquot IEEE Tran SE January 1986 pp 110 123 Y K Malaiya S Yang The Coverage Problem for Random Testing IEEE International Test Conference 1984 pp 237 245 a 3 b 10212008 FTC YKM October 21 2008 10 Fault Tolerant Computing CS 530 Lecture Notes 1 Introduction Yashwant K Malaiya Colorado State University What We Will Study Essential terms How defects arise Fault taxonomies Fault handling Reliability attributes and measures Redundancy types Deterministic vs probabilistic approaches August 25 2008 Faulttolerant Computing Objective to achieve very high reliability in computing How Design for high reliability Test to find and remove potential faults Use redundancy to tolerate faults In hardware software amp media Some approaches are common Some not Approaches Systems L um August 252003 3 wummv About this course Texts and courses generally focus on Reliability or Testing or Redundancy Hardware or software This course attempts to address different aspects of highly reliable computing Relationships among Reliability Testing and Redundancy Similarities and differences between hardware amp software issues No single book used Study based on Course notes Articles including some by instructor Various sources August 252003 I l 4 e August 25 2008 llurpljp s 31310 0 anthing that tam g0 mung mill Actually not by Murphy but by Finagle U30 1213211318th there 1395 011 BXEBptiDII C3530 laws 0 gnytljmg that tam g0 mung it thentually mill but It may not go wrong for a while It may not go wrong the next time Only one thing may go wrong at a time m August 252003 5 307 Hulxmiv Reliability increasing concern Historical High reliability in computers was needed in critical applications space missions telephone switching process control etc Contemporary Extraordinary dependence on computers online banking commerce cars planes communications etc Hardware is increasingly more faultprone Software is increasingly more complex Things simply will not work without special reliability measures August 252003 6 August 25 2008 Correct Operation in Computing 4 Smr Good software Good hardware operation These are the system components All are needed for proper operation August 252003 g 7 Memo Reliability approaches Fault Avoidance Vs Tolerance Fault avoidance eliminate problem sources Remove defects Testing and debugging Robust design reduce probability of defects Minimize environmental stress Radiation shielding etc Impossible to avoid faults completely Fault tolerance add redundancy to mask effect Additional resources needed more later Examples Error correction coding Backup storage Sparetire etc y August 252003 C 8 H arkMN August 25 2008 Terml n o ogy Internal to system external efecrs Errors Failures physical informatim application Latent fault which has not yet produced error Faulty component will produce error only when used by a process Latent error which has not yet produced failure An infected person may not show symptoms of a disease Unfortunately terminology is not standard You need to ensure you have understood author s intent m August 252003 9 gaggle luuxmrv Origin of Defects in Objects in hardware or software Good object wearing out with age Hardware software can age too Incorrect maintenanceoperation Good object unforeseen hostile environment I l ll rI lug llLllll ll swung ility Environmental fault Marginal object occasionally fails in target environment Tight designbad inputs Implementation mistakes Specification mistakes Object refers to a piece of hardware or software August 252003 August 25 2008 Fault Taxonomies Cause a previous slide Nature Software Hardware Digital causing a change in binary logic behavior Analog Ex high supply current Duration of the fault Permanent You have to throw away the unit Temporary Intermittent marginal system Ex a loose connection Transient environmental Ex charged particles causing soft errors Permanent with repair repair makes the fault go away Au st 25 2003 g 11 uliixemv Why We Need High Reliability High availability systems Telephone Transaction processing banksairlines Long life missions Unscheduled maintenance too costly Long outages manual reconfiguration OK Critical applications Critical applications Realtime industrial control Flight control Ordinary but widespread applications CDs encoding Internet packet retransmission August 25 2003 12 g 2 amp August 25 2008 What to do about faults Finding amp identifying faults Fault detection is a fault there Fault location where Fault diagnosis which fault it is Automatic handling of faults Fault containment blocking error flow Fault masking fault has no effect Fault recovery back to correct operation W Augustmoog Remember the terms In blue 13 ammo Reliability Measures formal defs Failure rate fraction of units failingunit time 1000 units 3 failed in 2 hours Failure rate31000x215x103 per hour Mean time to failure MTTF expected time before unit fails Corresponds to inverse of failure rate Reliability probability system will survive to time t Availability probability that system is operational at time t Corresponds to fraction of time system is operational W August 252003 14 39rv mixmmquot August 25 2008 Common Reliability Attributes 1 Dependability combination of several measures Safety attribute of a system which either operates correctly or fails in a safe manner Failsafe ex traffic light blinks red upon failure Performability combination of reliability amp performance Graceful degradation loss of performance due to minor failures Some of the terms are not defined in a way to be quantifiable A t 25 2008 ugus 15 lllllxe ie quot Common Reliability Attributes 2 Security authentication confidentiality integrity etc Survivability combination of dependability and secur y Testability ease of detecting presence of a fault Controllability and observability Maintainability ease of repairing a system after failure Quantitative measures for testability have been proposed but not widely accepted August 25 2008 16 n aux11m 39 August 25 2008 System Response to Faults Error on output may be acceptable in noncritical systems if happens only rarely Fault masking output correct even when fault from a specific class occurs Critical applications airspacemanufacturing Faultsecure output correct or error indication Retryable banking telephony payroll Fail safe output correct or in safe state Flashing red traffic light disabled ATM W August 25 2008 17 S I mka Need for fault tolerance Universal amp Basic Natural objects Fat deposits in body survival in famines Clotting of blood self repair Duplication of eyes graceful degradation upon failure Manmade objects Redundancy in ordinary text Asking for password twice during initial setup Duplicate tires in trucks Coin op machines check for bad coins August 25 2003 18 August 25 2008 Redundancy Spatial hardware Redundancy Replication higher level Encoding low level Temporal time Redundancy Rollback and retry Encoding Retransmission in networks ARQ Procedural Redundancy Checking small overhead Software redundancy nversion Design verification Au 212008 6 s g 19 RedundancyCont Analog Redundancy Use of slack or margin Ex allow for extra delays in chips due to temp rise Information or Data Redundancy already included in Spatial Ex bus with 8 bits 1 bit parity or Temporal Ex packet transmitted serially with party bit at the end Exact classification is sometimes hard Disadvantages Overhead Difficulty of testing Unmanagedexcessive redundancy increase unreliability Au st 25 2003 Q g 20 August 25 2008 10 Faulttolerant Computing Deterministic approaches Based on simplifying assumptions fault model Obtain methods using the models test generation Evaluation of effectiveness Used for Testing amp combinatorial faulttolerance Probabilistic approaches We can t predict exactly when a person will die but we can get life expectancy 772 years if we have data Used for evaluating achieving and optimizing reliability Random testing Chgme August 25 2003 21 Course Topics Testing Faultmodeling test generation Testability and blackbox testing Reliability Permanent and temporary faults Replication and retry Pursuit of ultrareliability Software reliability Defects factors reliability growth Reliability strategies Other topics quot August 252003 a g 22 quotquot lllt 39 August 25 2008 11 August 25 2008 References A Conceptual Framework for System Fault Tolerance A detailed introduction to Fault Tolerance Fault Handling and Fault Tolerance Introduction to how fault tolerance is achieved Dependability And Its Threats A Taxonomyquot by Algirdas Avizienis Jean Claude Laprie B Randell Advanced intro by distinguished researchers Au st 25 2003 W gu 23 391 New quot 12 9162008 Objectives The number of potential defects in a unit under test is extremely large A faultmodel presumes that most of the defects can be described by a well defined faults as given later in this Lecture Notes Here we primarily focus on hardware however there is something analogous in software test coverage Septemba 12008 Fault Tolerant Computing 2 alaiya 9162008 Fault Modeling Why fault modeling Stuckat 01 fault model The single fault assumption Bridging and delay faults MOS transistors and CMOS Switchlevel fault model Stuckonopen Shorts and IDDQ W Sep emb 12 03 Fault Tolerant Computing 3 lill lut v quot YK Malaiya Fault Modeling Fault Model a set of assumed faults in a system such that testing for them will test for most faults of a specific class Used for test generation fault simulation and quality evaluation A fault model hides complexities of actual defects Infinitely many defects possible Fault Models are based on past knowledge of defect modes and modeling experience X Sep emb 12 03 Fault Tolerant Computing 4 u ultwwj 39 YK Malaiya Common Fault Models No model test exhaustively ExhiaiUStiVstestingApplying all p o s Sible combinations Hardware fault models Gate level stuckat 01 most common bridgingfaults delayfaults Transistor level stuck onopen faults bridging faults Functional fault models Software fault models No formal fault model but the software test coverage concept is closely related Sep emb 163003 Fault Tolerant Computing 5 Hulda YK Malaiya Failure mechanisms in hardware Temporary sensitivity to charged particles etc Permanent Opens broken connection also nearopens Shorts unwanted connection also near shorts Can be seen in magnified chip photos Others Imperfect devices Analog impairments like excessive delays Septemba 16 2008 Fault Tolerant Computing 6 YK Malaiya 9162008 Stuckat 01 Model Classical model well developed resultsmethods Many opens and shorts result in a node getting stuckat a 0 or 1 May not describe some defects in today s VLSI still a nice way of structural probing Covering all stuckat on will result in covering a large fraction of all faults Model any one or more of these may be stuck at 0 or 1 a gate input a gate output a primary input Justification many lower level defects can be shown to have an equivalent 60mmon abbreviiations Septemba 162008 Fault Tolerant C 53701 S a391 7 a gt Y Malaiya imam What is a test A test for a specific fault is an input combination which results in different outputs for the normal and faulty circuits Application of a test will reveal the presence or absence of that fault by observation of the output For a combinational circuit a test is called a test vector or a test pattern A set of tests is called a test set Septemba 12008 Fault Tolerant Computing 8 alaiya 9162008 9162008 Stuckat 01 Example Normal function 2 x1 x2x2x3 Faulty Function when a fault is present X1 x2 sa1 2 x1 x3 a sa1 gtz x1x2x3 b sa1 2 x1x2x3 c sa1 2 1 Note39th at a sa o39an39d X27s a0 have different39impa ct m z sa1 gtz 1 Example Find a test vector for x2 sa1 Input x1 x2x3 001 Output 0 normally 1 if faultv k i Septemba 162008 Fault Tolerant Computing 9 I YK Malaiya Single Fault Assumption Assumption only one fault is present at a time Significantly reduces complexity Good for fault detection complete single stuck test setwill detect almost all multiple faults Not good for fault location A Multiple fault is a simultaneous presence of several single faults How many multiple faults in a unit Assume k lines 3 states per line normal saO sa1 Total 3k1 faulty situations For k1000 total 13x10477 0 neg am oing 393 K sit u at i o39n s 360mb 12003 Fault Tolerant Comput normal YK 11 2595 e Delay Faults Some defects can cause a gate to respond after an excessive propagation delay As a result some chips will not work at the intended clock frequency but may work at a lower frequency ie slower speed Delay faults are quite common and thus all chips must undergo testing for potential delay faults 193wa Sep emb 12 03 Fault Tolerant Computing YK Malaiya Delay Fault Model Excessive path delay or gate delay Signals may get sampled before stabilization Example OR gate delay increases from 02 to 08 ns 02ns Normal Fault c The fault causes thewlon g est path totake12 ns causing sampling before signal stabilizes Septemba 162008 Fault Tolerant Computing alaiya Clock C Ma 10 ns slack A b 6ns x prop delay 12 ns 9162008 Bridging Short Fault Model Model Two lines can get shorted bridged Common assumption only nearbylines can be bridged Impact of a short can depend on the technology and transistor dimensions Sometimes a 0 dominates over a 1 causing both bridged lines to become 0 Sometimes a 1 may dominate Sep emb 163003 Fault Tolerant Computing 13 ml YK Malaiya Impact of Bridging Faults Faulty function with AND 21 x1x2 bridging 0 21 x1x2 x1x2 x1x2 22 x1x2 x3 22 x2X3 OR Feedback bridging can cause Oscillations odd inversions Sufficient delay Setting at intermediate voltage level Septemba 12008 Fault Tolerant Computing 14 YK Malaiya 9162008 MOS Transistors Nchannel m 1 sq Pchannel 1 closed Open closed W sep emb 12 03 Fault Tolerant Computing 15 mlmw YK M 39 alalya CMOS NOR Gate VDDH A1 q off 1 0 3390 on 0 D Output0 verify off 0 n G nd L Septemba 12008 Fault Tolerant Computing 16 YK Malaiya 9162008 Switchlevel Fault Model 1 Model A transistor may be PB stuckopen stuckopen Stuckon PA stuckopen output effectively saO NA stuckopen To sensitize output AB10 Normal output 0 Faulty output high imp previous value sequential behavior Needed testpair T1 00 output 1 initialize T2 10 0 normal test 1 If faulty Septemba 162008 Fault Toleran flggllggygng Switchlevel Fault Model 2 StuckON VDD Assume PA is stuckON A A B1 0 normal out0 1 pA s a faulty out B pB Depends on relative resistances dimensions etc Low resistance between VDD NB and Gnd very high supply NA current IDDQ quot Gnd Septemba 12008 Fault Tolerant Computing 18 YKMalaiya 9162008 Shorts and IDDQ H Logical value can not be predicted in 0quot eneral 0 Undefined logic value 9 Very high supply current IDDQ Generally lDDQ based testing is very L effective for detecting some defects short H 1 off Undefined logic value on L Septemba 162008 Fault Toleran ggrgggyting 19 Transistor Level Faults A digital circuit has two power supply terminals High often called VDD and Low often called ground A transistor is a switch that either on conducts current or open Output of a gate is High 1 when the output is connected to High terminal through the transistor assembly Similarly the output is Low 0 when it gets connected to Low terminals Septemba 12008 Fault Tolerant Computing 20 YK Malaiya 9162008 10 9162008 Transistorlevel faults Impact Shorts or opens in the transistor assembly can cause these behaviors Output cannot become 1 Output cannot become 0 Output behaves as if one of the inputs was always 1 or 0 regardless of actual value of the input If an output gets connected to both High and Low supply terminals at the same time it causes shorting between them causing a very high current to flow Currentbased testing is often called IDDQ testing 793 Mamba 12003 Fault Toleran ggr ggyting 21 Refe re n ces Design for Testability in Digital Integrated circuits Bob Strunz Colin Flanagan Tim Hall Tutorial Delay Fault Models and Coverage Proc 11th Int Conf VLSI Design Page 364 1998 Ananta K Majhi Vishwani D Agrawal R Rajsuman APJayasumana YKMalaiya On Accuracy of SwitchLevel Modeling of Bridging Faults in Complex Gates 24th Conference on Design Automation June 1987 pp 244 250 WK AlAssadi YK Malaiya AP Jayasumana Faulty behavior of storage elements and its effects on sequential circuits IEEE Trans VLSI Dec 1993 pp 446 452 YK Malaiya AP Jayasumana Qiao Tong SM Menon Enhancement of resolution in supply current based testing for large le VLSI Test Symp April 1991 pp291 296 YK Malaiya and R NarayanaswamyquotModeling and Testing for Timing Faulls in Synchronous Sequential Circuitsquot IEEE Design amp Test pp62 741984 In library Septemba 12008 Fault Tolerant Computing 22 YK Malaiya 11 Coverage amp Reliability 7 IYa39shwant Test Coverage Measures Statement or Block coverage Branch or decision coverage P use coverage puse pair variable definedmodified use as predicate C use coverage similar use for computation Subsumption hierarchy Covering all branches cover all statements Covering all puses cover all branches YKM December 5 2006 Coverage amp Reliability Modeling Defects Time amp Coverage approximately linear here may not be observed in some programs 05 095 l quotI Malaiya Li Bieman Karcich Skibbe 1994 Li Malaiya Denton 1998 Coverage Based Defect Estimation 0 Coverage is an objective measure of testing Directly related to test effectiveness Independent of processor speed and testing efficiency 0 Lower defect density requires higher coverage to find more faults 0 Once we start finding faults expect coverage vs defect growth to be linear YKM December 5 2006 Coverage amp Reliability December 5 2006 LogarithmicExponential Coverage Model Hypothesis 1 defect coverage growth follows logarithmic model C t lnl 610t C0t 31 Hypothesis 2 test coverage growth follows logarithmic model CO 1n1 sz CO 1 v4 1252006 FTC YKM 5 m 213 mm LogExpo Coverage Model 2 0 Eliminating t and rearranging C0 as lnl afexpaCi l C0 1 where C 0 defect coverage C i test coverage as a a parameters 139 branch cov p use cov etc 0 For large Ci we can approximate C0 Aquot B Cquot YKM 3 Coverage amp Reliability Coverage Model Estimated Defects uCi Linear Approximation after the knee Defects 0 10 20 30 40 50 60 70 80 90 100 Cove ra 92 mCi Ai BiCi Ci gt Cgm Only applicable after the knee Assumptions Stable Software 1252006 FTC YKM 7 YKM Location of the knee Based on interpretation through logarithmic model Location of knee based on initial defect density Lower defect densities cause knee to occur at higher coverage Parameter estimation Malaiya and Denton HASE 98 W 1252006 FTC YKM l mu mni 7 December 5 2006 Coverage amp Reliability Data Sets Used Vouk and Pasquini Vouk data from N version programming project to create a ight controller Three data sets 6 to 9 errors each Pasquini data Data from European Space Agency C Program with 100000 source lines 29 of 33 known faults uncovered w 1252006 FTC YKM 9 1 03 mm Defects vs Branch Coverage Data Set Pasquini Defects Expected Fitted Model Model 0 Data t t t t t t t t t t t t t t t t t t t 20 24 28 32 35 40 44 48 52 55 50 54 58 72 75 80 84 88 92 95 100 Branch Coverage V db 1252006 FTC YKM 10 to YKM December 5 2006 Coverage amp Reliability Defects vs P Use Coverage Data Set Pasquini Defects Expected Fitted Model l 0 Nbde o Data 20 24 28 32 35 40 44 48 52 55 50 54 58 72 75 80 84 88 92 95 100 PUse Coverage w 1252006 FTC YKM 1 1 5 m3 mumv Estimation of Defect Density 0 Estimated defects at 95 coverage for Pasquini data assume 5 dead code 0 28 faults found and 33 known to exist Measure Coverage Expected Achieved Defects Block 82 36 Branch 70 44 P uses 67 48 YKM December 5 2006 Coverage amp Reliability YKM Defects Defects VS P Use Coverage Data Set Vouk 3 39 Defects Expected Fltted Model 2 ModeI o Data o i i i i i i 35 40 44 48 52 55 so 64 68 72 75 80 84 88 92 96 100 w 1252006 P39Use cwerage FTC YKM 13 a Mei mm C B d E t39 t39 Data Set Pasquini et al 50 507 40 Estlmates are stable w 30 7 m D 201 I I 10 o i i i i i i i i i i N g 8 3 395 8 g g 3 Q 3 8 3 i g R 3 B 93 v Nmmlttlttm o olxlmmmeezvm TestCases 1252006 FTC YKM December 5 2006 Coverage amp Reliability Current Methods 0 Development process based models allow for a priori estimates Not as accurate as methods based on test data 0 Sampling methods often assume faults found as easy to find as faults not found Underestimates faults Exponential model Assume applicability of exponential model We present results of a comparison w 1252006 FTC YKM 15 5 n3 quotMVIWSJquot The Exponential Model Data Set Pasquini et al Estimate rises as new defects found Estimates very close to actual faults 1252006 FTC YKM YKM December 5 2006 Coverage amp Reliability Related articles Frankl amp Iakouneno Proc SIGSOFT 98 8 versons of European Space Agency program 10K LOC Single fault reinsertion Tom Williams manuscript 1999 analysis from first principles 0 Peter G Bishop SAFECOMP 2002 A related model unreachable code v4 1252006 FTC YKM a 213 mm er Observations and Conclusions 0 Estimates with new method are very stable Visual confirmation of earlier projections Which coverage measure to use Stricter measure will yield closer estimate 0 Some code may be dead or unreachable Found with compile or link time tools May need to be taken into account 7 db 1252006 FTC YKM 18 AU ItuilF r YKM December 5 2006 Random Testing YKM Random Testing Outline RT advantages and tradeoffs RT vs pseudorandom testing PR Coverage and detectability pro le Hardware and software DPs CL for random and pseudorandom tests High and low testability faults during early amp late testing Implications of a late asymmetric pro le November 30 2006 Random Testing YKM Random Testing 0 Extensively used for both hardware and software Ideally each input is selected randomly PR Pseudorandom schemes approximate random Generally quite effective for moderate coverage Coverage hard to determine a priori Ineffective for random pattern resistant faults Coverage tools Random functional followed by Structural testing v4 11302006 FTC YKM Random Testing Advantage No test generation using structural information needed Test setup using comparison Random pattern generator comparator Random pattern generator comparator Stored esponse Alternative Is response reasonablesoftware testing 7 db 11302006 ig ut1 FTC YKM 4 November 30 2006 Random Testing YKM Pseudorandom PR Testing Unlike true random reproducible Will not repeat until all combinations applied Generation usually justintime not stored Autonomous linear feedback shift register ALFSR Cellular automata etc possible Some randomness properties satisfied but not all 11302006 FTC YKM 5 Coverage Achieved Coverage grows fast in the beginning saturates near end 975 Is it described by CL1e39aL 7 No doesn t t It is controlled by distribution of detectability of faults M74 0 39 39 39 O 5 10 15 20 CrL 05 exp exted coverage Detectability profile Malaiya ampYang L1 L 16 8 vectors Hhl h quot hN Total faults Mth N total poss1ble vectors hk number of faults detected by exactly hlz number 0f leaSt teStable k vectors faults 11302006 FTC YKM 6 November 30 2006 Random Testing November 30 2006 Detectability Profiles Ex 0 CECL Full adder j Inputs4 N16 M90 h Hh1h2h3h4 h59h69h8 1 11 2 43 21 48 o 1 2 3 4 5 s 7 Bi i g 39 SChHEider S 25 X Hardest to test counterexample T Inputs 4 N16 M44 k1 IIIIIIIMJI Hh1h2h3h14231911 0 1 2 3 4 5 e 7 8 910111213141516 k 11302006 FTC YKM 7 m ma wumwm Coverage with L random vectors hk out of M defects detectable by exactly k vectors detection probability kN k Pa defect with dp kN not detected by a vector 1 N Pa defect with dp kN not detected by L vectors l gr k Of hk faults expected number not covered is l N L hk Expected test coverage with L vectors CL1 l H 11302006 FTC YKM 8 YKM Random Testing YKM Ex CL and components for CECL Full Adder CECqulladder N 16 M 90 After 20 vectors covered 072 1024 1797 4280 20199 400 800 remaining 028 070 003 014 0101 000 0100 M 11302006 FTC YKM 9 1 ma mmwu Coverage of partitions partition coverage 0 5 10 15 20 25 test length L V db 11302006 FTC YKM 5 1 N 1U3 November 30 2006 Random Testing YKM Shift in profile with progress in testing El Initial l After 20 vectors Undetected Faults 11302006 FTC YKM n ImWRtll 1 Coverage Obtained by L Vectors 1 3999 For PR tests McClusky 87 N L NiL CrL C h C L 21 k k r 05 39 lt gt NCk M 929 z 1 ZN1 if k for Random 0274a 0 39 39 39 kil N M 0 5 10 15 20 L1 L 16 0 For large L terms with only low k ie faults that are hard to test have an impact Thus only lower elements of H need to be estimated 0 For CECL Full Adder C15 1 42 164 09 63 084 003 0 10393 7 11302006 FTC YKM C sum 12 Uuumm 7 November 30 2006 Random Testing November 30 2006 Detectability Profile software Regardless of initial profile after some initial testing the profile will become asymmetric Dunham s data based on NASA experiments for 16 faults faults mwbmmxi 0 001 002 003 004 005 006 007 008 009 01 error rate 11302006 FTC YKM Detectability Profile software Adam s Data Adam s data Product 1 Defects with this detection rate 0017 0053 0167 0526 1667 5263 1667 5263 Detection rate 11302006 FTC YKM YKM 7 Random Testing YKM Detectability Profile Software Software detectability profile is exponential Adam s data IBM 39 Justification Early testing will find amp remove easyto test faults Testing methods need to focus on hardtofind faults T k T Hard to test Low hanging fruit mmwm 11302006 FTC YKM Implications Fault Seeding A program has X defects We want to estimate X Seed j new faults Do some testing Let faults found be j1 seeded faults and X1 original faults Assuming jllj XlX we get szli Ji However in reality the X faults include harder faults to test 39 x 1 J xlj hence x gt J1 11302006 FTC YKM November 30 2006 Random Testing YKM Implications Estimation by Inspection Sampling Software with X bugs is inspected by two separate teams that finds Xland X2 bugs respectively of which K3 are shared Assuming X1X X3X2 we get x1352 x x3 However actually since X includes more harder to test faults x x x x 3 gt 1 hence x gt x2 x x3 11302006 FTC YKM 0 n6 mmwm Implications fault exposure ratio Let Nt be the number of bugs at time t during testing then if a is a parameter LN aNt If a is constant then N t N 0 expo SRGM However in random testing a should decline as faults get harder to find If testing is intelligent then a can rise which can give rise to Logarithmic SRGM 11302006 FTC YKM November 30 2006 Random Testing November 30 2006 References YK Malaiya A von Mayrhauser and P Srimani An Examination of Fault Exposure Ratio IEEE Trans Software Engineering Nov 1993 pp 1087 1094 S C Seth V D Agrawal H Farhat quotA Statistical Theory of Digital Circuit Testability IEEE Trans Computers 1990 pp 582 586 K Wagnor C Chin and E McCluskey Pseudorandom testing IEEE Trans Computer Mar 1987 pp 3327343 J R Dunham quotExperiments in software reliability Life critical applications IEEE Tran SE January 1986 pp 110 123 Y K Malaiya S Yang The Coverage Problem for Random Testing IEEE Intemational Test Conference 1984 pp 237 245 v4 11302006 FTC YKM I 3123 ll iw n 39 YKM 10 Fault Tolerant Computing cs 530 Introduction Yashwant K Malaiya Colorado State University August 24 2006 mum Faulttolerant Computing Objective to achieve very high reliability in com ut39n How Design for high reliability Test to find and remove potential faults Use redundancy to tolerate faults w Wu m hrphp s 31am gnything that tan an inning mill Actually not by Murphy but by Finagle 0L0 enery lain there is an exteptinn C5530 laws gnything that tan an mung it enentually mill but It may not go wrong for a while It may not go wrong the next time Only one thing may go wrong at a time mum Reliability increasing concern Historical High reliability in computers was needed in critical applications space missions telephone switching process control etc Contemporary Extraordinary dependence on computers online banking ommerce cars planes communications etc Hardware is increasingly more faultprone Software is increasingly more complex Things simply will not work without special reliability measures mum August 24 2006 Reliabilit a roaches Correct Operation F V pp ault Avoidance Vs Tolerance Fault avoidance eliminate problem sources 7 Remove detects Testing and debu ging e Robust design reduce probability oi detects e Minimize environmental stress Radiation shielding etc 7 lmpossibleto avoid laults completely Good hardware Fault tolerance add redundancy to mask effect Additional resources needed more later 7 Exampes r Enarcanectlancaalng 39 Backupstarage r Sparetlre etc operatl on a mm M ma martmax M ma W m 5 m s Terminology Origin of Defects in Objects hwsw Internaltosystem Meme 3 Good obiect wearing out with age failures 7 Hardware sottware can agetoo e incorrect maintenancyoperation Good obiect unforeseen hostile environment responsibility 7 Environmental tault Marginal object occasionally fails in target environment 39D etects physloal 39 3 application Increaslrig Latent fault which has not yet produced error Latent error which has not yet produced failure Unfortunately terminology is not standar 7 Tight designbad inputs Implementation mistakes Specification mistakes Whining mm 7 ah hwzym mm s