Structure Comp Org
Structure Comp Org CDA 4101
Popular in Course
Popular in Computer Design Architecture
This 22 page Class Notes was uploaded by Daron Wunsch on Monday October 12, 2015. The Class Notes belongs to CDA 4101 at Florida International University taught by Nagarajan Prabakar in Fall. Since its upload, it has received 159 views. For similar materials see /class/221765/cda-4101-florida-international-university in Computer Design Architecture at Florida International University.
Reviews for Structure Comp Org
Report this Material
What is Karma?
Karma is the currency of StudySoup.
Date Created: 10/12/15
Structured Computer Organization Final Exam Summary Chapters 1 2 3 Instruction Execution Cycle Fetch the next instruction from memory into IR Change the program counter to point to the next instruction Determine the type of instruction just fetched If the instruction uses a word in memory determine where it is Fetch the word if needed into the CPU register Execute the instruction Go to step 1 to begin executing the following instructions N99 P9 N Chapter 4 The Micro Architecture Level U sing a subset of a Java Virtual Machine IVM to describe the ISA De nitions State of a computer set of variables of a microprogram 0pcode operation code that identifies an instruction Data Path the part of the CPU containing the ALU its inputs and outputs Data Path timing Ragislols Ioadad Shlllev lnstamanoously lmm on put 0 pus and memory an Cycle 1 slahla nst edge a clock SIBHS f new I l 39 l The data ath cle New MP0 used In f oquotquot ee 1 The control signals are set up A w i 2 The registers are loaded onto the B bus MPC 39 S glntg it available 3 The ALU and shifter operate A y 353 5Wquot quotm 4 The result propagate along the C bus Drive H Fmpaoamn back to the registers A z and lrorll 5mm 3 bus m rogislevs Memoly Operation MAR MDR combination is used to read and write lSA level data words and PC MBR combination is used to read the executable ISAlevel program which consists of a byte stream All other registers that contain addressees use word addresses like MAR Structured Computer Organization Final Exam Summary Microinstructions To control the data path on the figure we need 29 signals divided into lt2 MM MW 5 functional groups 13d lt2 lmm ms 1m 9 signals to control writing data from the C bus to the registers msmuvy lt 9 signals to control enabling registers onto the B bus for ALU input 8 signals to control the ALU and shifter functions a hummus 2 signals not shown to indicate memory readwrite via MARMDR e Enabltamnabus f WMCMWWW 1 signal not shown to indicate memory fetch via PCMBR YOS onus The value of these 29 control signals specify the operations for one cycle ofthe data path ALU mnlml Q How can we reduce the number ofbits needed to select among the possible sources for drivingthe B bus where the signed and unsigned versions of MBR count separately A Encode the B bus info in 4 bits and use a decoderto generate the 16 control signals Possible format divided into 6 groups with 36 control signals Bils 9 3 a 9 3 4 JJJSSFUFIEEIIHOTCLSPMMWH MAALR NNNN POPVPCDA E B NEXLADDHESS PMMLA ABVC cs RRiA b CNZSI A DH 5 f Y f Addr JAM ALU C Mem 3 Addr 1AM 3 D in Contains the address of a potential next microinstruction Determines hoe the next microinstruction is selected ALU and shifter functions Selects which registers are written from the C bus emory functions Selects the B bus source it is encoded as shown Structured Computer Organization Final Exam Summary Microinstruction Control The MIC1 De nitions Sequencer responsible for stepping through the sequence of operations necessary for the execution of a single ISA instruction It produces 2 kind of information cycle the state of every control signal in the system the address of the microinstruction that is to be executed next The largest and most important item in the control portion memory that holds the complete microprogram read only memory Its memory address register is MPC MicropProgram Counter Its memory data register is MIR MicroInstruction Register 52x368d WSW um um miaupmgam Conucl Signals 1 Enabln mm a bus f Wm 1 bus to mgxslol The operation cycle is as follow it goes on and on until the machine is turned off Subcycle 1 At the start of each clockcycle MIR is Ioadedfrom the word in the control store pointed by MPC 2 The various signalsfrom MIR propagate out into the data path The B bus is loaded from the selected register 3 ALU and shifter operate and produce a stable result 4 The C bus memory buses and AL U values become stable Load the registers and the N and Z ip ops Determining the next microinstruction to be executed this happens in parallel with the data path cycle 1 After MIR is loaded and in an stable state the 9 bit NEXTADDRESS field is copied to MPC 2 Parallel to this the AM field is inspected If value is 000 nothing is done else is done a IflAMN is set N is ORed into the high order bit of MPC b IflAMZ is set Z is ORed into the high order bit of MPC Structured Computer Organization Final Exam Summary C IfIMPC is set he8 MRR the Current microinslruclion into MPC 3 When 1 completes MPC will point to the next microinstruction driven L The IIVM Memory Model following areas of memory are defined 1 The constant pool a this area cannot be written by an IIVM b Consists of constants strings and pointers C d he i te r r 2 The local variable frame a Area to allocate temporal variables LV register contains the address ofthe rst location in the frame 3 Operand Stack a Restricted size V into the slack or popped out of it 4 The method area Region ofmemory containing the program 39 5 SP Ins 0 Lv 100 1 13 1h The M10 1 has 2 ways ofaccessing memory ea 0 I L uiutu microinslruclions Both memory operation can proceed simultaneously Structured Computer Organization Final Exam Summary IIVM instruction set byte const and varnum are 1 byte others are 2 byte All permitted operations Any ofthe above can be extended by adding ltlt8quot to shift the result left by 1 byte EX H MBR ltlt 8 TOS top ofsnack contains the value ofthe memory location pointed to by SP OPC temporary register no specific use Commonly saves the address of an opcode for a branch instruction while PC is incremented to access the parameters Microinstructions are NOT consecutive in memory and Main is not at store address 0 because nop1 is Label Operation Commenls Malnl PC PC o l fetch golo MBR MBR holds opcode gel next bylo dispatch QW 90 M399 DQWJNquot laddl MAR SP SP 1 rd Read in maxiMop word on slack raddz H a TOS H a lop ol slack 11663 MDR TOS Mon o H M golo Mainl Add lop lwo words wnle lo lap 01 slack Isubl MAR SP SP 1 rd Read in nexllolop word on slack lsubZ H TOS H lop of slack isupa Mp QSF M98731 wr golo Malm Do sublrac llon wmg lqlgp ol slack rand MAR SP SP 1 rd Head in nexIlarlop word on stack landz H T08 H lop of slack lands MDR TOS MDR AND H M golo Mainl Do AND wrile lo new lop ol slack ion MAR SP SP 1 rd Head in nexIlo lop word on slack i032 HTOS Hlopofstack 303 MDR T05 a MDR OH H M golo Mainl Do OR wnle lo new lop of slack dupl MAR SP SP I Increment SP and copy to MAR dup2 MDR TOSVwr gale Maml iwrile new slack wort W 7 V popl MAR SP SP l rd Read in nexIerop word on slack p002 Wall lor new 105 lo be read lrom memory p093 TOS MDR 9010 Mainl Copy new word lo 105 swapl MAR SP 1 rd Sal MAR lo SP I read 2nd wold lrom slack swap2 MAR SP Sel MAR lo lop word swapa H MDR wr Save TOS In H wme 2nd word lo lop ol slack swa MDR TOS Copy old TOS lo MDR swaps MARSP 1wr SalMAFlloSP 1wmea52ndwordonslack swapG TOS H gala Mainl Update 108 smiciunzd Compuuz r Organization Final Exam Summary nwsm PMAnsiui Maniiiebyieiuwsnuiiosiadi niwsnz PC incmmem PC me next We h sria Man in M ni si nnximd wisimii M push on stack iioaai LV MEH iains exooayLVlaH xioauz MAR MERU ii a MAH andvass cl iecai vaviabie io gush ieana 5P i SP peinis In now ion a sin pmpale wine um I i mar W In F max nexi m mic mg 0 sxaox iiaaus 705 Mon 9010 Miami Updnlo TO Islmel H s LV MER comains KMBX Copy LV Io H isxorez MAR Mnnu H MAR a nddrass u imi variable 0 mm mm isms MM a 105 w Copy ms X0 MDR wnla mm mini 5 a MAR SP I m Hen1quot mimoiop word on am i ms lurrems l PC mm quotan opmas l 3quotquot g gaie mg 7 wiiisi PC 5 1 mm Feicii uperann by oi nexi came M6512 go MGR on moo Mammal mm mm mm m 52 aninani PC a ism MER minim isi mun hykz ieicn 2m wmjioaaz e si Index hylu sm ed Vaquot a nu mama H mm index 9 local vananiu w iieaaa MAR LV H 396 gm iloaes MAR address a iocai variable io push immi P c a I leldl MER wuiains isi Index um mm mm Menu ltlt 9 H Isl mm uyxe shined Mi 5 ms MBRU on H H a IMIH Index all local va ahle as is i wi c s i lalch MBH Domains isi mu um ieicn 2nd in we MBRU ltlt e si in cyia 8 inc w MENU on H iiizii ex mm mnslim pod inc W4 Mm a N cw m on new MAR s mamas a mm in pool mi a iv MER commas max cm LV K H am An Miami H 395 Copy LV mm In ma Nam vamma mic c PG v i ioiai mm s miss a a no I lvld quotMG 9 Vanuhh gala Save aaaiess ai We gem MER a si laer oi nlisei lawn 2m hyla 90103 Sam and save signed llisl Ier m H gum sRu on n H isnn hiancn Mlsel 90105 HI 90106 9010 Maui iiiii ma 5 7 SP iiii2 CFC 05 m3 m iii maxiMoo m a suck Suva ms In croimpnmniy Fm m w ul slack m Yes 1wa Hi mm oi a u Sal MAR Ia man in new lnDulAslacx a In N Structured Computer Organization Final Exam Summary 7 0P0 PC 1 9m 9092 Same as gem neeoea in legal address PC C I Invokovi u HQ MAR 5P SF I MDR m c I 10de mvokovmunlzz LV r05 9010 Mam 5 S a n a u 1 n Label Opural 5 Comments remml MAFI 5 LV rd Rssai SP MAR D gel link Delmar Vammz Wan or read lremma LV MAR MDH m Sal LV in link pin gel nlu PC lreiurM MAR LV I Sal MAR to read uld Lv lrelums PC an m ielm Heslove Pct elm nexi opmde returnG MAR SF 89 MAR in WINE T05 lrelumv LV MDE Re ave lrelumu MDR 105m gala Mam Save return value an nrig lnal mg m sack See examples ofmstructlons in book page 254 pdeSO The Design ofthe Microardiitecmre Level WW 1 39 39 r 5 a h Mw r RllrFrr r n rrl n efflclentslnce ltdepends on the data comingfmm me memory 2 Simplifythe organizationso thatthe clock cyde canbe shorter a cyc e Ex a decodingclrcult adds delay yet its efficient for low cost 3 Overlap the aiecution ofinstmctions a Structured Computer Organization Final Exam Summary of modern computer design but the system designer hast to decide whether the greater speed is worth the real state Ex the binary adder r the faster the quicker The MIC1 is moderately simple and moderately fast It has Minimum amount ofhardware r 10 registers Simple ALU replicated 32 times A shifter decoder control source and glue 5000 transistors Control store ROM and main memory RAM Wa s to reduce the number of reduce the number of microinstructlons ath len 11 for the MIC 1 Have 3 full inputbuses to the ALU in order to add 2 registers in 1 cycle Ex ILOAD LamI7 Overallsquot Cu um Labn Oparalion Cummsnls long maul MERU H a m murmur Claw ull loyrl 39 a loam AR menu LV rd MAR a ndamss ol local vanahle lo push 0 5 lasso mm cODusr ll da MAR SP SF I SP palms D now IOD olsmck Dmpule wnla quotmm M 39 5P 39SP 5quot Wm 39 quotW 39 quot mck mm mm mm DC 5 L mm W In PC gmm mm We up 0 sum lllma PC 7 PC l lalch wl Inc PO gal ncxl opcods wMe lop ol slack llaadS T05 MDR clu Mum Up BIS T05 lIO KM T05 MDR Upda a T Malnl PC 5 v l1elhgoalMBRl MEIR was made galnexl byle dlspnlch llaaus PC 5 PC l lnlch 9010 Man MBR nlveuuy holds encode lalch max by Merge the interpreter loop into the end of each microcode sequence Ex Pop Label n Gemmanls Label Operations pop P SP 711m Cf ln nexllroOlgl when olLs39lack popl MAR SP SP 7 1 rd Read ln nstolap Wald on stack DOD nl nr new 0 a ma rolll mamory m3 ms MD We Mmquot Copy MW Wm m ms Maln1pop PC PC l lelcll R h l s encode lelcn nexl me Malnl Pc no l lelcll you man 9 93 1 5 Remove load from the ALU by creating an independent unit to fetch and process the instructions Instruction Fetch Unit IFU A nite mte machine for implementing the IFU wala lemma a nu lell imll Mam Occurs when mam ls lea MEN Occurs when MERE ls 12m Wold lalcllud Gums men a memulywam ls lead 64 I1les am um um Inn 5th reams a The MIC2 Characteristics of the IFU39 Structured Computer Organization Final Exam Summary Eliminates the main loop entirely the end of an instruction branches to the next instruction directly Avoids the ALU incrementing PC Reduces the path length avoiding the need for assembly inll mommy Comm WW 1 Enzwnumo hns Mum comm mamas Opallam 91quot M35 MAR SP P comments Branch lo nmrnsrrnaron g Fiend rn noxunlop won m and MDI39j a ms MDRVH wr 90loMDHI MAR PsP 1m a T0 OSMDR on H wr gum M59 P 1 r u rn Hnos n 0905mm MDR MAR s 5p I m and m quotMllmnp word on sum H 105 a mu 0 mar H ummq ms 1mm Palmer 1054 n and n nextmop w on sxacx on or sand AN Fercn Sd writT05 Read quot enmop weEESn radr H Iw o a on Famed TosI wnn ms MAR em SP copy in f wm c nus In mgrnnr aupz a rgsw stack ward pom mm In quotundolop ward on slack D002 ml In tea pop 7 any nnw mm ID ms swan um 2nd word worn sum 59 MAR 10 SP Swap mph lo wnlv mm 206 Nd swap nvo now 105 wnla 2nd were In slack swap4 any old 105 10 man swaps P wv run on 105 m 2 plan on slack swaps gm Mum paaxg r95 W M u nrpnsnr P o 1 Sel up MAR lo wnung lo new lop nl slack quotquot I3 pusnz Mam wr gore Man i uwaie slam m ms and rnerncry 7 imam MBmu m Move LV Index m MAR rend wet1nd vloadz I lnclemnnl SP Mum new 5P m MAR quotwas MDR wr gore mam Updam sum In ms and ma cg Ismrul U Snl MAR In LV 0 H1601 rsromz MDR ms wl Copy TOS lo sienna mm on News MAR SP 7 s v r n1 uncranrnnt SP man new ms M quot7 59 a Qua rsrom anl at m nun c a To Save ms m cm lnmpomnly quot 5 T 5 9quot 9 myquot NM ms r M 105 M5 Pu new my 0 mm m ms w x gun mm an amino ex pddless rs Duoa Cred wrm opcode rlaql 2 opcr u 21 90m 7 arse 90m F mm m z in 141510 loam M u v Ln mmmauz mammal 1 ng usmg znyga rung llJcmnsqV SP 1 rd Road In nexxwIw mm 0 slim mare Isloml MAR LV 0 MEMU 901 lsleZ mammal ID slum but usmg 2 17le Index Io P SP 1 Sel MAR m read In newtmlsa m m 53 au 0 uoauz Same as was mm urn Indean on CPP quot W ng 1533quot 33936 mw lgnw Imc Sal MR 0 LV s max In read runnpeqs n Ful newton o4 slndlt In 105 quotquot 2 an 59 H lquot quot5 ange z 4 hygienmgomr umzwms I v T M3 39 E quoti an 9002 Same as gMoI gt 90m H PC 1 any PC I0 H i H 3 fuuch byias In has in dmm 9 PC H M332 5 quotquotW Aquot 96 PC F2 90 mam 90103 ave Io wall to IFU la Iercn new cpcode xnvohevmuall MAR PP Mamu Id Pu address 0 mama pornnn In MAR 9 Swim N H593 390 quot9 39quotHMW39 pc 53 am Fe m 0pc nm MAR SP s ad In I N om on sick Sal PC In Isl byleol rneum code rmz opc a ms mm 105 m opc empamnly m 105 nd rasso owner 7 rm T05 Mon um um pal xncu n 105 M n ms TOS mime owner mu N ops II N pom I also 90m F Brunch on N n MDH F OVEMIIIE ONHEF Wllll llllk pointer n v u M P D Em SP MAR loloc lm 0 hold Old PC Mole mm M I Program In saw Old PC m okev n inc SP Io norm Io Imuon is hold Mu W I o v w Saw om LV 93 Sal LV lo Ewn o zennn paramerer MAR LV m Resel SP MAR lo lead Llnk w Wan 10v nk prr LV on MDR Toswrgarowam Sal MAR m Dmnl ssion PC Sa Lv MAR lo link pxr reau old PC In old LV mad ma Lv Restore LV Savn mum value on angian lop ol slack The Data Path andMicmpmgramfor the MIC 2 Structured Computer Organization Final Exam Summary The MIC3 Improvement 3 bus architecture including the IFU with 3 additional latches registers one inserted in the middle of each bus The latches are written on every cycle The registers partition the data path into distinct parts that can operate independently known as a Pipelined Model The MIC3 program takes more cycles than the MC 2but it runs faster A SWAP operation takes 11 At nsec for the MIC3 and A18 for the MIC2 d from quotmm mummy Pipelining is a key technique in all modern CPUs Another way to look at Fig 4734 is to follow each instruction horizontally across the page For instruction 1 in cycle 1 the IFU is working on it In cycle 2 its registers are being put onto the Aand B buses In cycle 3 the ALU and shifter are working for it Finally in cycle 4 its WWW results are being stored back into the registers The thing to note here is that there are four sections of the hardware available and during each cycle a given instruction uses only one of l WWW 3quot them freeing up the other sections for different instructions l wms 2 INS m mum ALU comm The Data Path ofthe MIC3 Graphical iIIustraa39on ofpipelining Fig434 Inumcmm memory Tan m or Structured Computer Organization Final Exam Summary The MIC4 Improvements A highly pipelined design allows the individual steps to be very short thus the clock frequently high with 7 stages and more complex hardware Automatically prefetches a stream of bytes from memory decodes them into IIVM instructions converts them to sequences of microinstructions using a ROM and queues them for use as needed On each clock cycle the MIRs are shifted forward and the microoperation at the bottom of the queue is copied into MIR1 to start its execution The controls signals of the MIRs spread out through the data path causing actions to occur Each MIR controls a different microstep IJVM lenmll 39 Queueing unn lt9 chmopelamn ROM IADD lSUB LOAD l Decoder ammo 1m of penmng munops Insvunmn lam mm MIC4 Pipelined Main components 0fthe MIC4 Cache Memory A cache holds the most recently used memory words in a small fast memory speeding up its access The use of multiple caches is one of the most effective techniques for improving bandwidth and latency A basic technique is to introduce a separate cache for instructions and data split cache it allows memory operations to be initiated independently Level 2 cache an additional cache that resides between the instruction and data caches and main memory It often comes in the CPU package with an approximate size of 512 KB to 1MB Level 3 cache located on the processor board has a few megabytes of SRAM Structured Computer Organization Final Exam Summary CPU package Pmcessor 0quot Glapmcs Disk conlvollol Conlloilel connorlov l Splll LI Inslvucixon and data caches Boardlevel came SHAM System with 3 levels of cache Caches depend on 2 kinds of address locality to achieve their goal Spatial Locality memory location with addresses numerically similar to a recently accessed memory location is likely to be accessed in the near future Then bring more data than the requested with expectation of request Temporal locality recently accessed memory location are accessed again helps decide what to discard on a cache miss All cache use the following model main memory is divided into xed size blocks called cache lines Direct Mapped Caches Simplest cache A given memory word can be store in exactly one place within the cache Cache hit if the TAG field in the memory and in the cache agree Cache miss if the TAGS do not coincide Access is fast It possible to determine it a word is the correct word at the same time it is being read Collision when the next word needed is not available and an overwrite of the cache is neededare rarely because each memory operation takes longer than reading an entire cache line instead of just one wor SetAssociative Caches Solution to allow to or more lines in each cache ent Least Recently Used L RU algorithm is used to discard present items Structured Computer Organization Final Exam Summary Addvesses quotvar use this emry Entry T39 03 8 Valld Valid Valid 2047 131040 131071 DAmwbmmN El A El B E C El D Bus 6 ll 3 2 l ry n vy my n ry TAG LINE yWOFlD 39BYTEI 4 Way SetAssociative cache Direct mapped cache Definin ons Write through immediately updating an entry in the main memory Write deferred or write back if an error occurs recover the state of the memory Write allocation bring data into the cache on a write miss Branch Prediction Assume that all backward conditional branches will be taken and that allforward branches will not The backward branches are frequently located at the end of the loop most loops are executed multiple times so guessing that a branch back to the top of the loop will be taken is generally a good set When the branch is predicted wrongly it is difficult to undo instrucn ons that have already been executed and shouldn t have Solutions 1 Allow instruction fetched after a conditional branch to execute until they try to change the machine39s state and save the values in a scratch register 2 To record the value of any register about to be overwritten Dynamic Branch Prediction one approach The CPU maintains a history table in special hardware similar to the cache organizan on The history contains one entry for each conditional branch instruction along with the information ofwhether it was taken the last time it was executed This predicts if the branch will go the same way it went last time If the prediction is wrong the table is changed Emnchi Valm no branch 1 bit branch history 2 bit branch history A map between branch instructions and target addresses Structured Computer Organization Final Exam Summary Static Branch Prediction one approach have a compiler tell the hardware if at the end of the loop the branch will be taken When one of these is encountered the fetch unit does what it is being told Advantage no need for history table Out of Order Execution and Register Renaming Q What does it mean that a CPU is pipelined and superscalar A There is a fetch unit that pulls instruction words out of memory before they are needed in order to feed the decode unit The decode unit issues the decoded instructions into the proper functional units for execution Definitions RAW dependence read after write WAR dependence write after read WAW dependence write after write Rules 1 If any operand is being written do not issue RAW dependence 2 If the result register is being read do not issue WAR dependence 3 If the result register is being written do not issue WAW dependence A upen Lalal CPU with in d r i u and in rd mp1 tion Structured Computer Organization Final Exam Summary Chapter 5 The instruction set 39 r level Registers special purpose and general purpose Data types numeric and nonnumeric Instruction Formats a Zeroaddress instruction I OPCODE opcooel ADDRESS b Oneaddress instruction 3 m c Two address instruction 1 Three address instruction oecooe lADDRESS lADDRESSZI IOPCODEIADDRI IADDRZIADDRSI Cl 0 Expanding 0pcodes Consider n k bit instruction with k bit 0pcode and a single nbit address The instruction allows 2k different operations and 2H addressable memory cells 1514131211109876543210 llHIlHlllllllll Opcooe Address 1 Address 2 Address 3 An instruction with 4bit Upcode and3 4bit address eld Opcode 15 means that the opcode is contained in bits 8 to 15 instead of 12 to 15 Bits o to 3 and 4 to 7 form two addresses as usual The 14 two address instructions all have 1111 in the leftmost 4 bits and numbers from 0000 to 1101 in bits 8 to 11 Instructions with 1111 in the leftmost 4 bits and either 1110 or 1111 in bits 8 to 11 will be treated specially They will be treated as though their opcodes were in bits 4 to 15 The result is 32 new opcodes Because only 31 are needed upcode 111111111111 is interpreted to mean that the real opcode is in bits 0 to 15 giving 16 instructions with no address we proceeded through this discussion the opoode got longer and longer the three address instructions have a 4 bit opcode the two address insiIuctions have an 8 bit opcode the one address instructions have a 12 bit opcode and the zeroeaddress instructions have a 16bit opcode Formal Bytes as 172 or aquot 074 D4 PREFIX OPCODE MODE 1 SIB 1 DtSPLACEMENTI IMMEDIATE 1 Bus 5 1 Eixs 2 3 a n 2 m A Which operand ts source E EM 3 3 4 HBn address m Oncode 15Bit address Figure 514 The Pentium 4 instruction formats a Optode I Operand 1 Operand 2 The 8015 inschtion format Structured Computer Organization Final Exam Summary Addressing Immediate the address part of the instruction contains the operand itself Mov R1 4 Direct specify an operand in memory by giving its full address Register same as direct addressing but speci es a register instead of a memory location Register Indirect the operand being specified comes frommemory or goes to memory but the address is not hardwired into the instruction it is contained in a register the address is then a pointer Indexed addressing a memory by giving a register implicit or explicit plus a constant offset MOV R1O accumulale the OR in H1 initially o MOV R2390 R2 lndex i of current product Ali AND Blil MOV R34096 H3 first index value not to use LOOP MOV R4AR2 H4 All AND meme H4 Ai AND em OH R1R4 OH all the Boolean products into H1 ADD R2 i i41step in units ol 1 word 4 bytes CMP R2Fl3 are we done yet BLT LOOP If R2 lt R3 we are not done 50 continue Figure 519 A generic assembly program for computing the 0R ofAz AN D Bl for two MOV l m E 124300 1024element arrays Operation of this program is sh aightfonvard We need four registers here 1 R1 7 Holds the accumulated 0R ofthe Boolean product terms 2 R2 The index i that is used to step through the arrays 3 R3 The constant 4096 which is the lowest value of i not to use 4 R4 7 A scratch register for holding each product as it is formed Base Indexed the memory address is computed by adding up two registers plus an optional offset Stack having no address at all Use the following stack to solve 8 2 X 5 1 3 X 2 4 using the Reverse Polish notation formula gust V1511 Structured Computer Organization Final Exam Summary Figure 529 A comparison of addressing modes Instruction Types Dyadic Operations combine 2 operands to produce aresult Usually ADD SUBSTRACT MULT AND OR NOT Monadic Operations one operand produces one result Usually ADD in the INC operation to add 1 NEG it produces the additive inverse of a number Input Output 3 Types ofIO in PCs 1 Programmed I O with busy waiting public static void outpuLbufferlchar but 1 int countH ll Output a block at data to the devtca rnt status I ready lor i o i lt count l4 r do status indisplay statusJeg rea y status gtgt 7 8 0x01 whlle ready l 1 outdisplaybu erreg hufiil l l busy waiting Interrupteddriven I 0 Character available Ready or next character Keyboard status Display status Ill HI Interrupt enabled interrupt enabled get status isolate ready bit Keyboard bulter I Characterrecelvad I Display butler l climacteric display I Figure 531 Device registers for a simple terminal INTERRUPT ENABLE Direct Memory Access DMA IO It relieves the CPU from the burden of the IO but it takes bus cycles away from it cycle stealing But the gain in not having to handle one interrupt per byte transfer outweighs the cycles lost Structured Computer Organization Final Exam Summary Terminal Addvess Bus Figure 533 A system with a DMA con oller Coroutines A common use of routines is to simulate parallel processing on a single CPU Each coroutine runs in pseudo parallel as if it had its own CPU This style is helpful for testing software that will later actually run on a multiprocessor Traps A trap is an automatic procedure call initiated by some condition caused by the program usually an exceptional condition The trap handler branch to a procedure at a xed location Performs an action such as printing an error message Conditions that cause a trap oatingpoint over ow under ow integer over ow protection violation unde ned 0pcode attempt to fetch a word from an oddnumbered address division by zero etc Interrupts changes in the ow of control caused not by a running program but something else usually related to I O Traps are synchronous with the program and interrupts are asynchronous Transparency an interrupt routine with transparency property is one that when an interrupt happens some action is taken and some code run but when everything is finished the computer should be returned to exactly the same state it had before the interrupt Chapter 6 The Operating System Machine Language Definition Overlays when a programmer divides a program into a number of pieces in order to decide where in the secondary memory each overlay is kept Virtual memory performing the overlay process automatically Structured Computer Organization Final Exam Summary Paging The idea is to separate the concepts of address space and memory location Address space Address Ma m 4K Main em xiii mow 39 e 4095 A mapping in which virtual addresses 4096 to 8191 are mapped onto main memory addresses 0 to 4095 Definiu39ons What happens if a program branches to an address between 8192 and 12287quot On a machine that lacks virtual memory the program would cause an error trap that would print a suitably rude message for example Nonexistent memory referencedquot and terminate the program On a machine with virtual memory the following sequence of steps would occur 1 The contents of main memory would be saved on disk 2 Words 8192 to 12287 would be located on disk 3 Words 8192 to 12287 would be loaded into main memory 4 The address map would be changed to map addresses 8192 to 12287 onto memory locations 0 to 4095 5 Execution would continue as though nothing unusual had happened This technique for automatic overlaying is called paging and the chunks of program read in from disk are called pages The paging mechanism is said to be transparent Implementation One essential requirement for a virtual memory is a disk on which to keep the whole program and all the data It is conceptually simpler if one thinks of the copy of the program on the disk as the original one and the pieces brought into main memory every now and then as copies rather than the other way around Naturally it is important to keep the original up to date When changes are made to the copy in main memory they should also be re ected in the original eventually The virtual address space is broken up into a number of equalsized pages Page sizes ranging from 512 to 64 KB per page are common at present although sizes as large as 4 MB are used occasionally The page size is always a power of 2 The physical address space is broken up into pieces in a similar way each piece being the same size as a page so that each piece of main memory is capable of holding exactly one page These pieces of main memory into which the pages go are called page frames In Fig 62 the main memory contains only one page frame In practical designs it will usually contain thousands of them The rst64 KB ufvirtual address space divided in to 16 pages each uf4K a Virtual addresses Present39ahse m 75915 r 720 vinualH12hn Structured Computer Organization Final Exam Summary Virtual Page referencing the main memory Now consider howa 32ebit virtual address can be mapped onto a physical main memo address After all the only thing the memory understands are main memory addresses not Virtual addresses so that is what it must be given Every computer with virtual memory has a device for doing the virtualtophysical mapping This device is called the NIMU Memory Management Unit It may be on the CPU chip or it may be on a separate chip that works closely with the CPU chip Since our sample MMU maps from a 32bit virtual address to a 15bit physical address it needs a 32bit input register and a 15bit output register To see how the MMU works consider the example of Fig 6 4 When the MMU is presented with a 32bit virtual address it separates the address into a 20bit Virtual page number and a 12bit offset within the page because the pages in our example are 4K The virtual page number is used as an index into the page table to nd the entry for the page referenced In Fig 674 the virtual page number is 3 so entry 3 of the page table is selected as shown The rst thing the MU does with the page table entry is check to see if the page referenced is currently in main memory After all with 220 virtual pages and only eight page frames not virtual pages can be in memory at once The M39MU m 39 c y examining the presentabsent bit in the page table entry In our example the bit is 1 meaning the page is currently in memory The next step isto take the page frame value from the selected entry 6 in this case and copy it into the upper 3 bits of the 15bit output register Three bits are needed because there are eight page frames in physical memory In parallel with this operation the loworder 12 bits of the virtual address the page offset eld are copied into the loworder 12 bits of the output register as shown This 15bit address is now sent to the cache or memory for looku Figure 65 shows a possible mapping between virtual pages and physical page frames Virtual page 0 is in page frame 1 Virtual page 1 is in page frame 0 V39utual page 2 is not in main memory Virtual page 3 is in page frame 2 Virtual page 4 is not in main memory Virtual page 5 is in page frame 6 and so on Wb hll Memory armless quotquot339 Illi l l W mgr renew Page rablc Page as r when I Present In main memory a Maseru Item mam memory 7 3243i m uz address 7 r t wean Pa name ormmnwmsr 65 A possible mapping of the rst 16 virtual pages onto a main memory with eight page frames Structured Computer Organization Final Exam Summary Page Fault when reference is made to an address on a page not present in main memory After page fault occurs it is necessary for the OS to read in the required page from the disk enter its new physical memory location into the page table and the repeat the instruction that caused the fault Demand Paging Virtual l 39 out idethe main memorv It is possible to start a program running on a machine with Virtual memory even when none of the program is in main memory The page table merely has to be set to indicate that each and every virtual page is in the secondary memory an not in main memory linen CPU tries to fetch the rst instruction it immediately gets a page fault which causes the page containing the rst instruction to be loaded into memon and entered in the page table Then the rst instruction can begin If the rst instruction has two addresses with the two addresses on different pages both different from the instruction page two more page faults will occur and two more pages will be brought in before the instruction can nally execute The next instruction may cause some more page faults and so on This method of operating a virtual memory is called demand paging in analog to the wellknown demand feeding algorithm for babies when the baby cries you feed it as opposed to eeding it on a precise sc e ule Ln eman paging pages are brought into memory only L 39rPnn ifnrapage mm P Av Locality Principle most programs do not reference their address space uniformly but that the reference tends to cluster on a small number of pages Page size and Fragmentation Internal fragmentation when remaining bytes are wasted and occupy the main memory with no useful function w The trend is toward larger page size Segmentation Q How can we manage contracting and expanding tables in the same way that virtual memory eliminates the worry of organizing the problem into overlays A Providing many completely independent address spaces called segments Each segment is a linear sequence of address that varies in size They can grow and shrink independently without affecting each other To provide segments the program must supply the segment number and an address within the segment Advantages of segment memo Simplifying and handling data structures that are growing or shrinking lfeach procedure occupies a separate fragment the linking up of procedures compiled separately is greatly simplified lfa procedure is modified and recompiled no other procedures need to be changed Facilitates sharing procedure or data between several programs By marking a procedure separate segment the main memory does not need more than one physical copy among different libraries Structured Computer Organization Final Exam Summary Helpful for catching programming errors a procedure segment can be specified as execute only prohibiting attempts to read or store on it Implementation of Segmentation 1 Swapping pages are xed size External fragmentation takes place The whole segments are shuttled back and forth between memory and disk on demand external fragmentation or checkerboarding when the system has been running for a while memory is divided into a number of chunks some containing segments and some holes space is wasted here 2 Paging Divide each segment into xed size pages and demand paging them To page a segment a separate page table is needed for each segment To speed up performance hold the segment page in a hardware associative memory