New User Special Price Expires in

Let's log you in.

Sign in with Facebook


Don't have a StudySoup account? Create one here!


Create a StudySoup account

Be part of our community, it's free to join!

Sign up with Facebook


Create your account
By creating an account you agree to StudySoup's terms and conditions and privacy policy

Already have a StudySoup account? Login here


by: Furman Breitenberg
Furman Breitenberg
GPA 3.93


Almost Ready


These notes were just uploaded, and will be ready to view shortly.

Purchase these notes here, or revisit this page.

Either way, we'll remind you when they're ready :)

Preview These Notes for FREE

Get a free preview of these Notes, just enter your email below.

Unlock Preview
Unlock Preview

Preview these materials now for free

Why put in your email? Get access to more of this material and other relevant free materials for your school

View Preview

About this Document

Class Notes
25 ?




Popular in Course

Popular in Engineering Electrical & Compu

This 99 page Class Notes was uploaded by Furman Breitenberg on Wednesday September 9, 2015. The Class Notes belongs to EEC 272 at University of California - Davis taught by Staff in Fall. Since its upload, it has received 55 views. For similar materials see /class/191951/eec-272-university-of-california-davis in Engineering Electrical & Compu at University of California - Davis.

Similar to EEC 272 at UCD

Popular in Engineering Electrical & Compu


Reviews for High


Report this Material


What is Karma?


Karma is the currency of StudySoup.

You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 09/09/15
Modern Microprocessor Development Perspective Prof Vojin G Oklobdzij a Fellow IEEE IEEE CAS and SSC Distinguished Lecturer University of California Davis USA This presentation is available at httpwwweceucdaviseduacsel under Presentations Outline of the Talk I Historic Perspective I Challenges I De nitions I Going beyond one instruction per cycle I Issues in super scalar machines I New directions I Future TECHNOLOGY IN THE INTERNET ERA Lithography galsx39nhvmmiLgiiis Microns 013 pm 0 1 r x y y s y a y x 1980 1982 1 984 1986 1988 1990 1992 1994 1998 1998 2000 2002 Wavelength Feature Size From Dennis Buss Texas Instruments ICECS Malta 2001 presentation Process Technology Trends Intel To the Terahertz Transistor Transistor Leadership Continues 35 v Wt quot i i vquot quot T 39 v A quotJ f yaquot 7quot i quot L 7quot 5 3415 l39 H n x 395 I pm r u 7 V 7 39 3917 439 A 39 l 2 39 r I quot h39i qgi39 u 39 m hquot r39i i TEMHEEE TMHSESEU SII39IIEIUEE Rasedl Saurae I Drain u nghk Gate 39 Dielectric quot quotEQLJI39LJZI J 31 am I F uquot DEF IEtEd Sgurggj mtg wwwir39ltEJIAfsclm aljs cha e39 3 Robert Yung 2002 Intel Corp Page 12 INTEGRATED CIRCUIT 1958 US Patent 3138743 filed Feb 6 1959 1000000000 100000000 10000000 100 Transistors 100000 10000 1000 1 1310 1380 1990 2000 Transistors per IC doubles every two years In less than 30 years 1000X decrease in size 10000x increase in performance 10000000X reduction in cost Heading toward 1 billion transistors before end quot r Processor Design Challenges Will technology be able to keep up Will the bandwidth keep up Will the power be manageable Can we deliver the power What will we do with all those transistors Frequncy continues to double every two years 3X generation t M M J Nominal Clock Frequency MHz 3000 2500 Pentium 4 0 2000 7 Pentium 4 Athlon2100 Athlon1900 1500 7 Athlon Itanlum 1000 7 PI Athlon o Itanium Alpha 21264 Pm Xeon E Alpha 21164 500 7 Exponential Alpha 21164 AI h 21064 UltraSparc C 1 SCrayX MP p a E aV39 A A CDCCyber MIPSX 0 1 1 1 1 1 1975 1980 1985 1990 1995 2000 Year 2005 ISPEC 95 VS Year Performance 3X generation y 53051X 10581 100000 Total transistors 3X generation Logic transistors 2X Source Lmv ISSCC uP Repon HotChips 010 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 Processor Design Challenges I Performance seems to be tracking frequency increase I Where are the transistors being used I 3X per generation growth in transistors seems to be uncompensated as far as performance is concerned Well it will make up in power 100 x4 I 3yearsx39 Power W 001 Courtesy of Sakurai Sensei Gloom and Doom predictions Closer look t the ower Will Will 1BKW E W g 3 Should be EKW 5 LEE 5 1 5000 1 v 20021 2004 2MB 213 Year Intel l5 Source Shekhar Borkar Intel 532005 Prof VG Oklobdzij a University of California 14 Mquot39 UHEEIr il 3511HESi 1000 k wi ncrease Rocket Nozzle Nuclear Reactora 1 D D 100 r v v I Pentium pros Powerquot Density chm2 1970 1 930 1990 21100 2010 Year iner density too high tn keep junctions at low temp P owe Iquot D 9 I18 courtesy of Intel Corp Pwer Density The Future Fewer Map thnie TarImamwe F m it e J r 1 rr h Ie r v f 9 lf egww l 1iquot wre jrgi 7 m h 3 Temperature El With high power density cannot assume uniformity As de temperature increases CMOS logic slows down At high clie temp longterm reliability earl be compromised In e H Hebert Yunrg 2002 Intel Corp Page 19 1 w 7quot ti 7 i l 7quot r gt7 r quot1 g M U Q w 471 4 WELL 9 y Processor Thermal Compars W L 55L Penium Crusoe Proceisor Pllayng DVD Paying DV 1055 C 423 C 2219 F 11 F V Power and Current Trien d 25 200 500 Voltage 2 Power Q E 15 4 a U 5 g Current 0 t 1 D E g a o 3 o 05 g gt 0 0 0 1998 2002 2006 2010 2014 Year International Technology Roadmap for Semiconductors 1999 update sponsored by the Semiconductor Industry Association in cooperation with European Electronic Component Association EECA Electronic Industries Association of Japan EIAJ Korea Semiconductor Industry Association KSIA and Taiwan Semiconductor Industry Association TSIA l 39l Li l um llxlll lll l l 1Wquot ji39quot39cnnln39 l Your car 1 swam starter mum quot El E mm 3 SE 2 r a 3 mu 5 IF run 40 um 19m 1930 1990 20m 2010 Vaar Source Intel 100 t Q 2 Wu y 03733x392395778 3 39 Saving Grace 1 I I I I I I 000 010 020 030 040 050 060 7 7 L 7 y 1 AM u POWCI versus Yam O RHSC 7 A O Ccnsumer I DacAipha 7 7 Exporm RHSC Egtlt30n x86 39 39 Expon CODSLImEU 39 39 39 y 393 Expan Dec Alpha RISC 12 yr O Consumer lgwend iiiAt ljoiyeaf 77777777 19965 1997 19975 1998 19985 1999 19995 2000 20005 I Consumer x86 XS6 efficiency 1mpr0v1ng dramatically 4X generation 0 Server average improving HighEnd r 3X generation processors efficiency not improving Trend in L didt udidt is roughly proportional to I I where f is the chip s current and fis the clock frequency or I Idd f Vdd P f lda where P is the chip s power The trend is PI f It ldo I onchp L 1 package L slightly decreases Therefore L didt fluctuation increases significantly Source Shen Lin Hewlett Packard Labs Onchip Interconnect Trend e r r y Feat re 5kg I1 dam 250 18D 130 so 65 45 32 I I I I I I I I I I I I Gate delay feneut 4 lLecel interconnect M13 Globel intereennect with repeaters Glohel interconnect without repeaters 1 39I I 5mm quot115 mm 01 R w E Leeel intercenneete eeele With gate delay a Intermediate interconnects benefit from low k material I 7 A I of RC 37 ff It I i r 39I v quot J v I I I I e 4 1 I I V I 1 L L n 39 r s1 1 I39 n 75quot 19 93 39 Hebew item 532005 if d ir39r Prof VG Oklobdzij a University of California 26 Microprocessor Evolution ltanium 2 processor 4004 re 1911 2002 31 yrs 20112 31 yrs 2300 transistors 24K X 220M 55K mum promoss 013um 1H X 2 50mm wafer 12quot SU mm Ex 12mm2 39142n39mm2 12 X 108 kHz 28 GHz 26K X 018un1 H55 X 12quot 300mm BX 421mm2 35 X 1 6H 9H X 131 v r J IL gigggriNfag 1 1nL FKWK Vl39u 39 WEE NJ 51 M aquot h J P What to do with all those transistors I We have reached 220 Million I We will reach 1 Billion in the next 5 years I Memory transistors will save us from power cr1s1s I What should the architecture look like Synchronous Asynchronous Design on the Chip I 1 Billion transistors on the chip by 20056 I 64b 4Way issue logic core requires 2 Million Feature Digital 21 164 MIPS 10000 PowerPC 620 HP 8000 Sun UltraSpar Frequency Pipeline Stages Issue Rate OutofOrder Exec Register Renam intFF ansistors Logic transistors 500 MHZ 7 4 6 loads none8 93M 18M 200 MHz 200 MHz 5 7 5 4 4 32 16 3232 59M 23M 180 MHZ 250 MHZ 6 9 4 39M SPE C95 ImgFZPI 126183 89172 108 183 Power 25W 30W 40W SpecIntWott 05 03 027 IEnergy Dela V 64 26 29 Synchronous Asynchronous Design on the Chip 10 million transistors 1 Billion Transistors Chip What Drives the Architecture Processor to memory speed gap continues to Widen Transistor densities continue to increase Application negrain parallelism is limited Time and resources required for more complex designs is increasing Timetomarket is as critical as ever Multiprocessing on the Chip chUMA Design Source Pete Bannon DEC Metrics Topologies Cache Coherence A bit of history Historical Machines IBM Stretch7030 7090 etc circa 1964 V PDP8 i quot 39 CDC 6600 PDP39ll Z J 7 Cyber V V VAX11 It Cray l V l is I RISC CISC Important Features Introduced Separate Fixed and Floating point registers IBM S3 60 Separate registers for address calculation CDC 6600 Load Store architecture CrayI Branch and Execute IBM 801 Consequences Hardware resolution of data dependencies Scoreboarding CDC 6600 Tomasulo s Algorithm IBM 36091 Multiple functional units CDC 6600 IBM 36091 Multiple operation within the unit IBM 36091 RISC History CDC 6600 1963 Cyber Cray I 1976 RISCl Berkeley 1981 MIPS Stanford 1982 Mlpsi 1986 HPPA 1986 39 MIPSVZ 1989 L MIPS32 1992 DEC Al ha 1992 J SPARC v8 1987 SPARC v9 1994 MIPS4 1994 Reaching beyond the CPI of one The next challenge I With the perfect caches and no lost cycles in the pipeline the CPI 100 I The next step is to break the 10 CPI barrier and go beyond I How to efficiently achieve more than one instruction per cycle Again the key is exploitation of parallelism on the level of independent functional units on the pipeline level How does superscalar pipeline look like oinstructions completed EU1 data available possibly out of order instructions decoded and sent to corresponding EUs they could be sent out of order outoforderissue EU2 gt Instructions Instructlon Decode Data i spatch Cache I I Unit block of instructions EU being fetched from ICache Instructions screened for Branches possible target path being fetched data written to Cache inorder EU5 if r 1 IF i DEC i EXE WB Superscalar Pipeline I One pipeline stage in superscalar implementation may require more than one clock Some operations may take several clock cycles I SuperScalar Pipeline is much more complex therefore it will generally run at lower frequency than singleissue machine I The tradeoff is between the ability to execute several instructions in a single cycle and a lower clock frequency as compared to scalar machine Everything you always wanted to know about computer architecture can befoundmlBM36091 Greg Grohosky Chief Architect of IBM RS6000 Techniques to Alleviate Branch Problem How can the Architecture help c Conditional or Predicated Instructions Useful to eliminate BR from the code If condition is true the instruction is executed normally if false the instruction is treated as NOP if AO ST R1A R2S R3T BNEZ R1 L MOV R2 R3 replaced with l CMOVZ R2R3 R1 L c Loop Closing instructions BCT Branch and Count IBM RS6000 The loopcount register is held in the Branch Execution Unit therefore it is always known in advance if BCT will be taken or not loopcount register becomes a part of the machine status Superscalar Issues Contenticnfor Data Data Dependencies I ReadAfterWrite RAW also known as Data Dependency or True Data Dependency I WriteAfterRead WAR knows as Anti Dependency I WriteAfterWrite WAW known as Output Dependency WAR and WAW also known as Name Dependencies Superscalar Issues Contention for Data True Data Dependencies ReadAfterWrite RAW An instruction j is data dependent on instruction 139 if I Instruction i produces a result that is used by j or I Instruction j is data dependent on instruction k which is data dependent on instruction I Examples 3 SUBI R1 R1 8 decrementpointer BNEZ R1 Loop branch if R1 zero LD F0 0R1 ADDD F4 F0 F2 SD 0R1 F4 39F0array element 39add scalar in F2 39 store result F4 PattersonHennessy Superscalar Issues Contention for Data True Data Dependencies Data Dependencies are property of the program The presence of dependence indicates the potential for hazard which is a property of the pipeline including the length of the stall A Dependence I indicates the possibility of a hazard I determines the order in which results must be calculated I sets the upper bound on how much parallelism can possibly be exploited ie we can not do much about True Data Dependencies in hardware We have to live with them Superscalar Issues Contention for Data Name Dependencies are I AntiDependencies WriteAfterRead WAR Occurs when instruction j writes to a location that instruction i reads and i occurs first I Output Dependencies WriteAfterWrite WAW Occurs when instruction i and instruction j write into the same location The ordering of the instructions write must be preserved f writes last In this case there is no value that must be passed between the instructions If the name of the register memory used in the instructions is changed the instructions can execute simultaneously or be reordered The hardware CAN do something about Name Dependencies Superscalar Issues Contention for Data Name Dependencies I AntiDeg endencies WriteAfter Read WAR ADDD F4 F0 F2 F0 used by ADDD LD F0 0R1 F0 not to be changed before read by ADDD l Output Dependencies WriteAfter Write WAW LD F0 0R1 LD writes into F0 ADDDxfo F4 F2 Add should be the last to write into F0 This case does not make much sense since F0 will be overwritten however this combination is possible Instructions with name dependencies can execute simultaneously if reordered or if the name is changed This can be done statically by compiler or dynamically by the hardware Superscalar Issues Dynamic Scheduling I Thornton Algorithm Scoreboarding CDC 6600 1964 I One common unit Scoreboard which allows instructions to execute out of order when resources are available and dependencies are resolved I Tomasulo s Algorithm IBM 36091 1967 I Reservation Stations used to bu er the operands of instructions waiting to issue and to store the results waiting for the register Common Data Buss CDB used to distribute the results directly to the functional units I Register Renaming IBM RS6000 1990 I Implements more physical registers than logical architect They are used to hold the data until the instruction commit Superscalar Issues Dynamic Scheduling Thornton Algorithm ScorebOarding CDC 6600 Scoreboard Unit Stts Regs usd Pend wrt OK Read 39 l s1gna s Div to E Mult Fin Fj Fk Qj Qk Rj Rk execution E Add 8 units 0 Value 2 a D Instructions in 3 E Q E a a 2 a queue 0 Q a a D gt gt gt signals to registers Superscalar Issues Dynamic Scheduling Thornton Algorithm Scoreboarding CDC 6600 1964 Performance CDC 6600 was I 7 times faster than CDC 6400 no scoreboard one functional unit for FORTRAN and 25 faster for hand coded assembly Complexity To implement the scoreboard as much logic was used as to implement one of the ten functional units Superscalar Issues Dynamic Scheduling Tomasulo s Algorithm IBM 36091 1967 FLP Operation Stack FLP Buffer lt Store Queue t TAG Source Data TAG Source Data V V V I Reserv Station I I Reserv Station Fnct Unit1 Fnct Unit2 i Busy TA DATA G FLP Registers Data Source TAG Source TAG Data Data Common Data Bus Superscalar Issues Dynamic Scheduling Tomasulo s Algorithm IBM 36091 1967 The key to Tomasulo s algorithm are I Common Data Bus CDB CDB carries the data and the TAG identifying the source of the data I Reservation Station Reservation Station buffers the operation and the data if available awaiting the unit to be free to execute If data is not available it holds the TAG identifying the unit which is to produce the data The moment this TAG is matched with the one on the CDB the data is taken and the execution will commence Replacing register names with TAGs name dependencies are resolved sort of registerrenaming Super scalar Issues Dynamic Scheduling Consist of I Remap Table RT providing mapping form logical to physical register I Free List FL providing names of the registers that are unassigned so they can go back to the RT I Pending Target Return Queue PTRQ containing physical registers that are used and will be placed on the FL as soon as the instruction using them pass decode I Outstanding Load Queue OLQ containing registers of the next FLP load Whose data will return from the cache It stops instruction from decoding if data has not returned Superscalar Issues Dynamic Scheduling RegisterRenaming Structure IBM RS6000 1990 R0 R1 Is1szs3 l TlSllSlZSf Free List Remap Table 32 entries of 6b PTRQ There are 32 logical registers and 40 implemented physical registers 11H Instruction Decode LC SC GB T Buffer PS Q Bus B ass Outstnd Load Q III E D LOOP Power of Superscalar Implementation Coordinate Rotation IBM RS6000 1990 FL FRO sin theta FL FR2 cos theta FL FR3 Xdis FL FR4 ydis MTCTR I UFL FR8 Xl FMA FR10FR8FR2 FR3 UFL FR9 yi FMA FR11FR9FR2 FR4 FMA FR12FR9FR1 FR10 FST FR12 X10 FMA FR13 FR8 FRO FRll FST FR13y1i BC LOOP laod rotation matrix FL FRI sin theta constants load X and y displacements X1 X cosG y sine yl y cosG X sine load Count register with loop count laod Xi form Xicos Xdis laod yi form yicos ydis form yisin FRIO store Xl i form Xisin FRll store y l i continue for all points This code 18 instructions worth executes in 4 cycles in a loop Superscalar Issues Instruction Issue and Machine Parallelism I InOrder Issue with lnOrder Completion The simplest instructionissue policy Instructions are issued in exact program order Not efficient use of superscalar resources Even in scalar processors inorder completion is not used I InOrder Issue with OutofOrder Completion Used in scalar RISC processors Load Floating Point It improves the performance of superscalar processors Stalled when there is a con ict for resources or true dependency I OutofOrder Issue with l OutofOrder Completion The decoder stage is isolated from the execute stage by the instruction window additional pipeline stage Superscalar Examples Instruction Issue and Machine Parallelism DEC Alpha 21264 I FourWay Six Instructions peak Outof Order Execution MIPS R10000 I Four Instructions Outof Order Execution HP 8000 I FourWay Agressive Outof Order execution large Reorder Window I Issue InOrder Execute Outof Order Instruction Retire InOrder Intel P6 I Three Instructions Outof Order Execution Exp onential I Three Instructions InOrder Execution Superscalar Issues The Cost vs Gain of Multiple Instruction Execution PowerPC Example Feature 601 604 Difference Frequency 100MHz 100MHz same CMOS Process 5u 5metal 5u 4metal same Cache Total 32KB Cache 16K16K Cache same LoadStore Unit No Yes Dual Integer Unit No Yes Register Renaming No Yes Peak Issue 2 Branch 4 Instructions doub1e Transistors 28 N llion 36 N llion 30 SPECint92 105 160 50 SPEC 902 125 165 30 om 3888 we 535 30320 gt gem U aya umgiwu limmm air w uh l I ingui JI I 1 ml J z W w I u w mm liI39Al39l r5 I I L I I39 In m I 1 x 2 L x W HI LL r k 5 21339 um g m w W 9 1 H mlx a n HE J iwl Wk Ilka qu S l 399 ihrllnl I k I ni l urwhhxllhl l uhLFu mai F LLIIhaLI a 3 11 J Mm t F nm 53 7ulw ardagamu umwm mooQQm Superscalar Issues Comparisson of leading RISC microrpocessors Feature Digital MIPS PowerPC HP 8000 Sun 21164 10000 620 UltraSparc Frequency 500 MHZ 200 MHZ 200 MHZ 180 MHZ 250 MHZ Pipeline Stages 7 57 5 79 69 Issue Rate 4 4 4 4 4 OutofOrder 6 loads 32 16 56 none Exec Register Renam none 8 3232 88 56 none intFF T ransistors 93M 59M 69M 3 9M 38M Logic transistors 18M 23M 22M 39M 20M SPEC95 126183 89172 99 108183 8515 IntgFlPt PerformLogtrn 70102 3975 4141 277469 42575 IntgFP cache quot J MN quotm I 0 wa i E i 39 39 e 39 quot 39 Equot N 339 Mia gu m in quot 39 7 39 7 x 39 39 u ll 39 I r I II I I I I I I I II II I I I I U I I I A w I I I I HA v I I r I I n v I I w v I I I u I l u I I I II I I II I w I 4 r I v v I r i a 7 A v 1 I I A 7 I v I A I V 39 v 39 VI I r 1 w 39 39 39 39 39 39v 4 a h r 39 39 39 39 V h Z I I II I 39 I I I I y 39 I I I v I I I II I v 39 39 I III I I I II I 7 n L v 1 I w x H x gt L 1 39 u I I k w E I i I W J x 1 a xr v gal M 1 I A uI m 1 351 56 39I iLn 1 L a nu H I Illll w 39 fquot n F n A j W 39 f 1 r f C Q t rI o I rg FJ U 739 lt wq o if I x J 3 w M U L r r p H an 4 r7 397quot W 7 T quoti1m i39 2quot 6 A 3912 quot JV quotNP 1 m 1 iv J 4 L H H ZJ i J 3Jr L L1HH 111 Lg MU er gilt RAH JL T1 l willMUG Feature Frequency Pipeline Stages Issue Rate OutofOra er Exec RegisterRenam intFF T ransistors Logic transistors Cache SPEC95 IntgFPt Perform Log Tr IntgFP Superscalar Issues Value of OutofOrder Execution MIPS MIPS HPPA HP 8000 Digital 5000 10000 7300LC 21164 180 1V1Hz 200 MHz 160 MHz 180 MHz 500 MHz 5 57 5 79 7 2 4 2 4 4 none 32 none 56 6 loads none 3232 none 56 none 8 36M 59M 92M 39M 93M 11 23M 17M 39M 18M 32 32K 3232K 64 64K none 8 8 96 4037 89172 5573 108183 126183 3634 3975 3243 277469 70102 cache Digital 21264 600 MHZ 79 42 2011 Sfp 8072 152M 6M 64 64K 3 66O 60100 The ways to exploit instruction parallelism I Superscalar takes advantage of instruction parallelism to reduce the average number of cycles per instruction I Superpipelined takes advantage of instruction parallelism to reduce the cycle time I VLIW takes advantage of instruction parallelism to reduce the number of instructions The ways to exploit instruction parallelism Plpeline Scalar 0 1 2 3 4 5 IF ID EXE WB 4 IF ID FXE WR IF ID A FXF WR Superscalar The ways to exploit instruction parallelism Plpeline Superp1pellned 0 1 2 3 4 5 6 7 3 9 T r l FXE VJR I F D 30 NR 1 4 I X WB VLIW VeryLonglnstruction Word Processors I A single instruction speci es more than one concurrent operation This reduces the number of instructions in comparison to scalar The operations speci ed by the VLIW instruction must be independent of one another I The instruction is quite large Takes many hits to encode multiple operations VLIW processor relies on software to pack the operations mto an instruction Software uses technique called compaction It uses noops for instruction operations that cannot be used VLIW processor is not software compatible with any general purpose processor VeryLongInstructionWord Processors It is difficult to make different implementations of the same VLIW architecture binarycode compatible with one another because instruction parallelism compaction and the code depend on the processor s operation latencies Compaction depends on the instruction parallelism In sections of code having limited instruction parallelism most of the instruction is wasted VLIW lead to simple hardware implementation ltanium 2 Processor Transistors 221 M Cehee HO 33mm er 170M 75 Core 51M 25 Die eze 1951 216mm 421 mm2 Caches quot0 L38 50 ethers 16 Care Mme2 34 Caches beeeming an increasing pertien ef the die because of its performance impact and low power density Superpipelined Processors I In Superpipelined processor the major stages are divided into substages The degree of superpipelining is a measure of the number of substages in a major pipeline stage It is clocked at a higher frequency as compared to the pipelined processor the frequency is a multiple of the degree of superpipelining This adds latches and overhead due to clock skews to the overall cycle time Superpipelined processor relies on instruction parallelism and true dependencies can degrade its performance Superpipe li ned Processors I As compared to Superscalar processors Superpip elined processor takes longer to generate the result Some simple operation in the superscalar processor take a full cycle while superpipelined processor can complete them sooner At a constant hardware cost superscalar processor is more susceptible to the resource con icts than the superpipelined one A resource must be duplicated in the superscalar processor while superpipelined avoids them through pipelining I Superpipelining is appropriate when The cost of duplicating resources is prohibitive The ability to control clock skew is good This is appropriate for very high speed technologies GaAs BiCMOS ECL low logic density and low gate delays Hyper PIpolIned Technology 39ourlcsx39 Dong zmncmL Intel Corp Hol hmsl presenluuon quot 86quot Ink Wm IliumAlcth 5614 126142 Froquoncy Introduction u al 11 39 3 1 III Qvi quot7 M JH s V J V n n h N w w j a y m H E w r n k A E m y Tul fwjni d1d15 V ACIJ I 11 J 14 V I III gw wL m af 4quot ni g L 7 I l 1m MI Ar hw i pi I n 39 A i V 39 1quot 5 1quot 55 1 mnw r Pipeline Depth 10000 1000 MHz 100 7 100 ltgt Processor Freq I BM Power PC scales 2X per 5 DEC technology A A Gate delaysclock generation 395 2 h a n x u 2 10 U m gt 2 a D a H a D J f 1 n c eo u n c eo In co 00 c c c c c o o o c c c c c c c o o o F F F F N N N Frequency doubles each generation Number of gatesclock reduce by 25 MultiGHZ Clocking Problems I Fewer logic inbetween pipeline stages Out of 710 F04 allocated delays FF can take 24 F04 I Clock uncertainty can take another F04 I The total could be 12 of the time allowed for computation Consequences of multiGHz Clocks I Pipeline boundaries start to blur I Clocked Storage Elements must include logic I Wave pipelining domino style signals used to clock I Synchronous design only in a limited domain I Asynchronous communication between synchronous domains Future Perspective INTERNET ERA DSP PLUS ANALOG BJ Lle EQO39E39I Enabled r C r Dlgl a 0 or Control leo erver ProAudio V9 Digital Still Camera DAB Digital Radio quot Central Office DSL ode Natlorkhlg From Dennis Buss Texas Instruments ICECS Malta 2001 presentation 532005 Wearable Computer ti ll 74 l r g d39 39 I v 939 t i l a e i y g 1 ilji L 39 4 39 gkr 1 1 quot gt quot rum Prof VG Oklobdzij a University of California 76 532005 Prof VG Oklobdzij a University of California 77 httpwwwcscmueduwearable httpIcswwwmediamiteduprojectswearables httpwawmicroopticalcorpcom T Kuroda 2139 Digital Ink Digital Ink is a sophisticated pen that recognizes and stores the handwriting and drawing of it39s user After writing the user simplyjots the word quotsendquot or quotesmailquot followed by a fax number or cemai address The documents are wirelessly sent via cellular network to fax machines desktop computers or even other digital pens A small digital quotink wellquot connected to the user39s desktop computer serves as home to Digital lnlc and allows the pen39s information to be downloaded for future use Digital Ink reinvents the computer desktop by turning any writing surface from napkins to paper into lowtech and socially comfortable computer interfaces C MU T Kurod 2213939 Implantable Computer Hathml insplan 39 Hem ruin mi 39Iilii39 and Eanq g39rIT Cu 39H r39E i u quotHaa WIN 2 Ala1rd 139 quot1 hiquot Lum H39 NEE 1 gt H V a 155 mew39r TECHNOLOGY IN THE INTERNET ERA Future Scaling Beyond Bulk CMOS RTD Single Electronics Vertical Gate Structure JIL Molecular Switch Bulk SOI CMOS amp Nanotubes I I I Today 2020 2040 From Dennis Buss Texas Instruments ICECS Malta 2001 presentation From Hiroshi Iwai Toshiba ISSCC 2000 presentation Year 2010 Extrapolation of the trend with some saturation Many important interesting application Home Entertainment Of ce Translation Health care Year 2020 More assembly technique 3D Year 2100 Combination of bio and semiconductor Ultra small volume Small number of neuron cells Extremely low power Lon lifetime Br39 in Senso Infrared Humidity Arti cial Intel igence C02 39 3D ht control QBi0c0m uter Real time image rocessing by DNA manipulatim i39v fd quot vi More than 100 billion stars are involved From Hiroshi Iwai Toshiba ISSCC 2000 presentation A general introductory description of the logical structure of SYS TEM360 is given in preparation for the more detailed analyses occur ring in the other parts of the paper The functional units the principal registers and formats and the basic addressing and sequencing principles of the system are indicated The structure of SYSTEM360 Part I Outline of the logical structure by G A Blaauw and F P Brooks Jr SYSTEM360 is distinguished by a design orientation toward very large memories and a hierarchy of memory speeds a broad spec trum of manipulative functions and a uniform treatment of inputoutput functions that facilitates communication with a diversity of inputoutput devices The overall structure lends itself to program compatible embodiments over a wide range of performance levels The system designed for operation with a supervisory pro gram has comprehensive facilities for storage protection pro gram relocation nonstop operation and program interruption Privileged instructions associated with a supervisory operating state are included The supervisory program schedules and governs the execution of multiple programs handles exceptional condi tions and coordinates and issues input output 10 instructions Reliability is heightened by supplementing solid state compo nents with builtin checking and diagnostic aids Interconnection facilities permit a wide variety of possibilities for multisystem operation The purpose of this discussion is to introduce the functional units of the system as well as formats codes and conventions essential to characterization of the system IBM SYSTEMS JOURNAL 39 VOL 3 39 NO 2 39 1964 Functional structure The SYSTEM360 structure schematically outlined in Figure 1 has seven announced embodiments Six of these namely MODELS 30 4o 50 60 62 and 70 will be treated here1 Where requisite IO devices optional features and storage capacity are present these six models are logically identical for valid programs that contain explicit time dependencies only Hence even though the allowable channels or storage capacity may vary from model to model as discussed in Part II the logical structure can be discussed with out reference to speci c models Direct communication with a large number of low speed termi inputoutput nals and other 10 devices is provided through a special multi plexer channel unit Communication with highspeed 10 devices is accommodated by the selector channel units Conceptually the inputoutput system acts as a set of subchanncls that operate concurrently with one another and the processing unit Each sub channel instructed by its main controlword sequence can govern a data transfer operation between storage and a selected 10 de vice A multiplexer channel can function either as one or as many subchannels a selector channel always functions as a single sub channel The control unit of each 10 device attaches to the chan nels via a standard mechanicalelectricalprogramming interface Figure I Funcficnal schematic of System360 STORAGE ARITHMETlC AND LOGC PROCESSING UNIT IN PUTOUTPUT CHANNELS CONTROL UNITS DEVICES l I MULTIPLE 8 LOWSPEED SUBCHANNELS 3 MAIN STORAGE AND LARGE CAPACITY 3 STORzRGE SE LECTOR I I I SINGLE message quot3 SUBCHANNEL S ELECTOR SINGLE HIGHSPEED SUBCHANNEL 120 c A BLAAUW AND F P moons JR Figure 2 Schematic of basic registers and data paths STORAGE ADDRESS gt MAIN stoma COMPUTER gum VARaABLE FIELDLENGTH cszilli gi OPER T39ONS OPERATIONS l l 16 GENERAL REGISTERS FLOATI NGP0lNT OPERATIONS INDEXED l 39 ADDRESS 4 FLOATINGPOINT REGISTERS The processing unit has sixteen general purpose 32bit registers used for addressing indexing and accumulating Four 64bit oatingpoint accumulators are optionally available The inclu sion of multiple registers permits effective use to be made of small highspeed memories Four distinct types of processing are provided logical manipulation of individual bits character strings and xed words decimal arithmetic on digit strings xed point binary arithmetic and oatingpoint arithmetic The processing unit together with the central control function will be referred to as the central processing unit CPU The basic registers and data paths of the CPU are shown in Figure 2 The men s of the various models yield a substantial range in performance Relative to the smallest model MODEL 30 the in ternal performance of the largest MODEL 70 is approximately 50 1 for scienti c computation and 15 l for commercial data processing Because of the extensive instruction set srsramaco control is more elaborate than in conventional computers Control func tions include internal sequencing of each operation sequencing from instruction to instruction with branching and interruption governing of many IO transfers and the monitoring signaling timing and storage protection essential to total system operation The control equipment is combined with a programmed super visor which coordinates and issues all He instructions handles exceptional conditions loads and relocates programs and data manages storage and supervises scheduling and execution of multiple programs To a problem programmer the supervisory program and the control equipment are indistinguishable The functional structure of SYSTEM 360 like that of most com puters is most concisely described by considering the data for mats the types of manipulations performed on them and the instruction formats by which these manipulations are speci ed OUTLINE OF THE LOGICAL STRUCTURE processing control The several SYSTEMXSGO data formats are shown in Figure 3 information An 8bit unit of information is fundamental to most of the for formats mats A consecutive group of a such unite constitutes a eld of Eength n Fixedlength elds of length one two four and eight are termed bytes haifwords words and doable words respectively In many instructions the operation code implies one of these four elds as the length of the operands On the other hand the length is explicit in an instruction that refers to operands of vari able length The location of a stored eld is speci ed by the address of the leftmost byte of the eld Variable length elds may start on any byte location but a xedlength field of two four or eight bytes must have an address that is a multiple of 2 4 or 8 re spectively Some of the various alignment possibilities are ap parent from Figure 3 Storage addresses are represented by binary integers in the system Storage capacities are always expressed as numbers of bytes Figure 3 The data formats u nh nunu lt HALFWORDr NALFWORD 7 h aw BYTE 7 w BYTE w BYTE awe VVVVV v PACKED DECleL ZONED DECIMAL LOGICAL DATA VARIABLE LENGTH 122 e A BLAAUW AND F P BROOKS JR Processing operations The srsrnnfscc operations fall into four classes xedpoint arith metic oatingpoint arithmetic logical operations and decimal arithmetic These classes differ in the data formats used the regis ters involved the operations provided and the way the eld length is stated The basic arithmetic operand is the 32bit xedpoint binary word Halfword operands may be speci ed in most operations for the sake of improved speed or storage utilisation Some products and all dividends are 64 bits long using an even odd register pair Because the 32bit words accommodate the 24 bit address the entire xedpoint instruction set including multiplication division shifting and several logical operations can be used in address computation A two s complement notation is used for xedpoint operands Additions subtractions multiplications divisions and com parisons take one operand from a register and another from either a register or storage Multiple precision arithmetic is made con venient by the two s complement notation and by recognition of the carry from one word to another A pair of conversion in structions CONVERT TO BINARY and CONVERT T0 DECIMAL provide transition between decimal and binary radiccs without the use of tables Multipleregister loading and storing instructions facilitate subroutine switching Floatingpoint numbers may occur in either of two xed length formats short or long These formats dilier only in the length of the fractions as indicated in Figure 3 The fraction of a oating point number is expressed in 4bit hexadecimal base 16 digits In the short format the fraction has six hexadecimal digits in the long format the fraction has 14 hexadecimal digits The short length is equivalent to seven decimal places of precision The long length gives up to 1 decimal places of precision thus eliminating most requirements for doubleprecision arithmetic The radix point of the fraction is assumed to be innnediately to the left of the high order fraction digit To provide the proper magnitude for the oatingpoint number the fraction is con sidered to be multiplied by a power of 16 The characteristic portion bits 1 through 7 of both formats is used to indicate this power The characteristic is treated as an excess 64 number with a range from 64 through 63 and permits representation of decimal numbers with magnitudes in the range of 103978 to 10 Bit position 0 in either format is the fraction sign S The fraction of negative numbers is carried in true form Floatingpoint operations are performed with one operand from a register and another from either a register or storage The result placed in a register is generally of the same length as the operands Operations for comparison translation editing bit testing and bit setting are provided for processing logical elds of xed and variable lengths Fixedlength logical operands which con OUTLINE OF THE LOGICAL STRUCTURE fixedpoint arithmetic oatingpoint arithmetic logical operations sist of one four or eight bytes are processed from the general registers Logical operations can also be performed on elds of up to 256 bytes in which case the elds are processed from left to right one byte at a time Moreover two powerful scanning instructions permit byte bybyte translation and testing via tables An important special case of variablelength logical oper ations is the one byte eld Whose individual bits can be tested set reset and inverted as speci ed by an 8bit mask in the in struction Any 8bit character set can be processed although certain re Character strictions are assumed in the decimal arithmetic and editing oper codes ations However all charactersetsensitive 10 equipment assumes either the Extended Binary Coded Decimal Interchange Code EBCDIC of Figure 4 or the code of Figure 53 which is an eightbit extension of a seven bit code proposed by the International Standards Organization Decimal arithmetic can improve performance for processes decimal requiring few computational steps per datum between the source arithmetic input and the output In these cases Where radix conversion from decimal to binary and back to decimal is not justi ed the use of registers for intermediate results usually yields no advantage over storageto storage processing Hence decimal arithmetic is pro vided in SYSTEM360 with operands as well as results located in storage as in the IBM 1400 series Decimal arithmetic includes Figure 4 Extended BinaryCodedDecimal Interchange Code err POSITIONS gt 01 I on 23 gt 4567 0000 PF Punch 0quot BS Backspace 5M Se mode HT Hmizontal uh CL Idle PN Punch on LB Lowercase BVP Bypass er st DEL Delete LP Lyneleed vcase I lransrmssxon U RES Reamquot l EDT E NL Pie 1 SP nd u New line Space G A BLAAUW39 AND F P BROOKS JR Figure 5 Eightbit representation for proposed international codequot BIT POSITIONS gt 76 X5 43 2 1 oo 0000 000 1 0010 0011 7 Thivd lso dratt proposal tor 6 and 7 on coded character sets tor rnlormation processing interchange International Standards Organrzutron June 1964 ull idle tart or heading tart oltext no 0H llortzontal tabulation DC2 evlce control Esco e me e evlce control le separator v emcal tabulation llevlce control stop Group separator ext orrn reed e t acknowledge 5 Record paror nd oi transmission arnage return idle Unit separa or run out no of transmtssion block Space normally nonprinting nm in ancel Currency symbol HT LF VT FF CR SO S nqulry ckncwledge l DLE EM nd o1 medium Grav ackipaca cc evice control tart ol speclal sequence Delete addition subtraction multiplication division and comparison The decimal digits 0 through 9 are represented in the 4bit binarycodeddecimal form by 0000 through 1001 respectively The patterns 1010 through 1111 are not valid as digits and are interpreted as sign codes 1011 and 1101 represent a minus the other four a plus The sign patterns generated in decimal arith metic depend upon the character set preferred For EBCDIC the patterns are 1100 and 1101 for the code of Figure 5 they are 1010 and 1011 The choice between the two codes is determined by a mode bit Decimal digits packed two to a byte appear in elds of vari able length from 1 to 16 bytes and are accompanied by a sign in the rightmost four bits of the loworder byte Operand elds can be located on any byte boundary and can have lengths up to 31 digits and sign Operands participating in an operation have independent lengths Negative numbers are carried in true form Instructions are provided for packing and unpacking decimal numbers Packing of digits leads to ef cient use of storage in creased arithmetic performance and improved rates of data trans mission For purely decimal elds for example a 90000bytesec 0nd tape drive reads and writes 180000 digits second OUTLINE OF THE LOGICAL STRUCTURE instruction formats Figure 6 Five basic instruction formats FIRST HALFWORD SECOND HALFWORD THIRD HALFWORD REGISTER OPERANDS 1 2 A 39 H 0 CODE R I R RR FORMAT 7 B 11 12 15 REGISTER STORAGE OPERAND OPERAND l 2 Fw OP CODE R X B D RX FORMAT 7 8 1112 15 REGISTER OPERANDS STORAGE OPERAND 2 RS FORMAT 0P CODE R 11 12 15 STORAGE OPERAND 1 r m SI FORMAT l B D IMMEDIATE OPERAND STORAGE OPERAND 2 STORAGE OPERAND 1 OPERAND LENGTHS 1 2 4 L l L B D B l I 73 n12 15m 1920 47 SS FORMAT 0P CODE Instruction formats contain one two or three halfwords depending upon the number of storage addresses necessary for the operation If no storage address is required of an instruction one halfword suf ces A twohalfword instruction speci es one address a three halfword instruction speci es two addresses All instructions must be aligned on halfword boundaries The ve basic instruction formats denoted by the format mne monics RR RX RS s1 and ss are shown in Figure 6 RR denotes a registertoregister operation RX a register and indexed storage operation RS a register and storage operation s1 a storage and im mediateoperand operation and ss a storageto storage operation In each format the rst instruction halfword consists of two parts The rst byte contains the operation code The length and format of an instruction are indicated by the rst two bits of the operation code The second byte is used either as two 4 bit elds or as a single 8 bit eld This byte is speci ed from among the following Four bit operand register designator R Fourbit index register designator X Fourbit mask M Fourbit eld length speci cation L Eightbit eld length speci cation Eightbit byte of immediate data I The second and third halfwords each specify a 4bit base A BLAAUVV AND F P BROOKS JR register designator B followed by a 12 bit displacement D An effective storage address E is a 24bit binary integer given in the typical case by EBXD where B and X are 24wbit integers from general registers identi ed by elds B and X respectively and the displacement D is a 12bit integer contained in every instruction that references storage The base B can be used for static relocation of programs and data In record processing the base can identify a record in array calculations it can specify the location of an array The index X can provide the relative address of an element within an array Together 8 and X permit double indexing in array processing The displacement provides for relative addressing of up to 4095 bytes beyond the element or base address In array calcu lations the displacement can identify one of many items associ ated with an element Thus multiple arrays whose indices move together are best stored in an interleaved manner In the pro cessng of records the displacement can identify items Within a record In forming an effective address the base and index are treated as unsigned 24bit positive binary integers and the displacement as a 12 bit positive binary integer The three are added as 24bit binary numbers ignoring over ow Since every address is formed with the aid of a base programs can be readily and generally re located by changing thc contents of base registers A zero base or index designator implies that a zero quantity must be used in forming the address regardless of the contents of general register 0 A displacement of zero has no special signi cance Initialization modi cation and testing of bases and indices can be carried out by xedpoint instructions or by BRANCH AND LINK BRANCH 0N COUNT or BRANCH ON INDEX instructions LOAD EFFECTIVE ADDRESS provides not only a convenient housekeeping operation but also when the same register is speci ed for result and operand an immediate registernincro menting operation Sequencing Normally the CPU takes instructions in sequence After an in struction is fetched from a location speci ed by the instruction counter the instruction counter is increased by the number of bytes in the instruction Conceptuaily all halfwords of an instruction are fetched from storage after the preceding operation is completed and before execution of the current operation even though physical storage word size and overlap of instruction execution with storage access may cause the actual instruction fetching to be different Thus an instruction can be modi ed by the instruction that immedi OUTLINE OF THE LOGICAL STRUCTURE addressing branching program status word interruption Figure 7 Program status word format 8 4 a 16 svs MASK I KEY CMWP I INTERRUF T CODE 2 4 24 PROS I In I cc i MASK lNSTRUCTION ADDRESS SYSTEM MASK MPX channel ILCH Instruction length were sEL channels 16 Extimll 60 Condition code 9quot stmu mm m PROGRAM MASK Shred point over ow e eclmal ove law CWP Screamsot mode xponem undemow Hack chock significance Wait gm Prm state ater precedes it in the instruction stream and cannot effectively modify itself during execution Most branching is accomplished by a single BRANCH 0N CONDITION operation that inspects a 2 bit condition register Many of the arithmetic logical and 10 operations indicate an outcome by setting the condition register to one of its four pos sible states Subsequently a conditional branch can select one of the states as a criterion for branching For example the condition code re ects such conditions as nonzero result rst operand high operands equal over ow channel busy zero etc Once set the condition register remains unchanged until modi ed by an in struction execution that re ects a different condition code The outcome of address arithmetic and counting operations can be tested by a conditional branch to effect loop control Two instructions BRANCH 0N COUNT and BRANCH 0N INDEX provide for one instruction execution of the most common arith rustictest combinations A program status word Psw a double word having the for mat shown in Figure 7 contains information required for proper execution of a given program A new includes an instruction ad dress condition code and several mask and mode elds The active or controlling PSW is called the current Psw By storing the current rsw during an interruption the status of the interrupted program is preserved Five classes of interruption conditions are distinguished input output program supervisor call external and machine check For each class two Psw s called old and new are maintained in the mainstorage locations shown in Table 1 An interruption in a given class stores the current new as an old PSW and then takes the corresponding new Psw as the current Psw If at the con clusion of the interruption routine old and current Psw s are interchanged the system can be restored to its prior state and the interrupted routine can be continued The system mask program mask and machinecheck mask bits in the PSW may he used to control certain interruptions When masked off some interruptions remain pending While others are merely ignored The system mask can keep IO and external interruptions pending the program mask can cause four of the 15 program interruptions to be ignored and the machine check G A BLAAUW AND F P BROOKS JR mask can cause machine check interruptions to be ignored Other interruptions cannot be masked off Appropriate CPU response to a special condition in the channels and 10 units is facilitated by an Io interruption The addresses of the channel and 10 unit involved are recorded in the old Psw Related information is preserved in a channel status word that is stored as a result of the interruption Unusual conditions encountered in a program create program interruptions Eight of the fteen possible conditions involve over ows improper divides lost signi cance and exponent under ow The remaining seven deal with improper addresses attempted execution of privileged instructions and similar conditions A supervisorcall interruption results from execution of the instruction SUPERVISOR CALL Eight bits from the instruction format are placed in the interruption code of the old Psw per mitting a message to be associated with the interruption SUPER VISOR CALL permits a problem program to switch CPU control back to the supervisor Through an external interruption a CPU can respond to signals from the interruption key on the system control panel the timer other CPU s or special devices The source of the interruption is identi ed by an interruption code in bits 24 through 31 of the Psw The occurrence of a machine check if not masked off termi nates the current instruction initiates a diagnostic procedure and subsequently effects a machinecheck interruption A machine check is occasioned only by a hardware malfunction it cannot be caused by invalid data or instructions Table 1 Permanent storage assignmenfs Address Byte length Purpose 0 8 16 24 32 4O 48 56 64 72 76 80 84 88 96 104 Initial program loading PSW Initial program loading cow 1 Initial program loading cow 2 External old Psw Supervisor call 01d PSW Program 01d PSW Machine check old PSW Inputoutput old PSW Channel status word Channel address word Unused Timer Unused External new PSW Supervisor call new Psw Program new PSW 112 Machine check new PSW 120 Inputoutput new PSW 128 Diagnostic scanout area mmoooooommpphoooooooooooooooooo The size of the diagnostic scanout area is con guration dependent OUTLINE OF THE LOGICAL STRUCTURE 129 interrupt priority progra m status 130 Interruption requests are honored between instruction execu tions When several requests occur during execution of an instruc tion they are honored in the following order 1 machine check 2 program or supervisor call 3 external and 4 inputoutput Because the program and supervisorcall interruptions are mutu ally exclusive they cannot occur at the same time If a machine check interruption occurs no other interruptions can be taken until this interruption is fully processed Otherwise the execution of the CPU program is delayed while Psw s are ap propriately stored and fetched for each interruption When the last interruption request has been honored instruction execution is resumed with the PSW last fetched An interruption subroutine is then serviced for each interruption in the order 1 inputout put 2 external and 3 program or supervisor call Overall CPU status is determined by four alternatives 1 stopped versus operating state 2 running versus waiting state 3 masked versus interruptable state and 4 supervisor versus problem state In the stopped state which is entered and left by manual procedure instructions are not executed interruptions are not accepted and the timer is not updated In the operating state the CPU is capable of executing instructions and of being interrupted In the running state instruction fetching and execution pro ceeds in the normal manner The wait state is typically entered by the program to await an interruption for example an Io interruption or operator intervention from the console In the wait state no instructions are processed the timer is updated and 10 and external interruptions are accepted unless masked Running versus waiting is determined by the setting of a bit in the current PSW The CPU may be interruptable or masked for the system pro gram and machine interruptions When the CPU is interruptable for a class of interruptions these interruptions are accepted When the CPU is masked the system interruptions remain pending but the program and machine check interruptions are ignored The interruptable states of the CPU are changed by altering mask bits in the current PSW In the problem state processing instructions are valid but all Io instructions and a group of control instructions are invalid In the supervisor state all instructions are valid The choice of problem or supervisor state is determined by a bit in the Psw Supervisory Facilities A timer word in main storage location 80 is counted down at a rate of 50 or 60 cycles per second depending on power line fre quency The word is treated as a signed integer according to the rules of xed point arithmetic An external interrupt occurs when the value of the timer word goes from positive to negative The full cycle time of the timer is 155 hours G A BLAAUVV AND F P BROOKS JR lO instructions channels Figure 8 Channel status word format FKEY 0000 34 7B O COMMAND ADDRESS STATUS 32 47 48 Hits 0 tnraugh 3 contain the storage protection key used in the operation Hits 4 through 7 Conlaln zeros 39 a through 32 speciiy the location or the last ccw used sits 32 through 47 contain an Iodevicesmtus oyte and a channeiestatus yte The status byt 5 HOW e su Che b ch informantquot as data39 ck Chang electrical logical and buffering capabilities necessary for 10 device operation From the programming point of View most control unit and Io device functions are indistinguishable Some times the control unit is housed with an Io device as in the case of the printer A control unit functions only with those Io devices for which it is designed but all control units respond to a standard set of signals from the channel This controlunit to channel connection called the IO interface enables the CPU to handle all I o operations with only four instructions Inputoutput instructions can be executed only while the CPU is in the supervisor state The four Io instructions are START IO HALT IO TEST CHANNEL and TEST IO START I O initiates an 1 0 operation its address eld speci es a channel and an 1 0 device If the channel facilities are free the instruction is accepted and the CPU continues its program The channel independently selects the speci ed I 0 device HALT 10 terminates a channel operation TEST CHANNEL sets the condi tion code in the PSW to indicate the state of the channel addressed by the instruction The code then indicates one of the following conditions channel available interruption condition in channel channel working or channel not operational TEST IO sets the PSW condition code to indicate the state of the addressed channel subchannel and I 0 device Channels provide the data path and control for Io devices as they communicate with main storage In the multiplexor chan nel the single data path can be timeshared by several lowspeed devices card readers punches printers terminals etc and the channel has the functional character of many subchannels each of which services one Io device at a time On the other hand the selector channel which is designed for high speed devices has the functional character of a single subchannel A11 subchannels respond to the same 1 o instructions Each can fetch its own con trol word sequence govern the transfer of data and control signals count record lengths and interrupt the CPU on exceptions Two modes of operation burst and multiplex are provided for multiplexor channels In burst mode the channel facilities are monopolized for the duration of data transfer to or from a par ticular Io device The selector channel functions only in the burst mode In multiplex mode the multiplexor channel sustains several simultaneous IO operations bytes of data are interleaved G A BLAAUW AND F P BROOKS JR Figure 9 Channel command word format l COMMAND CODE 1 O 7 rues 9 couur 32 36 37 3 7 48 63 DATA ADDRESS art 34 causes a possible incurrect length indication to be suppressed Bit 35 suppresses the transfer or inturmation to mam sturuge art 35 Causes an interruption sits 37 through 39 must contain zeros Bits 40 through 47 are ignored Bits 0 through 7 specify the command code arts 3 through 31 specirytrie Iocatiori a a byte in main storage Hit 32 thmu h 36 Ive flag bits Bit 32 causes the address portion of the next CCW to be used Hit 33 causes the command code and an ddress In the next cow to be and and then routed between selected I 0 devices and desired locations in main storage At the conclusion of an operation launched by START 10 or TEST I 0 an 10 interruption occurs At this time a channel status word csw is stored in location 64 Figure 8 shows the csw for mat The csw provides information about the termination of the 10 operation Successful execution of START IO causes the channel to fetch a channel address word from mainstorage location 72 This word speci es the storageprotection key that governs the 10 oper ation as well as the location of the rst eight bytes of information that the channel fetches from main storage These 64 bits comprise a channel command word cow Figure 9 shows the cow format One or more cow s make up the channel program that directs channel operations Each cow points to the next one to be fetched except for the last in the chain which so identi es itself Six channel commands are provided read write read back ward sense transfer in channel and control The read command de nes an area in main storage and causes a read operation from the selected IO device The write command causes data to be written by the selected device The read backward command is akin to the read command but the external medium is moved in the opposite direction and bytes read backward are placed in descending main storage locations The control command contains information called an order that is used to control the selected Io device Orders peculiar to the particular 10 device in use can specify such functions as rewinding a tape unit searching for a particular track in disk storage or line skipping on a printer In a functional sense the CPU executes Io instructions the channels execute commands and the control units and devices execute orders The sense command speci es a main storage location and transfers one or more bytes of status information from the selected control unit It provides details concerning the selected Io device such as a stackerfull condition of a card reader or a le protected condition of a magnetic tape reel A channel program normally obtains cow s from a consecutive string of storage locations The string can be broken by a transfer in channel command that specifies the location of the next cow to be used by the channel External documents such as punched cards or magnetic tape may carry cow s that can be used by the OUTLINE OF THE LOGICAL STRUCTURE channel program Table 2 System lmtruc on RR Format BRANCHING AND STATUS SWITCHING 0000mm FIXEDPOINT FULLWORD AND LOGICAL 0001xxxx FLOATINGPOINT LONG 0010xxxx FLOATINGPOINT SHORT 0011mm 8PM BCR SSK SK 5V0 BALE B BCTR BRANCH ON COUNT SET PROGRAM MASK RANCH AND LINK BRANCHCONDITION SET KEY INSERT KEY SUPERVISOR CALL LOAD POSITIVE LOAD NEGATIVE L ID I A E 391 LOAD COMPLEMENT A COMPARE LOGICAL OR EXCLUSIVE 0R OAD COMPARE ADD SUBTRACT M ULTIPLY ADD LOGICAL SUBTRACT LOGICAL LPDR LOAD POSITIVE LNDR LOAD NEGATIVE TDR LOA L D TES39I LCDR LOAD COMPLEMENT HDR LOAD COMPARE ADD SUBTRAGT N MULTIPLY DIVIDE 1331ch U LPER LOAD POSITIVE LNER LOAD NEGATIVE OAD TEST LTER L D LCER LOAD COMPLEMENT HER HALVE LER LOAD CER COMPARE ADD DD U SUB SUBTRACT U RX Format FIXEDPOINT HALFWORD AND BRANCHING 0100mm FIXEDoPOINT FULLWORD AND L OGICAL Ol lxxxx FLOATINGJ OINT LONG 0110x100 FLOATINGPOINT SHGRT Oillxxxx 0001 M H CV1 CVB 0001 0010 001 00 0101 0119 0111 1000 00 1010 1011 00 110 1110 1111 ADDR S CHAR CTER INSERT CHARACTER E BUT NICK AND LINK ICH 0N COUNT BRA VCHGONDITION CGMPARE D A SUBTRAC I M ULTIPLY CONVERTDECIMAL CONVERTBINARY ST STORE AND COMPARE LOGICAL O EXCLUSIVE OR D COMPARE A D SUBTRACT MULTIPLY D ID ADI LDGICAL SUBTRACT LOGICAL STORE LO A D COMPARE SUB TRACT N MULTIPLY DI VID E ADD U SUETRACT U STE STQRE LOA I COMPARE A SUBTRACT N MULTIPLY DIVI DE AD SUBTRACT U RS 5 Format G BRANCHIN STATUS SWITCHING AND SHIFTING 1000xxxx FIXEDPOINT LOGI INPUTOUTPUT CAL AND 1001xxxx 1010mm l llxxxx SLDA SET SYSTEM MASK LOAD PSW I LEFT S SHIFT RIGHT DL SHIFT LEFT DL SHIFT RIGHT D SHIFT LEFT 139 STORE MULTIPLE TEST UNDER MASK MOVE TEST AND SET A D COMPARE LOGICAL O EXCLUSIVE 0R LOAD MULTIPLE START 10 TEST 10 HALT NO TEST CHANNEL SS Format LOGICAL 1 IOIxxxx 1 1 Mann 2 DECIMAI Illlxxxx MOVE NUMEch 313w ZONE gamma LOGICAL EXCLUSIVE OR TRANSLATE MVO MOVE WITH OFFSET PA UNPK UNPA GK ZERO AND ADD COMPARE ADD SUBTRA CT MULTIPLY DIVIDE TRT TRANSLATE AND TEST ED EDMR EDIT AND MARK DOUBLE LOGICAL S ISINGLE UNNORMALIZED I 5 DOUBLE NOTE N NOMALJ ZED D14 EL 3 SINGLE LOGICAL U G A BLAAUVV AND F P BROOKS JR channel to govern the reading of the documents The inputoutput interruptions caused by termination of an 10 operation or by operator intervention at the 10 device enable the CPU to provide appropriate programmed response to con ditions as they occur in IO devices or channels Conditions re sponsible for 10 interruption requests are preserved in the IO devices or channels until recognized by the CPU During execution of START I O a command can be rejected by a busy condition program check etc Rejection is indicated in the condition code of the rsw and additional detail on the conditions that precluded initiation of the 10 operation is pro vided in a csw The need for manual control is minimal because of the design of the system and supervisory program A control panel provides the ability to reset the system store and display information in main storage in registers and in the new and load initial program information After an input device is selected with the load unit switches depressing a load key causes a read from the selected input device The six words of information that are read into main storage provide the PSW and the cow s required for sub sequent operation The srs rnng aco instructions classified by format and function are displayed in Table 2 Operation codes and mnemonic abbrevi ations are also shown With the previously described formats in mind much of the generality provided by the system is apparent in this listing Summary In the srsrnnfseo logical structure processing ef ciency and versatility are served by multiple accumulators binary addressing bitmanipulation operations automatic indexing xed and vari able field lengths decimal and hexadecimal radices and oating point as well as xed point arithmetic The provisions for pro gram interruption storage protection and exible CPU states contribute to e ective operation Base register addressing the standard interface between channels and inputoutput control units and the machinelanguage compatibility among models contribute to exible configurations and to orderly system ex pansion FOOTNOTE 1 A seventh embodiment the Model 92 is not discussed in this paper This model does not provide decimal data handling and has a few minor di 39er cnces arising from its highly concurrent speedoriented organization A paper on Model 92 is planned for future publication in the IBM Systems Journal OUTLINE OF THE LOGICAL STRUCTURE manual control instruction set


Buy Material

Are you sure you want to buy this material for

25 Karma

Buy Material

BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.


You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

Why people love StudySoup

Steve Martinelli UC Los Angeles

"There's no way I would have passed my Organic Chemistry class this semester without the notes and study guides I got from StudySoup."

Jennifer McGill UCSF Med School

"Selling my MCAT study guides and notes has been a great source of side revenue while I'm in school. Some months I'm making over $500! Plus, it makes me happy knowing that I'm helping future med students with their MCAT."

Bentley McCaw University of Florida

"I was shooting for a perfect 4.0 GPA this semester. Having StudySoup as a study aid was critical to helping me achieve my goal...and I nailed it!"


"Their 'Elite Notetakers' are making over $1,200/month in sales by creating high quality content that helps their classmates in a time of need."

Become an Elite Notetaker and start selling your notes online!

Refund Policy


All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email


StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here:

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.