Popular in Course
verified elite notetaker
Popular in Department
This page Study Guide was uploaded by Usman Qureshi on Wednesday November 18, 2015. The Study Guide belongs to a course at a university taught by a professor in Fall. Since its upload, it has received 18 views.
Reviews for coa-02
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 11/18/15
Computer Organization amp Architecture Chapter 2 Computer Evolution amp Performance 29 ZilHijj 1436 Wednesday 14 October 2015 The Early Days 0 Counting with pebbles 5000 years ago 0 Several inventions of counting machines 0 2000 BC Chinese Abacus 0 1620 Napier s Bone John Napier o 1653 Pascaline Blaise Pascal 1673 Leibniz s Von Leibniz o 1823 Mechanical Calculator Machine Charles Babbage 0 1941 Mark 1 Harvard University First Generation Vacuum Tubes 0 ENIAC Electronic Numerical Integrator And Computer University of Pennsylvania Weighing 30 tons occupying 1500 square feet of floor space and containing more than 18000 vacuum tubes When operating it consumed 140 kilowatts of power It was also substantially faster than any electromechanical computer capable of 5000 additions per second 0 The ENIAC was a decimal rather than a binary machine Instruction Set of von Neumann Machine 0 Total of 21instructions of following types Data transfer Unconditiona branching jump Conditiona branching compare Arithmetic Address modify 0 Instruction format ZObits Opcode 8bits Address 12bits Opcode which operation to perform Address memory address of data IAS Computer Machine Language 40bit word two machine instructions per word Left instruction Right instruction bitO 7 8 19 20 27 28 39 8bit opcode 12bit memory address operand IAS Architecture Central Processing Unit CPU i I 39 I I I gt Arithmetic y 3908 I I unit i I I I Main 39 i 1 9 39 I Equ39 am I I meat 1 gt i gt Program I control unit CC I i 39 I 39 I Structure of IAS Computer Registers Memory buffer register MBR Contains a word to be stored in memory or sent to the IO unit or is used to receive a word from memory or from the IO unit a Memory address register MAR Specifies the address in memory of the word to be written from or read into the MBR Instruction register IR Contains the 8bit opcode instruction being executed 0 Instruction buffer register IBR Employed to hold temporarily the righthand instruction from a word in memory a Program counter PC Contains the address of the next instruction pair to be fetched from memory Accumulator AC and multiplier quotient MQ Operands and results of ALU operations For example the result of multiplying two 40bit numbers is an 80bit number the most significant 40 bits are stored in the AC and the least significant in the MQ Commercial Computers 0 1947 EckertMauchly Computer Corporation UNIVAC I Universal Automatic Computer 0 US Bureau of Census 1950 calculations 0 Became part of SperryRand Corporation 0 First successful commercial computer 0 Late 1950s UNIVAC II Faster More memory IBM International Business Machines 0 Punchedcard processing equipment 0 1953 the 701 IBM s first stored program computer Scientific calculations 0 1955 the 702 Business applications 0 Lead to 7007000 series First Ever Computer Hard Disk o The first ever computer hard disk designed It was in early 19505 It weighed about a ton With just 5 MB storage capacity 2quot Generation Computers Transistors o Replaced vacuum tubes 10mm Smaller Cheaper 0 Less heat dissipation Solid State device 0 Made from Silicon o Invented in 1947 at Bell Labs 0 William Shockley et al Transistor Based Computers 0 Second generation machines 0 NCR amp RCA produced small transistor machines 0 IBM 7000 0 DEC 1957 Produced PDPl IBM 7094 1959 2900000 50390quot 1 n m quot9 Third Generation IC 0 Microelectronics o Literally small electronics 0 Computer is made up of gates memory cells and interconnections o Manufactured with semiconductor material 0 eg silicon wafer 0 Digital Comp Gates and Memory Cells Gates To implement Boolean Logic functions Memory Cells Device that can store one bit of data Bookon Binary Input 39 logic 7 Output Input b dong 7 impel auction cdl J I Read fitt 139 dz tt Humor c 1 Wafer Chip and Gate E ld ip Relationship among wafer Chips and Gates Generations of Computer 0 Vacuum tube 19461957 Transistor 19581964 Small scale integration SCI 1965 on Up to 100 devices on a chip Medium scale integration MSI to 1971 1003000 devices on a chip Large scale integration LSI 19721977 3000 100000 devices on a chip Very large scale integration VLSI 1978 1991 100000 100000000 devices on a chip Ultra large scale integration 1991 Over 100000000 devices on a chip Computer Generations Generati n Appmximate ate5 T h l g Typical Speed mpemtimms per semml 1 1946 195 195 1964 1965 197 1 197 2 197 7 19T 1991 1991 V cmm TUIle TITEI EiEE I 51111311 and madimm male intagrarian Large SERIES intagrarian WEI large StillE intagrarian TU 111173 Mtg 331 i t gl ti 4U QUE EUU 11 1 BUB BUB 1U UUU UUU IUU BUB BBQ 1 UUU UUU UUU Moore s Law Density of components on chip increasing Gordon Moore cofounder of Intel 0 Number of transistors on a chip will double every year Since 1970 s development has slowed a little Number of transistors doubles every 18 months Cost of a chip has remained almost unchanged 0 Higher packing density means shorter electrical paths giving higher performance Smaller size gives increased flexibility Reduced power and cooling requirements 0 Fewer interconnections increases reliability Growth in CPU Transistor Count 139 biillim tral JEiE quot Cirill LJl 139 ID ID 1GB Transismrs rear chip 1 5 1114 1 EH 39TI TI 2m El IBM 360 series 1964 Replaced amp not compatible with 7000 series 0 First planned family of computers Similar or identical instruction sets Similar or identical OS Increasing speed Increasing number of IO ports ie more terminals Increased memory size Increased cost Multiplexed switch structure IBM System360 Model 30 Console m vtvm Cs Cl LCIZ IBM 2311 Disk Drive F IBM System360 Tape Drives DEC PDP8 1964 0 First minicomputer 0 Did not need air conditioned room 0 Small enough to sit on a lab bench 0 Price 16000 Compare with IBM 360 of 100000 0 Embedded applications BUS STRUCTURE DEC PDP8 DEC PDP8 Bus Structure Emaile gi r39itmilaer EPU MairI mammary Tl mnibus HE m ulla HE n ulle Intel 0 1971 4004 First microprocessor I All CPU components on a single chip l 1 4 bit 39 39 Followed in 1972 by 8008 8 bit Both designed for specific applications 0 1974 8080 Intel s first general purpose microprocessor Speeding it up Pipelining On board cache On board L1 amp L2 cache 0 Branch prediction increases amount of work available for CPU to execute 0 Data flow analysis prevent unnecessary delays Speculative execution keep execution engines as busy as possible Pipelining o Pipelining is one form of imbedding parallelism or concurrency in a computer system It refers to a segmentation of a computational process say an instruction into several sub processes which are executed by dedicated autonomous units facilities pipelining segments Pipelmmg A technique used in advanced microprocessors where the microprocessor begins executing a second instruction before the first has been completed 0 A Pipeline is a series of stages where some work is done at each stage The work is not finished until it has passed through all stages 0 With pipeining the computer architecture allows the next instructions to be fetched while the processor is performing arithmetic operations holding them in a buffer close to the processor until each instruction operation performed Performance Gap 0 Processor speed increased 0 Memory capacity increased 0 Memory speed lags behind processor speed Logic and Memory Performance Gap hmHa E ii Sill E ii Eiii T i Tiii Eii Solutions 0 Increase number of bits retrieved at one time Make DRAM wider rather than deeper Change DRAM interface Cache 0 Reduce frequency of memory access More complex cache and cache on chip 0 Increase interconnection bandwidth High speed buses Hierarchy of buses IIO Devices 0 Peripherals with intensive IO demands Large data throughput demands Processors can handle this demand Problem is how to move the data 0 Solutions Caching Buffering Higherspeed interconnection buses More elaborated bus structures Multipleprocessor configurations Typical IIO Device Data Rates Mm NW TE 11 U13 110quot 15 WE 1U 103 HF Data Rata pa Giigabit Etharnat Brahma diapllay Hard aiak Etharnat l tiisaall ak Eaannar Laaar rini ar Flayquot diak Muaa Hawaiian Chip Organization and Architecture 0 Increased hardware speed of processor Fundamentally due to shrinking logic gate size More gates packed more tightly Increasing clock rate Propagation time for signals reduced 0 Increased size and speed of caches Dedicated part of processor chip Cache access time dropped significantly 0 Change processor organization and architecture Increased effective speed of execution Parallelism Clock Speed and Logic Density 0 Power Power density increases with density of logic and clock speed Dissipating heat 0 RC delay Speed at which electrons flow is limited by resistance R and capacitance C of metal wires connecting them Delay increases as RC product increases Wire interconnects thinner increasing resistance Wires closer together increasing capacitance Memory latency Memory speeds lag processor speeds 0 Solution More emphasis on organizational and architectural approaches Intel Microprocessor Performance Thccrctical maximum pcrtcr mancc 1DUDD millicn cccraticnc ccr ccccnd 1 EDDIE 1D Hyccn39thrcading mutttccrc tmctcucmcntc in chip architecturc Lcngcr pipeline dcuhiccpccd arithmetic tncrccccc in FullSnead quotSHEER E 39 d EiE39iH EI cach A MIME multimedia Spcuculattunc ext na n cutcfcrdcr v 3am MHI I cxccuticn 39 A quot 39E39 V MHz Multich V tncttucticnc 39 pct cyclic t M39HE Illr39ntcmcll J mEIf39I39ID39HW ccc IMHz cacc I Ilnctmcttcn Em HIE ptlcceitnc 19733 197cc II I t 9592 IMHz t I E I E ll I II II I II 1994 1 BEE 1 998 Increased Cache Capacity 0 Typically two or three levels of cache between processor and main memory 0 Chip density increased More cache memory on chip Faster cache access 0 Pentium chip devoted about 10 of chip area to cache 0 Pentium 4 devotes about 50 More Complex Execution Logic 0 Enable parallel execution of instructions 0 Pipeline works like assembly line Different stages of execution of different instructions at same time along pipeline Superscalar allows multiple pipelines within single processor Instructions that do not depend on one another can be executed in parallel Diminishing Returns Tendency to decline in effectiveness of a continuing application of input after a certain level of result has been achieved Internal organization of processors complex Can get a great deal of parallelism Further significant increases likely to be relatively modest Benefits from cache are reaching limit Increasing clock rate runs into power dissipation problem Some fundamental physical limits are being reached New Approach Multiple Cores Multiple processors on single chip Large shared cache Within a processor increase in performance proportional to square root of increase in complexity If software can use multiple processors doubling number of processors almost doubles performance Use two simpler processors on the chip rather than one more complex processor With two processors larger caches are justified Power consumption of memory logic less than processing logic Example IBM POWER4 Two cores based on PowerPC POWER4 Chip Organization 2 a Fr22222r Eurquot2 1 Prane22r cure E Ilf2td11 2an 12222 Wet22h 22222 4222112 EB H 222 2L1 ii222 W 222 ipj a 222 4322 inieriaca unit Elm Switch 2 2 1331 T a 222 I 1 222 A H 222 ILE LE ca he 2222222 cache 222 quot 222 quot 222 lquot 1 Ti H 2 i Fahriu mntrriler Ehi1p1i 2hip innmer39mmnek t Elh ipemmchin intammnem Ii Eli quotHim V H 4H quotm hug L3 L3 Enntmller R2 2 d1 HIE ili39 L II L fmam r39y I222 r n22222h22hlJ2 Pentium Evolution 1 8080 first general purpose microprocessor 2 MHz 8 bit data path Used in first personal computer Altair o 8086 much more powerful 16 bit instruction cache prefetch few instructions 8088 8 bit external bus used in first IBM PC 0 80286 16 Mbyte memory addressable up from 1Mb 0 80386 12 MHz to 40 MHz 32 bit Support for multitasking Pentium Evolution 2 0 80486 1989 16 to 100 MHz Powerfu cache and instruction pipelining Buitin maths coprocessor Pentium Superscaar Mutipe instructions executed in parallel Pentium Pro Increased superscalar organization Aggressive register renaming branch prediction data flow analysis specuative execution Pentium Evolution 3 0 Pentium II MMX technology graphics video amp audio processing separately 0 Pentium 111 Additional floating point instructions for 3D graphics Pentium 4 2000 2008 13 GHz to 38 GHz Further floating point and multimedia enhancements Itanium 2001 64 bit 733800 MHz Itanium 2 2002 900 MHz 16 GHz up to 9MB Cache Intel Xeon Processors 0 Pentium II Xeon Instruction Set MMX Multimedia Extension 400450 MHz 512KB2MB L2 Cache 20 V 199899 0 Pentium III Xeon MMX SSE Streaming SIMD Extension SIMD Single Instruction Multiple Data 5001000 MHz 256KB2MB L2 Cache 212 V 199900 0 Xeon NetBurst Microarchitecture MMX SSE SSE2 1430 GHz 256512KB L2 Cache 15175 V 200106 0 Xeon Enhanced PentiumM Microarchitecture SSE2 SSE3 XD bit Intel VTX 2006 Xeon Core Microarchitecture SSE2 SSE3 IA64 XC bit Intel VTX 200607 CPU Core CPU Ccre and and Intel Core Processors H quot Backside Intel Core2200607 H i 2 cores on 1 die 1033GHz mamas LZCaches Core 2 Duo Core 2 Extreme Core 2 Quad Pentium Dual Core t Frontside Core i3 201011 39 2 physical cores 4 threads GPU 2534GHz 3232Kb core L1 256Kb core L2 3MB L3 cache 0 Core i5 201011 4 physical cores 4 threads GPU 2334GHz 3232Kb core L1 256Kb core L2 6MB L3 cache 0 Core i7 201011 Up to 8 physical cores 16 threads 283SGHz 3232Kb core L1 256Kb core L2 up to 20 MB L3 PowerPC o 1975 801 minicomputer project IBM RISC 0 Berkeley RISC I processor 0 1986 IBM commercial RISC workstation product RT PC Not commercial success Many rivals with comparable or better performance 0 1990 IBM RISC System6000 RISClike superscalar machine POWER architecture 0 IBM alliance with Motorola 68000 microprocessors and Apple used 68000 in Macintosh 0 Result is PowerPC architecture Derived from the POWER architecture Superscalar RISC Apple Macintosh Embedded chip applications PowerPC Family 1 0 601 Quicky to market 32bit machine 0 603 Lowend desktop and portable 32bit Comparabe performance with 601 Lower cost and more efficient implementation 0 604 Desktop and lowend servers 32bit machine Much more advanced superscalar design Greater performance 0 620 Highend servers 64bit architecture PowerPC Family 2 0 740750 Also known as 53 Two levels of cache on chip 0 G4 Increases parallelism and internal speed 0 Power 6 2007 35SGHz 2Cores 65nm 0 Power 7 2010 244256Hz 468Cores 45nm Over 250000 processors grouped in 72 racks connected with optical fiber 39 39 I 39 u A Buonum uraneau 7Itlbl y t a lt s s I11 a P 0 s x c OUUu V y u 0 9 IBM Blue GeneP Super Computer Computation Power li I a I F L05 FLOPSFLOating point Operations Per Second i l q E l ggp 1m 1mg Rm x 105 IDQE l gp L i i l g i 194E lg lg 19 l i lgg E 231E TEar Top 500 Super Computers New Mind FEES Sha r39a if Tp 5 Superm mputEra 1amp5 SwTEL lamzl 33915 13 in Canada ltdquot India lm In III 1 I ag Sweden L E u H Mu r EEgEEE Elus a Japan 22 BIKE A E E r ma nquot V Il i 39 El I39I Ehi United Fram5 5453 U P its 3 F i mm m 5 I 393939J I39 J 39IJ Sauna wwaHF mfg SuperComputer in Pakistan 0 National University of Sciences and Technology NUST Islamabad 0 Research Center for Modeling and Simulations RCMS o 132 TFlops o Inaugurated in March 2012 mm 913 an m 99 51 am 31 21 m n 1994 1995 1999 199 1999 1999 E999 E991 E999 E993 E994 9995 2999 299 2999 E999 E919 E911 03 Used on Top 500 Super Computers Ii Ilium I 115121 Win ur x vw wa fmixe Questions
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'