Popular in Course
verified elite notetaker
Popular in Department
This 58 page Study Guide was uploaded by Usman Qureshi on Wednesday November 18, 2015. The Study Guide belongs to a course at a university taught by a professor in Fall. Since its upload, it has received 19 views.
Reviews for dfg
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 11/18/15
Computer Organization & Architecture Chapter 2 Computer Evolution & Performance 29 ZilHijj 1436 Wednesday, 14 October 2015 The Early Days… • Counting with pebbles (5000 years ago) • Several inventions of counting machines • 2000 BC Chinese Abacus • 1620 Napier’s Bone (John Napier) • 1653 Pascaline ( Blaise Pascal) • 1673 Leibniz’s (Von Leibniz) • 1823 Mechanical Calculator Machine (Charles Babbage) • 1941 Mark 1 (Harvard University) First Generation (Vacuum Tubes) • ENIAC (Electronic Numerical Integrator And Computer) (University of Pennsylvania) —Weighing 30 tons, occupying 1500 square feet of floor space, and containing more than 18,000 vacuum tubes. —When operating, it consumed 140 kilowatts of power. It was also substantially faster than any electro-mechanical computer, capable of 5000 additions per second. • The ENIAC was a decimal rather than a binary machine. Instruction Set of von Neumann Machine • Total of 21-instructions of following types —Data transfer —Unconditional branching (jump) —Conditional branching (compare) —Arithmetic —Address modify • Instruction format (20-bits) —Opcode (8-bits) + Address (12-bits) —Opcode = which operation to perform —Address = memory address of data IAS Computer Machine Language 40-bit word, two machine instructions per word. Left instruction Right instruction bit 0 7 8 19 20 27 28 39 8-bit opcode 12-bit memory address (operand) IAS Architecture: Structure of IAS Computer Registers • Memory buffer register (MBR): Contains a word to be stored in memory or sent to the I/O unit, or is used to receive a word from memory or from the I/O unit. • Memory address register (MAR): Specifies the address in memory of the word to be written from or read into the MBR. • Instruction register (IR): Contains the 8-bit opcode instruction being executed. • Instruction buffer register (IBR): Employed to hold temporarily the right-hand instruction from a word in memory. • Program counter (PC): Contains the address of the next instruction- pair to be fetched from memory. • Accumulator (AC) and multiplier quotient (MQ): Operands and results of ALU operations. For example, the result of multiplying two 40-bit numbers is an 80-bit number; the most significant 40 bits are stored in the AC and the least significant in the MQ. Commercial Computers • 1947 - Eckert-Mauchly Computer Corporation • UNIVAC I (Universal Automatic Computer) • US Bureau of Census 1950 calculations • Became part of Sperry-Rand Corporation • First successful commercial computer • Late 1950s - UNIVAC II —Faster —More memory IBM (International Business Machines) • Punched-card processing equipment • 1953 - the 701 —IBM’s first stored program computer —Scientific calculations • 1955 - the 702 —Business applications • Lead to 700/7000 series First Ever Computer Hard Disk • The first ever computer hard disk designed —It was in early 1950s —It weighed about a ton —With just 5 MB storage capacity 2nd Generation Computers (Transistors) • Replaced vacuum tubes • Smaller • Cheaper • Less heat dissipation • Solid State device • Made from Silicon • Invented in 1947 at Bell Labs • William Shockley et al. Transistor Based Computers • Second generation machines • NCR & RCA produced small transistor machines • IBM 7000 • DEC - 1957 —Produced PDP-1 IBM 7094, 1959, $2,900,000 Third Generation (IC) • Microelectronics • Literally - “small electronics” • Computer is made up of gates, memory cells and interconnections • Manufactured with semiconductor material • e.g. silicon wafer • Digital Comp. = Gates and Memory Cells —Gates: To implement Boolean / Logic functions —Memory Cells: Device that can store one bit of data Wafer, Chip and Gate Relationship among wafer, Chips and Gates Generations of Computer • Vacuum tube - 1946-1957 • Transistor - 1958-1964 • Small scale integration (SCI) - 1965 on —Up to 100 devices on a chip • Medium scale integration (MSI) - to 1971 —100-3,000 devices on a chip • Large scale integration (LSI) - 1972-1977 —3,000 - 100,000 devices on a chip • Very large scale integration (VLSI) - 1978 -1991 —100,000 - 100,000,000 devices on a chip • Ultra large scale integration – 1991 - —Over 100,000,000 devices on a chip Computer Generations Moore’s Law • Density of components on chip increasing • Gordon Moore – co-founder of Intel • Number of transistors on a chip will double every year —Since 1970’s development has slowed a little —Number of transistors doubles every 18 months • Cost of a chip has remained almost unchanged • Higher packing density means shorter electrical paths, giving higher performance • Smaller size gives increased flexibility • Reduced power and cooling requirements • Fewer interconnections increases reliability Growth in CPU Transistor Count IBM 360 series • 1964 • Replaced (& not compatible with) 7000 series • First planned “family” of computers —Similar or identical instruction sets —Similar or identical O/S —Increasing speed —Increasing number of I/O ports (i.e. more terminals) —Increased memory size —Increased cost • Multiplexed switch structure IBM System/360 – Model 30: Console IBM 2311 – Disk Drive IBM System/360 – Tape Drives DEC PDP-8 • 1964 • First minicomputer • Did not need air conditioned room • Small enough to sit on a lab bench • Price: $16,000 —Compare with IBM 360 of $100,000+ • Embedded applications • BUS STRUCTURE DEC PDP-8 DEC - PDP-8 Bus Structure Intel • 1971 - 4004 —First microprocessor —All CPU components on a single chip —4 bit • Followed in 1972 by 8008 —8 bit —Both designed for specific applications • 1974 - 8080 —Intel’s first general purpose microprocessor Speeding it up • Pipelining • On board cache —On board L1 & L2 cache • Branch prediction (increases amount of work available for CPU to execute) • Data flow analysis (prevent unnecessary delays) • Speculative execution (keep execution engines as busy as possible) Pipelining • Pipelining is one form of imbedding parallelism or concurrency in a computer system. It refers to a segmentation of a computational process (say, an instruction) into several sub processes which are executed by dedicated autonomous units (facilities, pipelining segments). Pipelining • A technique used in advanced microprocessors where the microprocessor begins executing a second instruction before the first has been completed. • A Pipeline is a series of stages, where some work is done at each stage. The work is not finished until it has passed through all stages. • With pipelining, the computer architecture allows the next instructions to be fetched while the processor is performing arithmetic operations, holding them in a buffer close to the processor until each instruction operation performed. Performance Gap • Processor speed increased • Memory capacity increased • Memory speed lags behind processor speed Logic and Memory Performance Gap Solutions • Increase number of bits retrieved at one time —Make DRAM “ wider” rather than “deeper” • Change DRAM interface —Cache • Reduce frequency of memory access —More complex cache and cache on chip • Increase interconnection bandwidth —High speed buses —Hierarchy of buses I/O Devices • Peripherals with intensive I/O demands —Large data throughput demands —Processors can handle this demand —Problem is how to move the data? • Solutions: —Caching —Buffering —Higher-speed interconnection buses —More elaborated bus structures —Multiple-processor configurations Typical I/O Device Data Rates Chip Organization and Architecture • Increased hardware speed of processor —Fundamentally due to shrinking logic gate size – More gates, packed more tightly, – Increasing clock rate – Propagation time for signals reduced • Increased size and speed of caches —Dedicated part of processor chip – Cache access time dropped significantly • Change processor organization and architecture —Increased effective speed of execution —Parallelism Clock Speed and Logic Density • Power —Power density increases with density of logic and clock speed —Dissipating heat • RC delay —Speed at which electrons flow is limited by resistance (R) and capacitance (C) of metal wires connecting them —Delay increases as RC product increases —Wire interconnects thinner, increasing resistance —Wires closer together, increasing capacitance • Memory latency —Memory speeds lag processor speeds • Solution: More emphasis on organizational and architectural approaches Intel Microprocessor Performance Increased Cache Capacity • Typically two or three levels of cache between processor and main memory • Chip density increased —More cache memory on chip – Faster cache access • Pentium chip devoted about 10% of chip area to cache • Pentium 4 devotes about 50% More Complex Execution Logic • Enable parallel execution of instructions • Pipeline works like assembly line —Different stages of execution of different instructions at same time along pipeline • Superscalar allows multiple pipelines within single processor —Instructions that do not depend on one another can be executed in parallel Diminishing Returns • Tendency to decline in effectiveness of a continuing application of input after a certain level of result has been achieved • Internal organization of processors complex —Can get a great deal of parallelism —Further significant increases likely to be relatively modest • Benefits from cache are reaching limit • Increasing clock rate runs into power dissipation problem —Some fundamental physical limits are being reached New Approach – Multiple Cores • Multiple processors on single chip —Large shared cache • Within a processor, increase in performance proportional to square root of increase in complexity • If software can use multiple processors, doubling number of processors almost doubles performance —Use two simpler processors on the chip rather than one more complex processor • With two processors, larger caches are justified —Power consumption of memory logic less than processing logic • Example: IBM POWER4 —Two cores based on PowerPC POWER4 Chip Organization Pentium Evolution (1) • 8080 —first general purpose microprocessor —2 MHz, 8 bit data path —Used in first personal computer – Altair • 8086 —much more powerful —16 bit —instruction cache, prefetch few instructions —8088 (8 bit external bus) used in first IBM PC • 80286 —16 Mbyte memory addressable —up from 1Mb • 80386 —12 MHz to 40 MHz, 32 bit —Support for multitasking Pentium Evolution (2) • 80486 (1989) —16 to 100 MHz, —Powerful cache and instruction pipelining —Built-in maths co-processor • Pentium —Superscalar —Multiple instructions executed in parallel • Pentium Pro —Increased superscalar organization —Aggressive register renaming —branch prediction —data flow analysis —speculative execution Pentium Evolution (3) • Pentium II —MMX technology —graphics, video & audio processing separately • Pentium III —Additional floating point instructions for 3D graphics • Pentium 4 (2000 – 2008) —1.3 GHz to 3.8 GHz —Further floating point and multimedia enhancements • Itanium (2001) —64 bit, 733-800 MHz • Itanium 2 (2002) —900 MHz -1.6 GHz, up to 9MB Cache Intel Xeon Processors • Pentium II Xeon —Instruction Set MMX (Multimedia Extension) —400-450 MHz, 512KB-2MB L2 Cache, 2.0 V, 1998-99 • Pentium III Xeon —MMX, SSE (Streaming SIMD Extension) —SIMD = Single Instruction Multiple Data —500-1000 MHz, 256KB-2MB L2 Cache, 2-12 V, 1999-00 • Xeon (NetBurst Microarchitecture) —MMX, SSE, SSE2 —1.4-3.0 GHz, 256-512KB L2 Cache, 1.5-1.75 V, 2001-06 • Xeon (Enhanced Pentium-M Microarchitecture) —SSE2, SSE3, XD bit, Intel VT-x, 2006 • Xeon (Core Microarchitecture) —SSE2, SSE3, IA64, XC bit, Intel VT-x, 2006-07 Intel Core Processors • Intel Core 2 (2006-07) —2 cores on 1 die, 1.0-3.3GHz —Core 2 Duo, Core 2 Extreme, Core 2 Quad —Pentium Dual Core • Core i3 (2010-11) —2 physical cores, 4 threads, GPU, 2.5-3.4GHz —32+32Kb /core L1, 256Kb /core L2, 3MB L3 cache • Core i5 (2010-11) —4 physical cores, 4 threads, GPU, 2.3-3.4GHz —32+32Kb /core L1, 256Kb /core L2, 6MB L3 cache • Core i7 (2010-11) —Up to 8 physical cores, 16 threads, 2.8-3.5GHz —32+32Kb /core L1, 256Kb /core L2, up to 20 MB L3 PowerPC • 1975, 801 minicomputer project (IBM) RISC • Berkeley RISC I processor • 1986, IBM commercial RISC workstation product, RT PC. — Not commercial success — Many rivals with comparable or better performance • 1990, IBM RISC System/6000 — RISC-like superscalar machine — POWER architecture • IBM alliance with Motorola (68000 microprocessors), and Apple, (used 68000 in Macintosh) • Result is PowerPC architecture — Derived from the POWER architecture — Superscalar RISC — Apple Macintosh — Embedded chip applications PowerPC Family (1) • 601: —Quickly to market. 32-bit machine • 603: —Low-end desktop and portable —32-bit —Comparable performance with 601 —Lower cost and more efficient implementation • 604: —Desktop and low-end servers —32-bit machine —Much more advanced superscalar design —Greater performance • 620: —High-end servers —64-bit architecture PowerPC Family (2) • 740/750: —Also known as G3 —Two levels of cache on chip • G4: Increases parallelism and internal speed • Power 6: —2007, 3.5-5GHz, 2-Cores, 65nm • Power 7: —2010, 2.4-4.25GHz, 4-6-8-Cores, 45nm IBM Blue Gene/P Super Computer Over 250,000 processors, grouped in 72 racks, connected with optical fiber Computation Power FLOPS=FLoating point Operations Per Second Top 500 Super Computers SuperComputer in Pakistan • National University of Sciences and Technology (NUST), Islamabad • Research Center for Modeling and Simulations (RCMS) • 132 TFlops • Inaugurated in March 2012 OS Used on Top 500 Super Computers Questions ???
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'