### Create a StudySoup account

#### Be part of our community, it's free to join!

Already have a StudySoup account? Login here

# week 5 EECE 6086C

UC

GPA 3.5

### View Full Document

## 17

## 0

## Popular in VLSI Design automation

## Popular in Electrical Engineering

This 19 page Class Notes was uploaded by Abhinav Notetaker on Monday March 21, 2016. The Class Notes belongs to EECE 6086C at University of Cincinnati taught by Ranga Vemuri in Spring 2016. Since its upload, it has received 17 views. For similar materials see VLSI Design automation in Electrical Engineering at University of Cincinnati.

## Similar to EECE 6086C at UC

## Reviews for week 5

### What is Karma?

#### Karma is the currency of StudySoup.

#### You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 03/21/16

Optimization by Simulated Annealing S. Kirkpatrick; C. D. Gelatt; M. P. Vecchi Science, New Series, Vol. 220, No. 4598. (May 13, 1983), pp. 671-680. Stable URL: http://links.jstor.org/sici?sici=0036-8075%2819830513%293%3A220%3A4598%3C671%3AOBSA%3E2.0.CO%3B2-8 Science is currently published by American Association for the Advancement of Science. Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at http://www.jstor.org/about/terms.html. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you may use content in the JSTOR archive only for your personal, non-commercial use. Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at http://www.jstor.org/journals/aaas.html. Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed page of such transmission. The JSTOR Archive is a trusted digital repository providing for long-term preservation and access to leading academic journals and scholarly literature from around the world. The Archive is supported by libraries, scholarly societies, publishers, and foundations. It is an initiative of JSTOR, a not-for-profit organization with a mission to help the scholarly community take advantage of advances in technology. For more information regarding JSTOR, please contact support@jstor.org. http://www.jstor.org Mon Nov 5 11:27:46 2007 13 May 1983, Volume 220, Number 4598 S C I E N C E with N, so that in practice exact solu- tions can be attempted only on problems involving a few hundred cities or less. The traveling salesman belongs to the large class of NP-complete (nondeter- Optimization by ministic polynomial time complete) problems, which has received extensive Simulated Annealing study in the past0years (3).No method for exact solution with a computing ef- S. Kirkpatrick, C. D. Gelatt, Jr., M. P. Vecchi fort bounded by a power of N has been found for any of these problems, but if such a solution were found, it could be mapped into a procedure for solving all members of the class. It is not known what features of the individual problems In this article we briefly review the sure of the "goodness" of some complex central constructs in combinatorial opti- system. The cost function depends on in the NP-complete class are the cause of mization and in statistical mechanics and the detailed configuration of the many their difficulty. then develop the similarities between the parts of that system. We are most famil- Since the NP-complete class of prob- two fields. We show how the Metropolis iar with optimization problems occurring lems contains many situations of practi- algorithm for approximate numerical in the physical design of computers, so cal interest, heuristic methods have been simulation of the behavior of a many- examples used below are drawn from developed with computational require- body system at a finite temperature pro- vides a natural tool for bringing the tech- niques of statistical mechanics to bear on Summary. There is a deep and useful connection between statistical mechanics (the behavior of systems with many degrees of freedom in thermal equilibrium at a optirn~ization. We have applied this point of view to a finite temperature) and multivariate or combinatorial optimization (finding the mini- number of problems arising in optimal mum of a given function depending on many parameters). A detailed analogy with design of computers. Applications to annealing in solids provides a framework for optimization of the properties of very partitioning, component placement, and large and complex systems. This connection to statistical mechanics exposes new wiring of electronic systems are de- information and provides an unfamiliar perspective on traditional optimization prob- scribed in this article. In each context, lems and methods. we introduce the problem and discuss the improvements available from optimi- zation. that context. The number of variables ments proportional to small powers ofN. Of classic optimization problems, the involved may range up into the tens of Heuristics are rather problem-specific: travel~~ngalesman problem has received thousands. there is no guarantee that a heuristic the most intensive study. To test the The classic example, because it is so procedure for finding near-optimal solu- power of simulated annealing, we used simply stated, of a combinatorial optimi- tions for one NP-complete problem will the algorithm on traveling salesman zation problem is the traveling salesman be effective for another. problems with as many as several thou- problem. Given a list of N cities and a There are two basic strategies for sand cities. This work is described in a means of calculating the cost of traveling heuristics: "divide-and-conquer" and it- final section, followed by our conclu- between any two cities, one must plan erative improvement. In the first, one sions. the salesman's route, which will pass divides the problem into subproblems of through each city once and return finally manageable size, then solves the sub- to the startingpoint, minimizing the total problems. The solutions to the subprob- CombinatorialOptimization cost. Problems with this flavor arise in lems must then be patched back togeth- all areas of scheduling and design. Two er. For this method to produce very good The subject of combinatorial optimiza- subsidiary problems are of general inter- solutions, the subproblems must be natu- tion(1) consists of a set of problems that est: predicting the expected cost of the rally disjoint, and the division made must are central to the disciplines of computer salesman's optimal route, averaged over be an appropriate one, so that errors some class of typical arrangements of science and engineering. Research in this made in patching do not offset the gains area aims at developing efficient tech- cities, and estimating or obtaining niques for finding minimum or maximum bounds for the computing effort neces- S. Kirkpatrick and C. D. Gelatt, Jr., are research values of a function of very many inde- sary to determine that route. staff members and M. P. Vecchi was a visiting pendent variables (2).This function, usu- All exact methods known for deter- ter, Yorktown Heights, New York 10598.P.en- ally called the cost function or objective mining an optimal route require a com- Vecchi's present address is Instituto Venezolano de lnvestigaciones Cientificas, Caracas 1010A. Vene- function, represents a quantitative mea- puting effort that increases exponentially znela. 13 MAY I983 obtained in applying more powerful system is weighted by its Boltzmann the systems being optimized. We will methods to the subproblems (4). probability factor, exp(-E({r,))lkBT), introduce an effective temperature for In iterative improvement (5, 6), one where E({ri})is the energy of the config- optimization, and show how one can starts with the system in a known config- uration, kBis Boltzmann's constant, and carry out a simulated annealing process uration. A standard rearrangement oper- T is temperature. in order to obtain better heuristic solu- ation is applied to all parts of the system A fundamental question in statistical tions to combinatorial optimization prob- in turn, until a rearranged configuration mechanics concerns what happens to the lems. that improves the cost function is discov- system in the limit of low temperature- Iterative improvement, commonly ap- ered. The rearranged configuration then for example, whether the atoms remain plied to such problems, is much like the becomes the new configuration of the fluid or solidify, and if they solidify, microscopic rearrangement processes system, and the process is continued whether they form a crystalline solid or a modeled by statistical mechanics, with until no further improvements can be glass. Ground states and configurations the cost function playing the role of found. Iterative improvement consists of close to them in energy are extremely energy. However, accepting only rear- a search in this coordinate space for rare among all the configurations of a rangements that lower the cost function rearrangement steps which lead down- macroscopic body, yet they dominate its of the system is like extremely rapid hill. Since this search usually gets stuck properties at low temperatures because quenching from high temperatures to in a local but not a global optimum, it is as T is lowered the Boltzmann distribu- T = 0, so it should not be surprising that customary to carry out the process sev- tion collapses into the lowest energy resulting solutions are usually metasta- eral times, starting from different ran- state or states. ble. The Metropolis procedure from sta- domly generated configurations, and As a simplified example, consider the tistical mechanics provides a generaliza- save the best result. magnetic properties of a chain of atoms tion of iterative improvement in which There is a body of literature analyzing whose magnetic moments, yi, are al- controlled uphill steps can also be incor- the results to be expected and the com- lowed to point only "up" or "down," porated in the search for a better solu- puting requirements of common heuris- states denoted by pi = + 1. The interac- tion. tic methods when applied to the most tion energy between two such adjacent Metropolis et al. (8), in the earliest popular problems (1-3). This analysis spins can be written Jyiyi+,.Interaction days of scientific computing, introduced usually focuses on the worst-case situa- between each adjacent pair of spins con- a simple algorithm that can be used to tion-for instance, attempts to bound tributes ? J to the total energy of the provide an efficient simulation of a col- from above the ratio between the cost chain. For an N-spin chain, if all configu- lection of atoms in equilibrium at a given obtained by a heuristic method and the rations are equally likely the interaction temperature. In each step of this algo- exact minimum cost for any member of a energy has a binomial distribution, with rithm, an atom is given a small random family of similarly structured problems. the maximum and minimum energiesgiv- displacement and the resulting change, There are relatively few discussions of en by +NJ and the most probable state AE, in the energy of the system is com- the average performance of heuristic al- having zero energy. In this view, the puted. If AE 5 0, the displacement is gorithms, because the analysis is usually ground state configurations have statisti- accepted, and the configuration with the more difficult and the nature of the ap- cal weight exp(-Nl2) smaller than the displaced atom is used as the starting propriate average to study is not always zero-energy configurations. A Boltz- point of the next step. The case AE > 0 clear. We will argue that as the size of mann factor, exp(-El/cBT), can offset is treated probabilistically: the probabili- optimization problems increases, the this if kBTis smaller than J. If we focus ty that the configuration is accepted is worst-case analysis of a problem will on the problem of finding empirically the P(AE) = exp(-AElkBT). Random num- become increasingly irrelevant, and the system's ground state, this factor is seen bers uniformly distributed in the interval average performance of algorithms will to drastically increase the efficiency of (0,l) are a convenient means of imple- dominate the analysis of practical appli- such a search. menting the random part of the algo- cations. This large number limit is the In practical contexts, low temperature rithm. One such number is selected and domain of statistical mechanics. is not a sufficient condition for finding compared with P(AE). If it is less than ground states of matter. Experiments P(AEI),the new configuration is retained; that determine the low-temperature state if not, the original configuration is used StatisticalMechanics of a material-for example, by growing a to start the next step. By repeating the single crystal from a melt-are done by basic step many times, one simulates the Statistical mechanics is the central dis-careful annealing, first melting the sub- thermal motion of atoms in thermal con- cipline of condensed matter physics, a stance, then lowering the temperature tact with a heat bath at temperature T. slowly, and spending a long time at tem- This choice of P(Ab7 has the conse- body of methods for analyzing aggregate properties of the large numbers of atoms peratures in the vicinity of the freezing quence that the system evolves into a to be found in samples of liquid or solid point. If this is not done, and the sub- Boltzmann distribution. matter (7).Because the number of atoms stance is allowed to get out of equilibri- Using the cost function in place of the is of order 1O2>er cubic centimeter, um, the resulting crystal will have many energy and defining configurations by a defects, or the substance may form a only the most probable behavior of the set of parameters {x,},it is straightfor- system in thermal equilibrium at a given glass, with no crystalline order and only ward with the Metropolis procedure to temperature is observed in experiments. metastable, locally optimal structures. generate a population of configurations This can be characterized by the average Finding the low-temperature state of a of a given optimization problem at some and small fluctuations about the average system when a prescription for calculat- effective temperature. This temperature behavior of the system, when the aver- ing its energy is given is an optimization is simplya control parameter in the same age is taken over the ensemble of identi- problem not unlike those encountered in units as the cost function. The simulated cal systems introduced by Gibbs. In this combinatorial optimization. However, annealing process consists of first "melt- ensemble, each configuration, defined the concept of the temperature of a phys- ing" the system being optimized at a by the set of atomic positions, {r,),of the ical system has no obvious equivalent in high effective temperature, then lower- SCIENCE, VOI.. 220 ing the temperature by slow stages untilgun and hence that very slow cooling is This process is usually divided into required. It can also be used to deter- several stages. First, the design must be the system "freezes" and no further changes occur. At each temperature, the mine the entropy by the thermodynamic partitioned into groups small enough to simulation must proceed long enough for relation fit the available packages, for example, the system to reach a steady state. The into groups of circuits small enough to fit sequence of temperatures and the num- into a single chip, or intogroups of chips ber of rearrangements of the,)attempt- and associated discrete components that ed to reach equilibrium at each tempera-Integrating Eq. 5 gives can fit onto a card or other higher level ture can be considered an annealing package. Second, the circuits are as- signed specific locations on the chip. schedule. Annealing, as implemented by the Me- This stage is usually called placement. tropolis procedure, differs from iterativeere TIis a temperature at which S is Finally, the circuits are connected by improvement in that the procedure need known, usually by an approximation val- wires formed photolithographically out not get stuck since transitions out of aid at high temperatures. of a thin metal film, often in several local optimum are always possible at The analogy between cooling a fluici layers. Assigning paths, or routes, to the nonzero temperature. A second and and optimization may fail in one impor- wires is usually done in two stages. In more important feature is that a sort oftant respect. In ideal fluids all the atomsgh or global wiring, the wires are assigned to regions that represent sche- adaptive divide-and-conquer occurs. arealike and the ground state is a regular Gross features of the eventual state of crystal. A typical optimization problem matically the capacity of the intended the system appear at higher tempera- will contain many distinct, noninter- package. In detailed wiring (also called tures; fine details develop at lower tem-hangeable elements, so a regular solu- exact embedding), each wire is given a peratures. This will be discussed with tion is unlikely. However, much re- unique complete path. From the detailed specific examples. search in condensed matter physics is wiring results, masks can be generated Statistical mechanics contains many directed at systems with quenched-in and chips made. useful tricks for extracting properties ofndomness, in which the atoms are not At each stage of design one wants to a macroscopic system from microscopic all alike. An important feature of such optimize the eventual performance of the averages. Ensemble averages can be ob- systems, termed "frustration," is that system without compromising the feasi- tained from a single generating function,nteractions favoring different and in- bility of the subsequent design stages. the partition function, compatible kinds of ordering may be Thus partitioning must be done in such a simultaneously present(Y). The magnet- way that the number of circuits in each Z = Tr exp - ic alloys known as "spin glasses," whichpartition is small enough to fit easily into I;( exhibit competition between ferromag- the available package, yet the number of netic and antiferromagnetic spin order- signals that must cross partition bound- in which the trace symbol, Tr, denotes a sum over all possible configurations of ing, are the best understood example of aries (each requiring slow, power-con- the atoms in the sample system. The frustration (10). It is now believed thatuming driver circuitry) is minimized. logarithm of Z, called the free energy, highly frustrated systems like spin glass-e major focus in placement is on mini- F(T),contains information about the av- es have many nearly degenerate random mizing the length of connections, since erage energy, <E(T)>, and also the en- ground states rather than a single groundhis translates into the time required for tropy, S(T), which is the logarithm of tstate with a high degree of symmetry. propagation of signals, and thus into the number of configurations contributing toThese systems stand in the same relationspeed of the finished system. However, the ensemble at T: to conventional magnets as glasses do tothe placements with the shortest implied crystals, hence the name. wire lengths may not be wirable, because The physical properties of spin glassesf the presence of regions in which the at low temperatures provide a possible wiring is too congested for the packag- Boltzn~ann-weightedensemble averages guide for understanding the possibilitiesng technology. Congestion, therefore, are easily expressed in terms of deriva-of optimizing complex systems subject should alsobe anticipated and minimized tives ofF. Thus the average energy is to conflicting (frustrating) constraints.uring the placement process. In wiring, given by it is desirable to maintain the minimum possible wire lengths while minimizing Physical Design of Computers sources of noise, such as cross talk be- tween adjacent wires. We show in this and the:rate of change of the energy with The physical design of electronic sys-and the next two sections how these respect to the control parameter,T, is tems and the methods and simplifica- conflicting goals can be combined and related to the size of typical variatiotions employed to automate this process made the basis of an automatic optimiza- the energy by have been reviewed (11, 12). We first tion procedure. provide some background and defini- The tight schedules involved present tions related to applications of the simmajor obstacles to automation and opti- lated annealing framework to specific mization of large system design, even problems that arise in optimal design ofwhen computers are employed to speed computer systems and subsystems. up the mechanical tasks and reduce the Physical design follows logical design. chance of error. Possibilities of feed- In statistical mechanics C(T) is called theer the detailed specification of the back, in which early stages of a design specfic heat. A large value of C signals agic of a systemis complete, it is neces-e redone to solve problems that be- change in the state of order of a system, came apparent only at later stages, are sary to specify the precise physical real- and car1 be used in the optimization ization of the system in a particular tech-atly reduced as the scale of the over- context to indicate that freezing has be-ology. all system being designed increases.- 13MAY 1983 The objective functionf has precisely the form of a Hamiltonian, or energy function, studied in the theory of random magnets, when the common simplifying assumption is made that the spins, pi, have only two allowed orientations (up or down), as in the linear chain example of the previous section. It combines lo-5 : cal, random, attractive ("ferromagnet- ic") interactions, resulting from the, 5 'i with a long-range repulsive ("antiferro-I 2 Boundary magnetic") interaction due A. No con- Fig. 2. Construction of a horizontal net-cross- figuration of the {pl}can simultaneouslying histogram. 0 1 0 0 0 ~203000 4000 5000 6000 satisfy all the interactions, so the system Sum of pins on the two chips is "frustrated," in the sense formalized by Toulouse (9). circuits in a typical system could be Fig. 1. Distribution of total number of pinIf theaiare completely uncorrelated, connected with short-range interactions required in two-way partition of a micro- processor at various temperatures. Arrow in-can be shown (13) that this Hamilto- if they were embedded in a space with dicates best solution obtained by rapid nian has a spin glass phase at low tem- dimension between two and three. Un- quenching as opposed to annealing. peratures. This implies for the associatcorrelated connections, by contrast, can magnetic problem that there are many be thought of as infinite-dimensional, degenerate "ground states" of nearly since they are never short-range. timization procedures that can incorpo- equal energy and no obvious symmetry. The identification of Eq. 7 as a spin rate, even approximately, information The magnetic state of a spin glass is veryass Hamiltonian is not agected by the about the chance of success of later stable at low temperatures(14), so the reduction to a two- or three-dimensional stages of such complex designs will be ground states have energies well below problem, as long as AN = 212. The de- increasingly valuable in the limit of verthe energies of the random high-tempera-gree of ground state degeneracy in- large scale. ture states, and transforming one groundcreases with decreasing dimensionality. System performance is almost always state into another will usually require For the uncorrelated model, there are achieved at the expense of design conve- considerable rearrangement. Thus this typically of order 'Inearly degenerate nience. The partitioning problem pro- analogy has several implications for opti-ound states (14),while in two and three vides a clean example of this. Consider mization of partition: dimensions, 2"N, for some small value, N circuits that are to be partitioned be- 1) Even in the presence of frustrationa , are expected (16). This implies that tween two chips. Propagating a signal significant improvements over a random finding a near-optimum solution should across a chip boundary is always slow, starting partition are possible. become easier, the lower the effective so the number of signals required to 2) There will be many goodnear-opti- dimensionality of the problem. The en- cross between the two must be mini- ma1 solutions, so a stochastic search tropy, measurable as shown in Eq. 6, mized. Putting all the circuits on one procedure such as simulated annealing provides a measure of the degeneracy of chip eliminates signal crossings, but usu-hould find some. solutions.S(T) is the logarithm of the ally there is no room. Instead, for later 3) No one of the ground states is sig-number of solutions equal to or better convenience, it is desirable to divide theificantly better than the others, so itthan the average result encountered at circuits about equally. not very fruitful to search for the absotemperature T. If we have connectivity information in lute optimum. As an example of the partitioning a matrix whose elements {aii}are the In developing Eq. 7 we made several problem, we have taken the logic design number of signals passing between cir- severe simplifications, considering onlyfor a single-chip IBM"370 microproces- cuitsi andj,and we indicate which chip two-way partitioning and ignoring the sor" (17) and considered partitioning it fact that most signals connect more than circuiti is placed on by a two-valued into two chips. The original design has variablep,;= t 1, then N,, the number two circuits. Objective functions analo-approximately 5000 primitive logic gates of signals that must cross a chip bound- goustof that include both complications and 200 external signals (the chip has 200 ary is given by Ci:.,(a;,i4)(piJ2.Cal- are easily constructed. They no longer logic pins). The results of this study are culating &pi gives the difference be- have the simple quadratic form of Eq. 7,plotted in Fig.. If one randomly assigns tween the numbers of circuits on the two but the qualitative feature, frustrationgates to the two chips, one finds the remains dominant. The form of the Ham- chips. Squaring this imbalance and intro- distribution marked T = M-for the num- ducing a coefficient,A, to express the iltonian makes no difference in the Me- ber of pins required. Each of the two relative costs of imbalance and boundary tropolis Monte Carlo algorithm. Evalua- chips (with about 2500 circuits) would crossings, we obtain an objective func- tion of the change in function when a need 3000 pins. The other distributions tion,j; for the partition problem: circuit is shifted to a new chip remainsin Fig. 1 show the results of simulated rapid as the definition of f becomes annealing. more complicated. Monte Carlo annealing is simple to It is likely that the aii are somewhatimplement in this case. Each proposed Reasonable values of A should satisfy correlated, since any design has consid-configuration change simply flips a ran- A s 212, where z is the average number erable logical structure. Efforts to unddomly chosen circuit from one chip to of circuits connected to a typical circuittand the nature of this structure by the other. The new number of exkrnal (fan-in plus fan-out). ChoosingA = 212 analyzing the surface-to-volumeratio of connections, C, to the two chips is calcu- implies giving equal weight to changes incomponents of electronic systems [as in lated (an external connection is a net the balance and crossing scores. "Rent's rule" (15)] conclude that the with circuits on both chips, or a circuit SCIENCE, VOI..220 connected to one of the pins of the tures, the system is considered "frozen" 100 circuits, we found partitions result- and annealing stops. original single-chipdesign), as is the new ing in chips with 271 and 183 pins. balance score, H, calculated as in deriv- The finite temperature curves in Fig.1 If, instead of slowly cooling, one were ing Eq. 7. The objective function analo- show the distribution of pins per chip for to start from a random partition and gous to Eq. 7 is the configurations sampled at 7'= 2.5, accept only flips that reduce the objec- 1.O, and 0.1. As one would expect from tive function (equivalent to setting= 0 the statistical mechanical analog, the dis-in the Metropolis rule), the result is chips where C is the sum of the number of tribution shifts to fewer pins and shar- with approximately 700 pins (several external connections on the two chips pens as the temperature is decreased. such runs led to results with 677 to and B is the balance score. For this The sharpening is one consequence of 730 pins). Kapid cooling results in a example, h - 0.01. the decrease in the number of configura- system frozen into a metastable state For the annealing schedule we chose tions that contribute to the equilibriom far from the optimal configuration. The to start at a high "temperature," ensemble at the lower temperature. In best result obtained after several rapid TO = 10, where essentially all proposed the language of statistical mechanics, the quenches is indicated by the arrow in circuit flips are accepted, then cool ex- entropy of the sysiem decreases. For Fig. I. ponentially, T,,= (T11T0)"7;,, ith the ra- this sample run in the low-temperature tio Tl/TI,= 0.9. At each temperature limit, the two chips required 353 and 321 enough flips are attempted that either pins, respectively. There are 237 nets Placement there are ten accepted flips per circuit onconnecting the two chips (requiring a pin the average (for this case, 50,000 accept- on each chip) in addition to the 200 Placement is a further refinement of ed flips at each temperature), or the inputs and outputs of the original chip. the logic partitioning process, in which tiumber of attempts exceeds 100 times 'The final pal-tition in this example has the circuits are given physical positions the number of circuits before ten flips the circuits exactly evenly distributed (11, 12, 18, 19). In principle, the two per circuit have been accepted. If the between the two partitions. Using a stages could be combined, although this desired number of acceptances is not more complicated balance score, which is not often possible in practice. The achieved at three successive tempera- did not penalize imbalance of less than objectives in placement are to minimize Initial position T: 10000 . . . . . .. . . . . .. . 99 9 8 9 2 8 3 7 2 :)>: . . . . . 7 1 9 2 9 5 9 4 0 7 :53 6.'. .'s.?: :?? 5 6 8 ... . ' : ' 4 0 5 . . . . . . . . 9 3 9 5 5 5 5 4 5 2 2 5 :??, $, . . . .a,?: 8 3 8(+ 9 7 7 3 7 2 $$ ?:f: :2:2; 7 6 0 9 7 6 9 7 1 @$ ,, q , 6.2. .6:1: 'G.3: 5 6 39 5 99 9 8 @ 5 2 :&:I::?%. 12 2 3 8 0 9 . - .. 559 90 5 3 8 7 @ .& @ 9 4 7 3 'gl 5 1 5 9 0 $ z a @ a 7 00 5 6 0 8 9 27 67 7 8 2 3 1 2 10 11 16 25 7 8 5 5 6 9 10 a 11 1 7 1 8 29 4 1 1 5 16 1 91 @ 3 3 5 9 17 9 8 9 2 7 67 5 3 6 4 # I#/ @ a 7 1 7 5 9 5 .-- - . 5 a I 30 49 @ fa ! !mHl 8 19 15 2 9 49 - illltii @ ill#l 7 8 5 5 6 2 28 1 8 7 9 5 rn 36 +@@ @ 6 51 @ ll[ll a 6 9 5 5 5 5 l 9 -n m W I/\\I Iiil rn 3 9 5 7 9 '+l gg fg z* /// 4 9 9 - - =- - a =-g I#] &-- 30 2 8 7 1 36 \\i/i 3 3 c 4 7 7 7 0 08 9 2 7 8 28 2 0 9 3 2 9 0 97 3 6 6 8 9 d 3 7 34 7 1 569 6 1 3 580 6 2 56 3 2 590 334 Fig. 3. Ninety-eight chips on a ceramic module from the IBM 3081. Chips are identified by number (I to 100,with 20 and 100absent)and function. Thedark squares comprise an adder. the three typesquares wit11ruled lines are chips that contr-oland supply data to the adder, the lightly dot- ted chips perform logical arithmetic (bitwise OK, and so on), and the open squares denote genera-purpose registers, which serve both arithmetic units. The numbers at the left and lower edges the module image are the vertrcal and horizontal net-crossing histograms, respectively.a) Original chip placement; (b) a configurat7=- 10,000; (c)=T 1250;(d) a zero-temperature result. 13MAY 1983 075 signal propagation times or distances time as wire length, we use a convenient channel somewhere on the boundary. To while satisfying prescribed electrical intermediate analysis of the layout, a combine this information into a single constraints, without creating regions so net-crossing histogram. Its construction objective function, we introduce a congested that there will not be room is summarized in Fig. 2. We divide the threshold level for each histogram-an later to connect the circuits with actual package surface by a set of natural amount of wire that will nearly exhaust wire. boundaries. In this example, we use the the available wire capacity-and then Physical design of computers includes boundaries between adjacent rows or sum for all histogram elements that ex- several distinct categories of placement columns of chip sites. The histogram ceed the threshold the square of the problems, depending on the packages then contains the number of nets cross- excess over threshold. Adding this quan- involved (20). The larger objects to be ing each boundary. Since at least one tity to the estimated length gives the placed include chips that must reside in a wire must be routed across each bound- objective function that was used. higher level package, such as a printed ary crossed, the sum of the entries in the Figure 3 shows the stages of a simulat- circuit card or fired ceramic "module" histogram of Fig. 2 is the sum of the ed annealing run on the 98-chip module. (21). These chip carriers must in turn be horizontal extents of the rectangles Figure 3a shows the chip locations from placed on a backplane or "board," bounding each net, and is a lower bound the original design, with vertical and which is simply a very large printed to the horizontal wire length required. horizontal net-crossing histograms indi- circuit card. The chips seen today con- Constructing a vertical net-crossing his- cated. The differentshading patterns dis- tain from tens to tens of thousands of togram and summing its entries gives a tinguish the groups of chips that carry logic circuits, and each chip carrier or similar estimate of the vertical wire out different functions. Each such group board will provide from one to ten thou- length. was designed and placed together, usual- sand interconnections. The partition and The peak of the histogram provides a ly by a single designer. The net-crossing placement problems decouple poorly in lower bound to the amount of wire that histograms show that the center of the this situation, since the choice of which must be provided in the worst case, since layout is much more congested than the chip should carry a given piece of logic each net requires at least one wiring edges, most likely because the chips will be influenced by the position of that known to have the most critical timing chip. constraints were placed in the center of The simplest placement problems the module to allow the greatest number arise in designing chips with structured of other chips to be close to them. layout rules. These are called "gate ar- Heating the original design until the ray" or "master slice" chips. In these chips diffuse about freely quickly pro- chips, standard logic circuits, such as duces a random-looking arrangement, three- or four-input NOR'S, are pre- Fig. 3b. Cooling very slowly until the placed in a regular grid arrangement, and chips move sluggishly and the objective the designer specifies only the signal function ceases to decrease rapidly with wiring, which occupies the final, highest, change of temperature produced the re- layers of the chip. The circuits may all be sult in Fig. 3c. The net-crossing histo- identical, or they may be described in grams have peaks comparable to the terms of a few standard groupings of two peak heights in the original placement, or more adjacent cells. o i _ _ _ ~ .but ~re muc. fl-tter.-At tiis "freezing to1 lo2 lo3 Io4 As an example of a placement problem Temperature point," we find that the functionally re- with realistic complexity without too lated groups of chips have reorganized many complications arising from pack- Fig. 4. Specific heat as a function of ternfrom the melt, but now are spatially age idiosyncrasies, we consider 98 chips ture for the design of Fig. 3, a to d. separated in an overall arrangement packaged on one multilayer ceramic quite dilferent from the original place- module of the IRM 3081 processor (21). ment. In the final result, Fig. 3d, the Each chip can be placed on any of 100 histogram peaks are about 30 percent sites, in a I0x I0 grid on the top surface less than in the original placement. Inte- of the module. Information about the grating them, we find that total wire connections to be made through the sig- length, estimated in this way, is de- nal-carrying planes of the module is con- creased by about 10 percent. The com- tained in a "netlist," which groups sets puting requirements for this example of pins that see the same signal. were modest: 250,000 interchanges were The state of the system can be briefly attempted, requiring 12 minutes of com- represented by a list of the 98 chips with putation on an IBM 3033. theirx and y coordinates, or a list of the Between the temperature at which contents of each of the 100 legal loca- clusters form and freezing starts (Fig.c) tions. A sufficient set of moves to use for and the final result (Fig. 3d) there are many further local rearrangements. The annealing is interchanges of the contents of two locations. This results in either functional groups have remained in the the interchange of two chips or the inter- same regions, but their shapes and rela- change of a chip and a vacancy. For tive alignments continue to change more efficient search at low tempera- throughout the low-temperature part of tures, it is helpful to allow restrictions on the annealing process. This illustrates the distance across which an interchange that the introduction of temperature to may occur. Pig. 5. Examples of (a) L-shaped and (b%- the optimization process permits a con- To measure congestion at the same shaped wire rearrangements. trolled, adaptive division of the problem S('IENCE, VOL. 220 through the evolution of natural clusters Random M.C. 2-paths at the freezing temperature. Early pre- Grid size 10 Grid size 10 scription of natural clusters is also a central feature of several sophisticated placement programs used in master slice chip placement (22, 23). A quantity corresponding to the ther- nlodynamic specific heat is defined for this problem by taking the derivative with respect to temperature of the aver- age value of the objective function ob- served at a given temperature. 'I'his is plotted in Fig. 4. Just as a maximum in the specific heat of a fluid indicates the onset of freezing or the formation of clusters, we find specific heat maxima at two temperatures, each indicating a dif- ferent type of ordering in the problem. Fig. 6 (left). Wire density in the OX-chip module with the connections randomly assigned to The higher temperature peak corre- perimeter routes. Chips are in the originalacement. Fig.7 (right). Wire density after sponds to the aggregation of clusters of simulated annealing of the wire routing. using %-shapedmoves. functionally related objects, driven apart by the congestion term in the scoring. The lower temperature peak indicates We can model the rough routing prob- tailed wiring. In that case, all the links the further decrease in wire length ob- lem (and even simple cases of detailed have either one or no wires, and links tained by local rearrangements. l'his sort embedding) by lumping all actual pin with two or more wires are illegal. For of measurement can be useful in practice positions into a regular grid of points, this limit the best possible value of will as a means of determining the tempera- which are treated as the sources and be NwINL.. ture ranges in which the important rear- sinks of all connections. The wires are For the L,-shaped moves, F has a rangements in the design are occurring, then to be routed along the links that relatively simple form. L,et t,,= +I wherc slower cooling with be helpful. connect ad.jacent grid points. along the links that connection ihas for 'I'he objectives in global routing are to one orientation. - 1for the other orienta- minimize wire length and, often, the tion, and 0 otherwise. Let (I;be 1 if the Wiring number of bends in wires, while spread- ith connection can run through the vth ing the wire as evenly as possible to 0 link in either of its two positions, and After placement, specific legal rout- simplify exact embedding and later revi- otherwise. Note that (I;isjust E;,;Then i n gmust be found for the wires needed sion. Wires are to be routed around ifp, = + 1 indicates which route the ith to connect the circuits. The techniques regions in which wire demand exceeds connection has taken, we obtain for the typic;rlly applied to generate such rout- capacity if possible, so that they will not number of wires along the vth link, i n gare sequential in nature, treating one "overflow," requiring drastic rearrange- (Il"(e,L,+l 1) wire at a time with incomplete informa- ments of the other wires during exact n,,= + nu(()) (9) tion about the positions and effects of the embedding. Wire bends are costly in , 2 other wires (11,24). Annealing is inher- packages that confine the north-south where n,,(O)is the contribution from ently free of this sequence dependence. and east-west wires to different layers, straight wires, which cannot move with- In this section we describe a simulated since each bend requires a connection out increasing their length, or blockages. annealing approach to wiring, using the between two layers. 'I'wo classes of Summing the n,,gives ceramic module of the last section as an moves that maintain the minimum wire example. length are shown in Fig. 5. In the L- Nets with many pins must first be shaped move of Fig. 5a, only the essen- broken into connections-pairs of pins tial bends are permitted, while the Z- which has the form of the Hamiltonian joinetd by a single continuous wire. 'l'his shaped move of Fig. 5b introduces one for a random magnetic alloy or spin "ordering" of each net is highly depen- extra bend. We will explore the optimi- glass, like that discussed earlier. The dent on the nature of the circuits being zation possible with these two moves. "random field," hi, felt by each movable connected and the package technology. For a simple objective function that connection reflects the ditference, on the Orderings permitting more than two pins will reward the most balanced arrange- average, between the congestion associ- to be connected are sometimes allowed, ment of wire, we calculate the square of ated with the two possible paths: but will not be discussed here. the number of wires on each link of the The usual procedure, given an order- network, sum the squares for all links, ing, is first to construct a coarse-scale and term the result F. If there are NI, routing for each connection from which links and Nw wires, a global routing The interaction between two wires is the ultimate detailed wiring can be com- program that deals with a high density of proportional to the number of links on pleted. Package technologies and struc- wires will attempt to route precisely the which the two nets can overlap, its sign turecl image chips have prearranged ar- average number of wires, NwlNI,,along depending on their orientation conven- eas of fixed capacity for the wires. For each link. In this limit I.' is bounded tions: the rough routing to be successful, it below by N ~ * /N ,.ne can use the same must not call for wire densities that ex- ob.jective function for a low-density (or ceed this capacity. high-resolution) limit appropriate for de- Both Jii and hi vanish, on average, so it is of wire that results from assigning each ple, the largest numbers of wires on a the fluctuations in the terms that make wire to an L-shaped path, choosing ori- single link are 105 (x) and 96 (y). up F which will control the nature of the entations at random. The thickness of We compare the various methods of low-energy states. This is also true in the links is proportional to the number of improving the wire arrangement by plot- spin glasses. We have not tried to exhibit wires on each link. The congested area ting (Fig. 8) the highest wire density a functional form for the objective func- that gave rise to the peaks in the histo- found in each column ofx-links for each tion with Z-moves allowed, but simply grams discussed above is seen in the of the methods. The unevenness of the calculate it by first constructing the actu- wiring just below and to the right of the density profiles was already seen when al amounts of wire found along each link. center of the module. The maximum we considered net-crossing histograms To assess the value of annealing in numbers of wires along a single link in as input information to direct placement. wiring this model, we studied an ensem- Fig. 6 are 173 (x direction) and 143 0, The lines shown represent random as- ble of randomly situated connections, direction), so the design is also aniso- signment of wires with 1,-moves; align- under various statistical assumptions. tropic. Various ways of rearranging the ing wires in the direction of least average Here we consider routing wires for the wiring paths were studied. Monte Carlo congestion-that is, along h,-followed 9.3 chips on a module considered earlier. annealing with Z-moves gave the best by cooling for one pass at zero T;simu- First, we show in Fig. 6 the arrangement solution, shown in Fig. 7. In this exam- lated annealing with L-moves only; and annealing with Z-moves. Finally, the light dashed line shows the optimum result, in which the wires are distributed with all links carrying as close to the average weight as possible. The opti- mum cannot be attained in this example Pig. 8. Histogram of the rnaxi- without stretching wires beyond their mum wire densities within a minimum length, because the connec- given column of x-links, for ........ tions are too unevenly arranged. Any the various methods of rout- method of optimization gives a signifi- ing. cant improvement over the estimate ob- tained by assigning wire routings at ran- I : : : : : . : J , dom. All reduce the peak wire density on 2 4 6 8 10 a link by more than 45 percent. Simulat- 1 3 5 7 9 ed annealing with %-movesimproved the Channel position random routing by 57 percent, averaging results for both x and y links. Traveling Salesmen Quantitative analysis of the simulated annealing algorithm or comparison be- tween it and other heuristics requires problems simpler than physical design of computers. There is an extensive litera- ture on algorithms for the traveling sales- man problem (3, 4), so it provides a natural context for this discussion. If the cost of travel between two cities is proportional to the distance between them, then each instance of a traveling salesman problem is simply a list of the positions of N cities. For example, an arrangement of N points positioned at random in a square generates one in- stance. The distance can be calculated in either the Euclidean metric or a "Man- hattan" metric, in which the distance between two points is the sum of their separations along the two coordinate axes. The latter is appropriate for physi- cal design applications, and easier to compute, so we will adopt it. We let the side of the square have length N " ~ ,O that the average distance between each city and its nearest neigh- bor is independent of N. It can be shown that this choice of length units leaves the Fig. 9. Results at four temperatures for a clustered 400-city traveling salesman problem. The points are uniformly distributed in nine regions. (a) T = 1.2, a = 2.0567; (b) T = 0.8, optimal tour length per step independent n = 1.515;(c) T= 0.4,a = 1.055; (d) T= 0.0,n = 0.7839. of N, when one averages over many instances, keeping N fixed (25). Call this but the detailed arrangement of the long expect similarfreezing effects to limit the steps continued to change down to effectiveness of the common device of average optimal step length a. To bound a firom above, a numerical experiment T = 0.4 (Fig. 9c). Below T = 0.4, no employing iterative improvement repeat- was performed with the following further changes in the arrangement of the edly from different random starting con- "greedy" heuristic algorithm. From long steps were seen, but the small-scale figurations. Simulated annealing extends two of each city, go to the nearest city not structure within each region continued to already on the tour. From the Nth city, evolve, with the result shown in Fig. 9d. the most widely used heuristic tech- return directly to the first. In the worst niques. The temperature distinguishes case, the ratio of the length of such a classes of rearrangements, so that rear- greedy tour to the optimal tour is propor- Summary and Conclusions rangements causing large changes in the tional to In(N) (26), but on average, we objective function occur at high tempera- find that its step length is about12. The Implementing the appropriate Metrop- tures, while the small changes are de- varian

### BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.

### You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

## Why people love StudySoup

#### "I was shooting for a perfect 4.0 GPA this semester. Having StudySoup as a study aid was critical to helping me achieve my goal...and I nailed it!"

#### "I bought an awesome study guide, which helped me get an A in my Math 34B class this quarter!"

#### "I was shooting for a perfect 4.0 GPA this semester. Having StudySoup as a study aid was critical to helping me achieve my goal...and I nailed it!"

#### "Their 'Elite Notetakers' are making over $1,200/month in sales by creating high quality content that helps their classmates in a time of need."

### Refund Policy

#### STUDYSOUP CANCELLATION POLICY

All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email support@studysoup.com

#### STUDYSOUP REFUND POLICY

StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here: support@studysoup.com

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to support@studysoup.com

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.