New User Special Price Expires in

Let's log you in.

Sign in with Facebook


Don't have a StudySoup account? Create one here!


Create a StudySoup account

Be part of our community, it's free to join!

Sign up with Facebook


Create your account
By creating an account you agree to StudySoup's terms and conditions and privacy policy

Already have a StudySoup account? Login here

Special Topics

by: Alayna Veum
Alayna Veum

GPA 3.81


Almost Ready


These notes were just uploaded, and will be ready to view shortly.

Purchase these notes here, or revisit this page.

Either way, we'll remind you when they're ready :)

Preview These Notes for FREE

Get a free preview of these Notes, just enter your email below.

Unlock Preview
Unlock Preview

Preview these materials now for free

Why put in your email? Get access to more of this material and other relevant free materials for your school

View Preview

About this Document

Class Notes
25 ?




Popular in Course

Popular in ComputerScienence

This 0 page Class Notes was uploaded by Alayna Veum on Monday November 2, 2015. The Class Notes belongs to CS 8803 at Georgia Institute of Technology - Main Campus taught by Staff in Fall. Since its upload, it has received 71 views. For similar materials see /class/234094/cs-8803-georgia-institute-of-technology-main-campus in ComputerScienence at Georgia Institute of Technology - Main Campus.

Similar to CS 8803 at

Popular in ComputerScienence


Reviews for Special Topics


Report this Material


What is Karma?


Karma is the currency of StudySoup.

You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 11/02/15
Memory Management In the beginning there was FORTRAN o no dynamically allocated memory o everything in static arrays with fixed limits 0 a significant style as recently as 1983 TEX Dynamic allocation explicit memory management o Pascal newdispose o Cmallocfree o Cnewdelete Still an area for some research o efficiency o locality important for memory hierarchy o avoiding fragmentation many schemes Knuth vol 1 also Grunwald amp Zorn The terror of free When do you call free o must call sometime malloc without free leads to memory leak o premature call to free gt dangling reference hello core dump The things we do o forget about free just pray o crufty interfaces who must call free two ways to call every procedure 0 return pointers to static memory Ouch 0 make copies of things I own my copy you own yours We can do better f N Automatic Memory Management Why not have malloc without free let memory be freed automatically why not call it cons Can be required in the language definition No Scheme object is ever destroyed The reason that implementations of Scheme do not usually run out of storange is that they are permitted to reclaim the storage occupied by an object if they can prove that the object cannot possibly matter to any future computation Plan of study o see how garbage is created o see how we identify it o discuss ways of reclaiming it What is garbage A cell is garbage if o its contents can t affect any future legal computation As an approximation we call a cell garbage if o it is not reachable from a root set Any cell that is not garbage is live data 0 live data might affect a future computation What values can affect a computation We divide these values into two parts Roots o local variables of procedures 0 parameters of procedures o global variables Other values reachable from the roots 0 what if a formal parameter is a cons cell K Tracing pointers Confusing pointer amp integer could cause leaks o suppose integer n is address of an object Two major approaches o Type tagging each type of heap object gets unique tag tag leads to descriptive info compiler puts which fields are pointers in desc Example Modula3 o Pointer tagging use special bits to distinguish pointers integers LISP machines had special tag bits on stock hardware loses a bit from integer space eg all integers have low bit set heap object descriptors only show size some objects are nonpointercontaining eg strings Example Standard ML of New Jersey J Two families of garbage collectors Using start at roots follow pointers approach 0 markandsweep collection 1 unmark all objects 2 trace pointers marking all live data 3 sweep away unmarked objects unmarked objects moved to free list reallocated copying collection works with two semispaces all objects some free space in fromspace tospace is empty fromspace full copy live data to tospace tospace is not full not all objects copied now objects free space are in tospace flip The evil third family Using heavy compiler support reference counting if no pointer to an object it must be garbage keep in each object a reference count number of references ptrs to object if reference count becomes zero put on free list code must adjust ref count at every assignment also can t reclaim cycles BUT No pause times N OK Garbage collection and heap growth Allocate Empty 1 go 11 live cells k free cells Allocate OK Empty Grow Heap l Allocate OK Empty FATAL f N Mark and Sweep Collection Return to association list nr Course 152 Email nreecs Office MaxwellDworcin23 Office 231 Maxwell Dworkin Sweep Sweep phase looks at every cell in the heap puts unmarked cells on free list nr 152 Course Email nreecs free list Office Maxwell DworkiI Store marks X in objects or in separate bitmap K 231 J N Tracking pointers in the mark phase While marking must track marked nodes whose children have not yet been marked will go back and mark those children later essentially a depthfirst traversal depthfirst walk requires a stack but we re out of memory Early collectors allotted a special stack about 5 of memory size dump core if special stack exhausted Along came Peter Deutsch Herbert Schorr and William M Waite Marking algorithm uses only three extra pointers plus one bit per node Idea if you have pointer to current node you don t need copy in a predecessor node So temporarily save stack in car and cdr fields For details see Knuth vol 1 235 This problem vanishes in a copying collector Shifting work from collection to allocation Work done at collection unmark phase proportional to total heap size mark phase proportional to live data sweep phase proportional to total heap size large heap gt long pause times Work done at allocation constant work take one cell off free list Clever ideas 0 unmark amp sweep in allocator cuts pause time o use freespace pointer l MIMIIIIWII Allocate step free pointer to next unmarked cell Free list is eliminated saves pointer fiddling During one GC cycle collector work is proportional to live data total allocator work is constant per allocation PLUS amount of live data at last collection Most cells garbage expected work per allocation is small Mark and sweep with objects of varying size What if not everything in life is a cons cell Can t have a single free list to satisfy every request May have 0 multiple free lists of different sizes 0 free memory broken into arbitrary blocks first fit best fit 0 buddy system allocator Just like traditional dynamic memory allocators And we have the same problem fragmentation hurts locality of reference When we ask for an object we may have plenty of memory but it may be in tiny unusable pieces between allocated bits f N Beating fragmentation Good strategy called BIBOP Big Bag of Pages divide memory into fixedsize pages each page contains objects of a single type and size objects larger than 1 page typically get special tratment Advantages can identify an object s type just by its address no tag bits or descriptor word needed cons cells 33 off can put mark bits in a separate bitmap less overhead better locality can bound losses to fragmentation Stop and copy trading space for time Divide memory into two contiguous semispaces One semispace unused except for collection Allocate from the other Allocation by incrementing a pointer freep f N Stop and copy collection step When semispace is exhaused freep Only some of the shaded area is live data the rest is garbage If we copy only the live data to the other semispace we ll have plenty of room to allocate more freep Copying goes from fromspace to tospace Change of roles from one semispace to the other is a flip J Stop and copy details Collector must preserve sharing cycles Must copy each live object exactly once What if there is more than one path to an object At first visit replace with forwarding pointer don t need extra space ovenvrite object Marks object as already copied Shows location in tospace All references to object must be updated to new locati To mark forwarded could use 1 bitobject but can tell just by the value of the pointer only forwarding pointers point into tospace Must distinguish pointers updated from nonpointers not updated Stop and copy graph traversal As we walk the graph of live objects we must track objects that have been copied but the things they point to haven t been copied Compare markandsweep we had to track objects that were marked but things they pointed to hadn t been marked For markandsweep we used a stack depthfirst traversal For copying we use a queue breadthfirst traversal Use tospace as the queue at no cost Need only two pointers for copying collector freep points to next free location in tospace scanp points to next object whose referents not copied Invariant Objects before scanp point to tospace Objects between scanp and freep point to fromspace N Stop and copy algorithm Basic operation is to forward a pointer forward p if p is forwarding pointer then return p else copy the object p to location freep p freep freep freep 1engthp return p Copies a single object or its forwarding pointer To copy everything start with roots then consume queue from scanp to freep scanp freep beghn ngoftospace for each root r do r forwardr while scanp lt freep do thbebnghofscan for i 0 to Ll do if scanpi is a pointer then scanpi forwardscanpi KLscanp scanp L Stop and copy example lalw i 1 l l3l V i i b Properties of stop and copy Efficiency very good in limit of large memory but so is mark and sweep Crucial advantage comes from lightningfast allocator allocate by check and increment exactly the same cost as stack allocation garbage collection can be cheaper than stack allocation Can reduce cost of check two ways VM hardware cute but hard to port check bounds register check only at loops one check for many allocations Copying step compacts all data there is never any fragmentation of free memory can do compacting markandsweep too Reference counting Widely used technique in early days each object tracks number of references to it when count goes to zero can put object on free list Example value to be returned from mkassoc Email nreecs 1 Office MaxwellDworkin23l Office MaxwellDworkin Old refcounting collectors gave GC a bad name Reference counting costs Must change ref counts at every assignment the killer CPU overhead can reach 2030 various tricks used to cut down costs here Expensive to follow pointers when costs go to 0 can defer to allocation like marksweep So can get good bounds on costs good for realtime properties But ref counting can t collect cyclic garbage 56 Serious implementations need extra collector there go the realtime properties Refcounting can be tricky to get right discovered by many implementors of C classe J Generational collection Two observations about functional programs 0 newer cells tend to point to older cells always true except where there is mutation 0 young cells tend to be shortlived old cells longlived if cell kept for even 1 collection it s probably important will live for a while new cells often throwaway intermediate results eg in simple version of rev Divide heap into generations collect younger generations more frequently When collecting younger generations roots include pointers from old objects to young ones such pointers can be created only by assignment think about cons common solution track all assignments Works well in LISP systems even better in ML systems Generational example 2 generations So for example 2 generations 7 older 7 reserve tospace 7 When free space exhausted copy newer to reserve socalled minor collection older plus survivors x are new older generation survivors are promoted older 7 free 7 4 reserve tospace When older generation approaches half of memory copycollect older generation major collection copy result to beginning of heap start over Massive savings try to estimate fine exam question Copying generations easiest to understand but surprisingly can do with mark and sweep Many generations can save more keep youngest generation entirely in cache Conservative collection copying What if you don t know where roots are but you do know object layouts Mostlycopying conservative collection Bartlett Scheme compiler so treat it as a pointer Entire stack data bss segments are pointers Can t forward change pointers on the stack so objects are pinned in memory But if you know your own objects you can forward anything pointed to only by your own objects BIBOP lets you identify object addresses Conservative guess at roots may mistakenly keep garbage o surprisingly little in practice o get many advantages of regular copying collector including little fragmentation Anything on stack or in globals could be a pointer J Conservative collection mark amp sweep What if you don t know object layouts You can still garbagecollect but you can t move anything because you can t tell pointers from integers anwvhere Markandsweep to the rescue BoethVeiser Superconservative assumptions Anything on stack or in globals could be a pointer and anything on the heap could be a pointer Treat them all as pointers Works astonishingly well in practice simple replacement for mallocfree yields approximately the same CPU cost 30 150 memory overhead a few tricks can make it even better plus can write faster code by relying on GC State of the art collector Boehm et al widely used Java Cedar libscheme dovetails nicely with C C code J Dueling memorymanagement schemes Simple versions have different pros and cons Markandsweep o fragments o hard to do generational collection o poor locality o easy to make conservative Copying o needs big memories o copying large objects is expensive o identifying amp forwarding pointers can be hard o easy to make generational Stateoftheart collectors resemble one another ever more closely Asymptotic behaviors are all the same Important practical aspects of performance are constant factors locality of reference K 40 Plus Ultra There s more Memory management a hot area in research incremental collection concurrent collection realtime collection collection of persistent store many strategies for better performance 41 CSSSUEB r Amman Imelhgence Lecture 4 082602 Neural Networks Inductive L earning Emblem 1221mm x x2 X n Threshnlds as weight n R n 2 3 m m m m M eamme wengS andmveshmds eamme we gms Percepcrnhlearhlug algnrithm V has no local rhrrurha wlll converge on a set ofwelghts that correctly classlfles the example provlded the classes are llnearly separable f Jth feature value oftramlng examplel class class oftramlng examp e asslgn each vverght vv arauolorh value repeat 0 gerrl W2fr2 for each vverght olo epoch vv vv c class e o alpha or ms the learmng ratesrhall cohstaut all reached t at mtsclasstttceuons t ot Spa2h Steepestgradlent olesceht Conslder a perceptroh vvrthout the nal thresholdlng E rnr ruhctlvu Error 22 class 7 O V2 2 class 7 wf e wzfz r 2 gradlent olesceht oh the vverghts vv vv 7 u Erroraw olErrorolvv 2 class 7 0 olclass e o olvv 2 class 7 0 11 Systemlevel Interconnects PCI PCIX PCIExpress HyperTranspcrt RapidIO Advanced Switching Systemlevel interconnect PCI 7 dominant standard standard includes physical interface signalling and readwrite data semantics 7 address data PCI around since early 90s originally Intel backed by PCISIG PCI 32 or 64bit 33 or 66MHZ parallel shared bus synchronized clock signal scalability in terms of of deVices limit on physical distance clock skew electrical and speed limits due to bus loads reliability one misbehaving guy on shared bus limitation drive domain targeted solutions eg graphics ports chiptochip busses loadstore readwrite transactions address data phase with address decode happening in first 7ns for 33MHZ 3ns for 66MHZ PCIX PCIX still a shared parallel bus 64bit 1332665331066 MHz fully compatible with PCI physically signals systems software weakest device on the bus Will set performance limits signal in registers to extend decoding time budget but faster clocks so latency increase not necessarily clock seed throughput additional QoS error correction codes linklayer device ID messages split transactions reordering of transactions attribute eld with byte count gt better buffer management PCIEXpress PCIe HP serial IO interconnect packetbased serial switched PCI Express Links not shared but pointtopoint Upstream port 0 Software compatible with PCI but not otherwise Pc39sfv i t r ss Logically strict hierarchy 39 Sugggg39ggt of PCI bridges just like before Dovglfr tgfm target domain system level 105 1005 few thousands nodes not very dynamic topology 0 one host amp root switch deVicetodeVice comm PCI Express Links PCIe layers PCIe Standard speci es layers Physical lane RXTX pairs 25Gbps with 8b 10b encoding with code words for delimiters and control 1 2 4 8 12 16 32 lanes reaches up to 16Gbytess in each direction Con gurable Widths 32bit aligned data fewer pins gt less power space Data Link managed by 64b DLLP synchronize link acks credit info linklayer CRC and checking and retry if necessary Transport TLP sequence header CRC data of maximum 4kB header speci es operation options split transaction messages Virtual channels and traf c classes mapping not xed creditbased ow control on per VC will not starve for data Originally 3GIO Arapahoe Group Advanced Switching PCIe single host not for peertopeer and multiprocessing PCIEXpress AS gt into networking domain arbitrary topologies increased reliability AS routing model protocol agnostic shared physical and link layer with PCIe but can encapsulate other traf c header routespeci cpart protocolspeci cpart AS uses rst part only unicast sourcebased routing provide encoding of path can backtrace route to source multicast destinationbased routing support number of mcast groups 011100111 001 01110011 encodes path turnpointer keeps track of status bits depend on ports Turnpool 011100111 Turnpointer Forward Routing Congestion management switches detect buffer thresholds and can act upon ForwardBackwardLocal Explicit Congestion Noti cation LECN SBFC 5 011 2 full Standard Bus Width Clock Transfer PC123 32 Bit 33 133 MBs MHz 266 MBs 66 MHz PCI 64 64 Bit 33 266 MBs MHz 533 MBs 66 MHz PCI X 10 64 Bit 66 533 MBs MHz 800 MBs 100 1066 MBs MHz 133 MHz PCI X 20 64 Bit 133 2132 MBs DDR MHz PCI X 20 64 Bit 133 4264 MBs QDR MHz PCI Express 1 Lines 8 25 512 MBs Bit GHz PCI Express 2 Lines 8 25 1 GBs Bit GHz Duplex PCI Express 4 Lines 8 25 2 GBs Bit GHz Duplex PCI Express 8 Lines 8 25 4 GBs Bit GHz Duplex PCI Express 16 Lines 8 25 8 GBs Bit GHz Duplex PCI Express 32 Lanes 8 25 16 GBs Bit GHz Duplex RapidIO and HyperTrasnport RapidIO Motorola HyperTransport AMD mainly embedded and communications market HyperTransport also chiptochip in AMD server systems Xbox switched packetbased originally parallel now serial tradeoff is pins and range vs serializationpacketization logic amp latency link error and ow control in hardware shared distributed memory coherency protocols requesttransaction queue for QoS management RapidIOHTBridgetoPCIquotlt Hypertransport 3 0 Up to 26GHz clock rates lGHz Up to 416GBs aggregate rate 8GBs RapidIO Parallel and serial up to lOGbps Design geared towards communicationsstreaming apps Network Distance Estimation Sridhar Srinivasan Motivation Network Distance Estimation p 2 Network Distance What is network distance 3 Latency 9 Why do we need network distance Identification of nearest servers for games downloads Locating peers in an overlay network Why not use ping or traceroute Need low overhead 3 Cannot estimate distance between arbitrary hosts Network Distance Estimation p 3 Techniques Infrastructure based Require some infrastructure to be setup Estimate distance based on delay measurements 9 Examples lDMaps MCoop 0 Coordinate based 0 Require only a small set of beacons or landmarks Hosts are assigned coordinates Distance is estimated as a function of coordinates Examples Global Network Positioning Internet Coordinate System Virtual Landmarks Using existing infrastructure 0 Uses existing Internetwide infrastructure 0 Example King uses DNS Network Distance Estimation p 4 Direct Measurement Based Triangulation S Hotz Routing information organization to support scalable interdomain routing with heterogeneous path requirements PhD Thesis Select N nodes in the network to be Beacons Coordinate of node H assigned as tuple of distances to each of the Beacons eg dHBNdHBQ dHBN Distance D between any two hosts H1 and H2 is bounded by and D S minie12NdH1B dH2Bz39 Network Distance Estimation p 6 IDMaps P Francis 8 Jamin C Jin D Raz Y Shavitt L Zhang lDMaps A Global Internet Host Distance Estimation Service lEEEACM Trans on Networking Oct 2001 0 Hosts are aggregated into Address Prefixes APs consecutive range of IP addresses within which all hosts are equidistant to the rest of the Internet Systems called Tracers are placed such that each AP is close to one or more Tracers 0 Tracers measure distance between each other and to their closest APs 1 Distance computed as sum of distance of each AP to closest Tracer and distance between these Tracers Network Distance Estimation p 7 IDMaps Operation WV n W W U I a Mm w W MMm 0 gt9 m Address Pre x IDMaps Clients gather distance information from Tracers and build a virtual distance map of the Internet Network Distance Estimation p 8 To get distance infomation between any pair of hosts the IDMaps Client is queried E uery AB IDMaps Operation 0 e Client runs shortest path algorithm on its distance map to compute required distance g 0 0 IDMaps Operation 0 e Results are returned to querying host o o 0A x IDMaps Operation King I K P Gummadi S Saroiu S D Gribble King Estimating latency between arbitrary Internet end hosts IMW 2002 Designed as a tool to estimate latencies between arbitrary Internet hosts Uses DNS architecture for performing queries No offline extrapolation from measured paths Recursive DNS queries are used to measure latency between pairs of DNS servers closest to the hosts Network Distance Estimation p 9 Operation N mnc Scrvcr B Name Sm u39 A mm 1 Reply Q P nddrofxyzfoobnr w p Requcsi Q Forwm39dcdp I WI Om licnl C my Fig 2 The sequence of DNS messages used by King to esti mate latency Nmam Dsunce 5mm m Potential Issues Assumes that most end hosts are close to their DNS name servers 0 Depends on Name Servers performing recursive queries for arbitrary hosts Closest authoritative server is picked using heuristics such as matching hostnames or IP addresses Network Distance Estimation p 11 Evaluation u m as an 1 25 5 m z 225 25 275 3 Hana Esmnaled LatencyMeasured Lalewny Comparison of King and IDMaps from traceroute sewers to set of web servers mm Dsunce Emma 12 Evaluation 2 ng rasr mp remuved 075 r25 r5 r75 2 25 275 3 Hana summed LatencyMeasured Lareucy Comparison of King and IDMaps from traceroute servers to wamrmzsumm 13 set of Napster clients Coordinate System Based Global Network Positioning I T S E Ng H Zhang Predicting Internet Network Distance with CoordinatesBased Approaches INFOCOM 02 Models the Internet as a geometric space Assigns coordinates to each host Distance between pair of hosts is computed as a function of the coordinates of the hosts Network Distance Estimation p 15 Landmark Operations Internet 2 D Coordinate space The N Landmarks measure the latencies to each other using ping 9 Landmarks compute coordinates by minimizing the overall discrepancy between the measured distances and the computed distances 0 This is a multidimensional global minimization problem Network Distance Estimation p 16 Host Operations 9 A host measures its latencies to the N Landmarks using ping Host computes its own coordinates relative to the Landmarks coordinates Minimizes the discrepancy between its measured distances to the Landmarks and the computed distances Network Distance Estimation p 17 Relative Error Comparison Cummnwe pman cw 15 Landmm m GNP 5 unamafx TvungmnmdlU 5 am Trmnzm mm m 5 En man Humps n lt mee Enur mm aim Swanquot 7v m Issues in GNP How many dimensions does the Internet have Placement of the Landmark nodes Network Distance Estimation p 19 Internet Coordinate System H Lim J Hou CH Choi Constructing Internet Coordinate System Based on Delay Measurement IMC 2003 9 Coordinate based approach Uses a set of beacon nodes Host measures its latency to the beacon nodes and this creates a distance vector 0 Distance vector is projected into a smaller dimension space to create the coordinates of the host Network Distance Estimation p 20 Principal Component Analysis Let D be a distance matrix of four hosts with 0 1 3 3 1 0 3 3 D 3 3 0 1 3 3 1 0 o Singular value decomposition SVD of D is given by DUVVVT 0 7 0 0 0 1 1 1 withU 5 5 W 0 W 0 5 0 0 0 0 0 1 0 1 1 0 1 0 0 0 1 2 2 Network Distance Estimation p 21 Principal Component Analysis 2 Each element of D can be expressed as Since the wis are decreasing it is possible to b q H Dij ZznzlwkUik Gk 1 S i S 77171ng m approximate D using only a few of them eg 2 instead of 4 NIH NH NH NH NH NH NIH NIH I 2W D 00 00 NIH NgtH NIH NIH 03 03 NIH NIH NIH NIH NIH NH give NIH NIH Network Distance Estimation p 22 Principal Component Analysis 3 The columns of U are the principal components and are the orthogonal basis of the new subspace o Using the first n columns denoted by Un the mdimensional space can be projected into an ndimensional one o From the example the distance vector of the first host is dz 0 1 3 3IT This is converted into the twodimensional vector NH NIH NIH NH NIH NIH I l 0 NIH NIH 0 1 3 d L Network Distance Estimation p 23 Working 9 Each beacon node measures its latencies to all other beacons 9 An administrative node aggregates the delay information to produce D 9 Using PCA the node de termines the dimension of the coordinate system and calculates Un Network Distance Estimation p 24 Working 2 I A host measures to a set of beacon nodes and obtains the transformation matrix Calculates its coordinate by multiplying the measured distance vector with the transformation matrix Reduction in dimensions may cause the calculated distance in the coordinate system to be different from measured distance This is fixed by scaling the calculated distance by a The optimal scaling factor an is found by minimizing the error Ja zrzg mgmq acj dij2 Final transformation matrix Un dnUn Network Distance Estimation p 25 Effect of Dimensions u a 08 n2 n2 n 3 r i n 4 n 4 n e r n 5 n r B 9 06 s 06 a a 9 9 a a E E s g 04 g 04 M o 2 02 o 5 25 an o 5 1 25 so number OI beacon nodes number a beacon nodes a 1 S In GNP Performance of ICS unchanged for n 2 6 Newark Dwstance Esumannni p 26 Network Tomography Christos Gkantsidis Introduction nternet Network Tomography I What is the Bandwidth Loss Rate Connectivity of the links of the network Using only endtoend measurements What is the Traffic demands between users of the network Using only limited link measurements Network Tomography Network Tomography Use a limited number of measurements to infer network link performance parameters using Maximum Likelihood Estimator Bayesian Inference and assuming a prior model Categories of problems Link level parameter estimation Topology Inference SenderReceiver traffic intensity Network Tomography SenderReceiver Traffic Intensity A D Links Network Tomography Routing Matrix 1 2 3 4 5 6 7 8 9 10 1 1 12 ab ac ad ba bc bd ca cb cd da db do 1 a gtb 1 0 0 0 0 0 0 0 0 0 0 0 2 a gtc 0 1 1 0 0 1 0 0 0 0 0 0 3 b gta 0 0 0 1 0 1 1 0 0 1 0 0 4 b gtc 0 0 0 0 1 0 0 0 0 0 0 0 5 c gtb 0 0 0 0 0 0 1 1 0 1 1 0 6 c gtd 0 0 1 0 0 1 0 0 1 0 0 0 7 oi gt0 0 0 0 0 0 0 0 0 0 1 1 1 SourceDestination Pairs SenderReceiver Traffic Intensity Routing Matrix A D 1 2 3 4 5 6 7 s 9 1o 11 12 ab ac ad ba bc bd ca cb 01 da db dc 1asb 1 0 0 0 0 0 0 0 0 0 0 0 2asc 0 1 1 0 0 1 0 0 0 0 0 0 3bsa 0 0 0 1 0 1 1 0 0 1 0 0 Links 4bgtc 0 0 0 0 1 0 0 0 0 0 0 0 5cgtb 0 0 0 0 0 0 1 1 0 1 1 0 C 6csd 0 0 1 0 0 1 0 0 1 0 0 0 7dgtc 0 0 0 0 0 0 0 0 0 1 1 1 B SourceDestination Pairs 39 A Routing matrix X Source Destination transmission vector Unknown Y Link traffic Network Tomography OriginDestination Literature SourceDestination Traffic Estimation Vardi J of the Amer Statist Assoc 1996 Bayesian Inference on Network Traffic Using Link Count Data Tabaldi and West J of the Amer Statist Assoc 1998 TimeVarying Network Tomography Cao et al J of the Amer Statist Assoc 2000 Traffic Matrix Estimation Existing Techniques and New Directions Medina et al ACM SigComm 2002 An InformationTheoretic Approach to Traffic Matrix Estimation Zhang et al ACM SigComm 2003 Network Tomography Link Perf Inference Literature Multicastpased Inference of Networkinternal Characteristics MINC Prolect Caceres Duffield LoPresti Horowitz Kurose Towsley Paxson Network Loss Inference using Unicast EndtoEnd Measurement Coates and Nowak ITC Seminar on IP Traffic Measurement and Modelling 2000 Unicast inference of network link delay distributions from edge measurements Shih and Hero IEEE Int Conf on Acoust Speech and Sig Proc 2001 Nonparametric Internet Tomography Tsang Coates and Nowak IEEE Intl Conf on Acc Speech and Signal Proc 2002 Simple Network Performance Tomography Nick Duffield ACM IMC 2003 Tomographybased Overlay Network Monitoring Chen Bindel Katz ACM IMC 2003 Network Tomography Topology Inference Literature Multicast Topology Inference from Measured EndtoEnd Loss Duffield et al IEEE Trans on Info Theory 2002 Maximum Likelihood Network Topology Identification from Edgebased Unicast Measurements Coates et al ACM Sigmetrics 2002 Merging Logical Topologies Using Endtoend Measurements Coates et al ACM IMC 2003 Network Tomography 9 Outline OriginDestination SourceDestination Traffic Estimation Vardi J of the American Statistical Association 1996 LinkLevel Network Inference MulticastBased Inference of NetworkInternal Loss Characteristics Caceres Duffield Horowitz Towsley IEEE Trans In Information Theory 1999 Topology Inference Maximum Likelihood Network Topology Identification from Edgebased Unicast Measurements Coates et al ACM Sigmetrics 2002 Network Tomography 10 Outline OriginDestination SourceDestination Traffic Estimation Vardi J of the American Statistical Association 1996 Network Tomography 11 SourceDestination Traffic Estimation Goal Given Ym estimate pr1 st er1 Arxp pr1 with r ltlt p Idea Use a model for pr1 eg Poisson with rates A Estimate the parameters of the model A that maximize the probability of observing Ym Other Approaches Direct measurements with NetFIow passive monitoring etc Network Tomography 12 A Simple Example X17 X27 X3T Y17Y2T 12T 110 Y10139X Model Xi Poissoni ltgtlt A 1 Find possible X Very expensive x 101 or x 012 2 Find likelihood of Y L A mg A2A2 exp Al A2 A3 3 Find A maxALQ Maybe corner 13 solution Network Tomography ExpectationMaximization EM Likelihood maximization when AEMMYAM Algorithm for finding A 1 Pick initial N0 2 Mn EX Y AW Problems 1 Difficult to evaluate EX Y A 2 May converge to nonMLE point Network Tomography 14 Normal Approximation Measure Y in K periods 2 Approximate Y NrAA K391 AAA with A diag 3 Compute sample average V and sample covariance S 4 Equate sample moments to theoretical moments Tr AA S K1 AAA Network Tomography 15 Outline LinkLevel Network Inference MulticastBased Inference of NetworkInternal Loss Characteristics C ceres Duffield Horowitz Towsley IEEE Trans In Information Theory 1999 Network Tomography 16 LinkLevel Network Inference Goal lnfer network link characteristics like loss rate delay distribution etc Idea Collect endtoend measurements Assume a known topology b model for network behavior Identify network parameters that maximize the probability of the observed measurements Other Approaches pathchar traceroute clink NetwgrtGomography 17 Model Logical Multicast Tree 0 Source Bernoulli losses with Link 39 probabilities d SUCCGSS 1 Temporal dependence PFObablllty gt Slow convergence Spatial dependence gt Error proportional to dependence 4 El 7 Receivers Network Tomography 18 Model Logical Multicast Tree 0 Source Q set of outcomes Link 39 ie subsets of Success 1 receivers received a Probability probe packet n number of probes nx of probes with outcome er or 4 Find a to maXImize 4 a i pim1mnaprr Receivers x69 Network Tomography 19 Solution Methodology Compute or to maximize 1 n 132 m oz H paoz 1169 Using Maximum Likelihood Estimators Properties Strong consistency Asymptotic normality Asymptotic unbiasedness Asymptotic efficiency Network Tomography 20 Convergence of Inferred Loss Probabilities link loss probability xii5395 35 a r 12 31 Error 50001 after 2000 observations 0 200 400 600 800 1000 1200 1400 no observations Network Tomography 21 Effect of Topology Branching Ratio I Tree Depth I I I I I I I Iquot I FiIIII I 39 i l 22I g 39 is 39 39 iiII Iii 39 quot5 39 I 39 ii391quot 8 I 8 3 12quot C 39 C IUi IIIII39 quot 2 i E i cgtU II gt IIII 39 x gt iii 6 2 E II39a IIL39 I I I I I I I I I III I I I I I I I I I I IL quot2 Li I39I39 Its I39Ii It M 1 1 391 IL quot2 II na ilj Itii I39LT n ILquot I LOSS Ratio LOSS Ratio Variance decreases Variance increases with with branching ratio tree depth 22 Outline Topology Inference Maximum Likelihood Network Topology Identification from Edgebased Unicast Measurements Coates et al ACM Sigmetrics 2002 Network Tomography 23 Topology Inference Goal Identify the tree topology connecting a single server to multiple receivers Idea Use a monotonic increasing function of the number of shared links to two receivers eg delay loss etc MLE for topology identification 7 argmaxTEJrcmaxvgg P KW TD Other Approaches traceroute AS map mtrace etc Topology inference more expensive But works without router support Can identify Layer2 devices Network Tomography 24 Sandwich Probe p2 Ci 0391 Idea 91 39 Packet q introduces delay Size qu gtgt Ad between p1and p2 Size of I01 I02 Ad oc shared path 0 gt1 Node 3 measures Ad Advantages Ci I 92 Every measurement is important gt Fast No clock synchronization Network Tomography 25 Measurement Collection For each ordered pair of nodes ij K measurements of delay difference Adijfor k 1K Compute sample mean and sample vanance 1 K Ak 2 1 K Are 2 33237 E Z m 023939 g 2 zyj k1 k1 Asymptotically xij is Normal Network Tomography 26 Maximum Likelihood Topology lden oa on Find 2 2 7 argmaXQ Ef p X Z Assurne delay at IinK l of 239 is Ml Maximize L 13 2 D E Iogp QUIT Prior rnodel 2 Sci j N N Z Ml Oijj ZESi j Penalize trees with rnany links with A a user defined parameter Network Tomography 27 Finding the best tree Optimize Birth Move LA ET E logpa 239 2 239 n 2 Number of trees T is HUGE Instead of exhaustive search for 2 use a re versible jump Markov Chain Monte Carlo 1 Start at state QBMO 2 Propose a rnove to a new state 22Mi1 D th M ea ove 3 Accept the proposal with probability pro portional to the ratio of the likelihoods o1 Delete the two states 2 I 4 Continue and store for each state visited its likelihood O O O I Network Tomography 28 Topology Inference in ns 2 Link CaPaCity cess capacity low Leaf nodes Receivers 1 39 V4 Network Tomograph 1 29 Topology Inference in ns2 Network Tom ography Percent correct Effect of Number of Probes Light Utilization Utjtization 3000 5000 7000 9000 Number of probes Topology Inference in ns2 Effect of Penalty A 9 80 o lgmko 0 56 40 63 u a 20 O 0 2 4 6 8 IO Penalty Network Tomography 31 Topology Inference in the Internet Traceroute Topology Inferred Topology 1mm Mm Missing Layer 2 device Berkqu Emma U W1Illlln l5 MEL Rite Earlaeley ST Llslma mm mm Hg Fm my my Network Tomography 32 Summary Areas of interest OriginDestination Traffic Matrix LinkLevel Network Inference Topology Inference Pick solution with highest likelihood according to a predefined model Numerically difficult problems Standard tools MLE Bayesian Inference EM MCMC Network Tomography 33 Classes Notes for AI 91802 Production Systems As AI Technology Associative knowledgeiBeard 9 must be a man Associative knowledge is ofthe form iflt al a2 angt then lt cl c2 cngt antecedents consequents Not using cause and effect rather ant consq 4 State Actions What if we make a robot to pack groceries It might have knowledge of the form Ifyou have a big bottle NO bag With associative knowledge you can use a state to determine which action You can also capture info about cause effect effect cause Working Memory Read Write Rule Base Rule Selector Rule apply Q1 n A Ql If X has hair then X is a mammal Q2 is a mammal and X eats meat then X is a carnivore Q3 g g g g cud g g ungulate Q4 carnivore has tawny color and X has dark spots then X is a tiger Q5 ungulate X has white color and X has black strips then X is a zebra Eat Tawny Dark Carnivore Tiger half meat Color spots Mammal A A A A Go through the r les and test each Q Activate Rule Ql Now nd subset that matches 9 Question Why go through all the Rules In a reaction system it s important to select only one rule Rules can have con ict when more than one rule is applicable Need Con ict resolution Control of action selection is important You need control knowledgehelps agent decide what to do next Several control strategies Invoke rules in parallel Invoke one rule Control Strategies 1 Apply the rst rule 2 Apply all rules that apply 3 Select one rule from all that are applicable Reactive systems For a class of problems one strategy may apply Divide problems by the application strategy Choose the better strategy But what is better Maybe ef ciency Quality of solution Mycin Diagnosis of bacterial organism Rules If X type is primary bacterium and the susp Entry point then X is Tau r IMurruqIu mm 1 Pom Thquot 1 Mwwm bhza a C Flumnm Rama 8 B him4 yum c Ari a AA A 7 W 7 WM WWW W 5W L 7 quot Ww v mum Alma quPFyl a 1ngan wAr ammuag Tu Tsumwmtbhwuv swwlt Mg AlcII qu 7m 4 IMO unsunsm 14 mm 13 um murmur MA7 Rum MW van 7 thqu luluAmos 7 7 kv y 3 7 unU 197W PM BALMMLIM T umwucw39ugs runway szamLI4Lnyf to n was i leilcnayl Vt 4 1lu5 22 W W39ZW T W WLul39rrn X U 7 7 r I Jam47 HIV5391 I 544 71 MM cum an My mm mquot quotquot55 M g gw ut Pug mm zygulbw CSSSUEB r Am mal Intelligence Lecture 4 082602 Neural Networks Inductive L earning Emblem 1221mm x x2 X n Threshnlds as weight n R n 2 3 m m m m M eamme wengS andmveshmds eamme we gms Pereeptrnn learning algnrathnr V has no 1oea1 rnrnrrna wru eonverge on a set ofwexghts that eorreetly elassr es the example provrded the classes are hnearly separable 1th feature value oftrammg errarnpler classr class oftrammg example r assrgn eaeh werght w arandorn value repeat on gwrfr1 szrz for eaeh werght 1 do w w u e1assrr o epoeh reaehed t ot neetassttteaons t ot Spa2h Steepestgradrent deseent Consrder a pereeptron wrthout the nal Lhresholdmg Errnr functinn Error V2 class or2 2 ElaSS W11 W7f2 r gradrent deseent on the werghts ww d dw dErrordw elassrr or dlte1assr og dw 2 classr oo in 5A OLTP Monitors CS8803l ware Concepts and a TP Monitor Message Manager Request Control Transaction Server QR Display Transaction Server Page 1 Mor ritor Definition 0 System supporting transa between 7 client programs initiate user s req services 7 application servers process client requ interact with resource managers 7 resource managers RMs manage recoverable application data miter Features 0 Traditionally monitors ha ad the following features 7 Performance Handling many simultaneous users 7 Availability Providing online data access Running for long periods of time Handling high demand for particular data 7 Data Integrity 7 Security Authenticating users identity Authorizing requests for services Page 2 Client FrontEnds Display Load balancing over replicated application server Replicated Application Sewers Display Display Display Display Nelwistration Dynamic reconfiguration Centralized logging of information exceptions perforrna audit trail etc Dynamic Reconfiguration Display E monitor server Display Display Page 3 MQQOS Problem g Client 1 Client 2 g Process 1 Process 2 o Client n Process In SERVER Problem Too Many Pr 0 Adverser affects OS 0 Too much processor context s 0 Consumes too much memory 0 If processes aren t pinned in memo transaction invocation many need pagi IO 0 Distribution adds mor remote login 0 Hard to control load except by de activating clients 0V6 e processes Via Page 4 we itor Solution Client 1 RFC Process 1 Multithreaded Control Process Process 2 o Client n Process In SERVER 8 Thread 0 A thread is a savearea to retain o A multithreaded process manages thr for many clients 0 This greatly reduces the number of proce o m is proportional to of activetransactions activeclients so In ltlt n 0 Threads can be implemented by the TP monitor fast or the OS 0 If implemented by the TP monitor then the control process must not do synchronous IO Page 5 Wes of TP Monitor 0 Multithreading gt Few p cesses 7 low OS overhead 7 less processor context switching 7 less memory overhead 0 Multithreading gt 7 easy to manage load by controlling m 0 RFC gt 7 easy to program distributed applications 0 Transactional RPC gt 7 easy to program distributed transactions TPMonitor API CICS STDL 0 There are some API stan o Richer feature set 0 Usually compiled 0 Can put applications and DBMS in s processes for better protection 0 System management scales better for a large number of applications Page 6 DistT39rbu ed Transactions 0 DB server only support mageneous distributed transactions ones that access product 0 Still need a TP monitor s transaction transaction can access 7 two or more DBMS or TP monitor products 7 recordoriented files 7 queues The potential advantages of proprietary dist d transactions are performance and avoiding a T monitor for simple applications The est Abstraction 0 A request is a message as 39 the system to run a transaction 0 TP monitors have it DB servers 0 Supports queued requests gt priori load balancing and recoverable queue 0 Device address in request message supp geographic entitlement callbacks for conversational transaction and pseudo conversations Page 7 m nable Workflow 0 Request messages correspon steps of work ows work ows or 0 Requests carry persistent parameters work ow steps 0 Requests can support automated compens transactions 7 Each request type has a program for the forward and one for compensation 7 If a multi step work ow dies the system can automatically run compensations fs er Architecture 0 Middle tier does 7 dynamic routing 7 parameterbased routing like partitio DBs in a DB server 0 Reduces the number of clientserver ses multiserver configurations 0 Supports queued requests Control Process Client Page 8 Applic ion Management 0 Partition applications indepe ent of the DBs they access o Prioritizing applications 0 Applicationbased load control and secu 0 Dynamic installation startup and shutdow applications 0 Some TP monitors offer a lock manager and l manager for developing homegrown resource managers eg Transarc s Encina Fzru Tolerance 0 Some DB servers don t do au recovery unless there s a hot bac 0 Some DB servers don t have automat failover if a server fails 0 These automated recovery features are in monitors Page 9 Prod ct Status 0 Modern TP monitors 7 Gray and Reuter s Architecture 7 An example Encina 7 Tuxedo and TopEnd 0 Other TP systems 7 ClCS 7 Others Overview 0 Portability and ef ciency 7 Minimal use of OS facilities 7 Specialized scheduling program 10 memory management 0 Several implementations 7 Original ClCS on MVS and other IBM 0 7 ClCS6000 based on Encina 7 Other variants OS2 Page 10 W544 eroperability 0 Resource manager interfa 7 Started from within the address 7 Support for protected RMs 7 XOpen DTP interface 0 Distributed TP 7 MultiRegion Operation w shared memory 7 LU 62 for networking SICS DTP 0 Transaction routing 7 An input message may be relay 0 Function shipping 7 Operations converted to TRPC and exe remotely 0 Peertopeer communications 7 Based on LU 62 TM to TM conversations Page 11 o Transactional RPC RPC failure transaction abort 0 Log manager recovery data 0 Recovery manager log player 0 Lock manager concurrency control 0 Structured File System RM Page 12 Encin 0 Positive points Evaluation 7 Logical component modularit 7 Transactional RPC 7 Nested transactions questionable val 7 DCE portability functionality 0 security portability multithreading RPC 7 Excellent callback mechanisms 0 Negatives 7 Performance problems on DCE Pro ms with DCE 0 Mixed bag of tools not W 0 Poor documentation 0 Numerous tradeoffs variables ha estimate beforehand 0 No multithreading debugging tools C 0 No single management interface 0 DCERPC does not support msg queuing Page 13 USDNoveII Tuxedo 0 Transaction monitor Sys 7 Admin application develop to 7 TM communications 0 Database management SystemD 7 Less successful 0 Queue manager SystemQ 7 Reliable queues o Positives 7 Widely ported to many Unix pla 7 Relatively good performance 7 Many development and operations tool 0 Negatives 7 Primarily Unix socketbased communication 62 mainframes Transport Layer Interface 7 lacks multithreading support 7 weaker security amp in exible load balancing Page 14 aijti39i wtg v ampGeprgiaurn Pretenuri ngi For Java Appeared in OOPSLA 2001 Blackburn SM Singhai S Hertz M McKinley KS Moss JEB Architecture and Language Implementation Laboratory Department of Computer Science University of Massachusetts Rodric M Rabbah Georgiatims m cre ill Outline 0 Overview and Introduction Background Novelty and Contributions Summary of results 0 Gathering And Generating Pretenuring Advice 0 nDepth Performance Analysis 0 Summary Rodric M Rabbah Georgiatlmsz lmucs ill Overview 0 Garbage Collection GC is an automatic technique that reclaims storage reserved for unreachable data Implicit storage management reduces memory leaks and data corruption 0 Efficient and effective collectors are key to overall performance Growing use and popularity of Java where GC is required 0 Need to improve collector performance by reducing associated costs Focus on Generational Copying Collectors Proposed technique also generalized to Older First Collectors to demonstrate wider applicability Rodric M Rabbah ill Geprgiallmsrg lmucs GC And Generational Copying a GC in general consists of three phases Identify reachable live program data objects Copy live objects into new space Incurs the greatest cost Reclaim vacated space 0 Generational Copying GC incur lower copying costs Does not copy all live objects in the heap Partitions heap into agebased generations of objects Newly allocated objects are placed in youngest generation Often referred to as the nursery Only the nursery is collected Live objects are copied to next older generation Older generations are collected when necessary Rodric M Rabbah Georgiallmsz mucs CEGI39IWQDQQM Ill Object Pretenuring o Pretenuring allocates some objects directly into older generations lf objects are longlived then strategy avoids copying data from the nursery 0 Ideal pretenuring would inform an allocator of the exact lifespan of a new object Allocator would select the ideal generation for the object Collector only considers unreachable objects Avoids unnecessary copying Not feasible Rodric M Rabbah Ill Georgiallms lmucs Common Pretenuring Strategies o Tractable pretenuring advice is gleaned from application profiles and program analysis 0 Two possible strategies for dispensing advice Per allocationsite Results in accurate predictions for ML and Java programs Predictions are robust over different input workloads Per callchain C programs require context information a Previous work pretenures an object if it can be determined that It Will surVIve the next nursery collections Pro le based classification of objects as longshortlived Advice is GCspecific Rodric M Rabbah Novelty And Impact o Threelevel hierarchy of object classification Based on two object lifetime statistics lifetime a measure of how long an object lives in bytes of allocation time of death point in allocation history when object is no longer reachable Immortal objects have a time of death close to the end of the program Allocated in a separate permanent space that is never collected Shortlived objects have a lifetime below a chosen threshold Allocated in the nursery All other objects are considered longlived Allocated in a 2nd generation heap partition 0 Pretenuring advice is neutral with respect to GO algorithms The same advice is used to improve two distinct collectors Rodric M Rabbah Georgialimsz imucs J Other Contributions o Buildtime pretenuring Advice from different applications that share allocation sites is combined Benefits libraries and internal JVM classes 0 Combination of buildtime and applicationspeci c pretenuring to deliver the best performance Rodric M Rabbah Ill Georgiallms lmuce Results Summary o Buildtime pretenuring improves the Jalapeno JVM performance by an average 8 Benchmarks are profiled and their advice combined Performance of each benchmark is measured Information from the application to be measured is omitted o Applicationspecific pretenuring alone improves performance an average 35 0 Combination of strategies reduces GC time 20 to 30 Overall execution time reduced 7 on average 0 Larger heaps reduce the impact of GC costs on execution time Pretenuring still improves GC performance Rodric M Rabbah Georglleat mm a amp And Generating Rodric M Rabbah ill G A iallrnsfg lmtlca C1bchrmetl egjy Object Lifetime Profiling o A program executionprofile records all allocations and deaths To obtain accurate death information a nongenerational full heap collection is triggered once every 64Kb of allocation object o For each object its age and time of death are determined Max live size and total allocation size are also recorded and used for normalization and classification purposes The max live size is a theoretical minimum memory requirement ie the maximum size of live objects in bytes Rodric M Rabbah Georgiallms lmucs J Object Classification o For each object allocated at a given site it is categorized as shortlived longlived or immortal if an object dies more than halfway between its time of birth and the end ofthe program then place object in the immortal bin else if an object s age is less than 0 x max live size then other objects J l l l L O Dd l l 2 3 4 5 E T abject age relative to max live size place object in the short bin 1 else 7 abject age relative lo total all t ati n place object inthe longbin 5 1D 0 04 0 g 391 E I 39 39 7 o Immortal objects will EM 397 5 never be copied and g 5 result in lower space D j 1 immartai 4 requirements E M j x E a a3 In contrast space is f Ln3 reserved for copying EM l 0 l 2 E i 1 g A an Rodric M Rabbah ijacl time at death regime In max live 5129 Allocation Site Classification a An allocation site that allocates a fraction 8 of short lived objects L longlived objects and immortal objects is categorized using a threshold H ThesiteisshortifSHgtLl ThesiteislongifSLHgtl All other sites are immortal a To combine data from different program executions respective callsite bins from different traces are normalized and combined Rodric M Rabbah 53an gt 0 com 202838 qunEoo ucm 8236 m6 can and n I ucm Nd u a I E59 95 mozm 58 was 3 0 I 3 1 33 ma m d m win 1x mnmm Hm m 33 3 229 Hc nzxe wamw 3 m THH Indquot 23 Ream xv 3 SF xmuxm Imam n a hm mu HS 512quot HT 721quot 35 13 E 3 if Edwin menu 2 Es mm m mimm 3 gas 3 m Hm 51 arm quot5x 73 E a 3 m m Ewan an 2 3 Ema mm 1 ENE xwzmm imf hxcn m x w 321243 mam a mmrn a 2 3 52mm w 3 3 2 35 35 Eh H E 3 1 mm E ii in 55 15 nanxcmm 6335 can u Tim 2 m mm m UH 13 m mg m 332m niz hz Emmi 3 3quot 9 16m new 95 7133 r mm HEM 5 mi 33 enmi mnm 9553 n HE Magma E 22 it 25 5 LaEHE an 2 ME 3553 Eh 32 F m 3 3 53 2 2 mm 2 3 2 a w anew 53 35 E 29 h w 72 En ESE 13 SE w in 3 I k F32 3i a 3 a n 3 a 5 31 cad 23 3cm 2 f in 3 52 92quot nd 5 me 2 3 33A 2 2 3 2 ii 3 52 quot4 3 2 22 quot3 E13 31 mi 2 5 EH 23 235 3 2 m m a 22 3 3 2 12 m1 3 4 25 5 2 532 3 2 2 in r 2 23 n r 3 45 E mmm m E E n 25 a E 53 3 E E 3 n mumm 35H 25 HER Ext 5E E Eu ELEEE MEG EH 2 7 2514 F 5 e 9 35 quot52 we mcixmcoEm uwuzntu u 95 En 830 mwomb gtgt1 BEL 8 9 wmownmw mwzm wEom Loquot 53 2min m 9m wm m an 830 ho 5935 bEEmoEoI gm mo Georgiallrr mug Jhlg bchw Classification Accuracy 0 Advice may be categorized as neutral bad or good with respect to nonpretenured status quo Neutral advice allocates objects in the nursery Bad advice wastes space as objects are wrongly allocated into a longer lived region than appropriate Good pretenuring reduces copying without wasting space quot6 good quot6 neutral 01 quot6 berd it benchmark lt ii gt lt2 015 gt lt 1515 gt lt 575395 gt lt 05 gt lt imst gt lt 5013 gt lt2 mix gt 5 501 gt jess 416 01 00 473 39 58 00 04 07 javac 106 370 155 237 81 31 16 03 02 jack 296 76 08 462 78 56 13 02 09 health 255 35 17 420 195 58 14 03 03 pBOB 377 33 68 339 49 40 25 60 09 average 290 103 50 386 89 48 14 14 06 Rodric M Rabbah Georgllngat mmga Ag IJ Rodric M Rabbah Pretenuring Reduces Copying o Ratios of bytes copied mark to bytes allocated cons Rodric M Rabbah Mario39mns ratio relative to nonPT 100 it 95 lit 90 it 85 IE 80 Wu 3395 quotit 3 0 tit 65 60 quotIE 55 its Geometric mean for all henchmaiks up to 33 reductions in copying Application PT Build time FT e 5 build FT 125 15 2 25 3 Heap size relative to minimum heap size lag ill Heap Usage Over Time 218javact 30MB heap no pretenuring 213javac 30MB heap build amp application pretenuring 15MB Gen1GenD 15MB 1Gen Gen 1 Immortal Gen 1 14 MB 14 MB D 1 Ill 03 E 12 MB g 12 MB 2 2 8 10 MB 8 10 MB 0 CL 3 U 3 8 MB 3 3 MB 3 5 m 6 MB in 6 MB E E 2 2 g 4 MB g 4 MB 2 MB 2 MB 0 MB 0 MB 0 MB 125 MB 250 MB 375 MB 500 MB 625 MB T50 MB 0 MB 125 MB 250 MB 375 MB 5430 MB 625 MB 750 MB Time allocationl Time allocation 0 The top line shows the total heap consumption immediately before GC 0 The second line shows the heap consumption of the older generation before GC 0 The bottom line shows the immortal space consumption Always zero without pretenuring o The lowest points in the second lines are similar This shows that pretenuring does not allocate immortal objects inappropriately 0 Note the shape ofthe four troughs Without pretenuring the bottoms are flat indicating that there is no direct allocation to the older generation Rodric M Rabbah Time Spent Collecting Garbage o Pretenuring reduces copying costs and consistently reduces GC time Rodric M Rabbah Nerm alized GC time 110 100 1 90 80 3392 7U 60 50 Geometric mean far all benchmarks Applica tien F39T Build lime PT 9 amp buiid 1 125 15 2 25 3 Heep size relative m minimum heap size leg PT Summary New mechanism for collecting and combining pretenuring advice 0 Novel and generalized classification scheme finds many opportunities to pretenure objects and reduce copying o Profilebased applicationspecific pretenuring and buildtime pretenuring offer the best performance Rodric M Rabbah 20 Wednesday March 13 2002 Note for reading papers 1 draw the picture that they should have of what they did 2 2 Work from there Hamlyn 2E ll work Q63 entries kg slots pinned areas of Virtual memory metadata 128Bmsg Usa e 1 Short normal msg 2 Long normal msg Message is composed and sent slots Receiver metadata notify Q ring buffer Work Q entry Hdr Metadata data Header slot offset Metadata index Data goes to slot offset Entry is added to notify Q Hdr Metadata data 3 a one word msg to notiny Claim Hamlyn can do scatter gather See case 2 Gather from send area Scatter multiple messages to receive areas get help from notify mechanism but only one slot Zerocopy Remote write lets you write to right place if your data is always where it should be Application layer is responsible for ACKs READ buffer management Paper doesn t actually say how to USE this Application Q send gt recV J send tail pointer back windowing protocol More probably a remote DMA read Thekkath NFS Really Looks like client Q gtserver Rewrite write Send areas contiguous in physical mem Yes then this is all easy No Either DMA has translation amp knows about page bounds OR HP phys addr has VA on 10 bus Note PCI does not do this Web Service Transaction Qinyi Wu qxwccgatechedu 100504 c54803Encc58803 I Outline Web Service Concept Web Service Protocol Web Service Transaction I Web Service Concept From W3C A software application identified by a URL whose interface and bindings are capable of being defined described and discovered as XML artifacts A Web service supports direct interactions with other software agents using XMLbased message exchanged via Internet based protocols I Web Service Concept Identified by a URI Interface and binding are capable of being defined described and discovered as XML artifacts Direct interactions with other software agents using XMLbased message exchanged via Internetbased protocols I Web Service Protocol de nedSOAP describedWSDL discoveredUDD SOAP Structure SOAP envelope SOAP header I header block SOAP body I body block Copyright Springer Verlag Berlin Heidelberg 2004 HTTP POST Transactional comm Nam of Pmadm Inpu39 parentquot 1 sazvms PROVIDE SOAP by HTTP 8 quoto 3 E g E i lt a 59va REQUESTS 1an paraan 2 HTTP Ackmwkdgtmln39 Transactional contlx39 8 3 lt a Imm pmmm I A Simple Example Get rize rltemX gt Inventory Server Prize fc Item In XML a request POST StockQu0te HTTP11 Host wwwstockqu0teservercom ContentType textme charsetquotutf8quot ContentLength nnnn SOAPAction quotSomeURIquot ltSOAPENVEnve10pe Xmlns SOAPENV httpschemasxmlsoap orgsoapenvelope SOAPEN quot ltSOAPENVB0dygt ltmGetLastTradePrice xmlnsmquotSomeURIquotgt ltsymbolgtDISltsymbolgt ltm GetLastTradePricegt ltSOAPENVB0dygt ltSOAPENVEnvelopegt mm39 chem In XML the response HTTP11 200 OK ContentType textme charset utf8quot ContentLength nnnn ltSOAP ENVEnve10pe xmlns SOAPENV httpschemasxmlsoap orgsoapenvelope SOAPEN quot L ltSOAPENVB0dygt mm39 chem m nan ltmGetLastTradePriceResponse xmlnsmquotSomeURIquotgt ltPricegt3 4 5ltPricegt ltm GetLastTradePriceResponsegt lt S OAPENVBodygt lt S OAPENV Envelopegt service requesfor l A simple implementation of SOAP service provider i 1 le enfafion invokes The service as a local call client sTub invoke SOAP engine To prepare SOAP message SOAP engine packages SOAP inTo HTTP and passes iT To an HTTP clienT ThaT sends iT To The provider HTTP engine service i e enfafion invokes The local procedure of The service implemenTaTion The rouTer parses The message idenTifies The appropriaTe sTub and delivers The parsed message SOAP roufer passes The conTenT of The HTTP message To The rouTer l WSDL WSDL specificafion absfracf parf Types operations porf Types co 1crefe parf bindings porfs services WSDL Use Case w SCIvie p owdu 39 D39 cum sink smu sue service requester client 5 I I I l serVIce provider I I I I application object service providul SOAP basal middkwur SOAP basal middkwur 50 messages Copyright Springer Verlag Berlin Heidelberg 2004 UDDI Registry Information White Pages I Includes company s name address contact details Yellow Pages I Provides a categorization based on business and service types Green Pages I Describe technical information like how a web service is invoked UDDI Registry Data Structure businessEnfify CUTIE coniucis cuiegories businessService service key name descripiion caiegories bindingTemplafe binding key descripiion ress detailed info references To iModeIs caiegories Copyright Springer Verlng Berlin Heidelberg 2004 Sfored in The UDDI regisfry service provider UDDI Scenario service requesfor SOAPHTTP Swain Inquiry API alishers API I Web service interface UDDI regisfry MAIL businessService bindingTemplafe Copyright Springer Verlag Berlin Heidelberg 2004 I Take a Break Web Service Protocol Web Service Transaction Transaction Traditional Transactions 1 Shortlived a Tightlycoupled a Correctness based on ACID properties Web Service Transactions 1 longlived a looselycoupled Compensating Mechanism 1 ACID properties are weakened BTP vsWS CT vs WS TXM a 39 4 WS Transaction WS Coordlnatlon WSCoordinatIon I WS Coordination Architecture b dimibmd commune Copyright Springer Verlag Berlin Heidelberg 2004 l WS Coordination Functions Z I Q ActivationlkqustorPMTypc OutaCowdinu39ionCon39u lsponsc 39ce39ordim i mi coordination rugis39ru39ion scivie Copyright Springer Verlag Berlin Heidelberg 2004 39ion conkx39 ia I WS Coordination Registration Ragism39ionkqustorPMTypc regime 14 th39ifiu penicipm promo service coordinator pmon service Rugis39ru39ionCoonlinuOorPortTypc coordinator Copyright Springer Verlag Berlin Heidelberg 2004 WS C Creation of a Coordination Context pmocolspacific messages XP39quot39 39P39quot39P quotTYP from penicipenv o coordinator pmoeolspacific messages from coordinator in penicipenv xennlimowmtype coordinator Copyright Springer Verlag Berlin Heidelberg 2004 WS C Coordination Chaining for Distributed Coordination XPaticipamPortType Web service in 1 i I I i xPanieipaumType XParticipamPonType I coordinator iir n 1 XCoordinatorPonType xcoordinmmmype Copyright Springer Verlag Berlin Heidelberg 2004 WS C Message exchanged in a Centralized Conversation Web service A coordinator C Web service B activation protocol coordinator coordinator registration coordinator Copyright Springer Verlag Berlin Heidelberg 2004 WS C Message exchanged in a Distributed Conversation Web service coordinator CI Web service coordinator C CC Copyright Springer Verlag Berlin Heidelberg 2004 WS Tmnsaction Relation with WS Coordination By initiating Web service to create a new coordination context By a participating Web service to pass the coordination context to another Web Service By a participating Web Service to register for a transaction protocol with a central transaction coordinator By a proxy coordinator to chain with a primary coordinator I WS Transaction Atomic Transaction Supported Coordination Types 0 Completion 2PC CompleteWithAck PhaseZero OutcomeNotification WS T Port Types Used in Atomic Transaction WS Coordination interfaces l WS Transaction interfaces Compk39ionpmicipummrype Q CompktionCoordinatorPthypu Compk39ionnmhukpmicipumwom39ype CompktionWithMCoordinatorPthyp atomic transaction Phasezmpmicipunwmrype coordimmr PMuZuoCowdinatorPortTypc ZPGarticipa ntPortTypl ZPCCoordinatorPortTypc OutcoulchificationParticipantPortTypt a mNP39fic39incrdinmponryp WS Transaction interfaces 0 needed for chaining Registrationpmicipumpmrype WS Coordination interfaces needed for chaining Copyright Springer Veriag Berlin Heidelberg 2004 WS T for Atomic Transaction Web service coordinator Web service coordinator 9 a a N an s a E a E a m 1 s a a an 5 s a gt s a an 395 a m E 3 5 a a U WS T Port Types Used in Business Transaction WS Coordination interfaces l E E WS Transacfion interfaces BusinessAgmmenoPunicipumPonT e BusinessAgmmemCowimomeype 639 o BusquotasAgreemenmmhampmeawinuvowmrype 0 business activity coordinator WS Transaction interfaces needed for chaining nqis39m39bnpmicipumpomype WS Coordination interfaces needed for chaining Copyright Springer Verlag Berlin Heidelberg 2004 mcmm mm up my um m 5H nu r mm a man 1 umcmum I 1 my wJars snglzrmug Pmcessm Mam mm mm swungmm mwgmgzmwum mmmmm a Traf c Management I traf c management con gurable coprocessor 7 h dware support for er ow queueingschedulingadmission control for up to 128k ows 7 fourlevels of control 1 8k flows rgt4k pipes 7gt 512 subports rgt port 7 hardware provides per ow structures user uses policy engines to set up tra ic gt ow relationship Exception handling I AMCC NPs built in support for exception handling by same hardwaresoftware as fast path ow through I exception process ow gt for storeand forward operations 7 exception channel buffer With extra memory 7 eg TCP ow termination assembling jumbo frames Classi cation I Onchip integrated Policy Engine with 512x68bit TCAM with weightarray and mask arrays I large TCAMs expensive gt I Algorithmic Search Engine 7 hardware based mechanism to hash to a memory block in off chip SRAM Other engines I Special Purpose Unit for data coherncy 7 ensures atomicity of shared memory accesses 7 readmodifywrite I Packet Transform Engine 7 all inout traf c passes through PTE 7 does header eld prependinsertdelete checksums 7 does not need to wait for entire frame before other processing kicks in Standard interfaces 7 GMIL SPI3 SPI4 GPIO memory controllers Systemlevel Interconnects 7 PCI PClX PClExpress 7 HyperTransport Rapile 7 Advanced Switching Systemlevel interconnect PCI 7 dominant stan standard includes physical interface signalling and readWrite data semantics 7 address data PCI and PCIX around since early 90s originally Intel backed by PClSIG 0 PCI 7 32 or 64bit 33 or 66lle 0 parallel shared bus 7 synchronized clock signal 7 scalability in terms of of devices 7 limit on physical distance clock skew 7 electrical and speed limits due to bus loads 7 reliability 7 one misbehaving guy on shared bus limitation drive domain targeted solutions e g graphics ports chiptochip busses PCIX PClX 7 still a shared parallel bus 64bit 1332665331066 MHZ 7 Jle compatible With PCI physically signals systems so Ware weakest device on the bus will set performance limits 7 signal in registers to extend decoding time budget 7 clock seed throughput additional QoS enor conecn39on codes linklayer device ID messages split transactions reordering of transactions attribute eld with byte count gt better buffer management PcLExpress FCIe manBMW mm mmle a mummy 5m mumth P013 7 layers rmrm xym SWswmnm mammwmm mm man 126812m321ms WWW mewm mummmo magma 4mm gymwk m umm u mm mm 7 m u uxKhmcu mam mm mum mum 7 mm mmmk uhm l dvanced Svmchmg Mm m M m mVrhmxl wquot High Perf ICs Myrinet Quadries Ethernet vs Ethernet Myrinet Origins based on MPP architecture multicomputer parallel network packetbased regular topology ie can gure out address mapping cutthrough routing ow control on every link asynchronous signals low error rate assumption Caltech Mosaic multicomputer Myrinet Design pointtopoint host NIs and switches ow control W stop and go signals stop and go at threshold values not at minmax to allow for some slack it s own frame format header payload arbitrary lengthl CRC at the end on entire packet interpacket gap blockingcutthrough routing Myrinet NICs LaNai chips SRAM bus originally proprietary EBus LaNaiX with PCIX DMA engine packet engine DMA engine can compute checksum on data transferred on bus programmable processor Myrinet Control Program in memory write protected by LaNai processor Host interface command and acknowledgment queues scatter gather DMA 1 vs zerocopy data transfers user space to NIC memory checksum by DMA engine packet in multiple dma transfers interrupt enabledisable on rcv automatically select address mappers in network Self configuringself healing IP multicast support Applications I sockets MP1 other I userlevel API GM of MX MCR third generation 2X2Gbps fourth 2X10Gbps 10GbE compliant LaNaiX multiple interconnects supported in rmware Myrinet GbE In niband PCIX plus propriatery SAN interface Current top 11 4800 processors 96TB Mm Quadrics Overview Quadrics Interconnection Network QsNet QsNet II Integrated global shared memory Programmable NIC Hardware support for collectives integrated fault detection and fault tolerance in network hw Currently 5 8704 processors 261TB Mm Based on Elan NICs and Elite switches Elan NICs 20mtz Thread Processor Runs Commluucation Protocols 32bil SPARfibased pcode Proc threads inputter DMA thread processorscheduling thread commandprocessor thread mo Vl39lz m mama LB kept consistent with the host no p39 39 gregistering Handshake in HW Elite switches 8 bidirectional links with 2 Virtual channels in each direction An internal l6X8 full crossbar switch 2 input ports per input link to deal with Virtual channels 400 MBs on each link direction 2 priority levels plus an aging mechanism Adaptive routing Hardware support for broadcast Path remains reserved for acknoledgement 0 Adaptive routing choose lightest path through tree a zx39 Network topology Fattree 64 up64 down level 3 fattree 16 upl6 down level 2 fattree Single Elite backplane Building blocks lent A 64L 54Domlt SWITCHES 1394 OF FL U EISECTJONAJ BANDWlDTJI 15 640 764170 39 SWITCHES mam m iUPrMDOWN sm rcHEs FULL ElsgcnoNAL u nu nu nu nu nu nu nu mx nu uu nu nu nu contiguous 0 E g multicast provided that addresses are Hardware support for collectives War of the Interconnects Interconnects status HotInterconnects 03 War of the Interconnects Supercomputing 03 Battle of the Network Stars HotInterconnects 04 OSU benchmarks everyone quotes them just not all of them Today issue is Ethernet vs Ethemot technologies Myrinet Quadrics have Ethernet compatible options Long haul solution for WideWider area connectivity Ethernet technology with protocol of oad in hardware ICs on TopSOO list top500org 1103 0604 1104 0506 0606 GigE 109 163 0 176 0 212 0 256 0 Myrinet 73 2 186 2 193 2 69 1 56 0 Quadrics 26 4 23 3 20 2 13 1 141 In niband 31 10 0 101 171 37 3 clusters 208 7 290 6 294 5 304 3 3644 HPLinpack benchmark gt T ops application miX evaluates compute power at nodes also bandwidth aggregate and pointtopoint and pingpong and collective comm latency 2 years old status 106igE Infinibqnd Myrinef SCI Quadrics NeTwor k Any LAN SAN SAN SAN SAN SAN Environmen r MAN WAN Scalability IPerouTed 9 Sourceer oufed 9 SourcevrouTed 9 quotInfinitequot of f 7 nodes Cost Per Port 35 2004 35 2005 35 Performance 916 MBs 843 MBS 250 MBs 230 MEs 908 MBs MPLTueMPI 100 us 60 us 63 us 37 us 24 us of socke r level Profocols NGTlve TCPIP RDMA RDMA RDMA or RSM RDMA TCP of oad Remote Shared RDMA over TCPIP emery Toml Cos r of 35 Ownership SCI scalable coherent interface Both 10GigE and IB cost down IVINIB latency 2us range 10GigE 6us range Understanding performance factors Microbenchmarks unibidirectional latencybandwidth collective operations barrier allreduce hotstop multicast Higher layers sockets and MP1 of main concern Actual applications Priceperformance


Buy Material

Are you sure you want to buy this material for

25 Karma

Buy Material

BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.


You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

Why people love StudySoup

Steve Martinelli UC Los Angeles

"There's no way I would have passed my Organic Chemistry class this semester without the notes and study guides I got from StudySoup."

Jennifer McGill UCSF Med School

"Selling my MCAT study guides and notes has been a great source of side revenue while I'm in school. Some months I'm making over $500! Plus, it makes me happy knowing that I'm helping future med students with their MCAT."

Jim McGreen Ohio University

"Knowing I can count on the Elite Notetaker in my class allows me to focus on what the professor is saying instead of just scribbling notes the whole time and falling behind."

Parker Thompson 500 Startups

"It's a great way for students to improve their educational experience and it seemed like a product that everybody wants, so all the people participating are winning."

Become an Elite Notetaker and start selling your notes online!

Refund Policy


All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email


StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here:

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.