New User Special Price Expires in

Let's log you in.

Sign in with Facebook


Don't have a StudySoup account? Create one here!


Create a StudySoup account

Be part of our community, it's free to join!

Sign up with Facebook


Create your account
By creating an account you agree to StudySoup's terms and conditions and privacy policy

Already have a StudySoup account? Login here

Intermediate Computer Systems

by: Lacey Collier

Intermediate Computer Systems CS 5410

Marketplace > Cornell University > ComputerScienence > CS 5410 > Intermediate Computer Systems
Lacey Collier
GPA 3.84


Almost Ready


These notes were just uploaded, and will be ready to view shortly.

Purchase these notes here, or revisit this page.

Either way, we'll remind you when they're ready :)

Preview These Notes for FREE

Get a free preview of these Notes, just enter your email below.

Unlock Preview
Unlock Preview

Preview these materials now for free

Why put in your email? Get access to more of this material and other relevant free materials for your school

View Preview

About this Document

Class Notes
25 ?




Popular in Course

Popular in ComputerScienence

This 227 page Class Notes was uploaded by Lacey Collier on Saturday September 26, 2015. The Class Notes belongs to CS 5410 at Cornell University taught by Staff in Fall. Since its upload, it has received 60 views. For similar materials see /class/214336/cs-5410-cornell-university in ComputerScienence at Cornell University.


Reviews for Intermediate Computer Systems


Report this Material


What is Karma?


Karma is the currency of StudySoup.

You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 09/26/15
Web Services and SOA Standards Birman C85410 Fall 2008 A story of standards 0 What s a standard 0 Historically the industry has advanced in surges 0 First a major advance occurs like rst web browser 0 Big players jump on board agree to cooperate to ensure interoperability of their products which will innovate in terms of the user eXperience but standardize internals 0 Today we re awash in standards 0 But creating a standard isn t any formula for success 0 There are far more ignored standards than adopted ones f A short history of standards 0 Some standards that mattered 0 CORBA general object oriented interoperability 0 J2EE Java runtime environment 0 NET Microsoft s distributed computing infrastructure 0 Web Services the web but not limited to browsers interacting to web servers Web services use the same standards But the focus on programs that interact by exchanging documents web pages that encode information Today Web Services are hot 0 This is the basic standard employed in cloud computing systems 0 Internet is at the bottom of the stack 0 Then layer on standards used when browsers talk to web servers HTTP and to encode those pages HTML 0 Web services run over HTTP and HTML but the web pages have their own mandatory encoding called SOAP It describes requests and responses on services 0 The associated architecture is referred to as a service oriented architecture SOA and the systems built this way are service oriented systems SOS Turtles all the way down A wellknown scientist some say it was Bertrand Russell once gave a public lecture on astronomy He described how the earth orbits around the sun and how the sun in turn orbits around the center ofa vast collection ofstars called our U alaxJ At the end ofthe lecture a little old lady at the back ofthe room got up and said quotWhat you have told us is rubbish The world is really a flat plate supported on the bacK ofa giant tort01se quot The sc1entist gave a superior smile before replying quotWhat is the tortoise standing onquot quotYou39re very clever J ounu man very clever quot said the old lady quotBut it39s turtles all the way downquot Standards all the way down 0 We re starting to see a second generation of standards layered on the basic web services ones 0 XML on the bottom web page stuff 0 Then web services on top of the web page stuff 0 Then for example the military global information grid GIG layered over web services 0 Other emerging standards nancial data centers medical computing systems etc 0 These generally adopt the underlying standard then add additional rules for using it for specific purposes Elements of the standard 0 A collection of documents that spell out the rules 0 There are a great many of these documents 0 And like many standards not all have been widely adopted 0 Vendors like Microsoft BEA IBM even Google have their own platforms implementing parts of these documents in theory the systems interoperate 0 But they also compete by innovating around the edges 397 SOAP Router Backend Processes Web Service Basic Web Services model Web Services are software components described Via WSDL which are capable of being accessed Via standard network protocols such as SOAP over HTTP 739 SOAP Router Backend Processes Web Service 11 capable of being ssed Via standard network protocols such as SOAP SOAP Router L y r 7 Y rquot Backend Processes Web Service 3 SOAP Router L l g I I r 77739 v quot w Backend Processes Web Service quot39 quot w w w described via WSDL M 5quot V39 L 3 m Web Service Web Service WSDL invoker described Web Service Web Server eg IBM SOAP WebSphere messaging BEA WebLogic A a r n BVUHSSEDU BEEwian QUEEN un wn mm macaw WM r Tumubm irmus WWW I u U ma im a im Eng iam ng JEJJQEUUUEU WM Umsmm imu Eaam immm V j a 0 39quot01 mm Emmum M MUUUJ HER7UP E UUUVBE DUEEWUDUEK UZDEJUEEWEUU YUD UZ US FEEWSUUUDUUZ How Web Services work 0 First the client discovers the service 0 Typically client then binds to the server 0 Next build the SOAP request and send it 0 SOAP router routes the request to the appropriate serverassuming more than one available server 0 Can do load balancing here O Server unpacks the request handles it computes result Result sent back in the reverse direction from the server to the SOAP router back to the client Marshalling Issues 0 Data exchanged between Client and server needs to be in a platform independent format 0 Endian ness differ between machines 0 Data alignment issue 16 32 64 bits 0 Multiple oating point representations 0 Pointers gt Have to support legacy systems too quotW Marshalling 0 In Web Services the format used is XML 0 In UNICODE so very verbose 0 There are other less general but more ef cient formats Comparing with CORBA 0 CORBA is an older and very widely adopted standard 0 J2EE mimics it in most ways 0 NET Windows is very similar in style 0 lVlOClelS applications as big quotObjectsquot that export interfaces methods you can call with typed args 0 Then standardizes various tools for managing them 0 Also provides for ways of connecting data centers over a WAN protocol of their design which runs on TCP Comparing with CORBA CORBA 0 Object centric 0 RFC remote method invocation with typed interfaces 0 Much emphasis on semantics of active objects 0 Standardizes most 00 intrastructure Web Services 39 Document centric 0 Services treated as document processors 0 But can still do RPC 0 Document defines its own needs and services try to carry them out 0 Standardizes things documents can express Remote method invocation 0 Also called Remote Procedure Call Invoke a procedure on a remote machine just as you would on the local machine 0 Introduced by Birrell and Nelson in 1985 0 Idea mask distributed computing system using a transparent abstraction 0 Looks like normal procedure call 0 Hides all aspects of distributed interaction 0 Supports an easy programming model 0 Today RPC is the core of many distributed systems 0 Can View the WS client server interaction as an RPC RPC Optimization 0 Delay sending acks so that imminent reply itself acts as an ack 0 Don t send acks after each packet 0 Send ack only at the end of transmission of entire RPC Clint Server nun I IIl IhlIulIIIhnun Illnun In request 1r 0 NACK sent when m1ss1ng Ii igum 44 RI C using a hum prt imcol sequence number deteCted how the reply is sent mun cm mgh so ha11 111 acknowledgement 10 the hum is not needed RPC what can go wrong 0 Network failure client failure server failure 0 Assuming only network idiosyncrasies for now 0 RPCs use acks to make packet transmission more reliable 0 If timeout with no ack resend packet 0 Leads to the issue of replayed requests 0 Each packet has a sequence number and timestamp embedded to enable detection of duplicates could fail too 0 What does a failed request mean 0 Network failure and or machine failure 0 Client that issued request would not know if the server processed the request or not How about layering RPC on TCP 0 Web services often not always run over TCP 0 TCP gives reliable in order delivery flow control and congestion control 0 Reliable Acknowledgments and retransmissions 0 In order Sequence numbers embedded in each message 0 Flow Control Max allowed window size TCP 0 Congestion Control the saw tooth curve 0 Ramp up as long as no timeouts o Slowstart phase exponential increase until the slowstart threshold is hit Congestion Avoidance phase additive increase 0 Multiplicative Decrease on timeout W TCP optimizations 0 Random Early Detection 39 Selective Acknowledgments 0 Fast Retransmit Recovery Back to RPC on TCP 0 TCP gives reliable communication when both ends and the network connecting them are up 0 So the RPC protocol itself does not need to employ timeouts and retransmission 0 Simpler RPC implementation 0 But the failure semantics remain the same weak W RPC Semantics 0 Exactly Once 0 Each request handled exactly once 0 Impossible to satisfy in the face of failures gt Can t tell whether timeout was because of node failure or communication failure RPC Semantics 0 At most Once 0 Each request handled at most once 0 Can be satis ed assuming synchronized clocks and using timestamps O At least Once 0 If client is active inde nitely the request is eventually processed maybe more than once NAT box 0 Overcomes limited size of IPv4 address space 0 Role is to translate a large number of internal host addresses Amazon or Google might have tens of thousands of machines at each data center into a small number of externally Visible ones 0 Can also play a load balancing function Discovery 0 This is the problem of finding the right service 0 In our example we saw one way to do it with a URL 0 Web Services community favors what they call a URN Uniform Resource Name 0 But the more general approach is to use an intermediary a discovery service Example of a repository Name Type Publisher Toolkit Language OS Web Services Performance and Application LisaWu NA Cross Platform Load Tester Temperature Service Client Application vinuk Glue Java CrossPlatform Weather Buddy Application rdmgh724890 MS NET C Windows DreamFactory Client Application billappleton DreamFactory Javascript CrossPlatform Temperature Perl Client Example Source gfinkel3 Perl Cross Platform Apache SOAP sample source Example Source xmethodsnet Apache SOAP Java CrossPlatform ASS 4 Example Source TVG SOAPLite NA CrossPlatform PocketSOAP demo Example Source simonfell PocketSOAP CH Windows easysoap temperature Example Source aOO EasySoapl F CH Windows Weather Service Client with Example Source oglimmer MS SOAP Visual Basic Windows MS Visual Basic TemperatureClient Example Source jgalyan MS NET C Windows Roles 0 UDDI is used to write down the information that became a row in the repository I have a temperature service 0 WSDL documents the interfaces and data types used by the service 0 But this isn t the whole story Discovery and naming 0 The topic raises some tough questions 0 Many settings like the big data centers run by large corporations have rather standard structure Can we automate discovery 0 How to debug if applications might sometimes bind to the wrong service 0 Delegation and migration are very tricky 0 Should a system automatically launch services on demand w Client talks to eStuffcom 0 One big issue we re oversimplifying 0 We think of remote method invocation and Web Services as a simple chain Soap RPC 39 A glimpse inside eStuffcom 39 39 Pubsub combined with pointtopoint communication technologies like TCP In fact things are even more complex 0 Major providers often have multiple centers in different locations 0 So You access Amazoncom but 0 Which data center should see your request 0 When it arrives which front end host should handle it 0 That host will parallelize page construction using multiple services 0 Those are replicated which servers will be used To illustrate look at CDNs 0 Content distribution networks serve up Videos and other web content 0 A simpler case than full scale web services but enough to see some of the major mechanisms in action 0 Used whenever you access a page with lots of images on it like the home page at Yahoo or livemsncom Basic event sequence 0 Client queries directory to find the service 0 Server has several options 0 Web pages with dynamically created URLs Server can point to different places by changing host names Content hosting companies remap URLs on the fly E g http Z wwwakamaicomwwwcscornelledu reroutes requests for wwwcscornelledu to Akamai 0 Server can control mapping from host to IP addr Must use short lived DNS records overheads are very high Can also intercept incoming requests and redirect on the y Content Origin here at Origin Server Content Servers distributed throughout the Internet Content is served from content servers nearer to the client and pushed Cached CDN Cached CDN Client requests content CS checks cache if miss gets content from origin server Cached CDN Client requests content CS checks cache if miss gets content from origin server CS caches content delivers to client Cached CDN Client requests content CS checks cache if miss gets content from origin server CS caches content delivers to client Delivers content out of cache on subsequent requests Pushed CDN Pushed CDN 1 Origin Server pushes content out to all CSs Request served from 083 CDN benefits 0 Content served closer to client 0 Less latency better performance 0 Load spread over multiple distributed CSs 0 More robust to ISP failure as well as other failures 0 Handle flashes better load spread over lSPs 0 But wellconnected replicated Hosting Centers can do this too How well do CDNs work How well do CDNs work Recall that the bottleneck links are at the edges ven if 08s are pushed towards the edge they are still behind the bottleneck link TCP performance DNS round trip TCP handshake 2 round trips 0 Slow start 0 8 round trips to fill DSL pipe 0 total 128K bytes Compare to 56 Kbytes for cnncom home page Download finished before slow start completes 0 Total 11 round trips 0 Coast to coast propagation delay is about 15 ms 0 Measured RTT last night was 5oms o No difference between west coast and Cornell 0 30 ms improvement in RTT means 330 ms total improvement 0 Certainly noticeable Lets look at a study 0 Zhang Krishnamurthy and Wills 0 ATampT Labs 0 Traces taken in Sept 2000 and Jan 2001 0 Compared CDNs with each other 0 Compared CDNs against non CDN Methodology 0 Selected a bunch of CDNs Akamai Speedera Digital Island Note most of these gone now 0 Selected a number of non CDN sites for which good performance could be expected 0 UO and international origin 0 US Amazon Bloomberg CNN ESPN MTV NASA Playboy Sony Yahoo 0 Selected a set of images of comparable size for each CDN and non CDN site 0 Compare apples to apples 0 Downloaded images from 24 NIMI machines Including DNS Lookup Time Cumulative Probability 9900 mission L 0 Client Location US H39I39I39P Option Parallel10 US Origin lntl Origin 2 3 4 5 6 7 Completion Time DNSDownloading seconds 8 Including DNS Lookup Time AbOUt one Client Location us H39I39I39P Option Parallel 10 second O on 0 0 9 p 9 m US Origin lntl Origin O Cumulative Probability 39 l a 4 5 6 7 8 Author conclusion CDNs generally provide much shorter download time CDNs outperformed nonCDNs 0 Why is this O Lets consider ability to pick good content servers 0 They compared time to download with a fixed IP address versus the IP address dynamically selected by the CDN for each download 0 Recall short DNS TTLs Effectiveness of DNS load balancing 1 September 2000 Janu an 2DU1 Lemerdowrieedtime Lomertotal me DNS rd nod nmnl smrtertatal 1irne 03 39 05 3 EB n g e b i 15 El 339 h g E e i 3 3 E g 9 a quot39 E 1 a 3 12 3 D EL 5 I I a g I 39 I Effectiveness of DNS load balancing quot quotquot W Jzmuzlrllr 2DU1 Black longer download Wrmmm Lomertotalllme tl me 3mm Blue shorter download time but total time longer because of DNS Q i a lookup g 3 5 Green samelP address E g chosen Equot Red shortertotal time I I l IIIIIIlllll39 effective September 2000 Janu an 2DU1 13 05 04 02 DJEFIH YEMEN 39039 ma a mapaedg Lemerdowrieedtime Lomertotal me DNS rammed anmn IP smrtertatal 1irne BPHJE EA emEBID mew ELDEW anemia EJepaadQ Other findings of study 0 Each CDN performed best for at least one NIMI Client 0 Why Because of proximity 0 The best origin sites were better than the worst CDNs 0 CDNs with more servers don t necessarily perform better 0 Note that they don t know load on servers 0 HTTP 11 improvements parallel download pipelined download help a lot 0 Even more so for origin non CDN cases 0 Note not all origin sites implement pipelining Ultimately a frustrating study 0 Never actually says why CDNs perform better only that they do 0 For all we know maybe it is because CDNs threw more money at the problem 0 More server capacity and bandwidth relative to load Back to web services 0 We ve seen that 0 They embody a lot of standards for good reasons 0 Talking to Amazoncom is far more complex than just connecting one computer to another many levels of choices and many services are ultimately involved 0 Even serving relatively static content entails remarkably complex and diverse infrastructure True services do much more than just hand out copies of les Relating to C5514O themes 0 We ll look more closely at some of the major components of today s most successful data centers 0 But rather than limiting ourselves to superficial structure we ll ask how things work on the inside 0 For example how does Google s Map Reduce work 0 It resides on a cluster management platform How does that work 0 At the core locking and synchronization mechanisms How do these work Trustworthy web services 0 A preoccupation of many today and a Cornell specialty 0 Not only do we want this complex infrastructure to work but we ALSO want it to 0 be secure and protect private data 0 give correct answers and maintain availability 0 be hard to disrupt or attack 0 defend itself against s oo nv A shinv etc 0 be ef cient to manage and cost effective 0 Existing platforms don t satisfy these goals Next week 0 Services found in cloud computing systems and other SOA environments 0 There are lots of ways to build them some more effective than others 0 Today we looked at standards but standards don t extend to telling us how to build the services we need 0 We ll spend a full lecture on Map Reduce 0 Recommend that you read the OSDI paper about this platform 0 1v1ap Keauce will be a focus of aSSIgnment one Designing a New Multicast Infrastructure for Linux Birman Mission Impossible 0 Today multicast is persona nongrata in most cloud settings 0 Amazon s stories of their experience with Violent load oscillations has frightened most people in the industry 0 They weren the only ones 0 Today 0 Design a better multicast infrastructure for using the Linux Red Hat operating system in enterprise settings 0 Target trading floor in a big bank if any are left on vvau Street croua computing in data centers What do they need 0 Quick scalable pretty reliable message delivery 0 Argues for IPMC or a protocol like Ricochet 0 Virtual synchrony Paxos transactions all would be examples of higher level solutions running over the basic layer we want to design 0 But we don t want our base layer to misbehave Reminder What goes wrong 0 Earlier in the semester we touched on the issues with IPMC in existing cloud platforms 0 Applications unstable exhibit Violent load swings 0 Usually totally lossless but sometimes drops zillions of packets all over the place 0 Various forms of resource exhaustion 0 Start by tr inv to understand the big picture why is this happening Misbehavior pattern 0 Noticed when an application layer solution like a Virtual synchrony protocol begins to exhibit wild load swings for no obvious reason QSM oscillated in this 200 node experiment when its damping and prioritization mechanisms were disabled 0 For example we saw this in QSM 12000 Quicksilver V 10000 E 8000 Scalable Multlcast cu go 6000 0 F1X1ng the problem g 4000 at the end to end E 2000 layer was really hard 0 250 400 550 700 850 time s Tracking down the culprit 0 Why was QSM acting this way 0 When we started work this wasn t easy to x 0 issue occurred only with 200 nodes and high data rates 0 But we tracked down a pattern 0 Under heavy load the network was dellverlng packets to our receivers faster than they could handle them 0 Caused k rn l level queues to overfl w 7 h n wide loss 0 Retransmission requests and resends made things worse 0 So goodput drops to zero overhead to in nity Finally problem repaired and we restart only to do it again Aside QSM works well now 0 We did all sorts of things to stabilize it 0 Novel minimal memory footprint design 0 Incredibly low CPU loads minimize delays 0 Prioritization mechanisms ensure that lost data is repaired rst before new good data piles up behind gap 0 But most systems lack these sorts of unusual solutions 0 Hence most systems simply destabilize like QSM did before we studied and xed these issues 0 Linux goal a system wide solution Assumption 0 Assume that if we enable IP multicast 0 Some applications will use it heavily 0 Testing will be mostly on smaller con gurations 0 Thus as they scale up and encounter loss many will be at risk of oscillatory meltdowns 0 Fixing the protocol is obviously the best solution 0 but we want the data center the cloud to also protect itself against disruptive impact of such events So why did receivers get so lossy 0 To understand the issue need to understand history of network speeds and a little about the hardware 4 NIC sends 5 NIC receives VI 1v run 3 Enqueued for I 6 Copied into a handy send mbuf fragments it 2 UDP adds heal 7 UDP queues on socket 1 App sends pari 8 App receives Network speeds 0 When Linux was developed Ethernet ran at ioMbits and NIC was able to keep up 0 Then network sped up looMbits common 1Gbit more and more often seen 10 or 40 soon 0 But typical PCs didn t speed up remotely that much 0 Why did PC speed lag 0 Ethernets transitioned to optical hardware 0 PCs are limited by concerns about heat expense Trend favors multicore solutions that run slower so why invest to create a NIC that can run faster than the bus NIC as a quotrate matcher 0 Modern NIC has two sides running at different rates 0 Ethernet side is blazingly fast uses ECL memory 0 Main memory side is slower 0 So how can this work 0 Key insight NIC usually receives one packet but then doesn t need to accept h n X packet 0 Gives it time to unload the incoming data 0 But why does it get away with this NIC as a quotrate matcher 0 When would a machine get several back to back packets 0 Server with many clients 0 Pair of machines with a stream between them but here limited because the sending N lC will run at the speed of its interface to the machine s main memory in today s systems usually 100MBits 0 In a busy setting only servers are likely to see backto back traffic and even the server is unlikely to see a long run packets that it needs to accept So normally 0 N IC sees big gaps between messages it needs to accept 0 This gives us time 0 for OS to replenish the supply of memory buffers 0 to hand messages off to the application 0 In effect the whole system is well balanced 0 But notice the hidden assumption 0 All ofthis requires that most communication be pointto point with high rates ofmulticast it breaks down Multicast wrench in the works 0 What happens when we use multicast heavily 0 A NIC that on average received 1 out of k packets suddenly m1gnt receive many in a row just thinking in terms of the odds 0 Hence will see far more back to back packets 0 But this stresses our speed limits 0 NIC kept up with fast network traffic partly because it rarely needed to accept a packet letting it match the fast and the slow sides 0 With high rates of incoming traffic we overload it Intuition like a highway offramp 0 With a real highway cars just end up in a jam 0 With a high speed optical net coupled to a slower NIC packets are dropped by receiver More NIC worries 0 Next issue relates to implementation of multicast 0 Ethernet NIC actually is a pattern match machine 0 Kernel loads it with a list of maskvalue pairs 0 Incoming packet has a destination address 0 Computes destampmaslltvalue and if so accepts 0 Usually has 8 or 16 such pairs available More NIC worries 0 If the set of patterns is full kernel puts NIC into what we call promiscuous mode 0 It starts to accept all incoming traffic 0 Then OS protocol stack makes sense of it o If not forme ignore 0 But this requires an interrupt and work by the kernel 0 All of which adds up to sharl ll higher 0 CPU costs and slowdown due to cacheTLB effects 0 Loss rate because the more packets the NIC needs to receive the more it will drop due to overrunning queues More NIC worries 0 We can see this effect in an experiment done by YoaV Tock at IBM Research in Haifa Packet loss rate What about the switchrouter 0 Modern data centers used a switched network architecture v Q L c F L11 39 F 7731 amp F 0 Question to ask how does a switch handle multicast Concept of a Bloom filter 0 Goal of router 0 Packet p arrives on port a Quickly decide which ports to Iorwarcl it on 0 Bit vector filter approach 0 Take IPMC address of p hash it to a value in some range like 01023 0 Each output port has an associated bit vector Forward p on each port with that bit set 0 Bitvector gt Bloom filter 0 Just do the hash multiple times test against multiple vectors Must match in all of them reduces collisions Concept of a Bloom filter 0 So take our Class D multicast address 2330oo 8 0 331731129 hash it 3 times to a bit number 0 Now look at outgoing link A o Check bit 19 in 0101010010000001010000010101000000100000 Check bit 33 in 10100000101010oo0001V101001ooooomoooooJ Check bit 8 in OO00001010100000011010100lOOOOOOlOlOOOOO all matched so we relay a copy 0 Next look at outgoing link B o match failed 0 ETC What about the switchrouter 0 Modern data centers used a switched network architecture P 5 F t g 0 Question to ask how does a switch handle multicast Aggressive use of multicast 0 Bloom filters fill up all bits set 0 Not for a good reason but because of hash conflicts 0 Hence switch becomes promiscuous 0 Forwards every multicast on every network link 0 Amplifies problems confronting NIC especially if NIC itself is in promiscuous mode Worse and worse 0 Most of these mechanisms have long memories 0 Once an IPMC address is used by a node the NIC tends to reta1n memory of it and the switch does for a long time 0 This is an artifact of a stateless architecture Nobody remembers why the IPMC address was in use Application can leave but no delete will occur for a while 0 Underlying mechanisms are lease based periodically replaced with fresh data but not instantly puing the story into focus 0 We ve seen that multicast loss phenomena can ultimately be traced to two major factors 0 Modern systems have a serious rate mismatch Vis a Vis the network 0 Multicast delivery pattern and routing mechanisms scale poorly 0 A better Linux architecture needs to 0 Allow us to cap the rate of multicasts 0 Allow us to control which apps can use multicast 0 Control allocation of a limited set of multicast groups Dr Multicast the MCMD 0 RX for your multicast woes 0 Intercepts use of IPMC 0 Does this by library interposition exploiting a feature of DLL linkage 0 Then maps the logical IPMC address used by the application to either A set of pointtopoint UDP sends A physical IPMC address for lucky applications 0 Multiple groups share same IPMC address for ef ciency Criteria used 0 Dr Multicast has an acceptable use policy 0 Currently expressed as low level rewall type rules but could easily mtegrate with higher level tools 0 Examples 0 Application such and such can cannot use IPMC 0 Limit h system as awh l to 50 IPMC addresses 0 Can revoke IPMC permission rai idli in case of trouble How it works 0 Application uses IPMC source Receiver one of many IPMC event 9 u P n a multlcast interface SOCket Interface H ow it we rks 0 Application uses IPMC Receiver one of many source IPMC event multicast interface SOCket Interface V W UDP multicast interface 0 Very similar With UDP 0 Socket creates a socket 0 Bind connects that socket to the UDP multicast distribution network 0 Sendmsgrecvmsg send data W UDP multicast interface 0 Very similar With UDP 0 Socket creates a socket 0 Bind connects that socket to the UDP multicast distribution network 0 Sendmsgrecvmsg send data w M I m IC ry 0 Many options could mimic IPMC 0 Point to point UDP or TCP or even HTTP 0 Overlay multicast 0 Ricochet adds reliability 0 MCMD can potentially swap any of these in under user control Optimization 0 Problem of finding an optimal group to IPMC mapping is surprisingly hard 0 Goal is to have an exact mapping apps receive exactly the traf c they should receive Identical groups get the same IPMC address 0 But can also fragment some groups 0 Should we give an IPMC address to A to B to A03 0 Turns out to be NP complete Greedy heuristic 0 Dr Multicast currently uses a greedy heuristic 0 Looks for big busy groups and allocates IPMC addresses to them rst 0 Limited use of group fragmentation 0 We v e explored more aggressive options for fragmenting big groups into smaller ones but quality of result is very sensitive to properties of the pattern of group use O Solution is fast not optimal but works well Flow control 0 How can we address rate concerns 0 A good way to avoid broadcast storms is to somehow suppose an AUP of the type quotat most XX lPMCsec O Two sides of the coin 0 Most applications are greedy and try to send as fast as they can but would work on a slower or more congested network For these we can safely slow down their rate 0 But some need guaranteed real time delivery 0 Currently can t even specify this in Linux Flow control 0 Approach taken in Dr Multicast 0 Again starts with an AUP o Puts limits on the aggregate IPMC rate in the data center And can exempt specific applications from rate limiting 0 Next senders in a group monitor traffic in it 0 Conceptually happens in the network driver 0 Use this to apportion limited bandwidth 0 Sliding scale heavy users give up more Flow control 0 To make this work the kernel send layer can delay sending packets 0 and to prevent application from overrunning the kernel delay the application 0 For sender using non blocking mode can drop packets if sender side becomes overloaded 0 Highlights a weakness of the standard Linux interface 0 N 0 easy way to send upcalls notifying application wnen conditions change congestion arises etc The quotAJILquot protocol in action 0 Protocol adds a rate quot limiting module to the Biggiizii iii Dr Multicast stack 39 Uses a gossiplike MILJiummll lim M i h f39 WW iquot quotW w i H w mHWI W mec anlsm to lgure W i i H i H i o E 153 u 1 39 I I out the rate 11m1ts I y 0 Work by Hussam Abu SB I 1 A i n m In I mum H Lu u 1 HI uh I In 1 4 01 H H WTYYI I39V39HW39II39W Iquot 39r39lI 1quot r l I Y x39 f7 I quotI quot VTI quotVI39VW my S e a 2m 43932 e 693 s aa 1888 Fast joinleave patterns 0 Currently Dr Multicast doesn t do very much if applications thrash by joining and leaving groups rapidly 0 We have ideas on how to rate limit them and it seems like it won t be hard to support 0 Real question is how should this behave i End to End philosophy debate 0 In the dark ages E2E idea was proposed as a way to standardize rules for what should be done in the network and what should happen at the endpoints 0 In the network 0 Minimal mechanism no reliability just routing 0 Idea is that anything more costs overhead yet end points would need the same mechanisms anyhow since best guarantees will still be too weak 0 End points do security reliability flow control A religion but inconsistent 0 E2E took hold and became a kind of battle cry of the Internet community 0 But they don t always stick with their own story 0 Routers drop packets when overloaded 0 TCP assumes this is the main reason for loss and backs down 0 When these assumptions break down as in wireless or WAN settings TCP out of the box performs poorly EZE and Dr Multicast 0 How would the E2E philosophy View Dr Multicast 0 On the positive side the mechanisms being interposed operate mostly on the edges and under AUP control 0 On the negative side they are network wide mechanisms imposed on all users 0 Original E2E paper had exceptions perhaps this falls into that class of things 0 E2E except when doing something something in the networK layer Drlngs big win costs little and can t be done on the edges in any case Summary 0 Dr Multicast brings a Vision of a new world of controlled IPMC 0 Operator decides who can use it when and how much 0 Data center no longer at risk of instability from malfunctioning applications 0 Hence operator allows IPMC in trust but verify and if problems emerge intervene 0 Could reopen door for use of IPMC in many settings Birman C85410 Fall 2008 State Machines History 0 Idea was first proposed by Leslie Lamport in 1970 s 0 Builds on notion of a finite state automaton 0 We model the program of interest as a black box with inputs such as timer events and messages 0 Assume that the program is completely deterministic 0 Our goal is to replicate the program for fault tolerance 0 So make multiple copies of the state machine 0 Then design a protocol that for each event replicates the event and delivers it in the same order to each copy 0 The copies advance through time in synchrony y gt State Machine Event 6 State Machine A simple fault tolerance concept 0 We replace a single entity P with a set O Now our set can tolerate faults that would have caused P to stop providing service 0 Generally thinking of hardware faults 0 Software faults might impact all replicas in lock step 0 Side discussion 0 Why do applicationsfail Hardware Software Sidebar Why do systems fail 0 A topic studied by many researchers 0 They basically concluded that bugs are the big issue 0 Even the best software coded with cleanroom techniques will exhibit signi cant bug rates 0 Hardware an issue too of course 0 Sources of bugs 0 Poor coding inadequate testing 0 Vague speci cations including confusing documentation that was misunderstood when someone had to extend a pre existing system 0 Bohrbugs and Heisenbugs Sidebar Why do systems fail 0 Bohrbug 0 Term reminds us of Bohr s model of the nucleus A solid little nugget 0 If you persist you ll manage to track it down Like a binary search Sidebar Why do systems fail 0 Heisenbug 0 Term reminds us of Heisenberg s model of the nucleus o A wave function can t know both location and momentum 0 Every time you try to test the program the test seems to change its behavior 0 Often occurs when the bug is really a symptom of some much earlier problem Most studies 0 Early systems dominated by Bohrbugs 0 Mature systems show a mix 0 Many problems introduced by attempts to x other bugs 0 Persistent bugs usually of Heisenbug variety 0 Over long perlocls upgrading env1ronment can often destabilize a legacy system that worked perfectly well 0 Cloud scenario 0 Rare hardware and environmental events are actually very common in huge data centers Determinism assumption 0 State machine replication is 0 Easy to understand 0 Relatively easy to implement 0 Used in a CORBA fault tolerance standard 0 but there are a number of awkward assumptions 0 Determinism is the first of these 0 Question How deterministic is a modern application coded in a language such as Java Sources of nondeterminism 0 Threads and thread scheduling parallelism 0 Precise time when an interrupt is delivered or when user input will be processed 0 Values read from system clock or other kinds of operating system managed resources like process status data CPU load etc 0 If multiple messages arrive on multiple input sockets the order in which they will be seen by the process 0 When the garbage collector happens to run 0 Constants like my IP address or port numbers assigned to my sockets by the operating system Heisenbug problems 0 Many Heisenbugs are just vanilla bugs but 0 They occur early in the execution 0 And they damage some data structure 0 The application won t touch that structure until much later when some non deterministic thing happens 0 But then it will crash 0 So the crash symptoms vary from run to run 0 People on the sustaining support team tend to try and x the symptoms and often won t understand code well enough to understand the true cause Sidebar Life of a program 0 Coded by a wizard who really understood the logic 0 But she moved to other projects before nishing 0 Handed off to QA 0 QA did a reasonable job but worked with inadequate test suite so coverage was spotty 0 For example never tested clocks that move backwards in time or TCP connections that break when both ends are actually still healthy 0 In field such events DO occur but attempts to fix them just added complexity and more bugs Overcoming nondeterminism 0 One option disallow non determinism 0 This is what Lamport did and what CORBA does too 0 But how realistic is it 0 Worry what if something you use encapsulates a non deterministic behavior unbeknownst to you 0 Modern development styles big applications created from black box components with agreed interfaces 0 We lack a test for determinism Overcoming nondeterminism 0 Another option each time something non deterministic is about to happen turn it into an event 0 For example suppose that we want to read the system clock 0 If we simply read it every replica gets different result 0 But if we read one clock and replicate the value they see the same result 0 Trickier how about thread scheduling 0 With multicore hardware the machine itself isn t deterministic More issues 0 For input from the network or devices we need some kind of relay mechanism 0 Something that reads the network or the device 0 Then passes the events to the group of replicas 0 The relay mechanism itself won t be fault tolerant should this worry us 0 For example if we want to relay something typed by a user it starts at a single place his keyboard Implementing event replication 0 One option is to use a protocol like the Oracle protocol used in our GMS 0 This would be tolerant of crash failures and network faults 0 The Oracle is basically an example of a State Machine 0 Performance should be ok but will limited by RTT between the replicas Byzantine Agreement 0 Lamport s focus applications that are compromised by an attacker 0 Like a virus the attacker somehow takes over one of the copies 0 His goal ensure that the group of replicas can make progress even if some limited number of replicas fail in arbitrary ways they can lie cheat steal 0 This entails building what is called a Byzantine Broadcast Primitive and then using it to deliver events Questions to ask 0 When would Byzantine State Replication be desired 0 How costly does it need to be 0 Lamport s protocol was pretty costly 0 Modern protocols are much faster but remain quite expensive when compared with the cheapest alternatives 0 Are we solving the right problem 0 Gets back to issues of determinism and relaying events 0 Both seem like very difficult restrictions to accept without question later we ll see that we don t even need to do so Another question 0 Suppose that we take n replicas and they give us an extremely reliable state machine 0 It won t be faster than 1 copy because the replicas behave identically in fact it will be slower 0 But perhaps we can have 1 replica back up n 1 others 0 Or we might even have everyone do 1 n th of the work and also back up someone else so that we get n times the performance 0 In modern cloud computing systems performance and scalability are usually more important than tolerating insider attacks expressed with a state machine 0 Core role of the state machine put events into some order 0 Events come in concurrently 0 The replicas apply the events in an agreed order 0 So the natural match is with order based functions 0 Locking lock requests lock grants 0 Parameter values and system con guration 0 Membership information as in the Oracle 0 Generalizes to a notion of role delegation Core functionality 0 Anything that can be expressed in terms of an event that gets applied to the state and causes a new state 0 Locking events are lock requests release 0 Parameter changes events are new values 0 Membership changes events are join failure 0 Security actions events change permissions create new actors or withdraw existing roles 0 DNS events change ltnamegtltipgt mappings 0 In fact the list is very long Reminds us of active directory or dynamic DNS aka Network Info Svc Fancier uses 0 Castro and Liskov use a state machine to manage files actually stored in an of ine store 0 They call this Practical Byzantine Replication 0 The state machine tracks which copies are current and who has them a small amount of meta data 0 And they use Byzantine Agreement for this 0 The actual file contents are not passed through the state machine so it isn t on the critical path Role Delegation 0 New concept for a very sophisticated way of thinking about state machine replication 0 Starts with our GMS perspective of state machine as an append oriented log 0 Then like we did treats this as a set of logs and then as a set of logs spread over a hierarchy of state machines Role Delegation 0 Now think about this scenario 0 Initially the lock for the printer resided at the root 0 Then we moved it to cscornelledu 0 Later we added a sub lock for the printer cartridge 0 Notice similarity to human concept of handing a role to a person 0 John you ll be in charge of the printer 0 John OK then Sally I want you to handle the color ink levels in the cartridge Role Delegation 0 We can formalize this concept of role delegation 0 Won t do so in cs5410 0 Basic outline 0 Think of the log as a variable 0 Work with pairs one has values and one tracks the owner of the log Appending to the ownership log lets us transfer ownership to someone else 0 Think of decisions as functions that are computed over these variables Role Delegation 0 In this way of thinking we can understand our GMS as a big role delegation and decision making tool 0 It can handle any decision that occurs in a state machine where all the needed variables are local 0 But it can t handle decisions that require one shot access to variables split over multiple GMS services Example 0 Suppose the FBI handles all issues relatin ents Mulder and Scully work at the FBI 39 39 After reading a Sun article LlUlllUlCD Near Bell Tower Mulder and Scully leap on a plane I 7 1 a 7 M 1 WHAT IS HVZ CURRENT HVZ GAMES M Our State Machine Challenge 0 Should Cornell give Mulder access to student records 0 Think of this as a computer science question Grant Access 0 Issue is a multi part decision 0 Are Mulder and Scully legitimate FBI agents 0 Is this a real investigation 0 What are Cornell policies for FBI access to student records 0 Are those policies superceded by the Zombie outbreak 0 Very likely decision requires multiple sub decisions some by FBIgov and some by Cornelledu in their respective GMS serV1ces1 Options 0 Break decision into parts 0 Issue what if outcome leaves some form of changed state Denlncl a side effect 0 Until we know the set of outcomes we don t know if we should update the state O Collect data at one place 0 But where FBI won t transfer all its data to Cornell nor will Cornell transfer data to FBI Can t always solve such problems 0 If a decision splits nicely into separate ones sure O but many don t O If a decision requires one shot access to everything in one place we need a kind of database transaction 0 Allows atomicity for multi operation actions 0 Would need to add these functions to our GMS and doing so isn t trivial Performance worries 0 Last in our series of yes but warnings 0 Recall that with a GMS we send certain kinds of decisions to the GMS and it reports results back 0 This means that decision making is remote 0 May sound minor but has surprisingly big costs 0 Especially big issue if load becomes high Summary 0 State machine concept is very powerful 0 But it has limits too 0 Requires determinism which many applications lack 0 Can split application GMS up using role delegation but functions need to be disjoint 0 Scalability 0 If one action sometimes requires sub actions by multiple GMS role holders we would need transactions 0 But due to indirection and nature of protocol state machines are also fairly slow Iiun Song Fault Tolerant Systems 0 By now probably obvious that systems reliability availability is a key concern 0 Downtime is expensive 0 Replication is a general technique for providing fault tolerance W Replication unreplicated service Client server Replication unreplicated service replicated service Client server server replicas Replication 0 Applications as deterministic state machines O Reduce the problem of replication to that of agreement 0 Ensure that replicas process requests in the same order 0 batety clients never observe 1ncons1stent DenaV1or 0 Liveness system is always able to make progress Traditional Assumptions 0 Synchrony 0 Bounded difference in CPU speeds 0 Bounded time for message delivery 0 Beniun Crash faults 0 When machines fail they stop producing output immediately and forever What if these assumptions don t hold Asynchrony 0 In the real world systems are never quite as synchronous as we would like 0 Asynchrony is a pessimistic assumption to capture real world phenomenon 0 Messages will eventually be delivered processors will eventually complete computation But no bound on time 0 In general 0 OK to assume synchrony when providing liveness 0 Dangerous NOT OK to assume synchrony for safety Byzantine Faults 0 Crash faults are a strong assumption 0 In practice many kinds of problems can manifest 0 Bit ip in memory 0 Intermittent network errors 0 Ma11c1ous attacks 0 Byzantine faults strongest failure model 0 Completely arbitrary behavior of faulty nodes Byzantine Agreement 0 Can we build systems that tolerate Byzantine failures and asynchrony YES 0 Use replication Byzantine agreement protocol to order requests 0 Cost 0 At least 3t1 replicas 5t1 for some protocols 0 Communication overhead 0 Safety in the face of Byzantine faults and asynchrony 0 Liveness in periods of SI nchronJ PBFT 0 Castro and Liskov Practical Byzantine Fault Tolerance OSD199 0 The first replication algorithm that integrates Byzantine agreement 0 Demonstrates that Byzantine Fault Tolerance is not prohibitively expensive 0 Sparked off a thread of research that led to the development of many Byzantine fault tolerant algorithms and systems V PBFT Overview 0 Servers are replicated on 3t1 nodes O One particular server is called the primary Also called the leader or the coordinator 0 A continuous period of time during which a server stays as the primary is called a View or a configuration 0 Fixed primary within a View 0 Client submits request to primary 0 Primary orders requests and sends them to all nodes 0 Client waits for identical replies from at least t1 nodes primary 3 6 O view Client 0 Waits for t1 identical replies 0 Why is this sufficient 0 At most t failures So at least one of the t1 replies must be from a correct node 0 PBFT ensures that non faulty nodes never go into a bad state so their responses are always valid 0 Dif cult How to ensure this is the case 0 If client times out before receiving sufficient replies broadcast request to all replicas request m a 0 primary replica O I replica 1 replica 2 Primary assigns the request with a sequencequotnumber n Replicas accept preprepare if In View v never accepted preprepare for vn with different request w repllca O K 13s Riga R R k k k 1 repllca 1 x replica 2 K I I collect pre re are and 2f matchth re ares J Y Pcertlflcatemvn Phase 2 Prepare 0 Each replica collects 2f prepare msgs 2f msgs means that 2f1 replicas saw the same preprepare msg At least fi of these must be honest Since there are only 3f1 replicas this means that there cannot eXist more than 2f replicas that received a conflicting pre prepare msg or claim to have received one o All correct replicas that receive 2f prepare msgs for a ltv n mgt tuple received consistent msgs COMMITvnDm2 2 replies I I 5 5 2 t comml l 4 l 39 N l replica O I x X I I 39 39 V V 39 I K V 39 I replica 1 4 I V g 39 39 139 39 I 39 I I l l I l I I l l l I I I I I l I l I I I I I 1 I I I I i I I a Ccertifigartemvn Request m executed after having Ccen i catemvn executing requests with sequence number less than n Phase 3 Commit 0 If a correct replica p receives 2f1 matching commit msgs 0 At least f1 correct replicas sent matching msgs 0 N 0 correct replica can receive 2f1 matching commit msgs that contradict with the ones that p saw 0 In addition phase 2 ensures that correct replicas send the same commit msgs so together with the View change protocol correct replicas will eventually commit Why does this work 0 When a replica has collected sufficient prepared msgs it knows that sufficient msgs cannot be collected for any other request with that sequence number in that view 0 When a replica collects sufficient commit msgs it knows that eventually at least f1 non faulty replicas will also do the same 0 Formal proof of correctness is somewhat involved Refer to paper Drop by my office 320 Upson if you need help View Change 0 What if the primary fails View change 0 Provides liveness when the primary fails 0 New primary View number mod N 0 Triggered by timeouts Recall that the client broadcasts the request to all replicas if it doesn t receive sufficient consistent requests after some amount of time This triggers a timer in the replicas View Change 0 A node starts a timer if it receives a request that it has not executed If the timer expires it starts a View change protocol 0 Each node that hits the timeout broadcasts a VIEW CHANGE msg containing certificates for the current state 0 New rimar collects 2f1 VIEWCHANGE msgs computes the current state of the system and sends a N EWVI EW msg 0 Replicas check the NEWVIEW msg and move into the new View PBFT Guarantees 0 Safety all non faulty replicas agree on sequence numbers of requests as long as there are lt t Byzantine failures 0 Liveness PBFT is dependent on View changes to provide liveness However in the presence of asynchrony the system may be in a state of perpetual View change In order to make progress the system must be synchronous enough that some requests are executed before a View change Performance Penalty 0 Relative to an unreplicated system PBFT incurs 3 rounds of communication preprepare prepare commit 0 Relative to a system that tolerates only crash faults PBFT requires 3t1 rather than 2t1 replicas 0 Whether these costs are tolerable are highly application specific Beyond PBFT 0 Fast Byzantine Paxos Martin and Alvisi 0 Reduce 3 phase commit down to 2 phases 0 Remove use of digital signatures in the common case O Quorum based algorithms Eg Q U Abu El Malek et al 0 Require 5t1 replicas 0 Does not use agreement protocols Weaker guarantees Better performance when contention is low Zyzzyva Kotla et al 0 Use speculation to reduce cost of Byzantine fault tolerance 0 Idea leverage clients to avoid explicit agreement 0 Suf cient Client knows that the system is consistent 0 Not required Replicas know that they are consistent 39 How clients commits output only if they know that the system is consistent Zyzzyva 0 3t1 replicas 0 As in PBFT execution is organized as a sequence of Views 0 In each View one replica is designated as the primary 0 Client sends request to the primary the primary forwards the request to replicas and the replicas execute the request and send responses back to clients Zyzzyva 0 If client receives 3t1 consistent replies it s done 0 If client receives between 2t1 and 3t consistent replies the client gathers 2t1 responses and distributes a commit certificate to the replicas When 2t1 replicas acknowledge receipt of the certificate the client is done Zyzzyva Caveats 0 Correct replicas can have divergent state Must have a way to reconcile differences 0 View change protocol significantly more complicated since replicas may not be aware of a committed request only a client knew by receiving 3t1 identical replies 0 Performance is timeout sensitive How long do clients wait to see if they ll receive 3t1 identical replies Beyond Zyzzyva 0 In the good case Zyzzyva takes 3 network latencies to complete Client Primary Replicas Client Is is possible to eliminate yet another round of communication to make Byzantine Fault Tolerance perform as well as an unreplicated system 0 Yes If clients broadcast requests directly to all replicas leaderless protocols are available that can allow requests to complete in 2 network latencies Client Replicas Client Bosco Byzantine One Step Consensus 0 In the absence of contention Byzantine agreement is possible in one communication step 0 Strong one step Byzantine agreement 0 One step performance even in the presence of failures 0 7t1 replicas 0 vveaK one step byzantme agreement 0 One step performance only in the absence of failures and contention 0 5t1 replicas 0 State machine replication is a popular approach to provide fault tolerance in real systems 0 Chubby Google and Zookeeper Yahoo are toolkits that are essentially Dunt on top of agreement protocms 0 But Byzantine fault tolerant systems are not as common why 0 Application speci c checks can be used to maskdetech non crash faults 0 Performance overhead signi cant More machines More network overhead 0 As machines bandwidth become cheaper and downtime become more intolerable will this change 0 Can BFT help make applications easier to write 0 Can a combination of BFT code obfuscation and other techniques make systems more secure References 1 Miguel Castro and Barbara Liskov Practical Byzantine Fault Tolerance OSDI 1999 2 Michael AbdEl Malek Gregory R Granger Garth R Goodson Michael K Reiter Jay I Wylie Fault Scalable Byzantine Fault Tolerant Services SOSP 2005 3 KamaKr1snna 1ot1a Lorenzo Alvisi Mike uannn Allen Clement Edmund Wong Zyzzyva Speculative Byzantine Fault Tolerance SOSP 2007 4 Jean Philippe Martin and Lorenzo Alvisi Fast Byzantine Consensus IEEE TODSC 2006 5 Yee Jiun Song and Robbert van Renesse Bosco One Step Byzantine Asynchronous Consensus DISC 2008 n Happy Thanksgiving Cloud Computing and Edge Computing Birman C55410 Fall 2008 Welcome to C55140 0 A course on cloud computing edge computing and related systems technologies 0 We re using a textbook written by Professor Birman a bit out of date Copies on reserve 0 Grading mostly based on three assignments aimed at hands on experience with the things we re learning in class 0 Background Java or C or C familiar with threads comfortable writing programs had an architecture course and an operating systems course Two sidebyside revolutions 0 Cloud computing trend is to move more and more computing functions into large shared data centers 0 Amazon EC2 hosts data centers for customers 0 Google runs all sorts of office applications email etc on their systems 0 Yahoo wants to be a one source computing solution 0 IBM has aVision of computing like electric power 0 Edge computing direct interactions among computers peers out in the Internet 0 For example multi user games VR immersion Cloud Computing Concept Email quotfile storage Databases SpreadSheetS IM search of ce apps Client systems use web technologies GoogeIBMAmazonYahoo host the services Supporting technologies 0 Infrastructure 0 Cloud enablers 0 Core management and 0 Map Reduce scheduling functions BigTable 0 Event noti cation services Astrolabe 39 Storage Systems GFS Amazon s shopping cart 0 Monitoring debugging tun1ng assmtance Even higher level 0 Tools for building and 0 Increasingly Virtuallzatlon analyzing massive graphs Sadly we can t do everything 0 In C8514o we don t have time to cover all of these topics so we ll focus on infrastructure tools 0 You can t build things like Map Reduce without them 0 But you won t learn to use Hadoop a popular open source Map Reduce implementation in this class 0 Even within the infrastructure space we ll pick and choose our topics to get at some of the key ideas 0 Secondary issue we also want to look at the edge a updau aquot re lnra quot39Ms 53 on the edge of he ntwrk VR imersion stributed programming by drag and drop Live objects are 0 An integration tool a thin layer that lets us glue components together into event driven applications 0 A kind of drag and drop programming tool 0 Common framework unifies replication technologies Example Applications gt Photo sharing that works gt Games and virtual worlds gt Collaboration tools gt Emergency response gt Office automation Mobile services gt New Internet Services gt Coordinated planning gt Interactive television gt Social networking center FESOUI CES 0 Data centers host maps databases rendering software 0 Think of the static content as coming from a data center and streams of events re ecting realtime content coming directly from sensors and synthetic content sources combined on your end user node 0 All of this needs to scale to massive deployments Our goals today 0 In C8514o we ll peel back the covers 0 Try and understand major technologies used to implement cloud computing platforms How did IBM Amazon Google etc build their cloud computing infrastructure What tools do all of these systems employ How are they implemented and What are the cost performance tradeoffs How robust are they 0 And also how to build your own cloud applications Key issue to scale well they need to replicate functionality 0 The underlying standards Web Services and CORBA edge technologies 0 The edge is a world of peer to peer solutions 0 BitTorrent NapsterGnutella PPLive Skype and even Live Objects 0 How are these built What issues need to be addressed when systems live out in the wild in the Internet 0 But those edge solutions are invariably supported by some kind of cloud service and in the future the integration is going to become more significant 0 What happens when we graft edge solutions to cloud platforms Connecting the cloud to the edge 0 The cloud is a good place to 0 Store massive amounts of content 0 Keep precomputed information account information 0 Run scalable services 0 The edge is a good place to 0 Capture data from the real world sensors cameras 0 Share high rate video voice event streams updates 0 Support direct collaboration interaction Topics we ll cover 93 08 Web Services and SOA standards CO RBA and 00 standards 9808 Key components of cloud computing platforms 91008 Cloud computing applications and Map Reduce 91508 Thinking about distributed systems Models of time and event ordering 91708 Clock synchronization and the limits of real time 92208 Consensus on event ordering The GMS Service1 92408 The GMS Service2 92908 State machine concept Possible functionality that our GMS can support 10 1 08 Replication basic goals Ricochet 10 6 08 Replication with stronger semantics Virtual synchrony 10 8 08 Replication with consensus semantics Paxos 10 15 08 Transactional subsystems and Web Services support for the transactional model 10 20 08 How transactional servers are implemented 10 22 08 Gossip based replication and system monitoring Astrolabe 10 27 08 DHTs Chord Pastry Kelips 10 29 08 T Man 11 03 08 Trusted computing issues seen in cloud settings Practical Byzantine Agreement 11 05 08 Interconnecting cloud platforms with Maelstrom Mirrored file systems 1110 08 Life on the Edge Browsers BitTorrent 11 12 08 Sending complex functions to edge systems Javascript and AJAX 11 17 08 In flight web page and data modi cations and implications Web tripwires 11 19 08 Pure edge computing Gnutella 11 24 08 Resilient Overlay Networks PPLive Stylistic comments 0 One way to approach CS5l4o would focus on a how to way of using standards and packages 0 For example Microsoft favors Indigo as their web services solution for Windows platforms 0 We could spend 12 weeks learning everything we can about Indigo do projects using Indigo etc 0 You would emerge as an Indigo expert Stylistic comments 0 A second extreme would be to completely ignore the web services standards and focus entirely on the theory 0 We would discuss ways of thinking about distributed systems 0 Models for describing protocols 0 Ways of proving things about them 0 You would be left to apply these ideas as an exercise CSSl40 Pursuing the middle 0 The class will try and live somewhere in the middle 0 About half of our lectures are on very concrete real systems like BitTorrent or Chubby and how they work 0 And about half our lectures are concerned with the platform standards and structures and how they look 0 A few lectures focus on the underlying theory 0 Homework assignments will involve building real but simple distributed systems using these ideas 0 For this we ll work with Windows platforms 81 technology Let s look at an example 0 To illustrate the way the class will operate let s look at a typical example of a problem that cuts across these three elements 0 It arises in a standard web services context 0 But it raises harder questions 0 Ultimately theoretical tools help us gain needed clarity ATC Architecture ATC status is a kind of temporal database for each ATC sector it tells us what flights might be in that sector and when they will be there Server replication O Let s think about the service that tracks the status of ATC sectors 0 Client systems are like web browsers 0 Server is like a web service ATC is a cloud but one with special needs it speaks with one voice 0 Now an ATC needs highly available servers 0 Else a crash could leave controller unable to make decisions 0 So how can we make a service highly available Server replication 0 Key issue we need to maintain that one voice property 0 Behavior of our highly available service needs to be indistinguishable from that of a traditional service running on some single node that just doesn t fail 0 Most obvious option primary backup 0 We run two servers on separate platforms 0 The primary sends a log to the backup 0 If primary crashes the backup soon catches up and can take over Clients initially connected to primary which keeps backup up to date Backup collects the log Transient problem causes some links to break but not all Backup thinks it is now primary primary thinks backup is down Split brain Syndrome Some clients still connected to primary but one has switched to backup and one is completely disconnected from both Oh no But could this happen 0 How do web service systems detect failures 0 The speci cations don t really answer this question 0 A web client senses a failure if it can t connect to a server or if the connection breaks 0 And the connections are usually TCP 0 So how does TCP detect failures 0 Under the surface TCP sends data in IP packets and the receiver acknowledges receipt 0 TCP channels break if a timeout occurs Provoking a transient fault 0 Build a fairly complex network with some routers multiple network segments etc 0 Run TCP over it in the standard way 0 Now disrupt some core component 0 TCP connections will break over a 90 second period 0 So restore service after perhaps 30 seconds Some break but some don t Implication 0 We end up with multiple servers that might each think they are in charge of our ATC system 0 An ATC System with a split brain could malfunction disastrously 0 For example suppose the service is used to answer the question is anyone ying in such and such a sector of the sky 0 With the split brain version each half might say nope in response to different queries Can we fix this problem 0 Sure Just have the backup unplug the primary 0 But less draconian solutions are also possible 0 We ll look at this issue later in the class 0 Need agreement on which machines are up and which have crashed 0 Can t implement agreement on a purely 1 to 1 also called end to end basis Separate decisions can always lead to inconsistency So we need a membership service and this is fundamentally not an end to end concept EndtoEnd argument 0 Commonly cited as a justification for not tackling reliability in low levels of a platform 0 Originally posed in the Internet 0 Suppose an IP packet will take n hops to its destination and can be lost with probability p on each hop 0 Now say that we want to transfer a file of k records that each fit in one IP or UDP packet 0 Should we use a retransmission protocol running end to end or n TCP protocols in a chain sv Endto End argument Loss rate p O 0 0 Probability of successful transit 1p Expected packets 10st k k1p11 Saltzer et al analysis 0 If p is very small then even with many hops most packets will get through 0 The overhead of using TCP protocols in the links will slow things down and won t often benefit us 0 And we ll need an end to end recovery mechanism no matter what since routers can fail too 0 Conclusion let the end to end mechanism worry about reliability Generalized EndtoEnd view 0 Low level mechanisms should focus on speed not reliability C The application should worry about properties it needs 0 OK to Violate the E2E philosophy if E2E mechanism would be much slower E2E is visible in J2EE and NET 0 If something fails these technologies report timeouts 0 But they also report timeouts when nothing has failed 0 And when they report timeouts they don t tell you what failed 0 And they don t offer much help to x things up after the failure either 0 Timeouts and transient faults can t be distinguished 0 Thus we can always detect failures 0 But we ll sometimes make mistakes endtoend failure detection 0 ATC example illustrated a core issue 0 Existing platforms 0 Lack automated management features 0 Inherit features of the Internet even ones that embody inappropriate behavior 0 In this example we saw that TCP handles errors in ad hoc inconsistent ways 0 Developers often forced to step outside of the box and may succeed but might stumble 0 In C5514o we ll try and tackle some of these questions in a more principled way Even this case illustrates choice 0 We have many options if we are willing to Change the failure semantics of our platform 0 Just use a single server and wait for it to restart This common today but too slow for ATC Cloud computing systems usually need at least a few seconds 0 Give backup a way to physically kill the primary e g unplug it o If backup takes over primary shuts down 0 Or require some form of majority vote and implement this in the cloud computing platform itself System maintains agreement on its own structure 0 Later we ll see what the last of these options entails Elements of cloud computing 0 One perspective the laundry list of tools and technologies we saw earlier 0 A second perspective a collection of abstractions and assumptions that the cloud needs to implement and that the developer can then trust 0 For example if the Cloud were to implement a failure detection mechanism the developer could trust it and split brain problems would be avoided 0 We ll generalize this way of thinking The Cloud is a provider of abstractions Specific tools implement those abstractions ATC example illustrates 0 A form of replication one form among many 0 An example of a consistency need one kind of consistency but not the only kind 0 A type of management implication associated with that consistency need 0 A deeper question of what it means for a system to agree on the state of its own members membership 0 We ve discussed the idea that a cloud might offer users some form of VMM abstraction 0 E g Amazoncom might tell Targetcom we ll host your data center but rather than dedicate one machine for each server Target thinks it needs Amazon could virtualize the servers and schedule them ef ciently 0 So let s virtualize the concept of failure handling Typical clientserver scenario TCP used for connections 0 Each channel has its own timers 0 If a timeout occurs clients fail over to the backup Split brain scenario Potential for inconsistency 0 Each client makes its own decisions 0 Outcome can be inconsistent 0 Concern split brain behavior Role of an 0 An all seeing eye 0 Clients must obey it Track membership 0 If the oracle makes a mistake we do as it S S I anYhOW 77 0 This eliminates our fear of inconsistency 0 Now we just hope mistakes are rare Oracle 0 A kind of all seeing eye that dictates the official policy for the whole data center 0 If the Oracle says that node X is up we keep trying to talk to node X 0 If the Oracle says node X is down we give up 0 An Oracle imposes a form of consistency Oracle 0 Later we ll see that an Oracle can be implemented in a decentralized way 0 Some perhaps all nodes run a service 0 The service elects a leader and the leader makes decisions if you think a node is faulty you tell it 0 If the leader fails a majority election is used to elect a new leader 0 By continously requiring majority consent we guarantee that split brain cannot occur Stepping back 0 So one goal of CS5i4o is to look at these issues on the boundary of 0 What technologies can and can t do 0 What we can do to overcome or evade the limits 0 The associated theory 0 A second goal is to think about how to structure systems into more standard pieces Back to the edge 0 Today s focus was on issues see in cloud settings 0 But similar questions arise in peer to peer systems used for file sharing telephony games and even live objects 0 Not the identical notion of consistency and the style of solutions is different 0 We ll look at P2P replication event noti cation building structured overlays well known applications Wrapping up 0 CS5i4o lectures will look at technologies and issues such as the ones just reviewed 39 CS5i4o projects will 0 Give you hands on experience using Windows to build web services and clients that talk to them 0 And some limited experience using the solutions we identify in class in the context of those services 0 Grading Mostly based on the three assignments 0 First two will be done individually third can be done in small teams Meng credit 0 The CS5i4o project can be used for Meng project credit 0 You must sign up for CS7900 credit with Ken Birman 0 We recommend 3 credits letter grade 0 Your grade will be identical in CS5l4o CS7900 0 You are expected to tackle a slightly more ambitious problem to get the extra credit 0 Typically CS7900 entails doing a more detailed experimental evaluation of assignment three and reporting your findings


Buy Material

Are you sure you want to buy this material for

25 Karma

Buy Material

BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.


You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

Why people love StudySoup

Steve Martinelli UC Los Angeles

"There's no way I would have passed my Organic Chemistry class this semester without the notes and study guides I got from StudySoup."

Anthony Lee UC Santa Barbara

"I bought an awesome study guide, which helped me get an A in my Math 34B class this quarter!"

Jim McGreen Ohio University

"Knowing I can count on the Elite Notetaker in my class allows me to focus on what the professor is saying instead of just scribbling notes the whole time and falling behind."

Parker Thompson 500 Startups

"It's a great way for students to improve their educational experience and it seemed like a product that everybody wants, so all the people participating are winning."

Become an Elite Notetaker and start selling your notes online!

Refund Policy


All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email


StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here:

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.