Distributed Software Develop
Distributed Software Develop CS 682
Popular in Course
Popular in ComputerScienence
verified elite notetaker
This 60 page Class Notes was uploaded by Michele Herzog on Thursday October 29, 2015. The Class Notes belongs to CS 682 at University of San Francisco taught by Sami Rollins in Fall. Since its upload, it has received 6 views. For similar materials see /class/231233/cs-682-university-of-san-francisco in ComputerScienence at University of San Francisco.
Reviews for Distributed Software Develop
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 10/29/15
An Overview of PeertoPeer Sami Rollins Outline P2P Overview What is a peer Example applications Bene ts of P2P Is this just distributed computing P2P Challenges Distributed Hash Tables DHTs What is PeertOPeer P2P Napster Gnutella Most people think of P2P as music sharing What is a peer Contrasted with ClientServer model Servers are centrally maintained and administered Client has fewer resources than a server What is a peer A peer s resources are similar to the resources of the other participants P2P peers communicating directly with other peers and sharing resources Often administered by different entities Compare with DNS P2P Application Taxonomy P2P Systems Distributed Computing File Sharing Collaboration Platforms SE T home Gnutella Jabber JX T A Distributed Com I 7 I I I I 7 l 391 I a I In W 7 39r wk quot h B quot7 ha a 5px 7 V r a 9 Collaboration sendMessage l 0 00 7 h I receiveMessage sendMessagel I receiveMessage Collaboration sendMessage receiveMessage sendMessage receiveMessage Platforms Gnutella Instant Messaging Find Peers Send Messages P2P GoalsBene ts Cost sharing Resource aggregation Improved scalabilityreliability Increased autonomy Anonymityprivacy Dynamism Adhoc communication P2P File Sharing Centralized Napster Decentralized Gnutella Hierarchical Kazaa IncentiVized BitTorrent Distributed Hash Tables Chord CAN Tapestry Pastry Challenges Peer discovery Group management Search Download Incentives Metrics Pernode state Bandwidth usage Search time Fault toleranceresiliency Centralized Napster model Server contacted during search Peers directly exchange content Bene ts Ef cient search Limited bandwidth usage No pernode state Drawbacks Central point of failure Limited scale Bob Alice gt Decentralized Flooding Ca Jane Gnutella model Search is ooded to neighbors Neighbors are determined randomly Bene ts No central point of failure Limited pernode state Drawbacks Slow searches Bandwidth intensive Hierarchical Kazaanew Gnutella model Nodes with high bandwidthlong uptime become supernodesultrapeers Search requests sent to supernode Supernode caches info about attached leaf nodes Supernodes connect to eachother 32 in Limewire Bene ts Search faster than ooding Drawbacks Many of the same problems as decentralized Recon guration when supernode fails Jane Ahx eaw El 4 gt i SuperBob SuperFred h Andy ll AHce j Ca BitTorrent torrent server 1 Download torrent Source 2 Get list of peers and seeds 3 Exchange vector the swarm of content downloaded with peers 39 4 Exchange content w peers 5 Update w progress tracker BitTorrent Key Ideas Break large les into small blocks and download blocks individually Provide incentives for uploading content 0 Allow download from peers that provide best upload rate Benefits Incentives Centralized search No neighbor state except the peers in your swarm Drawbacks Centralized search No central repository Distributed Hash Tables DHT 001 012 Chord CAN Tapestry Pastry model AKA Structured P2P networks Provide performance guarantees If content exists it Will be found Bene ts More ef cient searching Limited pernode state Drawbacks Limited faulttolerance vs redundancy DHTs Overview Goal Map key to value Decentralized with bounded number of neighbors Provide guaranteed performance for search If content is in network it will be found Number of messages required for search is bounded Provide guaranteed performance for joinleave Minimal number of nodes affected Suitable for applications like file systems that require guaranteed performance Comparing DHTs Neighbor state Search performance Join algorithm Failure recovery CAN Associate to each node and item a unique id in an d dirnensional space Goals Scales to hundreds of thousands of nodes Handles rapid arrival and failure of nodes Properties Routing table size Od Guarantees that a le is found in at most dn1d steps Where n is the total number of nodes CAN Example TWO Dimensional Space Space divided between nodes 7 All nodes cover the entire space Each node covers either a square or a rectangular area of4 ratios 12 or 21 3 Example 2 N0de n11 2 rst node that 1 joins 9 cover the entire space 0 iquot 3 i 3 i 7 I 3 CAN Example TWO Dimensional Space Node n24 2 joins n2 contacts n1 7 6 n1 splits its area and assigns half to n2 CAN Example Two Dimensional Space Nodes n33 5 n45 5 and n566join Each new node sends JOIN request to an existing node chosen randomly New node gets neighbor table from existing node New and existing nodes update neighbor tables and neighbors accordingly before n5 joins n4 has neighbors n2 and n3 n5 adds n4 and n2 to neighborlist n2 updated to include n5 in neighborlist Only O2d nodes are affected l l l CAN Example Two Dimensional Space Bootstrapping assume CAN has an associated DNS domain and domain resolves to IP of one or more bootstrap nodes Optimizations landmark routing Ping a landmark servers and choose an existing node based on distance to landmark CAN Example TWO Dimensional Space Nodes n11 2 n242 n33 5 n455n566 Items f123 f251 f321 f475 I CAN Example Two Dimensional Space Each item is stored by the node who owns its mapping in the space i CAN Query Example Forward query to the 7 neighbor that is closest to the a query id Euclidean distance 5 Example assume n1 queries 4 f4 CAN Query Example Forward query to the neighbor that is closest to the query id Example assume n1 queries f4 CAN Query Example Forward query to the neighbor that is closest to the query id Example assume n1 queries f4 CAN Query Example Content guaranteed to be found in dn1d hops Each dimension has nld nodes Increasing the number of dimensions reduces path length but increases number of neighbors i Node Failure Recovery Detection Nodes periodically send refresh messages to neighbors Simple failures neighbor s neighbors are cached When a node fails one of its neighbors takes over its zone When a node fails to receive a refresh from neighbor it sets a t1mer many neighbors may simultaneously set their timers When a node s timer goes off it sends a TAKEOVER to the failed node s neighbors When a node receives a TAKEOVER it either a cancels its timer if the zone volume of the sender is smaller than its own or b replies with a TAKEOVER i Chord Each node has mbit id that is a SHAl hash of its IP address Nodes are arranged in a circle modulo m Data is hashed to an id in the same id space Node 11 stores data With id between H and n s predecessor 0 stores 40 1 stores 1 3 stores 23 Chord Simple query algorithm Node maintains successor To nd data With id 1 query successor until successor gt i found Running time Chord In reality nodes maintain a nger table with more routing information For a node n the I m entry in its nger table is the rst node that succeeds n by at least 21391 Size of nger table Chord In reality nodes maintain a nger table with more routing information For a node n the I m entry in its nger table is the rst node that succeeds n by at least 21391 Size of nger table O10g N Chord query hash key to get id if id node id data found else if id in nger table data found else p nd predecessorid n ndsuccessorp nd predecessorid choose 11 in nger table closest to id if n lt id lt ndsuccessorn return 11 else ask n for nger entry closest to id and recurse Chord Running time of query algorithm Problem size is halved at each iteration Chord Running time of query algorithm OIog N Join initialize predecessor and fingers update fingers and predecessors of existing nodes transfer data N Chord Initialize predecessor and finger of new node n ncontacts existing node in network n n does a lookup of predecessor of n for each entry in finger table look up successor Running time OmogN Optimization initialize nwith finger table of successor with high probability reduces running time to Olog N Chord Update existing nodes 4 6 n becomes ith finger of a node 0 5 p if p precedes n by at least 2 1 the ith finger of p succeeds n start at predecessor of n and 6 2 walk backwards for i1 to 3 5 find predecessor of n2i391 4 update table and recurse 5 Running time OIog2N 7 Chord Stabilization Goal handle concurrent joins Periodically ask successor for its predecessor If your successor s predecessor isn t you update Periodically refresh finger tables Failures keep list of rsuccessors if successor fails replace with next in the list finger tables will be corrected by stabilization algorithm DHTs TapestryPastry Global mesh Suf xbased routing Uses underlying network distance in constructing mesh Comparing Guarantees Model Search State Chord Uni log N log N dimensional CAN Mull led 2d d1rnens1onal Tapestry Global Mesh long b long Pastry Neighbor long b 10ng b map Remaining Problems Hard to handle highly dynamic environments Usable services Methods don t consider peer characteristics Measurement Studies Free Riding on Gnutella Most studies focus on Gnutella Want to determine how users behave Recommendations for the best way to design systems Free Riding Results Who is sharing What August 2000 The top Share As percent of Whole 333 hosts 1 1142645 37 1667 hosts 5 2182087 70 3334 hosts 10 2692082 87 5000 hosts 15 2928905 94 6667 hosts 20 3037232 98 8333 hosts 25 3082572 99 Saroiu et a1 Study How many peers are serverlike elient like Bandwidth latency Connectivity Who is sharing What Saroiu et al Study May 2001 Napster crawl query index server and keep track of results query about returned peers don t capture users sharing unpopular content Gnutella crawl send out ping messages with large TTL Results Overview Lots of heterogeneity between peers Systems should consider peer capabilities Peers lie Systems must be able to verify reported peer capabilities or measure true capabilities Measured Bandwidth GDP at annstream and Upstream Eollteneck Eandwrdths CDF 0 Downstream Eollleneck Eandwlalhs enmenat an Downstream S Farce Mag 2 a Husls Percanlage gr Masts m 1 10 100 000 10000 101000 1300000 1 10 100 1000 10000 101000 1000000 Kbps Figure 3 Left CDFs of upstream and downstream bottleneck bandwidths for Gnutella peers Right CDFs of downstream bottleneck bandwidths for Napster and Gnutella peers Reported Bandwidth Napster Repu ed Eandwldlhs Unknawn I DSL 22 quotkquot 14 t maxbpe I new 33 m7 3 I 56 7th5 5 2e BKbps a AKbp w 1mm aa eKhvtv Deahl 1 I 05 Ecnbie um In 331 15 ii T3 Figure 4 Left Reported bandwidths For Napster peers Right Reported bandwidths for Napster peers excluding peers that reported unknown39 aeponee Bandwidihs m Navsler Excluding Repenee Unknowns ARM 8 mp w n 5quot nanny Wquot sstps 2m cable we Measured Latency DDF 039 Measured Latencles Gnulella Gnutella Downstream Botrleneck Bandwidth vs Latency 100300 1uuno 7 w W o 5 Lane 0 m m g 100 E w n 107 1 7 u 1 1 mu won 1mm 1mm 1 1a 10 10m won woman 1uuoonu Mlllisecundi ths Figure 5 Left Measured latencies to Gnutella peers Right Correlation between Gnutella peers downstream bottleneck bandwidth and latency Measured Uptime can ul Host Uptrmes anurena Hnsi Upiima Percentage 01 Hosts an an Nest UPlvme Figure 6 lPlevel uptime of peers Internet Host ptimequot and applicationlevel uptime of peers GnutellalNapster Host Uptimequot in oth Napster and Gnutella as measured in the percentage of time the peers are reach able chs or Sesslnn Duralmn Percenlaqe at Sessinns 1 sn 12m mu m cm m an m m sun sea 72 Sessmn Dulalmn Mmules Figure 7 The 39bution of NapsterGnuteila session durations Number of Shared Files ch or Number or Shared Files Gnutelia CDFs or Number 0 Shared Files Excludan Hasls Sharing No Files we so 7 iquot m m a 3 E so 1 a t w n g m 395 40 E w i g a W E l 29 r E o 7 u 1 nu ma mun 1001 10mm 1 10 1M 1ouan 1Dnnnu Number ui Files Number 01 Shared Files Figure 8 Left The number of shared files for Gnutella peers Right The number of shared files for Napster and Gnutella peers peers with no files to share are excluded Connectivity a b C Figure 15 Left Topology of the Gnutella network as of February 16 2001 1771 peers Middle Topology of the Gnutella network after a random 30 of the nodes are removed Right Topology of the Gnutella network after the highestdegree 4 ofthe nodes are removed Points of Discussion Is it all hype Should P2P be a research area Do P2P applicationssystems have common research questions What are the killer apps for P2P systems
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'