Internetworking CS 63600
Popular in Course
Popular in ComputerScienence
This 116 page Class Notes was uploaded by Nick Rowe on Saturday September 19, 2015. The Class Notes belongs to CS 63600 at Purdue University taught by Ramana Kompella in Fall. Since its upload, it has received 86 views. For similar materials see /class/208053/cs-63600-purdue-university in ComputerScienence at Purdue University.
Reviews for Internetworking
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 09/19/15
CS 636 Internetworking ROUTER ALGORITHMICS Final Review CS 636 Internetworking CS 635 Interraetvvorking Measurement IP Network Management gt What is happening in my network gt How much traffic flows towards a given destination gt What are the popular applications in my network gt How much should I charge my customer Flow Measurement Flow is a combination of various header elds Measuring characteristics of a flow such as number of packets number of bytes protocol speci c ags such as SYNFINRST etc Aggregates computed from individual flow records collected from various routers Popular applications obtained via aggregation on port number Customer traf c metered based on aggregation on destination andor source IP addresspre x r r quotx with m viru Qcnx lt il a ri itwcu ml kg NetFIow Widely used and supported in routers Flow is a vetuple consisting of SIP DIP SP DP Protocol Flow records contain number of packets bytes timestamps TCP ags For every packet if flow exists update counters otherwise create a flow record Flow records expired based on TCP ags FIN or RST inactive timeout active timeout 30 mins At high line rates a the three resources memory CPU and reporting bandwidth become bottlenecks Sampled NetFlow Sample packets with 1 in N probability Con gurable N to reduce memory processing and bandwidth requirements Flow records updated only for sampled packets Multiply by N for each ow record when it expires Problems with Sampled NetFlow One parameter to control all resources limits exibility Strong dependence on traf c mix small ows exhaust memory and affect accuracy Adaptive NetFlow estan04 Divides time into equal measurement bins Adapt sampling probability in each bin based on traf c mix Begin with high sampling rate reduce sampling rate as memory lls up Problems with ANF High memory utilization as it needs two sets for seamless operation Large ows get reported many different times Every renormalization step requires extra CPU cycles Unbiased estimators Produces unbiased estimators for packet byte and SYN counters Proofs in the paper 0 When a new ow is created after it passes the ow slicing stage Set the packet counter to 1p Set the byte counter to bp b is size of the rst packet Set the quotSYN counter to 1p if packet has a SYN ag 0 otherwise Usagebased Sampling Objective Heavy Hitters Why are heavy hitters important 0 Network monitoring Current tools report top applications top sendersreceivers of traf c Security Malicious activities such as worms and ooding DoS attacks generate much traf c Capacity planning Largest elements oftraf c matrix determine network growth trends Accounting Usage based billing most important for most active customers Problem de nition Identify and measure all streams whose traf c exceeds threshold 01 of link capacity over certain time interval 1 minute Streams de ned by elds eg destination IP Single pass over packets Small worst case per packet processing Small memory usage Few false positives false negatives Measuring the heavy hitters Unscalable solution keep hash table with a counter for each stream and report largest entries Inaccurate solution count only sampled packets and compensate in analysis Ideal solution count all packets but only for the heavy hitters How do multistage lters work Array of counters HashPink How do multistage lters work Collisions H U are OK E How do multistage lters work Reached threshold Stream memory L y L y Insert stream2 1 9 How do multistage lters work Stagel I E Stream memory Stage 2 H I Conservative update Gray all prior packets Conservative update D Redundant i E Redundant u i 5 36 m has me Worm m Conservative update Multistage lter analysis Question Find probability that a small stream 01 of traf c passes lter with d 4 stages b 1000 counters threshold T 1 Analysis any stream distribution amp packet order can pass a stage if other streams in its bucket 2 09 of traf c at most 111 such buckets in a stage gt probability of passing one stage S 111 probability of passing all 4 stages S 01114 0015 result tight Comparing to current solution Trace 26 Gbps link 43000 streams in 5 seconds Multistage lters 1 Mbit of SRAM 4096 entries Sampling p116 unlimited DRAM Average absolute error average stream size Stream size Multistage lters Sampling sgt01 001 572 01 2 s gt 001 095 208 001 2 s gt 0001 399 466 Statistics Counters Motivation Typical Line Ca rd Architecture CS 636 lnt hetworking What is a Statistics Counter 0 When a packet arrives it is rst classi ed to determine the type of the packet Depending on the types one or more counters corresponding to the criteria are updated incremented Examples of Statistics Counters Packet switches maintain counters for Route pre xes ACLs TCP connections SNMP MIBs Counters are required for Traf c Engineering Policing amp Shaping Intrusion detection Performance Monitoring RMON Network Tracing r ifs9 L391i1 2M go13 MAJ tat211u bmi mng Counter Requirements Astandard P Router Number of Counters 39 39 n per pre x counters 8K 39 policing counters Size of Counters 32 64 bit counters Why is Existing Memory Technology not Ideal Use SRAM fast enough random access time but too expensive and too low density to store 64Mb of data Use DRAM high density means we can store data but too slow typically 50ns random access time Read modify write penalty H 275 l a quotHig lLair iju f LEJUL ll3 Memory Hierarchy Large Slow Speed DRAM Memory wiTh N 5 counTers of size M64 biTs Read M bits DRAM Bandwidth Speeddown b write M bi 3 Large Coun rer G 6 7 7 7 7 7 7 r 7 r r 7 Corresponding j i 9 39 39739 397quot Small Coun rer Size m Small High Speed SRAMC8MlhnEl 5 counTers of size m 8 lt M biTs Questions To deterministically guarantee that none of the counters in the SRAM over ow irrespective of the arriving traf c pattern Largest Counter First LCFCMA CounTer39s quotH n In SRAM N CounTer39s of size m Com uTe Find ouT The counTer39 wiTh The ar39gesT counT ever39y b TimesloTs UpdaTe Schedule updaTe To counTer39 in DRAM Clear SeT value of The counTer39 in SRAM To zer39o CS lla aweiwm hhmg Optimality of LCFCMA Theorem LCF CMA is optimal in the sense that it minimizes the size ofthe counter maintained in SRAM Minimum Size of the SRAM Counter Theorem speeddown factor bgt1 LCF CMA minimizes the size of counter in bits in the SRAM to fb N 39nbN 39 092nbb1 H I09 3909 N ii ii aivfi ietworking Implementation Numbers w OC192 Line Card R lOGbsec No of counters per packet C 10 Re RC 1OOGbs Cell Size 64bytes Teff 5122 256ns DRAM Random Access Time 512ns Speeddown Factor b gt TTeff 20 Total Number of Counters 1000 1 million Size of Counters 64 bits 8 bits Implementation Examples Values N 1000 8 M8 N1000 M64 N 1000000 9 M 64 OC192 Line Card fbN Br u re For39ce SRAM8Kb DRAM0 SRAM64Kb DRAM0 SRAM64Mb DRAM0 CS ln i networkmg LCFCMA SRAM8Kb DRAM8Kb SRAM8Kb DRAM64Kb SRAM9Mb DRAM64Mb Problem with LCFCMA Minimum counter needs to be found for every refresh requiring sorting Some potential possibilities Maintain maxheap Efficient statistics counter architecture using LRT Algorithm Ramabhadran et al LRT CMA Largest Recent with Threshold T Removes sorting bottleneck approximate bin sorting Keeps a bitmap that tracks counters that are larger than threshold T Practically realizable with 2 bits extra per counter Uses same optimal amount of SRAM as LCF A simple pipelined data structure LRT algorithm A updates made to counters in SRAM After b updates CMA picks one counter that is written to DRAM Updated counter is reset to O quotbquot depends on relative access times of DRAM and SRAM LRT Algorithm Letj be the counter with the largest value after the last cycle of b updates If valuej gt T Update counterj to DRAM and set it to O in the SRAM If valuej lt T nd another counter with value atleast T and update to DRAM If no counter found then update counterj to DRAM Implementation Aggregated bitmap A bitmap is used to indicate if a counter is above or below the threshold The following operations are required to be implemented on the bitmap to support LRT Addi To update bit for a counter to indicate its value is above threshold Deletei After updating a counter s value it this operation is performed to indicate that its value is now below T Testi to check if a counter s value is above T Find to nd any counter with value above T Tree structure Leaves of binary tree are formed by NW nodes where N is total number of counters W is the word size For a tree of height h1 2h should be equal to WW 0 For a node with children as leaf nodes lcount and rcount are number of bits set in the lchild and rchild respectively Tree structure 0 For a node whose children are not leaf nodes the Icount is the sum ofthe Icount and rcount elds of its left child and rcount Functions on the bitmap can be performed on a top down traversal of the tree Each ofthe internal nodes does not contain pointers to lchild and rchild only Icount and rcount values Memory Total number of node 2h1 1 Total memory 2h1 1 W 2NW 1W2N Wlt2N So 2 most 2 bits per element Measurement Primitives CS 636 Internetworking CS 1336 I hie rnetwo rkin g 52 Existing router infrastructure Routers come with SNMP Not very ne grained only perinterface Not useful for debugging delay or loss problems Routers come with NetFlow Only useful for per ow statistics Not useful for debugging performance problems Routers come with syslogs Some information in ad hoc fashion Very little information regarding delays and losses Applying existing techniques Standard approach is active probes and tomography Join results from many paths to infer perlink properties Can be applied to measuring all the metrics of interest Limitations Overheads for sending probes may limit granularity Cannot be used to measure things like ubursts10s of usecs Lack of tight endtoend timesynchronization Typical NTP error is on the order of milliseconds Tomography inaccurate due to underconstrained No guarantee that metrics measured by probes are representative of those experienced by any particular traffic flow l n iei me two 1 lcl ng Motivation lfwe could make changes to routers what new primitives should we add to aid in debugging performance problems 38 636 lnternetworking 55 Outline Model Why simple data structures do not work LDA for average delay and variance Exploiting LDAs to detect microbursts Assumptions gt O Assumption 1 FIFO link between sender and receiver 0 Assumption 2 Finegrained persegment time synchronization Using IEEE 1588 protocol for example 0 Assumption 3 Link susceptible to loss as well as variable delay 0 Assumption 4 A little bit of hardware can be put in the routers C You may have objections we will address common ones later CS lntemelworkino Constraints gt Receiver R Constraint 1 Very little highspeed memory Constraint 2 Limited communication budget Constraint 3 Constrained processing capacity Consider a runof themill OC192 10Gbps link 250byte packets implies 5 million packets per second CS lnternelworklng Analysis of LDA Packet loss k packet losses can corrupt up to k buckets If k ltlt B then only a small subset of buckets corrupted 0 Problem High loss implies many bad buckets Solution Sampling Control sampling rate such that no more than some fraction of buckets corrupted Can also be formulated to maximize expected number of good samples Computingjitter Jitter is a measure of variance in delay Can we adapt LDA to measure variance Solution idea inspired by sketching AMS96 Consider random variable Xi that takes values 1 and 1 with probability 12 At S and R packet pi has timestamps ai and bi S transmits ZaiXi to R R computes ZbiXi ZaiXi2 n u2 to obtain vanance Come again E2bl x X Eai gtlt X02 Eggs xXi2 512512 x X2 22615 x Xin 253 x EXl2 226151 X E XIlt1 251 1 0 String Searches 38 636 Internetworking CS 1336 I site rnetwo rkin g Intrusion Detection Systems Intrusion detection systems IDSes must be capable of distinguishing between normal and abnormal user activities Broad Categories String or Pattern Matching Behavioral Anomaly Detection Signature matching Used in intrusion preventiondetection application classi cation load balancing Guard a byte string or a regular expression Action drop packet log alert set priority direct to speci c server Input byte string from the payload of packets Hence the name quotdeep packet inspection Output the positions at which various signatures match or the identi er of the quothighest priority signature that matches Size of problem hundreds of signatures per protocol String matching Most widely used early form of deep packet inspection but the more expressive regular expressions have superceded strings by now Still used as pre lter to more expensive matching operations by popular open source IDSIPS Snort Matching multiple strings a wellstudied problem A Aho M Corasick quotEf cient string matching An aid to bibliographic search Communications of the ACM June 1975 Generalization of Knuth Morris Pratt algorithm for nding the occurrence of one word within a given text string Snort uses optimized AhoCorasick implementation Matching time independent of number of strings memory requirements proportional to sum of their sizes Single string search Knuth Morris Pratt algorithm Boyer Moore algorithm Searches for the occurrence of a word w within string S Kn uthMorrisPratt KMP algorithm over binary alphabet Build DFA from pattern IIIIII III accep r state CS 636 Internetvvorking Kn uthMorrisPratt KMP algorithm over binary alphabet Build DFA from pattern IIIIII III accep r state CS 636 Internetvvorking Kn uthMorrisPratt KMP algorithm over binary alphabet Build DFA from pattern IIIIII III accep r state CS 636 Internetvvorking Kn uthMorrisPratt KMP algorithm over binary alphabet Build DFA from pattern IIIIII III III accep r state 38 636 Internetvvorking Kn uthMorrisPratt KMP algorithm over binary alphabet Build DFA from pattern IIIIII III III accep r state 38 636 Internetvvorking 80 Kn uthMorrisPratt KMP algorithm over binary alphabet Build DFA from pattern IIIIII III III accep r state 38 636 Internetvvorking 81 Kn uthMorrisPratt KMP algorithm over binary alphabet Build DFA from pattern IIIIII III III accep r state 38 636 Internetvvorking 82 Kn uthMorrisPratt KMP algorithm over binary alphabet Build DFA from pattern IIIIII accep r state 38 636 Internetvvorking 83 Kn uthMorrisPratt KMP algorithm over binary alphabet Build DFA from pattern IIIIII accep r state 38 636 Internetvvorking 84 Kn uthMorrisPratt KMP algorithm over binary alphabet Build DFA from pattern IIIIII accep r state 38 636 Internetvvorking 85 Kn uthMorrisPratt KMP algorithm over binary alphabet Build DFA from pattern IIIIII accep r state 38 636 Internetvvorking 86 Kn uthMorrisPratt KMP algorithm over binary alphabet Build DFA from pattern IIIH accept state 38 636 Internetvvorking 87 String Search Problem De nition Input P a set of 2 patterns P1 Pz total length n text T length m within which to look for patterns Task Output location of all occurrences of each pattern Pi in T Bounds Onzm bound using exact string matching algorithms Goal Onmk bound where k is the number of occurrences of some pattern Pi in T r r Zquot39 on iv13 at i quotnew 493 a u l flvnl ml kg Keyword Tree p poet pope pOpO tOO Observations 0 Keyword tree K construction Can be done in On time remember n is total length of all patterns Naive search algorithm with keyword tree K Align tree to each position in T and see if there is a match Onm time Can we do better Failure functions Temporary assumption no pattern in P is a proper substring of another pattern in P De nitions For each node v of K Lv denotes the concatenation of the characters from the root to node v For any node v of K de ne pv to be the length of the longest proper suf x of Lv that is a pre x of some pattern in P For a node v of K let fv denote the unique node in K with the suf x of Lv of length pv Note fv the root of K if lpv O Directed edge v fv is a failure link r r quotx thou m lav17 MAJ lt il a ri ikwcu ml kg Keyword Tree and failure links p poet pope pOpO tOO A Scalable and Highspeed Algorithm Dharmapurikar03 Basic Idea Jumpahead AhoCorasicK NFA JACKNFA 1 Reorganize the AhoCorasick automaton to consider k characters at a time ex k4 quottechquot quottelequot quotphonquot 2 When looking for a matching tail we start from the longest pre x and move towards a shortest one for the longest pre x match tail state to match fewer than k characters to detect the string completely ex l lly nding the match of the string then report it Example at this paper Breaking into k characters 31 technical sl tech nica 2 technically 52 A 3 tel 39 53 tele phon s4 telephone 54 phon 5 phone 55 elephant 6 elephant e s 7 lly e 5 GD 4r regulai39ti39ansition 7 gt failure transition 7 7 7 r gt associated tails matching strings C 5 Alignment To match the string correctly our machine must capture it at the correct k character boundary Why we need capture the boundary If a string were to appear in the middle of the k character group it will not be detected II II ex xyte chni cal ytec hnic al tech nica l Solution deploy k machines each of which scans the text with one byte offset Data Structures Maintain hash table for transitions Stored in a offchip memory as a hash table Only for successful transitions and the false positives we need to look up the input key in the offchip table chain SJ 8 Sr This table can be im as a hash table the tail state Table 1 Transition table of plemented in the offchip Bloom lters Implementation of a physical machine For the gure k 4 The pair ltstate pre xgt is looked up in the associated Bloom lter before of f chip table accesses Regular Expression Searches 38 636 Internetworking CS 133639 I like metwo rkin g 99 Regular expression matching 0 Deterministic and nondeterministic nite automata DFAs and NFAs can match regular expressions NFAs more compact but require backtracking or keeping track of sets of states during matching Both representations used in hardware and software solutions but only DFA based solutions can guarantee throughput in software DFAs have a state space explosion problem From DFAs recognizing individual signatures we can build a DFA that recognizes entire signature set in a single pass Size of combined DFA much larger than sum of sizes for DFAs recognizing individual signatures Multiple combined DFAs are used to match signature set Representing DFAs I How to represent DFAs more compactly gtgt Can t reduce number of states gtgt How about reducing number of transitions 256 transitions per state 50 distinct transitions per state real world datasets Need at least 50 words per state Three rules C a bc cd Look at state pairs there are many common transitions How to remove them 4 transitions per state f 1 v39wr hm an iiiuc i u wv 1r Introduction to Our Approach I How to represent DFAs more compactly gtgt Can t reduce number of states gtgt How about reducing number of transitions 256 transitions per state 50 distinct transitions per state real world datasets Need at least 50 words per state Alternative Representation Three rules a bc cd 4 transitions per state f 1 v39wr hm an llluc i liz umu inlm Fewer transitions less memory DZFA Operation Heavy edges are called default transitions Take default transitions whenever a labeled transition is missing DZFA Operation Above two set of default transitions trees are also correct However we may traverse 2 default transitions to consume a character Thus we need to do more work gt lower performance W So how to construct space efficient DZFAs while keeping default paths bounded inter networking 11115 DZFA Construction I Present systematic approach to construct DZFA I Begin with a state minimized DFA I Construct space reduction graph gtgt Undirected graph vertices are states of DFA gtgt Edges exist between vertices with common transitions gtgt Weight of an edge of common transitions 1 DZFA Construction I Convert certain edges into default transitions gtgt A default transition reduces w transitions w wt of edge gtgt If we pick high weight edges gt more space reduction gtgt Find maximum weight spanning forest gtgt Tree edges becomes the default transitions I Problem spanning tree may have very large diameter gtgt Longer default paths gt lower performance of transitions 233311 DZFA Construction I We need to construct bounded diameter trees gtgt NP hard gtgt Small diameter bound leads to low trees weight Less space efficient DZFA gtgt Time space trade off I We propose heuristic algorithm based upon Kruskal s algorithm to create compact bounded diameter D2FAs End Node Algorithmics Copying CS 636 Internetworking 108 Web server sending a le 0 Web server reads from disk and writes to the socket Two system calls Read and Write Read From disk to le cache From le cache to application bu ers 0 Write From application buffer to socket buffer From socket buffer to NIC r r quotx with m lav1739 Ccix 0331iu l bwlTMng What copies can we remove Copyl Unavoidable since we need to read from the disk anyway Copy2 seems unnecessary Why can t we copy data directly from le cache to the network Copy3 also seems unnecessary Copy4 Maybe unavoidable So we can remove atleast two copies Local restructuring solutions 0 Exploiting adaptor memory Use P13 degree of freedom Memory mapped IO Use Copy onwrite Exploit virtual memory Explicit COW bits to raise interrupts Use Page Remapping Recon gure virtual page mappings Fbufs Limitations of the approach Cost problems Network adaptor needs lots of memory Checksum problems Doing Checksum while copying into appl buffer is not wise TCP problems Delayed acknowledgements Implementing copyonwrite Packet d ata Process 1 Process 2 0 Make different virtual page table entries point to the same physical page 0 What if Process 1 writes data COW bits cause hardware interrupt New page created for Process 1 CS lntemetworking 118 Solution3 FBUFS Invented by Druschel and Peterson Main idea Buffers will probably be reused multiple times Precompute P2a all page mapping information ahead of time Or evaluate lazily P2b when data transfer first starts amortized over length of transfer 0 Eliminate page remapping overheads in the common case Implementing FBUFs J 7 Domain 0 I Path l buffers Path 2 buffers Ethernet Naive way Map pages into VM tables of kernel and all applications But protection and security issues Secure way Reserve shared pages for each application Or for each path series of security domains CS 636 Internetworking 120 Implementing FBUFs continued 9 Domain 2 u DomainO D Path l buffers a Path2buffers 0 Early demultiplexing necessary Lowest level driver identi es path Picks buffers for that path 0 Free buffers transmitted back to lowest level 33 636 Interneivvorking 121 End of the FBUFs story Change in the API for networking Standard copy semantics not preserved To protect against buggy code Kernel toggles write enable after fbuf transferred from application Bit is set again when fbuf is given back Just change the protocol Web requests using GET to a webserver What if a client could say I want a le X I want you to put that le between addresses A and B in my memory The server could say I will schedule a DMA into your memory Voila Transfer done Remote DMA RDMA is just an extension of DMA over network Originally proposed by a group of architects over VAX clusters 1986 Goal No per packet mediation by the CPUs Two issues How receiver knows where to place data How security and integrity are handled Remote DMA Buffer B Page 11 Source A Page 16 Destination Used within SANS and clusters Can transfer megabytes without CPU involvement Incorporated in modern protocols Fiber channel In niband iSCSl Buffer id s are random strings hard to guess CS 8313 iniel39networiring What are storage area networks a Wig 39 F TuWANVar w Remm Sunny Fm 91 Fm sham SmrzveAre mka CE 636 Hl ei MMHg Storage technologies Fiberchannel Transfer data between two ports At speeds of 10625 Gbit For Storage via the SCSI 3 protocol Using optical and copper media 0 In niband Merge network disk and PCI bus into one iSCSl Gb Ethernet cheaper than Fiberchannel r r quotx with m lav1739 Ccix linen i l z itww will g Web server sending a le 0 Web server reads from disk and writes to the socket Two system calls Read and Write So far optimized only the Write 0 How about merging reads and writes to eliminate copy 2 Memorymapped IO Removes copy 2 Server buffer mapped to same pages as le cache buffer mmap system call Allows application to map a le directly to an application address space Like a copy of the le in application cache P4 leverage system components Example Flash 0 Flash Web Server Avoids Copyl and Copy 2 by using mmap Caches frequently used les into the application buffers Limited storage so use cache replacement Duplication of caching File system also caches les Copy 3 not avoided here oh man How to avoid copy 2 and 3 Mmap COW copy on write Does nothing for dynamic content and web server CGI processes transmit data via IPC What about TCP checksums IOIite API 0 IO Lite and Applications To take full advantage of IO Lite application programs can use an extended IO API that is based on buffer aggregates IOLite IO API Sizet IOL read int fd OLAgg aggr sizet size Sizet OLwrite int fd OLAgg aggr IO splicing Extend API with the send le system call 0 A full le is sent without ever passing through user space 0 The kernel uses the le cache buffer as socket buffer Broadening beyond copies VJ s idea RISC processors have delay slot between loads and stores Empty cycle used for other computation Eg copy loop 9 checksum 0 Clark amp Tenenhouse Generalize VJ s idea to other data intensive functions Integrated Data Processing Integrated layer processing When data is read in perform processing for multiple layers Perform CRC computations when data is copied Other examples didn t work out encryption format conversions challenges Other info needed keys in different modules Different layers use different chunk sizes Some manipulations conditional Instruction cache cannot hold all code Broadening beyond data Memory A F1 s code iCache size F1 s and F25 Frequently used code V A F28 code Infrequently used code V Two frequently used instructions can map to the same line of a direct mapped l cache Relocating frequently used code can eliminate con ict might increase code size 33 636 lnternetvvorking 138 Broadening beyond data Arrange code such that caching is effective 0 Normal compiler generated code not cache effective Direct mapping Multiple instructions per block 0 Networking code Have lots of if then else Often many checks are not performed
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'