Internet Arch& Protocols
Internet Arch& Protocols CS 7260
Popular in Course
Popular in ComputerScienence
This 0 page Class Notes was uploaded by Alayna Veum on Monday November 2, 2015. The Class Notes belongs to CS 7260 at Georgia Institute of Technology - Main Campus taught by Staff in Fall. Since its upload, it has received 13 views. For similar materials see /class/234100/cs-7260-georgia-institute-of-technology-main-campus in ComputerScienence at Georgia Institute of Technology - Main Campus.
Reviews for Internet Arch& Protocols
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 11/02/15
Content Overlays Nick Feamster CS 7260 March 8 2006 Administrivia Quizzes back at end of class Statistics Average 45 Std Dev 5 Max 55 Min 33 If not don t panicjust do a good project Today s Lecture Propagation Dagon et al Modeling Botnez Propagation using Timezones What can be done with a botnet Spam Phishing Click fraud Identity theft DDoS Content Overlays Resilient Overlay Networks Motivation Basic Operation Problems scaling syncrhonization etc Other applications security Modeling Botnet Propagation Heterogeneous mix of vulnerabilities Diurnal patterns Fianna Polrid Gelmany llaly Netherlands Unled kingdom 10000 Diurnal patterns can have an effect on the rate of propagation 8000 GOOD S r N CarnationsMinna Can model spread of the botne t based on shortterm propagation vi 1 2000 1 I x l 39 quot l 31112104 31l 12f04 DUDHOS OIIOHDS 020105 OZIOII OS OSNJHO OSIUIOS 04011 05 lJMOHO 05101 KB no 00 12 00 00300 1200 EIUIDO 00 CD 1200 0000 23900 0000 Modeling Propagation Single TZ Pairwise infection rate scanning ratesize of IP space Removal rate some dm fraction of online Um e39iil ff 5quott infected machines it df Ingggtzd Online Online vulnerable infected hosts hosts SW 2 Mt t Rm Useful for modeling the spread of regional worms Question How common is this Extension to multiple timezones is reasonably straightforward Spread across multiple timezones diff ri39l i l39jif39a4r 1quot finquot H57 Hr hOnltine vtlilnerable jquot quot ossm Imezonei T 39 E 1 1 33 3911quot l 15 I 239 I 1 NGWIY infeCtEd 7n39 n y JEN Infection from zone j to i hosts in timezone i Question What assumption is being made regarding scanning rates and timezones Experimental Validation How to capture various parameters Derive diurnal shaping function by country Monitor scanning activity per hour per day 24 bins Normalize each day to 1 and curvefit How to estimate Nt per timezone Fitting the model to the data x110 3 I Botnet data A 39 Diurnai model 39 l SIR modeli 25 A 2 I l llIA a g A 15 in i i i 1 39 39 A 39 quotA VI fl In 15 AI A A A A V A 0 2000 4000 6000 8000 Time I minute Diurnal shaping function yields more accurate model Applications of the model Forecasting the spread of botnets Improved monitoring and response capabilities A faster spreading worm may be stealth depending on the time of day that the worm was released x10 4 35 3 396 a w 12 14 16 Time airler release hoursl New Trend Social Engineering Bots frequently spread through AOL IM A botinfected computer is told to spread through AOL IM lt contacts all of the logged in buddies and sends them a link to a malicious web site People get a link from a friend click on it and say sure open it when asked nSUInfector DSUInfectee i le Edit Insert Eeople OSUInfector39s Warning LevelD 08Unfectonhtl 83h39t35cnm icturesmom aim 1Alwamn z ulmeem ml aim NeverMiss 7 3937 Q 0 ea g5 111991 Earn Block Evpressions Games Ia lk fair lllll Ieris mm 10 Early Botnets AgoBot 2003 Drops a copy of itself as svchostexe or syschkexe Propagates via Grokster Kazaa etc Also via Windows file shares 11 Botnet Operation General Assign a new random nickname to the bot Cause the bot to display its status Cause the bot to display system information Cause the bot to quit IRC and terminate itself Change the nickname ofthe bot Completely remove the bot from the system Display the bot version or ID Display the information about the bot Make the bot execute a EXE le C Commands Cause the bot to display network information Disconnect the bot from IRC Make the bot change IRC modes Make the bot change the server Cvars Make the bot join an IRC channel Make the bot part an IRC channel Make the bot quit from IRC Make the bot reconnect to IRC Redirection Redirect a TCP port to another host Redirect GRE traf c that results to proxy PPTP VPN connections DDoS Attacks Redirect a TCP port to another host Redirect GRE traf c that results to proxy PPTP VPN connections Information theft Steal CD keys of popular games Program termination 12 PhatBot 2004 Direct descendent of AgoBot More features Harvesting of email addresses via Web and local machine Steal AOL loginspasswords Sniff network traffic for passwords Control vector is peertopeer not IRC 13 PeertoPeer Control Good distributed CampC possible better anonymity Bad more information about network structure directly available to good guys IDS overhead typical p2p problems like partitioning joinleave etc 14 Botnet Application Spam Example Bobax Approximate size 100k bots 20000 Bobax 1P Baha i spam 1001 5mm Proportionally less mam spam from bots 40W J I 7 1300 r V i 39z I 4 quotquotquotquotquot quotIf I l I I I I D G 4 Q m C5 x g as c m 1 0quot g 1 LI rJ g 4 1 I 39 r3quot 5 1 7 r I 6 s 2 I i d r 1 a a g quot24 pre x Most ot IP addresses do not return E 06 g 39 r 65 of bots only send mail to a g 04 domain once over 18 months 02 quotn39 quotin39 39i x 39c39m39 39la m39 366006 lo 39m39 391ng Lifetime in seconds E mmmmns to lhsmja mg awn IE3 53 16 Blacklisting Seems to Work Pretty Well chtiau of all spam rcccivcd LE y La L4 I I I Spam from bobu drones A Spiln l H S me fmm trans icnt EGP immune m m Minimum numb OEDNSBLS listing this spanm39icr Premium for spamming bots that are not blacklisted 17 Botnet Application Phishing Phishing attacks use both social engineering and technical subterfuge to steal consumers39 personal identity data and financial account credentials Antispam working group Socialengineering schemes Spoofed emails direct users to counterfeit web sites Trick recipients into divulging financial personal data AntiPhishing Working Group Report Oct 2005 15820 phishing email messages 4367 unique phishing sites identified 96 brand names were hijacked Average time a site stayed onIine was 55 days Question What does phishing have to do with botnets 18 Which web sites are being phished Retail 25 SP 5 t39e iscellaneous 33 Financial Services 893 Source Antiphishing working group report Dec 2005 Financial services by far the most targeted sites New trend Keystroke logging 19 Phishing Detection and Research Idea Phishing generates sudden uptick of password reuse at a brandnew IP address Hpwd etradecom Hde Distribution of password harvesting across bots can help 20 Botnet Application Click Fraud Payper click advertising Publishers display links from advertisers Advertising networks act as middlemen Sometimes the same as publishers eg Google Click fraud botnets used to click on payper click ads Motivation Competition between advertisers Revenue generation by bogus content provider 21 Open Research Questions Botnet membership detection Existing techniques Require special privileges Disable the botnet operation Under various datasets packet traces various numbers of vantage points etc Click fraud detection Phishing detection 22 Today s Lecture Propagation Dagon et al Modeling Botnet Propagation using Timezones What can be done with a botnet Spam Phishing Click fraud Identity theft DDoS Content Overlays Resilient Overlay Networks Motivation Basic Operation Problems scaling syncrhonization etc Other applications security The Internet Ideal Dynamic routing routes around failures Enduser is none the wiser 24 Lesson from Routing Overlays Endhosts are often better informed about performance reachability problems than routers Endhosts can measure path performance metrics on the small number of paths that matter Internet routing scales well but at the cost of performance 25 Reality Routing pathologies Paxson s paper from a few lectures ago 33 of routes had serious problems Slow convergence BGP can take a long time to converge Up to 30 minutes 10 of routes available lt 95 of the time Labovitz Invisible failures about 50 of prolonged outages not visible in BGP Feamster 26 Intuition for Delayed BGP Convergence There exists a message ordering for which BGP will explore all possible AS paths Convergence is ON where N number of default free BGP speakers in a complete graph In practice exploration can take 1530 minutes Question What typically prevents this exploration from happening in practice Question Why can t BGP simply eliminate all paths containing a subpath when the subpath is withdrawn 27 Routing Convergence in Practice Time Pl39Pfix Type AS Path Lac alpl39efItEED 39olmmumty 33312 19578380123 A 17421100166311000 ggggggm 1957833023 A 33333533373 2 03J S f123 5400246 ggigm 19578 380123 w Route withdrawn but stub cycles through backup path 28 Resilient Overlay Networks Goal Increase reliability of communication for a small ie lt 50 nodes set of connected hosts Main idea End hosts discover networklevel path failure and cooperate to reroute 29 The RON Architecture Outage detection Active UDPbased probing Uniform random in 014 0072 3way probe Both sides get RTT information Store latency and lossrate information in DB Routing protocol Linkstate between overlay nodes Policy restrict some paths from hosts Eg don t use lnternet2 hosts to improve nonlnternet2 paths 30 Main results RON can route around failures in 10 seconds Often improves latency loss and throughput Singlehop indirection works well enough Motivation for second paper SOSR Also begs the question about the benefits of overlays 31 When and why does RON work Location Where do failures appear A few paths experience many failures but many paths experience at least a few failures 80 of failures on 20 of links Duration How long do failures last 70 of failures last less than 5 minutes Correlation Do failures correlate with BGP instability BGP updates often coincide with failures Failures near end hosts less likely to coincide with BGP Sometimes BGP updates precede failures why Feamster et al Measuring the Effects oflnternet Path Faults on Reactive Routing SIGMETRICS 2003 32 Location of Failures Why it matters failures closer to the edge are more difficult to route around particularly last hop failures RON testbed study 2003 About 60 of failures within two hops of the edge SOSR study 2004 About half of failures potentially recoverable with onehop source routing Harder to route around broadband failures why 33 SOSR gt 3 potential intermediaries diminishing returns Random selection works pretty well 03 E ran do mk El 86 P paths k El histo ryk 0 7 08 05 04 03 O ZJ 0 mailman of Tallures recovered I l I l I I l l I K number 0139 concurrently attempted Intermediaries Questions Results probably depend on topology ie placement of intermediaries What is the cost of random in terms of latency s4 Benefits of Overlays Access to multiple paths Provided by BGP multihoming Fast outage detection Butrequires aggressive probing doesn t scale Question What benefits does overlay routing provide over traditional multihoming intelligent routing eg RouteSciencey 35 Multihoming vs Overlays Intelligent Routing to lSPs 1 If all multihomed ends do this I18 netblock Routing table expansion Overlay Routing Cost F Overlay provider ATT Connectivity fees Connectivity fees overlay fee Operational Announce issues I20 subblocks Overlay node forces inter mediate ISP to provide transit 1 Bad interactions with policies Multihoming vs Overlays Route control similar to overlay routing for most practical purposes Overlays very useful for deploying functionality Multicast VPNs QoS security But overlays may be overrated for endtoend performance and resilience Don t abandon BGP there s still hope 37 Open Questions Efficiency Requires redundant traffic on access links Scaling Can a RON be made to scale to gt 50 nodes How to achieve probing efficiency Interaction of overlays and IP network Interaction of multiple overlays 38 Efficiency Problem traffic must traverse bottleneck link both inbound and outbound Solution innetwork support for overlays Endhosts establish reflection points in routers Reduces strain on bottleneck links Reduces packet duplication in applicationlayer multicast next lecture 39 ScaHng Problem On2 probing required to detect path failures Does not scale to large numbers of hosts Solution Probe some subset of paths which ones Is this any different than a routing protocol one layer higher A L Performance convergence speed etc 40 Scalability Interaction of Overlays and IP Network Supposed outcry from ISPs Overlays will interfere with our traffic engineering goals Likely would only become a problem if overlays became a significant fraction of all traffic Control theory feedback loop between ISPs and ove ays Philosophyreligion Who should have the final say in how traffic flows through the network Traffic matrix gt Changes In paths 41 Interaction of multiple overlays Endhosts observe qualities of endtoend paths Might multiple overlays see a common good path Could these multiple overlays interact to create increase congestion oscillations etc 42 The Price of Anarchy cost of worst Nash equilibrium socially optimum cost A directed graph G E source sink pairs siti for i1k rate ri 2 O of traffic between si and ti for each i1k For each edge e a latency function e 43 Flows and Their Cost Traffic and Flows A flow vector f specifies a traffic pattern fP amount routed on siti path P S l o Pf501 The Cost of a Flow Pf sum of latencies of edges along P wrt flow f Cf cost or total latency of a flow f 2P fP Pf 44 Example Fbw5 X me5 Traffic on lower edge is envious An envy free flow X at ow I 1xgtquotU 39eouv 2 me0 45 Flows and Game Theory Flow routes of many noncooperative agents each agent controlling infinitesimally small amount cars in a highway system packets in a network The toal latency of a flow represents social welfare Agents are selfish and want to minimize their own latency 46 Flows at Nash Equilibrium A flow is at Nash equilibrium or is a Nash flow if no agent can improve its latency by changing its path Assumption edge latency functions are continuous and non decreasing Lemma a flow f is at Nash equilibrium if and only if all flow travels along minimumlatency paths between its source and destination wrt f Theorem The Nash equilibrium exists and is unique 47 Braess s Paradox Traffic rate r 1 Cost of Nash flow 2 All the flows have increased delay 48 Existing Results and Open Questions Theoretical results on bounds of the price of anarchy 43 Open question study of the dynamics of this routing game Will the protocoloverlays actually converge to an equilibrium or will the oscillate Current directions exploring the use of taxation to reduce the cost of selfish routing 49 DoS Attacks Revisited Primary mitigation technique filtering Problem often too late by the time traffic reaches the bottleneck link One solution use communities to push null routes back into the network Problem configuring lots of packet filters may not scale introduce maintenance overhead 50 Using overlays to improve security Idea Combine overlays with lightweight filtering with overlays to implement distributed filtering Lightweight E Authenticator U Fllter Ring I Overlay E Nodes E quotquot Overlay Routing Clients Client a Authentlcator Clients communicate with overlay nodes authentication depends on service Overlay nodes provide lightweight authenticators to server Defense against DoS attacks 51 OverlayBased Filtering Filter ring set of routers that provide internet connectivity to a server Key design question placement of filter ring Lightweight capabilities must be easy to implement on resourcelimited routers Egress source address only allow packets from overlay nodes Server destination address null route packets that are not to server s IP Server changes destination IP address from time to time 52 BRICK A Novel Exact Active Statistics Counter Architecture Nan Hua1 Bill LinZ Jun Jim Xu1 Haiquan Chuck Zhao1 Georgiaamp Georgia Institute of Technology UCSD Tech 2University of California San Diego Supported in part by CNS0519745 CNS0626979 CNS0716423 CAREER Award ANI0238315 and a gift from CISCO Outline Motivation and current approaches Our approach Performance evaluation Conclusion Motivation 0 Routers need to maintain very large arrays of per flow statistics counters at wirespeed Needed for various network measurement router management traffic engineering and data streaming applications Millions of counters are needed for perflow measurements Large counters are needed eg 64 bits for worstcase counts during a measurement epoch At 40 Gbs just 8 ns update time Passive vs Active Counters Passive counters For collection of traffic statistics that are analyzed quotofflinequot counters just need to be updated in wirespeed but full counter values generally do not need to be read frequently say not until the end of a measurement epoch Active counters However a number of applications require the maintenance of active counters in which values may need to be read as frequently as they are incremented typically on a per packet basis eg in many data streaming applications on each packet arrival values need to be read from some counters to decide on actions that need to be taken Nai39ve Approach 0 Store full counters in SRAM which supports both passive and active counter applications 0 Problem Prohibitively Expensive eg 1 million flows x 64 bits 64 Mbits 8 MB of SRAM Number of flows increasing with line rates Hybrid SRAMDRAM Architectures Shah OZ Ramabhadran 03 Roeder 04 Zhao 06 0 Basic idea Store full counters in DRAM 64bits Keep say a 5bit SRAM counter one per flow Wirespeed increments on 5bit SRAM counters quotFlushquot SRAM counters to DRAM before they quotoverflowquot Once quotflushedquot SRAM counter won t overflow again for at least say another 25 32 or 2b in general cycles 0 Problem Passive Only Can only read counter values at DRAM speed eg 50 ns ltlt wirespeed Interleaved DRAM Architectures Lin and Xu HotMetrics OS 0 Basic idea Exploit the fact that modern DRAMs have many internal memory banks eg Rambus XDR has 16 internal banks per memory chip New memory transaction can be initiated say every 4ns if to a different internal memory bank even though memory latency is much higher Therefore wirespeed counter updates can be achieved 0 Problem Still Passive Only Worstcase counter read time too high Counter Braids Lu et al Sigmetrics 08 0 Inspired by the construction of LDPC codes Counter updates performed on an encoded structure called a quotcounter braid Counter values can be viewed as a linear transformation of flow counts However counter braids are quotmore passive than SRAM DRAM or DRAM architectures to find out the size of a single flow one needs to decode all flow counts in a lengthy decoding process 0 Problem Also Passive Only Approximate Counters 0 Generally based on the approximate counting idea by Morris 1978 Idea is to quotprobabilisticallyquot increment a counter based on the current counter value Small number of bits can be used eg 5 bits per counter and hence can be stored in SRAM for active retrieval However approximate counting in general has a very large error margin when the number of bits used is small eg well over 100 error not acceptable in many applications 0 Problem Large Errors Possible Summary 0 Nai39ve quotbrute force SRAM approach Too expensive 0 SRAMDRAM DRAM and counter braid approaches Passive counting applications only 0 Approximate methods Not sufficiently accurate Our Approach 0 Main observations The total number of increments during a measurement epoch is bounded by M cycles eg M 16 million cycles Therefore the sum of all N counters is also bounded by M eg N 1 million counters N 20 lt M i1 Although worstcase count can be M the average count is much smaller eg MN 16 then average counter size should be just log 16 4 bits Our Approach cont d 0 To exploit the fact that most counters will be small we propose a novel quotVariableLength Counter representation called BRICK which stands for Bucketized Rank Indexed Counters 0 Only dynamically increase counter size as necessary The result is an exact counter data structure that is small enough for SRAM storage enabling both active and passive applications Basic Idea 0 Randomly bundle counters into buckets counter index y 1 B k counters 1 39 b k k per uc et random k 1 permutation 32 3 2k 7r1N a 1N l permuted index i N k1 Bh N 0 Statistically the sum of counter sizes per bucket should be similar BRICK Wall Analogy 0 Each row corresponds to a bucket 0 Buckets should be statically sized to ensure a very low probability of overflow 0 Then provide a small amount of extra storage to handle overflow cases A Key Challenge and Our Approach 0 The idea of variable length data structures is not new but expensive pointers are typically used to quotchainquot together different segments of a data structure 0 In the case of counters these pointers are as or even more expensive than the counters themselves 0 Our key idea is a novel indexing method called Rank Indexing Rank Indexing o How rank indexing works The location ofthe linked element is calculated by the quotrankquot operation rankA b which returns the number of bits set in bitmapA at or before position b No need for explicit pointer stor age Bitmaps A I A I A A 3 A2 A 1 daga indzex datza injex datla 10 00000 C164 Ii i 10 mnmplquot 00000 C164 11111 C231 11 11111 C231 rank z 2 llquot 11110 C330 5 11110 7330 10 C42 10 C42 11011 C5379 11011 C5379 100 C64 r1111 1152 100 C6 11 C73 it SZnUl bitsetinIl 11 C73 10 C82 10 C82 Rank Indexing Key observation The rank operator can be efficiently implemented in modern 64bit X86 processors 0 Specifically both Intel and AMD x86 processors provide apopcount instruction that returns the number of 1 s in a 64 bit word 0 The rank operator can be implemented in just 2 instructions using a bitwise AND instruction and the popcount instruction Dynamic Sizing 0 Suppose we increment C2 which requires dynamic expansion into A2 0 The update is performed by performing a variable shift operation in A2 which is also efficiently implemented with x86 hardware instructions data index data index data data index data index data 10 10 rank h 1 quot 00000 C1 64 10 10 mm 1 N 00000 C1 64 mm 2 11111 C2 31 H 1 mnk 1 2 00000 C2 32 11110 C330 11 11110 C330 rank z 3 3939 10 C42 n 10 C42 11011 C5 379 11011 C5 379 rankIl 5 2 100 C6 2 rankIl 5 3 100 C6 2 4 nd rd It was 2 blt set 1n 1 11 C7 2 3 It s now 3 blt set 1n 1 u 11 C7 2 3 10 C82 10 C82 Finding a Good Configuration 0 We need to decide on the following for a good configuration k the number of counters in each bucket p the number of subarrays in each bucketA1 AP k1 kp the number of entries in each subarray k1 k w1 wp the bitwidth subarray 0 Given these configurations we can decide on the probability of bucket overflow Pf using a binomial distribution tail bound analysis Tail Bound I N 0 Due to the total count constraint 2C 3 M M i1 defined as md counters would be expanded into the dm Array 0 Translated into the language of balls and bins Throwing md balls into N bins The capacity of each bin is only kd Bound the probability that more than Jd bins have more than kd balls Tail Bound ll 0 Random Variable XW denotes the number of balls threw into ith bin when there comes m balls in total h o The fall probabIIIty IS Pr1X mgtc gt J j J is the number of fullsize buckets preallocated now we forgot quotdquot since the calculation is the same for each level For convenience we use c to denote kd 0 We could quotestimatequot fail probability by this way The overflow probability from one bin is roughly Binotail is tail probability 6 Z BmOtaZZkWN C of Binomial distribution Then the total fail probability would be roughly 5 Binotaz lhge J This calculation is not strict since Random Variable XWS are correlated under the constraint although weakly Tail Bound Ill How to quotdecorrelate the weakly correlated XW Construct Random Variables YW i1h which is iid random variables with Binomial distribution kmN it could be proved that EfXf 39X mS 2EfY1 Yzfm where f is an nonnegative and increasing function Then we could use the following increasing indicator function to get the bound fx19 9xh 1 h ZlxigtcgtJ i1 Numerical Results 0 Subcounter array sizing and percounter storage for k 64 and Pf 103910 a Sizing of subcounter arrays 0 162 kg 164 k5 w1 w2 ws w4 ws 3 15 3 1g M 3 4 13 4 25 10 2 1g 4 2 2 4 12 5 25 10 3 1 1g 2 2 3 4 9 b Size of each subcounter array kj gtlt wj in bits A2 A3 A4 A5 15x460 3X1339 25x250 10x440 2X1224 25x250 10x330 3X412 1X99 Unbw 0 Storage per counter 103 104 105 ilg6i05 1g5i66 1g5i50 za Effects of Larger Buckets o Bucket size k 64 works well amenable to 64 bit processor instructions 65 EI llt64 k128 E 6 k256 k512 E 8 E 55 CL 3 395 E g 5 D 45 39 39 39 3 4 5 number of levels p Simulation of Real Traces 0 USC 189 million packets 11 million flows and UNC traces 326 million packets 124 million flows Percentage of fullsize buckets lTraceH h lJl el USC 173K 111 060 UNC 195K 104 057 Concluding Remarks 0 Proposed an efficient variable length counter data structure called BRICK that can implement exact statistics counters 0 Avoids explicit pointer storage by means of a novel rank indexing method 0 Bucketization enables statistical multiplexing Denial of Service Attacks Nick Feamster CS 7260 February 22 2006 Today s Lecture What is Denial of Service Attacks and Defenses Packetflooding attacks Attack SYN Floods Defenses Ingress Filtering SYN Cookies Client puzzles Lowrate attacks Detection Singlepacket lP Traceback Networklevel defenses sinkholes and blackholes Inferring Denial of Service Activity Yang et al A DOSLimiting Network Architecture Distributed Denial of Service Intro Denial of Service What is it Attacker l Attempt to exhaust resources Network Bandwidth Transport TCP connections Application Server resources Typically highrate attacks but not always Victim PreZOOO Denial of Service DoS Tools Singlesource single target tools IP source address spoofing Packet amplification eg smurf Deployment Widespread scanning and exploitation via scripted tools Handinstalled tools and toolkits on compromised hosts unix Use Hand executed on source host TCP 3Way Handshake C S SYN SYNS ACKC ACIS Store data Wait Connected TCP handshake Each arriving SYN stores state at the server TCP Control Block TCB 280 bytes FIowID timer info Sequence number flow control status outof band data MSS other options agreed to Halfopen TCB entries exist until timeout Fixed bound on halfopen connections Resources exhausted gt requests rejected TCP SYN flooding Problem No client authentication of packets before resources allocated Attacker sends many connection requests Spoofed source addresses RSTs quickly generated if source address exists No reply for nonexistent sources Attacker exhausts TCP buffer to w halfopen connections SYN Flooding C S SYN SYNC3 Idea 1 lngress Filtering Drop all packets with source address other than 20469207024 20469207024 RFC 2827 Routers install filters to drop packets from networks that are not downstream Feasible at edges Difficult to configure closer to network core Idea 2 uRPF Checks Accept packet from interface only if forwarding table entry for source IP address matches ingress interface 5 Ode 100183 from wrong interface uRPF Enabled a a a o 39 v 1001024 Int 1 l00lBO24 Int 2 A Routing Table Destination Next Hop Unicast Reverse Path Forwarding Cisco ip verify unicast reversepath Requires symmetric routing 10 Problems with uRPF Umcas RPF apphid xo 50 would MI E black traf c In 9539 mute 7 m A sue Armm sn sue A Unmasx RPF apnea m sv would BLOCK name fram see A Besx mule m39 SP me sue A Asymmetric routing Idea 3 TCP SYN cookies General idea Client sends SYN w ACK number Server responds to Client with SYNACK cookie sqn fsrc addr src port dest addr dest port rand Server does not save state Honest client responds with ACKsqn Server checks response If matches SYNACK establishes connection 12 TCP SYN cookie TCP SYNACK seqno encodes a cookie 32 bit sequence number t mod 32 counter to ensure sequence numbers increase every 64 seconds MSS encoding of server MSS can only have 8 settings Cookie easy to create and validate hard to forge Includes timestamp nonce 4tuple 32 O l l 5 bits 3 bits 13 SYN Cookies client sends SYN packet and ACK number to server waits for SYNACK from server w matching ACK number server responds w SYNACK packet w initial SYNcookie sequence number Sequence number is cryptographically generated value based on client address port and time client sends ACK to server w matching sequence number server If ACK is to an unopened socket server validates returned sequence number as SYNcookie If value is reasonable a buffer is allocated and socket is opened SYN W9 SYNACK seqnumber as SYNcookie ack number NO BUFFER ALLOCATED K seqnumber ack numberdata SYNACK seqnumber ack number TCP BUFFER ALLOCATED 14 Client Puzzles Common technique Cryptographic hash inversion Challenge given k find X Response Message X Veri cation hMessage X H JH J kbits mk bits 1 s k s 64 m 128 02quot guesses to produce answer 15 Lowrate attacks Not all attacks are large flooding DOS attacks Wellplaced single packet attacks Packets may have spoofed IP addresses How to track these attacks and find their origin 16 IP Traceback 17 Logging Challenges Attack path reconstruction is difficult Packet may be transformed as it moves through the network Full packet storage is problematic Memory requirements are prohibitive at high line speeds QC192 is 10Mpktsec Extensive packet logs are a privacy risk Traffic repositories may aid eavesdroppers 18 SinglePacket Traceback Goals Trace a single IP packet back to source Asymmetric attacks eg Fraggle Teardrop pingof death Minimal cost resource usage One solution Source Path Isolation Engine SPIE 19 SPIE Architecture DGA Data Generation Agent computes and stores digests of each packet on forwarding path Deploy 1 DGA per router SCAR SPIE Collection and Reduction agent Long term storage for needed packet digests Assembles attack graph for local topology STM SPIE Traceback Manager Interfaces with IDS Verifies integrity and authenticity of Traceback call Sends requests to SCAR for local graphs Assembles attack graph from SCAR input 20 Data Generation Agents Compute packet digest Store in Bloom filter Flush filter periodically 21 Packet Digests Compute hashp Invariant fields of p only 28 bytes hash input 000092 WAN collision rate Fixed sized hash output nbits Compute k independent digests Increased robustness Reduced collisions reduced false positive rate 22 Hash input Invariant Content Total Length Fragment Offset Identification ii 28 Source Address bytes Checksum Destination Address First 8 bytes of Payload Remainder of Payload 23 Hashing Properties Each hash function Uniform distribution of input gt output H1 x H1 y for some xy gt unlikely Use k independent hash functions Collisions among k functions independent H1x H2y for some xy gt unlikely Cycle k functions every time interval t 24 Digest Storage Bloom Filters l Fixed structure size Uses 2 bit array n bits InItIallzed to zeros Insertion Use nbit digest as indices into bit array Set to 1 2n bits Membership Compute k digests d1 d2 etc If filterdi1 for all i router forwarded packet 25 Inferring DoS Activity IP address spoofing creates random backscatter SYM BJ Escatter m kst5 vmctim Attacker gt Attack 39 391 Backscat ter 26 Backscatter Analysis Monitor block of 17 IP addresses Expected of backscatter packets given an attack of m packets EX nm 232 Hence m X 232n Attack Rate R gt mT XT 232 n 27 Assumptions Uniform distribution Ingress filtering reflectors etc cause us to underestimate of attacks Reliable delivery Packet losses server overload amp rate limiting cause us to underestimate attack ratesdurations 28 Inferred DoS Activity 160 1 0 l a i I 83 8328 83 8239 323 32 3 Over 4000 DoSDDoS attacks 2 per week 0 m I E Short duration 80 last less 1 than 30 minutes 0 l I l I T l 2 5 10 30 1 2 6 12 1 2 7 min hour day Attack Duration Moore et al Inferring Internet Denial of Service Activity 29 Networklevel Defense Blackhole Figure 3 C us iomerJnfiiared Biack Hoie To Defend Against a DOS Amok 4Tra ic with destination 1921681 1 gets discarded at NULL imerfac 1Atluerlises1921EB11f32 with community 53566 Customer 1911 Did 3 All edge routers have static 39 route dial route mints Internet 10255155355132 to NULL interface Service Provider As 53 ltAa 192163 132 2 Sets community 5311 on commumw 53m 39 i 1921581324 39 z Matches community 53656 on 19216311I32 and changes HEKLHUP to 10255255255 Distributed Denial of Service DDoS Daemon Daemon Real Attacker IP addresses no longer need to be spoofed 31 Internet Measurement CS 7260 Nick Feamster February 13 2006 Administrivia Project meetings today and tomorrow next Monday as needed New problem set out Wednesday 34 shorter problems 9 days Today s Lecture Why measure Troubleshooting and debugging Incorporation into existing protocols Performance monitoring Science What to measure Routing traffic volumes applicationlevel statistics paths etc Pitfalls Measurement wishlist Traffic Why Measure Billing Measure usage on links tofrom customers Traffic engineering and capacity planning Measure the traffic matrix ie offered load Tune routing protocol or add new capacity Troubleshooting Anomaly detection Lecture 12 next week Traffic EngineeringCapacity Planning Need to estimate offered traffic loads BGP policy Topology configuration BGP routing Offered model traffic 1 Flow of traffic through the network eBGP gt routes Billing for Internet Usage 95th Percentile billing Customer network pays for committed information rate CIR Throughput measured every 5 minutes typically with SNMP flow statistics also can be used for billing Customer billed based on 95th percentile 00 M 800 M 500 M 400 M bitsx39sec 300 M 200 M 0200 0400 0800 0800 1000 1200 1400 1800 1800 2000 2200 0000 I Inbound Traffic current 48182 M nuerage 487430 M Max 745262 M Ioutbound Traffic Current 231085 M average 30754 M Max 635045 M Passive vs Active Measurement Passive Measurement Collection of packets flow statistics of traffic that is already flowing on the network Packet traces Flow statistics Applicationlevel logs Active Measurement Inject probing traffic to measure various characteristics Traceroute Ping Applicationlevel probes eg Web downloads Passive Traffic Data Measurement SNMP bytepacket counts everywhere Packet monitoring selected locations Flow monitoring typically at edges if possible Direct computation of the traffic matrix Input to denialof service attack detection Deep Packet Inspection also at edge where possible Simple Network Management Protocol Management Information Base MIB SNMP Information store Manager Agent Unique variables named by Mafiaged OIDS Objects Accessed with SNMP Specific MIBs for bytepacket counts per link SNMP Passive Advantage ubiquitous Supported on all networking equipment Multiple products for polling and analyzing data Disadvantages see Lecture 6 Coarse granularity Cannot express complex queries on the data Unreliable delivery of the data using UDP Utility Link utilization billing Traffic matrix inference 10 Packetlevel Monitoring Passive monitoring to collect full packet contents or at least headers Advantages lots of detailed information Precise tming information Information in packet headers Disadvantages overhead Hard to keep up with highspeed links Often requires a separate monitoring device 11 Full Packet Capture Passive Example Georgia Tech OCy 3Mon Pod installed in He rackmounting kit DAGfPod necting ca Rackmounted PC c o n n e c a rs Optical splitter Data Acquisition and Generation DAG card DAGSJTinstalled in internal PCI slot Source endacecom Ir Equipment R rack 12 1 L httpoptimusprimenocgatechedulcurrent Tr aFFic in Mbps For Feb 11 2006 00 01 02 03 04 05 05 0 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 Finegrained timing and applicationIevelm information port numbers TCP flags URLs etc 13 Portlevel information Example Traffic to and from 000 Col I Unassigned Port 23848088679 I 80 32432134245 I Ether well Known 252340901 53 I121231666851 22 9084626995 Note lots of port 22 traffic different from other GT subnets 14 Detailed Traffic Accounting Traffic monitoring by host Daily summaries Planetlab quot 1997712953 quot 19977128194 1997712973 quot19977128193 Spamhaus In MB 2631949 227877 517644 132103 Out MB 2706446 1885823 1579716 1179969 Total MB 5338395 2113700 2097360 1312071 15 Traffic Flow Statistics Flow monitoring eg Cisco Netflow Statistics about groups of related packets eg same lPTCP headers and close in time Recording header information counts and time More detail than SNMP less overhead than packet capture Typically implemented directly on line card 16 What is a flow Source IP address Destination IP address Source port Destination port Layer 3 protocol type TOS byte DSCP Input logical interface iflndex 17 Cisco Netflow Basic output Flow record Most common version is v5 Current version 9 is being standardized in the IETF templatebased More flexible record format Much easier to add new flow record types 4L CoIlection39lgnd Approximately 1500 bytes 7 gogfcwr Aggregation 2050 flow records Sent more frequently if traffic increases 18 Flow Record Contents Basic information about the flow Source and Destination IP address and port Packet and byte counts Start and end times ToS TCP flags pus information related to routing Nexthop IP address Source and destination AS Source and destination prefix 19 Aggregating Packets into Flows NN V flow 1 flow 2 flow 3 ow 4 Criteria 1 Set of packets that belong together Sourcedestination IP addresses and port numbers Same protocol ToS bits Same inputoutput interfaces at a router if known Criteria 2 Packets that are close together in time Maximum interpacket spacing eg 15 sec 30 sec Example flows 2 and 4 are different flows due to time 20 Netflow Processing 1 Create and update flows in NetFlow Cache Srclf SrclPadd Dstlf Dsthadd Protocol TOS Flgs Pkts SrcPort SrcMskSrcAS DstPort DstMsk DstAS NextHop BytesPkt Active Idle Fa10 173100212 Fa00 10022712 11 80 10 11000 00A2 I24 5 00A2 I24 15 100232 1528 391745 Fa10 17310032 Fa00 10022712 6 40 0 2491 15 I26 196 15 I24 15 100232 740 415 1 Fa10 173100202 Fa00 10022712 11 80 10 10000 00A1 I24 180 00A1 I24 15 100232 1428 11455 3 Fa10 17310062 Fa00 10022712 6 40 0 2210 19 I30 180 19 I24 15 100232 1040 245 14 2 Srclf SrclPadd Dstlf Dsthadd Protocol TOS Flgs Pkts SrcPortSrcMsk39SrcAS DstPort DstMsk DstAS NextHop BytesPktActive Idle Fa10 173100212 Fa00 10022712 11 80 10 11000 00A2 I24 5 00A2 I24 15 100232 1528 1800 3 eg ProtocolPort Aggregation Scheme becomes Protocol Pkts SrcPort DstPort BytesPkt 11 11000 00A2 00A2 1528 4 Export Version NonAggregated Flows export Version 5 or 9 Aggregated FIOWS export VerSion 3 0r 9 5 Transport Protocol Export Payload Packet ows Header 21 Reducing Measurement Overhead Filtering on interface destination prefix for a customer port number for an application eg 80 for Web Sampling before insertion into flow cache Random deterministic or hashbased sampling 1outof n or stratified based on packetflow size Two types packetlevel and flowlevel Aggregation after cache eviction packetsflows with same nexthop AS packetsflows destined to a particular service 22 Packet Sampling Packet sampling before flow creation Sampled Netflow 1out of m sampling of individual packets eg m100 Create of flow records over the sampled packets Reducing overhead Avoid perpacket overhead on m1m packets Avoid creating records for a large number of small flows Increasing overhead in some cases May split some long transfers into multiple flow records due to larger time gaps between successive packets E I 39 time not sampled timeout Sampling FlowLevel Sampling Sampling of flow records evicted from flow cache When evicting flows from table or when analyzing flows Stratified sampling to put weight on heavy flows Select all long flows and sample the short flows Reduces the number of flow records Still measures the vast majority of the traffic Flow 1 40 bytes lt pliwith 01 probability I Flow 2 15580 bytes Flow 3 8196 bytes Flow 4 5350789 bytes Flow 5 532 bytes Flow 6 7432 bytes mth 10 probability sample with 100 probability Traffic EngineeringCapacity Planning Need to estimate offered traffic loads BGP policy How to get this Topology configuration BGP routing Offered model traffic 1 Flow of traffic through the network eBGP routes 25 Traffic Matrix Estimation From link counts to the traffic matrix Sources 5Mbps 3Mbps 4Mbps 4Mbps DesUnanns Formalization Sourcedestination pairs p is a sourcedestination pair of nodes Xp is the unknown traffic volume for this pair Links in the network is a unidirectional edge y is the observed traffic volume on this link Routing R 1 if link I is on the path for srcdest pair p Or R J is the proportion of p s traffic that traverses I o y Rx now work back to get X 27 Problem Underconstrained Linear system is underdetermined Number of nodes n Number of links e is around 0n Number of srcdest pairs c is 0n2 Dimension of solution subspace at least c e Multiple observations can help k independent observations over time Stochastic model with srcdest counts Poisson amp iid Maximum likelihood estimation to infer traffic matrix Use NetFlow to augment byte counts can help constrain the problem Lots of recent work on traffic matrix estimation 28 Trend Deep Packet Inspection Intrusion detection capabilities in the router Analysis of packet payloads Recognition of application type Analyzes and identifies application traffic in real time Implemented with programmable hardware Different types of analysis techniques Signaturebased Rulebased Statistical 29 NetFlow vs NBAR Main Objective Side Benefits NetFlow Flow Characterization DDOS amp Worm Detection Network Usage Capacity Planning TE Billing Permanent Record of network activity Capacity Traffic Eng Peering Optimized Edge Routing OER NBAR Identify amp classify traffic based on Detection amp droppinglimiting of payload attributes amp protocol undesired traffic peertopeer file characteristics sharing worms Application statistics for bandwidth provisioning 30 30 Passive Traffic Data Measurement SNMP bytepacket counts everywhere Packet monitoring selected locations Flow monitoring Tracking the application mix Direct computation of the traffic matrix Input to denialof service attack detection 3 Active Measurement Tools Send probes and measure a response Afew common tools Ping RTT and loss Traceroute path and RTT iPerf generation of CBR flows for throughput estimation Pathchar perhop bandwidth latency loss measurement Pchar clink opensource reimplementation of pathchar Nettimer Lai bottleneck bandwidth using packet pair method 32 Using Traceroute to Measure Paths Send packets with increasing TTL values TTL1 TTL2 TTL3 T A ICMP time exceeded 33 Problems with Traceroute Can t unambiguously identify oneway outages Failure to reach host failure of reverse path ICMP messages may be filtered or ratelimited IP address of time exceeded packet may be the outgoing interface of the return packet TTL1 TTL2 TTL3 34 Routing Why Measure Detect anomalies eg route hijacks routing loops oscillations etc More likely to help explain a problem observed with traffic 35 Common Mode iBGP Monitor iBGP session to software box Sees only the best route Two types of logs Table dumps BGP updates 36 CPR Network Measurement at Georgia Tech Detection of noncatastrophic failures Discovered only when users call to complain The Internet is slow this morning This probably doesn t coincide with a warning in OpenView Little quantitative data to verify problems or determine the possible location of the issue 37 Today s Paper Paxson Strategies for Sound Internet Measurement Lessons Metadata Selfconsistency checks Examining the semantics of the data eg TCP acknowledgements without data Analysis of small subsets of data especially for large datasets Automated analysis of longrunning measurements 38 Traffic Anomaly Detection Nick Feamster CS 7260 February 20 2006 Administrivia Problem Set 2 Due Friday Quiz next Monday Open everything notes Web etc Handful of reading questions 12 Design Questions based on things we ve learned MPLS Overview Main idea Virtual circuit Packets forwarded based only on circuit identifier Router can forward traffic to the same destination on different interfacespaths Circuit Abstraction Label Swapping I54 Labelswitched paths LSPs Paths are named by the label at the paths entry point At each hop label determines Outgoing interface New label to attach Label distribution protocol responsible for disseminating signalling information Layer 3 Virtual Private Networks Private communications over a public network A set of sites that are allowed to communicate with each other Defined by a set of administrative policies determine both connectivity and 008 among sites established by VPN customers One way to implement BGPMPLS VPN mechanisms RFC 2547 Building Private Networks Separate physical network Good security properties Expensive Secure VPNs Encryption of entire network stack between endpoints Layer 2 Tunneling Protocol L2TP PPP over IPquot Privacy and No encryption interconnectivity not con dentiality Layer 3 VPNS integrity etc Layer 2 vs Layer 3 VPNs Layer 2 VPNs can carry traffic for many different protocols whereas Layer 3 is IP only More complicated to provision a Layer 2 VPN Layer 3 VPNs potentially more flexibility fewer configuration headaches VPN AlSite 2 Layer 3 BGPIMPLS VPNs VPN Site 1 CEBZ VPN BISite 2 BGP to exchange routes MPLS to forward traf c CE 3 103 6 10111 VPN AlSite 3 VPN AlSite 1 04 16 VPN BISite 3 Isolation Multiple logical networks over a single shared physical infrastructure Tunneling Keeping routes out of the core HighLevel Overview of Operation IP packets arrive at PE Destination IP address is looked up in forwarding table Datagram sent to customer s network using tunneling ie an MPLS labelswitched path BGPIMPLS VPN key components Forwarding in the core MPLS Distributing routes between PEs BGP Isolation Keeping different VPNs from routing traffic over one another Constrained distribution of routing information Multiple virtual forwarding tables Unique addresses VPNIP4 Address extension 10 Layer 3 VPNs Vanilla Layer 3 VPNs All customer routes in the core E LDP LDP 1 1 P 7 quotp quotMPLs CORE 11 Problems Introduced by Layer 3 VPNs Overlapping address space in forwarding table Solution Virtual routing and forwarding table VRF Overlapping address space in BGP routes Solution Route distinguisher 8 byte VPNspecific identifier prepended to each IP address Typically one route distinguisher per VPN New VPNIP address family Routes carried with multiprotocol BGP Filtering routes from routes not at that site Route target basically a special BGP community value 12 Virtual Routing and Forwarding Separate tables per customer at each router Customer 1 39 10010124 7 1001024 39 D Green Customer 2 1001 024 Customer 2 1001024 RD Blue 13 Routing Constraining Distribution Performed by Service Provider using route filtering based on BGP Extended Community attribute BGP Community is attached by ingress PE route filtering based on BGP Community is performed by egress PE Static route RIP etc RD1001024 Route target Green Nexthop A o1024 14 BGPIMPLS VPN Routing in Cisco IOS ip vrf CustomerA rd 100110 routetarget import 100 1 OOO ip vrf CustomerB rd 100120 routetarget export 1002000 routetarget import 1002000 15 Forwarding PE and P routers have BGP nexthop reachability through the backbone IGP Labels are distributed through LDP hopbyhop corresponding to BGP NextHops TwoLabel Stack is used for packet forwarding Top label indicates NextHop interior label Second level label indicates outgoing interface or VRF exterior label Corresponds to LSP of Corresponds to BGP nexthop PE VRFinterface at exit Layer 2 Label Label lp Datagram Header 1 2 16 Forwarding in BGPIMPLS VPNs Step 1 Packet arrives at incoming interface Site VRF determines BGP nexthop and Label 2 Lage39 IP Datagram Step 2 BGP nexthop lookup add corresponding LSP also at site VRF Label 1 IP Datagram Label 2 17 Scalability Problems Lots of customers leads to explosion of routing tables How to ensure that no single router needs to carry state for all customers 18 Other Uses for MPLSITunneling Reducing state in network core Internal routers no longer need paths for every des na on Traffic engineering Can shift traffic based on virtual circuits notjust destination prefixes 19 Open Research Questions Static configuration analysis for enforcing isolation and other security policies Easier in some sense since security reachability policies are likely easier to encode 20 Traffic Anomaly Detection Motivation Many actionable changes to traffic patterns DDoS attacks Routing anomalies Link failures Flash crowds 21 Gap between Capabilities and Goals Traditional Network Traffic Analysis Focus on Short stationary timescales Traffic on a single link in isolation Principal results Scaling properties Packet delays and losses What lSPs Care About Focus on Long nonstationary timescales Traffic on all links simultaneously Principal goals Anomaly detection Traffic engineering Capacity planning 22 NetworkWide Traffic Analysis Anomaly Detection Which links show unusual traffic Traffic Engineering How does traffic move throughout the network Capacity planning How much and Where in network to upgrade 23 This is Complicated Measuring and modeling traffic on all links simultaneousy is challenging Even single link modeling is difficult 1003 of links in large IP networks HighDimensional timeseries Significant correlation in link traffic 24 OriginDestination Flows total traffic on the link m 4 u NLW traffic time Link traffic arises from the superposition of OriginDestination OD flows A fundamental primitive for wholenetwork analysis 25 How to Analyze OD Flows Even more OD flows than links Still a high dimensional multivariate timeseries How do we extract meaning from this high dimensional structure in a systematic manner 26 Dimensionality Reduction Look for good lowdimensional representations A highdimensional structure can be explained by a small number of independent variables A commonly used technique Principal Component Analysis PCA aka KLTransform SVD 27 Summary Measure complete sets of OD flow timeseries from two backbone networks Use PCA to understand their structure Decompose OD flows into simpler features Characterize individual features Reconstruct OD flows as sum of features Call this structural analysis 28 29 Some have visible structure some less so 1 1 v 1 1 5 5 1 mm cm 4 11am 111 on F1w157 11 TEME 111 02 no 425 nu am 1m am am 1 u Tramu 111 on no a Tramu mAh on F1w27 11am 111 on no 1 24 Tramu mAh on Home u X 15 1 1 v 1 r 5 5 1 1 v 1 r 5 5 11am 111 on rmwas u 1 1 v 1 1 5 5 11am 111 on r1nw1a u 1 1 v 111 11am moDFmMW 1 1 v 1 1 5 u 1 1 v 1 r 5 5 5 1 1 v 1 r 5 11am 111 on F1w131 11am 111 on no 1 11 mu 1 1 v 1 r 5 5 11am mAh on Home 25 u 1 1 v 1 r 5 5 X 1 1m 2m 3m mu an em mu mu gm 1 Example OD Flows Structural Analysis Are there low dimensional representations for a set of OD flows Do OD flows share common features What do the features look like Can we get a highlevel understanding of a set of OD flows in terms of these features 30 Principal Component Analysis Coordinate transformation method Original Data P02 Transformed Data PC1 31 Properties of Principle Components Each PC in the direction of maximum remaining energy in the set of OD flows Ordered by amount of energy they capture Eigenflow set of OD flows mapped onto a PC a common trend Ordered by most common to least common 32 PCA on CD flows A r 39 OD 39 OD pairs Pairs OD pairs 9 cu T3 0 o s E Q 0 41 h OD ow Eigenfow PC X OD flow u Eigenflow Principal matrix matrix matrix V 33 PCA on CD flows 2 u Xvi Z 1 p Each eigenflow is a weighted z a quotquot sum of all OD flows E E E E i Eigenflows are orthonormal Singular values indicate the Xv A2 0i V 239 energy attributable to a principal component r 1082 2 1 map Each OD flow IS weighted 02 sum of all eigenflows 34 An Example Eigenflow and PC 0 o on Eigenflow 6 OD FIOW 167 005 OD Flow 94 20 40 60 80 100 120 140 160 OD Flow 5m 5m Low Dimensionality of CD Flows Small number of PCs capture most of signal s structure 1 39 Sprint1 Abilene of 06 energy captured by each dimension i Energy captured Principal Components PrXltx Structure of OD Flows 1 Sprint1 Abilene Most OD flows have less than 20 significant eigenflows Can think of each OD flow as having only a small set of features 160 20 40 60 80 100 120 140 Number of Eigenflows in an OD flow 37 Reasons for Low Dimensionality Generally traffic on different links is dependent Link traffic is the superposition of origin destination flows OD flows The same OD flow passes over multiple links inducing correlation among links All OD flows tend to vary according to common daily and weekly cycles and so are themselves correlated 38 Approximating With Top 5 Eigenflows x107 35 I I I Original 5 a 5P0 ff r t39 39 uulllillli39quotrlrmm 39quot39 39 N 01 I I lllllg39t k 39 IAl luvnunquot av NM i m m wni WWWl 39 Umquot I l um un m bll lug139 lil lll r39l 39wllflilp frlirll Traf c in 0D FIOW 88 l hhh quotMPP I I I I Mon Tue Wed Thu Fri Sat sun Eigen ow 2 Kinds of Eigenflows Eigen ow 20 Eigen ow 29 Deterministic deigenflows Periodic trends Spike seigenflows Sudden isolated spikes and drops Noise neigenflows Roughly stationary and Gaussian 40 Hundreds of Eigenflows But Only Three Basic Typ es V v m In W 3 ICE SIDE at m at and 131 um um r s a 9 quot I 3 a 0 a E 2 t m 3 3 322m 72 an an 1 ans an m at 331 am m 406 332 3112 N 1 ea 39rm m 2 5 we u m m in s 311 w m m a 3 3E 53 a JG 2 325 32 a a a a n m 7quot r 1m E zns F 39 o rm quot39 a U c a 39 gt3 m m 1 am 1 a 3 1 as its in m m n1 Wet n m m m a San w m wlt1 5m m m m m am am 305 lt at m m 1 r in am am 3 aquot a g 3 2 a a s a quot475quot w m m an at w 1a ms 3 m 4 ans lt1 m I w m m m m 3 m m um m m n H gr 9 w m m m 3 Sprint b Abilene 41 An OD Flow Reconstructed OD flow Dcomponents Scomponents Ncomponents 42 Application Anomaly Detection Is my network experiencing unusual conditions Then adopt the following framework Detection Is there an unusual event Identification Which of the possible explanations fits best Quantification How serious is the problem 43 Statistical Approach The advantage of such a framework is that it lends itself to a statistical approach Detection Outlier detection Anomaly Identification Hypothesis testing Diagnosis Quantification Estimation 44 State of the Art Much previous work in anomaly detection attack detection and traffic characterization Previous work has Relied on rules and heuristics Not taken a general approach Almost exclusively concentrated on measurements from individual links 45 WholeNetwork Diagnosis Effective diagnosis of network anomalies requires a wholenetwork approach For example diagnosing traffic anomalies requires analyzing traffic from all links 46 Complicated ANNAUWW JNQAUW X 1 7 DNVR SNVA X 1 7 HsTN Kscv X 1 3 LOSA SNVA X 10 3 cHIN NYcM X 1 5 LOSA LOSA X 1 7 HSTN HSTN X 105 STTL STTL 2 5 5 5 4 5 3 1 4 2 5 1 3 X 1 7 ATLA ATLA X 107 DNVR DNVR X 1 5 Kch Kscv X 107 SNVA SNVA 4 5 15 2 5 4 2 10 3 5 5 3 1 5 2 5 o 5 2 X 1 DMAsH WASH ANNAUWW How to extract meaning from such a highdimensional data in a systematic manner 47 Low Intrinsic Dimensionality of Link Traffic Mmmaii 11 main appmx mai tad by a W mmaima Variance Captured dik Min quot ail in Sprint 1 n y 4L i Elln 5 10 20 25 30 35 4O 45 Principal Component Anomaly Detection Subspace Method An approach to separate normal from anomalous traffic Define 8 as the space spanned by the first k principal components Define 8 as the space spanned by the remaining principal components Then decompose traffic on all links by projecting onto 8 and S to obtain yryx links at a particular point in time Residual traffic Normal traffic vector vector 49 The Subspace Method Geometrically In general anomalous traffic results in a large value of 37 Traffic on Link 2 9Cy Traffic on Link 1 50 Diagnosing Volume Anomalies A volume anomaly is a sudden change in an OD fIOW S traffic ie point to point traffic Problem Given link traffic measurements diagnose the volume anomalies 5 An Illustration OD ow ib SprintEurope Backbone Network The Diagnosis Probem requires analyzing traffic on all links to 1 Detect the time of the anomaly 2 Identify the source amp destination 3 Quantify the size of the anomaly 52 Subspace Method Detection Error Bounds on Squared Prediction Error SPE E iif li2 IICYH2 Assuming multivariate Gaussian data traffic is normal when SPE 3 6i Traffic on Link 2 Result due to Jackson and Mudholkar 1979 Traffic on Link 1 53 SPE vs All Traffic Value of HyH2 over time Value of H37 2 overtime I l quotml3n Tue Wed Thu Fr SPE IISIHZ at anomaly time points clearly stand out 54 Subspace Method Identification An anomaly causes a displacement of the state vector away from 5 The direction of the displacement gives information about the nature of the anomaly Intuition find the hypothesis that best describes a detected anomaly x10 Select the OD flow that 3 accounts for 81 maximum 391 residual traffic Identification Hypothesis Testing Denote the set of all anomalies fui 1I Each anomaly 739 adds traffic in some way 02 So in the presence of anomaly y 37 92 And the best estimate of yis found by minimizing the distance to 5 in the direction of the anomaly 56 A Geometric Illustration y N Normal Subspace Cm argrginuy zfzH 57 Selecting the Best Hypothesis 1 For each hypothesized anomaly 723239 17quot397I compute yZ as IV gift 2 Select anomaly Fj as j arg mini In this manner select the anomaly that accounts for maximum residual traffic 58 DenialofService and Resource Exhaustion Nick Feamster CS 7260 April 2 2007 Today s Lecture What is Denial of Service Attacks and Defenses Packetflooding attacks Attack SYN Floods Defenses Ingress Filtering SYN Cookies Client puzzles Lowrate attacks Detection Singlepacket IP Traceback Networklevel defenses sinkholes and blackholes Inferring Denial of Service Activity Distributed Denial of Service Worms Other resource exhaustion spam Denial of Service What is it Attacker l Attempt to exhaust resources Network Bandwidth Transport TCP connections Application Server resources Typically highrate attacks but not always Victim PreZOOO Denial of Service DoS Tools Singlesource single target tools IP source address spoo ng Packet amplification eg smurf Deployment Widespread scanning and exploitation via scripted tools Handinstalled tools and toolkits on compromised hosts unix Use Hand executed on source host TCP 3Way Handshake C S SYN SYNS ACKC ACKS Store data Wait Connected TCP handshake Each arriving SYN stores state at the server TCP Control Block TCB 280 bytes FIowID timer info Sequence number ow control status outofband data MSS other options agreed to Halfopen TCB entries exist until timeout Fixed bound on halfopen connections Resources exhausted gt requests rejected TCP SYN flooding Problem No client authentication of packets before resources allocated Attacker sends many connection requests Spoofed source addresses RSTs quickly generated if source address exists No reply for nonexistent sources Attacker exhausts TCP buffer to w halfopen connec ons SYN Flooding SYNc1 Listening Store data Idea 1 Ingress Filtering Drop all packets with source address other than 20469207024 RFC 2827 Routers install filters to drop packets from networks that are not downstream Feasible at edges Difficult to configure closer to network core Idea 2 uRPF Checks Accept packet from interface only if forwarding table entry for source IP address matches ingress interface Strict Mode uRPF 100183 from wrong a u a 39v 100183 I 101203 quotAquot Routing Table Destination Next Hop 1001024 Int 1 10018024 Int 2 Unicast Reverse Path Forwarding Cisco ip verify unicast reversepath Requires symmetric routing 10 Problems with uRPF UmcastRPF apphed m so mum nut B t t t macmmmcm as an e a steA sneAfmm sp unmast RPF apphed m 51mm BLOCK traf cfmm me A Best mte m SP mm me A Asymmetric routing Idea 3 TCP SYN cookies General idea Client sends SYN w ACK number Server responds to Client with SYNACK cookie sqn fsrc addr src port dest addr dest port rand Server does not save state Honest client responds with ACKsqn Server checks response If matches SYNACK establishes connection TCP SYN cookie TCP SYNACK seqno encodes a cookie 32 bit sequence number t mod 32 counter to ensure sequence numbers increase every 64 seconds MSS encoding of server MSS can only have 8 settings Cookie easy to create and validate hard to forge Includes timestamp nonce 4tuple 32 O l l 5 bits 3 bits 13 SYN Cookies client sends SYN packet and ACK number to server waits for SYNACK from server w matching ACK number server responds w SYNACK packet w initial SYNcookie sequence number Sequence number is cryptographically generated value based on client address port and time client sends ACK to server w matching sequence number server If ACK is to an unopened socket server validates returned sequence number as SYNcookie If value is reasonable a buffer is allocated and socket is opened SYN m SYNACK seqnumber as SYNcookie ack number NO BUFFER ALLOCATED K seqnumber ack numberdata SYNACK seqnumber ack number TCP BUFFER ALLOCATED IP Traceback Logging Challenges Attack path reconstruction is difficult Packet may be transformed as it moves through the network Full packet storage is problematic Memory requirements are prohibitive at high line speeds QC192 is 10Mpktsec Extensive packet logs are a privacy risk Traffic repositories may aid eavesdroppers SinglePacket Traceback Goals Trace a single IP packet back to source Asymmetric attacks eg Fraggle Teardrop pingof death Minimal cost resource usage One solution Source Path Isolation Engine SPIE Packet Digests Compute hashp Invariant elds of p only 28 bytes hash input 000092 WAN collision rate Fixed sized hash output nbits Compute k independent digests Increased robustness Reduced collisions reduced false positive rate Hash input Invariant Content Total Length Identification Fragment Offset TTL Checksum 28 Source Address bytes Destination Address First 8 bytes of Payload Remainder of Payload Hashing Properties Each hash function Uniform distribution of input gt output H1 x H1 y for some xy gt unlikely Use k independent hash functions Collisions among k functions independent H1 x H2y for some xy gt unlikely Cycle k functions every time interval t 20 Digest Storage Bloom Filters Fixed structure size Uses 2quot bit array Initialized to zeros Insertion Use nbit digest as indices into bit array Setto 1 2n bits Membership Compute k digests d1 d2 etc If filterdi1 for all i router forwarded packet 21 Other nNetwork Defenses Automatic injection of blackhole routes Rerouting through traffic scrubbers 22 Inferring DoS Activity IP address spoofing creates random backscaner SYNACK backseatter Backscatter Analysis Monitor block of 17 IP addresses Expected of backscatter packets given an attack of m packets EX nm 232 Hence m X 232n Attack Rate R gt mT XT 232n 24 Inferred DoS Activity Uanue Violii39i iPshaui Attacks 1m 0 S 3 mean 119 on 00 0203 on on 021 1 no on 9205 a 00 on 0202 7 4 iv a Tracer no on 9223 or on m w 02 i 7 3220 Over 4000 DoSIDDoS attacks per week 39 Short duration 80 last less than 30 minutes O l 10 I I I 30 1 2 6 hour Attack Duration I I 12 1 day Moore et al Inferring Internet Denial of Service Activity 25 DDoS Setting up the Infrastructure Zombies Slowspreading installations can be difficult to detect Can be spread quickly with worms lndirection makes attacker harder to locate No need to spoof IP addresses 26 What is a Worm Code that replicates and propagates across the network Often carries a payload Usually spread via exploiting flaws in open services Viruses require user action to spread First worm Robert Morris November 1988 6510 of all Internet hosts infected l Many more since but none on that scale until July 2001 27 Example Worm Code Red Initial version July 13 2001 Exploited known ISAPI vulnerabiity in Microsoft IIS Web seners 1St through 20th of each month spread 20th through end of each month attack Payload Web site defacement Scanning Random IP addresses Bug failure to seed random number generator 28 Code Red Revisions Released July 19 2001 Payload flooding attack on wwwwhitehousegov Attack was mounted at the IP address of the Web site Bug died after 20th of each month Random number generator for IP scanning fixed 29 Code Red Host Infection Rate Measured using backscatter technique Modeling the Spread of Code Red Random Constant Spread model K initial compromise rate N number of vulnerable hosts a fraction of vulnerable machines already compromised Nda NaK1 i adt LYJ Newly infected Machine Rate at which uninfected machines in dt already infected machines are compromised Bristling Pace of Innovation Various improvements to increase the infection rate Code Red 2 August 2001 Localized scanning Same exploit different codebase Payload root backdoor Nimda September 2001 Spread via multiple exploits llS vulnerability email CR2 root backdoor copying itself over network shares etc Firewalls were not sufficient protection 32 Designing FastSpreading Worms Hitlist scanning Time to infect first 10k hosts dominates infection time Solution Reconnaissance stealthy scans etc Permutation scanning Observation Most scanning is redundant Idea Shared permutation of address space Start scanning from own IP address Rerandomize when another infected machine is found Internetscale hit lists Flash worm complete infection within 30 seconds 33 Recent Advances Slammer February 2003 Exploited vulnerability in MS SQL server Exploit fit into a single UDP packet Send and forget Lots of damage BofA Wash Mutual ATMs unavailable Continental Airlines ticketing of ine Seattle E911 offline 34 Scary recent advances Witty March 19 2004 Single UDP packet exploits flaw in the passive analysis of Internet Security Systems products Bandwidthlimited UDP worm ala Slammer Initial spread seeded via a hitlist All 12000 vulnerable hosts infected within 45 mins Payload slowly corrupt random disk blocks 35 Why does DDoS work Simplicity On by default design Readily available zombie machines Attacks look like normal traffic lnternet s federated operation obstructs cooperation for diagnosismitigation 36 Resource Exhaustion Spam Unsolicited commercial email As of about February 2005 estimates indicate that about 90 of all email is spam Common spam filtering techniques Contentbased filters DNS Blacklist DNSBL lookups Significant fraction of today s DNS traffic Can IP addresses from which spam is received be spoofed 37 BGP Spectrum Agility Announcements Wilhdmmis and Spam fmm 610008 Log IP addresses of SMTP relays Join with BGP route advertisements seen at network where spam trap is colocated 10 minutes Announcement 0 5 am Withdrawal 132000 20050930 l3l DD I53000 20050930 10050930 Tune 133500 20050930 A small club of persistent players appears to be using this technique 61 0008 4678 660008 21562 820008 8717 Somewhere between 110 of all spam some clearly intentional others might be flapping A Slightly Different Pattern cl 1 Announcemeln 5 pam Wi1hd1a al 22 00 20041228 050000 12 00 1 D 02 00 0 16 00 20041229 20041229 20041229 20041y30 20041 20 20041230 Tune Why Such Big Prefixes Flexibility Client lPs can be scattered throughout dark space within a large 8 Same sender usually returns with different IP addresses Visibility Route typically won t be filtered nice and short 40 Characteristics of lPAgile Senders IP addresses are widely distributed across the 8 space IP addresses typically appear only once at our sinkhole Depending on which 8 6080 of these IP addresses were not reachable by traceroute when we spotchecked Some IP addresses were in allocated albeing unannounced space Some AS paths associated with the routes contained reserved AS numbers 41 Privacy and Anonymity Nick Feamster CS 7260 April 3 2006 Today s Lecture Anonymity and privacy Who should care Anonymous communication primitives Attacks on anonymous communication Traffic analysis lnfranet Anonymity without encryption Recent trends Privacy and Anonymity Privacy Protecting sensitive information from others Anonymity Being indistinguishable from others in a group Who should care Individuals Web surfing Email VOIP Purchasing habits Work patterns and locations Corporations Collaborations partners Political dissidents Why should you care Surveillance qumwmmm gg f h ga 3111me my o n S 0 rs h p A Fng ng icg mmg Emwmcg g Utm fmc rfia Tamm cgr Privacy in the News Policing Porn Is Not Part of Job Description Montgomery Homeland Security Officers Reassigned After Library Incident By Cameron W San Google desktop slammed for privacy problems Washington Post Sta 39Writer Friday Februaryf 1 2008 Pa Tom Sanders in CaliFornia ununetcom 13 Fat 2003 LL L gtriuli i a a Imu39 lur lieuer Two uniformed men stcol 1 o 1 library mBethesda one d prim gm the Hume Republican Speaks Upr Leadlng Others to EFruntier Foundation EFF and l l 7quot 1 of all patrons using the CsecurittlI vendor Kaspersky Lab 11 Illenge ll etaps t Th haue lashed out against Faogle39s BJV SHERYL GAY STOLEERG f Sgceme 39 e V1ew119t3939l rti rts desmp Published Februar39r112DDB or 1 en 39 The EFF has advised users to VIA39SHENGTON Feb 10 7 Nhen Representative Heather A Wilson broke avoid the application because a The men100ked Stem eature known as Search Across 39 words quotHomeland Secumtccwmputnr quotgreatly increases the domestic eavesdropping She gave some to what some tellow Republtcans r risk consumer privacyquot leavmg some reeldents cc The EFF Is a not for profit expllaln how employees aEorganisation that aims to protect against terrorists came teciuil liberties in the networked Id of pornography W ranks with President Bush on Tuesday to declare her quotserious concerns39 about were ifnot saying Now they are speaking up and growing louder Google launched Eoogle Desktop version 3 on Thursday One of the application39s new features is the a ilitil to search for documents including text Files PDFs and spreadsheets stored on anyI of the user39s computers L1 interviews ever several days Congressional RE expressed growing doubts about the National Sen program to intercept international communication United States unthout court warrants A growing Republicans say the program appears to violate ll lntelhgence Surveillance Act the 1978 law that c1 oversee such surveillance and are calling for revs To enable this teature the application temporarilyI stores copie law Anonymity Requires Company Traditional security requirements confidentiality integrity etc can be achieved between a pair of participants Anonymity requires each participant to carry traffic for others Requires some notion of distributed trust Types of Anonymity Single proxy Encryption Safeweb anonymizer etc Covert channels Infranet Network of proxies Mixnets Encryption onion routing nymaliasnet Tor Dining Cryptographer Networks Herbivore SingleProxy Anonymization Advantages Typically fast lowlatency communication Disadvantages Initiator must trust the proxy Single point of failure and attack anonpenetfi anonymizercom safewebcom Mixnets First proposed in 1981 by Chaum Each proxy handles messages in batches Key property unlinkability Eil message 1 7 message 2 g message 3 g message 4 Challenge Designing lowlatency mix networks 10 Onion Routing Utilizes advantages provided by both proxies and mixnets Initiator 11 Dining Cryptographers Proxies organized in a ring topology Proxy flips a coin and shares with left neighbor Each proxy publishes the parity XOR of two bits To send a 1bit message initiator lies when publicizing the coin flip When everyone tells the truth statements consistent When one party is lying statements are inconsistent but can t identify source of inconsistency nformationtheoretically anonymous but rather impractical 12 NonPayer s View Same Coins L 39f i cliff rent L 39 3quotdifferent f 13 NonPayer s View Different Coins Some Attacks on Anonymous Communication Systems Timing attacks traffic into and out of the network may reveal communication patterns Communication patterns eg exploit timezone information Packet counting attacks onion routing sends streams along the same path User interaction attacks sender of the user behaves differently than someone who is relaying 15 Resisting Surveillance and Censorship Many governmentscompanies trying to limit their citizens access to information Censorship prevent access Punishment deter access Surveillance spy on or monitor access China Saudi Arabia many companies 16 Web Censorship Government manages national web firewall Not optional catches ALL web traffic Block certain requests Possibly based on content More commonly on DNS name lP addresspublisher China Western news sites Falun Gong etc Log requests to detect troublemakers Even without blocking mayjust watch traffic But they don t block all Web traffic Creates a crack in their barrier 17 Renewed Relevance 3E NEWS l nul H Last Llpcated ii ifednestiayJ 25 January 2006 0845 GMT E Email this to a Friend E F rintable version Google censors itself for China 0 IIK uersirin I lnt ernatirinal Leading internet company Geegle has said it will censor its search services in China in urderte gain greater access to China39s fastgrowing market Geegle has offered a Chinese language version efits search engine for years but users have been frustrated by gnuernment blanks en the site See Boogie China The company is setting up a new site GQDngCH which it will censor itself to satisfy the authenties in Beijing L7 utto n39ii rst i TECHNOLOGY Tech giantslawmakers debate censorship ll39h39n nns a39r F obluary 1610E 5P nzod 53961 pm EST 122251 GMT WESHlNGTON AP Four U5 hightech campanias on Wednesday found themselves branded cullahorators with the Chinese government in suppressing dissent in return For access to a booming Internet market House members contended that Micros on L Inn ram3 In it CISE n Syste me In i an d Guugle Inc soughllo explain their DU Slit 35 memes li39l Chin Unl r l39l l Al I Geo li exec Elliol Stl lIUG riulvl answers lawman an queauons eaii i item Lemme 18 An Old Problem Many governmentscompanies trying to limit their citizens access to information Censorship prevent access Punishment deter access Surveillance spy on or monitor access China Saudi Arabia many companies How can we defeat such attempts Circumvent censorship Undetectably 19 Goal Circumvent censor via innocent web activity Normal web server and client cooperate to create covert channel Without consequence for client Censor shouldn t be able to detect use And without consequence for server Broad participation increases system robustness Offering service shouldn t lead to trouble Eg loss of business through being blocked Or legal implications 20 Requirements Client deniability Detection could be embarrassing or worse Client statistical deniability Even suspicion could be a problem Server covertnessstatistical deniability If server detected can be blocked Behave identically for clients and others Communication robustness Even without detecting censor could scramble covert channel Performance bandwidth latency 21 Some Options SSL Encrypted connectioncan t tell content Suspicious Doesn t help reach blocked servers Govt can require revealing SSL keys Anonymizing Proxies Prevent servers from knowing identity of client But proxy inside censor can t reach content And proxy outside censor can be blocked And use of proxy is suspicious 22 Conventional Mechanism Safeweb Operation Client contacts triangleboy reflector Reflector forwards requests to blocked server Server returns content to client IP spoof But very easily detected Local monitoring of the user only reveals an encrypted conversation Safeweb manual Engenders suspicion eg China blocks most SSLu 23
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'