### Create a StudySoup account

#### Be part of our community, it's free to join!

Already have a StudySoup account? Login here

# Class Note for CMPSCI 711 at UMass(11)

### View Full Document

## 13

## 0

## Popular in Course

## Popular in Department

This 61 page Class Notes was uploaded by an elite notetaker on Friday February 6, 2015. The Class Notes belongs to a course at University of Massachusetts taught by a professor in Fall. Since its upload, it has received 13 views.

## Similar to Course at UMass

## Popular in Subject

## Reviews for Class Note for CMPSCI 711 at UMass(11)

### What is Karma?

#### Karma is the currency of StudySoup.

#### You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 02/06/15

CMPSCI 711 Really Advanced Algorithms Lecture 19 Summary of Research Papers Andrew McGregor Last Comp ed May 12 2009 Basic Data Stream Model Definition Input is a length m list lt31 amgt where a E You can only access the elements sequentially and have memory that is sub linear in m and n Basic Data Stream Model Definition Input is a length m list lt31 amgt where a E You can only access the elements sequentially and have memory that is sub linear in m and n Problem Let f aj 1 Compute approximations to all fi 2 Estimate Fk Z fik 3 Estimate F0 2 fi0 i6n i e the number of distinct items i6n Outline Estimating Point Queries Estimating Point Queries Problem Estimate all f upto ism error Estimating Point Queries Problem Estimate all f upto ism error Algorithm 1 Pick d 2 Wise hash functions h1 hd n H 2 Compute Ti Zr hlj fl 3 Let 121 minihlj Ti Estimating Point Queries Problem Estimate all f upto ism error Algorithm 1 Pick d 2 Wise hash functions h17 hd n H 2 Compute Ti Z hlj fl 3 Let a minihzj Tu Theorem There exists a 06 1 Iogn6 space algorithm that returns 1 17 NJ where f g f g f em With probability 1 7 6 Estimating Point Queries Problem Estimate all f upto ism error Algorithm 1 Pick d 2 Wise hash functions h17 hd n H 2 Compute Ti Z hlj fl 3 Let a minihzj Tu Theorem There exists a 06 1 Iogn6 space algorithm that returns 1 17 NJ where f g f g f em With probability 1 7 6 Proof 1 Set W 26 E739Jih j fl em2 Estimating Point Queries Problem Estimate all f upto ism error Algorithm 1 Pick d 2 Wise hash functions h17 hd n H 2 Compute Ti Z hlj fl 3 Let a minihzj Tu Theorem there eists a 06 1 logn6 space algorithm that returns 1 17 NJ where f g f g f em With probability 1 7 6 Proof 1 Set W 26 E739Jih j fl em2 2 Set d Iogn6 1 mini hlj Tid gt fl em 6n Outline Estimating Fk Estimating Fk Problem Estimate Fk Z fik Estimating Fk Problem Estimate Fk Z fik Algorithm 1 Picj uniformly from m 2 Compute r 1 m aj 3 3 OutputX mrk 7 mr 71k Estimating Fk Problem Estimate Fk Z fik Algorithm 1 Picj uniformly from m 2 Compute r 1 m aj aj 3 OutputX mrk 7 mr 71quot Theorem There exists a Okn1 1ke 2 Iog16 space algorithm that returns a 1 6 factor approximation to Fk With probability 1 7 6 Estimating Fk Problem Estimate Fk Z fik Algorithm 1 Pickj uniformly from m 2 Compute r 1 m aj aj 3 OutputX mrk 7 mr 71k Theorem There exists a Okn1 1ke 2 log16 space algorithm that returns a 1 l 6 factor approximation to Fk With probability 1 7 6 Proof EX Fk and VX knl lRFE Run copies of algorithm Average results appropriately and apply ChernofFChebyshev D Outline Estimating F2 Estimating F2 Problem Estimate F2 Z fiz Estimating F2 Problem Estimate F2 Z fiz Algorithm 1 Pick X17 7Xn 6 711 With X andxj independent 2 Compute EH 3 Return X Eldrawn2 Xifi Estimating F2 Problem Estimate F2 Z fiz Algorithm 1 Pick X17 7Xn 6 711 With X andxj independent 2 Compute EH 3 Return X Eldrawn2 Xifi Theorem There exists a 06 2 Iog16og m log n space algorithm that returns a 1 6 factor approximation to F2 With probability 1 7 6 Estimating F2 Problem Estimate F2 Z fiz Algorithm 1 Pick X17 7Xn 6 711 With X andxj independent 2 Compute EH 3 Return X Eldrawn2 Xifi Theorem There exists a 06 2 log16log m l log n space algorithm that returns a 1 l 6 factor approximation to F2 With probability 1 7 6 Proof EX F2 and VX 2322 Run 06 2 log16 copies Average results appropriately and apply ChernofFChebyshev D Relationship to Johnson Lindenstrauss Algorithm 1 Let Y WX1Xn be such that each X are independent N01 variables Where HXHQ X12 Xn212 2 Compute hx Zidn 3 Repeat d times to get hx17 hxd Xifi Relationship to Johnson Lindenstrauss Algorithm 1 Let Y WX1Xn be such that each X are independent N01 variables Where HXHQ X12 Xn212 2 Compute hx Zidnlxif 3 Repeat d times to get hx17 hxd Theorem For any p vectors in Euclidean space v17 7 VP 6 R there exists a map h R H Rd such that 1 EiihVi hVjH2 S W Vjiiz S 1 EiihVi hVjH2 Where d 06 2 log n Outline Estimating F0 Problem and First Attempt Problem Estimate F0 Z fIO ie number of distinct values in the stream Problem and First Attempt Problem Estimate F0 Z fIO ie number of distinct values in the stream Algorithm 1 Pick a random hash function h 17 n a 01 2 As the stream arrives compute ha 3 Keep track of minimum value v seen so far 4 Output 1v 7 1 as estimate for F0 Problem and First Attempt Problem Estimate F0 Z fIO ie number of distinct values in the stream Algorithm 1 Pick a random hash function h 17 n a 01 2 As the stream arrives compute ha 3 Keep track of minimum value v seen so far 4 Output 1v 7 1 as estimate for F0 Issues 1 If hash function is totally random it may be too big to store Assume h is pairwise independent 2 Precision issues Hash into n3 rather than 01 3 The above estimator varies too much Track t th smallest The Algorithm Algorithm 1 Pick a random 2 Wise hash function h n a n3 2 Compute t th smallest hash values v17 Vt 3 lfFo t output lfo F0 4 lfFo gt t output lfo tn3vt The Algorithm Algorithm 1 Pick a random 2 Wise hash function h n a n3 2 Compute t th smallest hash values v17 Vt 3 lfFo t output lfo F0 4 lfFo gt t output lfo tn3vt Theorem A Ift 2062 then 19gt Fe 7 Fol 2 J0 25 The Algorithm Algorithm 1 Pick a random 2 Wise hash function h n a n3 2 Compute t th smallest hash values v17 Vt 3 lfFo t output lfo F0 4 lfFo gt t output lfo tn3vt Theorem A Ift 2062 then 19gt Fe 7 Fol 2 J0 25 Corollary We can 1 e approximate F0 With probability 1 7 6 in only 06 2 og16 log n space Proof of Theorem 1 Assuming F0 gt t we get good estimate if 17 EF0 tn3vt 1 EF0 3 3 or eqUIvaIentIy 13m Vt Proof of Theorem 1 Assuming F0 gt t we get good estimate if 17 EF0 tn3vt 1 EF0 or equivalently Vt 2 Consider l Vt lPY lt t where Y is the number m3 of Items hashing to under m Other case IS similar Proof of Theorem 1 Assuming F0 gt t we get good estimate if 17 EF0 tn3vt 1 EF0 or equivalently Vt 2 Consider l Vt lPY lt t where Y is the number m3 of Items hashing to under m Other case IS similar 3 For each a tn3 1 Plhla 1eFol 1eFo Proof of Theorem 1 Assuming F0 gt t we get good estimate if 17 EF0 tn3vt 1 EF0 or equivalently Vt 2 Consider l Vt lPY lt t where Y is the number of items hashing to under Other case is similar 3 For each a tn3 1 Plhla 1eFol 1eFo 4 By linearity of expectation E Y t1 e Proof of Theorem 1 Assuming F0 gt t we get good estimate if 17 EF0 tn3vt 1 EF0 or equivalently Vt 2 Consider l Vt lPY lt t where Y is the number m3 of Items hashing to under m Other case IS similar 3 For each a tn3 1 Plhla 1eFol 1eFo 4 By linearity of expectation E Y t1 e 5 Since h is 2 wise independent VY t1 e Proof of Theorem 1 Assuming F0 gt t we get good estimate if 17 EF0 tn3vt 1 EF0 or equivalently Vt 2 Consider l Vt lPY lt t where Y is the number m3 of Items hashing to under m Other case IS similar 3 For each a tn3 t 1 h lt 7 lt 7 llali 1EFol 1EF0 4 By linearity of expectation E Y t1 e 5 Since h is 2 wise independent VY t1 e 6 By Chebyshev lPY 2 t g 15 if t 2 206 2 Outline F0 lower bou nd F0 lower bound via communication complexity Theorem Any data stream algorithm that approximates F0 up to a 1 6 factor With probability 910 needs 96 2 space F0 lower bound via communication complexity Theorem Any data stream algorithm that approximates F0 up to a 1 6 factor With probability 910 needs 96 2 space Lemma Alice has X 6 01 and Bob has y 6 01 Alice needs to send Bob Qn bits if he is to estimate the hamming distance AXy up to io With probability 910 F0 lower bound via communication complexity Theorem Any data stream algorithm that approximates F0 up to a 1 i 6 factor With probability 910 needs 96 2 space Lemma Alice has X 6 01 and Bob has y 6 01 Alice needs to send Bob Qn bits if he is to estimate the hamming distance AXy up to io With probability 910 gt Define a multi set by 5 i X 1U i y 1 Fo5 iXi2 iYi12 AX7y2 gt 0 estimate for F0 gives 0 estimate for AXy gt If n 06 2 1 i 6 factor approx of F0 is 0 approx One Pass Lower Bound 12 gt Reduction from INDEX problem Alice knows z E O1t and Bob knowsj E Let39s assume izi t2 and this is odd One Pass Lower Bound 12 gt Reduction from INDEX problem Alice knows z E O1t and Bob knowsj E Let39s assume lzl t2 and this is odd gt Alice and Bob pick r ER 711 using public random bits One Pass Lower Bound 12 gt Reduction from INDEX problem Alice knows z E 01t and Bob knowsj E Let39s assume lzl t2 and this is odd gt Alice and Bob pick r ER 711 using public random bits gt Alice computes signrz and Bob computes signrj One Pass Lower Bound 12 gt Reduction from INDEX problem Alice knows z E 01t and Bob knowsj E Let39s assume lzl t2 and this is odd gt Alice and Bob pick r ER 711 using public random bits gt Alice computes signrz and Bob computes signrj gt Claim For some constant c gt 0 7 I 7 12 if 2 0 lPsIgnrz 7 signrj 7 12 C if zj 1 One Pass Lower Bound 12 gt Reduction from INDEX problem Alice knows z E 01t and Bob knowsj E Let39s assume lzl t2 and this is odd gt Alice and Bob pick r ER 711 using public random bits gt Alice computes signrz and Bob computes signrj gt Claim For some constant c gt 0 7 I 7 12 if 2 0 lPsIgnrz 7 signrj 7 12 C if zj 1 gt Repeat n Ot times to construct X Isignrz and y Isignrj One Pass Lower Bound 12 gt Reduction from INDEX problem Alice knows z E 01t and Bob knowsj E Let39s assume lzl t2 and this is odd Alice and Bob pick r ER 711 using public random bits Alice computes signrz and Bob computes signrj V V V Claim For some constant c gt 0 12 ifzj0 lP sIgnrzsIgnrJ 12CE 41 Repeat n Ot times to construct V X Isignrz and y Isignrj V With probability 910 for some constants 1 lt c2 zj O AXy 2 n2 7 cl zj 1 AXy n2 7 Cg One Pass Lower Bound 22 Claim For some constant c gt 0 12 1210 1P5Ignrz Signrj 12 C ifzj 1 One Pass Lower Bound 22 Claim For some constant c gt 0 12 1210 1P5Ignrz Signrj 12 C ifzj 1 gt If 2 0 then signrz and signrj are independent One Pass Lower Bound 22 Claim For some constant c gt 0 12 1210 1P5Ignrz Signrj 12 C ifzj 1 gt If 2 0 then signrz and signrj are independent gt If 2 1 let 5 rz 7 rj A signrz signrj One Pass Lower Bound 22 Claim For some constant c gt 0 12 1210 1P5Ignrz Signrj 12 C ifzj 1 gt If 2 0 then signrz and signrj are independent gt If 2 1 let 5 rz 7 rj A signrz signrj PMPAsOPs0PA57 OP57 O One Pass Lower Bound 22 Claim For some constant c gt 0 12 1210 1P5Ignrz Signrj 12 C ifzj 1 gt If 2 0 then signrz and signrj are independent gt If 2 1 let 5 rz 7 rj A signrz signrj PMPAsOPs0PA57 OP57 O 1PAs 0 1 and 1PAs 7 012 One Pass Lower Bound 22 Claim For some constant c gt 0 12 1210 1P5Ignrz Signrj 12 C ifzj 1 gt If 2 0 then signrz and signrj are independent gt If 2 1 let 5 rz 7 rj A signrz signrj PMPAsOPs0PA57 OP57 O 1PAs0land 1PA57 012 1Ps 0 2c for some constant c gt O Outline p and Stable Distributions Problem Definition Definition Input is a length m list 5 lt31 amgt where a jhAi E nU You can only access the elements sequentially and have memory that is sub linear in m and n Problem Definition Definition Input is a length m list 5 a17 amgt where a jhAi E n U illi7 You can only access the elements sequentially and have memory that is sub linear in m and n Problem Let Vj 2 A Note that Vj can be negative Estimate 1p MV llVllp 2 vi 1 Theorem Using only 06 2 log 6 1 space we can find a 1 l 6 error approx With probability 1 7 6 p stable distribution Definition A distribution D over R is called p stable if for any v E R and iid variables X17 Xn variables with distribution D the random variable 2 ViX I has the same distribution as llvlle where X is a random variable with distribution D p stable distribution Definition A distribution D over R is called p stable if for any v E R and iid variables X17 Xn variables with distribution D the random variable 2 ViX I has the same distribution as llvlle where X is a random variable with distribution D Example The Normal01 distribution is 2 stable 1 eixgZ 27r Cauchy is 1 stable 3 7r 1 l X2 Ideal Algorithm for p 1 Algorithm gt Let A be a k x n matrix Where each AM is an independent Cauchy distribution Let A1 be thej column ofA gt Let c be length k zero vector For each stream element j A c e c AA gt Return medianc1ck Ideal Algorithm for p 1 Algorithm gt Let A be a k x n matrix Where each AM is an independent Cauchy distribution Let Al be thej column ofA gt Let c be length k zero vector For each stream element j A c e c AA gt Return medianlc1llckl Lemma gt c Av but algorithm only needs to maintain length k vector gt Each lcil independently distributed as Hvale Where X is a Cauchy random variable Details Lemma IfX is Cauchy distributed medianUX 1 Proof Follows since tan7r4 1 and FXz 1PX E z OZ 2 Earctanz 1X2 7r Details Lemma IfX is Cauchy distributed medianUX 1 Proof Follows since tan7r4 1 and FXz 1PX E z OZ 2 Earctanz 1X2 7139 Lemma Let Y17 Yk iid andX medianY17 Yk 1PFxz 1276712E216 ifk 9672og671 Details Lemma le is Cauchy distributed median X 1 Proof Follows since tan7r4 1 and FXz 1PX E z OZ 2 Earctanz 1X2 7139 Lemma Let Y17 Yk iid andX medianY17 Yk M542 6 12 7 e 12 617 6 ifk 9672og671 Lemma For small 6 suf ciently small Let X be Cauchy distributed and z satisfy F X z E 12 7 6712 1 6 Then 2 E 17 4671 4e Thanks

### BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.

### You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

## Why people love StudySoup

#### "Knowing I can count on the Elite Notetaker in my class allows me to focus on what the professor is saying instead of just scribbling notes the whole time and falling behind."

#### "I used the money I made selling my notes & study guides to pay for spring break in Olympia, Washington...which was Sweet!"

#### "I was shooting for a perfect 4.0 GPA this semester. Having StudySoup as a study aid was critical to helping me achieve my goal...and I nailed it!"

#### "It's a great way for students to improve their educational experience and it seemed like a product that everybody wants, so all the people participating are winning."

### Refund Policy

#### STUDYSOUP CANCELLATION POLICY

All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email support@studysoup.com

#### STUDYSOUP REFUND POLICY

StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here: support@studysoup.com

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to support@studysoup.com

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.