New User Special Price Expires in

Let's log you in.

Sign in with Facebook


Don't have a StudySoup account? Create one here!


Create a StudySoup account

Be part of our community, it's free to join!

Sign up with Facebook


Create your account
By creating an account you agree to StudySoup's terms and conditions and privacy policy

Already have a StudySoup account? Login here


by: Orrin Rutherford


Marketplace > Portland State University > ComputerScienence > CS 410 > TOP INTRO TO MULTIMEDIA NTWRK
Orrin Rutherford
GPA 3.91

David Maier

Almost Ready


These notes were just uploaded, and will be ready to view shortly.

Purchase these notes here, or revisit this page.

Either way, we'll remind you when they're ready :)

Preview These Notes for FREE

Get a free preview of these Notes, just enter your email below.

Unlock Preview
Unlock Preview

Preview these materials now for free

Why put in your email? Get access to more of this material and other relevant free materials for your school

View Preview

About this Document

David Maier
Class Notes
25 ?




Popular in Course

Popular in ComputerScienence

This 9 page Class Notes was uploaded by Orrin Rutherford on Tuesday September 1, 2015. The Class Notes belongs to CS 410 at Portland State University taught by David Maier in Fall. Since its upload, it has received 7 views. For similar materials see /class/168259/cs-410-portland-state-university in ComputerScienence at Portland State University.

Similar to CS 410 at PSU

Popular in ComputerScienence




Report this Material


What is Karma?


Karma is the currency of StudySoup.

You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 09/01/15
Data Streams 07 Lecture 14 CS 410510 uata Streams Lecture 14 Synopses and Data Mining David Maier Kristin Tufte 11122007 Data Streams Lecture 14 1 l Other References Cormode Muthukrishnan An Improved Data Stream Summary The CountMin Sketch and its Application Journal of Agorfvms April 2005 l Greenwald Khanna SpaceEfficient Online Computation of Quantile Summaries SIGMOD 200 Cormode Johnson et al Holistic UDAFs at Streaming Speeds SIGMOD 2004 11122007 Data Streams Lecture 14 2 Data Streams 07 Lecture 14 l Keeping Counts with Hash Functions 43 41 43 21 41 43 32 43 44 12 21 43 32 41 Use first digit as hash h1 11122007 Data Streams Lecture 14 Save Space by Combining Entries 43 41 43 21 41 43 32 43 44 12 21 43 32 41 Use slo i i for all values with h1v i 11122007 Data Streams Lecture 14 Data Streams 07 Lecture 14 l Estimating Countv 43 41 43 21 41 43 32 43 44 12 21 43 32 41 Use value in slot h1v for coun tv 1 2 3 4 value 12 21 32 41 43 44 est 1 2 2 9 9 9 real 1 2 2 3 5 1 11122007 Data Streams Lecture 14 l Improve Estimate with Multiple Hash Tables 43 41 43 21 41 43 32 43 44 12 21 43 32 41 Use second digit as hash h2 11122007 Data Streams Lecture 14 Data Streams 07 Lecture 14 l Estimate with Min of Hash Entries 43 41 43 21 41 43 32 43 44 12 21 43 32 41 Use minh1v h2v to estimate countv value 12 21 32 41 43 44 est 1 2 real 1 2 11122007 Data Streams Lecture 14 7 lGeneralize to Multiple Hash Tables Use d hash tables each of length w Need d independent hash functions h1 h2 h Store in one table a count1d 1w call the skefch a on u date for v 1d count hv v Estimate countv as minJcountLj hJv 11122007 Data Streams Lecture 14 8 Data Streams 07 Lecture 14 l Probabilistic Guarantee Assumes fna epeno enfhash functions Er39r39or39 of estimate proportional to Nw N is number39 of elements seen Er39r39or39 bound exceeded lt 12d of the time For39 example with N 1000 w 50 d 5 Error bound is 100050 20 Could miss that bound 132 of the time 11122007 Data Streams Lecture 14 9 Works with Sums too Have tuples v amt Update is countJ hJv amt Er39r39or39 bound is totalsumw 11122007 Data Streams Lecture 14 10 Data Streams 07 Lecture 14 l Heavy Hitters Easy given v to get estimate of countv What if for39 O lt s lt 1 want to know all v with estimated countv gt sN Could generate all possible values of v and look up 11122007 Data Streams Lecture 14 11 l Alternative 1 If we know 3 in advance Whenever we do an update for39 element v see if estimated countv gt sN If so add v to the HeavyHitters list Details 0 v might cease being a heavy hitter39 after more input a keep HeavyHitters as a heap remove min element u if countu lt sN 11122007 Data Streams Lecture 14 12 Data Streams 07 Lecture 14 Alternative 1 Space Amount of extra space depends on 3 Can39t have more than 13 heavy hitters For example no more than 20 heavy hitters for s 05 Error bound might bump that up a bit 11122007 Data Streams Lecture 14 13 Alternative 2 We don39t know 3 in advance Keep several sketches using prefixes of value For example if we have 32bit values a sketch on all 32 bits a sketch on first 24 bits a sketch on first 16 bits a sketch or exact counts on first 8 bits 11122007 Data Streams Lecture 14 14 Data Streams 07 Lecture 14 lAIternative 2 Finding Heavy Hitters Enumerate all heavy hitters in the 8bit sketch 00 FF Check against 16bit sketch If 3C was a heavy hitter in 8bit check 3COO 7 3CFF in 16bit Similarly with 24bit and 32bit Note that the number of candidates at each stage is bounded by US 11122007 Data Streams Lecture 14 15 l Finding Quantiles Give me the median 75 rh percentile and 90 rh percentile values for temperaturequot Reduces to the problem of finding the element of a particular rank Suppose there are N 2000 values 1 Median rank 1000 a 7539rh percentile rank 1500 a 9039rh percentile rank 1800 11122007 Data Streams Lecture 14 16


Buy Material

Are you sure you want to buy this material for

25 Karma

Buy Material

BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.


You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

Why people love StudySoup

Bentley McCaw University of Florida

"I was shooting for a perfect 4.0 GPA this semester. Having StudySoup as a study aid was critical to helping me achieve my goal...and I nailed it!"

Jennifer McGill UCSF Med School

"Selling my MCAT study guides and notes has been a great source of side revenue while I'm in school. Some months I'm making over $500! Plus, it makes me happy knowing that I'm helping future med students with their MCAT."

Jim McGreen Ohio University

"Knowing I can count on the Elite Notetaker in my class allows me to focus on what the professor is saying instead of just scribbling notes the whole time and falling behind."

Parker Thompson 500 Startups

"It's a great way for students to improve their educational experience and it seemed like a product that everybody wants, so all the people participating are winning."

Become an Elite Notetaker and start selling your notes online!

Refund Policy


All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email


StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here:

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.