New User Special Price Expires in

Let's log you in.

Sign in with Facebook


Don't have a StudySoup account? Create one here!


Create a StudySoup account

Be part of our community, it's free to join!

Sign up with Facebook


Create your account
By creating an account you agree to StudySoup's terms and conditions and privacy policy

Already have a StudySoup account? Login here

Dependable Distribut Sys

by: Cassidy Effertz

Dependable Distribut Sys ECE 6102

Cassidy Effertz

GPA 3.64

Douglas Blough

Almost Ready


These notes were just uploaded, and will be ready to view shortly.

Purchase these notes here, or revisit this page.

Either way, we'll remind you when they're ready :)

Preview These Notes for FREE

Get a free preview of these Notes, just enter your email below.

Unlock Preview
Unlock Preview

Preview these materials now for free

Why put in your email? Get access to more of this material and other relevant free materials for your school

View Preview

About this Document

Douglas Blough
Class Notes
25 ?




Popular in Course


This 0 page Class Notes was uploaded by Cassidy Effertz on Monday November 2, 2015. The Class Notes belongs to ECE 6102 at Georgia Institute of Technology - Main Campus taught by Douglas Blough in Fall. Since its upload, it has received 16 views. For similar materials see /class/233886/ece-6102-georgia-institute-of-technology-main-campus in ELECTRICAL AND COMPUTER ENGINEERING at Georgia Institute of Technology - Main Campus.



Reviews for Dependable Distribut Sys


Report this Material


What is Karma?


Karma is the currency of StudySoup.

You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 11/02/15
The Google File System Georgia Institute of Technology ECE6102 4202009 David Colvin Jimmy Vuong I Introduction Relatively recent still applicable today GFS Google s storage platform for the generation and processing of data used by services that require large data sets Goals performance scalability reliability and availability Design driven by observed application workloads Design Assumptions Component failures are not uncommon gt commodity parts File size is far larger than traditional FS standards gt multiGB files Files are mutated by appending data gt few large sequential writes many producers High bandwidth is required gt bulk data processing I Architecture Overview Consists of a Single master a Multiple chunkservers Chunkservers store chunks on local disks Files divided into fixedsize chunks of 64MB Each chunk is replicated on multiple chunkservers a Multiple clients File data not cached by client l Architecture Single Master Makes chunk placement and replication decisions using global knowledge Minimal involvement in reads and writes Maintains all file system metadata Periodically sends HeartBeatmessages to collect state l Architecture Continued Apylicminn chunk lmndlt chunk locations cm dumth or Linux le system 3 13 Figure 1 GFS Architecture Master Operation Chunk Creation Re replication Rebalancing Creation in Place replicas on chunkservers with belowaverage disk space utilization in Limit number of recent creations on each chunkserver u Spread replicas across racks I Rereplication u Occurs when replicas fall below a threshold in Takes priority into account a Operates with throttled bandwidth for clone operations I Rebalancing I Master Operation Garbage Collection Marks deletion rather than reclaim resources immediately deletion operation occurs three days after being marked Marks files by turning them into hidden files Able to identify orphaned and stale chunks Advantages a Simple and reliable a Occurs when master is relatively free a Three day safety net for deletion Architecture Operation Log Historical record of critical metadata changes logged using logical time Defines order of concurrent operations Updates master s state simply reliably and without risking inconsistencies upon master crashing I Consistency File mutations are atomic Consistent all replicas have the same data Defined all replicas have current mutation in its entirety also consistent Successful mutations guarantee a All mutations in order a All replicas are defined and consistent Leases and Mutation Order Minimize master involvement Mutation change to data or metadata Leases maintain consistency Primary replica that receives lease a Receives commands from client a Decides order for all mutations I Data Flow Control and data are separate Pipelining to minimize latency and bottlenecks Chunkservers repeat data to the next closest chunkserver while receiving Write Control and Data Flow Cnmml mu Figure 2 Write Control and Data Flaw Atomic Record App ends GFS serializes concurrent record appends and writes each record at least once atomically Order of writes does not matter Simplifies concurrent writes for client app Failures if a write fails the client retries Client handles inconsistent regions Atomic Record Appends Replica 1 Replica 2 i Snapshot I Used to create branch copies Master receives snapshot request and revokestimes out leases Copies metadata Next client request triggers branch chunk creation I New chunk created on same chunkserver l Fault Tolerance Hardware failures are inevitable Failures can make data unavailable or corrupt I GFS Goals 1 High Availability 1 High Data Integrity High Availability Fast Recovery master and chunkserver can boot very quickly Replication simple and easy to monitor with automatic rereplication Master logs and checkpoints are replicated To recover master we load last checkpoint and replay operation log from when it was checkpointed Alias used by clients to access master Shadow masters used for read operations Data Integrity Chunkservers use checksumming Chunks have 64KB blocks with 32bit checksums Checksums are verified on reads Checksums can be incremented when partial blocks are added to by append Inactive chunks are periodically scanned and verified Benchmarks Micro bulwark limit so 3mm llle E m g X W w r vk gmm Wampum 7 lo H l 5 la 15 b 5 m l5 Namml Cllcnu n umhcraicllcnb N Nam mlan N 5 Heads b Writes a Recal39d appends gure 3 Aug Lu a aw 39 39 39 39 p g Baum cums med mmughpms They hsve arm bars mu Shaw 95 con dence intervals which are illegible m some uses sh lumen of law muan lu mullsumulmm 1 master 16 clients 16 chunkservers Benchmarks Real World Storage many terabytes of data from hundreds of thousands of files Metadata 380MB per chunkserver Read Rates 580MBs Master Load 200 to 500 OPs Recovery Time 15000 chunks 600 GB Took 23 minutes to replicate chunks I Conclusions System optimized for a unique set of application workloads a Large Files a Hardware Failures a Many Concurrent Appends a Large Sequential Reads Fault Tolerance Automatic Replication and checksums Master involvement minimized with large chunk sizes metadata caching and chunk leases


Buy Material

Are you sure you want to buy this material for

25 Karma

Buy Material

BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.


You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

Why people love StudySoup

Steve Martinelli UC Los Angeles

"There's no way I would have passed my Organic Chemistry class this semester without the notes and study guides I got from StudySoup."

Allison Fischer University of Alabama

"I signed up to be an Elite Notetaker with 2 of my sorority sisters this semester. We just posted our notes weekly and were each making over $600 per month. I LOVE StudySoup!"

Bentley McCaw University of Florida

"I was shooting for a perfect 4.0 GPA this semester. Having StudySoup as a study aid was critical to helping me achieve my goal...and I nailed it!"

Parker Thompson 500 Startups

"It's a great way for students to improve their educational experience and it seemed like a product that everybody wants, so all the people participating are winning."

Become an Elite Notetaker and start selling your notes online!

Refund Policy


All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email


StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here:

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.