Exam (elaborations)

Cloud Computing - Technical Foundations

Rating

Sold

Pages

Grade

A+

Uploaded on

09-12-2024

Written in

2024/2025

31. What happens in case of errors at any of the replicas?: The write may have succeeded at the primary and an arbitrary subset of the secondary replicas. (If it had failed at the primary, it would not have been assigned a serial number and forwarded.) The client request is considered to have failed, and the modified region is left in an inconsistent state. Our client code handles such errors by retrying the failed mutation. 32. Why large write may cause consistent but undefined GFS regions?: GFS will break the write load into multiple write operations. While they follow proper write protocols, they may be interleaved with and overwritten by concurrent operations from other clients. Therefore, the region may end up containing fragments from different clients, although the write load chunks are identical at all replicas. This is why the region is consistent but undefined. 33. Control flow vs. Data flow: While control flows from the client to the primary and then to all secondaries, data is pushed linearly along a carefully picked chain of chunk servers in a pipelined fashion (to fully utilize network bandwidth - esp outbound) 34. How GFS dataflow avoid network bottlenecks and high-latency links (e.g., inter-switch links are often both) as much as possible?: each machine

Show more Read less

Institution

Course

Content preview

Cloud Computing - Technical Foundations

1. Characteristics of large data centers (like Google data centers): - Component failures are the norm
rather than the exception. Constant monitoring, error detection, fault tolerance, and automatic
recovery must be integral to the system
- Files are huge by traditional standards
- Most files are mutated by appending new data rather than overwriting existing data. Random
writes within a file are practically non-existent. The system must efficiently implement well-
defined semantics for multiple clients that concurrently append to the same file.
- Co-designing the applications and the file system API benefits the overall system by
increasing flexibility
- High sustained bandwidth is more important than low latency.
2. Google File System interface: - Files are organized hierarchically in directories and identified
by pathnames
- operations to create, delete, open, close, read, and write files
- snapshot and record append operations (multiple clients to append data to the same file
concurrently)
3. Google File System Architecture: - single master and multiple chunk servers and are
accessed by multiple clients
- commodity Linux machine running a user-level server process
- Files are divided into fixed-size chunks (ie 64 bits). Each chunk is replicated on multiple (3)
chunk servers
- The master maintains all file system metadata
4. Examples of GFS system metadata: the namespace, access control informa- tion, the mapping
from files to chunks, the current locations of chunks, chunk lease management, garbage
collection of orphaned chunks, and chunk migration between chunk servers
5. The master periodically communicates with each chunk server in to
give it instructions and collect its state.: The master periodically communicates with each chunk
server in HeartBeat messages to give it instructions and collect its state.
6. Clients interact with the for metadata operations, but all data-bear- ing
communication goes directly to .: Clients interact with the master for metadata
operations, but all data-bearing communication goes directly to the chunk servers.
7. In GFS, why Neither the client nor the chunk server caches file data ?: - most applications stream
through huge files or have working sets too large to be cached
- eliminating cache coherence issues
- chunks are stored as local files at chunk servers
- Linux's buffer cache already keeps frequently accessed data in memory

, Cloud Computing - Technical Foundations

8. The Master's operations include?: - all namespace operations
- managing of chunk replicas throughout the system
- making chunk placement decisions
- creating new chunks and hence replicas
- coordinating various system-wide activities to keep chunks fully replicated
- balancing load across all the chunk servers
- reclaiming unused storage
9. GFS Chunk size and its advantages: Chunk size is 64MB which is larger than typical file system
block sizes. Large chunk has the following advantages:
- reduces clients' need to interact with the master because reads and writes on the same
chunk require only one initial request to the master for chunk location information
- client is more likely to perform many operations on a given chunk, it can reduce network
overhead by keeping a persistent TCP connection to the chunkserver over an extended period
of time
- reduces the size of the metadata stored on the master
10.GFS Chunk size and its disadvantages: - If a popular file is small (small number of chunks), it
can become a hotspot on stored chunk servers (many users accessing it)
11.How to solve hot spot issues in GFS?: - increase the number of replications
- stagger requests
12.How GFS handle metadata?: - 3 types: the file and chunk namespaces, the mapping from
files to chunks, and the locations of each chunk's replicas
- All metadata is kept in the master's memory. Namespaces and file-to-chunk mapping are
also kept persistent by logs stored on the master's local disk and replicated on remote
machines
- Master asks each chunk server about its chunks at master startup and whenever a chunk
server joins the cluster
13.Why putting metadata in Master server's memory is not a risk in practice?-
: - the master maintains less than 64 bytes of metadata for each 64 MB chunk
- the file namespace data typically requires less then 64 bytes per file because it stores file
names compactly using prefix compression
- it's not expensive or complicated to add more memory to Master
14.Regarding operation log, we must do this or we will effectively lose the whole file system or
recent client operations even if the chunks themselves survive: We must store operation logs
reliably and not make changes visible to clients until metadata changes are made persistent.
Specifically, we store the logs on multiple machines and respond to a client operation only
after flushing the corresponding log record to disk both locally and remotely.

, Cloud Computing - Technical Foundations

15.The master recovers its file system state by?: replaying the operation log
16.The master checkpoints its state when?: the log grows beyond a certain size
17. The Master's checkpoint is in a form that can
be directly mapped into memory and used for namespace lookup without .: compact B-tree like
extra parsing
18.Master Recovery needs ?: - latest complete checkpoint
- subsequent log files
19.A failure during checkpointing does not affect correctness because?: the recovery code
detects and skips incomplete checkpoints
20.GSF data mutations: writes or record appends
21.GFS record append: A record append causes data (the "record") to be append- ed atomically
at least once even in the presence of concurrent mutations, but at an offset of GFS's choosing
22.The state of a GFS file region after a data mutation depends on?: - the type of mutation
- whether it succeeds or fails
- whether there are concurrent mutations
23.A GFS file region is consistent if?: all clients will always see the same data, regardless of
which replicas they read from
24.A GFS region is defined after a file data mutation if?: it is consistent and clients will see
what the mutation writes in its entirety
25.Concurrent successful mutations leave the GFS region undefined but consistent when?: all
clients see the same data, but it may not reflect what any one mutation has written
Typically, it consists of mingled fragments from multiple mutations
26.A failed mutation makes the region inconsistent (hence also undefined) when?: different
clients may see different data at different times
27.After a sequence of successful mutations, the mutated file region is guar- anteed to be defined
and contain the data written by the last mutation. GFS achieves this by?: - applying mutations to a
chunk in the same order on all its replicas
- using chunk version numbers to detect any replica that has become stale because it has
missed mutations while its chunk server was down. Stale replicas will never be involved in a
mutation or given to clients
28.Since clients cache chunk locations, they may read from a stale replica before that information is
refreshed. How GFS solves this provlem?: - limit cache window by utilizing cache entry's timeout
and the next open of the file, which purges cache

Report Copyright Violation

Written for

Course: Cloud Data File Systems

All documents for this subject (1)

Document information

Uploaded on: December 9, 2024
Number of pages: 21
Written in: 2024/2025
Type: Exam (elaborations)
Contains: Questions & answers

Subjects

cloud computing technical foundations

$9.49

Get access to the full document:

Written by students who passed

Immediately available after payment

Read online or as PDF

Get to know the seller

smartchoices

4.8

(9)

Get to know the seller

smartchoices Chamberlain College Of Nursing

View profile

Sold

Member since

5 year

Number of followers

Documents

4499

Last sold

2 weeks ago

4.8

9 reviews

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

Satisfaction guarantee: how does it work?

Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.

Who am I buying these notes from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller smartchoices. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy these notes for $9.49. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews) 48766 documents were sold in the last 30 days Founded in 2010, the go-to place to buy study notes for 16 years now

Cloud Computing - Technical Foundations

Content preview

Written for

Document information

Subjects

Get to know the seller

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Didn't get what you expected? Choose another document

Pay as you like, start learning right away

Working on your references?

Frequently asked questions

What do I get when I buy this document?

Satisfaction guarantee: how does it work?

Who am I buying these notes from?

Will I be stuck with a subscription?

Can Stuvia be trusted?