Written by students who passed Immediately available after payment Read online or as PDF Wrong document? Swap it for free 4.6 TrustPilot
logo-home
Exam (elaborations)

Cloud Computing - Technical Foundations

Rating
-
Sold
-
Pages
21
Grade
A+
Uploaded on
09-12-2024
Written in
2024/2025

31. What happens in case of errors at any of the replicas?: The write may have succeeded at the primary and an arbitrary subset of the secondary replicas. (If it had failed at the primary, it would not have been assigned a serial number and forwarded.) The client request is considered to have failed, and the modified region is left in an inconsistent state. Our client code handles such errors by retrying the failed mutation. 32. Why large write may cause consistent but undefined GFS regions?: GFS will break the write load into multiple write operations. While they follow proper write protocols, they may be interleaved with and overwritten by concurrent operations from other clients. Therefore, the region may end up containing fragments from different clients, although the write load chunks are identical at all replicas. This is why the region is consistent but undefined. 33. Control flow vs. Data flow: While control flows from the client to the primary and then to all secondaries, data is pushed linearly along a carefully picked chain of chunk servers in a pipelined fashion (to fully utilize network bandwidth - esp outbound) 34. How GFS dataflow avoid network bottlenecks and high-latency links (e.g., inter-switch links are often both) as much as possible?: each machine

Show more Read less
Institution
Course

Content preview

Cloud Computing - Technical Foundations


1. Characteristics of large data centers (like Google data centers): - Component failures are the norm
rather than the exception. Constant monitoring, error detection, fault tolerance, and automatic
recovery must be integral to the system
- Files are huge by traditional standards
- Most files are mutated by appending new data rather than overwriting existing data. Random
writes within a file are practically non-existent. The system must efficiently implement well-
defined semantics for multiple clients that concurrently append to the same file.
- Co-designing the applications and the file system API benefits the overall system by
increasing flexibility
- High sustained bandwidth is more important than low latency.
2. Google File System interface: - Files are organized hierarchically in directories and identified
by pathnames
- operations to create, delete, open, close, read, and write files
- snapshot and record append operations (multiple clients to append data to the same file
concurrently)
3. Google File System Architecture: - single master and multiple chunk servers and are
accessed by multiple clients
- commodity Linux machine running a user-level server process
- Files are divided into fixed-size chunks (ie 64 bits). Each chunk is replicated on multiple (3)
chunk servers
- The master maintains all file system metadata
4. Examples of GFS system metadata: the namespace, access control informa- tion, the mapping
from files to chunks, the current locations of chunks, chunk lease management, garbage
collection of orphaned chunks, and chunk migration between chunk servers
5. The master periodically communicates with each chunk server in to
give it instructions and collect its state.: The master periodically communicates with each chunk
server in HeartBeat messages to give it instructions and collect its state.
6. Clients interact with the for metadata operations, but all data-bear- ing
communication goes directly to .: Clients interact with the master for metadata
operations, but all data-bearing communication goes directly to the chunk servers.
7. In GFS, why Neither the client nor the chunk server caches file data ?: - most applications stream
through huge files or have working sets too large to be cached
- eliminating cache coherence issues
- chunks are stored as local files at chunk servers
- Linux's buffer cache already keeps frequently accessed data in memory






, Cloud Computing - Technical Foundations


8. The Master's operations include?: - all namespace operations
- managing of chunk replicas throughout the system
- making chunk placement decisions
- creating new chunks and hence replicas
- coordinating various system-wide activities to keep chunks fully replicated
- balancing load across all the chunk servers
- reclaiming unused storage
9. GFS Chunk size and its advantages: Chunk size is 64MB which is larger than typical file system
block sizes. Large chunk has the following advantages:
- reduces clients' need to interact with the master because reads and writes on the same
chunk require only one initial request to the master for chunk location information
- client is more likely to perform many operations on a given chunk, it can reduce network
overhead by keeping a persistent TCP connection to the chunkserver over an extended period
of time
- reduces the size of the metadata stored on the master
10.GFS Chunk size and its disadvantages: - If a popular file is small (small number of chunks), it
can become a hotspot on stored chunk servers (many users accessing it)
11.How to solve hot spot issues in GFS?: - increase the number of replications
- stagger requests
12.How GFS handle metadata?: - 3 types: the file and chunk namespaces, the mapping from
files to chunks, and the locations of each chunk's replicas
- All metadata is kept in the master's memory. Namespaces and file-to-chunk mapping are
also kept persistent by logs stored on the master's local disk and replicated on remote
machines
- Master asks each chunk server about its chunks at master startup and whenever a chunk
server joins the cluster
13.Why putting metadata in Master server's memory is not a risk in practice?-
: - the master maintains less than 64 bytes of metadata for each 64 MB chunk
- the file namespace data typically requires less then 64 bytes per file because it stores file
names compactly using prefix compression
- it's not expensive or complicated to add more memory to Master
14.Regarding operation log, we must do this or we will effectively lose the whole file system or
recent client operations even if the chunks themselves survive: We must store operation logs
reliably and not make changes visible to clients until metadata changes are made persistent.
Specifically, we store the logs on multiple machines and respond to a client operation only
after flushing the corresponding log record to disk both locally and remotely.






, Cloud Computing - Technical Foundations


15.The master recovers its file system state by?: replaying the operation log
16.The master checkpoints its state when?: the log grows beyond a certain size
17. The Master's checkpoint is in a form that can
be directly mapped into memory and used for namespace lookup without .: compact B-tree like
extra parsing
18.Master Recovery needs ?: - latest complete checkpoint
- subsequent log files
19.A failure during checkpointing does not affect correctness because?: the recovery code
detects and skips incomplete checkpoints
20.GSF data mutations: writes or record appends
21.GFS record append: A record append causes data (the "record") to be append- ed atomically
at least once even in the presence of concurrent mutations, but at an offset of GFS's choosing
22.The state of a GFS file region after a data mutation depends on?: - the type of mutation
- whether it succeeds or fails
- whether there are concurrent mutations
23.A GFS file region is consistent if?: all clients will always see the same data, regardless of
which replicas they read from
24.A GFS region is defined after a file data mutation if?: it is consistent and clients will see
what the mutation writes in its entirety
25.Concurrent successful mutations leave the GFS region undefined but consistent when?: all
clients see the same data, but it may not reflect what any one mutation has written
Typically, it consists of mingled fragments from multiple mutations
26.A failed mutation makes the region inconsistent (hence also undefined) when?: different
clients may see different data at different times
27.After a sequence of successful mutations, the mutated file region is guar- anteed to be defined
and contain the data written by the last mutation. GFS achieves this by?: - applying mutations to a
chunk in the same order on all its replicas
- using chunk version numbers to detect any replica that has become stale because it has
missed mutations while its chunk server was down. Stale replicas will never be involved in a
mutation or given to clients
28.Since clients cache chunk locations, they may read from a stale replica before that information is
refreshed. How GFS solves this provlem?: - limit cache window by utilizing cache entry's timeout
and the next open of the file, which purges cache

Written for

Course

Document information

Uploaded on
December 9, 2024
Number of pages
21
Written in
2024/2025
Type
Exam (elaborations)
Contains
Questions & answers

Subjects

$9.49
Get access to the full document:

Wrong document? Swap it for free Within 14 days of purchase and before downloading, you can choose a different document. You can simply spend the amount again.
Written by students who passed
Immediately available after payment
Read online or as PDF

Get to know the seller

Seller avatar
Reputation scores are based on the amount of documents a seller has sold for a fee and the reviews they have received for those documents. There are three levels: Bronze, Silver and Gold. The better the reputation, the more your can rely on the quality of the sellers work.
smartchoices Chamberlain College Of Nursing
Follow You need to be logged in order to follow users or courses
Sold
36
Member since
5 year
Number of followers
5
Documents
4499
Last sold
2 weeks ago

4.8

9 reviews

5
7
4
2
3
0
2
0
1
0

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

Student with book image

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Working on your references?

Create accurate citations in APA, MLA and Harvard with our free citation generator.

Working on your references?

Frequently asked questions