Exam (elaborations)

Big Data Engineer

Rating

Sold

Pages

Grade

Uploaded on

07-02-2024

Written in

2023/2024

This document is intended for anyone seeking for work prospects in Big Data. It contains the most frequently asked interview questions that I encountered between November 2023 and January 2024. It includes topics from Hadoop, Spark, and Hive.

Show more Read less

Institution

Course

Content preview

Easy to crack the Big data interview:
Topics covered:
1.Hadoop
2.Spark
3.Hive

1.HADOOP
Q1. what is Hadoop? why ?
Hadoop is an open source framework that manages the storage and processing of large amounts of data for
applications.

Q2.what are the main components of Hadoop?
Storage – HDFS
Batch processing – MapReduce
Resource Management – YARN
Q3. What is HDFS? What are the functions of name node and data node?
HDFS (Hadoop Distributed File System). Instead of keeping all data on a single node (machine), HDFS distributes it across
multiple nodes with the default replication factor of 3.
It follows master and slave topology.
NameNode works as Master in Hadoop cluster. Main function performed by NameNode:
1. Stores metadata of actual data.
2. Manages File system namespace and executes operations like opening/closing files, renaming files and directories.
3. Regulates client access request for actual file data file.
4. Assign work to Slaves(DataNode).
DataNode works as Slave in Hadoop cluster . Main function performed by DataNode:
1. Actually stores Business data.
2. This is actual worker node were Read/Write/Data processing is handled.
3. Upon instruction from Master, it performs creation/replication/deletion of data blocks.
4. As all the Business data is stored on DataNode, the huge amount of storage is required for its operation.
Q4. What happens to a NameNode that has no data?
There does not exist any NameNode without data. If it is a NameNode then it should have some sort of data in it.

Q5. What happens if namenode fails?
 Since Hadoop 2.x, HDFS cluster has two NameNodes: active and passive. The Active NameNode is the NameNode
that works and runs in the Hadoop cluster.

,  Passive NameNode is also known as Standby NameNode. It comes into action only when the active NameNode
fails.
 Whenever the active NameNode fails, the standby NameNode takes over the responsibility of the failed
NameNode and keep the HDFS up and running. The passive Namenode takes the edit logs (meta data file) from
NameNode and merges it with the FsImage (File system Image) to produce an updated FsImage as well as to
prevent the Edit Logs from becoming too large.

Q6. what are the process of MapReduce?
Map Phase:
 The input data is divided into smaller chunks called "splits."
 A "Mapper" function is applied to each split independently. The Mapper takes the input data and produces a
set of key-value pairs.
Shuffle and Sort Phase:
 The output key-value pairs from all Mappers are shuffled and sorted by key to ensure that all values with the
same key are grouped together. This is essential for the subsequent Reduce phase.
Reduce Phase:
 The sorted key-value pairs are passed to a set of "Reducer" functions. Each Reducer receives a group of key-
value pairs with the same key.
 The Reducer processes this data and produces an output, typically aggregating or summarizing the values
associated with each key.
 The output of the Reduce phase is typically written to an external storage system, like HDFS (Hadoop
Distributed File System).

2.SPARK
Q7. What are the features of Apache Spark?
 High Processing Speed
 In-Memory Computation
 Reusability
 Fault Tolerance
 Stream Processing
 Lazy Evaluation
 Support Multiple Languages
 Hadoop Integration

Q8. What does DAG refer to in Apache Spark?
DAG stands for Directed Acyclic Graph with no directed cycles. There would be finite vertices and edges. Each edge
from one vertex is directed to another vertex in a sequential manner. The vertices refer to the RDDs of Spark and the
edges represent the operations to be performed on those RDDs
Q9. How is Apache Spark different from MapReduce?

MapReduce spark

Report Copyright Violation

Written for

Course: Big Data

All documents for this subject (125)

Document information

Uploaded on: February 7, 2024
Number of pages: 9
Written in: 2023/2024
Type: Exam (elaborations)
Contains: Questions & answers

Subjects

big data
hadoop
spark
hive
data engineer
hdfs
sparksql

$8.89

Get access to the full document:

Written by students who passed

Immediately available after payment

Read online or as PDF

Get to know the seller

rbabyshri

Get to know the seller

rbabyshri Exam Questions

View profile

Sold

Member since

2 year

Number of followers

Documents

Last sold

0.0

0 reviews

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

Satisfaction guarantee: how does it work?

Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.

Who am I buying these notes from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller rbabyshri. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy these notes for $8.89. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews) 53659 documents were sold in the last 30 days Founded in 2010, the go-to place to buy study notes for 16 years now

Big Data Engineer

Content preview

Written for

Document information

Subjects

Get to know the seller

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Didn't get what you expected? Choose another document

Pay as you like, start learning right away

Working on your references?

Frequently asked questions

What do I get when I buy this document?

Satisfaction guarantee: how does it work?

Who am I buying these notes from?

Will I be stuck with a subscription?

Can Stuvia be trusted?