Written by students who passed Immediately available after payment Read online or as PDF Wrong document? Swap it for free 4.6 TrustPilot
logo-home
Class notes

Introduction to Big Data Notes

Rating
-
Sold
-
Pages
23
Uploaded on
15-09-2023
Written in
2023/2024

This Document sustains introductory notes about Big Data and FAQ's

Institution
Course

Content preview

All about [What is Big Data]?
Data Engineering

Understanding Big Data Volume and Use Cases

During an interview with a banking company, the interviewer was not satisfied with the volume
of data the interviewee was processing, which was 25GB per day for one country. However, the
interviewee explained that the problem was not with the volume, but with the processing speed
of the existing technology. This scenario highlights the misconception that big data is only
about volume.


The use case for big data is dependent on the previous technology, and the answer to why big
data is necessary lies with the limitations of the previous technology. For example, if an Oracle
system cannot process more than 10,000TB of data, moving to big data becomes necessary
when the volume exceeds this limit.


Confidence in big data comes from understanding that the use case is not just about volume,
but also about velocity and processing speed. Big data technology encompasses a range of
solutions, including Hadoop, Spark, Kafka, Strom, Flumes, and over 10,000 more, developed to
solve different data problems across various layers.


Big data is not just a market name but a problem name given to the technology due to the lack
of other names. It is similar to how programming languages like C, C++, and Java have their
names.


Introduction to Big Data

Big data is a complex field that involves various layers of technology, including storage,
processing, testing, visualization, analytics, machine learning, and artificial intelligence. These
layers are supported by different technologies such as databases, file systems, and processing
frameworks.

,Data Layers


● Storage Layer: This layer involves technology for storing data, including databases
and file systems.
● Processing Layer: This layer involves technology for processing data, including
processing frameworks like informatica etl.
● Testing Layer: This layer involves technology for testing data.
● Visualization Layer: This layer involves technology for visualizing data.
● Analytics: This layer involves technology for data science, machine learning, and
artificial intelligence.
● Automation: This layer involves technology for scheduling and automation.

Big data also involves various sub-projects that are supported by different groups of people and
companies. While some of these sub-projects are included in the initial releases of big data
technologies like Hadoop, others are added later.


History of Hadoop

Hadoop was invented by Doug Cutting in the mid-2000s. It is an open-source technology that
includes two projects, HDFS and MapReduce. The inspiration for Hadoop came from two base
papers released by Google in the early 2000s, GFS and Google MapReduce. Hadoop was
developed to process and distribute data in a parallel and distributed manner.


After inventing Hadoop, Doug Cutting announced it as an open-source technology. Open-source
technologies are those in which the source code is freely available for use and modification.
Companies can use open-source technologies for free and may provide funding for the
developers of the technology to provide support and create new projects.


Apache Software Foundation is a community that provides licenses for open-source code. Many
IT giant companies trust the Apache License and monitor the Apache website for new source
code. If they find a project they like, they may provide funding for the developers to create new
projects and provide support.


BigData System Configuration
Data Engineering

, System Requirements for Learning Big Data

If you want to start learning about big data on your personal laptop, it is important to choose the
right system requirements. Here are some recommendations:


● Avoid using enterprise editions like Cloudera or Hortonworks, as they require a
minimum of 10+ GB of RAM and may not work well on your laptop.
● Instead, opt for Apache's vanilla flavour of Hadoop and Spark, which you can
download and install directly from the internet.
● You will need a Linux operating system on top of Windows. You can install Linux
using software like VMware and then install Hadoop and Spark.
● For Apache flavor, a minimum of 4 to 6 GB of RAM is sufficient, and the hard disk can
be around 13 GB to 100 GB.
● No need to purchase a new laptop or RAM unless you are using Cloudera or
Hortonworks.

It's important to note that for real-time projects or interviews, it is not recommended to mention
that you are using Apache. Instead, preconfigured platforms like Cloudera or Hortonworks are
more commonly used. However, for self-learning or course-based learning, Apache is
recommended.


Although the installation process may be a bit cumbersome, it is a one-time process and you
will learn valuable skills from it. Plus, once installed, you can freely explore and learn without any
performance issues.


Unboxing [Hadoop Framework]
Data Engineering


What is Hadoop?

Hadoop is one of the solutions in big data with multiple components.


What is HDfs?

HDfs is a component in Hadoop for distributing data, similar to Linux commands.

Written for

Course

Document information

Uploaded on
September 15, 2023
Number of pages
23
Written in
2023/2024
Type
Class notes
Professor(s)
Gautam gujjar
Contains
All classes

Subjects

$8.99
Get access to the full document:

Wrong document? Swap it for free Within 14 days of purchase and before downloading, you can choose a different document. You can simply spend the amount again.
Written by students who passed
Immediately available after payment
Read online or as PDF

Get to know the seller
Seller avatar
sambarandas

Get to know the seller

Seller avatar
sambarandas Self
Follow You need to be logged in order to follow users or courses
Sold
-
Member since
2 year
Number of followers
0
Documents
2
Last sold
-

0.0

0 reviews

5
0
4
0
3
0
2
0
1
0

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

Student with book image

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Working on your references?

Create accurate citations in APA, MLA and Harvard with our free citation generator.

Working on your references?

Frequently asked questions