s
Big Data
,Course : Big Data
Module Name: Hive
Querying and Analysis
Session Name: Getting
Started with Apache Spark
& Programming with RDDs
Instructor :
,● Apache Spark
○ Spark Overview
○ Spark vs MapReduce
○ Spark Ecosystem
○ Spark Architecture
○ Spark APIs
○ Spark Installation
○ Introduction to Spark RDDs
○ Creating RDDs
○ Operations on RDDs
○ Transformation Operations
○ Action Operations
○ Lazy Evaluation in Spark
, Spark Overview
Apache Spark is an open-source distributed general-purpose cluster computing
framework. Spark provides high-level APIs in Java, Scala, Python, and R, and
an optimized engine that supports general execution graphs. It also supports a
rich set of higher-level tools including Spark SQL for SQL and structured data
processing, MLlib for machine learning, GraphX for graph processing, and
Spark Streaming for stream processing.
Spark is designed to be fast and scalable. It can run on a single machine or on
a cluster of machines. Spark is used by a wide variety of organizations,
including Netflix, Airbnb, and Yahoo.