UNIT – III
Introduction to PIG
0...........................00000
0000000000000000000
000000000000000000P
repared by:
` Aditya Sharma
, PIG
Pig’s are omnivores animals which means they
can consume both plants and animals.
The PIG consumes any type of data whether
Structured or unStructured or any other machine
data & helps processing the same.
,What is PIG?
Pig is a high-level programming language useful for analysing large data sets. A pig was a
result of development effort at Yahoo!. It is an open source data flow language.
In a MapReduce framework, programs need to be translated into a series of Map and Reduce
stages. However, this is not a programming model which data analysts are familiar with. So,
in order to bridge this gap, an abstraction called PIG was built on top of Hadoop.
Apache PIG enables people to focus more on analyzing bulk data sets and to spend less
time writing Map-Reduce programs. Similar to Pigs, who eat anything, the PIG
programming language is designed to work upon any kind of data. That's why the name, PIG!
Apache Pig is an abstraction over MapReduce.
• It is a tool/platform which is used to analyze larger sets of data representing them as data
flows.
• Pig is generally used with Hadoop; we can perform all the data manipulation operations in
Hadoop using Apache Pig.
• To write data analysis programs, Pig provides a high-level language known as Pig Latin.
• This language provides various operators using which programmers can develop their own
functions for reading, writing, and processing data.
, Features of Pig:
• Rich set of operators: It provides many operators to perform operations like join, sort,
filer, etc.
• Ease of programming: Pig Latin is similar to SQL and it is easy to write a Pig script if
you are good at SQL.
• UDF’s: Pig provides the facility to create User-defined Functions in other programming
languages such as Java and invoke or embed them in Pig Scripts.
• Handles all kinds of data: Apache Pig analyzes all kinds of data, both structured as well
as unstructured. It stores the results in HDFS.
Introduction to PIG
0...........................00000
0000000000000000000
000000000000000000P
repared by:
` Aditya Sharma
, PIG
Pig’s are omnivores animals which means they
can consume both plants and animals.
The PIG consumes any type of data whether
Structured or unStructured or any other machine
data & helps processing the same.
,What is PIG?
Pig is a high-level programming language useful for analysing large data sets. A pig was a
result of development effort at Yahoo!. It is an open source data flow language.
In a MapReduce framework, programs need to be translated into a series of Map and Reduce
stages. However, this is not a programming model which data analysts are familiar with. So,
in order to bridge this gap, an abstraction called PIG was built on top of Hadoop.
Apache PIG enables people to focus more on analyzing bulk data sets and to spend less
time writing Map-Reduce programs. Similar to Pigs, who eat anything, the PIG
programming language is designed to work upon any kind of data. That's why the name, PIG!
Apache Pig is an abstraction over MapReduce.
• It is a tool/platform which is used to analyze larger sets of data representing them as data
flows.
• Pig is generally used with Hadoop; we can perform all the data manipulation operations in
Hadoop using Apache Pig.
• To write data analysis programs, Pig provides a high-level language known as Pig Latin.
• This language provides various operators using which programmers can develop their own
functions for reading, writing, and processing data.
, Features of Pig:
• Rich set of operators: It provides many operators to perform operations like join, sort,
filer, etc.
• Ease of programming: Pig Latin is similar to SQL and it is easy to write a Pig script if
you are good at SQL.
• UDF’s: Pig provides the facility to create User-defined Functions in other programming
languages such as Java and invoke or embed them in Pig Scripts.
• Handles all kinds of data: Apache Pig analyzes all kinds of data, both structured as well
as unstructured. It stores the results in HDFS.