Concept and Importance
Open Source vs. Closed Source:
Differences and Best Practices
Open source software is code that is designed to
be publicly accessible and can be modified and
shared
Closed source software is proprietary, and the
source code is not available to the public
Best practices for open source:
Ensure the license is compatible with your
project
Give back to the community if you make
modifications
Be aware of security implications
Best practices for closed source:
Make sure the software meets your needs
before purchasing
Understand the terms of the license
Ensure the vendor provides adequate
support
Big Data Commercialization: Apache vs.
Enterprise Editions
Apache distributions are open source and free
Enterprise distributions often include additional
features, such as:
Technical support
Enterprise-level security
Additional tools for data management
, Job Perspective: Experience and Skill Set
Demand
Experience with Big Data technologies is in high
demand
Essential skills for Big Data jobs:
Programming languages (Python, Java,
Scala)
SQL and NoSQL databases
Linux command line
Experience with Big Data tools (Hadoop,
Spark, Hive)
Hadoop Distributions: Cloudera,
Hortonworks, Amazon EMR, and Others
Hadoop distributions are pre-packaged sets of
Hadoop-related software
Popular distributions include:
Cloudera
Hortonworks
Amazon EMR
MapR
Issues and Problems with Traditional Data
Management
Traditional data management systems struggle
with:
Large volumes of data
Variety of data types
High velocity of data creation