Introduction to Hortonworks Data Platform 2
Storm :
An open source distributed real-time computing system is Apache Storm.
Fast
Scalable
Fault-tolerant
used to process enormous amounts of data at quick speeds
Has been benchmarked at over a million tuples processed per second per node,
making it useful when milliseconds of latency matter and Spark isn't quick enough.
Cassandra, HBase, S3, HDFS
Data Lifecycle and Governance :
Atlas :
Apache Atlas is a suite of fundamental governance services that are scalable and
extendable.
enables businesses to successfully and efficiently comply with regulatory standards
using Hadoop
Cassandra, HBase, S3, HDFS
With tools and procedures both inside and outside of Hadoop, exchange metadata
Integration with the entire company data ecosystem is possible.
• Atlas Features:
▪ Data classification
▪ Centralized auditing
▪ Centralized lineage
Storm :
An open source distributed real-time computing system is Apache Storm.
Fast
Scalable
Fault-tolerant
used to process enormous amounts of data at quick speeds
Has been benchmarked at over a million tuples processed per second per node,
making it useful when milliseconds of latency matter and Spark isn't quick enough.
Cassandra, HBase, S3, HDFS
Data Lifecycle and Governance :
Atlas :
Apache Atlas is a suite of fundamental governance services that are scalable and
extendable.
enables businesses to successfully and efficiently comply with regulatory standards
using Hadoop
Cassandra, HBase, S3, HDFS
With tools and procedures both inside and outside of Hadoop, exchange metadata
Integration with the entire company data ecosystem is possible.
• Atlas Features:
▪ Data classification
▪ Centralized auditing
▪ Centralized lineage