Hadoop Terms – One line Description
MapReduce
- A YARN-based system for Low-level
parallel processing and analysis of large data sets.
Pig
-Procedural
data flow language executed using MapReduce
Hive
-SQL
base queries executed using MapReduce.
Impala
-High
Performance SQL based queries using a common execution Engine.
HBase
-A
scalable, distributed database that supports batch, random reads and limited queries structured data storage for large tables.
Hadoop Distributed File System
(HDFS™)
-A distributed file system that
provides high-throughput access to application data
YARN
-A framework for job scheduling and
cluster resource management
Ambari
-A web-based tool for provisioning,
managing, and monitoring Apache Hadoop clusters
Avro
-A data serialization system.
Cassandra
-A scalable multi-master database
with no single points of failure.
Chukwa
-A data collection system for managing large
distributed systems.
Mahout
-A Scalable machine learning and
data mining library.
ZooKeeper
-A high-performance coordination service for
distributed applications.
Oozie
-Hadoop workflow Scheduler and Manager
Quest
-Sqoop plugin that enables
high-performance data transfer between Oracle Database and Hadoop.
Talend Open Studio for Big Data
-Talend Open Studio for Big Data is a
powerful and versatile open source data integration tool.
No comments:
Post a Comment