Wednesday, 13 November 2013

Hadoop Terms - Single line descriptions



Hadoop Terms – One line Description

MapReduce
- A YARN-based system for Low-level parallel processing and analysis of large data sets.
Pig
                -Procedural data flow language executed using MapReduce
Hive
                -SQL base queries executed using MapReduce.

Impala 
                -High Performance SQL based queries using a common execution Engine.
HBase
-A scalable, distributed database that supports batch, random reads and limited queries structured    data storage for large tables.

Hadoop Distributed File System (HDFS™)
-A distributed file system that provides high-throughput access to application data    


YARN
-A framework for job scheduling and cluster resource management
Ambari
-A web-based tool for provisioning, managing, and monitoring Apache Hadoop clusters
Avro
-A data serialization system.
Cassandra
-A scalable multi-master database with no single points of failure.
Chukwa
 -A data collection system for managing large distributed systems.

Mahout
-A Scalable machine learning and data mining library.
ZooKeeper
 -A high-performance coordination service for distributed applications.
Oozie

            -Hadoop workflow Scheduler and Manager

Quest
             -Sqoop plugin that enables high-performance data transfer between Oracle Database and Hadoop.

Talend Open Studio for Big Data

-Talend Open Studio for Big Data is a powerful and versatile open source data integration tool.
               

No comments:

Post a Comment