JUST A COLLECTION: Hadoop Terms - Single line descriptions

Hadoop Terms – One line Description

MapReduce

- A YARN-based system for Low-level parallel processing and analysis of large data sets.

Pig

-Procedural data flow language executed using MapReduce

Hive

-SQL base queries executed using MapReduce.

Impala

-High Performance SQL based queries using a common execution Engine.

HBase

-A scalable, distributed database that supports batch, random reads and limited queries structured data storage for large tables.

Hadoop Distributed File System (HDFS™)

-A distributed file system that provides high-throughput access to application data

YARN

-A framework for job scheduling and cluster resource management

Ambari

-A web-based tool for provisioning, managing, and monitoring Apache Hadoop clusters

Avro

-A data serialization system.

Cassandra

-A scalable multi-master database with no single points of failure.

Chukwa

-A data collection system for managing large distributed systems.

Mahout

-A Scalable machine learning and data mining library.

ZooKeeper

-A high-performance coordination service for distributed applications.

Oozie

-Hadoop workflow Scheduler and Manager

Quest

-Sqoop plugin that enables high-performance data transfer between Oracle Database and Hadoop.

Talend Open Studio for Big Data

-Talend Open Studio for Big Data is a powerful and versatile open source data integration tool.

JUST A COLLECTION