Tuesday, 12 November 2013

Introduction - Apache Hadoop



Apache Hadoop

  •  100% open-source
  •  framework
  •  reliable, scalable, distributed computing.
  •  Fault –tolerant.
  •  Scale out than Scale up.
  • Operates on unstructured , semi-structured and structured data.
  •  Combination of two technologies:
- Hadoop Distributed File System (HDFS) that provides storage
- MapReduce programming model, which provides processing.

Major Participants
                Cloudera
                Hortonworks
                MapR Distribution

Hadoop History        
  • Apache Lucene, the widely used text search library.
  • Doug Cutting, the creator of Apache Lucene
  • Hadoop was created by Doug Cutting
  • Hadoop has its origins in Apache Nutch, an open source web search engine
  • Nutch is a  sub-project of the Lucene Project.
  • Hadoop has been inspired by Google's File System (GFS) which was detailed in a paper by  released by Google in 2003.
  • Hadoop, originally called Nutch Distributed File System (NDFS)
  • Hadoop split from Nutch in 2006
  • In 2008 Yahoo!  announced that their web search engine index was being generated by a 10,000 core Hadoop cluster.

Links for more history


Success Stories
  •   The New York Times using Hadoop to convert about 4 million entities to PDF in just under 36 hours
  • Analyze 220 million Facebook profiles, in just under 11 hours for a total cost of $100 
  •  Skybox recently raised $70 million for its efforts. 
  •  Facebook and ebay , making major use of HBase.
Advantage of Hadoop:
  •             Companies and organizations to do research, analysis with big data .
  •             Live data streaming for analysis.
  •             Save money on energy bills.
  •             Scale out rather than Scale up.
  •             Cheaper.
The Preferred Book:

Note:

[1] HDFS and MapReduce were heavily influenced by two papers that Google published: Google File System (GFS) in 2003 and MapReduce in 2006.
[2] Hadoop can run with another file system, but most deployments use HDFS









           

No comments:

Post a Comment