Apache
Hadoop
- 100% open-source
- framework
- reliable, scalable, distributed computing.
- Fault –tolerant.
- Scale out than Scale up.
- Operates on unstructured , semi-structured and structured data.
- Combination of two technologies:
- Hadoop Distributed File System (HDFS)
that provides storage
- MapReduce programming model, which
provides processing.
Hadoop History
- Apache Lucene, the widely used text search library.
- Doug Cutting, the creator of Apache Lucene
- Hadoop was created by Doug Cutting
- Hadoop has its origins in Apache Nutch, an open source web search engine
- Nutch is a sub-project of the Lucene Project.
- Hadoop has been inspired by Google's File System (GFS) which was detailed in a paper by released by Google in 2003.
- Hadoop, originally called Nutch Distributed File System (NDFS)
- Hadoop split from Nutch in 2006
- In 2008 Yahoo! announced that their web search engine index was being generated by a 10,000 core Hadoop cluster.
Links for more history
- https://www.inkling.com/read/hadoop-definitive-guide-tom-white-3rd/chapter-1/a-brief-history-of-hadoop
- http://blog.cloudera.com/blog/2012/04/apache-hadoop-versions-looking-ahead-3/
- http://binarynerd.com/java-tutorials/distributed-computing/intro-apache-hadoop.html
- http://en.wikipedia.org/wiki/Apache_Hadoop
Success Stories
- The New York Times using Hadoop to convert about 4 million entities to PDF in just under 36 hours
- Analyze 220 million Facebook profiles, in just under 11 hours for a total cost of $100
- Skybox recently raised $70 million for its efforts.
- Facebook and ebay , making major use of HBase.
Advantage of Hadoop:
- Companies and organizations to do research, analysis with big data .
- Live data streaming for analysis.
- Save money on energy bills.
- Scale out rather than Scale up.
- Cheaper.
The Preferred Book:
- Hadoop, the Definitive Guide, 3rd edition, by Tom White.
- Hadoop in Action by Chuck Lam
- Programing Pig by Alan Gates
- Programming Hive by Capriolo, Wampler, Rutherglen
Note:
[1] HDFS and MapReduce were heavily influenced by two papers that Google published: Google File System (GFS) in 2003 and MapReduce in 2006.[2] Hadoop can run with another file system, but most deployments use HDFS
No comments:
Post a Comment