Wednesday, 13 November 2013

Why Hadoop?



Bigdata is an emerging problem in this century. The massive amount of data that is emerging from connected, digital systems is fundamentally changing everything. The most important one to note in these fast emerging System specifications is
“Storage capacity has increased but not the read/write speed “

Every Day
  •              More than 1.5 billion shares are  traded  on the New York Stock Exchange.
  •              Facebook stores 2.7 billion comments and ‘Likes’.
  •              Google processes about 24 petabytes of data.
Every Minute
  •                 Foursquare handles more than 2,000 check-ins.
  •                 TransUnion makes nearly 70,000 updates to credit files.

Every Second
  •                Banks process more than 10,000 credit card transactions

We are generating data faster than ever. Systems are increasingly interconnected. People started using online in larger number, producing heavy data’s. Few of well known data’s are listed below
Example:
Larger videos, datas, images, Social network connections, comments, tweets, new post, audio, log files, product ratings on shoppings sites,

So far we were trying to increase the hard disk size. It is harder and more expensive to scale-up
Hadoop main features is to Scale out .  Hadoop is designed to stream large files and large amounts of data




Why Hadoop?

·         Storing large files

  •  Terabytes, Petabytes, etc...
  •  Millions rather than billions of files
  •  100MB or more per file


·        Scale-Out

  •  Add more nodes/machines to an existing distributed application
  •  Software Layer is designed for node additions or removal
  •  Hadoop takes this approach - A set of nodes are bonded together as a single distributed system
  • Very easy to scale down as well


·         Good in Streaming data

  • Write once and read-many times patterns

·         “Cheap” Commodity Hardware

  • No need for super-computers, use less reliable

   commodity hardware


No comments:

Post a Comment