JUST A COLLECTION: Why Hadoop?

Bigdata is an emerging problem in this century. The massive amount of data that is emerging from connected, digital systems is fundamentally changing everything. The most important one to note in these fast emerging System specifications is

“Storage capacity has increased but not the read/write speed “

Every Day

More than 1.5 billion shares are traded on the New York Stock Exchange.
Facebook stores 2.7 billion comments and ‘Likes’.
Google processes about 24 petabytes of data.

Every Minute

Foursquare handles more than 2,000 check-ins.
TransUnion makes nearly 70,000 updates to credit files.

Every Second

Banks process more than 10,000 credit card transactions

We are generating data faster than ever. Systems are increasingly interconnected. People started using online in larger number, producing heavy data’s. Few of well known data’s are listed below

Example:

Larger videos, datas, images, Social network connections, comments, tweets, new post, audio, log files, product ratings on shoppings sites,

So far we were trying to increase the hard disk size. It is harder and more expensive to scale-up

Hadoop main features is to Scale out . Hadoop is designed to stream large files and large amounts of data

Why Hadoop?

· Storing large files

Terabytes, Petabytes, etc...
Millions rather than billions of files
100MB or more per file

· Scale-Out

Add more nodes/machines to an existing distributed application
Software Layer is designed for node additions or removal
Hadoop takes this approach - A set of nodes are bonded together as a single distributed system
Very easy to scale down as well

· Good in Streaming data

Write once and read-many times patterns

· “Cheap” Commodity Hardware

No need for super-computers, use less reliable

commodity hardware

JUST A COLLECTION

Wednesday, 13 November 2013

Why Hadoop?

No comments:

Post a Comment

Search This Blog

VISITOR NUMBER