Bigdata is an emerging problem in this century. The massive
amount of data that is emerging from connected, digital systems is
fundamentally changing everything. The most important one to note in these fast
emerging System specifications is
“Storage capacity has increased but
not the read/write speed “
Every Day
- More
than 1.5 billion shares are traded on the New York Stock Exchange.
- Facebook
stores 2.7 billion comments and ‘Likes’.
- Google processes about 24 petabytes of data.
Every Minute
- Foursquare
handles more than 2,000 check-ins.
- TransUnion makes nearly 70,000 updates to credit files.
Every Second
- Banks process more than 10,000 credit card transactions
We are generating data faster than ever. Systems are
increasingly interconnected. People started using online in larger number,
producing heavy data’s. Few of well known data’s are listed below
Example:
Larger videos,
datas, images, Social network connections, comments, tweets, new post, audio, log
files, product ratings on shoppings sites,
So far we were trying to increase the hard disk size. It is harder and
more expensive
to scale-up
Hadoop main features is to Scale out . Hadoop is designed to stream large files and large amounts of data
Why
Hadoop?
·
Storing large files
- Terabytes,
Petabytes, etc...
- Millions
rather than billions of files
- 100MB or more per file
·
Scale-Out
- Add more nodes/machines to an existing distributed application
- Software Layer is designed for node additions or removal
- Hadoop takes this approach - A set of nodes are bonded together as a single
distributed system
- Very easy to scale down as well
·
Good in Streaming data
- Write once and read-many times patterns
·
“Cheap” Commodity Hardware
- No need for super-computers, use less reliable
commodity hardware
No comments:
Post a Comment