There is a lot of hype around Big Data. Whichever training class you go to, there is a course running on Big Data. This is called as the next “Big thing”.
This gets the entire developer community confused. Should we learn Big Data? Is it really going to be that big? How different is it from Microsoft or open technologies that I am already working on? And the important one is this for me?
I will try to explain Big Data from a developers perspective here which can hopefully make things clear for you then they are right now (assuming you don’t know a lot about Big Data) and then you will have enough data to take informed decisions, pun intended.
What is Big Data?
- Multitude of tools – It is a combination of tools & technologies and not just one thing. There are Big Data in itself simply means big and complex datasets which are difficult to process through traditional tools & applications.
- Data Characteristics – The data which should be processed typically should be voluminous, complex. Structured data which is in Terabytes or unstructured data such as documents, audio, video, etc. are good contenders.
- Technologies – A host of technologies exists through which we can process the big data sets. Distribution platform such as Cloudera and Horton works are popular.
- Storage, Retrieval, Processing, Analysis – It involves solution to store big data sets,
- For storage NoSQL DBs like MongoDB, Cassandra can be used.
- The processing of large datasets within HDFS is done by writing SQL like queries in ‘Pig’ & ‘Hive’.
- Programming can be done to process large and complex data in Java or Python language.
- Analytical tools like BIRT, tableau can be used to showcase the data in a more meaningful ways.
- Analytics – The ultimate goal of analyzing big data sets is to make sense of the data, find patterns, correlations, find customer habits, preferences or do predictive analysis and more. Data in itself doesn’t make sense unless it is analyzed and results plotted in an understandable manner such as charts, graphs, etc. There are different tools and programming languages through which data analysis can be done. If you do not have analytic mind (programmer’s mind is not always analytical) then Big Data would be drab for you.
What Big Data is not?
- Not a programming language – It’s not a programming language or just a database server that you can just learn and start to work on it.
- Not for small data sets – If the data is in Megabytes or even a few Gigabytes then it doesn’t make a lot of sense to use Big Data. The real performance can be seen only when the data is really huge. For smaller data sets conventional solutions works best. Hence mostly big companies who have data intensive operations are the best clients for it.
- Not simply a database – Often people mistake Big Data with yet another database and you won’t become ‘just a DBA’. But big data is much more than that. There is a lot of programming involved. Lot of tools that one needs to learn to provide Big Data solutions.
- Not for Real time Analysis – Usually archived data is analyzed or data which is at least a day old. If you want real time analysis then the traditional solutions might work best for you.