Is Java required to learn Hadoop?
One of the questions asked by prospective Hadoopers is whether Java is a prerequisite to learn Hadoop.
Most of the learners exhibit some kind of disappointment when they ask this question - they feel not knowing Java to be a limitation and they might have to miss on a great career opportunity. However with the experience on many projects and training, we can say that It is one of the biggest myth that a person who does not know Java cannot learn Hadoop.
Apache Hadoop is an open source software framework which is indeed coded in Java for distributed storage and distributed processing of very large data sets.
So why Java is not required to learn Hadoop if we say that it was built in Java itself?
Well, do you need to know C++ to be able to work on MS Word or MS Excel ? And do you know that these tools were built in C++ only?
Now do you get the idea? While it is true that Hadoop is built in Java and in some cases you might need to know Java as well (will discuss these cases later in this article) but for majority of the projects, Java is optional and does not require you to be an expert in Java.
In general non Big data scenarios, most of the data processing tasks (Datawarehousing, Data integration, Master data management etc) happen through some advanced software tools like Informatica, Datastage etc.
Hadoop is used for processing and storing huge volume of structured and unstructured data. Most of the time Hadoop is used exactly for the same processes as these tools but Hadoop does it on Big Data. That's it !!
So what do we use if not Java ? Is it another dreaded language at play here?
No, Relax !!
The two languages which are used for data processing and querying are PIG Latin and HIVE Query Language respectively.
The first one is very much similar to SQL, in plain English and the second one is like a SQL itself. These languages are very easy to learn, execute and provide awesome connectors for outside world.
You can use HIVE to connect to files and folders on Hadoop and run your reports from a BI tool, you can also connect xls to HIVE, download the data and do your analysis on this data.
End users never have to worry about whether data is coming from Hadoop or a table and they would never have to worry about writing a Java MapReduce program.
It would be highly counterproductive to sit and write a complex Java Map Reduce program, debug it, compile and run it even when you just wanted to fetch few rows from a simple Hadoop file. This is what inspired the developers to build HIVE and PIG tools.
Now most of the companies prefer PIG for data processing tasks and HIVE for query tasks.
So what happened to these PIG and HIVE jobs?
It's an absolute Magic !! These scripts get converted to Java Map Reduce programs and run on Hadoop system, but you never have to worry about the compilation part or even the Java part.
So, in which scenarios we do need Java knowledge?
There are three scenarios where you would be better off if you have some Java knowledge.
- Product Development on top of Hadoop
If you want to build a product on top of Hadoop framework, then you need to do coding on Java; you need to have expert knowledge on Java. These types of projects are only 5% of the total available projects on Hadoop. - Extending the functionality of PIG/HIVE or other Hadoop tools
If you want to extend the functionality of Hadoop tools or develop custom Input and Output Formats then you need Java. E.g. if you want to add a user defined function in PIG then you need to write that function in Java, again very less projects or work in this space. - Debugging
You might need to do some debugging in case a Hadoop program crashes. You need to know only basic Java for this. Or even if you have done debugging in some other programming language then that knowledge will give you a fair idea of how to deal with Java debugging.
Do not waste your time anymore. Learn the hottest tool of today. Learn Big Data and Hadoop
So now, as you know that you do not need java knowledge to be able to start your learning on Hadoop, we suggest that do not delay it any more. You can pick your Java skill along the way or maybe start that later.
There is a big rush and everyone seems to be looking at riding this big elephant (of opportunities). Your delay will only push you down in the queue. So get up and start.