Facebook Open Sources Corona Hadoop Big Data Framework
Google (NASDAQ: GOOG), which laid the groundwork for the immensely popular, open source, distributed computing framework known as MapReduce, has already made its mark in the Big Data channel. But Facebook (NASDAQ: FB), which a few days ago open sourced its own alternative to MapReduce, is vying to secure its place in the evolution of Big Data technology as well. Now, we wonder: Which solution will prevail?
MapReduce and similar programs are a vital component of the Hadoop framework, which makes it possible to interact with vast quantities of data stored across many different individual computers. Specifically, MapReduce is the piece of Hadoop that distributes queries from users or applications across the cluster for processing and delivers the cluster’s response. The MapReduce concept and code were pioneered by Google, but the most popular modern implementation is maintained by the Apache Software Foundation as part of its open source Hadoop project.
While MapReduce enjoys widespread use across the world of Big Data today, it has some technical limitations. As a result, Apache has been working on a “next generation” query framework, called YARN (which, to keep things complicated, is also known as “MapReduce 2.0”) to replace MapReduce. YARN promises more flexible and efficient use of system resources to make Big Data operations even faster.
Facebook, however, thinks its own Hadoop scheduler, named Corona, can do a better job than YARN. A few days ago, the company made the Corona code open source. It is now freely available on github, along with some notes on why Facebook belives Corona is the best solution in terms of scalability, latency and job fairness.
There is one major caveat, at least for now. Corona currently works only with Facebook’s particular implementation of Hadoop, which is not the version that most third parties are likely to be running. But that situation may change if Corona gains in popularity and developers tailor it to work with more generic Hadoop infrastructures.
Facebook hasn’t said explicitly why it decided to make the Corona code public. But it’s likely the company is hoping to increase the pace of development, and keep its code compatible with other components of Hadoop, by inviting third-party collaboration.
For now, it remains to be seen whether the open source community will opt to throw the bulk of its support behind Apache’s YARN or Facebook’s Corona. Both are designed to achieve the same technical goals, and both enjoy backing from major organizations.
Either way, however, the addition of the Corona code to the line up of open source Big Data technologies bodes well for the open source channel as a whole, and underlines the momentum that it continues to enjoy in the development of solutions for handling Big Data more efficiently.