Open Source Solutions for Big Data Expand with Apache Drill
Big organizations such as Google (NASDAQ: GOOG), which has been developing proprietary tools for handling huge amounts of information for years, may have an early advantage in the world of Big Data. But the open source ecosystem is busy creating its own solutions, with Drill, one of the newest Apache Incubator projects, as one of the clearest examples. Here’s a look at this fledgling initiative, and the opportunities it opens up within the open source channel.
Inspired by Google’s Dremel system, which is not open source, Drill’s goal is to provide fast queries of large sets of data. And by large, the project means petabytes of information, according to the Drill proposal, spread across up to 10,000 separate servers or more.
The Bigger Picture: Drill, Hadoop and Open Source
But Drill is about much more than simply producing an open source clone of Dremel. The Drill developers aim to go further than Google by supporting more query languages and types of data.
By extension, Drill also promises to help promote the development of new, open APIs for the world of Big Data. Such technologies would do much to solidify the open source world’s presence in this space and ensure that proprietary standards don’t stunt continued growth.
At the same time, Drill stands poised to integrate with existing open source tools for Big Data, such as Hadoop — which was itself created to emulate another technology born at Google, MapReduce — to provide a more complete set of open source solutions for handling massive amounts of information. And that means Big Data could become a major sub-ecosystem within the open source channel, with its own sets of upstream and downstream players.
Indeed, what’s happening now, perhaps, is not all that different from the melding together in the 1990s of an array of independent projects, from the Linux kernel to the X Window System to the GNOME and KDE desktop environments — to make Linux distributions-viable desktop operating systems. Before that, attempts to create open source platforms for running PCs were mostly piecemeal and incomplete, just as Hadoop would be without sister-projects such as Drill to fill in the gaps and provide an all-encompassing open source solution for Big Data problems.
And as in Linux’s early days, commercial organizations within the open source channel are already seizing on the opportunities currently opening up in the world of Big Data. MapR, which distributes a value-added version of Apache Hadoop and was one of the leading forces behind Drill’s establishment as an Apache Incubator project, is a prime example. Expect others to follow as open source Big Data reaches full throttle.