Google (GOOG) has taken its Cloud Dataflow SDK to the open source world. Announced in June, Google Cloud Dataflow is a managed service model for data processing and is currently in alpha release. Now, the search giant has decided to open-source the software development kit (SDK) connected to Cloud Dataflow. What does this news mean for developers? Here are the details.
According to Sam McVeety, software engineer at Google, "This will make it easier for developers to integrate with our managed service while also forming the basis for porting Cloud Dataflow to other languages and execution environments."
Google Cloud Dataflow was designed "as a platform to democratize large scale data processing by enabling easier and more scalable access to data for data scientists, data analysts and data-centric developers." With it, Google aims to help its customers and developers discover meaningful results from their data using what it claims are simple and intuitive programming concepts, "without the extra noise from managing distributed systems.
The move to open-source the SDK is for three main reasons, which McVeety detailed:
- It will spur future innovation in combining stream- and batch-based processing models. "Reusable programming patterns are a key enabler of developer efficiency. The Cloud Dataflow SDK introduces a unified model for batch and stream data processing," he wrote.
- It will help in adapting the Dataflow programming model to other languages. As McVeety wrote, "As the proliferation of data grows, so do programming languages and patterns. We are currently building a Python 3 version of the SDK, to give developers even more choice and to make dataflow accessible to more applications."
- It will enable the execution of Dataflow on other service environments. "Although we are building a massively scalable, highly reliable, strongly consistent managed service for Dataflow execution, we also embrace portability. As Storm, Spark, and the greater Hadoop family continue to mature — developers are challenged with bifurcated programming models," McVeety wrote.
McVeety wrote in a blog post, Google's team has learned plenty about turning data into intelligence as the original FlumeJava programming models (used as the basis for Cloud Dataflow) have evolved within the company.