Databricks added a job scheduling capability to make it simpler to automate workflows involving the Databricks Cloud service. Deployed on Amazon Web Services (AWS), solution providers across the channel should take note of the rise of Apache Spark.

Mike Vizard, Contributing Editor

April 6, 2015

2 Min Read
Ali Ghodsi cofounder and head of engineering at Databricks
Ali Ghodsi, co-founder and head of engineering at Databricks.

For all the excitement over Hadoop initial interest in the big data management framework was fairly limited to batch-oriented jobs. But the emergence of Apache Spark as a framework for running big data applications that run in-memory on top of Hadoop has the potential to dramatically extend the scope and reach of where and how Hadoop gets employed as a platform.

With the price of memory falling so sharply, many organizations now want to deploy big data applications that run in memory. Of course, it takes a fair amount of infrastructure resources to run Hadoop, which accounts for why so many Hadoop deployments are moving into the cloud.

Naturally, one of the first vendors to apply that same concept to Apache Spark is Databricks, which is led by the team that developed the Apache Spark framework in the first place. Most recently, Databricks added a job scheduling capability to make it simpler to automate workflows involving the Databricks Cloud service.

Deployed on Amazon Web Services (AWS), solution providers across the channel should take note of the rise of Apache Spark. There are now more than 500 plus Apache Spark deployments in production. Databricks itself only has about two dozen customers at the moment, but as one of the few places where organizations can access Apache Spark in the cloud that number should rapidly increase in the months ahead.

Ali Ghodsi, co-founder and head of engineering at Databricks, said the reason for this is that Databricks exposes Apache Spark via a simple RESTful application programming interface (API) that enables organizations to provision Apache Spark cluster in a matter of minutes.

The implications of the rise of Apache Spark are manifold. Most significantly, Apache Spark significantly expands the type of applications that can be deployed on Hadoop into the realm of real time. With multiple forms of memory dropping in price real time applications are now within the reach of a much broader number of organizations.

Click here for Talkin’ Cloud’s Top 100 CSP list

What many of those organizations can’t afford is a proprietary big data database to run applications in real time. As such, Apache Spark represents an open source alternative to in-memory databases from SAP, Oracle and Microsoft that sits directly on top of a Hadoop framework where most big data is going to be stored. Rather than having to move data in an out of Hadoop the Apache Spark approach generates a lot less in the way of data management headaches for all concerned.

It’s still relatively early days when it comes to Apache Spark adoption, but for a lot of organizations it’s already clear that the path of least resistance to building and deploying big data applications is most likely going to be via deploying Apache Spark in the cloud.

About the Author(s)

Mike Vizard

Contributing Editor, Penton Technology Group, Channel

Michael Vizard is a seasoned IT journalist, with nearly 30 years of experience writing and editing about enterprise IT issues. He is a contributor to publications including Programmableweb, IT Business Edge, CIOinsight and UBM Tech. He formerly was editorial director for Ziff-Davis Enterprise, where he launched the company’s custom content division, and has also served as editor in chief for CRN and InfoWorld. He also has held editorial positions at PC Week, Computerworld and Digital Review.

Free Newsletters for the Channel
Register for Your Free Newsletter Now

You May Also Like