MapR, in partnership with Databricks, has added a key new feature to its Apache Hadoop distribution for Big Data. Now, the MapR platform integrates the open source Apache Spark software stack, which dramatically increases the efficiency of large-scale data processing, especially in conjunction with in-memory computing.

Christopher Tozzi, Contributing Editor

April 10, 2014

2 Min Read
MapR Adds Open Source Apache Spark for High Performance Big Data

MapR, in partnership with Databricks, has added a key new feature to its Apache Hadoop distribution for Big Data. Now, the MapR platform integrates the open source Apache Spark software stack, which dramatically increases the efficiency of large-scale data processing, especially in conjunction with in-memory computing.

Apache Spark, which was developed by the Apache Software Foundation, the University of California-Berkeley and Databricks, works on top of the Hadoop Distributed File System (HDFS), the file system tailored for Hadoop Big Data deployments. But Spark takes a different approach to processing data than other Hadoop software. Instead of adopting the two-stage MapReduce strategy, Spark can perform repeated queries on the same information while keeping it in memory, which can make data analysis much more efficient.

The result, according to Databricks, is Hadoop application performance that is up to 100 times faster with in-memory computing, and 10 times faster using traditional storage.

MapR is hoping to capitalize on those performance improvements to help the MapR Hadoop distribution appeal to enterprises that demand high-performance computing. "With this release, MapR extends its lead in the Hadoop market for high performance by enabling Spark applications to run on the world record-holding distribution for Hadoop, which uniquely allows streaming writes directly to the data platform," according to the company.

MapR also is pitching Spark integration as a way to improve data quality and derive better information from data, since Spark-powered applications "are operating on more real-time data, which ultimately enables faster fraud detection, better personalization of media, higher quality from manufacturing processes and other operational analytic use cases."

That Spark is open source means it's likely to continue evolving rapidly, expanding the applicability of Hadoop and making it more useful in high-performance environments.

Read more about:

AgentsMSPsVARs/SIs

About the Author(s)

Christopher Tozzi

Contributing Editor

Christopher Tozzi started covering the channel for The VAR Guy on a freelance basis in 2008, with an emphasis on open source, Linux, virtualization, SDN, containers, data storage and related topics. He also teaches history at a major university in Washington, D.C. He occasionally combines these interests by writing about the history of software. His book on this topic, “For Fun and Profit: A History of the Free and Open Source Software Revolution,” is forthcoming with MIT Press.

Free Newsletters for the Channel
Register for Your Free Newsletter Now

You May Also Like