What's the best SQL-on-Hadoop solution? For companies choosing between Impala, Spark and Hive, which platform delivers the best speed and performance? Those are questions that a recent benchmark report from AtScale helps answer.

Christopher Tozzi, Contributing Editor

February 24, 2016

1 Min Read
AtScale Benchmarks SQL-on-Hadoop Data Analytics Platforms

What’s the best SQL-on-Hadoop solution? For companies choosing between Impala, Spark and Hive, which platform delivers the best speed and performance? Those are questions that a recent benchmark report from AtScale helps answer.

The report, which was published this week, analyzed the performance of the top three SQL-on-Hadoop platforms for business intelligence operations. Those platforms — Impala, Spark and Hive — are increasingly important in the data analytics market for companies that want to work with big data while leveraging the scalability of Hadoop, but also maintaining SQL compatibility.

AtScale’s main findings included:

  • Hive, despite its widespread use in Hadoop environments, did not come out on top in any of the benchmark tests.

  • Impala testing results varied significantly depending on query type, data size and other factors. This suggests that Impala can be a winning data analytics solution in some situations, but not all.

  • Spark performance for small data sets was markedly better when using Spark 1.6 rather than Spark 1.5. Enterprises that want the most from their Spark systems should upgrade.

The complete results show that there is no one-size-fits-all solution for Hadoop-based business intelligence. Getting the most from data analytics solutions requires evaluating the needs of a particular workload.

That no SQL-on-Hadoop platform outperforms all others is not surprising, of course. It’s rare in any context for one vendor’s software solution to beat out all others uniformly.

The bigger point to note from the AtScale study is that the leading data analytics platforms seem to be developing different types of strengths. Going forward, those distinctions could prove to be important in determining how these various platforms solidify their positions in the market.

Read more about:

AgentsMSPsVARs/SIs

About the Author(s)

Christopher Tozzi

Contributing Editor

Christopher Tozzi started covering the channel for The VAR Guy on a freelance basis in 2008, with an emphasis on open source, Linux, virtualization, SDN, containers, data storage and related topics. He also teaches history at a major university in Washington, D.C. He occasionally combines these interests by writing about the history of software. His book on this topic, “For Fun and Profit: A History of the Free and Open Source Software Revolution,” is forthcoming with MIT Press.

Free Newsletters for the Channel
Register for Your Free Newsletter Now

You May Also Like