Cloudera Moves Beyond Hadoop With Object Store in New Data Platform
(Pictured above: Cloudera CMO Mick Hollison on stage at the O’Reilly Strata Data Conference in New York City, Sept. 25.)
Cloudera has begun rolling out the much-needed revamp of its big-data platform with a new database management and machine learning engine for multiple functions including large-scale, self-service analytics and AI capabilities using telemetry consumed from edge-based endpoints.
Under development since completing its merger with onetime rival Hortonworks earlier this year, the company officially launched the Cloudera Data Platform (CDP) during this week’s O’Reilly Strata Data Conference in New York. Faced with declining demand for Hadoop, the engine for early cloud-based streaming analytics architectures, the companies came together with CDP as Cloudera’s next act.
Most notably, Cloudera replaced the core Hadoop Distributed File Store (HDFS) with a cloud object store that can run in Kubernetes clusters. CDP is available in the three major public clouds (AWS, Microsoft Azure and Google Cloud) and on premises, with a bare-metal server version of Cloudera’s object store based on Apache Ozone, developed by Hortonworks as a more scalable extension of HDFS. The on-premises version also runs on Red Hat OpenShift and customer-provided Kubernetes management tools.
In AWS, CDP runs in Amazon S3 and the Elastic Kubernetes Service (EKS); in Microsoft Azure it runs in Azure Data Lake Store (ADLS) or the Azure Kubernetes Services (AKS); and in Google Cloud, it operates in either Google Cloud Storage (GCS) or the Google Kubernetes Engine (GKE).
Cloudera describes CDP as a multifunction, integrated platform that manages data flows and streaming, data engineering, machine learning, operational data stores and data warehouse environments. CDP is designed to integrate existing data silos with a feature Cloudera calls “intelligent migration.” CDP offers adaptive scaling and is designed to manage and analyze data distributed among hybrid and multicloud environments.
A key component of CDP is its Shared Data Experience (SDX), a data fabric to set consistent data and metadata security and governance policies are set and maintained. SDX also ensures that those policies remain intact when data is moved across supported infrastructures. Also available is Cloudera Machine Learning, designed to let data science teams collaborate on the building of machine learning workspaces in a wizard-type environment. The tool allows movement of data without creating data siloes.
Cloudera and its key partner IBM believe CDP will address the next wave of requirements among enterprises looking to provide their data platforms and enable real-time analytics from distributed stores. While IBM had distinct partnerships in the past with Cloudera and Hortonworks, the newly combined company formed a go-to-market pact back in June. Under the new partnership, IBM will resell Cloudera Data Hub, which can move existing workloads to the public cloud, and DataFlow, the former Hortonworks real-time steaming analytics engine. For its part, Cloudera will resell IBM’s Watson Studio and BigSQL.
“No one that we work with does more to force the issue around hybrid and multicloud, and to really drive that message home, than our most strategic partner in the world, IBM,” said Mick Hollison, Cloudera’s chief marketing officer, speaking during the keynote session at the Strata Data conference. Cloudera also has partnerships with the major cloud and platform providers and a variety of ISVs, systems integrators, hardware vendors and specialty data integration, data science and security providers.
Analysts have remained skeptical about Cloudera’s prospects since its merger was announced last year and remained skeptical after reporting a disappointing outlook in March, leading its shares to drop sharply. Earlier this month, the company reported a narrower loss for its 2Q 2020 than expected.
James Kobielus, lead analyst for AI, data, data science, deep learning and application development at Wikibon, attended Cloudera’s analyst summit, where he reported that the company has advanced its solution portfolio and customer adoption beyond its core open-source platform roots with a more diverse platform, as expected.
However, Cloudera will face a challenge in a highly competitive market.
“Its background in the now-declining Hadoop market does not distinguish it in any way in competing for these opportunities,” Kobielus noted. “Its current offerings and enhancement road map for automation of data science pipelines are not appreciably different from those of many competitors. Though it may prove a formidable competitor going forward, Cloudera will find itself challenged to match the public-cloud providers in serving the new generation of AI developers.”