Christopher Tozzi, Contributing Editor

October 23, 2012

2 Min Read
Talend Introduces Big Data Profiling Tools

These days, storing large amounts of data is easy. Where things get complicated is ensuring the integrity and reliability of that data, an increasing challenge as Big Data clusters grow bigger and bigger. This problem has created new opportunities in the Big Data channel, on which companies such as Talend, which has introduced new Hadoop data profiling technology, are working to capitalize. Read on for the details.

Talend’s Big Data products are already closely integrated with Apache Hadoop, the open source distributed computing platform. In its stated mission to “democratize” Big Data, the company has focused extensively on solutions that make deploying and managing Hadoop and related technologies simple, without requiring specific expertise in these new and rapidly evolving technologies.

Talend Data Profiling

On Tuesday, Talend took its Big Data strategy a step further by releasing new technology that not only simplifies Hadoop configuration, but also protects the information stored in Hadoop clusters. That means addressing a number of issues, such as data redundancy, incompleteness and inconsistency, that can undermine otherwise well-run Big Data deployments.

Talend is promoting the new data profiling tools, which are integrated into the company’s Platform for Big Data, as a solution for an issue that most organizations traditionally either have been ignoring at their peril, or trying to solve by building in-house software. Talend promises the new product will bring several welcome changes to the Big Data world:

  • Profiling allows users to analyze their data in their Hive database on Hadoop. Profiling is performed “in place,” meaning data does not need to go through the time-consuming process of being extracted from Hadoop before being profiled.

  • Talend data profiling leverages the power of the Hadoop cluster, allowing users to scale up with additional servers to boost performance and deal with increases in volume.

  • Analysis provides a custom graphical report on the level of quality of organizations’ data. Data quality analysis includes standard tests that apply to all types of data including empty/missing values, number of duplicates, length of data and shapes of data. It includes further tests for specific data domains such as e-mail validation and phone number validation. These tests can be customized and extended as needed.

Further Developments

In a conversation regarding this newest addition to Talend’s Big Data technologies, Yves de Montcheuil, the company’s vice president of marketing, hinted at important news to come later this week concerning yet further technological enhancements for Talend’s Big Data products. We’ll stay tuned as developments are made public.

Read more about:


About the Author(s)

Christopher Tozzi

Contributing Editor

Christopher Tozzi started covering the channel for The VAR Guy on a freelance basis in 2008, with an emphasis on open source, Linux, virtualization, SDN, containers, data storage and related topics. He also teaches history at a major university in Washington, D.C. He occasionally combines these interests by writing about the history of software. His book on this topic, “For Fun and Profit: A History of the Free and Open Source Software Revolution,” is forthcoming with MIT Press.

Free Newsletters for the Channel
Register for Your Free Newsletter Now

You May Also Like