Microsoft AI Researchers Expose 38TB of Sensitive Data via SAS Tokens

Microsoft says no customer data was exposed.

Edward Gately, Senior News Editor

September 18, 2023

4 Min Read
Sensitive data exposed by Microsoft researchers
NicoElNino/Shutterstock

Microsoft artificial intelligence (AI) researchers accidentally exposed 38 terabytes of sensitive data, including private keys and passwords, according to Wiz.

The researchers did so while publishing a storage bucket of open-source training data on GitHub. The exposed data included a backup disk of two employees’ workstations. The backup includes secrets, private keys, passwords and more than 30,000 internal Microsoft Teams messages.

According to Wiz, the researchers shared their files using an Azure feature called Shared Access Signature (SAS) tokens, which allows users to share data from Azure Storage accounts. The access level can be limited to specific files only; however, in this case, the link was configured to share the entire storage account, including the private files.

New Risks Associated with AI

“This case is an example of the new risks organizations face when starting to leverage the power of AI more broadly, as more of their engineers now work with massive amounts of training data,” Wiz wrote in a blog. “As data scientists and engineers race to bring new AI solutions to production, the massive amounts of data they handle require additional security checks and safeguards.”

In its own blog, Microsoft said it investigated and remediated an incident involving a Microsoft employee who shared a URL for a “blob store” (cloud storage for unstructured data) in a public GitHub repository while contributing to open-source AI learning models. This URL included an overly-permissive SAS token for an internal storage account.

Security researchers at Wiz were then able to use this token to access information in the storage account. Data exposed in this storage account included backups of two former employees’ workstation profiles and internal Microsoft Teams messages of these two employees with their colleagues.

This issue was responsibly reported under a coordinated vulnerability disclosure and has already been addressed, Microsoft said.

No Customer Action Required for Sensitive Data Exposure

No customer data was exposed, and no other internal services were put at risk because of this issue, Microsoft said. No customer action is required in response to this issue.

Mohit Tiwari, Symmetry Systems‘ co-founder and CEO, said data is meant to be shared, but sharing data securely on the cloud today is like “driving a car with a tiller, fast and close to the edge of a cliff.”

Tiwaii-Mohit_Symmetry.jpg

Symmetry’s Mohit Tiwari

“The key takeaway is that organizations have to understand what data you have, who can access it and how it is being accessed,” he said. “What Wiz has identified is not a cloud posture problem — this is a data inventory and access problem. Detecting SAS tokens, finding public [storage] buckets, etc., are useful point solutions, but a data security foundation is critical if we are to innovate and be secure.”

Patrick Tiquet, vice president of security and architecture at Keeper Security, said while AI can be a useful tool, organizations need to be aware of the potential risks associated with using tools that leverage this relatively nascent technology. The recent incident involving the exposure of AI training data through SAS tokens highlights the need to treat AI tools with the same level of caution and security diligence as any other tool or system that may be used to store and process sensitive data. With AI in particular, the amount of sensitive data being used and stored can be extremely large.

Tiquet-Patrick_Keeper-Security.jpg

Keeper Security’s Patrick Tiquet

“In some cases, organizations need assurances from AI providers that sensitive information will be kept isolated to their organization and not be used to cross-train AI products, potentially divulging sensitive information outside of the organization through AI collective knowledge,” he said. “The implementation of AI-powered cybersecurity tools requires a comprehensive strategy that also includes supplementary technologies to boost limitations as well as human expertise to provide a layered defense against the evolving threat landscape.”

Want to contact the author directly about this story? Have ideas for a follow-up article? Email Edward Gately or connect with him on LinkedIn.

Read more about:

MSPsVARs/SIs

About the Author(s)

Edward Gately

Senior News Editor, Channel Futures

As news editor, Edward Gately covers cybersecurity, new channel programs and program changes, M&A and other IT channel trends. Prior to Informa, he spent 26 years as a newspaper journalist in Texas, Louisiana and Arizona.

Free Newsletters for the Channel
Register for Your Free Newsletter Now

You May Also Like