Researchers train model to create images without ‘seeing’ copyrighted work

Researchers at The University of Texas at Austin have developed an innovative framework for training AI models on heavily corrupted images. Known as Ambient Diffusion, this method enables AI models to ‘draw inspiration’ from images without directly copying them. Conventional AI models like DALL-E, Midjourney, and Stable Diffusion risk copyright infringement because they’re trained on large datasets that include copyrighted images, leading them to sometimes inadvertently replicate these images. Ambient Diffusion flips that on its head by training models with deliberately corrupted data. In the study, the research team, including Alex Dimakis from the Electrical and Computer Engineering department at The post Researchers train model to create images without ‘seeing’ copyrighted work appeared first on DailyAI.

May 22, 2024 - 01:00

Researchers train model to create images without ‘seeing’ copyrighted work

Researchers at The University of Texas at Austin have developed an innovative framework for training AI models on heavily corrupted images.

Known as Ambient Diffusion, this method enables AI models to ‘draw inspiration’ from images without directly copying them.

Conventional AI models like DALL-E, Midjourney, and Stable Diffusion risk copyright infringement because they’re trained on large datasets that include copyrighted images, leading them to sometimes inadvertently replicate these images.

Ambient Diffusion flips that on its head by training models with deliberately corrupted data.

In the study, the research team, including Alex Dimakis from the Electrical and Computer Engineering department at UT Austin and Constantinos Daskalakis from MIT, trained a Stable Diffusion XL model on a dataset of 3,000 celebrity images.

Initially, models trained on clean data were blatantly observed to copy the training examples.

However, when the training data was corrupted – randomly masking up to 90% of the pixels – the model still produced high-quality, unique images.

This means the AI is never exposed to recognizable versions of the original images, preventing it from copying them.

Despite the corruption, the framework allows the AI to generate high-quality, original images distinct from the training data.

“Our framework allows for controlling the trade-off between memorization and performance,” explained Giannis Daras, a computer science graduate student who led the work.

“As the level of corruption encountered during training increases, the memorization of the training set decreases.”

Scientific and medical applications

The uses of Ambient Diffusion extend beyond resolving copyright issues.

According to Professor Adam Klivans, a collaborator on the project, “The framework could prove useful for scientific and medical applications too. That would be true for basically any research where it is expensive or impossible to have a full set of uncorrupted data, from black hole imaging to certain types of MRI scans.”

This is particularly beneficial in fields with limited access to uncorrupted data, such as astronomy and particle physics.

If this approach were refined, AI companies could balance the need for creative, high-quality AI outputs while respecting the rights of original content creators and preventing legal issues.

While that wouldn’t solve concerns that AI image tools reduce the pool of work for real artists, it would at least protect their works from being accidentally replicated in outputs.

The post Researchers train model to create images without ‘seeing’ copyrighted work appeared first on DailyAI.