Two Years After DALL-E's Debut, The Inventor Is "surprised" By The Impact

View all on-demand sessions from the Intelligent Security Summit here.

Before DALL-E 2, Stable Diffusion and Midjourney there was only a research paper called “Zero-Shot text-to-image generation.”

With that paper and a verified website demo, on January 5, 2021 — two years ago today — OpenAI introduced DALL-Ea neural network that “creates images from text captions for a wide variety of concepts that can be expressed in natural language.”

The 12 billion parameter version of the Transformer language model GPT-3 is trained to generate images from text descriptions, using a dataset of text-image pairs. VentureBeat reporter Khari Johnson described the name as “intended to evoke the artist Salvador Dali and the robot WALL-E” and added a DALL-E generated illustration of a “baby daikon radish in a tutu walking a dog “.

Things have moved quickly since then, according to OpenAI researcher, DALL-E inventor, and DALL-E 2 co-inventor Aditya Ramesh. It’s more than an understatement given the staggering pace of development in the generative AI space over the past year. Then there was the meteoric rise of diffusion models, which were a game-changer for DALL-E 2, released last April, and its open source counterparts, Stable Diffusion and Midjourney.

Event

Intelligent Security Summit on demand

Learn the critical role of AI and ML in cybersecurity and industry-specific case studies. Check out on-demand sessions today.

Look here

“It doesn’t feel like that long ago when we first tried this line of research to see what could be done,” Ramesh told VentureBeat. “I knew the technology would reach a point where it would impact consumers and be useful for many different applications, but I was still surprised by how quickly it moved.”

Now generative modeling is approaching the point where “there’s going to be a kind of iPhone-like moment for image generation and other modalities,” he said. “I’m excited to be able to build something that will be used for all of these applications that will emerge.”

Original research developed in collaboration with CLIP

The DALL-E 1 study was developed and announced in collaboration with CLAMP (Contrastive Language-Image Pre-training), a separate model based on zero-shot learning that was essentially DALL-E’s secret sauce. Trained on 400 million pairs of images with text captions scraped from the Internet, CLIP could be instructed in natural language to run ranking benchmarks and rank DALL-E results.

Of course, there were plenty of early signs that text-to-image progress was on the way.

“It’s been clear for years that this future is coming soon,” said Jeff Clune, an associate professor of computer science at the University of British Columbia. In 2016, when his team produced what he says were the first synthetic images that were difficult to distinguish from real images, Clune recalled speaking to a journalist.

“I said in a few years you can describe any image you want and AI will produce it, like ‘Donald Trump takes a bribe from Putin with a grin on his face,'” he said.

Generative AI has been a core tenet of AI research since its inception, said Nathan Benaich, general partner at Air Street Capital. “It’s worth pointing out that research, such as the development of Generative Adversarial Networks (GANs) in 2014 and DeepMind’s WaveNet in 2016, was already beginning to show how AI models could generate new images and audio, respectively,” he said. VentureBeat in a message.

Still, the original DALL-E paper was “quite impressive at the time,” added futurist, author, and AI researcher Matt White. “While not the first work in text-to-image synthesis, Open AI’s approach to promoting their work to the general public and not just in AI research circles has received significant attention, and rightly so.”

Pushing DALL-E research as far as possible

From the start, Ramesh said his main interest was to push the investigation as far as possible.

“We found text-to-image generation interesting because, as humans, we can construct a sentence to describe any situation we might encounter in real life, as well as fantastic situations or crazy scenarios that are impossible,” he said. . “So we wanted to see if we trained a model to just generate good enough images from text, if it could do the same things humans can, as far as extrapolation is concerned.”

One of the main research influences on the original DALL-E, he added, was VQ-UAEa technique developed by Aaron van den Oord, a DeepMind researcher, to split images into tokens similar to those used to train language models.

“So we can take a Transformer like GPT, which is just trained to predict each word after the next, and extend its language tokens with these additional image tokens,” he explained. “As a result, we can also apply the same technology to generate images.”

People were surprised by DALL-E, he said, because “it’s one thing to see an example of generalization in language models, but when you see it in image generation, it’s just much more visceral and impactful.”

The move from DALL-E 2 to diffusion models

But by the time the original DALL-E study was published, Ramesh’s co-authors for DALL-E 2, Alex Nichol and Prafulla Dhariwal, were already working on using diffusion models in a modified version of GLIDE (a new OpenAI diffusion model ).

This led to the creation of DALL-E 2 a very different architecture from the first iteration of DALL-E: As Vasclav Kosar explained“DALL-E 1 uses discrete variation autoencoder (dVAE), next token prediction, and CLIP model rearrangement, while DALL-E 2 directly uses CLIP embedding and decodes images via diffusion similar to GLIDE.”

“It seemed very natural [to combine diffusion models with DALL-E] because there are many advantages to diffusion models – inpainting is the most obvious feature which is quite clean and elegant to implement using diffusion,” said Ramesh.

Incorporating a technique used in developing GLIDE into DALL-E 2 — classification-free accompaniment — led to a drastic improvement in subtitle matching and realism, he explained.

“When Alex first tried it out, none of us expected such a drastic improvement in results,” he said. “My initial expectation for DALL-E 2 was that it would just be an update to DALL-E, but it was surprising to me that we got to the point where it’s already starting to be useful to people,” he said.

When the AI community and the general public first introduced the image output from DALL-E 2 on April 6, 2022, the difference in image quality was overwhelming for many.

“Competitive, exciting and fraught”

The January 2021 release of DALL-E was the first in a wave of text-to-image research that builds on fundamental advancements in language and image processing, including variation autoencoders and autoregressive transformers, Margaret Mitchell, chief ethics officer at Hugging Face, told VentureBeat by email. When DALL-E 2 was released, “the spread was a breakthrough that most of us working in the area didn’t see, and it really elevated the game,” she said.

The past two years since the original DALL-E research paper have been “competitive, exciting and fraught,” she added.

“The focus on modeling language and images has come at the expense of how best to get data for the model,” she said, noting that individual rights and consent have been “all but abandoned” in the modern text-to-text image advances. Current systems “essentially steal artists’ concepts without providing any recourse for the artists,” she concluded.

DALL-E’s failure to make its source code available also led others to develop open source text-to-image options that made splashes of their own by the summer of 2022.

The original DALL-E was “interesting but not accessible,” says Emad Mostaque, founder of Stability AI, which released the first iteration of the open source text-to-image generator Stable Diffusion in August, adding that “only the models of my team were trained [open source].” Mostaque added that “we started aggressively funding and supporting this space in the summer of 2021,” he said.

Going forward, DALL-E still has plenty of work to do, says White — even though it’s teasing a new iteration soon.

“DALL-E 2 suffers from consistency, quality and ethical issues,” he said. It has problems with associations and composability, he stressed, so a prompt like “a brown dog with a red shirt” could produce results where the features are transposed (e.g. red dog with a brown shirt, red dog with a red shirt, or different colors.) In addition, he added, DALL-E 2 still struggles with face and body composition, as well as consistently generating text in images — “especially longer words.”

The Future of DALL-E and Generative AI

Ramesh hopes that more people learn how the DALL-E 2 technology works, which he says will lead to fewer misunderstandings.

“People think the way the model works is it has a database of images somewhere, and the way it generates images is to cut and paste pieces of these images to create something new,” he said. “But actually the way it works is much closer to a human, where when the model is trained on the images, it learns an abstract representation of what all these concepts are.”

The training data “is no longer used when we generate a completely new image,” he explains. “Diffusion models start with a vague approximation of what they’re trying to generate, then gradually add details to it in many steps, like how an artist would start with a rough sketch and slowly flesh it out over time.”

And helping artists, he said, has always been a goal for DALL-E.

“We had ambitiously hoped that these models would be a kind of creative copilot for artists, similar to how Codex is a copilot for programmers — another tool that you can use to make many everyday tasks a lot easier and faster,” he said. “We found that some artists find it very useful for prototyping ideas – while they would normally spend several hours or even several days exploring a concept before deciding to go ahead with it, DALL-E could help them in enable you to get to the same place in just a few hours or a few minutes.”

Ramesh said he hopes more and more people learn and discover, both with DALL-E and other generative AI tools.

“Of [OpenAI’s] ChatGPT, I think we’ve dramatically expanded the scope of what these AI tools can do and exposed a lot of people to using them,” he said. “I hope that over time people who want to do things with our technology will time to easily access it through our website and find ways to use it to build things they want to see.”

VentureBeat’s mission is to become a digital city plaza where tech decision makers can learn about transformative business technology and execute transactions. Discover our Briefings.

Is Sal Vulcano Married? Explore relationship and dating history

Jonathan Hillstrand Net Worth, Wife, Daughter, Weight Loss

Who is Ariel Contreras from ‘Hell’s Kitchen’ anyway? Biography

Who is Kaydon Boebert? Bio, parents, siblings, age, relationship

MSC’s Explora Journeys Makes Its Maiden Voyage from Copenhagen to Reykjavik

Two years after DALL-E’s debut, the inventor is “surprised” by the impact

Event

Original research developed in collaboration with CLIP

Pushing DALL-E research as far as possible

The move from DALL-E 2 to diffusion models

“Competitive, exciting and fraught”

The Future of DALL-E and Generative AI

Latest news

1xbet Зеркало Букмекерской Конторы 1хбет На следующий ️ Вход и Сайт Прямо тольк

Mostbet Pakistan ᐉ Online Casino Review Official Website

Casino Pin Up Pin-up Casino Resmi Sitesi Türkiye Proloq Ve Kayıt Çevrimiçi

Играть В Авиатора: Самолетик Pin Up

Pin Up 306 Casino əvvəl Qeydiyyat, Bonuslar, Yukl The National Investo

Must read

You might also likeRELATED
Recommended to you

POPULAR POSTS

Why Managed Discovery and Response (MDR) adoption is growing among small...

What Uber’s data breach reveals about social engineering

Growfin’s AI-based cash collection SaaS continues to expand into the US...

POPULAR CATEGORY

Two years after DALL-E’s debut, the inventor is “surprised” by the impact

Event

Original research developed in collaboration with CLIP

Pushing DALL-E research as far as possible

The move from DALL-E 2 to diffusion models

“Competitive, exciting and fraught”

The Future of DALL-E and Generative AI

Latest news

Must read

You might also likeRELATEDRecommended to you

POPULAR POSTS

POPULAR CATEGORY

You might also likeRELATED
Recommended to you