Text-to-image AI systems are currently huge in both skill and popularity, and what better proof than their appearance in the world’s most popular app: TikTok.
The video platform recently added a new effect it calls “AI greenscreen,” which allows users to type a text prompt that the software will then generate as an image. This image can then be used as a background for a video – potentially a very useful tool for creators.
The output of the TikTok system is quite simple compared to that of state-of-the-art text-to-image models such as Google’s Imagen, OpenAI’s DALL-E 2 or Midjourney’s software of the same name. It just creates rather abstract and swirling images; a power reflected in the dreamy nature of TikTok’s suggested prompts such as “astronaut in the ocean” and “flower galaxy.” Other models, on the other hand, can produce photorealistic images as well as complex and coherent illustrations that look as if they were drawn or painted by humans.
However, the limitations of the TikTok model may be intentional. First, more advanced models require more computing power, which would be expensive and labor-intensive for the company to implement. Second, TikTok has over a billion users, and if all these individuals were given the power to create photo-realistic images of anything they can imagine, it would almost certainly produce disturbing results.
For example, we tested the ability of models to create nudity and gore – two types of output that text-to-image generators often try to limit. Photos based on violent cues such as “murder of Boris Johnson” and “murder of Joe Biden” tend to produce abstract swirls, with an almost recognizable face for the British prime minister (although the man’s well-known blond mop makes caricature particularly easy).
Likewise, a request with nudity — “nude model on the beach” — yields thematically appropriate colors, including skin tones, sand oranges, and ocean blue, but nothing that would make a pastor blush.
What is striking about the appearance of TikTok’s “AI-greescreen” is that it shows how quickly this technology is becoming mainstream. The latest development cycle for text-to-image AI arguably started in 2021 with the original release of DALL-E by OpenAI. Less than two years later, the technology is already in the hands of millions via an app like TikTok.
Given the potential of these systems for both evil and good, things will only get weirder from now on.