In the world of artificial intelligence, there are so-called text image generators. It’s a self-explanatory name: based on the phrase the user types, the system returns an image matching what was typed.
Until then, the pioneer in this type of software was DALL-E, a program created by the OpenAi Lab. Now, Google has decided to get into the game with an extension Imagen, announced last Tuesday (24).
Imagen works in the same way as other generators: based on the text, it generates an image. On the page devoted to this show, he was described as possessing an “unprecedented degree of photo realism and a deep understanding of language”. In fact, you only need to look at the images released by the company to understand the capabilities of the new tool:


According to Google, Imagen produces better images than DALL-E. To come to this conclusion, the company created a comparison scale called DrawBench. There is nothing too complicated: they used the same text to create images in several generators. The products were submitted to human judges who chose their favourites. Imagen’s results were selected more times than competitors.
photo problems
Despite the impressive results on Imagen, caution should be exercised. After all, the released images were hand-picked to show the software’s best potential – and may not represent the average test result.
Another problem with Imagen: Even with its huge artistic and creative potential, the program can be used to generate fake news and misinformation. – As it happened with deep fakes.
The Google team also draws attention to the problems caused by the project database. Let’s move on to parts: systems like this work through machine learning (“Machine Learning”). The program is exposed to a huge amount of data (in the case of text image generators and the associated text and images). The program then studies this data to find patterns (eg associate the word “ball” with pictures with different types of balls).
The goal is that through this learning, the software can repeat these patterns according to the user’s request. If I write the word “football”, then not only do you need to understand that I want the image of the ball, it is a brown oval ball with visible seams.
To create complex images like the ones you saw above, Imagen, of course, needs a huge amount of data. The higher this size, the more difficult it is to filter it. Herein lies the problem:By absorbing this information from Internet banks, machines learn to carry with them the same preconceptions and stereotypes that are circulating on the web.
“There is a risk that Imagen has encrypted malicious stereotypes and representations, which justifies our decision not to release Imagen for public use,” the project team said on its official page. After an initial assessment, the company identified “various biases and social stereotypes” that Imagine embodied, “including the tendency to create images of light-skinned people and a tendency to portray different occupations in line with Western gender stereotypes.”

For these and other reasons, Imagen still does not have a release date to the public. Google is committed to fixing “these challenges and limitations in future work.” It is hoped that, with new updates, the program will become a safe tool for creating stunning images from simple texts.
Share this article via: