Why AI Image Generators Like Midjourney Struggle with Text

Updated: May 24

Photo Source: Bing

AI image generators are amazing tools that can create realistic and artistic images from text prompts. You can use them to generate logos, posters, illustrations, memes, and more. However, if you have ever tried to use them, you may have noticed that they often get the text on the image wrong.

They may change the letters, misspell the words, use weird fonts, or ignore the text altogether. Why does this happen?

One of the reasons is that AI image generators are not very good at understanding language. They are trained on a large amount of data that contains images and text, but they do not learn the meaning or the structure of language.

They just learn to associate certain patterns of pixels with certain patterns of characters. This means that they may not know how to spell correctly, how to use punctuation, how to align text, or how to follow grammar rules.

Another reason is that AI image generators may not have enough data or examples of how to generate text in different styles, fonts, colors, etc. They may have seen more images without text than with text, or more images with certain types of text than others.

This means that they may not know how to generate text that matches the style or the context of the image. They may also have a hard time generating text that is original or creative.

According to one answer on Artificial Intelligence Stack Exchange, this is a specific type of failure of coherence of the model. It’s not really that different from the generator mangling hands, positioning limbs incorrectly etc.

There isn’t really anything special about text to (this kind of) AI in this regard.

Some models may be better than others at generating text, depending on their size, architecture, training data, etc. For example, Google’s research project Parti published an article to demonstrate the effect of adding more parameters to the model. Their best model generates very good-looking text.

However, these models may not be available to the public yet.

A possible workaround is to generate the background image with the AI image generator and then use an image editor to add the text manually. This way, you can have more control over the text and make sure it looks good and accurate.

AI image generators are still evolving and improving. Maybe in the future, they will be able to generate text on images flawlessly. Until then, we can enjoy their creativity and quirks.

